Catalogue Search | MBRL

by Meinshausen, Nicolai , Bühlmann, Peter in Algorithms , Cluster analysis , Computational methods

2010

Estimation of structure, such as in variable selection, graphical modelling or cluster analysis, is notoriously difficult, especially for high dimensional data. We introduce stability selection. It is based on subsampling in combination with (high dimensional) selection algorithms. As such, the method is extremely general and has a very wide range of applicability. Stability selection provides finite sample control for some error rates of false discoveries and hence a transparent principle to choose a proper amount of regularization for structure estimation. Variable selection and structure estimation improve markedly for a range of selection methods if stability selection is applied. We prove for the randomized lasso that stability selection will be variable selection consistent even if the necessary conditions for consistency of the original lasso method are violated. We demonstrate stability selection for variable selection and Gaussian graphical modelling, using real and simulated data.

Journal Article

Share this book

Add to My Shelf

ANDERSON ACCELERATION FOR FIXED-POINT ITERATIONS

by WALKER, HOMER F. , NI, PENG in Algebra , Algorithms , Applied mathematics

2011

This paper concerns an acceleration method for fixed-point iterations that originated in work of D. G. Anderson [J. Assoc. Comput. Mach., 12 (1965), pp. 547–560], which we accordingly call Anderson acceleration here. This method has enjoyed considerable success and wide usage in electronic structure computations, where it is known as Anderson mixing; however, it seems to have been untried or underexploited in many other important applications. Moreover, while other acceleration methods have been extensively studied by the mathematics and numerical analysis communities, this method has received relatively little attention from these communities over the years. A recent paper by H. Fang and Y. Saad [Numer. Linear Algebra Appl., 16 (2009), pp. 197–221] has clarified a remarkable relationship of Anderson acceleration to quasi-Newton (secant updating) methods and extended it to define a broader Anderson family of acceleration methods. In this paper, our goals are to shed additional light on Anderson acceleration and to draw further attention to its usefulness as a general tool. We first show that, on linear problems, Anderson acceleration without truncation is \"essentially equivalent\" in a certain sense to the generalized minimal residual (GMRES) method. We also show that the Type 1 variant in the Fang—Saad Anderson family is similarly essentially equivalent to the Arnoldi (full orthogonalization) method. We then discuss practical considerations for implementing Anderson acceleration and illustrate its performance through numerical experiments involving a variety of applications.

Journal Article

Share this book

Add to My Shelf

ESTIMATION OF (NEAR) LOW-RANK MATRICES WITH NOISE AND HIGH-DIMENSIONAL SCALING

by Wainwright, Martin J. , Negahban, Sahand in 62F30 , 62H12 , Asymptotic methods

2011

We study an instance of high-dimensional inference in which the goal is to estimate a matrix Θ* ∈ ℝ m₁ × m₂ on the basis of N noisy observations. The unknown matrix Θ* is assumed to be either exactly low rank, or \"near\" low-rank, meaning that it can be well-approximated by a matrix with low rank. We consider a standard M-estimator based on regularization by the nuclear or trace norm over matrices, and analyze its performance under high-dimensional scaling. We define the notion of restricted strong convexity (RSC) for the loss function, and use it to derive nonasymptotic bounds on the Frobenius norm error that hold for a general class of noisy observation models, and apply to both exactly low-rank and approximately low rank matrices. We then illustrate consequences of this general theory for a number of specific matrix models, including low-rank multivariate or multi-task regression, system identification in vector autoregressive processes and recovery of low-rank matrices from random projections. These results involve nonasymptotic random matrix theory to establish that the RSC condition holds, and to determine an appropriate choice of regularization parameter. Simulation results show excellent agreement with the high-dimensional scaling of the error predicted by our theory.

Journal Article

Share this book

Add to My Shelf

THE BENEFIT OF GROUP SPARSITY

by Huang, Junzhou , Zhang, Tong in 62G05 , 62J05 , Approximation

2010

This paper develops a theory for group Lasso using a concept called strong group sparsity. Our result shows that group Lasso is superior to standard Lasso for strongly group-sparse signals. This provides a convincing theoretical justification for using group sparse regularization when the underlying group structure is consistent with the data. Moreover, the theory predicts some limitations of the group Lasso formulation that are confirmed by simulation studies.

Journal Article

Share this book

Add to My Shelf

BAYESIAN INVERSE PROBLEMS WITH GAUSSIAN PRIORS

by van der Vaart, A. W. , Knapik, B. T. , van Zanten, J. H. in 62G05 , 62G15 , 62G20

2011

The posterior distribution in a nonparametric inverse problem is shown to contract to the true parameter at a rate that depends on the smoothness of the parameter, and the smoothness and scale of the prior. Correct combinations of these characteristics lead to the minimax rate. The frequentist coverage of credible sets is shown to depend on the combination of prior and true parameter, with smoother priors leading to zero coverage and rougher priors to conservative coverage. In the latter case credible sets are of the correct order of magnitude. The results are numerically illustrated by the problem of recovering a function from observation of a noisy version of its primitive.

Journal Article

Share this book

Add to My Shelf

SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION

by Wainwright, Martin J. , Obozinski, Guillaume , Jordan, Michael I. in 62F07 , 62J07 , block-norm

2011

In multivariate regression, a K -dimensional response vector is regressed upon a common set of p covariates, with a matrix B* ∈ ℝ p×K of regression coefficients. We study the behavior of the multivariate group Lasso, in which block regularization based on the 𝓁₁/𝓁₂ norm is used for support union recovery, or recovery of the set of s rows for which B* is nonzero. Under high-dimensional scaling, we show that the multivariate group Lasso exhibits a threshold for the recovery of the exact row pattern with high probability over the random design and noise that is specified by the sample complexity parameter θ(n, p, s):=n/[2ψ(B*) log(p − s)]. Here n is the sample size, and ψ(B*) is a sparsity-overlap function measuring a combination of the sparsities and overlaps of the K -regression coefficient vectors that constitute the model. We prove that the multivariate group Lasso succeeds for problem sequences (n, p, s) such that θ(n, p, s) exceeds a critical level θ u , and fails for sequences such that θ(n, p, s) lies below a critical level θ 𝓁 . For the special case of the standard Gaussian ensemble, we show that θ 𝓁 = θ u so that the characterization is sharp. The sparsity-overlap function ψ(B*) reveals that, if the design is uncorrelated on the active rows, 𝓁₁/𝓁₂ regularization for multivariate regression never harms performance relative to an ordinary Lasso approach and can yield substantial improvements in sample complexity (up to a factor of K) when the coefficient vectors are suitably orthogonal. For more general designs, it is possible for the ordinary Lasso to outperform the multivariate group Lasso. We complement our analysis with simulations that demonstrate the sharpness of our theoretical results, even for relatively small problems.

Journal Article

Share this book

Add to My Shelf

BART: BAYESIAN ADDITIVE REGRESSION TREES

by George, Edward I. , Chipman, Hugh A. , McCulloch, Robert E. in Bayesian backfitting , boosting , Burn in

2010

We develop a Bayesian \"sum-of-trees\" model where each tree is constrained by a regularization prior to be a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a posterior. Effectively, BART is a nonparametric Bayesian regression approach which uses dimensionally adaptive random basis elements. Motivated by ensemble methods in general, and boosting algorithms in particular, BART is defined by a statistical model: a prior and a likelihood. This approach enables full posterior inference including point and interval estimates of the unknown regression function as well as the marginal effects of potential predictors. By keeping track of predictor inclusion frequencies, BART can also be used for model-free variable selection. BART's many features are illustrated with a bake-off against competing methods on 42 different data sets, with a simulation experiment and on a drug discovery classification problem.

Journal Article

Share this book

Add to My Shelf

The Mathematics of Atmospheric Dispersion Modeling

by Stockie, John M. in Airborne particulates , Applied sciences , Artificial intelligence

2011

The Gaussian plume model is a standard approach for studying the transport of airborne contaminants due to turbulent diffusion and advection by the wind. This paper reviews the assumptions underlying the model, its derivation from the advection-diffusion equation, and the key properties of the plume solution. The results are then applied to solving an inverse problem in which emission source rates are determined from a given set of ground-level contaminant measurements. This source identification problem can be formulated as an overdetermined linear system of equations that is most easily solved using the method of least squares. Various generalizations of this problem are discussed, and we illustrate our results with an application to the study of zinc emissions from a smelting operation.

Journal Article

Share this book

Add to My Shelf

OPTIMAL SELECTION OF REDUCED RANK ESTIMATORS OF HIGH-DIMENSIONAL MATRICES

by Bunea, Florentina , Wegkamp, Marten H. , She, Yiyuan in 62H15 , 62J07 , adaptive estimation

2011

We introduce a new criterion, the Rank Selection Criterion (RSC), for selecting the optimal reduced rank estimator of the coefficient matrix in multivariate response regression models. The corresponding RSC estimator minimizes the Frobenius norm of the fit plus a regularization term proportional to the number of parameters in the reduced rank model. The rank of the RSC estimator provides a consistent estimator of the rank of the coefficient matrix; in general, the rank of our estimator is a consistent estimate of the effective rank, which we define to be the number of singular values of the target matrix that are appropriately large. The consistency results are valid not only in the classic asymptotic regime, when n, the number of responses, and p, the number of predictors, stay bounded, and m, the number of observations, grows, but also when either, or both, n and p grow, possibly much faster than m. We establish minimax optimal bounds on the mean squared errors of our estimators. Our finite sample performance bounds for the RSC estimator show that it achieves the optimal balance between the approximation error and the penalty term. Furthermore, our procedure has very low computational complexity, linear in the number of candidate models, making it particularly appealing for large scale problems. We contrast our estimator with the nuclear norm penalized least squares (NNP) estimator, which has an inherently higher computational complexity than RSC, for multivariate regression models. We show that NNP has estimation properties similar to those of RSC, albeit under stronger conditions. However, it is not as parsimonious as RSC. We offer a simple correction of the NNP estimator which leads to consistent rank estimation. We verify and illustrate our theoretical findings via an extensive simulation study.

Journal Article

Share this book

Add to My Shelf

A Fast Randomized Algorithm for Computing a Hierarchically Semiseparable Representation of a Matrix

by Martinsson, P. G. in Accuracy , Algebra , Algorithms

2011

Randomized sampling has recently been proven a highly efficient technique for computing approximate factorizations of matrices that have low numerical rank. This paper describes an extension of such techniques to a wider class of matrices that are not themselves rank-deficient but have off-diagonal blocks that are; specifically, the class of so-called hierarchically semiseparable (HSS) matrices. HSS matrices arise frequently in numerical analysis and signal processing, particularly in the construction of fast methods for solving differential and integral equations numerically. The HSS structure admits algebraic operations (matrix-vector multiplications, matrix factorizations, matrix inversion, etc.) to be performed very rapidly, but only once the HSS representation of the matrix has been constructed. How to rapidly compute this representation in the first place is much less well understood. The present paper demonstrates that if an $N\\times N$ matrix can be applied to a vector in $O(N)$ time, and if individual entries of the matrix can be computed rapidly, then provided that an HSS representation of the matrix exists, it can be constructed in $O(N\\,k^{2})$ operations, where $k$ is an upper bound for the numerical rank of the off-diagonal blocks. The point is that when legacy codes (based on, e.g., the fast multipole method) can be used for the fast matrix-vector multiply, the proposed algorithm can be used to obtain the HSS representation of the matrix, and then well-established techniques for HSS matrices can be used to invert or factor the matrix.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter