Catalogue Search | MBRL

Regression shrinkage and selection via the lasso: a retrospective

by Tibshirani, Robert in Algorithms , Approximation , Computational methods

2011

In the paper I give a brief review of the basic idea and some history and then discuss some developments since the original paper on regression shrinkage and selection via the lasso. [PUBLICATION ABSTRACT]

Journal Article

Share this book

Add to My Shelf

Model-Free Feature Screening for Ultrahigh-Dimensional Data

by Li, Runze , Zhu, Li-Ping , Li, Lexin in Applications , Candidates , Computational methods

2011

With the recent explosion of scientific data of unprecedented size and complexity, feature ranking and screening are playing an increasingly important role in many scientific studies. In this article, we propose a novel feature screening procedure under a unified model framework, which covers a wide variety of commonly used parametric and semiparametric models. The new method does not require imposing a specific model structure on regression functions, and thus is particularly appealing to ultrahigh-dimensional regressions, where there are a huge number of candidate predictors but little information about the actual model forms. We demonstrate that, with the number of predictors growing at an exponential rate of the sample size, the proposed procedure possesses consistency in ranking, which is both useful in its own right and can lead to consistency in selection. The new procedure is computationally efficient and simple, and exhibits a competent empirical performance in our intensive simulations and real data analysis.

Journal Article

Share this book

Add to My Shelf

High-Dimensional Variable Selection

by Roeder, Kathryn , Wasserman, Larry in 62J05 , 62J07 , Cleaning

2009

This paper explores the following question: what kind of statistical guarantees can be given when doing variable selection in high-dimensional models? In particular, we look at the error rates and power of some multi-stage regression methods. In the first stage we fit a set of candidate models. In the second stage we select one model by cross-validation. In the third stage we use hypothesis testing to eliminate some variables. We refer to the first two stages as \"screening\" and the last stage as \"cleaning.\" We consider three screening methods: the lasso, marginal regression, and forward stepwise regression. Our method gives consistent variable selection under certain conditions.

Journal Article

Share this book

Add to My Shelf

COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION

by Huang, Jian , Breheny, Patrick in Calculus of variations and optimal control , Coefficients , Convexity

2011

A number of variable selection methods have been proposed involving nonconvex penalty functions. These methods, which include the smoothly clipped absolute deviation (SCAD) penalty and the minimax concave penalty (MCP), have been demonstrated to have attractive theoretical properties, but model fitting is not a straightforward task, and the resulting solutions may be unstable. Here, we demonstrate the potential of coordinate descent algorithms for fitting these models, establishing theoretical convergence properties and demonstrating that they are significantly faster than competing approaches. In addition, we demonstrate the utility of convexity diagnostics to determine regions of the parameter space in which the objective function is locally convex, even though the penalty is not. Our simulation study and data examples indicate that nonconvex penalties like MCP and SCAD are worthwhile alternatives to the lasso in many applications. In particular, our numerical results suggest that MCP is the preferred approach among the three methods.

Journal Article

Share this book

Add to My Shelf

NEARLY UNBIASED VARIABLE SELECTION UNDER MINIMAX CONCAVE PENALTY

by Zhang, Cun-Hui in 62H12 , 62H25 , 62J05

2010

We propose MC+, a fast, continuous, nearly unbiased and accurate method of penalized variable selection in high-dimensional linear regression. The LASSO is fast and continuous, but biased. The bias of the LASSO may prevent consistent variable selection. Subset selection is unbiased but computationally costly. The MC+ has two elements: a minimax concave penalty (MCP) and a penalized linear unbiased selection (PLUS) algorithm. The MCP provides the convexity of the penalized loss in sparse regions to the greatest extent given certain thresholds for variable selection and unbiasedness. The PLUS computes multiple exact local minimizers of a possibly nonconvex penalized loss function in a certain main branch of the graph of critical points of the penalized loss. Its output is a continuous piecewise linear path encompassing from the origin for infinite penalty to a least squares solution for zero penalty. We prove that at a universal penalty level, the MC+ has high probability of matching the signs of the unknowns, and thus correct selection, without assuming the strong irrepresentable condition required by the LASSO. This selection consistency applies to the case of p ≫ n, and is proved to hold for exactly the MC+ solution among possibly many local minimizers. We prove that the MC+ attains certain minimax convergence rates in probability for the estimation of regression coefficients in l r balls. We use the SURE method to derive degrees of freedom and C p -type risk estimates for general penalized LSE, including the LASSO and MC+ estimators, and prove their unbiasedness. Based on the estimated degrees of freedom, we propose an estimator of the noise level for proper choice of the penalty level. For full rank designs and general sub-quadratic penalties, we provide necessary and sufficient conditions for the continuity of the penalized LSE. Simulation results overwhelmingly support our claim of superior variable selection properties and demonstrate the computational efficiency of the proposed method.

Journal Article

Share this book

Add to My Shelf

THE COMPOSITE ABSOLUTE PENALTIES FAMILY FOR GROUPED AND HIERARCHICAL VARIABLE SELECTION

by Zhao, Peng , Rocha, Guilherme , Yu, Bin in 62J07 , Algorithms , coefficient paths

2009

Extracting useful information from high-dimensional data is an important focus of today's statistical research and practice. Penalized loss function minimization has been shown to be effective for this task both theoretically and empirically. With the virtues of both regularization and sparsity, the L₁-penalized squared error minimization method Lasso has been popular in regression models and beyond. In this paper, we combine different norms including L₁ to form an intelligent penalty in order to add side information to the fitting of a regression or classification model to obtain reasonable estimates. Specifically, we introduce the Composite Absolute Penalties (CAP) family, which allows given grouping and hierarchical relationships between the predictors to be expressed. CAP penalties are built by defining groups and combining the properties of norm penalties at the across-group and within-group levels. Grouped selection occurs for nonoverlapping groups. Hierarchical variable selection is reached by defining groups with particular overlapping patterns. We propose using the BLASSO and cross-validation to compute CAP estimates in general. For a subfamily of CAP estimates involving only the L₁ and L ∞ norms, we introduce the iCAP algorithm to trace the entire regularization path for the grouped selection problem. Within this subfamily, unbiased estimates of the degrees of freedom (df) are derived so that the regularization parameter is selected without cross-validation. CAP is shown to improve on the predictive performance of the LASSO in a series of simulated experiments, including cases with p ≫ n and possibly mis-specified groupings. When the complexity of a model is properly calculated, iCAP is seen to be parsimonious in the experiments.

Journal Article

Share this book

Add to My Shelf

group lasso for logistic regression

by van de Geer, Sara , Meier, Lukas , Bühlmann, Peter in Algorithms , Categorical data , Co-ordinate descent algorithm

2008

The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regression models and present an efficient algorithm, that is especially suitable for high dimensional problems, which can also be applied to generalized linear models to solve the corresponding convex optimization problem. The group lasso estimator for logistic regression is shown to be statistically consistent even if the number of predictors is much larger than sample size but with sparse true underlying structure. We further use a two-stage procedure which aims for sparser models than the group lasso, leading to improved prediction performance for some cases. Moreover, owing to the two-stage nature, the estimates can be constructed to be hierarchical. The methods are used on simulated and real data sets about splice site detection in DNA sequences.

Journal Article

Share this book

Add to My Shelf

The Sparsity and Bias of the Lasso Selection in High-Dimensional Linear Regression

by Zhang, Cun-Hui , Huang, Jian in 62H25 , 62J05 , 62J07

2008

Meinshausen and Buhlmann [Ann. Statist. 34 (2006) 1436-1462] showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent, even when the number of variables is of greater order than the sample size. Zhao and Yu [(2006) J. Machine Learning Research 7 2541-2567] formalized the neighborhood stability condition in the context of linear regression as a strong irrepresentable condition. That paper showed that under this condition, the LASSO selects exactly the set of nonzero regression coefficients, provided that these coefficients are bounded away from zero at a certain rate. In this paper, the regression coefficients outside an ideal model are assumed to be small, but not necessarily zero. Under a sparse Riesz condition on the correlation of design variables, we prove that the LASSO selects a model of the correct order of dimensionality, controls the bias of the selected model at a level determined by the contributions of small regression coefficients and threshold bias, and selects all coefficients of greater order than the bias of the selected model. Moreover, as a consequence of this rate consistency of the LASSO in model selection, it is proved that the sum of error squares for the mean response and the $\\ell _{\\alpha}\\text{-loss}$ for the regression coefficients converge at the best possible rates under the given conditions. An interesting aspect of our results is that the logarithm of the number of variables can be of the same order as the sample size for certain random dependent designs.

Journal Article

Share this book

Add to My Shelf

Scaled sparse linear regression

by SUN, TINGNI , ZHANG, CUN-HUI in Algorithms , Applications , Biology, psychology, social sciences

2012

Scaled sparse linear regression jointly estimates the regression coefficients and noise level in a linear model. It chooses an equilibrium with a sparse regression method by iteratively estimating the noise level via the mean residual square and scaling the penalty in proportion to the estimated noise level. The iterative algorithm costs little beyond the computation of a path or grid of the sparse regression estimator for penalty levels above a proper threshold. For the scaled lasso, the algorithm is a gradient descent in a convex minimization of a penalized joint loss function for the regression coefficients and noise level. Under mild regularity conditions, we prove that the scaled lasso simultaneously yields an estimator for the noise level and an estimated coefficient vector satisfying certain oracle inequalities for prediction, the estimation of the noise level and the regression coefficients. These inequalities provide sufficient conditions for the consistency and asymptotic normality of the noise-level estimator, including certain cases where the number of variables is of greater order than the sample size. Parallel results are provided for least-squares estimation after model selection by the scaled lasso. Numerical results demonstrate the superior performance of the proposed methods over an earlier proposal of joint convex minimization.

Journal Article

Share this book

Add to My Shelf

Unconditional Quantile Regressions

by Firpo, Sergio , Lemieux, Thomas , Fortin, Nicole M. in Applications , Changes , Consistent estimators

2009

We propose a new regression method to evaluate the impact of changes in the distribution of the explanatory variables on quantiles of the unconditional (marginal) distribution of an outcome variable. The proposed method consists of running a regression of the (recentered) influence function (RIF) of the unconditional quantile on the explanatory variables. The influence function, a widely used tool in robust estimation, is easily computed for quantiles, as well as for other distributional statistics. Our approach, thus, can be readily generalized to other distributional statistics.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter