Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Reading LevelReading Level
-
Content TypeContent Type
-
YearFrom:-To:
-
More FiltersMore FiltersItem TypeIs Full-Text AvailableSubjectPublisherSourceDonorLanguagePlace of PublicationContributorsLocation
Done
Filters
Reset
13,747
result(s) for
"Wainwright, Martin"
Sort by:
STATISTICAL GUARANTEES FOR THE EM ALGORITHM: FROM POPULATION TO SAMPLE-BASED ANALYSIS
by
Wainwright, Martin J.
,
Balakrishnan, Sivaraman
,
Yu, Bin
in
Algorithms
,
Demography
,
Iterative methods
2017
The EM algorithm is a widely used tool in maximum-likelihood estimation in incomplete data problems. Existing theoretical work has focused on conditions under which the iterates or likelihood values converge, and the associated rates of convergence. Such guarantees do not distinguish whether the ultimate fixed point is a near global optimum or a bad local optimum of the sample likelihood, nor do they relate the obtained fixed point to the global optima of the idealized population likelihood (obtained in the limit of infinite data). This paper develops a theoretical framework for quantifying when and how quickly EM-type iterates converge to a small neighborhood of a given global optimum of the population likelihood. For correctly specified models, such a characterization yields rigorous guarantees on the performance of certain two-stage estimators in which a suitable initial pilot estimator is refined with iterations of the EM algorithm. Our analysis is divided into two parts: a treatment of the EM and first-order EM algorithms at the population level, followed by results that apply to these algorithms on a finite set of samples. Our conditions allow for a characterization of the region of convergence of EM-type iterates to a given population fixed point, that is, the region of the parameter space over which convergence is guaranteed to a point within a small neighborhood of the specified population fixed point. We verify our conditions and give tight characterizations of the region of convergence for three canonical problems of interest: symmetric mixture of two Gaussians, symmetric mixture of two regressions and linear regression with covariates missing completely at random.
Journal Article
RANDOMIZED SKETCHES FOR KERNELS: FAST AND OPTIMAL NONPARAMETRIC REGRESSION
2017
Kernel ridge regression (KRR) is a standard method for performing nonparametric regression over reproducing kernel Hilbert spaces. Given n samples, the time and space complexity of computing the KRR estimate scale as 𝓞(n3) and 𝓞(n2), respectively, and so is prohibitive in many cases. We propose approximations of KRR based on m-dimensional randomized sketches of the kernel matrix, and study how small the projection dimension m can be chosen while still preserving minimax optimality of the approximate KRR estimate. For various classes of randomized sketches, including those based on Gaussian and randomized Hadamard matrices, we prove that it suffices to choose the sketch dimension m proportional to the statistical dimension (modulo logarithmic factors). Thus, we obtain fast and minimax optimal approximations to the KRR estimate for nonparametric regression. In doing so, we prove a novel lower bound on the minimax risk of kernel regression in terms of the localized Rademacher complexity.
Journal Article
Minimax Optimal Procedures for Locally Private Estimation
by
Wainwright, Martin J.
,
Duchi, John C.
,
Jordan, Michael I.
in
Blogs
,
Density
,
Differential privacy
2018
Working under a model of privacy in which data remain private even from the statistician, we study the tradeoff between privacy guarantees and the risk of the resulting statistical estimators. We develop private versions of classical information-theoretical bounds, in particular those due to Le Cam, Fano, and Assouad. These inequalities allow for a precise characterization of statistical rates under local privacy constraints and the development of provably (minimax) optimal estimation procedures. We provide a treatment of several canonical families of problems: mean estimation and median estimation, generalized linear models, and nonparametric density estimation. For all of these families, we provide lower and upper bounds that match up to constant factors, and exhibit new (optimal) privacy-preserving mechanisms and computationally efficient estimators that achieve the bounds. Additionally, we present a variety of experimental results for estimation problems involving sensitive data, including salaries, censored blog posts and articles, and drug abuse; these experiments demonstrate the importance of deriving optimal procedures. Supplementary materials for this article are available online.
Journal Article
SUPPORT RECOVERY WITHOUT INCOHERENCE: A CASE FOR NONCONVEX REGULARIZATION
by
Loh, Po-Ling
,
Wainwright, Martin J.
in
Generalized linear models
,
Incoherence
,
Least squares method
2017
We develop a new primal-dual witness proof framework that may be used to establish variable selection consistency and ℓ∞-bounds for sparse regression problems, even when the loss function and regularizer are nonconvex. We use this method to prove two theorems concerning support recovery and ℓ∞-guarantees for a regression estimator in a general setting. Notably, our theory applies to all potential stationary points of the objective and certifies that the stationary point is unique under mild conditions. Our results provide a strong theoretical justification for the use of nonconvex regularization: For certain nonconvex regularizers with vanishing derivative away from the origin, any stationary point can be used to recover the support without requiring the typical incoherence conditions present in ℓ1-based methods. We also derive corollaries illustrating the implications of our theorems for composite objective functions involving losses such as least squares, nonconvex modified least squares for errors-in-variables linear regression, the negative log likelihood for generalized linear models and the graphical Lasso. We conclude with empirical studies that corroborate our theoretical predictions.
Journal Article
ESTIMATION OF (NEAR) LOW-RANK MATRICES WITH NOISE AND HIGH-DIMENSIONAL SCALING
2011
We study an instance of high-dimensional inference in which the goal is to estimate a matrix Θ* ∈ ℝ m₁ × m₂ on the basis of N noisy observations. The unknown matrix Θ* is assumed to be either exactly low rank, or \"near\" low-rank, meaning that it can be well-approximated by a matrix with low rank. We consider a standard M-estimator based on regularization by the nuclear or trace norm over matrices, and analyze its performance under high-dimensional scaling. We define the notion of restricted strong convexity (RSC) for the loss function, and use it to derive nonasymptotic bounds on the Frobenius norm error that hold for a general class of noisy observation models, and apply to both exactly low-rank and approximately low rank matrices. We then illustrate consequences of this general theory for a number of specific matrix models, including low-rank multivariate or multi-task regression, system identification in vector autoregressive processes and recovery of low-rank matrices from random projections. These results involve nonasymptotic random matrix theory to establish that the RSC condition holds, and to determine an appropriate choice of regularization parameter. Simulation results show excellent agreement with the high-dimensional scaling of the error predicted by our theory.
Journal Article
A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable Regularizers
by
Yu, Bin
,
Negahban, Sahand N.
,
Wainwright, Martin J.
in
Analytical estimating
,
Convexity
,
Covariance matrices
2012
High-dimensional statistical inference deals with models in which the the number of parameters ñ is comparable to or larger than the sample size n. Since it is usually impossible to obtain consistent procedures unless p/n → 0, a line of recent work has studied models with various types of low-dimensional structure, including sparse vectors, sparse and structured matrices, low-rank matrices and combinations thereof. In such settings, a general approach to estimation is to solve a regularized optimization problem, which combines a loss function measuring how well the model fits the data with some regularization function that encourages the assumed structure. This paper provides a unified framework for establishing consistency and convergence rates for such regularized Af-estimators under highdimensional scaling. We state one main theorem and show how it can be used to re-derive some existing results, and also to obtain a number of new results on consistency and convergence rates, in both ℓ₂-error and related norms. Our analysis also identifies two key properties of loss and regularization functions, referred to as restricted strong convexity and decomposability, that ensure corresponding regularized M-estimators have fast convergence rates and which are optimal in many well-studied cases.
Journal Article
ON THE COMPUTATIONAL COMPLEXITY OF HIGH-DIMENSIONAL BAYESIAN VARIABLE SELECTION
by
Yang, Yun
,
Wainwright, Martin J.
,
Jordan, Michael I.
in
Algorithms
,
Bayesian analysis
,
Feature selection
2016
We study the computational complexity of Markov chain Monte Carlo (MCMC) methods for high-dimensional Bayesian linear regression under sparsity constraints. We first show that a Bayesian approach can achieve variable-selection consistency under relatively mild conditions on the design matrix. We then demonstrate that the statistical criterion of posterior concentration need not imply the computational desideratum of rapid mixing of the MCMC algorithm. By introducing a truncated sparsity prior for variable selection, we provide a set of conditions that guarantee both variable-selection consistency and rapid mixing of a particular Metropolis-Hastings algorithm. The mixing time is linear in the number of covariates up to a logarithmic factor. Our proof controls the spectral gap of the Markov chain by constructing a canonical path ensemble that is inspired by the steps taken by greedy algorithms for variable selection.
Journal Article
FAST GLOBAL CONVERGENCE OF GRADIENT METHODS FOR HIGH-DIMENSIONAL STATISTICAL RECOVERY
2012
Many statistical M-estimators are based on convex optimization problems formed by the combination of a data-dependent loss function with a norm-based regularizer. We analyze the convergence rates of projected gradient and composite gradient methods for solving such problems, working within a high-dimensional framework that allows the ambient dimension d to grow with (and possibly exceed) the sample size n. Our theory identifies conditions under which projected gradient descent enjoys globally linear convergence up to the statistical precision of the model, meaning the typical distance between the true unknown parameter θ* and an optimal solution θ̂. By establishing these conditions with high probability for numerous statistical models, our analysis applies to a wide range of M-estimators, including sparse linear regression using Lasso; group Lasso for block sparsity; loglinear models with regularization; low-rank matrix recovery using nuclear norm regularization; and matrix decomposition using a combination of the nuclear and ℓ₁ norms. Overall, our analysis reveals interesting connections between statistical and computational efficiency in high-dimensional estimation.
Journal Article
ACTIVE RANKING FROM PAIRWISE COMPARISONS AND WHEN PARAMETRIC ASSUMPTIONS DO NOT HELP
by
Ramchandran, Kannan
,
Heckel, Reinhard
,
Shah, Nihar B.
in
Algorithms
,
Confidence intervals
,
Lower bounds
2019
We consider sequential or active ranking of a set of n items based on noisy pairwise comparisons. Items are ranked according to the probability that a given item beats a randomly chosen item, and ranking refers to partitioning the items into sets of prespecified sizes according to their scores. This notion of ranking includes as special cases the identification of the top-k items and the total ordering of the items. We first analyze a sequential ranking algorithm that counts the number of comparisons won, and uses these counts to decide whether to stop, or to compare another pair of items, chosen based on confidence intervals specified by the data collected up to that point. We prove that this algorithm succeeds in recovering the ranking using a number of comparisons that is optimal up to logarithmic factors. This guarantee does depend on whether or not the underlying pairwise probability matrix, satisfies a particular structural property, unlike a significant body of past work on pairwise ranking based on parametric models such as the Thurstone or Bradley–Terry–Luce models. It has been a long-standing open question as to whether or not imposing these parametric assumptions allows for improved ranking algorithms. For stochastic comparison models, in which the pairwise probabilities are bounded away from zero, our second contribution is to resolve this issue by proving a lower bound for parametric models. This shows, perhaps surprisingly, that these popular parametric modeling choices offer at most logarithmic gains for stochastic comparisons.
Journal Article