Catalogue Search | MBRL

Geometric median and robust estimation in Banach spaces

by MINSKER, STANISLAV in distributed computing , heavy-tailed noise , large deviations

2015

In many real-world applications, collected data are contaminated by noise with heavy-tailed distribution and might contain outliers of large magnitude. In this situation, it is necessary to apply methods which produce reliable outcomes even if the input contains corrupted measurements. We describe a general method which allows one to obtain estimators with tight concentration around the true parameter of interest taking values in a Banach space. Suggested construction relies on the fact that the geometric median of a collection of independent \"weakly concentrated\" estimators satisfies a much stronger deviation bound than each individual element in the collection. Our approach is illustrated through several examples, including sparse linear regression and low-rank matrix recovery problems.

Journal Article

Share this book

Add to My Shelf

SUB-GAUSSIAN ESTIMATORS OF THE MEAN OF A RANDOM MATRIX WITH HEAVY-TAILED ENTRIES

by Minsker, Stanislav in Covariance matrix , Empirical analysis , Matrix

2018

Estimation of the covariance matrix has attracted a lot of attention of the statistical research community over the years, partially due to important applications such as principal component analysis. However, frequently used empirical covariance estimator, and its modifications, is very sensitive to the presence of outliers in the data. As P. Huber wrote [Ann. Math. Stat. 35 (1964) 73–101], “… This raises a question which could have been asked already by Gauss, but which was, as far as I know, only raised a few years ago (notably by Tukey): what happens if the true distribution deviates slightly from the assumed normal one? As is now well known, the sample mean then may have a catastrophically bad performance….” Motivated by Tukey’s question,we develop a new estimator of the (element-wise) mean of a random matrix, which includes covariance estimation problem as a special case. Assuming that the entries of a matrix possess only finite second moment, this new estimator admits sub-Gaussian or sub-exponential concentration around the unknown mean in the operator norm. We explain the key ideas behind our construction, and discuss applications to covariance estimation and matrix completion problems.

Journal Article

Share this book

Add to My Shelf

Generalized median of means principle for Bayesian inference

by Minsker, Stanislav , Yao, Shunan in Algorithms , Artificial Intelligence , Bayesian analysis

2025

The topic of robustness is experiencing a resurgence of interest in the statistical and machine learning communities. In particular, robust algorithms making use of the so-called median of means estimator were shown to satisfy strong performance guarantees for many problems, including estimation of the mean, covariance structure as well as linear regression. In this work, we propose an extension of the median of means principle to the Bayesian framework, leading to the notion of the robust posterior distribution. In particular, we (a) quantify robustness of this posterior to outliers, (b) show that it satisfies a version of the Bernstein-von Mises theorem that connects Bayesian credible sets to the traditional confidence intervals, and (c) demonstrate that our approach performs well in applications.

Journal Article

Share this book

Add to My Shelf

ROBUST ESTIMATION OF COVARIANCE MATRICES

by Minsker, Stanislav , Wang, Lang

2024

We consider the problem of estimating the covariance structure of a random vector Y ∈ ℝ d from an independent and identically distributed (i.i.d.) sample Y 1, ..., Y n. We are interested in the situation in which d is large relative to n, but the covariance matrix Σ of interest has (exactly or approximately) low rank. We assume that the given sample is either (a) ε-adversarially corrupted, meaning that an ε-fraction of the observations can be replaced by arbitrary vectors, or (b) i.i.d., but the underlying distribution is heavy-tailed, meaning that the norm of Y possesses only finite fourth moments. We propose estimators that are adaptive to the potential low-rank structure of the covariance matrix and to the proportion of contaminated data, and that admit tight deviation guarantees, despite rather weak underlying assumptions. Finally, we show that the proposed construction leads to numerically efficient algorithms that require minimal tuning from the user, and demonstrate the performance of such methods under various models of contamination.

Journal Article

Share this book

Add to My Shelf

Robust modifications of U-statistics and applications to covariance estimation problems

by MINSKER, STANISLAV , WEI, XIAOHAN

2020

Let Y be a d-dimensional random vector with unknown mean µ and covariance matrix Σ. This paper is motivated by the problem of designing an estimator of Σ that admits exponential deviation bounds in the operator norm under minimal assumptions on the underlying distribution, such as existence of only 4th moments of the coordinates of Y. To address this problem, we propose robust modifications of the operator-valued U-statistics, obtain non-asymptotic guarantees for their performance, and demonstrate the implications of these results to the covariance estimation problem under various structural assumptions.

Journal Article

Share this book

Add to My Shelf

User-Friendly Covariance Estimation for Heavy-Tailed Distributions

by Ke, Yuan , Minsker, Stanislav , Zhou, Wen-Xin in Confidence intervals , Covariance matrix , Estimating techniques

2019

We provide a survey of recent results on covariance estimation for heavy-tailed distributions. By unifying ideas scattered in the literature, we propose user-friendly methods that facilitate practical implementation. Specifically, we introduce elementwise and spectrumwise truncation operators, as well as their 𝛭-estimator counterparts, to robustify the sample covariance matrix. Different from the classical notion of robustness that is characterized by the breakdown property, we focus on the tail robustness which is evidenced by the connection between nonasymptotic deviation and confidence level. The key insight is that estimators should adapt to the sample size, dimensionality and noise level to achieve optimal tradeoff between bias and robustness. Furthermore, to facilitate practical implementation, we propose data-driven procedures that automatically calibrate the tuning parameters. We demonstrate their applications to a series of structured models in high dimensions, including the bandable and low-rank covariance matrices and sparse precision matrices. Numerical studies lend strong support to the proposed methods.

Journal Article

Share this book

Add to My Shelf

User-Friendly Covariance Estimation for Heavy-Tailed Distributions

by Ke, Yuan , Minsker, Stanislav , Zhou, Wen-Xin

2019

Journal Article

Share this book

Add to My Shelf

Active Clinical Trials for Personalized Medicine

by Minsker, Stanislav , Zhao, Ying-Qi , Cheng, Guang in Active learning , Clinical research , Clinical trial

2016

Individualized treatment rules (ITRs) tailor treatments according to individual patient characteristics. They can significantly improve patient care and are thus becoming increasingly popular. The data collected during randomized clinical trials are often used to estimate the optimal ITRs. However, these trials are generally expensive to run, and, moreover, they are not designed to efficiently estimate ITRs. In this article, we propose a cost-effective estimation method from an active learning perspective. In particular, our method recruits only the \"most informative\" patients (in terms of learning the optimal ITRs) from an ongoing clinical trial. Simulation studies and real-data examples show that our active clinical trial method significantly improves on competing methods. We derive risk bounds and show that they support these observed empirical advantages. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Efficient median of means estimator

by Minsker, Stanislav

2023

The goal of this note is to present a modification of the popular median of means estimator that achieves sub-Gaussian deviation bounds with nearly optimal constants under minimal assumptions on the underlying distribution. We build on a recent work on the topic by the author, and prove that desired guarantees can be attained under weaker requirements.

Paper

Share this book

Add to My Shelf

Asymptotic normality of robust risk minimizers

by Minsker, Stanislav in Algorithms , Asymptotic properties , Convergence

2023

This paper investigates asymptotic properties of algorithms that can be viewed as robust analogues of the classical empirical risk minimization. These strategies are based on replacing the usual empirical average by a robust proxy of the mean, such as the (version of) the median of means estimator. It is well known by now that the excess risk of resulting estimators often converges to zero at optimal rates under much weaker assumptions than those required by their ``classical'' counterparts. However, less is known about the asymptotic properties of the estimators themselves, for instance, whether robust analogues of the maximum likelihood estimators are asymptotically efficient. We make a step towards answering these questions and show that for a wide class of parametric problems, minimizers of the appropriately defined robust proxy of the risk converge to the minimizers of the true risk at the same rate, and often have the same asymptotic variance, as the estimators obtained by minimizing the usual empirical risk.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter