Catalogue Search | MBRL

DEVIATION OPTIMAL LEARNING USING GREEDY Q-AGGREGATION

by Rigollet, Philippe , Dai, Dong , Zhang, Tong in 62G05 , 62G08 , 62G20

2012

Given a finite family of functions, the goal of model selection aggregation is to construct a procedure that mimics the function from this family that is the closest to an unknown regression function. More precisely, we consider a general regression model with fixed design and measure the distance between functions by the mean squared error at the design points. While procedures based on exponential weights are known to solve the problem of model selection aggregation in expectation, they are, surprisingly, sub-optimal in deviation. We propose a new formulation called Q-aggregation that addresses this limitation; namely, its solution leads to sharp oracle inequalities that are optimal in a minimax sense. Moreover, based on the new formulation, we design greedy Q-aggregation procedures that produce sparse aggregation models achieving the optimal rate. The convergence and performance of these greedy procedures are illustrated and compared with other standard methods on simulated examples.

Journal Article

Share this book

Add to My Shelf

High-dimensional generalized linear models and the lasso

by van de Geer, Sara A.

2008

Journal Article

Share this book

Add to My Shelf

Mirror averaging with sparsity priors

by TSYBAKOV, ALEXANDRE B. , DALALYAN, ARNAK S. in Aggregation , aggregation of estimators , Approximation

2012

We consider the problem of aggregating the elements of a possibly infinite dictionary for building a decision procedure that aims at minimizing a given criterion. Along with the dictionary, an independent identically distributed training sample is available, on which the performance of a given procedure can be tested. In a fairly general set-up, we establish an oracle inequality for the Mirror Averaging aggregate with any prior distribution. By choosing an appropriate prior, we apply this oracle inequality in the context of prediction under sparsity assumption for the problems of regression with random design, density estimation and binary classification.

Journal Article

Share this book

Add to My Shelf

BAYESIAN FRACTIONAL POSTERIORS

by Yang, Yun , Pati, Debdeep , Bhattacharya, Anirban in Bayes Theorem , Bayesian analysis , Divergence

2019

We consider the fractional posterior distribution that is obtained by updating a prior distribution via Bayes theorem with a fractional likelihood function, a usual likelihood function raised to a fractional power. First, we analyze the contraction property of the fractional posterior in a general misspecified framework. Our contraction results only require a prior mass condition on certain Kullback–Leibler (KL) neighborhood of the true parameter (or the KL divergence minimizer in the misspecified case), and obviate constructions of test functions and sieves commonly used in the literature for analyzing the contraction property of a regular posterior. We show through a counterexample that some condition controlling the complexity of the parameter space is necessary for the regular posterior to contract, rendering additional flexibility on the choice of the prior for the fractional posterior. Second, we derive a novel Bayesian oracle inequality based on a PAC-Bayes inequality in misspecified models. Our derivation reveals several advantages of averaging based Bayesian procedures over optimization based frequentist procedures. As an application of the Bayesian oracle inequality, we derive a sharp oracle inequality in multivariate convex regression problems. We also illustrate the theory in Gaussian process regression and density estimation problems.

Journal Article

Share this book

Add to My Shelf

HIGH-DIMENSIONAL A-LEARNING FOR OPTIMAL DYNAMIC TREATMENT REGIMES

by Song, Rui , Lu, Wenbin , Shi, Chengchun in Demographics , Estimating techniques , Learning

2018

Precision medicine is a medical paradigm that focuses on finding the most effective treatment decision based on individual patient information. For many complex diseases, such as cancer, treatment decisions need to be tailored over time according to patients’ responses to previous treatments. Such an adaptive strategy is referred as a dynamic treatment regime. A major challenge in deriving an optimal dynamic treatment regime arises when an extraordinary large number of prognostic factors, such as patient’s genetic information, demographic characteristics, medical history and clinical measurements over time are available, but not all of them are necessary for making treatment decision. This makes variable selection an emerging need in precision medicine. In this paper, we propose a penalized multi-stage A-learning for deriving the optimal dynamic treatment regime when the number of covariates is of the nonpolynomial (NP) order of the sample size. To preserve the double robustness property of the A-learning method, we adopt the Dantzig selector, which directly penalizes the A-leaning estimating equations. Oracle inequalities of the proposed estimators for the parameters in the optimal dynamic treatment regime and error bounds on the difference between the value functions of the estimated optimal dynamic treatment regime and the true optimal dynamic treatment regime are established. Empirical performance of the proposed approach is evaluated by simulations and illustrated with an application to data from the STAR*D study.

Journal Article

Share this book

Add to My Shelf

ORACLE INEQUALITIES FOR NETWORK MODELS AND SPARSE GRAPHON ESTIMATION

by Tsybakov, Alexandre B. , Klopp, Olga , Verzelen, Nicolas in Approximation , Automatic Control Engineering , Computer Science

2017

Inhomogeneous random graph models encompass many network models such as stochastic block models and latent position models. We consider the problem of statistical estimation of the matrix of connection probabilities based on the observations of the adjacency matrix of the network. Taking the stochastic block model as an approximation, we construct estimators of network connection probabilities—the ordinary block constant least squares estimator, and its restricted version. We show that they satisfy oracle inequalities with respect to the block constant oracle. As a consequence, we derive optimal rates of estimation of the probability matrix. Our results cover the important setting of sparse networks. Another consequence consists in establishing upper bounds on the minimax risks for graphon estimation in the L₂ norm when the probability matrix is sampled according to a graphon model. These bounds include an additional term accounting for the \"agnostic\" error induced by the variability of the latent unobserved variables of the graphon model. In this setting, the optimal rates are influenced not only by the bias and variance components as in usual nonparametric problems but also include the third component, which is the agnostic error. The results shed light on the differences between estimation under the empirical loss (the probability matrix estimation) and under the integrated loss (the graphon estimation).

Journal Article

Share this book

Add to My Shelf

ISOTONIC REGRESSION IN GENERAL DIMENSIONS

by Wang, Tengyao , Han, Qiyang , Chatterjee, Sabyasachi in Asymptotic methods , Convergence , Cubic lattice

2019

We study the least squares regression function estimator over the class of real-valued functions on [0, 1] d that are increasing in each coordinate. For uniformly bounded signals and with a fixed, cubic lattice design, we establish that the estimator achieves the minimax rate of order n −min{2/(d+2),1/d} in the empirical L₂ loss, up to polylogarithmic factors. Further, we prove a sharp oracle inequality, which reveals in particular that when the true regression function is piecewise constant on k hyperrectangles, the least squares estimator enjoys a faster, adaptive rate of convergence of (k/n)min(1,2/d), again up to polylogarithmic factors. Previous results are confined to the case d ≤ 2. Finally, we establish corresponding bounds (which are new even in the case d = 2) in the more challenging random design setting. There are two surprising features of these results: first, they demonstrate that it is possible for a global empirical risk minimisation procedure to be rate optimal up to polylogarithmic factors even when the corresponding entropy integral for the function class diverges rapidly; second, they indicate that the adaptation rate for shape-constrained estimators can be strictly worse than the parametric rate.

Journal Article

Share this book

Add to My Shelf

$Optimal cross-validation in density estimation with the$L^{2}$ -loss$

Optimal cross-validation in density estimation with the$L^{2}$ -loss

by Celisse, Alain in 62E17 , 62G07 , 62G09

2014

We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave-p-out CV procedure (Lpo), where p denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon V-fold cross-validation in terms of variability and computational complexity. ¶ From a theoretical point of view, closed-form expressions also enable to study the Lpo performance in terms of risk estimation. The optimality of leave-one-out (Loo), that is Lpo with p=1, is proved among CV procedures used for risk estimation. Two model selection frameworks are also considered: estimation, as opposed to identification. For estimation with finite sample size n, optimality is achieved for p large enough [with p/n=o(1)] to balance the overfitting resulting from the structure of the model collection. For identification, model selection consistency is settled for Lpo as long as p/n is conveniently related to the rate of convergence of the best estimator in the collection: (i) p/n\\to1 as n\\to+\\infty with a parametric rate, and (ii) p/n=o(1) with some nonparametric estimators. These theoretical results are validated by simulation experiments.

Journal Article

Share this book

Add to My Shelf

ROBUST LOW-RANK MATRIX ESTIMATION

by Elsener, Andreas , van de Geer, Sara in Asymptotic properties , Data analysis , Estimating techniques

2018

Many results have been proved for various nuclear norm penalized estimators of the uniform sampling matrix completion problem. However, most of these estimators are not robust: in most of the cases the quadratic loss function and its modifications are used. We consider robust nuclear norm penalized estimators using two well-known robust loss functions: the absolute value loss and the Huber loss. Under several conditions on the sparsity of the problem (i.e., the rank of the parameter matrix) and on the regularity of the risk function sharp and nonsharp oracle inequalities for these estimators are shown to hold with high probability. As a consequence, the asymptotic behavior of the estimators is derived. Similar error bounds are obtained under the assumption of weak sparsity, that is, the case where the matrix is assumed to be only approximately low-rank. In all of our results, we consider a high-dimensional setting. In this case, this means that we assume n ≤ pq. Finally, various simulations confirm our theoretical results.

Journal Article

Share this book

Add to My Shelf

A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems

by Zhang, Cun-Hui , Zhang, Tong in approximate solution , Concave regularization , Eigenvalues

2012

Concave regularization methods provide natural procedures for sparse recovery. However, they are difficult to analyze in the highdimensional setting. Only recently a few sparse recovery results have been established for some specific local solutions obtained via specialized numerical procedures. Still, the fundamental relationship between these solutions such as whether they are identical or their relationship to the global minimizer of the underlying nonconvex formulation is unknown. The current paper fills this conceptual gap by presenting a general theoretical framework showing that, under appropriate conditions, the global solution of nonconvex regularization leads to desirable recovery performance; moreover, under suitable conditions, the global solution corresponds to the unique sparse local solution, which can be obtained via different numerical procedures. Under this unified framework, we present an overview of existing results and discuss their connections. The unified view of this work leads to a more satisfactory treatment of concave high-dimensional sparse estimation procedures, and serves as a guideline for developing further numerical procedures for concave regularization.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter