Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
185
result(s) for
"oracle inequalities"
Sort by:
DEVIATION OPTIMAL LEARNING USING GREEDY Q-AGGREGATION
2012
Given a finite family of functions, the goal of model selection aggregation is to construct a procedure that mimics the function from this family that is the closest to an unknown regression function. More precisely, we consider a general regression model with fixed design and measure the distance between functions by the mean squared error at the design points. While procedures based on exponential weights are known to solve the problem of model selection aggregation in expectation, they are, surprisingly, sub-optimal in deviation. We propose a new formulation called Q-aggregation that addresses this limitation; namely, its solution leads to sharp oracle inequalities that are optimal in a minimax sense. Moreover, based on the new formulation, we design greedy Q-aggregation procedures that produce sparse aggregation models achieving the optimal rate. The convergence and performance of these greedy procedures are illustrated and compared with other standard methods on simulated examples.
Journal Article
Mirror averaging with sparsity priors
by
TSYBAKOV, ALEXANDRE B.
,
DALALYAN, ARNAK S.
in
Aggregation
,
aggregation of estimators
,
Approximation
2012
We consider the problem of aggregating the elements of a possibly infinite dictionary for building a decision procedure that aims at minimizing a given criterion. Along with the dictionary, an independent identically distributed training sample is available, on which the performance of a given procedure can be tested. In a fairly general set-up, we establish an oracle inequality for the Mirror Averaging aggregate with any prior distribution. By choosing an appropriate prior, we apply this oracle inequality in the context of prediction under sparsity assumption for the problems of regression with random design, density estimation and binary classification.
Journal Article
On the prediction performance of the Lasso
by
DALALYAN, ARNAK S.
,
LEDERER, JOHANNES
,
HEBIRI, MOHAMED
in
Economics and Finance
,
Humanities and Social Sciences
2017
Although the Lasso has been extensively studied, the relationship between its prediction performance and the correlations of the covariates is not fully understood. In this paper, we give new insights into this relationship in the context of multiple linear regression. We show, in particular, that the incorporation of a simple correlation measure into the tuning parameter can lead to a nearly optimal prediction performance of the Lasso even for highly correlated covariates. However, we also reveal that for moderately correlated covariates, the prediction performance of the Lasso can be mediocre irrespective of the choice of the tuning parameter. We finally show that our results also lead to near-optimal rates for the least-squares estimator with total variation penalty.
Journal Article
BAYESIAN FRACTIONAL POSTERIORS
by
Yang, Yun
,
Pati, Debdeep
,
Bhattacharya, Anirban
in
Bayes Theorem
,
Bayesian analysis
,
Divergence
2019
We consider the fractional posterior distribution that is obtained by updating a prior distribution via Bayes theorem with a fractional likelihood function, a usual likelihood function raised to a fractional power. First, we analyze the contraction property of the fractional posterior in a general misspecified framework. Our contraction results only require a prior mass condition on certain Kullback–Leibler (KL) neighborhood of the true parameter (or the KL divergence minimizer in the misspecified case), and obviate constructions of test functions and sieves commonly used in the literature for analyzing the contraction property of a regular posterior. We show through a counterexample that some condition controlling the complexity of the parameter space is necessary for the regular posterior to contract, rendering additional flexibility on the choice of the prior for the fractional posterior. Second, we derive a novel Bayesian oracle inequality based on a PAC-Bayes inequality in misspecified models. Our derivation reveals several advantages of averaging based Bayesian procedures over optimization based frequentist procedures. As an application of the Bayesian oracle inequality, we derive a sharp oracle inequality in multivariate convex regression problems. We also illustrate the theory in Gaussian process regression and density estimation problems.
Journal Article
BANDWIDTH SELECTION IN KERNEL DENSITY ESTIMATION: ORACLE INEQUALITIES AND ADAPTIVE MINIMAX OPTIMALITY
2011
We address the problem of density estimation with 𝕃 s -loss by selection of kernel estimators. We develop a selection procedure and derive corresponding 𝕃 s -risk oracle inequalities. It is shown that the proposed selection rule leads to the estimator being minimax adaptive over a scale of the anisotropic Nikol'skii classes. The main technical tools used in our derivations are uniform bounds on the 𝕃 s -norms of empirical processes developed recently by Goldenshluger and Lepski [Ann. Probab. (2011), to appear].
Journal Article
Simpler PAC-Bayesian bounds for hostile data
2018
PAC-Bayesian learning bounds are of the utmost interest to the learning community. Their role is to connect the generalization ability of an aggregation distribution ρ to its empirical risk and to its Kullback-Leibler divergence with respect to some prior distribution π. Unfortunately, most of the available bounds typically rely on heavy assumptions such as boundedness and independence of the observations. This paper aims at relaxing these constraints and provides PAC-Bayesian learning bounds that hold for dependent, heavy-tailed observations (hereafter referred to as hostile data). In these bounds the Kullack-Leibler divergence is replaced with a general version of Csiszár’s f-divergence. We prove a general PAC-Bayesian bound, and show how to use it in various hostile settings.
Journal Article
The Dantzig Selector: Statistical Estimation When p Is Much Larger than n
2007
In many important statistical applications, the number of variables or parameters p is much larger than the number of observations n. Suppose then that we have observations y = Xβ + z, where $\\beta \\in {\\bf R}^{p}$ is a parameter vector of interest, X is a data matrix with possibly far fewer rows than columns, n « p, and the $z_{i}\\text{'}{\\rm s}$ are i.i.d. N(0, σ²). Is it possible to estimate β reliably based on the noisy data y? To estimate β, we introduce a new estimator-we call it the Dantzig selector-which is a solution to the l₁-regularization problem $\\underset \\tilde{\\beta}\\in {\\bf R}^{p}\\to{{\\rm min}}\\|\\tilde{\\beta}\\|_{\\ell _{1}}$ subject to $\\|X^{\\ast }r\\|_{\\ell _{\\infty}}\\leq (1+t^{-1})\\sqrt{2\\,{\\rm log}\\,p}\\cdot \\sigma $, where r is the residual vector $y-X\\tilde{\\beta}$ and t is a positive scalar. We show that if X obeys a uniform uncertainty principle (with unit-normed columns) and if the true parameter vector β is sufficiently sparse (which here roughly guarantees that the model is identifiable), then with very large probability, $\\|\\hat{\\beta}-\\beta \\|_{\\ell _{2}}^{2}\\leq C^{2}\\cdot 2\\,{\\rm log}\\,p\\cdot \\left(\\sigma ^{2}+\\sum_{i}{\\rm min}(\\beta _{i}^{2},\\sigma ^{2})\\right)$. Our results are nonasymptotic and we give values for the constant C. Even though n may be much smaller than p, our estimator achieves a loss within a logarithmic factor of the ideal mean squared error one would achieve with an oracle which would supply perfect information about which coordinates are nonzero, and which were above the noise level. In multivariate regression and from a model selection viewpoint, our result says that it is possible nearly to select the best subset of variables by solving a very simple convex program, which, in fact, can easily be recast as a convenient linear program (LP).
Journal Article
ELASTIC-NET REGULARIZED HIGH-DIMENSIONAL NEGATIVE BINOMIAL REGRESSION
2022
We study a sparse negative binomial regression (NBR) for count data by showing the non-asymptotic advantages of using the elastic-net estimator. Two types of oracle inequalities are derived for the NBR’s elastic-net estimates by using the Compatibility Factor Condition and the Stabil Condition. The second type of oracle inequality is for the random design and can be extended to many ℓ1 + ℓ2 regularized M-estimations, with the corresponding empirical process having stochastic Lipschitz properties. We derive the concentration inequality for the suprema empirical processes for the weighted sum of negative binomial variables to show some high-probability events. We apply the method by showing the sign consistency, provided that the nonzero components in the true sparse vector are larger than a proper choice of the weakest signal detection threshold. In the second application, we show the grouping effect inequality with high probability. Third, under some assumptions for a design matrix, we can recover the true variable set with a high probability if the weakest signal detection threshold is large than the turning parameter up to a known constant. Lastly, we briefly discuss the de-biased elastic-net estimator, and numerical studies are given to support the proposal.
Journal Article
ORACLE INEQUALITIES AND OPTIMAL INFERENCE UNDER GROUP SPARSITY
2011
We consider the problem of estimating a sparse linear regression vector β * under a Gaussian noise model, for the purpose of both prediction and model selection. We assume that prior knowledge is available on the sparsity pattern, namely the set of variables is partitioned into prescribed groups, only few of which are relevant in the estimation process. This group sparsity assumption suggests us to consider the Group Lasso method as a means to estimate β * . We establish oracle inequalities for the prediction and ℓ 2 estimation errors of this estimator. These bounds hold under a restricted eigenvalue condition on the design matrix. Under a stronger condition, we derive bounds for the estimation error for mixed (2, p)-norms with 1 ≤ p ≤ ∞. When p = ∞, this result implies that a thresholded version of the Group Lasso estimator selects the sparsity pattern of β * with high probability. Next, we prove that the rate of convergence of our upper bounds is optimal in a minimax sense, up to a logarithmic factor, for all estimators over a class of group sparse vectors. Furthermore, we establish lower bounds for the prediction and ℓ 2 estimation errors of the usual Lasso estimator. Using this result, we demonstrate that the Group Lasso can achieve an improvement in the prediction and estimation errors as compared to the Lasso. An important application of our results is provided by the problem of estimating multiple regression equations simultaneously or multi-task learning. In this case, we obtain refinements of the results in [In Proc. of the 22nd Annual Conference on Learning Theory (COLT) (2009)], which allow us to establish a quantitative advantage of the Group Lasso over the usual Lasso in the multi-task setting. Finally, within the same setting, we show how our results can be extended to more general noise distributions, of which we only require the fourth moment to be finite. To obtain this extension, we establish a new maximal moment inequality, which may be of independent interest.
Journal Article