Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
35 result(s) for "Audibert, Jean-Yves"
Sort by:
Regret in Online Combinatorial Optimization
We address online linear optimization problems when the possible actions of the decision maker are represented by binary vectors. The regret of the decision maker is the difference between her realized loss and the minimal loss she would have achieved by picking, in hindsight, the best possible action. Our goal is to understand the magnitude of the best possible (minimax) regret. We study the problem under three different assumptions for the feedback the decision maker receives: full information, and the partial information models of the so-called \"semi-bandit\" and \"bandit\" problems. In the full information case we show that the standard exponentially weighted average forecaster is a provably suboptimal strategy. For the semi-bandit model, by combining the Mirror Descent algorithm and the INF (Implicitely Normalized Forecaster) strategy, we are able to prove the first optimal bounds. Finally, in the bandit case we discuss existing results in light of a new lower bound, and suggest a conjecture on the optimal regret in that case.
ROBUST LINEAR LEAST SQUARES REGRESSION
We consider the problem of robustly predicting as well as the best linear combination of d given functions in least squares regression, and variants of this problem including constraints on the parameters of the linear combination. For the ridge estimator and the ordinary least squares estimator, and their variants, we provide new risk bounds of order d/n without logarithmic factor unlike some standard results, where n is the size of the training data. We also provide a new estimator with better deviations in the presence of heavy-tailed noise. It is based on truncating differences of losses in a min-max framework and satisfies a d/n risk bound both in expectation and in deviations. The key common surprising factor of these results is the absence of exponential moment condition on the output distribution while achieving exponential deviations. All risk bounds are obtained through a PAC-Bayesian analysis on truncated differences of losses. Experimental results strongly back up our truncated min-max estimator.
Fast Learning Rates in Statistical Inference through Aggregation
We develop minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set g up to the smallest possible additive term, called the convergence rate. When the reference set is finite and when n denotes the size of the training data, we provide minimax convergence rates of the form $C(\\frac{log|g|}{n})^{\\upsilon}$ with tight evaluation of the positive constant C and with exact $0 < \\upsilon \\leq 1$ , the latter value depending on the convexity of the loss function and on the level of noise in the output distribution. The risk upper bounds are based on a sequential randomized algorithm, which at each step concentrates on functions having both low risk and low variance with respect to the previous step prediction function. Our analysis puts forward the links between the probabilistic and worst-case viewpoints, and allows to obtain risk bounds unachievable with the standard statistical learning approach. One of the key ideas of this work is to use probabilistic inequalities with respect to appropriate (Gibbs) distributions on the prediction function space instead of using them with respect to the distribution generating the data. The risk lower bounds are based on refinements of the Assouad lemma taking particularly into account the properties of the loss function. Our key example to illustrate the upper and lower bounds is to consider the $L_{q}$ -regression setting for which an exhaustive analysis of the convergence rates is given while q ranges in [1; +∞[.
Fast Learning Rates for Plug-In Classifiers
It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, that is, rates faster than $n^{-1/2}$. The work on this subject has suggested the following two conjectures: (i) the best achievable fast rate is of the order n⁻¹, and (ii) the plug-in classifiers generally converge more slowly than the classifiers based on empirical risk minimization. We show that both conjectures are not correct. In particular, we construct plug-in classifiers that can achieve not only fast, but also super-fast rates, that is, rates faster than n⁻¹. We establish minimax lower bounds showing that the obtained rates cannot be improved.
Supplement to \Robust linear least squares regression\
This supplementary material provides the proofs of Theorems 2.1, 2.2 and 3.1 of the article ''Robust linear least squares regression''.
Bandit View on Noisy Optimization
In this chapter, we investigate the problem of function optimization with a finite number of noisy evaluations. While at first one may think that simple repeated sampling can overcome the difficulty introduced by noisy evaluations, it is far from being an optimal strategy. Indeed, to make the best use of the evaluations, one may want to estimate the seemingly best options more precisely, while for bad options a rough estimate might be enough. This reasoning leads to non-trivial algorithms, which depend on the objective criterion that we set and on how we define the budget constraint on the number of
Optimization for Machine Learning
The interplay between optimization and machine learning is one of the most important developments in modern computational science. Optimization formulations and methods are proving to be vital in designing algorithms to extract essential knowledge from huge volumes of data. Machine learning, however, is not simply a consumer of optimization technology but a rapidly evolving field that is itself generating new optimization ideas. This book captures the state of the art of the interaction between optimization and machine learning in a way that is accessible to researchers in both fields.Optimization approaches have enjoyed prominence in machine learning because of their wide applicability and attractive theoretical properties. The increasing complexity, size, and variety of today's machine learning models call for the reassessment of existing assumptions. This book starts the process of reassessment. It describes the resurgence in novel contexts of established frameworks such as first-order methods, stochastic approximations, convex relaxations, interior-point methods, and proximal methods. It also devotes attention to newer themes such as regularized optimization, robust optimization, gradient and subgradient methods, splitting techniques, and second-order methods. Many of these techniques draw inspiration from other fields, including operations research, theoretical computer science, and subfields of optimization. The book will enrich the ongoing cross-fertilization between the machine learning community and these other fields, and within the broader optimization community.
Robust linear least squares regression
We consider the problem of robustly predicting as well as the best linear combination of \\(d\\) given functions in least squares regression, and variants of this problem including constraints on the parameters of the linear combination. For the ridge estimator and the ordinary least squares estimator, and their variants, we provide new risk bounds of order \\(d/n\\) without logarithmic factor unlike some standard results, where \\(n\\) is the size of the training data. We also provide a new estimator with better deviations in the presence of heavy-tailed noise. It is based on truncating differences of losses in a min--max framework and satisfies a \\(d/n\\) risk bound both in expectation and in deviations. The key common surprising factor of these results is the absence of exponential moment condition on the output distribution while achieving exponential deviations. All risk bounds are obtained through a PAC-Bayesian analysis on truncated differences of losses. Experimental results strongly back up our truncated min--max estimator.
Fast learning rates in statistical inference through aggregation
We develop minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set \\(\\mathcal{G}\\) up to the smallest possible additive term, called the convergence rate. When the reference set is finite and when \\(n\\) denotes the size of the training data, we provide minimax convergence rates of the form \\(C(\\frac{\\log|\\mathcal{G}|}{n})^v\\) with tight evaluation of the positive constant \\(C\\) and with exact \\(0
Robustness of stochastic bandit policies
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. [2] exhibit a policy such that with probability at least 1-1/n, the regret of the policy is of order log n. They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. [3]. This work first answers an open question: it extends this negative result to any anytime policy (i.e. any policy that does not take the number of plays n into account). Another contribution of this paper is to design robust anytime policies for specific multi-armed bandit problems in which some restrictions are put on the set of possible distributions of the different arms. We also show that, for any policy (i.e. even when the number of plays n is known), the regret is of order log n with probability at least 1-1/n, so that the policy of Audibert et al. has the best possible deviation properties.