Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
1,999 result(s) for "stochastic approximation algorithms"
Sort by:
CENTRAL LIMIT THEOREMS OF A RECURSIVE STOCHASTIC ALGORITHM WITH APPLICATIONS TO ADAPTIVE DESIGNS
Stochastic approximation algorithms have been the subject of an enormous body of literature, both theoretical and applied. Recently, Lamelle and Pages [Ann. Appl. Probab. 23 (2013) 1409-1436] presented a link between the stochastic approximation and response-adaptive designs in clinical trials based on randomized urn models investigated in Bai and Hu [Stochastic Process. Appl. 80 (1999) 87-101; Ann. Appl. Probab. 15 (2005) 914-940], and derived the asymptotic normality or central limit theorem for the normalized procedure using a central limit theorem for the stochastic approximation algorithm. However, the classical central limit theorem for the stochastic approximation algorithm does not include all cases of its regression function, creating a gap between the results of Lamelle and Pages [Ann. Appl Probab. 23 (2013) 1409-1436] and those of Bai and Hu [Ann. Appl. Probab. 15 (2005) 914-940] for randomized urn models. In this paper, we establish new central limit theorems of the stochastic approximation algorithm under the popular Lindeberg condition to fill this gap. Moreover, we prove that the process of the algorithms can be approximated by a Gaussian process that is a solution of a stochastic differential equation. In our application, we investigate a more involved family of urn models and related adaptive designs in which it is possible to remove the balls from the urn, and the expectation of the total number of balls updated at each stage is not necessary a constant. The asymptotic properties are derived under much less stringent assumptions than those in Bai and Hu [Stochastic Process. Appl. 80 (1999) 87-101; Ann. Appl. Probab. 15 (2005) 914-940] and Laruelle and Pagès [Ann. Appl. Probab. 23 (2013) 1409-1436].
On Information Distortions in Online Ratings
Consumer reviews and ratings of products and services have become ubiquitous on the Internet. This paper analyzes, given the sequential nature of reviews and the limited feedback of such past reviews, the information content they communicate to future customers. We consider a model with heterogeneous customers who buy a product of unknown quality and we focus on two different informational settings. In the first setting, customers observe the whole history of past reviews. In the second one they only observe the sample mean of past reviews. We examine under which conditions, in each setting, customers can recover the true quality of the product based on the feedback they observe. In the case of total monitoring, if consumers adopt a fully rational Bayesian updating paradigm, then they asymptotically learn the unknown quality. With access to only the sample mean of past reviews, inference becomes intricate for customers and it is not clear if, when, and how social learning can take place. We first analyze the setting when customers interpret the mean as the proxy of quality. We show that in the long run, the sample mean of reviews stabilizes and, in general, customers overestimate the underlying quality of the product. We establish properties of the bias, stemming from the selection associated with observing only reviews of customers who purchase. Then, we show the existence of a simple non-Bayesian quality inference rule that leads to social learning when all customers use such a rule. The results point to the strong information content of even limited statistics of past reviews as long as customers have minimal sophistication.
RECURSIVE NONPARAMETRIC REGRESSION ESTIMATION FOR INDEPENDENT FUNCTIONAL DATA
We propose an automatic selection of the bandwidth of the recursive nonparametric estimation of the regression function defined by the stochastic approximation algorithm. Here the explanatory data are curves and the response is real. We compare our recursive estimators with the nonrecursive estimator proposed by Ferraty and Vieu (2002). The two methods are based on the wild boot-strapping approach, where resampling is done from a suitably estimated residual distribution. Moreover, we establish a central limit theorem for our proposed recursive estimators. We use the wild bootstrap to select the bandwidth and some special stepsizes. As such, the proposed recursive estimators are competitive in terms of the estimation error, but much better in terms of computational costs. The proposed estimators are used in simulated and real functional data sets.
Construction of Bayesian deformable models via a stochastic approximation algorithm: A convergence study
The problem of the definition and estimation of generative models based on deformable templates from raw data is of particular importance for modeling non-aligned data affected by various types of geometric variability. This is especially true in shape modeling in the computer vision community or in probabilistic atlas building in computational anatomy. A first coherent statistical framework modeling geometric variability as hidden variables was described in Allassonnière, Amit and Trouvé [J. R. Stat. Soc. Ser. B Stat. Methodol. 69 (2007) 3-29]. The present paper gives a theoretical proof of convergence of effective stochastic approximation expectation strategies to estimate such models and shows the robustness of this approach against noise through numerical experiments in the context of handwritten digit modeling.
An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy method
In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, i.e., estimating the value function of a model-free Markov reward process using the linear function approximation architecture and with memory and computation costs scaling quadratically in the size of the feature set. The algorithms employ the multi-timescale stochastic approximation variant of the very popular cross entropy optimization method which is a model based search method to find the global optimum of a real-valued function. A proof of convergence of the algorithms using the ODE method is provided. We supplement our theoretical results with experimental comparisons. The algorithms achieve good performance fairly consistently on many RL benchmark problems with regards to computational efficiency, accuracy and stability.
STRONGLY REINFORCED PÓLYA URNS WITH GRAPH-BASED COMPETITION
We introduce a class of reinforcement models where, at each time step t, one first chooses a random subset At of colours (independently of the past) from n colours of balls, and then chooses a colour i from this subset with probability proportional to the number of balls of colour i in the urn raised to the power α > 1. We consider stability of equilibria for such models and establish the existence of phase transitions in a number of examples, including when the colours are the edges of a graph; a context which is a toy model for the formation and reinforcement of neural connections. We conjecture that for any graph G and all α sufficiently large, the set of stable equilibria is supported on so-called whisker-forests, which are forests whose components have diameter between 1 and 3.
Adaptive Regulation for Hammerstein and Wiener Systems with Event-Triggered Observations
The paper considers the adaptive regulation for the Hammerstein and Wiener systems with event-triggered observations. The authors adopt a direct approach, i.e., without identifying the unknown parameters and functions within the systems, adaptive regulators are directly designed based on the event-triggered observations on the regulation errors. The adaptive regulators belong to the stochastic approximation algorithms and under moderate assumptions, the authors prove that the adaptive regulators are optimal for both the Hammerstein and Wiener systems in the sense that the squared regulation errors are asymptotically minimized. The authors also testify the theoretical results through simulation studies.
Quasi-Newton smoothed functional algorithms for unconstrained and constrained simulation optimization
We propose a multi-time scale quasi-Newton based smoothed functional (QN-SF) algorithm for stochastic optimization both with and without inequality constraints. The algorithm combines the smoothed functional (SF) scheme for estimating the gradient with the quasi-Newton method to solve the optimization problem. Newton algorithms typically update the Hessian at each instant and subsequently (a) project them to the space of positive definite and symmetric matrices, and (b) invert the projected Hessian. The latter operation is computationally expensive. In order to save computational effort, we propose in this paper a quasi-Newton SF (QN-SF) algorithm based on the Broyden-Fletcher-Goldfarb-Shanno (BFGS) update rule. In Bhatnagar (ACM TModel Comput S. 18(1): 27–62, 2007 ), a Jacobi variant of Newton SF (JN-SF) was proposed and implemented to save computational effort. We compare our QN-SF algorithm with gradient SF (G-SF) and JN-SF algorithms on two different problems – first on a simple stochastic function minimization problem and the other on a problem of optimal routing in a queueing network. We observe from the experiments that the QN-SF algorithm performs significantly better than both G-SF and JN-SF algorithms on both the problem settings. Next we extend the QN-SF algorithm to the case of constrained optimization. In this case too, the QN-SF algorithm performs much better than the JN-SF algorithm. Finally we present the proof of convergence for the QN-SF algorithm in both unconstrained and constrained settings.
An Introduction to Development of Centralized and Distributed Stochastic Approximation Algorithm with Expanding Truncations
The stochastic approximation algorithm (SAA), starting from the pioneer work by Robbins and Monro in 1950s, has been successfully applied in systems and control, statistics, machine learning, and so forth. In this paper, we will review the development of SAA in China, to be specific, the stochastic approximation algorithm with expanding truncations (SAAWET) developed by Han-Fu Chen and his colleagues during the past 35 years. We first review the historical development for the centralized algorithm including the probabilistic method (PM) and the ordinary differential equation (ODE) method for SAA and the trajectory-subsequence method for SAAWET. Then, we will give an application example of SAAWET to the recursive principal component analysis. We will also introduce the recent progress on SAAWET in a networked and distributed setting, named the distributed SAAWET (DSAAWET).
Distributed dynamic stochastic approximation algorithm over time-varying networks
In this paper, a distributed stochastic approximation algorithm is proposed to track the dynamic root of a sum of time-varying regression functions over a network. Each agent updates its estimate by using the local observation, the dynamic information of the global root, and information received from its neighbors. Compared with similar works in optimization area, we allow the observation to be noise-corrupted, and the noise condition is much weaker. Furthermore, instead of the upper bound of the estimate error, we present the asymptotic convergence result of the algorithm. The consensus and convergence of the estimates are established. Finally, the algorithm is applied to a distributed target tracking problem and the numerical example is presented to demonstrate the performance of the algorithm.