Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
12,803 result(s) for "Prediction error"
Sort by:
Subsecond dopamine fluctuations in human striatum encode superposed error signals about actual and counterfactual reward
In the mammalian brain, dopamine is a critical neuromodulator whose actions underlie learning, decision-making, and behavioral control. Degeneration of dopamine neurons causes Parkinson’s disease, whereas dysregulation of dopamine signaling is believed to contribute to psychiatric conditions such as schizophrenia, addiction, and depression. Experiments in animal models suggest the hypothesis that dopamine release in human striatum encodes reward prediction errors (RPEs) (the difference between actual and expected outcomes) during ongoing decision-making. Blood oxygen level-dependent (BOLD) imaging experiments in humans support the idea that RPEs are tracked in the striatum; however, BOLD measurements cannot be used to infer the action of any one specific neurotransmitter. We monitored dopamine levels with subsecond temporal resolution in humans (n = 17) with Parkinson’s disease while they executed a sequential decision-making task. Participants placed bets and experienced monetary gains or losses. Dopamine fluctuations in the striatum fail to encode RPEs, as anticipated by a large body of work in model organisms. Instead, subsecond dopamine fluctuations encode an integration of RPEs with counterfactual prediction errors, the latter defined by how much better or worse the experienced outcome could have been. How dopamine fluctuations combine the actual and counterfactual is unknown. One possibility is that this process is the normal behavior of reward processing dopamine neurons, which previously had not been tested by experiments in animal models. Alternatively, this superposition of error terms may result from an additional yet-to-be-identified subclass of dopamine neurons.
Fixed rank kriging for very large spatial data sets
Spatial statistics for very large spatial data sets is challenging. The size of the data set, n, causes problems in computing optimal spatial predictors such as kriging, since its computational cost is of order [graphic removed] . In addition, a large data set is often defined on a large spatial domain, so the spatial process of interest typically exhibits non-stationary behaviour over that domain. A flexible family of non-stationary covariance functions is defined by using a set of basis functions that is fixed in number, which leads to a spatial prediction method that we call fixed rank kriging. Specifically, fixed rank kriging is kriging within this class of non-stationary covariance functions. It relies on computational simplifications when n is very large, for obtaining the spatial best linear unbiased predictor and its mean-squared prediction error for a hidden spatial process. A method based on minimizing a weighted Frobenius norm yields best estimators of the covariance function parameters, which are then substituted into the fixed rank kriging equations. The new methodology is applied to a very large data set of total column ozone data, observed over the entire globe, where n is of the order of hundreds of thousands.
Neural dissociation between reward and salience prediction errors through the lens of optimistic bias
The question of how the brain represents reward prediction errors is central to reinforcement learning and adaptive, goal‐directed behavior. Previous studies have revealed prediction error representations in multiple electrophysiological signatures, but it remains elusive whether these electrophysiological correlates underlying prediction errors are sensitive to valence (in a signed form) or to salience (in an unsigned form). One possible reason concerns the loose correspondence between objective probability and subjective prediction resulting from the optimistic bias, that is, the tendency to overestimate the likelihood of encountering positive future events. In the present electroencephalography (EEG) study, we approached this question by directly measuring participants' idiosyncratic, trial‐to‐trial prediction errors elicited by subjective and objective probabilities across two experiments. We adopted monetary gain and loss feedback in Experiment 1 and positive and negative feedback as communicated by the same zero‐value feedback in Experiment 2. We provided electrophysiological evidence in time and time‐frequency domains supporting both reward and salience prediction error signals. Moreover, we showed that these electrophysiological signatures were highly flexible and sensitive to an optimistic bias and various forms of salience. Our findings shed new light on multiple presentations of prediction error in the human brain, which differ in format and functional role. We measured subjective and objective prediction error signals across two EEG tasks. We found prediction error representations in multiple neural signatures. The variety of neural prediction errors is further modulated by optimistic bias.
Long-memory recursive prediction error method for identification of continuous-time fractional models
This paper deals with recursive continuous-time system identification using fractional-order models. Long-memory recursive prediction error method is proposed for recursive estimation of all parameters of fractional-order models. When differentiation orders are assumed known, least squares and prediction error methods, being direct extensions to fractional-order models of the classic methods used for integer-order models, are compared to our new method, the long-memory recursive prediction error method. Given the long-memory property of fractional models, Monte Carlo simulations prove the efficiency of our proposed algorithm. Then, when the differentiation orders are unknown, two-stage algorithms are necessary for both parameter and differentiation-order estimation. The performances of the new proposed recursive algorithm are studied through Monte Carlo simulations. Finally, the proposed algorithm is validated on a biological example where heat transfers in lungs are modeled by using thermal two-port network formalism with fractional models.
Credit assignment in movement-dependent reinforcement learning
When a person fails to obtain an expected reward from an object in the environment, they face a credit assignment problem: Did the absence of reward reflect an extrinsic property of the environment or an intrinsic error in motor execution? To explore this problem, we modified a popular decision-making task used in studies of reinforcement learning, the two-armed bandit task. We compared a version in which choices were indicated by key presses, the standard response in such tasks, to a version in which the choices were indicated by reaching movements, which affords execution failures. In the key press condition, participants exhibited a strong risk aversion bias; strikingly, this bias reversed in the reaching condition. This result can be explained by a reinforcement model wherein movement errors influence decision-making, either by gating reward prediction errors or by modifying an implicit representation of motor competence. Two further experiments support the gating hypothesis. First, we used a condition in which we provided visual cues indicative of movement errors but informed the participants that trial outcomes were independent of their actual movements. The main result was replicated, indicating that the gating process is independent of participants’ explicit sense of control. Second, individuals with cerebellar degeneration failed to modulate their behavior between the key press and reach conditions, providing converging evidence of an implicit influence of movement error signals on reinforcement learning. These results provide a mechanistically tractable solution to the credit assignment problem.
Risk prediction error signaling: A two-component response?
Organisms use rewards to navigate and adapt to (uncertain) environments. Error-based learning about rewards is supported by the dopaminergic system, which is thought to signal reward prediction errors to make adjustments to past predictions. More recently, the phasic dopamine response was suggested to have two components: the first rapid component is thought to signal the detection of a potentially rewarding stimulus; the second, slightly later component characterizes the stimulus by its reward prediction error. Error-based learning signals have also been found for risk. However, whether the neural generators of these signals employ a two-component coding scheme like the dopaminergic system is unknown. Here, using human high density EEG, we ask whether risk learning, or more generally speaking surprise-based learning under uncertainty, is similarly comprised of two temporally dissociable components. Using a simple card game, we show that the risk prediction error is reflected in the amplitude of the P3b component. This P3b modulation is preceded by an earlier component, that is modulated by the stimulus salience. Source analyses are compatible with the idea that both the early salience signal and the later risk prediction error signal are generated in insular, frontal, and temporal cortex. The identified sources are parts of the risk processing network that receives input from noradrenergic cells in the locus coeruleus. Finally, the P3b amplitude modulation is mirrored by an analogous modulation of pupil size, which is consistent with the idea that both the P3b and pupil size indirectly reflect locus coeruleus activity. •Dopaminergic neurons first signal detection, then reward prediction error.•P3b ERP component amplitude correlates with risk prediction error magnitude.•Earlier components signal stimulus salience.•Both components share common sources in the noradrenergic risk processing network.
Separate mesocortical and mesolimbic pathways encode effort and reward learning signals
Optimal decision making mandates organisms learn the relevant features of choice options. Likewise, knowing how much effort we should expend can assume paramount importance. A mesolimbic network supports reward learning, but it is unclear whether other choice features, such as effort learning, rely on this same network. Using computational fMRI, we show parallel encoding of effort and reward prediction errors (PEs) within distinct brain regions, with effort PEs expressed in dorsomedial prefrontal cortex and reward PEs in ventral striatum. We show a common mesencephalic origin for these signals evident in overlapping, but spatially dissociable, dopaminergic midbrain regions expressing both types of PE. During action anticipation, reward and effort expectations were integrated in ventral striatum, consistent with a computation of an overall net benefit of a stimulus. Thus, we show that motivationally relevant stimulus features are learned in parallel dopaminergic pathways, with formation of an integrated utility signal at choice.
Sumca
We propose a simple, unified, Monte-Carlo-assisted approach (called ‘Sumca’) to second-order unbiased estimation of the mean-squared prediction error (MSPE) of a small area predictor.The MSPE estimator proposed is easy to derive, has a simple expression and applies to a broad range of predictors that include the traditional empirical best linear unbiased predictor, empirical best predictor and post-model-selection empirical best linear unbiased predictor and empirical best predictor as special cases. Furthermore, the leading term of the MSPE estimator proposed is guaranteed positive; the lower order term corresponds to a bias correction, which can be evaluated via a Monte Carlo method. The computational burden for the Monte Carlo evaluation is much less, compared with other Monte-Carlo-based methods that have been used for producing second-order unbiased MSPE estimators, such as the double bootstrap and Monte Carlo jackknife. The Sumca estimator also has a nice stability feature. Theoretical and empirical results demonstrate properties and advantages of the Sumca estimator.
Effects of Direct Social Experience on Trust Decisions and Neural Reward Circuitry
The human striatum is integral for reward-processing and supports learning by linking experienced outcomes with prior expectations. Recent endeavors implicate the striatum in processing outcomes of social interactions, such as social approval/rejection, as well as in learning reputations of others. Interestingly, social impressions often influence our behavior with others during interactions. Information about an interaction partner's moral character acquired from biographical information hinders updating of expectations after interactions via top down modulation of reward circuitry. An outstanding question is whether initial impressions formed through experience similarly modulate the ability to update social impressions at the behavioral and neural level. We investigated the role of experienced social information on trust behavior and reward-related BOLD activity. Participants played a computerized ball-tossing game with three fictional partners manipulated to be perceived as good, bad, or neutral. Participants then played an iterated trust game as investors with these same partners while undergoing fMRI. Unbeknownst to participants, partner behavior in the trust game was random and unrelated to their ball-tossing behavior. Participants' trust decisions were influenced by their prior experience in the ball-tossing game, investing less often with the bad partner compared to the good and neutral. Reinforcement learning models revealed that participants were more sensitive to updating their beliefs about good and bad partners when experiencing outcomes consistent with initial experience. Increased striatal and anterior cingulate BOLD activity for positive versus negative trust game outcomes emerged, which further correlated with model-derived prediction error learning signals. These results suggest that initial impressions formed from direct social experience can be continually shaped by consistent information through reward learning mechanisms.
Robust estimation of mean squared prediction error in small-area estimation
The nested-error regression model is one of the best-known models in small area estimation. A small area mean is often expressed as a linear combination of fixed effects and realized values of random effects. In such analyses, prediction is made by borrowing strength from other related areas or sources and mean-squared prediction error (MSPE) is often used as a measure of uncertainty. In this article, we propose a bias-corrected analytical estimation of MSPE as well as a moment-match jackknife method to estimate the MSPE without specific assumptions about the distributions of the data. Theoretical and empirical studies are carried out to investigate performance of the proposed methods with comparison to existing procedures. Le modèle de régression à erreur imbriquée est l’un des mieux connus pour l’estimation sur des petits domaines. La moyenne d’un petit domaine est souvent exprimée comme une combinaison linéaire d’effets fixes et de valeurs réalisées d’effets aléatoires. Pour de telles analyses, les prévisions sont effectuées en empruntant de l’information d’autres domaines associés ou d’autres sources, et l’erreur quadratique moyenne de prévision (EQMP) sert souvent à mesurer l’incertitude. Les auteurs proposent une estimation analytique de l’EQMP corrigée pour le biais ainsi qu’une méthode jackknife d’appariement des moments afin d’estimer l’EQMP sans formuler d’hypothèses spécifiques sur la distribution des données. Ils présentent des études théoriques et empiriques comparant la performance des méthodes proposées aux procédures existantes.