Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
17 result(s) for "Solway, Alec"
Sort by:
Conflict and competition between model-based and model-free control
A large literature has accumulated suggesting that human and animal decision making is driven by at least two systems, and that important functions of these systems can be captured by reinforcement learning algorithms. The “model-free” system caches and uses stimulus–value or stimulus–response associations, and the “model-based” system implements more flexible planning using a model of the world. However, it is not clear how the two systems interact during deliberation and how a single decision emerges from this process, especially when they disagree. Most previous work has assumed that while the systems operate in parallel, they do so independently, and they combine linearly to influence decisions. Using an integrated reinforcement learning/drift-diffusion model, we tested the hypothesis that the two systems interact in a non-linear fashion similar to other situations with cognitive conflict. We differentiated two forms of conflict: action conflict , a binary state representing whether the systems disagreed on the best action, and value conflict , a continuous measure of the extent to which the two systems disagreed on the difference in value between the available options. We found that decisions with greater value conflict were characterized by reduced model-based control and increased caution both with and without action conflict. Action conflict itself (the binary state) acted in the opposite direction, although its effects were less prominent. We also found that between-system conflict was highly correlated with within-system conflict, and although it is less clear a priori why the latter might influence the strength of each system above its standard linear contribution, we could not rule it out. Our work highlights the importance of non-linear conflict effects, and provides new constraints for more detailed process models of decision making. It also presents new avenues to explore with relation to disorders of compulsivity, where an imbalance between systems has been implicated.
Information uncertainty influences learning strategy from sequentially delayed rewards
When receiving a reward after a sequence of multiple events, how do we determine which event caused the reward? This problem, known as temporal credit assignment, can be difficult for humans to solve given the temporal uncertainty in the environment. Research to date has attempted to isolate dimensions of delay and reward during decision-making, but algorithmic solutions to temporal learning problems and the effect of uncertainty on learning remain underexplored. To further our understanding, we adapted a reward learning task that creates a temporal credit assignment problem by combining sequentially delayed rewards, intervening events, and varying uncertainty via the amount of information presented during feedback. Using computational modeling, two learning strategies were developed: an eligibility trace, whereby previously selected actions are updated as a function of the temporal sequence, and a tabular update, whereby only systematically related past actions (rather than unrelated intervening events) are updated. We hypothesized that reduced information uncertainty would correlate with increased use of the tabular strategy, given the model’s capacity to incorporate additional feedback information. Both models effectively learned the task, and predicted choices made by participants (N = 142) as well as specific behavioral signatures of credit assignment. Consistent with our hypothesis, the tabular model outperformed the eligibility model under low information uncertainty, as evidenced by more accurate predictions of participants’ behavior and an increase in tabular weight. These findings provide new insights into the mechanisms implemented by humans to solve temporal credit assignment and adapt their strategy in varying environments.
Neural Activity in Human Hippocampal Formation Reveals the Spatial Context of Retrieved Memories
In many species, spatial navigation is supported by a network of place cells that exhibit increased firing whenever an animal is in a certain region of an environment. Does this neural representation of location form part of the spatiotemporal context into which episodic memories are encoded? We recorded medial temporal lobe neuronal activity as epilepsy patients performed a hybrid spatial and episodic memory task. We identified place-responsive cells active during virtual navigation and then asked whether the same cells activated during the subsequent recall of navigation-related memories without actual navigation. Place-responsive cell activity was reinstated during episodic memory retrieval. Neuronal firing during the retrieval of each memory was similar to the activity that represented the locations in the environment where the memory was initially encoded.
Direct recordings of grid-like neuronal activity in human spatial navigation
Grid cell activity in the rodent and non-human primate entorhinal cortex is thought to provide spatial location information to the hippocampus for navigation and spatial processing. Here, Jacobs et al . examined single neuron spiking activities from human subjects performing a virtual spatial navigation task and show the presence of grid-like firing activity. Grid cells in the entorhinal cortex appear to represent spatial location via a triangular coordinate system. Such cells, which have been identified in rats, bats and monkeys, are believed to support a wide range of spatial behaviors. Recording neuronal activity from neurosurgical patients performing a virtual-navigation task, we identified cells exhibiting grid-like spiking patterns in the human brain, suggesting that humans and simpler animals rely on homologous spatial-coding schemes.
Evidence integration in model-based tree search
Research on the dynamics of reward-based, goal-directed decision making has largely focused on simple choice, where participants decide among a set of unitary, mutually exclusive options. Recent work suggests that the deliberation process underlying simple choice can be understood in terms of evidence integration: Noisy evidence in favor of each option accrues over time, until the evidence in favor of one option is significantly greater than the rest. However, real-life decisions often involve not one, but several steps of action, requiring a consideration of cumulative rewards and a sensitivity to recursive decision structure. We present results from two experiments that leveraged techniques previously applied to simple choice to shed light on the deliberation process underlying multistep choice. We interpret the results from these experiments in terms of a new computational model, which extends the evidence accumulation perspective to multiple steps of action.
Optimal Behavioral Hierarchy
Human behavior has long been recognized to display hierarchical structure: actions fit together into subtasks, which cohere into extended goal-directed activities. Arranging actions hierarchically has well established benefits, allowing behaviors to be represented efficiently by the brain, and allowing solutions to new tasks to be discovered easily. However, these payoffs depend on the particular way in which actions are organized into a hierarchy, the specific way in which tasks are carved up into subtasks. We provide a mathematical account for what makes some hierarchies better than others, an account that allows an optimal hierarchy to be identified for any set of tasks. We then present results from four behavioral experiments, suggesting that human learners spontaneously discover optimal action hierarchies.
Loss Aversion Correlates With the Propensity to Deploy Model-Based Control
Reward-based decision making is thought to be driven by at least two different types of decision systems: a simple stimulus--response cache-based system which embodies the common-sense notion of `habit', for which model-free reinforcement learning serves as a computational substrate, and a more deliberate, prospective, model-based planning system. Previous work has shown that loss aversion, a well-studied measure of how much more on average individuals weigh losses relative to gains during decision making, is reduced when participants take all possible decisions and outcomes into account including future ones, relative to when they myopically focus on the current decision. Model-based control offers a putative mechanism for implementing such foresight. Using a well-powered data set (N=117) in which participants completed two different tasks designed to measure each of the two quantities of interest, and four models of choice data for these tasks, we found consistent evidence of a relationship between loss aversion and model-based control but in the direction opposite to that expected based on previous work: loss aversion had a positive relationship with model-based control. We did not find evidence for a relationship between either decision system and risk aversion, a related aspect of subjective utility.
Transfer of information across repeated decisions in general and in obsessive–compulsive disorder
Real-life decisions are often repeated. Whether considering taking a job in a new city, or doing something mundane like checking if the stove is off, decisions are frequently revisited even if no new information is available. This mode of behavior takes a particularly pathological form in obsessive–compulsive disorder (OCD), which is marked by individuals’ redeliberating previously resolved decisions. Surprisingly, little is known about how information is transferred across decision episodes in such circumstances, and whether and how such transfer varies in OCD. In two experiments, data from a repeated decision-making task and computational modeling revealed that both implicit and explicit memories of previous decisions affected subsequent decisions by biasing the rate of evidence integration. Further, we replicated previous work demonstrating impairments in baseline decision-making as a function of self-reported OCD symptoms, and found that information transfer effects specifically due to implicit memory were reduced, offering computational insight into checking behavior.
Simulating future value in intertemporal choice
The laboratory study of how humans and other animals trade-off value and time has a long and storied history, and is the subject of a vast literature. However, despite a long history of study, there is no agreed upon mechanistic explanation of how intertemporal choice preferences arise. Several theorists have recently proposed model-based reinforcement learning as a candidate framework. This framework describes a suite of algorithms by which a model of the environment, in the form of a state transition function and reward function, can be converted on-line into a decision. The state transition function allows the model-based system to make decisions based on projected future states, while the reward function assigns value to each state, together capturing the necessary components for successful intertemporal choice. Empirical work has also pointed to a possible relationship between increased prospection and reduced discounting. In the current paper, we look for direct evidence of a relationship between temporal discounting and model-based control in a large new data set (n = 168). However, testing the relationship under several different modeling formulations revealed no indication that the two quantities are related.
Positional and temporal clustering in serial order memory
The well-known finding that responses in serial recall tend to be clustered around the position of the target item has bolstered positional-coding theories of serial order memory. In the present study, we show that this effect is confounded with another well-known finding—that responses in serial recall tend to also be clustered around the position of the prior recall (temporal clustering). The confound can be alleviated by conditioning each analysis on the positional accuracy of the previously recalled item. The revised analyses show that temporal clustering is much more prevalent in serial recall than is positional clustering. A simple associative chaining model with asymmetric neighboring, remote associations, and a primacy gradient can account for these effects. Using the same parameter values, the model produces reasonable serial position curves and captures the changes in item and order information across study-test trials. In contrast, a prominent positional coding model cannot account for the pattern of clustering uncovered by the new analyses.