Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
59
result(s) for
"Khamassi, Mehdi"
Sort by:
Strong and weak alignment of large language models with human values
2024
Minimizing negative impacts of Artificial Intelligent (AI) systems on human societies without human supervision requires them to be able to align with human values. However, most current work only addresses this issue from a technical point of view, e.g., improving current methods relying on reinforcement learning from human feedback, neglecting what it means and is required for alignment to occur. Here, we propose to distinguish strong and weak value alignment. Strong alignment requires cognitive abilities (either human-like or different from humans) such as understanding and reasoning about agents’ intentions and their ability to causally produce desired effects. We argue that this is required for AI systems like large language models (LLMs) to be able to recognize situations presenting a risk that human values may be flouted. To illustrate this distinction, we present a series of prompts showing ChatGPT’s, Gemini’s and Copilot’s failures to recognize some of these situations. We moreover analyze word embeddings to show that the nearest neighbors of some human values in LLMs differ from humans’ semantic representations. We then propose a new thought experiment that we call “the Chinese room with a word transition dictionary”, in extension of John Searle’s famous proposal. We finally mention current promising research directions towards a weak alignment, which could produce statistically satisfying answers in a number of common situations, however so far without ensuring any truth value.
Journal Article
Contextual modulation of value signals in reward and punishment learning
by
Joffily, Mateus
,
Khamassi, Mehdi
,
Palminteri, Stefano
in
631/378/1457/1369
,
631/378/1457/1936
,
631/378/1595
2015
Compared with reward seeking, punishment avoidance learning is less clearly understood at both the computational and neurobiological levels. Here we demonstrate, using computational modelling and fMRI in humans, that learning option values in a relative—context-dependent—scale offers a simple computational solution for avoidance learning. The context (or state) value sets the reference point to which an outcome should be compared before updating the option value. Consequently, in contexts with an overall negative expected value, successful punishment avoidance acquires a positive value, thus reinforcing the response. As revealed by post-learning assessment of options values, contextual influences are enhanced when subjects are informed about the result of the forgone alternative (counterfactual information). This is mirrored at the neural level by a shift in negative outcome encoding from the anterior insula to the ventral striatum, suggesting that value contextualization also limits the need to mobilize an opponent punishment learning system.
In contrast to predictions from learning theory, humans learn to seek rewards and avoid punishments equally well. Here the authors offer an elegant solution to this problem by demonstrating that humans learn option values relative to a reference point subserved by a common neural substrate.
Journal Article
Global reward state affects learning and activity in raphe nucleus and anterior insula in monkeys
by
Fouragnan, Elsa
,
Khamassi, Mehdi
,
Chau, Bolton K. H.
in
59/36
,
631/378/1595/2618
,
631/378/2649/1409
2020
People and other animals learn the values of choices by observing the contingencies between them and their outcomes. However, decisions are not guided by choice-linked reward associations alone; macaques also maintain a memory of the general, average reward rate – the global reward state – in an environment. Remarkably, global reward state affects the way that each choice outcome is valued and influences future decisions so that the impact of both choice success and failure is different in rich and poor environments. Successful choices are more likely to be repeated but this is especially the case in rich environments. Unsuccessful choices are more likely to be abandoned but this is especially likely in poor environments. Functional magnetic resonance imaging (fMRI) revealed two distinct patterns of activity, one in anterior insula and one in the dorsal raphe nucleus, that track global reward state as well as specific outcome events.
Wittmann and colleagues show that not only single outcome events but also the global reward state (GRS) impact learning in macaques; low GRS drives explorative choices. Analyses of macaque BOLD signal reveals that GRS impacts activity in the anterior insula as well as the dorsal raphe nucleus.
Journal Article
Dopamine blockade impairs the exploration-exploitation trade-off in rats
by
Aklil, Nassim
,
Girard, Benoît
,
Fresno, Virginie
in
631/378/1595/1396
,
631/378/1788
,
631/378/2649/1409
2019
In a volatile environment where rewards are uncertain, successful performance requires a delicate balance between exploitation of the best option and exploration of alternative choices. It has theoretically been proposed that dopamine contributes to the control of this exploration-exploitation trade-off, specifically that the higher the level of tonic dopamine, the more exploitation is favored. We demonstrate here that there is a formal relationship between the rescaling of dopamine positive reward prediction errors and the exploration-exploitation trade-off in simple non-stationary multi-armed bandit tasks. We further show in rats performing such a task that systemically antagonizing dopamine receptors greatly increases the number of random choices without affecting learning capacities. Simulations and comparison of a set of different computational models (an extended Q-learning model, a directed exploration model, and a meta-learning model) fitted on each individual confirm that, independently of the model, decreasing dopaminergic activity does not affect learning rate but is equivalent to an increase in random exploration rate. This study shows that dopamine could adapt the exploration-exploitation trade-off in decision-making when facing changing environmental contingencies.
Journal Article
Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences
2018
In economics and perceptual decision-making contextual effects are well documented, where decision weights are adjusted as a function of the distribution of stimuli. Yet, in reinforcement learning literature whether and how contextual information pertaining to decision states is integrated in learning algorithms has received comparably little attention. Here, we investigate reinforcement learning behavior and its computational substrates in a task where we orthogonally manipulate outcome valence and magnitude, resulting in systematic variations in state-values. Model comparison indicates that subjects’ behavior is best accounted for by an algorithm which includes both reference point-dependence and range-adaptation—two crucial features of state-dependent valuation. In addition, we find that state-dependent outcome valuation progressively emerges, is favored by increasing outcome information and correlated with explicit understanding of the task structure. Finally, our data clearly show that, while being locally adaptive (for instance in negative valence and small magnitude contexts), state-dependent valuation comes at the cost of seemingly irrational choices, when options are extrapolated out from their original contexts.
Humans often make sub-optimal decisions, choosing options that are less advantageous than available alternatives. Using computational modeling of behavior, the authors demonstrate that such irrational choices can arise from context dependence in reinforcement learning.
Journal Article
Modeling awake hippocampal reactivations with model-based bidirectional search
2020
Hippocampal offline reactivations during reward-based learning, usually categorized as replay events, have been found to be important for performance improvement over time and for memory consolidation. Recent computational work has linked these phenomena to the need to transform reward information into state-action values for decision making and to propagate it to all relevant states of the environment. Nevertheless, it is still unclear whether an integrated reinforcement learning mechanism could account for the variety of awake hippocampal reactivations, including variety in order (forward and reverse reactivated trajectories) and variety in the location where they occur (reward site or decision-point). Here, we present a model-based bidirectional search model which accounts for a variety of hippocampal reactivations. The model combines forward trajectory sampling from current position and backward sampling through prioritized sweeping from states associated with large reward prediction errors until the two trajectories connect. This is repeated until stabilization of state-action values (convergence), which could explain why hippocampal reactivations drastically diminish when the animal’s performance stabilizes. Simulations in a multiple T-maze task show that forward reactivations are prominently found at decision-points while backward reactivations are exclusively generated at reward sites. Finally, the model can generate imaginary trajectories that are not allowed to the agent during task performance. We raise some experimental predictions and implications for future studies of the role of the hippocampo–prefronto–striatal network in learning.
Journal Article
Replay of rule-learning related neural patterns in the prefrontal cortex during sleep
by
Peyrache, Adrien
,
Wiener, Sidney I
,
Khamassi, Mehdi
in
Acquisitions & mergers
,
Action Potentials
,
Algorithms
2009
During sleep, neural patterns reflecting previously acquired information are replayed in the hippocampus. Here, the authors report that there is reactivation of learning-related patterns of activity in the medial prefrontal cortex during sleep following rule acquisition that coincided with hippocampal sharp wave/ripple complexes.
Slow-wave sleep (SWS) is important for memory consolidation. During sleep, neural patterns reflecting previously acquired information are replayed. One possible reason for this is that such replay exchanges information between hippocampus and neocortex, supporting consolidation. We recorded neuron ensembles in the rat medial prefrontal cortex (mPFC) to study memory trace reactivation during SWS following learning and execution of cross-modal strategy shifts. In general, reactivation of learning-related patterns occurred in distinct, highly synchronized transient bouts, mostly simultaneous with hippocampal sharp wave/ripple complexes (SPWRs), when hippocampal ensemble reactivation and cortico-hippocampal interaction is enhanced. During sleep following learning of a new rule, mPFC neural patterns that appeared during response selection replayed prominently, coincident with hippocampal SPWRs. This was learning dependent, as the patterns appeared only after rule acquisition. Therefore, learning, or the resulting reliable reward, influenced which patterns were most strongly encoded and successively reactivated in the hippocampal/prefrontal network.
Journal Article
Increased cortical plasticity leads to memory interference and enhanced hippocampal-cortical interactions
2023
Our brain is continuously challenged by daily experiences. Thus, how to avoid systematic erasing of previously encoded memories? While it has been proposed that a dual-learning system with ‘slow’ learning in the cortex and ‘fast’ learning in the hippocampus could protect previous knowledge from interference, this has never been observed in the living organism. Here, we report that increasing plasticity via the viral-induced overexpression of RGS14414 in the prelimbic cortex leads to better one-trial memory, but that this comes at the price of increased interference in semantic-like memory. Indeed, electrophysiological recordings showed that this manipulation also resulted in shorter NonREM-sleep bouts, smaller delta-waves and decreased neuronal firing rates. In contrast, hippocampal-cortical interactions in form of theta coherence during wake and REM-sleep as well as oscillatory coupling during NonREM-sleep were enhanced. Thus, we provide the first experimental evidence for the long-standing and unproven fundamental idea that high thresholds for plasticity in the cortex protect preexisting memories and modulating these thresholds affects both memory encoding and consolidation mechanisms.
Journal Article
The object space task shows cumulative memory expression in both mice and rats
by
Schröder, Tim
,
Khamassi, Mehdi
,
Battaglia, Francesco
in
Animal memory
,
Animals
,
Behavior, Animal - physiology
2019
Declarative memory encompasses representations of specific events as well as knowledge extracted by accumulation over multiple episodes. To investigate how these different sorts of memories are created, we developed a new behavioral task in rodents. The task consists of 3 distinct conditions (stable, overlapping, and random). Rodents are exposed to multiple sample trials, in which they explore objects in specific spatial arrangements, with object identity changing from trial to trial. In the stable condition, the locations are constant during all sample trials even though the objects themselves change; in the test trial, 1 object's location is changed. In the random condition, object locations are presented in the sample phase without a specific spatial pattern. In the overlapping condition, 1 location is shared (overlapping) between all trials, while the other location changes during sample trials. We show that in the overlapping condition, instead of only remembering the last sample trial, rodents form a cumulative memory of the sample trials. Here, we could show that both mice and rats can accumulate information across multiple trials and express a long-term abstracted memory.
Journal Article
Interactions of spatial strategies producing generalization gradient and blocking: A computational approach
2018
We present a computational model of spatial navigation comprising different learning mechanisms in mammals, i.e., associative, cognitive mapping and parallel systems. This model is able to reproduce a large number of experimental results in different variants of the Morris water maze task, including standard associative phenomena (spatial generalization gradient and blocking), as well as navigation based on cognitive mapping. Furthermore, we show that competitive and cooperative patterns between different navigation strategies in the model allow to explain previous apparently contradictory results supporting either associative or cognitive mechanisms for spatial learning. The key computational mechanism to reconcile experimental results showing different influences of distal and proximal cues on the behavior, different learning times, and different abilities of individuals to alternatively perform spatial and response strategies, relies in the dynamic coordination of navigation strategies, whose performance is evaluated online with a common currency through a modular approach. We provide a set of concrete experimental predictions to further test the computational model. Overall, this computational work sheds new light on inter-individual differences in navigation learning, and provides a formal and mechanistic approach to test various theories of spatial cognition in mammals.
Journal Article