Catalogue Search | MBRL

Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: A Short Survey

by Karch, Tristan , Colas, Cédric , Sigaud, Olivier in Agents (artificial intelligence) , Algorithms , Artificial Intelligence

2022

Building autonomous machines that can explore open-ended environments, discover possible interactions and build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by autotelic agents: intrinsically motivated learning agents that can learn to represent, generate, select and solve their own problems. In recent years, the convergence of developmental approaches with deep reinforcement learning (RL) methods has been leading to the emergence of a new field: developmental reinforcement learning. Developmental RL is concerned with the use of deep RL algorithms to tackle a developmental problem— the intrinsically motivated acquisition of open-ended repertoires of skills. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions. This raises new challenges compared to standard RL algorithms originally designed to tackle pre-defined sets of goals using external reward signals. The present paper introduces developmental RL and proposes a computational framework based on goal-conditioned RL to tackle the intrinsically motivated skills acquisition problem. It proceeds to present a typology of the various goal representations used in the literature, before reviewing existing methods to learn to represent and prioritize goals in autonomous systems. We finally close the paper by discussing some open challenges in the quest of intrinsically motivated skills acquisition.

Journal Article

Share this book

Add to My Shelf

Language and culture internalization for human-like autotelic AI

by Karch, Tristan , Colas, Cédric , Oudeyer, Pierre-Yves in 4014/4009 , 639/705/117 , Agents (artificial intelligence)

2022

Building autonomous agents able to grow open-ended repertoires of skills across their lives is a fundamental goal of artificial intelligence (AI). A promising developmental approach recommends the design of intrinsically motivated agents that learn new skills by generating and pursuing their own goals—autotelic agents. But despite recent progress, existing algorithms still show serious limitations in terms of goal diversity, exploration, generalization or skill composition. This Perspective calls for the immersion of autotelic agents into rich socio-cultural worlds, an immensely important attribute of our environment that shapes human cognition but is mostly omitted in modern AI. Inspired by the seminal work of Vygotsky, we propose Vygotskian autotelic agents—agents able to internalize their interactions with others and turn them into cognitive tools. We focus on language and show how its structure and informational content may support the development of new cognitive functions in artificial agents as it does in humans. We justify the approach by uncovering several examples of new artificial cognitive functions emerging from interactions between language and embodiment in recent works at the intersection of deep reinforcement learning and natural language processing. Looking forward, we highlight future opportunities and challenges for Vygotskian autotelic AI research, including the use of language models as cultural models supporting artificial cognitive development. A goal of AI is to develop autonomous artificial agents with a wide set of skills. The authors propose the immersion of intrinsically motivated agents within rich socio-cultural worlds, focusing on language as a way for artificial agents to develop new cognitive functions.

Journal Article

Share this book

Add to My Shelf

EpidemiOptim: A Toolbox for the Optimization of Control Policies in Epidemiological Models

by Colas, Cédric , Rouillon, Sebastien , Thiébaut, Rodolphe in Algorithms , Artificial Intelligence , Artificial neural networks

2021

Modeling the dynamics of epidemics helps to propose control strategies based on pharmaceuticaland non-pharmaceutical interventions (contact limitation, lockdown, vaccination,etc). Hand-designing such strategies is not trivial because of the number of possibleinterventions and the difficulty to predict long-term effects. This task can be cast as an optimization problem where state-of-the-art machine learning methods such as deep reinforcement learning might bring significant value. However, the specificity of each domain|epidemic modeling or solving optimization problems|requires strong collaborationsbetween researchers from different fields of expertise. This is why we introduce EpidemiOptim, a Python toolbox that facilitates collaborations between researchers inepidemiology and optimization. EpidemiOptim turns epidemiological models and cost functions into optimization problems via a standard interface commonly used by optimization practitioners (OpenAI Gym). Reinforcement learning algorithms based on QLearning with deep neural networks (DQN) and evolutionary algorithms (NSGA-II) are already implemented. We illustrate the use of EpidemiOptim to find optimal policies fordynamical on-o lockdown control under the optimization of the death toll and economic recess using a Susceptible-Exposed-Infectious-Removed (SEIR) model for COVID-19. Using EpidemiOptim and its interactive visualization platform in Jupyter notebooks, epidemiologists, optimization practitioners and others (e.g. economists) can easily compare epidemiological models, costs functions and optimization algorithms to address important choicesto be made by health decision-makers. Trained models can be explored by experts and non-experts via a web interface. This article is part of the special track on AI and COVID-19.

Journal Article

Share this book

Add to My Shelf

Convolutional neural network, personalised, closed-loop Brain-Computer Interfaces for multi-way control mode switching in real-time

by Ortega, Pablo , Faisal, Aldo , Colas, Cedric in Bioengineering , Brain , Classification

2018

Exoskeletons and robotic devices are for many motor disabled people the only way to interact with their environment. Our lab previously developed a gaze guided assistive robotic system for grasping. It is well known that the same natural task can require different interactions described by different dynamical systems that would require different robotic controllers and their selection by the user in a self paced way. Therefore, we investigated different ways to achieve transitions between multiple states, finding that eye blinks were the most reliable to transition from off to control modes (binary classification) compared to voice and electromyography. In this paper be expanded on this work by investigating brain signals as sources for control mode switching. We developed a Brain Computer Interface (BCI) that allows users to switch between four control modes in self paced way in real time. Since the system is devised to be used in domestic environments in a user friendly way, we selected non-invasive electroencephalographic (EEG) signals and convolutional neural networks (ConvNets), known by their capability to find the optimal features for a classification task, which we hypothesised would add flexibility to the system in terms of which mental activities the user could perform to control it. We tested our system using the Cybathlon BrainRunners computer game, which represents all the challenges inherent to real time control. Our preliminary results show that an efficient architecture (SmallNet) composed by a convolutional layer, a fully connected layer and a sigmoid classification layer, is able to classify 4 mental activities that the user chose to perform. For his preferred mental activities, we run and validated the system online and retrained the system using online collected EEG data. We achieved 47,6% accuracy in online operation in the 4-way classification task. In particular we found that models trained with online collected data predicted better the behaviour of the system in real time suggesting, as a side note, that similar (ConvNets based) offline classifying methods present in literature might find a decay in performance when applied online. To the best of our knowledge this is the first time such an architecture is tested in an online operation task. While compared to our previous method relying on blinks with this one we reduced in less than half (1.6 times) the accuracy but increased by 2 the amount of states among which we can transit, bringing the opportunity for finer control of specific subtasks composing natural grasping in a self paced way.

Paper

Share this book

Add to My Shelf

Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI

by Pourcel, Julien , Colas, Cédric , Pierre-Yves Oudeyer in Effectiveness , Sampling , Searching

2026

Many program synthesis tasks prove too challenging for even state-of-the-art language models to solve in single attempts. Search-based evolutionary methods offer a promising alternative by exploring solution spaces iteratively, but their effectiveness remain limited by the fixed capabilities of the underlying generative model. We propose SOAR, a method that learns program synthesis by integrating language models into a self-improving evolutionary loop. SOAR alternates between (1) an evolutionary search that uses an LLM to sample and refine candidate solutions, and (2) a hindsight learning phase that converts search attempts into valid problem-solution pairs used to fine-tune the LLM's sampling and refinement capabilities\\, -- \\,enabling increasingly effective search in subsequent iterations. On the challenging ARC-AGI benchmark, SOAR achieves significant performance gains across model scales and iterations, leveraging positive transfer between the sampling and refinement finetuning tasks. These improvements carry over to test-time adaptation, enabling SOAR to solve 52\\% of the public test set. Our code is open-sourced at: https://github.com/flowersteam/SOAR

Paper

Share this book

Add to My Shelf

Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI

by Pourcel, Julien , Colas, Cédric , Pierre-Yves Oudeyer in Effectiveness , Sampling , Searching

2025

Many program synthesis tasks prove too challenging for even state-of-the-art language models to solve in single attempts. Search-based evolutionary methods offer a promising alternative by exploring solution spaces iteratively, but their effectiveness remain limited by the fixed capabilities of the underlying generative model. We propose SOAR, a method that learns program synthesis by integrating language models into a self-improving evolutionary loop. SOAR alternates between (1) an evolutionary search that uses an LLM to sample and refine candidate solutions, and (2) a hindsight learning phase that converts search attempts into valid problem-solution pairs used to fine-tune the LLM's sampling and refinement capabilities\\, -- \\,enabling increasingly effective search in subsequent iterations. On the challenging ARC-AGI benchmark, SOAR achieves significant performance gains across model scales and iterations, leveraging positive transfer between the sampling and refinement finetuning tasks. These improvements carry over to test-time adaptation, enabling SOAR to solve 52\\% of the public test set. Our code is open-sourced at: https://github.com/flowersteam/SOAR

Paper

Share this book

Add to My Shelf

Language and Culture Internalisation for Human-Like Autotelic AI

by Karch, Tristan , Colas, Cédric , Moulin-Frier, Clément in Agents (artificial intelligence) , Algorithms , Artificial intelligence

2022

Building autonomous agents able to grow open-ended repertoires of skills across their lives is a fundamental goal of artificial intelligence (AI). A promising developmental approach recommends the design of intrinsically motivated agents that learn new skills by generating and pursuing their own goals - autotelic agents. But despite recent progress, existing algorithms still show serious limitations in terms of goal diversity, exploration, generalisation or skill composition. This perspective calls for the immersion of autotelic agents into rich socio-cultural worlds, an immensely important attribute of our environment that shapes human cognition but is mostly omitted in modern AI. Inspired by the seminal work of Vygotsky, we propose Vygotskian autotelic agents - agents able to internalise their interactions with others and turn them into cognitive tools. We focus on language and show how its structure and informational content may support the development of new cognitive functions in artificial agents as it does in humans. We justify the approach by uncovering several examples of new artificial cognitive functions emerging from interactions between language and embodiment in recent works at the intersection of deep reinforcement learning and natural language processing. Looking forward, we highlight future opportunities and challenges for Vygotskian Autotelic AI research, including the use of language models as cultural models supporting artificial cognitive development.

Paper

Share this book

Add to My Shelf

Goal-Conditioned Agents that Learn Everything All at Once

by Foerster, Jakob , Fujimoto, Scott , Colas, Cédric in Source code

2026

A goal-conditioned reinforcement learning agent exploring an environment will see a wealth of information throughout a trajectory, most of which is discarded when only performing on-policy updates with respect to the commanded goal. All-goals learning, where each transition is used for learning off-policy with respect to every goal, allows agents to extract maximal information, however it is usually computationally infeasible when done via naive relabelling. This can be overcome by jointly outputting values and actions for every goal at once, allowing for efficient, parallel all-goals updates with a single pass through the network, in a process we call Learning Everything all at Once (LEO). We show that this approach significantly outperforms other methods on goal-conditioned Craftax and is competitive with existing baselines on continuous control environments, while achieving a >250x speed-up compared to all-goals relabelling. We then go on to show that this approach can be made even more powerful by using LEO as a teacher network, rather than a direct actor. We hope that, by unlocking all-goals learning at scale, LEO can serve as a useful tool for RL practitioners in complex environments. We open source our code.

Paper

Share this book

Add to My Shelf

Towards Teachable Autotelic Agents

by Chetouani, Mohamed , Colas, Cédric , Akakzia, Ahmed in Agents (artificial intelligence) , Artificial intelligence , Autonomy

2023

Autonomous discovery and direct instruction are two distinct sources of learning in children but education sciences demonstrate that mixed approaches such as assisted discovery or guided play result in improved skill acquisition. In the field of Artificial Intelligence, these extremes respectively map to autonomous agents learning from their own signals and interactive learning agents fully taught by their teachers. In between should stand teachable autotelic agents (TAA): agents that learn from both internal and teaching signals to benefit from the higher efficiency of assisted discovery. Designing such agents will enable real-world non-expert users to orient the learning trajectories of agents towards their expectations. More fundamentally, this may also be a key step to build agents with human-level intelligence. This paper presents a roadmap towards the design of teachable autonomous agents. Building on developmental psychology and education sciences, we start by identifying key features enabling assisted discovery processes in child-tutor interactions. This leads to the production of a checklist of features that future TAA will need to demonstrate. The checklist allows us to precisely pinpoint the various limitations of current reinforcement learning agents and to identify the promising first steps towards TAA. It also shows the way forward by highlighting key research directions towards the design or autonomous agents that can be taught by ordinary people via natural pedagogy.

Paper

Share this book

Add to My Shelf

Scaling MAP-Elites to Deep Neuroevolution

by Madhavan, Vashisht , Colas, Cédric , Clune, Jeff in Algorithms , Artificial neural networks , Control tasks

2020

Quality-Diversity (QD) algorithms, and MAP-Elites (ME) in particular, have proven very useful for a broad range of applications including enabling real robots to recover quickly from joint damage, solving strongly deceptive maze tasks or evolving robot morphologies to discover new gaits. However, present implementations of MAP-Elites and other QD algorithms seem to be limited to low-dimensional controllers with far fewer parameters than modern deep neural network models. In this paper, we propose to leverage the efficiency of Evolution Strategies (ES) to scale MAP-Elites to high-dimensional controllers parameterized by large neural networks. We design and evaluate a new hybrid algorithm called MAP-Elites with Evolution Strategies (ME-ES) for post-damage recovery in a difficult high-dimensional control task where traditional ME fails. Additionally, we show that ME-ES performs efficient exploration, on par with state-of-the-art exploration algorithms in high-dimensional control tasks with strongly deceptive rewards.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter