Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
64 result(s) for "Colas, Cédric"
Sort by:
Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: A Short Survey
Building autonomous machines that can explore open-ended environments, discover possible interactions and build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by autotelic agents: intrinsically motivated learning agents that can learn to represent, generate, select and solve their own problems. In recent years, the convergence of developmental approaches with deep reinforcement learning (RL) methods has been leading to the emergence of a new field: developmental reinforcement learning. Developmental RL is concerned with the use of deep RL algorithms to tackle a developmental problem— the intrinsically motivated acquisition of open-ended repertoires of skills. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions. This raises new challenges compared to standard RL algorithms originally designed to tackle pre-defined sets of goals using external reward signals. The present paper introduces developmental RL and proposes a computational framework based on goal-conditioned RL to tackle the intrinsically motivated skills acquisition problem. It proceeds to present a typology of the various goal representations used in the literature, before reviewing existing methods to learn to represent and prioritize goals in autonomous systems. We finally close the paper by discussing some open challenges in the quest of intrinsically motivated skills acquisition.
Language and culture internalization for human-like autotelic AI
Building autonomous agents able to grow open-ended repertoires of skills across their lives is a fundamental goal of artificial intelligence (AI). A promising developmental approach recommends the design of intrinsically motivated agents that learn new skills by generating and pursuing their own goals—autotelic agents. But despite recent progress, existing algorithms still show serious limitations in terms of goal diversity, exploration, generalization or skill composition. This Perspective calls for the immersion of autotelic agents into rich socio-cultural worlds, an immensely important attribute of our environment that shapes human cognition but is mostly omitted in modern AI. Inspired by the seminal work of Vygotsky, we propose Vygotskian autotelic agents—agents able to internalize their interactions with others and turn them into cognitive tools. We focus on language and show how its structure and informational content may support the development of new cognitive functions in artificial agents as it does in humans. We justify the approach by uncovering several examples of new artificial cognitive functions emerging from interactions between language and embodiment in recent works at the intersection of deep reinforcement learning and natural language processing. Looking forward, we highlight future opportunities and challenges for Vygotskian autotelic AI research, including the use of language models as cultural models supporting artificial cognitive development. A goal of AI is to develop autonomous artificial agents with a wide set of skills. The authors propose the immersion of intrinsically motivated agents within rich socio-cultural worlds, focusing on language as a way for artificial agents to develop new cognitive functions.
EpidemiOptim: A Toolbox for the Optimization of Control Policies in Epidemiological Models
Modeling the dynamics of epidemics helps to propose control strategies based on pharmaceuticaland non-pharmaceutical interventions (contact limitation, lockdown, vaccination,etc). Hand-designing such strategies is not trivial because of the number of possibleinterventions and the difficulty to predict long-term effects. This task can be cast as an optimization problem where state-of-the-art machine learning methods such as deep reinforcement learning might bring significant value. However, the specificity of each domain|epidemic modeling or solving optimization problems|requires strong collaborationsbetween researchers from different fields of expertise. This is why we introduce EpidemiOptim, a Python toolbox that facilitates collaborations between researchers inepidemiology and optimization. EpidemiOptim turns epidemiological models and cost functions into optimization problems via a standard interface commonly used by optimization practitioners (OpenAI Gym). Reinforcement learning algorithms based on QLearning with deep neural networks (DQN) and evolutionary algorithms (NSGA-II) are already implemented. We illustrate the use of EpidemiOptim to find optimal policies fordynamical on-o  lockdown control under the optimization of the death toll and economic recess using a Susceptible-Exposed-Infectious-Removed (SEIR) model for COVID-19. Using EpidemiOptim and its interactive visualization platform in Jupyter notebooks, epidemiologists, optimization practitioners and others (e.g. economists) can easily compare epidemiological models, costs functions and optimization algorithms to address important choicesto be made by health decision-makers. Trained models can be explored by experts and non-experts via a web interface. This article is part of the special track on AI and COVID-19.
Convolutional neural network, personalised, closed-loop Brain-Computer Interfaces for multi-way control mode switching in real-time
Exoskeletons and robotic devices are for many motor disabled people the only way to interact with their environment. Our lab previously developed a gaze guided assistive robotic system for grasping. It is well known that the same natural task can require different interactions described by different dynamical systems that would require different robotic controllers and their selection by the user in a self paced way. Therefore, we investigated different ways to achieve transitions between multiple states, finding that eye blinks were the most reliable to transition from off to control modes (binary classification) compared to voice and electromyography. In this paper be expanded on this work by investigating brain signals as sources for control mode switching. We developed a Brain Computer Interface (BCI) that allows users to switch between four control modes in self paced way in real time. Since the system is devised to be used in domestic environments in a user friendly way, we selected non-invasive electroencephalographic (EEG) signals and convolutional neural networks (ConvNets), known by their capability to find the optimal features for a classification task, which we hypothesised would add flexibility to the system in terms of which mental activities the user could perform to control it. We tested our system using the Cybathlon BrainRunners computer game, which represents all the challenges inherent to real time control. Our preliminary results show that an efficient architecture (SmallNet) composed by a convolutional layer, a fully connected layer and a sigmoid classification layer, is able to classify 4 mental activities that the user chose to perform. For his preferred mental activities, we run and validated the system online and retrained the system using online collected EEG data. We achieved 47,6% accuracy in online operation in the 4-way classification task. In particular we found that models trained with online collected data predicted better the behaviour of the system in real time suggesting, as a side note, that similar (ConvNets based) offline classifying methods present in literature might find a decay in performance when applied online. To the best of our knowledge this is the first time such an architecture is tested in an online operation task. While compared to our previous method relying on blinks with this one we reduced in less than half (1.6 times) the accuracy but increased by 2 the amount of states among which we can transit, bringing the opportunity for finer control of specific subtasks composing natural grasping in a self paced way.
Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI
Many program synthesis tasks prove too challenging for even state-of-the-art language models to solve in single attempts. Search-based evolutionary methods offer a promising alternative by exploring solution spaces iteratively, but their effectiveness remain limited by the fixed capabilities of the underlying generative model. We propose SOAR, a method that learns program synthesis by integrating language models into a self-improving evolutionary loop. SOAR alternates between (1) an evolutionary search that uses an LLM to sample and refine candidate solutions, and (2) a hindsight learning phase that converts search attempts into valid problem-solution pairs used to fine-tune the LLM's sampling and refinement capabilities\\, -- \\,enabling increasingly effective search in subsequent iterations. On the challenging ARC-AGI benchmark, SOAR achieves significant performance gains across model scales and iterations, leveraging positive transfer between the sampling and refinement finetuning tasks. These improvements carry over to test-time adaptation, enabling SOAR to solve 52\\% of the public test set. Our code is open-sourced at: https://github.com/flowersteam/SOAR
Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI
Many program synthesis tasks prove too challenging for even state-of-the-art language models to solve in single attempts. Search-based evolutionary methods offer a promising alternative by exploring solution spaces iteratively, but their effectiveness remain limited by the fixed capabilities of the underlying generative model. We propose SOAR, a method that learns program synthesis by integrating language models into a self-improving evolutionary loop. SOAR alternates between (1) an evolutionary search that uses an LLM to sample and refine candidate solutions, and (2) a hindsight learning phase that converts search attempts into valid problem-solution pairs used to fine-tune the LLM's sampling and refinement capabilities\\, -- \\,enabling increasingly effective search in subsequent iterations. On the challenging ARC-AGI benchmark, SOAR achieves significant performance gains across model scales and iterations, leveraging positive transfer between the sampling and refinement finetuning tasks. These improvements carry over to test-time adaptation, enabling SOAR to solve 52\\% of the public test set. Our code is open-sourced at: https://github.com/flowersteam/SOAR
Language and Culture Internalisation for Human-Like Autotelic AI
Building autonomous agents able to grow open-ended repertoires of skills across their lives is a fundamental goal of artificial intelligence (AI). A promising developmental approach recommends the design of intrinsically motivated agents that learn new skills by generating and pursuing their own goals - autotelic agents. But despite recent progress, existing algorithms still show serious limitations in terms of goal diversity, exploration, generalisation or skill composition. This perspective calls for the immersion of autotelic agents into rich socio-cultural worlds, an immensely important attribute of our environment that shapes human cognition but is mostly omitted in modern AI. Inspired by the seminal work of Vygotsky, we propose Vygotskian autotelic agents - agents able to internalise their interactions with others and turn them into cognitive tools. We focus on language and show how its structure and informational content may support the development of new cognitive functions in artificial agents as it does in humans. We justify the approach by uncovering several examples of new artificial cognitive functions emerging from interactions between language and embodiment in recent works at the intersection of deep reinforcement learning and natural language processing. Looking forward, we highlight future opportunities and challenges for Vygotskian Autotelic AI research, including the use of language models as cultural models supporting artificial cognitive development.
Goal-Conditioned Agents that Learn Everything All at Once
A goal-conditioned reinforcement learning agent exploring an environment will see a wealth of information throughout a trajectory, most of which is discarded when only performing on-policy updates with respect to the commanded goal. All-goals learning, where each transition is used for learning off-policy with respect to every goal, allows agents to extract maximal information, however it is usually computationally infeasible when done via naive relabelling. This can be overcome by jointly outputting values and actions for every goal at once, allowing for efficient, parallel all-goals updates with a single pass through the network, in a process we call Learning Everything all at Once (LEO). We show that this approach significantly outperforms other methods on goal-conditioned Craftax and is competitive with existing baselines on continuous control environments, while achieving a >250x speed-up compared to all-goals relabelling. We then go on to show that this approach can be made even more powerful by using LEO as a teacher network, rather than a direct actor. We hope that, by unlocking all-goals learning at scale, LEO can serve as a useful tool for RL practitioners in complex environments. We open source our code.
Towards Teachable Autotelic Agents
Autonomous discovery and direct instruction are two distinct sources of learning in children but education sciences demonstrate that mixed approaches such as assisted discovery or guided play result in improved skill acquisition. In the field of Artificial Intelligence, these extremes respectively map to autonomous agents learning from their own signals and interactive learning agents fully taught by their teachers. In between should stand teachable autotelic agents (TAA): agents that learn from both internal and teaching signals to benefit from the higher efficiency of assisted discovery. Designing such agents will enable real-world non-expert users to orient the learning trajectories of agents towards their expectations. More fundamentally, this may also be a key step to build agents with human-level intelligence. This paper presents a roadmap towards the design of teachable autonomous agents. Building on developmental psychology and education sciences, we start by identifying key features enabling assisted discovery processes in child-tutor interactions. This leads to the production of a checklist of features that future TAA will need to demonstrate. The checklist allows us to precisely pinpoint the various limitations of current reinforcement learning agents and to identify the promising first steps towards TAA. It also shows the way forward by highlighting key research directions towards the design or autonomous agents that can be taught by ordinary people via natural pedagogy.
Scaling MAP-Elites to Deep Neuroevolution
Quality-Diversity (QD) algorithms, and MAP-Elites (ME) in particular, have proven very useful for a broad range of applications including enabling real robots to recover quickly from joint damage, solving strongly deceptive maze tasks or evolving robot morphologies to discover new gaits. However, present implementations of MAP-Elites and other QD algorithms seem to be limited to low-dimensional controllers with far fewer parameters than modern deep neural network models. In this paper, we propose to leverage the efficiency of Evolution Strategies (ES) to scale MAP-Elites to high-dimensional controllers parameterized by large neural networks. We design and evaluate a new hybrid algorithm called MAP-Elites with Evolution Strategies (ME-ES) for post-damage recovery in a difficult high-dimensional control task where traditional ME fails. Additionally, we show that ME-ES performs efficient exploration, on par with state-of-the-art exploration algorithms in high-dimensional control tasks with strongly deceptive rewards.