Catalogue Search | MBRL

Hierarchical motor control in mammals and machines

by Wayne, Greg , Botvinick, Matthew , Merel, Josh in 631/378/2632 , 639/705/117 , Animals

2019

Advances in artificial intelligence are stimulating interest in neuroscience. However, most attention is given to discrete tasks with simple action spaces, such as board games and classic video games. Less discussed in neuroscience are parallel advances in “synthetic motor control”. While motor neuroscience has recently focused on optimization of single, simple movements, AI has progressed to the generation of rich, diverse motor behaviors across multiple tasks, at humanoid scale. It is becoming clear that specific, well-motivated hierarchical design elements repeatedly arise when engineering these flexible control systems. We review these core principles of hierarchical control, relate them to hierarchy in the nervous system, and highlight research themes that we anticipate will be critical in solving challenges at this disciplinary intersection. Recent research in motor neuroscience has focused on optimal feedback control of single, simple tasks while robotics and AI are making progress towards flexible movement control in complex environments employing hierarchical control strategies. Here, the authors argue for a return to hierarchical models of motor control in neuroscience.

Journal Article

Share this book

Add to My Shelf

Sensing flow gradients is necessary for learning autonomous underwater navigation

by Jiao, Yusheng , Kanso, Eva , Hang, Haotian in 631/57/2266 , 639/166/988 , Animals

2025

Aquatic animals are much better at underwater navigation than robotic vehicles. Robots face major challenges in deep water because of their limited access to global positioning signals and flow maps. These limitations, and the changing nature of water currents, support the use of reinforcement learning approaches, where the navigator learns through trial-and-error interactions with the flow environment. But is it feasible to learn underwater navigation in the agent’s Umwelt , without any land references? Here, we tasked an artificial swimmer with learning to reach a specific destination in unsteady flows by relying solely on egocentric observations, collected through on-board flow sensors in the agent’s body frame, with no reference to a geocentric inertial frame. We found that while sensing local flow velocities is sufficient for geocentric navigation, successful egocentric navigation requires additional information of local flow gradients. Importantly, egocentric navigation strategies obey rotational symmetry and are more robust in unfamiliar conditions and flows not experienced during training. Our work expands underwater robot-centric learning, helps explain why aquatic organisms have arrays of flow sensors that detect gradients, and provides physics-based guidelines for transfer learning of learned policies to unfamiliar and diverse flow environments. Aquatic animals outperform robotic vehicles in underwater navigation due to robots’ limited access to GPS and flow maps in deep water. The authors report that to successfully learn navigation, an agent must sense both local flows and flow gradients, enabling adaptable and robust policies under unfamiliar conditions.

Journal Article

Share this book

Add to My Shelf

Spatiotemporal receptive fields of barrel cortex revealed by reverse correlation of synaptic input

by Bruno, Randy M , Ramirez, Alejandro , Miller, Kenneth D in 14/63 , 631/378/116/2395 , 631/378/2620/2618

2014

To investigate the sensory contributions of barrel cortex, the authors estimate spatiotemporal receptive fields by reverse correlation of multi-whisker stimulation to synaptic inputs. Complex stimuli revealed dramatically sharpened receptive fields, largely due to adaptation, and suggest the potential importance of surround facilitation through adaptation for discriminating complex shapes and textures during natural sensing. Of all of the sensory areas, barrel cortex is among the best understood in terms of circuitry, yet least understood in terms of sensory function. We combined intracellular recording in rats with a multi-directional, multi-whisker stimulator system to estimate receptive fields by reverse correlation of stimuli to synaptic inputs. Spatiotemporal receptive fields were identified orders of magnitude faster than by conventional spike-based approaches, even for neurons with little spiking activity. Given a suitable stimulus representation, a linear model captured the stimulus-response relationship for all neurons with high accuracy. In contrast with conventional single-whisker stimuli, complex stimuli revealed markedly sharpened receptive fields, largely as a result of adaptation. This phenomenon allowed the surround to facilitate rather than to suppress responses to the principal whisker. Optimized stimuli enhanced firing in layers 4–6, but not in layers 2/3, which remained sparsely active. Surround facilitation through adaptation may be required for discriminating complex shapes and textures during natural sensing.

Journal Article

Share this book

Add to My Shelf

Encoder-Decoder Optimization for Brain-Computer Interfaces

by Pianto, Donald M. , Cunningham, John P. , Paninski, Liam in Algorithms , Brain-Computer Interfaces , Computational Biology

2015

Neuroprosthetic brain-computer interfaces are systems that decode neural activity into useful control signals for effectors, such as a cursor on a computer screen. It has long been recognized that both the user and decoding system can adapt to increase the accuracy of the end effector. Co-adaptation is the process whereby a user learns to control the system in conjunction with the decoder adapting to learn the user's neural patterns. We provide a mathematical framework for co-adaptation and relate co-adaptation to the joint optimization of the user's control scheme (\"encoding model\") and the decoding algorithm's parameters. When the assumptions of that framework are respected, co-adaptation cannot yield better performance than that obtainable by an optimal initial choice of fixed decoder, coupled with optimal user learning. For a specific case, we provide numerical methods to obtain such an optimized decoder. We demonstrate our approach in a model brain-computer interface system using an online prosthesis simulator, a simple human-in-the-loop pyschophysics setup which provides a non-invasive simulation of the BCI setting. These experiments support two claims: that users can learn encoders matched to fixed, optimal decoders and that, once learned, our approach yields expected performance advantages.

Journal Article

Share this book

Add to My Shelf

Neuroprosthetic Decoder Training as Imitation Learning

by Carlson, David , Cunningham, John P. , Paninski, Liam in Algorithms , Biology and Life Sciences , Brain-Computer Interfaces - statistics & numerical data

2016

Neuroprosthetic brain-computer interfaces function via an algorithm which decodes neural activity of the user into movements of an end effector, such as a cursor or robotic arm. In practice, the decoder is often learned by updating its parameters while the user performs a task. When the user's intention is not directly observable, recent methods have demonstrated value in training the decoder against a surrogate for the user's intended movement. Here we show that training a decoder in this way is a novel variant of an imitation learning problem, where an oracle or expert is employed for supervised training in lieu of direct observations, which are not available. Specifically, we describe how a generic imitation learning meta-algorithm, dataset aggregation (DAgger), can be adapted to train a generic brain-computer interface. By deriving existing learning algorithms for brain-computer interfaces in this framework, we provide a novel analysis of regret (an important metric of learning efficacy) for brain-computer interfaces. This analysis allows us to characterize the space of algorithmic variants and bounds on their regret rates. Existing approaches for decoder learning have been performed in the cursor control setting, but the available design principles for these decoders are such that it has been impossible to scale them to naturalistic settings. Leveraging our findings, we then offer an algorithm that combines imitation learning with optimal control, which should allow for training of arbitrary effectors for which optimal control can generate goal-oriented control. We demonstrate this novel and general BCI algorithm with simulated neuroprosthetic control of a 26 degree-of-freedom model of an arm, a sophisticated and realistic end effector.

Journal Article

Share this book

Add to My Shelf

A virtual rodent predicts the structure of neural activity across behaviours

by Aldarondo, Diego , Gellis, Amanda , Ölveczky, Bence P. in 631/378/116/1925 , 631/378/116/2392 , 631/378/2632

2024

Animals have exquisite control of their bodies, allowing them to perform a diverse range of behaviours. How such control is implemented by the brain, however, remains unclear. Advancing our understanding requires models that can relate principles of control to the structure of neural activity in behaving animals. Here, to facilitate this, we built a ‘virtual rodent’, in which an artificial neural network actuates a biomechanically realistic model of the rat 1 in a physics simulator 2 . We used deep reinforcement learning 3 , 4 – 5 to train the virtual agent to imitate the behaviour of freely moving rats, thus allowing us to compare neural activity recorded in real rats to the network activity of a virtual rodent mimicking their behaviour. We found that neural activity in the sensorimotor striatum and motor cortex was better predicted by the virtual rodent’s network activity than by any features of the real rat’s movements, consistent with both regions implementing inverse dynamics 6 . Furthermore, the network’s latent variability predicted the structure of neural variability across behaviours and afforded robustness in a way consistent with the minimal intervention principle of optimal feedback control 7 . These results demonstrate how physical simulation of biomechanically realistic virtual animals can help interpret the structure of neural activity across behaviour and relate it to theoretical principles of motor control. We built an artificial neural network to control a biomechanically realistic virtual rodent, which, when trained to imitate real rats, predicts neural activity and variability across natural behaviours.

Journal Article

Share this book

Add to My Shelf

Encoder-Decoder Optimization for Brain-Computer Interfaces

by Cunningham, John P , Pianto, Donald M , Paninski, Liam in Algorithms , Experiments , Funding

2015

Neuroprosthetic brain-computer interfaces are systems that decode neural activity into useful control signals for effectors, such as a cursor on a computer screen. It has long been recognized that both the user and decoding system can adapt to increase the accuracy of the end effector. Co-adaptation is the process whereby a user learns to control the system in conjunction with the decoder adapting to learn the user's neural patterns. We provide a mathematical framework for co-adaptation and relate co-adaptation to the joint optimization of the user's control scheme (\"encoding model\") and the decoding algorithm's parameters. When the assumptions of that framework are respected, co-adaptation cannot yield better performance than that obtainable by an optimal initial choice of fixed decoder, coupled with optimal user learning. For a specific case, we provide numerical methods to obtain such an optimized decoder. We demonstrate our approach in a model brain-computer interface system using an online prosthesis simulator, a simple human-in-the-loop pyschophysics setup which provides a non-invasive simulation of the BCI setting. These experiments support two claims: that users can learn encoders matched to fixed, optimal decoders and that, once learned, our approach yields expected performance advantages.

Journal Article

Share this book

Add to My Shelf

Flow Currents Support Simple and Versatile Trail-Tracking Strategies

by Jiao, Yusheng , Kanso, Eva , Hang, Haotian in Biophysics

2025

Aquatic animals offer compelling evidence that flow sensing alone, without vision, is sufficient to guide a swimming organism to the source of an unsteady hydrodynamic trail. However, the sensory feedback strategies that allow these remarkable trail tracking abilities remain opaque. Here, by integrating mechanistic flow simulations with reinforcement learning techniques, we discovered two simple and equally effective strategies for hydrodynamic trail following. Though not a priori obvious, these strategies possess parsimonious interpretations, analogous to Braitenberg’s simplest vehicles, where the agent senses local flow signals and turns away from or toward the direction of stronger signals. A rigorous stability analysis shows that the effectiveness of these strategies in robustly tracking flow currents is independent of the type of sensor but depends on sensor placement and the traveling nature of the flow signal. Importantly, these results inform a suite of versatile strategies for hydrodynamic trail following applicable to both vortical and turbulent flows. These insights support the future design and implementation of adaptive real-time sensory feedback strategies for autonomous robots in dynamic flow environments.

Paper

Share this book

Add to My Shelf

Data augmentation for efficient learning from parametric experts

by Galashov, Alexandre , Heess, Nicolas , Merel, Josh in Algorithms , Cloning , Data augmentation

2022

We present a simple, yet powerful data-augmentation technique to enable data-efficient learning from parametric experts for reinforcement and imitation learning. We focus on what we call the policy cloning setting, in which we use online or offline queries of an expert or expert policy to inform the behavior of a student policy. This setting arises naturally in a number of problems, for instance as variants of behavior cloning, or as a component of other algorithms such as DAGGER, policy distillation or KL-regularized RL. Our approach, augmented policy cloning (APC), uses synthetic states to induce feedback-sensitivity in a region around sampled trajectories, thus dramatically reducing the environment interactions required for successful cloning of the expert. We achieve highly data-efficient transfer of behavior from an expert to a student policy for high-degrees-of-freedom control problems. We demonstrate the benefit of our method in the context of several existing and widely used algorithms that include policy cloning as a constituent part. Moreover, we highlight the benefits of our approach in two practically relevant settings (a) expert compression, i.e. transfer to a student with fewer parameters; and (b) transfer from privileged experts, i.e. where the expert has a different observation space than the student, usually including access to privileged information.

Paper

Share this book

Add to My Shelf

Learning to swim in potential flow

by Jiao, Yusheng , Kanso, Eva , Feng, Ling in Drift , Fish , Learning

2020

Fish swim by undulating their bodies. These propulsive motions require coordinated shape changes of a body that interacts with its fluid environment, but the specific shape coordination that leads to robust turning and swimming motions remains unclear. To address the problem of underwater motion planning, we propose a simple model of a three-link fish swimming in a potential flow environment and we use model-free reinforcement learning for shape control. We arrive at optimal shape changes for two swimming tasks: swimming in a desired direction and swimming towards a known target. This fish model belongs to a class of problems in geometric mechanics, known as driftless dynamical systems, which allow us to analyze the swimming behavior in terms of geometric phases over the shape space of the fish. These geometric methods are less intuitive in the presence of drift. Here, we use the shape space analysis as a tool for assessing, visualizing, and interpreting the control policies obtained via reinforcement learning in the absence of drift. We then examine the robustness of these policies to drift-related perturbations. Although the fish has no direct control over the drift itself, it learns to take advantage of the presence of moderate drift to reach its target.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter