Catalogue Search | MBRL

Hierarchical reinforcement learning for kinematic control tasks with parameterized action spaces

by Sun, Changyin , Dong, Lu , Cao, Jingyu in Algorithms , Artificial Intelligence , Computational Biology/Bioinformatics

2024

Most existing reinforcement learning (RL) algorithms are solely applied to scenarios with pure discrete action space or pure continuous action space. However, in certain real-world kinematic control tasks that involve robot control based on kinematic properties, the action space is parameterized, wherein actions are represented by a fusion of discrete actions and continuous parameters. In this paper, we propose a hierarchical RL architecture designed specifically for handling parameterized action spaces. Our architecture consists of two levels, the higher level (discrete actor network) selects the discrete action and the lower level (continuous actor networks) determines the corresponding continuous parameters. These components work in tandem to generate an action-parameters vector to interact with the environment. Both the higher and lower levels share the rewards of environmental feedback and the critic networks to update the network weights. The soft actor critic and deep deterministic policy gradient algorithms are adopted to update higher-level and lower-level policies, respectively. Through simulation experiments conducted on different kinematic control tasks with parameterized action spaces, we demonstrate the effectiveness of our proposed algorithm.

Journal Article

Share this book

Add to My Shelf

Smart Transparency: A User-Centered Approach to Improving Human–Machine Interaction in High-Risk Supervisory Control Tasks

by Hou, Wenjun , Wang, Keran , Hong, Leyi in Artificial intelligence , Automation , Cognition

2025

In supervisory control tasks, particularly in high-risk fields, operators need to collaborate with automated intelligent agents to manage dynamic, time-sensitive, and uncertain information. Effective human–agent collaboration relies on transparent interface communication to align with the operator’s cognition and enhance trust. This paper proposes a human-centered adaptive transparency information design framework (ATDF), which dynamically adjusts the display of transparency information based on the operator’s needs and the task type. This ensures that information is accurately conveyed at critical moments, thereby enhancing trust, task performance, and interface usability. Additionally, the paper introduces a novel user research method, Heu–Kano, to explore the prioritization of transparency needs and presents a model based on eye-tracking and machine learning to identify different types of human–agent interactions. This research provides new insights into human-centered explainability in supervisory control tasks.

Journal Article

Share this book

Add to My Shelf

Data-efficient model-based reinforcement learning with trajectory discrimination

by Zhang, Junge , Zhao, Bo , Qu, Tuo in Algorithms , Complexity , Computational Intelligence

2024

Deep reinforcement learning has always been used to solve high-dimensional complex sequential decision problems. However, one of the biggest challenges for reinforcement learning is sample efficiency, especially for high-dimensional complex problems. Model-based reinforcement learning can solve the problem with a learned world model, but the performance is limited by the imperfect world model, so it usually has worse approximate performance than model-free reinforcement learning. In this paper, we propose a novel model-based reinforcement learning algorithm called World Model with Trajectory Discrimination (WMTD). We learn the representation of temporal dynamics information by adding a trajectory discriminator to the world model, and then compute the weight of state value estimation based on the trajectory discriminator to optimize the policy. Specifically, we augment the trajectories to generate negative samples and train a trajectory discriminator that shares the feature extractor with the world model. Experimental results demonstrate that our method improves the sample efficiency and achieves state-of-the-art performance on DeepMind control tasks.

Journal Article

Share this book

Add to My Shelf

Mastering diverse control tasks through world models

by Lillicrap, Timothy , Pasukonis, Jurgis , Hafner, Danijar in 639/705 , 639/705/117 , Algorithms

2025

Developing a general algorithm that learns to solve tasks across a wide range of applications has been a fundamental challenge in artificial intelligence. Although current reinforcement-learning algorithms can be readily applied to tasks similar to what they have been developed for, configuring them for new application domains requires substantial human expertise and experimentation 1 , 2 . Here we present the third generation of Dreamer, a general algorithm that outperforms specialized methods across over 150 diverse tasks, with a single configuration. Dreamer learns a model of the environment and improves its behaviour by imagining future scenarios. Robustness techniques based on normalization, balancing and transformations enable stable learning across domains. Applied out of the box, Dreamer is, to our knowledge, the first algorithm to collect diamonds in Minecraft from scratch without human data or curricula. This achievement has been posed as a substantial challenge in artificial intelligence that requires exploring farsighted strategies from pixels and sparse rewards in an open world 3 . Our work allows solving challenging control problems without extensive experimentation, making reinforcement learning broadly applicable. A general reinforcement-learning algorithm, called Dreamer, outperforms specialized expert algorithms across diverse tasks by learning a model of the environment and improving its behaviour by imagining future scenarios.

Journal Article

Share this book

Add to My Shelf

Man-in-the-Loop Control and Mission Planning for Unmanned Underwater Vehicles

by Han, Mengxue , Zhang, Qianqian , Wang, Hongjian in Aircraft , Algorithms , Autonomous underwater vehicles

2024

UUVs (unmanned underwater vehicles) perform tasks in the marine environment under direction from a commander through a mother ship control system. In cases where communication is available, a UUV task re-planning system was designed to ensure task completion despite uncertain events faced by UUVs. First, the XML language standardizes the expression of UUV task elements. Second, considering the time sequence and spatial path planning requirements of human-supervised UUV control tasks, time sequence planning based on a genetic algorithm and spatial path planning based on an improved genetic algorithm were designed to plan near-optimal approximate spatial paths for control tasks. Third, uncertainties encountered during UUV task execution were classified so that the commander could adjust according to the situation or invoke the control task re-planning algorithm to re-plan. Finally, a simulation platform was built using the QT development environment to simulate human-supervised UUV control task planning and re-planning, verifying the algorithm’s design effectiveness.

Journal Article

Share this book

Add to My Shelf

Model-Free Quantum Control with Reinforcement Learning

by Sivak, V. V. , Liu, H. , Devoret, M. H. in Adaptive control , Bias , Control tasks

2022

Model bias is an inherent limitation of the current dominant approach to optimal quantum control, which relies on a system simulation for optimization of control policies. To overcome this limitation, we propose a circuit-based approach for training a reinforcement learning agent on quantum control tasks in a model-free way. Given a continuously parametrized control circuit, the agent learns its parameters through trial-and-error interaction with the quantum system, using measurement outcomes as the only source of information about the quantum state. Focusing on control of a harmonic oscillator coupled to an ancilla qubit, we show how to reward the learning agent with measurements of experimentally available observables. We train the agent to prepare various nonclassical states via both unitary control and control with adaptive measurement-based quantum feedback, and to execute logical gates on encoded qubits. The agent does not rely on averaging for state tomography or fidelity estimation, and significantly outperforms widely used model-free methods in terms of sample efficiency. Our numerical work is of immediate relevance to superconducting circuits and trapped ions platforms where such training can be implemented in experiment, allowing complete elimination of model bias and the adaptation of quantum control policies to the specific system in which they are deployed.

Journal Article

Share this book

Add to My Shelf

Research on Hybrid Force Control of Redundant Manipulator with Reverse Task Priority

by Su, Yu , Liu, Haiyan , Lin, Chunlan in Accuracy , Algorithms , Cartesian coordinates

2022

This paper presents the reverse priority impedance control of manipulators with reference to redundant robots of a given task. The reverse priority kinematic control of redundant manipulators is first expressed in detail. The motion in the joint space is derived following the opposite order compared with the classical task priority–based solution. Then the Cartesian impedance control is combined with the reverse priority impedance control to solve the reverse hierarchical impedance controlled, so that the Cartesian impedance behavior can be divided into the primary priority impedance control and the secondary priority impedance control. Furthermore, the secondary impedance control task will not disturb the primary impedance control task. The motion in the joint space is affected following the opposite order and working in the corresponding projection operators. The primary impedance control tasks are implemented at the end, so as to avoid the possible deformations caused by the singularities occurring in the secondary impedance control tasks. Hence, the proposed reverse priority impedance control of manipulator can achieve the desired impedance control tasks with proper hierarchy. In this paper, the simulation experiments of the manipulator will verify the proposed reverse priority control algorithm.

Journal Article

Share this book

Add to My Shelf

Cognitive computational neuroscience

by Douglas, Pamela K , Kriegeskorte, Nikolaus in Artificial intelligence , Brain , Cognition

2018

To learn how cognition is implemented in the brain, we must build computational models that can perform cognitive tasks, and test such models with brain and behavioral experiments. Cognitive science has developed computational models that decompose cognition into functional components. Computational neuroscience has modeled how interacting neurons can implement elementary components of cognition. It is time to assemble the pieces of the puzzle of brain computation and to better integrate these separate disciplines. Modern technologies enable us to measure and manipulate brain activity in unprecedentedly rich ways in animals and humans. However, experiments will yield theoretical insight only when employed to test brain-computational models. Here we review recent work in the intersection of cognitive science, computational neuroscience and artificial intelligence. Computational models that mimic brain information processing during perceptual, cognitive and control tasks are beginning to be developed and tested with brain and behavioral data.

Journal Article

Share this book

Add to My Shelf

Expectation effects in working memory training

by Parong, Jocelyn , Jaeggi, Susanne M. , Green, C. Shawn in Cognition , Cognitive ability , Cognitive tasks

2022

There is a growing body of research focused on developing and evaluating behavioral training paradigms meant to induce enhancements in cognitive function. It has recently been proposed that one mechanism through which such performance gains could be induced involves participants’ expectations of improvement. However, no work to date has evaluated whether it is possible to cause changes in cognitive function in a long-term behavioral training study by manipulating expectations. In this study, positive or negative expectations about cognitive training were both explicitly and associatively induced before either a working memory training intervention or a control intervention. Consistent with previous work, a main effect of the training condition was found, with individuals trained on the working memory task showing larger gains in cognitive function than those trained on the control task. Interestingly, a main effect of expectation was also found, with individuals given positive expectations showing larger cognitive gains than those who were given negative expectations (regardless of training condition). No interaction effect between training and expectations was found. Exploratory analyses suggest that certain individual characteristics (e.g., personality, motivation) moderate the size of the expectation effect. These results highlight aspects of methodology that can inform future behavioral interventions and suggest that participant expectations could be capitalized on to maximize training outcomes.

Journal Article

Share this book

Add to My Shelf

Libsignal: an open library for traffic signal control

by Da, Longchao , Wei, Hua , Mei, Hao in Artificial Intelligence , Computer Science , Control

2024

This paper introduces a library for cross-simulator comparison of reinforcement learning models in traffic signal control tasks. This library is developed to implement recent state-of-the-art reinforcement learning models with extensible interfaces and unified cross-simulator evaluation metrics. It supports commonly-used simulators in traffic signal control tasks, including Simulation of Urban MObility(SUMO) and CityFlow, and multiple benchmark datasets for fair comparisons. We conducted experiments to validate our implementation of the models and to calibrate the simulators so that the experiments from one simulator could be referential to the other. Based on the validated models and calibrated environments, this paper compares and reports the performance of current state-of-the-art RL algorithms across different datasets and simulators. This is the first time that these methods have been compared fairly under the same datasets with different simulators.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter