Catalogue Search | MBRL

Open-Ended Learning: A Conceptual Framework Based on Representational Redescription

by Coninx, Alexandre , Girard, Benoît , Duro, Richard in actions and goals , Computer Science , Design

2018

Reinforcement learning (RL) aims at building a policy that maximizes a task-related reward within a given domain. When the domain is known, i.e., when its states, actions and reward are defined, Markov Decision Processes (MDPs) provide a convenient theoretical framework to formalize RL. But in an open-ended learning process, an agent or robot must solve an unbounded sequence of tasks that are not known in advance and the corresponding MDPs cannot be built at design time. This defines the main challenges of open-ended learning: how can the agent learn how to behave appropriately when the adequate states, actions and rewards representations are not given? In this paper, we propose a conceptual framework to address this question. We assume an agent endowed with low-level perception and action capabilities. This agent receives an external reward when it faces a task. It must discover the state and action representations that will let it cast the tasks as MDPs in order to solve them by RL. The relevance of the action or state representation is critical for the agent to learn efficiently. Considering that the agent starts with a low level, task-agnostic state and action spaces based on its low-level perception and action capabilities, we describe open-ended learning as the challenge of building the adequate representation of states and actions, i.e., of redescribing available representations. We suggest an iterative approach to this problem based on several successive Representational Redescription processes, and highlight the corresponding challenges in which intrinsic motivations play a key role.

Journal Article

Share this book

Add to My Shelf

Policy-Gradient and Actor-Critic Based State Representation Learning for Safe Driving of Autonomous Vehicles

by Khwaja, Ahmed Shaharyar , Anpalagan, Alagan , Gupta, Abhishek in Artificial intelligence , autonomous driving , Autonomous vehicles

2020

In this paper, we propose an environment perception framework for autonomous driving using state representation learning (SRL). Unlike existing Q-learning based methods for efficient environment perception and object detection, our proposed method takes the learning loss into account under deterministic as well as stochastic policy gradient. Through a combination of variational autoencoder (VAE), deep deterministic policy gradient (DDPG), and soft actor-critic (SAC), we focus on uninterrupted and reasonably safe autonomous driving without steering off the track for a considerable driving distance. Our proposed technique exhibits learning in autonomous vehicles under complex interactions with the environment, without being explicitly trained on driving datasets. To ensure the effectiveness of the scheme over a sustained period of time, we employ a reward-penalty based system where a negative reward is associated with an unfavourable action and a positive reward is awarded for favourable actions. The results obtained through simulations on DonKey simulator show the effectiveness of our proposed method by examining the variations in policy loss, value loss, reward function, and cumulative reward for ‘VAE+DDPG’ and ‘VAE+SAC’ over the learning process.

Journal Article

Share this book

Add to My Shelf

Robot skill learning and the data dilemma it faces: a systematic review

by He, Bin , Zhou, Yanmin , Sang, Hongrui in Data analysis , Decision making , Deep learning

2024

PurposeCompared with traditional methods relying on manual teaching or system modeling, data-driven learning methods, such as deep reinforcement learning and imitation learning, show more promising potential to cope with the challenges brought by increasingly complex tasks and environments, which have become the hot research topic in the field of robot skill learning. However, the contradiction between the difficulty of collecting robot–environment interaction data and the low data efficiency causes all these methods to face a serious data dilemma, which has become one of the key issues restricting their development. Therefore, this paper aims to comprehensively sort out and analyze the cause and solutions for the data dilemma in robot skill learning.Design/methodology/approachFirst, this review analyzes the causes of the data dilemma based on the classification and comparison of data-driven methods for robot skill learning; Then, the existing methods used to solve the data dilemma are introduced in detail. Finally, this review discusses the remaining open challenges and promising research topics for solving the data dilemma in the future.FindingsThis review shows that simulation–reality combination, state representation learning and knowledge sharing are crucial for overcoming the data dilemma of robot skill learning.Originality/valueTo the best of the authors’ knowledge, there are no surveys that systematically and comprehensively sort out and analyze the data dilemma in robot skill learning in the existing literature. It is hoped that this review can be helpful to better address the data dilemma in robot skill learning in the future.

Journal Article

Share this book

Add to My Shelf

Reinforcement Learning-Based Multimodal Model for the Stock Investment Portfolio Management Task

by Du, Sha , Shen, Hailong in Algorithms , Artificial intelligence , Datasets

2024

Machine learning has been applied by more and more scholars in the field of quantitative investment, but traditional machine learning methods cannot provide high returns and strong stability at the same time. In this paper, a multimodal model based on reinforcement learning (RL) is constructed for the stock investment portfolio management task. Most of the previous methods based on RL have chosen the value-based RL methods. Policy gradient-based RL methods have been proven to be superior to value-based RL methods by a growing number of research. Commonly used policy gradient-based reinforcement learning methods are DDPG, TD3, SAC, and PPO. We conducted comparative experiments to select the most suitable method for the dataset in this paper. The final choice was DDPG. Furthermore, there will rarely be a way to refine the raw data before training the agent. The stock market has a large amount of data, and the data are complex. If the raw stock market data are fed directly to the agent, the agent cannot learn the information in the data efficiently and quickly. We use state representation learning (SRL) to process the raw stock data and then feed the processed data to the agent. It is not enough to train the agent using only stock data; we also added comment text data and image data. The comment text data comes from investors’ comments on stock bars. Image data are derived from pictures that can represent the overall direction of the market. We conducted experiments on three datasets and compared our proposed model with 11 other methods. We set up three evaluation indicators in the paper. Taken together, our proposed model works best.

Journal Article

Share this book

Add to My Shelf

Solving Partially Observable 3D-Visual Tasks with Visual Radial Basis Function Network and Proximal Policy Optimization

by Hautot, Julien , Teulière, Céline , Azzaoui, Nourddine in Algorithms , Computer Science , computer vision

2023

Visual Reinforcement Learning (RL) has been largely investigated in recent decades. Existing approaches are often composed of multiple networks requiring massive computational power to solve partially observable tasks from high-dimensional data such as images. Using State Representation Learning (SRL) has been shown to improve the performance of visual RL by reducing the high-dimensional data into compact representation, but still often relies on deep networks and on the environment. In contrast, we propose a lighter, more generic method to extract sparse and localized features from raw images without training. We achieve this using a Visual Radial Basis Function Network (VRBFN), which offers significant practical advantages, including efficient and accurate training with minimal complexity due to its two linear layers. For real-world applications, its scalability and resilience to noise are essential, as real sensors are subject to change and noise. Unlike CNNs, which may require extensive retraining, this network might only need minor fine-tuning. We test the efficiency of the VRBFN representation to solve different RL tasks using Proximal Policy Optimization (PPO). We present a large study and comparison of our extraction methods with five classical visual RL and SRL approaches on five different first-person partially observable scenarios. We show that this approach presents appealing features such as sparsity and robustness to noise and that the obtained results when training RL agents are better than other tested methods on four of the five proposed scenarios.

Journal Article

Share this book

Add to My Shelf

A2C: Attention-Augmented Contrastive Learning for State Representation Extraction

by Liu, Yadong , Zhou, Zongtan , Chen, Haoqiang in attention mechanism , contrastive learning , Data processing

2020

Reinforcement learning (RL) faces a series of challenges, including learning efficiency and generalization. The state representation used to train RL is one of the important factors causing these challenges. In this paper, we explore providing a more efficient state representation for RL. Contrastive learning is used as the representation extraction method in our work. We propose an attention mechanism implementation and extend an existing contrastive learning method by embedding the attention mechanism. Finally an attention-augmented contrastive learning method called A2C is obtained. As a result, using the state representation from A2C, the robot achieves better learning efficiency and generalization than those using state-of-the-art representations. Moreover, our attention mechanism is proven to be able to calculate the correlation of arbitrary distance among pixels, which is conducive to capturing more accurate obstacle information. What is more, we remove the attention mechanism from A2C. It is shown that the rewards available for the attention-removed A2C are reduced by more than 70%, which indicates the important role of the attention mechanism.

Journal Article

Share this book

Add to My Shelf

Predictive Modeling of Aircraft Dynamics Using Neural Networks

by Roach, Shane , Soleyman, Sean , Khosla, Deepak in Adversarial , Aircraft , Aircraft pilots

2022

Fighter pilots must study models of aircraft dynamics before learning complex maneuvers and tactics. Similarly, autonomous fighter aircraft applications may benefit from a model-based learning approach. Instead of using a preexisting physics model of a given aircraft, a machine learning system can learn a predictive model of the aircraft physics from training data. Furthermore, it can model interactions between multiple friendly aircraft, enemy aircraft, and the environment. Such a system can also learn to represent state variables that are not directly observable, as well as dynamics that are not hard coded. Existing model-based methods use a deep neural network that takes observable state information and agent actions as input and provides predictions of future observations as output. The proposed method builds upon this approach by adding a residual feedforward skip connection from some of the inputs to all of the outputs of the deep neural network. Further innovations address numerical conditioning issues as well as periodic discontinuities of angular quantities such as bearing or heading. The methods in this article also extend techniques from model-based reinforcement learning control to the domain of adversarial multi-agent environments. In previous literature, these model-based methods have only been used for controlling individual agents. Instead of using a traditional Recurrent Neural Network (RNN) to learn a representation of the world state, the novel method also uses a compressive encoding scheme. This is based on an augmented version of the same neural network that is used for predictive modeling.

Journal Article

Share this book

Add to My Shelf

Neuro-symbolic artificial intelligence: a survey

by Ramdane-Cherif, Amar , Tomar, Ravi , Bhuyan, Bikram Pratim in Artificial Intelligence , Cognition & reasoning , Computational Biology/Bioinformatics

2024

The goal of the growing discipline of neuro-symbolic artificial intelligence (AI) is to develop AI systems with more human-like reasoning capabilities by combining symbolic reasoning with connectionist learning. We survey the literature on neuro-symbolic AI during the last two decades, including books, monographs, review papers, contribution pieces, opinion articles, foundational workshops/talks, and related PhD theses. Four main features of neuro-symbolic AI are discussed, including representation, learning, reasoning, and decision-making. Finally, we discuss the many applications of neuro-symbolic AI, including question answering, robotics, computer vision, healthcare, and more. Scalability, explainability, and ethical considerations are also covered, as well as other difficulties and limits of neuro-symbolic AI. This study summarizes the current state of the art in neuro-symbolic artificial intelligence.

Journal Article

Share this book

Add to My Shelf

Autoencoding slow representations for semi-supervised data-efficient regression

by Kyrki, Ville , Tiwari, Kshitij , Struckmeier, Oliver in Artificial Intelligence , Brain research , Brownian motion

2023

The slowness principle is a concept inspired by the visual cortex of the brain. It postulates that the underlying generative factors of a quickly varying sensory signal change on a different, slower time scale. By applying this principle to state-of-the-art unsupervised representation learning methods one can learn a latent embedding to perform supervised downstream regression tasks more data efficient. In this paper, we compare different approaches to unsupervised slow representation learning such as L p norm based slowness regularization and the SlowVAE, and propose a new term based on Brownian motion used in our method, the S-VAE. We empirically evaluate these slowness regularization terms with respect to their downstream task performance and data efficiency in state estimation and behavioral cloning tasks. We find that slow representations show great performance improvements in settings where only sparse labeled training data is available. Furthermore, we present a theoretical and empirical comparison of the discussed slowness regularization terms. Finally, we discuss how the Fréchet Inception Distance (FID), commonly used to determine the generative capabilities of GANs, can predict the performance of trained models in supervised downstream tasks.

Journal Article

Share this book

Add to My Shelf

Learning Low Dimensional Convolutional Neural Networks for High-Resolution Remote Sensing Image Retrieval

by Zhou, Weixun , Li, Congmin , Shao, Zhenfeng in Architecture , Aversion learning , Complexity

2017

Learning powerful feature representations for image retrieval has always been a challenging task in the field of remote sensing. Traditional methods focus on extracting low-level hand-crafted features which are not only time-consuming but also tend to achieve unsatisfactory performance due to the complexity of remote sensing images. In this paper, we investigate how to extract deep feature representations based on convolutional neural networks (CNNs) for high-resolution remote sensing image retrieval (HRRSIR). To this end, several effective schemes are proposed to generate powerful feature representations for HRRSIR. In the first scheme, a CNN pre-trained on a different problem is treated as a feature extractor since there are no sufficiently-sized remote sensing datasets to train a CNN from scratch. In the second scheme, we investigate learning features that are specific to our problem by first fine-tuning the pre-trained CNN on a remote sensing dataset and then proposing a novel CNN architecture based on convolutional layers and a three-layer perceptron. The novel CNN has fewer parameters than the pre-trained and fine-tuned CNNs and can learn low dimensional features from limited labelled images. The schemes are evaluated on several challenging, publicly available datasets. The results indicate that the proposed schemes, particularly the novel CNN, achieve state-of-the-art performance.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter