Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
757 result(s) for "reward function"
Sort by:
Deep Reinforcement Learning for Indoor Mobile Robot Path Planning
This paper proposes a novel incremental training mode to address the problem of Deep Reinforcement Learning (DRL) based path planning for a mobile robot. Firstly, we evaluate the related graphic search algorithms and Reinforcement Learning (RL) algorithms in a lightweight 2D environment. Then, we design the algorithm based on DRL, including observation states, reward function, network structure as well as parameters optimization, in a 2D environment to circumvent the time-consuming works for a 3D environment. We transfer the designed algorithm to a simple 3D environment for retraining to obtain the converged network parameters, including the weights and biases of deep neural network (DNN), etc. Using these parameters as initial values, we continue to train the model in a complex 3D environment. To improve the generalization of the model in different scenes, we propose to combine the DRL algorithm Twin Delayed Deep Deterministic policy gradients (TD3) with the traditional global path planning algorithm Probabilistic Roadmap (PRM) as a novel path planner (PRM+TD3). Experimental results show that the incremental training mode can notably improve the development efficiency. Moreover, the PRM+TD3 path planner can effectively improve the generalization of the model.
Deep reinforcement learning for imbalanced classification
Data in real-world application often exhibit skewed class distribution which poses an intense challenge for machine learning. Conventional classification algorithms are not effective in case of imbalanced data distribution, and may fail when the data distribution is highly imbalanced. To address this issue, we propose a general imbalanced classification model based on deep reinforcement learning, in which we formulate the classification problem as a sequential decision-making process and solve it by a deep Q-learning network. In our model, the agent performs a classification action on one sample in each time step, and the environment evaluates the classification action and returns a reward to the agent. The reward from the minority class sample is larger, so the agent is more sensitive to the minority class. The agent finally finds an optimal classification policy in imbalanced data under the guidance of the specific reward function and beneficial simulated environment. Experiments have shown that our proposed model outperforms other imbalanced classification algorithms, and identifies more minority samples with better classification performance.
Efficient Path Planning for Mobile Robot Based on Deep Deterministic Policy Gradient
When a traditional Deep Deterministic Policy Gradient (DDPG) algorithm is used in mobile robot path planning, due to the limited observable environment of mobile robots, the training efficiency of the path planning model is low, and the convergence speed is slow. In this paper, Long Short-Term Memory (LSTM) is introduced into the DDPG network, the former and current states of the mobile robot are combined to determine the actions of the robot, and a Batch Norm layer is added after each layer of the Actor network. At the same time, the reward function is optimized to guide the mobile robot to move faster towards the target point. In order to improve the learning efficiency, different normalization methods are used to normalize the distance and angle between the mobile robot and the target point, which are used as the input of the DDPG network model. When the model outputs the next action of the mobile robot, mixed noise composed of Gaussian noise and Ornstein–Uhlenbeck (OU) noise is added. Finally, the simulation environment built by a ROS system and a Gazebo platform is used for experiments. The results show that the proposed algorithm can accelerate the convergence speed of DDPG, improve the generalization ability of the path planning model and improve the efficiency and success rate of mobile robot path planning.
On the origin of the reward function: Exploring the role of conditioned reinforcement and social learning
Influential cognitive science theories postulate that decision-making is based on treating expected outcomes as incentives according to a reward function. Yet a systematic analysis of the learning processes that determine the reward function remains to be carried out. The paper fills this gap by examining the contribution of two fundamental learning processes: conditioned reinforcement , occurring either via direct or via vicarious experience, and imitative incentive learning , at play when an agent appropriates the incentives sought by another individual. From an evolutionary perspective, the two processes appear to be adaptive insofar as conditioned reinforcement might have evolved to simplify decision-making, while imitative incentive learning might have arisen to harness the full potential of social learning and to facilitate cooperation. The paper contributes to research on decision-making by offering a detailed analysis of the learning mechanisms that drive acquisition of the reward function.
Sleep and circadian contributions to adolescent alcohol use disorder
Adolescence is a time of marked changes across sleep, circadian rhythms, brain function, and alcohol use. Starting at puberty, adolescents' endogenous circadian rhythms and preferred sleep times shift later, often leading to a mismatch with the schedules imposed by secondary education. This mismatch induces circadian misalignment and sleep loss, which have been associated with affect dysregulation, increased drug and alcohol use, and other risk-taking behaviors in adolescents and adults. In parallel to developmental changes in sleep, adolescent brains are undergoing structural and functional changes in the circuits subserving the pursuit and processing of rewards. These developmental changes in reward processing likely contribute to the initiation of alcohol use during adolescence. Abundant evidence indicates that sleep and circadian rhythms modulate reward function, suggesting that adolescent sleep and circadian disturbance may contribute to altered reward function, and in turn, alcohol involvement. In this review, we summarize the relevant evidence and propose that these parallel developmental changes in sleep, circadian rhythms, and neural processing of reward interact to increase risk for alcohol use disorder (AUD). •Later sleep/circadian timing during adolescence are at odds with school start times.•Adolescents consequently suffer from circadian misalignment and sleep problems.•Circadian misalignment and sleep problems are linked to increased alcohol use.•Circadian rhythms and sleep modulate reward-related behavior and brain function.•Sleep/circadian effects on reward function may increase risk for adolescent AUDs.
Survey of Model-Based Reinforcement Learning: Applications on Robotics
Reinforcement learning is an appealing approach for allowing robots to learn new tasks. Relevant literature reveals a plethora of methods, but at the same time makes clear the lack of implementations for dealing with real life challenges. Current expectations raise the demand for adaptable robots. We argue that, by employing model-based reinforcement learning, the—now limited—adaptability characteristics of robotic systems can be expanded. Also, model-based reinforcement learning exhibits advantages that makes it more applicable to real life use-cases compared to model-free methods. Thus, in this survey, model-based methods that have been applied in robotics are covered. We categorize them based on the derivation of an optimal policy, the definition of the returns function, the type of the transition model and the learned task. Finally, we discuss the applicability of model-based reinforcement learning approaches in new applications, taking into consideration the state of the art in both algorithms and hardware.
Coverage Path Planning Using Reinforcement Learning-Based TSP for hTetran—A Polyabolo-Inspired Self-Reconfigurable Tiling Robot
One of the critical challenges in deploying the cleaning robots is the completion of covering the entire area. Current tiling robots for area coverage have fixed forms and are limited to cleaning only certain areas. The reconfigurable system is the creative answer to such an optimal coverage problem. The tiling robot’s goal enables the complete coverage of the entire area by reconfiguring to different shapes according to the area’s needs. In the particular sequencing of navigation, it is essential to have a structure that allows the robot to extend the coverage range while saving energy usage during navigation. This implies that the robot is able to cover larger areas entirely with the least required actions. This paper presents a complete path planning (CPP) for hTetran, a polyabolo tiled robot, based on a TSP-based reinforcement learning optimization. This structure simultaneously produces robot shapes and sequential trajectories whilst maximizing the reward of the trained reinforcement learning (RL) model within the predefined polyabolo-based tileset. To this end, a reinforcement learning-based travel sales problem (TSP) with proximal policy optimization (PPO) algorithm was trained using the complementary learning computation of the TSP sequencing. The reconstructive results of the proposed RL-TSP-based CPP for hTetran were compared in terms of energy and time spent with the conventional tiled hypothetical models that incorporate TSP solved through an evolutionary based ant colony optimization (ACO) approach. The CPP demonstrates an ability to generate an ideal Pareto optima trajectory that enhances the robot’s navigation inside the real environment with the least energy and time spent in the company of conventional techniques.
Navigation of Mobile Robots Based on Deep Reinforcement Learning: Reward Function Optimization and Knowledge Transfer
This paper presents an end-to-end online learning navigation method based on deep reinforcement learning (DRL) for mobile robots, whose objective is that mobile robots can avoid obstacles to reach the target point in an unknown environment. Specifically, double deep Q-networks (Double DQN), dueling deep Q-networks (Dueling DQN) and prioritized experience replay (PER) are combined to form prioritized experience replay-double dueling deep Q-networks (PER-D3QN) algorithm to realize high-efficiency navigation of mobile robots. Moreover, considering the problem of sparse reward in the traditional reward function, an artificial potential field is introduced into the reward function to guide robots to fulfill the navigation task through the change of potential energy. Furthermore, in order to accelerate the training of mobile robots in complex environment, a knowledge transfer training method is proposed, which migrates the knowledge from simple to complex environment, and quickly learns on the basis of the priori knowledge. Finally, the performance is validated based on a three-dimensional simulator, which shows that the mobile robot can obtain higher rewards and achieve higher success rates and less time for navigation, indicating that the proposed approaches are feasible and efficient.
Robotic Arm Trajectory Planning in Dynamic Environments Based on Self-Optimizing Replay Mechanism
In complex dynamic environments, robotic arms face multiple challenges such as real-time environmental changes, high-dimensional state spaces, and strong uncertainties. Trajectory planning tasks based on deep reinforcement learning (DRL) suffer from difficulties in acquiring human expert strategies, low experience utilization (leading to slow convergence), and unreasonable reward function design. To address these issues, this paper designs a neural network-based expert-guided triple experience replay mechanism (NETM) and proposes an improved reward function adapted to dynamic environments. This replay mechanism integrates imitation learning’s fast data fitting with DRL’s self-optimization to expand limited expert demonstrations and algorithm-generated successes into optimized expert experiences. Experimental results show the expanded expert experience accelerates convergence: in dynamic scenarios, NETM boosts accuracy by over 30% and safe rate by 2.28% compared to baseline algorithms.
Multi-Objective Energy Management Strategy for Hybrid Electric Vehicles Based on TD3 with Non-Parametric Reward Function
The energy management system (EMS) of hybridization and electrification plays a pivotal role in improving the stability and cost-effectiveness of future vehicles. Existing efforts mainly concentrate on specific optimization targets, like fuel consumption, without sufficiently taking into account the degradation of on-board power sources. In this context, a novel multi-objective energy management strategy based on deep reinforcement learning is proposed for a hybrid electric vehicle (HEV), explicitly conscious of lithium-ion battery (LIB) wear. To be specific, this paper mainly contributes to three points. Firstly, a non-parametric reward function is introduced, for the first time, into the twin-delayed deep deterministic policy gradient (TD3) strategy, to facilitate the optimality and adaptability of the proposed energy management strategy and to mitigate the effort of parameter tuning. Then, to cope with the problem of state redundancy, state space refinement techniques are included in the proposed strategy. Finally, battery health is incorporated into this multi-objective energy management strategy. The efficacy of this framework is validated, in terms of training efficiency, optimality and adaptability, under various standard driving tests.