Catalogue Search | MBRL

Multi-robot path planning based on a deep reinforcement learning DQN algorithm

by Juntao, Li , Yang, Yang , Lingling, Peng in algorithmic process , Algorithms , Automation

2020

The unmanned warehouse dispatching system of the ‘goods to people’ model uses a structure mainly based on a handling robot, which saves considerable manpower and improves the efficiency of the warehouse picking operation. However, the optimal performance of the scheduling system algorithm has high requirements. This study uses a deep Q-network (DQN) algorithm in a deep reinforcement learning algorithm, which combines the Q-learning algorithm, an empirical playback mechanism, and the volume-based technology of productive neural networks to generate target Q-values to solve the problem of multi-robot path planning. The aim of the Q-learning algorithm in deep reinforcement learning is to address two shortcomings of the robot path-planning problem: slow convergence and excessive randomness. Preceding the start of the algorithmic process, prior knowledge and prior rules are used to improve the DQN algorithm. Simulation results show that the improved DQN algorithm converges faster than the classic deep reinforcement learning algorithm and can more quickly learn the solutions to path-planning problems. This improves the efficiency of multi-robot path planning.

Journal Article

Share this book

Add to My Shelf

A Self-Adaptive Reinforcement-Exploration Q-Learning Algorithm

by Zhang, Shenglan , Zhang, Zuqiong , Tang, Liu in Adaptive algorithms , Algorithms , Attenuation

2021

Directing at various problems of the traditional Q-Learning algorithm, such as heavy repetition and disequilibrium of explorations, the reinforcement-exploration strategy was used to replace the decayed ε-greedy strategy in the traditional Q-Learning algorithm, and thus a novel self-adaptive reinforcement-exploration Q-Learning (SARE-Q) algorithm was proposed. First, the concept of behavior utility trace was introduced in the proposed algorithm, and the probability for each action to be chosen was adjusted according to the behavior utility trace, so as to improve the efficiency of exploration. Second, the attenuation process of exploration factor ε was designed into two phases, where the first phase centered on the exploration and the second one transited the focus from the exploration into utilization, and the exploration rate was dynamically adjusted according to the success rate. Finally, by establishing a list of state access times, the exploration factor of the current state is adaptively adjusted according to the number of times the state is accessed. The symmetric grid map environment was established via OpenAI Gym platform to carry out the symmetrical simulation experiments on the Q-Learning algorithm, self-adaptive Q-Learning (SA-Q) algorithm and SARE-Q algorithm. The experimental results show that the proposed algorithm has obvious advantages over the first two algorithms in the average number of turning times, average inside success rate, and number of times with the shortest planned route.

Journal Article

Share this book

Add to My Shelf

Navigating Intelligence: A Survey of Google OR‐Tools and Machine Learning for Global Path Planning in Autonomous Vehicles

by Asef, Pedram , Benoit, Alexandre in Algorithms , Application programming interface , Applications programs

2024

We offer a new in‐depth investigation of global path planning (GPP) for unmanned ground vehicles, an autonomous mining sampling robot named ROMIE. GPP is essential for ROMIE's optimal performance, which is translated into solving the traveling salesman problem, a complex graph theory challenge that is crucial for determining the most effective route to cover all sampling locations in a mining field. This problem is central to enhancing ROMIE's operational efficiency and competitiveness against human labor by optimizing cost and time. The primary aim of this research is to advance GPP by developing, evaluating, and improving a cost‐efficient software and web application. We delve into an extensive comparison and analysis of Google operations research (OR)‐Tools optimization algorithms. Our study is driven by the goal of applying and testing the limits of OR‐Tools capabilities by integrating Reinforcement Learning techniques for the first time. This enables us to compare these methods with OR‐Tools, assessing their computational effectiveness and real‐world application efficiency. Our analysis seeks to provide insights into the effectiveness and practical application of each technique. Our findings indicate that Q‐Learning stands out as the optimal strategy, demonstrating superior efficiency by deviating only 1.2% on average from the optimal solutions across our datasets. Advancing global path planning algorithm is studied for transforming geochemical mining sampling in autonomous vehicles. Cutting‐edge algorithms are harnessed to solve the intricate traveling salesman problem, optimizing route efficiency. A novel analysis of operations research‐tools and reinforcement learning techniques is investigated, demonstrating Q‐learning's superior efficiency (codes provided for benchmarking). Technological advancements with a new benchmark for autonomous mining operations are provided.

Journal Article

Share this book

Add to My Shelf

Adaptive PID controller based on Q-learning algorithm

by Lam, Hak-Keung , Shi, Qian , Xiao, Bo in Adaptive algorithms , Adaptive control , adaptive PID controller

2018

An adaptive proportional–integral–derivative (PID) controller based on Q-learning algorithm is proposed to balance the cart–pole system in simulation environment. This controller was trained using Q-learning algorithm and implemented the learned Q-tables to change the gains of linear PID controllers according to the state of the system during the control process. The adaptive PID controller based on Q-learning algorithm was trained from a set of fixed initial positions and was able to balance the system starting from a series of initial positions that are different from the ones used in the training session, which achieved equivalent or even better performances in comparison with the conventional PID controller and the controller only uses Q-learning algorithm. This indicates the advantage of the adaptive PID controller based on Q-learning algorithm both in the generality of balancing the cart–pole system from a relatively wide range of initial positions and in the stabilisability of achieving smaller steady-state error.

Journal Article

Share this book

Add to My Shelf

Reinforcement learning method for plug-in electric vehicle bidding

by Wei, Wei , Shafie-khah, Miadreza , Catalão, João P.S. in aggregator role , Algorithms , Alternative energy sources

2019

This study proposes a novel multi-agent method for electric vehicle (EV) owners who will take part in the electricity market. Each EV is considered as an agent, and all the EVs have vehicle-to-grid capability. These agents aim to minimise the charging cost and to increase the privacy of EV owners due to omitting the aggregator role in the system. Each agent has two independent decision cores for buying and selling energy. These cores are developed based on a reinforcement learning (RL) algorithm, i.e. Q-learning algorithm, due to its high efficiency and appropriate performance in multi-agent methods. Based on the proposed method, agents can buy and sell energy with the cost minimisation goal, while they should always have enough energy for the trip, considering the uncertain behaviours of EV owners. Numeric simulations on an illustrative example with one agent and a testing system with 500 agents demonstrate the effectiveness of the proposed method.

Journal Article

Share this book

Add to My Shelf

Path Planning for Wheeled Mobile Robot in Partially Known Uneven Terrain

by Bai, Xiaoshan , Li, Guobin , Khan, Awais in Algorithms , A⋆ algorithm , Cameras

2022

Path planning for wheeled mobile robots on partially known uneven terrain is an open challenge since robot motions can be strongly influenced by terrain with incomplete environmental information such as locally detected obstacles and impassable terrain areas. This paper proposes a hierarchical path planning approach for a wheeled robot to move in a partially known uneven terrain. We first model the partially known uneven terrain environment respecting the terrain features, including the slope, step, and unevenness. Second, facilitated by the terrain model, we use A⋆ algorithm to plan a global path for the robot based on the partially known map. Finally, the Q-learning method is employed for local path planning to avoid locally detected obstacles in close range as well as impassable terrain areas when the robot tracks the global path. The simulation and experimental results show that the designed path planning approach provides satisfying paths that avoid locally detected obstacles and impassable areas in a partially known uneven terrain compared with the classical A⋆ algorithm and the artificial potential field method.

Journal Article

Share this book

Add to My Shelf

Stochastic games for power grid coordinated defence against coordinated attacks

by Feng, Xiaomeng , Sun, Qiuye in Algorithms , attack-and-defence actions , bad data detectors

2020

As the worst-case interacting false data to the power system state estimation (SE), cyber data attacks can avoid being filtered out by most bad data detectors. In this study, coordinated attacks (unobservable attack and logic bomb attack) and coordinated defences (honeypot and weakening vision) are used to analyse attackers’ and defenders’ behaviours, respectively. To quantify the potential physical influences (attack-and-defence) benefits, the residual of the expected state is devised. Subsequently, a zero-sum stochastic game is utilised to model the interaction between the cyber-physical power system and the external attack-and-defence actions. This game is demonstrated to admit a Nash equilibrium and the minimax Q-learning algorithm is introduced to enable the two players to reach their equilibrium strategies while maximising their respective minimum rewards in a sequence of stages. Numerous simulations of the stochastic game model on the IEEE 14-bus system show that while resisting the isolated or coordinated attacks, the optimal coordinated defences are more effective than those of isolated attacks.

Journal Article

Share this book

Add to My Shelf

Dynamic matching strategy for college students’ innovative training projects based on reinforcement learning optimization

by Ding, Yumo , Ju, Xiao in Academic achievement , Accuracy , Algorithms

2026

Matching college students with appropriate innovative training projects is a challenging task that often relies on static assignment techniques, which overlook individual interests, skills, and learning styles. Traditional methods lead to mismatched assignments, decreased engagement, and lower innovation output. This research presents an intelligent, dynamic matching algorithm that maximizes the allocation of students to innovative training projects, utilizing techniques based on reinforcement learning (RL) to analyze interactions and continually optimize assignment decisions. The data was collected, which involves the profiles of students, their interests and skills rating, and project metadata. The data is pre-processed to normalize and encode categorical features. The extraction of features, dimensionality reduction, and significant matching signs are obtained by the Principal Component Analysis (PCA). The Q-Learning Algorithm Tuned Dueling Deep Q-Network (QLA-D2QNet) was developed to dynamically learn optimal matching policies through interaction with the environment and reward feedback. QLA is used to learn optimal assignment strategies by trial and error. D2QNet separates value and advantage functions to enhance policy learning stability. The model constantly adjusts matching policies based on feedback for project success and student satisfaction. The experimental results indicate that the QLA-D2QNet significantly outperforms traditional manual approaches. The best results include a Skill-Interest Fit of 89.10%, a Project Completion Rate of 94.00%, and a Skill Improvement Score of 31.40%. The suggested QLA-D2QNet model provides a scalable, flexible, and successful technique for dynamically matching students to training projects, resulting in dramatically improved educational outcomes in creative learning environments.

Journal Article

Share this book

Add to My Shelf

An integrated scheduling approach considering dispatching strategy and conflict-free route of AMRs in flexible job shop

by Liu, Jiaojiao , Chen, Yuqi , Sun, Baofeng in Genetic algorithms , Job shop scheduling , Job shops

2023

To reveal the profound impact of dispatching strategy and route of autonomous mobile robot (AMR) on scheduling in the flexible job shop with AMRs, this study presents the bi-level programming model for integrated scheduling with machines and AMRs (ISMV), subdividing distributed shared dispatching strategy (DSDS) and following dispatching strategy (FDS) for AMRs. The integrated scheduling model is developed at the upper level with the objective of minimizing the makespan, and the AMR conflict-free route planning (CFRP) model is formulated at the lower level to minimize travel time. To solve the model, a novel algorithmic framework (SLGA-D) composed of the self-learning genetic algorithm (SLGA) and Dijkstra with time window (DijkstraTW) is designed. The SLGA is formed by embedding the Q-learning into genetic algorithm to intelligently adjust crossover probability and mutation probability. Several experiments are implemented to validate the SLGA-D and the model proposed in this study. The results of experiments prove that reinforcement learning mechanism within Q-learning successfully enhances the global search ability of the algorithm, and production scheduling and distribution route are optimized synergistically by the model. Sensitivity analyses reveal that, as the number of AMRs rises, the makespan tends to stabilize after decreasing rapidly to a threshold value. Once the number of AMRs exceeds a certain threshold (w=5 in this study), it is not significant to shorten the makespan by continuing to invest AMRs. When the number of production tasks to be processed does not exceed the number of AMRs, following dispatching strategy of AMR is more favorable; conversely, distributed shared dispatching strategy is superior. The results provide guideline and inspiration for managers who are committed to manufacturing schedule and control.

Journal Article

Share this book

Add to My Shelf

Research on path planning algorithm of mobile robot based on reinforcement learning

by Pan, Guoqian , Zhou, Xinzhi , Xiang, Yong in Artificial Intelligence , Computational Intelligence , Control

2022

In order to solve the problems of low learning efficiency and slow convergence speed when mobile robot uses reinforcement learning method for path planning in complex environment, a reinforcement learning method based on each round path planning result is proposed. Firstly, the algorithm adds obstacle learning matrix to improve the success rate of path planning; and introduces heuristic reward to speed up the learning process by reducing the search space; then proposes a method of dynamically adjusting the exploration factor to balance the exploration and utilization in path planning, so as to further improve the performance of the algorithm. Finally, the simulation experiment in grid environment shows that compared with Q-learning algorithm, the improved algorithm not only shortens the average path length of the robot to reach the target position, but also speeds up the learning efficiency of the algorithm, so that the robot can find the optimal path more quickly. The code of EPRQL algorithm proposed in this paper has been published to GitHub: https://github.com/panpanpanguoguoqian/mypaper1.git .

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter