Catalogue Search | MBRL

Power Allocation and Energy Cooperation for UAV-Enabled MmWave Networks: A Multi-Agent Deep Reinforcement Learning Approach

by Domingo, Mari Carmen in Algorithms , Alternative energy sources , Computer Simulation

2021

Unmanned Aerial Vehicle (UAV)-assisted cellular networks over the millimeter-wave (mmWave) frequency band can meet the requirements of a high data rate and flexible coverage in next-generation communication networks. However, higher propagation loss and the use of a large number of antennas in mmWave networks give rise to high energy consumption and UAVs are constrained by their low-capacity onboard battery. Energy harvesting (EH) is a viable solution to reduce the energy cost of UAV-enabled mmWave networks. However, the random nature of renewable energy makes it challenging to maintain robust connectivity in UAV-assisted terrestrial cellular networks. Energy cooperation allows UAVs to send their excessive energy to other UAVs with reduced energy. In this paper, we propose a power allocation algorithm based on energy harvesting and energy cooperation to maximize the throughput of a UAV-assisted mmWave cellular network. Since there is channel-state uncertainty and the amount of harvested energy can be treated as a stochastic process, we propose an optimal multi-agent deep reinforcement learning algorithm (DRL) named Multi-Agent Deep Deterministic Policy Gradient (MADDPG) to solve the renewable energy resource allocation problem for throughput maximization. The simulation results show that the proposed algorithm outperforms the Random Power (RP), Maximal Power (MP) and value-based Deep Q-Learning (DQL) algorithms in terms of network throughput.

Journal Article

Share this book

Add to My Shelf

An Improved Approach towards Multi-Agent Pursuit–Evasion Game Decision-Making Using Deep Reinforcement Learning

by Wan, Kaifang , Li, Bo , Gao, Xiaoguang in adversarial learning , Algorithms , Collaboration

2021

A pursuit–evasion game is a classical maneuver confrontation problem in the multi-agent systems (MASs) domain. An online decision technique based on deep reinforcement learning (DRL) was developed in this paper to address the problem of environment sensing and decision-making in pursuit–evasion games. A control-oriented framework developed from the DRL-based multi-agent deep deterministic policy gradient (MADDPG) algorithm was built to implement multi-agent cooperative decision-making to overcome the limitation of the tedious state variables required for the traditionally complicated modeling process. To address the effects of errors between a model and a real scenario, this paper introduces adversarial disturbances. It also proposes a novel adversarial attack trick and adversarial learning MADDPG (A2-MADDPG) algorithm. By introducing an adversarial attack trick for the agents themselves, uncertainties of the real world are modeled, thereby optimizing robust training. During the training process, adversarial learning was incorporated into our algorithm to preprocess the actions of multiple agents, which enabled them to properly respond to uncertain dynamic changes in MASs. Experimental results verified that the proposed approach provides superior performance and effectiveness for pursuers and evaders, and both can learn the corresponding confrontational strategy during training.

Journal Article

Share this book

Add to My Shelf

Frequency Diversity Array Radar and Jammer Intelligent Frequency Domain Power Countermeasures Based on Multi-Agent Reinforcement Learning

by Gong, Jian , Wang, Chunyang , Tan, Ming in Algorithms , Countermeasures , domain

2024

With the development of electronic warfare technology, the intelligent jammer dramatically reduces the performance of traditional radar anti-jamming methods. A key issue is how to actively adapt radar to complex electromagnetic environments and design anti-jamming strategies to deal with intelligent jammers. The space of the electromagnetic environment is dynamically changing, and the transmitting power of the jammer and frequency diversity array (FDA) radar in each frequency band is continuously adjustable. Both can learn the optimal strategy by interacting with the electromagnetic environment. Considering that the competition between the FDA radar and the jammer is a confrontation process of two agents, we find the optimal power allocation strategy for both sides by using the multi-agent deep deterministic policy gradient (MADDPG) algorithm based on multi-agent reinforcement learning (MARL). Finally, the simulation results show that the power allocation strategy of the FDA radar and the jammer can converge and effectively improve the performance of the FDA radar and the jammer in the intelligent countermeasure environment.

Journal Article

Share this book

Add to My Shelf

Task Assignment of UAV Swarms Based on Deep Reinforcement Learning

by Wang, Changhong , Liu, Bo , Li, Qinghua in Algorithms , Analysis , Assignment problem

2023

UAV swarm applications are critical for the future, and their mission-planning and decision-making capabilities have a direct impact on their performance. However, creating a dynamic and scalable assignment algorithm that can be applied to various groups and tasks is a significant challenge. To address this issue, we propose the Extensible Multi-Agent Deep Deterministic Policy Gradient (Ex-MADDPG) algorithm, which builds on the MADDPG framework. The Ex-MADDPG algorithm improves the robustness and scalability of the assignment algorithm by incorporating local communication, mean simulation observation, a synchronous parameter-training mechanism, and a scalable multiple-decision mechanism. Our approach has been validated for effectiveness and scalability through both simulation experiments in the Multi-Agent Particle Environment (MPE) and a real-world experiment. Overall, our results demonstrate that the Ex-MADDPG algorithm is effective in handling various groups and tasks and can scale well as the swarm size increases. Therefore, our algorithm holds great promise for mission planning and decision-making in UAV swarm applications.

Journal Article

Share this book

Add to My Shelf

MADDPG-D2: An Intelligent Dynamic Task Allocation Algorithm Based on Multi-Agent Architecture Driven by Prior Knowledge

by Li, Tengda , Fu, Qiang , Wang, Gang in Algorithms , Differential thermal analysis , Multiagent systems

2024

Aiming at the problems of low solution accuracy and high decision pressure when facing large-scale dynamic task allocation (DTA) and high-dimensional decision space with single agent, this paper combines the deep reinforcement learning (DRL) theory and an improved Multi-Agent Deep Deterministic Policy Gradient (MADDPG-D2) algorithm with a dual experience replay pool and a dual noise based on multi-agent architecture is proposed to improve the efficiency of DTA. The algorithm is based on the traditional Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm, and considers the introduction of a double noise mechanism to increase the action exploration space in the early stage of the algorithm, and the introduction of a double experience pool to improve the data utilization rate; at the same time, in order to accelerate the training speed and efficiency of the agents, and to solve the cold-start problem of the training, the a priori knowledge technology is applied to the training of the algorithm. Finally, the MADDPG-D2 algorithm is compared and analyzed based on the digital battlefield of ground and air confrontation. The experimental results show that the agents trained by the MADDPG-D2 algorithm have higher win rates and average rewards, can utilize the resources more reasonably, and better solve the problem of the traditional single agent algorithms facing the difficulty of solving the problem in the high-dimensional decision space. The MADDPG-D2 algorithm based on multi-agent architecture proposed in this paper has certain superiority and rationality in DTA.

Journal Article

Share this book

Add to My Shelf

MW-MADDPG: a meta-learning based decision-making method for collaborative UAV swarm

by Liu, XiangYu , Guo, Xiangke , Wang, Gang in Algorithms , Cluster analysis , Collaboration

2023

Unmanned Aerial Vehicles (UAVs) have gained popularity due to their low lifecycle cost and minimal human risk, resulting in their widespread use in recent years. In the UAV swarm cooperative decision domain, multi-agent deep reinforcement learning has significant potential. However, current approaches are challenged by the multivariate mission environment and mission time constraints. In light of this, the present study proposes a meta-learning based multi-agent deep reinforcement learning approach that provides a viable solution to this problem. This paper presents an improved MAML-based multi-agent deep deterministic policy gradient (MADDPG) algorithm that achieves an unbiased initialization network by automatically assigning weights to meta-learning trajectories. In addition, a Reward-TD prioritized experience replay technique is introduced, which takes into account immediate reward and TD-error to improve the resilience and sample utilization of the algorithm. Experiment results show that the proposed approach effectively accomplishes the task in the new scenario, with significantly improved task success rate, average reward, and robustness compared to existing methods.

Journal Article

Share this book

Add to My Shelf

An unmanned tank combat game driven by FPSO-MADDPG algorithm

by Yan, Dan , Wang, Fei , Zhou, Yudong in Algorithms , Artificial intelligence , Back propagation

2024

With the development of artificial intelligence and unmanned technology, unmanned vehicles have been utilized in a variety of situations which may be hazardous to human beings, even in real battle fields. An intelligent unmanned vehicle can be aware of surrounding situations and make appropriate responding decisions. For this purpose, this paper applies Multi-agent Deep Deterministic Policy Gradient (MADDPG) algorithm for vehicle’s of situation awareness and decision making, inside which a Fast Particle Swarm Optimization (FPSO) algorithm is proposed to calculate the optimal vehicle attitude and position; therefore, an improved deep reinforcement learning algorithm FPSO-MADPPG is formed. A specific advantage function is designed for the FPSO portion, which considers angle, distance, outflanking encirclement. A dedicated reward is designed for the MADPPG portion, which considers key factors like angle, distance, and damage. Finally, FPSO-MADPPG is then used in a combat game to operate unmanned tanks. Simulation results show that our method not only can obtain higher winning rate, but also higher reward and faster convergence than DDPG and MADPPG algorithms.

Journal Article

Share this book

Add to My Shelf

Research on Decision-Making Strategies for Multi-Agent UAVs in Island Missions Based on Rainbow Fusion MADDPG Algorithm

by Yang, Chaofan , Wang, Qi , Zhang, Meng in Algorithms , Analysis , Behavior

2025

To address the limitations of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm in autonomous control tasks including low convergence efficiency, poor training stability, inadequate adaptability of confrontation strategies, and challenges in handling sparse reward tasks—this paper proposes an enhanced algorithm by integrating the Rainbow module. The proposed algorithm improves long-term reward optimization through prioritized experience replay (PER) and multi-step TD updating mechanisms. Additionally, a dynamic reward allocation strategy is introduced to enhance the collaborative and adaptive decision-making capabilities of agents in complex adversarial scenarios. Furthermore, behavioral cloning is employed to accelerate convergence during the initial training phase. Extensive experiments are conducted on the MaCA simulation platform for 5 vs. 5 to 10 vs. 10 UAV island capture missions. The results demonstrate that the Rainbow-MADDPG outperforms the original MADDPG in several key metrics: (1) The average reward value improves across all confrontation scales, with notable enhancements in 6 vs. 6 and 7 vs. 7 tasks, achieving reward values of 14, representing 6.05-fold and 2.5-fold improvements over the baseline, respectively. (2) The convergence speed increases by 40%. (3) The combat effectiveness preservation rate doubles that of the baseline. Moreover, the algorithm achieves the highest average reward value in quasi-rectangular island scenarios, demonstrating its strong adaptability to large-scale dynamic game environments. This study provides an innovative technical solution to address the challenges of strategy stability and efficiency imbalance in multi-agent autonomous control tasks, with significant application potential in UAV defense, cluster cooperative tasks, and related fields.

Journal Article

Share this book

Add to My Shelf

Three-Dimensional Trajectory and Resource Allocation Optimization in Multi-Unmanned Aerial Vehicle Multicast System: A Multi-Agent Reinforcement Learning Method

by Hou, Yanzhao , Yu, Hongda , Wang, Dongyu in Algorithms , Altitude , Channel capacity

2023

Unmanned aerial vehicles (UAVs) are able to act as movable aerial base stations to enhance wireless coverage for edge users with poor ground communication quality. However, in urban environments, the link between UAVs and ground users can be blocked by obstacles, especially when complicated terrestrial infrastructures increase the probability of non-line-of-sight (NLoS) links. In this paper, in order to improve the average throughput, we propose a multi-UAV multicast system, where a multi-agent reinforcement learning method is utilized to help UAVs determine the optimal altitude and trajectory. Intelligent reflective surfaces (IRSs) are also employed to reflect signals to solve the blocking problem. Furthermore, since the UAV’s onboard power is limited, this paper aims to minimize the UAVs’ energy consumption and maximize the transmission rate for edge users by jointly optimizing the UAVs’ 3D trajectory and transmit power. Firstly, we deduce the channel capacity of ground users in different multicast groups. Subsequently, the K-medoids algorithm is utilized for the multicast grouping problem of edge users based on transmission rate requirements. Then, we employ the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm to learn an optimal solution and eliminate the non-stationarity of multi-agent training. Finally, the simulation results show that the proposed system can increase the average throughput by 14% approximately compared to the non-grouping system, and the MADDPG algorithm can achieve a 20% improvement in reducing the energy consumption of UAVs compared to traditional deep reinforcement learning (DRL) methods.

Journal Article

Share this book

Add to My Shelf

Advanced Cooperative Formation Control in Variable-Sweep Wing UAVs via the MADDPG–VSC Algorithm

by Cao, Zhengyang , Chen, Gang in Adaptability , Algorithms , Control algorithms

2024

UAV technology is advancing rapidly, and variable-sweep wing UAVs are increasingly valuable because they can adapt to different flight conditions. However, conventional control methods often struggle with managing continuous action spaces and responding to dynamic environments, making them inadequate for complex multi-UAV cooperative formation control tasks. To address these challenges, this study presents an innovative framework that integrates dynamic modeling with morphing control, optimized by the multi-agent deep deterministic policy gradient for two-sweep control (MADDPG–VSC) algorithm. This approach enables real-time sweep angle adjustments based on current flight states, significantly enhancing aerodynamic efficiency and overall UAV performance. The precise motion state model for wing morphing developed in this study underpins the MADDPG–VSC algorithm’s implementation. The algorithm not only optimizes multi-UAV formation control efficiency but also improves obstacle avoidance, attitude stability, and decision-making speed. Extensive simulations and real-world experiments consistently demonstrate that the proposed algorithm outperforms contemporary methods in multiple aspects, underscoring its practical applicability in complex aerial systems. This study advances control technologies for morphing-wing UAV formation and offers new insights into multi-agent cooperative control, with substantial potential for real-world applications.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter