Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
1,015
result(s) for
"Multi-agent reinforcement learning"
Sort by:
Hierarchical multi‐agent reinforcement learning for multi‐aircraft close‐range air combat
2023
The close‐range autonomous air combat has gained significant attention from researchers involved in applications related to artificial intelligence (AI). A majority of the previous studies on autonomous air combat were focused on one‐on‐one air combat scenarios, however, the modern air combat is mostly conducted in formations. With regard to the aforementioned factors, a novel hierarchical maneuvering control architecture is introduced that is applied to the multi‐aircraft close‐range air combat scenario, which can handle air combat scenarios with variable‐size formation. Subsequently, three air combat sub‐tasks are designed, and recurrent soft actor‐critic (RSAC) algorithm combined with competitive self‐play (SP) is incorporated to learn the sub‐strategies. A novel hierarchical multi‐agent reinforcement learning (HMARL) algorithm is proposed to obtain the high‐level strategy for target and sub‐strategy selection. The training performance of the training algorithm of sub‐strategies and high‐level strategy in different air combat scenarios is evaluated. The obtained strategies are analyzed and it is found that the formations exhibit effective cooperative behavior in symmetric and asymmetric scenarios. Finally, the ideas of engineering implementation of the maneuvering control architecture are given. The study provides a solution for future multi‐aircraft autonomous air combat. Aiming at the problem of maneuvering control in homogeneous multi‐aircraft close‐range air combat, a hierarchical maneuvering control architecture is proposed, which allows for multi‐aircraft close‐range air combat in formations of varying numbers of aircraft, which has good performance, symmetry and outputs smooth control commands. We also present some ideas for the engineering realization.
Journal Article
Multi‐Agent Reinforcement Learning Algorithm Based on Local Observation Imitation Learning
2025
This paper investigates the error accumulation problem in centralized training and decentralized execution (CTDE) policy‐based multi‐agent reinforcement learning (MARL) algorithms, which arises from local observation inaccuracies. To address this issue, we propose a novel MARL algorithm that incorporates imitation learning using local observations. Firstly, by analysing the multi‐agent proximal policy optimization algorithm and examining the problems arising when global states are replaced with local observations, it is proved that insufficient observations can lead to information loss, thereby introducing errors of advantage function, and it is demonstrated that the generalized advantage estimation method accumulates errors during the training process. Then, imitation learning is introduced and a novel training framework that combines reinforcement learning and imitation learning is proposed. During the reinforcement learning phase, an MARL agent trained with global observations acts as an expert. Subsequently, imitation learning is applied to train another agent that mimics the expert's decisions using only local observations. Finally, the effectiveness of this algorithm is verified in some commonly used multi‐agent environments, which demonstrates its superior performance compared to traditional multi‐agent reinforcement learning algorithms. In this work, we propose a novel multi‐agent reinforcement learning algorithm that leverages local observation imitation learning to effectively mitigate the error accumulation caused by local observation errors in centralized training and decentralized execution frameworks. By combining reinforcement learning with imitation learning techniques, our algorithm demonstrates superior performance compared to traditional MARL methods, especially in scenarios with incomplete local observations.
Journal Article
GHQ: grouped hybrid Q-learning for cooperative heterogeneous multi-agent reinforcement learning
2024
Previous deep multi-agent reinforcement learning (MARL) algorithms have achieved impressive results, typically in symmetric and homogeneous scenarios. However, asymmetric heterogeneous scenarios are prevalent and usually harder to solve. In this paper, the main discussion is about the cooperative heterogeneous MARL problem in asymmetric heterogeneous maps of the Starcraft Multi-Agent Challenges (SMAC) environment. Recent mainstream approaches use policy-based actor-critic algorithms to solve the heterogeneous MARL problem with various individual agent policies. However, these approaches lack formal definition and further analysis of the heterogeneity problem. Therefore, a formal definition of the Local Transition Heterogeneity (LTH) problem is first given. Then, the LTH problem in SMAC environment can be studied. To comprehensively reveal and study the LTH problem, some new asymmetric heterogeneous maps in SMAC are designed. It has been observed that baseline algorithms fail to perform well in the new maps. Then, the authors propose the Grouped Individual-Global-Max (GIGM) consistency and a novel MARL algorithm, Grouped Hybrid Q-Learning (GHQ). GHQ separates agents into several groups and keeps individual parameters for each group. To enhance cooperation between groups, GHQ maximizes the mutual information between trajectories of different groups. A novel hybrid structure for value factorization in GHQ is also proposed. Finally, experiments on the original and the new maps show the fabulous performance of GHQ compared to other state-of-the-art algorithms.
Journal Article
Applications of Multi-Agent Deep Reinforcement Learning: Models and Algorithms
by
Abdikarim Mohamed Ibrahim
,
Kok-Lim Alvin Yau
,
Yung-Wey Chong
in
Algorithms
,
applied reinforcement learning
,
Biology (General)
2021
Recent advancements in deep reinforcement learning (DRL) have led to its application in multi-agent scenarios to solve complex real-world problems, such as network resource allocation and sharing, network routing, and traffic signal controls. Multi-agent DRL (MADRL) enables multiple agents to interact with each other and with their operating environment, and learn without the need for external critics (or teachers), thereby solving complex problems. Significant performance enhancements brought about by the use of MADRL have been reported in multi-agent domains; for instance, it has been shown to provide higher quality of service (QoS) in network resource allocation and sharing. This paper presents a survey of MADRL models that have been proposed for various kinds of multi-agent domains, in a taxonomic approach that highlights various aspects of MADRL models and applications, including objectives, characteristics, challenges, applications, and performance measures. Furthermore, we present open issues and future directions of MADRL.
Journal Article
Solving Action Semantic Conflict in Physically Heterogeneous Multi-Agent Reinforcement Learning with Generalized Action-Prediction Optimization
2025
Traditional multi-agent reinforcement learning (MARL) algorithms typically implement global parameter sharing across various types of heterogeneous agents without meticulously differentiating between different action semantics. This approach results in the action semantic conflict problem, which decreases the generalization ability of policy networks across heterogeneous types of agents and decreases the cooperation among agents in intricate scenarios. Conversely, completely independent agent parameters significantly escalate computational costs and training complexity. To address these challenges, we introduce an adaptive MARL algorithm named Generalized Action-Prediction Optimization (GAPO). First, we introduce the Generalized Action Space (GAS), which represents the union of all agent actions with distinct semantics. All agents first compute their unified representation in the GAS, and then generate their heterogeneous action policies with different available action masks. Second, in order to further improve cooperation between heterogeneous groups, we propose a Cross-Group Prediction (CGP) loss, which adaptively predicts the action policies of other groups by leveraging trajectory information. We integrate the GAPO into both value-based and policy-based MARL algorithms, giving rise to two practical algorithms: G-QMIX and G-MAPPO. Experimental results obtained within the SMAC, MPE, MAMuJoCo, and RPE environments demonstrate the superiority of G-QMIX and G-MAPPO over several state-of-the-art MARL methods, validating the effectiveness of our proposed adaptive generalized MARL approach.
Journal Article
Power Allocation and Energy Cooperation for UAV-Enabled MmWave Networks: A Multi-Agent Deep Reinforcement Learning Approach
Unmanned Aerial Vehicle (UAV)-assisted cellular networks over the millimeter-wave (mmWave) frequency band can meet the requirements of a high data rate and flexible coverage in next-generation communication networks. However, higher propagation loss and the use of a large number of antennas in mmWave networks give rise to high energy consumption and UAVs are constrained by their low-capacity onboard battery. Energy harvesting (EH) is a viable solution to reduce the energy cost of UAV-enabled mmWave networks. However, the random nature of renewable energy makes it challenging to maintain robust connectivity in UAV-assisted terrestrial cellular networks. Energy cooperation allows UAVs to send their excessive energy to other UAVs with reduced energy. In this paper, we propose a power allocation algorithm based on energy harvesting and energy cooperation to maximize the throughput of a UAV-assisted mmWave cellular network. Since there is channel-state uncertainty and the amount of harvested energy can be treated as a stochastic process, we propose an optimal multi-agent deep reinforcement learning algorithm (DRL) named Multi-Agent Deep Deterministic Policy Gradient (MADDPG) to solve the renewable energy resource allocation problem for throughput maximization. The simulation results show that the proposed algorithm outperforms the Random Power (RP), Maximal Power (MP) and value-based Deep Q-Learning (DQL) algorithms in terms of network throughput.
Journal Article
Multi-Task Multi-Agent Reinforcement Learning for Real-Time Scheduling of a Dual-Resource Flexible Job Shop with Robots
by
Xie, Zhiqiang
,
Zhu, Xiaofei
,
Wang, Yaping
in
Algorithms
,
Artificial intelligence
,
Decision making
2023
In this paper, a real-time scheduling problem of a dual-resource flexible job shop with robots is studied. Multiple independent robots and their supervised machine sets form their own work cells. First, a mixed integer programming model is established, which considers the scheduling problems of jobs and machines in the work cells, and of jobs between work cells, based on the process plan flexibility. Second, in order to make real-time scheduling decisions, a framework of multi-task multi-agent reinforcement learning based on centralized training and decentralized execution is proposed. Each agent interacts with the environment and completes three decision-making tasks: job sequencing, machine selection, and process planning. In the process of centralized training, the value network is used to evaluate and optimize the policy network to achieve multi-agent cooperation, and the attention mechanism is introduced into the policy network to realize information sharing among multiple tasks. In the process of decentralized execution, each agent performs multiple task decisions through local observations according to the trained policy network. Then, observation, action, and reward are designed. Rewards include global and local rewards, which are decomposed into sub-rewards corresponding to tasks. The reinforcement learning training algorithm is designed based on a double-deep Q-network. Finally, the scheduling simulation environment is derived from benchmarks, and the experimental results show the effectiveness of the proposed method.
Journal Article
A survey of multi-agent deep reinforcement learning with communication
by
Zhu, Changxi
,
Dastani, Mehdi
,
Wang, Shihan
in
Artificial Intelligence
,
Communication
,
Computer Science
2024
Communication is an effective mechanism for coordinating the behaviors of multiple agents, broadening their views of the environment, and to support their collaborations. In the field of multi-agent deep reinforcement learning (MADRL), agents can improve the overall learning performance and achieve their objectives by communication. Agents can communicate various types of messages, either to all agents or to specific agent groups, or conditioned on specific constraints. With the growing body of research work in MADRL with communication (Comm-MADRL), there is a lack of a systematic and structural approach to distinguish and classify existing Comm-MADRL approaches. In this paper, we survey recent works in the Comm-MADRL field and consider various aspects of communication that can play a role in designing and developing multi-agent reinforcement learning systems. With these aspects in mind, we propose 9 dimensions along which Comm-MADRL approaches can be analyzed, developed, and compared. By projecting existing works into the multi-dimensional space, we discover interesting trends. We also propose some novel directions for designing future Comm-MADRL systems through exploring possible combinations of the dimensions.
Journal Article
Adaptive Policy Switching for Multi-Agent ASVs in Multi-Objective Aquatic Cleaning Environments
by
Marín, Sergio Toral
,
Yanes-Luis, Samuel
,
Seck, Dame
in
autonomous surface vehicles
,
Communication
,
Decision making
2026
Plastic pollution in aquatic environments is a major ecological problem requiring scalable autonomous solutions for cleanup. This study addresses the coordination of multiple Autonomous Surface Vehicles by formulating the problem as a Partially Observable Markov Game and decoupling the mission into two tasks: exploration to maximize coverage and cleaning to collect trash. These tasks share navigation requirements but present conflicting goals, motivating a multi-objective learning approach. The proposed multi-agent deep reinforcement learning framework involves the utilisation of the same Multitask Deep Q-network shared by all the agents, with a convolutional backbone and two heads, one dedicated to exploration and the other to cleaning. Parameter sharing and egocentric state design leverages agent homogeneity and enable experience aggregation across tasks. An adaptive mechanism governs task switching, combining task-specific rewards with a weighted aggregation and selecting tasks via a reward-greedy strategy. This enables the construction of Pareto fronts capturing non-dominated solutions. The framework demonstrates improvements over fixed-phase approaches, improving hypervolume and uniformity metrics by 14% and 300%, respectively. It also adapts to diverse initial trash distributions, providing decision-makers with a portfolio of effective and adaptive strategies for autonomous plastic cleanup.
Journal Article
Learning team-based navigation: a review of deep reinforcement learning techniques for multi-agent pathfinding
by
Younes, Younes Al
,
Chung, Jaehoon
,
Najjaran, Homayoun
in
Agents
,
Algorithms
,
Artificial Intelligence
2024
Multi-agent pathfinding (MAPF) is a critical field in many large-scale robotic applications, often being the fundamental step in multi-agent systems. The increasing complexity of MAPF in complex and crowded environments, however, critically diminishes the effectiveness of existing solutions. In contrast to other studies that have either presented a general overview of the recent advancements in MAPF or extensively reviewed Deep Reinforcement Learning (DRL) within multi-agent system settings independently, our work presented in this review paper focuses on highlighting the integration of DRL-based approaches in MAPF. Moreover, we aim to bridge the current gap in evaluating MAPF solutions by addressing the lack of unified evaluation indicators and providing comprehensive clarification on these indicators. Finally, our paper discusses the potential of model-based DRL as a promising future direction and provides its required foundational understanding to address current challenges in MAPF. Our objective is to assist readers in gaining insight into the current research direction, providing unified indicators for comparing different MAPF algorithms and expanding their knowledge of model-based DRL to address the existing challenges in MAPF.
Journal Article