Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
2
result(s) for
"Multi-Agent Twin Delayed Deep Deterministic Policy Gradient (MATD3)"
Sort by:
Research on a Cooperative Grasping Method for Heterogeneous Objects in Unstructured Scenarios of Mine Conveyor Belts Based on an Improved MATD3
by
Gao, Rui
,
Du, Jingyi
,
Wu, Xudong
in
Algorithms
,
Artificial intelligence
,
Coal-mining machinery
2025
Underground coal mine conveying systems operate in unstructured environments. Influenced by geological and operational factors, coal conveyors are frequently contaminated by foreign objects such as coal gangue and anchor bolts. These contaminants disrupt conveying stability and pose challenges to safe mining operations, making their effective removal critical. Given the significant heterogeneity and unpredictability of these objects in shape, size, and orientation, precise manipulation requires dual-arm cooperative control. Traditional control algorithms rely on precise dynamic models and fixed parameters, lacking robustness in such unstructured environments. To address these challenges, this paper proposes a cooperative grasping method tailored for heterogeneous objects in unstructured environments. The MATD3 algorithm is employed to cooperatively perform dual-arm trajectory planning and grasping tasks. A multi-factor reward function is designed to accelerate convergence in continuous action spaces, optimize real-time grasping trajectories for foreign objects, and ensure stable robotic arm positioning. Furthermore, priority experience replay (PER) is integrated into the MATD3 framework to enhance experience utilization and accelerate convergence toward optimal policies. For slender objects, a sequential cooperative optimization strategy is developed to improve the stability and reliability of grasping and placement. Experimental results demonstrate that the P-MATD3 algorithm significantly improves grasping success rates and efficiency in unstructured environments. In single-arm tasks, compared to MATD3 and MADDPG, P-MATD3 increases grasping success rates by 7.1% and 9.94%, respectively, while reducing the number of steps required to reach the pre-grasping point by 11.44% and 12.77%. In dual-arm tasks, success rates increased by 5.58% and 9.84%, respectively, while step counts decreased by 11.6% and 18.92%. Robustness testing under Gaussian noise demonstrated that P-MATD3 maintains high stability even with varying noise intensities. Finally, ablation and comparative experiments comprehensively validated the proposed method’s effectiveness in simulated environments.
Journal Article
Optimizing uplink power control for energy efficiency in mmWave user-centric cell-free massive MIMO with deep reinforcement learning
by
Absaloms, Heywood Ouma
,
Langat, Philip Kibet
,
Diarra, Dramane
in
Adaptability
,
Cell-Free 6G Networks
,
Cell-free massive MIMO (CF-mMIMO)
2026
User-centric (UC) Cell-Free massive Multiple-Input Multiple-Output (CF-mMIMO) millimeter-wave (mmWave) networks are a promising solution to meet the performance requirements of next-generation wireless systems. However, maximizing energy efficiency in dense deployments remains challenging due to coordination overhead and highly dynamic propagation conditions. This work addresses uplink power control in UC CF-mMIMO networks and proposes a Multi-Agent Twin Delayed Deep Deterministic Policy Gradient (MATD3) approach trained under a centralized training and decentralized execution (CTDE) paradigm. The simulations are performed in a PyTorch library and rely on 3GPP TR 38.901 specification for the mmWave channel model over a UC architecture with 35 user equipments (UEs) and 100 distributed access points (APs). Simulation results indicate clear gains over both DRL baselines and conventional optimization methods. In particular, the proposed scheme reaches an energy efficiency of up to 380 Mbit/joule and maintains spectral efficiencies above 18 bps/Hz. Moreover, the method also preserves user-level reliability with a median minimum per-user spectral efficiency remains above 9 bps/Hz, and the Jain fairness index reaches 0.96, preventing resource starvation while maintaining strict QoS guarantees. These findings demonstrate that multi-agent cooperation enables robust and energy-efficient power control policies, paving the way for cost-effective and scalable UC CF-mMIMO deployments.
Journal Article