Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
2 result(s) for "Multi-Agent Twin Delayed Deep Deterministic Policy Gradient (MATD3)"
Sort by:
Research on a Cooperative Grasping Method for Heterogeneous Objects in Unstructured Scenarios of Mine Conveyor Belts Based on an Improved MATD3
Underground coal mine conveying systems operate in unstructured environments. Influenced by geological and operational factors, coal conveyors are frequently contaminated by foreign objects such as coal gangue and anchor bolts. These contaminants disrupt conveying stability and pose challenges to safe mining operations, making their effective removal critical. Given the significant heterogeneity and unpredictability of these objects in shape, size, and orientation, precise manipulation requires dual-arm cooperative control. Traditional control algorithms rely on precise dynamic models and fixed parameters, lacking robustness in such unstructured environments. To address these challenges, this paper proposes a cooperative grasping method tailored for heterogeneous objects in unstructured environments. The MATD3 algorithm is employed to cooperatively perform dual-arm trajectory planning and grasping tasks. A multi-factor reward function is designed to accelerate convergence in continuous action spaces, optimize real-time grasping trajectories for foreign objects, and ensure stable robotic arm positioning. Furthermore, priority experience replay (PER) is integrated into the MATD3 framework to enhance experience utilization and accelerate convergence toward optimal policies. For slender objects, a sequential cooperative optimization strategy is developed to improve the stability and reliability of grasping and placement. Experimental results demonstrate that the P-MATD3 algorithm significantly improves grasping success rates and efficiency in unstructured environments. In single-arm tasks, compared to MATD3 and MADDPG, P-MATD3 increases grasping success rates by 7.1% and 9.94%, respectively, while reducing the number of steps required to reach the pre-grasping point by 11.44% and 12.77%. In dual-arm tasks, success rates increased by 5.58% and 9.84%, respectively, while step counts decreased by 11.6% and 18.92%. Robustness testing under Gaussian noise demonstrated that P-MATD3 maintains high stability even with varying noise intensities. Finally, ablation and comparative experiments comprehensively validated the proposed method’s effectiveness in simulated environments.
Optimizing uplink power control for energy efficiency in mmWave user-centric cell-free massive MIMO with deep reinforcement learning
User-centric (UC) Cell-Free massive Multiple-Input Multiple-Output (CF-mMIMO) millimeter-wave (mmWave) networks are a promising solution to meet the performance requirements of next-generation wireless systems. However, maximizing energy efficiency in dense deployments remains challenging due to coordination overhead and highly dynamic propagation conditions. This work addresses uplink power control in UC CF-mMIMO networks and proposes a Multi-Agent Twin Delayed Deep Deterministic Policy Gradient (MATD3) approach trained under a centralized training and decentralized execution (CTDE) paradigm. The simulations are performed in a PyTorch library and rely on 3GPP TR 38.901 specification for the mmWave channel model over a UC architecture with 35 user equipments (UEs) and 100 distributed access points (APs). Simulation results indicate clear gains over both DRL baselines and conventional optimization methods. In particular, the proposed scheme reaches an energy efficiency of up to 380 Mbit/joule and maintains spectral efficiencies above 18 bps/Hz. Moreover, the method also preserves user-level reliability with a median minimum per-user spectral efficiency remains above 9 bps/Hz, and the Jain fairness index reaches 0.96, preventing resource starvation while maintaining strict QoS guarantees. These findings demonstrate that multi-agent cooperation enables robust and energy-efficient power control policies, paving the way for cost-effective and scalable UC CF-mMIMO deployments.