Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
53
result(s) for
"twin delayed deep deterministic policy gradient"
Sort by:
Multi-Objective Energy Management Strategy for Hybrid Electric Vehicles Based on TD3 with Non-Parametric Reward Function
by
Du, Changqing
,
Wang, Jinhai
,
Yan, Fuwu
in
Artificial intelligence
,
Automobiles, Electric
,
Consumption
2023
The energy management system (EMS) of hybridization and electrification plays a pivotal role in improving the stability and cost-effectiveness of future vehicles. Existing efforts mainly concentrate on specific optimization targets, like fuel consumption, without sufficiently taking into account the degradation of on-board power sources. In this context, a novel multi-objective energy management strategy based on deep reinforcement learning is proposed for a hybrid electric vehicle (HEV), explicitly conscious of lithium-ion battery (LIB) wear. To be specific, this paper mainly contributes to three points. Firstly, a non-parametric reward function is introduced, for the first time, into the twin-delayed deep deterministic policy gradient (TD3) strategy, to facilitate the optimality and adaptability of the proposed energy management strategy and to mitigate the effort of parameter tuning. Then, to cope with the problem of state redundancy, state space refinement techniques are included in the proposed strategy. Finally, battery health is incorporated into this multi-objective energy management strategy. The efficacy of this framework is validated, in terms of training efficiency, optimality and adaptability, under various standard driving tests.
Journal Article
Efficient TD3 based path planning of mobile robot in dynamic environments using prioritized experience replay and LSTM
2025
To address the challenges of sample utilization efficiency and managing temporal dependencies, this paper proposes an efficient path planning method for mobile robot in dynamic environments based on an improved twin delayed deep deterministic policy gradient (TD3) algorithm. The proposed method, named PL-TD3, integrates prioritized experience replay (PER) and long short-term memory (LSTM) neural networks, which enhance both sample efficiency and the ability to handle time-series data. To verify the effectiveness of the proposed method, simulation and practical experiments were designed and conducted. In the simulation experiments, both static and dynamic obstacles were included in the test environment, along with experiments to assess generalization capabilities. The algorithm demonstrated superior performance in terms of both execution time and path efficiency. The practical experiments, based on the assumptions from the simulation tests, further confirmed that PL-TD3 has improved the effectiveness and robustness of path planning for mobile robot in dynamic environments.
Journal Article
Integrating self-attention and LSTM into TD3 for robust mobile robot navigation in dynamic environments
2026
Mobile robot path planning in dynamic environments is challenging because existing deep reinforcement learning methods lack temporal memory, suffer from inefficient sample utilization under uniform replay, and face credit assignment difficulties with sparse rewards. This paper proposes the Self-Attention LSTM TD3 (SAL-TD3) algorithm, which integrates LSTM networks and multi-head self-attention into the TD3 framework to capture temporal dependencies for proactive obstacle avoidance. A rank-based prioritized experience replay with n-step returns improves sample efficiency, and a composite reward function provides dense feedback for efficient policy learning. Experiments show that SAL-TD3 achieves a 91% success rate (vs. 77% for TD3), reduces path length by 16.6%, and lowers collision rate from 23% to 9%. Generalization tests and real-world robot deployment confirm robust sim-to-real transfer performance.
Journal Article
Visual Target-Driven Robot Crowd Navigation with Limited FOV Using Self-Attention Enhanced Deep Reinforcement Learning
2025
Navigating crowded environments poses significant challenges for mobile robots, particularly as traditional Simultaneous Localization and Mapping (SLAM)-based methods often struggle with dynamic and unpredictable settings. This paper proposes a visual target-driven navigation method using self-attention enhanced deep reinforcement learning (DRL) to overcome these limitations. The navigation policy is developed based on the Twin-Delayed Deep Deterministic Policy Gradient (TD3) algorithm, enabling efficient obstacle avoidance and target pursuit. We utilize a single RGB-D camera with a limited field of view (FOV) for target detection and surrounding sensing, where environmental features are extracted from depth data via a convolutional neural network (CNN). A self-attention network (SAN) is employed to compensate for the limited FOV, enhancing the robot’s capability of searching for the target when it is temporarily lost. Experimental results show that our method achieves a higher success rate and shorter average target-reaching time in dynamic environments, while offering hardware simplicity, cost-effectiveness, and ease of deployment in real-world applications.
Journal Article
Reinforcement learning-driven model predictive control for optimizing counter-rotating permanent magnet synchronous motor in submarine propulsion system
by
Dulecha, Kejela Adane
,
Ararso, Zawde Tolossa
,
Delelew, Eliyab Yosef
in
639/166
,
639/4077
,
Alternative energy sources
2026
Counter Rotating Permanent Magnet Synchronous Motors (CRPMSM) are increasingly favored in underwater application due to their high torque density, efficiency and ability to cancel out yaw inducing moments through the use of dual rotors spinning in opposite directions. However, ensuring synchronization between the rotors under varying load dynamic underwater conditions poses significant control challenges. To address these limitation this research proposed a Reinforcement Learning-Driven Model Predictive Control (RL-MPC) for optimizing the performance of CRPMSM in submarine propulsion systems. RL-MPC control architecture used a Twin Delayed Deep Deterministic Policy Gradient (TD3) reinforcement learning. The system is modeled in MATLAB/simulink with CRPMSM represented in d-q reference frame and driven by voltage source inverter (VSI) using Space Vector Pulse Width Modulation (SVPWM). The RL-MPC controller performance evaluated under three condition: constant speed with variable balanced load, variable speed with constant load and constant speed with unbalanced load variation. Simulation result confirm that the RL-MPC improves motor performance by enhancing speed tracking, reducing torque ripple, maintaining rotor synchronization improving transient response compared to standalone MPC. Quantitative comparison shows RL-MPC enhances dynamic performance comparatively over single MPC. The total harmonic distortion (THD) of stator current during unbalanced load resynchronization was enhanced from 9.3% (MPC) to 3.4% (RL-MPC), overshoot decreased from 30% to 16.6%, and settling time was enhanced from 1.4 s to 0.7 s. These enhancements validate RL-MPC achieves a 63.4% reduction in THD, 45% reduction in overshoot, and 50% enhancement in settling time under unbalanced load conditions. Finally the Lyapunov-based stability analysis confirms the closed-loop stability of the system.
Journal Article
Adaptive Non-Singular Fast Terminal Sliding Mode Trajectory Tracking Control for Robotic Manipulator with Novel Configuration Based on TD3 Deep Reinforcement Learning and Nonlinear Disturbance Observer
2026
This work proposes a non-singular fast terminal sliding mode control (NFTSMC) strategy based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm and a nonlinear disturbance observer (NDO) to address the issues of modeling errors, motion disturbances, and transmission friction in robotic manipulators. Firstly, a novel modular serial 5-DOF robotic manipulator configuration is designed, and its kinematic and dynamic models are established. Secondly, a nonlinear disturbance observer is employed to estimate the total disturbance of the system and apply feedforward compensation. Based on boundary layer technology, an improved NFTSMC method is proposed to accelerate the convergence of tracking errors, reduce chattering, and avoid singularity issues inherent in traditional terminal sliding mode control. The stability of the designed control system is proved using Lyapunov stability theory. Subsequently, a deep reinforcement learning (DRL) agent based on the TD3 algorithm is trained to adaptively adjust the control gains of the non-singular fast terminal sliding mode controller. The dynamic information of the robotic manipulator is used as the input to the TD3 agent, which searches for optimal controller parameters within a continuous action space. A composite reward function is designed to ensure the stable and efficient learning of the TD3 agent. Finally, the motion characteristics of three joints for the designed 5-DOF robotic manipulator are analyzed. The results show that compared to the non-singular fast terminal sliding mode control algorithm based on a nonlinear disturbance observer (NDONFT), the non-singular fast terminal sliding mode control algorithm integrating a nonlinear disturbance observer and the Twin Delayed Deep Deterministic Policy Gradient algorithm (TD3NDONFT) reduces the mean absolute error of position tracking for the three joints by 7.14%, 19.94%, and 6.14%, respectively, and reduces the mean absolute error of velocity tracking by 1.78%, 9.10%, and 2.11%, respectively. These results verify the effectiveness of the proposed algorithm in enhancing the trajectory tracking accuracy of the robotic manipulator under unknown time-varying disturbances and demonstrate its strong robustness against sudden disturbances.
Journal Article
Optimization of broadband metamaterial absorber using twin delayed deep deterministic policy gradient reinforcement learning technique
by
Obayya, Salah S. A.
,
Mahmoud, Basant E.
,
Hameed, Mohamed Farhat O.
in
639/166
,
639/301
,
639/624
2026
This paper presents a new reinforcement learning (RL)-driven inverse design strategy that leverages the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm for the efficient optimization of photonic structures, with a focus on metamaterial absorbers (MAs) and cross polarization converters (CPC) as demonstrative applications. Unlike conventional heuristic or surrogate-based optimization methods, the proposed RL approach autonomously learns the optimal geometric configuration through direct interaction with the simulation environment, without requiring gradient information or pre-built surrogate models. Initially, the TD3 model is used to optimize the geometric parameters of an existing MA based on an L-shaped resonator, significantly enhancing its absorption performance to be greater than 90% in the frequency range from 12.2 GHz to 22.4 GHz in only 23 iterations. Then, a novel CPC design is proposed, optimized using the same RL framework, and subsequently fabricated. The fabricated structure achieves high polarization conversion ratio (PCR) above 90% over a wide frequency range from 11.8 GHz to 24.2 GHz, covering the full Ku band and most of the K band. Furthermore, over most of the frequency range, the converter maintains strong performance under oblique incidence, with PCR levels above 80% up to an angle of 50
. These results validate the effectiveness of the TD3-based RL framework in discovering high-performance and fabrication-ready designs, while also establishing a scalable and generalizable optimization paradigm for advanced photonic devices.
Journal Article
Data-driven model identification and control of the quasi-zero-stiffness system
2025
Nonlinear vibraion isolation technique is widely employed for vibration suppression. An identification-control integrated method based on data-driven approaches is proposed for solving the optimal control law of a nonlinear time-continuous dynamic system. A dynamic surrogate model of the quasi-zero-stiffness (QZS) vibration isolation system is established by using an identification algorithm combined physical information neural network and Runge–Kutta method with the input and output signals of the original model. Two approximate optimal controllers are trained through the particle swarm optimization with ‘loser-out’ skill and the twin delayed deep deterministic policy gradient (TD3) in the sense of a self-defined objective function, where controllers communicate with the dynamic surrogate model during the training process. Then, the comprehensive performance in the condition of variable load and Gaussian noise excitation, and the displacement transmissibility are tested on the original model. The results show that the identified surrogate model can accurately reproduce the dynamic characteristics of the original model and the trained controllers are able to accomplish the control tasks successfully with a certain adaptability, further enhancing the low-frequency vibration isolation of the QZS isolator.
Journal Article
RIS Wireless Network Optimization Based on TD3 Algorithm in Coal-Mine Tunnels
2025
As an emerging technology, Reconfigurable Intelligent Surfaces (RIS) offers an efficient communication performance optimization solution for the complex and spatially constrained environment of coal mines by effectively controlling signal-propagation paths. This study investigates the channel attenuation characteristics of a semi-circular arch coal-mine tunnel with a dual RIS reflection link. By jointly optimizing the base-station beamforming matrix and the RIS phase-shift matrix, an improved Twin Delayed Deep Deterministic Policy Gradient (TD3)-based algorithm with a Noise Fading (TD3-NF) propagation optimization scheme is proposed, effectively improving the sum rate of the coal-mine wireless communication system. Simulation results show that when the transmit power is 38 dBm, the average link rate of the system reaches 11.1 bps/Hz, representing a 29.07% improvement compared to Deep Deterministic Policy Gradient (DDPG). The average sum rate of the 8 × 8 structure RIS is 3.3 bps/Hz higher than that of the 4 × 4 structure. The research findings provide new solutions for optimizing mine communication quality and applying artificial intelligence technology in complex environments.
Journal Article
Research on a Cooperative Grasping Method for Heterogeneous Objects in Unstructured Scenarios of Mine Conveyor Belts Based on an Improved MATD3
by
Gao, Rui
,
Du, Jingyi
,
Wu, Xudong
in
Algorithms
,
Artificial intelligence
,
Coal-mining machinery
2025
Underground coal mine conveying systems operate in unstructured environments. Influenced by geological and operational factors, coal conveyors are frequently contaminated by foreign objects such as coal gangue and anchor bolts. These contaminants disrupt conveying stability and pose challenges to safe mining operations, making their effective removal critical. Given the significant heterogeneity and unpredictability of these objects in shape, size, and orientation, precise manipulation requires dual-arm cooperative control. Traditional control algorithms rely on precise dynamic models and fixed parameters, lacking robustness in such unstructured environments. To address these challenges, this paper proposes a cooperative grasping method tailored for heterogeneous objects in unstructured environments. The MATD3 algorithm is employed to cooperatively perform dual-arm trajectory planning and grasping tasks. A multi-factor reward function is designed to accelerate convergence in continuous action spaces, optimize real-time grasping trajectories for foreign objects, and ensure stable robotic arm positioning. Furthermore, priority experience replay (PER) is integrated into the MATD3 framework to enhance experience utilization and accelerate convergence toward optimal policies. For slender objects, a sequential cooperative optimization strategy is developed to improve the stability and reliability of grasping and placement. Experimental results demonstrate that the P-MATD3 algorithm significantly improves grasping success rates and efficiency in unstructured environments. In single-arm tasks, compared to MATD3 and MADDPG, P-MATD3 increases grasping success rates by 7.1% and 9.94%, respectively, while reducing the number of steps required to reach the pre-grasping point by 11.44% and 12.77%. In dual-arm tasks, success rates increased by 5.58% and 9.84%, respectively, while step counts decreased by 11.6% and 18.92%. Robustness testing under Gaussian noise demonstrated that P-MATD3 maintains high stability even with varying noise intensities. Finally, ablation and comparative experiments comprehensively validated the proposed method’s effectiveness in simulated environments.
Journal Article