Catalogue Search | MBRL

Deep Reinforcement Learning (DRL)-Driven Intelligent Scheduling of Virtual Power Plants

by Zhou Jiren , Sun Yuqin , Kang, Zheng in Accuracy , Algorithms , Artificial intelligence

2025

Driven by the global energy transition and carbon-neutrality goals, virtual power plants (VPPs) are expected to aggregate distributed energy resources and participate in multiple electricity markets while achieving economic efficiency and low carbon emissions. However, the strong volatility of wind and photovoltaic generation, together with the coupling between electric and thermal loads, makes real-time VPP scheduling challenging. Existing deep reinforcement learning (DRL)-based methods still suffer from limited predictive awareness and insufficient handling of physical and carbon-related constraints. To address these issues, this paper proposes an improved model, termed SAC-LAx, based on the Soft Actor–Critic (SAC) deep reinforcement learning algorithm for intelligent VPP scheduling. The model integrates an Attention–xLSTM prediction module and a Linear Programming (LP) constraint module: the former performs multi-step forecasting of loads and renewable generation to construct an extended state representation, while the latter projects raw DRL actions onto a feasible set that satisfies device operating limits, energy balance, and carbon trading constraints. These two modules work together with the SAC algorithm to form a closed perception–prediction–decision–control loop. A campus integrated-energy virtual power plant is adopted as the case study. The system consists of a gas–steam combined-cycle power plant (CCPP), battery storage, a heat pump, a thermal storage unit, wind turbines, photovoltaic arrays, and a carbon trading mechanism. Comparative simulation results show that, at the forecasting level, the Attention–xLSTM (Ax) module reduces the day-ahead electric load Mean Absolute Percentage Error (MAPE) from 4.51% and 5.77% obtained by classical Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) models to 2.88%, significantly improving prediction accuracy. At the scheduling level, the SAC-LAx model achieves an average reward of approximately 1440 and converges within around 2500 training episodes, outperforming other DRL algorithms such as Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic Policy Gradient (TD3), and Proximal Policy Optimization (PPO). Under the SAC-LAx framework, the daily net operating cost of the VPP is markedly reduced. With the carbon trading mechanism, the total carbon emission cost decreases by about 49% compared with the no-trading scenario, while electric–thermal power balance is maintained. These results indicate that integrating prediction enhancement and LP-based safety constraints with deep reinforcement learning provides a feasible pathway for low-carbon intelligent scheduling of VPPs.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter