Asset Details

MbrlCatalogueTitleDetail

Do you wish to reserve the book?

Trust-Region Diffusion Policies for Massively Parallel On-Policy RL

by Le, Huy , Blessing, Denis , Voelcker, Claas A , Hoang, Tai , Neumann, Gerhard , Celik, Onur , Brunnbauer, Axel , Richter, Felix , Volpp, Michael

in Control tasks / Policies

2026

Yes Please

Hey, we have placed the reservation for you!

By the way, why not check out events that you can attend while you pick your title.

Oops! Something went wrong.

Looks like we were not able to place the reservation. Kindly try again later.

Paper

Trust-Region Diffusion Policies for Massively Parallel On-Policy RL

Le, Huy,

Blessing, Denis,

Voelcker, Claas A,

Hoang, Tai,

Neumann, Gerhard,

Celik, Onur,

Brunnbauer, Axel,

Richter, Felix,

Volpp, Michael

2026

Overview

Reinforcement learning with massively parallel simulations has become a standard framework for developing robust, deployable policies; however, most existing approaches still rely on simple Gaussian policy parameterizations. Diffusion models provide a more expressive policy class and have shown strong performance on challenging control problems, yet most diffusion-based RL methods are designed for offline or off-policy training. In this work, we ask whether diffusion policies can be trained effectively in the massively parallel, on-policy regime. To this end, we introduce Trust-region Diffusion Policies (TruDi), which enables diffusion policies for on-policy RL with massively parallel simulations. This setting is particularly challenging because the data distribution changes quickly across updates, making stable training with complex policies difficult. TruDi addresses this by integrating a trust-region optimization rule to enforce a KL-divergence constraint over the entire diffusion trajectory. Empirically, we evaluate TruDi on a diverse set of 4 massively parallel RL benchmarks comprising a total of 73 tasks. Across these tasks, TruDi consistently outperforms or is on-par with strong baselines on standard tasks and achieves clear gains on more challenging humanoid control tasks, establishing a strong new baseline for massively parallel on-policy RL.

Share this book

Add to My Shelf

Publisher

Cornell University Library, arXiv.org

Subject

Control tasks

/ Policies