Asset Details

MbrlCatalogueTitleDetail

Do you wish to reserve the book?

Leveraging exploration in off-policy algorithms via normalizing flows

by Hjelm, R Devon , Pineau, Joelle , Mazoure, Bogdan , Doan, Thang , Durand, Audrey

in Algorithms / Domains / Exploration / Normalizing (statistics) / Optimization / Performance enhancement

2019

Yes Please

Hey, we have placed the reservation for you!

By the way, why not check out events that you can attend while you pick your title.

Oops! Something went wrong.

Looks like we were not able to place the reservation. Kindly try again later.

Paper

Leveraging exploration in off-policy algorithms via normalizing flows

Hjelm, R Devon,

Pineau, Joelle,

Mazoure, Bogdan,

Doan, Thang,

Durand, Audrey

2019

Overview

The ability to discover approximately optimal policies in domains with sparse rewards is crucial to applying reinforcement learning (RL) in many real-world scenarios. Approaches such as neural density models and continuous exploration (e.g., Go-Explore) have been proposed to maintain the high exploration rate necessary to find high performing and generalizable policies. Soft actor-critic(SAC) is another method for improving exploration that aims to combine efficient learning via off-policy updates while maximizing the policy entropy. In this work, we extend SAC to a richer class of probability distributions (e.g., multimodal) through normalizing flows (NF) and show that this significantly improves performance by accelerating the discovery of good policies while using much smaller policy representations. Our approach, which we call SAC-NF, is a simple, efficient,easy-to-implement modification and improvement to SAC on continuous control baselines such as MuJoCo and PyBullet Roboschool domains. Finally, SAC-NF does this while being significantly parameter efficient, using as few as 5.5% the parameters for an equivalent SAC model.

Share this book

Add to My Shelf

Publisher

Cornell University Library, arXiv.org

Subject

Algorithms

/ Domains

/ Exploration

/ Normalizing (statistics)

/ Optimization

/ Performance enhancement