Catalogue Search | MBRL

Grandmaster level in StarCraft II using multi-agent reinforcement learning

by McKinney, Katrina , Lillicrap, Timothy , Chung, Junyoung in 639/705/117 , 639/705/531 , Actors

2019

Many real-world applications require artificial agents to compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the domain of StarCraft has emerged as an important challenge for artificial intelligence research, owing to its iconic and enduring status among the most difficult professional esports and its relevance to the real world in terms of its raw complexity and multi-agent challenges. Over the course of a decade and numerous competitions 1 – 3 , the strongest agents have simplified important aspects of the game, utilized superhuman capabilities, or employed hand-crafted sub-systems 4 . Despite these advantages, no previous agent has come close to matching the overall skill of top StarCraft players. We chose to address the challenge of StarCraft using general-purpose learning methods that are in principle applicable to other complex domains: a multi-agent reinforcement learning algorithm that uses data from both human and agent games within a diverse league of continually adapting strategies and counter-strategies, each represented by deep neural networks 5 , 6 . We evaluated our agent, AlphaStar, in the full game of StarCraft II, through a series of online games against human players. AlphaStar was rated at Grandmaster level for all three StarCraft races and above 99.8% of officially ranked human players. AlphaStar uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II.

Journal Article

Share this book

Add to My Shelf

'Seldom Scene' tackles the clarinet

by Horgan, Dan in Clarinet music

2021

Newsletter

Share this book

Add to My Shelf

WATCH 'The Seldom Scene!' What's smaller than but looks like a tuba?

by Horgan, Dan

2021

Newsletter

Share this book

Add to My Shelf

The Mandalorian Steals the Bounty on Disney Plus

by Horgan, Dan in Bounty hunters , Favreau, Jon

2020

Newsletter

Share this book

Add to My Shelf

Vision-Language Models as a Source of Rewards

by Masoom, Hussain , Nikulin, Dmitry , Rocktäschel, Tim in Vision

2024

Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of rewards for reinforcement learning agents. We show how rewards for visual achievement of a variety of language goals can be derived from the CLIP family of models, and used to train RL agents that can achieve a variety of language goals. We showcase this approach in two distinct visual domains and present a scaling trend showing how larger VLMs lead to more accurate rewards for visual goal achievement, which in turn produces more capable RL agents.

Paper

Share this book

Add to My Shelf

Distributed Prioritized Experience Replay

by Barth-Maron, Gabriel , Hado van Hasselt , Hessel, Matteo in Algorithms , Machine learning , Neural networks

2018

We propose a distributed architecture for deep reinforcement learning at scale, that enables agents to learn effectively from orders of magnitude more data than previously possible. The algorithm decouples acting from learning: the actors interact with their own instances of the environment by selecting actions according to a shared neural network, and accumulate the resulting experience in a shared experience replay memory; the learner replays samples of experience and updates the neural network. The architecture relies on prioritized experience replay to focus only on the most significant data generated by the actors. Our architecture substantially improves the state of the art on the Arcade Learning Environment, achieving better final performance in a fraction of the wall-clock training time.

Paper

Share this book

Add to My Shelf

Unicorn: Continual Learning with a Universal, Off-policy Agent

by Barreto, André , Schaul, Tom , Mankowitz, Daniel J in Curricula , Domains , Initiatives

2018

Some real-world domains are best characterized as a single task, but for others this perspective is limiting. Instead, some tasks continually grow in complexity, in tandem with the agent's competence. In continual learning, also referred to as lifelong learning, there are no explicit task boundaries or curricula. As learning agents have become more powerful, continual learning remains one of the frontiers that has resisted quick progress. To test continual learning capabilities we consider a challenging 3D domain with an implicit sequence of tasks and sparse rewards. We propose a novel agent architecture called Unicorn, which demonstrates strong continual learning and outperforms several baseline agents on the proposed domain. The agent achieves this by jointly representing and learning multiple policies efficiently, using a parallel off-policy learning setup.

Paper

Share this book

Add to My Shelf

Distributed Distributional Deterministic Policy Gradients

by Lillicrap, Timothy , Hoffman, Matthew W , Dabney, Will in Algorithms , Control tasks , Locomotion

2018

This work adopts the very successful distributional perspective on reinforcement learning and adapts it to the continuous control setting. We combine this within a distributed framework for off-policy learning in order to develop what we call the Distributed Distributional Deep Deterministic Policy Gradient algorithm, D4PG. We also combine this technique with a number of additional, simple improvements such as the use of \\(N\\)-step returns and prioritized experience replay. Experimentally we examine the contribution of each of these individual components, and show how they interact, as well as their combined contributions. Our results show that across a wide variety of simple control tasks, difficult manipulation tasks, and a set of hard obstacle-based locomotion tasks the D4PG algorithm achieves state of the art performance.

Paper

Share this book

Add to My Shelf

COLUMN: Help is available for alcohol, drug addiction

by Horgan, Dan in Drug addiction

2007

Newspaper Article

Share this book

Add to My Shelf

Psychologically speaking: Bipolar disorder is a treatable condition

by Horgan, Dan in Behavior , Bipolar disorder , Medical disorders

2007

The average age of onset is in the late teens or early 20s. Because of better understanding of the disorder, we are now able to identify its presence in younger adolescents and in children as well.

Newspaper Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter