Catalogue Search | MBRL

Grandmaster level in StarCraft II using multi-agent reinforcement learning

by McKinney, Katrina , Lillicrap, Timothy , Chung, Junyoung in 639/705/117 , 639/705/531 , Actors

2019

Many real-world applications require artificial agents to compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the domain of StarCraft has emerged as an important challenge for artificial intelligence research, owing to its iconic and enduring status among the most difficult professional esports and its relevance to the real world in terms of its raw complexity and multi-agent challenges. Over the course of a decade and numerous competitions 1 – 3 , the strongest agents have simplified important aspects of the game, utilized superhuman capabilities, or employed hand-crafted sub-systems 4 . Despite these advantages, no previous agent has come close to matching the overall skill of top StarCraft players. We chose to address the challenge of StarCraft using general-purpose learning methods that are in principle applicable to other complex domains: a multi-agent reinforcement learning algorithm that uses data from both human and agent games within a diverse league of continually adapting strategies and counter-strategies, each represented by deep neural networks 5 , 6 . We evaluated our agent, AlphaStar, in the full game of StarCraft II, through a series of online games against human players. AlphaStar was rated at Grandmaster level for all three StarCraft races and above 99.8% of officially ranked human players. AlphaStar uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II.

Journal Article

Share this book

Add to My Shelf

Allosteric nanobodies uncover a role of hippocampal mGlu2 receptor homodimers in contextual fear consolidation

by Moreno-Delgado, David , Mathieu, Michaël , Mathis, Gérard in 631/378/340 , 631/61/338 , 692/4017

2017

Antibodies have enormous therapeutic and biotechnology potential. G protein-coupled receptors (GPCRs), the main targets in drug development, are of major interest in antibody development programs. Metabotropic glutamate receptors are dimeric GPCRs that can control synaptic activity in a multitude of ways. Here we identify llama nanobodies that specifically recognize mGlu2 receptors, among the eight subtypes of mGluR subunits. Among these nanobodies, DN10 and 13 are positive allosteric modulators (PAM) on homodimeric mGlu2, while DN10 displays also a significant partial agonist activity. DN10 and DN13 have no effect on mGlu2-3 and mGlu2-4 heterodimers. These PAMs enhance the inhibitory action of the orthosteric mGlu2/mGlu3 agonist, DCG-IV, at mossy fiber terminals in the CA3 region of hippocampal slices. DN13 also impairs contextual fear memory when injected in the CA3 region of hippocampal region. These data highlight the potential of developing antibodies with allosteric actions on GPCRs to better define their roles in vivo. G protein-coupled receptors are considered promising therapeutic targets. Here, the authors have identified nanobodies, or single-domain llama antibodies, that specifically enhance agonist-induced activity of a type of G protein-coupled receptor, the mGlu2 receptor.

Journal Article

Share this book

Add to My Shelf

The Accuracy or Inaccuracy of Affective Forecasts Depends on How Accuracy Is Indexed: A Meta-Analysis of Past Studies

by Gosling, Samuel D. , Mathieu, Michael Tyler in Affect , Affectivity. Emotion , Analytical forecasting

2012

Journal Article

Share this book

Add to My Shelf

Unsupervised Learning under Uncertainty

by Mathieu, Michaël in Artificial intelligence , Computer science

2017

Deep learning, in particular neural networks, achieved remarkable success in the recent years. However, most of it is based on supervised learning, and relies on ever larger datasets, and immense computing power. One step towards general artificial intelligence is to build a model of the world, with enough knowledge to acquire a kind of ``common sense''. Representations learned by such a model could be reused in a number of other tasks. It would reduce the requirement for labelled samples and possibly acquire a deeper understanding of the problem. The vast quantities of knowledge required to build common sense precludes the use of supervised learning, and suggests to rely on unsupervised learning instead. The concept of uncertainty is central to unsupervised learning. The task is usually to learn a complex, multimodal distribution. Density estimation and generative models aim at representing the whole distribution of the data, while predictive learning consists of predicting the state of the world given the context and, more often than not, the prediction is not unique. That may be because the model lacks the capacity or the computing power to make a certain prediction, or because the future depends on parameters that are not part of the observation. Finally, the world can be chaotic of truly stochastic. Representing complex, multimodal continuous distributions with deep neural networks is still an open problem. In this thesis, we first assess the difficulties of representing probabilities in high dimensional spaces, and review the related work in this domain. We then introduce two methods to address the problem of video prediction, first using a novel form of linearizing auto-encoders and latent variables, and secondly using Generative Adversarial Networks (GANs). We show how GANs can be seen as trainable loss functions to represent uncertainty, then how they can be used to disentangle factors of variation. Finally, we explore a new non-probabilistic framework for GANs.

Dissertation

Share this book

Add to My Shelf

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

by Gulcehre, Caglar , Mathieu, Michaël , Czarnecki, Wojciech Marian in Algorithms , Cloning , Datasets

2023

StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time low-level execution. It also has an active professional competitive scene. StarCraft II is uniquely suited for advancing offline RL algorithms, both because of its challenging nature and because Blizzard has released a massive dataset of millions of StarCraft II games played by human players. This paper leverages that and establishes a benchmark, called AlphaStar Unplugged, introducing unprecedented challenges for offline reinforcement learning. We define a dataset (a subset of Blizzard's release), tools standardizing an API for machine learning methods, and an evaluation protocol. We also present baseline agents, including behavior cloning, offline variants of actor-critic and MuZero. We improve the state of the art of agents using only offline data, and we achieve 90% win rate against previously published AlphaStar behavior cloning agent.

Paper

Share this book

Add to My Shelf

Fast Approximation of Rotations and Hessians matrices

by Mathieu, Michael , LeCun, Yann in Approximation , Covariance matrix , Hessian matrices

2014

A new method to represent and approximate rotation matrices is introduced. The method represents approximations of a rotation matrix \\(Q\\) with linearithmic complexity, i.e. with \\(\\frac{1}{2}n\\lg(n)\\) rotations over pairs of coordinates, arranged in an FFT-like fashion. The approximation is \"learned\" using gradient descent. It allows to represent symmetric matrices \\(H\\) as \\(QDQ^T\\) where \\(D\\) is a diagonal matrix. It can be used to approximate covariance matrix of Gaussian models in order to speed up inference, or to estimate and track the inverse Hessian of an objective function by relating changes in parameters to changes in gradient along the trajectory followed by the optimization procedure. Experiments were conducted to approximate synthetic matrices, covariance matrices of real data, and Hessian matrices of objective functions involved in machine learning problems.

Paper

Share this book

Add to My Shelf

Physical Simulation of Inarticulate Robots

by Mathieu, Michaël , Naccache, David , Claret, Guillaume in Physical simulation , Robots

2011

In this note we study the structure and the behavior of inarticulate robots. We introduce a robot that moves by successive revolvings. The robot's structure is analyzed, simulated and discussed in detail.

Paper

Share this book

Add to My Shelf

Energy-based Generative Adversarial Network

by Zhao, Junbo , Mathieu, Michael , LeCun, Yann in Architecture , Generative adversarial networks , Image resolution

2017

We introduce the \"Energy-based Generative Adversarial Network\" model (EBGAN) which views the discriminator as an energy function that attributes low energies to the regions near the data manifold and higher energies to other regions. Similar to the probabilistic GANs, a generator is seen as being trained to produce contrastive samples with minimal energies, while the discriminator is trained to assign high energies to these generated samples. Viewing the discriminator as an energy function allows to use a wide variety of architectures and loss functionals in addition to the usual binary classifier with logistic output. Among them, we show one instantiation of EBGAN framework as using an auto-encoder architecture, with the energy being the reconstruction error, in place of the discriminator. We show that this form of EBGAN exhibits more stable behavior than regular GANs during training. We also show that a single-scale architecture can be trained to generate high-resolution images.

Paper

Share this book

Add to My Shelf

Deep multi-scale video prediction beyond mean square error

by Couprie, Camille , Mathieu, Michael , LeCun, Yann in Artificial neural networks , Computer vision , Learning

2016

Learning to predict future images from a video sequence involves the construction of an internal representation that models the image evolution accurately, and therefore, to some degree, its content and dynamics. This is why pixel-space video prediction may be viewed as a promising avenue for unsupervised feature learning. In addition, while optical flow has been a very studied problem in computer vision for a long time, future frame prediction is rarely approached. Still, many vision applications could benefit from the knowledge of the next frames of videos, that does not require the complexity of tracking every pixel trajectories. In this work, we train a convolutional network to generate future frames given an input sequence. To deal with the inherently blurry predictions obtained from the standard Mean Squared Error (MSE) loss function, we propose three different and complementary feature learning strategies: a multi-scale architecture, an adversarial training method, and an image gradient difference loss function. We compare our predictions to different published results based on recurrent neural networks on the UCF101 dataset

Paper

Share this book

Add to My Shelf

Learning to Linearize Under Uncertainty

by Ross Goroshin , Mathieu, Michael , LeCun, Yann in Computer vision , Hierarchies , Training

2015

Training deep feature hierarchies to solve supervised learning tasks has achieved state of the art performance on many problems in computer vision. However, a principled way in which to train such hierarchies in the unsupervised setting has remained elusive. In this work we suggest a new architecture and loss for training deep feature hierarchies that linearize the transformations observed in unlabeled natural video sequences. This is done by training a generative model to predict video frames. We also address the problem of inherent uncertainty in prediction by introducing latent variables that are non-deterministic functions of the input into the network architecture.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter