Catalogue Search | MBRL

Grandmaster level in StarCraft II using multi-agent reinforcement learning

by McKinney, Katrina , Lillicrap, Timothy , Chung, Junyoung in 639/705/117 , 639/705/531 , Actors

2019

Many real-world applications require artificial agents to compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the domain of StarCraft has emerged as an important challenge for artificial intelligence research, owing to its iconic and enduring status among the most difficult professional esports and its relevance to the real world in terms of its raw complexity and multi-agent challenges. Over the course of a decade and numerous competitions 1 – 3 , the strongest agents have simplified important aspects of the game, utilized superhuman capabilities, or employed hand-crafted sub-systems 4 . Despite these advantages, no previous agent has come close to matching the overall skill of top StarCraft players. We chose to address the challenge of StarCraft using general-purpose learning methods that are in principle applicable to other complex domains: a multi-agent reinforcement learning algorithm that uses data from both human and agent games within a diverse league of continually adapting strategies and counter-strategies, each represented by deep neural networks 5 , 6 . We evaluated our agent, AlphaStar, in the full game of StarCraft II, through a series of online games against human players. AlphaStar was rated at Grandmaster level for all three StarCraft races and above 99.8% of officially ranked human players. AlphaStar uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II.

Journal Article

Share this book

Add to My Shelf

Learning agents that acquire representations of social groups

by Vezhnevets, Alexander Sasha , Eckstein, Maria K. , Leibo, Joel Z. in Bias , Decision making , Deep learning

2022

Humans are learning agents that acquire social group representations from experience. Here, we discuss how to construct artificial agents capable of this feat. One approach, based on deep reinforcement learning, allows the necessary representations to self-organize. This minimizes the need for hand-engineering, improving robustness and scalability. It also enables “virtual neuroscience” research on the learned representations.

Journal Article

Share this book

Add to My Shelf

What is the simplest model that can account for high-fidelity imitation?

by Sunehag, Peter , Vezhnevets, Alexander Sasha , Duénez-Guzmán, Edgar A. in Algorithms , Artificial intelligence , Cooperation

2022

What inductive biases must be incorporated into multi-agent artificial intelligence models to get them to capture high-fidelity imitation? We think very little is needed. In the right environments, both instrumental- and ritual-stance imitation can emerge from generic learning mechanisms operating on non-deliberative decision architectures. In this view, imitation emerges from trial-and-error learning and does not require explicit deliberation.

Journal Article

Share this book

Add to My Shelf

A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings

by Agapiou, John P , Duéñez-Guzmán, Edgar , Leibo, Joel Z in Learning , Motivation , Multiagent systems

2022

Society is characterized by the presence of a variety of social norms: collective patterns of sanctioning that can prevent miscoordination and free-riding. Inspired by this, we aim to construct learning dynamics where potentially beneficial social norms can emerge. Since social norms are underpinned by sanctioning, we introduce a training regime where agents can access all sanctioning events but learning is otherwise decentralized. This setting is technologically interesting because sanctioning events may be the only available public signal in decentralized multi-agent systems where reward or policy-sharing is infeasible or undesirable. To achieve collective action in this setting we construct an agent architecture containing a classifier module that categorizes observed behaviors as approved or disapproved, and a motivation to punish in accord with the group. We show that social norms emerge in multi-agent systems containing this agent and investigate the conditions under which this helps them achieve socially beneficial outcomes.

Paper

Share this book

Add to My Shelf

Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia

by Matyas, Jayd , Avia Aharon , Agapiou, John P in Agent-based models , Application programming interface , Associative memory

2023

Agent-based modeling has been around for decades, and applied widely across the social and natural sciences. The scope of this research method is now poised to grow dramatically as it absorbs the new affordances provided by Large Language Models (LLM)s. Generative Agent-Based Models (GABM) are not just classic Agent-Based Models (ABM)s where the agents talk to one another. Rather, GABMs are constructed using an LLM to apply common sense to situations, act \"reasonably\", recall common semantic knowledge, produce API calls to control digital technologies like apps, and communicate both within the simulation and to researchers viewing it from the outside. Here we present Concordia, a library to facilitate constructing and working with GABMs. Concordia makes it easy to construct language-mediated simulations of physically- or digitally-grounded environments. Concordia agents produce their behavior using a flexible component system which mediates between two fundamental operations: LLM calls and associative memory retrieval. A special agent called the Game Master (GM), which was inspired by tabletop role-playing games, is responsible for simulating the environment where the agents interact. Agents take actions by describing what they want to do in natural language. The GM then translates their actions into appropriate implementations. In a simulated physical world, the GM checks the physical plausibility of agent actions and describes their effects. In digital environments simulating technologies such as apps and services, the GM may handle API calls to integrate with external tools such as general AI assistants (e.g., Bard, ChatGPT), and digital apps (e.g., Calendar, Email, Search, etc.). Concordia was designed to support a wide array of applications both in scientific research and for evaluating performance of real digital services by simulating users and/or generating synthetic data.

Paper

Share this book

Add to My Shelf

Heterogeneous Social Value Orientation Leads to Meaningful Diversity in Sequential Social Dilemmas

by Agapiou, John P , Hughes, Edward , Thomas, Anthony in Policies

2023

In social psychology, Social Value Orientation (SVO) describes an individual's propensity to allocate resources between themself and others. In reinforcement learning, SVO has been instantiated as an intrinsic motivation that remaps an agent's rewards based on particular target distributions of group reward. Prior studies show that groups of agents endowed with heterogeneous SVO learn diverse policies in settings that resemble the incentive structure of Prisoner's dilemma. Our work extends this body of results and demonstrates that (1) heterogeneous SVO leads to meaningfully diverse policies across a range of incentive structures in sequential social dilemmas, as measured by task-specific diversity metrics; and (2) learning a best response to such policy diversity leads to better zero-shot generalization in some situations. We show that these best-response agents learn policies that are conditioned on their co-players, which we posit is the reason for improved zero-shot generalization results.

Paper

Share this book

Add to My Shelf

Melting Pot 2.0

by Agapiou, John P , Mordatch, Igor , Kopparapu, Kavya in Agents (artificial intelligence) , Algorithms , Artificial intelligence

2023

Multi-agent artificial intelligence research promises a path to develop intelligent technologies that are more human-like and more human-compatible than those produced by \"solipsistic\" approaches, which do not consider interactions between agents. Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures generalization to novel social partners in a set of canonical test scenarios. Each scenario pairs a physical environment (a \"substrate\") with a reference set of co-players (a \"background population\"), to create a social situation with substantial interdependence between the individuals involved. For instance, some scenarios were inspired by institutional-economics-based accounts of natural resource management and public-good-provision dilemmas. Others were inspired by considerations from evolutionary biology, game theory, and artificial life. Melting Pot aims to cover a maximally diverse set of interdependencies and incentives. It includes the commonly-studied extreme cases of perfectly-competitive (zero-sum) motivations and perfectly-cooperative (shared-reward) motivations, but does not stop with them. As in real-life, a clear majority of scenarios in Melting Pot have mixed incentives. They are neither purely competitive nor purely cooperative and thus demand successful agents be able to navigate the resulting ambiguity. Here we describe Melting Pot 2.0, which revises and expands on Melting Pot. We also introduce support for scenarios with asymmetric roles, and explain how to integrate them into the evaluation protocol. This report also contains: (1) details of all substrates and scenarios; (2) a complete description of all baseline algorithms and results. Our intention is for it to serve as a reference for researchers using Melting Pot 2.0.

Paper

Share this book

Add to My Shelf

Hidden Agenda: a Social Deduction Game with Diverse Learned Equilibria

by Matyas, Jayd , Agapiou, John P , Graepel, Thore in Cooperation , Deduction , Game theory

2022

A key challenge in the study of multiagent cooperation is the need for individual agents not only to cooperate effectively, but to decide with whom to cooperate. This is particularly critical in situations when other agents have hidden, possibly misaligned motivations and goals. Social deduction games offer an avenue to study how individuals might learn to synthesize potentially unreliable information about others, and elucidate their true motivations. In this work, we present Hidden Agenda, a two-team social deduction game that provides a 2D environment for studying learning agents in scenarios of unknown team alignment. The environment admits a rich set of strategies for both teams. Reinforcement learning agents trained in Hidden Agenda show that agents can learn a variety of behaviors, including partnering and voting without need for communication in natural language.

Paper

Share this book

Add to My Shelf

Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot

by Matyas, Jayd , Agapiou, John P , Mordatch, Igor in Algorithms , Melting , Multiagent systems

2021

Existing evaluation suites for multi-agent reinforcement learning (MARL) do not assess generalization to novel situations as their primary objective (unlike supervised-learning benchmarks). Our contribution, Melting Pot, is a MARL evaluation suite that fills this gap, and uses reinforcement learning to reduce the human labor required to create novel test scenarios. This works because one agent's behavior constitutes (part of) another agent's environment. To demonstrate scalability, we have created over 80 unique test scenarios covering a broad range of research topics such as social dilemmas, reciprocity, resource sharing, and task partitioning. We apply these test scenarios to standard MARL training algorithms, and demonstrate how Melting Pot reveals weaknesses not apparent from training performance alone.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter