Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Language
      Language
      Clear All
      Language
  • Subject
      Subject
      Clear All
      Subject
  • Item Type
      Item Type
      Clear All
      Item Type
  • Discipline
      Discipline
      Clear All
      Discipline
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
39 result(s) for "Cai, Trevor"
Sort by:
Grandmaster level in StarCraft II using multi-agent reinforcement learning
Many real-world applications require artificial agents to compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the domain of StarCraft has emerged as an important challenge for artificial intelligence research, owing to its iconic and enduring status among the most difficult professional esports and its relevance to the real world in terms of its raw complexity and multi-agent challenges. Over the course of a decade and numerous competitions 1 – 3 , the strongest agents have simplified important aspects of the game, utilized superhuman capabilities, or employed hand-crafted sub-systems 4 . Despite these advantages, no previous agent has come close to matching the overall skill of top StarCraft players. We chose to address the challenge of StarCraft using general-purpose learning methods that are in principle applicable to other complex domains: a multi-agent reinforcement learning algorithm that uses data from both human and agent games within a diverse league of continually adapting strategies and counter-strategies, each represented by deep neural networks 5 , 6 . We evaluated our agent, AlphaStar, in the full game of StarCraft II, through a series of online games against human players. AlphaStar was rated at Grandmaster level for all three StarCraft races and above 99.8% of officially ranked human players. AlphaStar uses a multi-agent reinforcement learning algorithm and has reached Grandmaster level, ranking among the top 0.2% of human players for the real-time strategy game StarCraft II.
Red Teaming Language Models with Language Models
Language Models (LMs) often cannot be deployed because of their potential to harm users in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using human annotators to hand-write test cases. However, human annotation is expensive, limiting the number and diversity of test cases. In this work, we automatically find cases where a target LM behaves in a harmful way, by generating test cases (\"red teaming\") using another LM. We evaluate the target LM's replies to generated test questions using a classifier trained to detect offensive content, uncovering tens of thousands of offensive replies in a 280B parameter LM chatbot. We explore several methods, from zero-shot generation to reinforcement learning, for generating test cases with varying levels of diversity and difficulty. Furthermore, we use prompt engineering to control LM-generated test cases to uncover a variety of other harms, automatically finding groups of people that the chatbot discusses in offensive ways, personal and hospital phone numbers generated as the chatbot's own contact info, leakage of private training data in generated text, and harms that occur over the course of a conversation. Overall, LM-based red teaming is one promising tool (among many needed) for finding and fixing diverse, undesirable LM behaviors before impacting users.
GPT-4 Technical Report
We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4.
Training Compute-Optimal Large Language Models
We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant. By training over 400 language models ranging from 70 million to over 16 billion parameters on 5 to 500 billion tokens, we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled. We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and 4\\(\\times\\) more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks. This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatly facilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of 67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher.
Unified Scaling Laws for Routed Language Models
The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count. Here we study the scaling behaviors of Routing Networks: architectures that conditionally use only a subset of their parameters while processing an input. For these models, parameter count and computational requirement form two independent axes along which an increase leads to better performance. In this work we derive and justify scaling laws defined on these two variables which generalize those known for standard language models and describe the performance of a wide range of routing architectures trained via three different techniques. Afterwards we provide two applications of these laws: first deriving an Effective Parameter Count along which all models scale at the same rate, and then using the scaling coefficients to give a quantitative comparison of the three routing techniques considered. Our analysis derives from an extensive evaluation of Routing Networks across five orders of magnitude of size, including models with hundreds of experts and hundreds of billions of parameters.
Improving language models by retrieving from trillions of tokens
We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a \\(2\\) trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25\\(\\times\\) fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train RETRO from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gopher. These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.
RPG interacts with E3-ligase CERBERUS to mediate rhizobial infection in Lotus japonicus
Symbiotic interactions between rhizobia and legumes result in the formation of root nodules, which fix nitrogen that can be used for plant growth. Rhizobia usually invade legume roots through a plant-made tunnel-like structure called an infection thread (IT). RPG (Rhizobium-directed polar growth) encodes a coiled-coil protein that has been identified in Medicago truncatula as required for root nodule infection, but the function of RPG remains poorly understood. In this study, we identified and characterized RPG in Lotus japonicus and determined that it is required for IT formation. RPG was induced by Mesorhizobium loti or purified Nodulation factor and displayed an infection-specific expression pattern. Nodule inception (NIN) bound to the RPG promoter and induced its expression. We showed that RPG displayed punctate subcellular localization in L . japonicus root protoplasts and in root hairs infected by M . loti . The N-terminal predicted C2 lipid-binding domain of RPG was not required for this subcellular localization or for function. CERBERUS, a U-box E3 ligase which is also required for rhizobial infection, was found to be localized similarly in puncta. RPG co-localized and directly interacted with CERBERUS in the early endosome (TGN/EE) compartment and near the nuclei in root hairs after rhizobial inoculation. Our study sheds light on an RPG-CERBERUS protein complex that is involved in an exocytotic pathway mediating IT elongation.
Neutralizing antibody vaccine for pandemic and pre-emergent coronaviruses
Betacoronaviruses caused the outbreaks of severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome, as well as the current pandemic of SARS coronavirus 2 (SARS-CoV-2) 1 – 4 . Vaccines that elicit protective immunity against SARS-CoV-2 and betacoronaviruses that circulate in animals have the potential to prevent future pandemics. Here we show that the immunization of macaques with nanoparticles conjugated with the receptor-binding domain of SARS-CoV-2, and adjuvanted with 3M-052 and alum, elicits cross-neutralizing antibody responses against bat coronaviruses, SARS-CoV and SARS-CoV-2 (including the B.1.1.7, P.1 and B.1.351 variants). Vaccination of macaques with these nanoparticles resulted in a 50% inhibitory reciprocal serum dilution (ID 50 ) neutralization titre of 47,216 (geometric mean) for SARS-CoV-2, as well as in protection against SARS-CoV-2 in the upper and lower respiratory tracts. Nucleoside-modified mRNAs that encode a stabilized transmembrane spike or monomeric receptor-binding domain also induced cross-neutralizing antibody responses against SARS-CoV and bat coronaviruses, albeit at lower titres than achieved with the nanoparticles. These results demonstrate that current mRNA-based vaccines may provide some protection from future outbreaks of zoonotic betacoronaviruses, and provide a multimeric protein platform for the further development of vaccines against multiple (or all) betacoronaviruses. Immunization of macaques with nanoparticle-conjugated receptor-binding domain of SARS-CoV-2 adjuvanted with 3M-052 and alum results in cross-neutralizing antibodies against bat coronaviruses, SARS-CoV and SARS-CoV-2 variants, and may provide a platform for developing pan-coronavirus vaccines.
The multimodality cell segmentation challenge: toward universal solutions
Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyper-parameters in different experimental settings. Here, we present a multimodality cell segmentation benchmark, comprising more than 1,500 labeled images derived from more than 50 diverse biological experiments. The top participants developed a Transformer-based deep-learning algorithm that not only exceeds existing methods but can also be applied to diverse microscopy images across imaging platforms and tissue types without manual parameter adjustments. This benchmark and the improved algorithm offer promising avenues for more accurate and versatile cell analysis in microscopy imaging. Cell segmentation is crucial in many image analysis pipelines. This analysis compares many tools on a multimodal cell segmentation benchmark. A Transformer-based model performed best in terms of performance and general applicability.