Catalogue Search | MBRL

by Plotkin, Joshua B. , Otwinowski, Jakub , McCandlish, David M. in Biological Sciences , Epistasis , Epistasis, Genetic

2018

Genotype–phenotype relationships are notoriously complicated. Idiosyncratic interactions between specific combinations of mutations occur and are difficult to predict. Yet it is increasingly clear that many interactions can be understood in terms of global epistasis. That is, mutations may act additively on some underlying, unobserved trait, and this trait is then transformed via a nonlinear function to the observed phenotype as a result of subsequent biophysical and cellular processes. Here we infer the shape of such global epistasis in three proteins, based on published high-throughput mutagenesis data. To do so, we develop a maximum-likelihood inference procedure using a flexible family of monotonic nonlinear functions spanned by an I-spline basis. Our analysis uncovers dramatic nonlinearities in all three proteins; in some proteins a model with global epistasis accounts for virtually all of the measured variation, whereas in others we find substantial local epistasis as well. This method allows us to test hypotheses about the form of global epistasis and to distinguish variance components attributable to global epistasis, local epistasis, and measurement error.

Journal Article

Share this book

Add to My Shelf

Minimum epistasis interpolation for sequence-function relationships

by McCandlish, David M. , Zhou, Juannan in 45/23 , 45/70 , 49/61

2020

Massively parallel phenotyping assays have provided unprecedented insight into how multiple mutations combine to determine biological function. While such assays can measure phenotypes for thousands to millions of genotypes in a single experiment, in practice these measurements are not exhaustive, so that there is a need for techniques to impute values for genotypes whose phenotypes have not been directly assayed. Here, we present an imputation method based on inferring the least epistatic possible sequence-function relationship compatible with the data. In particular, we infer the reconstruction where mutational effects change as little as possible across adjacent genetic backgrounds. The resulting models can capture complex higher-order genetic interactions near the data, but approach additivity where data is sparse or absent. We apply the method to high-throughput transcription factor binding assays and use it to explore a fitness landscape for protein G. High-throughput combinatorial mutagenesis assays are useful to screen the function of many different sequences but they are not exhaustive. Here, Zhou and McCandlish develop a method to impute such missing genotype-phenotype data based on inferring the least epistatic sequence-function relationship.

Journal Article

Share this book

Add to My Shelf

Contingency and entrenchment in protein evolution under purifying selection

by Joshua B. Plotkin , Shah, Premal , David M. McCandlish in amino acid substitution , Biological Sciences , coevolution

2015

The phenotypic effect of an allele at one genetic site may depend on alleles at other sites, a phenomenon known as epistasis. Epistasis can profoundly influence the process of evolution in populations and shape the patterns of protein divergence across species. Whereas epistasis between adaptive substitutions has been studied extensively, relatively little is known about epistasis under purifying selection. Here we use computational models of thermodynamic stability in a ligand-binding protein to explore the structure of epistasis in simulations of protein sequence evolution. Even though the predicted effects on stability of random mutations are almost completely additive, the mutations that fix under purifying selection are enriched for epistasis. In particular, the mutations that fix are contingent on previous substitutions: Although nearly neutral at their time of fixation, these mutations would be deleterious in the absence of preceding substitutions. Conversely, substitutions under purifying selection are subsequently entrenched by epistasis with later substitutions: They become increasingly deleterious to revert over time. Our results imply that, even under purifying selection, protein sequence evolution is often contingent on history and so it cannot be predicted by the phenotypic effects of mutations assayed in the ancestral background. Significance How large a role does history play in evolution? Do later events depend critically on specific earlier events, or do all events occur more or less independently? If a change occurs early in evolution, does it become easier or harder to revert the change as time proceeds? Here, we explore these ideas in the context of protein evolution, by simulating sequence evolution under purifying selection and then systematically permuting the order of amino acid substitutions. Our results suggest that the amino acid substitutions that occur in evolution are typically contingent on the presence of prior substitutions, and that substitutions that occur early in evolution become entrenched and difficult to modify as subsequent substitutions accrue.

Journal Article

Share this book

Add to My Shelf

MAVE-NN: learning genotype-phenotype maps from multiplex assays of variant effect

by Posfai, Anna , Tareen, Ammar , Kinney, Justin B. in Animal Genetics and Genomics , Bioinformatics , Biological Assay

2022

Multiplex assays of variant effect (MAVEs) are a family of methods that includes deep mutational scanning experiments on proteins and massively parallel reporter assays on gene regulatory sequences. Despite their increasing popularity, a general strategy for inferring quantitative models of genotype-phenotype maps from MAVE data is lacking. Here we introduce MAVE-NN, a neural-network-based Python package that implements a broadly applicable information-theoretic framework for learning genotype-phenotype maps—including biophysically interpretable models—from MAVE datasets. We demonstrate MAVE-NN in multiple biological contexts, and highlight the ability of our approach to deconvolve mutational effects from otherwise confounding experimental nonlinearities and noise.

Journal Article

Share this book

Add to My Shelf

Robust genetic codes enhance protein evolvability

by Payne, Joshua L. , McCandlish, David M. , Rozhoňová, Hana in Accessibility , Adaptation , Amino acid sequence

2024

The standard genetic code defines the rules of translation for nearly every life form on Earth. It also determines the amino acid changes accessible via single-nucleotide mutations, thus influencing protein evolvability—the ability of mutation to bring forth adaptive variation in protein function. One of the most striking features of the standard genetic code is its robustness to mutation, yet it remains an open question whether such robustness facilitates or frustrates protein evolvability. To answer this question, we use data from massively parallel sequence-to-function assays to construct and analyze 6 empirical adaptive landscapes under hundreds of thousands of rewired genetic codes, including those of codon compression schemes relevant to protein engineering and synthetic biology. We find that robust genetic codes tend to enhance protein evolvability by rendering smooth adaptive landscapes with few peaks, which are readily accessible from throughout sequence space. However, the standard genetic code is rarely exceptional in this regard, because many alternative codes render smoother landscapes than the standard code. By constructing low-dimensional visualizations of these landscapes, which each comprise more than 16 million mRNA sequences, we show that such alternative codes radically alter the topological features of the network of high-fitness genotypes. Whereas the genetic codes that optimize evolvability depend to some extent on the detailed relationship between amino acid sequence and protein function, we also uncover general design principles for engineering nonstandard genetic codes for enhanced and diminished evolvability, which may facilitate directed protein evolution experiments and the bio-containment of synthetic organisms, respectively.

Journal Article

Share this book

Add to My Shelf

Gauge fixing for sequence-function relationships

by Posfai, Anna , Kinney, Justin B. , McCandlish, David M. in Algorithms , Binding proteins , Biological research

2025

Quantitative models of sequence-function relationships are ubiquitous in computational biology, e.g., for modeling the DNA binding of transcription factors or the fitness landscapes of proteins. Interpreting these models, however, is complicated by the fact that the values of model parameters can often be changed without affecting model predictions. Before the values of model parameters can be meaningfully interpreted, one must remove these degrees of freedom (called “gauge freedoms” in physics) by imposing additional constraints (a process called “fixing the gauge”). However, strategies for fixing the gauge of sequence-function relationships have received little attention. Here we derive an analytically tractable family of gauges for a large class of sequence-function relationships. These gauges are derived in the context of models with all-order interactions, but an important subset of these gauges can be applied to diverse types of models, including additive models, pairwise-interaction models, and models with higher-order interactions. Many commonly used gauges are special cases of gauges within this family. We demonstrate the utility of this family of gauges by showing how different choices of gauge can be used both to explore complex activity landscapes and to reveal simplified models that are approximately correct within localized regions of sequence space. The results provide practical gauge-fixing strategies and demonstrate the utility of gauge-fixing for model exploration and interpretation.

Journal Article

Share this book

Add to My Shelf

Specificity, synergy, and mechanisms of splice-modifying drugs

by McCandlish, David M. , Hanson, Sonya M. , Kinney, Justin B. in 45/88 , 45/91 , 631/154/556

2024

Drugs that target pre-mRNA splicing hold great therapeutic potential, but the quantitative understanding of how these drugs work is limited. Here we introduce mechanistically interpretable quantitative models for the sequence-specific and concentration-dependent behavior of splice-modifying drugs. Using massively parallel splicing assays, RNA-seq experiments, and precision dose-response curves, we obtain quantitative models for two small-molecule drugs, risdiplam and branaplam, developed for treating spinal muscular atrophy. The results quantitatively characterize the specificities of risdiplam and branaplam for 5’ splice site sequences, suggest that branaplam recognizes 5’ splice sites via two distinct interaction modes, and contradict the prevailing two-site hypothesis for risdiplam activity at SMN2 exon 7. The results also show that anomalous single-drug cooperativity, as well as multi-drug synergy, are widespread among small-molecule drugs and antisense-oligonucleotide drugs that promote exon inclusion. Our quantitative models thus clarify the mechanisms of existing treatments and provide a basis for the rational development of new therapies. Two small-molecule drugs, risdiplam and branaplam, have been developed for treating spinal muscular atrophy. Here the authors develop quantitative modeling methods for the sequence-specific and concentration-dependent effects of these and other splice-modifying drugs.

Journal Article

Share this book

Add to My Shelf

Designed active-site library reveals thousands of functional GFP variants

by Hoch, Shlomo Yakir , Petrovich-Kopitman, Ekaterina , Fleishman, Sarel J. in 49/23 , 49/31 , 49/40

2023

Mutations in a protein active site can lead to dramatic and useful changes in protein activity. The active site, however, is sensitive to mutations due to a high density of molecular interactions, substantially reducing the likelihood of obtaining functional multipoint mutants. We introduce an atomistic and machine-learning-based approach, called high-throughput Functional Libraries (htFuncLib), that designs a sequence space in which mutations form low-energy combinations that mitigate the risk of incompatible interactions. We apply htFuncLib to the GFP chromophore-binding pocket, and, using fluorescence readout, recover >16,000 unique designs encoding as many as eight active-site mutations. Many designs exhibit substantial and useful diversity in functional thermostability (up to 96 °C), fluorescence lifetime, and quantum yield. By eliminating incompatible active-site mutations, htFuncLib generates a large diversity of functional sequences. We envision that htFuncLib will be used in one-shot optimization of activity in enzymes, binders, and other proteins. Mutations in a protein active site can alter function in useful ways, but the active site is sensitive to changes. Here the authors present a general strategy to design combinatorial mutation libraries. Applied to GFP, the authors isolate thousands of fluorescent designs that exhibit large and useful changes in spectral properties.

Journal Article

Share this book

Add to My Shelf

The role of mutation bias in adaptive molecular evolution: insights from convergent changes in protein function

by Stoltzfus, Arlin , Natarajan, Chandrasekhar , Witt, Christopher C. in Adaptation, Physiological , Altitude , Animals

2019

An underexplored question in evolutionary genetics concerns the extent to which mutational bias in the production of genetic variation influences outcomes and pathways of adaptive molecular evolution. In the genomes of at least some vertebrate taxa, an important form of mutation bias involves changes at CpG dinucleotides: if the DNA nucleotide cytosine (C) is immediately 5′ to guanine (G) on the same coding strand, then—depending on methylation status—point mutations at both sites occur at an elevated rate relative to mutations at non-CpG sites. Here, we examine experimental data from case studies in which it has been possible to identify the causative substitutions that are responsible for adaptive changes in the functional properties of vertebrate haemoglobin (Hb). Specifically, we examine the molecular basis of convergent increases in Hb–O 2 affinity in high-altitude birds. Using a dataset of experimentally verified, affinity-enhancing mutations in the Hbs of highland avian taxa, we tested whether causative changes are enriched for mutations at CpG dinucleotides relative to the frequency of CpG mutations among all possible missense mutations. The tests revealed that a disproportionate number of causative amino acid replacements were attributable to CpG mutations, suggesting that mutation bias can influence outcomes of molecular adaptation. This article is part of the theme issue ‘Convergent evolution in the genomics era: new insights and directions’.

Journal Article

Share this book

Add to My Shelf