Catalogue Search | MBRL

Protein 3D Structure Computed from Evolutionary Sequence Variation

by Sander, Chris , Zecchina, Riccardo , Sheridan, Robert in Amino acid sequence , Amino acids , Analysis

2011

The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing.In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues, including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7-4.8 Å C(α)-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.

Journal Article

Share this book

Add to My Shelf

Protein structure prediction from sequence variation

by Hopf, Thomas A , Sander, Chris , Marks, Debora S in 631/114/2411 , 631/61/475 , Agriculture

2012

Genomic sequences contain rich evolutionary information about functional constraints on macromolecules such as proteins. This information can be efficiently mined to detect evolutionary couplings between residues in proteins and address the long-standing challenge to compute protein three-dimensional structures from amino acid sequences. Substantial progress has recently been made on this problem owing to the explosive growth in available sequences and the application of global statistical methods. In addition to three-dimensional structure, the improved understanding of covariation may help identify functional residues involved in ligand binding, protein-complex formation and conformational changes. We expect computation of covariation patterns to complement experimental structural biology in elucidating the full spectrum of protein structures, their functional interactions and evolutionary dynamics.

Journal Article

Share this book

Add to My Shelf

Mutation effects predicted from sequence co-variation

by Poelwijk, Frank J , Hopf, Thomas A , Sander, Chris in 631/114/2397 , 631/114/2410 , 631/181/735

2017

The global effects of epistasis on protein and RNA function are revealed by an unsupervised model of amino acid co-conservation in evolutionary sequence variation. Many high-throughput experimental technologies have been developed to assess the effects of large numbers of mutations (variation) on phenotypes. However, designing functional assays for these methods is challenging, and systematic testing of all combinations is impossible, so robust methods to predict the effects of genetic variation are needed. Most prediction methods exploit evolutionary sequence conservation but do not consider the interdependencies of residues or bases. We present EVmutation, an unsupervised statistical method for predicting the effects of mutations that explicitly captures residue dependencies between positions. We validate EVmutation by comparing its predictions with outcomes of high-throughput mutagenesis experiments and measurements of human disease mutations and show that it outperforms methods that do not account for epistasis. EVmutation can be used to assess the quantitative effects of mutations in genes of any organism. We provide pre-computed predictions for ∼7,000 human proteins at http://evmutation.org/ .

Journal Article

Share this book

Add to My Shelf

A deep proteome and transcriptome abundance atlas of 29 healthy human tissues

by Meng, Chen , Hahne, Hannes , Eraslan, Basak in Amino acids , Biomarkers , Brain research

2019

Genome‐, transcriptome‐ and proteome‐wide measurements provide insights into how biological systems are regulated. However, fundamental aspects relating to which human proteins exist, where they are expressed and in which quantities are not fully understood. Therefore, we generated a quantitative proteome and transcriptome abundance atlas of 29 paired healthy human tissues from the Human Protein Atlas project representing human genes by 18,072 transcripts and 13,640 proteins including 37 without prior protein‐level evidence. The analysis revealed that hundreds of proteins, particularly in testis, could not be detected even for highly expressed mRNAs, that few proteins show tissue‐specific expression, that strong differences between mRNA and protein quantities within and across tissues exist and that protein expression is often more stable across tissues than that of transcripts. Only 238 of 9,848 amino acid variants found by exome sequencing could be confidently detected at the protein level showing that proteogenomics remains challenging, needs better computational methods and requires rigorous validation. Many uses of this resource can be envisaged including the study of gene/protein expression regulation and biomarker specificity evaluation. Synopsis Proteome and transcriptome quantification across tissues reveals which human genes exist as transcripts and proteins, where they are expressed and in which approximate quantities. Tissue‐specific protein expression is found to be a rare and quantitative rather than qualitative characteristic. The study presents the most comprehensive atlas of protein expression to date, across 29 healthy human tissues. Protein level evidence is provided for 13,640 genes and 15,257 isoforms, including 37 missing proteins. Tissue‐specific protein expression is rare and quantitative rather than qualitative characteristic. Proteogenomics is still challenging and needs rigorous validation by synthetic peptides. Graphical Abstract Proteome and transcriptome quantification across tissues reveals which human genes exist as transcripts and proteins, where they are expressed and in which approximate quantities. Tissue‐specific protein expression is found to be a rare and quantitative rather than qualitative characteristic.

Journal Article

Share this book

Add to My Shelf

Structure of the peptidoglycan polymerase RodA resolved by evolutionary coupling analysis

by Sjodt, Megan , Srisuknimit, Veerasak , Bernhardt, Thomas G. in 631/326/1320 , 631/45/535 , Antibiotics

2018

Evolutionary coupling-enabled molecular replacement determination of the structure of Thermus thermophilus RodA reveals a highly conserved cavity in its transmembrane domain, and mutagenesis experiments in Bacillus subtilis and Escherichia coli show that perturbation of this cavity abolishes RodA function. Structure of a new class of bacterial cell wall polymerases The SEDS (shape, elongation, division and sporulation) proteins are a large family of bacterial proteins important for cell wall synthesis. Following the discovery of a new family of peptidoglycan polymerases among the SEDS family, Andrew Kruse and colleagues report the first crystal structure of a member of this family, RodA. The team developed a new phasing methodology and carried out mutagenesis work that shows that RodA contains a ten-pass transmembrane fold. A highly conserved cavity in the transmembrane domain contains key residues and structural determinants that are important for RodA function. The shape, elongation, division and sporulation (SEDS) proteins are a large family of ubiquitous and essential transmembrane enzymes with critical roles in bacterial cell wall biology. The exact function of SEDS proteins was for a long time poorly understood, but recent work 1 , 2 , 3 has revealed that the prototypical SEDS family member RodA is a peptidoglycan polymerase—a role previously attributed exclusively to members of the penicillin-binding protein family 4 . This discovery has made RodA and other SEDS proteins promising targets for the development of next-generation antibiotics. However, little is known regarding the molecular basis of SEDS activity, and no structural data are available for RodA or any homologue thereof. Here we report the crystal structure of Thermus thermophilus RodA at a resolution of 2.9 Å, determined using evolutionary covariance-based fold prediction to enable molecular replacement. The structure reveals a ten-pass transmembrane fold with large extracellular loops, one of which is partially disordered. The protein contains a highly conserved cavity in the transmembrane domain, reminiscent of ligand-binding sites in transmembrane receptors. Mutagenesis experiments in Bacillus subtilis and Escherichia coli show that perturbation of this cavity abolishes RodA function both in vitro and in vivo , indicating that this cavity is catalytically essential. These results provide a framework for understanding bacterial cell wall synthesis and SEDS protein function.

Journal Article

Share this book

Add to My Shelf

Amino acid coevolution reveals three-dimensional structure and functional domains of insect odorant receptors

by Morinaga, Satoshi , Ihara, Sayoko , Marks, Debora S. in 631/114 , 631/208/212/2304 , 631/378/2624/2625

2015

Insect odorant receptors (ORs) comprise an enormous protein family that translates environmental chemical signals into neuronal electrical activity. These heptahelical receptors are proposed to function as ligand-gated ion channels and/or to act metabotropically as G protein-coupled receptors (GPCRs). Resolving their signalling mechanism has been hampered by the lack of tertiary structural information and primary sequence similarity to other proteins. We use amino acid evolutionary covariation across these ORs to define restraints on structural proximity of residue pairs, which permit de novo generation of three-dimensional models. The validity of our analysis is supported by the location of functionally important residues in highly constrained regions of the protein. Importantly, insect OR models exhibit a distinct transmembrane domain packing arrangement to that of canonical GPCRs, establishing the structural unrelatedness of these receptor families. The evolutionary couplings and models predict odour binding and ion conduction domains, and provide a template for rationale structure-activity dissection. The structure of insect odorant receptors (ORs) has remained elusive due to their lack of homology to other proteins and the inability to obtain OR crystals. Here, the authors use amino acid evolutionary covariation patterns to fold these proteins de novo and generate the first three-dimensional models of insect ORs.

Journal Article

Share this book

Add to My Shelf

Sequence co-evolution gives 3D contacts and structures of protein complexes

by Hopf, Thomas A , Sander, Chris , Rodrigues, João P G L M in Bioinformatics , Biology , co-evolution

2014

Protein–protein interactions are fundamental to many biological processes. Experimental screens have identified tens of thousands of interactions, and structural biology has provided detailed functional insight for select 3D protein complexes. An alternative rich source of information about protein interactions is the evolutionary sequence record. Building on earlier work, we show that analysis of correlated evolutionary sequence changes across proteins identifies residues that are close in space with sufficient accuracy to determine the three-dimensional structure of the protein complexes. We evaluate prediction performance in blinded tests on 76 complexes of known 3D structure, predict protein–protein contacts in 32 complexes of unknown structure, and demonstrate how evolutionary couplings can be used to distinguish between interacting and non-interacting protein pairs in a large complex. With the current growth of sequences, we expect that the method can be generalized to genome-wide elucidation of protein–protein interaction networks and used for interaction predictions at residue resolution. DNA is often referred to as the ‘blueprint of life’, as this molecule contains the instructions that are required to build a living organism from a single cell. But these instructions largely play out through the proteins that DNA encodes; and most proteins do not work alone. Instead they come together in different combinations, or complexes, and a single protein may participate in many complexes with different activities. Proteins are so small that it is difficult to get clear information about what they look like. Visualizing protein complexes is even harder. Most protein–protein interactions remain poorly understood, even in the best-studied organisms such as humans, yeast, and bacteria. Proteins are made from smaller molecules, called amino acids, strung together one after the other. The order in which different amino acids are arranged in a protein determines the protein’s shape and ultimately its function. Like DNA, protein sequences can change over time. Sometimes, the sequence of one protein changes in a way that prevents it binding to another protein. If these two proteins must work together for an organism to survive, the second protein will often develop a compensating change that allows the protein–protein complex to reform. Identifying pairs of changes in the sequences of pairs of proteins suggests that the two proteins interact and gives some information about how the proteins fit together. Different species can have copies of the same proteins that have slightly different sequences. Since the DNA sequences from many different organisms are already known, there are now many opportunities to find sites in pairs of proteins that have evolved together, or co-evolved, over time. To find sites that seem to have co-evolved, Hopf et al. used a computer program based on an approach from statistical physics to look at pairs of proteins that were already known to form complexes. Co-evolving sites were found in over 300 pairs of proteins; including 76 where the structure of the complex was already known. When sites that were predicted to be co-evolving were then mapped to these known complex structures, the co-evolving sites were remarkably close to the true protein–protein contacts. This indicates that the information from the co-evolved sequences is sufficient to show how two proteins fit together. Hopf et al. then turned their attention to 82 pairs of proteins that were thought to interact, but where a structure was unavailable. For 32 of these pairs, structures of the entire complex could be predicted, showing how the two proteins might interact. Furthermore, when other researchers subsequently worked out the structure of one of these complexes, the prediction was a good match to the solved complex structure. The machinery of life is largely made up of proteins, which must interact in ever-changing but precise ways. The new methods developed by Hopf et al. provide a new way to discover and investigate the details of these interactions.

Journal Article

Share this book

Add to My Shelf

FreeContact: fast and free software for protein contact prediction from residue co-evolution

by Rost, Burkhard , Hopf, Thomas A , Kalaš, Matúš in Algorithms , Bioinformatics , Biology

2014

Background 20 years of improved technology and growing sequences now renders residue-residue contact constraints in large protein families through correlated mutations accurate enough to drive de novo predictions of protein three-dimensional structure. The method EVfold broke new ground using mean-field Direct Coupling Analysis (EVfold-mfDCA); the method PSICOV applied a related concept by estimating a sparse inverse covariance matrix. Both methods (EVfold-mfDCA and PSICOV) are publicly available, but both require too much CPU time for interactive applications. On top, EVfold-mfDCA depends on proprietary software. Results Here, we present FreeContact , a fast, open source implementation of EVfold-mfDCA and PSICOV. On a test set of 140 proteins, FreeContact was almost eight times faster than PSICOV without decreasing prediction performance. The EVfold-mfDCA implementation of FreeContact was over 220 times faster than PSICOV with negligible performance decrease. EVfold-mfDCA was unavailable for testing due to its dependency on proprietary software. FreeContact is implemented as the free C++ library “libfreecontact”, complete with command line tool “freecontact”, as well as Perl and Python modules. All components are available as Debian packages. FreeContact supports the BioXSD format for interoperability. Conclusions FreeContact provides the opportunity to compute reliable contact predictions in any environment (desktop or cloud).

Journal Article

Share this book

Add to My Shelf

Quantification and discovery of sequence determinants of protein‐per‐mRNA amount in 29 human tissues

by Hahne, Hannes , Eraslan, Basak , Uhlén, Mathias in Amino acid sequence , Assaying , Binding

2019

Despite their importance in determining protein abundance, a comprehensive catalogue of sequence features controlling protein‐to‐mRNA (PTR) ratios and a quantification of their effects are still lacking. Here, we quantified PTR ratios for 11,575 proteins across 29 human tissues using matched transcriptomes and proteomes. We estimated by regression the contribution of known sequence determinants of protein synthesis and degradation in addition to 45 mRNA and 3 protein sequence motifs that we found by association testing. While PTR ratios span more than 2 orders of magnitude, our integrative model predicts PTR ratios at a median precision of 3.2‐fold. A reporter assay provided functional support for two novel UTR motifs, and an immobilized mRNA affinity competition‐binding assay identified motif‐specific bound proteins for one motif. Moreover, our integrative model led to a new metric of codon optimality that captures the effects of codon frequency on protein synthesis and degradation. Altogether, this study shows that a large fraction of PTR ratio variation in human tissues can be predicted from sequence, and it identifies many new candidate post‐transcriptional regulatory elements. Synopsis Protein‐to‐mRNA (PTR) ratios are quantified across 29 human tissues using matched transcriptomes and proteomes. Sequence‐based predictions of tissue‐specific PTR ratios reveal novel post‐transcriptional regulatory elements and yield a new metrics of codon optimality. A sequence‐based model predicts protein‐to‐mRNA ratios for 29 human tissues at a median precision across genes of 3.2‐fold. Reporter assays provide functional support for two novel UTR motifs and a proteome‐wide competition‐binding assay identifies motif‐specific bound proteins for one motif. Protein‐to‐mRNA adaptation index (PTR‐AI), a new metrics of codon optimality, captures the effects of codon frequency on protein synthesis and degradation. Graphical Abstract Protein‐to‐mRNA (PTR) ratios are quantified across 29 human tissues using matched transcriptomes and proteomes. Sequence‐based predictions of tissue‐specific PTR ratios reveal novel post‐transcriptional regulatory elements and yield a new metrics of codon optimality.

Journal Article

Share this book

Add to My Shelf

Protein structure determination by combining sparse NMR data with evolutionary couplings

by Hopf, Thomas A , Sander, Chris , Montelione, Gaetano T in 101/6 , 631/114/2411 , 631/1647/2258/878/1263

2015

A hybrid method that combines sparse NMR spectroscopy data with evolutionary residue-residue coupling information is used to solve accurate structures of large proteins. Accurate determination of protein structure by NMR spectroscopy is challenging for larger proteins, for which experimental data are often incomplete and ambiguous. Evolutionary sequence information together with advances in maximum entropy statistical methods provide a rich complementary source of structural constraints. We have developed a hybrid approach (evolutionary coupling–NMR spectroscopy; EC-NMR) combining sparse NMR data with evolutionary residue-residue couplings and demonstrate accurate structure determination for several proteins 6−41 kDa in size.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter