Catalogue Search | MBRL

A Combinatorial Amino Acid Code for RNA Recognition by Pentatricopeptide Repeat Proteins

by Fujii, Sota , Barkan, Alice , Small, Ian in Agriculture , Amino Acid Sequence , Amino acids

2012

The pentatricopeptide repeat (PPR) is a helical repeat motif found in an exceptionally large family of RNA-binding proteins that functions in mitochondrial and chloroplast gene expression. PPR proteins harbor between 2 and 30 repeats and typically bind single-stranded RNA in a sequence-specific fashion. However, the basis for sequence-specific RNA recognition by PPR tracts has been unknown. We used computational methods to infer a code for nucleotide recognition involving two amino acids in each repeat, and we validated this model by recoding a PPR protein to bind novel RNA sequences in vitro. Our results show that PPR tracts bind RNA via a modular recognition mechanism that differs from previously described RNA-protein recognition modes and that underpins a natural library of specific protein/RNA partners of unprecedented size and diversity. These findings provide a significant step toward the prediction of native binding sites of the enormous number of PPR proteins found in nature. Furthermore, the extraordinary evolutionary plasticity of the PPR family suggests that the PPR scaffold will be particularly amenable to redesign for new sequence specificities and functions.

Journal Article

Share this book

Add to My Shelf

Neural networks to learn protein sequence–function relationships from deep mutational scanning data

by Gitter, Anthony , Fahlberg, Sarah A. , Gelman, Sam in Algorithms , Amino acid sequence , Amino Acid Sequence - genetics

2021

The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties.We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants.We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence–function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models’ ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.

Journal Article

Share this book

Add to My Shelf

Early Pleistocene enamel proteome from Dmanisi resolves Stephanorhinus phylogeny

by Palkopoulou, Eleftheria , Martínez-Navarro, Bienvenido , Sandoval Velasco, Marcela in 631/181/2474 , 631/181/414 , 631/208/182

2019

The sequencing of ancient DNA has enabled the reconstruction of speciation, migration and admixture events for extinct taxa 1 . However, the irreversible post-mortem degradation 2 of ancient DNA has so far limited its recovery—outside permafrost areas—to specimens that are not older than approximately 0.5 million years (Myr) 3 . By contrast, tandem mass spectrometry has enabled the sequencing of approximately 1.5-Myr-old collagen type I 4 , and suggested the presence of protein residues in fossils of the Cretaceous period 5 —although with limited phylogenetic use 6 . In the absence of molecular evidence, the speciation of several extinct species of the Early and Middle Pleistocene epoch remains contentious. Here we address the phylogenetic relationships of the Eurasian Rhinocerotidae of the Pleistocene epoch 7 – 9 , using the proteome of dental enamel from a Stephanorhinus tooth that is approximately 1.77-Myr old, recovered from the archaeological site of Dmanisi (South Caucasus, Georgia) 10 . Molecular phylogenetic analyses place this Stephanorhinus as a sister group to the clade formed by the woolly rhinoceros ( Coelodonta antiquitatis ) and Merck’s rhinoceros ( Stephanorhinus kirchbergensis ). We show that Coelodonta evolved from an early Stephanorhinus lineage, and that this latter genus includes at least two distinct evolutionary lines. The genus Stephanorhinus is therefore currently paraphyletic, and its systematic revision is needed. We demonstrate that sequencing the proteome of Early Pleistocene dental enamel overcomes the limitations of phylogenetic inference based on ancient collagen or DNA. Our approach also provides additional information about the sex and taxonomic assignment of other specimens from Dmanisi. Our findings reveal that proteomic investigation of ancient dental enamel—which is the hardest tissue in vertebrates 11 , and is highly abundant in the fossil record—can push the reconstruction of molecular evolution further back into the Early Pleistocene epoch, beyond the currently known limits of ancient DNA preservation. Palaeoproteomic analysis of dental enamel from an Early Pleistocene Stephanorhinus resolves the phylogeny of Eurasian Rhinocerotidae, by enabling the reconstruction of molecular evolution beyond the limits of ancient DNA preservation.

Journal Article

Share this book

Add to My Shelf

ECNet is an evolutionary context-integrated deep learning framework for protein engineering

by Zhao, Huimin , Jiang, Guangde , Luo, Yunan in 42/70 , 631/114/1305 , 631/92/469

2021

Machine learning has been increasingly used for protein engineering. However, because the general sequence contexts they capture are not specific to the protein being engineered, the accuracy of existing machine learning algorithms is rather limited. Here, we report ECNet (evolutionary context-integrated neural network), a deep-learning algorithm that exploits evolutionary contexts to predict functional fitness for protein engineering. This algorithm integrates local evolutionary context from homologous sequences that explicitly model residue-residue epistasis for the protein of interest with the global evolutionary context that encodes rich semantic and structural features from the enormous protein sequence universe. As such, it enables accurate mapping from sequence to function and provides generalization from low-order mutants to higher-order mutants. We show that ECNet predicts the sequence-function relationship more accurately as compared to existing machine learning algorithms by using ~50 deep mutational scanning and random mutagenesis datasets. Moreover, we used ECNet to guide the engineering of TEM-1 β-lactamase and identified variants with improved ampicillin resistance with high success rates. Protein engineering is an active area of research in which machine learning has proven quite powerful. Here, the authors present a deep learning method that integrates both general and protein-specific sequence representations to improve the engineering of one’s protein of interest.

Journal Article

Share this book

Add to My Shelf

Structural basis for specific single-stranded RNA recognition by designer pentatricopeptide repeat proteins

by Fan, Shilong , Wang, Xiang , Zou, Tingting in 631/45/535 , 631/61/338/469 , 82/16

2016

As a large family of RNA-binding proteins, pentatricopeptide repeat (PPR) proteins mediate multiple aspects of RNA metabolism in eukaryotes. Binding to their target single-stranded RNAs (ssRNAs) in a modular and base-specific fashion, PPR proteins can serve as designable modules for gene manipulation. However, the structural basis for nucleotide-specific recognition by designer PPR (dPPR) proteins remains to be elucidated. Here, we report four crystal structures of dPPR proteins in complex with their respective ssRNA targets. The dPPR repeats are assembled into a right-handed superhelical spiral shell that embraces the ssRNA. Interactions between different PPR codes and RNA bases are observed at the atomic level, revealing the molecular basis for the modular and specific recognition patterns of the RNA bases U, C, A and G. These structures not only provide insights into the functional study of PPR proteins but also open a path towards the potential design of synthetic sequence-specific RNA-binding proteins. Pentatricopeptide repeat (PPR) proteins bind RNA and are involved in the regulation of RNA metabolism in eukaryotes. Here, the authors examine the capability of these proteins as modules for gene manipulation using structural biology methods.

Journal Article

Share this book

Add to My Shelf

c-Src and c-Abl kinases control hierarchic phosphorylation and function of the CagA effector protein in Western and East Asian Helicobacter pylori strains

by Mueller, Doreen , Smolka, Adam , Wessler, Silja in Amino Acid Motifs , Amino Acid Sequence , Antigens, Bacterial

2012

Many bacterial pathogens inject into host cells effector proteins that are substrates for host tyrosine kinases such as Src and Abl family kinases. Phosphorylated effectors eventually subvert host cell signaling, aiding disease development. In the case of the gastric pathogen Helicobacter pylori, which is a major risk factor for the development of gastric cancer, the only known effector protein injected into host cells is the oncoprotein CagA. Here, we followed the hierarchic tyrosine phosphorylation of H. pylori CagA as a model system to study early effector phosphorylation processes. Translocated CagA is phosphorylated on Glu-Pro-Ile-Tyr-Ala (EPIYA) motifs EPIYA-A, EPIYA-B, and EPIYA-C in Western strains of H. pylori and EPIYA-A, EPIYA-B, and EPIYA-D in East Asian strains. We found that c-Src only phosphorylated EPIYA-C and EPIYA-D, whereas c-Abl phosphorylated EPIYA-A, EPIYA-B, EPIYA-C, and EPIYA-D. Further analysis revealed that CagA molecules were phosphorylated on 1 or 2 EPIYA motifs, but never simultaneously on 3 motifs. Furthermore, none of the phosphorylated EPIYA motifs alone was sufficient for inducing AGS cell scattering and elongation. The preferred combination of phosphorylated EPIYA motifs in Western strains was EPIYA-A and EPIYA-C, either across 2 CagA molecules or simultaneously on 1. Our study thus identifies a tightly regulated hierarchic phosphorylation model for CagA starting at EPIYA-C/D, followed by phosphorylation of EPIYA-A or EPIYA-B. These results provide insight for clinical H. pylori typing and clarify the role of phosphorylated bacterial effector proteins in pathogenesis.

Journal Article

Share this book

Add to My Shelf

Mutations in the pale aleurone color1 regulatory gene of the Zea mays anthocyanin pathway have distinct phenotypes relative to the functionally similar TRANSPARENT TESTA GLABRA1 gene in Arabidopsis thaliana

by Selinger, D.A , Chandler, V.L , Carey, C.C in Alleles , Amino Acid Sequence , amino acid sequences

2004

The pale aleurone color1 (pac1) locus, required for anthocyanin pigment in the aleurone and scutellum of the Zea mays (maize) seed, was cloned using Mutator transposon tagging. pac1 encodes a WD40 repeat protein closely related to anthocyanin regulatory proteins ANTHOCYANIN11 (AN11) (Petunia hybrida [petunia]) and TRANSPARENT TESTA GLABRA1 (TTG1) (Arabidopsis thaliana). Introduction of a 35S-Pac1 transgene into A. thaliana complemented multiple ttg1 mutant phenotypes, including ones nonexistent in Z. mays. Hybridization of Z. mays genomic BAC clones with the pac1 sequence identified an additional related gene, mp1. PAC1 and MP1 deduced protein sequences were used as queries to build a phylogenetic tree of homologous WD40 repeat proteins, revealing an ancestral gene duplication leading to two clades in plants, the PAC1 clade and the MP1 clade. Subsequent duplications within each clade have led to additional WD40 repeat proteins in particular species, with all mutants defective in anthocyanin expression contained in the PAC1 clade. Substantial differences in pac1, an11, and ttg1 mutant phenotypes suggest the evolutionary divergence of regulatory mechanisms for several traits that cannot be ascribed solely to divergence of the dicot and monocot protein sequences.

Journal Article

Share this book

Add to My Shelf

Transcriptome-wide identification of R2R3-MYB transcription factors in barley with their boron responsive expression analysis

by Tombuloglu, Huseyin , Kekec, Guzin , Sakcali, Mehmet Serdal in Agricultural production , Amino Acid Sequence , amino acid sequences

2013

MYB family of transcription factors (TF) comprises one of the largest transcription factors in plants and is represented in all eukaryotes. They include highly conserved MYB repeats (1R, R2R3, 3R, and 4R) in the N-terminus. In addition to this, they have diverse C-terminal sequences which help the protein gain wide distinct functions, such as controlling development, secondary metabolism, hormonal regulation and response to biotic and abiotic stress. Stress-responsive roles of the MYB TFs were reported for drought, salt, wounding, cold, freezing, dehydration and osmotic stresses. This study describes the identification of barley R2R3-MYB TFs including their expression analysis in tissues under control and Boron (B) toxic conditions. Conserved motifs for MYB proteins were searched into barley full-transcriptome RNA-seq data and a total of 320 protein sequences were filtered as MYB TFs in which 51 of them corresponded to R2R3 MYB TFs. Using various bioinformatics tools, their conserved domain structures, chromosomal distributions, gene duplications, comparative functional analysis, as well as phylogenetic relations with Arabidopsis thaliana, were conducted. Beside the RNA-seq data-based expression pattern analysis of 51 R2R3 MYB TFs, quantitative analysis of selected R2R3 MYB TF genes was assessed in control and B-stressed root and leaf tissues. Critical B-induced R2R3 MYB TFs were identified. It was concluded that the results would be useful for functional characterizations of R2R3-type MYB transcription factors that are possibly involved in both B stress and divergent regulation mechanisms in plants.

Journal Article

Share this book

Add to My Shelf

Structural Determinants at the Interface of the ARC2 and Leucine-Rich Repeat Domains Control the Activation of the Plant Immune Receptors Rx1 and Gpa2

by Pomp, Rikus , Roosien, Jan , Bakker, Erin in amino acid motifs , Amino Acid Sequence , amino acid sequences

2013

Many plant and animal immune receptors have a modular nucleotide-binding-leucine-rich repeat (NB-LRR) architecture in which a nucleotide-binding switch domain, NB-ARC, is tethered to a LRR sensor domain. The cooperation between the switch and sensor domains, which regulates the activation of these proteins, is poorly understood. Here, we report structural determinants governing the interaction between the NB-ARC and LRR in the highly homologous plant immune receptors Gpa2 and Rx1, which recognize the potato cyst nematode Globodera pallida and Potato virus X, respectively. Systematic shuffling of polymorphic sites between Gpa2 and Rx1 showed that a minimal region in the ARC2 and N-terminal repeats of the LRR domain coordinate the activation state of the protein. We identified two closely spaced amino acid residues in this region of the ARC2 (positions 401 and 403) that distinguish between autoactivation and effector-triggered activation. Furthermore, a highly acidic loop region in the ARC2 domain and basic patches in the N-terminal end of the LRR domain were demonstrated to be required for the physical interaction between the ARC2 and LRR. The NB-ARC and LRR domains dissociate upon effector-dependent activation, and the complementary-charged regions are predicted to mediate a fast reassociation, enabling multiple rounds of activation. Finally, we present a mechanistic model showing how the ARC2, NB, and N-terminal half of the LRR form a clamp, which regulates the dissociation and reassociation of the switch and sensor domains in NB-LRR proteins.

Journal Article

Share this book

Add to My Shelf

Genomic Characterization of a Newly Discovered Coronavirus Associated with Acute Respiratory Distress Syndrome in Humans

by Gorbalenya, Alexander E. , Raj, V. Stalin , Bestebroer, Theo M. in Amino acid sequence , amino acid sequences , Betacoronavirus

2012

A novel human coronavirus (HCoV-EMC/2012) was isolated from a man with acute pneumonia and renal failure in June 2012. This report describes the complete genome sequence, genome organization, and expression strategy of HCoV-EMC/2012 and its relation with known coronaviruses. The genome contains 30,119 nucleotides and contains at least 10 predicted open reading frames, 9 of which are predicted to be expressed from a nested set of seven subgenomic mRNAs. Phylogenetic analysis of the replicase gene of coronaviruses with completely sequenced genomes showed that HCoV-EMC/2012 is most closely related to Tylonycteris bat coronavirus HKU4 (BtCoV-HKU4) and Pipistrellus bat coronavirus HKU5 (BtCoV-HKU5), which prototype two species in lineage C of the genus Betacoronavirus . In accordance with the guidelines of the International Committee on Taxonomy of Viruses, and in view of the 75% and 77% amino acid sequence identity in 7 conserved replicase domains with BtCoV-HKU4 and BtCoV-HKU5, respectively, we propose that HCoV-EMC/2012 prototypes a novel species in the genus Betacoronavirus . HCoV-EMC/2012 may be most closely related to a coronavirus detected in Pipistrellus pipistrellus in The Netherlands, but because only a short sequence from the most conserved part of the RNA-dependent RNA polymerase-encoding region of the genome was reported for this bat virus, its genetic distance from HCoV-EMC remains uncertain. HCoV-EMC/2012 is the sixth coronavirus known to infect humans and the first human virus within betacoronavirus lineage C. IMPORTANCE Coronaviruses are capable of infecting humans and many animal species. Most infections caused by human coronaviruses are relatively mild. However, the outbreak of severe acute respiratory syndrome (SARS) caused by SARS-CoV in 2002 to 2003 and the fatal infection of a human by HCoV-EMC/2012 in 2012 show that coronaviruses are able to cause severe, sometimes fatal disease in humans. We have determined the complete genome of HCoV-EMC/2012 using an unbiased virus discovery approach involving next-generation sequencing techniques, which enabled subsequent state-of-the-art bioinformatics, phylogenetics, and taxonomic analyses. By establishing its complete genome sequence, HCoV-EMC/2012 was characterized as a new genotype which is closely related to bat coronaviruses that are distant from SARS-CoV. We expect that this information will be vital to rapid advancement of both clinical and vital research on this emerging pathogen. Coronaviruses are capable of infecting humans and many animal species. Most infections caused by human coronaviruses are relatively mild. However, the outbreak of severe acute respiratory syndrome (SARS) caused by SARS-CoV in 2002 to 2003 and the fatal infection of a human by HCoV-EMC/2012 in 2012 show that coronaviruses are able to cause severe, sometimes fatal disease in humans. We have determined the complete genome of HCoV-EMC/2012 using an unbiased virus discovery approach involving next-generation sequencing techniques, which enabled subsequent state-of-the-art bioinformatics, phylogenetics, and taxonomic analyses. By establishing its complete genome sequence, HCoV-EMC/2012 was characterized as a new genotype which is closely related to bat coronaviruses that are distant from SARS-CoV. We expect that this information will be vital to rapid advancement of both clinical and vital research on this emerging pathogen.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter