Catalogue Search | MBRL

Comprehensive characterization of amino acid positions in protein structures reveals molecular effect of missense variants

by Ahmed, Shehab S. , Pérez-Palma, Eduardo , Wagner, Florence F. in Amino Acid Sequence , Amino acids , Biological Sciences

2020

Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acidsubstituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations’ positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants’ pathogenicity in terms of the perturbed molecular mechanisms.

Journal Article

Share this book

Add to My Shelf

Characterization of intrinsically disordered regions in proteins informed by human genetic diversity

by Ahmed, Shehab S. , Rahman, M. Sohel , Rifat, Zaara T. in Amino Acid Sequence , Amino acids , Annotations

2022

All proteomes contain both proteins and polypeptide segments that don’t form a defined three-dimensional structure yet are biologically active—called intrinsically disordered proteins and regions (IDPs and IDRs). Most of these IDPs/IDRs lack useful functional annotation limiting our understanding of their importance for organism fitness. Here we characterized IDRs using protein sequence annotations of functional sites and regions available in the UniProt knowledgebase (“UniProt features”: active site, ligand-binding pocket, regions mediating protein-protein interactions, etc.). By measuring the statistical enrichment of twenty-five UniProt features in 981 IDRs of 561 human proteins, we identified eight features that are commonly located in IDRs. We then collected the genetic variant data from the general population and patient-based databases and evaluated the prevalence of population and pathogenic variations in IDPs/IDRs. We observed that some IDRs tolerate 2 to 12-times more single amino acid-substituting missense mutations than synonymous changes in the general population. However, we also found that 37% of all germline pathogenic mutations are located in disordered regions of 96 proteins. Based on the observed-to-expected frequency of mutations, we categorized 34 IDRs in 20 proteins (DDX3X, KIT, RB1, etc.) as intolerant to mutation. Finally, using statistical analysis and a machine learning approach, we demonstrate that mutation-intolerant IDRs carry a distinct signature of functional features. Our study presents a novel approach to assign functional importance to IDRs by leveraging the wealth of available genetic data, which will aid in a deeper understating of the role of IDRs in biological processes and disease mechanisms.

Journal Article

Share this book

Add to My Shelf

Differential functional consequences of GRIN2A mutations associated with schizophrenia and neurodevelopmental disorders

by Baez-Nieto, David , Farsi, Zohreh , Campbell, Arthur J. in 631/208 , 631/378 , Epilepsy

2024

Human genetic studies have revealed rare missense and protein-truncating variants in GRIN2A , encoding for the GluN2A subunit of the NMDA receptors, that confer significant risk for schizophrenia (SCZ). Mutations in GRIN2A are also associated with epilepsy and developmental delay/intellectual disability (DD/ID). However, it remains enigmatic how alterations to the same protein can result in diverse clinical phenotypes. Here, we performed functional characterization of human GluN1/GluN2A heteromeric NMDA receptors that contain SCZ-linked GluN2A variants, and compared them to NMDA receptors with GluN2A variants associated with epilepsy or DD/ID. Our findings demonstrate that SCZ-associated GRIN2A variants were predominantly loss-of-function (LoF), whereas epilepsy and DD/ID-associated variants resulted in both gain- and loss-of-function phenotypes. We additionally show that M653I and S809R, LoF GRIN2A variants associated with DD/ID, exert a dominant-negative effect when co-expressed with a wild-type GluN2A, whereas E58Ter and Y698C, SCZ-linked LoF variants, and A727T, an epilepsy-linked LoF variant, do not. These data offer a potential mechanism by which SCZ/epilepsy and DD/ID-linked variants can cause different effects on receptor function and therefore result in divergent pathological outcomes.

Journal Article

Share this book

Add to My Shelf

DisPredict: A Predictor of Disordered Protein Using Optimized RBF Kernel

by Iqbal, Sumaiya , Hoque, Md Tamjidul in Algorithms , Amino Acid Sequence , Analysis

2015

Intrinsically disordered proteins or, regions perform important biological functions through their dynamic conformations during binding. Thus accurate identification of these disordered regions have significant implications in proper annotation of function, induced fold prediction and drug design to combat critical diseases. We introduce DisPredict, a disorder predictor that employs a single support vector machine with RBF kernel and novel features for reliable characterization of protein structure. DisPredict yields effective performance. In addition to 10-fold cross validation, training and testing of DisPredict was conducted with independent test datasets. The results were consistent with both the training and test error minimal. The use of multiple data sources, makes the predictor generic. The datasets used in developing the model include disordered regions of various length which are categorized as short and long having different compositions, different types of disorder, ranging from fully to partially disordered regions as well as completely ordered regions. Through comparison with other state of the art approaches and case studies, DisPredict is found to be a useful tool with competitive performance. DisPredict is available at https://github.com/tamjidul/DisPredict_v1.0.

Journal Article

Share this book

Add to My Shelf

Estimation of Position Specific Energy as a Feature of Protein Residues from Sequence Alone for Structural Classification

by Iqbal, Sumaiya , Hoque, Md Tamjidul in Acids , Amino Acid Sequence , Amino acids

2016

A set of features computed from the primary amino acid sequence of proteins, is crucial in the process of inducing a machine learning model that is capable of accurately predicting three-dimensional protein structures. Solutions for existing protein structure prediction problems are in need of features that can capture the complexity of molecular level interactions. With a view to this, we propose a novel approach to estimate position specific estimated energy (PSEE) of a residue using contact energy and predicted relative solvent accessibility (RSA). Furthermore, we demonstrate PSEE can be reasonably estimated based on sequence information alone. PSEE is useful in identifying the structured as well as unstructured or, intrinsically disordered region of a protein by computing favorable and unfavorable energy respectively, characterized by appropriate threshold. The most intriguing finding, verified empirically, is the indication that the PSEE feature can effectively classify disorder versus ordered residues and can segregate different secondary structure type residues by computing the constituent energies. PSEE values for each amino acid strongly correlate with the hydrophobicity value of the corresponding amino acid. Further, PSEE can be used to detect the existence of critical binding regions that essentially undergo disorder-to-order transitions to perform crucial biological functions. Towards an application of disorder prediction using the PSEE feature, we have rigorously tested and found that a support vector machine model informed by a set of features including PSEE consistently outperforms a model with an identical set of features with PSEE removed. In addition, the new disorder predictor, DisPredict2, shows competitive performance in predicting protein disorder when compared with six existing disordered protein predictors.

Journal Article

Share this book

Add to My Shelf

Mapping MAVE data for use in human genomics applications

by Da, Estelle Y. , Stevenson, James S. , Riehle, Kevin in Animal Genetics and Genomics , Bioinformatics , Biomedical and Life Sciences

2025

Background Experimental data from functional assays have a critical role in interpreting the impact of genetic variants. Assay data must be unambiguously mapped to a reference genome to make it accessible, but it is often reported relative to assay-specific sequences, complicating downstream use and integration of variant data across resources. To make multiplexed assays of variant effect (MAVE) data more broadly available to the research and clinical communities, the Atlas of Variant Effects Alliance mapped MAVE data from the MaveDB community database to human reference sequences, creating an extensive set of machine-readable homology mappings that are incorporated into widely used human genomics applications. Results Here, we map approximately 9.0 million individual protein and nucleotide variants in MaveDB to the human genome, describing the examined variants with respect to human reference sequences while preserving the data provenance of the original MAVE sequences. We then disseminate the results to major genomic resources including the Genomics 2 Proteins Portal, UCSC Genome Browser, Ensembl Variant Effect Predictor, and DECIPHER platform. Within these applications, MAVE variants can now be visualized and integrated with other relevant clinical and biological data, making additional knowledge available when performing variant interpretation and conducting other research activities. Conclusions Mapping MAVE variants to human reference sequences and sharing the mapped dataset with several key human genomics applications enables a new and diverse set of applications for MAVE data. This study provides increased access to functional data that can assist in clinical variant interpretation pipelines and enable biomedical research and discovery.

Journal Article

Share this book

Add to My Shelf

Allosteric inhibition of PPM1D serine/threonine phosphatase via an altered conformational state

by Qian, Yue , Sperling, Adam S. , Vernier, Camille in 13/31 , 38/44 , 38/70

2022

PPM1D encodes a serine/threonine phosphatase that regulates numerous pathways including the DNA damage response and p53. Activating mutations and amplification of PPM1D are found across numerous cancer types. GSK2830371 is a potent and selective allosteric inhibitor of PPM1D, but its mechanism of binding and inhibition of catalytic activity are unknown. Here we use computational, biochemical and functional genetic studies to elucidate the molecular basis of GSK2830371 activity. These data confirm that GSK2830371 binds an allosteric site of PPM1D with high affinity. By further incorporating data from hydrogen deuterium exchange mass spectrometry and sedimentation velocity analytical ultracentrifugation, we demonstrate that PPM1D exists in an equilibrium between two conformations that are defined by the movement of the flap domain, which is required for substrate recognition. A hinge region was identified that is critical for switching between the two conformations and was directly implicated in the high-affinity binding of GSK2830371 to PPM1D. We propose that the two conformations represent active and inactive forms of the protein reflected by the position of the flap, and that binding of GSK2830371 shifts the equilibrium to the inactive form. Finally, we found that C-terminal truncating mutations proximal to residue 400 result in destabilization of the protein via loss of a stabilizing N- and C-terminal interaction, consistent with the observation from human genetic data that nearly all PPM1D mutations in cancer are truncating and occur distal to residue 400. Taken together, our findings elucidate the mechanism by which binding of a small molecule to an allosteric site of PPM1D inhibits its activity and provides insights into the biology of PPM1D. In this work, the authors report a sophisticated combination of genetic, biophysical, and biochemical analyses to identifies the cycling conformational states of PPM1D. The findings reveal how an allosteric inhibitor locks the protein into a conformationally inactive state, and explain the distribution of PPM1D activating mutations in cancer.

Journal Article

Share this book

Add to My Shelf

PRICKLE2 revisited—further evidence implicating PRICKLE2 in neurodevelopmental disorders

by Zweier Christiane , Iqbal Sumaiya , Kraus, Cornelia in Amino acids , Autism , Computational neuroscience

2021

PRICKLE2 encodes a member of a highly conserved family of proteins that are involved in the non-canonical Wnt and planar cell polarity signaling pathway. Prickle2 localizes to the post-synaptic density, and interacts with post-synaptic density protein 95 and the NMDA receptor. Loss-of-function variants in prickle2 orthologs cause seizures in flies and mice but evidence for the role of PRICKLE2 in human disease is conflicting. Our goal is to provide further evidence for the role of this gene in humans and define the phenotypic spectrum of PRICKLE2-related disorders. We report a cohort of six subjects from four unrelated families with heterozygous rare PRICKLE2 variants (NM_198859.4). Subjects were identified through an international collaboration. Detailed phenotypic and genetic assessment of the subjects were carried out and in addition, we assessed the variant pathogenicity using bioinformatic approaches. We identified two missense variants (c.122 C > T; p.(Pro41Leu), c.680 C > G; p.(Thr227Arg)), one nonsense variant (c.214 C > T; p.(Arg72*) and one frameshift variant (c.1286_1287delGT; p.(Ser429Thrfs*56)). While the p.(Ser429Thrfs*56) variant segregated with disease in a family with three affected females, the three remaining variants occurred de novo. Subjects shared a mild phenotype characterized by global developmental delay, behavioral difficulties ± epilepsy, autistic features, and attention deficit hyperactive disorder. Computational analysis of the missense variants suggest that the altered amino acid residues are likely to be located in protein regions important for function. This paper demonstrates that PRICKLE2 is involved in human neuronal development and that pathogenic variants in PRICKLE2 cause neurodevelopmental delay, behavioral difficulties and epilepsy in humans.

Journal Article

Share this book

Add to My Shelf

Critical assessment of protein intrinsic disorder prediction

by Marino-Buslje Cristina , Veljkovic Nevena , Lobanov Michail Yu in Binding , Computer applications , Datasets

2021

Intrinsically disordered proteins, defying the traditional protein structure–function paradigm, are a challenge to study experimentally. Because a large part of our knowledge rests on computational predictions, it is crucial that their accuracy is high. The Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment was established as a community-based blind test to determine the state of the art in prediction of intrinsically disordered regions and the subset of residues involved in binding. A total of 43 methods were evaluated on a dataset of 646 proteins from DisProt. The best methods use deep learning techniques and notably outperform physicochemical methods. The top disorder predictor has Fmax = 0.483 on the full dataset and Fmax = 0.792 following filtering out of bona fide structured regions. Disordered binding regions remain hard to predict, with Fmax = 0.231. Interestingly, computing times among methods can vary by up to four orders of magnitude.Results are presented from the first Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment, a community-based blind test to determine the state of the art in predicting intrinsically disordered regions in proteins.

Journal Article

Share this book

Add to My Shelf

Genomics 2 Proteins portal: a resource and discovery tool for linking genetic screening outputs to protein sequences and structures

by Burgin, Alex , Nguyen, Duyen T. , Rubin, Alan F. in 631/114/129 , 631/114/2398 , 631/114/2401

2024

Recent advances in AI-based methods have revolutionized the field of structural biology. Concomitantly, high-throughput sequencing and functional genomics have generated genetic variants at an unprecedented scale. However, efficient tools and resources are needed to link disparate data types—to ‘map’ variants onto protein structures, to better understand how the variation causes disease, and thereby design therapeutics. Here we present the Genomics 2 Proteins portal ( https://g2p.broadinstitute.org/ ): a human proteome-wide resource that maps 20,076,998 genetic variants onto 42,413 protein sequences and 77,923 structures, with a comprehensive set of structural and functional features. Additionally, the Genomics 2 Proteins portal allows users to interactively upload protein residue-wise annotations (for example, variants and scores) as well as the protein structure beyond databases to establish the connection between genomics to proteins. The portal serves as an easy-to-use discovery tool for researchers and scientists to hypothesize the structure–function relationship between natural or synthetic variations and their molecular phenotypes. The Genomics 2 Proteins portal is an open-source tool for proteome-wide linking of human genetic variants to protein sequences and structures. The portal serves as a discovery tool to hypothesize the structure–function relationship between natural or synthetic variations and their molecular phenotypes.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter