Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
56
result(s) for
"Babbitt, Patricia C"
Sort by:
Evolutionary and molecular foundations of multiple contemporary functions of the nitroreductase superfamily
by
Akiva, Eyal
,
Tokuriki, Nobuhiko
,
Babbitt, Patricia C.
in
Biological evolution
,
Biological Sciences
,
Biophysics and Computational Biology
2017
Insight regarding how diverse enzymatic functions and reactions have evolved from ancestral scaffolds is fundamental to understanding chemical and evolutionary biology, and for the exploitation of enzymes for biotechnology. We undertook an extensive computational analysis using a unique and comprehensive combination of tools that include large-scale phylogenetic reconstruction to determine the sequence, structural, and functional relationships of the functionally diverse flavin mononucleotide-dependent nitroreductase (NTR) superfamily (>24,000 sequences from all domains of life, 54 structures, and >10 enzymatic functions). Our results suggest an evolutionary model in which contemporary subgroups of the superfamily have diverged in a radial manner from a minimal flavin-binding scaffold. We identified the structural design principle for this divergence: Insertions at key positions in the minimal scaffold that, combined with the fixation of key residues, have led to functional specialization. These results will aid future efforts to delineate the emergence of functional diversity in enzyme superfamilies, provide clues for functional inference for superfamily members of unknown function, and facilitate rational redesign of the NTR scaffold.
Journal Article
Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies
by
Dodevski, Igor
,
Babbitt, Patricia C.
,
Brown, Shoshana D.
in
Accuracy
,
Biocatalysis
,
Biochemistry/Bioinformatics
2009
Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error levels (close to 0% for most families); the two other protein sequence databases (GenBank NR and TrEMBL) and the protein sequences in the KEGG pathways database exhibit similar and surprisingly high levels of misannotation that average 5%-63% across the six superfamilies studied. For 10 of the 37 families examined, the level of misannotation in one or more of these databases is >80%. Examination of the NR database over time shows that misannotation has increased from 1993 to 2005. The types of misannotation that were found fall into several categories, most associated with \"overprediction\" of molecular function. These results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized. Strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannotation.
Journal Article
Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies
by
Ferrin, Thomas E.
,
Babbitt, Patricia C.
,
Atkinson, Holly J.
in
Abundance
,
Accessibility
,
Acids
2009
The dramatic increase in heterogeneous types of biological data--in particular, the abundance of new protein sequences--requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity--GPCRs and kinases from humans, and the crotonase superfamily of enzymes--we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships.
Journal Article
An Atlas of the Thioredoxin Fold Class Reveals the Complexity of Function-Enabling Adaptations
2009
The group of proteins that contain a thioredoxin (Trx) fold is huge and diverse. Assessment of the variation in catalytic machinery of Trx fold proteins is essential in providing a foundation for understanding their functional diversity and predicting the function of the many uncharacterized members of the class. The proteins of the Trx fold class retain common features-including variations on a dithiol CxxC active site motif-that lead to delivery of function. We use protein similarity networks to guide an analysis of how structural and sequence motifs track with catalytic function and taxonomic categories for 4,082 representative sequences spanning the known superfamilies of the Trx fold. Domain structure in the fold class is varied and modular, with 2.8% of sequences containing more than one Trx fold domain. Most member proteins are bacterial. The fold class exhibits many modifications to the CxxC active site motif-only 56.8% of proteins have both cysteines, and no functional groupings have absolute conservation of the expected catalytic motif. Only a small fraction of Trx fold sequences have been functionally characterized. This work provides a global view of the complex distribution of domains and catalytic machinery throughout the fold class, showing that each superfamily contains remnants of the CxxC active site. The unifying context provided by this work can guide the comparison of members of different Trx fold superfamilies to gain insight about their structure-function relationships, illustrated here with the thioredoxins and peroxiredoxins.
Journal Article
Large-Scale Determination of Sequence, Structure, and Function Relationships in Cytosolic Glutathione Transferases across the Biosphere
by
Hillerich, Brandan
,
Stead, Mark
,
Vetting, Matthew W.
in
Amino Acid Sequence
,
Base Sequence
,
Binding Sites
2014
The cytosolic glutathione transferase (cytGST) superfamily comprises more than 13,000 nonredundant sequences found throughout the biosphere. Their key roles in metabolism and defense against oxidative damage have led to thousands of studies over several decades. Despite this attention, little is known about the physiological reactions they catalyze and most of the substrates used to assay cytGSTs are synthetic compounds. A deeper understanding of relationships across the superfamily could provide new clues about their functions. To establish a foundation for expanded classification of cytGSTs, we generated similarity-based subgroupings for the entire superfamily. Using the resulting sequence similarity networks, we chose targets that broadly covered unknown functions and report here experimental results confirming GST-like activity for 82 of them, along with 37 new 3D structures determined for 27 targets. These new data, along with experimentally known GST reactions and structures reported in the literature, were painted onto the networks to generate a global view of their sequence-structure-function relationships. The results show how proteins of both known and unknown function relate to each other across the entire superfamily and reveal that the great majority of cytGSTs have not been experimentally characterized or annotated by canonical class. A mapping of taxonomic classes across the superfamily indicates that many taxa are represented in each subgroup and highlights challenges for classification of superfamily sequences into functionally relevant classes. Experimental determination of disulfide bond reductase activity in many diverse subgroups illustrate a theme common for many reaction types. Finally, sequence comparison between an enzyme that catalyzes a reductive dechlorination reaction relevant to bioremediation efforts with some of its closest homologs reveals differences among them likely to be associated with evolution of this unusual reaction. Interactive versions of the networks, associated with functional and other types of information, can be downloaded from the Structure-Function Linkage Database (SFLD; http://sfld.rbvi.ucsf.edu).
Journal Article
Biases in the Experimental Annotations of Protein Function and Their Effect on Our Understanding of Protein Function Space
by
Friedberg, Iddo
,
Thorman, Alexander W.
,
Babbitt, Patricia C.
in
Amino acid sequence
,
Animals
,
Biology
2013
The ongoing functional annotation of proteins relies upon the work of curators to capture experimental findings from scientific literature and apply them to protein sequence and structure data. However, with the increasing use of high-throughput experimental assays, a small number of experimental studies dominate the functional protein annotations collected in databases. Here, we investigate just how prevalent is the \"few articles - many proteins\" phenomenon. We examine the experimentally validated annotation of proteins provided by several groups in the GO Consortium, and show that the distribution of proteins per published study is exponential, with 0.14% of articles providing the source of annotations for 25% of the proteins in the UniProt-GOA compilation. Since each of the dominant articles describes the use of an assay that can find only one function or a small group of functions, this leads to substantial biases in what we know about the function of many proteins. Mass-spectrometry, microscopy and RNAi experiments dominate high throughput experiments. Consequently, the functional information derived from these experiments is mostly of the subcellular location of proteins, and of the participation of proteins in embryonic developmental pathways. For some organisms, the information provided by different studies overlap by a large amount. We also show that the information provided by high throughput experiments is less specific than those provided by low throughput experiments. Given the experimental techniques available, certain biases in protein function annotation due to high-throughput experiments are unavoidable. Knowing that these biases exist and understanding their characteristics and extent is important for database curators, developers of function annotation programs, and anyone who uses protein function annotation data to plan experiments.
Journal Article
Discovery of new enzymes and metabolic pathways by using structure and genome context
by
Kumar, Ritesh
,
Vetting, Matthew W.
,
Sakai, Ayano
in
631/114/2410
,
631/92/607
,
ABC transporters
2013
Pathway docking (
in silico
docking of metabolites to several enzymes and binding proteins in a metabolic pathway) enables the discovery of a catabolic pathway for the osmolyte
trans
-4-hydroxy-
l
-proline betaine.
Structural key to predicting enzyme function
Overprediction and database annotation errors in genome-sequencing projects have caused much confusion because of the difficulty of assigning valid functions to the proteins identified. These authors use structure-guided approaches for predicting the substrate specificities of several enzymes encoded by a bacterial gene cluster to correctly predict the
in vitro
activity of an enzyme of unknown function and identify the catabolic pathway in which it participates in cells. The substrate-liganded pose predicted by virtual library screening was confirmed experimentally, enzyme activities in the predicted pathway were confirmed by
in vitro
assays and genetic analyses, the intermediates were identified by metabolomics, and repression of the genes encoding the pathway by high salt concentrations was established by transcriptomics. This study establishes the utility of structure-guided functional predictions for the discovery of new metabolic pathways.
Assigning valid functions to proteins identified in genome projects is challenging: overprediction and database annotation errors are the principal concerns
1
. We and others
2
are developing computation-guided strategies for functional discovery with ‘metabolite docking’ to experimentally derived
3
or homology-based
4
three-dimensional structures. Bacterial metabolic pathways often are encoded by ‘genome neighbourhoods’ (gene clusters and/or operons), which can provide important clues for functional assignment. We recently demonstrated the synergy of docking and pathway context by ‘predicting’ the intermediates in the glycolytic pathway in
Escherichia coli
5
. Metabolite docking to multiple binding proteins and enzymes in the same pathway increases the reliability of
in silico
predictions of substrate specificities because the pathway intermediates are structurally similar. Here we report that structure-guided approaches for predicting the substrate specificities of several enzymes encoded by a bacterial gene cluster allowed the correct prediction of the
in vitro
activity of a structurally characterized enzyme of unknown function (PDB 2PMQ), 2-epimerization of
trans
-4-hydroxy-
l
-proline betaine (tHyp-B) and
cis
-4-hydroxy-
d
-proline betaine (cHyp-B), and also the correct identification of the catabolic pathway in which Hyp-B 2-epimerase participates. The substrate-liganded pose predicted by virtual library screening (docking) was confirmed experimentally. The enzymatic activities in the predicted pathway were confirmed by
in vitro
assays and genetic analyses; the intermediates were identified by metabolomics; and repression of the genes encoding the pathway by high salt concentrations was established by transcriptomics, confirming the osmolyte role of tHyp-B. This study establishes the utility of structure-guided functional predictions to enable the discovery of new metabolic pathways.
Journal Article
Molecular Diversity of Terpene Synthases in the Liverwort Marchantia polymorpha
by
Bell, Stephen A.
,
Linscott, Kristin B.
,
Jia, Qidong
in
Alkyl and Aryl Transferases - genetics
,
Alkyl and Aryl Transferases - metabolism
,
Evolution, Molecular
2016
Marchantia polymorpha is a basal terrestrial land plant, which like most liverworts accumulates structurally diverse terpenes believed to serve in deterring disease and herbivory. Previous studies have suggested that the mevalonate and methylerythritol phosphate pathways, present in evolutionarily diverged plants, are also operative in liverworts. However, the genes and enzymes responsible for the chemical diversity of terpenes have yet to be described. In this study, we resorted to a HMMER search tool to identify 17 putative terpene synthase genes from M. polymorpha transcriptomes. Functional characterization identified four diterpene synthase genes phylogenetically related to those found in diverged plants and nine rather unusual monoterpene and sesquiterpene synthase-like genes. The presence of separate monofunctional diterpene synthases for ent-copalyl diphosphate and ent-kaurene biosynthesis is similar to orthologs found in vascular plants, pushing the date of the underlying gene duplication and neofunctionalization of the ancestral diterpene synthase gene family to >400 million years ago. By contrast, the mono- and sesquiterpene synthases represent a distinct class of enzymes, not related to previously described plant terpene synthases and only distantly so to microbial-type terpene synthases. The absence of a Mg2+ binding, aspartate-rich, DDXXD motif places these enzymes in a noncanonical family of terpene synthases.
Journal Article
DIVERGENT EVOLUTION OF ENZYMATIC FUNCTION: Mechanistically Diverse Superfamilies and Functionally Distinct Suprafamilies
by
Gerlt, John A.
,
Babbitt, Patricia C.
in
Amidohydrolases - chemistry
,
Amidohydrolases - metabolism
,
Catalytic Domain
2001
The protein sequence and structure databases are now sufficiently
representative that strategies nature uses to evolve new catalytic functions
can be identified. Groups of divergently related enzymes whose members catalyze
different reactions but share a common partial reaction, intermediate, or
transition state (mechanistically diverse superfamilies) have been discovered,
including the enolase, amidohydrolase, thiyl radical, crotonase,
vicinal-oxygen-chelate, and Fe-dependent oxidase superfamilies. Other groups of
divergently related enzymes whose members catalyze different overall reactions
that do not share a common mechanistic strategy (functionally distinct
supra
families) have also been identified: (
a
) functionally
distinct suprafamilies whose members catalyze successive transformations in the
tryptophan and histidine biosynthetic pathways and (
b
) functionally
distinct suprafamilies whose members catalyze different reactions in different
metabolic pathways. An understanding of the structural bases for the catalytic
diversity observed in super- and suprafamilies may provide the basis for
discovering the functions of proteins and enzymes in new genomes as well as
provide guidance for in vitro evolution/engineering of new enzymes.
Journal Article