Catalogue Search | MBRL

Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies

by Dodevski, Igor , Babbitt, Patricia C. , Brown, Shoshana D. in Accuracy , Biocatalysis , Biochemistry/Bioinformatics

2009

Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error levels (close to 0% for most families); the two other protein sequence databases (GenBank NR and TrEMBL) and the protein sequences in the KEGG pathways database exhibit similar and surprisingly high levels of misannotation that average 5%-63% across the six superfamilies studied. For 10 of the 37 families examined, the level of misannotation in one or more of these databases is >80%. Examination of the NR database over time shows that misannotation has increased from 1993 to 2005. The types of misannotation that were found fall into several categories, most associated with \"overprediction\" of molecular function. These results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized. Strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannotation.

Journal Article

Share this book

Add to My Shelf

Bridging gaps in traditional research training with iBiology Courses

by Vale, Ronald D. , Nguyen, Thi A. , Behrman, Shannon L. in Biology , Biology and Life Sciences , Careers

2024

iBiology Courses provide trainees with just-in-time learning resources to become effective researchers. These courses can help scientists build core research skills, plan their research projects and careers, and learn from scientists with diverse backgrounds.

Journal Article

Share this book

Add to My Shelf

Biases in the Experimental Annotations of Protein Function and Their Effect on Our Understanding of Protein Function Space

by Friedberg, Iddo , Thorman, Alexander W. , Babbitt, Patricia C. in Amino acid sequence , Animals , Biology

2013

The ongoing functional annotation of proteins relies upon the work of curators to capture experimental findings from scientific literature and apply them to protein sequence and structure data. However, with the increasing use of high-throughput experimental assays, a small number of experimental studies dominate the functional protein annotations collected in databases. Here, we investigate just how prevalent is the \"few articles - many proteins\" phenomenon. We examine the experimentally validated annotation of proteins provided by several groups in the GO Consortium, and show that the distribution of proteins per published study is exponential, with 0.14% of articles providing the source of annotations for 25% of the proteins in the UniProt-GOA compilation. Since each of the dominant articles describes the use of an assay that can find only one function or a small group of functions, this leads to substantial biases in what we know about the function of many proteins. Mass-spectrometry, microscopy and RNAi experiments dominate high throughput experiments. Consequently, the functional information derived from these experiments is mostly of the subcellular location of proteins, and of the participation of proteins in embryonic developmental pathways. For some organisms, the information provided by different studies overlap by a large amount. We also show that the information provided by high throughput experiments is less specific than those provided by low throughput experiments. Given the experimental techniques available, certain biases in protein function annotation due to high-throughput experiments are unavoidable. Knowing that these biases exist and understanding their characteristics and extent is important for database curators, developers of function annotation programs, and anyone who uses protein function annotation data to plan experiments.

Journal Article

Share this book

Add to My Shelf

Broadening the impact of plant science through innovative, integrative, and inclusive outreach

by Macintosh, Gustavo C. , Sun, Ying , Ayalew, Mentewab in Arabidopsis , Climate change , expert opinion

2021

Population growth and climate change will impact food security and potentially exacerbate the environmental toll that agriculture has taken on our planet. These existential concerns demand that a passionate, interdisciplinary, and diverse community of plant science professionals is trained during the 21st century. Furthermore, societal trends that question the importance of science and expert knowledge highlight the need to better communicate the value of rigorous fundamental scientific exploration. Engaging students and the general public in the wonder of plants, and science in general, requires renewed efforts that take advantage of advances in technology and new models of funding and knowledge dissemination. In November 2018, funded by the National Science Foundation through the Arabidopsis Research and Training for the 21st century (ART 21) research coordination network, a symposium and workshop were held that included a diverse panel of students, scientists, educators, and administrators from across the US. The purpose of the workshop was to re‐envision how outreach programs are funded, evaluated, acknowledged, and shared within the plant science community. One key objective was to generate a roadmap for future efforts. We hope that this document will serve as such, by providing a comprehensive resource for students and young faculty interested in developing effective outreach programs. We also anticipate that this document will guide the formation of community partnerships to scale up currently successful outreach programs, and lead to the design of future programs that effectively engage with a more diverse student body and citizenry.

Journal Article

Share this book

Add to My Shelf

A large-scale evaluation of computational protein function prediction

by Džeroski, Sašo , Kihara, Daisuke , Rentzsch, Robert in 631/114/2410 , 631/1647/48 , Algorithms

2013

A report on the results of the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.

Journal Article

Share this book

Add to My Shelf

A large-scale evaluation of computational protein function prediction

by Džeroski, Sašo , Kihara, Daisuke , Rentzsch, Robert in annotation , BASIC BIOLOGICAL SCIENCES , Biochemistry & Molecular Biology

2013

Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today’s best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.

Journal Article

Share this book

Add to My Shelf

Bridging gaps in traditional research training with iBiology Courses

by Alexandra M. Schnoes , Thi A. Nguyen , Shannon L. Behrman

2024

Journal Article

Share this book

Add to My Shelf

Biases in the Experimental Annotations of Protein Function and Their Effect on Our Understanding of Protein Function Space

by Schnoes, Alexandra M , Friedberg, Iddo , Babbitt, Patricia C in Experiments , Genomes , Molecular biology

2013

The ongoing functional annotation of proteins relies upon the work of curators to capture experimental findings from scientific literature and apply them to protein sequence and structure data. However, with the increasing use of high-throughput experimental assays, a small number of experimental studies dominate the functional protein annotations collected in databases. Here, we investigate just how prevalent is the \"few articles - many proteins\" phenomenon. We examine the experimentally validated annotation of proteins provided by several groups in the GO Consortium, and show that the distribution of proteins per published study is exponential, with 0.14% of articles providing the source of annotations for 25% of the proteins in the UniProt-GOA compilation. Since each of the dominant articles describes the use of an assay that can find only one function or a small group of functions, this leads to substantial biases in what we know about the function of many proteins. Mass-spectrometry, microscopy and RNAi experiments dominate high throughput experiments. Consequently, the functional information derived from these experiments is mostly of the subcellular location of proteins, and of the participation of proteins in embryonic developmental pathways. For some organisms, the information provided by different studies overlap by a large amount. We also show that the information provided by high throughput experiments is less specific than those provided by low throughput experiments. Given the experimental techniques available, certain biases in protein function annotation due to high-throughput experiments are unavoidable. Knowing that these biases exist and understanding their characteristics and extent is important for database curators, developers of function annotation programs, and anyone who uses protein function annotation data to plan experiments.

Journal Article

Share this book

Add to My Shelf

Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies

by Schnoes, Alexandra M , Brown, Shoshana D , Dodevski, Igor in Accuracy , Bioinformatics , Genomes

2009

Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error levels (close to 0% for most families); the two other protein sequence databases (GenBank NR and TrEMBL) and the protein sequences in the KEGG pathways database exhibit similar and surprisingly high levels of misannotation that average 5%-63% across the six superfamilies studied. For 10 of the 37 families examined, the level of misannotation in one or more of these databases is >80%. Examination of the NR database over time shows that misannotation has increased from 1993 to 2005. The types of misannotation that were found fall into several categories, most associated with \"overprediction\" of molecular function. These results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized. Strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannotation.

Journal Article

Share this book

Add to My Shelf

Biases in the Experimental Annotations of Protein Function and their Effect on Our Understanding of Protein Function Space

by Schnoes, Alexandra M , Friedberg, Iddo , Babbitt, Patricia C in Annotations , Consortia , Experiments

2013

The ongoing functional annotation of proteins relies upon the work of curators to capture experimental findings from scientific literature and apply them to protein sequence and structure data. However, with the increasing use of high-throughput experimental assays, a small number of experimental studies dominate the functional protein annotations collected in databases. Here we investigate just how prevalent is the \"few articles -- many proteins\" phenomenon. We examine the experimentally validated annotation of proteins provided by several groups in the GO Consortium, and show that the distribution of proteins per published study is exponential, with 0.14% of articles providing the source of annotations for 25% of the proteins in the UniProt-GOA compilation. Since each of the dominant articles describes the use of an assay that can find only one function or a small group of functions, this leads to substantial biases in what we know about the function of many proteins. Mass-spectrometry, microscopy and RNAi experiments dominate high throughput experiments. Consequently, the functional information derived from these experiments is mostly of the subcellular location of proteins, and of the participation of proteins in embryonic developmental pathways. For some organisms, the information provided by different studies overlap by a large amount. We also show that the information provided by high throughput experiments is less specific than those provided by low throughput experiments. Given the experimental techniques available, certain biases in protein function annotation due to high-throughput experiments are unavoidable. Knowing that these biases exist and understanding their characteristics and extent is important for database curators, developers of function annotation programs, and anyone who uses protein function annotation data to plan experiments.

Paper

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter