Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
70,744
result(s) for
"Annotation"
Sort by:
eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale
by
Huerta-Cepas, Jaime
,
Letunic, Ivica
,
Hernández-Plaza, Ana
in
Annotations
,
Databases, Genetic
,
Flavored milk drinks
2021
Abstract
Even though automated functional annotation of genes represents a fundamental step in most genomic and metagenomic workflows, it remains challenging at large scales. Here, we describe a major upgrade to eggNOG-mapper, a tool for functional annotation based on precomputed orthology assignments, now optimized for vast (meta)genomic data sets. Improvements in version 2 include a full update of both the genomes and functional databases to those from eggNOG v5, as well as several efficiency enhancements and new features. Most notably, eggNOG-mapper v2 now allows for: 1) de novo gene prediction from raw contigs, 2) built-in pairwise orthology prediction, 3) fast protein domain discovery, and 4) automated GFF decoration. eggNOG-mapper v2 is available as a standalone tool or as an online service at http://eggnog-mapper.embl.de.
Journal Article
GFF Utilities: GffRead and GffCompare version 2; peer review: 3 approved
2020
GTF (Gene Transfer Format) and GFF (General Feature Format) are popular file formats used by bioinformatics programs to represent and exchange information about various genomic features, such as gene and transcript locations and structure. GffRead and GffCompare are open source programs that provide extensive and efficient solutions to manipulate files in a GTF or GFF format. While GffRead can convert, sort, filter, transform, or cluster genomic features, GffCompare can be used to compare and merge different gene annotations.
Availability and implementation: GFF utilities are implemented in C++ for Linux and OS X and released as open source under an MIT license (
https://github.com/gpertea/gffread,
https://github.com/gpertea/gffcompare).
Journal Article
Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models
2021
As the number of single‐cell transcriptomics datasets grows, the natural next step is to integrate the accumulating data to achieve a common ontology of cell types and states. However, it is not straightforward to compare gene expression levels across datasets and to automatically assign cell type labels in a new dataset based on existing annotations. In this manuscript, we demonstrate that our previously developed method, scVI, provides an effective and fully probabilistic approach for joint representation and analysis of scRNA‐seq data, while accounting for uncertainty caused by biological and measurement noise. We also introduce single‐cell ANnotation using Variational Inference (scANVI), a semi‐supervised variant of scVI designed to leverage existing cell state annotations. We demonstrate that scVI and scANVI compare favorably to state‐of‐the‐art methods for data integration and cell state annotation in terms of accuracy, scalability, and adaptability to challenging settings. In contrast to existing methods, scVI and scANVI integrate multiple datasets with a single generative model that can be directly used for downstream tasks, such as differential expression. Both methods are easily accessible through scvi‐tools.
SYNOPSIS
This study demonstrates the ability of scVI to integrate single‐cell RNA‐seq datasets in a variety of settings and presents scANVI, a new development based on scVI for automated annotation of cell types and states.
In scVI, datasets from different labs and technologies are integrated in a joint latent space.
In scANVI, cell type annotations are transferred between datasets and across different scenarios.
Uncertainties of differential gene expression in multiple samples are quantified.
The performance of scVI and scANVI in data integration and cell state annotation is superior to other related methods.
Graphical Abstract
This study demonstrates the ability of scVI to integrate single‐cell RNA‐seq datasets in a variety of settings and presents scANVI, a new development based on scVI for automated annotation of cell types and states.
Journal Article
MicrobeAnnotator: a user-friendly, comprehensive functional annotation pipeline for microbial genomes
by
Ruiz-Perez, Carlos A.
,
Konstantinidis, Konstantinos T.
,
Conrad, Roth E.
in
Algorithms
,
Annotations
,
Antibiotics
2021
Background
High-throughput sequencing has increased the number of available microbial genomes recovered from isolates, single cells, and metagenomes. Accordingly, fast and comprehensive functional gene annotation pipelines are needed to analyze and compare these genomes. Although several approaches exist for genome annotation, these are typically not designed for easy incorporation into analysis pipelines, do not combine results from different annotation databases or offer easy-to-use summaries of metabolic reconstructions, and typically require large amounts of computing power for high-throughput analysis not available to the average user.
Results
Here, we introduce MicrobeAnnotator, a fully automated, easy-to-use pipeline for the comprehensive functional annotation of microbial genomes that combines results from several reference protein databases and returns the matching annotations together with key metadata such as the interlinked identifiers of matching reference proteins from multiple databases [KEGG Orthology (KO), Enzyme Commission (E.C.), Gene Ontology (GO), Pfam, and InterPro]. Further, the functional annotations are summarized into Kyoto Encyclopedia of Genes and Genomes (KEGG) modules as part of a graphical output (heatmap) that allows the user to quickly detect differences among (multiple) query genomes and cluster the genomes based on their metabolic similarity. MicrobeAnnotator is implemented in Python 3 and is freely available under an open-source Artistic License 2.0 from
https://github.com/cruizperez/MicrobeAnnotator
.
Conclusions
We demonstrated the capabilities of MicrobeAnnotator by annotating 100
Escherichia coli
and 78 environmental Candidate Phyla Radiation (CPR) bacterial genomes and comparing the results to those of other popular tools. We showed that the use of multiple annotation databases allows MicrobeAnnotator to recover more annotations per genome compared to faster tools that use reduced databases and is computationally efficient for use in personal computers. The output of MicrobeAnnotator can be easily incorporated into other analysis pipelines while the results of other annotation tools can be seemingly incorporated into MicrobeAnnotator to generate summary plots.
Journal Article
Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
by
Chougule, Kapeel
,
Agda, Jireh R. A.
,
Ou, Shujun
in
Accuracy
,
Animal Genetics and Genomics
,
Animals
2019
Background
Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations.
Results
We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and
F
1
. Using the most robust programs, we create a comprehensive pipeline called Extensive
de-novo
TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species.
Conclusions
The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available:
https://github.com/oushujun/EDTA
.
Journal Article
The Ensembl Variant Effect Predictor
by
Ritchie, Graham R. S.
,
Cunningham, Fiona
,
Riat, Harpreet Singh
in
Animal Genetics and Genomics
,
Annotations
,
Bioinformatics
2016
The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.
Journal Article
Review on the Computational Genome Annotation of Sequences Obtained by Next-Generation Sequencing
2020
Next-Generation Sequencing (NGS) has made it easier to obtain genome-wide sequence data and it has shifted the research focus into genome annotation. The challenging tasks involved in annotation rely on the currently available tools and techniques to decode the information contained in nucleotide sequences. This information will improve our understanding of general aspects of life and evolution and improve our ability to diagnose genetic disorders. Here, we present a summary of both structural and functional annotations, as well as the associated comparative annotation tools and pipelines. We highlight visualization tools that immensely aid the annotation process and the contributions of the scientific community to the annotation. Further, we discuss quality-control practices and the need for re-annotation, and highlight the future of annotation.
Journal Article
Shifting the limits in wheat research and breeding using a fully annotated reference genome
by
Institut des Sciences des Plantes de Paris-Saclay (IPS2 (UMR_9213 / UMR_1403)) ; Institut National de la Recherche Agronomique (INRA)-Université Paris-Sud - Paris 11 (UP11)-Université Paris Diderot - Paris 7 (UPD7)-Université d'Évry-Val-d'Essonne (UEVE)-Centre National de la Recherche Scientifique (CNRS)
,
European Project: 212019,EC:FP7:KBBE,FP7-KBBE-2007-1,TRITICEAEGENOME
,
Alaux, Michael
in
Adaptation
,
Agricultural production
,
Agricultural research
2018
An annotated reference sequence representing the hexaploid bread wheat genome in 21 pseudomolecules has been analyzed to identify the distribution and genomic context of coding and noncoding elements across the A, B, and D subgenomes. With an estimated coverage of 94% of the genome and containing 107,891 high-confidence gene models, this assembly enabled the discovery of tissue- and developmental stage-related coexpression networks by providing a transcriptome atlas representing major stages of wheat development. Dynamics of complex gene families involved in environmental adaptation and end-use quality were revealed at subgenome resolution and contextualized to known agronomic single-gene or quantitative trait loci. This community resource establishes the foundation for accelerating wheat research and application through improved understanding of wheat biology and genomics-assisted breeding.
Journal Article
The status of the human gene catalogue
2023
Scientists have been trying to identify every gene in the human genome since the initial draft was published in 2001. In the years since, much progress has been made in identifying protein-coding genes, currently estimated to number fewer than 20,000, with an ever-expanding number of distinct protein-coding isoforms. Here we review the status of the human gene catalogue and the efforts to complete it in recent years. Beside the ongoing annotation of protein-coding genes, their isoforms and pseudogenes, the invention of high-throughput RNA sequencing and other technological breakthroughs have led to a rapid growth in the number of reported non-coding RNA genes. For most of these non-coding RNAs, the functional relevance is currently unclear; we look at recent advances that offer paths forward to identifying their functions and towards eventually completing the human gene catalogue. Finally, we examine the need for a universal annotation standard that includes all medically significant genes and maintains their relationships with different reference genomes for the use of the human gene catalogue in clinical settings.
Although the catalogue of human protein-coding genes is nearing completion, the number of non-coding RNA genes remains highly uncertain, and for all genes much work remains to be done to understand their functions.
Journal Article
Assessing the performance of different approaches for functional and taxonomic annotation of metagenomes
by
Tamames, Javier
,
Puente-Sánchez, Fernando
,
Cobo-Simón, Marta
in
Algorithms
,
Animal Genetics and Genomics
,
Annotations
2019
Background
Metagenomes can be analysed using different approaches and tools. One of the most important distinctions is the way to perform taxonomic and functional assignment, choosing between the use of assembly algorithms or the direct analysis of raw sequence reads instead by homology searching, k-mer analysys, or detection of marker genes. Many instances of each approach can be found in the literature, but to the best of our knowledge no evaluation of their different performances has been carried on, and we question if their results are comparable.
Results
We have analysed several real and mock metagenomes using different methodologies and tools, and compared the resulting taxonomic and functional profiles. Our results show that database completeness (the representation of diverse organisms and taxa in it) is the main factor determining the performance of the methods relying on direct read assignment either by homology, k-mer composition or similarity to marker genes, while methods relying on assembly and assignment of predicted genes are most influenced by metagenomic size, that in turn determines the completeness of the assembly (the percentage of read that were assembled).
Conclusions
Although differences exist, taxonomic profiles are rather similar between raw read assignment and assembly assignment methods, while they are more divergent for methods based on k-mers and marker genes. Regarding functional annotation, analysis of raw reads retrieves more functions, but it also makes a substantial number of over-predictions. Assembly methods are more advantageous as the size of the metagenome grows bigger.
Journal Article