Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
      More Filters
      Clear All
      More Filters
      Source
    • Language
4,034 result(s) for "Taxonomic classification"
Sort by:
Taxonomic classification of DNA sequences beyond sequence similarity using deep neural networks
Taxonomic classification, that is, the assignment to biological clades with shared ancestry, is a common task in genetics, mainly based on a genome similarity search of large genome databases. The classification quality depends heavily on the database, since representative relatives must be present. Many genomic sequences cannot be classified at all or only with a high misclassification rate. Here we present BERTax, a deep neural network program based on natural language processing to precisely classify the superkingdom and phylum of DNA sequences taxonomically without the need for a known representative relative from a database. We show BERTax to be at least on par with the state-of-the-art approaches when taxonomically similar species are part of the training data. For novel organisms, however, BERTax clearly outperforms any existing approach. Finally, we show that BERTax can also be combined with database approaches to further increase the prediction quality in almost all cases. Since BERTax is not based on similar entries in databases, it allows precise taxonomic classification of a broader range of genomic sequences, thus increasing the overall information gain.
SILVA, RDP, Greengenes, NCBI and OTT — how do these taxonomies compare?
Background A key step in microbiome sequencing analysis is read assignment to taxonomic units. This is often performed using one of four taxonomic classifications, namely SILVA, RDP, Greengenes or NCBI. It is unclear how similar these are and how to compare analysis results that are based on different taxonomies. Results We provide a method and software for mapping taxonomic entities from one taxonomy onto another. We use it to compare the four taxonomies and the Open Tree of life Taxonomy (OTT). Conclusions While we find that SILVA, RDP and Greengenes map well into NCBI, and all four map well into the OTT, mapping the two larger taxonomies on to the smaller ones is problematic.
RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification
In order to determine the role of the database in taxonomic sequence classification, we examine the influence of the database over time on k -mer-based lowest common ancestor taxonomic classification. We present three major findings: the number of new species added to the NCBI RefSeq database greatly outpaces the number of new genera; as a result, more reads are classified with newer database versions, but fewer are classified at the species level; and Bayesian-based re-estimation mitigates this effect but struggles with novel genomes. These results suggest a need for new classification approaches specially adapted for large databases.
A Bayesian taxonomic classification method for 16S rRNA gene sequences with improved species-level accuracy
Background Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. Results We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Conclusions Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .
High-throughput sequencing for community analysis: the promise of DNA barcoding to uncover diversity, relatedness, abundances and interactions in spider communities
Large-scale studies on community ecology are highly desirable but often difficult to accomplish due to the considerable investment of time, labor and, money required to characterize richness, abundance, relatedness, and interactions. Nonetheless, such large-scale perspectives are necessary for understanding the composition, dynamics, and resilience of biological communities. Small invertebrates play a central role in ecosystems, occupying critical positions in the food web and performing a broad variety of ecological functions. However, it has been particularly difficult to adequately characterize communities of these animals because of their exceptionally high diversity and abundance. Spiders in particular fulfill key roles as both predator and prey in terrestrial food webs and are hence an important focus of ecological studies. In recent years, large-scale community analyses have benefitted tremendously from advances in DNA barcoding technology. High-throughput sequencing (HTS), particularly DNA metabarcoding, enables community-wide analyses of diversity and interactions at unprecedented scales and at a fraction of the cost that was previously possible. Here, we review the current state of the application of these technologies to the analysis of spider communities. We discuss amplicon-based DNA barcoding and metabarcoding for the analysis of community diversity and molecular gut content analysis for assessing predator-prey relationships. We also highlight applications of the third generation sequencing technology for long read and portable DNA barcoding. We then address the development of theoretical frameworks for community-level studies, and finally highlight critical gaps and future directions for DNA analysis of spider communities.
A study on software fault prediction techniques
Software fault prediction aims to identify fault-prone software modules by using some underlying properties of the software project before the actual testing process begins. It helps in obtaining desired software quality with optimized cost and effort. Initially, this paper provides an overview of the software fault prediction process. Next, different dimensions of software fault prediction process are explored and discussed. This review aims to help with the understanding of various elements associated with fault prediction process and to explore various issues involved in the software fault prediction. We search through various digital libraries and identify all the relevant papers published since 1993. The review of these papers are grouped into three classes: software metrics, fault prediction techniques, and data quality issues. For each of the class, taxonomical classification of different techniques and our observations have also been presented. The review and summarization in the tabular form are also given. At the end of the paper, the statistical analysis, observations, challenges, and future directions of software fault prediction have been discussed.
Dereplication strategies in natural product research: How many tools and methodologies behind the same concept?
The development of new drugs will certainly benefit from an ever improving knowledge of the living beings chemistry. However, identification of drugable molecules within the immense biodiversity of forests, soils or oceans still requires considerable investments in technical equipments, time and human resources. An important part of this process is the quick identification of known substances in order to concentrate the efforts on the discovery of new ones. A range of “dereplication” procedures are currently emerging to meet this challenge as key strategies to improve the performance of natural product screening programs. Initially defined in 1990 as “a process of quickly identifying known chemotypes”, dereplication is today a not so univocal concept and has evolved over the last years in different ways. The present review covers all dereplication-related sudies in natural product research from 1990 to 2014. Its writing brought to light five distinct dereplication workflows that can be characterized by the nature of starting materials, by the selected analytical technique, and above all by the final objective. Dereplication can be used as an untargeted workflow for the rapid identification of the major compounds whatever their chemical class in a single sample or for the acceleration of bioactivity-guided fractionation procedures. In other cases dereplication is fully integrated in metabolomic studies for the untargeted chemical profiling of natural extract collections or for the targeted identification of a predetermined class of metabolites. Finally a quite distinct dereplication approach mainly based on gene-sequence analyses is frequently used for the taxonomic identification of microbial strains.
A comprehensive fungi-specific 18S rRNA gene sequence primer toolkit suited for diverse research issues and sequencing platforms
Background Several fungi-specific primers target the 18S rRNA gene sequence, one of the prominent markers for fungal classification. The design of most primers goes back to the last decades. Since then, the number of sequences in public databases increased leading to the discovery of new fungal groups and changes in fungal taxonomy. However, no reevaluation of primers was carried out and relevant information on most primers is missing. With this study, we aimed to develop an 18S rRNA gene sequence primer toolkit allowing an easy selection of the best primer pair appropriate for different sequencing platforms, research aims (biodiversity assessment versus isolate classification) and target groups. Results We performed an intensive literature research, reshuffled existing primers into new pairs, designed new Illumina-primers, and annealing blocking oligonucleotides. A final number of 439 primer pairs were subjected to in silico PCRs. Best primer pairs were selected and experimentally tested. The most promising primer pair with a small amplicon size, nu-SSU-1333-5′/nu-SSU-1647-3′ (FF390/FR-1), was successful in describing fungal communities by Illumina sequencing. Results were confirmed by a simultaneous metagenomics and eukaryote-specific primer approach. Co-amplification occurred in all sample types but was effectively reduced by blocking oligonucleotides. Conclusions The compiled data revealed the presence of an enormous diversity of fungal 18S rRNA gene primer pairs in terms of fungal coverage, phylum spectrum and co-amplification. Therefore, the primer pair has to be carefully selected to fulfill the requirements of the individual research projects. The presented primer toolkit offers comprehensive lists of 164 primers, 439 primer combinations, 4 blocking oligonucleotides, and top primer pairs holding all relevant information including primer’s characteristics and performance to facilitate primer pair selection.
Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities
Background The field of metagenomics (study of genetic material recovered directly from an environment) has grown rapidly, with many bioinformatics analysis methods being developed. To ensure appropriate use of such methods, robust comparative evaluation of their accuracy and features is needed. For taxonomic classification of sequence reads, such evaluation should include use of clade exclusion, which better evaluates a method’s accuracy when identical sequences are not present in any reference database, as is common in metagenomic analysis. To date, relatively small evaluations have been performed, with evaluation approaches like clade exclusion limited to assessment of new methods by the authors of the given method. What is needed is a rigorous, independent comparison between multiple major methods, using the same in silico and in vitro test datasets, with and without approaches like clade exclusion, to better characterize accuracy under different conditions. Results An overview of the features of 38 bioinformatics methods is provided, evaluating accuracy with a focus on 11 programs that have reference databases that can be modified and therefore most robustly evaluated with clade exclusion. Taxonomic classification of sequence reads was evaluated using both in silico and in vitro mock bacterial communities. Clade exclusion was used at taxonomic levels from species to class—identifying how well methods perform in progressively more difficult scenarios. A wide range of variability was found in the sensitivity, precision, overall accuracy, and computational demand for the programs evaluated. In experiments where distilled water was spiked with only 11 bacterial species, frequently dozens to hundreds of species were falsely predicted by the most popular programs. The different features of each method (forces predictions or not, etc.) are summarized, and additional analysis considerations discussed. Conclusions The accuracy of shotgun metagenomics classification methods varies widely. No one program clearly outperformed others in all evaluation scenarios; rather, the results illustrate the strengths of different methods for different purposes. Researchers must appreciate method differences, choosing the program best suited for their particular analysis to avoid very misleading results. Use of standardized datasets for method comparisons is encouraged, as is use of mock microbial community controls suitable for a particular metagenomic analysis.
SpeciateIT and vSpeciateDB: novel, fast, and accurate per sequence 16S rRNA gene taxonomic classification of vaginal microbiota
Background Clustering of sequences into operational taxonomic units (OTUs) and denoising methods are a mainstream stopgap to taxonomically classifying large numbers of 16S rRNA gene sequences. Environment-specific reference databases generally yield optimal taxonomic assignment. Results We developed SpeciateIT, a novel taxonomic classification tool which rapidly and accurately classifies individual amplicon sequences ( https://github.com/Ravel-Laboratory/speciateIT ). We also present vSpeciateDB, a custom reference database for the taxonomic classification of 16S rRNA gene amplicon sequences from vaginal microbiota. We show that SpeciateIT requires minimal computational resources relative to other algorithms and, when combined with vSpeciateDB, affords accurate species level classification in an environment-specific manner. Conclusions Herein, two resources with new and practical importance are described. The novel classification algorithm, SpeciateIT, is based on 7th order Markov chain models and allows for fast and accurate per-sequence taxonomic assignments (as little as 10 min for 10 7 sequences). vSpeciateDB, a meticulously tailored reference database, stands as a vital and pragmatic contribution. Its significance lies in the superiority of this environment-specific database to provide more species-resolution over its universal counterparts.