Catalogue Search | MBRL

The nf-core framework for community-curated bioinformatics pipelines

by Garcia, Maxime Ulysse , Patel, Harshil , Di Tommaso, Paolo in 631/114 , 706/648 , Agriculture

2020

Journal Article

Share this book

Add to My Shelf

Reproducibility standards for machine learning in the life sciences

by Lee, Su-In , Hicks, Stephanie C , Hoffman, Michael M in Automation , Best practice , Learning algorithms

2021

To make machine-learning analyses in the life sciences more computationally reproducible, we propose standards based on data, model and code publication, programming best practices and workflow automation. By meeting these standards, the community of researchers applying machine-learning methods in the life sciences can ensure that their analyses are worthy of trust.

Journal Article

Share this book

Add to My Shelf

Multi-omics approaches to disease

by Lusis, Aldons , Seldin, Marcus , Hasin, Yehudit in Animal Genetics and Genomics , Arrays , as Revealed Through Genomics

2017

High-throughput technologies have revolutionized medical research. The advent of genotyping arrays enabled large-scale genome-wide association studies and methods for examining global transcript levels, which gave rise to the field of “integrative genetics”. Other omics technologies, such as proteomics and metabolomics, are now often incorporated into the everyday methodology of biological researchers. In this review, we provide an overview of such omics technologies and focus on methods for their integration across multiple omics layers. As compared to studies of a single omics type, multi-omics offers the opportunity to understand the flow of information that underlies disease.

Journal Article

Share this book

Add to My Shelf

Whole genome sequencing of Mycobacterium tuberculosis: current standards and open issues

by Ashton, Philip M , Suresh, Anita , Utpatel, Christian in Bioinformatics , Gene sequencing , Genomes

2019

Whole genome sequencing (WGS) of Mycobacterium tuberculosis has rapidly progressed from a research tool to a clinical application for the diagnosis and management of tuberculosis and in public health surveillance. This development has been facilitated by drastic drops in cost, advances in technology and concerted efforts to translate sequencing data into actionable information. There is, however, a risk that, in the absence of a consensus and international standards, the widespread use of WGS technology may result in data and processes that lack harmonization, comparability and validation. In this Review, we outline the current landscape of WGS pipelines and applications, and set out best practices for M.tuberculosis WGS, including standards for bioinformatics pipelines, curated repositories of resistance-causing variants, phylogenetic analyses, quality control and standardized reporting.

Journal Article

Share this book

Add to My Shelf

Highly accurate protein structure prediction with AlphaFold

by Nikolov, Stanislav , Senior, Andrew W. , Zielinski, Michal in 631/114/1305 , 631/114/2411 , 631/535

2021

Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort 1 , 2 , 3 – 4 , the structures of around 100,000 unique proteins have been determined 5 , but this represents a small fraction of the billions of known protein sequences 6 , 7 . Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’ 8 —has been an important open research problem for more than 50 years 9 . Despite recent progress 10 , 11 , 12 , 13 – 14 , existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14) 15 , demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm. AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.

Journal Article

Share this book

Add to My Shelf

A practical guide to methods controlling false discoveries in computational biology

by Korthauer, Keegan , Kimes, Patrick K. , Reyes, Alejandro in Animal Genetics and Genomics , Benchmarking Studies , Bioinformatics

2019

Background In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only p values as input, more modern FDR methods have been shown to increase power by incorporating complementary information as informative covariates to prioritize, weight, and group hypotheses. However, there is currently no consensus on how the modern methods compare to one another. We investigate the accuracy, applicability, and ease of use of two classic and six modern FDR-controlling methods by performing a systematic benchmark comparison using simulation studies as well as six case studies in computational biology. Results Methods that incorporate informative covariates are modestly more powerful than classic approaches, and do not underperform classic approaches, even when the covariate is completely uninformative. The majority of methods are successful at controlling the FDR, with the exception of two modern methods under certain settings. Furthermore, we find that the improvement of the modern FDR methods over the classic methods increases with the informativeness of the covariate, total number of hypothesis tests, and proportion of truly non-null hypotheses. Conclusions Modern FDR methods that use an informative covariate provide advantages over classic FDR-controlling procedures, with the relative gain dependent on the application and informativeness of available covariates. We present our findings as a practical guide and provide recommendations to aid researchers in their choice of methods to correct for false discoveries.

Journal Article

Share this book

Add to My Shelf

Ten Simple Rules for Reproducible Computational Research

by Sandve, Geir Kjetil , Nekrutenko, Anton , Taylor, James in Biomedical Research - standards , Computational biology , Computational Biology - standards

2013

There will for a given analysis be an exponential number of possible combinations of software versions, parameter values, pre-processing steps, and so on, meaning that a failure to take notes may make exact reproduction essentially impossible. For Every Result, Keep Track of How It Was Produced Whenever a result may be of potential interest, keep track of how it was produced.\\n Although the results of analyses and their corresponding textual interpretations are clearly interconnected at the conceptual level, they tend to live quite separate lives in their representations: results usually live on a data area on a server or personal computer, while interpretations live in text documents in the form of personal notes or emails to collaborators.

Journal Article

Share this book

Add to My Shelf

DOME: recommendations for supervised machine learning validation in biology

by Garcia-Gasulla, Dario , Del Conte Alessio , Capella-Gutierrez, Salvador in Domes , Learning algorithms , Machine learning

2021

DOME is a set of community-wide recommendations for reporting supervised machine learning–based analyses applied to biological studies. Broad adoption of these recommendations will help improve machine learning assessment and reproducibility.

Journal Article

Share this book

Add to My Shelf

OME-NGFF: a next-generation file format for expanding bioimaging data-access strategies

by Gault, David , Linkert Melissa , Jean-Marie, Burel in Accessibility , Datasets , Format

2021

The rapid pace of innovation in biological imaging and the diversity of its applications have prevented the establishment of a community-agreed standardized data format. We propose that complementing established open formats such as OME-TIFF and HDF5 with a next-generation file format such as Zarr will satisfy the majority of use cases in bioimaging. Critically, a common metadata format used in all these vessels can deliver truly findable, accessible, interoperable and reusable bioimaging data.OME’s next-generation file format (OME-NGFF) provides a cloud-native complement to OME-TIFF and HDF5 for storing and accessing bioimaging data at scale and works toward the goal of findable, accessible, interoperable and reusable bioimaging data.

Journal Article

Share this book

Add to My Shelf

Benchmarking of cell type deconvolution pipelines for transcriptomics data

by Powell, Joseph E. , De Preter, Katleen , Avila Cobos, Francisco in 631/114 , 631/208/212/2019 , Animals

2020

Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance. Inferring cell type proportions from transcriptomics data is affected by data transformation, normalization, choice of method and the markers used. Here, the authors use single-cell RNAseq datasets to evaluate the impact of these factors and propose guidelines to maximise deconvolution performance.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter