Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
3,952
result(s) for
"Computational Biology - standards"
Sort by:
Reproducibility standards for machine learning in the life sciences
by
Lee, Su-In
,
Hicks, Stephanie C
,
Hoffman, Michael M
in
Automation
,
Best practice
,
Learning algorithms
2021
To make machine-learning analyses in the life sciences more computationally reproducible, we propose standards based on data, model and code publication, programming best practices and workflow automation. By meeting these standards, the community of researchers applying machine-learning methods in the life sciences can ensure that their analyses are worthy of trust.
Journal Article
Multi-omics approaches to disease
by
Lusis, Aldons
,
Seldin, Marcus
,
Hasin, Yehudit
in
Animal Genetics and Genomics
,
Arrays
,
as Revealed Through Genomics
2017
High-throughput technologies have revolutionized medical research. The advent of genotyping arrays enabled large-scale genome-wide association studies and methods for examining global transcript levels, which gave rise to the field of “integrative genetics”. Other omics technologies, such as proteomics and metabolomics, are now often incorporated into the everyday methodology of biological researchers. In this review, we provide an overview of such omics technologies and focus on methods for their integration across multiple omics layers. As compared to studies of a single omics type, multi-omics offers the opportunity to understand the flow of information that underlies disease.
Journal Article
Whole genome sequencing of Mycobacterium tuberculosis: current standards and open issues
by
Ashton, Philip M
,
Suresh, Anita
,
Utpatel, Christian
in
Bioinformatics
,
Gene sequencing
,
Genomes
2019
Whole genome sequencing (WGS) of Mycobacterium tuberculosis has rapidly progressed from a research tool to a clinical application for the diagnosis and management of tuberculosis and in public health surveillance. This development has been facilitated by drastic drops in cost, advances in technology and concerted efforts to translate sequencing data into actionable information. There is, however, a risk that, in the absence of a consensus and international standards, the widespread use of WGS technology may result in data and processes that lack harmonization, comparability and validation. In this Review, we outline the current landscape of WGS pipelines and applications, and set out best practices for M.tuberculosis WGS, including standards for bioinformatics pipelines, curated repositories of resistance-causing variants, phylogenetic analyses, quality control and standardized reporting.
Journal Article
Highly accurate protein structure prediction with AlphaFold
by
Nikolov, Stanislav
,
Senior, Andrew W.
,
Zielinski, Michal
in
631/114/1305
,
631/114/2411
,
631/535
2021
Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort
1
,
2
,
3
–
4
, the structures of around 100,000 unique proteins have been determined
5
, but this represents a small fraction of the billions of known protein sequences
6
,
7
. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’
8
—has been an important open research problem for more than 50 years
9
. Despite recent progress
10
,
11
,
12
,
13
–
14
, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14)
15
, demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm.
AlphaFold predicts protein structures with an accuracy competitive with experimental structures in the majority of cases using a novel deep learning architecture.
Journal Article
A practical guide to methods controlling false discoveries in computational biology
by
Korthauer, Keegan
,
Kimes, Patrick K.
,
Reyes, Alejandro
in
Animal Genetics and Genomics
,
Benchmarking Studies
,
Bioinformatics
2019
Background
In high-throughput studies, hundreds to millions of hypotheses are typically tested. Statistical methods that control the false discovery rate (FDR) have emerged as popular and powerful tools for error rate control. While classic FDR methods use only
p
values as input, more modern FDR methods have been shown to increase power by incorporating complementary information as informative covariates to prioritize, weight, and group hypotheses. However, there is currently no consensus on how the modern methods compare to one another. We investigate the accuracy, applicability, and ease of use of two classic and six modern FDR-controlling methods by performing a systematic benchmark comparison using simulation studies as well as six case studies in computational biology.
Results
Methods that incorporate informative covariates are modestly more powerful than classic approaches, and do not underperform classic approaches, even when the covariate is completely uninformative. The majority of methods are successful at controlling the FDR, with the exception of two modern methods under certain settings. Furthermore, we find that the improvement of the modern FDR methods over the classic methods increases with the informativeness of the covariate, total number of hypothesis tests, and proportion of truly non-null hypotheses.
Conclusions
Modern FDR methods that use an informative covariate provide advantages over classic FDR-controlling procedures, with the relative gain dependent on the application and informativeness of available covariates. We present our findings as a practical guide and provide recommendations to aid researchers in their choice of methods to correct for false discoveries.
Journal Article
Ten Simple Rules for Reproducible Computational Research
by
Sandve, Geir Kjetil
,
Nekrutenko, Anton
,
Taylor, James
in
Biomedical Research - standards
,
Computational biology
,
Computational Biology - standards
2013
There will for a given analysis be an exponential number of possible combinations of software versions, parameter values, pre-processing steps, and so on, meaning that a failure to take notes may make exact reproduction essentially impossible. For Every Result, Keep Track of How It Was Produced Whenever a result may be of potential interest, keep track of how it was produced.\\n Although the results of analyses and their corresponding textual interpretations are clearly interconnected at the conceptual level, they tend to live quite separate lives in their representations: results usually live on a data area on a server or personal computer, while interpretations live in text documents in the form of personal notes or emails to collaborators.
Journal Article
DOME: recommendations for supervised machine learning validation in biology
by
Garcia-Gasulla, Dario
,
Del Conte Alessio
,
Capella-Gutierrez, Salvador
in
Domes
,
Learning algorithms
,
Machine learning
2021
DOME is a set of community-wide recommendations for reporting supervised machine learning–based analyses applied to biological studies. Broad adoption of these recommendations will help improve machine learning assessment and reproducibility.
Journal Article
OME-NGFF: a next-generation file format for expanding bioimaging data-access strategies
2021
The rapid pace of innovation in biological imaging and the diversity of its applications have prevented the establishment of a community-agreed standardized data format. We propose that complementing established open formats such as OME-TIFF and HDF5 with a next-generation file format such as Zarr will satisfy the majority of use cases in bioimaging. Critically, a common metadata format used in all these vessels can deliver truly findable, accessible, interoperable and reusable bioimaging data.OME’s next-generation file format (OME-NGFF) provides a cloud-native complement to OME-TIFF and HDF5 for storing and accessing bioimaging data at scale and works toward the goal of findable, accessible, interoperable and reusable bioimaging data.
Journal Article
Benchmarking of cell type deconvolution pipelines for transcriptomics data
by
Powell, Joseph E.
,
De Preter, Katleen
,
Avila Cobos, Francisco
in
631/114
,
631/208/212/2019
,
Animals
2020
Many computational methods have been developed to infer cell type proportions from bulk transcriptomics data. However, an evaluation of the impact of data transformation, pre-processing, marker selection, cell type composition and choice of methodology on the deconvolution results is still lacking. Using five single-cell RNA-sequencing (scRNA-seq) datasets, we generate pseudo-bulk mixtures to evaluate the combined impact of these factors. Both bulk deconvolution methodologies and those that use scRNA-seq data as reference perform best when applied to data in linear scale and the choice of normalization has a dramatic impact on some, but not all methods. Overall, methods that use scRNA-seq data have comparable performance to the best performing bulk methods whereas semi-supervised approaches show higher error values. Moreover, failure to include cell types in the reference that are present in a mixture leads to substantially worse results, regardless of the previous choices. Altogether, we evaluate the combined impact of factors affecting the deconvolution task across different datasets and propose general guidelines to maximize its performance.
Inferring cell type proportions from transcriptomics data is affected by data transformation, normalization, choice of method and the markers used. Here, the authors use single-cell RNAseq datasets to evaluate the impact of these factors and propose guidelines to maximise deconvolution performance.
Journal Article