Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
316
result(s) for
"EMBO10"
Sort by:
Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models
2021
As the number of single‐cell transcriptomics datasets grows, the natural next step is to integrate the accumulating data to achieve a common ontology of cell types and states. However, it is not straightforward to compare gene expression levels across datasets and to automatically assign cell type labels in a new dataset based on existing annotations. In this manuscript, we demonstrate that our previously developed method, scVI, provides an effective and fully probabilistic approach for joint representation and analysis of scRNA‐seq data, while accounting for uncertainty caused by biological and measurement noise. We also introduce single‐cell ANnotation using Variational Inference (scANVI), a semi‐supervised variant of scVI designed to leverage existing cell state annotations. We demonstrate that scVI and scANVI compare favorably to state‐of‐the‐art methods for data integration and cell state annotation in terms of accuracy, scalability, and adaptability to challenging settings. In contrast to existing methods, scVI and scANVI integrate multiple datasets with a single generative model that can be directly used for downstream tasks, such as differential expression. Both methods are easily accessible through scvi‐tools.
SYNOPSIS
This study demonstrates the ability of scVI to integrate single‐cell RNA‐seq datasets in a variety of settings and presents scANVI, a new development based on scVI for automated annotation of cell types and states.
In scVI, datasets from different labs and technologies are integrated in a joint latent space.
In scANVI, cell type annotations are transferred between datasets and across different scenarios.
Uncertainties of differential gene expression in multiple samples are quantified.
The performance of scVI and scANVI in data integration and cell state annotation is superior to other related methods.
Graphical Abstract
This study demonstrates the ability of scVI to integrate single‐cell RNA‐seq datasets in a variety of settings and presents scANVI, a new development based on scVI for automated annotation of cell types and states.
Journal Article
Predicting cellular responses to complex perturbations in high‐throughput screens
by
Shendure, Jay
,
Günnemann, Stephan
,
Lopez‐Paz, David
in
Combinatorial analysis
,
Computational Biology
,
Datasets
2023
Recent advances in multiplexed single‐cell transcriptomics experiments facilitate the high‐throughput study of drug and genetic perturbations. However, an exhaustive exploration of the combinatorial perturbation space is experimentally unfeasible. Therefore, computational methods are needed to predict, interpret, and prioritize perturbations. Here, we present the compositional perturbation autoencoder (CPA), which combines the interpretability of linear models with the flexibility of deep‐learning approaches for single‐cell response modeling. CPA learns to
in silico
predict transcriptional perturbation response at the single‐cell level for unseen dosages, cell types, time points, and species. Using newly generated single‐cell drug combination data, we validate that CPA can predict unseen drug combinations while outperforming baseline models. Additionally, the architecture's modularity enables incorporating the chemical representation of the drugs, allowing the prediction of cellular response to completely unseen drugs. Furthermore, CPA is also applicable to genetic combinatorial screens. We demonstrate this by imputing
in silico
5,329 missing combinations (97.6% of all possibilities) in a single‐cell Perturb‐seq experiment with diverse genetic interactions. We envision CPA will facilitate efficient experimental design and hypothesis generation by enabling
in silico
response prediction at the single‐cell level and thus accelerate therapeutic applications using single‐cell technologies.
Synopsis
The compositional perturbation autoencoder (CPA) is a deep learning model for predicting the transcriptomic responses of single cells to single or combinatorial treatments from drugs and genetic manipulations.
CPA can be trained on highly multiplexed, single‐cell experiments with thousands of conditions to predict unmeasured phenotypes (e.g., specific dose responses).
It can generalize to predict responses to small molecules never seen in the training by adding priors on chemical space.
Validations using a newly generated combinatorial drug perturbation dataset demonstrate the accuracy of CPA in predicting unseen drug combinations.
CPA is also applicable to genetic combinatorial screens, as shown by imputing
in silico
5,329 missing combinations in a single‐cell perturb‐seq experiment with diverse genetic interactions.
Graphical Abstract
The compositional perturbation autoencoder (CPA) is a deep learning model for predicting the transcriptomic responses of single cells to single or combinatorial treatments from drugs and genetic manipulations.
Journal Article
RNA velocity—current challenges and future perspectives
2021
RNA velocity has enabled the recovery of directed dynamic information from single‐cell transcriptomics by connecting measurements to the underlying kinetics of gene expression. This approach has opened up new ways of studying cellular dynamics. Here, we review the current state of RNA velocity modeling approaches, discuss various examples illustrating limitations and potential pitfalls, and provide guidance on how the ensuing challenges may be addressed. We then outline future directions on how to generalize the concept of RNA velocity to a wider variety of biological systems and modalities.
Graphical Abstract
This Review discusses the emerging challenges and potential pitfalls of current RNA velocity modeling approaches and provides guidance on how to address them.
Journal Article
Benchmarking AlphaFold‐enabled molecular docking predictions for antibiotic discovery
by
Zheng, Erica J
,
Manson, Abigail L
,
Krishnan, Aarti
in
Accuracy
,
AlphaFold2
,
Anti-Bacterial Agents - pharmacology
2022
Efficient identification of drug mechanisms of action remains a challenge. Computational docking approaches have been widely used to predict drug binding targets; yet, such approaches depend on existing protein structures, and accurate structural predictions have only recently become available from AlphaFold2. Here, we combine AlphaFold2 with molecular docking simulations to predict protein‐ligand interactions between 296 proteins spanning
Escherichia coli
's essential proteome, and 218 active antibacterial compounds and 100 inactive compounds, respectively, pointing to widespread compound and protein promiscuity. We benchmark model performance by measuring enzymatic activity for 12 essential proteins treated with each antibacterial compound. We confirm extensive promiscuity, but find that the average area under the receiver operating characteristic curve (auROC) is 0.48, indicating weak model performance. We demonstrate that rescoring of docking poses using machine learning‐based approaches improves model performance, resulting in average auROCs as large as 0.63, and that ensembles of rescoring functions improve prediction accuracy and the ratio of true‐positive rate to false‐positive rate. This work indicates that advances in modeling protein‐ligand interactions, particularly using machine learning‐based approaches, are needed to better harness AlphaFold2 for drug discovery.
Synopsis
Assessing molecular docking simulations based on AlphaFold2‐predicted structures with high‐throughput measurements of protein‐ligand interactions reveals weak model performance. Machine learning‐based approaches improve performance and better harness AlphaFold2 for drug discovery.
AlphaFold2‐based molecular docking predictions for 296
Escherichia coli
proteins, 218 active antibacterial compounds and 100 inactive compounds predict widespread promiscuity and similar distributions of binding affinities between active and inactive compounds.
Quantitative enzymatic inhibition assays for 12 essential
E. coli
proteins treated with each of the 218 antibacterial compounds confirm extensive promiscuity.
The enzymatic inhibition dataset reveals that the performance of the molecular docking model is weak.
Rescoring of docking poses using machine learning‐based scoring functions improves model performance.
Graphical Abstract
Assessing molecular docking simulations based on AlphaFold2‐predicted structures with high‐throughput measurements of protein‐ligand interactions reveals weak model performance. Machine learning‐based approaches improve performance and better harness AlphaFold2 for drug discovery.
Journal Article
Deep learning for computational biology
by
Angermueller, Christof
,
Parts, Leopold
,
Stegle, Oliver
in
Artificial intelligence
,
Biology
,
cellular imaging
2016
Technological advances in genomics and imaging have led to an explosion of molecular and cellular profiling data from large numbers of samples. This rapid increase in biological data dimension and acquisition rate is challenging conventional analysis strategies. Modern machine learning methods, such as deep learning, promise to leverage very large data sets for finding hidden structure within them, and for making accurate predictions. In this review, we discuss applications of this new breed of analysis approaches in regulatory genomics and cellular imaging. We provide background of what deep learning is, and the settings in which it can be successfully applied to derive biological insights. In addition to presenting specific applications and providing tips for practical use, we also highlight possible pitfalls and limitations to guide computational biologists when and how to make the most use of this new technology.
Graphical Abstract
Deep learning, a class of modern machine learning methods, has become a go‐to approach for analysing large‐scale high‐dimensional data. This review discusses its applications in biology, focusing on regulatory genomics and cellular imaging, and gives guidelines for practitioners.
Journal Article
Multi‐Omics Factor Analysis—a framework for unsupervised integration of multi‐omics data sets
by
Buettner, Florian
,
Huber, Wolfgang
,
Velten, Britta
in
Antineoplastic Agents - therapeutic use
,
Axes (reference lines)
,
Biological activity
2018
Multi‐omics studies promise the improved characterization of biological processes across molecular layers. However, methods for the unsupervised integration of the resulting heterogeneous data sets are lacking. We present Multi‐Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in multi‐omics data sets. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability. It disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The learnt factors enable a variety of downstream analyses, including identification of sample subgroups, data imputation and the detection of outlier samples. We applied MOFA to a cohort of 200 patient samples of chronic lymphocytic leukaemia, profiled for somatic mutations, RNA expression, DNA methylation and
ex vivo
drug responses. MOFA identified major dimensions of disease heterogeneity, including immunoglobulin heavy‐chain variable region status, trisomy of chromosome 12 and previously underappreciated drivers, such as response to oxidative stress. In a second application, we used MOFA to analyse single‐cell multi‐omics data, identifying coordinated transcriptional and epigenetic changes along cell differentiation.
Synopsis
Multi‐Omics Factor Analysis (MOFA) is a computational framework for unsupervised discovery of the principal axes of biological and technical variation when multiple omics assays are applied to the same samples. MOFA is a broadly applicable approach for multi‐omics data integration.
The inferred latent factors represent the underlying principal axes of heterogeneity across the samples. Factors can be shared by multiple data modalities or can be data‐type specific.
The model flexibly handles missing values and different data types.
In an application to Chronic Lymphocytic Leukaemia, MOFA discovers a low dimensional space spanned by known clinical markers and underappreciated axes of variation such as oxidative stress.
In an application to multi‐omics profiles from single‐cells, MOFA recovers differentiation trajectories and identifies coordinated variation between the transcriptome and the epigenome.
Graphical Abstract
Multi‐Omics Factor Analysis (MOFA) is a computational framework for unsupervised discovery of the principal axes of biological and technical variation when multiple omics assays are applied to the same samples. MOFA is a broadly applicable approach for multi‐omics data integration.
Journal Article
SBML Level 3: an extensible format for the exchange and reuse of biological models
by
Dharuri, Harish
,
ANSYS France SAS ; ANSYS Inc. (États-Unis)
,
Wrzodek, Finja
in
Animals
,
Biological models (mathematics)
,
Biology
2020
Systems biology has experienced dramatic growth in the number, size, and complexity of computational models. To reproduce simulation results and reuse models, researchers must exchange unambiguous model descriptions. We review the latest edition of the Systems Biology Markup Language (SBML), a format designed for this purpose. A community of modelers and software authors developed SBML Level 3 over the past decade. Its modular form consists of a core suited to representing reaction-based models and packages that extend the core with features suited to other model types including constraint-based models, reaction-diffusion models, logical network models, and rule-based models. The format leverages two decades of SBML and a rich software ecosystem that transformed how systems biologists build and interact with models. More recently, the rise of multi-scale models of whole cells and organs, and new data sources such as single-cell measurements and live imaging, has precipitated new ways of integrating data with models. We provide our perspectives on the challenges presented by these developments and how SBML Level provides the foundation needed to support this evolution.
Journal Article
Updated benchmarking of variant effect predictors using deep mutational scanning
2023
The assessment of variant effect predictor (VEP) performance is fraught with biases introduced by benchmarking against clinical observations. In this study, building on our previous work, we use independently generated measurements of protein function from deep mutational scanning (DMS) experiments for 26 human proteins to benchmark 55 different VEPs, while introducing minimal data circularity. Many top‐performing VEPs are unsupervised methods including EVE, DeepSequence and ESM‐1v, a protein language model that ranked first overall. However, the strong performance of recent supervised VEPs, in particular VARITY, shows that developers are taking data circularity and bias issues seriously. We also assess the performance of DMS and unsupervised VEPs for discriminating between known pathogenic and putatively benign missense variants. Our findings are mixed, demonstrating that some DMS datasets perform exceptionally at variant classification, while others are poor. Notably, we observe a striking correlation between VEP agreement with DMS data and performance in identifying clinically relevant variants, strongly supporting the validity of our rankings and the utility of DMS for independent benchmarking.
Synopsis
Common sources of bias in variant effect predictor benchmarking are assessed using data from deep mutational scanning experiments. ESM‐1v, EVE and DeepSequence are among the top performers on both functionally validated and clinically observed variants.
Deep mutational scanning datasets from 26 human proteins are used to benchmark 55 computational predictors of missense variant effect.
The top‐performing methods include several very recent predictors and are based mostly on unsupervised machine learning methodologies.
There is a strong correlation between predictor performance when benchmarked against deep mutational scanning data and clinical variants.
Graphical Abstract
Common sources of bias in variant effect predictor benchmarking are assessed using data from deep mutational scanning experiments. ESM‐1v, EVE and DeepSequence are among the top performers on both functionally validated and clinically observed variants.
Journal Article
Integrated intra‐ and intercellular signaling knowledge for multicellular omics analysis
2021
Molecular knowledge of biological processes is a cornerstone in omics data analysis. Applied to single‐cell data, such analyses provide mechanistic insights into individual cells and their interactions. However, knowledge of intercellular communication is scarce, scattered across resources, and not linked to intracellular processes. To address this gap, we combined over 100 resources covering interactions and roles of proteins in inter‐ and intracellular signaling, as well as transcriptional and post‐transcriptional regulation. We added protein complex information and annotations on function, localization, and role in diseases for each protein. The resource is available for human, and via homology translation for mouse and rat. The data are accessible via
OmniPath
’s web service (
https://omnipathdb.org/
), a Cytoscape plug‐in, and packages in R/Bioconductor and Python, providing access options for computational and experimental scientists. We created workflows with tutorials to facilitate the analysis of cell–cell interactions and affected downstream intracellular signaling processes.
OmniPath
provides a single access point to knowledge spanning intra‐ and intercellular processes for data analysis, as we demonstrate in applications studying SARS‐CoV‐2 infection and ulcerative colitis.
SYNOPSIS
Over 100 resources are integrated into
OmniPath
, a comprehensive knowledge base of intra‐ and inter‐cellular signaling. Workflows are provided and illustrated in case studies analyzing omics data in SARS‐CoV‐2 infection and ulcerative colitis.
OmniPath
includes 4,000,000 annotations for over 20,000 proteins.
A new framework defining
transmitter
and
receiver
roles generalizes the concepts of
ligand
and
receptor
.
Integrated analysis of intra‐ and intercellular signaling can be performed to study how cells affect each other in healthy and diseased conditions.
Software tools and workflows in R and Python facilitate the analysis of bulk and single‐cell omics data using tools such as
CellPhoneDB
,
NicheNet
and
CARNIVAL
.
Graphical Abstract
Over 100 resources are integrated into
OmniPath
, a comprehensive knowledge base of intra‐ and inter‐cellular signaling. Workflows are provided and illustrated in case studies analyzing omics data in SARS‐CoV‐2 infection and ulcerative colitis.
Journal Article
Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations
2020
To deal with the huge number of novel protein‐coding variants identified by genome and exome sequencing studies, many computational variant effect predictors (VEPs) have been developed. Such predictors are often trained and evaluated using different variant data sets, making a direct comparison between VEPs difficult. In this study, we use 31 previously published deep mutational scanning (DMS) experiments, which provide quantitative, independent phenotypic measurements for large numbers of single amino acid substitutions, in order to benchmark and compare 46 different VEPs. We also evaluate the ability of DMS measurements and VEPs to discriminate between pathogenic and benign missense variants. We find that DMS experiments tend to be superior to the top‐ranking predictors, demonstrating the tremendous potential of DMS for identifying novel human disease mutations. Among the VEPs, DeepSequence clearly stood out, showing both the strongest correlations with DMS data and having the best ability to predict pathogenic mutations, which is especially remarkable given that it is an unsupervised method. We further recommend SNAP2, DEOGEN2, SNPs&GO, SuSPect and REVEL based upon their performance in these analyses.
Synopsis
Data from deep mutational scans is used to benchmark computational protein variant effect predictors using fully independent data. The performance of deep mutational scanning is also compared to computational predictors for identifying pathogenic variants.
DeepSequence is the method that correlates the best with deep mutational scanning data for human proteins.
Predictor performance depends heavily on the protein and fitness metric. For this reason, using results from multiple predictors is recommended. Other recommended predictors include SNAP2, DEOGEN2, SNPs&GO, SuSPect and REVEL.
Deep mutational scanning is generally superior to variant effect predictors for distinguishing pathogenic from benign variants.
Graphical Abstract
Data from deep mutational scans is used to benchmark computational protein variant effect predictors using fully independent data. The performance of deep mutational scanning is also compared to computational predictors for identifying pathogenic variants.
Journal Article