Catalogue Search | MBRL

Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models

by Jordan, Michael I , Xu, Chenling , Lopez, Romain in Adaptability , annotation , Annotations

2021

As the number of single‐cell transcriptomics datasets grows, the natural next step is to integrate the accumulating data to achieve a common ontology of cell types and states. However, it is not straightforward to compare gene expression levels across datasets and to automatically assign cell type labels in a new dataset based on existing annotations. In this manuscript, we demonstrate that our previously developed method, scVI, provides an effective and fully probabilistic approach for joint representation and analysis of scRNA‐seq data, while accounting for uncertainty caused by biological and measurement noise. We also introduce single‐cell ANnotation using Variational Inference (scANVI), a semi‐supervised variant of scVI designed to leverage existing cell state annotations. We demonstrate that scVI and scANVI compare favorably to state‐of‐the‐art methods for data integration and cell state annotation in terms of accuracy, scalability, and adaptability to challenging settings. In contrast to existing methods, scVI and scANVI integrate multiple datasets with a single generative model that can be directly used for downstream tasks, such as differential expression. Both methods are easily accessible through scvi‐tools. SYNOPSIS This study demonstrates the ability of scVI to integrate single‐cell RNA‐seq datasets in a variety of settings and presents scANVI, a new development based on scVI for automated annotation of cell types and states. In scVI, datasets from different labs and technologies are integrated in a joint latent space. In scANVI, cell type annotations are transferred between datasets and across different scenarios. Uncertainties of differential gene expression in multiple samples are quantified. The performance of scVI and scANVI in data integration and cell state annotation is superior to other related methods. Graphical Abstract This study demonstrates the ability of scVI to integrate single‐cell RNA‐seq datasets in a variety of settings and presents scANVI, a new development based on scVI for automated annotation of cell types and states.

Journal Article

Share this book

Add to My Shelf

Predicting cellular responses to complex perturbations in high‐throughput screens

by Shendure, Jay , Günnemann, Stephan , Lopez‐Paz, David in Combinatorial analysis , Computational Biology , Datasets

2023

Recent advances in multiplexed single‐cell transcriptomics experiments facilitate the high‐throughput study of drug and genetic perturbations. However, an exhaustive exploration of the combinatorial perturbation space is experimentally unfeasible. Therefore, computational methods are needed to predict, interpret, and prioritize perturbations. Here, we present the compositional perturbation autoencoder (CPA), which combines the interpretability of linear models with the flexibility of deep‐learning approaches for single‐cell response modeling. CPA learns to in silico predict transcriptional perturbation response at the single‐cell level for unseen dosages, cell types, time points, and species. Using newly generated single‐cell drug combination data, we validate that CPA can predict unseen drug combinations while outperforming baseline models. Additionally, the architecture's modularity enables incorporating the chemical representation of the drugs, allowing the prediction of cellular response to completely unseen drugs. Furthermore, CPA is also applicable to genetic combinatorial screens. We demonstrate this by imputing in silico 5,329 missing combinations (97.6% of all possibilities) in a single‐cell Perturb‐seq experiment with diverse genetic interactions. We envision CPA will facilitate efficient experimental design and hypothesis generation by enabling in silico response prediction at the single‐cell level and thus accelerate therapeutic applications using single‐cell technologies. Synopsis The compositional perturbation autoencoder (CPA) is a deep learning model for predicting the transcriptomic responses of single cells to single or combinatorial treatments from drugs and genetic manipulations. CPA can be trained on highly multiplexed, single‐cell experiments with thousands of conditions to predict unmeasured phenotypes (e.g., specific dose responses). It can generalize to predict responses to small molecules never seen in the training by adding priors on chemical space. Validations using a newly generated combinatorial drug perturbation dataset demonstrate the accuracy of CPA in predicting unseen drug combinations. CPA is also applicable to genetic combinatorial screens, as shown by imputing in silico 5,329 missing combinations in a single‐cell perturb‐seq experiment with diverse genetic interactions. Graphical Abstract The compositional perturbation autoencoder (CPA) is a deep learning model for predicting the transcriptomic responses of single cells to single or combinatorial treatments from drugs and genetic manipulations.

Journal Article

Share this book

Add to My Shelf

RNA velocity—current challenges and future perspectives

by Bergen, Volker , Soldatov, Ruslan A , Kharchenko, Peter V in Bone marrow , challenges , dynamics

2021

RNA velocity has enabled the recovery of directed dynamic information from single‐cell transcriptomics by connecting measurements to the underlying kinetics of gene expression. This approach has opened up new ways of studying cellular dynamics. Here, we review the current state of RNA velocity modeling approaches, discuss various examples illustrating limitations and potential pitfalls, and provide guidance on how the ensuing challenges may be addressed. We then outline future directions on how to generalize the concept of RNA velocity to a wider variety of biological systems and modalities. Graphical Abstract This Review discusses the emerging challenges and potential pitfalls of current RNA velocity modeling approaches and provides guidance on how to address them.

Journal Article

Share this book

Add to My Shelf

Benchmarking AlphaFold‐enabled molecular docking predictions for antibiotic discovery

by Zheng, Erica J , Manson, Abigail L , Krishnan, Aarti in Accuracy , AlphaFold2 , Anti-Bacterial Agents - pharmacology

2022

Efficient identification of drug mechanisms of action remains a challenge. Computational docking approaches have been widely used to predict drug binding targets; yet, such approaches depend on existing protein structures, and accurate structural predictions have only recently become available from AlphaFold2. Here, we combine AlphaFold2 with molecular docking simulations to predict protein‐ligand interactions between 296 proteins spanning Escherichia coli 's essential proteome, and 218 active antibacterial compounds and 100 inactive compounds, respectively, pointing to widespread compound and protein promiscuity. We benchmark model performance by measuring enzymatic activity for 12 essential proteins treated with each antibacterial compound. We confirm extensive promiscuity, but find that the average area under the receiver operating characteristic curve (auROC) is 0.48, indicating weak model performance. We demonstrate that rescoring of docking poses using machine learning‐based approaches improves model performance, resulting in average auROCs as large as 0.63, and that ensembles of rescoring functions improve prediction accuracy and the ratio of true‐positive rate to false‐positive rate. This work indicates that advances in modeling protein‐ligand interactions, particularly using machine learning‐based approaches, are needed to better harness AlphaFold2 for drug discovery. Synopsis Assessing molecular docking simulations based on AlphaFold2‐predicted structures with high‐throughput measurements of protein‐ligand interactions reveals weak model performance. Machine learning‐based approaches improve performance and better harness AlphaFold2 for drug discovery. AlphaFold2‐based molecular docking predictions for 296 Escherichia coli proteins, 218 active antibacterial compounds and 100 inactive compounds predict widespread promiscuity and similar distributions of binding affinities between active and inactive compounds. Quantitative enzymatic inhibition assays for 12 essential E. coli proteins treated with each of the 218 antibacterial compounds confirm extensive promiscuity. The enzymatic inhibition dataset reveals that the performance of the molecular docking model is weak. Rescoring of docking poses using machine learning‐based scoring functions improves model performance. Graphical Abstract Assessing molecular docking simulations based on AlphaFold2‐predicted structures with high‐throughput measurements of protein‐ligand interactions reveals weak model performance. Machine learning‐based approaches improve performance and better harness AlphaFold2 for drug discovery.

Journal Article

Share this book

Add to My Shelf

Deep learning for computational biology

by Angermueller, Christof , Parts, Leopold , Stegle, Oliver in Artificial intelligence , Biology , cellular imaging

2016

Technological advances in genomics and imaging have led to an explosion of molecular and cellular profiling data from large numbers of samples. This rapid increase in biological data dimension and acquisition rate is challenging conventional analysis strategies. Modern machine learning methods, such as deep learning, promise to leverage very large data sets for finding hidden structure within them, and for making accurate predictions. In this review, we discuss applications of this new breed of analysis approaches in regulatory genomics and cellular imaging. We provide background of what deep learning is, and the settings in which it can be successfully applied to derive biological insights. In addition to presenting specific applications and providing tips for practical use, we also highlight possible pitfalls and limitations to guide computational biologists when and how to make the most use of this new technology. Graphical Abstract Deep learning, a class of modern machine learning methods, has become a go‐to approach for analysing large‐scale high‐dimensional data. This review discusses its applications in biology, focusing on regulatory genomics and cellular imaging, and gives guidelines for practitioners.

Journal Article

Share this book

Add to My Shelf

Multi‐Omics Factor Analysis—a framework for unsupervised integration of multi‐omics data sets

by Buettner, Florian , Huber, Wolfgang , Velten, Britta in Antineoplastic Agents - therapeutic use , Axes (reference lines) , Biological activity

2018

Multi‐omics studies promise the improved characterization of biological processes across molecular layers. However, methods for the unsupervised integration of the resulting heterogeneous data sets are lacking. We present Multi‐Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in multi‐omics data sets. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability. It disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The learnt factors enable a variety of downstream analyses, including identification of sample subgroups, data imputation and the detection of outlier samples. We applied MOFA to a cohort of 200 patient samples of chronic lymphocytic leukaemia, profiled for somatic mutations, RNA expression, DNA methylation and ex vivo drug responses. MOFA identified major dimensions of disease heterogeneity, including immunoglobulin heavy‐chain variable region status, trisomy of chromosome 12 and previously underappreciated drivers, such as response to oxidative stress. In a second application, we used MOFA to analyse single‐cell multi‐omics data, identifying coordinated transcriptional and epigenetic changes along cell differentiation. Synopsis Multi‐Omics Factor Analysis (MOFA) is a computational framework for unsupervised discovery of the principal axes of biological and technical variation when multiple omics assays are applied to the same samples. MOFA is a broadly applicable approach for multi‐omics data integration. The inferred latent factors represent the underlying principal axes of heterogeneity across the samples. Factors can be shared by multiple data modalities or can be data‐type specific. The model flexibly handles missing values and different data types. In an application to Chronic Lymphocytic Leukaemia, MOFA discovers a low dimensional space spanned by known clinical markers and underappreciated axes of variation such as oxidative stress. In an application to multi‐omics profiles from single‐cells, MOFA recovers differentiation trajectories and identifies coordinated variation between the transcriptome and the epigenome. Graphical Abstract Multi‐Omics Factor Analysis (MOFA) is a computational framework for unsupervised discovery of the principal axes of biological and technical variation when multiple omics assays are applied to the same samples. MOFA is a broadly applicable approach for multi‐omics data integration.

Journal Article

Share this book

Add to My Shelf

SBML Level 3: an extensible format for the exchange and reuse of biological models

by Dharuri, Harish , ANSYS France SAS ; ANSYS Inc. (États-Unis) , Wrzodek, Finja in Animals , Biological models (mathematics) , Biology

2020

Systems biology has experienced dramatic growth in the number, size, and complexity of computational models. To reproduce simulation results and reuse models, researchers must exchange unambiguous model descriptions. We review the latest edition of the Systems Biology Markup Language (SBML), a format designed for this purpose. A community of modelers and software authors developed SBML Level 3 over the past decade. Its modular form consists of a core suited to representing reaction-based models and packages that extend the core with features suited to other model types including constraint-based models, reaction-diffusion models, logical network models, and rule-based models. The format leverages two decades of SBML and a rich software ecosystem that transformed how systems biologists build and interact with models. More recently, the rise of multi-scale models of whole cells and organs, and new data sources such as single-cell measurements and live imaging, has precipitated new ways of integrating data with models. We provide our perspectives on the challenges presented by these developments and how SBML Level provides the foundation needed to support this evolution.

Journal Article

Share this book

Add to My Shelf

Updated benchmarking of variant effect predictors using deep mutational scanning

by Livesey, Benjamin J , Marsh, Joseph A in Amino acids , Benchmark , Benchmarking

2023

The assessment of variant effect predictor (VEP) performance is fraught with biases introduced by benchmarking against clinical observations. In this study, building on our previous work, we use independently generated measurements of protein function from deep mutational scanning (DMS) experiments for 26 human proteins to benchmark 55 different VEPs, while introducing minimal data circularity. Many top‐performing VEPs are unsupervised methods including EVE, DeepSequence and ESM‐1v, a protein language model that ranked first overall. However, the strong performance of recent supervised VEPs, in particular VARITY, shows that developers are taking data circularity and bias issues seriously. We also assess the performance of DMS and unsupervised VEPs for discriminating between known pathogenic and putatively benign missense variants. Our findings are mixed, demonstrating that some DMS datasets perform exceptionally at variant classification, while others are poor. Notably, we observe a striking correlation between VEP agreement with DMS data and performance in identifying clinically relevant variants, strongly supporting the validity of our rankings and the utility of DMS for independent benchmarking. Synopsis Common sources of bias in variant effect predictor benchmarking are assessed using data from deep mutational scanning experiments. ESM‐1v, EVE and DeepSequence are among the top performers on both functionally validated and clinically observed variants. Deep mutational scanning datasets from 26 human proteins are used to benchmark 55 computational predictors of missense variant effect. The top‐performing methods include several very recent predictors and are based mostly on unsupervised machine learning methodologies. There is a strong correlation between predictor performance when benchmarked against deep mutational scanning data and clinical variants. Graphical Abstract Common sources of bias in variant effect predictor benchmarking are assessed using data from deep mutational scanning experiments. ESM‐1v, EVE and DeepSequence are among the top performers on both functionally validated and clinically observed variants.

Journal Article

Share this book

Add to My Shelf

Integrated intra‐ and intercellular signaling knowledge for multicellular omics analysis

by Ölbei, Márton , Módos, Dezső , Türei, Dénes in Animals , Annotations , Biological activity

2021

Molecular knowledge of biological processes is a cornerstone in omics data analysis. Applied to single‐cell data, such analyses provide mechanistic insights into individual cells and their interactions. However, knowledge of intercellular communication is scarce, scattered across resources, and not linked to intracellular processes. To address this gap, we combined over 100 resources covering interactions and roles of proteins in inter‐ and intracellular signaling, as well as transcriptional and post‐transcriptional regulation. We added protein complex information and annotations on function, localization, and role in diseases for each protein. The resource is available for human, and via homology translation for mouse and rat. The data are accessible via OmniPath ’s web service ( https://omnipathdb.org/ ), a Cytoscape plug‐in, and packages in R/Bioconductor and Python, providing access options for computational and experimental scientists. We created workflows with tutorials to facilitate the analysis of cell–cell interactions and affected downstream intracellular signaling processes. OmniPath provides a single access point to knowledge spanning intra‐ and intercellular processes for data analysis, as we demonstrate in applications studying SARS‐CoV‐2 infection and ulcerative colitis. SYNOPSIS Over 100 resources are integrated into OmniPath , a comprehensive knowledge base of intra‐ and inter‐cellular signaling. Workflows are provided and illustrated in case studies analyzing omics data in SARS‐CoV‐2 infection and ulcerative colitis. OmniPath includes 4,000,000 annotations for over 20,000 proteins. A new framework defining transmitter and receiver roles generalizes the concepts of ligand and receptor . Integrated analysis of intra‐ and intercellular signaling can be performed to study how cells affect each other in healthy and diseased conditions. Software tools and workflows in R and Python facilitate the analysis of bulk and single‐cell omics data using tools such as CellPhoneDB , NicheNet and CARNIVAL . Graphical Abstract Over 100 resources are integrated into OmniPath , a comprehensive knowledge base of intra‐ and inter‐cellular signaling. Workflows are provided and illustrated in case studies analyzing omics data in SARS‐CoV‐2 infection and ulcerative colitis.

Journal Article

Share this book

Add to My Shelf

Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations

by Livesey, Benjamin J , Marsh, Joseph A in Amino acids , Benchmarks , Computer applications

2020

To deal with the huge number of novel protein‐coding variants identified by genome and exome sequencing studies, many computational variant effect predictors (VEPs) have been developed. Such predictors are often trained and evaluated using different variant data sets, making a direct comparison between VEPs difficult. In this study, we use 31 previously published deep mutational scanning (DMS) experiments, which provide quantitative, independent phenotypic measurements for large numbers of single amino acid substitutions, in order to benchmark and compare 46 different VEPs. We also evaluate the ability of DMS measurements and VEPs to discriminate between pathogenic and benign missense variants. We find that DMS experiments tend to be superior to the top‐ranking predictors, demonstrating the tremendous potential of DMS for identifying novel human disease mutations. Among the VEPs, DeepSequence clearly stood out, showing both the strongest correlations with DMS data and having the best ability to predict pathogenic mutations, which is especially remarkable given that it is an unsupervised method. We further recommend SNAP2, DEOGEN2, SNPs&GO, SuSPect and REVEL based upon their performance in these analyses. Synopsis Data from deep mutational scans is used to benchmark computational protein variant effect predictors using fully independent data. The performance of deep mutational scanning is also compared to computational predictors for identifying pathogenic variants. DeepSequence is the method that correlates the best with deep mutational scanning data for human proteins. Predictor performance depends heavily on the protein and fitness metric. For this reason, using results from multiple predictors is recommended. Other recommended predictors include SNAP2, DEOGEN2, SNPs&GO, SuSPect and REVEL. Deep mutational scanning is generally superior to variant effect predictors for distinguishing pathogenic from benign variants. Graphical Abstract Data from deep mutational scans is used to benchmark computational protein variant effect predictors using fully independent data. The performance of deep mutational scanning is also compared to computational predictors for identifying pathogenic variants.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter