Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
212
result(s) for
"Marioni, John C."
Sort by:
Stabilized mosaic single-cell data integration using unshared features
by
Ghazanfar, Shila
,
Guibentif, Carolina
,
Marioni, John C.
in
631/114/2397
,
631/114/2401
,
Agriculture
2024
Currently available single-cell omics technologies capture many unique features with different biological information content. Data integration aims to place cells, captured with different technologies, onto a common embedding to facilitate downstream analytical tasks. Current horizontal data integration techniques use a set of common features, thereby ignoring non-overlapping features and losing information. Here we introduce StabMap, a mosaic data integration technique that stabilizes mapping of single-cell data by exploiting the non-overlapping features. StabMap first infers a mosaic data topology based on shared features, then projects all cells onto supervised or unsupervised reference coordinates by traversing shortest paths along the topology. We show that StabMap performs well in various simulation contexts, facilitates ‘multi-hop’ mosaic data integration where some datasets do not share any features and enables the use of spatial gene expression features for mapping dissociated single-cell data onto a spatial transcriptomic reference.
Mosaic data integration is improved by capturing features that do not overlap between datasets.
Journal Article
Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors
2018
Differences in gene expression between individual cells of the same type are measured across batches and used to correct technical artifacts in single-cell RNA-sequencing data.
Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.
Journal Article
MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data
by
Velten, Britta
,
Marioni, John C.
,
Arnol, Damien
in
Animal Genetics and Genomics
,
Animals
,
Bioinformatics
2020
Technological advances have enabled the profiling of multiple molecular layers at single-cell resolution, assaying cells from multiple samples or conditions. Consequently, there is a growing need for computational strategies to analyze data from complex experimental designs that include multiple data modalities and multiple groups of samples. We present Multi-Omics Factor Analysis v2 (MOFA+), a statistical framework for the comprehensive and scalable integration of single-cell multi-modal data. MOFA+ reconstructs a low-dimensional representation of the data using computationally efficient variational inference and supports flexible sparsity constraints, allowing to jointly model variation across multiple sample groups and data modalities.
Journal Article
Computational and analytical challenges in single-cell transcriptomics
by
Teichmann, Sarah A.
,
Stegle, Oliver
,
Marioni, John C.
in
631/114/2415
,
631/1647/2017
,
631/1647/514/1949
2015
Key Points
Until recently, RNA profiling was limited to ensemble-based approaches, which average over bulk populations of cells. Technological advances in single-cell RNA sequencing (scRNA-seq) now enable the transcriptomes of large numbers of individual cells to be assayed in an unbiased manner.
To ensure that scRNA-seq data are fully exploited and interpreted correctly, it is important to apply appropriate computational and statistical approaches. Methods and principles previously developed for bulk RNA sequencing can be reused for this purpose; however, scRNA-seq data analysis poses several unique challenges that require new analytical strategies.
At the experimental design stage, unique molecular identifiers and quantitative standards such as spike-ins need to be considered to allow accurate normalization and quality control of the raw data.
Prior to using scRNA-seq data for biological discovery, it is important to consider both technical variability and confounding factors such as batch effects, the cell cycle or apoptosis. Computational methods that account for technical variation and remove confounding factors are beginning to emerge.
The processed and normalized scRNA-seq data provide unique analysis opportunities that allow novel biological discoveries to be made. These include identification and characterization of cell types and the study of their organization in space and/or time; inference of gene regulatory networks and their robustness across individual cells; and characterization of the stochastic component of transcription.
High-throughput RNA sequencing (RNA-seq) is a powerful method for transcriptome-wide analysis that has recently been applied to single cells. This Review discusses the analytical and computational challenges of processing and analysing single-cell RNA-seq data, paying special consideration to differences relative to the analysis of RNA-seq data generated from bulk cell populations and discussing how single-cell-specific biological insights can be obtained.
The development of high-throughput RNA sequencing (RNA-seq) at the single-cell level has already led to profound new discoveries in biology, ranging from the identification of novel cell types to the study of global patterns of stochastic gene expression. Alongside the technological breakthroughs that have facilitated the large-scale generation of single-cell transcriptomic data, it is important to consider the specific computational and analytical challenges that still have to be overcome. Although some tools for analysing RNA-seq data from bulk cell populations can be readily applied to single-cell RNA-seq data, many new computational strategies are required to fully exploit this data type and to enable a comprehensive yet detailed study of gene expression at the single-cell level.
Journal Article
Multi‐Omics Factor Analysis—a framework for unsupervised integration of multi‐omics data sets
by
Buettner, Florian
,
Huber, Wolfgang
,
Velten, Britta
in
Antineoplastic Agents - therapeutic use
,
Axes (reference lines)
,
Biological activity
2018
Multi‐omics studies promise the improved characterization of biological processes across molecular layers. However, methods for the unsupervised integration of the resulting heterogeneous data sets are lacking. We present Multi‐Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in multi‐omics data sets. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability. It disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The learnt factors enable a variety of downstream analyses, including identification of sample subgroups, data imputation and the detection of outlier samples. We applied MOFA to a cohort of 200 patient samples of chronic lymphocytic leukaemia, profiled for somatic mutations, RNA expression, DNA methylation and
ex vivo
drug responses. MOFA identified major dimensions of disease heterogeneity, including immunoglobulin heavy‐chain variable region status, trisomy of chromosome 12 and previously underappreciated drivers, such as response to oxidative stress. In a second application, we used MOFA to analyse single‐cell multi‐omics data, identifying coordinated transcriptional and epigenetic changes along cell differentiation.
Synopsis
Multi‐Omics Factor Analysis (MOFA) is a computational framework for unsupervised discovery of the principal axes of biological and technical variation when multiple omics assays are applied to the same samples. MOFA is a broadly applicable approach for multi‐omics data integration.
The inferred latent factors represent the underlying principal axes of heterogeneity across the samples. Factors can be shared by multiple data modalities or can be data‐type specific.
The model flexibly handles missing values and different data types.
In an application to Chronic Lymphocytic Leukaemia, MOFA discovers a low dimensional space spanned by known clinical markers and underappreciated axes of variation such as oxidative stress.
In an application to multi‐omics profiles from single‐cells, MOFA recovers differentiation trajectories and identifies coordinated variation between the transcriptome and the epigenome.
Graphical Abstract
Multi‐Omics Factor Analysis (MOFA) is a computational framework for unsupervised discovery of the principal axes of biological and technical variation when multiple omics assays are applied to the same samples. MOFA is a broadly applicable approach for multi‐omics data integration.
Journal Article
Unsupervised removal of systematic background noise from droplet-based single-cell experiments using CellBender
by
Chaffin, Mark D.
,
Arduini, Alessandro
,
Babadi, Mehrtash
in
631/114/1305
,
631/114/2397
,
631/1647/794
2023
Droplet-based single-cell assays, including single-cell RNA sequencing (scRNA-seq), single-nucleus RNA sequencing (snRNA-seq) and cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq), generate considerable background noise counts, the hallmark of which is nonzero counts in cell-free droplets and off-target gene expression in unexpected cell types. Such systematic background noise can lead to batch effects and spurious differential gene expression results. Here we develop a deep generative model based on the phenomenology of noise generation in droplet-based assays. The proposed model accurately distinguishes cell-containing droplets from cell-free droplets, learns the background noise profile and provides noise-free quantification in an end-to-end fashion. We implement this approach in the scalable and robust open-source software package CellBender. Analysis of simulated data demonstrates that CellBender operates near the theoretically optimal denoising limit. Extensive evaluations using real datasets and experimental benchmarks highlight enhanced concordance between droplet-based single-cell data and established gene expression patterns, while the learned background noise profile provides evidence of degraded or uncaptured cell types.
Using a deep generative model, CellBender models and denoises droplet-based single-cell data and improves multiple downstream analyses.
Journal Article
BASiCS: Bayesian Analysis of Single-Cell Sequencing Data
by
Richardson, Sylvia
,
Marioni, John C.
,
Vallejos, Catalina A.
in
Algorithms
,
Animals
,
Bayes Theorem
2015
Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell's lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach.
Journal Article
EmptyDrops: distinguishing cells from empty droplets in droplet-based single-cell RNA sequencing data
by
Dao, The Phuong
,
Lun, Aaron T. L.
,
Riesenfeld, Samantha
in
Animal Genetics and Genomics
,
Bioinformatics
,
Biomarkers - metabolism
2019
Droplet-based single-cell RNA sequencing protocols have dramatically increased the throughput of single-cell transcriptomics studies. A key computational challenge when processing these data is to distinguish libraries for real cells from empty droplets. Here, we describe a new statistical method for calling cells from droplet-based data, based on detecting significant deviations from the expression profile of the ambient solution. Using simulations, we demonstrate that EmptyDrops has greater power than existing approaches while controlling the false discovery rate among detected cells. Our method also retains distinct cell types that would have been discarded by existing methods in several real data sets.
Journal Article
Using single‐cell genomics to understand developmental processes and cell fate decisions
by
Scialdone, Antonio
,
Griffiths, Jonathan A
,
Marioni, John C
in
Biology
,
Cell Differentiation - genetics
,
Cell fate
2018
High‐throughput
‐omics
techniques have revolutionised biology, allowing for thorough and unbiased characterisation of the molecular states of biological systems. However, cellular decision‐making is inherently a unicellular process to which “bulk” ‐omics techniques are poorly suited, as they capture ensemble averages of cell states. Recently developed single‐cell methods bridge this gap, allowing high‐throughput molecular surveys of individual cells. In this review, we cover core concepts of analysis of single‐cell gene expression data and highlight areas of developmental biology where single‐cell techniques have made important contributions. These include understanding of cell‐to‐cell heterogeneity, the tracing of differentiation pathways, quantification of gene expression from specific alleles, and the future directions of cell lineage tracing and spatial gene expression analysis.
Graphical Abstract
Single‐cell genomic techniques have advanced our understanding of several developmental processes. This Review summarises advances related to generating and analyzing single‐cell transcriptome data and discusses areas of developmental biology that benefited from such technologies.
Journal Article
Staged developmental mapping and X chromosome transcriptional dynamics during mouse spermatogenesis
2019
Male gametes are generated through a specialised differentiation pathway involving a series of developmental transitions that are poorly characterised at the molecular level. Here, we use droplet-based single-cell RNA-Sequencing to profile spermatogenesis in adult animals and at multiple stages during juvenile development. By exploiting the first wave of spermatogenesis, we both precisely stage germ cell development and enrich for rare somatic cell-types and spermatogonia. To capture the full complexity of spermatogenesis including cells that have low transcriptional activity, we apply a statistical tool that identifies previously uncharacterised populations of leptotene and zygotene spermatocytes. Focusing on post-meiotic events, we characterise the temporal dynamics of X chromosome re-activation and profile the associated chromatin state using CUT&RUN. This identifies a set of genes strongly repressed by H3K9me3 in spermatocytes, which then undergo extensive chromatin remodelling post-meiosis, thus acquiring an active chromatin state and spermatid-specific expression.
The transcriptional regulation of murine spermatogenesis is not well understood. Here, the authors use single-cell and bulk RNA-Sequencing of juvenile and adult mice to characterise somatic and germ cell development, and chromatin profile the X chromosome to show that spermatid-specific genes are repressed by H3K9me3 during meiosis.
Journal Article