Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
21
result(s) for
"Yan, Guanao"
Sort by:
RA3 is a reference-guided approach for epigenetic characterization of single cells
2021
The recent advancements in single-cell technologies, including single-cell chromatin accessibility sequencing (scCAS), have enabled profiling the epigenetic landscapes for thousands of individual cells. However, the characteristics of scCAS data, including high dimensionality, high degree of sparsity and high technical variation, make the computational analysis challenging. Reference-guided approaches, which utilize the information in existing datasets, may facilitate the analysis of scCAS data. Here, we present RA3 (Reference-guided Approach for the Analysis of single-cell chromatin Accessibility data), which utilizes the information in massive existing bulk chromatin accessibility and annotated scCAS data. RA3 simultaneously models (1) the shared biological variation among scCAS data and the reference data, and (2) the unique biological variation in scCAS data that identifies distinct subpopulations. We show that RA3 achieves superior performance when used on several scCAS datasets, and on references constructed using various approaches. Altogether, these analyses demonstrate the wide applicability of RA3 in analyzing scCAS data.
Methods for profiling differences between individual cells are constantly expanding. Here, the authors present a computational framework for the analysis of chromatin accessibility data at the single-cell level that takes into account previous knowledge and data-specific characteristics.
Journal Article
Categorization of 34 computational methods to detect spatially variable genes from spatially resolved transcriptomics data
2025
In the analysis of spatially resolved transcriptomics data, detecting spatially variable genes (SVGs) is crucial. Numerous computational methods exist, but varying SVG definitions and methodologies lead to incomparable results. We review 34 state-of-the-art methods, classifying SVGs into three categories: overall, cell-type-specific, and spatial-domain-marker SVGs. Our review explains the intuitions underlying these methods, summarizes their applications, and categorizes the hypothesis tests they use in the trade-off between generality and specificity for SVG detection. We discuss challenges in SVG detection and propose future directions for improvement. Our review offers insights for method developers and users, advocating for category-specific benchmarking.
In spatial transcriptomics data analysis, identifying spatially variable genes (SVGs) is crucial for understanding tissue organization and function. The authors categorize 34 computational methods for SVG detection, exploring their definitions, methodologies—including statistical approaches—and applications, while proposing future research directions.
Journal Article
scReadSim: a single-cell RNA-seq and ATAC-seq read simulator
2023
Benchmarking single-cell RNA-seq (scRNA-seq) and single-cell Assay for Transposase-Accessible Chromatin using sequencing (scATAC-seq) computational tools demands simulators to generate realistic sequencing reads. However, none of the few read simulators aim to mimic real data. To fill this gap, we introduce scReadSim, a single-cell RNA-seq and ATAC-seq read simulator that allows user-specified ground truths and generates synthetic sequencing reads (in a FASTQ or BAM file) by mimicking real data. At both read-sequence and read-count levels, scReadSim mimics real scRNA-seq and scATAC-seq data. Moreover, scReadSim provides ground truths, including unique molecular identifier (UMI) counts for scRNA-seq and open chromatin regions for scATAC-seq. In particular, scReadSim allows users to design cell-type-specific ground-truth open chromatin regions for scATAC-seq data generation. In benchmark applications of scReadSim, we show that UMI-tools achieves the top accuracy in scRNA-seq UMI deduplication, and HMMRATAC and MACS3 achieve the top performance in scATAC-seq peak calling.
Benchmarking computational tools for analysis of single-cell sequencing data demands simulation of realistic sequencing reads. However, none of the few existing read simulators aim to mimic real data. Here, the authors introduce scReadSim, a single-cell RNA-seq and ATAC-seq read simulator that works by mimicking real data.
Journal Article
CellScope: high-performance cell atlas workflow with tree-structured representation
2025
Single-cell sequencing enables comprehensive profiling of individual cells, revealing cellular heterogeneity and function with unprecedented resolution. However, current analysis frameworks lack the ability to simultaneously explore and visualize cellular hierarchies at multiple biological levels. To address these limitations, we present CellScope, a promising framework for constructing high-resolution cell atlases at multiple clustering levels. CellScope employs a two-stage manifold fitting process for gene selection and noise reduction, followed by agglomerative clustering, and integrates UMAP visualization with hierarchical clustering to intuitively represent cellular relationships simultaneously at multiple levels—such as cell lineage, cell type, and cell subtype levels. Compared to established pipelines such as Seurat and Scanpy, CellScope comprehensively improves clustering performance, visualization clarity, computational efficiency, and algorithm interpretability, while reducing dependence on hyperparameters across a multitude of single-cell datasets. Most importantly, it can reveal biological insights that other contemporary methods are unable to detect, thereby deepening our understanding of cellular heterogeneity and function, and potentially informing disease research.
Li and colleagues present CellScope, a tree-structured framework that reveals multi-level cellular hierarchies and gene functions in single-cell data. This approach provides clear clustering, intuitive visualization, and deep biological views into cell types and functions.
Journal Article
Systematic benchmarking of computational methods to identify spatially variable genes
by
Pinello, Luca
,
M.Patel, Zain
,
Yasa, Sai Nirmayi
in
Algorithms
,
Animal Genetics and Genomics
,
Benchmarking
2025
Background
Spatially resolved transcriptomics offers unprecedented insight by enabling the profiling of gene expression within the intact spatial context of cells, effectively adding a new and essential dimension to data interpretation. To efficiently detect spatial structure of interest, an essential step in analyzing such data involves identifying spatially variable genes (SVGs). Despite researchers having developed several computational methods to accomplish this task, the lack of a comprehensive benchmark evaluating their performance remains a considerable gap in the field.
Results
Here, we systematically evaluate 14 methods using 96 spatial datasets and 6 metrics. We compare the methods regarding gene ranking and classification based on real spatial variation, statistical calibration, and computation scalability and investigate the impact of identified SVGs on downstream applications such as spatial domain detection. Finally, we explore the applicability of the methods to spatial ATAC-seq data to examine their effectiveness in identifying spatially variable peaks (SVPs). Overall, SPARK-X outperforms other benchmarked methods and Moran’s I achieves a competitive performance, representing a strong baseline for future method development. Moreover, our results reveal that most methods are poorly calibrated, and more specialized algorithms are needed to identify spatially variable peaks.
Conclusions
Our benchmarking provides a detailed comparison of SVG detection methods and serves as a reference for both users and method developers.
Journal Article
scDesign3 generates realistic in silico data for multimodal single-cell and spatial omics
2024
We present a statistical simulator, scDesign3, to generate realistic single-cell and spatial omics data, including various cell states, experimental designs and feature modalities, by learning interpretable parameters from real data. Using a unified probabilistic model for single-cell and spatial omics data, scDesign3 infers biologically meaningful parameters; assesses the goodness-of-fit of inferred cell clusters, trajectories and spatial locations; and generates in silico negative and positive controls for benchmarking computational tools.
The challenge of simulating multiomic single-cell data is addressed by a probabilistic model.
Journal Article
Robust estimation and shrinkage in ultrahigh dimensional expectile regression with heavy tails and variance heterogeneity
2022
High-dimensional data subject to heavy-tailed phenomena and heterogeneity are commonly encountered in various scientific fields and bring new challenges to the classical statistical methods. In this paper, we combine the asymmetric square loss and huber-type robust technique to develop the robust expectile regression for ultrahigh dimensional heavy-tailed heterogeneous data. Different from the classical huber method, we introduce two different tuning parameters on both sides to account for possibly asymmetry and allow them to diverge to reduce bias induced by the robust approximation. In the regularized framework, we adopt the generally folded concave penalty function like the SCAD or MCP penalty for the seek of bias reduction. We investigate the finite sample property of the corresponding estimator and figure out how our method plays its role to trade off the estimation accuracy against the heavy-tailed distribution. Also, based on our theoretical study, we propose an efficient first-order optimization algorithm after locally linear approximation of the non-convex problem. Simulation studies under various distributions and a real data example demonstrate the satisfactory performances of our method in coefficient estimation, model selection and heterogeneity detection.
Journal Article
Advancing Statistical Rigor in Single-Cell and Spatial Omics Analysis Through In Silico Control Data
2025
Over the past decade, single-cell and spatial transcriptomics technologies have transformed our ability to study cellular diversity and tissue organization. These advances have led to the rapid development of computational methods for analyzing high-dimensional omics data. However, benchmarking these methods and ensuring their statistical rigor remain challenging, largely due to the absence of realistic synthetic data with ground truths and the conceptual ambiguity in defining key biological features such as spatially variable genes (SVGs). This dissertation addresses these gaps through two simulation frameworks and a comprehensive review that improve the statistical rigor and interpretability of tool development and evaluation.My first project introduces scReadSim, a simulator designed to generate realistic synthetic data for single-cell RNA sequencing (scRNA-seq) and chromatin accessibility profiling (scATAC-seq). It produces simulated sequencing reads in standard formats by mimicking the characteristics of real datasets, while allowing users to specify key ground truths, such as transcript abundance for scRNA-seq and cell-type-specific open chromatin regions for scATAC-seq. scReadSim supports flexible simulation settings, including varying cell numbers and sequencing depths, and enables systematic benchmarking of preprocessing tools. Using scReadSim, we show that UMI-tools achieves higher accuracy in transcript quantification for scRNA-seq, while HMMRATAC and MACS3 perform best in peak calling for scATAC-seq.My second project presents scIsoSim, a simulator that generates single-cell RNA sequencing data with known isoform structures and their corresponding expression levels. In gene expression, a single gene can give rise to multiple isoforms—different versions of RNA transcripts—through a biological process called alternative splicing, where segments of RNA are included or excluded in various combinations. scIsoSim supports widely used experimental protocols, including Smart-seq2 and 10x Genomics 3’ and 5’ platforms, and captures realistic splicing patterns observed in real datasets. This tool enables systematic evaluation of computational methods for quantifying isoform expression and detecting alternative splicing events. Benchmarking results show that bulk RNA-seq tools, such as Salmon, perform accurately on Smart-seq2 data with high computational efficiency. In contrast, Scasa—the only existing tool for 10x 3’ data—shows limited accuracy due to sparse data. Among splicing analysis tools, brie demonstrates better overall accuracy than outrigger but is less effective in detecting cell-specific splicing events.My third project is a review of 34 state-of-the-art SVG detection methods for spatial transcriptomics data. The review introduces a new categorization framework that defines SVGs as overall, cell-type-specific, or spatial-domain-marker genes, based on their spatial expression patterns and analytic objectives. It summarizes the underlying assumptions and statistical hypothesis tests used by each method, and discusses trade-offs between power and specificity. The review also identifies limitations in existing benchmarks, such as inappropriate method comparisons and oversimplified simulation designs, and calls for category-specific benchmarking using well-annotated datasets and realistic simulators.
Dissertation
Benchmarking computational methods to identify spatially variable genes and peaks
by
Pinello, Luca
,
Li, Zhijian
,
Li, Jingyi Jessica
in
Bioinformatics
,
Computer applications
,
Gene expression
2023
Spatially resolved transcriptomics offers unprecedented insight by enabling the profiling of gene expression within the intact spatial context of cells, effectively adding a new and essential dimension to data interpretation. To efficiently detect spatial structure of interest, an essential step in analyzing such data involves identifying spatially variable genes. Despite researchers having developed several computational methods to accomplish this task, the lack of a comprehensive benchmark evaluating their performance remains a considerable gap in the field. Here, we present a systematic evaluation of 14 methods using 60 simulated datasets generated by four different simulation strategies, 12 real-world transcriptomics, and three spatial ATAC-seq datasets. We find that spatialDE2 consistently outperforms the other benchmarked methods, and Moran's I achieves competitive performance in different experimental settings. Moreover, our results reveal that more specialized algorithms are needed to identify spatially variable peaks.
Journal Article
CellScope: High-Performance Cell Atlas Workflow with Tree-Structured Representation
2025
Single-cell sequencing enables comprehensive profiling of individual cells, revealing cellular heterogeneity and function with unprecedented resolution. However, current analysis frameworks lack the ability to simultaneously explore and visualize cellular hierarchies at multiple biological levels. To address these limitations, we present CellScope, an innovative framework for constructing high-resolution cell atlases at multiple clustering levels. CellScope employs a two-step manifold fitting process for gene selection and noise reduction, followed by agglomerative clustering, and uniquely integrates UMAP visualization with hierarchical clustering to intuitively represent cellular relationships simultaneously at multiple levels-such as cell lineage, cell type, and cell subtype levels. Compared to established pipelines such as Seurat and Scanpy, CellScope comprehensively improves clustering performance, visualization clarity, computational efficiency, and algorithm interpretability, while reducing dependence on hyperparameters across a multitude of single-cell datasets. Most importantly, it can reveal new biological insights that other contemporary methods are unable to detect, thereby deepening our understanding of cellular heterogeneity and function, and potentially informing disease research.Competing Interest StatementThe authors have declared no competing interest.