Catalogue Search | MBRL

Unsupervised spatially embedded deep representation of spatial transcriptomics

by Chen, Ao , Fu, Huazhu , Uddamvathanak, Rom in Anopheles , Applications of technology in health and disease , B cells

2024

Optimal integration of transcriptomics data and associated spatial information is essential towards fully exploiting spatial transcriptomics to dissect tissue heterogeneity and map out inter-cellular communications. We present SEDR, which uses a deep autoencoder coupled with a masked self-supervised learning mechanism to construct a low-dimensional latent representation of gene expression, which is then simultaneously embedded with the corresponding spatial information through a variational graph autoencoder. SEDR achieved higher clustering performance on manually annotated 10 × Visium datasets and better scalability on high-resolution spatial transcriptomics datasets than existing methods. Additionally, we show SEDR’s ability to impute and denoise gene expression (URL: https://github.com/JinmiaoChenLab/SEDR/ ).

Journal Article

Share this book

Add to My Shelf

Pattern Learning and Knowledge Distillation for Single-Cell Data Annotation

by Ren, Boran , Li, Xuedong , Zhang, Ming in Analysis , Annotations , batch integration

2025

Transferring cell type annotations from reference dataset to query dataset is a fundamental problem in AI-based single-cell data analysis. However, single-cell measurement techniques lead to domain gaps between multiple batches or datasets. The existing deep learning methods lack consideration on batch integration when learning reference annotations, which is a challenge for cell type annotation on multiple query batches. For cell representation, batch integration can not only eliminate the gaps between batches or datasets but also improve the heterogeneity of cell clusters. In this study, we proposed PLKD, a cell type annotation method based on pattern learning and knowledge distillation. PLKD consists of Teacher (Transformer) and Student (MLP). Teacher groups all input genes (features) into different gene sets (patterns), and each pattern represents a specific biological function. This design enables model to focus on biologically relevant functions interaction rather than gene-level expression that is susceptible to gaps of batches. In addition, knowledge distillation makes lightweight Student resistant to noise, allowing Student to infer quickly and robustly. Furthermore, PLKD supports multi-modal cell type annotation, multi-modal integration and other tasks. Benchmark experiments demonstrate that PLKD is able to achieve accurate and robust cell type annotation.

Journal Article

Share this book

Add to My Shelf

scMC learns biological variation through the alignment of multiple single-cell genomics datasets

by Zhang, Lihua , Nie, Qing in Algorithms , Animal Genetics and Genomics , Bioinformatics

2021

Distinguishing biological from technical variation is crucial when integrating and comparing single-cell genomics datasets across different experiments. Existing methods lack the capability in explicitly distinguishing these two variations, often leading to the removal of both variations. Here, we present an integration method scMC to remove the technical variation while preserving the intrinsic biological variation. scMC learns biological variation via variance analysis to subtract technical variation inferred in an unsupervised manner. Application of scMC to both simulated and real datasets from single-cell RNA-seq and ATAC-seq experiments demonstrates its capability of detecting context-shared and context-specific biological signals via accurate alignment.

Journal Article

Share this book

Add to My Shelf

A benchmark of batch-effect correction methods for single-cell RNA sequencing data

by Ang, Kok Siong , Goh, Michelle , Zhang, Xiaomeng in Algorithms , Animal Genetics and Genomics , Animals

2020

Background Large-scale single-cell transcriptomic datasets generated using different technologies contain batch-specific systematic variations that present a challenge to batch-effect removal and data integration. With continued growth expected in scRNA-seq data, achieving effective batch integration with available computational resources is crucial. Here, we perform an in-depth benchmark study on available batch correction methods to determine the most suitable method for batch-effect removal. Results We compare 14 methods in terms of computational runtime, the ability to handle large datasets, and batch-effect correction efficacy while preserving cell type purity. Five scenarios are designed for the study: identical cell types with different technologies, non-identical cell types, multiple batches, big data, and simulated data. Performance is evaluated using four benchmarking metrics including kBET, LISI, ASW, and ARI. We also investigate the use of batch-corrected data to study differential gene expression. Conclusion Based on our results, Harmony, LIGER, and Seurat 3 are the recommended methods for batch integration. Due to its significantly shorter runtime, Harmony is recommended as the first method to try, with the other methods as viable alternatives.

Journal Article

Share this book

Add to My Shelf

Why Batch Effects Matter in Omics Data, and How to Avoid Them

by Wang, Wei , Wong, Limsoon , Goh, Wilson Wen Bin in Algorithms , Arthritis , batch effect

2017

Effective integration and analysis of new high-throughput data, especially gene-expression and proteomic-profiling data, are expected to deliver novel clinical insights and therapeutic options. Unfortunately, technical heterogeneity or batch effects (different experiment times, handlers, reagent lots, etc.) have proven challenging. Although batch effect-correction algorithms (BECAs) exist, we know little about effective batch-effect mitigation: even now, new batch effect-associated problems are emerging. These include false effects due to misapplying BECAs and positive bias during model evaluations. Depending on the choice of algorithm and experimental set-up, biological heterogeneity can be mistaken for batch effects and wrongfully removed. Here, we examine these emerging batch effect-associated problems, propose a series of best practices, and discuss some of the challenges that lie ahead. Effectively dealing with batch effects will be the next frontier in large-scale biological data analysis, particularly involving the integration of different data sets. Given how batch-effect correction exaggerates cross-validation outcomes, cross-validation is becoming considered a less authoritative form of evaluation. Batch effect-resistant methods will become important in the future, alongside existing batch effect-correction methods.

Journal Article

Share this book

Add to My Shelf

Alternative empirical Bayes models for adjusting for batch effects in genomic studies

by Manimaran, Solaiappan , Jenkins, David F. , Zhang, Yuqing in Algorithms , Batch effects , Batch processing

2018

Background Combining genomic data sets from multiple studies is advantageous to increase statistical power in studies where logistical considerations restrict sample size or require the sequential generation of data. However, significant technical heterogeneity is commonly observed across multiple batches of data that are generated from different processing or reagent batches, experimenters, protocols, or profiling platforms. These so-called batch effects often confound true biological relationships in the data, reducing the power benefits of combining multiple batches, and may even lead to spurious results in some combined studies. Therefore there is significant need for effective methods and software tools that account for batch effects in high-throughput genomic studies. Results Here we contribute multiple methods and software tools for improved combination and analysis of data from multiple batches. In particular, we provide batch effect solutions for cases where the severity of the batch effects is not extreme, and for cases where one high-quality batch can serve as a reference, such as the training set in a biomarker study. We illustrate our approaches and software in both simulated and real data scenarios. Conclusions We demonstrate the value of these new contributions compared to currently established approaches in the specified batch correction situations.

Journal Article

Share this book

Add to My Shelf

Benchmarking clustering, alignment, and integration methods for spatial transcriptomics

by Luo, Can , Zhou, Xin Maizie , Li, Yikang in Algorithms , Alignment , Animal Genetics and Genomics

2024

Background Spatial transcriptomics (ST) is advancing our understanding of complex tissues and organisms. However, building a robust clustering algorithm to define spatially coherent regions in a single tissue slice and aligning or integrating multiple tissue slices originating from diverse sources for essential downstream analyses remains challenging. Numerous clustering, alignment, and integration methods have been specifically designed for ST data by leveraging its spatial information. The absence of comprehensive benchmark studies complicates the selection of methods and future method development. Results In this study, we systematically benchmark a variety of state-of-the-art algorithms with a wide range of real and simulated datasets of varying sizes, technologies, species, and complexity. We analyze the strengths and weaknesses of each method using diverse quantitative and qualitative metrics and analyses, including eight metrics for spatial clustering accuracy and contiguity, uniform manifold approximation and projection visualization, layer-wise and spot-to-spot alignment accuracy, and 3D reconstruction, which are designed to assess method performance as well as data quality. The code used for evaluation is available on our GitHub. Additionally, we provide online notebook tutorials and documentation to facilitate the reproduction of all benchmarking results and to support the study of new methods and new datasets. Conclusions Our analyses lead to comprehensive recommendations that cover multiple aspects, helping users to select optimal tools for their specific needs and guide future method development.

Journal Article

Share this book

Add to My Shelf

SCITUNA: single-cell data integration tool using network alignment

by Doğan, Süleyman Onur , Tastan, Oznur , Houdjedj, Aissa in Algorithms , Alignment , Animals

2025

Background As single-cell genomics experiments increase in complexity and scale, the need to integrate multiple datasets has grown. Such integration enhances cellular feature identification by leveraging larger data volumes. However, batch effects-technical variations arising from differences in labs, times, or protocols-pose a significant challenge. Despite numerous proposed batch correction methods, many still have limitations, such as outputting only dimension-reduced data, relying on computationally intensive models, or resulting in overcorrection for batches with diverse cell type composition. Results We introduce a novel method for batch effect correction named SCITUNA, a Single-Cell data Integration Tool Using Network Alignment. We perform evaluations on 39 individual batches from four real datasets and a simulated dataset, which include both scRNA-seq and scATAC-seq datasets, spanning multiple organisms and tissues. A thorough comparison of existing batch correction methods using 13 metrics reveals that SCITUNA outperforms current approaches and is successful at preserving biological signals present in the original data. In particular, SCITUNA shows a better performance than the current methods in all the comparisons except for the multiple batch integration of the lung dataset where the difference is 0.004. Conclusion SCITUNA effectively removes batch effects while retaining the biological signals present in the data. Our extensive experiments reveal that SCITUNA will be a valuable tool for diverse integration tasks.

Journal Article

Share this book

Add to My Shelf

Correcting batch effects in large-scale multiomics studies using a reference-material-based ratio method

by Yang, Jingcheng , Chen, Qingwang , Hong, Huixiao in Algorithms , Animal Genetics and Genomics , Base Composition

2023

Background Batch effects are notoriously common technical variations in multiomics data and may result in misleading outcomes if uncorrected or over-corrected. A plethora of batch-effect correction algorithms are proposed to facilitate data integration. However, their respective advantages and limitations are not adequately assessed in terms of omics types, the performance metrics, and the application scenarios. Results As part of the Quartet Project for quality control and data integration of multiomics profiling, we comprehensively assess the performance of seven batch effect correction algorithms based on different performance metrics of clinical relevance, i.e., the accuracy of identifying differentially expressed features, the robustness of predictive models, and the ability of accurately clustering cross-batch samples into their own donors. The ratio-based method, i.e., by scaling absolute feature values of study samples relative to those of concurrently profiled reference material(s), is found to be much more effective and broadly applicable than others, especially when batch effects are completely confounded with biological factors of study interests. We further provide practical guidelines for implementing the ratio based approach in increasingly large-scale multiomics studies. Conclusions Multiomics measurements are prone to batch effects, which can be effectively corrected using ratio-based scaling of the multiomics data. Our study lays the foundation for eliminating batch effects at a ratio scale.

Journal Article

Share this book

Add to My Shelf

scAlign: a tool for alignment, integration, and rare cell identification from scRNA-seq data

by Quon, Gerald , Johansen, Nelson in Alignment , Animal Genetics and Genomics , Animals

2019

scRNA-seq dataset integration occurs in different contexts, such as the identification of cell type-specific differences in gene expression across conditions or species, or batch effect correction. We present scAlign, an unsupervised deep learning method for data integration that can incorporate partial, overlapping, or a complete set of cell labels, and estimate per-cell differences in gene expression across datasets. scAlign performance is state-of-the-art and robust to cross-dataset variation in cell type-specific expression and cell type composition. We demonstrate that scAlign reveals gene expression programs for rare populations of malaria parasites. Our framework is widely applicable to integration challenges in other domains.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter