Catalogue Search | MBRL

Batch effect detection and correction in RNA-seq data using machine-learning-based automated assessment of quality

by Fontaine, Jean-Fred , Andrade-Navarro, Miguel A. , Sprang, Maximilian in Algorithms , Automation , Batch effect

2022

Background The constant evolving and development of next-generation sequencing techniques lead to high throughput data composed of datasets that include a large number of biological samples. Although a large number of samples are usually experimentally processed by batches, scientific publications are often elusive about this information, which can greatly impact the quality of the samples and confound further statistical analyzes. Because dedicated bioinformatics methods developed to detect unwanted sources of variance in the data can wrongly detect real biological signals, such methods could benefit from using a quality-aware approach. Results We recently developed statistical guidelines and a machine learning tool to automatically evaluate the quality of a next-generation-sequencing sample. We leveraged this quality assessment to detect and correct batch effects in 12 publicly available RNA-seq datasets with available batch information. We were able to distinguish batches by our quality score and used it to correct for some batch effects in sample clustering. Overall, the correction was evaluated as comparable to or better than the reference method that uses a priori knowledge of the batches (in 10 and 1 datasets of 12, respectively; total = 92%). When coupled to outlier removal, the correction was more often evaluated as better than the reference (comparable or better in 5 and 6 datasets of 12, respectively; total = 92%). Conclusions In this work, we show the capabilities of our software to detect batches in public RNA-seq datasets from differences in the predicted quality of their samples. We also use these insights to correct the batch effect and observe the relation of sample quality and batch effect. These observations reinforce our expectation that while batch effects do correlate with differences in quality, batch effects also arise from other artifacts and are more suitably corrected statistically in well-designed experiments.

Journal Article

Share this book

Add to My Shelf

Propensity scores as a novel method to guide sample allocation and minimize batch effects during the design of high throughput experiments

by Vigers, Tim , Litkowski, Elizabeth , Vanderlinden, Lauren A. in Algorithms , Batch effect adjustment , Batch effects

2023

Background We developed a novel approach to minimize batch effects when assigning samples to batches. Our algorithm selects a batch allocation, among all possible ways of assigning samples to batches, that minimizes differences in average propensity score between batches. This strategy was compared to randomization and stratified randomization in a case–control study (30 per group) with a covariate (case vs control, represented as β1, set to be null) and two biologically relevant confounding variables (age, represented as β2, and hemoglobin A1c (HbA1c), represented as β3). Gene expression values were obtained from a publicly available dataset of expression data obtained from pancreas islet cells. Batch effects were simulated as twice the median biological variation across the gene expression dataset and were added to the publicly available dataset to simulate a batch effect condition. Bias was calculated as the absolute difference between observed betas under the batch allocation strategies and the true beta (no batch effects). Bias was also evaluated after adjustment for batch effects using ComBat as well as a linear regression model. In order to understand performance of our optimal allocation strategy under the alternative hypothesis, we also evaluated bias at a single gene associated with both age and HbA1c levels in the ‘true’ dataset (CAPN13 gene). Results Pre-batch correction, under the null hypothesis (β1), maximum absolute bias and root mean square (RMS) of maximum absolute bias, were minimized using the optimal allocation strategy. Under the alternative hypothesis (β2 and β3 for the CAPN13 gene), maximum absolute bias and RMS of maximum absolute bias were also consistently lower using the optimal allocation strategy. ComBat and the regression batch adjustment methods performed well as the bias estimates moved towards the true values in all conditions under both the null and alternative hypotheses. Although the differences between methods were less pronounced following batch correction, estimates of bias (average and RMS) were consistently lower using the optimal allocation strategy under both the null and alternative hypotheses. Conclusions Our algorithm provides an extremely flexible and effective method for assigning samples to batches by exploiting knowledge of covariates prior to sample allocation.

Journal Article

Share this book

Add to My Shelf

A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data

by Shi, T , Shi, W , Tillinghast, G in 631/1647/2017/2079 , Algorithms , Biomedical and Life Sciences

2010

Batch effects are the systematic non-biological differences between batches (groups) of samples in microarray experiments due to various causes such as differences in sample preparation and hybridization protocols. Previous work focused mainly on the development of methods for effective batch effects removal. However, their impact on cross-batch prediction performance, which is one of the most important goals in microarray-based applications, has not been addressed. This paper uses a broad selection of data sets from the Microarray Quality Control Phase II (MAQC-II) effort, generated on three microarray platforms with different causes of batch effects to assess the efficacy of their removal. Two data sets from cross-tissue and cross-platform experiments are also included. Of the 120 cases studied using Support vector machines (SVM) and K nearest neighbors (KNN) as classifiers and Matthews correlation coefficient (MCC) as performance metric, we find that Ratio-G, Ratio-A, EJLR, mean-centering and standardization methods perform better or equivalent to no batch effect removal in 89, 85, 83, 79 and 75% of the cases, respectively, suggesting that the application of these methods is generally advisable and ratio-based methods are preferred.

Journal Article

Share this book

Add to My Shelf

The BAMBOO method for correcting batch effects in high throughput proximity extension assays for proteomic studies

by Smits, H. M. , Oldenburg, B. , Nierkens, S. in Bamboo , Batch effects , Batch effects correction

2025

The proximity extension assay (PEA) enables large-scale proteomic investigations across numerous proteins and samples. However, discrepancies between measurements, known as batch-effects, potentially skew downstream statistical analyses and increase the risks of false discoveries. While implementing bridging controls (BCs) on each plate has been proposed to mitigate these effects, a clear method for utilizing this strategy remains elusive. Here, we characterized batch effects in PEA proteomics and identified three types: protein-specific, sample-specific, and plate-wide. We developed a robust regression-based method called BAMBOO (Batch Adjustments using Bridging cOntrOls) to correct them. Simulations comparing BAMBOO with established correction techniques (median centering, median of the difference (MOD), and ComBat) revealed that median centering and ComBat were significantly impacted by outliers within the BCs, whereas BAMBOO and MOD were more robust when no plate-wide effects were introduced. Optimal batch correction was achieved with 10–12 BCs. We validated the simulation results using experimental data and found that BAMBOO and MOD had a reduced incidence of false discoveries compared to alternative methods. Our findings emphasize the prevalence of batch effects in PEA proteomic studies and advocate for BAMBOO as a robust and effective tool to enhance the reliability of large-scale analyses in the proteomic field.

Journal Article

Share this book

Add to My Shelf

Extent, impact, and mitigation of batch effects in tumor biomarker studies using tissue microarrays

by Mucci, Lorelei A , Penney, Kathryn L , Stopsack, Konrad H in Automation , batch effect , batchtma

2021

Tissue microarrays (TMAs) have been used in thousands of cancer biomarker studies. To what extent batch effects, measurement error in biomarker levels between slides, affects TMA-based studies has not been assessed systematically. We evaluated 20 protein biomarkers on 14 TMAs with prospectively collected tumor tissue from 1448 primary prostate cancers. In half of the biomarkers, more than 10% of biomarker variance was attributable to between-TMA differences (range, 1–48%). We implemented different methods to mitigate batch effects (R package batchtma ), tested in plasmode simulation. Biomarker levels were more similar between mitigation approaches compared to uncorrected values. For some biomarkers, associations with clinical features changed substantially after addressing batch effects. Batch effects and resulting bias are not an error of an individual study but an inherent feature of TMA-based protein biomarker studies. They always need to be considered during study design and addressed analytically in studies using more than one TMA. To understand cancer, researchers need to know which molecules tumor cells use. These so-called ‘biomarkers’ tag cancer cells as being different from healthy cells, and can be used to predict how aggressive a tumor may be, or how well it might respond to treatment. A popular technique for assessing biomarkers across multiple tumors is to use tissue microarrays. This involves taking samples from different tumors and embedding them in a block of wax, which is then cut into micro-thin slices and stained with reagents that can detect specific biomarkers, such as proteins. Each block contains hundreds of samples, which all experience the same conditions. So, any patterns detected in the staining are likely to represent real variations in the biomarkers present. Many cancer studies, however, often compare samples from multiple tissue microarrays, which may increase the risk of technical artifacts: for example, staining may look stronger in one batch of tissue samples than another, even though the amount of biomarker present in these different arrays is roughly the same. These ‘batch effects’ could potentially bias the results of the experiment and lead to the identification of misleading patterns. To evaluate how batch effects impact tissue microarray studies, Stopsack et al. examined 14 wax blocks which contained tumor samples from 1,448 men with prostate cancer. This revealed that for some biomarkers, but not others, there were noticeable differences between tissue microarrays that were clearly the result of batch effects. Stopsack et al. then tested six different ways of fixing these discrepancies using statistical methods. All six approaches were successful, even if the arrays included tumors with different characteristics, such as tumors that had been diagnosed more or less recently. This work highlights the importance of considering batch effects when using tissue microarrays to study cancer. Stopsack et al. have used their statistical approaches to develop freely available software which can reduce the biases that sometimes arise from these technical artifacts. This could help researchers avoid misleading patterns in their data and make it easier to detect real variations in the biomarkers present between tumor samples.

Journal Article

Share this book

Add to My Shelf

A benchmark of batch-effect correction methods for single-cell RNA sequencing data

by Ang, Kok Siong , Goh, Michelle , Zhang, Xiaomeng in Algorithms , Animal Genetics and Genomics , Animals

2020

Background Large-scale single-cell transcriptomic datasets generated using different technologies contain batch-specific systematic variations that present a challenge to batch-effect removal and data integration. With continued growth expected in scRNA-seq data, achieving effective batch integration with available computational resources is crucial. Here, we perform an in-depth benchmark study on available batch correction methods to determine the most suitable method for batch-effect removal. Results We compare 14 methods in terms of computational runtime, the ability to handle large datasets, and batch-effect correction efficacy while preserving cell type purity. Five scenarios are designed for the study: identical cell types with different technologies, non-identical cell types, multiple batches, big data, and simulated data. Performance is evaluated using four benchmarking metrics including kBET, LISI, ASW, and ARI. We also investigate the use of batch-corrected data to study differential gene expression. Conclusion Based on our results, Harmony, LIGER, and Seurat 3 are the recommended methods for batch integration. Due to its significantly shorter runtime, Harmony is recommended as the first method to try, with the other methods as viable alternatives.

Journal Article

Share this book

Add to My Shelf

RaMBat : Accurate identification of medulloblastoma subtypes from diverse data sources with severe batch effects

by Wan, Shibiao , Sun, Mengtao , Wang, Jieqiong

2026

As the most common pediatric brain malignancy, medulloblastoma (MB) includes multiple distinct molecular subtypes characterized by clinical heterogeneity and genetic alterations. Accurate identification of MB subtypes is essential for downstream risk stratification and tailored therapeutic design. Existing MB subtyping approaches perform poorly due to limited cohorts and severe batch effects when integrating various MB data sources. To address these concerns, we propose a novel approach called RaMBat for accurate MB subtyping from diverse data sources with severe batch effects. Benchmarking tests based on 13 datasets with severe batch effects suggested that RaMBat achieved a median accuracy of 99%, significantly outperforming state‐of‐the‐art MB subtyping approaches and conventional machine learning classifiers. RaMBat could efficiently deal with the batch effects and clearly separate subtypes of MB samples from diverse data sources. We believe RaMBat will bring direct positive impacts on downstream MB risk stratification and tailored treatment design.

Journal Article

Share this book

Add to My Shelf

Why Batch Effects Matter in Omics Data, and How to Avoid Them

by Wang, Wei , Wong, Limsoon , Goh, Wilson Wen Bin in Algorithms , Arthritis , batch effect

2017

Effective integration and analysis of new high-throughput data, especially gene-expression and proteomic-profiling data, are expected to deliver novel clinical insights and therapeutic options. Unfortunately, technical heterogeneity or batch effects (different experiment times, handlers, reagent lots, etc.) have proven challenging. Although batch effect-correction algorithms (BECAs) exist, we know little about effective batch-effect mitigation: even now, new batch effect-associated problems are emerging. These include false effects due to misapplying BECAs and positive bias during model evaluations. Depending on the choice of algorithm and experimental set-up, biological heterogeneity can be mistaken for batch effects and wrongfully removed. Here, we examine these emerging batch effect-associated problems, propose a series of best practices, and discuss some of the challenges that lie ahead. Effectively dealing with batch effects will be the next frontier in large-scale biological data analysis, particularly involving the integration of different data sets. Given how batch-effect correction exaggerates cross-validation outcomes, cross-validation is becoming considered a less authoritative form of evaluation. Batch effect-resistant methods will become important in the future, alongside existing batch effect-correction methods.

Journal Article

Share this book

Add to My Shelf

MetaboAnalystR 3.0: Toward an Optimized Workflow for Global Metabolomics

by Chong, Jasmine , Pang, Zhiqiang , Xia, Jianguo in batch effects , global metabolomics , pathway activity prediction

2020

Liquid chromatography coupled to high-resolution mass spectrometry platforms are increasingly employed to comprehensively measure metabolome changes in systems biology and complex diseases. Over the past decade, several powerful computational pipelines have been developed for spectral processing, annotation, and analysis. However, significant obstacles remain with regard to parameter settings, computational efficiencies, batch effects, and functional interpretations. Here, we introduce MetaboAnalystR 3.0, a significantly improved pipeline with three key new features: (1) efficient parameter optimization for peak picking; (2) automated batch effect correction; and (3) more accurate pathway activity prediction. Our benchmark studies showed that this workflow was 20~100× faster compared to other well-established workflows and produced more biologically meaningful results. In summary, MetaboAnalystR 3.0 offers an efficient pipeline to support high-throughput global metabolomics in the open-source R environment.

Journal Article

Share this book

Add to My Shelf

Population structure discovery in meta-analyzed microbial communities and inflammatory bowel disease using MMUPHin

by Nguyen, Long H. , Mallick, Himel , Schirmer, Melanie in Acinetobacter , Animal Genetics and Genomics , Batch effect

2022

Microbiome studies of inflammatory bowel diseases (IBD) have achieved a scale for meta-analysis of dysbioses among populations. To enable microbial community meta-analyses generally, we develop MMUPHin for normalization, statistical meta-analysis, and population structure discovery using microbial taxonomic and functional profiles. Applying it to ten IBD cohorts, we identify consistent associations, including novel taxa such as Acinetobacter and Turicibacter , and additional exposure and interaction effects. A single gradient of dysbiosis severity is favored over discrete types to summarize IBD microbiome population structure. These results provide a benchmark for characterization of IBD and a framework for meta-analysis of any microbial communities.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter