Catalogue Search | MBRL

Improved batch correction in untargeted MS-based metabolomics

by Wijnker, Erik , Mumm, Roland , Hall, Robert D. in Arabidopsis thaliana , Batch correction , Biochemistry

2016

Introduction Batch effects in large untargeted metabolomics experiments are almost unavoidable, especially when sensitive detection techniques like mass spectrometry (MS) are employed. In order to obtain peak intensities that are comparable across all batches, corrections need to be performed. Since non-detects, i.e., signals with an intensity too low to be detected with certainty, are common in metabolomics studies, the batch correction methods need to take these into account. Objectives This paper aims to compare several batch correction methods, and investigates the effect of different strategies for handling non-detects. Methods Batch correction methods usually consist of regression models, possibly also accounting for trends within batches. To fit these models quality control samples (QCs), injected at regular intervals, can be used. Also study samples can be used, provided that the injection order is properly randomized. Normalization methods, not using information on batch labels or injection order, can correct for batch effects as well. Introducing two easy-to-use quality criteria, we assess the merits of these batch correction strategies using three large LC–MS and GC–MS data sets of samples from Arabidopsis thaliana . Results The three data sets have very different characteristics, leading to clearly distinct behaviour of the batch correction strategies studied. Explicit inclusion of information on batch and injection order in general leads to very good corrections; when enough QCs are available, also general normalization approaches perform well. Several approaches are shown to be able to handle non-detects—replacing them with very small numbers such as zero seems the worst of the approaches considered. Conclusion The use of quality control samples for batch correction leads to good results when enough QCs are available. If an experiment is properly set up, batch correction using the study samples usually leads to a similar high-quality correction, but has the advantage that more metabolites are corrected. The strategy for handling non-detects is important: choosing small values like zero can lead to suboptimal batch corrections.

Journal Article

Share this book

Add to My Shelf

Management of left‐censored data in dietary exposure assessment of chemical substances

in chemical contaminants , detection limit , laboratory sensitivity

2010

Within the general framework of chemical risk assessment, a difficult step in dietary exposure assessment is the handling of concentration data reported to be below the limit of detection (LOD). These data are known as non‐detects and the resulting distribution of occurrence values is left‐censored. Handling left‐censored data represents a challenge for EFSA?s collection and statistical analysis of chemical occurrence data. EFSA has so far treated left‐censored data with widely used substitution methods recommended by international organisations. The appropriateness of this approach has a natural limitation in the computation of percentiles and in the application of statistical techniques. An EFSA working group was established to estimate the accuracy of methods currently used and to propose recommendations for more advanced alternative statistical approaches. Based on a simulation study and on analyses of real data, an ad hoc evaluation was carried out to assess the performance of different statistical methods to handle non‐detects, i.e. parametric Maximum likelihood (ML) models, the log‐probit regression method and the non‐parametric Kaplan‐Meier (KM) method. Results showed that the number of samples had a relatively limited impact on the accuracy and precision of estimates, but the degree of censoring had a large effect. When analysing a complex set of data, it was also shown that it is essential to identify possible sources of heterogeneity in a dataset, such as country of sample collection/origin, food group, laboratory, etc. Statistical analyses should either be conducted separately from these factors, or, to explicitly account for this heterogeneity, fixed/random effect ML models could be used. Based on a minimum number of available samples and to different values of censoring percentages, the working group outlined recommendations, including the use of appropriate statistical tests, to handle left‐censored distributions of chemical contaminant data in the context of exposure assessment.

Journal Article

Share this book

Add to My Shelf

Combining statistical methods for detecting potential outliers in groundwater quality time series

by Berendrecht, Wilbert , van Vliet, Mariëlle , Griffioen, Jasper in Atmospheric Protection/Air Quality Control/Air Pollution , Data analysis , data collection

2023

Quality control of large-scale monitoring networks requires the use of automatic procedures to detect potential outliers in an unambiguous and reproducible manner. This paper describes a methodology that combines existing statistical methods to accommodate for the specific characteristics of measurement data obtained from groundwater quality monitoring networks: the measurement series show a large variety of dynamics and often comprise few (< 25) measurements, the measurement data are not normally distributed, measurement series may contain several outliers, there may be trends in the series, and/or some measurements may be below detection limits. Furthermore, the detection limits may vary in time. The methodology for outlier detection described in this paper uses robust regression on order statistics (ROS) to deal with measured values below the detection limit. In addition, a biweight location estimator is applied to filter out any temporal trends from the series. The subsequent outlier detection is done in z-score space. Tuning parameters are used to attune the robustness and accuracy to the given dataset and the user requirements. The method has been applied to data from the Dutch national groundwater quality monitoring network, which consists of approximately 350 monitoring wells. It proved to work well in general, detecting outliers at the top and bottom of the regular measurement range and around the detection limit. Given the diversity exhibited by measurement series, it is to be expected that the method does not give 100% satisfactory results. Measured values identified by the method as potential outliers will therefore always need to be further assessed on the basis of expert knowledge, consistency with other measurement data and/or additional research.

Journal Article

Share this book

Add to My Shelf

Accounting for Non-Detects: Application to Satellite Ammonia Observations

by Cady-Pereira, Karen E. , O’Brien, Jason , Dammers, Enrico in Algorithms , Ammonia , Artificial satellites in remote sensing

2023

Presented is a methodology to explicitly identify and account for cloud-free satellite measurements below a sensor’s measurement detection level. These low signals can often be found in satellite observations of minor atmospheric species with weak spectral signals (e.g., ammonia (NH3)). Not accounting for these non-detects can high-bias averaged measurements in locations that exhibit conditions below the detection limit of the sensor. The approach taken here is to utilize the information content from the satellite signal to explicitly identify non-detects and then account for them with a consistent approach. The methodology is applied to the CrIS Fast Physical Retrieval (CFPR) ammonia product and results in a more realistic averaged dataset under conditions where there are a significant number of non-detects. These results show that in larger emission source regions (i.e., surface values > 7.5 ppbv) the non-detects occur less than 5% of the time and have a relatively small impact (decreases by less than 5%) on the gridded averaged values (e.g., annual ammonia source regions). However, in regions that have low ammonia concentration amounts (i.e., surface values < 1 ppbv) the fraction of non-detects can be greater than 70%, and accounting for these values can decrease annual gridded averaged values by over 50% and make the distributions closer to what is expected based on surface station observations.

Journal Article

Share this book

Add to My Shelf

Multiple imputation and direct estimation for qPCR data with non-detects

by Land, Harmut , Sherina, Valeriia , McCall, Matthew N. in Algorithms , Analysis , Bioinformatics

2020

Background Quantitative real-time PCR (qPCR) is one of the most widely used methods to measure gene expression. An important aspect of qPCR data that has been largely ignored is the presence of non-detects: reactions failing to exceed the quantification threshold and therefore lacking a measurement of expression. While most current software replaces these non-detects with a value representing the limit of detection, this introduces substantial bias in the estimation of both absolute and differential expression. Single imputation procedures, while an improvement on previously used methods, underestimate residual variance, which can lead to anti-conservative inference. Results We propose to treat non-detects as non-random missing data, model the missing data mechanism, and use this model to impute missing values or obtain direct estimates of model parameters. To account for the uncertainty inherent in the imputation, we propose a multiple imputation procedure, which provides a set of plausible values for each non-detect. We assess the proposed methods via simulation studies and demonstrate the applicability of these methods to three experimental data sets. We compare our methods to mean imputation, single imputation, and a penalized EM algorithm incorporating non-random missingness (PEMM). The developed methods are implemented in the R/Bioconductor package nondetects . Conclusions The statistical methods introduced here reduce discrepancies in gene expression values derived from qPCR experiments in the presence of non-detects, providing increased confidence in downstream analyses.

Journal Article

Share this book

Add to My Shelf

Analysis of German BSE Surveillance Data: Estimation of the Prevalence of Confirmed Cases versus the Number of Infected, but Non-Detected, Cattle to Assess Confidence in Freedom from Infection

by Conraths, Franz , Balkema-Buschmann, Anne , Greiner, Matthias in Age groups , Animals , Bovine spongiform encephalopathy

2021

Quantitative risk assessments for Bovine spongiform encephalopathy (BSE) necessitate estimates for key parameters such as the prevalence of infection, the probability of absence of infection in defined birth cohorts, and the numbers of BSE-infected, but non-detected cattle entering the food chain. We estimated three key parameters with adjustment for misclassification using the German BSE surveillance data using a Gompertz model for latent (i.e., unobserved) age-dependent detection probabilities and a Poisson response model for the number of BSE cases for birth cohorts 1999 to 2015. The models were combined in a Bayesian framework. We estimated the median true BSE prevalence between 3.74 and 0.216 cases per 100,000 animals for the birth cohorts 1990 to 2001 and observed a peak for the 1996 birth cohort with a point estimate of 16.41 cases per 100,000 cattle. For birth cohorts ranging from 2002 to 2013, the estimated median prevalence was below one case per 100,000 heads. The calculated confidence in freedom from disease (design prevalence 1 in 100,000) was above 99.5% for the birth cohorts 2002 to 2006. In conclusion, BSE surveillance in the healthy slaughtered cattle chain was extremely sensitive at the time, when BSE repeatedly occurred in Germany (2000–2009), because the entry of BSE-infected cattle into the food chain could virtually be prevented by the extensive surveillance program during these years and until 2015 (estimated non-detected cases/100.000 [95% credible interval] in 2000, 2009, and 2015 are 0.64 [0.5,0.8], 0.05 [0.01,0.14], and 0.19 [0.05,0.61], respectively).

Journal Article

Share this book

Add to My Shelf

Management of left‐censored data in dietary exposure assessment of chemical substances

2010

Journal Article

Share this book

Add to My Shelf

Sewage-specific enterococcal bacteriophages and multiple water quality parameters for coastal water quality assessment

by Kongprajug, Akechai , Sirikanchana, Kwanrawee , Booncharoen, Namfon in Bacteria , Bacteriophages , Bacteriophages - growth & development

2019

Coastal water quality is deteriorating worldwide. Water quality monitoring is therefore essential for public health risk evaluation and the management of water bodies. This study investigated the feasibility of using bacteriophages of Enterococcus faecalis as sewage-specific faecal indicators, together with physicochemical (dissolved oxygen, pH, temperature and total suspended solids) and biological parameters, to assess coastal water quality using multivariate analysis incorporating non-detects. The principal component and cluster analyses demonstrated that coastal water quality was mostly influenced by biological parameters, including Escherichia coli and total coliforms, which were found in all 31 sampling sites, and enterococci, which was found in all but two sampling sites. The enterococcal bacteriophages AIM06 and SR14 were detected in 17 and 18 samples at concentrations up to 1,815 and 2,790 PFU/100 mL, respectively. Both bacteriophages co-presented in approximately 80% of phage-positive samples, and the concentrations at each site were not significantly different. Overall, either bacteriophage could be used to differentiate high- and low-level coastal water pollution, as grouped by cluster analysis. This study is the first to investigate the suitability of sewage-specific bacteriophages of E. faecalis for monitoring coastal water quality and emphasises the importance of a multivariate analysis with non-detects to facilitate coastal water quality monitoring and management.

Journal Article

Share this book

Add to My Shelf

Maximum Pairwise Pseudo-likelihood Estimation of the Covariance Matrix from Left-Censored Data

by Thorne, Peter S. , Perry, Sarah S. , Jones, Michael P. in Agriculture , Biostatistics , covariance

2015

Toxicological studies often depend on laboratory assays that have thresholds below which environmental pollutants cannot be measured with accuracy. Exposure levels below this limit of detection may well be toxic and hence it is vital to use data analytic methods that handle such left-censored data with as little estimation bias as possible. In an on-going study for which our methodology is developed, levels of residential exposure to polychlorinated biphenyls (PCBs) and the interrelationships of their subtypes (congeners) are characterized. In any given sample many of the congeners may fall below the detection limit. The main problem tackled in this paper is estimation of mean exposure levels and corresponding covariance and correlation matrices for a large number of potentially left-censored measures that have very low bias and are computationally feasible. The proposed methods are likelihood based, using marginal likelihoods for means and variances and pairwise pseudo-likelihoods for correlations and covariances. In the simple bivariate case, head-to-head comparisons show the proposed methods to be computationally more stable than ordinary maximum likelihood estimates (MLEs) and still maintain comparable bias. When the number of variables is much larger than 2, the proposed methods are far more computationally feasible than MLE. Furthermore, they exhibit much less bias when compared to popular imputation procedures. Analysis of the PCB data uncovered interesting correlational structures.

Journal Article

Share this book

Add to My Shelf

Quantile regression for the statistical analysis of immunological data with many non-detects

by Eilers, Paul HC , van Wijk, Roy Gerth , Röder, Esther in Allergology , Analysis , Animals

2012

Background Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Methods and results Quantile regression, a generalization of percentiles to regression models, models the median or higher percentiles and tolerates very high numbers of non-detects. We present a non-technical introduction and illustrate it with an implementation to real data from a clinical trial. We show that by using quantile regression, groups can be compared and that meaningful linear trends can be computed, even if more than half of the data consists of non-detects. Conclusion Quantile regression is a valuable addition to the statistical methods that can be used for the analysis of immunological datasets with non-detects.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter