Catalogue Search | MBRL

Estimation of the false discovery proportion with unknown dependence

by Fan, Jianqing , Han, Xu in Analysis of covariance , Approximate factor model , Approximation

2017

Large-scale multiple testing with correlated test statistics arises frequently in much scientific research. Incorporating correlation information in approximating the false discovery proportion (FDP) has attracted increasing attention in recent years. When the covariance matrix of test statistics is known, Fan and his colleagues provided an accurate approximation of the FDP under arbitrary dependence structure and some sparsity assumption. However, the covariance matrix is often unknown in many applications and such dependence information must be estimated before approximating the FDP. The estimation accuracy can greatly affect the FDP approximation. In the current paper, we study theoretically the effect of unknown dependence on the testing procedure and establish a general framework such that the FDP can be well approximated. The effects of unknown dependence on approximating the FDP are in the following two major aspects: through estimating eigenvalues or eigenvectors and through estimating marginal variances. To address the challenges in these two aspects, we firstly develop general requirements on estimates of eigenvalues and eigenvectors for a good approximation of the FDP. We then give conditions on the structures of covariance matrices that satisfy such requirements. Such dependence structures include banded or sparse covariance matrices and (conditional) sparse precision matrices. Within this framework, we also consider a special example to illustrate our method where data are sampled from an approximate factor model, which encompasses most practical situations. We provide a good approximation of the FDP via exploiting this specific dependence structure. The results are further generalized to the situation where the multivariate normality assumption is relaxed. Our results are demonstrated by simulation studies and some real data applications.

Journal Article

Share this book

Add to My Shelf

False discovery control in large‐scale spatial multiple testing

by Guindani, Michele , Tony Cai, T. , Reich, Brian J. in Algorithms , Approximation , Asymptotic methods

2015

The paper develops a unified theoretical and computational framework for false discovery control in multiple testing of spatial signals. We consider both pointwise and clusterwise spatial analyses, and derive oracle procedures which optimally control the false discovery rate, false discovery exceedance and false cluster rate. A data‐driven finite approximation strategy is developed to mimic the oracle procedures on a continuous spatial domain. Our multiple‐testing procedures are asymptotically valid and can be effectively implemented using Bayesian computational algorithms for analysis of large spatial data sets. Numerical results show that the procedures proposed lead to more accurate error control and better power performance than conventional methods. We demonstrate our methods for analysing the time trends in tropospheric ozone in eastern USA.

Journal Article

Share this book

Add to My Shelf

A Framework for Monte Carlo based Multiple Testing

by Hahn, Georg , Gandy, Axel in algorithm , algorithm, framework, hypothesis testing, Monte Carlo, multiple testing procedure, p‐value , Algorithms

2016

We are concerned with a situation in which we would like to test multiple hypotheses with tests whose p-values cannot be computed explicitly but can be approximated using Monte Carlo simulation. This scenario occurs widely in practice. We are interested in obtaining the same rejections and non-rejections as the ones obtained if the p-values for all hypotheses had been available. The present article introduces a framework for this scenario by providing a generic algorithm for a general multiple testing procedure. We establish conditions that guarantee that the rejections and non-rejections obtained through Monte Carlo simulations are identical to the ones obtained with the p-values. Our framework is applicable to a general class of step-up and step-down procedures, which includes many established multiple testing corrections such as the ones of Bonferroni, Holm, Sidak, Hochberg or Benjamini–Hochberg. Moreover, we show how to use our framework to improve algorithms available in the literature in such a way as to yield theoretical guarantees on their results. These modifications can easily be implemented in practice and lead to a particular way of reporting multiple testing results as three sets together with an error bound on their correctness, demonstrated exemplarily using a real biological dataset.

Journal Article

Share this book

Add to My Shelf

On the Benjamini-Hochberg Method

by Ferreira, J. A. , Zwinderman, A. H. in 60F05 , 62G30 , 62J15

2006

We investigate the properties of the Benjamini-Hochberg method for multiple testing and of a variant of Storey's generalization of it, extending and complementing the asymptotic and exact results available in the literature. Results are obtained under two different sets of assumptions and include asymptotic and exact expressions and bounds for the proportion of rejections, the proportion of incorrect rejections out of all rejections and two other proportions used to quantify the efficacy of the method.

Journal Article

Share this book

Add to My Shelf

Outlier Detection Algorithms Over Fuzzy Data with Weighted Least Squares

by Toneva, Daniela , Symes, Mark , Kolev, Krasimir in Algorithms , Artificial Intelligence , Computational Intelligence

2021

In the classical leave-one-out procedure for outlier detection in regression analysis, we exclude an observation and then construct a model on the remaining data. If the difference between predicted and observed value is high we declare this value an outlier. As a rule, those procedures utilize single comparison testing. The problem becomes much harder when the observations can be associated with a given degree of membership to an underlying population, and the outlier detection should be generalized to operate over fuzzy data. We present a new approach for outlier detection that operates over fuzzy data using two inter-related algorithms. Due to the way outliers enter the observation sample, they may be of various order of magnitude. To account for this, we divided the outlier detection procedure into cycles. Furthermore, each cycle consists of two phases. In Phase 1, we apply a leave-one-out procedure for each non-outlier in the dataset. In Phase 2, all previously declared outliers are subjected to Benjamini–Hochberg step-up multiple testing procedure controlling the false-discovery rate, and the non-confirmed outliers can return to the dataset. Finally, we construct a regression model over the resulting set of non-outliers. In that way, we ensure that a reliable and high-quality regression model is obtained in Phase 1 because the leave-one-out procedure comparatively easily purges the dubious observations due to the single comparison testing. In the same time, the confirmation of the outlier status in relation to the newly obtained high-quality regression model is much harder due to the multiple testing procedure applied hence only the true outliers remain outside the data sample. The two phases in each cycle are a good trade-off between the desire to construct a high-quality model (i.e., over informative data points) and the desire to use as much data points as possible (thus leaving as much observations as possible in the data sample). The number of cycles is user defined, but the procedures can finalize the analysis in case a cycle with no new outliers is detected. We offer one illustrative example and two other practical case studies (from real-life thrombosis studies) that demonstrate the application and strengths of our algorithms. In the concluding section, we discuss several limitations of our approach and also offer directions for future research.

Journal Article

Share this book

Add to My Shelf

Sequential selection procedures and false discovery rate control

by Wager, Stefan , Tibshirani, Robert , Chouldechova, Alexandra in Discovery , equations , False discovery rate

2016

We consider a multiple‐hypothesis testing setting where the hypotheses are ordered and one is only permitted to reject an initial contiguous block H1,…,Hk of hypotheses. A rejection rule in this setting amounts to a procedure for choosing the stopping point k. This setting is inspired by the sequential nature of many model selection problems, where choosing a stopping point or a model is equivalent to rejecting all hypotheses up to that point and none thereafter. We propose two new testing procedures and prove that they control the false discovery rate in the ordered testing setting. We also show how the methods can be applied to model selection by using recent results on p‐values in sequential model selection settings.

Journal Article

Share this book

Add to My Shelf

Control of Generalized Error Rates in Multiple Testing

by Romano, Joseph P. , Wolf, Michael in 62G10 , 62J15 , Applied statistics

2007

Consider the problem of testing s hypotheses simultaneously. The usual approach restricts attention to procedures that control the probability of even one false rejection, the familywise error rate (FWER). If s is large, one might be willing to tolerate more than one false rejection, thereby increasing the ability of the procedure to correctly reject false null hypotheses. One possibility is to replace control of the FWER by control of the probability of k or more false rejections, which is called the k-FWER. We derive both single-step and step-down procedures that control the k-FWER in finite samples or asymptotically, depending on the situation. We also consider the false discovery proportion (FDP) defined as the number of false rejections divided by the total number of rejections (and defined to be 0 if there are no rejections). The false discovery rate proposed by Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B 57 (1995) 289-300] controls E(FDP). Here, the goal is to construct methods which satisfy, for a given γ and α, P{FDP > γ} ≤ α, at least asymptotically. In contrast to the proposals of Lehmann and Romano [Ann. Statist. 33 (2005) 1138-1154], we construct methods that implicitly take into account the dependence structure of the individual test statistics in order to further increase the ability to detect false null hypotheses. This feature is also shared by related work of van der Laan, Dudoit and Pollard [Stat. Appl. Genet. Mol. Biol. 3 (2004) article 15], but our methodology is quite different. Like the work of Pollard and van der Laan [Proc. 2003 International Multi-Conference in Computer Science and Engineering, METMBS'03 Conference (2003) 3-9] and Dudoit, van der Laan and Pollard [Stat. Appl. Genet. Mol. Biol. 3 (2004) article 13], we employ resampling methods to achieve our goals. Some simulations compare finite sample performance to currently available methods.

Journal Article

Share this book

Add to My Shelf

When to adjust alpha during multiple testing

by Rubin, Mark in Applications , Context , Education

2021

Scientists often adjust their significance threshold (alpha level) during null hypothesis significance testing in order to take into account multiple testing and multiple comparisons. This alpha adjustment has become particularly relevant in the context of the replication crisis in science. The present article considers the conditions in which this alpha adjustment is appropriate and the conditions in which it is inappropriate. A distinction is drawn between three types of multiple testing: disjunction testing, conjunction testing, and individual testing. It is argued that alpha adjustment is only appropriate in the case of disjunction testing, in which at least one test result must be significant in order to reject the associated joint null hypothesis. Alpha adjustment is inappropriate in the case of conjunction testing, in which all relevant results must be significant in order to reject the joint null hypothesis. Alpha adjustment is also inappropriate in the case of individual testing, in which each individual result must be significant in order to reject each associated individual null hypothesis. The conditions under which each of these three types of multiple testing is warranted are examined. It is concluded that researchers should not automatically (mindlessly) assume that alpha adjustment is necessary during multiple testing. Illustrations are provided in relation to joint studywise hypotheses and joint multiway ANOVAwise hypotheses.

Journal Article

Share this book

Add to My Shelf

Multiplicity Eludes Peer Review: The Case of COVID-19 Research

by García, Luis Ventura , Gutiérrez-Hernández, Oliver in Case studies , Coronaviruses , COVID-19

2021

Multiplicity arises when data analysis involves multiple simultaneous inferences, increasing the chance of spurious findings. It is a widespread problem frequently ignored by researchers. In this paper, we perform an exploratory analysis of the Web of Science database for COVID-19 observational studies. We examined 100 top-cited COVID-19 peer-reviewed articles based on p-values, including up to 7100 simultaneous tests, with 50% including >34 tests, and 20% > 100 tests. We found that the larger the number of tests performed, the larger the number of significant results (r = 0.87, p < 10−6). The number of p-values in the abstracts was not related to the number of p-values in the papers. However, the highly significant results (p < 0.001) in the abstracts were strongly correlated (r = 0.61, p < 10−6) with the number of p < 0.001 significances in the papers. Furthermore, the abstracts included a higher proportion of significant results (0.91 vs. 0.50), and 80% reported only significant results. Only one reviewed paper addressed multiplicity-induced type I error inflation, pointing to potentially spurious results bypassing the peer-review process. We conclude the need to pay special attention to the increased chance of false discoveries in observational studies, including non-replicated striking discoveries with a potentially large social impact. We propose some easy-to-implement measures to assess and limit the effects of multiplicity.

Journal Article

Share this book

Add to My Shelf

To adjust, or not to adjust, for multiple comparisons

by Hooper, Richard in Bonferroni , Clinical trials , Data Interpretation, Statistical

2025

Questions often arise concerning when, whether, and how we should adjust our interpretation of the results from multiple hypothesis tests. Strong arguments have been put forward in the epidemiological literature against any correction or adjustment for multiplicity, but regulatory requirements (particularly for pharmaceutical trials) can sometimes trump other concerns. The formal basis for adjustment is often the control of error rates, and hence the problems of multiplicity may seem rooted in a purely frequentist paradigm, though this can be a restrictive viewpoint. Commentators may never wholly agree on any of these things. This article draws some of the key threads from the discussion and suggests further reading.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter