Catalogue Search | MBRL

ONLY CLOSED TESTING PROCEDURES ARE ADMISSIBLE FOR CONTROLLING FALSE DISCOVERY PROPORTIONS

by Goeman, Jelle J. , Hemerik, Jesse , Solari, Aldo in Algorithms , Errors , False information

2021

We consider the class of all multiple testing methods controlling tail probabilities of the false discovery proportion, either for one random set or simultaneously for many such sets. This class encompasses methods controlling familywise error rate, generalized familywise error rate, false discovery exceedance, joint error rate, simultaneous control of all false discovery proportions, and others, as well as gene set testing in genomics and cluster inference in neuroimaging. We show that all such methods are either equivalent to a closed testing procedure, or are uniformly improved by one. Moreover, we show that a closed testing method is admissible if and only if all its local tests are admissible. This implies that, when designing methods, it is sufficient to restrict attention to closed testing. We demonstrate the practical usefulness of this design principle by obtaining more informative inferences from the method of higher criticism, and by constructing a uniform improvement of a recently proposed method.

Journal Article

Share this book

Add to My Shelf

GAUSSIAN APPROXIMATION FOR HIGH DIMENSIONAL TIME SERIES

by Wu, Wei Biao , Zhang, Danna in Approximation , Confidence intervals , Covariance matrix

2017

We consider the problem of approximating sums of high dimensional stationary time series by Gaussian vectors, using the framework of functional dependence measure. The validity of the Gaussian approximation depends on the sample size n, the dimension p, the moment condition and the dependence of the underlying processes. We also consider an estimator for long-run covariance matrices and study its convergence properties. Our results allow constructing simultaneous confidence intervals for mean vectors of high-dimensional time series with asymptotically correct coverage probabilities. As an application, we propose a Kolmogorov–Smirnov-type statistic for testing distributions of high-dimensional time series.

Journal Article

Share this book

Add to My Shelf

LASSO-DRIVEN INFERENCE IN TIME AND SPACE

by Huang, Chen , Wang, Weining , Chernozhukov, Victor in Bootstrap method , Estimating techniques , Inference

2021

We consider the estimation and inference in a system of high-dimensional regression equations allowing for temporal and cross-sectional dependency in covariates and error processes, covering rather general forms of weak temporal dependence. A sequence of regressions with many regressors using LASSO (Least Absolute Shrinkage and Selection Operator) is applied for variable selection purpose, and an overall penalty level is carefully chosen by a block multiplier bootstrap procedure to account for multiplicity of the equations and dependencies in the data. Correspondingly, oracle properties with a jointly selected tuning parameter are derived. We further provide high-quality de-biased simultaneous inference on the many target parameters of the system. We provide bootstrap consistency results of the test procedure, which are based on a general Bahadur representation for the Z-estimators with dependent data. Simulations demonstrate good performance of the proposed inference procedure. Finally, we apply the method to quantify spillover effects of textual sentiment indices in a financial market and to test the connectedness among sectors.

Journal Article

Share this book

Add to My Shelf

Simultaneous Inference for High-Dimensional Linear Models

by Zhang, Xianyang , Cheng, Guang in Americans , Desparsifying Lasso , dimensions

2017

This article proposes a bootstrap-assisted procedure to conduct simultaneous inference for high-dimensional sparse linear models based on the recent desparsifying Lasso estimator. Our procedure allows the dimension of the parameter vector of interest to be exponentially larger than sample size, and it automatically accounts for the dependence within the desparsifying Lasso estimator. Moreover, our simultaneous testing method can be naturally coupled with the margin screening to enhance its power in sparse testing with a reduced computational cost, or with the step-down method to provide a strong control for the family-wise error rate. In theory, we prove that our simultaneous testing procedure asymptotically achieves the prespecified significance level, and enjoys certain optimality in terms of its power even when the model errors are non-Gaussian. Our general theory is also useful in studying the support recovery problem. To broaden the applicability, we further extend our main results to generalized linear models with convex loss functions. The effectiveness of our methods is demonstrated via simulation studies. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Extremal Depth for Functional Data and Applications

by Narisetty, Naveen N. , Nair, Vijayan N. in Central regions , Data , Data depth

2016

We propose a new notion called \"extremal depth\" (ED) for functional data, discuss its properties, and compare its performance with existing concepts. The proposed notion is based on a measure of extreme \"outlyingness.\" ED has several desirable properties that are not shared by other notions and is especially well suited for obtaining central regions of functional data and function spaces. In particular: (a) the central region achieves the nominal (desired) simultaneous coverage probability; (b) there is a correspondence between ED-based (simultaneous) central regions and appropriate pointwise central regions; and (c) the method is resistant to certain classes of functional outliers. The article examines the performance of ED and compares it with other depth notions. Its usefulness is demonstrated through applications to constructing central regions, functional boxplots, outlier detection, and simultaneous confidence bands in regression problems. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

general framework for multiple testing dependence

by Leek, Jeffrey T , Storey, John D in Algorithms , brain , Computer Simulation

2008

We develop a general framework for performing large-scale significance testing in the presence of arbitrarily strong dependence. We derive a low-dimensional set of random vectors, called a dependence kernel, that fully captures the dependence structure in an observed high-dimensional dataset. This result shows a surprising reversal of the \"curse of dimensionality\" in the high-dimensional hypothesis testing setting. We show theoretically that conditioning on a dependence kernel is sufficient to render statistical tests independent regardless of the level of dependence in the observed data. This framework for multiple testing dependence has implications in a variety of common multiple testing problems, such as in gene expression studies, brain imaging, and spatial epidemiology.

Journal Article

Share this book

Add to My Shelf

VALID POST-SELECTION INFERENCE IN MODEL-FREE LINEAR REGRESSION

by Buja, Andreas , Kuchibhotla, Arun K. , Brown, Lawrence D. in Asymptotic properties , Computational efficiency , Confidence

2020

Modern data-driven approaches to modeling make extensive use of covariate/ model selection. Such selection incurs a cost: it invalidates classical statistical inference. A conservative remedy to the problem was proposed by Berk et al. (Ann. Statist. 41 (2013) 802–837) and further extended by Bachoc, Preinerstorfer and Steinberger (2016). These proposals, labeled “PoSI methods,” provide valid inference after arbitrary model selection. They are computationally NP-hard and have limitations in their theoretical justifications. We therefore propose computationally efficient confidence regions, named “UPoSI”¹ and prove large-p asymptotics for them. We do this for linear OLS regression allowing misspecification of the normal linear model, for both fixed and random covariates, and for independent as well as some types of dependent data. We start by proving a general equivalence result for the post-selection inference problem and a simultaneous inference problem in a setting that strips inessential features still present in a related result of Berk et al. (Ann. Statist. 41 (2013) 802–837). We then construct valid PoSI confidence regions that are the first to have vastly improved computational efficiency in that the required computation times grow only quadratically rather than exponentially with the total number p of covariates. These are also the first PoSI confidence regions with guaranteed asymptotic validity when the total number of covariates p diverges (almost exponentially) with the sample size n. Under standard tail assumptions, we only require (log p)⁷ = o(n) and k = o ( n / log p ) where k (≤ p) is the largest number of covariates (model size) considered for selection. We study various properties of these confidence regions, including their Lebesgue measures, and compare them theoretically with those proposed previously.

Journal Article

Share this book

Add to My Shelf

Simultaneous confidence bands for multiple comparisons of several percentile lines

by Zhang, Yu , Zhou, Sanyu in Random variables , Regression analysis , Simulation

2025

In practice, it is often necessary to compare several percentile lines. To that end, a set of simultaneous confidence bands has been constructed. The contributions of this research are as follows: (1) the proposed bands are constructed and used to multiple comparisons of several percentile lines for the first time; (2) they allow to draw various comparisons: pairwise, successive and many-to-one; and (3) the comparisons can be drawn on any intervals of interest, and provide more information on both the magnitude and the direction of difference. In addition, practical applications are presented.

Journal Article

Share this book

Add to My Shelf

Size, Power and False Discovery Rates

by Efron, Bradley in 62G07 , 62J07 , Arrays

2007

Modern scientific technology has provided a new class of large-scale simultaneous inference problems, with thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology, but similar situations arise in proteomics, spectroscopy, imaging, and social science surveys. This paper uses false discovery rate methods to carry out both size and power calculations on large-scale problems. A simple empirical Bayes approach allows the false discovery rate (fdr) analysis to proceed with a minimum of frequentist or Bayesian modeling assumptions. Closed-form accuracy formulas are derived for estimated false discovery rates, and used to compare different methodologies: local or tail-area fdr's, theoretical, permutation, or empirical null hypothesis estimates. Two microarray data sets as well as simulations are used to evaluate the methodology, the power diagnostics showing why nonnull cases might easily fail to appear on a list of \"significant\" discoveries.

Journal Article

Share this book

Add to My Shelf

The Positive False Discovery Rate: A Bayesian Interpretation and the q-Value

by Storey, John D. in 62F03 , Bayesian analysis , Classification theory

2003

Multiple hypothesis testing is concerned with controlling the rate of false positives when testing several hypotheses simultaneously. One multiple hypothesis testing error measure is the false discovery rate (FDR), which is loosely defined to be the expected proportion of false positives among all significant hypotheses. The FDR is especially appropriate for exploratory analyses in which one is interested in finding several significant results among many tests. In this work, we introduce a modified version of the FDR called the \"positive false discovery rate\" (pFDR). We discuss the advantages and disadvantages of the pFDR and investigate its statistical properties. When assuming the test statistics follow a mixture distribution, we show that the pFDR can be written as a Bayesian posterior probability and can be connected to classification theory. These properties remain asymptotically true under fairly general conditions, even under certain forms of dependence. Also, a new quantity called the \"q -value\" is introduced and investigated, which is a natural \"Bayesian posterior p-value,\" or rather the pFDR analogue of the p-value.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter