Catalogue Search | MBRL

CONTROLLING THE FALSE DISCOVERY RATE VIA KNOCKOFFS

by Candès, Emmanuel J. , Barber, Rina Foygel in 62F03 , 62J05 , false discovery rate (FDR)

2015

In many fields of science, we observe a response variable together with a large number of potential explanatory variables, and would like to be able to discover which variables are truly associated with the response. At the same time, we need to know that the false discovery rate (FDR)—the expected fraction of false discoveries among all discoveries—is not too high, in order to assure the scientist that most of the discoveries are indeed true and replicable. This paper introduces the knockoff filter, a new variable selection procedure controlling the FDR in the statistical linear model whenever there are at least as many observations as variables. This method achieves exact FDR control in finite sample settings no matter the design or covariates, the number of variables in the model, or the amplitudes of the unknown regression coefficients, and does not require any knowledge of the noise level. As the name suggests, the method operates by manufacturing knockoff variables that are cheap—their construction does not require any new data—and are designed to mimic the correlation structure found within the existing variables, in a way that allows for accurate FDR control, beyond what is possible with permutation-based methods. The method of knockoffs is very general and flexible, and can work with a broad class of test statistics. We test the method in combination with statistics from the Lasso for sparse regression, and obtain empirical results showing that the resulting method has far more power than existing selection rules when the proportion of null variables is high.

Journal Article

Share this book

Add to My Shelf

Accumulation Tests for FDR Control in Ordered Hypothesis Testing

by Barber, Rina Foygel , Li, Ang in Accumulation , Accumulation test , Americans

2017

Multiple testing problems arising in modern scientific applications can involve simultaneously testing thousands or even millions of hypotheses, with relatively few true signals. In this article, we consider the multiple testing problem where prior information is available (for instance, from an earlier study under different experimental conditions), that can allow us to test the hypotheses as a ranked list to increase the number of discoveries. Given an ordered list of n hypotheses, the aim is to select a data-dependent cutoff k and declare the first k hypotheses to be statistically significant while bounding the false discovery rate (FDR). Generalizing several existing methods, we develop a family of \"accumulation tests\" to choose a cutoff k that adapts to the amount of signal at the top of the ranked list. We introduce a new method in this family, the HingeExp method, which offers higher power to detect true signals compared to existing techniques. Our theoretical results prove that these methods control a modified FDR on finite samples, and characterize the power of the methods in the family. We apply the tests to simulated data, including a high-dimensional model selection problem for linear regression. We also compare accumulation tests to existing methods for multiple testing on a real data problem of identifying differential gene expression over a dosage gradient. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Detecting concept change in dynamic data streams

by Koh, Yun Sing , Sakthithasan, Sripirakas , Pears, Russel in Algorithmics. Computability. Computer arithmetics , Applied sciences , Artificial Intelligence

2014

In this research we present a novel approach to the concept change detection problem. Change detection is a fundamental issue with data stream mining as classification models generated need to be updated when significant changes in the underlying data distribution occur. A number of change detection approaches have been proposed but they all suffer from limitations with respect to one or more key performance factors such as high computational complexity, poor sensitivity to gradual change, or the opposite problem of high false positive rate. Our approach uses reservoir sampling to build a sequential change detection model that offers statistically sound guarantees on false positive and false negative rates but has much smaller computational complexity than the ADWIN concept drift detector. Extensive experimentation on a wide variety of datasets reveals that the scheme also has a smaller false detection rate while maintaining a competitive true detection rate to ADWIN.

Journal Article

Share this book

Add to My Shelf

Online Multi-Layer FDR Control

by Wang, Runqiu , Dai, Ran in Algorithms , Decision making , Drug development

2025

When hypotheses are tested in a stream and real-time decision-making is needed, online sequential hypothesis testing procedures are needed. Furthermore, these hypotheses are commonly partitioned into groups by their nature. For example, RNA nanocapsules can be partitioned based on the therapeutic nucleic acids (siRNAs) being used, as well as the delivery nanocapsules. When selecting effective RNA nanocapsules, simultaneous false discovery rate control at multiple partition levels is needed. In this paper, we develop hypothesis testing procedures which control the false discovery rate (FDR) simultaneously for multiple partitions of hypotheses in an online fashion. We provide rigorous proofs for their FDR or modified FDR (mFDR) control properties and use extensive simulations to demonstrate their performances.

Journal Article

Share this book

Add to My Shelf

Estimation of an Extent of Sinusoidal Voltage Waveform Distortion Using Parametric and Nonparametric Multiple-Hypothesis Sequential Testing in Devices for Automatic Control of Power Quality Indices

by Suslov, Konstantin , Kulikov, Aleksandr , Sevostyanov, Aleksandr in Algorithms , Classification , Consumers

2024

Deviations of power quality indices (PQI) from standard values in power supply systems of industrial consumers lead to defective products, complete shutdown of production processes, and significant damage. At the same time, the PQI requirements vary depending on the industrial consumer, which is due to different kinds, types, and composition of essential electrical loads. To ensure their reliable operation, it is crucial to introduce automatic PQI control devices, which evaluate the extent of distortion of the sinusoidal voltage waveform of a three-phase system. This allows the power dispatchers of grid companies and industrial enterprises to quickly make decisions on the measures to be taken in external and internal power supply networks to ensure that the PQI values are within the acceptable range. This paper proposes the use of an integrated indicator to assess the extent of distortion of the sinusoidal voltage waveform in a three-phase system. This indicator is based on the use of the magnitude of the ratio of complex amplitudes of the forward and reverse rotation of the space vector. In the study discussed, block diagrams of algorithms and flowcharts of automatic PQI control devices are developed, which implement parametric and nonparametric multiple-hypothesis sequential analysis using an integrated indicator. In this case, Palmer’s algorithm and the nearest neighbor method are used. The calculations demonstrate that the developed algorithms have high speed and high performance in detecting deviations of the electrical power quality.

Journal Article

Share this book

Add to My Shelf

MULTIPLE HYPOTHESIS TESTS CONTROLLING GENERALIZED ERROR RATES FOR SEQUENTIAL DATA

by Bartroff, Jay in Applied statistics , Approximation , Critical values

2018

The γ-FDP and k-FWER multiple testing error metrics, which are tail probabilities of the respective error statistics, have become popular recently as alternatives to the FDR and FWER. We propose general and flexible stepup and stepdown procedures for testing multiple hypotheses about sequential (or streaming) data that simultaneously control both the type I and II versions of γ-FDP, or k-FWER. The error control holds regardless of the dependence between data streams, which may be of arbitrary size and shape. All that is needed is a test statistic for each data stream that controls the conventional type I and II error probabilities, and no information or assumptions are required about the joint distribution of the statistics or data streams. The procedures can be used with sequential, group sequential, truncated, or other sampling schemes. We give recommendations for the procedures' implementation including closed-form expressions for the needed critical values in some commonly-encountered testing situations. The proposed sequential procedures are compared with each other and with comparable fixed sample size procedures in the context of strongly positively correlated Gaussian data streams. For this setting we conclude that both the stepup and stepdown sequential procedures provide substantial savings over the fixed sample procedures in terms of expected sample size, and the stepup procedure performs slightly but consistently better than the stepdown for γ-FDP control, with the relationship reversed for k-FWER control.

Journal Article

Share this book

Add to My Shelf

Universal scheme for optimal search and stop

by NITINAWARAT, SIRIN , VEERAVALLI, VENUGOPAL V.

2017

The problem of universal search and stop using an adaptive search policy is considered. When the unique target location is searched, the observation is distributed according to the target distribution, otherwise it is distributed according to the absence distribution. A universal scheme for search and stop is proposed using only the knowledge of the absence distribution, and its asymptotic performance is analyzed. The universal test is shown to yield a vanishing error probability, and to achieve the optimal reliability when the target is present, universally for every target distribution. Consequently, it is established that the knowledge of the target distribution is only useful for improving the reliability for detecting that the target is missing. It is also shown that a multiplicative gain for the search reliability equal to the number of searched locations is achieved by allowing adaptivity in the search.

Journal Article

Share this book

Add to My Shelf

Real-Time Detection and Classification of Power Quality Disturbances

by Mozaffari, Mahsa , Doshi, Keval , Yilmaz, Yasin in Alternative energy sources , Analysis , anomaly detection

2022

This paper considers the problem of real-time detection and classification of power quality disturbances in power delivery systems. We propose a sequential and multivariate disturbance detection method (aiming for quick and accurate detection). Our proposed detector follows a non-parametric and supervised approach, i.e., it learns nominal and anomalous patterns from training data involving clean and disturbance signals. The multivariate nature of the method enables joint processing of data from multiple meters, facilitating quicker detection as a result of the cooperative analysis. We further extend our supervised sequential detection method to a multi-hypothesis setting, which aims to classify the disturbance events as quickly and accurately as possible in a real-time manner. The multi-hypothesis method requires a training dataset per hypothesis, i.e., per each disturbance type as well as the ’no disturbance’ case. The proposed classification method is demonstrated to quickly and accurately detect and classify power disturbances.

Journal Article

Share this book

Add to My Shelf

A Rejection Principle for Sequential Tests of Multiple Hypotheses Controlling Familywise Error Rates

by Song, Jinlin , Bartroff, Jay in closed testing , Error analysis , Hypotheses

2016

We present a unifying approach to multiple testing procedures for sequential (or streaming) data by giving sufficient conditions for a sequential multiple testing procedure to control the familywise error rate (FWER). Together, we call these conditions a 'rejection principle for sequential tests', which we then apply to some existing sequential multiple testing procedures to give simplified understanding of their FWER control. Next, the principle is applied to derive two new sequential multiple testing procedures with provable FWER control, one for testing hypotheses in order and another for closed testing. Examples of these new procedures are given by applying them to a chromosome aberration data set and finding the maximum safe dose of a treatment.

Journal Article

Share this book

Add to My Shelf

Investment Timing with Incomplete Information and Multiple Means of Learning

by Sunar, Nur , Harrison, J. Michael in Analysis , Bayesian analysis , Bayesian sequential hypothesis testing

2015

We consider a firm that can use one of several costly learning modes to dynamically reduce uncertainty about the unknown value of a project. Each learning mode incurs cost at a particular rate and provides information of a particular quality. In addition to dynamic decisions about its learning mode, the firm must decide when to stop learning and either invest or abandon the project. Using a continuous-time Bayesian framework, and assuming a binary prior distribution for the project’s unknown value, we solve both the discounted and undiscounted versions of this problem. In the undiscounted case, the optimal learning policy is to choose the mode that has the smallest cost per signal quality. When the discount rate is strictly positive, we prove that an optimal learning and investment policy can be summarized by a small number of critical values, and the firm only uses learning modes that lie on a certain convex envelope in cost-rate-versus-signal-quality space. We extend our analysis to consider a firm that can choose multiple learning modes simultaneously, which requires the analysis of both investment timing and dynamic subset selection decisions. We solve both the discounted and undiscounted versions of this problem and explicitly identify sets of learning modes that are used under the optimal policy.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter