Catalogue Search | MBRL

EXACT POST-SELECTION INFERENCE, WITH APPLICATION TO THE LASSO

by Sun, Dennis L. , Sun, Yuekai , Lee, Jason D. in Confidence interval , Confidence intervals , Estimators

2016

We develop a general approach to valid inference after model selection. At the core of our framework is a result that characterizes the distribution of a post-selection estimator conditioned on the selection event. We specialize the approach to model selection by the lasso to form valid confidence intervals for the selected coefficients and test whether all relevant variables have been included in the model.

Journal Article

Share this book

Add to My Shelf

Estimation methods for the variance of Birnbaum-Saunders distribution containing zero values with application to wind speed data in Thailand

by Ratasukharom, Natchaya , Niwitpong, Sa-Aat , Niwitpong, Suparat in Air pollution , Air Pollution - analysis , Air quality

2024

Thailand is currently grappling with a severe problem of air pollution, especially from small particulate matter (PM), which poses considerable threats to public health. The speed of the wind is pivotal in spreading these harmful particles across the atmosphere. Given the inherently unpredictable wind speed behavior, our focus lies in establishing the confidence interval (CI) for the variance of wind speed data. To achieve this, we will employ the delta-Birnbaum-Saunders (delta-BirSau) distribution. This statistical model allows for analyzing wind speed data and offers valuable insights into its variability and potential implications for air quality. The intervals are derived from ten different methods: generalized confidence interval (GCI), bootstrap confidence interval (BCI), generalized fiducial confidence interval (GFCI), and normal approximation (NA). Specifically, we apply GCI, BCI, and GFCI while considering the estimation of the proportion of zeros using the variance stabilized transformation (VST), Wilson, and Hannig methods. To evaluate the performance of these methods, we conduct a simulation study using Monte Carlo simulations in the R statistical software. The study assesses the coverage probabilities and average widths of the proposed confidence intervals. The simulation results reveal that GFCI based on the Wilson method is optimal for small sample sizes, GFCI based on the Hannig method excels for medium sample sizes, and GFCI based on the VST method stands out for large sample sizes. To further validate the practical application of these methods, we employ daily wind speed data from an industrial area in Prachin Buri and Rayong provinces, Thailand.

Journal Article

Share this book

Add to My Shelf

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

by Kruschke, John K. , Liddell, Torrin M. in Bayes Theorem , Bayesian analysis , Behavioral Science and Psychology

2018

In the practice of data analysis, there is a conceptual distinction between hypothesis testing, on the one hand, and estimation with quantified uncertainty on the other. Among frequentists in psychology, a shift of emphasis from hypothesis testing to estimation has been dubbed “the New Statistics” (Cumming 2014 ). A second conceptual distinction is between frequentist methods and Bayesian methods. Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. The article reviews frequentist and Bayesian approaches to hypothesis testing and to estimation with confidence or credible intervals. The article also describes Bayesian approaches to meta-analysis, randomized controlled trials, and power analysis.

Journal Article

Share this book

Add to My Shelf

Twelve-Year Analysis of NO2 Concentration Measurements at Belisario Station (Quito, Ecuador) Using Statistical Inference Techniques

by Mendez, Alfredo , Hernandez, Wilmar in Air pollution , Artificial intelligence , classic analysis

2020

In this paper, a robust analysis of nitrogen dioxide (NO2) concentration measurements taken at Belisario station (Quito, Ecuador) was performed. The data used for the analysis constitute a set of measurements taken from 1 January 2008 to 31 December 2019. Furthermore, the analysis was carried out in a robust way, defining variables that represent years, months, days and hours, and classifying these variables based on estimates of the central tendency and dispersion of the data. The estimators used here were classic, nonparametric, based on a bootstrap method, and robust. Additionally, confidence intervals based on these estimators were built, and these intervals were used to categorize the variables under study. The results of this research showed that the NO2 concentration at Belisario station is not harmful to humans. Moreover, it was shown that this concentration tends to be stable across the years, changes slightly during the days of the week, and varies greatly when analyzed by months and hours of the day. Here, the precision provided by both nonparametric and robust statistical methods served to comprehensively proof the aforementioned. Finally, it can be concluded that the city of Quito is progressing on the right path in terms of improving air quality, because it has been shown that there is a decreasing tendency in the NO2 concentration across the years. In addition, according to the Quito Air Quality Index, most of the observations are in either the desirable level or acceptable level of air pollution, and the number of observations that are in the desirable level of air pollution increases across the years.

Journal Article

Share this book

Add to My Shelf

Exact Post-Selection Inference for Sequential Regression Procedures

by Tibshirani, Robert , Taylor, Jonathan , Tibshirani, Ryan J. in Confidence interval , Confidence intervals , equations

2016

We propose new inference tools for forward stepwise regression, least angle regression, and the lasso. Assuming a Gaussian model for the observation vector y, we first describe a general scheme to perform valid inference after any selection event that can be characterized as y falling into a polyhedral set. This framework allows us to derive conditional (post-selection) hypothesis tests at any step of forward stepwise or least angle regression, or any step along the lasso regularization path, because, as it turns out, selection events for these procedures can be expressed as polyhedral constraints on y. The p-values associated with these tests are exactly uniform under the null distribution, in finite samples, yielding exact Type I error control. The tests can also be inverted to produce confidence intervals for appropriate underlying regression parameters. The R package selectiveInference , freely available on the CRAN repository, implements the new inference tools described in this article. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Confidence intervals for low dimensional parameters in high dimensional linear models

by Zhang, Cun-Hui , Zhang, Stephanie S. in Analysis of covariance , Asymptotic methods , Asymptotic properties

2014

The purpose of this paper is to propose methodologies for statistical inference of low dimensional parameters with high dimensional data. We focus on constructing confidence intervals for individual coefficients and linear combinations of several of them in a linear regression model, although our ideas are applicable in a much broader context. The theoretical results that are presented provide sufficient conditions for the asymptotic normality of the proposed estimators along with a consistent estimator for their finite dimensional covariance matrices. These sufficient conditions allow the number of variables to exceed the sample size and the presence of many small non‐zero coefficients. Our methods and theory apply to interval estimation of a preconceived regression coefficient or contrast as well as simultaneous interval estimation of many regression coefficients. Moreover, the method proposed turns the regression data into an approximate Gaussian sequence of point estimators of individual regression coefficients, which can be used to select variables after proper thresholding. The simulation results that are presented demonstrate the accuracy of the coverage probability of the confidence intervals proposed as well as other desirable properties, strongly supporting the theoretical results.

Journal Article

Share this book

Add to My Shelf

Cronbach's alpha reliability: Interval estimation, hypothesis testing, and sample size planning

by Bonett, Douglas G. , Wright, Thomas A. in Confidence , confidence interval , Confidence intervals

2015

Cronbach’s alpha is one of the most widely used measures of reliability in the social and organizational sciences. Current practice is to report the sample value of Cronbach’s alpha reliability, but a confidence interval for the population reliability value also should be reported. The traditional confidence interval for the population value of Cronbach’s alpha makes an unnecessarily restrictive assumption that the multiple measurements have equal variances and equal covariances. We propose a confidence interval that does not require equal variances or equal covariances. The results of a simulation study demonstrated that the proposed method performed better than alternative methods. We also present some sample size formulas that approximate the sample size requirements for desired power or desired confidence interval precision. R functions are provided that can be used to implement the proposed confidence interval and sample size methods.

Journal Article

Share this book

Add to My Shelf

High-Dimensional Inference: Confidence Intervals, p-Values and R-Software hdi

by Meier, Lukas , Meinshausen, Nicolai , Dezeure, Ruben in Clustering , confidence interval , Confidence intervals

2015

We present a (selective) review of recent frequentist high-dimensional inference methods for constructing p-values and confidence intervals in linear and generalized linear models. We include a broad, comparative empirical study which complements the viewpoint from statistical methodology and theory. Furthermore, we introduce and illustrate the R-package hdi which easily allows the use of different methods and supports reproducibility.

Journal Article

Share this book

Add to My Shelf

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

by Wager, Stefan , Athey, Susan in Adaptive nearest neighbors matching , Algorithms , Asymptotic methods

2018

Many scientific and engineering challenges-ranging from personalized medicine to customized marketing recommendations-require an understanding of treatment effect heterogeneity. In this article, we develop a nonparametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. In the potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference. In experiments, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially in the presence of irrelevant covariates.

Journal Article

Share this book

Add to My Shelf

SEMI-SUPERVISED INFERENCE

by Zhang, Anru , Brown, Lawrence D. , Cai, T. Tony in Asymptotic methods , Asymptotic properties , Confidence intervals

2019

We propose a general semi-supervised inference framework focused on the estimation of the population mean. As usual in semi-supervised settings, there exists an unlabeled sample of covariate vectors and a labeled sample consisting of covariate vectors along with real-valued responses (“labels”). Otherwise, the formulation is “assumption-lean” in that no major conditions are imposed on the statistical or functional form of the data. We consider both the ideal semi-supervised setting where infinitely many unlabeled samples are available, as well as the ordinary semi-supervised setting in which only a finite number of unlabeled samples is available. Estimators are proposed along with corresponding confidence intervals for the population mean. Theoretical analysis on both the asymptotic distribution and ℓ₂-risk for the proposed procedures are given. Surprisingly, the proposed estimators, based on a simple form of the least squares method, outperform the ordinary sample mean. The simple, transparent form of the estimator lends confidence to the perception that its asymptotic improvement over the ordinary sample mean also nearly holds even for moderate size samples. The method is further extended to a nonparametric setting, in which the oracle rate can be achieved asymptotically. The proposed estimators are further illustrated by simulation studies and a real data example involving estimation of the homeless population.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter