Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
21,272
result(s) for
"Sampling distributions"
Sort by:
The Unit-Root Revolution Revisited: Where Do Non-Standard Sampling Distributions and Related Conundrums Stem From?
The primary objective of the paper is twofold.
, to answer the question posed in the title by arguing that the conundrums: [C1] the non-standard sampling distributions, [C2] the low power of unit-root tests for
∈ [0.9, 1], and [C3] their size distortions, [C4] issues in handling
, and [C5] the framing of
and
in testing
= 1, as well as [C6] two competing parametrizations for the AR(1) models, (B)
=
+
+
, (C)
=
+
+
+
,
viewing these models as aPriori Postulated (aPP) stochastic difference equations driven by the error process
.
, to use R.A. Fisher’s model-based statistical perspective to unveil the statistical models implicit in each of the AR(1): (B)-(C) models, specified entirely in terms of probabilistic assumptions assigned to the observable process
underlying the data
, which is all that matters for inference. The key culprit behind [C1]–[C6] is the presumption that the AR(1) nests the unit root [UR(1)] model when
= 1, which is shown to belie Kolmogorov’s existence theorem as it relates to
. Fisher’s statistical perspective reveals that the statistical AR(1) and UR(1) models are grounded on (i) two distinct processes
, with (ii) different probabilistic assumptions and (iii) statistical parametrizations, (iv) rendering them
, and (v) their respective likelihood-based inferential components are free from conundrums [C1]–[C6]. The claims (i)–(v) are affirmed by analytical derivations, simulations, as well as proposing a non-stationary AR(1) model that nests the related UR(1) model, where testing
= 1 relies on likelihood-based tests free from conundrums [C1]–[C6].
Journal Article
Monte Carlo Simulation for Lasso-Type Problems by Estimator Augmentation
2014
Regularized linear regression under the ℓ
1
penalty, such as the Lasso, has been shown to be effective in variable selection and sparse modeling. The sampling distribution of an ℓ
1
-penalized estimator
is hard to determine as the estimator is defined by an optimization problem that in general can only be solved numerically and many of its components may be exactly zero. Let S be the subgradient of the ℓ
1
norm of the coefficient vector β evaluated at
. We find that the joint sampling distribution of
and S, together called an augmented estimator, is much more tractable and has a closed-form density under a normal error distribution in both low-dimensional (p ⩽ n) and high-dimensional (p > n) settings. Given β and the error variance σ
2
, one may employ standard Monte Carlo methods, such as Markov chain Monte Carlo (MCMC) and importance sampling (IS), to draw samples from the distribution of the augmented estimator and calculate expectations with respect to the sampling distribution of
. We develop a few concrete Monte Carlo algorithms and demonstrate with numerical examples that our approach may offer huge advantages and great flexibility in studying sampling distributions in ℓ
1
-penalized linear regression. We also establish nonasymptotic bounds on the difference between the true sampling distribution of
and its estimator obtained by plugging in estimated parameters, which justifies the validity of Monte Carlo simulation from an estimated sampling distribution even when p ≫ n → ∞.
Journal Article
Learning to Optimize via Posterior Sampling
2014
This paper considers the use of a simple posterior sampling algorithm to balance between exploration and exploitation when learning to optimize actions such as in multiarmed bandit problems. The algorithm, also known as
Thompson Sampling
and as
probability matching
, offers significant advantages over the popular upper confidence bound (UCB) approach, and can be applied to problems with finite or infinite action spaces and complicated relationships among action rewards. We make two theoretical contributions. The first establishes a connection between posterior sampling and UCB algorithms. This result lets us convert regret bounds developed for UCB algorithms into Bayesian regret bounds for posterior sampling. Our second theoretical contribution is a Bayesian regret bound for posterior sampling that applies broadly and can be specialized to many model classes. This bound depends on a new notion we refer to as the
eluder dimension
, which measures the degree of dependence among action rewards. Compared to UCB algorithm Bayesian regret bounds for specific model classes, our general bound matches the best available for linear models and is stronger than the best available for generalized linear models. Further, our analysis provides insight into performance advantages of posterior sampling, which are highlighted through simulation results that demonstrate performance surpassing recently proposed UCB algorithms.
Journal Article
An empirical evaluation of sampling methods for the classification of imbalanced data
2022
In numerous classification problems, class distribution is not balanced. For example, positive examples are rare in the fields of disease diagnosis and credit card fraud detection. General machine learning methods are known to be suboptimal for such imbalanced classification. One popular solution is to balance training data by oversampling the underrepresented (or undersampling the overrepresented) classes before applying machine learning algorithms. However, despite its popularity, the effectiveness of sampling has not been rigorously and comprehensively evaluated. This study assessed combinations of seven sampling methods and eight machine learning classifiers (56 varieties in total) using 31 datasets with varying degrees of imbalance. We used the areas under the precision-recall curve (AUPRC) and receiver operating characteristics curve (AUROC) as the performance measures. The AUPRC is known to be more informative for imbalanced classification than the AUROC. We observed that sampling significantly changed the performance of the classifier (paired t -tests P < 0.05) only for few cases (12.2% in AUPRC and 10.0% in AUROC). Surprisingly, sampling was more likely to reduce rather than improve the classification performance. Moreover, the adverse effects of sampling were more pronounced in AUPRC than in AUROC. Among the sampling methods, undersampling performed worse than others. Also, sampling was more effective for improving linear classifiers. Most importantly, we did not need sampling to obtain the optimal classifier for most of the 31 datasets. In addition, we found two interesting examples in which sampling significantly reduced AUPRC while significantly improving AUROC (paired t -tests P < 0.05). In conclusion, the applicability of sampling is limited because it could be ineffective or even harmful. Furthermore, the choice of the performance measure is crucial for decision making. Our results provide valuable insights into the effect and characteristics of sampling for imbalanced classification.
Journal Article
The Ubiquitous Ewens Sampling Formula
2016
Ewens's sampling formula exemplifies the harmony of mathematical theory, statistical application, and scientific discovery. The formula not only contributes to the foundations of evolutionary molecular genetics, the neutral theory of biodiversity, Bayesian nonparametrics, combinatorial stochastic processes, and inductive inference but also emerges from fundamental concepts in probability theory, algebra, and number theory. With an emphasis on its far-reaching influence throughout statistics and probability, we highlight these and many other consequences of Ewens's seminal discovery.
Journal Article
A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models
by
Atay-Kayis, Aliye
,
Massam, Hélène
in
Applications
,
Biology, psychology, social sciences
,
Combinatorics
2005
A centred Gaussian model that is Markov with respect to an undirected graph G is characterised by the parameter set of its precision matrices which is the cone M+(G) of positive definite matrices with entries corresponding to the missing edges of G constrained to be equal to zero. In a Bayesian framework, the conjugate family for the precision parameter is the distribution with Wishart density with respect to the Lebesgue measure restricted to M+(G). We call this distribution the G-Wishart. When G is nondecomposable, the normalising constant of the G-Wishart cannot be computed in closed form. In this paper, we give a simple Monte Carlo method for computing this normalising constant. The main feature of our method is that the sampling distribution is exact and consists of a product of independent univariate standard normal and chi-squared distributions that can be read off the graph G. Computing this normalising constant is necessary for obtaining the posterior distribution of G or the marginal likelihood of the corresponding graphical Gaussian model. Our method also gives a way of sampling from the posterior distribution of the precision matrix.
Journal Article
Reconsidering Baron and Kenny: Myths and Truths about Mediation Analysis
2010
Baron and Kenny’s procedure for determining if an independent variable affects a dependent variable through some mediator is so well known that it is used by authors and requested by reviewers almost reflexively. Many research projects have been terminated early in a research program or later in the review process because the data did not conform to Baron and Kenny’s criteria, impeding theoretical development. While the technical literature has disputed some of Baron and Kenny’s tests, this literature has not diffused to practicing researchers. We present a nontechnical summary of the flaws in the Baron and Kenny logic, some of which have not been previously noted. We provide a decision tree and a step‐by‐step procedure for testing mediation, classifying its type, and interpreting the implications of findings for theory building and future research.
Journal Article
Partial stratified ranked set sampling scheme for estimation of population mean and median
by
Ismail, Muhammad
,
Cheema, Ammara Nawaz
,
M, Maria
in
Computer and Information Sciences
,
COVID-19
,
COVID-19 - epidemiology
2023
Ranked set sampling is an alternative to simple random sampling, which uses the least amount of money and time. The ranked set sampling (RSS) is modified to obtain a more efficient and cost-effective estimator of population parameters. This paper aims to bring a more efficient and cost-effective design than stratified ranked set sampling and simple random sampling. In some distributions, the suggested method used fewer sample units than stratified ranked set sampling and gives a more efficient estimation of population parameters. In symmetric distributions, the proposed design, called \"partial stratified ranked set sampling\" yields an unbiased estimator of the population mean. The design is illustrated with practical data of COVID-19 confirmed cases.
Journal Article
Estimation of finite population distribution function with dual use of auxiliary information under non-response
2020
In this paper, we propose two new families of estimators for estimating the finite population distribution function in the presence of non-response under simple random sampling. The proposed estimators require information on the sample distribution functions of the study and auxiliary variables, and additional information on either sample mean or ranks of the auxiliary variable. We considered two situations of non-response (i) non-response on both study and auxiliary variables, (ii) non-response occurs only on the study variable. The performance of the proposed estimators are compared with the existing estimators available in the literature, both theoretically and numerically. It is also observed that proposed estimators are more precise than the adapted distribution function estimators in terms of the percentage relative efficiency.
Journal Article
Particle Markov chain Monte Carlo methods
by
Doucet, Arnaud
,
Holenstein, Roman
,
Andrieu, Christophe
in
Algorithms
,
Approximation
,
Bayesian analysis
2010
Markov chain Monte Carlo and sequential Monte Carlo methods have emerged as the two main tools to sample from high dimensional probability distributions. Although asymptotic convergence of Markov chain Monte Carlo algorithms is ensured under weak assumptions, the performance of these algorithms is unreliable when the proposal distributions that are used to explore the space are poorly chosen and/or if highly correlated variables are updated independently. We show here how it is possible to build efficient high dimensional proposal distributions by using sequential Monte Carlo methods. This allows us not only to improve over standard Markov chain Monte Carlo schemes but also to make Bayesian inference feasible for a large class of statistical models where this was not previously so. We demonstrate these algorithms on a non-linear state space model and a Lévy-driven stochastic volatility model.
Journal Article