Catalogue Search | MBRL
Search Results Heading
Explore the vast range of titles available.
MBRLSearchResults
-
DisciplineDiscipline
-
Is Peer ReviewedIs Peer Reviewed
-
Item TypeItem Type
-
SubjectSubject
-
YearFrom:-To:
-
More FiltersMore FiltersSourceLanguage
Done
Filters
Reset
7,139
result(s) for
"Statistical discrepancies"
Sort by:
ROBUST HYPERPARAMETER ESTIMATION PROTECTS AGAINST HYPERVARIABLE GENES AND IMPROVES POWER TO DETECT DIFFERENTIAL EXPRESSION
by
Lee, Stanley
,
Phipso, Belinda
,
Alexander, Warren S.
in
B lymphocytes
,
Degrees of freedom
,
Estimators
2016
One of the most common analysis tasks in genomic research is to identify genes that are differentially expressed (DE) between experimental conditions. Empirical Bayes (EB) statistical tests using moderated genewise variances have been very effective for this purpose, especially when the number of biological replicate samples is small. The EB procedures can, however, be heavily influenced by a small number of genes with very large or very small variances. This article improves the differential expression tests by robustifying the hyperparameter estimation procedure. The robust procedure has the effect of decreasing the informativeness of the prior distribution for outlier genes while increasing its informativeness for other genes. This effect has the double benefit of reducing the chance that hypervariable genes will be spuriously identified as DE while increasing statistical power for the main body of genes. The robust EB algorithm is fast and numerically stable. The procedure allows exact small-sample null distributions for the test statistics and reduces exactly to the original EB procedure when no outlier genes are present. Simulations show that the robustified tests have similar performance to the original tests in the absence of outlier genes but have greater power and robustness when outliers are present. The article includes case studies for which the robust method correctly identifies and downweights genes associated with hidden covariates and detects more genes likely to be scientifically relevant to the experimental conditions. The new procedure is implemented in the limma software package freely available from the Bioconductor repository.
Journal Article
A Gentle Introduction to Bayesian Analysis: Applications to Developmental Research
by
Asendorpf, Jens B.
,
Kaplan, David
,
van de Schoot, Rens
in
Bayes Theorem
,
Bayesian analysis
,
Bayesian method
2014
Bayesian statistical methods are becoming ever more popular in applied and fundamental research. In this study a gentle introduction to Bayesian analysis is provided. It is shown under what circumstances it is attractive to use Bayesian estimation, and how to interpret properly the results. First, the ingredients underlying Bayesian methods are introduced using a simplified example. Thereafter, the advantages and pitfalls of the specification of prior knowledge are discussed. To illustrate Bayesian methods explained in this study, in a second example a series of studies that examine the theoretical framework of dynamic interactionism are considered. In the Discussion the advantages and disadvantages of using Bayesian statistics are reviewed, and guidelines on how to report on Bayesian statistics are provided.
Journal Article
PERMANOVA, ANOSIM, and the Mantel test in the face of heterogeneous dispersions: What null hypothesis are you testing?
by
Walsh, Daniel C. I.
,
Anderson, Marti J.
in
Animal and plant ecology
,
Animal, plant and microbial ecology
,
ANOSIM
2013
ANOSIM, PERMANOVA, and the Mantel test are all resemblance-based permutation methods widely used in ecology. Here, we report the results of the first simulation study, to our knowledge, specifically designed to examine the effects of heterogeneity of multivariate dispersions on the rejection rates of these tests and on a classical MANOVA test (Pillai's trace). Increasing differences in dispersion among groups were simulated under scenarios of changing sample sizes, correlation structures, error distributions, numbers of variables, and numbers of groups for balanced and unbalanced one-way designs. The power of these tests to detect environmental impacts or natural large-scale biogeographic gradients was also compared empirically under simulations based on parameters derived from real ecological data sets.
Overall, ANOSIM and the Mantel test were very sensitive to heterogeneity in dispersions, with ANOSIM generally being more sensitive than the Mantel test. In contrast, PERMANOVA and Pillai's trace were largely unaffected by heterogeneity for balanced designs. PERMANOVA was also unaffected by differences in correlation structure, unlike Pillai's trace. For unbalanced designs, however, all of the tests were (1) too liberal when the smaller group had greater dispersion and (2) overly conservative when the larger group had greater dispersion, especially ANOSIM and the Mantel test. For simulations based on real ecological data sets, PERMANOVA was generally, but not always, more powerful than the others to detect changes in community structure, and the Mantel test was usually more powerful than ANOSIM. Both the error distributions and the resemblance measure affected results concerning power.
Differences in the underlying construction of these test statistics result in important differences in the nature of the null hypothesis they are testing, their sensitivity to heterogeneity, and their power to detect important changes in ecological communities. For balanced designs, PERMANOVA and PERMDISP can be used to rigorously identify location vs. dispersion effects, respectively, in the space of the chosen resemblance measure. ANOSIM and the Mantel test can be used as more \"omnibus\" tests, being sensitive to differences in location, dispersion or correlation structure among groups. Unfortunately, none of the tests (PERMANOVA, Mantel, or ANOSIM) behaved reliably for unbalanced designs in the face of heterogeneity.
Journal Article
Identifiability of Gaussian structural equation models with equal error variances
2014
We consider structural equation models in which variables can be written as a function of their parents and noise terms, which are assumed to be jointly independent. Corresponding to each structural equation model is a directed acyclic graph describing the relationships between the variables. In Gaussian structural equation models with linear functions, the graph can be identified from the joint distribution only up to Markov equivalence classes, assuming faithfulness.In this work, we prove full identifiability in the case where all noise variables have the same variance: the directed acyclic graph can be recovered from the joint Gaussian distribution.Our result has direct implications for causal inference: if the data follow a Gaussian structural equation model with equal error variances, then, assuming that all variables are observed, the causal structure can be inferred from observational data only. We propose a statistical method and an algorithm based on our theoretical findings.
Journal Article
Strong rules for discarding predictors in lasso-type problems
by
Hastie, Trevor
,
Taylor, Jonathan
,
Tibshirani, Ryan J.
in
Approximation
,
Causal analysis
,
Coefficients
2012
We consider rules for discarding predictors in lasso regression and related problems, for computational efficiency. El Ghaoui and his colleagues have proposed 'SAFE' rules, based on univariate inner products between each predictor and the outcome, which guarantee that a coefficient will be 0 in the solution vector. This provides a reduction in the number of variables that need to be entered into the optimization. We propose strong rules that are very simple and yet screen out far more predictors than the SAFE rules. This great practical improvement comes at a price: the strong rules are not foolproof and can mistakenly discard active predictors, i.e. predictors that have non-zero coefficients in the solution. We therefore combine them with simple checks of the Karush-Kuhn-Tucker conditions to ensure that the exact solution to the convex problem is delivered. Of course, any (approximate) screening method can be combined with the Karush-Kuhn-Tucker conditions to ensure the exact solution; the strength of the strong rules lies in the fact that, in practice, they discard a very large number of the inactive predictors and almost never commit mistakes. We also derive conditions under which they are foolproof. Strong rules provide substantial savings in computational time for a variety of statistical optimization problems.
Journal Article
The Bayesian Lasso
2008
The Lasso estimate for linear regression parameters can be interpreted as a Bayesian posterior mode estimate when the regression parameters have independent Laplace (i.e., double-exponential) priors. Gibbs sampling from this posterior is possible using an expanded hierarchy with conjugate normal priors for the regression parameters and independent exponential priors on their variances. A connection with the inverse-Gaussian distribution provides tractable full conditional distributions. The Bayesian Lasso provides interval estimates (Bayesian credible intervals) that can guide variable selection. Moreover, the structure of the hierarchical model provides both Bayesian and likelihood methods for selecting the Lasso parameter. Slight modifications lead to Bayesian versions of other Lasso-related estimation methods, including bridge regression and a robust variant.
Journal Article
Kernel Regularized Least Squares: Reducing Misspecification Bias with a Flexible and Interpretable Machine Learning Approach
by
Hainmueller, Jens
,
Hazlett, Chad
in
Artificial intelligence
,
Classification
,
Estimating techniques
2014
We propose the use of Kernel Regularized Least Squares (KRLS) for social science modeling and inference problems. KRLS borrows from machine learning methods designed to solve regression and classification problems without relying on linearity or additivity assumptions. The method constructs a flexible hypothesis space that uses kernels as radial basis functions and finds the best-fitting surface in this space by minimizing a complexity-penalized least squares problem. We argue that the method is well-suited for social science inquiry because it avoids strong parametric assumptions, yet allows interpretation in ways analogous to generalized linear models while also permitting more complex interpretation to examine nonlinearities, interactions, and heterogeneous effects. We also extend the method in several directions to make it more effective for social inquiry, by (1) deriving estimators for the pointwise marginal effects and their variances, (2) establishing unbiasedness, consistency, and asymptotic normality of the KRLS estimator under fairly general conditions, (3) proposing a simple automated rule for choosing the kernel bandwidth, and (4) providing companion software. We illustrate the use of the method through simulations and empirical examples.
Journal Article
EXACT AND ASYMPTOTICALLY ROBUST PERMUTATION TESTS
2013
Given independent samples from P and Q, two-sample permutation tests allow one to construct exact level tests when the null hypothesis is P = Q. On the other hand, when comparing or testing particular parameters θ of P and Q, such as their means or medians, permutation tests need not be level α, or even approximately level α in large samples. Under very weak assumptions for comparing estimators, we provide a general test procedure whereby the asymptotic validity of the permutation test holds while retaining the exact rejection probability α in finite samples when the underlying distributions are identical. The ideas are broadly applicable and special attention is given to the k-sample problem of comparing general parameters, whereby a permutation test is constructed which is exact level α under the hypothesis of identical distributions, but has asymptotic rejection probability α under the more general null hypothesis of equality of parameters. A Monte Carlo simulation study is performed as well. A quite general theory is possible based on a coupling construction, as well as a key contiguity argument for the multinomial and multivariate hypergeometric distributions.
Journal Article
Expected Stock Returns and Variance Risk Premia
by
Tauchen, George
,
Zhou, Hao
,
Bollerslev, Tim
in
Economic models
,
Equilibrium
,
Expected returns
2009
Motivated by the implications from a stylized self-contained general equilibrium model incorporating the effects of time-varying economic uncertainty, we show that the difference between implied and realized variation, or the variance risk premium, is able to explain a nontrivial fraction of the time-series variation in post-1990 aggregate stock market returns, with high (low) premia predicting high (low) future returns. Our empirical results depend crucially on the use of \"model-free,\" as opposed to Black-Scholes, options implied volatilities, along with accurate realized variation measures constructed from high-frequency intraday as opposed to daily data. The magnitude of the predictability is particularly strong at the intermediate quarterly return horizon, where it dominates that afforded by other popular predictor variables, such as the P/E ratio, the default spread, and the consumption-wealthratio.
Journal Article
Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations
2009
Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalized) linear models, (generalized) additive models, smoothing spline models, state space models, semiparametric regression, spatial and spatiotemporal models, log-Gaussian Cox processes and geostatistical and geoadditive models. We consider approximate Bayesian inference in a popular subset of structured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non-Gaussian response variables. The posterior marginals are not available in closed form owing to the non-Gaussian response variables. For such models, Markov chain Monte Carlo methods can be implemented, but they are not without problems, in terms of both convergence and computational time. In some practical applications, the extent of these problems is such that Markov chain Monte Carlo sampling is simply not an appropriate tool for routine analysis. We show that, by using an integrated nested Laplace approximation and its simplified version, we can directly compute very accurate approximations to the posterior marginals. The main benefit of these approximations is computational: where Markov chain Monte Carlo algorithms need hours or days to run, our approximations provide more precise estimates in seconds or minutes. Another advantage with our approach is its generality, which makes it possible to perform Bayesian analysis in an automatic, streamlined way, and to compute model comparison criteria and various predictive measures so that models can be compared and the model under study can be challenged.
Journal Article