Search Results Heading

MBRLSearchResults

mbrl.module.common.modules.added.book.to.shelf
Title added to your shelf!
View what I already have on My Shelf.
Oops! Something went wrong.
Oops! Something went wrong.
While trying to add the title to your shelf something went wrong :( Kindly try again later!
Are you sure you want to remove the book from the shelf?
Oops! Something went wrong.
Oops! Something went wrong.
While trying to remove the title from your shelf something went wrong :( Kindly try again later!
    Done
    Filters
    Reset
  • Discipline
      Discipline
      Clear All
      Discipline
  • Is Peer Reviewed
      Is Peer Reviewed
      Clear All
      Is Peer Reviewed
  • Item Type
      Item Type
      Clear All
      Item Type
  • Subject
      Subject
      Clear All
      Subject
  • Year
      Year
      Clear All
      From:
      -
      To:
  • More Filters
757 result(s) for "Bayesian variable selection"
Sort by:
Scalable importance tempering and Bayesian variable selection
We propose a Monte Carlo algorithm to sample from high dimensional probability distributions that combines Markov chain Monte Carlo and importance sampling. We provide a careful theoretical analysis, including guarantees on robustness to high dimensionality, explicit comparison with standard Markov chain Monte Carlo methods and illustrations of the potential improvements in efficiency. Simple and concrete intuition is provided for when the novel scheme is expected to outperform standard schemes. When applied to Bayesian variable-selection problems, the novel algorithm is orders of magnitude more efficient than available alternative sampling schemes and enables fast and reliable fully Bayesian inferences with tens of thousand regressors.
Joint Analysis of Strain and Parent-of-Origin Effects for Recombinant Inbred Intercrosses Generated from Multiparent Populations with the Collaborative Cross as an Example
Multiparent populations (MPP) have become popular resources for complex trait mapping because of their wider allelic diversity and larger population size compared with traditional two-way recombinant inbred (RI) strains. In mice, the collaborative cross (CC) is one of the most popular MPP and is derived from eight genetically diverse inbred founder strains. The strategy of generating RI intercrosses (RIX) from MPP in general and from the CC in particular can produce a large number of completely reproducible heterozygote genomes that better represent the (outbred) human population. Since both maternal and paternal haplotypes of each RIX are readily available, RIX is a powerful resource for studying both standing genetic and epigenetic variations of complex traits, in particular, the parent-of-origin (PoO) effects, which are important contributors to many complex traits. Furthermore, most complex traits are affected by >1 genes, where multiple quantitative trait locus mapping could be more advantageous. In this paper, for MPP-RIX data but taking CC-RIX as a working example, we propose a general Bayesian variable selection procedure to simultaneously search for multiple genes with founder allelic effects and PoO effects. The proposed model respects the complex relationship among RIX samples, and the performance of the proposed method is examined by extensive simulations.
ON THE COMPUTATIONAL COMPLEXITY OF HIGH-DIMENSIONAL BAYESIAN VARIABLE SELECTION
We study the computational complexity of Markov chain Monte Carlo (MCMC) methods for high-dimensional Bayesian linear regression under sparsity constraints. We first show that a Bayesian approach can achieve variable-selection consistency under relatively mild conditions on the design matrix. We then demonstrate that the statistical criterion of posterior concentration need not imply the computational desideratum of rapid mixing of the MCMC algorithm. By introducing a truncated sparsity prior for variable selection, we provide a set of conditions that guarantee both variable-selection consistency and rapid mixing of a particular Metropolis-Hastings algorithm. The mixing time is linear in the number of covariates up to a logarithmic factor. Our proof controls the spectral gap of the Markov chain by constructing a canonical path ensemble that is inspired by the steps taken by greedy algorithms for variable selection.
An RShiny app for modelling environmental DNA data: accounting for false positive and false negative observation error
Environmental DNA (eDNA) surveys have become a popular tool for assessing the distribution of species. However, it is known that false positive and false negative observation error can occur at both stages of eDNA surveys, namely the field sampling stage and laboratory analysis stage. We present an RShiny app that implements the Griffin et al. (2020) statistical method, which accounts for false positive and false negative errors in both stages of eDNA surveys that target single species using quantitative PCR methods. Following Griffin et al. (2020), we employ a Bayesian approach and perform efficient Bayesian variable selection to identify important predictors for the probability of species presence as well as the probabilities of observation error at either stage. We demonstrate the RShiny app using a data set on great crested newts collected by Natural England in 2018, and we identify water quality, pond area, fish presence, macrophyte cover and frequency of drying as important predictors for species presence at a site. The state‐of‐the‐art statistical method that we have implemented is the only one that has specifically been developed for the purposes of modelling false negative and false positive observation error in eDNA data. Our RShiny app is user‐friendly, requires no prior knowledge of R and fits the models very efficiently. Therefore, it should be part of the tool‐kit of any researcher or practitioner who is collecting or analysing eDNA data.
A Selective Review of Multi-Level Omics Data Integration Using Variable Selection
High-throughput technologies have been used to generate a large amount of omics data. In the past, single-level analysis has been extensively conducted where the omics measurements at different levels, including mRNA, microRNA, CNV and DNA methylation, are analyzed separately. As the molecular complexity of disease etiology exists at all different levels, integrative analysis offers an effective way to borrow strength across multi-level omics data and can be more powerful than single level analysis. In this article, we focus on reviewing existing multi-omics integration studies by paying special attention to variable selection methods. We first summarize published reviews on integrating multi-level omics data. Next, after a brief overview on variable selection methods, we review existing supervised, semi-supervised and unsupervised integrative analyses within parallel and hierarchical integration studies, respectively. The strength and limitations of the methods are discussed in detail. No existing integration method can dominate the rest. The computation aspects are also investigated. The review concludes with possible limitations and future directions for multi-level omics data integration.
Is Seeing Believing? A Practitioner’s Perspective on High-Dimensional Statistical Inference in Cancer Genomics Studies
Variable selection methods have been extensively developed for and applied to cancer genomics data to identify important omics features associated with complex disease traits, including cancer outcomes. However, the reliability and reproducibility of the findings are in question if valid inferential procedures are not available to quantify the uncertainty of the findings. In this article, we provide a gentle but systematic review of high-dimensional frequentist and Bayesian inferential tools under sparse models which can yield uncertainty quantification measures, including confidence (or Bayesian credible) intervals, p values and false discovery rates (FDR). Connections in high-dimensional inferences between the two realms have been fully exploited under the “unpenalized loss function + penalty term” formulation for regularization methods and the “likelihood function × shrinkage prior” framework for regularized Bayesian analysis. In particular, we advocate for robust Bayesian variable selection in cancer genomics studies due to its ability to accommodate disease heterogeneity in the form of heavy-tailed errors and structured sparsity while providing valid statistical inference. The numerical results show that robust Bayesian analysis incorporating exact sparsity has yielded not only superior estimation and identification results but also valid Bayesian credible intervals under nominal coverage probabilities compared with alternative methods, especially in the presence of heavy-tailed model errors and outliers.
Bayesian Sparse Group Selection
This article proposes a Bayesian approach for the sparse group selection problem in the regression model. In this problem, the variables are partitioned into different groups. It is assumed that only a small number of groups are active for explaining the response variable, and it is further assumed that within each active group only a small number of variables are active. We adopt a Bayesian hierarchical formulation, where each candidate group is associated with a binary variable indicating whether the group is active or not. Within each group, each candidate variable is also associated with a binary indicator, too. Thus, the sparse group selection problem can be solved by sampling from the posterior distribution of the two layers of indicator variables. We adopt a group-wise Gibbs sampler for posterior sampling. We demonstrate the proposed method by simulation studies as well as real examples. The simulation results show that the proposed method performs better than the sparse group Lasso in terms of selecting the active groups as well as identifying the active variables within the selected groups. Supplementary materials for this article are available online.
STRONG SELECTION CONSISTENCY OF BAYESIAN VECTOR AUTOREGRESSIVE MODELS BASED ON A PSEUDO-LIKELIHOOD APPROACH
Vector autoregressive (VAR) models aim to capture linear temporal interdependencies among multiple time series. They have been widely used in macroeconomics and financial econometrics and more recently have found novel applications in functional genomics and neuroscience. These applications have also accentuated the need to investigate the behavior of the VAR model in a high-dimensional regime, which will provide novel insights into the role of temporal dependence for regularized estimates of the models parameters. However, hardly anything is known regarding posterior model selection consistency for Bayesian VAR models in such regimes. In this work, we develop a pseudo-likelihood based Bayesian approach for consistent variable selection in high-dimensional VAR models by considering hierarchical normal priors on the autoregressive coefficients, as well as on the model space. We establish strong selection consistency of the proposed method, namely that the posterior probability assigned to the true underlying VAR model converges to one under high-dimensional scaling where the dimension p of the VAR system grows nearly exponentially with the sample size n. Further, the result is established under mild regularity conditions on the problem parameters. Finally, as a by-product of these results, we also establish strong selection consistency for the sparse high-dimensional linear regression model with serially correlated regressors and errors.
A Sparse Latent Class Model for Cognitive Diagnosis
Cognitive diagnostic models (CDMs) are latent variable models developed to infer latent skills, knowledge, or personalities that underlie responses to educational, psychological, and social science tests and measures. Recent research focused on theory and methods for using sparse latent class models (SLCMs) in an exploratory fashion to infer the latent processes and structure underlying responses. We report new theoretical results about sufficient conditions for generic identifiability of SLCM parameters. An important contribution for practice is that our new generic identifiability conditions are more likely to be satisfied in empirical applications than existing conditions that ensure strict identifiability. Learning the underlying latent structure can be formulated as a variable selection problem. We develop a new Bayesian variable selection algorithm that explicitly enforces generic identifiability conditions and monotonicity of item response functions to ensure valid posterior inference. We present Monte Carlo simulation results to support accurate inferences and discuss the implications of our findings for future SLCM research and educational testing.
Adaptive MCMC for Bayesian Variable Selection in Generalised Linear Models and Survival Models
Developing an efficient computational scheme for high-dimensional Bayesian variable selection in generalised linear models and survival models has always been a challenging problem due to the absence of closed-form solutions to the marginal likelihood. The Reversible Jump Markov Chain Monte Carlo (RJMCMC) approach can be employed to jointly sample models and coefficients, but the effective design of the trans-dimensional jumps of RJMCMC can be challenging, making it hard to implement. Alternatively, the marginal likelihood can be derived conditional on latent variables using a data-augmentation scheme (e.g., Pólya-gamma data augmentation for logistic regression) or using other estimation methods. However, suitable data-augmentation schemes are not available for every generalised linear model and survival model, and estimating the marginal likelihood using a Laplace approximation or a correlated pseudo-marginal method can be computationally expensive. In this paper, three main contributions are presented. Firstly, we present an extended Point-wise implementation of Adaptive Random Neighbourhood Informed proposal (PARNI) to efficiently sample models directly from the marginal posterior distributions of generalised linear models and survival models. Secondly, in light of the recently proposed approximate Laplace approximation, we describe an efficient and accurate estimation method for marginal likelihood that involves adaptive parameters. Additionally, we describe a new method to adapt the algorithmic tuning parameters of the PARNI proposal by replacing Rao-Blackwellised estimates with the combination of a warm-start estimate and the ergodic average. We present numerous numerical results from simulated data and eight high-dimensional genetic mapping data-sets to showcase the efficiency of the novel PARNI proposal compared with the baseline add–delete–swap proposal.