Catalogue Search | MBRL

Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods

by NISHIMURA, AKIHIKO , DUNSON, DAVID B. , LU, JIANFENG in Algorithms , Continuity (mathematics) , Embedding

2020

Hamiltonian Monte Carlo has emerged as a standard tool for posterior computation. In this article we present an extension that can efficiently explore target distributions with discontinuous densities. Our extension in particular enables efficient sampling from ordinal parameters through the embedding of probability mass functions into continuous spaces. We motivate our approach through a theory of discontinuous Hamiltonian dynamics and develop a corresponding numerical solver. The proposed solver is the first of its kind, with a remarkable ability to exactly preserve the Hamiltonian. We apply our algorithm to challenging posterior inference problems to demonstrate its wide applicability and competitive performance.

Journal Article

Share this book

Add to My Shelf

Robust Bayesian Inference via Coarsening

by Dunson, David B. , Miller, Jeffrey W. in Algorithms , Autoregressive models , Bayesian analysis

2019

The standard approach to Bayesian inference is based on the assumption that the distribution of the data belongs to the chosen model class. However, even a small violation of this assumption can have a large impact on the outcome of a Bayesian procedure. We introduce a novel approach to Bayesian inference that improves robustness to small departures from the model: rather than conditioning on the event that the observed data are generated by the model, one conditions on the event that the model generates data close to the observed data, in a distributional sense. When closeness is defined in terms of relative entropy, the resulting \"coarsened\" posterior can be approximated by simply tempering the likelihood-that is, by raising the likelihood to a fractional power-thus, inference can usually be implemented via standard algorithms, and one can even obtain analytical solutions when using conjugate priors. Some theoretical properties are derived, and we illustrate the approach with real and simulated data using mixture models and autoregressive models of unknown order. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

Dirichlet–Laplace Priors for Optimal Shrinkage

by Pillai, Natesh S. , Dunson, David B. , Pati, Debdeep in Bayesian , Bayesian analysis , Bayesian method

2015

Penalized regression methods, such as L ₁ regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routinely induced through two-component mixture priors having a probability mass at zero, but such priors encounter daunting computational problems in high dimensions. This has motivated continuous shrinkage priors, which can be expressed as global-local scale mixtures of Gaussians, facilitating computation. In contrast to the frequentist literature, little is known about the properties of such priors and the convergence and concentration of the corresponding posterior distribution. In this article, we propose a new class of Dirichlet–Laplace priors, which possess optimal posterior concentration and lead to efficient posterior computation. Finite sample performance of Dirichlet–Laplace priors relative to alternatives is assessed in simulated and real data examples.

Journal Article

Share this book

Add to My Shelf

GENERALIZED DOUBLE PARETO SHRINKAGE

by Dunson, David B. , Lee, Jaeyong , Armagan, Artin in A posteriori knowledge , Analytical estimating , Density estimation

2013

We propose a generalized double Pareto prior for Bayesian shrinkage estimation and inferences in linear models. The prior can be obtained via a scale mixture of Laplace or normal distributions, forming a bridge between the Laplace and Normal-Jeffreys' priors. While it has a spike at zero like the Laplace density, it also has a Student's t-like tail behavior. Bayesian computation is straightforward via a simple Gibbs sampling algorithm. We investigate the properties of the maximum a posteriori estimator, as sparse estimation plays an important role in many problems, reveal connections with some well-established regularization procedures, and show some asymptotic results. The performance of the prior is tested through simulations and an application.

Journal Article

Share this book

Add to My Shelf

Nonparametric Bayes Modeling of Populations of Networks

by Dunson, David B. , Durante, Daniele , Vogelstein, Joshua T. in Bayesian analysis , Bayesian nonparametrics , Bayesian theory

2017

Replicated network data are increasingly available in many research fields. For example, in connectomic applications, interconnections among brain regions are collected for each patient under study, motivating statistical models which can flexibly characterize the probabilistic generative mechanism underlying these network-valued data. Available models for a single network are not designed specifically for inference on the entire probability mass function of a network-valued random variable and therefore lack flexibility in characterizing the distribution of relevant topological structures. We propose a flexible Bayesian nonparametric approach for modeling the population distribution of network-valued data. The joint distribution of the edges is defined via a mixture model that reduces dimensionality and efficiently incorporates network information within each mixture component by leveraging latent space representations. The formulation leads to an efficient Gibbs sampler and provides simple and coherent strategies for inference and goodness-of-fit assessments. We provide theoretical results on the flexibility of our model and illustrate improved performance-compared to state-of-the-art models-in simulations and application to human brain networks. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

PPA: Principal parcellation analysis for brain connectomes and multiple traits

by Dunson, David B. , Li, Meng , Liu, Rongjie in Brain - diagnostic imaging , Brain networks , Brain parcellation

2023

•We propose a new approach called principal parcellation analysis (PPA) for predicting human traits using the brain connectome.•PPA uses tractography-based brain connectome representation, clustering fiber endpoints to create a data-driven white matter parcellation.•With vector-valued representation, PPA facilitates easier statistical analysis, improving trait prediction power and model parsimony.•Applications to data from the human connectome project demonstrate the effectiveness of PPA. Our understanding of the structure of the brain and its relationships with human traits is largely determined by how we represent the structural connectome. Standard practice divides the brain into regions of interest (ROIs) and represents the connectome as an adjacency matrix having cells measuring connectivity between pairs of ROIs. Statistical analyses are then heavily driven by the (largely arbitrary) choice of ROIs. In this article, we propose a human trait prediction framework utilizing a tractography-based representation of the brain connectome, which clusters fiber endpoints to define a data-driven white matter parcellation targeted to explain variation among individuals and predict human traits. This leads to Principal Parcellation Analysis (PPA), representing individual brain connectomes by compositional vectors building on a basis system of fiber bundles that captures the connectivity at the population level. PPA eliminates the need to choose atlases and ROIs a priori, and provides a simpler, vector-valued representation that facilitates easier statistical analysis compared to the complex graph structures encountered in classical connectome analyses. We illustrate the proposed approach through applications to data from the Human Connectome Project (HCP) and show that PPA connectomes improve power in predicting human traits over state-of-the-art methods based on classical connectomes, while dramatically improving parsimony and maintaining interpretability. Our PPA package is publicly available on GitHub, and can be implemented routinely for diffusion image data.

Journal Article

Share this book

Add to My Shelf

Mutual information: Measuring nonlinear dependence in longitudinal epidemiological data

by van den Boom, Willem , Raghunathan, Karthik , Dunson, David B. in Analysis , Biology and Life Sciences , Blood pressure

2023

Given a large clinical database of longitudinal patient information including many covariates, it is computationally prohibitive to consider all types of interdependence between patient variables of interest. This challenge motivates the use of mutual information (MI), a statistical summary of data interdependence with appealing properties that make it a suitable alternative or addition to correlation for identifying relationships in data. MI: (i) captures all types of dependence, both linear and nonlinear, (ii) is zero only when random variables are independent, (iii) serves as a measure of relationship strength (similar to but more general than R 2 ), and (iv) is interpreted the same way for numerical and categorical data. Unfortunately, MI typically receives little to no attention in introductory statistics courses and is more difficult than correlation to estimate from data. In this article, we motivate the use of MI in the analyses of epidemiologic data, while providing a general introduction to estimation and interpretation. We illustrate its utility through a retrospective study relating intraoperative heart rate (HR) and mean arterial pressure (MAP). We: (i) show postoperative mortality is associated with decreased MI between HR and MAP and (ii) improve existing postoperative mortality risk assessment by including MI and additional hemodynamic statistics.

Journal Article

Share this book

Add to My Shelf

MCMC for Imbalanced Categorical Data

by Johndrow, James E. , Smith, Aaron , Dunson, David B. in Adaptive algorithms , Advertisements , Advertising

2019

Many modern applications collect highly imbalanced categorical data, with some categories relatively rare. Bayesian hierarchical models combat data sparsity by borrowing information, while also quantifying uncertainty. However, posterior computation presents a fundamental barrier to routine use; a single class of algorithms does not work well in all settings and practitioners waste time trying different types of Markov chain Monte Carlo (MCMC) approaches. This article was motivated by an application to quantitative advertising in which we encountered extremely poor computational performance for data augmentation MCMC algorithms but obtained excellent performance for adaptive Metropolis. To obtain a deeper understanding of this behavior, we derive theoretical results on the computational complexity of commonly used data augmentation algorithms and the Random Walk Metropolis algorithm for highly imbalanced binary data. In this regime, our results show computational complexity of Metropolis is logarithmic in sample size, while data augmentation is polynomial in sample size. The root cause of this poor performance of data augmentation is a discrepancy between the rates at which the target density and MCMC step sizes concentrate. Our methods also show that MCMC algorithms that exhibit a similar discrepancy will fail in large samples-a result with substantial practical impact. Supplementary materials for this article are available online.

Journal Article

Share this book

Add to My Shelf

BAYESIAN MANIFOLD REGRESSION

by Yang, Yun , Dunson, David B. in 62-07 , 62H30 , 65U05

2016

There is increasing interest in the problem of nonparametric regression with high-dimensional predictors. When the number of predictors D is large, one encounters a daunting problem in attempting to estimate a D-dimensional surface based on limited data. Fortunately, in many applications, the support of the data is concentrated on a d-dimensional subspace with d « D. Manifold learning attempts to estimate this subspace. Our focus is on developing computationally tractable and theoretically supported Bayesian nonparametric regression methods in this context. When the subspace corresponds to a locally-Euclidean compact Riemannian manifold, we show that a Gaussian process regression approach can be applied that leads to the minimax optimal adaptive rate in estimating the regression function under some conditions. The proposed model bypasses the need to estimate the manifold, and can be implemented using standard algorithms for posterior computation in Gaussian processes. Finite sample performance is illustrated in a data analysis example.

Journal Article

Share this book

Add to My Shelf

Inferring taxonomic placement from DNA barcoding aiding in discovery of new taxa

by Dunson, David B. , Zito, Alessandro , Rigon, Tommaso in Algorithms , Arthropods , Bayesian analysis

2023

Predicting the taxonomic affiliation of DNA sequences collected from biological samples is a fundamental step in biodiversity assessment. This task is performed by leveraging existing databases containing reference DNA sequences endowed with a taxonomic identification. However, environmental sequences can be from organisms that are either unknown to science or for which there are no reference sequences available. Thus, taxonomic novelty of a sequence needs to be accounted for when doing classification. We propose Bayesian nonparametric taxonomic classifiers, BayesANT, which use species sampling model priors to allow unobserved taxa to be discovered at each taxonomic rank. Using a simple product multinomial likelihood with conjugate Dirichlet priors at the lowest rank, a highly flexible supervised algorithm is developed to provide a probabilistic prediction of the taxa placement of each sequence at each rank. As an illustration, we run our algorithm on a carefully annotated library of Finnish arthropods (FinBOL). To assess the ability of BayesANT to recognize novelty and to predict known taxonomic affiliations correctly, we test it on two training‐test splitting scenarios, each with a different proportion of taxa unobserved in training. We show how our algorithm attains accurate predictions and reliably quantifies classification uncertainty, especially when many sequences in the test set are affiliated to taxa unknown in training. By enabling taxonomic predictions for DNA barcodes to identify unseen branches, we believe BayesANT will be of broad utility as a tool for DNA metabarcoding within bioinformatics pipelines.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter