Catalogue Search | MBRL

DISTRIBUTION THEORY FOR HIERARCHICAL PROCESSES

by Camerlenghi, Federico , Prünster, Igor , Orbanz, Peter in Algorithms , Bayesian analysis , Computer simulation

2019

Hierarchies of discrete probability measures are remarkably popular as nonparametric priors in applications, arguably due to two key properties: (i) they naturally represent multiple heterogeneous populations; (ii) they produce ties across populations, resulting in a shrinkage property often described as “sharing of information.” In this paper, we establish a distribution theory for hierarchical random measures that are generated via normalization, thus encompassing both the hierarchical Dirichlet and hierarchical Pitman–Yor processes. These results provide a probabilistic characterization of the induced (partially exchangeable) partition structure, including the distribution and the asymptotics of the number of partition sets, and a complete posterior characterization. They are obtained by representing hierarchical processes in terms of completely random measures, and by applying a novel technique for deriving the associated distributions. Moreover, they also serve as building blocks for new simulation algorithms, and we derive marginal and conditional algorithms for Bayesian inference.

Journal Article

Share this book

Add to My Shelf

MEASURING DEPENDENCE IN THE WASSERSTEIN DISTANCE FOR BAYESIAN NONPARAMETRIC MODELS

by Catalano, Marta , Prünster, Igor , Lijoi, Antonio in Approximation , Bayesian analysis , Graphs

2021

The proposal and study of dependent Bayesian nonparametric models has been one of the most active research lines in the last two decades, with random vectors of measures representing a natural and popular tool to define them. Nonetheless, a principled approach to understand and quantify the associated dependence structure is still missing. We devise a general, and not model-specific, framework to achieve this task for random measure based models, which consists in: (a) quantify dependence of a random vector of probabilities in terms of closeness to exchangeability, which corresponds to the maximally dependent coupling with the same marginal distributions, that is, the comonotonic vector; (b) recast the problem in terms of the underlying random measures (in the same Fréchet class) and quantify the closeness to comonotonicity; (c) define a distance based on the Wasserstein metric, which is ideally suited for spaces of measures, to measure the dependence in a principled way. Several results, which represent the very first in the area, are obtained. In particular, useful bounds in terms of the underlying Lévy intensities are derived relying on compound Poisson approximations. These are then specialized to popular models in the Bayesian literature leading to interesting insights.

Journal Article

Share this book

Add to My Shelf

Robustifying Bayesian Nonparametric Mixtures for Count Data

by Prünster, Igor , Canale, Antonio in Abundance , Abundance heterogeneity , Animal populations

2017

Our motivating application stems from surveys of natural populations and is characterized by large spatial heterogeneity in the counts, which makes parametric approaches to modeling local animal abundance too restrictive. We adopt a Bayesian nonparametric approach based on mixture models and innovate with respect to popular Dirichlet process mixture of Poisson kernels by increasing the model flexibility at the level both of the kernel and the nonparametric mixing measure. This allows to derive accurate and robust estimates of the distribution of local animal abundance and of the corresponding clusters. The application and a simulation study for different scenarios yield also some general methodological implications. Adding flexibility solely at the level of the mixing measure does not improve inferences, since its impact is severely limited by the rigidity of the Poisson kernel with considerable consequences in terms of bias. However, once a kernel more flexible than the Poisson is chosen, inferences can be robustified by choosing a prior more general than the Dirichlet process. Therefore, to improve the performance of Bayesian nonparametric mixtures for count data one has to enrich the model simultaneously at both levels, the kernel and the mixing measure.

Journal Article

Share this book

Add to My Shelf

SURVIVAL ANALYSIS VIA HIERARCHICALLY DEPENDENT MIXTURE HAZARDS

by Prünster, Igor , Camerlenghi, Federico , Lijoi, Antonio in Algorithms , Bayesian analysis , Combinatorial analysis

2021

Hierarchical nonparametric processes are popular tools for defining priors on collections of probability distributions, which induce dependence across multiple samples. In survival analysis problems, one is typically interested in modeling the hazard rates, rather than the probability distributions themselves, and the currently available methodologies are not applicable. Here, we fill this gap by introducing a novel, and analytically tractable, class of multivariate mixtures whose distribution acts as a prior for the vector of sample-specific baseline hazard rates. The dependence is induced through a hierarchical specification of the mixing random measures that ultimately corresponds to a composition of random discrete combinatorial structures. Our theoretical results allow to develop a full Bayesian analysis for this class of models, which can also account for right-censored survival data and covariates, and we also show posterior consistency. In particular, we emphasize that the posterior characterization we achieve is the key for devising both marginal and conditional algorithms for evaluating Bayesian inferences of interest. The effectiveness of our proposal is illustrated through some synthetic and real data examples.

Journal Article

Share this book

Add to My Shelf

Bayesian inference with dependent normalized completely random measures

by NIPOTI, BERNARDO , PRÜNSTER, IGOR , LIJOI, ANTONIO in Bayesian inference , completely random measure , Datasets

2014

The proposal and study of dependent prior processes has been a major research focus in the recent Bayesian nonparametric literature. In this paper, we introduce a flexible class of dependent nonparametric priors, investigate their properties and derive a suitable sampling scheme which allows their concrete implementation. The proposed class is obtained by normalizing dependent completely random measures, where the dependence arises by virtue of a suitable construction of the Poisson random measures underlying the completely random measures. We first provide general distributional results for the whole class of dependent completely random measures and then we specialize them to two specific priors, which represent the natural candidates for concrete implementation due to their analytic tractability: the bivariate Dirichlet and normalized σ-stable processes. Our analytical results, and in particular the partially exchangeable partition probability function, form also the basis for the determination of a Markov Chain Monte Carlo algorithm for drawing posterior inferences, which reduces to the well-known Blackwell-MacQueen Pólya urn scheme in the univariate case. Such an algorithm can be used for density estimation and for analyzing the clustering structure of the data and is illustrated through a real two-sample dataset example.

Journal Article

Share this book

Add to My Shelf

Bayesian nonparametric inference beyond the Gibbs-type framework

by Prünster, Igor , Camerlenghi, Federico , Lijoi, Antonio in Bayesian analysis , Bayesian nonparametrics , completely random measure

2018

The definition and investigation of general classes of nonparametric priors has recently been an active research line in Bayesian statistics. Among the various proposals, the Gibbs-type family, which includes the Dirichlet process as a special case, stands out as the most tractable class of nonparametric priors for exchangeable sequences of observations. This is the consequence of a key simplifying assumption on the learning mechanism, which, however, has justification except that of ensuring mathematical tractability. In this paper, we remove such an assumption and investigate a general class of random probability measures going beyond the Gibbs-type framework. More specifically, we present a nonparametric hierarchical structure based on transformations of completely random measures, which extends the popular hierarchical Dirichlet process. This class of priors preserves a good degree of tractability, given that we are able to determine the fundamental quantities for Bayesian inference. In particular, we derive the induced partition structure and the prediction rules and characterize the posterior distribution. These theoretical results are also crucial to devise both a marginal and a conditional algorithm for posterior inference. An illustration concerning prediction in genomic sequencing is also provided.

Journal Article

Share this book

Add to My Shelf

Asymptotic behavior of the number of distinct values in a sample from the geometric stick-breaking process

by Prünster, Igor , De Blasi, Pierpaolo , Mena, Ramsés H. in Asymptotic properties , Behavior , Binomial distribution

2022

Discrete random probability measures are a key ingredient of Bayesian nonparametric inference. A sample generates ties with positive probability and a fundamental object of both theoretical and applied interest is the corresponding number of distinct values. The growth rate can be determined from the rate of decay of the small frequencies implying that, when the decreasingly ordered frequencies admit a tractable form, the asymptotics of the number of distinct values can be conveniently assessed. We focus on the geometric stick-breaking process and we investigate the effect of the distribution for the success probability on the asymptotic behavior of the number of distinct values. A whole range of logarithmic behaviors are obtained by appropriately tuning the prior. A two-term expansion is also derived and illustrated in a comparison with a larger family of discrete random probability measures having an additional parameter given by the scale of the negative binomial distribution.

Journal Article

Share this book

Add to My Shelf

Bayesian modeling via discrete nonparametric priors

by Catalano, Marta , Prünster, Igor , Rigon, Tommaso in Chemistry and Earth Sciences , Computer Science , Economics

2023

The availability of complex-structured data has sparked new research directions in statistics and machine learning. Bayesian nonparametrics is at the forefront of this trend thanks to two crucial features: its coherent probabilistic framework, which naturally leads to principled prediction and uncertainty quantification, and its infinite-dimensionality, which exempts from parametric restrictions and ensures full modeling flexibility. In this paper, we provide a concise overview of Bayesian nonparametrics starting from its foundations and the Dirichlet process, the most popular nonparametric prior. We describe the use of the Dirichlet process in species discovery, density estimation, and clustering problems. Among the many generalizations of the Dirichlet process proposed in the literature, we single out the Pitman–Yor process, and compare it to the Dirichlet process. Their different features are showcased with real-data illustrations. Finally, we consider more complex data structures, which require dependent versions of these models. One of the most effective strategies to achieve this goal is represented by hierarchical constructions. We highlight the role of the dependence structure in the borrowing of information and illustrate its effectiveness on unbalanced datasets.

Journal Article

Share this book

Add to My Shelf

Modeling with Normalized Random Measure Mixture Models

by Nieto-Barajas, Luis E. , Barrios, Ernesto , Prünster, Igor in A priori knowledge , Barrios , Bayesian nonparametrics

2013

The Dirichlet process mixture model and more general mixtures based on discrete random probability measures have been shown to be flexible and accurate models for density estimation and clustering. The goal of this paper is to illustrate the use of normalized random measures as mixing measures in nonparametric hierarchical mixture models and point out how possible computational issues can be successfully addressed. To this end, we first provide a concise and accessible introduction to normalized random measures with independent increments. Then, we explain in detail a particular way of sampling from the posterior using the Ferguson-Klass representation. We develop a thorough comparative analysis for location-scale mixtures that considers a set of alternatives for the mixture kernel and for the nonparametric component. Simulation results indicate that normalized random measure mixtures potentially represent a valid default choice for density estimation problems. As a byproduct of this study an R package to fit these models was produced and is available in the Comprehensive R Archive Network (CRAN).

Journal Article

Share this book

Add to My Shelf

A New Estimator of the Discovery Probability

by Prünster, Igor , Favaro, Stefano , Lijoi, Antonio in Algorithms , Bayesian nonparametrics , Biodiversity

2012

Species sampling problems have a long history in ecological and biological studies and a number of issues, including the evaluation of species richness, the design of sampling experiments, and the estimation of rare species variety, are to be addressed. Such inferential problems have recently emerged also in genomic applications, however, exhibiting some peculiar features that make them more challenging: specifically, one has to deal with very large populations (genomic libraries) containing a huge number of distinct species (genes) and only a small portion of the library has been sampled (sequenced). These aspects motivate the Bayesian nonparametric approach we undertake, since it allows to achieve the degree of flexibility typically needed in this framework. Based on an observed sample of size n, focus will be on prediction of a key aspect of the outcome from an additional sample of size m, namely, the so-called discovery probability. In particular, conditionally on an observed basic sample of size n, we derive a novel estimator of the probability of detecting, at the (n + m + 1)th observation, species that have been observed with any given frequency in the enlarged sample of size n + m. Such an estimator admits a closed-form expression that can be exactly evaluated. The result we obtain allows us to quantify both the rate at which rare species are detected and the achieved sample coverage of abundant species, as m increases. Natural applications are represented by the estimation of the probability of discovering rare genes within genomic libraries and the results are illustrated by means of two expressed sequence tags datasets.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter