Catalogue Search | MBRL

Sufficientness Postulates for Gibbs-Type Priors and Hierarchical Generalizations

by Battiston, M. , Favaro, S. , Bacallado, S. in Bayesian analysis , Dirichlet problem , Probability

2017

A fundamental problem in Bayesian nonparametrics consists of selecting a prior distribution by assuming that the corresponding predictive probabilities obey certain properties. An early discussion of such a problem, although in a parametric framework, dates back to the seminal work by English philosopher W. E. Johnson, who introduced a noteworthy characterization for the predictive probabilities of the symmetric Dirichlet prior distribution. This is typically referred to as Johnson's \"sufficientness\" postulate. In this paper, we review some nonparametric generalizations of Johnson's postulate for a class of nonparametric priors known as species sampling models. In particular, we revisit and discuss the \"sufficientness\" postulate for the two parameter Poisson–Dirichlet prior within the more general framework of Gibbs-type priors and their hierarchical generalizations.

Journal Article

Share this book

Add to My Shelf

Effects of sampling protocol on the shapes of species richness curves

by Dengler, Jürgen , Oldeland, Jens in Animal and plant ecology , Animal, plant and microbial ecology , Biogeography

2010

Scheiner (Journal of Biogeography, 2009, 36, 2005-2008) criticized several issues regarding the typology and analysis of species richness curves that were brought forward by Dengler (Journal of Biogeography, 2009, 36, 728-744). In order to test these two sets of views in greater detail, we used a simulation model of ecological communities to demonstrate the effects of different sampling schemes on the shapes of species richness curves and their extrapolation capability. We simulated five random communities with 100 species on a 64 x 64 grid using random fields. Then we sampled species-area relationships (SARs, contiguous plots) as well as species-sampling relationships (SSRs, non-contiguous plots) from these communities, both for the full extent and the central quarter of the grid. Finally, we fitted different functions (power, quadratic power, logarithmic, Michaelis-Menten, Lomolino) to the obtained data and assessed their goodness-of-fit (Akaike weights) and their extrapolation capability (deviation of the predicted value from the true value). We found that power functions gave the best fit for SARs, while for SSRs saturation functions performed better. Curves constructed from data of 32² grid cells gave reasonable extrapolations for 64² grid cells for SARs, irrespective of whether samples were gathered from the full extent or the centre only. By contrast, SSRs worked well for extrapolation only in the latter case. SARs and SSRs have fundamentally different curve shapes. Both sampling strategies can be used for extrapolation of species richness to a target area, but only SARs allow for extrapolation to a larger area than that sampled. These results confirm a fundamental difference between SARs and area-based SSRs and thus support their typological differentiation.

Journal Article

Share this book

Add to My Shelf

Which function describes the species-area relationship best? A review and empirical evaluation

by Dengler, Jürgen in analytical methods , Animal and plant ecology , Animal, plant and microbial ecology

2009

The aims of this study are to resolve terminological confusion around different types of species-area relationships (SARs) and their delimitation from species sampling relationships (SSRs), to provide a comprehensive overview of models and analytical methods for SARs, to evaluate these theoretically and empirically, and to suggest a more consistent approach for the treatment of species-area data. Curonian Spit in north-west Russia and archipelagos world-wide. First, I review various typologies for SARs and SSRs as well as mathematical models, fitting procedures and goodness-of-fit measures applied to SARs. This results in a list of 23 function types, which are applicable both for untransformed (S) and for log-transformed (log S) species richness. Then, example data sets for nested plots in continuous vegetation (n = 14) and islands (n = 6) are fitted to a selection of 12 function types (linear, power, logarithmic, saturation, sigmoid) both for S and for log S. The suitability of these models is assessed with Akaike's information criterion for S and log S, and with a newly proposed metric that addresses extrapolation capability. SARs, which provide species numbers for different areas and have no upper asymptote, must be distinguished from SSRs, which approach the species richness of one single area asymptotically. Among SARs, nested plots in continuous ecosystems, non-nested plots in continuous ecosystems, and isolates can be distinguished. For the SARs of the empirical data sets, the normal and quadratic power functions as well as two of the sigmoid functions (Lomolino, cumulative beta-P) generally performed well. The normal power function (fitted for S) was particularly suitable for predicting richness values over ten-fold increases in area. Linear, logarithmic, convex saturation and logistic functions generally were inappropriate. However, the two sigmoid models produced unstable results with arbitrary parameter estimates, and the quadratic power function resulted in decreasing richness values for large areas. Based on theoretical considerations and empirical results, I suggest that the power law should be used to describe and compare any type of SAR while at the same time testing whether the exponent z changes with spatial scale. In addition, one should be aware that power-law parameters are significantly influenced by methodology.

Journal Article

Share this book

Add to My Shelf

DISTRIBUTION THEORY FOR HIERARCHICAL PROCESSES

by Camerlenghi, Federico , Prünster, Igor , Orbanz, Peter in Algorithms , Bayesian analysis , Computer simulation

2019

Hierarchies of discrete probability measures are remarkably popular as nonparametric priors in applications, arguably due to two key properties: (i) they naturally represent multiple heterogeneous populations; (ii) they produce ties across populations, resulting in a shrinkage property often described as “sharing of information.” In this paper, we establish a distribution theory for hierarchical random measures that are generated via normalization, thus encompassing both the hierarchical Dirichlet and hierarchical Pitman–Yor processes. These results provide a probabilistic characterization of the induced (partially exchangeable) partition structure, including the distribution and the asymptotics of the number of partition sets, and a complete posterior characterization. They are obtained by representing hierarchical processes in terms of completely random measures, and by applying a novel technique for deriving the associated distributions. Moreover, they also serve as building blocks for new simulation algorithms, and we derive marginal and conditional algorithms for Bayesian inference.

Journal Article

Share this book

Add to My Shelf

Inferring taxonomic placement from DNA barcoding aiding in discovery of new taxa

by Dunson, David B. , Zito, Alessandro , Rigon, Tommaso in Algorithms , Arthropods , Bayesian analysis

2023

Predicting the taxonomic affiliation of DNA sequences collected from biological samples is a fundamental step in biodiversity assessment. This task is performed by leveraging existing databases containing reference DNA sequences endowed with a taxonomic identification. However, environmental sequences can be from organisms that are either unknown to science or for which there are no reference sequences available. Thus, taxonomic novelty of a sequence needs to be accounted for when doing classification. We propose Bayesian nonparametric taxonomic classifiers, BayesANT, which use species sampling model priors to allow unobserved taxa to be discovered at each taxonomic rank. Using a simple product multinomial likelihood with conjugate Dirichlet priors at the lowest rank, a highly flexible supervised algorithm is developed to provide a probabilistic prediction of the taxa placement of each sequence at each rank. As an illustration, we run our algorithm on a carefully annotated library of Finnish arthropods (FinBOL). To assess the ability of BayesANT to recognize novelty and to predict known taxonomic affiliations correctly, we test it on two training‐test splitting scenarios, each with a different proportion of taxa unobserved in training. We show how our algorithm attains accurate predictions and reliably quantifies classification uncertainty, especially when many sequences in the test set are affiliated to taxa unknown in training. By enabling taxonomic predictions for DNA barcodes to identify unseen branches, we believe BayesANT will be of broad utility as a tool for DNA metabarcoding within bioinformatics pipelines.

Journal Article

Share this book

Add to My Shelf

Slowdowns in Diversification Rates from Real Phylogenies May Not be Real

by Cusimano, Natalie , Renner, Susanne S. in Biodiversity , Biological taxonomies , Diversification rate

2010

Studies of diversification patterns often find a slowing in lineage accumulation toward the present. This seemingly pervasive pattern of rate downturns has been taken as evidence for adaptive radiations, density-dependent regulation, and metacommunity species interactions. The significance of rate downturns is evaluated with statistical tests (the γ statistic and Monte Carlo constant rates (MCCR) test; birth–death likelihood models and Akaike Information Criterion [AIC] scores) that rely on null distributions, which assume that the included species are a random sample of the entire clade. Sampling in real phylogenies, however, often is nonrandom because systematists try to include early-diverging species or representatives of previous intrataxon classifications. We studied the effects of biased sampling, structured sampling, and random sampling by experimentally pruning simulated trees (60 and 150 species) as well as a completely sampled empirical tree (58 species) and then applying the γ statistic/MCCR test and birth–death likelihood models/AIC scores to assess rate changes. For trees with random species sampling, the true model (i.e., the one fitting the complete phylogenies) could be inferred in most cases. Oversampling deep nodes, however, strongly biases inferences toward downturns, with simulations of structured and biased sampling suggesting that this occurs when sampling percentages drop below 80%. The magnitude of the effect and the sensitivity of diversification rate models is such that a useful rule of thumb may be not to infer rate downturns from real trees unless they have >80% species sampling.

Journal Article

Share this book

Add to My Shelf

Long term effect of dune fixation by three shrub species on vegetation and Orthoptera communities in arid dunes of Algeria

by Djemai, Imene , Guendouz Benrima, Atika , Sba, Bent El Hadi in Analysis , Orthoptera , Plantations

2025

The present work analyzed flora and Orthoptera communities in three different plantations in the arid area of El Mesrane, Djelfa Wilaya, Algeria that begun 31 years earlier as means of fixation. The plantations consisted of retam ( Retama raetam (Forssk.) Webb), tamarix ( Tamarix gallica L.) and prickly pear cactus or opuntia ( Opuntia ficus-indica (L.) Miller). Samplings performed in 2014 recorded 46 plant species and 32 Orthoptera species of which members of the Acrididae family were dominant. Community indices showed a decreasing gradient for both flora and Orthoptera from the retam to the opuntia to the tamarix plantation. Likewise, a growing proportion of plants were linked to mobile sands in the same gradient. The retam plantation having the highest flora diversity is mainly due to the therophytes of European open tufts and plants belonging to perennial xerophile fallow lands. The rarest species of grasshoppers are mainly present in the retam plantation, in contrast to the tamarix plantation where half of the characteristic species are among the most common, suggesting that specialist and generalist species are not distributed randomly but rather to the stability of the habitat.

Journal Article

Share this book

Add to My Shelf

A New Estimator of the Discovery Probability

by Prünster, Igor , Favaro, Stefano , Lijoi, Antonio in Algorithms , Bayesian nonparametrics , Biodiversity

2012

Species sampling problems have a long history in ecological and biological studies and a number of issues, including the evaluation of species richness, the design of sampling experiments, and the estimation of rare species variety, are to be addressed. Such inferential problems have recently emerged also in genomic applications, however, exhibiting some peculiar features that make them more challenging: specifically, one has to deal with very large populations (genomic libraries) containing a huge number of distinct species (genes) and only a small portion of the library has been sampled (sequenced). These aspects motivate the Bayesian nonparametric approach we undertake, since it allows to achieve the degree of flexibility typically needed in this framework. Based on an observed sample of size n, focus will be on prediction of a key aspect of the outcome from an additional sample of size m, namely, the so-called discovery probability. In particular, conditionally on an observed basic sample of size n, we derive a novel estimator of the probability of detecting, at the (n + m + 1)th observation, species that have been observed with any given frequency in the enlarged sample of size n + m. Such an estimator admits a closed-form expression that can be exactly evaluated. The result we obtain allows us to quantify both the rate at which rare species are detected and the achieved sample coverage of abundant species, as m increases. Natural applications are represented by the estimation of the probability of discovering rare genes within genomic libraries and the results are illustrated by means of two expressed sequence tags datasets.

Journal Article

Share this book

Add to My Shelf

A New Method for Handling Missing Species in Diversification Analysis Applicable to Randomly or Nonrandomly Sampled Phylogenies

by Cusimano, Natalie , Renner, Susanne S. , Stadler, Tanja in Araceae - classification , Araceae - genetics , Asia, Southeastern

2012

Chronograms from molecular dating are increasingly being used to infer rates of diversification and their change over time. A major limitation in such analyses is incomplete species sampling that moreover is usually nonrandom. While the widely used γ statistic with the Monte Carlo constant-rates test or the birth-death likelihood analysis with the AAICrc test statistic are appropriate for comparing the fit of different diversification models in phylogenies with random species sampling, no objective automated method has been developed for fitting diversification models to nonrandomly sampled phylogenies. Here, we introduce a novel approach, CorSiM, which involves simulating missing splits under a constant rate birth-death model and allows the user to specify whether species sampling in the phylogeny being analyzed is random or nonrandom. The completed trees can be used in subsequent model-fitting analyses. This is fundamentally different from previous diversification rate estimation methods, which were based on null distributions derived from the incomplete trees. CorSiM is automated in an R package and can easily be applied to large data sets. We illustrate the approach in two Araceae clades, one with a random species sampling of 52% and one with a nonrandom sampling of 55%. In the latter clade, the CorSiM approach detects and quantifies an increase in diversification rate, whereas classic approaches prefer a constant rate model; in the former clade, results do not differ among methods (as indeed expected since the classic approaches are valid only for randomly sampled phylogenies). The CorSiM method greatly reduces the type I error in diversification analysis, but type II error remains a methodological problem.

Journal Article

Share this book

Add to My Shelf

Bayesian non-parametric inference for species variety with a two-parameter Poisson-Dirichlet process prior

by Favaro, Stefano , Mena, Ramsés H. , Prünster, Igor in Asymptotics , Bayesian analysis , Bayesian method

2009

A Bayesian non-parametric methodology has been recently proposed to deal with the issue of prediction within species sampling problems. Such problems concern the evaluation, conditional on a sample of size n, of the species variety featured by an additional sample of size m. Genomic applications pose the additional challenge of having to deal with large values of both n and m. In such a case the computation of the Bayesian non-parametric estimators is cumbersome and prevents their implementation. We focus on the two-parameter Poisson-Dirichlet model and provide completely explicit expressions for the corresponding estimators, which can be easily evaluated for any sizes of n and m. We also study the asymptotic behaviour of the number of new species conditionally on the observed sample: such an asymptotic result, combined with a suitable simulation scheme, allows us to derive asymptotic highest posterior density intervals for the estimates of interest. Finally, we illustrate the implementation of the proposed methodology by the analysis of five expressed sequence tags data sets.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter