Catalogue Search | MBRL

Using the negative binomial distribution to model overdispersion in ecological count data

by Mäntyniemi, Samu , Lindén, Andreas in aggregation behavior , Animal and plant ecology , Animal Migration

2011

A Poisson process is a commonly used starting point for modeling stochastic variation of ecological count data around a theoretical expectation. However, data typically show more variation than implied by the Poisson distribution. Such overdispersion is often accounted for by using models with different assumptions about how the variance changes with the expectation. The choice of these assumptions can naturally have apparent consequences for statistical inference. We propose a parameterization of the negative binomial distribution, where two overdispersion parameters are introduced to allow for various quadratic mean-–variance relationships, including the ones assumed in the most commonly used approaches. Using bird migration as an example, we present hypothetical scenarios on how overdispersion can arise due to sampling, flocking behavior or aggregation, environmental variability, or combinations of these factors. For all considered scenarios, mean-–variance relationships can be appropriately described by the negative binomial distribution with two overdispersion parameters. To illustrate, we apply the model to empirical migration data with a high level of overdispersion, gaining clearly different model fits with different assumptions about mean-–variance relationships. The proposed framework can be a useful approximation for modeling marginal distributions of independent count data in likelihood-based analyses.

Journal Article

Share this book

Add to My Shelf

Faster permutation inference in brain imaging

by Winkler, Anderson M. , Smith, Stephen M. , Ridgway, Gerard R. in Algorithms , Binomial distribution , Brain - diagnostic imaging

2016

Permutation tests are increasingly being used as a reliable method for inference in neuroimaging analysis. However, they are computationally intensive. For small, non-imaging datasets, recomputing a model thousands of times is seldom a problem, but for large, complex models this can be prohibitively slow, even with the availability of inexpensive computing power. Here we exploit properties of statistics used with the general linear model (GLM) and their distributions to obtain accelerations irrespective of generic software or hardware improvements. We compare the following approaches: (i) performing a small number of permutations; (ii) estimating the p-value as a parameter of a negative binomial distribution; (iii) fitting a generalised Pareto distribution to the tail of the permutation distribution; (iv) computing p-values based on the expected moments of the permutation distribution, approximated from a gamma distribution; (v) direct fitting of a gamma distribution to the empirical permutation distribution; and (vi) permuting a reduced number of voxels, with completion of the remainder using low rank matrix theory. Using synthetic data we assessed the different methods in terms of their error rates, power, agreement with a reference result, and the risk of taking a different decision regarding the rejection of the null hypotheses (known as the resampling risk). We also conducted a re-analysis of a voxel-based morphometry study as a real-data example. All methods yielded exact error rates. Likewise, power was similar across methods. Resampling risk was higher for methods (i), (iii) and (v). For comparable resampling risks, the method in which no permutations are done (iv) was the absolute fastest. All methods produced visually similar maps for the real data, with stronger effects being detected in the family-wise error rate corrected maps by (iii) and (v), and generally similar to the results seen in the reference set. Overall, for uncorrected p-values, method (iv) was found the best as long as symmetric errors can be assumed. In all other settings, including for familywise error corrected p-values, we recommend the tail approximation (iii). The methods considered are freely available in the tool PALM — Permutation Analysis of Linear Models. •Permutation methods can be accelerated through additional statistical approaches.•Six approaches are described and assessed.•Methods can be 100 times faster than in the non-accelerated case.•Recommendations are provided for various common scenarios.

Journal Article

Share this book

Add to My Shelf

ESC: an efficient error-based stopping criterion for kriging-based reliability analysis methods

by Shafieezadeh, Abdollah , Wang, Zeyu in Accuracy , Binomial distribution , Computational Mathematics and Numerical Analysis

2019

The ever-increasing complexity of numerical models and associated computational demands have challenged classical reliability analysis methods. Surrogate model-based reliability analysis techniques, and in particular those using kriging meta-model, have gained considerable attention recently for their ability to achieve high accuracy and computational efficiency. However, existing stopping criteria, which are used to terminate the training of surrogate models, do not directly relate to the error in estimated failure probabilities. This limitation can lead to high computational demands because of unnecessary calls to costly performance functions (e.g., involving finite element models) or potentially inaccurate estimates of failure probability due to premature termination of the training process. Here, we propose the error-based stopping criterion ( ESC ) to address these limitations. First, it is shown that the total number of wrong sign estimation of the performance function for candidate design samples by kriging, S , follows a Poisson binomial distribution. This finding is subsequently used to estimate the lower and upper bounds of S for a given confidence level for sets of candidate design samples classified by kriging as safe and unsafe. An upper bound of error of the estimated failure probability is subsequently derived according to the probabilistic properties of Poisson binomial distribution. The proposed upper bound is implemented in the kriging-based reliability analysis method as the stopping criterion. The efficiency and robustness of ESC are investigated here using five benchmark reliability analysis problems. Results indicate that the proposed method achieves the set accuracy target and substantially reduces the computational demand, in some cases by over 50%.

Journal Article

Share this book

Add to My Shelf

Quasi-Poisson vs. Negative Binomial Regression: How Should We Model Overdispersed Count Data?

by Boveng, Peter L. , Hoef, Jay M. Ver in aerial photography , Aerial surveys , Agnatha. Pisces

2007

Quasi-Poisson and negative binomial regression models have equal numbers of parameters, and either could be used for overdispersed count data. While they often give similar results, there can be striking differences in estimating the effects of covariates. We explain when and why such differences occur. The variance of a quasi-Poisson model is a linear function of the mean while the variance of a negative binomial model is a quadratic function of the mean. These variance relationships affect the weights in the iteratively weighted least-squares algorithm of fitting models to data. Because the variance is a function of the mean, large and small counts get weighted differently in quasi-Poisson and negative binomial regression. We provide an example using harbor seal counts from aerial surveys. These counts are affected by date, time of day, and time relative to low tide. We present results on a data set that showed a dramatic difference on estimating abundance of harbor seals when using quasi-Poisson vs. negative binomial regression. This difference is described and explained in light of the different weighting used in each regression method. A general understanding of weighting can help ecologists choose between these two methods.

Journal Article

Share this book

Add to My Shelf

Estimating Species Occurrence, Abundance, and Detection Probability Using Zero-Inflated Distributions

by Wenger, Seth J. , Freeman, Mary C. in abundance estimation , accuracy , Anas platyrhynchos

2008

Researchers have developed methods to account for imperfect detection of species with either occupancy (presence—absence) or count data using replicated sampling. We show how these approaches can be combined to simultaneously estimate occurrence, abundance, and detection probability by specifying a zero-inflated distribution for abundance. This approach may be particularly appropriate when patterns of occurrence and abundance arise from distinct processes operating at differing spatial or temporal scales. We apply the model to two data sets: (1) previously published data for a species of duck, Anas platyrhynchos, and (2) data for a stream fish species, Etheostoma scotti. We show that in these cases, an incomplete-detection zero-inflated modeling approach yields a superior fit to the data than other models. We propose that zero-inflated abundance models accounting for incomplete detection be considered when replicate count data are available.

Journal Article

Share this book

Add to My Shelf

Generalized additive models for location, scale and shape

by Rigby, R. A. , Stasinopoulos, D. M. in Additives , Algorithms , Beta-binomial distribution

2005

A general class of statistical models for a univariate response variable is presented which we call the generalized additive model for location, scale and shape (GAMLSS). The model assumes independent observations of the response variable y given the parameters, the explanatory variables and the values of the random effects. The distribution for the response variable in the GAMLSS can be selected from a very general family of distributions including highly skew or kurtotic continuous and discrete distributions. The systematic part of the model is expanded to allow modelling not only of the mean (or location) but also of the other parameters of the distribution of y, as parametric and/or additive nonparametric (smooth) functions of explanatory variables and/or random-effects terms. Maximum (penalized) likelihood estimation is used to fit the (non)parametric models. A Newton-Raphson or Fisher scoring algorithm is used to maximize the (penalized) likelihood. The additive terms in the model are fitted by using a backfitting algorithm. Censored data are easily incorporated into the framework. Five data sets from different fields of application are analysed to emphasize the generality of the GAMLSS class of models.

Journal Article

Share this book

Add to My Shelf

Levels of Confidence and Utility for Binary Classifiers

by Zhang, Zhiyi in binary classifiers , Binary trees (Computers) , Binomial distribution

2024

Two performance measures for binary tree classifiers are introduced: the level of confidence and the level of utility. Both measures are probabilities of desirable events in the construction process of a classifier and hence are easily and intuitively interpretable. The statistical estimation of these measures is discussed. The usual maximum likelihood estimators are shown to have upward biases, and an entropy-based bias-reducing methodology is proposed. Along the way, the basic question of appropriate sample sizes at tree nodes is considered.

Journal Article

Share this book

Add to My Shelf

What can occupancy models gain from time-to-detection data?

by Altwegg, Res , Lee, Alan T. K. , Hwang, Wen-Han in Animals , Binomial distribution , Birds

2022

The time taken to detect a species during site occupancy surveys contains information about the observation process. Accounting for the observation process leads to better inference about site occupancy. We explore the gain in efficiency that can be obtained from time-to-detection (TTD) data and show that this model type has a significant benefit for estimating the parameters related to detection intensity. However, for estimating occupancy probability parameters, the efficiency improvement is generally very minor. To explore whether TTD data could add valuable information when detection intensities vary between sites and surveys, we developed a mixed exponential TTD occupancy model. This new model can simultaneously estimate the detection intensity and aggregation parameters when the number of detectable individuals at the site follows a negative binomial distribution. We found that this model provided a much better description of the occupancy patterns than conventional detection/nondetection methods among 63 bird species data from the Karoo region of South Africa. Ignoring the heterogeneity of detection intensity in the TTD model generally yielded a negative bias in the estimated occupancy probability. Using simulations, we briefly explore study design trade offs between numbers of sites and surveys for different occupancy modeling strategies.

Journal Article

Share this book

Add to My Shelf

Overdispersion and Poisson Regression

by MacDonald, John M. , Berk, Richard in Binomial distributions , Binomials , Criminology

2008

This article discusses the use of regression models for count data. A claim is often made in criminology applications that the negative binomial distribution is the conditional distribution of choice when for a count response variable there is evidence of overdispersion. Some go on to assert that the overdisperson problem can be \"solved\" when the negative binomial distribution is used instead of the more conventional Poisson distribution. In this paper, we review the assumptions required for both distributions and show that only under very special circumstances are these claims true.

Journal Article

Share this book

Add to My Shelf

Interval Estimation for a Binomial Proportion

by Brown, Lawrence D. , DasGupta, Anirban , Cai, T. Tony in Approximation , Bayes , binomial distribution

2001

We revisit the problem of interval estimation of a binomial proportion. The erratic behavior of the coverage probability of the standard Wald confidence interval has previously been remarked on in the literature (Blyth and Still, Agresti and Coull, Santner and others). We begin by showing that the chaotic coverage properties of the Wald interval are far more persistent than is appreciated. Furthermore, common textbook prescriptions regarding its safety are misleading and defective in several respects and cannot be trusted. This leads us to consideration of alternative intervals. A number of natural alternatives are presented, each with its motivation and context. Each interval is examined for its coverage probability and its length. Based on this analysis, we recommend the Wilson interval or the equal-tailed Jeffreys prior interval for small n and the interval suggested in Agresti and Coull for larger n. We also provide an additional frequentist justification for use of the Jeffreys interval.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter