Catalogue Search | MBRL

Equivalence of MAXENT and Poisson Point Process Models for Species Distribution Modeling in Ecology

by Renner, Ian W. , Warton, David I. in Animals , biogeography , BIOMETRIC PRACTICE

2013

Modeling the spatial distribution of a species is a fundamental problem in ecology. A number of modeling methods have been developed, an extremely popular one being MAXENT, a maximum entropy modeling approach. In this article, we show that MAXENT is equivalent to a Poisson regression model and hence is related to a Poisson point process model, differing only in the intercept term, which is scale‐dependent in MAXENT. We illustrate a number of improvements to MAXENT that follow from these relations. In particular, a point process model approach facilitates methods for choosing the appropriate spatial resolution, assessing model adequacy, and choosing the LASSO penalty parameter, all currently unavailable to MAXENT. The equivalence result represents a significant step in the unification of the species distribution modeling literature.

Journal Article

Share this book

Add to My Shelf

The arcsine is asinine: the analysis of proportions in ecology

by Warton, David I. , Hui, Francis K. C. in Animal and plant ecology , Animal, plant and microbial ecology , arcsine transformation

2011

The arcsine square root transformation has long been standard procedure when analyzing proportional data in ecology, with applications in data sets containing binomial and non-binomial response variables. Here, we argue that the arcsine transform should not be used in either circumstance. For binomial data, logistic regression has greater interpretability and higher power than analyses of transformed data. However, it is important to check the data for additional unexplained variation, i.e., overdispersion, and to account for it via the inclusion of random effects in the model if found. For non-binomial data, the arcsine transform is undesirable on the grounds of interpretability, and because it can produce nonsensical predictions. The logit transformation is proposed as an alternative approach to address these issues. Examples are presented in both cases to illustrate these advantages, comparing various methods of analyzing proportions including untransformed, arcsine- and logit-transformed linear models and logistic regression (with or without random effects). Simulations demonstrate that logistic regression usually provides a gain in power over other methods.

Journal Article

Share this book

Add to My Shelf

Model-Based Control of Observer Bias for the Analysis of Presence-Only Data in Ecology

by Ramp, Daniel , Renner, Ian W. , Warton, David I. in Australia , Bias , Biometrics

2013

Presence-only data, where information is available concerning species presence but not species absence, are subject to bias due to observers being more likely to visit and record sightings at some locations than others (hereafter \"observer bias\"). In this paper, we describe and evaluate a model-based approach to accounting for observer bias directly--by modelling presence locations as a function of known observer bias variables (such as accessibility variables) in addition to environmental variables, then conditioning on a common level of bias to make predictions of species occurrence free of such observer bias. We implement this idea using point process models with a LASSO penalty, a new presence-only method related to maximum entropy modelling, that implicitly addresses the \"pseudo-absence problem\" of where to locate pseudo-absences (and how many). The proposed method of bias-correction is evaluated using systematically collected presence/absence data for 62 plant species endemic to the Blue Mountains near Sydney, Australia. It is shown that modelling and controlling for observer bias significantly improves the accuracy of predictions made using presence-only data, and usually improves predictions as compared to pseudo-absence or \"inventory\" methods of bias correction based on absences from non-target species. Future research will consider the potential for improving the proposed bias-correction approach by estimating the observer bias simultaneously across multiple species.

Journal Article

Share this book

Add to My Shelf

The PIT-trap—A “model-free” bootstrap procedure for inference about regression models with discrete, multivariate responses

by Thibaut, Loïc , Wang, Yi Alice , Warton, David I. in Abundance , Biology and Life Sciences , Computer Simulation

2017

Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)-common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of \"model-free bootstrap\", adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.

Journal Article

Share this book

Add to My Shelf

Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure

by Gurutzeta Guillera-Arroita , David I. Warton , Brendan A. Wintle in Autoregressive models , Blocking , case studies

2017

Ecological data often show temporal, spatial, hierarchical (random effects), or phylogenetic structure. Modern statistical approaches are increasingly accounting for such dependencies. However, when performing cross-validation, these structures are regularly ignored, resulting in serious underestimation of predictive error. One cause for the poor performance of uncorrected (random) cross-validation, noted often by modellers, are dependence structures in the data that persist as dependence structures in model residuals, violating the assumption of independence. Even more concerning, because often overlooked, is that structured data also provides ample opportunity for overfitting with non-causal predictors. This problem can persist even if remedies such as autoregressive models, generalized least squares, or mixed models are used. Block cross-validation, where data are split strategically rather than randomly, can address these issues. However, the blocking strategy must be carefully considered. Blocking in space, time, random effects or phylogenetic distance, while accounting for dependencies in the data, may also unwittingly induce extrapolations by restricting the ranges or combinations of predictor variables available for model training, thus overestimating interpolation errors. On the other hand, deliberate blocking in predictor space may also improve error estimates when extrapolation is the modelling goal. Here, we review the ecological literature on non-random and blocked cross-validation approaches. We also provide a series of simulations and case studies, in which we show that, for all instances tested, block cross-validation is nearly universally more appropriate than random cross-validation if the goal is predicting to new data or predictor space, or for selecting causal predictors. We recommend that block cross-validation be used wherever dependence structures exist in a dataset, even if no correlation structure is visible in the fitted model residuals, or if the fitted models account for such correlations.

Journal Article

Share this book

Add to My Shelf

Efficient estimation of generalized linear latent variable models

by Brooks, Wesley , Niku, Jenni , Taskinen, Sara in Accuracy , Algorithms , Analysis

2019

Generalized linear latent variable models (GLLVM) are popular tools for modeling multivariate, correlated responses. Such data are often encountered, for instance, in ecological studies, where presence-absences, counts, or biomass of interacting species are collected from a set of sites. Until very recently, the main challenge in fitting GLLVMs has been the lack of computationally efficient estimation methods. For likelihood based estimation, several closed form approximations for the marginal likelihood of GLLVMs have been proposed, but their efficient implementations have been lacking in the literature. To fill this gap, we show in this paper how to obtain computationally convenient estimation algorithms based on a combination of either the Laplace approximation method or variational approximation method, and automatic optimization techniques implemented in R software. An extensive set of simulation studies is used to assess the performances of different methods, from which it is shown that the variational approximation method used in conjunction with automatic optimization offers a powerful tool for estimation.

Journal Article

Share this book

Add to My Shelf

A general algorithm for error-in-variables regression modelling using Monte Carlo expectation maximization

by Hwang, Wen-Han , Stoklosa, Jakub , Warton, David I. in Algorithms , Analysis , Biology and Life Sciences

2023

In regression modelling, measurement error models are often needed to correct for uncertainty arising from measurements of covariates/predictor variables. The literature on measurement error (or errors-in-variables) modelling is plentiful, however, general algorithms and software for maximum likelihood estimation of models with measurement error are not as readily available, in a form that they can be used by applied researchers without relatively advanced statistical expertise. In this study, we develop a novel algorithm for measurement error modelling, which could in principle take any regression model fitted by maximum likelihood, or penalised likelihood, and extend it to account for uncertainty in covariates. This is achieved by exploiting an interesting property of the Monte Carlo Expectation-Maximization (MCEM) algorithm, namely that it can be expressed as an iteratively reweighted maximisation of complete data likelihoods (formed by imputing the missing values). Thus we can take any regression model for which we have an algorithm for (penalised) likelihood estimation when covariates are error-free, nest it within our proposed iteratively reweighted MCEM algorithm, and thus account for uncertainty in covariates. The approach is demonstrated on examples involving generalized linear models, point process models, generalized additive models and capture–recapture models. Because the proposed method uses maximum (penalised) likelihood, it inherits advantageous optimality and inferential properties, as illustrated by simulation. We also study the model robustness of some violations in predictor distributional assumptions. Software is provided as the refitME package on R , whose key function behaves like a refit() function, taking a fitted regression model object and re-fitting with a pre-specified amount of measurement error.

Journal Article

Share this book

Add to My Shelf

Penalized Normal Likelihood and Ridge Regularization of Correlation and Covariance Matrices

by Warton, David I in Analysis of covariance , Applications , Correlation

2008

High dimensionality causes problems in various areas of statistics. A particular situation that rarely has been considered is the testing of hypotheses about multivariate regression models in which the dimension of the multivariate response is large. In this article a ridge regularization approach is proposed in which either the covariance or the correlation matrix is regularized to ensure nonsingularity irrespective of the dimensionality of the data. It is shown that the proposed approach can be derived through a penalized likelihood approach, which suggests cross-validation of the likelihood function as a natural approach for estimation of the ridge parameter. Useful properties of this likelihood estimator are derived, discussed, and demonstrated by simulation. For a class of test statistics commonly used in multivariate analysis, the proposed regularization approach is compared with some obvious alternative regularization approaches using generalized inverse and data reduction through principal components analysis. Essentially, the approaches considered differ in how they shrink eigenvalues of sample covariance and correlation matrices. This leads to predictable differences in power properties when comparing the use of different regularization approaches, as demonstrated by simulation. The proposed ridge approach has relatively good power compared with the alternatives considered. In particular, a generalized inverse is shown to perform poorly and cannot be recommended in practice. Finally, the proposed approach is used in analysis of data on macroinvertebrate biomasses that have been classified to species.

Journal Article

Share this book

Add to My Shelf

A fast method for fitting integrated species distribution models

by Popovic, Gordana C. , Dovers, Elliot , Warton, David I. in Approximation , data fusion , data integration

2024

Integrated distribution models (IDMs) predict where species might occur using data from multiple sources, a technique thought to be especially useful when data from any individual source are scarce. Recent advances allow us to fit such models with latent terms to account for dependence within and between data sources, but they are computationally challenging to fit. We propose a fast new methodology for fitting integrated distribution models using presence/absence and presence‐only data, via a spatial random effects approach combined with automatic differentiation. We have written an R package (called scampr ) for straightforward implementation of our approach. We use simulation to demonstrate that our approach has comparable performance to INLA —a common framework for fitting IDMs—but with computation times up to an order of magnitude faster. We also use simulation to look at when IDMs can be expected to outperform models fitted to a single data source, and find that the amount of benefit gained from using an IDM is a function of the relative amount of additional information available from incorporating a second data source into the model. We apply our method to predict 29 plant species in NSW, Australia, and find particular benefit in predictive performance when data from a single source are scarce and when compared to models for presence‐only data. Our faster methods of fitting IDMs make it feasible to more deeply explore the model space (e.g. comparing different ways to model latent terms), and in future work, to consider extensions to more complex models, for example the multi‐species setting.

Journal Article

Share this book

Add to My Shelf

Global patterns in plant height

by Swenson, Nathan G. , Moles, Angela T. , Warman, Laura in Altitude , Animal and plant ecology , Animal, plant and microbial ecology

2009

1. Plant height is a central part of plant ecological strategy. It is strongly correlated with life span, seed mass and time to maturity, and is a major determinant of a species' ability to compete for light. Plant height is also related to critical ecosystem variables such as animal diversity and carbon storage capacity. However, remarkably little is known about global patterns in plant height. Here, we use maximum height data for 7084 plant Species x Site combinations to provide the first global, cross-species quantification of the latitudinal gradient in plant height. 2. The mean maximum height of species growing within 15° of the equator (7.8 m) was 29 times greater than the height of species between 60° and 75° N (27 cm), and 31 times greater than the height of species between 45° and 60° S (25 cm). There was no evidence that the latitudinal gradient in plant height was different in the northern hemisphere than in the southern hemisphere (P = 0.29). A 2.4-fold drop in plant height at the edge of the tropics (P = 0.006) supports the idea that there might be a switch in plant strategy between temperate and tropical zones. 3. We investigated 22 environmental variables to determine which factors underlie the latitudinal gradient in plant height. We found that species with a wide range of height strategies were present in cold, dry, low productivity systems, but there was a noticeable lack of very short species in wetter, warmer, more productive sites. Variables that capture information about growing conditions during the harsh times of the year were relatively poor predictors of height. The best model for global patterns in plant height included only one term: precipitation in the wettest month (R² = 0.256). 4. Synthesis. We found a remarkably steep relationship between latitude and height, indicating a major difference in plant strategy between high and low latitude systems. We also provide new, surprising information about the correlations between plant height and environmental variables.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter