Catalogue Search | MBRL

by Mary E. Blair , Steven J. Phillips , Robert E. Schapire in Abundance , Black boxes , Computer programs

2017

This software note announces a new open-source release of the Maxent software for modeling species distributions from occurrence records and environmental data, and describes a new R package for fitting such models. The new release (ver. 3.4.0) will be hosted online by the American Museum of Natural History, along with future versions. It contains small functional changes, most notably use of a complementary log-log (cloglog) transform to produce an estimate of occurrence probability. The cloglog transform derives from the recently-published interpretation of Maxent as an inhomogeneous Poisson process (IPP), giving it a stronger theoretical justification than the logistic transform which it replaces by default. In addition, the new R package, maxnet, fits Maxent models using the glmnet package for regularized generalized linear models. We discuss the implications of the IPP formulation in terms of model inputs and outputs, treating occurrence records as points rather than grid cells and interpreting the exponential Maxent model (raw output) as as an estimate of relative abundance. With these two open-source developments, we invite others to freely use and contribute to the software.

Journal Article

Share this book

Add to My Shelf

A statistical explanation of MaxEnt for ecologists

by Hastie, Trevor , Phillips, Steven J. , Chee, Yung En in Absence , Animal, plant and microbial ecology , Applied ecology

2011

MaxEnt is a program for modelling species distributions from presence-only species records. This paper is written for ecologists and describes the MaxEnt model from a statistical perspective, making explicit links between the structure of the model, decisions required in producing a modelled distribution, and knowledge about the species and the data that might affect those decisions. To begin we discuss the characteristics of presence-only data, highlighting implications for modelling distributions. We particularly focus on the problems of sample bias and lack of information on species prevalence. The keystone of the paper is a new statistical explanation of MaxEnt which shows that the model minimizes the relative entropy between two probability densities (one estimated from the presence data and one, from the landscape) defined in covariate space. For many users, this viewpoint is likely to be a more accessible way to understand the model than previous ones that rely on machine learning concepts. We then step through a detailed explanation of MaxEnt describing key components (e.g. covariates and features, and definition of the landscape extent), the mechanics of model fitting (e.g. feature selection, constraints and regularization) and outputs. Using case studies for a Banksia species native to south-west Australia and a riverine fish, we fit models and interpret them, exploring why certain choices affect the result and what this means. The fish example illustrates use of the model with vector data for linear river segments rather than raster (gridded) data. Appropriate treatments for survey bias, unprojected data, locally restricted species, and predicting to environments outside the range of the training data are demonstrated, and new capabilities discussed. Online appendices include additional details of the model and the mathematical links between previous explanations and this one, example code and data, and further information on the case studies.

Journal Article

Share this book

Add to My Shelf

Shifts in Arctic vegetation and associated feedbacks under climate change

by Loranty, Michael M. , Goetz, Scott J. , Phillips, Steven J. in 704/106/694 , Albedo , Biosphere

2013

This study shows that climate change could lead to a major redistribution of vegetation across the Arctic, with important implications for biosphere–atmosphere interactions, as well as for biodiversity conservation and ecosystem services. Woody vegetation is predicted to expand substantially over coming decades, causing more Arctic warming through positive climate feedbacks than previously thought. Climate warming has led to changes in the composition, density and distribution of Arctic vegetation in recent decades 1 , 2 , 3 , 4 . These changes cause multiple opposing feedbacks between the biosphere and atmosphere 5 , 6 , 7 , 8 , 9 , the relative magnitudes of which will have globally significant consequences but are unknown at a pan-Arctic scale 10 . The precise nature of Arctic vegetation change under future warming will strongly influence climate feedbacks, yet Earth system modelling studies have so far assumed arbitrary increases in shrubs (for example, +20%; refs 6 , 11 ), highlighting the need for predictions of future vegetation distribution shifts. Here we show, using climate scenarios for the 2050s and models that utilize statistical associations between vegetation and climate, the potential for extremely widespread redistribution of vegetation across the Arctic. We predict that at least half of vegetated areas will shift to a different physiognomic class, and woody cover will increase by as much as 52%. By incorporating observed relationships between vegetation and albedo, evapotranspiration and biomass, we show that vegetation distribution shifts will result in an overall positive feedback to climate that is likely to cause greater warming than has previously been predicted. Such extensive changes to Arctic vegetation will have implications for climate, wildlife and ecosystem services.

Journal Article

Share this book

Add to My Shelf

Sample Selection Bias and Presence-Only Distribution Models: Implications for Background and Pseudo-Absence Data

by Phillips, Steven J. , Ferrier, Simon , Elith, Jane in Animals , Applied ecology , background data

2009

Most methods for modeling species distributions from occurrence records require additional data representing the range of environmental conditions in the modeled region. These data, called background or pseudo-absence data, are usually drawn at random from the entire region, whereas occurrence collection is often spatially biased toward easily accessed areas. Since the spatial bias generally results in environmental bias, the difference between occurrence collection and background sampling may lead to inaccurate models. To correct the estimation, we propose choosing background data with the same bias as occurrence data. We investigate theoretical and practical implications of this approach. Accurate information about spatial bias is usually lacking, so explicit biased sampling of background sites may not be possible. However, it is likely that an entire target group of species observed by similar methods will share similar bias. We therefore explore the use of all occurrences within a target group as biased background data. We compare model performance using target-group background and randomly sampled background on a comprehensive collection of data for 226 species from diverse regions of the world. We find that target-group background improves average performance for all the modeling methods we consider, with the choice of background data having as large an effect on predictive performance as the choice of modeling method. The performance improvement due to target-group background is greatest when there is strong bias in the target-group presence records. Our approach applies to regression-based modeling methods that have been adapted for use with occurrence data, such as generalized linear or additive models and boosted regression trees, and to Maxent, a probability density estimation method. We argue that increased awareness of the implications of spatial bias in surveys, and possible modeling remedies, will substantially improve predictions of species distributions.

Journal Article

Share this book

Add to My Shelf

Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation

by Dudík, Miroslav , Phillips, Steven J. in Animal and plant ecology , Animal, plant and microbial ecology , biogeography

2008

Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively time-consuming to do separately for each species, or unreliable for small or biased datasets. Additionally, even with the abundance of good quality data, users interested in the application of species models need not have the statistical knowledge required for detailed tuning. In such cases, it is desirable to use \"default settings\", tuned and validated on diverse datasets. Maxent is a recently introduced modeling technique, achieving high predictive accuracy and enjoying several additional attractive properties. The performance of Maxent is influenced by a moderate number of parameters. The first contribution of this paper is the empirical tuning of these parameters. Since many datasets lack information about species absence, we present a tuning method that uses presence-only data. We evaluate our method on independently collected high-quality presence-absence data. In addition to tuning, we introduce several concepts that improve the predictive accuracy and running time of Maxent. We introduce \"hinge features\" that model more complex relationships in the training data; we describe a new logistic output format that gives an estimate of probability of presence; finally we explore \"background sampling\" strategies that cope with sample selection bias and decrease model-building time. Our evaluation, based on a diverse dataset of 226 species from 6 regions, shows: 1) default settings tuned on presence-only data achieve performance which is almost as good as if they had been tuned on the evaluation data itself; 2) hinge features substantially improve model performance; 3) logistic output improves model calibration, so that large differences in output values correspond better to large differences in suitability; 4) \"target-group\" background sampling can give much better predictive performance than random background sampling; 5) random background sampling results in a dramatic decrease in running time, with no decrease in model performance.

Journal Article

Share this book

Add to My Shelf

Transferability, sample selection bias and background data in presence-only modelling: a response to Peterson et al. (2007)

by Phillips, Steven J.

2008

Journal Article

Share this book

Add to My Shelf

On estimating probability of presence from use-availability or presence-background data

by Elith, Jane , Phillips, Steven J. in Animal and plant ecology , Animal, plant and microbial ecology , Animals

2013

A fundamental ecological modeling task is to estimate the probability that a species is present in (or uses) a site, conditional on environmental variables. For many species, available data consist of \"presence\" data (locations where the species [or evidence of it] has been observed), together with \"background\" data, a random sample of available environmental conditions. Recently published papers disagree on whether probability of presence is identifiable from such presence-background data alone. This paper aims to resolve the disagreement, demonstrating that additional information is required. We defined seven simulated species representing various simple shapes of response to environmental variables (constant, linear, convex, unimodal, S-shaped) and ran five logistic model-fitting methods using 1000 presence samples and 10 000 background samples; the simulations were repeated 100 times. The experiment revealed a stark contrast between two groups of methods: those based on a strong assumption that species' true probability of presence exactly matches a given parametric form had highly variable predictions and much larger RMS error than methods that take population prevalence (the fraction of sites in which the species is present) as an additional parameter. For six species, the former group grossly under- or overestimated probability of presence. The cause was not model structure or choice of link function, because all methods were logistic with linear and, where necessary, quadratic terms. Rather, the experiment demonstrates that an estimate of prevalence is not just helpful, but is necessary (except in special cases) for identifying probability of presence. We therefore advise against use of methods that rely on the strong assumption, due to Lele and Keim (recently advocated by Royle et al.) and Lancaster and Imbens. The methods are fragile, and their strong assumption is unlikely to be true in practice. We emphasize, however, that we are not arguing against standard statistical methods such as logistic regression, generalized linear models, and so forth, none of which requires the strong assumption. If probability of presence is required for a given application, there is no panacea for lack of data. Presence-background data must be augmented with an additional datum, e.g., species' prevalence, to reliably estimate absolute (rather than relative) probability of presence.

Journal Article

Share this book

Add to My Shelf

POC plots: calibrating species distribution models with presence-only data

by Elith, Jane , Phillips, Steven J. in Animal and plant ecology , Animal, plant and microbial ecology , background

2010

Statistical models are widely used for predicting species' geographic distributions and for analyzing species' responses to climatic and other predictor variables. Their predictive performance can be characterized in two complementary ways: discrimination, the ability to distinguish between occupied and unoccupied sites, and calibration, the extent to which a model correctly predicts conditional probability of presence. The most common measures of model performance, such as the area under the receiver operating characteristic curve (AUC), measure only discrimination. In contrast, we introduce a new tool for measuring model calibration: the presence-only calibration plot, or POC plot. This tool relies on presence-only evaluation data, which are more widely available than presence-absence evaluation data, to determine whether predictions are proportional to conditional probability of presence. We generalize the predicted/expected curves of Hirzel et al. to produce a presence-only analogue of traditional (presence-absence) calibration curves. POC plots facilitate visual exploration of model calibration, and can be used to recalibrate badly calibrated models. We demonstrate their use by recalibrating models made by the DOMAIN modeling method on a comprehensive set of 226 species from six regions of the world, significantly improving DOMAIN's predictive performance.

Journal Article

Share this book

Add to My Shelf

Another Disaster: The Closing of the National Library of Medicines Disaster Information Management Research Center

by Phillips, Steven J. in Access to information , Collaboration , Conferences

2022

Journal Article

Share this book

Add to My Shelf

The Ongoing Syrian Arab Republic Health Care Crisis

by Phillips, Steven J. in Brief Reports , Civil war , Geneva Conventions

2018

Prior to the Syrian civil war, access and delivery of health care and health care information over the past 4 decades had steadily improved. The life expectancy of the average Syrian in 2012 was 75.7 years, compared to 56 years in 1970. As a result of the civil war, this trend has reversed, with the life expectancy reduced by 20 years from the 2012 level. The Syrian government and its allies have specifically targeted the health care infrastructure not under government control. (Disaster Med Public Health Preparedness. 2018;12:23–25)

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter