Catalogue Search | MBRL

The effect of sample size on the accuracy of species distribution models: considering both presences and pseudo‐absences or background sites

by Newell, Graeme , Liu, Canran , White, Matt in Accuracy , Arches , ensemble model

2019

Most high‐performing species distribution modelling techniques require both presences, and either absences or pseudo‐absences or background points. In this paper, we explore the effect of sample size, towards developing improved strategies for modelling. We generated 1800 virtual species with three levels of prevalence using ten modelling techniques, while varying the number of training presences (NTP) and the number of random points (NRP representing pseudo‐absences or background sites). For five of the ten modelling techniques we built two versions of models: one with an equal total weight (ETW) setting where the total weight for pseudo‐absence is equivalent to the total weight for presence, and another with an unequal total weight (UTW) setting where the total weight for pseudo‐absence is not required to be equal to the total weight for presence. We compared two strategies for NRP: a small multiplier strategy (i.e. setting NRP at a few times as large as NTP), and a large number strategy (i.e. using numerous random points). We produced ensemble models (by averaging the predictions from 30 models built with the same set of training presences and different sets of random points in equivalent numbers) for three NTP magnitudes and two NRP strategies. We found that model accuracy altered as NRP increased with four distinct patterns of performance: increasing, decreasing, arch‐shaped and horizontal. In most cases ETW improved model performance. Ensemble models had higher accuracy than the corresponding single models, and this improvement was pronounced when NTP was low. We conclude that a large NRP is not always an appropriate strategy. The best choice for NRP will depend on the modelling techniques used, species prevalence and NTP. We recommend building ensemble models instead of single models, using the small multiplier strategy for NRP with ETW, especially when only a small number of species presence records are available.

Journal Article

Share this book

Add to My Shelf

Shifting ranges and conservation challenges for lemurs in the face of climate change

by Yoder, Anne D. , Brown, Jason L. in ANUSPLIN , Biodiversity , Biodiversity conservation

2015

Geospatial modeling is one of the most powerful tools available to conservation biologists for estimating current species ranges of Earth's biodiversity. Now, with the advantage of predictive climate models, these methods can be deployed for understanding future impacts on threatened biota. Here, we employ predictive modeling under a conservative estimate of future climate change to examine impacts on the future abundance and geographic distributions of Malagasy lemurs. Using distribution data from the primary literature, we employed ensemble species distribution models and geospatial analyses to predict future changes in species distributions. Current species distribution models (SDMs) were created within the BIOMOD2 framework that capitalizes on ten widely used modeling techniques. Future and current SDMs were then subtracted from each other, and areas of contraction, expansion, and stability were calculated. Model overprediction is a common issue associated Malagasy taxa. Accordingly, we introduce novel methods for incorporating biological data on dispersal potential to better inform the selection of pseudo‐absence points. This study predicts that 60% of the 57 species examined will experience a considerable range of reductions in the next seventy years entirely due to future climate change. Of these species, range sizes are predicted to decrease by an average of 59.6%. Nine lemur species (16%) are predicted to expand their ranges, and 13 species (22.8%) distribution sizes were predicted to be stable through time. Species ranges will experience severe shifts, typically contractions, and for the majority of lemur species, geographic distributions will be considerably altered. We identify three areas in dire need of protection, concluding that strategically managed forest corridors must be a key component of lemur and other biodiversity conservation strategies. This recommendation is all the more urgent given that the results presented here do not take into account patterns of ongoing habitat destruction relating to human activities. Major distribution patterns predicted for lemurs resulting from future climate change. Our results predict that most lemurs will experience considerable range shifts into the future.

Journal Article

Share this book

Add to My Shelf

Predicting Species Distributions from Museum and Herbarium Records Using Multiresponse Models Fitted with Multivariate Adaptive Regression Splines

by John Leathwick , Elith, Jane in Biodiversity conservation , Biodiversity Research , biogeography

2007

Current circumstances - that the majority of species distribution records exist as presence-only data (e.g. from museums and herbaria), and that there is an established need for predictions of species distributions - mean that scientists and conservation managers seek to develop robust methods for using these data. Such methods must, in particular, accommodate the difficulties caused by lack of reliable information about sites where species are absent. Here we test two approaches for overcoming these difficulties, analysing a range of data sets using the technique of multivariate adaptive regression splines (MARS). MARS is closely related to regression techniques such as generalized additive models (GAMs) that are commonly and successfully used in modelling species distributions, but has particular advantages in its analytical speed and the ease of transfer of analysis results to other computational environments such as a Geographic Information System. MARS also has the advantage that it can model multiple responses, meaning that it can combine information from a set of species to determine the dominant environmental drivers of variation in species composition. We use data from 226 species from six regions of the world, and demonstrate the use of MARS for distribution modelling using presence-only data. We test whether (1) the type of data used to represent absence or background and (2) the signal from multiple species affect predictive performance, by evaluating predictions at completely independent sites where genuine presence-absence data were recorded. Models developed with absences inferred from the total set of presence-only sites for a biological group, and using simultaneous analysis of multiple species to inform the choice of predictor variables, performed better than models in which species were analysed singly, or in which pseudo-absences were drawn randomly from the study area. The methods are fast, relatively simple to understand, and useful for situations where data are limited. A tutorial is included.

Journal Article

Share this book

Add to My Shelf

Sample Selection Bias and Presence-Only Distribution Models: Implications for Background and Pseudo-Absence Data

by Phillips, Steven J. , Ferrier, Simon , Elith, Jane in Animals , Applied ecology , background data

2009

Most methods for modeling species distributions from occurrence records require additional data representing the range of environmental conditions in the modeled region. These data, called background or pseudo-absence data, are usually drawn at random from the entire region, whereas occurrence collection is often spatially biased toward easily accessed areas. Since the spatial bias generally results in environmental bias, the difference between occurrence collection and background sampling may lead to inaccurate models. To correct the estimation, we propose choosing background data with the same bias as occurrence data. We investigate theoretical and practical implications of this approach. Accurate information about spatial bias is usually lacking, so explicit biased sampling of background sites may not be possible. However, it is likely that an entire target group of species observed by similar methods will share similar bias. We therefore explore the use of all occurrences within a target group as biased background data. We compare model performance using target-group background and randomly sampled background on a comprehensive collection of data for 226 species from diverse regions of the world. We find that target-group background improves average performance for all the modeling methods we consider, with the choice of background data having as large an effect on predictive performance as the choice of modeling method. The performance improvement due to target-group background is greatest when there is strong bias in the target-group presence records. Our approach applies to regression-based modeling methods that have been adapted for use with occurrence data, such as generalized linear or additive models and boosted regression trees, and to Maxent, a probability density estimation method. We argue that increased awareness of the implications of spatial bias in surveys, and possible modeling remedies, will substantially improve predictions of species distributions.

Journal Article

Share this book

Add to My Shelf

Global and regional evaluation of Corythucha marmorata distribution under different spatial modeling conditions

by Lee, Wang-Hee , Byeon, Dae-hyeon in 631/158 , 631/601 , Accuracy

2026

The performance of species distribution models is influenced by model algorithms, and the form of occurrence/non-occurrence data. Therefore, selecting an appropriate approach based on the objective and type/size of the modeling data is essential for reducing model uncertainty. In this study, we used a range of algorithm-based single models to predict the habitat suitability of Corythucha marmorata (chrysanthemum lace bug) worldwide and developed ensemble models using different methods, including mean, median, committee averaging, and weighted mean, so that they could be further applied to a specific region (South Korea). In addition, we tested the pseudo-absence data generation methods (random, surface range envelope, and Disk) using a combination of ensemble modeling methods in terms of model performance. Among the three methods, the TSS of the committee averaging algorithm and the weighted mean algorithm with the surface range envelope method were the highest at 0.980 and 0.977, respectively. These models were used to predict the potential distribution of C. marmorata in South Korea, showing a high probability of occurrence throughout the country, except on the southernmost island. Through this study, we expected to provide insights into the methodological use of species distribution modeling by incorporating various algorithm-based models, ensemble methods, and data preprocessing techniques.

Journal Article

Share this book

Add to My Shelf

Mapping large-scale bird distributions using occupancy models and citizen data with spatially biased sampling effort

by Ono, Satoru , Koizumi, Itsuro , Yabuhara, Yuki in Aquatic birds , bias , BIODIVERSITY RESEARCH

2015

Aim Although data collected by citizen scientists have received a great deal of attention for assessing species distributions over large extents, their sampling efforts are usually spatially biased. We assessed whether the bias of spatially varied sampling effort for opportunistic citizen data can be corrected using occupancy models that incorporate observation processes. Location Hokkaido Island, northern Japan. Methods We applied occupancy models for citizen data with spatially biased sampling effort to model and map large-scale distributions of 52 forest and 23 grassland/wetland bird species. We used estimated species richness (summed occupancy probabilities among the species) as the aggregated distributional patterns of each species group and compared them among two occupancy models (i.e. single-species and multispecies occupancy models), two conventional logistic regression models and Maxlike, which do not explicitly deal with observation processes. Results Conventional logistic regression models and Maxlike predicted inappropriate patterns, such as forest species preferring lowland non-forested areas where most of the data were collected. Occupancy models, however, showed more appropriate results, indicating that forest species preferred lowland forested areas. The prediction by logistic models was somewhat improved by the use of spatially biased non-detection data as the absence data; however, estimates of species richness were still much lower than those of occupancy models. Differences in model outputs were evident for the forest species but not for grassland/wetland species because citizen data covered virtually all environmental niches for grassland/wetland species. Results of the single-species and multispecies occupancy models were nearly identical, but in some cases, estimates from the single-species models were not converged or deviated notably from those of other species compared with estimates by the multispecies model. Main conclusions We found that citizen data with spatially biased sampling effort can be appropriately utilized for large-scale biodiversity distribution modelling with the use of occupancy models, which encourages data collection by citizen scientists.

Journal Article

Share this book

Add to My Shelf

Environmental predictive models for shark attacks in Australian waters

by Peddemors, Vic , Lynch, Samantha K. , Slip, David J. in Additives , Animal behavior , Carcharhinus

2019

Shark attacks are rare but traumatic events that generate social and economic costs and often lead to calls for enhanced attack mitigation strategies that are detrimental to sharks and other wildlife. Improved understanding of the influence of environmental conditions on shark attack risk may help to inform shark management strategies. Here, we developed predictive models for the risk of attack by white Carcharodon carcharias, tiger Galeocerdo cuvier, and bull/whaler Carcharhinus spp. sharks in Australian waters based on location, sea surface temperature (SST), rainfall, and distance to river mouth. A generalised additive model analysis was performed using shark attack data and randomly generated pseudo-absence non-attack data. White shark attack risk was significantly higher in warmer SSTs, increased closer to a river mouth (>10 km), and peaked at a mean monthly rainfall of 100 mm. Whaler shark attack risk increased significantly within 1 km of a river mouth and peaked in the summer months. Tiger shark attack risk increased significantly with rainfall. We performed additional temporal and spatio-temporal analyses to test the hypothesis that SST anomaly (SSTanom) influences white shark attack risk, and found that attacks tend to occur at locations where there is a lower SSTanom (i.e. the water is relatively cooler) compared to surrounding areas. On the far north coast of eastern Australia—an attack hotspot—a strengthening of the East Australian Current may cause white sharks to move into cooler up-welling waters close to this stretch of the coast and increase the risk of an attack.

Journal Article

Share this book

Add to My Shelf

Use of taxonomy to delineate spatial extent of atlas data for species distribution models

by Toxopeus, Albertus G. , Niamir, Aidin , Real, Raimundo in biogeography , Calibration , data collection

2016

AIM: The use of atlas data in combination with a variety of modelling approaches has become a common practice in species distribution studies. The spatial extent over which species distribution models (SDMs) should be fitted (i.e. the spatial extent) is often arbitrary and coincides with the extent of the atlases. In order to develop reliable SDMs using species atlas data, we propose an approach that incorporates the taxonomy of species and therefore delineates the spatial extent for SDMs. LOCATION: Mainland Spain. METHODS: We used atlas data to generate taxonomically delineated datasets for 365 terrestrial species. The presence records in the datasets were identical to those in the atlas, while the absence records were delimited to the presence of at least one species in the same family or order. We also generated two randomly delineated datasets that were the same size as the taxonomically delineated datasets. We assessed the predictive performance of the SDMs specifically by studying the model calibration (Miller's statistic) and discrimination capacity (area under the curve of the receiver operating characteristic plot), along with the geographical similarity pattern of the predicted maps. RESULTS: The models that were trained using the taxonomically delineated datasets produced significantly improved models in terms of calibration, while their discrimination capacity was no greater than that of the models trained using the atlas dataset. The improvements to the calibration of the taxonomically delineated datasets were significantly greater than those with random absence sets. MAIN CONCLUSION: Delineating the spatial extent using taxonomical information leads to a significant improvement in the model performance of SDMs. This restriction can reduce the effect of environmental events beyond the species history during model parameterization, thus allowing the models obtained to more precisely depict the potential distribution of the species. We therefore recommend considering the delineation of spatial extent using species taxonomy when atlas data are employed in SDMs.

Journal Article

Share this book

Add to My Shelf

Distribution models for koalas in South Australia using citizen science‐collected data

by Daniels, Christopher B. , Bradshaw, Corey J. A. , Baker, Andrew K. in Biodiversity , Citizen science , climate

2014

The koala (Phascolarctos cinereus) occurs in the eucalypt forests of eastern and southern Australia and is currently threatened by habitat fragmentation, climate change, sexually transmitted diseases, and low genetic variability throughout most of its range. Using data collected during the Great Koala Count (a 1‐day citizen science project in the state of South Australia), we developed generalized linear mixed‐effects models to predict habitat suitability across South Australia accounting for potential errors associated with the dataset. We derived spatial environmental predictors for vegetation (based on dominant species of Eucalyptus or other vegetation), topographic water features, rain, elevation, and temperature range. We also included predictors accounting for human disturbance based on transport infrastructure (sealed and unsealed roads). We generated random pseudo‐absences to account for the high prevalence bias typical of citizen‐collected data. We accounted for biased sampling effort along sealed and unsealed roads by including an offset for distance to transport infrastructures. The model with the highest statistical support (wAICc ~ 1) included all variables except rain, which was highly correlated with elevation. The same model also explained the highest deviance (61.6%), resulted in high R2(m) (76.4) and R2(c) (81.0), and had a good performance according to Cohen's κ (0.46). Cross‐validation error was low (~ 0.1). Temperature range, elevation, and rain were the best predictors of koala occurrence. Our models predict high habitat suitability in Kangaroo Island, along the Mount Lofty Ranges, and at the tips of the Eyre, Yorke and Fleurieu Peninsulas. In the highest‐density region (5576 km2) of the Adelaide–Mount Lofty Ranges, a density–suitability relationship predicts a population of 113,704 (95% confidence interval: 27,685–199,723; average density = 5.0–35.8 km−2). We demonstrate the power of citizen science data for predicting species' distributions provided that the statistical approaches applied account for some uncertainties and potential biases. A future improvement to citizen science surveys to provide better data on search effort is that smartphone apps could be activated at the start of the search. The results of our models provide preliminary ranges of habitat suitability and population size for a species for which previous data have been difficult or impossible to gather otherwise. Predicted habitat suitability for koalas in South Australia derived from the generalized linear mixed‐effects models.

Journal Article

Share this book

Add to My Shelf

POC plots: calibrating species distribution models with presence-only data

by Elith, Jane , Phillips, Steven J. in Animal and plant ecology , Animal, plant and microbial ecology , background

2010

Statistical models are widely used for predicting species' geographic distributions and for analyzing species' responses to climatic and other predictor variables. Their predictive performance can be characterized in two complementary ways: discrimination, the ability to distinguish between occupied and unoccupied sites, and calibration, the extent to which a model correctly predicts conditional probability of presence. The most common measures of model performance, such as the area under the receiver operating characteristic curve (AUC), measure only discrimination. In contrast, we introduce a new tool for measuring model calibration: the presence-only calibration plot, or POC plot. This tool relies on presence-only evaluation data, which are more widely available than presence-absence evaluation data, to determine whether predictions are proportional to conditional probability of presence. We generalize the predicted/expected curves of Hirzel et al. to produce a presence-only analogue of traditional (presence-absence) calibration curves. POC plots facilitate visual exploration of model calibration, and can be used to recalibrate badly calibrated models. We demonstrate their use by recalibrating models made by the DOMAIN modeling method on a comprehensive set of 226 species from six regions of the world, significantly improving DOMAIN's predictive performance.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter