Catalogue Search | MBRL

Selecting thresholds for the prediction of species occurrence with presence-only data

by Newell, Graeme , Liu, Canran , White, Matt in Animal and plant ecology , Animal, plant and microbial ecology , Biogeography

2013

Aim: Species distribution models have been widely used to tackle ecological, evolutionary and conservation problems. Most species distribution modelling techniques produce continuous suitability predictions, but many real applications (e.g. reserve design, species invasion and climate change impact assessment) and model evaluations require binary outputs, and thresholds are needed for these transformations. Although there are many threshold selection methods for presence/absence data, it is unclear whether these are suitable for presence-only data. In this paper, we investigate mathematically and empirically which of the existing threshold selection methods can be used confidently with presence-only data. Location: We used real spatially explicit environmental data derived from the western part of the state of Victoria, south-eastern Australia, and simulated species distributions within this area. Methods: Thirteen existing threshold selection methods were investigated mathematically to see whether the same threshold can be produced using either presence/absence data or presence-only data. We further adopted a simulation approach, created many virtual species with differing prevalences in a real landscape in south-eastern Australia, generated data sets with different proportions of pseudo-absences, built eight types of models with four modelling techniques, and investigated the behaviours of four threshold selection methods in these situations. Results: Three threshold selection methods were not affected by pseudo-absences, including max SSS (which is based on maximizing the sum of sensitivity and specificity), the prevalence of model training data and the mean predicted value of a set of random points. Max SSS produced higher sensitivity in most cases and higher true skill statistic and kappa in many cases than the other methods. The other methods produced different thresholds from presence-only data to those determined from presence/absence data. Main conclusions: Max SSS is a promising method for threshold selection when only presence data are available.

Journal Article

Share this book

Add to My Shelf

Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null model

by Hijmans, Robert J. in Algorithms , Animal and plant ecology , Animal, plant and microbial ecology

2012

Species distribution models are usually evaluated with cross-validation. In this procedure evaluation statistics are computed from model predictions for sites of presence and absence that were not used to train (fit) the model. Using data for 226 species, from six regions, and two species distribution modeling algorithms (Bioclim and MaxEnt), I show that this procedure is highly sensitive to \"spatial sorting bias\": the difference between the geographic distance from testing-presence to training-presence sites and the geographic distance from testing-absence (or testing-background) to training-presence sites. I propose the use of pairwise distance sampling to remove this bias, and the use of a null model that only considers the geographic distance to training sites to calibrate cross-validation results for remaining bias. Model evaluation results (AUC) were strongly inflated: the null model performed better than MaxEnt for 45% and better than Bioclim for 67% of the species. Spatial sorting bias and area under the receiver-operator curve (AUC) values increased when using partitioned presence data and random-absence data instead of independently obtained presence-absence testing data from systematic surveys. Pairwise distance sampling removed spatial sorting bias, yielding null models with an AUC close to 0.5, such that AUC was the same as null model calibrated AUC (cAUC). This adjustment strongly decreased AUC values and changed the ranking among species. Cross-validation results for different species are only comparable after removal of spatial sorting bias and/or calibration with an appropriate null model.

Journal Article

Share this book

Add to My Shelf

Predicting Species Distributions from Small Numbers of Occurrence Records: A Test Case Using Cryptic Geckos in Madagascar

by Peterson, A. Townsend , Raxworthy, Christopher J. , Pearson, Richard G. in Animal and plant ecology , Animal, plant and microbial ecology , Biodiversity conservation

2007

Aim Techniques that predict species potential distributions by combining observed occurrence records with environmental variables show much potential for application across a range of biogeographical analyses. Some of the most promising applications relate to species for which occurrence records are scarce, due to cryptic habits, locally restricted distributions or low sampling effort. However, the minimum sample sizes required to yield useful predictions remain difficult to determine. Here we developed and tested a novel jackknife validation approach to assess the ability to predict species occurrence when fewer than 25 occurrence records are available. Location Madagascar. Methods Models were developed and evaluated for 13 species of secretive leaf-tailed geckos (Uroplatus spp.) that are endemic to Madagascar, for which available sample sizes range from 4 to 23 occurrence localities (at 1 km2grid resolution). Predictions were based on 20 environmental data layers and were generated using two modelling approaches: a method based on the principle of maximum entropy (Maxent) and a genetic algorithm (GARP). Results We found high success rates and statistical significance in jackknife tests with sample sizes as low as five when the Maxent model was applied. Results for GARP at very low sample sizes (less than c. 10) were less good. When sample sizes were experimentally reduced for those species with the most records, variability among predictions using different combinations of localities demonstrated that models were greatly influenced by exactly which observations were included. Main conclusions We emphasize that models developed using this approach with small sample sizes should be interpreted as identifying regions that have similar environmental conditions to where the species is known to occur, and not as predicting actual limits to the range of a species. The jackknife validation approach proposed here enables assessment of the predictive ability of models built using very small sample sizes, although use of this test with larger sample sizes may lead to overoptimistic estimates of predictive power. Our analyses demonstrate that geographical predictions developed from small numbers of occurrence records may be of great value, for example in targeting field surveys to accelerate the discovery of unknown populations and species.

Journal Article

Share this book

Add to My Shelf

Utility in Willingness to Pay Space: A Tool to Address Confounding Random Scale Effects in Destination Choice to the Alps

by Scarpa, Riccardo , Train, Kenneth , Thiene, Mara in Agricultural economics , Alps , Alternative approaches

2008

We compare two approaches for estimating the distribution of consumers' willingness to pay (WTP) in discrete choice models. The usual procedure is to estimate the distribution of the utility coefficients and then derive the distribution of WTP, which is the ratio of coefficients. The alternative is to estimate the distribution of WTP directly. We apply both approaches to data on site choice in the Alps. We find that the alternative approach fits the data better, reduces the incidence of exceedingly large estimated WTP values, and provides the analyst with greater control in specifying and testing the distribution of WTP.

Journal Article

Share this book

Add to My Shelf

Paired-End Analysis of Transcription Start Sites in Arabidopsis Reveals Plant-Specific Promoter Signatures

by Ohler, Uwe , Li, Song , Carda, Alexa in Arabidopsis , Arabidopsis - genetics , Arabidopsis - metabolism

2014

Understanding plant gene promoter architecture has long been a challenge due to the lack of relevant large-scale data sets and analysis methods. Here, we present a publicly available, large-scale transcription start site (TSS) data set in plants using a high-resolution method for analysis of 5' ends of mRNA transcripts. Our data set is produced using the paired-end analysis of transcription start sites (PEAT) protocol, providing millions of TSS locations from wild-type Columbia-0 Arabidopsis thaliana whole root samples. Using this data set, we grouped TSS reads into \"TSS tag clusters\" and categorized clusters into three spatial initiation patterns: narrow peak, broad with peak, and weak peak. We then designed a machine learning model that predicts the presence of TSS tag clusters with outstanding sensitivity and specificity for all three initiation patterns. We used this model to analyze the transcription factor binding site content of promoters exhibiting these initiation patterns. In contrast to the canonical notions of TATA-containing and more broad \"TATA-less\" promoters, the model shows that, in plants, the vast majority of transcription start sites are TATA free and are defined by a large compendium of known DNA sequence binding elements. We present results on the usage of these elements and provide our Plant PEAT Peaks (3PEAT) model that predicts the presence of TSSs directly from sequence.

Journal Article

Share this book

Add to My Shelf

A Statistical Model of Facial Attractiveness

by Todorov, Alexander , Said, Christopher P. in Accounts , Alternative approaches , Beauty

2011

Previous research has identified facial averageness and sexual dimorphism as important factors in facial attractiveness. The averageness and sexual dimorphism accounts provide important first steps in understanding what makes faces attractive, and should be valued for their parsimony. However, we show that they explain relatively little of the variance in facial attractiveness, particularly for male faces. As an alternative to these accounts, we built a regression model that defines attractiveness as a function of a face's position in a multidimensional face space. The model provides much more predictive power than the averageness and sexual dimorphism accounts and reveals previously unreported components of attractiveness. The model shows that averageness is attractive in some dimensions but not in others and resolves previous contradictory reports about the effects of sexual dimorphism on the attractiveness of male faces.

Journal Article

Share this book

Add to My Shelf

Systems Model of Signaling Identifies a Molecular Basis Set for Cytokine-Induced Apoptosis

by Sorger, Peter K , Janes, Kevin A , Lauffenburger, Douglas A in Analysis , Apoptosis , Autocrine Communication

2005

Signal transduction pathways control cellular responses to stimuli, but it is unclear how molecular information is processed as a network. We constructed a systems model of 7980 intracellular signaling events that directly links measurements to 1440 response outputs associated with apoptosis. The model accurately predicted multiple time-dependent apoptotic responses induced by a combination of the death-inducing cytokine tumor necrosis factor with the prosurvival factors epidermal growth factor and insulin. By capturing the role of unsuspected autocrine circuits activated by transforming growth factor-[alpha] and interleukin-1[alpha], the model revealed new molecular mechanisms connecting signaling to apoptosis. The model derived two groupings of intracellular signals that constitute fundamental dimensions (molecular \"basis axes\") within the apoptotic signaling network. Projection along these axes captures the entire measured apoptotic network, suggesting that cell survival is determined by signaling through this canonical basis set.

Journal Article

Share this book

Add to My Shelf

A New Resource-Constrained Multicommodity Flow Model for Conflict-Free Train Routing and Scheduling

by Fuchsberger, M. , Laumanns, M. , Zenklusen, R. in Algorithms , Alternative approaches , Applied sciences

2011

This paper addresses the problem of generating conflict-free train schedules on a microscopic model of the railway infrastructure. Conflicts arise if two or more trains are scheduled to block the same track section at the same time. A standard model for this problem is the so-called conflict graph, where each considered train path corresponds to a vertex, and edges represent pairwise conflicts so that a conflict-free schedule corresponds to a maximum independent set. Because the linear programming relaxation of the conflict graph formulation is typically very weak, we develop an alternative model using the sequence of resources that each train path passes, encoded in a resource tree. For each resource, we can efficiently determine the maximal conflict cliques by scanning through the blocking times of all train paths and use these cliques as strong cutting planes in an integer linear programming formulation. We show that the number of maximal conflict cliques is linear in the number of train paths, so the ILP formulation uses much fewer but stronger constraints compared to the conflict graph model. In tests with real-world data from the Swiss Federal Railways, the new Resource Tree Conflict Graph model generates for major stations within seconds, even though the underlying model contains about half a million binary variables. This corresponds to a reduction of the computation time of roughly two orders of magnitude when compared to previous approaches and thus allows us to tackle considerable larger problem instances.

Journal Article

Share this book

Add to My Shelf

Modelling Ecological Niches with Support Vector Machines

by Guisan, Antoine , Randin, Christophe , Drake, John M. in Applied ecology , Datasets , Ecological invasion

2006

1. The ecological niche is a fundamental biological concept. Modelling species' niches is central to numerous ecological applications, including predicting species invasions, identifying reservoirs for disease, nature reserve design and forecasting the effects of anthropogenic and natural climate change on species' ranges. 2. A computational analogue of Hutchinson's ecological niche concept (the multi-dimensional hyperspace of species' environmental requirements) is the support of the distribution of environments in which the species persist. Recently developed machine-learning algorithms can estimate the support of such high-dimensional distributions. We show how support vector machines can be used to map ecological niches using only observations of species presence to train distribution models for 106 species of woody plants and trees in a montane environment using up to nine environmental covariates. 3. We compared the accuracy of three methods that differ in their approaches to reducing model complexity. We tested models with independent observations of both species presence and species absence. We found that the simplest procedure, which uses all available variables and no pre-processing to reduce correlation, was best overall. Ecological niche models based on support vector machines are theoretically superior to models that rely on simulating pseudo-absence data and are comparable in empirical tests. 4. Synthesis and applications. Accurate species distribution models are crucial for effective environmental planning, management and conservation, and for unravelling the role of the environment in human health and welfare. Models based on distribution estimation rather than classification overcome theoretical and practical obstacles that pervade species distribution modelling. In particular, ecological niche models based on machine-learning algorithms for estimating the support of a statistical distribution provide a promising new approach to identifying species' potential distributions and to project changes in these distributions as a result of climate change, land use and landscape alteration.

Journal Article

Share this book

Add to My Shelf

ModEco: an integrated software package for ecological niche modeling

by Guo, Qinghua , Liu, Yu in Abundance , Animal and plant ecology , Animal, plant and microbial ecology

2010

ModEco is a software package for ecological niche modeling. It integrates a range of niche modeling methods within a geographical information system. ModEco provides a user friendly platform that enables users to explore, analyze, and model species distribution data with relative ease. ModEco has several unique features: 1) it deals with different types of ecological observation data, such as presence and absence data, presence-only data, and abundance data; 2) it provides a range of models when dealing with presence-only data, such as presence-only models, pseudo-absence models, background vs presence data models, and ensemble models; and 3) it includes relatively comprehensive tools for data visualization, feature selection, and accuracy assessment.

Journal Article

Share this book

Add to My Shelf

Language Selector

MBRLGlobalSearch

Language Selector

Catalogue Search | MBRL

Search Results Heading

Explore the vast range of titles available.

MBRLSearchResults

MBRLHappinessMeter