Title
Pseudo-absences, pseudo-models and pseudo-niches: pitfalls of model selection based on the area under the curve
Abstract
The area under the curve AUC of the receiver operator characteristic ROC graph is regarded as an objective measure of the discrimination accuracy of predictive models. AUC scores calculated from background values, or pseudo-absences, have been proposed as a method of model selection for species distribution models SDMs fitted to presence-only data. However, the utility of AUC as a measure of model performance when data on confirmed absence are unavailable has not been fully investigated. We fitted SDMs using informative climatic variables for 2000 species of Mesoamerican trees. As a reference, we also built ‘pseudo-models’ using Gaussian random fields with no biological meaning. AUC correctly selected SDMs fitted to single environmental variables over ‘pseudo-models’ fitted to single random fields in almost all cases. However, when all seven variables were included in the models, AUC erroneously selected complex pseudo-models over complex climate models in 17% of the cases. The spatial distribution patterns predicted by the pseudo-models differed from the results derived from climate-based models, even when overall AUC scores were similar. Both model and pseudo-model AUC values increased when presence points were few and spatially aggregated. The results show that AUC calculated from presence-only data can be an unreliable guide for model selection. Pseudo-absences have ill-defined properties that challenge the interpretation of AUC values. Inference on multidimensional niche spaces should not be supported by AUC values calculated using pseudo-absences.
Year
DOI
Venue
2012
10.1080/13658816.2012.719626
International Journal of Geographical Information Science
Keywords
Field
DocType
complex pseudo-models,overall auc score,auc score,curve auc,model performance,complex climate model,auc value,pseudo-model auc,model selection,climate-based model,species distribution modelling
Data mining,Climate model,Environmental niche modelling,Random field,Receiver operating characteristic,Inference,Model selection,Gaussian,Statistics,Mathematics,Area under the curve
Journal
Volume
Issue
ISSN
26
11
1365-8816
Citations 
PageRank 
References 
7
0.69
2
Authors
4
Name
Order
Citations
PageRank
Duncan Golicher170.69
Andrew Ford270.69
Luis Cayuela370.69
Adrian Newton470.69