The paper on WhyWhere entitled “Improving ecological niche models by data mining large environmental datasets for surrogate models” by David R.B. Stockwell, Ecological Modelling 192 (2006) 188–196 is finally available here. Note the source for the application is temporarily available here, due to a bad file on the main site.

The WhyWhere algorithm (and accompanying database of environmental data) developed from concern with two issues:

  • Is eliminating the large number of possible correlates for species distributions justified?
  • Is not considering a range of possible distributions of species responses to those variables justified?

It is easy to find justifications for the inverse of the questions, i.e. why certain variables such as annual averages of temperature and rainfall are included in models, and why certain distributions such as a bell shaped curve might be preferred, but is there any real reason for excluding alternatives, particularly when their value can be determined objectively by maximum accuracy or some other measure of utility?

In questioning these assumptions with the WhyWhere algorithm, it was found that many variables not usually included in ENMs, such as monthly averages of climate, do indeed maximize accuracy. At other times, surprising variables such as distribution of density of beef cattle provide the best model (Yellow Star Thistle). In addition, examples have been found of variables that maximize accuracy and do not have bell shaped distribution – such as asymptotic (the Brown Tree Snake) and bimodal distributions (house price increase). The conclusion is that the preferences for annual averages and unimodal distributions in most ENM studies are questionable, and may produce models of species distributions with less than maximum accuracy.