I received a number of emails suddenly about [tag]WhyWhere[/tag], and I thought I would answer them all here with an update on progress of the new version. This is my highest priority now, and should be available as beta in a week or so. The old version was too hard to maintain, being built on via a number of student postdocs over many years. The new version will be in [tag]R[/tag] and so have far fewer lines of code. It will also be more more consistent with subscription trends. It will consist Read more [...] none
The Australian Institute of Geoscientists News has published online my article "Reconstruction of past climate using series with red noise" on page 14. Many thanks to Louis Hissink the editor for the rapidity of this publication. It is actually a very interesting newsletter with articles on the IPCC, and a summary of the state of the hockey stick (or hokey stick). There are articles on the K-T boundary controversy and how to set up an exploration company. Reconstructing the hokey stick with random Read more [...] 6 com
This week I am posting another quiz, although no-one has yet solved the Spaghetti Graph Quiz. This one, suggested Demetris Koutsoyiannis may require some statistical analysis to solve. I have plotted the points up, and converted them to an R statement below. The Quiz: The following numbers are synthetic, generated by a mathematical model. Can anybody decompose it into components such as trends, periodicities or whatever, and can one infer the type of the generating model? data< -c(0.057,0.204,0.469,0.108,0.422,0.046,0.437,0.175,0.371,0.085,0.487,0.602,0.633,0.854,0.529,0.579,0.260,0.695,0.564,0.181,0.991,0.679,0.657,0.648,0.392,0.543,0.293,0.769,0.183,0.932,0.538,0.339,0.335,0.978,0.732,0.325,0.760,0.821,0.651,0.554,0.374,0.692,0.982,0.922,0.604,0.815,0.969,0.986,0.859,0.940) Read more [...] 55 com
Demetris Koutsoyiannis contributed the following excellent piece as a comment on a previous post. I have made it into a post to ensure it gets the widest distribution. Hurst, Joseph, colours and noises: The importance of names in an important natural behaviour “What’s in a name? That which we call a rose By any other name would smell as sweet. William Shakespeare, “Romeo and Juliet, Act 2 scene 2 Is the name given to a physical phenomenon or in a scientific concept (e.g. a Read more [...] 40 com
There are many reasons a scientist might start a blog:
  • Prepublication of work in progress to enable review by others
  • Outreach to the general community
  • Dissemination of notes
  • Provide a review of the literature
  • Advocate a position or idea
  • Facilitate project management
  • Make money
Of these the last is probably the most tricky, but I will say something about that too. After deciding to start a blog, the next question is how to do it. There are a range of possibilities available. Following are my notes on the experience. Read more [...] 28 com
A new temperature reconstruction has certainly resonated with many people. Here is a summary of what some of the blogs have been saying, and my corrections of some small inaccuracies. Read more [...] 4 com
Here is the 'spaghetti graph' of a number of prominent reconstructions, with two-sigma confidence interval. The CRU calibration temperatures are the solid black line. Can you find the random reconstruction? (Thanks to Steve McIntyre at http://www.climateaudit.org/?p=566 for recon data.) Here I am just applying the same red noise technique to the ’select and average’ method in use by many including Esper et.al. prior to MBH98 Principle Components method examined in detail in MM05 Read more [...] 9 com
To recap previous posts (http://www.climateaudit.org/?p=566), about replicating the cross-validation procedure used in MBH98 for reconstruction skill of randomly generated series on raw and filtered CRU temperatures. The RE statistic correctly indicated no skill for the reconstruction in both the raw and filtered temperature data. The R2 statistic indicated no skill on the raw temperature data and skill at predicting the filtered temperature data. The importance of these 'tests' is that they Read more [...] none
To follow up on the last post, I have calculated the RE as well as the R2 statsitics for the reconstruction from the random series. The same approach was used, i.e. generate 1000 sequences with LTP, select those with positive slope and R2>0.1, calibrate on linear model, and average. Here is the reconstruction again, with the test and training periods marked with a horizontal dashed line (test period to the left, training to right of temperature values): The table below adds the RE statistic Read more [...] none
As a follow-up om the previous post, I have examined the correlation statistics for the reconstruction of past climate from random series with red noise. I have tried to use the same approach as MBH98, where the model is tested over data for years held back from the main analysis and model development. Different intervals of years could be chosen, but in the case of MBH98, the model is trained on years 1901-1990 and tested on years 1856-1900. The distribution of R2 values are as follows: Figure Read more [...] one
In honor of the National Research Council of the National Academies committee to study "Surface Temperature Reconstructions for the Past 1,000-2,000 Years" meeting at this moment, I offer my own climate reconstruction based on the methods blessed by dendroclimatology. The graph below shows reconstructed temperature anomolies over 2000 years, with the surface temperature measurements from 1850 from CRU as black dots, the individual series in blue and the climate reconstruction in black. I think Read more [...] 53 com
The paper on WhyWhere entitled "Improving ecological niche models by data mining large environmental datasets for surrogate models" by David R.B. Stockwell, Ecological Modelling 192 (2006) 188–196 is finally available here. Note the source for the application is temporarily available here, due to a bad file on the main site. The WhyWhere algorithm (and accompanying database of environmental data) developed from concern with two issues: Is eliminating the large number of possible correlates Read more [...] none
Below is an investigation of scale invariance or long term persistence (LTP) in time series including tree-ring proxies – the recognition, quantification and implications for analysis – drawn largely from Koutsoyiannis [2] (preprints available here). In researching this topic, I found a lot of misconceptions about LTP phenomena, such as LTP implying a long term memory process, and a lack of recognition of the implications of LTP. As to implications, the standard error of the mean of Read more [...] 11 com
Predicting real estate is like any other geospatial problem - all you need is data - e.g. see Zillow. If locations such as cities and their house prices or increases are correlated with environmental variables then a model can be developed. Here I address the question - what environmetal variables predict the increase in house prices in metro areas of the US? Applying WhyWhere (WW) to a topic other than predicting species produces some interesting results: the best predictor of areas with high Read more [...] 5 com
The major scientific journals are often regarded as the touchstones of scientific truth. However, their reputation has been tarnished with yet another major scientific fraud unfolding over South Korean researcher Hwang Woo-suk's peer-reviewed and published Stem Cell research. Should the publication of these results be viewed as simple 'mistakes', a crime by a deviant individual, or a broader conspiracy aided by lax reviewing and journal oversight? Blogs were apparently instrumental in uncovering Read more [...] 9 com
There are two main forms of data about species occurrences, lists of locations where a species has been found, called presence-only (P) data, and lists of locations where species are both present and absent (PA). In developing ENMs, PA data are often said to be preferable to P data (e.g. Austin and Meyers 1996), and some have shown empirical results supporting this view (e.g. Broto et al. 2004). But is there an intrinsic advantage to PA data? The problem of presence-only data arises largely from Read more [...] none
Have become interested in checking out dendroclimatology from the ENM point of view - particularly evaluating the model used for functional responses of alpine trees to temperature. All studies in Briffa et al. 2001 (figure below) invariably use a linear model, OLS fit of the proxy to temperature be it tree ring width (TRW) or density (MXD). It is of course not possible for tree growth to increase indefinitely with temperature increases - it has to be limited. The obvious choice for a more accurate Read more [...] none
by David Stockwell and Bing Zhu for SRB Workshop, February 2-3, 2006, San Diego, due Dec 15th Here we describe the use of the Storage Resource Broker (SRB) to support new data intensive approaches to Environmental Niche Modeling (ENM) by providing access to cropped images from a remote SRB data store of almost 1000 global coverage data sets. The basic architecture of the system is illustrated on the figure below. Figure 1. Illustration of the components and operation of the SRB WhyWhere data Read more [...] 2 com
The WhyWhere system has integrated a lot of environmental data sets of many different kinds with a robust method. This allows you to search for for correlates of any geographic points, not just species. The user does not have to prepare these, just enter the coordinates. I thought it would be interesting to see what correlated with recent temperature anomalies. We all know average annual temperatures have increased in the last 30 years, but the spatial pattern of those increases is less well Read more [...] none
Some have been asking for an explanation of WhyWhere and how it fits in relation to other methods, particularly GARP. Although the details are in the paper, they are in a more academic from and I thought I would try to explain it here. Here is a nice schematic prepared by Jean Tate describing the basic one dimensional model output from a run on the Yellow Star Thistle, illustrated as a frequency histogram. A 2D model would be similar, just columns with two environmental dimensions The blue Read more [...] one
I have just placed the Lifemapper paper onto the arXiv pre-print archive here. The use of the GARP genetic algorithm and internet grid computing in the Lifemapper world atlas of species biodiversity Authors: David R.B. Stockwell, James H. Beach, Aimee Stewart, Gregory Vorontsov, David Vieglais, Ricardo Scachetti Pereira Comments: 17 pages, 4 figures, in press at Ecological Modelling Subj-class: Quantitative Methods; Other Lifemapper (this http URL) is a predictive electronic atlas of the Earth's Read more [...] none
I have just posted the WhyWhere paper to arXiv here. Improving ecological niche models by data mining large environmental datasets for surrogate models Authors: David R.B. Stockwell Comments: 16 pages, 4 figures, to appear in Ecological Modelling Subj-class: Quantitative Methods WhyWhere is a new ecological niche modeling (ENM) algorithm for mapping and explaining the distribution of species. The algorithm uses image processing methods to efficiently sift through large amounts of data to find the Read more [...] one
There are a number of issues that arise in analysis of spatial data points, not enough data and spatial auto-correlation being two often raised. As a general principle, the external accuracy (on the test set) will increase asymptotically as number of data increases, and the internal accuracy (on the training set) will decrease asymptotically as the number of data increases. We are most often interested in external accuracy, so more data is better, but the returns are diminishing. There are also Read more [...] none