Screening on the dependent, auto-correlated variable

To screen or not to screen? The question arises in the context of selecting which sets of tree-rings to use for millennial temperature reconstructions. One side, represented by CA, says screening is just plain wrong:

In the last few days, readers have drawn attention to relevant articles discussing closely related statistical errors under terms like “selecting on the dependent variable” or “double dipping – the use of the same data set for selection and selective analysis”.

Another side, represented by Jim Boulden say, says screening is just fine.

So, once again, if you are proposing that a random, red noise process with no actual relationship to the environmental variable of interest (seasonal temperature) causes a spurious correlation with that variable over the instrumental period, then I defy you to show how such a process, with ANY level of lag 1 auto-correlation, operating on individual trees, will lead to what you claim. And if it won’t produce spurious correlations at individual sites, then it won’t produce spurious correlations with larger networks of site either.

Furthermore, whatever extremely low probabilities for such a result might occur for a “typical” site having 20 too 30 cores, is rendered impossible in any practical sense of the word, by the much higher numbers of cores collected in each of the 11 tree ring sites they used. So your contention that this study falls four square within this so called “Screening Fallacy” is just plain wrong, until you demonstrate conclusively otherwise. Instead of addressing this issue–which is the crux issue of your argument–you don’t, you just go onto to one new post after another.

Yet another side, represented by Gergis, says screening is OK providing some preparation such as linear detrending is imposed:

For predictor selection, both proxy climate and instrumental data were linearly detrended over the 1921–1990 period to avoid inflating the correlation coefficient due to the presence of the global warming signal present in the observed temperature record. Only records that were significantly (p<0.05) correlated with the detrended instrumental target over the 1921–1990 period were selected for analysis.

I always find guidance in going back to fundamentals, which people never seem to do in statistics. Firstly, what does “records that were significantly (p<0.05) correlated with the detrended instrumental target" mean? It states that they expect that 95% of the records in their sample are responding to temperature as they want, and that 5% are spurious, bogus, wring-ins, undesirable, caused by something else. It is implicit that being wrong about 5% is good enough for the purposes of their study.

For example, imagine a population of trees where some respond to rainfall and some respond to temperature. Both temperature and rainfall are autocorrelated, and for the sake of simplicity, lets assume they vary independently. If we want to screen those that respond to temperature-only with 95% confidence we can do that by correlating their growth with temperature. But we do have to make sure that the screen we use is sufficiently powerful to eliminate the other, autocorrelated rainfall responders.

The question that arises from autocorrelation in the records -- the tendency for temperature to follow-on and trend even though they are random -- is that the proportion of spurious records, in most tests, may be much higher than 5%. That would be unacceptable for the study. The onus is on the author, by monte-carlo simulation or some other method, to show that the 5% failure rate is really 5%, and not something larger, like 50%, which would invalidate the whole study.

As the tendency of autocorrelated records is to fool us into thinking the proportion of spurious records is lower than it is, then the simplest, most straightforward remedy is to increase the critical value so that the actual proportion of spurious records is once again, around the desired 5% level. This might mean adopting a 99% critical value, a 99.9% or a 99.999% critical value depending on the degree of autocorrelation.

So I would argue that it not correct that screening is an error in all cases. Tricky, but not an error. It is also not correct to impost ad-hocery such as correlating on the detrended variable, as this might simply result in select a different set of spurious records. It is also not correct to apply screening blindly.

What you need to do, is do the modeling stock-standard correctly, as argued in my book Niche Modeling. Work out the plausible error model, or some set of error models if you are uncertain, and establish robust bounds for your critical values using monte-carlo simulation. In the case of tree studies, you might create a population of two or more sets of highly autocorrelated records to test that the screening method performs to the desired tolerance. In my view, the correlation coefficient is fine as it is as good as anything else in this situation.

You get into a lot less trouble that way.