Edward Vul, Christine Harris, Piotr Winkielman, & Harold Pashler have published research that provides useful insights into the practice of ‘cherry picking’ or prior selection of desirable results leading to exaggerated significance. They also demonstrates the effect in a comprehensive survey of studies in the field of social neuroscience.

To further ‘pin the thumbs of researchers to the table’, and ensure they are noticed and not ignored, they name all the studies explicitly, listing those that exaggerate significance and those that don’t. This is a great example of how not to win friends while influencing people, and gets 5 stars from me. Here is the back story on the brow-chewing response by his colleagues.

The statistical basis of the paper is this: the strength of the correlation observed between measures A and B (rObsA,ObsB ) reflects not only the strength of the relationship between the traits underlying A and B (rA,B), but also the reliability of the measures of A and B (reliabilityA and reliabilityB). In general,

rObsA,ObsB = rA,B * sqrt(reliabilityA * reliabilityB)

As the maximum rA,B=1 for perfect correlation, the reliabilities of the two measures provide an upper bound on the possible correlation that can be observed between the two measures (Nunnally, 1970).

The problem is that many reported correlations, such as a subject’s proneness to anxiety reactions (Carver and White, 1994 reference omitted from the paper) correlated at a very high r=.96. Measures of personality and emotion evidently do not often have reliabilities greater than .8. Neuroimaging measures seem typically to be reliable at .7 or less. Assuming a perfect correlation the maximum that could be obtained would be sqrt(.8 * .7), or .74. A correlation of 0.94 is therefore impossible.

The result must there fore have been achieved by some process of ‘data-peeking’ or ‘cherry-picking’, thresholding higher correlations in the results while discarding those results that are uncorrelated.

An analogous situation in climate science is the selection of a subset of tree-ring proxies by calibration on global temperatures. This procedure alone can produce a hockey-stick shaped temperature history from random data series. It should be possible to estimate the expected correlation given (1) reliability of tree ring signals for given climate changes, (2) reliability of given climate measurements. The square root of the product of these numbers should set an upper limit on the calibration correlation. If published calibration statistics exceed this figure, the a sample of poorly correlated trees must have been discarded in order to enhance the correlation.

Using this process it should be possible to quantify the degree of ‘cherry-picking’ that has taken place, resolving one of the main contentions that skeptics have had with this field.

HT to Geoff Sherrington.


References:

Nunnally, JC. Introduction to Psychological Measurement. New York: McGraw-Hill; 1970.