Announce: New fraud detection website

Detecting ‘massaging’ of data by human hands is an area of statistical analysis I have been working on for some time, and devoted one chapter of my book, Niche Modeling, to its application to environmental data sets.

The WikiChecks web site now incorporates a script for doing a Benford’s analysis of digit frequency, sometimes used in numerical analysis of tax and other financial data.

I have posted some initial tests on the site: random numbers and the like. I also ran each of the major monthly global temperature indices through the site: GISS, RSS, UAH and CRU. The results, listed from lowest deviation to highest are listed below.

RSS – Pr<1
UAH – Pr<1 based on global data series Pr<0.001 for whole file (see note)
GISS – Pr<0.05
CRU – Pr<0.01

Numbers such as missing values in the UAH data (-99.990) may have caused its high deviation. I don't know about the others.

Table of results for GISS monthly global temperature data.

Frequency of each final digit: observed vs. expected

0123456789Totals
Observed2973002672682432622552272532352607
Expected2602602602602602602602602602602607
Variance4.925.770.130.181.130.000.104.230.202.4419.10
Significant** *

StatisticDFObtainedProbCritical
Chi Square919.10<0.0516.92

RESULT: Significant management detected.


Significant variation in digit 0: (Pr<0.05) indicates rounding up or down.
Significant variation in digit 1: (Pr<0.05) indicates management.
Significant variation in digit 7: (Pr<0.05) indicates management.

One of the main sources of global warming information, the GISS data set from NASA showed significant management, particularly a deficiency of zeros and ones. Interestingly the moving window mode of the algorithm identified two years, 1940 and 1968 (see here).

Considerable controversy has surrounded the 1940 period, related to possible adjustments for bucket sampling of water temperatures. I am not aware of controversy surrounded 1968 temperature measurements, although 1968 is was a year marked by violent protests, the assassination of Martin Luther King Jr. and Senator Robert Kennedy.

At this stage I am in exploratory mode. The chi-square test is prone to produce false positives for small samples. Also, there are a number of innocent reasons that digit frequency may diverge from expected. However, the tests are very sensitive. Even if arithmetic operations are performed on data after the manipulations, the ‘fingerprint’ of human intervention can remain.

Update:

Thanks to Luboš Motl who checked this data, UAH was confirmed to be manipulation free.

53 Comments

  1. Luboš Motl January 14, 2009 6:07 pm

    Dear David, a great idea. I’ve reproduced your qualitative results but obtained an even stronger signal i.e. lower probability that this non-uniformity appeared by chance, namely 0.4 percent or so for GISS. Click my name to see the details, including a Mathematica notebook.

    http://motls.blogspot.com/2009/01/final-digit-and-cheating-giss.html

  2. Luboš Motl January 14, 2009 6:07 pm

    Dear David, a great idea. I’ve reproduced your qualitative results but obtained an even stronger signal i.e. lower probability that this non-uniformity appeared by chance, namely 0.4 percent or so for GISS. Click my name to see the details, including a Mathematica notebook.

    http://motls.blogspot.com/2009/01/final-digit-and-cheating-giss.html

  3. Anonymous January 14, 2009 8:49 pm

    Dear Luboš,

    Thanks for checking these results for GISS and UAH. The reason for the high UAH result, is that the whole file contains data such as number of days in month that will give a signal for non-uniform digit frequency. When I run on just the global data series, (after extraction into excel) I get the same result as you that clears UAH of manipulation.

    This illustrates another way to get false positives. Rounding errors are another.

    GISS and CRU divergences are significant, and the files don’t have these data inhomogeneities. The localization of the GISS divergences at 1940, the year where the controversy over the ‘warmest year of the century’ erupted, is intriguing though.

    The test I use is the standard Chi-square with DF=9 and Yates correction for small samples. I am looking for a more reliable test for small samples.

  4. admin January 14, 2009 8:49 pm

    Dear Luboš,

    Thanks for checking these results for GISS and UAH. The reason for the high UAH result, is that the whole file contains data such as number of days in month that will give a signal for non-uniform digit frequency. When I run on just the global data series, (after extraction into excel) I get the same result as you that clears UAH of manipulation.

    This illustrates another way to get false positives. Rounding errors are another.

    GISS and CRU divergences are significant, and the files don’t have these data inhomogeneities. The localization of the GISS divergences at 1940, the year where the controversy over the ‘warmest year of the century’ erupted, is intriguing though.

    The test I use is the standard Chi-square with DF=9 and Yates correction for small samples. I am looking for a more reliable test for small samples.

    http://landshape.org/enm

  5. Nick Stokes January 14, 2009 10:48 pm

    David,
    As an exercise, I put the numbers from your table of results above through the check. The result ; Pr<0.001 – “Extremely significant management detected”!

    What have you been up to?

  6. Nick Stokes January 14, 2009 10:48 pm

    David,
    As an exercise, I put the numbers from your table of results above through the check. The result ; Pr<0.001 – “Extremely significant management detected”!

    What have you been up to?

  7. David Stockwell January 14, 2009 11:12 pm

    Nick, Oh right, very funny. You have all the repeated 0 digits from the expected value of 260.

    http://landshape.org/enm

  8. Anonymous January 14, 2009 11:12 pm

    Nick, Oh right, very funny. You have all the repeated 0 digits from the expected value of 260.

  9. Nathan January 15, 2009 1:56 am

    You should do this on all the lotto draws from last year. You may find THEY have been massaged too.

  10. Nathan January 15, 2009 1:56 am

    You should do this on all the lotto draws from last year. You may find THEY have been massaged too.

  11. Nathan January 15, 2009 1:58 am

    I like the disclaimer at the bottom of the web page:

    “Disclaimer: Statistical forensic methods are prone to false positives. Findings must be verified independently. No responsibility is taken for misuse of the tools on this website. ”

    tee hee…

  12. Nathan January 15, 2009 1:58 am

    I like the disclaimer at the bottom of the web page:

    “Disclaimer: Statistical forensic methods are prone to false positives. Findings must be verified independently. No responsibility is taken for misuse of the tools on this website. ”

    tee hee…

  13. Bishop Hill January 15, 2009 6:05 am

    Would it be possible to run this on raw station data too? I’m guessing the intervention (if there is any) could be there rather than at GISS.

    http://bishophill.squarespace.com

  14. Bishop Hill January 15, 2009 6:05 am

    Would it be possible to run this on raw station data too? I’m guessing the intervention (if there is any) could be there rather than at GISS.

    http://bishophill.squarespace.com

  15. Anonymous January 15, 2009 6:18 am

    Bishop, I would work backwards from the final data towards the station data, analysing intermediate data sets and trying to show how they change from stage to stage.

  16. David Stockwell January 15, 2009 6:18 am

    Bishop, I would work backwards from the final data towards the station data, analysing intermediate data sets and trying to show how they change from stage to stage.

    http://landshape.org/enm

  17. Jan Pompe January 15, 2009 6:40 am

    “You should do this on all the lotto draws from last year. You may find THEY have been massaged too.”

    I’m sure they have been :- I never win!!!

    All very interesting.

  18. Jan Pompe January 15, 2009 6:40 am

    “You should do this on all the lotto draws from last year. You may find THEY have been massaged too.”

    I’m sure they have been :- I never win!!!

    All very interesting.

  19. Rich January 15, 2009 1:15 pm

    Ang on a sec. Benford’s law is about the distribution of initial digits which follow a power law distribution. The test above is chi-squared against an assumed uniform distribution. Not the same animal at all.

  20. Rich January 15, 2009 1:15 pm

    Ang on a sec. Benford’s law is about the distribution of initial digits which follow a power law distribution. The test above is chi-squared against an assumed uniform distribution. Not the same animal at all.

  21. John January 15, 2009 4:14 pm

    Can you make your script available? I don’t understand how the Pr values were derived.

  22. John January 15, 2009 4:14 pm

    Can you make your script available? I don’t understand how the Pr values were derived.

  23. Anonymous January 15, 2009 5:42 pm

    #Rich, Benfords Law tends to uniform in the subsequent digits. Anyway it doesn’t apply to measurement data, that can have a constant initial digit.

    #John, http://landshape.org/check/check.txt

  24. davids January 15, 2009 5:42 pm

    #Rich, Benfords Law tends to uniform in the subsequent digits. Anyway it doesn’t apply to measurement data, that can have a constant initial digit.

    #John, http://landshape.org/check/check.txt

    http://landshape.org/enm

  25. Steven Talbot January 16, 2009 2:27 am

    David Stockwell,

    You state yourself that “there are a number of innocent reasons that digit frequency may diverge from expected”. Why, then, do you consider it proper to refer to “fraud detection” in your title and “cheating” in your subtitle? It seems to me that you are improperly insinuating such motives on the part of GISS, without respecting the responsibility of actually presenting any proof to support your implications. Evidence of divergence is not proof of fraud or cheating.

  26. Steven Talbot January 16, 2009 2:27 am

    David Stockwell,

    You state yourself that “there are a number of innocent reasons that digit frequency may diverge from expected”. Why, then, do you consider it proper to refer to “fraud detection” in your title and “cheating” in your subtitle? It seems to me that you are improperly insinuating such motives on the part of GISS, without respecting the responsibility of actually presenting any proof to support your implications. Evidence of divergence is not proof of fraud or cheating.

  27. Anonymous January 16, 2009 6:16 am

    Steven: “Evidence of divergence is not proof of fraud or cheating.” That is right and I have said that. No insinuation in the title, detecting cheating is what the web site and these methods are mainly for.

    Here is an analogy. You go to your doctor and he suggests a blood test for a condition, prostate cancer say. He tells you the test is not definitive, but if it comes back negative you are clear. The test comes back positive, and he suggests more extensive testing.

    That is what is happening here. Do you accuse the doctor of insinuating you have prostate cancer? If it turns out you don’t have prostate cancer do you tell he’s wearing a tin hat for reporting that the first test came back positive?

    Thanks for your comment anyway.

  28. admin January 16, 2009 6:16 am

    Steven: “Evidence of divergence is not proof of fraud or cheating.” That is right and I have said that. No insinuation in the title, detecting cheating is what the web site and these methods are mainly for.

    Here is an analogy. You go to your doctor and he suggests a blood test for a condition, prostate cancer say. He tells you the test is not definitive, but if it comes back negative you are clear. The test comes back positive, and he suggests more extensive testing.

    That is what is happening here. Do you accuse the doctor of insinuating you have prostate cancer? If it turns out you don’t have prostate cancer do you tell he’s wearing a tin hat for reporting that the first test came back positive?

    Thanks for your comment anyway.

    http://landshape.org/enm

  29. Rich January 16, 2009 9:43 am

    #16 davids

    That was rather my point. If what you’re doing is testing against a uniform distribution (and the table of results shows that you are) why mention Benford’s Law at all? True, the distribution of the Nth digit tends to uniform as N increases but so what?

    No, it doesn’t matter but it confused me at the outset.

  30. Rich January 16, 2009 9:43 am

    #16 davids

    That was rather my point. If what you’re doing is testing against a uniform distribution (and the table of results shows that you are) why mention Benford’s Law at all? True, the distribution of the Nth digit tends to uniform as N increases but so what?

    No, it doesn’t matter but it confused me at the outset.

  31. Anonymous January 16, 2009 10:39 am

    Rich: “why mention Benford’s Law at all?” Well on the plus side it puts it into context so people can research it. First digit has been used a lot, see Nigrini, second digit I have used in my book on measurement data. This is the first time I have seen last digits used. It’s not a law really.

  32. admin January 16, 2009 10:39 am

    Rich: “why mention Benford’s Law at all?” Well on the plus side it puts it into context so people can research it. First digit has been used a lot, see Nigrini, second digit I have used in my book on measurement data. This is the first time I have seen last digits used. It’s not a law really.

    http://landshape.org/enm

  33. Geoff Sherrington January 16, 2009 12:04 pm

    In my laboratory times in the 70s, instruments often had meter readouts that the operator had to approximate while they moved gently to and fro. Unknown to the operators, we developed signatures for each of them based on the last digit. It was not infallible, but when you asked “Why did you do XYZ’s readings yesterday?” the body language and sometimes the admission often suggested you were right. So I explained to the staff what we were doing and the practice soon died out. It was sometimes complicated by log scales on the meters.

  34. Geoff Sherrington January 16, 2009 12:04 pm

    In my laboratory times in the 70s, instruments often had meter readouts that the operator had to approximate while they moved gently to and fro. Unknown to the operators, we developed signatures for each of them based on the last digit. It was not infallible, but when you asked “Why did you do XYZ’s readings yesterday?” the body language and sometimes the admission often suggested you were right. So I explained to the staff what we were doing and the practice soon died out. It was sometimes complicated by log scales on the meters.