Warning: This is very preliminary. The main reason is to make it available for comment to anyone who is brave enough to try it out.

This includes an R module for checking geophysical data for ‘results management’. As the expected distribution of digit frequencies is some cases is described by Benford’s Law, the observed frequency can be compared for deviations from the ideal expected distribution.

Installation

The package can be downloaded from here – audit.tar.gz. Install as you would any R package. Alternatively, you can just ’source’ the file audit/R/benford.R from the R console, and read the documentation via the included pdf files or html files.

Usage

The R module consists of a single call to a function of the following form with the following parameters. See the documentation file for more details.

benford< -function(x,dir=".",files="",type="auto",plot=FALSE,
save="result.csv",n=0)

There are two modes. When n=0 the first and second digit frequencies of the numeric vector x are plotted, and statistics calculated. An example of the distribution of the first two digits of random numbers is shown below.

chap12-003.png

When n>0 the values in the vector are plotted, and Chi-square statistics of the 2nd digit (the most reliable indicator) are calculated on a moving window along the series. This is helpful in diagnosing the locations of deviation from Benford’s analysis. Below are two figures from this mode, showing the detection of fabricated numbers via deviation from 2nd digit frequency in the center (surrounded by blue lines) and random numbers on either side.

chap12-006.png

Analysis

The function benford(x) attempts to modify the vector of values x
to be suitable for extracting the distribution of digits. There are
a number of ways to do this. The R code block below shows the
default “auto” procedure. The main aim is to produce data with
all values positive, between the range of 10 and 1000.

c< -range(r) # Extract the range of values
r<-r-c[1] # Shift so all are positive
s<-10^(ceiling(log10(mean(r)))-2) # Find an appropriate exponent
r<-r/s # Apply exponent
r<-r[r>9] # Remove all values less than 10

The basic analysis is to compare the distribution of first and second digits to the expected frequencies predicted by Benford’s Law. These are shown in a histogram and a graph, with a line indicating the expected frequencies for comparison. As well as Chi squared statistics, the sum of the norm of the difference between the observed and expected frequencies for each digits is calculated.

More detailed description of these analysis is here.

Disclaimer

This is the first release of the software and is not warranted for any purpose, nor is it claimed that it will produce reliable indications of ‘results management’ fraud or any other thing. Use your common sense and triple check everything, including independent verification of any results.