Perth 1940 Max Min Daily Temps

Previous posts have introduced the work that Chris Gillham is doing in spot auditing the accuracy of the Bureau of Meteorology’s temperature records. He has now re-recorded the daily max and min temperatures from one Australian weather station for one year, Perth 9034 in 1940, using original sources in The West Australian newspaper.

Below is an initial look at the historic data (in red) compared to the BoM’s “unadjusted” or “raw” records (grey) for the station.

Its fairly clear that there are a lot of errors. The minimum temperatures, however, are shockers. Each of the red lines seen on the lower series above is an error in the daily minimum — mostly down.

Mean of the max differences = +0.20C
Mean of the min differences = -1.18C
Average max all differences = +0.04C
Average min all differences = -0.33C

While the average error of the max temperatures is up 0.2C, the average magnitude of the errors in the min temperatures is a whopping 1.18C! Over the whole year that changes the annual minimum temperature by -0.33C.

The diurnal range is increased by an average of 0.4C. While these errors are only in one year in one station, it is noteworthy that the magnitude of these errors is similar to the change in the diurnal range attributed to global warming.

The data file is here – perth-1940-actual-raw. You need to open it in excel and save as a CVS file.

The code below should run on the datafile.

P1940=ts(read.csv("perth-1940-actual-raw.csv"),start=1940,freq=365)
l=2
plot(P1940[,3],col=2,ylim=c(0,45),main="Perth Regional Office 9034",ylab="Temperature C",lwd=l)
lines(P1940[,4],col="gray",lwd=l)
lines(P1940[,7],col=2,lwd=l)
lines(P1940[,8],col="gray",lwd=l)
maxErrs=P1940[P1940[,3]!=P1940[,4],]
print(mean(maxErrs[,4]-maxErrs[,3]))
minErrs=P1940[P1940[,7]!=P1940[,8],]
print(mean(minErrs[,8]-minErrs[,7]))
print(mean(P1940[,4]-P1940[,3]))
print(mean(P1940[,8]-P1940[,7]))

Perth 1940 Jan-Dec – Errors

Chris Gillham has completed re-digitizing one years worth of the daily temperature records for Perth in 1940 (perth-1940-actual-raw). These are digitised for all of 1940 at Perth Regional Office 9034 from temperatures published in The West Australian newspaper.

While the majority of the temperatures agree with contemporary BoM data, up to a third of the temperatures in some months disagreed, sometimes by over 1C! This is a very strange pattern of errors, and difficult to explain.

I will be doing more detailed analysis, but Chris reports that overall, the annual average of actual daily Perth max temperatures in 1940, as published in the newspaper, was the same as the BoM raw daily max. The annual average of newspaper daily min temperatures was .3C warmer than in the BoM raw daily min. ACORN max interpreted 1940 as 1.3C warmer than both actual newspaper max and BoM raw max, with ACORN min 1.5C cooler than actual newspaper min and 1.2C cooler than BoM raw min.

Anything above a .1C newspaper/raw difference is highlighted.

Chris notes:

It took a couple of days wading through about 310 newspapers to find all the weather reports and although it would be great to have all years from all locations (those with decimalised F newspaper sources) to confirm the Perth 1940 results, it’s a huge task. It would certainly be easier if the BoM just provided the temps from the old logbooks.

Rewriting the Temperature Records – Adelaide 1912

Record temperature always make the news, with climate alarmists trumpeting any record hot day. But what if the historic record temperatures recorded by BoM were adjusted down, and recent records were not records at all? More detective work using old newspapers by Chis Gillham in Adelaide this time.

The BoM claims the hottest ever Feb max at West Terrace was 43.4C on 1 February 1912. They got the date sort of right except the Adelaide Advertiser below shows Feb 1 at 112.5F (44.7C) and Feb 2 at 112.8F (44.9C). The BoM cut Feb 2 to 43.3C in raw.

Perth 1940 Jan-Mar Historic Comparisons

Continuing the comparison of historic sources of temperature and contemporary records, Chris Gillham has compiled a list of maximum and minimum daily temperatures for Perth for the months of January, February and March 1940 and uncovered some strange discrepancies (highlighted – all months at perth-newspapers-mar-qtr-1940).

Chris notes that while BoM’s contemporary temperatures largely agree with temperatures reported in newspapers of the day, a couple of temperatures in each month disagree by up to a degree C!

File attached comparing the March quarter 1940 daily newspaper and BoM raw data for Perth Regional Office 9034 (Perth Observatory atop Mt Eliza at the time), plus an ACORN average for each month.

Combining all days in the March 1940 quarter, average max in The West Australian newspaper was 29.51C and average BoM raw max was 29.56C. Average min in the newspaper was 17.38C and average BoM raw min was 17.15C. Rounded, max up .1C and min down .3C in BoM raw compared to what was reported in 1940. There seems a tendency for just two or three temps each month to be adjusted in raw, sometimes up but obviously with a downward bias in min.

ACORN-SAT judged the three months to have an average max of 31.32C and an average min of 16.17C. So max has been pushed up about 1.8C and min has been pushed down about 1.2C or 1C, depending on your point of view :-).

It always pays to go back to the source data.

Should the ABS take over the BoM?

I read an interesting article article about Peter Martin, head of the Australian Bureau of Statistics.

He has a refreshing, mature attitude to his job.

‘I want people to challenge our data – that’s a good thing, it helps us pick things up,’ he says.

Big contrast to the attitude of Climate Scientists. Examples that they believe they cannot be challenged are legion, from meetings to peer review. For example, emails expressing disagreement with the science are treated as threatening, as shown by the text of eleven emails released under ‘roo shooter’ FOI by the Climate Institute at Australian National University.

Australia’s Chief statistician is also egalitarian. In response to a complaint by the interviewer about employment figures, he responds:

He says he doesn’t believe there is a problem, but gives every indication he’ll put my concerns to his staff, giving them as much weight as if they came from the Treasurer.

This is a far cry from the stated policy of the CSIRO/BoM (Bureau of Meteorology) to only respond to peer-reviewed publications. Even when one does publish statistical audits identifying problems with datasets, as I have done, you are likely to get a curt review stating that “this paper should be thrown out because its only purpose is criticism”. It takes a certain type of editor to proceed with publication under those circumstances.

When the Federal Government changes this time, as appears inevitable, one initiative they might consider is a greater role for the ABS in overseeing the BoM responsibilities. Although the BoM is tasked with the collection of weather and water data by Acts of Parliament, it would benefit from an audit and ongoing supervision by the ABS, IMHO.

Dynamical vs Statistical Models Battle Over ENSO

There is a battle brewing between dynamical and statistical models. The winner will be determined when the current neural ENSO conditions resolve into an El Nino or not in the current months.

The International Research Institute for Climate and Society compares the predictions of ensembles of each type of model here.

Although most of the set of dynamical and statistical model predictions issued during late April and early May 2012 predict continuation of neutral ENSO conditions through the middle of northern summer (i.e., June-August), slightly more than half of the models predict development of El Nino conditions around the July-September season, continuing through the remainder of 2012. Still, a sizable 40-45% of the models predict a continuation of ENSO-neutral conditions throughout 2012. Most of the models predicting El Nino development are dynamical, while most of those predicting persistence of neutral conditions are statistical.

The figure above shows forecasts of dynamical (solid) and statistical (hollow) models for sea surface temperature (SST) in the Nino 3.4 region for nine overlapping 3-month periods. While differences among the forecasts of the models reflect both differences in model design, and actual uncertainty in the forecast of the possible future SST scenario, the divergence between dynamical and statistical models is clear.

This question fascinates me so much, I studied it for three years “Machine learning and the problem of prediction and explanation in ecological modelling” (1992). Why is there a distinction between dynamical and statistical models? What does it mean for prediction? What does it mean if one set of models are wrong?

For example, what if ENSO remains in a neutral or even La Nina state, thus ‘disproving’ the dynamical models. These models are based in the current understanding of physics (with a number of necessary approximations). Clearly this would say that something about the understanding of the climate system is wrong.

Alternatively, what if the currently neutral ENSO resolves into an El Nino, ‘disproving’ the statistical models. These models are based in past correlative relationships between variables. It would mean that some important physical feature of the system that is missing from the correlative variable has suddenly come into play.

Why should there be a distinction between dynamical and physical models at all? I have always argued that good, robust prediction requires no distinction. More precisely , the set of predictive models is at the intersection of statistical and dynamical models.

To achieve this intersection, from a starting point of a statistical model, each of the parameters and their relationships should by physically measurable. That is, if you use a simple linear regression model, each of the coefficients need to be physically measurable and the physical relationships between them additive.

From a starting point of a dynamical model, the gross, robust features of the systems should be properly described, and if necessary statistically parameterized. This usually entails a first or second order differential equation as the model.

This dynamical/statistical model is then positioned to incorporate both meaningful physical structure, and accurate correlative relationships.

It amazes me that most research models are developed along either dynamical or statistical lines, while ignoring the other.

Screening on the dependent, auto-correlated variable

To screen or not to screen? The question arises in the context of selecting which sets of tree-rings to use for millennial temperature reconstructions. One side, represented by CA, says screening is just plain wrong:

In the last few days, readers have drawn attention to relevant articles discussing closely related statistical errors under terms like “selecting on the dependent variable” or “double dipping – the use of the same data set for selection and selective analysis”.

Another side, represented by Jim Boulden say, says screening is just fine.

So, once again, if you are proposing that a random, red noise process with no actual relationship to the environmental variable of interest (seasonal temperature) causes a spurious correlation with that variable over the instrumental period, then I defy you to show how such a process, with ANY level of lag 1 auto-correlation, operating on individual trees, will lead to what you claim. And if it won’t produce spurious correlations at individual sites, then it won’t produce spurious correlations with larger networks of site either.

Furthermore, whatever extremely low probabilities for such a result might occur for a “typical” site having 20 too 30 cores, is rendered impossible in any practical sense of the word, by the much higher numbers of cores collected in each of the 11 tree ring sites they used. So your contention that this study falls four square within this so called “Screening Fallacy” is just plain wrong, until you demonstrate conclusively otherwise. Instead of addressing this issue–which is the crux issue of your argument–you don’t, you just go onto to one new post after another.

Yet another side, represented by Gergis et.al., says screening is OK providing some preparation such as linear detrending is imposed:

For predictor selection, both proxy climate and instrumental data were linearly detrended over the 1921–1990 period to avoid inflating the correlation coefficient due to the presence of the global warming signal present in the observed temperature record. Only records that were significantly (p<0.05) correlated with the detrended instrumental target over the 1921–1990 period were selected for analysis.

I always find guidance in going back to fundamentals, which people never seem to do in statistics. Firstly, what does “records that were significantly (p<0.05) correlated with the detrended instrumental target" mean? It states that they expect that 95% of the records in their sample are responding to temperature as they want, and that 5% are spurious, bogus, wring-ins, undesirable, caused by something else. It is implicit that being wrong about 5% is good enough for the purposes of their study.

For example, imagine a population of trees where some respond to rainfall and some respond to temperature. Both temperature and rainfall are autocorrelated, and for the sake of simplicity, lets assume they vary independently. If we want to screen those that respond to temperature-only with 95% confidence we can do that by correlating their growth with temperature. But we do have to make sure that the screen we use is sufficiently powerful to eliminate the other, autocorrelated rainfall responders.

The question that arises from autocorrelation in the records -- the tendency for temperature to follow-on and trend even though they are random -- is that the proportion of spurious records, in most tests, may be much higher than 5%. That would be unacceptable for the study. The onus is on the author, by monte-carlo simulation or some other method, to show that the 5% failure rate is really 5%, and not something larger, like 50%, which would invalidate the whole study.

As the tendency of autocorrelated records is to fool us into thinking the proportion of spurious records is lower than it is, then the simplest, most straightforward remedy is to increase the critical value so that the actual proportion of spurious records is once again, around the desired 5% level. This might mean adopting a 99% critical value, a 99.9% or a 99.999% critical value depending on the degree of autocorrelation.

So I would argue that it not correct that screening is an error in all cases. Tricky, but not an error. It is also not correct to impost ad-hocery such as correlating on the detrended variable, as this might simply result in select a different set of spurious records. It is also not correct to apply screening blindly.

What you need to do, is do the modeling stock-standard correctly, as argued in my book Niche Modeling. Work out the plausible error model, or some set of error models if you are uncertain, and establish robust bounds for your critical values using monte-carlo simulation. In the case of tree studies, you might create a population of two or more sets of highly autocorrelated records to test that the screening method performs to the desired tolerance. In my view, the correlation coefficient is fine as it is as good as anything else in this situation.

You get into a lot less trouble that way.

“This is commonly referred to as ‘research’” – Gergis

Just what is the ‘research’ that Gergis et.al. claims to have done? And what are the sceptics complaining about?

The ‘research’ claimed by the Gergis et.al. team is captured in the following graphical representation of the past temperature of the Australiasn region.

The hockey stick shape has also been produced using similar methods and random data, as shown in my AIG news article in 2006, and also in chapter 11 of my 2007 book “Niche Modeling“.

It is obvious that if the same result is achieved with random data and with real-world data, the real-world data are probably random. That is, whatever patterns seen are not proven to be significant patterns, by the yardsticks of rigorous statistical methods.

These problems have been widely discussed at ClimateAudit since 2006, and my publications probably grew out of those discussions. Moreover, the circular argument has become commonly known as the “Screening Fallacy” and widely discussed in relation to this area of research ever since.

To claim results when they could equally be achieved by random numbers would get you laughed off the podium in most areas of science. Gergis et.al. informed Steve McIntyre superciliously, however, that this is commonly referred to as ‘research’.

One of the co-authors, Ailie Gallant, stars in the cringe-worthy We Are Climate Scientists, a pretentious rap-video proclaiming they are “fucking climate scientists” and “their work is peer reviewed” in dollar-store sunglasses and lab coats. They have no reason to act superior, and this recent effort proves the point.

Of course, Gergis et.al. claimed to have detrended the data before performing the correlations, and whether this ad-hocery would mitigate the circularity above is questionable. Whether by oversight or intent, it appears the detrending was not performed anyway. I don’t know whether this is the reason for the paper being pulled. We shall find out in time. The paper appears to be the result of a three-year research program, announced on Gergis’ personal blog.

The project, funded by the Australian Research Council’s Linkage scheme, is worth a total of $950K and will run from mid-2009 to mid-2012.

It gives me a job for three years and money to bring a PhD student, research assistant and part time project manager on board.

More importantly, it will go a long way in strengthening the much needed ties between the sciences and humanities scrambling to understand climate change.

Who is contributing the most to research, unpaid bloggers, or a one million, three years, tied with humanities fiasco?

Gergis’ hockeystick “on hold”

You may by now have heard here or here that “Evidence of unusual late 20th century warming from an Australasian temperature reconstruction spanning the last millennium” by Joelle Gergis, Raphael Neukom, Stephen Phipps, Ailie Gallant and David Karoly, has been put “on-hold” by the Journal, due to “an issue” in the processing of the data used in the study.

It is illuminating to review the crowing commentary by Australian science intelligencia and the press reaction to the paper.

ABC’s AM show, “Australia’s most informative (government funded) morning current affairs program. AM sets the agenda for the nation’s daily news and current affairs coverage.”

TONY EASTLEY: For the first time scientists have provided the most complete climate record of the last millennium and they’ve found that the last five decades years in Australia have been the warmest.

He then speaks for the IPCC:

The Australian researchers used 27 different natural indicators like tree rings and ice cores to come to their conclusion which will be a part of the next report by the United Nations Intergovernmental Panel on Climate Change.

The Gergis paper was proof enough for the ABC Science Show, which gives “fascinating insights into all manner of things from the physics of cricket”.

Robyn Williams: Did you catch that research published last week analysing the last 1,000 years of climate indicators in Australia? It confirmed much of what climate scientists have been warning us about.

Here is another via ABC Statewide Drive tweet.

Dr Joelle Gergis from @unimelb: We are as confident that the warming in the post 1950 period is unprecedent in the past 1000 years.

Such shallow and gullible commentary is no better than blogs such as Gerry’s blogistic digression gerry’s blogistic digression “I’ve got a blog and I’m gonna use it.”

It’s offical: Australia is warming and it is your fault.

The tone of the Real Scientists from realclimate is no better, jubilant that the “hockey-stick” has now been seen in Australia.

First, a study by Gergis et al., in the Journal of Climate uses a proxy network from the Australasian region to reconstruct temperature over the last millennium, and finds what can only be described as an Australian hockey stick.

As Steve Mosher said, such papers cannot be trusted. Putting aside questions of the methodology (that I will get to later), the reviewers in climate science don’t check the data, don’t check the numbers produce the graphs and tables published, or check that the numbers actually do what the text describes.

Yet they approve the paper for publication.

He is stunned this has to be explained to anyone. Apparently it does.