Errors of Global Warming Effects Modeling

Since 2006, in between promoting numeracy in education, and examples of simple statistics using topical issues from the theory of Anthropogenic Global Warming (AGW) to illustrate points, I asked the question “Have these models been validated?”, in blog posts and occasionally submissions to journals. This post summarizes these efforts.

Species Extinctions

Predictions of massive species extinctions due to AGW came into prominence with a January 2004 paper in Nature called Extinction Risk from Climate Change by Chris Thomas et al.. They made the following predictions:

“we predict, on the basis of mid-range climate-warming scenarios for 2050, that 15–37% of species in our sample of regions and taxa will be ‘committed to extinction’.

Subsequently, three communications appeared in Nature in July 2004. Two raised technical problems, including one by the eminent ecologist Joan Roughgarden. Opinions raged from “Dangers of Crying Wolf over Risk of Extinctions” concerned with damage to conservationism by alarmism, through poorly written press releases by the scientists themselves, and Extinction risk [press] coverage is worth the inaccuracies stating “we believe the benefits of the wide release greatly outweighed the negative effects of errors in reporting”.

Among those believing gross scientific inaccuracies are not justified, and such attitudes diminish the standing of scientists, I was invited to a meeting of a multidisciplinary group of 19 scientists, including Dan Bodkin from UC Santa Barbara, mathematician Matt Sobel, Craig Loehle and others at the Copenhagen base of Bjørn Lomborg, author of The Skeptical Environmentalist. This resulted in Forecasting the Effects of Global Warming on Biodiversity published in 2007 BioScience. We were particularly concerned by the cavalier attitude to model validations in the Thomas paper, and the field in general:

Of the modeling papers we have reviewed, only a few were validated. Commonly, these papers simply correlate present distribution of species with climate variables, then replot the climate for the future from a climate model and, finally, use
one-to-one mapping to replot the future distribution of the species,without any validation using independent data. Although some are clear about some of their assumptions (mainly equilibrium assumptions), readers who are not experts in modeling can easily misinterpret the results as valid and validated. For example, Hitz and Smith (2004) discuss many possible effects of global warming on the basis of a review of modeling papers, and in this kind of analysis the unvalidated assumptions of models would most likely be ignored.

The paper observed that few mass extinctions have been seen over recent rapid climate changes, suggesting something must be wrong with the models to get such high rates of extinctions. They speculated that species may survive in refugia, suitable habitats below the spatial scale of the models.

Another example of an unvalidated assumptions that could bias results in the direction of extinctions, was described in chapter 7 of my book Niche Modeling.


When climate change shifts a species’ niche over a landscape (dashed to solid circle) the response of that species can be described in three ways: dispersing to the new range (migration), local extirpation (intersection), or expansion (union). Given the probability of extinction is correlated with range size, there will either be no change, an increase (intersection), or decrease (union) in extinctions depending on the dispersal type. Thomas et al. failed to consider range expansion (union), a behavior that predominates in many groups. Consequently, the methodology was inherently biased towards extinctions.

One of the many errors in this work was a failure to evaluate the impact of such assumptions.

The prevailing view now, according to Stephen Williams, coauthor of the Thomas paper and Director for the Center for Tropical Biodiversity and Climate Change, and author of such classics as “Climate change in Australian tropical rainforests: an impending environmental catastrophe”, may be here.

Many unknowns remain in projecting extinctions, and the values provided in Thomas et al. (2004) should not be taken as precise predictions. … Despite these uncertainties, Thomas et al. (2004) believe that the consistent overall conclusions across analyses establish that anthropogenic climate warming at least ranks alongside other recognized threats to global biodiversity.

So how precise are the figures? Williams suggests we should just trust the beliefs of Thomas et al. — an approach referred to disparagingly in the forecasting literature as a judgmental forecast rather than a scientific forecast (Green & Armstrong 2007). These simple models gloss over numerous problems in validating extinction models, including the propensity of so-called extinct species quite often reappear. Usually they are small, hard to find and no-one is really looking for them.


One of the pillars of AGW is the view that 20th-century warmth is exceptional in the context of the past 1200 years, illustrated by the famous hockey-stick graph, as seen in movies, and government reports to this day.

Claims that 20th-century warming is ‘exceptional’ rely on selection of so-called temperature ‘proxies’ such as tree rings, and statistical tests of the significance of changes in growth. I modelled the proxy selection process here and showed you can get a hockey stick shape using random numbers (with serial correlation). When the numbers trend, and then are selected based on correlation with recent temperatures, the result is inevitably ‘hockey stick’ shaped: i.e. with a distinct uptick where the random series correlated with recent temperatures, and a long straight shaft as the series revert back to the mean. My reconstruction was similar to many other reconstructions with low variance medieval warm period (MWP).


It is an error to underestimate the effect of ex-post selection based on correlation or ‘cherry picking’ on uncertainty. Cherry picking has been much criticised on ClimateAudit. Steve McIntyre and Ross McKitrick published in February 2009 a comment, cited my AIG article, in a criticism of an article by Michael Mann, saying:

Numerous other problems undermine their conclusions. Their CPS reconstruction screens proxies by calibration-period correlation, a procedure known to generate ‘‘hockey sticks’’ from red noise (4).

The response by Michael Mann acknowledged such screening was common, used in their reconstructions, but claimed it was ‘unsupported’ in the literature.

McIntyre and McKitrick’s claim that the common procedure (6) of screening proxy data (used in some of our reconstructions) generates ‘‘hockey sticks’’ is unsupported in peer-reviewed literature and reflects an unfamiliarity with the concept of screening regression/validation.

In fact, it is supported in the peer-reviewed literature, as Gerd Bürger raised the same objection in a Science 29 June 2007 comment on “The Spatial Extent of 20th-Century Warmth in the Context of the Past 1200 years by Osborn and Keith R. Briffa (29 June 2007)” finding 20th-Century warming not exceptional.

However, their finding that the spatial extent of 20th-century warming is exceptional ignores the effect of proxy screening on the corresponding significance levels. After appropriate correction, the significance of the 20th-century warming anomaly disappears.

The National Academy of Science agreed that uncertainty was greater than appreciated, and shortened the hockey-stick of the time by 600 years (contrary to assertions in the press).

Long Term Persistence (LTP)

is one of my first php applications, a fractional differencing simulation climate. Reload to see a new simulation below, together with measures of correlation (r2 and RE) with some monthly climate figures of the time.

This little application gathered a lot of interest, I think because fractional differencing is an inherently interesting technique, creates realistic temperature simulations, and is a very elegant way to generate series with long term persistence (LTP), a statistical property that generates natural ‘trendiness’. One of the persistent errors in climate science has been the failure to take into account the autocorrelation in climate data, leading to inflated significance values.

It has been noted that there are no requirements for verified accuracy for climate models to be incorporated into the IPCC. Perhaps if I got my random model published it would qualify. It would be a good benchmark.

Extreme Sensitivity

“According to a new U.N. report, the global warming outlook is much worse than originally predicted. Which is pretty bad when they originally predicted it would destroy the planet.” –Jay Leno

The paper by Rahmstorf et al. must rank as one the most quotable of all time.

The data available for the period since 1990 raise concerns that the climate system, in particular sea level, may be responding more quickly to climate change than our current generation of models indicates.

This claim, made without the benefit of any statistical analysis or significance testing is widely quoted to justify claims that the climate system is “responding more strongly than we thought”. I debated this paper with Stefan at RealClimate, and succeeded in demonstrating they had grossly underestimated the uncertainty.

His main defense was that the end point uncertainty would only affect the last 5 points of the smoothed trend line with an 11 point embedding. Here the global temperatures were smoothed using a complex method called Singular Spectrum Analysis (SSA). I gave examples of SSA and other methods where the end point uncertainty affected virtually ALL points in the smoothed trend line, and particularly more than 5 end points. Stefan clearly had little idea of how SSA worked. His final message, without an argument, was:

[Response: If you really think you’d come to a different conclusion with a different analysis method, I suggest you submit it to a journal, like we did. I am unconvinced, though. -stefan]

But to add insult to injury, this paper figured prominently in the Interim Report of the Garnaut Review where I put in a submission.

“Developments in mainstream scientific opinion on the relationship between emissions, accumulations and climate outcomes, and the Review’s own work on future business-as-usual global emissions, suggest that the world is moving towards high risks of dangerous climate change more rapidly than has generally been understood.”

As time moves on and more data is available, a trend line using the same technique is regressing to the mean. It is increasingly clear that the apparent upturn was probably due to the 1998 El Nino. It is an error to regard a short term deviation as an important indication of heightened climate sensitivity.

More Droughts

The CSIRO Climate Adaptation Flagship produced a Drought Exceptional Circumstances Report (DECR), suggesting among other things that droughts would double in the coming decades. Released in the middle of a major drought in Southern Australia, this glossy report had all the hallmarks of promotional literature. I clashed with CSIRO firstly over release of their data, and then in attempting to elicit a formal response to issues raised. My main concern was that there was no apparent attempt demonstrating the climate models used in the report were fit for the purpose of modeling drought, particularly rainfall.

One of the main results of my review of the data is summed up in the following graph, comparing the predicted frequency and severity of low rainfall over the last hundred years, with the observed frequency and severity of low rainfall. It is quite clear that the models are inversely related to the observations.


A comment submitted to the Australian Meteoreological Magazine was recently rejected. Here I tested the models and observation following an approach of Rybski of analyzing difference between discrete periods 1900-1950 and 1950-2000. The table belows shows that while drought decreased significantly between the periods, modeled droughts increased significantly.

p>Table 1: Mean percentage area of exceptionally low rainfall over time periods suggested by KB09. A Mann Whitney rank-sum test shows significant differences between periods.

1900-2007 1900-1967 1951-2007 P 1900-2007 vs. 1951-2007 P 1900-1950 vs. 1951-2007 Test
Observed % Area Drought 5.6±0.5 6.2±0.7 4.9±0.6 0.10 0.004 Mann-Whitney test
(wilcox.test(x,y) in R)
Modelled % Area Drought 5.5±0.1 4.8±0.2 6.2±0.2 0.006 <0.001 Mann-Whitney test
(wilcox.test(x,y) in R)

Moreover I showed that while similar results were reported for temperature in the DECR (where models and observations are more consistent), they were not reported for rainfall.

The reviewers did not comment on the statistical proof that the models were useless at predicting drought. Instead, they pointed to Fig 10 in the DECR, a rough graphic, claiming “the models did a reasonable job of simulating the variability”. I am not aware of any statistical basis for model validation by the casual matching of the variability of observations to models. The widespread acceptance of such low standards of model validation is apparently a feature of climate science.

Former Head of the Australian Bureau of Statistics Ian Castles solicited a review by ANU independent Accredited Statisticians, Brewer and Other. They concurred that models in the DECR required validation (along with other interesting points).

Dr Stockwell has argued that the GCMs should be subject to testing of their adequacy using historical or external data. We agree that this should be undertaken as a matter of course by all modelers. It is not clear from the DECR whether or not any such validation analyses have been undertaken by CSIRO/BoM. If they have, we urge CSIRO/BoM make the results available so that readers can make their own judgments as to the accuracy of the forecasts. If they have not, we urge them to undertake some.

A persistent error in climate science is using models when they have not been shown to be ‘fit for purpose’.


Recently a paper came out potentially undermining the central assumptions of climate modeling. Supported by extensive empirical validation, it was suggested that ‘optical depth’ in the atmosphere is maintained at an optimal, constant value (in the average over the long term). Finding an initial negligible sensitivity of 0.24C surface temperature increase to doubling CO2 increase, it then goes on to suggest constrains that ensure equilibrium will eventually be established, giving no increase in temperature, due to reversion to the constant optical depth. The paper by Ferenc Miskolczi, (2007) called Greenhouse effect in semi-transparent planetary atmospheres, was published in the Quarterly Journal of the Hungarian Meteorological Service, January–March 2007.

I was initially impressed by the extensive validation of his theory using empirical data. Despite a furious debate online, there has been no peer-reviewed rebuttal to date. The pro-AGW blog site RealClimate promised a rebuttal by “students” but to date has made none. This suggests either that it is carefully ignored, or it is transparently flawed.

Quite recently Ken Gregory encouraged Ferenc to run his model using actual recorded water vapor data which declines in the upper atmosphere over the last few decades. While there are large uncertainties associated with these data, they do show a decline consistent with Ferenc’s theory, that water vapor (a greenhouse gas) will decline to compensate for increased CO2. The results of Miskolczi’s calculations using his line-by-line HARTCODE program are given here.

The theoretical aspects of Ferenc’s theory have been been furiously debated online. I am not sure that any conclusions have been reached, but nor has his theory been disproved.


What often happens is that a publication appears which gets a lot of exciting attention. Then some time later, rather quietly, subsequent work gets published that questions the claim or substantially weakens it. But that doesn’t get any headlines, and the citation rate is typically 10:1 in favor of the alarmist claims. It does not help that the IPCC report selectively cites studies, and presents unvalidated projections as ‘highly likely’, which shows they are largely expert forecasts, not scientific forecasts.

All of the ‘errors’ here can be attributed to exaggeration of the significance of the findings, due to inadequate rigor in the validation of models. This view that this is an increasing problem is shared by new studies of rigor from the intelligence community, but apply even more to data derived so easily from computer modeling.

The proliferation of data accessibility has exacerbated the risk of shallowness in information analysis, making it increasingly difficult to tell when analysis is sufficient for making decisions or changing plans, even as it becomes increasingly easy to find seemingly relevant data.

I also agree with John P. A. Ioannidis, who in a wide-ranging study of medical journals found that Most Published Research Findings Are False. To my mind when the methodologies underlying AGW are scrutinized, the findings seem to match the prevailing bias. To make matters worse, in most cases, the response of the scientific community has been to carefully ignore, dissemble, or ad hom dissenters, instead of initiating vigorous programs to improve rigor in problem areas.

We need to adopt more practices from clinical research, such as the structured review, whereby the basis for evaluating evidence for or against an issue is well defined. In this view, the IPCC is simply a review of the literature, one among reviews by competing groups (such as NIPCC REPORT 2008 Nature, Not Human Activity, Rules the Climate). In other words, stop pretending scientists are unbiased, but put systems in place to help prevent ‘group-think’ and promote more vigorous testing of models against reality.

If the very slow, to no rate of increase in global temperature continues, we will be treated to the spectacle of otherwise competent researchers clinging to extreme AGW, while the public become more cynical and disinterested. This would have been avoided if they had been confronted with “Are these models validated? If they are, by all means make your forecasts, if not, don’t.”

Jan Pompe Science Project

Some time ago I had a brief discussion with Leif Svalgaard on ClimateAudit blog inspired by an exchange between Leif and David Archibald when the latter complained that Leif’s TSI reconstruction was “too flat”.

The sunspots exhibited cyclic variability in terms of the frequency of the cycles and that most thermostats work by pulse width modulation and some digital music with pulse frequency modulation. Both these work in a similar manner the thermal inertia of whatever the thermostat is controlling smooths the temperature variability and the pulse frequency modulation’s demodulator is a simple low pass filter often just a series resistor and shunt capacitor. In both these cases only the duty cycle or the frequency varies but not the amplitude. Below is a description of how this behaviour can be simulated with an electrical circuit emulator called ‘qucs’.

Continue reading Jan Pompe Science Project

Nir Shaviv

The theory of this Israeli astrophysicist has gained traction as the great white hope of climate skeptics. Below are some sources of background reading.

Shaviv champions the solar-wind modulated cosmic ray flux (CRF) hypothesis, which was suggested by Ney, discussed by Dickenson, and furthered by Svensmark (see CO2 Science). Evidence consistes of correlations between CRF variations and cloud cover, correlations between non-solar CRF variations and temperature over geological timescales, as well as experimental results showing that the formation of small condensation nuclei could be bottlenecked by the number density of atmospheric ions.

Basically, high CRF ionizes particles that seed more clouds, causing cooling. Low CRF produces brighter cloud free condition, resulting in warming.

Recently, he reports in GRL that three independent data sets show that the oceans absorb and emit an order of magnitude more heat than could be expected from just the variations in the total solar irradiance, implying the existence of an amplification mechanism. Shaviv, says this predicts the correct radiation imbalance observed in the cloud cover variations that are needed to produce the magnitude of the net heat flux into the oceans associated with the 11-year solar cycle.

The Reference Frame had an article about Shaviv recently too, noting significant pushback by RealClimate, proof the CRF theory is a viable alternative to the GHG warming as the main explanation for recent warmth.

By the way, despite all the huge pro-greenhouse bias in the journals and elsewhere, the Shaviv-Veizer paper has 91 citations right now, while the almost immediate alarmist reply by 11 authors, including RealClimate’s Rahmstorf, Archer, and Schmidt, only has 24 citations.

A very instructive exchange ensued in May 2006 at the RealClimate post “Thankyou for Emitting” where Shaviv challenged masterfully (starting at post 37), until the team eventually threw in the towel around post 125.

On the subject of Rahmstorf, Shaviv’s own blog site ScienceBits refers to RealClimate as in a post More slurs from RealClimate. He pins them as bleeding hearts and intellectual lightweights as well. continues with its same line of attack. writers try again and again to concoct what appears to be deep critiques against skeptic arguments, but end up doing a very shallow job. All in the name of saving the world. How gallant of them.

Since there is no evidence which proves that 20th century warming is human in origin, the only logically possible way to convict humanity is to prove that there is no alternative explanation to the warming (e.g., see here). My motivation (as is the motivation of my serious colleagues) is simply to do the science as good as I can.

But Nir is not an extremist discounting all effects of greenhouse gasses.

In fact, my best estimate for climate sensitivity implies that anthropogenic radiative forcing explain about 1/3 of the 20th century warming, in particular over the past few decades.

Some of the flavor of the debate between them can be seen from the following two comments at Shaviv’s blog:

Rasmus: You are wrong about the motivation about our critisism, Shaviv; we are primarily interested in doing good sicence. We want to unravel the facts behind climate variability. In science, one challenge other views if one finds them strange or not credible. This is what we habve done. You make claims based on your own subjective belief og based on far-fetched speculations. The fact is that the claim that the recent global warming is due to GCR is not supported be any real evidence; there is no credible trend in the solar activity or GCR in the last ~50 years.

Shaviv: Perhaps you’re right. But if so, then it means you should have the integrity to add at the end of your post (and not buried in the discussion below), an addendum saying that this particular critique turned out to be wrong, as Kranz et al. is not applicable to the Milky Way. I for my part would add a similar addendum to my response, specifying that my comments about motives was wrong.

Second, over all, there was a large increase in the solar activity over the 20th century, even if you discard the Yakutsk data (used in the Ahluwalia plot), and this increase explains a large fraction of the 20th century temperature increase if the CRF/climate link is real. As for the temperature increase over the 1990’s, see my response above. Some of the warming is due to the fact that although there was a decrease in the indirect solar forcing over the last cycle, it is still notably above the current forcing/temperature equilibrium (and therefore causes warming), and of course, some of the warming is anthropogenic.

The scientific issues are not settled.

Examples of Research Bias

The Financial Times recently reported on the Australian bushfires, linking them to increases in greenhouse gases. We take another look at the data in the DECR and find Australia is getting wetter not drier:

Scientists say Australia, with its harsh environment, is set to be one of the nations most affected by climate change.

“Continued increases in greenhouse gases will lead to further warming and drier conditions in southern Australia, so the [fire] risks are likely to slightly worsen,” Kevin Hennessy at the Commonwealth Scientific and Industrial Research Centre told Reuters.

Bob Brown, the senator who leads the Greens party, said the bushfires provided stark evidence of what climate change could mean.

“Global warming is predicted to make this sort of event happen 25 per cent, 50 per cent more,” he said. “It’s a sobering reminder of the need for this nation and the whole world to act and put at a priority our need to tackle climate change.”

The Drought Exceptional Circumstances Report, that I have been reviewing in this series, promoted these conclusions. Lets look at another analysis, this time using simple quantile analysis of the data in Table 3. This table contains the average percentage area having exceptionally low rainfall years for selected 40-year periods and the most recent
decade (1998-2007).

1900-1939 1910-1949 1920-1959 1930-1969 1940-1979 1950-1989 1960-1999 1968-2007 1998-2007
Qld 9.5 6.5 5.5 4.1 3.3 3.1 2.7 2.6 4.7
NSW 5.7 6.9 5.7 6.2 5.8 4.3 4.0 3.8 6.4
Vic&Tas 5.3 6.0 4.2 6.1 5.1 5.0 5.3 5.2 8.5
SW 5.2 7.1 7.2 6.9 7.9 5.9 4.9 4.4 3.4
NW* 6.3 5.3 6.5 7.5 6.5 6.1 4.7 3.5 3.3
MDB 6.1 7.2 5.8 6.4 5.7 4.1 3.5 3.5 6.9
SWWA 2.5 4.7 4.1 6.5 8.3 6.1 6.3 8.5 8.9
Australia 6.4 6.4 6.6 6.4 6.3 5.3 4.6 3.5 3.1

Using the function ‘quantile’ in R, we output the percentage areas for each probability in each 40 year period. Then we lookup the probability for each region using the most recent 40 year period 1968-2007.

5% 10% 50% 90% 95%
3.25 4.05 5.85 7.15 7.60

Regions, area and probability drought has increased.
Qld 2.6 <5%
NSW 3.8 <10%
Vic&Tas 5.2 NS
SW 4.4 NS
NW* 3.5 <10%
MDB 3.5 95%
Australia 3.5 <10%

The results show that over the last 40 years, regions Qld, NSW, NW, and MDB have had significantly less area under drought. Only in SWWA has the drought area increased significantly, while Vic&Tas (the region of recent bushfires) and SW have no significant change.

The ‘inconvenient’ results were reported in the DECR text as follows:

Observed trends in exceptionally low rainfall years are highly dependent on the period of analysis due to large variability between decades.

Despite these highly significant DECR results showing Australia getting wetter, not drier, CSIRO scientists continue to report in the media that Australia will get drier.

It only takes two thoughts to realize that wetter conditions can pose greater fire risks due to the greater production of fuel in the wet season, and more dangerous conditions when it drys out. Drier conditions lead to a more open grassland environment in Australia, much like the African Savannah, with cooler grassfires but not the hot forest fires suffered recently in Victoria. You simply cannot look at environmental factors in isolation.

But don’t tell CSIRO, or the next thing we will hear is that greenhouse gases are causing more fires by making it wetter.

Climate Flagship Response

A number of familiar tests, often used to evaluate the performance of models: R2 correlation, Nash-Sutcliffe efficiency and similarity of trends and return period, were reported here, noting not much evidence of skill in the DECR models compared with observations at any of these. I also said what a better treatment might entail but left that for another time:

The percentage of droughted area appears to be a ’bounded extreme value, peaks over threshold’ or bounded POT statistic. The distribution resembles a Pareto (power) law, but due to the boundedness where predicted extent of drought approaches 100% becomes more like a beta distribution (shown for SW-WA on Fig 2). Recasting the drought modeling problem into known statistical methods might salvage some data from the DEC report. Aggregating the percentage area under drought to the whole of Australia might reduce the boundedness of the distribution, and might also improve the efficiency of the models.

Aware that the tests I applied were not the last word due to the idiosyncratic nature of the data, the conclusion in the summary was slightly nuanced: that as there was no demonstration of skill at modeling drought (in both the DECR and my tests), and as validation of models is necessary for credibility, there is no credible basis:

Therefore there is no credible basis for the claims of increasing frequency of Exceptional Circumstances declarations made in the report.

What is needed to provide credibility is demonstrated evidence of model skill.

Andrew Ash, of the CSIRO Climate Flagship sent a response on 12/18/08. This was to fulfill an obligation he made on 16 Sep 2008, to provide a formal response to your review of the Drought Exceptional Circumstances report (dated 3 Sep 2008), after my many requests to provide details of the validation to skill at modelling of droughts.

I must say I was very pleased to see there was no confidentially text at the end of the email. I feel so much more inclined to be friendly without it. I can understand confidentiality inside organizations, but to send out stock riders to outsiders is picayune. The sender should be prepared to stand by what they say in a public forum and not hide behind a legal disclaimer. Good on him for that.

The gist of the whole email is that he felt less compelled to respond due to an ongoing review I sent to the Australian Meteorological Magazine on 23 Sep 2008. As it was, I was still waiting for the review from the AMM on the 18th December when I received this response. Kevin Hennessy relayed some advice from Dr Bill Venables, a prominent CSIRO statistician, and the following didn’t add anything:

However, we have looked at the 3 Sep 2008 version of your review. The four climate model validation tests selected in your analysis are inappropriate and your conclusions are flawed.

* The trend test is invalidly applied because (i) there is a requirement that the trends are linear and (ii) the t-test assumes the residuals are normally distributed. We undertook a more appropriate statistical test. Across 13 models and seven regions, there are no significant differences (at the 5% level) between the observed and simulated trends in exceptionally low rainfall, except for four models in Queensland and one model in NW Australia.

Its well known that different tests can give different results, depending on the test. It is also true that some tests may be better or more reliable than others. Without more details of their more test its hard to say anything, except to say that lack of significance does not demonstrate skill. The variability of the climate model outputs could be so high that they allow ‘anything to be possible’, as often seems to be the case.

* The correlation and efficiency tests are based on an assumption that the climate models are being used to hindcast historical weather. This assumption is incorrect. As a result, the tests selected are inapplicable to the problem being addressed. This in turn leads to false conclusions.

This would be true if these were the only tests and if correlation and efficiency were dependent entirely on ‘short-term-fluctuations’. They are not as they will capture skill at modeling both short AND long term fluctuations. This is also why I placed more emphasis skill on modelling trends over climatically relevant time scales. He is also not specific about which conclusions. The conclusion of ‘no credible basis’ is not falsified by lack of evidence.

It should also be noted that the DECR also considered return periods (Tables 8 and 10) so any criticism of returns periods applies equally to the DECR.

* The return period test is based on your own definition of ‘regional return period’, which is different from the definition used in the DEC report. Nevertheless, your analysis does highlight the importance of data being collected or produced at different resolutions and the effect this has on interpretations of the frequency of drought. The observed data have the shortest return period as they have the finest spatial resolution and the model based return regions have increasingly larger mean return periods, inversely related to the spatial resolution at which they are reported. We were well aware of this issue prior to the commencement of the study and spent a considerable amount of time designing an analysis that would be robust to take this effect into account.

I appreciate the explanation for the lack of skill at modelling return period, which measures drought frequency, as opposed to drought intensity measured by efficiency. Nevertheless, lack of demonstrated skill at modelling drought frequency stands.

Note they continue to be unresponsive to requests for evidence the climate models have skill at modelling droughts. Where we stand at the moment is, that irrespective of the reliability of my tests, there is still no evidence of skill to be seen, at the short term, long term, or at drought intensity or frequency and so my claim of “no credible basis for the claims of increasing frequency of Exceptional Circumstances declarations” still stands. The Climate Flagship has steadfastly abjured presenting validation evidence. While the concerns expressed have relevance to the quality of the tests (which are widely used, but problematic due to the strange data), they were not precise about the conclusions or claims they were trying to rebut.

Further Work

I came across a recent drought study that also finds no statistical significance between models and drought observations. This was actually in Ref 27 of the DECR: Sheffield, J. &Wood, E. F. Projected changes in drought occurrence under future global warming from multi-model, multi-scenario, IPCC AR4 simulations. Climate Dynamics 13, 79-105 (2008).

Although the predicted future changes in drought occurrence are essentially monotonic increasing globally and in many regions, they are generally not statistically different from contemporary climate (as estimated from the 1961-1990 period of the 20C3M simulations) or natural variability (as estimated from the PICNTRL simulations) for multiple decades, in contrast to primary climate variables, such as global mean surface air temperature and precipitation.

Below is a plot of the observations for the drought statistic, area experiencing less than 5% exceptionally low rainfall (leading to an Exceptional Circumstances drought declaration). You can see how ‘peaky’ it is, even when the average is taken (black).


In some ways it might have been better to just knuckle down and develop a POT model right from the start, as it might have allowed me to produce a less nuanced response. I have been doing that, but have had to upgrade R. A recompilation is needed, that took all night and lost my graphic interface to R. Even then, the package VGAM doesn’t compile for some reason, so I have to look for other packages.

Assumptions for linear regression

One of the main assumptions of linear regression is, ahem, linearity. Here is an example drawn from dendroclimatology, the reconstruction of past climates using tree rings, of the trouble one can get into by blindly assuming linearity. This subject was dealt with some time ago at ClimateAudit Upside-Down Quadratic Proxy Response.

From the Summary of chapter 9 of my book, niche-modeling-chap-9

9.2 Summary
These results demonstrate that procedures with linear assumptions are unreliable when applied to the non-linear responses of niche models. Reliability of reconstruction of past climates depends, at minimum, on the correct specification of a model of response that holds over the whole range of the proxy, not just the calibration period. Use of a linear model of non-linear response can cause apparent growth decline with higher temperatures, signal degradation with latitudinal variation, temporal shifts in peaks, period doubling, and depressed long time-scale amplitude.

Niche Modeling: Predictions from Statistical Distributions. Chapman & Hall/CRC, Boca Raton, FL., 2007.

I notice Craig Loehle converged on similar results in post about a publication on the Divergence Problem. In the abstract Craig finds a similar quantitative depression in the range of signal recovered.

If trees show a nonlinear growth response, the result is to potentially truncate any historical temperatures higher than those in the calibration period, as well as to reduce the mean and range of reconstructed values compared to actual.

By far the most interesting result I find is the introduction of ‘doubling’ from assuming a linear response of a non-linear variable. This is illustrated by Craig’s figure here:

Because over the course of one climate cycle, the tree passes through two optimal growth periods, the tree is, in electrical terms, a frequency doubler. This would create enormous difficulties in trying to detect major features such as Medieval Warm Periods and Little Ice Ages from such a responder.

But the problems do not end there. According to the latitudinal (or attitudinal) location of the tree, relative to its optimal growth zone, the location of the doubled peaks is shifted temporally. This shifting of the peaks is illustrated in the figure below, taken from my chapter.

If one then imposes two non-linear responses, such as temperature and rainfall, the response becomes even more choppy, as shown in another graphic from the chapter.

The recovery of a climate signal in the face of nonlinearity of response is fraught with difficulties. When the fundamental growth response of a trees, and all living things actually, is known to be a non-linear niche-like response, there is more onus on modelers to prove their methods are adequate.

While not unaware of the problem, most often in climate (and ecological) science, risky statistical prediction methods are used with inadequate validation, or like the drought modeling efforts by CSIRO here, results from GCMs are used with no attempt to demonstrate they are ‘fit-for-purpose’ at all. Rob Wilson argues at ClimatAudit that while linear modelling of tree-growth relationships is not ideal, the field is ripe for some fancy non-linear modelling. Given the range of exotic features introduced by non-linearities, as I showed above, I would argue that fancy non-linear modeling would probably lead more surely to self deception, and a better path is robust validation.

Rahmstorf Revisited

The sharp-eyed UC who keeps a good technical blog on signal theory alerted me to this intelligent reference in the Finnish media to Rahmstorf et al. 2007. This is a paper I have reviewed previously and had words with Stefan at RealClimate demonstrating they had grossly underestimated the uncertainty at the end points. This flawed paper is widely quoted to justify claims that the climate system is “responding more strongly than we thought”.

Who said statistics lie?

Translation: “VO: The updated trend is just as significant as the original one, done by Rahmstorf with other top scientists of the IPCC – it was calculated with the very same method.”


Rahmstorf curve was mentioned in Finnish TV document on Monday. It’s in Finnish, and not visible outside Finland, but the manuscript is in English here,

“MOT asked a statistical expert to update the curve with real climate data up to July 2008. And here’s the result: the trend has sunk fast and now at the lower end of the range of model predictions. So, if we take the modellers’ own method, we discover that, warming has clearly more modest than predicted.”

see the updated version of the curve !
( I had nothing to do with this )

Rahmstorf 7 Finale

“According to a new U.N. report, the global warming outlook is much worse than originally predicted. Which is pretty bad when they originally predicted it would destroy the planet.” –Jay Leno

If ever there was a good example of alarmists views being given a free ride by a major journal, then the publication in Science of “Recent Climate Observations Compared to Projections” by
Stefan Rahmstorf, Anny Cazenave, John A. Church, James E. Hansen, Ralph F. Keeling, David E. Parker, and Richard C. J. Somerville is it.

This paper claimed to show that:

The data available for the period since 1990 raise concerns that the climate system, in particular sea level, may be responding more quickly to climate change than our current generation of models indicates.

By way of recap, this paper figured prominently in the Interim Report of the Garnaut Review where it is clearly used as a source of mainstream scientific opinion:

“Developments in mainstream scientific opinion on the relationship between emissions, accumulations and climate outcomes, and the Review’s own work on future business-as-usual global emissions, suggest that the world is moving towards high risks of
dangerous climate change more rapidly than has generally been understood.”

Interest in the current weather has been growing since people have been observing either sharp declines in temperatures since last year, or relative stability in temperatures over about the last 10 years and wondering how these fit into the picture of global warming. I did some posts putting it into context showing last years temperature drop was not unusual here, that a particular 10 year period has been flat here, and that a number of climate indicators are showing decadal stability here.

The Blackboard has been spear-heading rigorous statistical methods for checking IPCC projections and finding post 2001 TAR consistently falsified by climate trends.

Contradicting these findings was the paper by Rahmstorf et al 2007, published in Science, by seven of the leading members of the IPCC scientific team. So, I started to audit this paper to see if this paper does in fact provide a more reliable perspective on the issue of whether climate is changing faster or slower than expected.

A number of bloggers ‘raised concerns’ about the vague description of the methodology, and argued at Niche Modeling and The Blackboard that there were important sources of uncertainty unaccounted for. Other blogs picked up the issue including Peter Gallagher and Mark Lawson.

Stefan Rahmstorf and I exchanged comments at and here.

His main defense was that the end point uncertainty would only affect the last 5 points of the smoothed trend line with an 11 point embedding. Here the global temperatures were smoothed using a complex method called Singular Spectrum Analysis (SSA). I gave examples of SSA and other methods where the end point uncertainty affected virtually ALL points in the smoothed trend line, and particularly more than 5 end points. Stefan clearly had little idea of how SSA worked. His final message, without an argument, was:

[Response: If you really think you’d come to a different conclusion with a different analysis method, I suggest you submit it to a journal, like we did. I am unconvinced, though. -stefan]

So much for the recap. Keep in mind that the purpose of a scientific exchange like this is to clarify the points of agreement and disagreement and attempt to arrive at a resolution on the validity of the claims. Note the problem I raised is not the only obvious problem either, but just one I worked on. This is not meant to be a personal process. I am grateful for someone to point out errors in my work and would try to understand them, as I would rather not be blowing smoke unintentionally.

This example highlights the power of numbers to resolve an issue. Stefan can have his opinion, and I have opinions too, but the thing I love is the power of numbers to arbitrate and discriminate, and ultimately eliminate the unjustified ones.

Also I was wanting to address the Garnaut Review, as I feel that they are abrogating a duty of diligence by not paying more critical attention to papers such as these. Here was an opportunity to give a specific example of a paper with flaws so obvious that it SHOULD have been dismissed by anyone with statistical training, or background knowledge.

So thank you readers for your patience with this process. I have put a submission into the Garnaut Review supported by documentation from the web sites involved.

Here is a good example of the use of blogs. As the time for comments has closed, I could not submit a critique to Science. It is also better to have a through and open discussion of the issues at hand anyway, before rushing to publication of critical comments, so both can gain a deeper understanding of the finer points. It is unfortunate that Stefan cut the discussion off, but to his credit he was responsive to the actual concerns in the replies he did make.

Examples of simple smoothers

How much error is there in smoothed climatic and financial series? How much does variability at the ends of the series affect the trend as a result? Here we showed that certain ways of treating the end points introduce a lot of variability. Here we show in certain smoothers variability at the ends can affect the whole smooth!

Below are three different methods with slightly varying end point treatments. Two are causal smoothers (SSA and spline) and one is acausal (moving average). Causal smoothers do not use future data to create a trend to the end point of the series. Acausal smoothers (such as moving averages) need past and future data, and so stop half a window short of the endpoint (see wiki)

All data are global temperature data from GISS from 1973 to 2006.

1. Singular Spectrum Analysis.

Below is the result of two approaches using CaterpillarSSA with an 11 year embedding period. The red curve was a result of padding the end with data reflected around the final 2006 value, the so-called ‘minimum roughness condition’ or MRC. The blue trend is without padding. The green line is the simple linear regression of the 34 years.


The two different approaches differ throughout the whole length, except where the two curves meet at year 1999. The last seven points deviate quite a lot, illustrating the extra uncertainty at the end. Further discussion of this here.

2. Smooth spline

The figure below shows a smooth spline method of fitting and another approach to estimating uncertainty. This fits a higher order non-linear regression line with 11 degrees of freedom to the points. In this figure, the last point at 2006 has been altered to either the top or bottom of the 95% channel range. That is, the last point covers the range of random variation that might reasonably be expected in 2006.


The two curves differ again, but this time they flex about the 11th point from the end. Further discussion of this method here.

3. Moving average

The final figure below shows the result of running a moving average with an end point at 2006 of 0.6 and 0.3.


The moving average stops 5 points short of the end of the series, and the last point of the trend varies as a result of this variation at 2006.


So this shows that methods and data variability results in uncertainty of the trend line, and the uncertainty is particularly pronounced at the end points.

There is a difference between the causal and acausal smoothers used here. In regression type causal smoothers, the end variation can be propogated throughout the whole series. But the regression smoothers have the advantage of extending the smooth all the way to the end of the series (and further if predictions are made). In moving averages the variation is more localized, but the smooth stops short of the end.

Below is a plot of monthly global temperatures from Hadley and GISS with their smooth splines (11df) and regression lines. This is suggestive of temperatures fluctuating more or less randomly above and below a long term trend line.


Thanks to Stefan Rahmstorf for prompting this comparison here.

Comments are closed to allow discussion here.

Rahmstorf et al. 2007 Update

Well it is almost 24 hours since I posted the comment below to RealClimate at the post by Stefan Rahmstorf, about the inconsistency in the methodology used in their Science Brevia article to show that climate is trending higher than IPCC models. As yet the post has not appeared. I can’t see how it breeches their moderation policy, so I guess I am being told to go pound sand.

Update: Stefan Rahmstorf replied at the post here.

To provide a bit more clarity, I have drawn a couple of lines on the figure at issue to illustrate possible trajectories of the trend. The thin red line is where I think the trend should have gone if the method described in the figure caption had been used — SSA+MRC. The thin blue line is where I think the trend line should have gone if SSA only had been used.


Figure: Annotated Rahmstorf et al. 2007 Science Brevia figure showing global temperatures and trend line. SSA is where the trend line should be for SSA method only, SSA+MRC is the trend line for SSA with the ‘minimum roughness criterion’ applied. The published trend line passes between these possible outcomes.

The actual trend on the figure passes between these two obvious choices. So at this stage I don’t know what method was used. It seems clear that if they had used the method SSA+’maximum roughness criterion’ as described, the trend line would not have supported their argument that ‘temperatures may be responding more quickly to climate change than our current generation of models indicates’.

Below is my post to


I would be grateful if you would clarify for me a
puzzling aspect of your Rahmstorf et al. ’07 Science
paper. You state in the figure caption that the
‘minimum roughness criterion’ was used to get the
temperature trend line. Use of this method of data
padding as described by Mann 2004 should ‘pin’ the
trend line to the 2006 temperature value. However,
while the 2006 value lies in the center of the IPCC
range, the trend line shown on the figure lies above
the 2006 value, in the upper IPCC range.

I would like to clarify this apparent inconsistency.
This an important paper for the case that ‘the climate
system is responding more quickly than climate models
indicate’ and it is important to verify its technical
correctness. More details and graphs can be found

Confidence Limits of Minimum Roughness Criterion

Here I show more humorous effects of smoothed trend lines with the ‘minimum roughness condition’ (MRC). The confidence limits blow out.

Fitting a straight line to data such as global temperature data is a common linear regression example problem. Linear regression of stock prices tells you your rate of appreciation. Smoothing, (or filtering) is used to give a smooth, curved trend instead of a straight regression line. Instead of applying a linear regression model to data many techniques such as moving averages, splines, or singular spectrum analysis (SSA) can give a smooth trend line. One problem with these methods is what to do at the ends where the data runs out.

One way of handling end points is the MRC. The MRC is referenced in papers including Rahmsdorf et al. 2007 who state the nonlinear trend lines “were computed with an embedding period of 11 years and a minimum roughness criterion at the end (Moore 2006)”. The MRC is described in a paper by Michael Mann (2004) as follows: “[O]ne pads the series with the values within one filter width of the boundary reflected vertically (i.e. about the y axis) relative to the final value.” He states the intent of MRC padding the end of a time series is to ensure a smooth trend line until the end of the series.

But, as was noted in “Mannomatic Smoothing and Pinned End-points” MRC causes the trend line to pass through the value of the final point of the series (the pin). Willis Eschenbach also notes his paper on the pinning property had been twice rejected by GRL.

When I wrote a little routine to implement Mannomatic smoothing, I noticed something really funny. I know that it seems bizarre that there can be humor in smoothing algorithms, but hey, this is the Team. Think about what happens with the Mannomatic smooth: you reflect the series around the final value both horizontally and vertically. Accordingly with a symmetric filter (as these things tend to be), everything cancels out except the final value. The Mannomatic pins the series on the end-point exactly the same as Emanuel’s “incorrect” smoothing.

Well if you take the pinning property a step further and estimate the confidence interval of the trend line, another humorous thing happens.

The figure below shows the confidence limits of the calculated trends in global temperature from GISS (shown as solid black) and the MRC padding (dashed black lines). The upper and lower trend lines (red) were calculated using MRC padding originating at the limits of the 95% confidence intervals at year 2006. The blue line is the linear regression of the GISS trend from 1975 to 2001.


Figure: Confidence intervals of a smooth spline trend line with ‘minimum roughness criterion’ padding of endpoints.

With MRC confidence limits of the trend expand to the width of a single value, rather than a mean value (Solid red lines). This is considerably greater than the uncertainty of a trend line (dashed red line). The only effect of the MRC is to replace a narrow confidence interval of a trend line, with a large confidence interval of a single point!

Why would you want to do this? If you want to emphasize the direction the final point is going in, then MRC provides a strong bias on the trend, but you would only get away with it if you don’t present the full uncertainty. So it would be useful for fraudulent statistical modeling, but such practices are bordering on academic fraud.

To be fair Mann 2004 cautions against the application MRC because it is sensitive to outliers, suggests careful evaluation of goodness of fit, providing a pathological example of MRC padding from climate science. However in general, ad hoc methodologies such as MRC should be avoided, uncertainty limits and formal tests of significance should be performed to support claims.


[1]Stefan Rahmstorf, Anny Cazenave, John A. Church, James E. Hansen,
Ralph F. Keeling, David E. Parker, and Richard C. J. Somerville. Recent
Climate Observations Compared to Projections.
Science, 316(5825):709–,

[2] A. Grinsted Moore, J.C. and S. Jevrejeva. New tools for analyzing time series relationships and trends. Eos, 86(24), 1995.

[3] M. E. Mann. On smoothing potentially non-stationary climate time series.
Geophys. Res. Lett., 31:L07214, doi:10.1029/2004GL019569 2004.

Rahmstorf et al. 2007 IPCC Error?

There appears to be an error in the influential paper by Rahmstorf et al. (2007). Rahmstorf (Science Brevia, 4 May 2007, p709 [1]) reports that the trend of the global mean of surface temperature and sea level raise concerns that the climate system “may be responding more quickly to climate change than our current generation of models indicates”. At least one major study, Interim Report of the Garnaut Review, relies on the paper for advocating prompt and extreme action on carbon emissions, one of its major conclusions (Section 2.4 Consequences of Climate Change, Observed Climate Change). But there seems to be a problem.

As previously reported here,
the conclusions of Rahmstorf’s 7 (Rahmstorf, Cazenave, Church, Hansen, Keeling, Parker, and Somerville) rely on a trend line lying above the IPCC projections on their Figure 1, shown enlarged below. No statistical tests are performed, the basis for their claim is purely based on the visual aid. Their Figure 1 is below, with the key part of the image containing the IPCC projection enlarged.



Figures 1 and 2: Rahmstorf et al. (2007) Figure 1. Whole and enlarged.

Rahmstorf’s 7 state in the figure caption that
“All trends are nonlinear trend lines
and are computed with an embedding period of 11 years
and a minimum roughness criterion at the end (Moore 2006 [2])”. On reading Moore’s paper, it would
appear the nonlinear methodology used was Singular
Spectrum Analysis (SSA). The Moore paper suggests the
minimum roughness criterion (MRC) would follow the
Mann 2004 [3] recipe of padding the end of the series with
data reflected about the final value.



Figure 3 and 4: GISS temperature with SSA trend, unpadded and padded.

The peculiar property of MRC of “pinning” the trend line
to the final end point of the series has been noted in
a post “Mannomatic Smoothing and Pinned End-points” at ClimateAudit.

The comparison of MRC and non-MRC padded series
is shown in figures captured from CaterpillarSSA.
The first figure “unpadded.png” shows
a SSA trend line that approximates the Rahmstorf figure result. The
second figure “padded.png” shows the SSA trend line with an MRC
padded GISS series, passing directly through the 2006
value. This is as it should be, as the MRC effectively
‘pins’ the trend line to the final value due to the
symmetry about the final value.

Was a direct application
of the SSA trend line used and not an MRC padded series
as described?

If an MRC padded series had been used in the figure,
it would have been end-pinned to the 2006 value,
at the center of the IPCC
projections. The figure would then not have conveyed the
impression that temperatures are in the upper range of
the IPCC projections, as claimed.

As it was in 2006, it appears that SSA
without MRC padding produces a higher trend line than
with MRC padding, necessary for supporting their claim.

An additional puzzling factor is the reference to MRC padding
at all. Padding the end of the series is not actually
necessary to ensure a SSA trend line is drawn to the
end of the series values. Padding is only necessary in
acausal filters such as moving averages that stop a
window length short of the end of the series.

To date, attempts to contact Prof. Rahmstorf and clarify the actual methodology
used have been unsuccessful.


[1]Stefan Rahmstorf, Anny Cazenave, John A. Church, James E. Hansen,
Ralph F. Keeling, David E. Parker, and Richard C. J. Somerville. Recent
Climate Observations Compared to Projections. Science, 316(5825):709–,

[2] A. Grinsted Moore, J.C. and S. Jevrejeva. New tools for analyzing time series relationships and trends. Eos, 86(24), 1995.

[3] M. E. Mann. On smoothing potentially non-stationary climate time series.
Geophys. Res. Lett., 31:L07214, doi:10.1029/2004GL019569 2004.