Show us your tests – Australian climate projections

My critique of models used in a major Australian drought study appeared in Energy and Environment last month (read Critique-of-DECR-EE here). It deals with validation of models (the subject of a recent post by Judith Curry), and regional model disagreement with rainfall observations (see post by Willis here).

The main purpose is summed up in the last sentence of the abstract:

The main conclusion and purpose of the paper is to provide a case study showing the need for more rigorous and explicit validation of climate models if they are to advise government policy.

It is well known that despite persistent attempts and claims in the press, general circulation models are virtually worthless at projecting changes in regional rainfall, the IPCC says so, and the Australian Academy of Science agrees. The most basic statistical tests in the paper demonstrate this: the simulated drought trends are statistically inconsistent with the trend of the observations, a simple mean value shows more skill that any of the models, and drought frequency has dropped below the 95%CL of the simulations (see Figure).

Rainfall has increased in tropical and subtropical areas of Australia since the 70′s, while some areas of the country, particularly major population centers to the south-east and south-west have experienced multi-year deficits of rainfall. Overall Australian rainfall is increasing.

The larger issue is how to acknowledge that there will always be worthless models, and the task of genuinely committed modellers to identify and eliminate these. It’s not convincing to argue that validation is too hard for climate models, or they are justified by physical realism, or use the calibrated eyeball approach. The study shows that the obvious testing regimes would have eliminated these drought models from contention — if performed.

While scientists are mainly interested in the relative skill of models, where statistical measures such as root mean square (RMS) are appropriate, decision-makers are (or should) be concerned with whether the models should be used at all (are fit-for-use). Because of this, model testing regimes for decision-makers must have the potential to completely reject some or all models if they do not rise above a predetermined standard, or benchmark.

There are a number of ways that benchmarking can be set up, which engineers or others in critical disciplines would be familiar with, usually involving a degree of independent inspection, documentation of expected standards, and so on. My study makes the case that climate science needs to start adopting more rigorous validation practises. Until they do, regional climate projections should not be taken seriously by decision-makers.

It is up to the customers of these studies to not rely on the say-so of the IPCC, the CSIRO and the BoM, and to ask “Show me your tests”, as would be expected with any economic, medical or engineering study where the costs of making the wrong decision are high. Their duty of care requires they are confident that all reasonable means have been taken to validate all of the models that support the key conclusions.