Guess the direction of the RSS global temperature in November 2008?

Up Down
Nickname:
Information

Niche Modeling

The power of numeracy

Guess the direction of the RSS global temperature in November 2008?

(1) Up Down (8)
Results
Niche Modeling header image

Scale invariance for Dummies

March 15th, 2006 by davids · 8 Comments

Below is an investigation of scale invariance or long term persistence (LTP) in time series including tree-ring proxies – the recognition, quantification and implications for analysis – drawn largely from Koutsoyiannis [2] (preprints available here). In researching this topic, I found a lot of misconceptions about LTP phenomena, such as LTP implying a long term memory process, and a lack of recognition of the implications of LTP. As to implications, the standard error of the mean of global temperatures at 30 data points is 4 times larger than the usual estimate for normal errors. Given that LTP is a fact of nature – attributed by Koutsoyiannis to the maximum entropy (ME) principle – this strongly suggests the H should be considered in all hypothesis testing.

There are a number of ways of modeling natural data, which leads to the question: What is an appropriate model? While basic features such as the mean, standard deviation and linear trends are usually the basis of analysis or data, little attention is paid to their autocorrelation properties, and how they scale. Most natural series have a very interesting character, and climate in particular seems to have long term internal correlations. These internal features have a bearing at a minimum on the information needed to predict, and the significance levels for testing of hypotheses. Understanding how these series are constructed helps to guide choice for modeling phenomena.

Here we construct a set of the basic series to simulate the global temperature series from CRU and use a number of tools to examine their properties. We then examine a set of proxy series provided at climateaudit.com as listed in Table 1.





names Reference



1 year year
2 CRU Climate Research Unit
3 J98 Jones et al. 1998 Holocene
4 MBH99 Mann et al. 1999 Geophys Res Lett
5 MJ03 Mann and Jones 2003
6 CL00 Crowley and Lowery 2000 Ambio
7 BJ00 Briffa 2000 Quat Sci Rev
8 BJ01 Briffa et al. 2001 J Geophys Res
9 Esp02 Esper 2002 Science
10 Mob05 Moberg 2005 Science



Table 1: Global temperatures and temperature reconstructions


1.1 Types

1.1.1 Identically distributed error models (IID)

An IID series is the simplest and most familiar series consisting of independent random numbers with a distribution such as the normal distribution. In order to predict future terms in the series, we could make use of the long term mean and variance of past data (hence identical), but the previous term is of no use in predicting the next term (hence independence). We first determine the parameters of the global temperature series to use for generating simulated data. The mean (or drift) in values is not of interest as all values are standardized, but the standard deviation (or dispersion) is estimated from the differenced values to eliminate the drift from the CRU values. A series of random numbers with a normal distribution and a standard deviation equal to CRU is then generated and appended to the data frame. The standard deviation is estimated as 0.15.


> sd <- sqrt(var(diff(na.omit(d$CRU))))
> u <- rnorm(1990, sd = sd)
> d <- cbind(d, list(iid = u))

1.1.2 Moving average models (MA)

Moving averages are probably the next most familiar series, where the average of a limited set or window of values is calculated for every position in the series. In R this is done with the filter command, the filter being determined by a list of numbers to use as coefficients in a summation – in this case 30 values of 1/30 provide a 30 year moving average for CRU. In a MA model the previous value of the series will help predict the next value. While a MA is often called a low frequency bandpass filter, suppressing high frequency fluctuations while passing the long frequency ones. It is also ‘non-stationary’ as the mean is not constant. We need to change the length of the filtered series to match the original series, as filtering reduces the length of the series.


> x <- filter(d$CRU, filter = rep(1, 30)/30)
> length(x) <- length(d$CRU)
> d <- cbind(d, list(CRU30 = x))

1.1.3 Autoregressive models (AR)

An auto-regression model is the next most familiar. Here each term in the series is determined by the previous terms plus some random error. In an AR(1) (or Markov) model only the previous term is useful in predicting the next term. An AR(1) series has the equation X(t) = e + aX(t - 1) where a is a coefficient and e is a random error term. A random walk is the simplest form where a = 1. It can be generated from the series of random numbers by taking the cumulative sum of the random series. We can estimate the value of a with the ar() function and generate an AR(1) model using the R facility arima.sim and the parameters from the CRU temperature data. The coefficient is a = 0.67 and standard deviation is sd = 0.15 for the AR(1) model of CRU.

1.1.4 Fractional Gaussian Noise (FGN)

The next and final series goes by many names: self similar, fractal, roughness, fractional Gaussian noise model (FGN), long term persistence (LTP), clustering or simple scaling series (SSS). Specifically they are characterized as having constantly scaling variance (or standard deviation) over all time or spatial scales. The well known proportionality of the variance of FGN series is:

V ar(k)  oc k2H

where k is the scale, and H is a constant called the Hurst exponent. On a log-log plot of standard deviation this equation is a straight line from which H can be estimated.

log(StDev(k)) = c + Hlog(k)

Among other things, Koutsoyiannis shows a number of different methods of producing models that produce this scaling behavior [1]. He also shows that FGN models of natural climatic phenomena satisfy the maximum entropy (ME) condition, which provides a satisfying link to thermodynamic principles [3]. While there are a number of ways of approximating the FGN series one of the easiest is to generate a series with Gaussian errors and additional large and medium scale random fluctuations of the mean. The timing of the larger scale fluctuations is determined by a random variable with a logarithmic distribution in order to ensure no periodicity.

1.2 Characteristics

In Figure 1 the simulated series are plotted w.r.t. time. The AR(1) and the SSS series resemble quite closely the CRU natural series. However the IID series does not to capture the longer time scale fluctuations. In comparison, the random walk has very large long time scale fluctuations. While it can be seen by eye on Figure 1 that IID and random walk are not good models for the natural series more insightful methods are needed to distinguish the LTP properties. The FGN models are described unflatteringly as having ‘fat tails’. This refers to the way the distribution of less frequent difference values fades out into a thicker tail (power-type) rather than the exponential form of a normal distribution. When these distributions are plotted on Figure 1 it is hard to see which are Gaussian and which are not. We need more powerful ways to examine the data.

PIC

Figure 1: A: Plots of the global temperature (CRU), and the simulated series random, walk, ar(1), and sss. B: Probability distributions for the differenced variables.


One of the main tools for examining the autocorrelation structure of data is the autocorrelation function or ACF. The ACF provides a set of correlations for each distance between numbers in the series, or lags. The autocorrelation decays in a characteristic fashion for each series as the lags get longer as shown in Figure 2. It can be seen that IID series decays very quickly (no long term correlation), the AR(1) model decays fairly quickly, the SSS next and the random walk most slowly. Another way to look at the autocorrelation properties is to plot the ACF at a constant lag, but at increasing aggregate time (or spatial scales). The aggregate is calculated as follows. For example, given a series of numbers X, the aggregated series X1, X2 and X3 is as follows.


> x <- seq(0, 1, by = 0.1)
> agg(x, 1)

[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

> agg(x, 2)

[1] 0.1 0.5 0.9 1.3 1.7

> agg(x, 3)

[1] 0.3 1.2 2.1

This produces the plot on Figure 2B showing more clearly the decay in the autocorrelation of the series.

PIC

Figure 2: Autocorrelation function (ACF) of the simulated series for lags 1 to 40 with the 1/lag (solid line) for comparison. B: Lag 1 ACF of the aggregated processes at time scales 1 to 40.


A plot of the logarithm of standard deviation of the simulated series against the logarithm of the level of aggregation or time scale on Figure 3 shows scale invariant series as straight lines. The plot of random numbers is a straight line of low slope (0.5). The random walk is also a straight line of higher slope (1) and hence with scale invariant standard deviation. The CRU and SSS are also a straight lines. The AR(1) model deviates from the initial slope of the SSS, towards the slope of the random line, showing decreasing standard deviation at larger scales and demonstrating the AR(1) model is not scale invariant.

The difference between the AR(1) model and the SSS model is seen even more clearly on a plot of the standard deviation of the mean against the lag-one autocorrelations against time scale (Figure 3). Here the implications of high self similarity or H value begin to be seen. Series such as the SSS, CRU and the random walk would maintain high standard deviations for very large numbers of data, while the s.d. of the IID series declines very quickly with more data This means that where a series has a high H, increasing numbers of data do not decrease our uncertainty in the mean very much. At the level of the CRU of H = 0.95 the uncertainty in a mean value of 30 points is almost as high as the uncertainty in a mean of a few points. It is this feature of LTP series that is of great concern where accurate estimates of confidence are needed.

PIC

Figure 3: A: Log-log plot of the s.d. of the aggregated simulated processes vs. time scale k. B. Log-log plot of the s.d. of the mean of the aggregated simulated processes vs. time scale k.


We can calculate the H values for the simulated series from the slope of the regression lines on the log-log plot as listed in Table 2. The random series with no persistence has a Hurst exponent of about 0.5. The H of the AR(1) model is 0.67 while the SSS model we generated has an H of 0.83. The global temperature has a high H of 0.94 and the random walk is close to one.





names H



1 CRU 0.94
2 J98 0.87
3 MBH99 0.87
4 MJ03 0.92
5 CL00 0.97
6 BJ00 0.87
7 BJ01 0.84
8 Esp02 0.91
9 Mob05 0.91
10 iid 0.52
11 CRU30 0.95
12 ar1.67 0.65
13 walk 1.00
14 sss 0.83



Table 2: Hurst exponent for all series


2 Real Data

What do the proxies look like? Figure 4 below is a log-log plot of the standard error of the mean of the proxies w.r.t scale. The lines indicate high levels of scale invariance although the slopes of the line differ slightly. In Figure 4 the lag-one ACF against scale for the proxies shows persistent correlation at long time scales. The Hurst exponents range from 0.82 to 0.97 as shown on Table 2.

PIC

Figure 4: A: Log-log plot of the s.d. of the proxy series vs time scale. B: Plot of the lag one ACF vs. time scale k.


3 Implications

Which model should we use? The conclusions are clear.

1. Natural series are better modeled by SSS type models than AR(1) or IID models.

What are the consequences? The variance in an an SSS model is greater than IID models at all scales. Thus using IID variance estimates will lead to Type I errors, spurious significance, and the danger of asserting false claims. As variance of the natural LTP series is much higher than other series, the significance of trends in natural series is likely to be greatly underestimated by assuming IID errors and also underestimated by assuming AR(1) behaviour. The normal relationship for IID data for the standard error of the mean with number of data n

SE[IID] = s/sqrt(n)

has been generalized to the following form in [1]:

SE[LTP] = s/n1-H

Where the errors are IID then H = 0.5 and the generalized from becomes identical to the upper IID form. How much larger is the standard deviation for non IID errors? By dividing and simplifying the equations above we get

SE[LTP]/SE[IID] = nH-0.5

This is plotted by n at a number of values of H below. It can be seen that at higher H values the SE[LTP] can be many times the SE[IID] Figure 5. When the 30 year running mean of temperature is plotted against the CRU temperatures it can be seen that the temperature increase from 1990 to 1950 is just outside the 95% confidence intervals for the FGN model (dotted line). The CIs for the IID model are very narrow however (dashed line),

PIC

Figure 5: A: Number of times by which the s.e. for FGN model exceeds s.e. for IID models at different H values. B: Confidence intervals for the 30 year mean temperature anomaly under IID assumptions (dashed line) and FGN assumptions (dotted lines).


Even without this practical consequence, the relationship of FGN to maximum entropy provides a justification for their use. The self-similar property of natural series amounts to maximization of entropy simultaneously at all scales [3]. This provides a link to the processes that may be responsible for the series consistent with the second law of thermodynamics, which states that overall entropy is never decreasing. While FGN models seem more complex than IID, they are justified simply because they are the ways things are. Other forms of model examined here do not necessarily maximize entropy. It is shown in [4] that the use of an annual AR(1) or similar model amounts to an assumption of a preferred (annual) time scale, but there is neither a justification nor evidence for a preferred time scale in natural series. Also, it is noted that the natural proxy series have only a slightly lower value for H than the CRU temperatures. While this provides additional support for the high values of the temperature series, it also may indicate a loss of some long term persistence in the temperature reconstructions by proxies.

References

[1] D. Koutsoyiannis. Nonstationary versus scaling in hydrology. Journal of Hydrology, in press, 2005.

[2] Demetris Koutsoyiannis. The hurst phenomenon and fractional gaussian noise made easy. Hydrological Sciences Journal, 47(4):573–596, 2002.

[3] Demetris Koutsoyiannis. Uncertainty, entropy, scaling and hydrological stochastics, 2, time dependence of hydrological processes and time scaling. Hydrological Sciences Journal, 50(3):405–426, 2005.

[4] Demetris Koutsoyiannis. An entropic-stochastic representation of rainfall intermittency: The origin of clustering and persistence. Water Resources Research, 42, 2006.

Tags:

8 responses so far ↓

  • 1 Demetris Koutsoyiannis // Mar 15, 2006 at 9:02 pm

    This investigation by David Stockwell is extremely interesting and useful. Not only does it provide evidence that all available proxy series of temperature exhibit scaling behaviour. It also shows how inappropriate the classical (IID) statistical model is for them: in all cases the Hurst coefficient H is as high as 0.90±0.07, far from 0.5. The last part of the article correctly points out the significant implications. I have attempted, too, in several cases to emphasize these implications and expressed the opinion that statistical hydrology needs to be rectified in order to harmonize with this complex nature of physical processes. Now this investigation, with the large climatological data base it uses, points out that the same is true for climatology, given that climatology has been traditionally based on the classical (IID) statistical model, too.

    However, probably the implications are even “worse” than described by David Stockwell. In fact, the formula SE[LTP]/SE[IID] = n^(H-0.5) he derives and the relevant plot indicate the increase of uncertainty under the SSS behaviour, if H is known a priori. Note that in the IID case there is no H (or H = 0.5 a priori) and, thus, no uncertainty about it. In the SSS case, H is typically estimated from the data, so there is more uncertainty due to statistical estimation error. This, however, is difficult (but not intractable) to quantify. And in the case of proxy data, there is additional uncertainty due to the proxy character of the data. This is even more difficult to quantify.

    My conclusion is that the world is more uncertain and more indeterministic than modelled using classical statistics. In my opinion this is not bad; in contrast, this makes life more fascinating. Just think of the difference of watching, say, a football game in real time (live and thus indeterministic) or one day after (with known outcome and thus deterministic).

  • 2 John A // Mar 15, 2006 at 9:02 pm

    Dave,

    Let’s see if I can work through this. What you’re basically showing is that the statistical model (IID) used in multiproxy studies is fundamentally wrong, and leads to conclusions which are false or at the very least misleading. Is this correct?

    [BTW, if possible I’d recommend you find a Wordpress theme that spans the entire page, as this weblog is seriously hard on the eyes.]

  • 3 davids // Mar 15, 2006 at 9:03 pm

    Yes. Most natural series seem to have high LTP, including the reconstructions. The conclusion should be that the confidence intervals are greatly expanded by LTP —that excessive confidence is misplaced—but the conclusions are not necessarily false. For more about falsification, see “A new climate reconstruction?.

    Yeah, I am just using the default theme. Supposedly the width is controlled by a couple of images. If I could only find out which ones and delete them.

  • 4 Surf » Home // Apr 25, 2006 at 5:35 am

    [...] Scale invariance for dummies. [...]

  • 5 Niche Modeling » Options for ACF in R // Jul 14, 2006 at 6:57 pm

    [...] July 14th - Options for ACF in R [...]

  • 6 Niche Modeling » Example of Simple Linear Regression - global warming trends // Mar 18, 2008 at 8:43 pm

    [...] > pnorm(0.135,sd=0.155) [1] 0.8081141 Below is a table of data and results for all the four main global temperature indices. Source10yr_trend(t)SD0.2-tpIPCC Confidence UAH+0.0450.1350.1550.81Likely RSS+0.0090.1220.1910.94Very Likely CRU+0.0980.2160.1020.68Likely GISS+0.1730.1420.0270.58Medium Likelihood As of this month, the trend in temperatures for the last 10 years is so low, that an increase of 0.2C per decade could be rejected in 3 out of 4 indices with some level of confidence. In one case, using the IPCC terminology, these results suggest IPCC projection of global warming this century are very unlikely (1-10% chance) to be correct. This is a controversial result contradicting the IPCC ‘consensus’ position. To any controversial result there are objections. One is that ten years is too short a time to test climate trends. This is of course a statistical nonsense as a trend of any length in a time series can be tested providing appropriate uncertainty is used. With short trends the SD simply becomes much larger. For example, the SD for the 5 year trend is closer to 0.5C, so a trend deviation of more than twice as much would be needed to reject the null hypothesis at the same level as a 10 year trend. Another objection here is that a ten year trend is cherry picking. However, to cherry pick one needs options to pick from. Treating data from the present day 10 years back is not cherry picking as we are talking about temperatures now, and it is impossible to cherry pick the present, there is only one present. The choice of length of trend is the only free variable. Another objection is that increasing temperatures may being masked by factors such as volcanic eruptions, El Niños, sulphates, and trade winds. However, the masking effect was not part of the IPCC predictions of 0.2C per decade, and the results concern those predictions. Also, the masking argument tries to convince us that the models’ magnitude of warming is certain, and explains failure by reference to uncertain factors. Such excuses for lack of predictive skill get tired very quickly. One could also argue that uncertainty limits may be wider than classical statistics and this finite empirical process for estimating SD suggest. Climatic processes exhibit a scaling invariant behavior, also known as long-range persistence (LTP) or the Hurst phenomenon which produces random long term trends. A combination of analytical and Monte Carlo methods such as described in Uncertainty assessment of future hydroclimatic predictions: A comparison of probabilistic and scenario-based approaches (2007) may extend uncertainty limits more and reduce the confidence in the judgements. I think it is a remarkable testament to the power of numbers, that one of the most complex and contentious issues of the time could potentially be brought down by such a simple statistical analysis. The IPCC model projections were only published in 2001 and are already looking very shaky. These projections are central to the IPCC mission. If the current stable temperature trend continues, put the AGW agenda on hold. [...]

  • 7 Niche Modeling » Hurst Coefficient Software // Apr 23, 2008 at 8:20 am

    [...] Long-range dependence is being identified many disciplines such as, networking, databases, economics, climate and biodiversity. LTP is competing with the sexy “long tail” for top spot as a theory of cultural consumption. Thus, the need for software offering complete long-range dependence analysis is crucial. While there are some steps towards this direction, none are yet completely satisfactory. For one, the Hurst exponent cannot be calculated in a definitive way, it can only be estimated. Second, there are several different methods to estimate the Hurst exponent, but they often produce conflicting estimates, and it is not clear which of the estimators are most accurate. A first step towards a systematic approach in estimating self-similarity and long-range dependence is the java tool called SELFIS, a java based tool that will automate the self-similarity analysis. In R there is the fracdiff package on CRAN for fitting frARIMA(p,d,q) models with fractional “d” and there’s a one-to-one relationship between ‘d’ and the Hurst parameter for these models: d = H - 1/2. These are better than the R/S method known to be far from optimal. A useful summary of the issues and reference to other resources is Estimating the Hurst Exponent. Demonstrating just how pervasive are these concepts in our daily life is the Physics of Fashion Fluctuations by R. Donangeloa, A. Hansenb,c, K. Sneppenb and S. R. Souzaa. Here a simple model for emergence of fashions — goods that become popular not due to any intrinsic value, but simply because “everybody wants it” — in markets where people trade goods shows spontaneous emergence of random products as money. The model supports collectively driven fluctuations characterized by a Hurst exponent of about 0.7. Scale Invariance for Dummies is an investigation of scale invariance or long term persistence (LTP) in time series including tree-ring proxies – the recognition, quantification and implications for analysis – drawn largely from Koutsoyiannis. [...]

  • 8 Niche Modeling » CSIRO Data Policy: Go Pound Sand // Jul 15, 2008 at 12:42 pm

    [...] I’m not able to hand over the data from the 13 models, due to restrictions on Intellectual Property, but I can describe the methods used to determine statistical significance. Dewi Kirono says: · I have used a number of statistical tests (parametric and non-parametric) and found that most of them show agreement. I used the 5% significance level. One marginal case was the change in percentage area for exceptionally low rainfall in NSW, in which the T-test was insignificant at the 5% level while Kolmogorov-Smirnov test was significant 5% level. I feel the non-parametric test is more objective since it doesn’t assume a Normal distribution. · For the percentage area (temp, rain), the 13-model-mean sample is the 108 yr time series for 1900-2007 and the 31 yr time series for 2010-2040. For percentage area (soil moisture), the sample is the 50 yr time series for 1957-2006 and the same 50 yr time series modified for a period centred on 2030 · For the frequency (temp and rain), the sample is the number of models (13) as each period (i.e. 1900-2007 and 2010-2040) only produces one return period value. · For soil moisture frequency, I cannot perform the test as we only have one value for the obs (1957-2006). · At the moment I’ve only applied the tests to the “mean” data not the “90th” and 10th” percentiles. This is because we cannot do that for soil moisture and because we deal with lots of zero values for the 10th percentile. Regards Kevin Hennessy I then asked for further clarification of how the statistical tests were performed, and asked again for the data. Further explanation of the statistical tests reveals that they consisted simply of comparison of the means for two time periods, where % area in individual years were the single data points. This simple test assumes the data points are independent, but due to autocorrelation is an unjustified assumption. Failing to account for autocorrelation grossly overestimates the power of the test to detect significant differences (see my post Scale Invariance for Dummies or Chapter 10 of my book). Also see the results of Breusch and Vahid 2008 from the Draft Garnaut Report (reviewed here), where t-test scores for rate of temperature increase dropped from more than 4 to less than 2 when autocorrelation was taken into account. I don’t know the exact autocorrelation, as I can’t get the data, but the temperature and rainfall variables that produce it have very high autocorrelation (or ‘bursty’) behaviour, and so these data must inherit that character. Dear David, [...]

Leave a Comment