Examples of simple smoothers


How much error is there in smoothed climatic and financial series? How much does variability at the ends of the series affect the trend as a result? Here we showed that certain ways of treating the end points introduce a lot of variability. Here we show in certain smoothers variability at the ends can affect the whole smooth!

Below are three different methods with slightly varying end point treatments. Two are causal smoothers (SSA and spline) and one is acausal (moving average). Causal smoothers do not use future data to create a trend to the end point of the series. Acausal smoothers (such as moving averages) need past and future data, and so stop half a window short of the endpoint (see wiki)

All data are global temperature data from GISS from 1973 to 2006.

1. Singular Spectrum Analysis.

Below is the result of two approaches using CaterpillarSSA with an 11 year embedding period. The red curve was a result of padding the end with data reflected around the final 2006 value, the so-called ‘minimum roughness condition’ or MRC. The blue trend is without padding. The green line is the simple linear regression of the 34 years.

ssafig.png

The two different approaches differ throughout the whole length, except where the two curves meet at year 1999. The last seven points deviate quite a lot, illustrating the extra uncertainty at the end. Further discussion of this here.

2. Smooth spline

The figure below shows a smooth spline method of fitting and another approach to estimating uncertainty. This fits a higher order non-linear regression line with 11 degrees of freedom to the points. In this figure, the last point at 2006 has been altered to either the top or bottom of the 95% channel range. That is, the last point covers the range of random variation that might reasonably be expected in 2006.

article-001.png

The two curves differ again, but this time they flex about the 11th point from the end. Further discussion of this method here.

3. Moving average

The final figure below shows the result of running a moving average with an end point at 2006 of 0.6 and 0.3.

mvfig.png

The moving average stops 5 points short of the end of the series, and the last point of the trend varies as a result of this variation at 2006.

Summary

So this shows that methods and data variability results in uncertainty of the trend line, and the uncertainty is particularly pronounced at the end points.

There is a difference between the causal and acausal smoothers used here. In regression type causal smoothers, the end variation can be propogated throughout the whole series. But the regression smoothers have the advantage of extending the smooth all the way to the end of the series (and further if predictions are made). In moving averages the variation is more localized, but the smooth stops short of the end.

Below is a plot of monthly global temperatures from Hadley and GISS with their smooth splines (11df) and regression lines. This is suggestive of temperatures fluctuating more or less randomly above and below a long term trend line.

hadgisstemp.png

Thanks to Stefan Rahmstorf for prompting this comparison here.

Comments are closed to allow discussion here.