-
12
Aug
Here is an example of doing statistics in R, illustrating a pitfall of simple linear regression, using the global temperatures by satellite from 1979 to 2008. I have never seen the problem of ‘spurious compensation’ mentioned, but it is a common problem when trying to develop a model from a set of additive factors.
The linear regression below ‘invents’ large positive and negative factors that when added together, create a small difference that is a very good fit. The factors themselves are pronounced ‘highly significant’. Could ‘spurious compensation’ explain high positive forcing from CO2 and high negative forcing from aerosols found in some attribution studies?
Below I have fit the lower troposphere temperature to a linear regression made up of a list of the linear, square, and cubic multiples of time:
y = a + bx + cx2 + dx3
The resulting curve is the red line in the figure below. Nice fit. However, each of the components are shown as colored lines. The linear and cubic terms are dropping faster than mortgage loan companies Fannie Mae and Freddie Mac. The squared term is rising faster than female viagra. Who would like to ‘attribute’ a physical factor to each of these components?
The summary listing indicates this is a model with high significance, overall and in its components.
> source(“script.R”)
> l|t|)
(Intercept) -1.333e-02 3.634e-02 -0.367 0.714
x -4.265e-02 1.059e-02 -4.027 6.92e-05 ***
I(x^2) 4.789e-03 8.290e-04 5.777 1.68e-08 ***
I(x^3) -1.042e-04 1.837e-05 -5.671 2.98e-08 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 0.1693 on 351 degrees of freedom
Multiple R-Squared: 0.4559, Adjusted R-squared: 0.4512
F-statistic: 98.03 on 3 and 351 DF, p-value: < 2.2e-16
The code for generating this plot is below. Comment out the line readRSS() after the first time so that it does not keep reading the data in from the remote website.
TLT< -"ftp://ftp.ssmi.com/msu/monthly_time_series/rss_monthly_msu_amsu_channel_tlt_anomalies_land_and_ocean_v03_1.txt"
#tlt<-readRSS(file=TLT)do<-function(Y) {
x<-(1:length(Y))/12
Y1 <- as.vector(Y)
l <- lm(Y1 ~ x + I(x^2) + I(x^3))
plot(x,Y1,type="l",ylim=c(-1,1),ylab="Anomaly DegC",xlab="Years")
lines(x,predict(l),col="red")
lines(x,l$coefficients[2]*x,col="blue")
lines(x,l$coefficients[3]*x^2,col="green")
lines(x,l$coefficients[4]*x^3,col="orange")
print(summary(l))
l
}
Update: Here is the result for the fifth and sixth order in x linear regression of tropospheric temperatures. Note that the coefficients all have very low significance in the fifth order version, while the coefficients have mild significance in the sixth order.
Call:
lm(formula = Y1 ~ x + I(x^2) + I(x^3) + I(x^4) + I(x^5))Residuals:
Min 1Q Median 3Q Max
-0.392083 -0.122727 -0.007861 0.092455 0.721341Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.179e-02 5.482e-02 -0.580 0.562
x -4.572e-02 3.715e-02 -1.231 0.219
I(x^2) 9.152e-03 7.734e-03 1.183 0.237
I(x^3) -7.418e-04 6.599e-04 -1.124 0.262
I(x^4) 3.144e-05 2.451e-05 1.283 0.200
I(x^5) -5.024e-07 3.288e-07 -1.528 0.127Residual standard error: 0.1679 on 349 degrees of freedom
Multiple R-Squared: 0.4681, Adjusted R-squared: 0.4605
F-statistic: 61.43 on 5 and 349 DF, p-value: < 2.2e-16
Call:
lm(formula = Y1 ~ x + I(x^2) + I(x^3) + I(x^4) + I(x^5) + I(x^6))Residuals:
Min 1Q Median 3Q Max
-0.384645 -0.121876 -0.006046 0.093070 0.731062Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.067e-01 6.420e-02 -1.662 0.0975 .
x 5.739e-02 5.954e-02 0.964 0.3357
I(x^2) -2.531e-02 1.740e-02 -1.455 0.1465
I(x^3) 3.889e-03 2.197e-03 1.770 0.0776 .
I(x^4) -2.608e-04 1.345e-04 -1.939 0.0534 .
I(x^5) 8.160e-06 3.935e-06 2.073 0.0389 *
I(x^6) -9.733e-08 4.406e-08 -2.209 0.0278 *
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 0.167 on 348 degrees of freedom
Multiple R-Squared: 0.4755, Adjusted R-squared: 0.4664
F-statistic: 52.57 on 6 and 348 DF, p-value: < 2.2e-16
Finally, here is the result for the second order in x linear regression. Note that the squared term has a small coefficient and is not significant. The linear term has a coefficient of 0.012 or +0.12 degC per decade.
Call:
lm(formula = Y1 ~ x + I(x^2))Residuals:
Min 1Q Median 3Q Max
-0.431278 -0.118049 -0.007253 0.107826 0.753653Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1504715 0.0282913 -5.319 1.86e-07 ***
x 0.0124321 0.0044039 2.823 0.00503 **
I(x^2) 0.0001537 0.0001438 1.069 0.28560
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 0.1767 on 352 degrees of freedom
Multiple R-Squared: 0.406, Adjusted R-squared: 0.4027
F-statistic: 120.3 on 2 and 352 DF, p-value: < 2.2e-16
Without the squared term the linear term is +0.17 degC per decade.
Call:
lm(formula = Y1 ~ x)Residuals:
Min 1Q Median 3Q Max
-0.418624 -0.119757 -0.006886 0.108515 0.745554Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.173087 0.018798 -9.208 <2e-16 ***
x 0.016993 0.001098 15.472 <2e-16 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 0.1767 on 353 degrees of freedom
Multiple R-Squared: 0.4041, Adjusted R-squared: 0.4024
F-statistic: 239.4 on 1 and 353 DF, p-value: < 2.2e-16
- Published by david stockwell in: All
- If you like this blog please take a second from your precious time and subscribe to my rss feed!



