Linear Regression R Squared

One of the tests of climate models predicting drought in my review of the Drought Exceptional Circumstances Report was the correlation of predicted area under drought with actual observed area under drought. Lazar criticized my inclusion of the R-Squared (r2) coefficient, an issue I didn’t follow up at the time.

… correlating model predictions for individual years of exceptional rainfall with observed years of exceptional rainfall! This ignores noise (internal variability in the climate system and GCM climate simulations) and that the CSIRO report predicted frequency. Steve MicIntrye and the auditors repeat this mistake here, with the obligatory snark from Steve

The objection is that it unreasonable to expect climate models to predict ‘year-to-year variation’ with drought using this test. To set the record straight, I have run a small test demonstrating conclusively that the r2 does detect trends in frequency of intermittent events, (as opposed to trends in actual values) and consequently the test does not only rely only on year-to-year variations.

Below is a short R script where I represent a trend of increasing drought frequency with two independent sequences of numbers (0,1). These are plotted below. The results of fitting a linear regression to the sequences follow.


o runif(100)
e runif(100)
l<- lm(e~o)
plot(o,col="blue",type="l")
lines(e,col="red")
print(summary(l))


> source("rtest.R")

Call:
lm(formula = e ~ o)

Residuals:
Min 1Q Median 3Q Max
-0.6739 -0.3148 -0.3148 0.3261 0.6852

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.31481 0.06412 4.910 3.64e-06 ***
oTRUE 0.35910 0.09454 3.798 0.000253 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4712 on 98 degrees of freedom
Multiple R-Squared: 0.1283, Adjusted R-squared: 0.1194
F-statistic: 14.43 on 1 and 98 DF, p-value: 0.0002527

The run shown produced a non-zero R-squared of 0.128, and significant slope (***), demonstrating that a similar trend in frequency of values does produce a positive correlation r2. In comparison, the r2 between modeled and drought data was essentially zero, indicating no detectable common trend in drought frequency using this method.

Niche Modeling. Chapter Summary

Here is a summary of the chapters in my upcoming book Niche Modeling to be published by CRC Press. Many of the topics have been introduced as posts on the blog. My deepest thanks to everyone who has commented and so helped in the refinement of ideas, and particularly in providing motivation and focus.

Writing a book is a huge task, much of it a slog, and its not over yet. But I hope to get it to the publishers so it will be available at the end of this year. Here is the dustjacket blurb:

Through theory, applications, and examples of inferences, this book shows how to conduct and evaluate ecological niche modeling (ENM) projects in any area of application. It features a series of theoretical and practical exercises in developing and evaluating ecological niche models using a range of software supplied on an accompanying CD. These cover geographic information systems, multivariate modeling, artificial intelligence methods, data handling, and information infrastructure. The author then features applications of predictive modeling methods with reference to valid inference from assumptions. This is a seminal reference for ecologists as well as a superb hands-on text for students.

Part 1: Informatics

Functions: This chapter summarizes major types, operations and relationships encountered in the book and in niche modeling. This and the following two chapters could be treated as a tutorial in the R. For example, the main functions for representing the inverted ‘U’ shape characteristic of a niche — step, Gaussian, quadratic and ramp functions – are illustrated in both graphical from and R code. The chapeter concludes with the ACF and lag plots, in one or two dimensions.

Data: This chapter demonstrates how to manage simple biodiversity databases using R. By using data frames as tables,
it is possible to replicate the basic spreadsheet and relational database operations with R’s powerful indexing functions.
While a database is necessary for large-scale data management, R can eliminate conversion problems as data is moved between systems.

Spatial:
R and image processing operations can perform many of the
elementary spatial operations necessary for niche modeling.
While these do not replace a GIS, it demonstrates that generalization of arithmetic concepts to images can be implemented simple spatial operations efficiently.

Part 2: Modeling

Theory: Set theory helps to identify the basic assumptions
underlying niche modeling, and the relationships and constraints between these
assumptions. The chapter shows the standard definition of the niche as
environmental envelopes is equivalent to a box topology. It is proven that when
extended to infinite dimensions of environmental variables this definition
loses the property of continuity between environmental and geographic spaces.
Using the product topology for niches would retain this property.

Continue reading Niche Modeling. Chapter Summary

RE of random reconstructions

To follow up on the last post, I have calculated the RE as well as the R2 statsitics for the reconstruction from the random series. The same approach was used, i.e. generate 1000 sequences with LTP, select those with positive slope and R2>0.1, calibrate on linear model, and average. Here is the reconstruction again, with the test and training periods marked with a horizontal dashed line (test period to the left, training to right of temperature values):

Continue reading RE of random reconstructions