-
8
Aug
Reading CRU data is an opportunity to demonstrate some of the features available for programming in R. The Climate Research Unit (CRU) data is a record of the global, northern and southern hemisphere temperatures compiled from temperature sources around the globe for the last 150 years. The files are located at http://www.cru.uea.ac.uk:80/cru/data/temperature/ and look like this, with alternating lines of numbers and values for each month, and annual averages at the end:
1856 -0.405 -0.486 -0.985 -0.277 -0.140 0.313 -0.009 -0.220 -0.391 -0.540 -0.985 -0.357 -0.374 1856 13 18 16 16 15 16 14 14 17 14 16 16 1857 -0.520 -0.035 -0.536 -0.848 -0.582 -0.186 -0.044 0.032 -0.402 -0.619 -0.791 0.341 -0.349
The obvious brain-dead approach is given on the web site. Download the file such as crutem3gl.txt and write something like the FORTRAN pseudo-code with sequential reads of each line in a loop to the end of the file. Actual code in C would entail two more loops for reading the values to the ends of the lines, and code for handling the incomplete line for the current year.
for year = 1850 to endyear format(i5,13f7.3) year, 12 * monthly values, annual value format(i5,12i7) year, 12 * percentage coverage of hemisphere or globe
Here is my example of the same function for extracting and plotting the global temperature in R using R features. Some R sugar is: the data is read from a URL directly rather than a downloaded file. Secondly the function readCRU is defined with default values that can be overridden if required.
readCRU< -function(source="http://www.cru.uea.ac.uk:80/cru/data/temperature/crutem3gl.txt",temps=2:13,plot=T) {
f<-read.table(source,fill=TRUE)
d<-as.vector(t(as.matrix(f[seq(1,length(f[,1]),by=2),temps])))
if (plot) plot(d,type="l")
return(d)
}
readCRU()
Most importantly, using the matrix and vector capacity of R, the entire dataset is read into a data.frame with one command using the read.table command. In the next line R’s powerful indexing selects the desired columns (i.e. f[seq(1,length(f[,1]),by=2),temps]). The data.frame is then converted to a transposed matrix and unrolled into a vector for plotting with a single plot command.
R allows all loops to be eliminated. The power of features including reading from URLs, default parameters and the plot commands make data analysis quick. Anyone what to post a shorter solution?
- Published by david stockwell in: All
- If you like this blog please take a second from your precious time and subscribe to my rss feed!