• Home»
• All»
• Geographic models with R and netpbm

# Geographic models with R and netpbm

Geographic information is a major component of niche modeling in any
spatial science such as ecology.
Geographic Information Systems (GIS) are the tool of choice when
the main purpose is managing geographic information.

As in the previous chapter when R was used as a relational database,
R can be used to perform simple spatial tasks.
This both avoids the need for a separate GIS system when not necessary,
and helps to build knowledge of advanced use of the R language.

R is not very efficient for some of these operations as data
must be manipulated in a form suitable for mathematical operations,
and this limits the size of the data that can be handled.
Another more efficient way to perform basic ENM functions
on large sets of data is to use image processing.
For this, a good image processing package is called netpbm
and examples of the use of image utilities to perform fundamental
analytical operations for modeling are given.

# Data types

Two main types of data are used for
representing geographic information and preparing maps.
The first type is the raster — a regular
grid of numbers, where each number represents the value of a
variable in a regular area, or cell.

A raster can be represented in R either as a matrix or a vector.
The contents of the cells can be integers, floating point numbers,
characters or raw bytes. Below are examples of two ways we
might generate a raster to use in analysis: simulation or input from
a data file.

In the first we generate a matrix with a sequence of numbers
and display it with the image command. In the second we read in a raw
raster dataset and display.
The format of the image data is called a gray scale image, where the data
in each cell, called a pixel in an image, is a single byte.

```> par(mfcol = c(1, 2))
> palette(gray(seq(0, 0.9, len = 30)))
> m  image(m, col = 1:30)
> x  image(matrix(as.numeric(x), 124, 52), ylim = c(1, 0), col = 1:30)

```

Images of this kind are an efficient way of storing large
amounts of data, although they have the disadvantage of not containing
geographic data to enable alignment with other maps and points. Here is an example of the
portable gray map or pgm format used in netpbm, that has many
similarities to most image formats. The first codes, ‘P2′, are called a magic number
that identifies the type of the file. The next line is an optional comments line.
The dimensions of the image follow, then the number of colors in the image. Finally
the data in each pixel is listed. While the pgm format is inefficient for storing large
amounts of data, and having a range of values limited from 0 to 255,
it has the advantage of being very simple and easy to
manipulate.

```P2
# feep.pgm
24 7
15
0  0  0  0  0  0  0  0 ...
```

The second main type of data are point locations.
Sets of related points can be used to form lines like roads and streams,
or polygon shapes representing areas. The ordering of the coordinates
defines the connections in the road or shape. These can be represented in R
either as a matrix with two columns, one for each x and y coordinate,
or as vector of complex numbers.
The two forms of data, rasters and points,
when put it together make a map. R also allows you
to contour matrices of data information easily.

```> par(mfcol = c(1, 2))
> pts  z  plot(z, type = "l")
> points(z, cex = 1:10)
> contour(1:124, 1:52, matrix(as.numeric(x), 124, 52), ylim = c(51, 1))

```

# Operations

While many operations in professional GIS’s are devoted to
producing professional looking maps, basic R is devoted
more to analytical operations.
A range of mathematical operations can be applied to matrices and vectors
to help to prepare spatial data and answer statistical questions.

### Rasterizing

The first operation is rasterizing — plotting points onto a raster.
This is exceedingly easy using the indexing operations in R
as shown below.

Histograms are an essential construct for examining the
distributions of values. In the histogram of classes in a raster layer.
a large proportion of values are 255. This is due to using the value
of 255 to represent ocean.

```> par(mfcol = c(1, 2))
> points  t1  t1[points]  image(t1, col = 1:30)
> hist(as.numeric(x))

```

In ecological niche modeling of spatial data it
is usually necessary to mask out values that are irrelevant to the analysis.
One way to do it is to set a specific value in a mask vector, such as zero, and use
arithmetic operations to nullify the values in another vector, such as multiplication
by zero.

```> par(mfcol = c(1, 2))
> t1[t1 == 255]  comb  image(comb, col = 1:30)
> comb  hist(comb)

```

When using image processing operations, masking can be achieved

```pamarith -add | -subtract | -multiply | -divide | -difference | -minimum | -maximum | -mean | -compare | -and | -or | -nand | -nor | -xor | -shiftleft | -shiftright pamfile1 pamfile2

```

For example, in performing a masking operation, making use of the limited
range of the single byte in each cell with the following command, if the areas
to be masked have value 255 (white) then all areas in the image to be masked,
back.ppm will have the value 255 after the operation.

```pamarith -add back.ppm mask.ppm > masked.ppm
```

### Proximity

Proximity is often an important relationship to capture in ecological
analysis. Convolution is the operation whereby we use neighboring values to
determine the value of the central cell. In R a filter can be applied to
achieve convolution in one direction. The matrix can then
be transposed to apply the filter in the other directions.

```> par(mfcol = c(1, 2))
> f  image(matrix(f, 124.52), col = 1:30)
> f1  f2  image(matrix(t(f2), 124, 52), col = 1:30)

```

Image processing packages usually have a smoothing operation
that achieves the same purpose. The program in neppbm is pnmsmooth
or pnmconvol. The utility pnmconvol uses a convolution matrix file.
In netpbm this is specified
with a pgm image as follows. The parameters width and height specify the
dimensions of hte convolution matrix.

```pnmsmooth [-width=cols] [-height=rows] [-dump=dumpfile] [pnmfile]
```
```pnmconvol convolution_matrix_file [-nooffset] [pnmfile]
```
```    P2
3 3
18
2 2 2
2 2 2
2 2 2

```

### Cropping

Cropping refers to the trimming of unwanted parts of a 2D matrix to
leave the parts necessary for analysis. Foe example, a continental map may be
cropped to remove the area surrounding the set of points where a species occurs, so
as to develop more specific regional models.

This is done by pamcut in netpbm as follows:

```pamcut [-left colnum] [-right colnum] [-top rownum] [-bottom rownum] [-width cols] [-height rows] [-pad] [-verbose] [left top width height] [pnmfile]
```

The approach in R is to develop an array of indices for each of the cells in
the rectangle required with a command called expand.grid. This is however
not suitable for large data matrices.

### Generalization or model development

All models are essentially generalizations, or simplification that enable expression
of theories in mathematical terms. As such, one of the simplest forms of generalization
is categorization, or clustering, where a large number of dissimilar items are sorted into a smaller
number of bins, based on their similarity. Once a set of bins or categories is established,
and there is a basis for deciding into which bin new items should go, new items can
be categorized. In this way, a categorization, or clustering can serve as a predictive
model.

In R the basic operation for clustering is called kmeans. In kmeans, the data to be clustered
is partitioned into k groups such that the sum
of squares from points to the assigned cluster centres is minimized.
At the minimum, all cluster centres are at the mean of
the set of data points with the same category.

A similar operation is used in image processing called color quantization
or color reduction. Reducing the number of colors will compress the size of
an image at the expense of the number of colors. The utility in netpbm
for this purpose is called ppmquant.

```ppmquant colors pnmfile
```

### Prediction

Prediction can be achieved very efficiently with index mapping where
the dependent variable is one dimensional.
Index mapping is another fundamental operation for
mapping the values of a matrix into another set of values.
This operation can be used to apply the results of a model,
for example, changing the values in the cells from their
original value, to a probability for each specific class.

In image processing, index mapping is called palette mapping,
a process where the original colors are substituted for a new set.
Here a palette file defining the mapping is used.

```
> par(mfcol = c(1, 2))
> f1  prob = seq(255, 1, -1)
> result  image(matrix(result, 124, 52), col = 1:30)
> prob = c(seq(0, 255, 20), rep(0, 100))
> result  image(matrix(result, 124, 52), col = 1:30)

```

The utility in the netpbm package is pamlookup, invoked with an image as
a lookup table for mapping the old colors to the new.

```pamlookup -lookupfile=lookupfile -missingcolor=color [-fit] indexfile
```

Note that this is a very efficient operation as the data in the image
does not change, only the small set of values in the palette of the image.

## Summary

Here we have shown that R and image processing operations can perform the
elementary operations necessary for niche modeling.
These do not replace the GIS for accurate work. It demonstrates
that the basic concepts can be implemented in a number of
different ways, with different tools.

It is worth noting that all analysis represents a theoretical
approximation of the target concept. As such, any approach
has strengths and weaknesses. It is important to be aware of
them when performing analysis, and to try to understand and verify
the correctness of any analytical tool.

Now we move on to actual applications of niche modeling using the principles and
methods developed in the last three chapters.