Spatial Data
Spatial Data
Robert J. Hijmans
1 Introduction 1
2 Spatial data 3
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Vector data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Raster data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Simple representation of spatial data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Vector data 9
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 SpatialPoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 SpatialLines and SpatialPolygons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Raster data 15
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 RasterLayer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 RasterStack and RasterBrick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
i
7.1.3 Merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.1.4 Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.2 Append and aggregate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.3 Append . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.4 Aggregate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.5 Overlay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.5.1 Erase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.5.2 Intersect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7.5.3 Union . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.5.4 Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.5.5 Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.6 Spatial queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
9 Maps 61
9.1 Vector data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
9.1.1 Base plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
9.1.2 spplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
9.2 Raster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.3 Specialized packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
ii
CHAPTER
ONE
INTRODUCTION
This is an introduction to spatial data manipulation with R. In this context “spatial data” refers to data about geograph-
ical locations, that is, places on earth. So to be more precise, we should speak about “geospatial” data, but we use the
shorthand “spatial”.
This is the introductory part of a set of resources for learning about spatial analysis and modeling with R. Here we
cover the basics of data manipulation. When you are done with this section, you can continue with the introduction to
spatial data analysis.
You need to need to know some of the basics of the R language before you can work with spatial data in R. If you
have not worked with R before, or not recently, have a look at this brief introduction to R.
You can download this manual as a pdf.
1
Spatial Data in R
2 Chapter 1. Introduction
CHAPTER
TWO
SPATIAL DATA
2.1 Introduction
Spatial phenomena can generally be thought of as either discrete objects with clear boundaries or as a continuous
phenomenon that can be observed everywhere, but does not have natural boundaries. Discrete spatial objects may
refer to a river, road, country, town, or a research site. Examples of continuous phenomena, or “spatial fields”, include
elevation, temperature, and air quality.
Spatial objects are usually represented by vector data. Such data consists of a description of the “geometry” or “shape”
of the objects, and normally also includes additional variables. For example, a vector data set may describe the borders
of the countries of the world (geometry), and also store their names and the size of their population in 2015; or the
geometry of the roads in an area, as well as their type and names. These additional variables are often referred to as
“attributes”. Continuous spatial data (fields) are usually represented with a raster data structure. We discuss these two
data types in turn.
3
Spatial Data in R
A map of point locations is not that different from a basic x-y scatter plot. Here I make a plot (a map in this case) that
shows the location of the weather stations, and the size of the dots is proportional to the amount of precipitation. The
point size is set with argument cex.
# add a legend
breaks <- c(100, 500, 1000, 2000)
legend("topright", legend=breaks, pch=20, pt.cex=psize, col='red', bg='gray')
Note that the data are represented by “longitude, latitude”, in that order, do not use “latitude, longitude” because on
most maps latitude (North/South) is used for the vertical axis and longitude (East/West) for the horizontal axis. This
is important to keep in mind, as it is a very common source of mistakes!
We can add multiple sets of points to the plot, and even draw lines and polygons:
plot(stations, main='Precipitation')
The above illustrates how numeric vectors representing locations can be used to draw simple maps. It also shows how
points can (and typically are) represented by pairs of numbers, and a line and a polygons by a number of these points.
Polygons is that they are “closed”, i.e. the first point coincides with the last point, but the polygon function took
care of that for us.
There are cases where a simple approach like this may suffice and you may come across this in older R code or
packages. Likewise, raster data could be represented by a matrix or higher-order array. Particularly when only dealing
with point data such an approach may be practical. For example, a spatial data set representing points and attributes
could be made by combining geometry and attributes in a single ‘data.frame‘.
However, wst is a data.frame and R does not automatically understand the special meaning of the first two columns,
or to what coordinate reference system it refers (longitude/latitude, or perhaps UTM zone 17S, or . . . .?).
Moreover, it is non-trivial to do some basic spatial operations. For example, the blue polygon drawn on the map
above might represent a state, and a next question might be which of the 10 stations fall within that polygon. And
how about any other operation on spatial data, including reading from and writing data to files? To facilitate such
operation a number of R packages have been developed that define new spatial data types that can be used for this type
of specialized operations. The most important packages that define such spatial data structures are sp and raster.
These data types are discussed in the next chapters.
THREE
VECTOR DATA
3.1 Introduction
Package sp is the central package supporting spatial data analysis in R. sp defines a set of classes to represent spatial
data. A class defines a particular data type. The data.frame is an example of a class. Any particular data.frame
you create is an object (instantiation) of that class.
The main reason for defining classes is to create a standard representation of a particular data type to make it easier
to write functions (also known as ‘methods’) for them. In fact, the sp package does not provide many functions
to modify or analyze spatial data; but the classes it defines are used in more than 100 other R packages that provide
specific functionality. See Hadley Wickham’s Advanced R or John Chambers’ Software for data analysis for a detailed
discussion of the use of classes in R).
We will be using the sp package here. Note that this package will eventually be replaced by the newer sf package —
but sp is still more commonly used.
Package sp introduces a number of classes with names that start with Spatial. For vector data, the ba-
sic types are the SpatialPoints, SpatialLines, and SpatialPolygons. These classes only repre-
sent geometries. To also store attributes, classes are available with these names plus DataFrame, for exam-
ple, SpatialPolygonsDataFrame and SpatialPointsDataFrame. When referring to any object with
a name that starts with Spatial, it is common to write Spatial*. When referring to a SpatialPolygons or
SpatialPolygonsDataFrame object it is common to write SpatialPolygons*. The Spatial classes (and
their use) are described in detail by Bivand, Pebesma and Gómez-Rubio.
It is possible to create Spatial* objects from scratch with R code. That can be very useful to create small self
contained example to illustrate something, for example to ask a question about how to do a particular operation
without needing to give access to the real data you are using (which is always cumbersome). But in real life you will
read these from a file or database, for example from a “shapefile” see Chapter 5.
To get started, let’s make some Spatial objects from scratch anyway, using the same data as were used in the previous
chapter.
3.2 SpatialPoints
longitude <- c(-116.7, -120.4, -116.7, -113.5, -115.5, -120.8, -119.5, -113.7, -113.7,
˓→ -110.7)
latitude <- c(45.3, 42.6, 38.9, 42.1, 35.7, 38.9, 36.2, 39, 41.6, 36.9)
lonlat <- cbind(longitude, latitude)
library(sp)
pts <- SpatialPoints(lonlat)
9
Spatial Data in R
class (pts)
## [1] "SpatialPoints"
## attr(,"package")
## [1] "sp"
showDefault(pts)
## An object of class "SpatialPoints"
## Slot "coords":
## longitude latitude
## [1,] -116.7 45.3
## [2,] -120.4 42.6
## [3,] -116.7 38.9
## [4,] -113.5 42.1
## [5,] -115.5 35.7
## [6,] -120.8 38.9
## [7,] -119.5 36.2
## [8,] -113.7 39.0
## [9,] -113.7 41.6
## [10,] -110.7 36.9
##
## Slot "bbox":
## min max
## longitude -120.8 -110.7
## latitude 35.7 45.3
##
## Slot "proj4string":
## CRS arguments: NA
So we see that the object has the coordinates we supplied, but also a bbox. This is a ‘bounding box’, or the ‘spatial
extent’ that was computed from the coordinates. There is also a “proj4string”. This stores the coordinate reference
system (“crs”, discussed in more detail later). We did not provide the crs so it is unknown (NA). That is not good, so
let’s recreate the object, and now provide a crs.
library(raster)
pts
## class : SpatialPoints
## features : 10
## extent : -120.8, -110.7, 35.7, 45.3 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
We can use the SpatialPoints object to create a SpatialPointsDataFrame object. First we need a data.
frame with the same number of rows as there are geometries.
str(ptsdf)
## Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
## ..@ data :'data.frame': 10 obs. of 2 variables:
## .. ..$ ID : int [1:10] 1 2 3 4 5 6 7 8 9 10
## .. ..$ precip: num [1:10] 6.18 20.6 17.66 68.7 38.41 ...
## ..@ coords.nrs : num(0)
## ..@ coords : num [1:10, 1:2] -117 -120 -117 -114 -116 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : NULL
## .. .. ..$ : chr [1:2] "longitude" "latitude"
## ..@ bbox : num [1:2, 1:2] -120.8 35.7 -110.7 45.3
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:2] "longitude" "latitude"
## .. .. ..$ : chr [1:2] "min" "max"
## ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
## .. .. ..@ projargs: chr "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"
Or
showDefault(ptsdf)
## An object of class "SpatialPointsDataFrame"
## Slot "data":
## ID precip
## 1 1 6.178627
## 2 2 20.597457
## 3 3 17.655675
## 4 4 68.702285
## 5 5 38.410372
## 6 6 76.984142
## 7 7 49.769924
## 8 8 71.761851
## 9 9 99.190609
## 10 10 38.003518
##
## Slot "coords.nrs":
## numeric(0)
##
## Slot "coords":
## longitude latitude
## [1,] -116.7 45.3
## [2,] -120.4 42.6
## [3,] -116.7 38.9
## [4,] -113.5 42.1
## [5,] -115.5 35.7
(continues on next page)
3.2. SpatialPoints 11
Spatial Data in R
The structure of the SpatialPolygons class is somewhat complex as it needs to accommodate the possibility of multiple
polygons, each consisting of multiple sub-polygons, some of which may be “holes”.
str(pols)
## Formal class 'SpatialPolygons' [package "sp"] with 4 slots
## ..@ polygons :List of 1
## .. ..$ :Formal class 'Polygons' [package "sp"] with 5 slots
## .. .. .. ..@ Polygons :List of 1
## .. .. .. .. ..$ :Formal class 'Polygon' [package "sp"] with 5 slots
## .. .. .. .. .. .. ..@ labpt : num [1:2] -114.7 40.1
## .. .. .. .. .. .. ..@ area : num 19.7
## .. .. .. .. .. .. ..@ hole : logi FALSE
## .. .. .. .. .. .. ..@ ringDir: int 1
## .. .. .. .. .. .. ..@ coords : num [1:8, 1:2] -117 -114 -113 -112 -114 ...
## .. .. .. ..@ plotOrder: int 1
## .. .. .. ..@ labpt : num [1:2] -114.7 40.1
(continues on next page)
Fortunately, you do not need to understand how these structures are organized. The main take home message is that
they store geometries (coordinates), the name of the coordinate reference system, and attributes.
We can make use generic R function plot to make a map.
FOUR
RASTER DATA
4.1 Introduction
The sp package supports raster (gridded) data with with the SpatialGridDataFrame and
SpatialPixelsDataFrame classes. However, we will focus on classes from the raster package for
raster data. The raster package is built around a number of classes of which the RasterLayer, RasterBrick,
and RasterStack classes are the most important. When discussing methods that can operate on all three of these
objects, they are referred to as ‘Raster*’ objects.
The raster package has functions for creating, reading, manipulating, and writing raster data. The package pro-
vides, among other things, general raster data manipulation functions that can easily be used to develop more specific
functions. For example, there are functions to read a chunk of raster values from a file or to convert cell numbers to
coordinates and back. The package also implements raster algebra and many other functions for raster data manipula-
tion.
4.2 RasterLayer
A RasterLayer object represents single-layer (variable) raster data. A RasterLayer object always stores a
number of fundamental parameters that describe it. These include the number of columns and rows, the spatial extent,
and the Coordinate Reference System. In addition, a RasterLayer can store information about the file in which the
raster cell values are stored (if there is such a file). A RasterLayer can also hold the raster cell values in memory.
Here I create a RasterLayer from scratch. But note that in most cases where real data is analyzed, these objects
are created from a file.
library(raster)
r <- raster(ncol=10, nrow=10, xmx=-80, xmn=-150, ymn=20, ymx=60)
r
## class : RasterLayer
## dimensions : 10, 10, 100 (nrow, ncol, ncell)
## resolution : 7, 4 (x, y)
## extent : -150, -80, 20, 60 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
Object r only has the skeleton of a raster data set. That is, it knows about its location, resolution, etc., but there are no
values associated with it. Let’s assign some values. In this case I assign a vector of random numbers with a length that
is equal to the number of cells of the RasterLayer.
values(r) <- runif(ncell(r))
r
## class : RasterLayer
## dimensions : 10, 10, 100 (nrow, ncol, ncell)
## resolution : 7, 4 (x, y)
(continues on next page)
15
Spatial Data in R
You can also assign cell numbers (in this case overwriting the previous values)
plot(r)
b <- brick(s)
b
## class : RasterBrick
## dimensions : 10, 10, 100, 3 (nrow, ncol, ncell, nlayers)
## resolution : 7, 4 (x, y)
## extent : -150, -80, 20, 60 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
## data source : in memory
## names : layer.1, layer.2, layer.3
## min values : 1, 1, 1
## max values : 100, 10000, 10
FIVE
5.1 Introduction
Reading and writing spatial is complicated by the fact that there are many different file formats. However, there are a
few formats that are most common that we discuss here.
5.2.1 Reading
We use the system.file function to get the full path name of the file’s location. We need to do this as the location
of this file depends on where the raster package is installed. You should not use the system.file function for your
own files. It only serves for creating examples with data that ships with R.
library(raster)
filename <- system.file("external/lux.shp", package="raster")
filename
## [1] "C:/soft/R/R-3.5.2/library/raster/external/lux.shp"
Now we have the filename we need we use the shapefile function. This function comes with the raster package.
For it to work you must also have the rgdal package.
s <- shapefile(filename)
s
## class : SpatialPolygonsDataFrame
## features : 12
## extent : 5.74414, 6.528252, 49.44781, 50.18162 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 5
## names : ID_1, NAME_1, ID_2, NAME_2, AREA
## min values : 1, Diekirch, 1, Capellen, 76
## max values : 3, Luxembourg, 9, Wiltz, 312
19
Spatial Data in R
5.2.2 Writing
You can also write shapefiles using the shapefile method. In stead of a filename, you need to provide a vec-
tor type Spatial* object as first argument and a new filename as a second argument. You can add argument
overwrite=TRUE if you want to overwrite an existing file.
outfile <- 'test.shp'
shapefile(s, outfile, overwrite=TRUE)
For other formats, you can use writeOGR function in package rgdal.
5.3.1 Reading
Again we need to get a filename for an example file.
f <- system.file("external/rlogo.grd", package="raster")
f
## [1] "C:/soft/R/R-3.5.2/library/raster/external/rlogo.grd"
Now we can do
r1 <- raster(f)
r1
## class : RasterLayer
## band : 1 (of 3 bands)
## dimensions : 77, 101, 7777 (nrow, ncol, ncell)
## resolution : 1, 1 (x, y)
## extent : 0, 101, 0, 77 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=merc +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
## data source : C:\soft\R\R-3.5.2\library\raster\external\rlogo.grd
## names : red
## values : 0, 255 (min, max)
Note that r1 is a RasterLayer of the first “band” (layer) in the file (out of three bands (layers)). We can request another
layer.
r2 <- raster(f, band=2)
r2
## class : RasterLayer
## band : 2 (of 3 bands)
## dimensions : 77, 101, 7777 (nrow, ncol, ncell)
## resolution : 1, 1 (x, y)
## extent : 0, 101, 0, 77 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=merc +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
## data source : C:\soft\R\R-3.5.2\library\raster\external\rlogo.grd
## names : green
## values : 0, 255 (min, max)
More commonly, you would want all layers in a single object. For that you can use the brick function.
b <- brick(f)
b
## class : RasterBrick
(continues on next page)
Or you can use stack, but that is less efficient in most cases.
s <- stack(f)
s
## class : RasterStack
## dimensions : 77, 101, 7777, 3 (nrow, ncol, ncell, nlayers)
## resolution : 1, 1 (x, y)
## extent : 0, 101, 0, 77 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=merc +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
## names : red, green, blue
## min values : 0, 0, 0
## max values : 255, 255, 255
The same approach holds for other raster file formats, including GeoTiff, NetCDF, Imagine, and ESRI Grid formats.
5.3.2 Writing
Use writeRaster to write raster data. You must provide a Raster* object and a filename. The file format will
be guessed from the filename extension (if that does not work you can provide an argument like format=GTIFF).
Note the argument overwrite=TRUE and see ?writeRaster for more arguments, such as datatype= to set
the datatype (e.g., integer, float).
SIX
6.1 Introduction
A very important aspect of spatial data is the coordinate reference system (CRS) that is used. For example, a location
of (140, 12) is not meaningful if you do know where the origin is and if the x-coordinate is 140 meters, kilometers, or
perhaps degrees away from it (in the x direction).
Obviously we cannot actually measure these angles. But we can estimate them. To do so, you need a model of the
shape of the earth. Such a model is called a ‘datum’. The simplest datums are a spheroid (a sphere that is ‘flattened’
at the poles and bulges at the equator). More complex datums allow for more variation in the earth’s shape. The most
commonly used datum is called WGS84 (World Geodesic System 1984). This is very similar to NAD83 (The North
American Datum of 1983). Other, local datums exist to more precisely record locations for a single country or region.
So the basic way to record a location is a coordinate pair in degrees and a reference datum. (Sometimes people say that
their coordinates are “in WGS84”. That is meaningless; but they typically mean to say that they are longitude/latitude
relative to the WGS84 datum).
23
Spatial Data in R
6.2.2 Projections
A major question in spatial analysis and cartography is how to transform this three dimensional angular system to a two
dimensional planar (sometimes called “Cartesian”) system. A planar system is easier to use for certain calculations
and required to make maps (unless you have a 3-d printer). The different types of planar coordinate reference systems
are referred to as ‘projections’. Examples are ‘Mercator’, ‘UTM’, ‘Robinson’, ‘Lambert’, ‘Sinusoidal’ ‘Robinson’
and ‘Albers’.
There is not one best projection. Some projections can be used for a map of the whole world; other projections are
appropriate for small areas only. One of the most important characteristics of a map projection is whether it is “equal
area” (the scale of the map is constant) or “conformal” (the shapes of the geographic features are as they are seen on
a globe). No two dimensional map projection can be both conformal and equal-area (but they can be approximately
both for smaller areas, e.g. UTM, or Lambert Equal Area for a larger area), and some are neither.
6.2.3 Notation
A planar CRS is defined by a projection, datum, and a set of parameters. The parameters determine things like where
the center of the map is. The number of parameters depends on the projection. It is therefore not trivial to document a
projection used, and several systems exist. In R we use the [PROJ.4[(ftp://ftp.remotesensing.org/proj/OF90-284.pdf )
notation. PROJ.4 is the name of an open source software library that is commonly used for CRS transformation.
Here is a list of commonly used projections and their parameters in PROJ4 notation. You can find many more of these
on spatialreference.org
Most commonly used CRSs have been assigned a “EPSG code” (EPSG stands for European Petroleum Survey
Group). This is a unique ID that can be a simple way to identify a CRS. For example EPSG:27561 is equivalent
to +proj=lcc +lat_1=49.5 +lat_0=49.5 +lon_0=0 +k_0=0.999877341 +x_0=6 +y_0=2
+a=6378249.2 +b=6356515 +towgs84=-168,-60,320,0,0,0,0 +pm=paris +units=m
+no_defs. However EPSG:27561 is opaque and should not be used outside of databases. In R use the
PROJ.4 notation, as that can be readily interpreted without relying on software.
Below is an illustration of how to find a particular projection you may need (in this example, a list of projections for
France).
library(rgdal)
epsg <- make_EPSG()
i <- grep("France", epsg$note, ignore.case=TRUE)
# first three
epsg[i[1:3], ]
## code note
## 684 2192 # ED50 / France EuroLambert (deprecated)
## 4408 27561 # NTF (Paris) / Lambert Nord France
## 4409 27562 # NTF (Paris) / Lambert Centre France
##
˓→
˓→ prj4
## 684 +proj=lcc +lat_1=46.8 +lat_0=46.8 +lon_0=2.
˓→337229166666667 +k_0=0.99987742 +x_0=600000 +y_0=2200000 +ellps=intl +towgs84=-87,-
library(raster)
library(rgdal)
f <- system.file("external/lux.shp", package="raster")
p <- shapefile(f)
p
## class : SpatialPolygonsDataFrame
## features : 12
## extent : 5.74414, 6.528252, 49.44781, 50.18162 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 5
## names : ID_1, NAME_1, ID_2, NAME_2, AREA
## min values : 1, Diekirch, 1, Capellen, 76
## max values : 3, Luxembourg, 9, Wiltz, 312
crs(p)
## CRS arguments:
## +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
pp <- p
crs(pp) <- NA
crs(pp)
## CRS arguments: NA
crs(pp) <- CRS("+proj=longlat +datum=WGS84")
crs(pp)
## CRS arguments:
## +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
Note that you should not use this approach to change the CRS of a data set from what it is to what you want it to
be_. Assigning a CRS is like labeling something. You need to provide the label that corresponds to the item. Not to
what you would like it to be. For example if you label a bicycle, you can write “bicycle”. Perhaps you would prefer
a car, and you can label your bicycle as “car” but that would not do you any good. It is still a bicycle. You can try to
transform your bicycle into a car. That would not be easy. Transforming spatial data is easier.
Now use it
After the transformation, the units of the geometry are no longer in degrees, but in meters away from (longitude=0,
latitude=0). The spatial extent of the data is also in these units.
We can backtransform to longitude/latitude:
Simplest approach
But to have more control, provide an existing Raster object. That is generally the best way to project raster. By
providing an existing Raster object, such that your newly projected data perfectly aligns with it. In this example we
do not have an existing Raster object, so we create one using projectExtent.
For raster based analysis it is often important to use equal area projections, particularly when large areas are analyzed.
This will assure that the grid cells are all of same size, and therefore comparable to each other.
SEVEN
Example SpatialPolygons
29
Spatial Data in R
7.1 Basics
Basic operations are pretty much like working with a data.frame.
7.1.2 Variables
Extracting a variable.
p$NAME_2
## [1] "Clervaux" "Diekirch" "Redange"
## [4] "Vianden" "Wiltz" "Echternach"
## [7] "Remich" "Grevenmacher" "Capellen"
## [10] "Esch-sur-Alzette" "Luxembourg" "Mersch"
Sub-setting by variable. Note how this is different from the above example. Above a vector of values is returned. With
the approach below you get a new SpatialPolygonsDataFrame with only one variable.
p[, 'NAME_2']
## class : SpatialPolygonsDataFrame
## features : 12
## extent : 5.74414, 6.528252, 49.44781, 50.18162 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 1
## names : NAME_2
## min values : Capellen
## max values : Wiltz
set.seed(0)
p$new <- sample(letters, length(p))
p
## class : SpatialPolygonsDataFrame
## features : 12
## extent : 5.74414, 6.528252, 49.44781, 50.18162 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 6
## names : ID_1, NAME_1, ID_2, NAME_2, AREA, new
## min values : 1, Diekirch, 1, Capellen, 76, a
## max values : 3, Luxembourg, 9, Wiltz, 312, x
7.1. Basics 31
Spatial Data in R
7.1.3 Merge
You can join a table (data.frame) with a Spatial* object with merge.
dfr <- data.frame(District=p$NAME_1, Canton=p$NAME_2, Value=round(runif(length(p),
˓→100, 1000)))
7.1.4 Records
Selecting rows (records).
i <- which(p$NAME_1 == 'Grevenmacher')
g <- p[i,]
g
## class : SpatialPolygonsDataFrame
## features : 3
## extent : 6.169137, 6.528252, 49.46498, 49.85403 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 5
## names : ID_1, NAME_1, ID_2, NAME_2, AREA
## min values : 2, Grevenmacher, 12, Echternach, 129
## max values : 2, Grevenmacher, 7, Remich, 210
It is also possible to interactively select and query records by clikcing on a plotted dataset. That is difficult to show
here. See ?select for interactively selecting spatial features and ?click to identify attributes by clicking on a plot
(map).
To append Spatial* objects of the same (vector) type you can use bind
b <- bind(p, z)
head(b)
## ID_1 NAME_1 ID_2 NAME_2 AREA Zone
## 1 1 Diekirch 1 Clervaux 312 NA
## 2 1 Diekirch 2 Diekirch 218 NA
## 3 1 Diekirch 3 Redange 259 NA
## 4 1 Diekirch 4 Vianden 76 NA
## 5 1 Diekirch 5 Wiltz 263 NA
## 6 2 Grevenmacher 6 Echternach 188 NA
tail(b)
(continues on next page)
7.3. Append 33
Spatial Data in R
Note how bind allows you to append Spatial* objects with different attribute names.
7.4 Aggregate
pa <- aggregate(p, by='NAME_1')
za <- aggregate(z)
plot(za, col='light gray', border='light gray', lwd=5)
plot(pa, add=TRUE, col=rainbow(3), lwd=3, border='white')
You can also aggregate by providing a second Spatial object (see ?sp::aggregate)
Aggregate without dissolve
This is a structure that is similar to what you may get for an archipelago: multiple polygons represented as one entity
(one row). Use disaggregate to split these up into their parts.
zd <- disaggregate(zag)
zd
## class : SpatialPolygons
## features : 4
## extent : 5.74414, 6.528252, 49.44781, 50.18162 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
7.5 Overlay
7.5.1 Erase
Erase a part of a SpatialPolygons object
7.5. Overlay 35
Spatial Data in R
This is equivalent to
e <- p - z2
plot(e)
7.5.2 Intersect
Intersect SpatialPolygons
This is equivalent to
i <- p * z2
7.5. Overlay 37
Spatial Data in R
7.5.3 Union
Get the union of two SpatialPolygon* objects.
u <- union(p, z)
This is equivalent to
u <- p + z
Note that there are many more polygons now. One for each unique combination of polygons (and attributes in this
case).
u
## class : SpatialPolygonsDataFrame
## features : 28
## extent : 5.74414, 6.528252, 49.44781, 50.18162 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 6
## names : Zone, ID_1, NAME_1, ID_2, NAME_2, AREA
## min values : 1, 1, Diekirch, 1, Capellen, 76
## max values : 4, 3, Luxembourg, 9, Wiltz, 312
set.seed(5)
plot(u, col=sample(rainbow(length(u))))
7.5.4 Cover
Cover is a combination of intersect and union
7.5. Overlay 39
Spatial Data in R
7.5.5 Difference
The symmetrical difference of two SpatialPolygons* objects
dif
## class : SpatialPolygonsDataFrame
## features : 4
## extent : 5.74414, 6.528252, 49.44781, 50.18162 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## variables : 1
## names : Zone
## min values : 1
## max values : 4
pts <- matrix(c(6, 6.1, 5.9, 5.7, 6.4, 50, 49.9, 49.8, 49.7, 49.5), ncol=2)
spts <- SpatialPoints(pts, proj4string=crs(p))
plot(z, col='light blue', lwd=2)
points(spts, col='light gray', pch=20, cex=6)
text(spts, 1:nrow(pts), col='red', font=2, cex=1.5)
lines(p, col='blue', lwd=2)
over(spts, p)
## ID_1 NAME_1 ID_2 NAME_2 AREA
## 1 1 Diekirch 5 Wiltz 263
## 2 1 Diekirch 2 Diekirch 218
## 3 1 Diekirch 3 Redange 259
## 4 NA <NA> <NA> <NA> NA
## 5 NA <NA> <NA> <NA> NA
over(spts, z)
## Zone
## 1 1
## 2 1
## 3 3
## 4 NA
## 5 4
extract is generally used for queries between Spatial* and Raster* objects, but it can also be used here.
extract(z, pts)
## point.ID poly.ID Zone
## 1 1 1 1
## 2 2 1 1
## 3 3 3 3
(continues on next page)
EIGHT
8.1 Introduction
In this chapter general aspects of the design of the raster package are discussed, notably the structure of the main
classes, and what they represent. The use of the package is illustrated in subsequent sections. raster has a large
number of functions, not all of them are discussed here, and those that are discussed are mentioned only briefly. See
the help files of the package for more information on individual functions and help("raster-package") for an
index of functions by topic.
library(raster)
# RasterLayer with the default parameters
x <- raster()
x
## class : RasterLayer
## dimensions : 180, 360, 64800 (nrow, ncol, ncell)
## resolution : 1, 1 (x, y)
## extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
res(x)
## [1] 55.55556 55.55556
res(x) <- 100
res(x)
## [1] 100 100
45
Spatial Data in R
ncol(x)
## [1] 20
ncol(x) <- 18
ncol(x)
## [1] 18
res(x)
## [1] 111.1111 100.0000
Set the coordinate reference system (CRS) (i.e., define the projection).
The objects x created in the examples above only consist of the raster ‘geometry’, that is, we have defined the number
of rows and columns, and where the raster is located in geographic space, but there are no cell-values associated with
it. Setting and accessing values is illustrated below.
First another example empty raster geometry.
Another example.
set.seed(0)
values(r) <- runif(ncell(r))
hasValues(r)
## [1] TRUE
inMemory(r)
## [1] TRUE
values(r)[1:10]
## [1] 0.8966972 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819 0.8983897
## [8] 0.9446753 0.6607978 0.6291140
plot(r, main='Raster with 100 cells')
In some cases, for example when you change the number of columns or rows, you will lose the values associated with
the RasterLayer if there were any (or the link to a file if there was one). The same applies, in most cases, if you
change the resolution directly (as this can affect the number of rows or columns). Values are not lost when changing
the extent as this change adjusts the resolution, but does not change the number of rows or columns.
hasValues(r)
## [1] TRUE
res(r)
## [1] 36 18
dim(r)
## [1] 10 10 1
xmax(r)
## [1] 180
Now change the maximum x coordinate of the extent (bounding box) of the RasterLayer.
xmax(r) <- 0
hasValues(r)
## [1] TRUE
res(r)
## [1] 18 18
dim(r)
## [1] 10 10 1
ncol(r) <- 6
hasValues(r)
## [1] FALSE
res(r)
## [1] 30 18
dim(r)
## [1] 10 6 1
xmax(r)
## [1] 0
The function raster also allows you to create a RasterLayer from another object, including another
RasterLayer, RasterStack and RasterBrick , as well as from a SpatialPixels* and SpatialGrid*
object (defined in the sp package), an Extent object, a matrix, an im object (spatstat package), and others.
It is more common, however, to create a RasterLayer object from a file. The raster package can use raster files in
several formats, including some ‘natively’ supported formats and other formats via the rgdal package. Supported
formats for reading include GeoTiff, ESRI, ENVI, and ERDAS. Most formats supported for reading can also be written
to. Here is an example using the ‘Meuse’ dataset (taken from the sp package), using a file in the native ‘raster-file’
format.
A notable feature of the raster package is that it can work with raster datasets that are stored on disk and are too
large to be loaded into memory (RAM). The package can work with large files because the objects it creates from
these files only contain information about the structure of the data, such as the number of rows and columns, the
spatial extent, and the filename, but it does not attempt to read all the cell values in memory. In computations with
these objects, data is processed in chunks. If no output filename is specified to a function, and the output raster is too
large to keep in memory, the results are written to a temporary file.
For this example, we first we get the name of an example file installed with the package. Do not use this system.
file construction of your own files (just type the file name; don’t forget the forward slashes).
r <- raster(filename)
filename(r)
## [1] "C:\\soft\\R\\R-3.5.2\\library\\raster\\external\\test.grd"
hasValues(r)
## [1] TRUE
inMemory(r)
## [1] FALSE
plot(r, main='RasterLayer from file')
Multi-layer objects can be created in memory (from RasterLayer objects) or from files.
Create three identical RasterLayer objects
r1 <- r2 <- r3 <- raster(nrow=10, ncol=10)
# Assign random cell values
values(r1) <- runif(ncell(r1))
values(r2) <- runif(ncell(r2))
values(r3) <- runif(ncell(r3))
b2 <- brick(s)
In this case, that would be equivalent to creating it from disk with a band=2 argument.
s <- r + 10
s <- sqrt(s)
s <- s * r + 5
r[] <- runif(ncell(r))
r <- round(r)
r <- r == 1
If you use multiple Raster objects (in functions where this is relevant, such as range), these must have the same
resolution and origin. The origin of a Raster object is the point closest to (0, 0) that you could get if you moved
from a corners of a Raster object toward that point in steps of the x and y resolution. Normally these objects would
also have the same extent, but if they do not, the returned object covers the spatial intersection of the objects used.
When you use multiple multi-layer objects with different numbers or layers, the ‘shorter’ objects are ‘recycled’. For
example, if you multiply a 4-layer object (a1, a2, a3, a4) with a 2-layer object (b1, b2), the result is a four-layer object
(a1b1, a2b2, a3b1, a3b2).
Summary functions (min, max, mean, prod, sum, Median, cv, range, any, all) always return a RasterLayer object.
Perhaps this is not obvious when using functions like min, sum or mean.
a <- mean(r,s,10)
b <- sum(r,s)
st <- stack(r, s, a, b)
sst <- sum(st)
sst
## class : RasterLayer
## dimensions : 5, 5, 25 (nrow, ncol, ncell)
## resolution : 72, 36 (x, y)
## extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
## data source : in memory
## names : layer
## values : 11.5, 11.5 (min, max)
Use cellStats if instead of a RasterLayer you want a single number summarizing the cell values of each layer.
cellStats(st, 'sum')
## layer.1.1 layer.1.2 layer.2.1 layer.2.2 layer.3
## 25.0 25.0 50.0 87.5 100.0
cellStats(sst, 'sum')
## [1] 287.5
r <- raster()
r[] <- 1:ncell(r)
ra <- aggregate(r, 20)
rd <- disaggregate(ra, 20)
flip lets you flip the data (reverse order) in horizontal or vertical direction – typically to correct for a ‘communication
problem’ between different R packages or a misinterpreted file. rotate lets you rotate longitude/latitude rasters that
have longitudes from 0 to 360 degrees (often used by climatologists) to the standard -180 to 180 degrees system. With
t you can rotate a Raster object 90 degrees.
8.4.2 Overlay
The overlay function can be used as an alternative to the raster algebra discussed above. Overlay, like the functions
discussed in the following subsections provide either easy to use short-hand, or more efficient computation for large
(file based) objects.
With overlay you can combine multiple Raster objects (e.g. multiply them). The related function mask removes
all values from one layer that are NA in another layer, and cover combines two layers by taking the values of the first
layer except where these are NA.
8.4.3 Calc
calc allows you to do a computation for a single raster object by providing a function. If you supply a
RasterLayer, another RasterLayer is returned. If you provide a multi-layer object you get a (single layer)
RasterLayer if you use a summary type function (e.g. sum but a RasterBrick if multiple layers are returned.
stackApply computes summary type layers for subsets of a RasterStack or RasterBrick.
8.4.4 Reclassify
You can use cut or reclassify to replace ranges of values with single values, or subs to substitute (replace)
single values with other values.
r <- raster(ncol=3, nrow=2)
r[] <- 1:ncell(r)
getValues(r)
## [1] 1 2 3 4 5 6
Divide the first raster with two times the square root of the second raster and add five.
u <- mask(r, w)
as.matrix(u)
## [,1] [,2] [,3]
## [1,] NA NA NA
## [2,] 4 5 6
v <- u==s
as.matrix(v)
## [,1] [,2] [,3]
## [1,] NA NA NA
## [2,] TRUE TRUE TRUE
8.4.6 Distance
There are a number of distance related functions. distance computes the shortest distance to cells that are not NA.
pointDistance computes the shortest distance to any point in a set of points. gridDistance computes the
distance when following grid cells that can be traversed (e.g. excluding water bodies). direction computes the
direction toward (or from) the nearest cell that is not NA. adjacency determines which cells are adjacent to other
cells. See the gdistance package for more advanced distance calculations (cost distance, resistance distance)
8.4.8 Predictions
The package has two functions to make model predictions to (potentially very large) rasters. predict takes a
multilayer raster and a fitted model as arguments. Fitted models can be of various classes, including glm, gam, and
RandomForest. The function interpolate is similar but is for models that use coordinates as predictor variables,
for example in Kriging and spline interpolation.
rasterToPoints is reasonably efficient and allows you to provide a function to subset the output before it is
produced (which can be necessary for very large rasters as the point object is created in memory).
Zonal stats
s <- r
s[] <- round(runif(ncell(r)) * 5)
zonal(r, s, 'mean')
## zone mean
## [1,] 0 0.5144431
## [2,] 1 0.5480089
## [3,] 2 0.5249257
## [4,] 3 0.5194031
## [5,] 4 0.4853966
## [6,] 5 0.5218401
Count cells
freq(s)
## value count
## [1,] 0 54
## [2,] 1 102
## [3,] 2 139
## [4,] 3 148
## [5,] 4 133
## [6,] 5 72
freq(s, value=3)
## [1] 148
Cross-tabulate
You can also read values using cell numbers or coordinates (xy) using the extract method.
cells <- cellFromRowCol(r, 50, 35:39)
cells
## [1] 3955 3956 3957 3958 3959
extract(r, cells)
## [1] 456.878 485.538 550.788 580.339 590.029
xy <- xyFromCell(r, cells)
xy
## x y
## [1,] 179780 332020
## [2,] 179820 332020
(continues on next page)
You can also extract values using SpatialPolygons* or SpatialLines*. The default approach for extracting
raster values with polygons is that a polygon has to cover the center of a cell, for the cell to be included. However, you
can use argument “weights=TRUE” in which case you get, apart from the cell values, the percentage of each cell that
is covered by the polygon, so that you can apply, e.g., a “50% area covered” threshold, or compute an area-weighted
average.
In the case of lines, any cell that is crossed by a line is included. For lines and points, a cell that is only ‘touched’ is
included when it is below or to the right (or both) of the line segment/point (except for the bottom row and right-most
column).
In addition, you can use standard R indexing to access values, or to replace values (assign new values to cells) in a
raster object. If you replace a value in a raster object based on a file, the connection to that file is lost (because
it now is different from that file). Setting raster values for very large files will be very slow with this approach as each
time a new (temporary) file, with all the values, is written to disk. If you want to overwrite values in an existing file,
you can use update (with caution!)
r[cells]
## [1] 456.878 485.538 550.788 580.339 590.029
r[1:4]
## [1] NA NA NA NA
filename(r)
## [1] "C:\\soft\\R\\R-3.5.2\\library\\raster\\external\\test.grd"
r[2:3] <- 10
r[1:4]
## [1] NA 10 10 NA
filename(r)
## [1] ""
Note that in the above examples values are retrieved using cell numbers. That is, a raster is represented as a (one-
dimensional) vector. Values can also be inspected using a (two-dimensional) matrix notation. As for R matrices, the
first index represents the row number, the second the column number.
r[1]
## [1] NA
r[2,2]
## [1] NA
r[1,]
## [1] NA 10 10 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [24] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [47] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [70] NA NA NA NA NA NA NA NA NA NA NA
r[,2]
## [1] 10.000 NA NA NA NA NA NA NA
## [9] NA NA NA NA NA NA NA NA
## [17] NA NA NA NA NA NA NA NA
## [25] NA NA NA NA NA NA NA NA
## [33] NA NA NA NA NA NA NA NA
## [41] NA NA NA NA NA NA NA NA
## [49] NA NA NA NA NA NA NA NA
## [57] NA NA NA NA NA NA NA NA
(continues on next page)
Accessing values through this type of indexing should be avoided inside functions as it is less efficient than accessing
values via functions like getValues.
NINE
MAPS
Like for other plots, there are different approaches in R to make maps. You can use “base plot” in many cases.
Alternatively use levelplot, either via the spplot function (implemented in sp and raster) or via the rasterVis
package.
Here are some brief examples about making maps. You can also look elsewhere on the Internet, like here, or this for
spplot and rasterVis.
61
Spatial Data in R
n <- length(p)
plot(p, col=rainbow(n))
62 Chapter 9. Maps
Spatial Data in R
u <- unique(p$NAME_1)
u
## [1] "Diekirch" "Grevenmacher" "Luxembourg"
m <- match(p$NAME_1, u)
plot(p, col=rainbow(n)[m])
text(p, 'NAME_2', cex=.75, halo=TRUE)
9.1.2 spplot
spplot(p, 'AREA')
64 Chapter 9. Maps
Spatial Data in R
9.2 Raster
Example data
library(raster)
b <- brick(system.file("external/rlogo.grd", package="raster"))
Several generic functions have been implemented for Raster* objects to create maps and other plot types. Use ‘plot’
to create a map of a Raster* object. When plot is used with a RasterLayer, it calls the function ‘rasterImage’ (but,
by default, adds a legend; using code from fields::image.plot). It is also possible to directly call image. You
can zoom in using ‘zoom’ and clicking on the map twice (to indicate where to zoom to). With click it is possible to
interactively query a Raster* object by clicking once or several times on a map plot.
After plotting a RasterLayer you can add vector type spatial data (points, lines, polygons). You can do this with
functions points, lines, polygons if you are using the basic R data structures or plot(object, add=TRUE) if you are
using Spatial* objects as defined in the sp package. When plot is used with a multi-layer Raster* object, all layers are
plotted (up to 16), unless the layers desired are indicated with an additional argument.
9.2. Raster 65
Spatial Data in R
plot(r)
plot(p, add=TRUE)
image does not provide a legend and that can be advantageous in some cases.
image(r)
plot(p, add=TRUE)
66 Chapter 9. Maps
Spatial Data in R
plot(b)
9.2. Raster 67
Spatial Data in R
They can also be combined into a single image, by assigning individual layers to one of the three color channels (red,
green and blue):
68 Chapter 9. Maps
Spatial Data in R
9.2. Raster 69
Spatial Data in R
spplot(b, layout=c(3,1))
The rasterVis package has several other lattice based plotting functions for Raster* objects. The rasterVis
package also facilitates creating a map from a RasterLayer with the ggplot2 package.
You can also use the a number of other plotting functions with a raster object as argument, including hist, persp,
contour}, and density. See the help files for more info.
70 Chapter 9. Maps
Spatial Data in R