Spatial Data r 3

The document provides an overview of spatial data, including definitions, types (vector and raster), and their management using Geographic Information Systems (GIS). It details the structure and file formats of vector data, particularly Shapefiles, and introduces various R packages for manipulating spatial data. The document also explains how to load spatial objects from Shapefiles in R and describes the components of spatial objects.

Uploaded by

sjoerddklinkert

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Spatial Data r 3

Uploaded by

sjoerddklinkert

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Spatial Data with R

Diego LEGROS

2022-10-10

Spatial Data
By spatial (we also say geospatial data) data we mean data that contain locational as well as attribute
information. Spatial data is data about objects, events, or phenomena that have a location on the surface of
the earth. The location may be static in the short-term (e.g., the location of a road, an earthquake event,
children living in poverty), or dynamic (e.g., a moving vehicle or pedestrian, the spread of an infectious
disease).
Geospatial data combines location information (usually coordinates on the earth), attribute information (the
characteristics of the object, event, or phenomena concerned), and often also temporal information (the time
or life span at which the location and attributes exist).
To manage spatial data a geographical Information System (GIS) is very useful. A GIS is a multi-component
environment used to create, manage, visualize and analyse spatial data i.e data with information about their
locations (adress, longitude, latitude, cartesian coordinates (X, Y ). . . In a GIS there are two main types of
data: Vector and Raster.

Vector data
Vector data can be perceived as the digitization of communicating with coordinates. Just as people have
shared spatial information by writing its coordinate on paper, now they share it by writing the coordinates
on files. That is as simple as that.
There are three subcategories of vector data: point, line, and polygon. A single piece of coordinate is a point.
Usually, houses, cars, and places where a particular incident occurs are represented by a point. A series of
points connected to each other in line. Roads, rivers, cables, and pipelines are perfect examples of line data.
An enclosed area, created by connecting a number of lines is a polygon. Neighborhoods, cities, and countries
are examples of polygon data.

Raster Data
Raster data, also known as grid data, is a spatial data type that is created by taking photos of the earth
from the sky. Raster data is stored as a grid of pixels( sometimes they are called cells), where the grid is an
array of rows and columns. Satellite images and aerial photographs are the perfect examples of raster data.
Raster data is divided into two: Single-Band and Multi-Band (or Single-Layer and Multi-Layer). If a
raster data has only one grid of pixels it is called a single-band raster. But sometimes raster data contain
information on more than one dimension. In these situations, there are grids as many as the number of
different information, of the same size, on top of each other. Then they are called multi-band rasters.

Working with spatial data in R

The most common file format of vector data is Shapefile with the extension .shp. In GIS world, you will
encounter many different GIS file formats. Some file formats are unique to specific GIS applications, others

1
are universal. For this course, we will focus on a subset of spatial data file formats: shapefile for vector data.
The shapefile format is a popular geospatial vector data format for geographic information system (GIS)
software for storing the location, shape, and attributes of geographic features.
Shapefiles consist of many files sharing the same core filename and different suffixes (i.e. file extensions). It
is developed and regulated by Esri as a (mostly) open specification for data interoperability among Esri and
other GIS software products.
A Shapefile is stored in a set of related files and contains one feature class. The Shapefile is by far the most
common geospatial file type you’ll encounter. You’ll need a complete set of files that are mandatory to make
up a Shapefile.
The required files are :
.shp is a mandatory Esri file that gives features their geometry. Every Shapefile has its own .shp file that
represent spatial vector data. For example, it could be points, lines and polygons in a map.
.shx are mandatory Esri and AutoCAD shape index position. This type of file is used to search forward and
backwards.
.dbf is a standard database file used to store attribute data and object IDs. A .dbf file is mandatory for
shape files. You can open .dbf files in Microsoft Access or Excel.
.prj is an optional file that contains the metadata associated with the shapefiles coordinate and projection
system. If this file does not exist, you will get the error “unknown coordinate system”. If you want to fix this
error, you have to use the “define projection” tool which generates .prj files.
.xml file types contains the metadata associated with the shapefile. If you delete this file, you essentially
delete your metadata. You can open and edit this optional file type (.xml) in any text editor.
.sbn is an optional spatial index file that optimizes spatial queries. This file type is saved together with a
.sbx file. These two files make up a shape index to speed up spatial queries.
.sbx are similar to .sbn files in which they speed up loading times. It works with .sbn files to optimize
spatial queries.
.cpg are optional plain text files that describes the encoding applied to create the shapefile. If your shapefile
doesn’t have a cpg file, then it has the system default encoding.

Working with points, line and polygons in R

There are many, many packages devoted extensively to the manipulation, presentation and analysis of spatial
data. Whilst some of these are complimentary (essentially adding functionality to other packages), others
provide alternatives. So numerous are these packages, that it is easy to become confused and overwhelmed
by choice. In this tutorial, we will cover the basic usage of points, lines and polygons in R.
1. sp: basic package defining spatial object
2. rgdal: import/export pf spatial objects
3. rgeos: geometric manipulation
4. cartography: producing analysis maps
The sf package groups together all the functions of package sp, rgdal and rgeos. There are many others
spatial packages in R. Some are compatible with sf spatial data classes, some others are not. Another very
useful package is the rgeos one. Its contains a range of spatial manipulations functions (are, perimiter,
unions, intersections, query attribution . . . )

2
Geomtry type Attibute table Class
Points No SpatialPoints
Points Yes SpatialPointsDataFrame
Lines No SpatialLines
Lines Yes SpatialLinesDataFrame
Polygons No SpatialPolygons
Polygons Yes SpatialPolygonsDataFrame

Data structures for vectors layers in R

In this section, we will review the architecture of the vector layer classes defined in the sp package. Spatial
vector layers have two components:
1. the geometry
2. the attribute table
The geometry component holds the spatial coordinates and information regarding their arrangement in
separate features, while the attribute table holds additional information regarding each feature. For example,
in a point layer of capital cities, the record for London may be composed of a geometric component (a point
coordinate, such as 51.5072°N, 0.1275°W) and a row in an attribute table holding additional data regarding
each city (for example, population size, built area, and so on).
The geometry part in a vector layer is obligatory and there are three types of geometries: points, lines, and
polygons. The attribute table is optional. Classes for the six spatial vector layers types, constituting all
possible combinations of these two properties, have been defined in the sp package. They are summarized in
the following table:

Loading spatial objects from shapefiles

There are number of ways to load spatial objects from shpaefiles. Spatial data usually comes from of shapefiles
which can be loaded using readOGR function from the rgdal package. This function, from the rgdal
package, automatically extracts the information regarding the data. The package rgdal is R’s interface to
the “Geospatial Abstraction Library (GDAL)” which is used by other open source GIS packages such as
QGIS and enables R to handle a broader range of spatial data formats.
library(rgdal)
shp <- readOGR(dsn = "path/to/your/file",layer = "filename")

The readOGR has two arguments. Exactly what you pass to these arguments depends on what kinds of data
you are reading in. The first one is dsn and the second one is layer. The argument dsn should be the path
to the directory in which the file is stored and layer is the filename of the shapefile (without any extension).
The arguments are separated by a comma and the order in which they are specified is important. You do not
have to explicitly type sn=... or layer = ...as R knows which order they appear. For clarity, it is good
pratice to include argument names when learning new function so we will continue to do so.
A example using the Columbus data available in the GEODA website1 .
# Ensure that rgdal package is installed
if (!require("rgdal")) install.packages("rgdal")
library(rgdal)
columbus <- readOGR(dsn = "/cloud/project/MappingWithR/columbus",layer = "columbus")

## OGR data source with driver: ESRI Shapefile

## Source: "/cloud/project/MappingWithR/columbus", layer: "columbus"
## with 49 features
1 All information we need about the variables names can be found the the html file available in the GEODA website.

3
## It has 20 fields
## Integer64 fields read as strings: COLUMBUS_ COLUMBUS_I POLYID
In the code above the readOGR\?? function is used to load a shapefile and assign it to a new spatial object
called columbus. Another way is to create an object that contains the location where are saved the data.
# Ensure that rgdal package is installed
if (!require("rgdal")) install.packages("rgdal")
library(rgdal)
# Set the folder where the data are saved
data.columbus <- setwd("/cloud/project/MappingWithR/columbus")
columbus <- readOGR(dsn = ".",layer = "columbus")

The structure of spatial data in R

Spatial objects like the Columbus object are made up of a number of different slots, the key slots being @data
(non geographic attribute data) and @polygons (or @lines for line data). The data slot can be thought of as
an attribute table and the geometry slot is the polygons that make up the physcial boundaries. Specific slots
are accessed using the @ symbol. To display slots, we can use the slotNames function.
slotNames(columbus)

## [1] "data" "polygons" "plotOrder" "bbox" "proj4string"

Sometimes we will need to know the object class. We will use the class function.
class(columbus)

## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
Let’s now analyse the columbus object with some basic commands:
head(columbus@data, n = 2)

## AREA PERIMETER COLUMBUS_ COLUMBUS_I POLYID NEIG HOVAL INC CRIME

## 0 0.309441 2.440629 2 5 1 5 80.467 19.531 15.72598
## 1 0.259329 2.236939 3 1 2 1 44.567 21.232 18.80175
## OPEN PLUMB DISCBD X Y NSA NSB EW CP THOUS NEIGNO
## 0 2.850747 0.217155 5.03 38.80 44.07 1 1 1 0 1000 1005
## 1 5.296720 0.320581 4.27 35.62 42.38 1 1 0 0 1000 1001
The head function simply means show the first few lines of data (see head? for more details).
Take a look at the output created (note the table format of the data and the column names). There are two
important symbols at work in the above block of code: the @ symbol in the code is used to refer to the data
slot of the columbus object.
The summary() command is a useful way of exploring a spatial object.
summary(columbus)

## Object of class SpatialPolygonsDataFrame

## Coordinates:
## min max
## x 5.874907 11.28742
## y 10.788630 14.74245
## Is projected: FALSE
## proj4string : [+proj=longlat +datum=WGS84 +no_defs]
## Data attributes:

4
## AREA PERIMETER COLUMBUS_ COLUMBUS_I
## Min. :0.03438 Min. :0.9021 Length:49 Length:49
## 1st Qu.:0.09315 1st Qu.:1.4023 Class :character Class :character
## Median :0.17477 Median :1.8410 Mode :character Mode :character
## Mean :0.18649 Mean :1.8887
## 3rd Qu.:0.24669 3rd Qu.:2.1992
## Max. :0.69926 Max. :5.0775
## POLYID NEIG HOVAL INC
## Length:49 Min. : 1 Min. :17.90 Min. : 4.477
## Class :character 1st Qu.:13 1st Qu.:25.70 1st Qu.: 9.963
## Mode :character Median :25 Median :33.50 Median :13.380
## Mean :25 Mean :38.44 Mean :14.375
## 3rd Qu.:37 3rd Qu.:43.30 3rd Qu.:18.324
## Max. :49 Max. :96.40 Max. :31.070
## CRIME OPEN PLUMB DISCBD
## Min. : 0.1783 Min. : 0.0000 Min. : 0.1327 Min. :0.370
## 1st Qu.:20.0485 1st Qu.: 0.2598 1st Qu.: 0.3323 1st Qu.:1.700
## Median :34.0008 Median : 1.0061 Median : 1.0239 Median :2.670
## Mean :35.1288 Mean : 2.7709 Mean : 2.3639 Mean :2.852
## 3rd Qu.:48.5855 3rd Qu.: 3.9364 3rd Qu.: 2.5343 3rd Qu.:3.890
## Max. :68.8920 Max. :24.9981 Max. :18.8111 Max. :5.570
## X Y NSA NSB
## Min. :24.25 Min. :24.96 Min. :0.0000 Min. :0.0000
## 1st Qu.:36.15 1st Qu.:28.26 1st Qu.:0.0000 1st Qu.:0.0000
## Median :39.61 Median :31.91 Median :0.0000 Median :1.0000
## Mean :39.46 Mean :32.37 Mean :0.4898 Mean :0.5102
## 3rd Qu.:43.44 3rd Qu.:35.92 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :51.24 Max. :44.07 Max. :1.0000 Max. :1.0000
## EW CP THOUS NEIGNO
## Min. :0.0000 Min. :0.0000 Min. :1000 Min. :1001
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1000 1st Qu.:1013
## Median :1.0000 Median :0.0000 Median :1000 Median :1025
## Mean :0.5918 Mean :0.4898 Mean :1000 Mean :1025
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1000 3rd Qu.:1037
## Max. :1.0000 Max. :1.0000 Max. :1000 Max. :1049
columbus@data[columbus$CRIME < 20.0485,]

The above line of code asked R to select only the rows from the columbus object, where crime is lower than
the first quartile. The square brackets work as follow: anything before the comma refers to the rows that will
be selected, anything after the comma refers to the numbers of columns that should be returned.
We can compute the mean of a specific variable, for example the income.
mean(columbus$INC) # Compute the mean of the income

## [1] 14.37494
The $ symbol refers to the INC column (a variable within the table) in the data slot. The use of the mean
function works because we are dealing with numeric data. To check the classes (know the types of the
variables in the dataset) of all the variables in a spatial dataset, you can use the sapply command:
sapply(columbus@data, class)

## AREA PERIMETER COLUMBUS_ COLUMBUS_I POLYID NEIG

## "numeric" "numeric" "character" "character" "character" "integer"
## HOVAL INC CRIME OPEN PLUMB DISCBD

5
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
## X Y NSA NSB EW CP
## "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
## THOUS NEIGNO
## "numeric" "numeric"
To explore columbus object further, try typing nrow(columbus) (display number of rows) and record how
many zones the datasets contains. You can also try ncol(columbus).
numberOfZones <-nrow(columbus)
print(numberOfZones)

## [1] 49
Now we have seen something of the structure of the spatial object in R, let us look at plotting them using
the plot function. The plot function is one of the most useful function in R, as it changes its behavior
depending on the input data (this is called polymorphism by computer scientists). In putting another object
such as plot(columbus@data) will generate an entirely different type of plot. Note that the plot function
use the geometry data, contained primarily in the @polygons slot.
plot(columbus)

Maos V300R018C10 Optional Feature Description
No ratings yet
Maos V300R018C10 Optional Feature Description
12 pages
p_2018_basel_rspatial1
No ratings yet
p_2018_basel_rspatial1
77 pages
Geoeasy
No ratings yet
Geoeasy
17 pages
3 Spatial Data Modelling
No ratings yet
3 Spatial Data Modelling
5 pages
Simple Features For R Standardized
No ratings yet
Simple Features For R Standardized
8 pages
Mapping and Geographic Information Systems (GIS) : What Is GIS?
No ratings yet
Mapping and Geographic Information Systems (GIS) : What Is GIS?
4 pages
Gis Interview Questions
No ratings yet
Gis Interview Questions
10 pages
Computer Basis
No ratings yet
Computer Basis
25 pages
Implementing Spatial Data Analysis Software Tools Inr
No ratings yet
Implementing Spatial Data Analysis Software Tools Inr
14 pages
Submitted By-Pawan Yadav, Roll No. (18PT1-17)
No ratings yet
Submitted By-Pawan Yadav, Roll No. (18PT1-17)
4 pages
ALPHA_GIS_Class_7
No ratings yet
ALPHA_GIS_Class_7
7 pages
Glossary of Terms: GIS Generally
No ratings yet
Glossary of Terms: GIS Generally
5 pages
Spatial Data
No ratings yet
Spatial Data
11 pages
FINAL EXAM GUIDE GIS 311
No ratings yet
FINAL EXAM GUIDE GIS 311
57 pages
The Ultimate List of GIS Formats and Geospatial File Extensions
No ratings yet
The Ultimate List of GIS Formats and Geospatial File Extensions
27 pages
GIS Interview
No ratings yet
GIS Interview
11 pages
Lecture 5
No ratings yet
Lecture 5
13 pages
Glossary of Terms _ Coursera
No ratings yet
Glossary of Terms _ Coursera
4 pages
Important Questions of GIS For Exam and Viva
100% (2)
Important Questions of GIS For Exam and Viva
25 pages
Spatial Databases (All Chapters)(Seng 3174) (1)
No ratings yet
Spatial Databases (All Chapters)(Seng 3174) (1)
71 pages
Vmerwade@purdue - Edu: Handling Raster Data For Hydrologic Applications
No ratings yet
Vmerwade@purdue - Edu: Handling Raster Data For Hydrologic Applications
10 pages
Lecture 4 Spatial Data Formats
No ratings yet
Lecture 4 Spatial Data Formats
27 pages
Assignment 01: Introduction To GIS
No ratings yet
Assignment 01: Introduction To GIS
8 pages
Definitions
No ratings yet
Definitions
3 pages
Define GIS
No ratings yet
Define GIS
6 pages
UNIT 4 &5
No ratings yet
UNIT 4 &5
17 pages
Geographic Data Model
No ratings yet
Geographic Data Model
5 pages
File Formats
No ratings yet
File Formats
10 pages
Shortlisted
No ratings yet
Shortlisted
5 pages
R Programming ChatGPT
No ratings yet
R Programming ChatGPT
106 pages
Tutorial 6 08
No ratings yet
Tutorial 6 08
95 pages
Introduction To Vector Data
No ratings yet
Introduction To Vector Data
4 pages
Gis PDF
No ratings yet
Gis PDF
4 pages
Typographic Conventions: Plot (X, Y) Monospace C, ## (1) 1 4 9 25 ## #
No ratings yet
Typographic Conventions: Plot (X, Y) Monospace C, ## (1) 1 4 9 25 ## #
14 pages
2016 CH 2 Gis RS
No ratings yet
2016 CH 2 Gis RS
85 pages
Güting1994 Article AnIntroductionToSpatialDatabas
No ratings yet
Güting1994 Article AnIntroductionToSpatialDatabas
43 pages
In This View of A Polygon Based Dataset, Frequency of Fire in An Area Is Depicted Showing A Graduate Color Symbology
No ratings yet
In This View of A Polygon Based Dataset, Frequency of Fire in An Area Is Depicted Showing A Graduate Color Symbology
20 pages
Components of GIS
No ratings yet
Components of GIS
33 pages
Overview of GIS
No ratings yet
Overview of GIS
9 pages
Unit 1 Exam Review
No ratings yet
Unit 1 Exam Review
11 pages
RS 3 UNIT PPT
No ratings yet
RS 3 UNIT PPT
42 pages
Components of GIS
No ratings yet
Components of GIS
28 pages
Fundamental Differences Between GIS and CAD
No ratings yet
Fundamental Differences Between GIS and CAD
7 pages
Spatial Database Assignment
No ratings yet
Spatial Database Assignment
12 pages
GIS Article Review
No ratings yet
GIS Article Review
2 pages
Arcgis
No ratings yet
Arcgis
35 pages
Lecture-3
No ratings yet
Lecture-3
8 pages
Tutorial 1
No ratings yet
Tutorial 1
32 pages
An Introduction To Spatial Database Systems: VLDB Journal, 3
No ratings yet
An Introduction To Spatial Database Systems: VLDB Journal, 3
43 pages
GeoXp - An R Package For Exploratory Spatial Data Analysis
No ratings yet
GeoXp - An R Package For Exploratory Spatial Data Analysis
23 pages
387587d8-ca7f-4e38-a342-c11ecdaa73d8
No ratings yet
387587d8-ca7f-4e38-a342-c11ecdaa73d8
28 pages
GIS Assignment 3 Areej
No ratings yet
GIS Assignment 3 Areej
3 pages
Presentation (2)
No ratings yet
Presentation (2)
26 pages
Algorithms and Applications For Spatial Data Mining PDF
No ratings yet
Algorithms and Applications For Spatial Data Mining PDF
32 pages
Gis v3 Datamodel
No ratings yet
Gis v3 Datamodel
19 pages
Spatial Data Models
No ratings yet
Spatial Data Models
20 pages
GIS Data: Types and Structures
No ratings yet
GIS Data: Types and Structures
49 pages
Gis
No ratings yet
Gis
29 pages
Introducing Geographic Information Systems with ArcGIS: A Workbook Approach to Learning GIS
From Everand
Introducing Geographic Information Systems with ArcGIS: A Workbook Approach to Learning GIS
Michael D. Kennedy
3/5 (1)
Geometric Primitive: Exploring Foundations and Applications in Computer Vision
From Everand
Geometric Primitive: Exploring Foundations and Applications in Computer Vision
Fouad Sabry
No ratings yet
Raster Graphics: Understanding the Foundations of Raster Graphics in Computer Vision
From Everand
Raster Graphics: Understanding the Foundations of Raster Graphics in Computer Vision
Fouad Sabry
No ratings yet
Store Location
No ratings yet
Store Location
61 pages
Water Quality Management Using GIS and RS Tools: Conference Paper
No ratings yet
Water Quality Management Using GIS and RS Tools: Conference Paper
8 pages
Mtechds 2021
No ratings yet
Mtechds 2021
17 pages
Maximo Spatial Setup Document
No ratings yet
Maximo Spatial Setup Document
29 pages
Sericulture Project Phase I Atlas
No ratings yet
Sericulture Project Phase I Atlas
405 pages
TMI2053-SEM2-2023-24-LU8 Improving Decision Making and Managing Knowledge
No ratings yet
TMI2053-SEM2-2023-24-LU8 Improving Decision Making and Managing Knowledge
52 pages
Etap Overview
No ratings yet
Etap Overview
6 pages
Website For ArcGIS Tutorial
No ratings yet
Website For ArcGIS Tutorial
8 pages
Assessment of Surface Water Model Maintenance and Support Status-EPA
No ratings yet
Assessment of Surface Water Model Maintenance and Support Status-EPA
61 pages
Coca and Conservation: Cultivation, Eradication, and Trafficking in The Amazon Borderlands
No ratings yet
Coca and Conservation: Cultivation, Eradication, and Trafficking in The Amazon Borderlands
20 pages
GL-22 Technical Drawings FGL
No ratings yet
GL-22 Technical Drawings FGL
80 pages
Parishma Nath ROLL NO-16/GEOL/029 REGISTRATION NO. 1600727: Presented by
No ratings yet
Parishma Nath ROLL NO-16/GEOL/029 REGISTRATION NO. 1600727: Presented by
31 pages
Land Officer Document1-1
100% (3)
Land Officer Document1-1
30 pages
AGIS manual 2_merged
No ratings yet
AGIS manual 2_merged
84 pages
14 Ep 2 2014-14
No ratings yet
14 Ep 2 2014-14
17 pages
Previewpdf
No ratings yet
Previewpdf
51 pages
Sustainability 16 05798 v2
No ratings yet
Sustainability 16 05798 v2
14 pages
555-Article Text-1793-2-10-20221202
No ratings yet
555-Article Text-1793-2-10-20221202
12 pages
Ecesis: The Role of Landscape Architects in Ecological Restoration
No ratings yet
Ecesis: The Role of Landscape Architects in Ecological Restoration
12 pages
GIS Manual
100% (1)
GIS Manual
37 pages
ArcGIS 1 Introduction To GIS
No ratings yet
ArcGIS 1 Introduction To GIS
5 pages
Attua &, Fisher 2010 - Land Suitability Assessment For Pineapple Production in The Akwapim South District, Ghana - A GIS Approach
No ratings yet
Attua &, Fisher 2010 - Land Suitability Assessment For Pineapple Production in The Akwapim South District, Ghana - A GIS Approach
38 pages
World Geography Curriculum Framework Project
No ratings yet
World Geography Curriculum Framework Project
16 pages
07-074 OpenGIS Location Services OpenLS Core Services
No ratings yet
07-074 OpenGIS Location Services OpenLS Core Services
179 pages
Gredhd GB Georeva2016
No ratings yet
Gredhd GB Georeva2016
2 pages
Gayathry CV PDF
No ratings yet
Gayathry CV PDF
2 pages
ArcGIS Lab 2 Vector Data
No ratings yet
ArcGIS Lab 2 Vector Data
10 pages
xACP Optimization PDF
No ratings yet
xACP Optimization PDF
49 pages
1 - Introduction To Geographic Information Systems - Eng
No ratings yet
1 - Introduction To Geographic Information Systems - Eng
4 pages