Step-by-Step Guide To Vulnerability Hotspots Mapping: Implementing The Spatial Index Approach
Step-by-Step Guide To Vulnerability Hotspots Mapping: Implementing The Spatial Index Approach
Developed by the
Center for International Earth Science Information Network (CIESIN),
The Earth Institute, Columbia University
Under contract with TetraTech/ARD for the US Agency for International Development (USAID)
Planning for Resilience in East Africa through Policy, Adaptation, Research, and Economic Development
(PREPARED) Project
November 2015
1
Authors:
Malanding S Jaiteh, PhD
Tricia Chai-Onn
Valentina Mara
Alex de Sherbinin, PhD
Acknowledgements
This work was performed under contract with TetraTech/ARD for the US Agency for International
Development (USAID) Planning for Resilience in East Africa through Policy, Adaptation, Research, and
Economic Development (PREPARED) project. The work also benefited greatly from vulnerability mapping
efforts under the USAID-funded African and Latin American Resilience to Climate Change (ARCC) project.
We wish to acknowledge the contributions of Dr. idris Bexi Wasama, University of Djibouti, who
reviewed the draft manual while visiting CIESIN as a Hubert H. Humphrey Fellow through the University
of California, Davis.
CIESIN (Center for International Earth Science Information Network), Columbia University. (2015). A
Step-by-Step Guide to Vulnerability Hotspots Mapping: Implementing the Spatial Index Approach.
Palisades NY: CIESIN. Available at http://ciesin.columbia.edu/documents/vmapping_guide_final.pdf
2
Table of Contents
Authors: .................................................................................................................................................. 2
Acknowledgements ................................................................................................................................ 2
Citing this Guide ..................................................................................................................................... 2
Description ............................................................................................................................................. 5
Lecture 1. Introduction to Vulnerability Hotspots Mapping ................................................................... 6
Exercise 1. Developing Vulnerability Mapping Framework and Selecting Variables ........................... 25
1.0 Variable Listing ........................................................................................................................... 26
1.1 Variable definition ...................................................................................................................... 27
1.2 Final variable listing................................................................................................................. 28
Lecture 2. Writing metadata for spatial data....................................................................................... 29
Exercise 2. Writing metadata for spatial data ..................................................................................... 33
Lecture 3. Introducing ArcGIS Tools...................................................................................................... 38
Exercise 3. Data Processing in ArcGIS ................................................................................................... 67
3.1 Preparing a Common Reference System ........................................................................................ 68
3.1.1 Create Fishnet.......................................................................................................................... 68
3.1.2 Create Fishnet Centroids ......................................................................................................... 71
3.1.3 Create Mask............................................................................................................................. 73
3.2 Processing Vector Data................................................................................................................... 73
3.2.1 Point/Tabular Data .............................................................................................................. 74
Plot XY Events layer in ArcMap ........................................................................................................... 74
Create an Interpolation Surface raster ............................................................................................... 75
3.2.2 Line .......................................................................................................................................... 78
3.2.3 Polygons .................................................................................................................................. 79
Categorical polygon data .................................................................................................................... 79
3.3 Processing Raster Data ................................................................................................................... 80
3.3.1 Subset Raster Data .............................................................................................................. 80
3.3.2 Matching Rasters ................................................................................................................ 80
Aggregate ............................................................................................................................................ 81
Nibble Rasters ..................................................................................................................................... 82
Resample Rasters ................................................................................................................................ 83
Project Rasters .................................................................................................................................... 84
3.3. Converting to Tabular Data ........................................................................................................... 84
3.3.1 Extract Values to Point ........................................................................................................ 84
3
3.3.2 Export Values to Text Files .................................................................................................. 85
Lecture 4. Introductions to R Statistics Environment............................................................................ 86
Exercise 4. Installing and Configuring R Statistics ................................................................................ 92
4.1 Downloading R distribution ........................................................................................................ 93
4.2 Installing R Installer ..................................................................................................................... 93
4.2.1 Installing and updating packages ........................................................................................ 94
Exercise 5. Getting Started with R ........................................................................................................ 96
Learning Objectives .............................................................................................................................. 96
Exercise data..................................................................................................................................... 96
5.1 Start the R environment and setting up workspaces ................................................................. 97
Data Types ........................................................................................................................................ 98
Vectors.............................................................................................................................................. 98
Lists ................................................................................................................................................... 99
Data Frames.................................................................................................................................... 100
5.2 Importing and Exporting Data Frames ...................................................................................... 100
Importing Files ................................................................................................................................ 100
Exporting data ................................................................................................................................ 101
5.3. Learning More ............................................................................................................................. 102
Other Useful Functions................................................................................................................... 102
Online Tutorials .............................................................................................................................. 102
Exercise 6. Getting Started with R ...................................................................................................... 103
6.1. R Studio Console ................................................................................................................... 104
6.1.1 Create a new Project ............................................................................................................. 104
6.1.2 Reading CSV tables into R...................................................................................................... 106
6.1.3 Joining Imported Data Frames .................................................................................................. 108
6.2 Data Exploration ....................................................................................................................... 109
6.2.1 Numerical measures .............................................................................................................. 110
6.2.2 Graphical Methods ................................................................................................................ 112
6.3 Winsorization................................................................................................................................ 114
6.4 Rescale variable values to 0-100 .................................................................................................. 115
6.5 Calculate composites using weighted averages, and rescale to 0-100 ........................................ 116
6.6 Calculate vulnerability, and rescaling to 0-100 ............................................................................ 117
Exercise 7. Mapping Results in ArcMap.............................................................................................. 118
4
Description
This manual was designed for a five-day training program in Entebbe, Uganda, in August 2015. CIESIN
provided a comprehensive training in the framework, data and methods utilized to develop a spatial
vulnerability assessment for Kenya using the spatial index approach. Data sets and potential indicators
to be derived from them were assembled and evaluated in advance of the workshop. The workshop’s
primary focus was on the methods needed to process and transform the spatial data in order to develop
a spatial vulnerability index (and constituent indices for exposure, sensitivity, and adaptive capacity).
The workshop was intended for GIS analysts who regularly use ArcGIS 9.x or later versions, who are
broadly familiar with a variety of geospatial data formats, and who have the ability to do advanced
geospatial processing (e.g., editing data sets, creating buffers, conducting overlay analyses, running
zonal statistics, etc.).
While vulnerability index maps can be produced entirely in ArcGIS environment, we use R statistical
computing and graphics software for the component of part in this training. R is a free software with a
wide variety of statistical and graphical techniques that have proven to be very efficient in handling
large data sets as the one we encounter in spatial vulnerability index mapping.
The workshop was divided into the following modules, which are addressed in greater detail in the
lectures and exercises that follow.
5
Lecture 1. Introduction to Vulnerability
Hotspots Mapping
A large body of evidence going back more than two decades shows that exposure alone is not
sufficient for understanding trends in disaster losses, and that social and economic vulnerability are
critical ingredients (Mechler and Bouwer 2014, Cutter et al. 2003). Africa has been identified as one of
the regions that is most vulnerable to climate change both in terms of exposure to climate hazards
(Turco et al. 2015, Muller et al. 2014) and social vulnerability (Busby et al. 2014, Lopez-Carr et al. 2014).
Tools such as spatial vulnerability assessment are useful for understanding patterns of vulnerability and
risk to climate change at multiple scales, and have been applied in Africa perhaps more than any other
region (e.g. Busby et al. 2014, Lopez-Carr et al. 2014, and Midgely et al. 2011). The demand for
vulnerability maps among development agencies and governments is increasing as greater emphasis is
placed on scientifically sound methods for targeting adaptation assistance (de Sherbinin 2014a).
Mapping is useful because climate variability and extremes, the sensitivity of populations and
systems to climatic stressors, and adaptive capacities are all spatially differentiated. The interplay of
these factors produces different patterns of vulnerability. Typically spatial vulnerability assessment
involves data integration in which geo-referenced socio-economic and biophysical data, including those
derived from remote sensing, are combined with climate data to understand patterns of vulnerability
and, in turn, inform where adaptation may be required. Maps have proven to be useful boundary
objects in multi-stakeholder discussions, providing a common basis for discussion and for deliberations
over adaptation planning (Preston et al. 2011). Maps can help to ground discussions on a solid evidence
base, especially in developing country contexts where geographic information may not be easily
accessible for all stakeholders.
Spatial data integration and spatial analysis have become standard tools in the toolkit of climate
change vulnerability assessments. The United Nations Environment Programme (UNEP) Programme of
Research on Climate Change Vulnerability, Impacts and Adaptation (PROVIA) Research Priorities on
Vulnerability, Impacts and Adaptation (PROVIA 2013) highlights “measuring and mapping vulnerability”
as a first priority for supporting adaptation decision-making. In many cases vulnerability assessment
(VA) is synonymous with spatial vulnerability assessment, owing in part to an understanding that
vulnerability and its constituent components exhibit high degrees of spatial and temporal heterogeneity
6
(Preston et al., 2011). The purposes vary according to the specific study, but spatial VAs are generally
intended to identify areas at potentially high risk of climate impacts — so-called climate change
“hotspots” (de Sherbinin 2013) — and to better understand the determinants of vulnerability in order to
identify planning and capacity building needs.
Any vulnerability mapping effort needs to be guided by a theoretical framework. de Sherbinin
(2014b) provides a review of a number of different frameworks. In addition, the selection of indicators
should be guided by theoretical linkages to the concept of interest – whether vulnerability, or its
constituent elements such as exposure, sensitivity, and adaptive capacity. In a mapping effort for Mali
(de Sherbinin et al. 2015), which served as a model for the Kenya mapping described in this guide,
indicator selection was guided by the literature on factors known to contribute to each component of
vulnerability, as well as by data availability and quality. Each data layer was justified based on its
conceptual proximity to the three vulnerability components (Hinkel 2011), and choices were consistent
with the variables that have been found to be associated with harm from climate variability and change,
including education levels (Lutz et al. 2014), climate variability (Hall et al. 2014), and marginal (semi-arid
and arid) environments and geographically remote areas in poor developing regions (de Sherbinin et al.
2013, Gray and Moseley 2005). The guiding approach should be to identify a limited number of high-
quality spatial data sets that best represent the component of interest while avoiding the temptation to
add low-quality data (data of high uncertainty or coarse spatial resolution), thereby “contaminating” the
results. We had reasonably high confidence in the validity and reliability of each of the data sets
included; data limitations are explored in Annex IV of the overall report.
Vulnerability mapping and the quantification of vulnerability is not without shortcomings, and more
critical perspectives are provided in a number of other publications (de Sherbinin 2014a, de Sherbinin
2013, Preston et al. 2011, Hinkel 2011). Users of this guide who desire a more in depth look at the
challenges of vulnerability mapping, including issues around uncertainty, are advised to read these
publications. Despite these caveats, the spatial vulnerability index construction methods described here
are widely used in the literature and have been found to be useful to policy audiences seeking to better
understand the factors contributing to vulnerability (e.g., Busby et al. 2014, Midgley et al. 2011, Preston
et al. 2011, BMZ 2014). Those wishing an overview of different vulnerability mapping methods may wish
to refer to de Sherbinin et al. (2014b).
References
BMZ (German Federal Ministry for Economic Cooperation and Development). (2011). The Vulnerability
Sourcebook: Concept and guidelines for standardised vulnerability assessment; GIZ (Deutsche
Gesellschaft fur Internationale Zusammenarbeit): Berlin, Germany.
Busby, J. W.; Smith, T. G.; Krishnan, N. (2014) Climate security vulnerability in Africa mapping 3.01. Polit.
Geogr., 43, 51–67
Cutter, S.L., B.J. Boruff, and W.L. Shirley. (2003) Social Vulnerability to Environmental Hazards. Soc. Sci.
Quarterly. 84, 242–261
de Sherbinin, A., T. Chai-On, M. Jaiteh, V. Mara, L. Pistolesi, E. Schnarr, S. Trzaska. 2015. Data Integration
for Climate Vulnerability Mapping in West Africa. ISPRS International Journal of Geo-Information. In
press. http://www.mdpi.com/2220-9964/4/3.
7
de Sherbinin, A. (2014a). Mapping the Unmeasurable? Spatial analysis of vulnerability to climate change
and climate variability. Dissertation in fulfillment of the PhD degree at the Faculty of Geo-Information
Science and Earth Observation (ITC), University of Twente. Enschede, Netherlands: ITC.
de Sherbinin, A. (2014b). Spatial Climate Change Vulnerability Assessments: A Review of Data, Methods
and Issues. Technical Paper for the USAID African and Latin American Resilience to Climate Change
(ARCC) project. Washington, DC: USAID. Available at
http://ciesin.columbia.edu/documents/SpatialVulAsses_CLEARED_000.pdf.
de Sherbinin, A. (2013). Climate Change Hotspots Mapping: What Have We Learned? Climatic Change,
123(1): 23-37.
Gray, L.C., and W.G. Moseley. (2005). A geographical perspective on poverty–environment interactions.
The Geographical Journal. DOI: 10.1111/j.1475-4959.2005.00146.x.
Hall, W., D. Grey, D. Garrick , F. Fung , C. Brown , S. J. Dadson, and C. W. Sadoff. (2014). Coping with the
curse of freshwater variability. Science, 346(6208).
Hinkel, J. (2011). “Indicators of vulnerability and adaptive capacity”: Towards a clarification of the
science–policy interface. Global Environmental Change, 21(1), 198-208.
Lopez-Carr, D., N. Pricope, J. Aukema, M. Jankowska, C. Funk, G. Husak, and J. Michaelsen. (2014).
spatial analysis of population dynamics and climate change in Africa: potential vulnerability hot spots
emerge where precipitation declines and demographic pressures coincide. Population and Environment,
35:323–339
Lutz, W., R. Muttarak, and E. Striessnig. (2014). Universal education is key to enhanced climate
adaptation. Science, 346(6213): 1061-1062.
Mechler, R., and L.M. Bouwer. (2014). Understanding trends and projections of disaster losses and
climate change: is vulnerability the missing link? Climatic Change, 133(1): 23-35.
Midgley, S.J.E., Davies, R.A.G., and Chesterman, S. (2011). Climate risk and vulnerability mapping in
southern Africa: status quo (2008) and future (2050). Report produced for the Regional Climate Change
Programme for Southern Africa (RCCP), UK Department for International Development (DFID). Cape
Town, South Africa: OneWorld Sustainable Investments.
Muller, C., K. Waha, A. Bondeau, and J. Heinke. (2014). Hotspots of climate change impacts in sub-
Saharan Africa and implications for adaptation and development. Global Change Biology. doi:
10.1111/gcb.12586
Preston, B.L, Yuen, E.J., and Westaway, R.M. (2011). Putting vulnerability to climate change on the map:
a review of approaches, benefits, and risks. Sustainability Science, 6(2), 177-202.
PROVIA. (2013). Research Priorities on Vulnerability, Impacts and Adaptation: Responding to the Climate
Change Challenge. Nairobi, Kenya: United Nations Environment Programme.
Turco, M., E. Palazzi1, J. von Hardenberg, and A. Provenzale. 2014. Observed climate change hotspots.
Geophysical Research Letters, doi:10.1002/2015GL063891
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Exercise 1. Developing Vulnerability
Mapping Framework and Selecting
Variables
25
1.0 Variable Listing
In this exercise we will examine the variables under each of the components, Exposure, Adaptive
Capacity and Sensitivity to climate change and vulnerability
Exposure
• Seasonal temperature
• Average Annual Precipitation
• Long-term trend in temperature in July-August-Sept. (1950-2009)
• Flood frequency (1999-2007)
Adaptive Capacity
• Education level of mother (2006) point/polygon
• Market accessibility (travel time to major cities)
• Anthropogenic biomes (2000)
• Irrigated areas (area equipped for irrigation) (1990-2000)
Sensitivity
• Household wealth (2006)
• Child stunting (2006)
• Infant mortality rate (IMR) (2008)
• Soil organic carbon/soil quality (1950-2005)
26
1.1 Variable definition
Variable name:………………………………………………………………………………………………………………..
Briefly explain three ways in which this variable may contribute to social vulnerability to climate
change.
1………………………………………………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………………………………………….
2………………………………………………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………………………………………………
3………………………………………………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………………………………………….
List and briefly explain two alternative variables you know that can be used to substitute for this
variable
1.………………………………………………………………………………………………………………………………………………………………
……………………………………………………………………………………………………………………………………………………………….
2………………………………………………………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………………………………………………..
3………………………………………………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………………………………………………
List in order of importance which component of vulnerability (exposure, adaptive capacity and
sensitivity) does this belong.
1………………………………………………….
2………………………………………………..
3…………………………………………………
27
1.2 Final variable listing
Sensitivity:
1.
2.
3.
4.
5.
Exposure
1.
2.
3.
4.
5.
Adaptive Capacity
1.
2.
3.
4.
5.
28
Lecture 2. Writing metadata for spatial
data
Metadata is data that describes other data. Metadata summarizes basic information about data
such as the author, date created and date modified and file size. The main purpose of
metadata is to facilitate in the discovery of relevant information. Metadata also helps organize
electronic resources, provide digital identification, and helps support archiving and preservation
of the resource. Having the ability to filter through that metadata makes it much easier for
someone to locate a specific document.
Metadata in GIS describe geographic objects (such as shapefiles, satellite imagery, maps,
features, or simply documents with a geospatial component). Metadata can be created
manually, or by automated information processing. Manual creation tends to be more
accurate, allowing the user to input any information they feel is relevant or needed to help
describe the file. Automated metadata creation can be much more elementary, usually only
displaying information such as file size, file extension, when the file was created and who
29
created the file.
30
31
32
Exercise 2. Writing metadata for spatial
data
33
Anthropogenic Biomes Metadata Sheet
Corresponds to the field ‘Rationale’ in Indicator sheets in methodology. Include brief description of
the indicator in the context of:
34
Biome analysis was conducted at 5 arc minute resolution (5’ grid cells cover ~ 86 km2 at the
Units: equator), a spatial resolution selected as the finest allowing direct use of high-quality land-use area
estimates.
The data were subset to the Mali national boundary extent using ArcGIS Extract by Mask tool and a
30 arc-second raster mask generated from a 30 arc second fishnet. Raster values were extracted
Computation:
using ArcGIS Extract Values to Points tool and the 30 arc-second fishnet centroids. The output was
exported to a .csv table for re-coding and statistical analysis.
Statistics for
After recode: Min=10, Max=100, Median=80, Mean=70.26, Standard Deviation=24.24
transformed data:
The data set is a conceptual model and not intended to replace existing biome systems based on
climate, terrain, and geology. Rather it is intended that wide availability of an anthropogenic biome
Limitations: system will encourage a richer view of human–ecosystem interactions across the terrestrial
biosphere, and that this will, in turn, guide our investigation, understanding, and management of
ecosystem processes and their changes at global and regional scales.
Spatial Resolution: Raster cell sizes are 5 arc-minute or 0.08333 degree decimal (about 10 kilometers at the equator)
Year of
2009
Publication:
35
Additional Notes:
Raster data available in GeoTiff and Esri Grid formats. The Africa Esri Grid was downloaded for this
Format:
analysis.
\\Dataserver0\arcc\VAs\Mali\VulnerabilityMapping\data\AnthropogenicBiomes\af_anthrome_ESRI
File Name:
grid\af_anthrome
Exercise Task: Given the metadata template provided, add as much information for each of the variables
as possible following the sample metadata provided above.
36
List of indicators for Vulnerability Mapping
37
Lecture 3. Introducing ArcGIS Tools
Introduction
This lecture is a refresher for those who may not have been regular users of the ArcGIS software. It will
introduce the Graphic User Interface (GUI) and basic navigation to access the different tools that we
may be using in the vulnerability assessment. Lecture will be followed by exercises to process the data
sets in each category.
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
Exercise 3. Data Processing in ArcGIS
In this exercise we will process the various input GIS datasets and convert export them to comma
separated variable (.CSV) tables. First we will create a common reference system for our area of interest,
then process vector files (points, lines and polygons) and finally process raster data.
67
3.1 Preparing a Common Reference System
A common reference system consists of a fishnet covering the entire area of interest and a point file
representing the centroids of each polygon in the fishnet. Each polygon and centroid representing a
pixel location has a pixel identification number, pixel_id. The pixel_id is the attribute used to join other
variables to create the tabular files.
2. Start the Create Fishnet tool, name Output Feature Class as KenyaFN, spatial extent information as
shown below. Make sure to check box next to Create Label Points and select POLYGON as Geometry
Type.
68
The fishnet created extends beyond Kenya’s boundary, Longitude 33E to 42E and Latitude 5S to 6N.
Next we will extract only those polygons in the fishnet that intersects with the Kenya’s country, using
the Select by Location tool.
3. Under the selection menu, use the Select by Location option to select polygons in the fishnet that
Intersect with the Kenya country boundary. (Click
Selection on the menu bar and)
69
4. Export the features selected above to a new feature class, name it yourCountyNameID:
a. Use the Select By Location to :
b. Selection Method: Select Features From
c. Target: “YourCountryNameFN”
d. Source Layer: yourCountry admin. For those from Uganda this is Uganda admin layer
e. Spatial Selection: intersect the source layer feature
70
3.1.2 Create Fishnet Centroids
Select only the fishnet centroids that intersect the country boundary. Use the yourCountryNameID
feature class created above to select by location the fishnet centroids that intersect with the
yourCountryNameID .
1. Use the following steps to run the Select by Location tool to create fishnet centroids layer :
a. Selection Method: Select Features From
b. Target: “yourCountryNameFN_label” points
c. Source Layer: “Kenya boundary” polygon fishnet
d. Spatial Selection: intersect the source layer feature
e. Export selected points>Data>”yourCountryName_centroids” point feature class
2. Add a new field to the Kenya_Centroids attribute table. Call it PixelID. Make this field Long
Integer.
71
3. Calculate PixelID values using the following code. This gives a unique identification number for
each of the centroid points. Warning: Python is case sensitive. Make sure you type in the code
as is with line 3-5 starting 2 spaces after line begins.
72
3.1.3 Create Mask
A raster mask is dataset that defines area of interest in the input rasters. It is often a 1 and NoData
raster, with 1 defining areas included. Cells that are NoData in a raster mask will be NoData in the
output. We use a raster mask to create country raster of raster of small spatial extent from a global or
regional raster.
1. To create a Kenya country raster mask, Use Feature To Raster tool to create raster of
YourCountryName admin0 shapefile. (ArcToolbox\Conversion Tools\To Raster\Feature To
Raster.)
2. Use the Extract by Mask tool in Spatial Analyst to extract all raster datasets to “Ken_Mask.img”
extent.
73
3.2.1 Point/Tabular Data
Some vector data such as survey analysis results from Microsoft Excel and other statistical packages are
available as tables. These tables often contain fields representing coordinate information. The following
steps are needed to process tabular vector files
1) Plot XY events layer in ArcMap
2) Export the events layer to shapefile or geodatabase Feature Class
3) Create a raster surface using one of several Interpolation methods
4) Close all ArcMap sessions before the next section
4. Right-click on the Table of Contents and select Display XY Data…. This opens the Display XY Data
dialog box. Make sure you chose field for longitude for X Field and latitude for Y Field (see below).
74
5. Right-click on the newly created feature class and export to a shapefile.
75
Interpolating Cluster Data
Empirical Bayesian Kriging (EBK) is a geostatistical interpolation method that automates the task of
building a valid kriging model. EBK automatically calculates model parameters through a process of
subsetting and simulations. In addition to creating a prediction surface, EBK also allows you to create a
prediction standard error surface. For further details, please connect on ESRI website
(http://www.esri.com/news/arcuser/0704/files/interpolating.pdf).
You will need to activate the Geostatistical Analyst and Spatial Analyst extensions to use interpolation
tools. (Customize>Extensions).
1. Under ArcToolbox/Geostatistical Analyst Toolbox, start the Empirical Bayesian Kriging tool.
2. Set Environments: Set Mask and Raster Analysis Cell Size. This should be equal to the fishnet
resolution.
3. Output Parameters: Output surface type choose PREDICTION
4. Run the EBK Tool again but this time choose for Output surface type choose PREDICTION
STANDARD ERROR.
76
Kernel Density/Point Density
Create a surface to calculate the density of points or number of points per unit area is another way of
interpolating points. The search radius of the kernel function will vary the results. Only the points that
fall within the search radius will be included in the density calculation. A large radius will create a more
generalized or smoother output raster. A small radius will create a more detailed raster. The input layer
should be in projected coordinate system to use the kernel density tool.
77
3.2.2 Line
For line data create a surface of the distance from the lines.
78
3.2.3 Polygons
For polygons that have integer values create a surface of the polygon data using the conversion, Polygon
to Raster tool.
ArcToolbox\Conversion Tools\To Raster\Polygon To Raster
Example:
Input: ke_milk-surplus-deficit
Value: MILK_DEF_S
Output Raster: WRI.gdb\ke_milk_surplus_deficit
Cell assignment type: CELL CENTER
Priority field: NONE
Cellsize: Set to Kenya_Mask
Set Environments: Set Processing Extent: Kenya_Mask,
Raster Analysis: Kenya_Mask
79
3.3 Processing Raster Data
Processing Raster Data requires the use of a number of tools including, Extract by Mask, Aggregate,
Resample, Reclassify, Nibble and Extract Values to Points
80
Aggregate
• Aggregate tool is used to merge fine resolution raster cells into coarse resolution cell size.
• Eg. If you have a data set with a resolution of 100m then you use aggregate tool to create a
raster with 1,000m cell size.
• Make sure tif’s are converted to rasters before aggregating (Export>Data>Geodatabase Raster)
81
Nibble Rasters
Sometimes we have raster files that are smaller than our Mask. In such case we use Nibble tool to grow
raster to match the Mask raster. Nibble works only with integer raster. If you have a float or continuous
raster, you will need to convert it to an integer first. Also nibbling will occur where there are no NoData
values in the mask.
1) Convert input raster to integer using Raster Calculator by multiplying raster by a factor of 10
2) Create nibbling mask
3) Run Nibble
4) Export to Raster
82
Using Python to Nibble
In Python console:
#: indicates Python comments or notes that are not part of the script,
>: indicates what is typed in the Python window one line at a time.
1. Start Python window to execute commands. Type commands one line at a time.
#create a new raster called test. Give ‘test’ the values of your raster file for nibbling that is an
#integer, if not then give it a value of 1
>test = Con(IsNull(“<your raster file for nibbling that is an integer>”), 1, “<your raster file for
nibbling that is an integer>”)
#Nibble test to the original mask as integer and write values that have data. Don’t let no data
#values influence decision.
>nibbled = Nibble(“test”, “<your raster file for nibbling that is an integer>”, “DATA_ONLY”)
Resample Rasters
Resample (Data Management) tool, aalters the raster dataset by changing the cell size and resampling
method.
*Before you use resample make sure that your rasters line up.
*If not use Raster Calculator and snap raster to ‘realign’ raster to Mask:
[Raster] * 1
Set Environments: Set Extent to “kenya_Mask” AND Snap Raster to “kenya_Mask”
Use Resample tool to get rasters to desired resolution (5 km). (ArcToolboxes\Data Management
Tools.tbx\Raster\Raster Processing\Resample)
83
Set Environment: Extent to Niger_Mask; Raster Analysis Cell Size: 0.041667 dd (~5km)
Project Rasters
Make sure all rasters are in the same projection before aggregation. If not use Project Raster (Data
Management) to project raster to desired projection. Project Raster tool transforms the raster dataset
from one projection to another.
Extract the raster cell values from to the centroids using ArcGIS Extract Values to Points
ArcToolbox\SpatialAnalyst\Extraction\Extract Values to Points tool
Input point features: Fishnet centroids
Input raster: [Raster] eg ACLED
Output point features: ACLED_Fishnet_Centroids
84
3.3.2 Export Values to Text Files
Add the point feature class to ArcMap and open the attribute table. Export table as text. In Windows
you can change the file extension to .csv or write your R script to accept .txt
85
Lecture 4. Introductions to R Statistics
Environment
86
87
88
89
90
91
Exercise 4. Installing and Configuring R
Statistics
R is a language and environment for statistical computing and graphics. R provides a wide variety of
statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification,
clustering, ...) and graphical techniques, and is highly extensible. A major advantage of R over other
statistical software is that it is freely available under the terms of the Free Software Foundation’s GNU
General Public License in the source code form, thus having fewer restrictions on who gets to use it for
what. The Free Software status also allowed many users and developers to build a comprehensive
library of functions and packages which are freely available at the Comprehensive R Archive Network
(CRAN) http://cran.r-project.org/. CRAN is a network of ftp and web servers around the world that store
identical, up-to-date, versions of code and documentation for R.
In this tutorial we will:
• Download the latest stable version of R from one of the many CRAN sites
• Install and configure the downloaded file
• Make sure the Packages we need for the next two days are all installed and updated
You will only need to perform the first part of this tutorial when you are installing R on your own, with
administrative privileges. For most users in organizations installation, setting up environments and paths
are done by the network administrator
92
4.1 Downloading R distribution
R distributions are available for Unix-like, Windows and Mac family of operating systems. The current
release of the software is available via CRAN , http://cran.r-project.org/
The rest of this tutorial will be focused on R for Windows distribution. The latest release of the
Windows distribution is R 3.0.2 for Windows. For those installing R on Windows for the first time should
download - http://cran.r-project.org/bin/windows/base/R-3.0.2-win.exe
The download file, R-3.0.2-win.exe file size is 51.4MB. Depending on your internet connection,
download could take only few minutes.
For this training we will need the following packages installed. Before proceeding, start R with
Administrator Privileges. This will enable you to install or update any package that may be needed.
93
The following packages are essential for the completion of this training.
• ggplot2
• Plyr
• Foreign
• xlsReadWrite
• gridExtra
• lattice
In the Packages menu, you have different options, select Update packages … to get the latest version of
all installed packages (see below). The Default selection is all packages in need of update. You can
narrow the selection by clearing default selection and selecting only those packages you want to update.
This feature can be useful in situations where internet connectivity is poor.
94
Load package - option will show all packages that are currently installed (see screenshot #). In the event
that you cannot find your desired package, you can install packages using the Install packages option.
The first time you select to install packages, you will get a list of CRAN mirror sites (screenshot # ) where
you can possibly download desired package; it is wise to download from closest mirror to your location.
While it possible to download from any CRAN site, it is advised to download from mirror closest to you.
Selecting CRAN site will open the Packages window. This has the list
of all packages hosted at the CRAN site that are compatible with
your OS. You can select multiple packages by holding Ctrl and select
the packages.
Installing in packages or updating packages require connection to
the Internet.
95
Exercise 5. Getting Started with R
Learning Objectives
Part II of vulnerability mapping is completed in R Statistics. The goal of this exercise is to introduce
beginners to the R environment and how it works. By the end of the exercise you will learn about R how
to:
• Start and exit the R environment
• Setup workspaces and navigate between directories
• Learn about the different data types
• Import and manipulate data frames
• Generate basic statistics and charts
Exercise data
Three data files used in this exercise are contained in the data.zip. You will need to uncompressed the
file into your ../VIA_R folder.
96
5.1 Start the R environment and setting up workspaces
Starting R opens in the R Console window. The Console window is the default R Graphic User Interface
(RGUI) allows interaction with the R environment. The ‘>’ symbol by default indicates R is waiting for a
command while ‘#” indicates the statement is a comment and not a command. R execute only a
command not a comment!
97
2. To set your working directory:
• Check your current directory with getwd(). If R is newly installed on a Windows
machine, the default working directory is C:/Windows/systems32.
• Change working directory to our project folder with setwd(“D:/VIA/VIA_R”)
Note: R writes path with forward slash ‘/’ instead of backslash ‘\’. It is desirable to create a separate
working directory for each new project. R is case sensitive. getwd() is not the same as GETWD().
3. Next we will change working directory to our project folder. First, using Windows Explorer,
create a new folder in the location you desire. Call it VIA_R. The path to the folder is
“D:\VIA\VIA_R”
Type setwd(“D:/VIA/VIA_R”) to set working directory to the folder you just created.
Use get working directory function getwd() to confirm the change in directory
Data Types
R has a wide variety of data types are scalars, vectors (numerical, character, logical), data frames, and
lists.
Vectors
The simplest data type in R, are Vectors. There are three main types of vector data namely, numeric,
character and logical. A vector assignment results in an object in R
4. To set up a vector named x, say consisting of one number, namely 5,
>x <- 5
The symbol ‘<-‘ denotes an assignment. Here x is assigned the value 5.
Type a and <enter> to confirm assignment
>a <enter>
R screen output
>a
[1] 5 output confirms a is now 5
[1], indicates the display starts at the first element of a
A vector can consist of a list of numbers. e.g. y consisting of all prime numbers between 0 and 10
>y <- c(2,3,5,7)
The function c() denotes concatenating list. Concatenating list is given list argument “2,3,5,7” . The
result is an object whose components are the argument list “2,3,5,7”
>y <enter>
>y
[1] 2 3 5 7 confirms y is a list of prime number between 0 and 10, “2,3,5,7”
Your turn.
Now try the following vector assignments:
a <- 5
b <- 6
98
c <- a * b # multiply a by b
d <- a / b
Q1. What is the value of a * b…………………..?
We use the function a[c(xxx)] to refer to elements in a vector. For instance we can chose to display only
the 1st and 4th elements in the vector y <- c(2,3,5,7,) by using:
• y[c(1,4)] # will display 1st and 4th elements (2 and 7) in the list
• y[c(2,3)] # will display the second and 3 elements in the list
Lists
In R, a list is an ordered collection of objects (components). A list allows us to gather a variety of
(possibly unrelated) objects under one name. In GIS context a list refers to a record in an attribute data
and the objects (components) are attributes (items). You can create as many attributes as you like for a
record.
A list is created with the function list(). The object names must be specified when creating a list else the
names() function will return NULL.
L1 <- list(x,y)
L2 <- list(A=x, B=y)
Type L1 to see the content of the list
> L1 <enter>
• [[1]] #vector sequence number, 1
• [1] 1 2 3 4 # values in the vector
[[2]] # vector sequence number 2
[1] 2 3 4 # values in the vector
99
[1] 1 2 3 4
$B # vector sequence number 2
[1] 2 3 4
Names () is functions to get or set the names of an object. For further details, please see
(http://astrostatistics.psu.edu/su07/R/html/base/html/names.html)
Q. Use names() function to list the labels for the vectors in each list
L1 …………………………………………...
L2 ………………………………………………
Data Frames
A data frame is more like an Excel spreadsheet, SAS or SPSS datasets. A data frame contains individual
columns and rows. You can create a Data Frame by importing a table form another format using the
read.table function.
Importing Files
R has specialized functions to import from and export data frames to other formats. Depending on the
format you will need to load the foreign packages, you may need to download the needed packages as
explain in the section 4.2.1. For Excel, you need the xlsReadWrite package. In the following steps we
will import data from .TXT, .CSV and .XLS format.
7. Start a new R session and change working directory to “ ../VIA/VIA_R”. The directory
contains the following POP.XLS (Excel format), FEW.TXT, Text file and
Note: header = TRUE means there is a header in the table; sep= “,” – denotes the separator
in the data is a comma (,); and quotes are denoted by “ “
100
b. Assign the POP data frame equal to POP.XLS data:
> POP = odbcConnectExcel(file.choose())
file.choose() function starts a add file dialog box to allow you navigate to the “../VIA”
folder and select the POP.XLS
You now have all three data frames imported in R. You can use the following functions to explore the
content:
• names(data frame) – e.g. names(FEWS) –
• View(data_frame) # note use of Upper case V
• hist(data_frame$column)
• quantile(POP$, 0.025, c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)
Now we will join the FEWS and POP data frames using the PixelID as the join field.
First use the name() function to list the fields in each of the tables. Note the sequencing of the fields.
PixelID is used as the join field and joining starts with first pixel in each data frame. It is important to
make sure that each of the files is of the same sequencing.
Exporting data
Like import, R allows us to export data frames to a number of formats. Use the following code to
export the join1 table to a comma separated variable (.CSV) file.:
101
5.3. Learning More
Online Tutorials
An Introduction to R- Notes on R: A Programming Environment for Data Analysis and Graphics-
http://cran.r-project.org/doc/manuals/R-intro.pdf
http://www.cyclismo.org/tutorial/R/genindex.html
102
Exercise 6. Getting Started with R
In exercise 2-3 we have processed the various input layers using various GIS methods,
converted them to raster and exported each individual raster to a comma separated variable
table. Each table contains a field containing a raster sequence number and a column
representing variable of interest. In this exercise we will:
In exercises 4a and 4b we used command line RGui to handle vectors, list and data frames. In
this exercise we will use RStudio IDE to implement the rest of the data processing.
103
6.1. R Studio Console
When started for the first time the RStudio GUI has three panels, namely Console,
Environment/ History and Files/Plots/packages?Viewer/Help panel.
a. The Console panel is the R command panel. This is where to type in commands such as
getwd() to inquire about which folder R is using as Working Directory. The Console
panel divides into 2 when a data is imported. The upper half of the panel turns into a
data viewer (to help visualize the data frame active in the Environment panel. The lower
part of the panel remains as the Console panel.
b. The Environment /History panel – The Environment panel shows the data frames in the
workspace. This panel allows us to open saved environments, or import data sets. The
History panel contains the list of commands and functions used in the session. You can
send commands from in this panel to the Console or to Source (new scripts).
c. The Files/Plots/Packages/Help/Viewer panels – We manage files in the Files panel;
Packages in Packages. Plots are drawn in the Plots panel.
Working in RStudio IDE allows you to create separate projects for your work. Projects make it
easier to divide your work into multiple contexts each with their own working directory,
workspace, history and source documents.
104
2. Click on the R_Project drop-down window and select New Project.
Select Create project from: Existing Directory. This will allow you to navigate to the CSV folder
containing the .CSV files.
105
3. In the Packages panel, check the box next to the following packages to load.
If in the process you get an error that package failed to load as the dependencies are not found,
you can search and install the dependencies by clicking on the Install Packages button in the
panel.
We are now ready to read in the .CSV files representing the input variables
.
1. Click on the Import Dataset button in the Environment panel to start the RStudio import
wizard. , then select from Text File
106
This opens the Select File to Import Dataset dialog window.
Make sure the dataset name in the Name field is correct, if there is Heading in the data or not
and if the other variables (separator, Decimal and Quote) are correct.
The Import Dataset window also displays the Input File and the Output Data Frame. This allows
you to confirm that all is well before hitting Import button to complete.
107
The imported data will be displayed in the panel on the left. You can view the first 1000 rows of
your document as well as the total number of rows in the dataset in this case 1,538,278.
Your Turn: Using the same approach to import the remaining CSV files into R.
For example, to join the Biomes (ANTH) and soil carbon (CARB) datasets, the following
command was used:
108
Now join the rest of the data frames, beginning with the second data set in the list to join1 to
create join2, then third to join2 to create join3, and so on.
join2 <- join(join1,EDUC, match="first", by="PIXELID")
We need to clean up our workspace of unnecessary data frames. This means removing all
joined files except the last, via_data
It is general practice in statistics to employ exploratory data analysis tools to check for missing
data and determine central tendency of input variables.
In this step we will use the following data exploration techniques to examine each of our
variables.
We begin by displaying the list of variable names in the via_data frame using the command
command:
names(via_data)
109
6.2.1 Numerical measures
.
We use the function, mean(dataframe$field) to calculate the mean.
> mean(join3$ANTH) - computes the mean of the ANTH field in the join3 data frame
> mean(join3$DCVAR,na.rm=TRUE)
[1] 25.3434?
2. Median - The median of an observation variable is the value at the middle when the
data is sorted in ascending order. It is an ordinal measure of the central location of the
data values. The median of a few data points is the value of the middle observation
when all the observations are lineup from lowest to highest value.
110
0% 25% 50% 75% 100%
2.378686 22.357409 25.346428 28.378949 38.683297
You can specify the break points using the concatenation function,
(c(0, 0.1, 0.25, 0.5, 0.75, 1.0)
4. Percentile – The nth percentile of an observation variable is that cuts off the first n
percent of the data values when it is sorted in ascending order. For instance when we
sort a variable from lowest to highest observation, the 66th percentile is the that value
that cuts of the first 66 percent data observation points. The quantile() function is used
to compute the percentiles by using the concatenate, c() function.
5. The standard deviation (SD) shows how much dispersion from the average exists. It is
mathematically expressed as the square root of its variance.
na.rm=-9999 Removed -9999, the no data values from the dataset in order to calculate it.If
the missing datais not coded (appear as NA in the dataset), the argument is: na.rm=TRUE)
6. Summary statistics. Summary statistics are used to summarize a set of observations. The
function summary() displays the minimum, 1st Quartile, Mean, 3rd Quartile and Max.
> summary(via_data$ANTH)
Min. 1st Qu. Median Mean 3rd Qu. Max.
11.00 41.00 43.00 47.15 63.00 63.00
Your turn: Now use the functions above and explore each of the variables in the via_data data
frame.
111
6.2.2 Graphical Methods
R statistics contain a number of tools for graphical methods for exploratory data analysis
including:
• histograms
• Scatter plots
• Box plots
1. Histograms - a histogram consists of parallel vertical bars that graphically show the
frequency distribution of a quantitative variable. A histogram is a graphical representation of
the data in which the area of each bar is equal to the frequency of items found in each class.
> hist(via_data$CONF, main=”Conflict”, xlab=”Ratio”, ylab=”Number of grids”)
We can use the ggplot() function to make elaborate histograms with labels.
> plot_anth <- ggplot( via_data, aes(x=ANTH))+geom_histogram(colour = "darkgreen", fill =
"white")+scale_x_continuous("name of x axis")+scale_y_continuous("name of y axis")+ggtit
le("ANTH graph")
ggplot() function used to declare the input data frame for a graphic and to specify the set of
plot aesthetics intended to be common throughout all subsequent layers unless specifically
overridden
112
aes() aesthetics function describes how variables in the data are mapped to visual propertie
s of geometries.
> plot_anth
2. Scatter plots . A scatter plot provides a graphical view of the relationship between two
variables. The data is displayed as a collection of points, each having the value of one variable
determining the position on the horizontal axis and the value of the other variable determining
the position on the vertical axis.
For instance the scatter plot of ANTH and DCVAR variables is:
> plot(via_data$ANTH, via_data$DCVAR, xlab="Anthropogenic biome", ylab="Variance ")
>
113
3. Box plots - A boxplot provides a graphical view of the median, quartiles, maximum, and
minimum of a data set. Box plots display differences between groups without making any
assumptions of the underlying statistical distribution, and are most useful for identification of
outliers.
6.3 Winsorization
Extreme values sometimes have a big effect on statistical operations. Extreme values can have
positive or negative effect. Winsorization is the transformation of statistics by limiting extreme
values in the statistical data to reduce the effect of possibly spurious outliers. A common
method to deal with outliers is to trim or truncate the data. A 90% Winsorisation would see all
data below the 5th percentile set to the 5th percentile and data above the 95th percentile set to
above 95th percentile. We usually do the winsorization at one end, where we observe the
presence of outliers, i.e.data is usually highly skewed to the left or to the right.
114
Example of Winsorization
Order the observations by value
The value for Xi1, Xi2, Xi3,Xi4,will be replaced by the value for Xi5
The value for Xi96, Xi97, Xi98, Xi99, Xi100, will be replaced by the value for Xi95
We use the function recode() to set the new values below 5th and above 95th percentile to new
values.
5% 95%
32 63
2. Recode all values below 5th percentile (32) in the ANTH field to 32. This creates a new
field called ANTHr1
> via_data$ANTH1r <-[via_data$ANTH < 32] <-32
3. Recode all values above 95th percentile (63) in the ANTHr1 to 32. This creates a new field
called ANTHr2
> via_data$ANTH2r[a_data$ANTHr1 > 63] <-63
The resultant data frame contains original ANTH field and ANTHr1 field with values less than 32
(5th percentile) recoded to 32 and ANTHr2 field with values greater than 95th percentile (63)
recoded to 63.
The variables have different units of measurements, and the next step in the analysis is to bring
all the data to a common 0-100 scale, where 0 equates to lower vulnerability and 100 equates
to high vulnerability. For variables where larger values represent low vulnerability (i.e. years for
education), we invert the data.
Data rescaling for variable where low values indicates low vulnerability: example= conflict
115
1) min_conf<-min(via_data$CONF) ….1
2) max_conf<-max(via_data$CONF) ……2
3) via_data$var_pct<-(via_data$CONF-min_conf)/(max_conf-min_conf)*100 …..3
The command number (1) created a new field (min_conf) and sets value to the minimum value
in the CONF field.
Command (2) creates a new field (max_conf) and computes value to maximum value in the
CONF field.
Command (3) subtract the minimum value created in (1) from the CONF field and divide it by
maximum minus minimum.
Data rescaling for variable where low values indicates high vulnerability: example= education
4) min_educ<-min(via_data$EDUC) …………..1
5) max_conf<-max(via_data$EDUC)…………..2
6) via_data$var_pct<-(via_data$EDUC-max_conf)/(min_conf-max_conf)*100…..3
The input variables included in our spreadsheet are measures of three dimensions of
vulnerability: adaptive capacity, exposure and sensitivity. In the following step, we will use the
clean and transformed data generated in the previous steps to calculate these composites.
The calculation of the composites is based on average. If you think that some of the variables
have a larger importance within the vulnerability dimension, wither because of data quality,
resolution or conceptual meaning, weights are assigned to the variables to account for this.
via_data$adaptive<-(via_data$var1*0.2857+ via_data$var2*0.2857+
via_data$var3*0.1428+ via_data$var4*0.1428+ via_data$var5*0.1428)/5
The rescaling to 0-100 range is done similarly as the variable rescaling, renaming the variable
adaptive_prc.
116
6.6 Calculate vulnerability, and rescaling to 0-100
The overall vulnerability is calculated using simple average of the three composites calculated
in the previous step:
The rescaling to 0-100 range is done similarly as the variable and composite rescaling, renaming
the index vulnerability_prc.
117
Exercise 7. Mapping Results in ArcMap
1. In ArcCatalog import table generated from R “composites_for_mapping_table”, as a table in
geodatabase.
2. In ArcMap join “composite table” by PixelID field to a copy of the “centroids” feature class.
3. If the analysis area is smaller or a subset of the “mask” then create a query by using “Select By
Attributes” where PixelID IS NOT NULL.
4. Export new point feature class with the attributes from composite table
“composite_table_points”
5. Rasters were created for each component and each PCA in the “composite_table_point” using
ArcGIS Point to Raster tool (ArcToolbox\Conversion Tools\Raster\Point To Raster)
118