0% found this document useful (0 votes)
8 views

Point Pattern Analysis

Point pattern analysis in Gis

Uploaded by

elitebookenya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Point Pattern Analysis

Point pattern analysis in Gis

Uploaded by

elitebookenya
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

SPATIAL ANALYSIS

EGS 2310

Point Pattern Analysis (PPA)


Lecture No. 06

Felix Mutua, Ph. D


Tuesday, February 27, 2024

1
Lecture Plan

Week Topic Week Topic

1 Introduction to SDA 8 CAT I

2 Geometric-Based SDA techniques 9 Network Analysis

3 Queries, Computations and density 10 Districting and normalization

4 Tutorial on Spatial Queries on PostreSQL 11 Spatial Autocorrelation

5 Overlay Analysis 12 Spatial Regression

6 Point Pattern Analysis 13 Spatial Modelling

7 Surface Field Analysis 14 CAT II

2
Outline

• Centrography
• Density-based analysis
• Distance based analysis

3
Introduction
• Point pattern analysis (PPA) is the study of
the spatial arrangements of points in
(usually 2-dimensional) space.
• The easiest way to visualize a 2-D point
pattern is a map of the locations, which is
simply a scatterplot but with the provision
that the axes are equally scaled
• Another straightforward way to visualize
the points is a 2D histogram (sometimes
called a quadrats) that bins the points into
rectangular regions. A benefit of quadrat
analysis is that it forces the analysis to
take into account possible scales within
which statistically significant
inhomogeneities may be occurring.
Centrography

• A very basic form of


point pattern analysis
involves summary
statistics such as
the mean
center, standard
distance and standard
deviational ellipse.
More powerful analysis methods can be used to explore
point patterns. These methods can be classified into two
groups: density based approach and distance
based approach. 5
Density based analysis
• Density based techniques characterize the pattern in terms of its
distribution vis-a-vis the study area–a first-order property of the
pattern.
• A first order property of a pattern concerns itself with the variation
of the observations’ density across a study area. For example, the
distribution of certain tree species will vary across a landscape
based on underlying soil characteristics (resulting in areas having
dense clusters of the species and other areas not).
• Density-based techniques are:
– Global density
– Local density

6
Global density
• A basic measure of a pattern’s density λ is its overall, or global,
density. This is simply the ratio of observed number of points, n , to
the study region’s surface area, a, or: λ=n/a

An example of a point pattern where n = 20 and the study area


(defined by a square boundary) is 10 units squared. The point
density is thus 20/100 = 0.2 points per unit area.

7
Local density

• A point pattern’s density can be measured at different locations


within the study area. Such an approach helps us assess if the
density–and, by extension, the underlying process’ local (modeled)
intensity λ –is constant across the study area.
• This can be an important property of the data since it may need to
be mitigated for when using the distance based analysis tools.
• Several techniques for measuring local density are available, here
we will focus on two such methods:
– quadrat density and
– kernel density.

8
Quadrat density
• This technique requires that the
study area be divided into sub-
regions (aka quadrats).
• Then, the point density is
computed for each quadrat by
dividing the number of points in
each quadrat by the quadrat’s
area.
• Quadrats can take on many
different shapes such as
hexagons and triangles, here
we use square shaped quadrats
to demonstrate the procedure.
An example of a quadrat count where the study area is divided into four equally sized quadrats whose area is 25
square units each. The density in each quadrat can be computed by dividing the number of points in each quadrat
by that quadrat’s area. 9
Quadrat density
• The choice of quadrat numbers and quadrat shape can
influence the measure of local density and must be chosen
with care.
• If very small quadrat sizes are used you risk having many
quadrats with no points which may prove uninformative. If
very large quadrat sizes are used, you risk missing subtle
changes in spatial density distributions such as the east-
west gradient in density values in the above example.

10
Quadrat density
• Quadrat regions do not have to take on a
uniform pattern across the study area, they
can also be defined based on a covariate.
• For example, if it’s believed that the
underlying point pattern process is driven
by elevation, quadrats can be defined by
sub-regions such as different ranges of
elevation values (labeled 1 through 4 on
the right-hand plot in the following
example).
Example of a covariate. Figure on the left shows the
elevation map. Figure on the right shows elevation broken

• This can result in quadrats having non-


down into four sub-regions (a tessellated surface) for which
local density values will be computed.
uniform shape and area.

If the local intensity changes across the tessellated covariate, then there is evidence of a
dependence between the process that generated the point pattern and the covariate.
11
Kernel density
• The kernel density approach is an extension of the
quadrat method: Like the quadrat density, the
kernel approach computes a localized density for
subsets of the study area, but unlike its quadrat
density counterpart, the sub-regions overlap one
another providing a moving sub-region window.
• This moving window is defined by a kernel. The
kernel density approach generates a grid of density
values whose cell size is smaller than that of the
kernel window. Each cell is assigned the density
value computed for the kernel window centered on
that cell.
An example of a basic 3x3 kernel density map
• A kernel not only defines the shape and size of the (ArcGIS calls this a point density map) where
window, but it can also weight the points following each point is assigned an equal weight. For

a well defined kernel function. The simplest


example, the second cell from the top and left
(i.e. centered at location x=1.5 and y =8.5) has
function is a basic kernel where each point in the one point within a 3x3 unit (pixel) region and

kernel window is assigned equal weight.


thus has a local density of 1/9 = 0.11.
12
Kernel density
• Some of the most popular kernel
functions assign weights to points
that are inversely proportional to
their distances to the kernel window
center.
• A few such kernel functions follow
a gaussian or quartic like distribution
function. These functions tend to
produce a smoother density map. An example of a kernel function is the 3x3 quartic kernel
function where each point in the kernel window is
weighted based on its proximity to the kernel’s center cell
(typically, closer points are weighted more heavily). Kernel
functions, like the quartic, tend to generate smoother
13
surfaces.
Modeling intensity as a function of a
covariate
• So far, we have learned techniques that describe the
distribution of points across a region of interest. But it is often
more interesting to model the relationship between the
distribution of points and some underlying covariate by
defining that relationship mathematically.
• This can be done by exploring the changes in point density as a
function of a covariate, however, unlike techniques explored
thus far, this approach makes use of a statistical model.
• One such model is a Poisson point process model

14
Distance based analysis
• An alternative to the density based methods explored thus far are the
distance based methods for pattern analysis whereby the interest lies in
how the points are distributed relative to one another (a second-order
property of the point pattern) as opposed to how the points are
distributed relative to the study extent
• A second order property of a pattern concerns itself with the observations’
influence on one another. For example, the distribution of oaks will be
influenced by the location of parent trees–where parent oaks are present
we would expect dense clusters of oaks to emerge.
• Three distance based approaches are covered next: The average nearest
neighbor (ANN), the K function, and the pair correlation function.

15
Average Nearest Neighbour
• An average nearest
neighbor (ANN)
analysis measures the
average distance from
each point in the study
area to its nearest
point.
• In the following
example, the average
nearest neighbor for
all points is 1.52 units.

16
Average Nearest Neighbour

• An extension of this idea is


to plot the ANN values for
different order neighbors,
that is for the first closest
point, then the second
closest point, and so forth.

17
Average Nearest Neighbour

• The shape of the ANN curve


as a function of neighbor
order can provide insight into
the spatial arrangement of
points relative to one another.
• In the following example,
three different point patterns
of 20 points are presented.
Three different ANN vs. neighbor order plots. The black ANN line is for the first
point pattern (single cluster); the blue line is for the second point pattern
(double cluster) and the red line is for the third point pattern. 18
Average Nearest Neighbour
• The bottom line (black dotted line) indicates
that the cluster (left plot) is tight and that
the distances between a point and all other
points is very short.
• This is in stark contrast with the top line (red
dotted line) which indicates that the
distances between points is much greater.
• Note that the way we describe these
patterns is heavily influenced by the size and
shape of the study region.
• If the region was defined as the smallest
rectangle encompassing the cluster of
points, the cluster of points would no longer
look clustered.

19
K functions
• The average nearest neighbor (ANN) statistic is one of many distance based
point pattern analysis statistics. Another statistic is the K-function which
summarizes the distance between points for all distances.
• The calculation of K is fairly simple: it consists of dividing the mean of the
sum of the number of points at different distance lags for each point by the
area event density.
• For example, for point S1 we draw circles, each of varying radius d,
centered on that point. We then count the number of points (events)
inside each circle. We repeat this for point S2 and all other points Si. Next,
we compute the average number of points in each circle then divide that
number by the overall point density λ (i.e. total number of events per
study area).

20
K functions
• The K function is a powerful
approach to identify the multi-scale
patterns of points. The three
aspects/steps/factors of K function
are:
– (i) Construct a circle with a
radius d around each point i;
– (ii) Count the total number (n) of
points that fall inside any of the
circles (excluding the points at the
circle centers); and
– (iii) Increment d by a small fixed
amount and repeat the first two
steps.

21
Tools and Packages for PPA
Package Programming Description
Language
There are many ArcMap(link is external) None required Includes a comprehensive toolkit for PPA, such as
the calculation of descriptive statistics, distance-

software packages based measures, density-based measures, etc.

and tools ArcGIS Pro(link is


external)
None required Includes a comprehensive toolkit for PPA, such as
the calculation of descriptive statistics, distance-

developed to
based measures, density-based measures, etc.

analyze and model ArcPy(link is external) Python A Python package that allows users to conduct the
spatial analysis function of ArcGIS in Python.

point patterns. CrimeStat(link is


external)
None required A package specialized in point-based crime data
analysis.
Here is a list of PySAL(link is external) Python A geospatial data science tool in Python that
includes a sub-package “pointpats” for PPA.
the few commonly QGIS(link is external) None required An open-source geographic information system

used ones: that includes customized plugins for PPA.

spatstat(link is external) R An R package specialized in spatial point pattern


analysis in both 2-dimensional and 3-dimensional
spaces. 22
Case Study: Cholera in London, 1854

23
Background

• To demonstrate the effectiveness of PPA, we use the deadly


cholera outbreaks in 1854 London as an example (Snow, 1855).
• At the time, many people believed this infectious disease was
spread by inhaling mysterious “miasmas” (i.e., bad air).
• By mapping the cholera deaths near the Soho neighborhood,
however, Dr. John Snow hypothesized that it could be caused
by oral consumption of contaminated food or water sources.
• Dr. Snow purportedly stopped the epidemic by removing the
pump handle at Broad Street.
24
Background
• One could also test Dr. Snow’s
suspicion of the Broad Street
pump by deriving the mean
center of all cholera death
cases in the Soho
neighborhood.
• It was clear that the mean
center and the median center
of all cholera cases in Soho
were centralized around the
water pump at Broad Street.

25
Background

• Using this dataset, the median center was not far from the
mean center as well. Moreover, the directional distribution
(i.e., the 1st and 2nd standard deviations) fits the pattern of
these cases very well. The cholera data used in this case study
has a count attribute for each address, so it was used as a
weight to calculate the point pattern analysis. Hence, Equation
1 for the mean center (μx, μy) becomes

26
Mean center

• Equation 1 for the mean center (μx, μy) becomes

where wi is the death count at location i as the weight, and n is


the total number of points. The weighted mean center can be
derived as:
μx = (1,114,518 ´ 3) + (1,114,522 ´ 2) + … + (1,114,551 ´ 1)/489 = 1,114,624
μy = (5,744,204 ´ 3)+ (5,744,198 ´ 2) + … + (5,744,032 ´ 1)/489 = 5,744,215

27
Mean center

28
Average Nearest neighbour
• One may also analyze the point
pattern of the London cholera
outbreak based on spatial
statistics. By comparing the
expected and observed distances
between the nearest neighbor
among all cholera cases, the ratio
of the average nearest neighbor is
0.76, indicating a significant
clustering pattern at the 0.01 level

The results of the average nearest neighbor statistics.

29
Average Nearest neighbour
• The London cholera outbreak’s clustered
pattern also makes it possible to explore the
nearest neighbor of any of the outbreak’s
geographic features to examine their spatial
relationship.
• For example, Thiessen polygons delineate
the “area of influence” for each feature in
set A (e.g., water pumps), so that any
feature(s) of set B (e.g., cholera cases) that
fall inside that particular Thiessen polygon
would be closer than any other features of
set A.
• As shown in the figure, most cholera cases
were inside the Thiessen polygon of the The Thiessen polygons of water pumps in 1854 London
Broad Street pump, which makes it a
reasonable suspect based on their spatial
relationship.
30
Density Functions
• The density surface can provide another
way to visualize the concentration of
cholera cases.
• By measuring the number of cases per unit
area over a region, the kernel density
surface yet again pinpoints the highest
concentration of cholera cases at the
water pump at Broad Street
• Here, the study area is chosen as the
bounding box of the cholera cases. Note
that the mean center of cholera cases also
aligns well with the highest density, and it The kernel density of cholera cases in 1854 London
also falls inside the Thiessen polygon of
the Broad Street water pump as well.
31
Summary
• Point pattern analysis provides an effective way to visualize and interpret the
distribution of point patterns across space.
• It is particularly useful for conducting exploratory analysis at an early stage of a
research project. This lecture reviewed commonly used methods for PPA,
including descriptive statistics, distance-based measures, and density-based
measures.
• These measurements provide effective tools for understanding the global and
local patterns of point data.
• Therefore, it is possible to adopt any of the aforementioned point pattern
analyses to examine the geographic pattern of cholera cases and its spatial
relationship with other geographic features, which allows us to examine
probable theories and associated hypotheses to gain a better understanding of
the underlying phenomenon.

32
Assignment A06
Each student is required to identify a unique phenomenon that exhibits point
patterns. This could be disease outbreaks, locations of facilities, settlements, e.t.c.
• Identify, describe and document the phenomenon based on peer-reviewed
research
• Collect and process the point data
• Develop an hypothesis and framework for pattern analysis
• Perform point pattern analysis (at least 5 metrics)
• Describe the patterns and why
The outputs from this assignment are: 1) a detailed report (5 pages max), 2) a
database containing the data, models, inputs and outputs and 3) scripts used in
data processing. Submission link https://forms.gle/qXV7ncJWP8qDHei58
deadline next week 9am
NB: Class representative to coordinate to avoid duplication. Similar submissions
will be rejected.
33

You might also like