01 Spatial - Correlation
01 Spatial - Correlation
Overheads
D G Rossiter
Department of Earth Systems Analysis
International Institute for Geo-information Science & Earth Observation (ITC)
<http://www.itc.nl/personal/rossiter>
Topic: Resources
There are many resources, at various mathematical levels, some aimed at
particular applications. These lists are not comprehensive but should be good
starting points:
• Texts
• Web pages
• Computer programmes
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 2
Texts: Applied
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 3
Texts: Mathematical
• Chilès, J.-P. and Delfiner, P., 1999. Geostatistics: modeling spatial uncertainty.
Wiley series in probability and statistics. John Wiley & Sons, New York.
• Cressie, N., 1993. Statistics for spatial data. John Wiley & Sons, New York.
• Ripley, B.D., 1981. Spatial statistics. John Wiley and Sons, New York.
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 4
• Stein, A., Meer, F.v.d. and Gorte, B.G.F. (Editors), 1999. Spatial statistics for
remote sensing. Kluwer Academic, Dordrecht.
• Davis, J.C., 2002. Statistics and data analysis in geology. John Wiley & Sons,
New York.
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 5
Web pages
• R: http://www.r-project.org/
• gstat: http://www.gstat.org/
• gslib: http://www.gslib.com/
• GEOEAS: http://www.epa.gov/ada/csmos/models/geoeas.html
• ILWIS: http://www.itc.nl/ilwis/
Computer programmes
* gstat, by Pebesma
* spatial, by Ripley
* geoR, by Ribeiro & Diggle
* spdep, by Rowlingson & Diggle
* spatstat, by Baddeley & Turner (point pattern analysis)
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 7
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 8
What is “space”?
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 9
Geographic space
• Two-dimensional: coordinates are on a grid with respect to some origin (0, 0):
(x1, x2) = (x, y) = (E, N )
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 10
Feature space
• Not included in the common use of the term “spatial” data or analysis
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 11
4. All data sets from a given area are implicitly related by their coordinates →
models of spatial structure
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 12
• Euclidean distance:
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 13
For fields:
2. Fields have an implict distance metric, from the row & column positions (the
natural coordinate system of a field)
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 14
Key Concepts
• Spatial structure: the nature of the spatial relation: how far, and in what
directions, is the spatial dependence? How does the dependence vary with
distance and direction between points?
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 15
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 16
Point distribution
This shows how sample points are distributed in space.
• Random or clustered?
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 17
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 18
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 19
• Are values of closeby points similar to each other, or do the values appear to
be random?
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 20
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 21
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 22
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 23
Geographic postplot
This shows the postplot against a background that may explain the distribution of
samples or values. Examples:
• structural geology
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 24
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 25
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 26
Spatial Correlation
• That is, does knowing the value of some variable at some location give us
information on the value at ‘nearby’ locations?
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 27
• Sample covariance:
n
1 X
sXY = (xi − x) · (yi − y)
n − 2 i=1
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 28
Autocorrelation
We want to apply the idea of correlation to one variable (auto-correlation).
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 29
Spatial autocorrelation
Two methods; the variable to be autocorrelated can be:
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 30
• Moran’s I statistic: a simple approach which works for regions of arbitrary size,
as long as “adjacency” is well-defined (but here we will use distance classes)
P P
• wij = 1 iff the point pair is in the distance class, so n/ i j wij is the inverse
proportion of all points in this class.
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 31
Interpreting Moran’s I
• Higher values (ILWIS > 0): positive spatial autocorrelation (values separated
by this distance tend to be similar)
• Lower values (ILWIS < 0): negative spatial autocorrelation (values separated
by this distance tend to be dissimilar
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 32
Covariance
n
X
(xi − x)(yi − y)
i=1
• Units are the product of the two variable’s units; not standardized (that is the
correlation)
• This is a large number! For example, with 200 points this is 19,900 point pairs.
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 33
Semi-variances
1
γ(~xi, ~xj ) = [z(~xi) − z(~xj )]2
2
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 34
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 35
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 36
m(~
h)
1 X
γ(~h) = [z(~xi) − z(~xj )]2
2m(~h) i=1
• In practice, we have to define the set of vectors in each “bin” (to have enough
points)
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 37
np are the number of point pairs in the bin; dist is the average separation of
these pairs; gamma is the average semivariance
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 38
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 39
2.0 ● 543
● 452
● 477
● 500
● 457 ● 415
● 589
● 564
● 574
1.5 ● 533
● 547
semivariance
● 457
1.0 ● 419
● 299
● 57
0.5
0.0
0 500 1000 1500
distance
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 40
In the previous slide, we can estimate the sill ≈ 1.9, the range ≈ 1200 m, and the
nugget ≈ 0.5 i.e. ≈ 25% of the sill.
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 41
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 42
• Distance interval, specifying the centres. E.g. (0, 100, 200, . . .) means
intervals of [0 . . . 50], [50 . . . 150], . . .
• All point pairs whose separation is in the interval are used to estimate γ(~h) for
~h as the interval centre
• Narrow intervals: more resolution but fewer point pairs for each sample
> v<-variogram(log(cadmium)~1, ~x+y, meuse, boundaries=seq(50,2050,by=100))
> plot(v, pl=T)
> par(mfrow = c(2,3)) #show all six plots together
> for (bw in seq(20, 220, by = 40)) {
v<-variogram(log(cadmium)~1, ~x+y, meuse, width=bw)
plot(v$dist, v$gamma, xlab=paste("bin width", bw))
}
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 43
● ● ●
2.0
●● ●
● ● ● ●
●● ● ● ● ● ●
● ●
2.0
●
1.8
●
●● ●● ● ●●● ● ● ●●
● ● ●
●● ● ● ● ● ● ● ●
● ●● ●
● ●
● ●
● ● ●●● ●
●●
1.6
● ● ●
1.5
●
●● ● ● ● ● ●
●
1.5
●● ●● ● ●
● ●● ●
v$gamma
v$gamma
v$gamma
●
● ● ●●
1.4
●● ● ●
●
● ● ●●
● ●
1.0
● ●
1.0
● ● ● ●
1.2
● ●
●●
●
●● ● ●
1.0
● ●
0.5
0.5
●
0.8
●
● ● ●
● ● ● ●
●
● ● ●
1.8
● ● ●
1.8
1.8
●
●
● ● ● ●
1.6
1.6
1.6
●
●
v$gamma
v$gamma
v$gamma
1.4
●
1.4
1.4
●
●
1.2
1.2
1.2
1.0
● ●
●
1.0
1.0
0.8
●
0.8
● ● ●
0.8
500 1000 1500 200 600 1000 1400 200 600 1000 1400
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 44
• Each bin should have > 100 point pairs; > 300 is much more reliable
> v<-variogram(log(cadmium)~1, ~x+y, meuse, width=20)
> plot(v, plot.numbers=T)
> v$np
[1] 6 19 27 27 51 65 58 62 62 82 76 75 86 81 76
[16] 91 92 90 88 92 112 103 80 116 108 106 79 94 117 99
[31] 100 101 108 117 110 117 114 107 96 110 109 106 114 117 104
[46] 98 94 117 92 110 105 91 89 98 89 91 103 102 93 92
[61] 73 85 88 91 88 84 75 81 90 73 93 95 76 85 67
[76] 77 88 60
> v<-variogram(log(cadmium)~1, ~x+y, meuse, width=120)
> v$np
[1] 79 380 485 577 583 642 654 648 609 572 522 491 493 148
> plot(v, plot.numbers=T)
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 45
Topic: Anisotropy
• Greek “Iso” + “tropic” = English “same” + “trend”; Greek “an-” = English “not-”
• This is why we refer to the separation vector; up till now this has just meant
distance, but now it includes direction
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 46
• Directional process
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 47
• If we can find orthogonal axes of maximum and minimum range and if the
same semi-variogram model can be fitted, the coordinates can be transformed
from an oblique ellipse to a circle
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 48
Zonal anisotropy
• Variance is inherently different in the two zones; this is called zonal anisotropy.
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 49
• This is not a map! but rather a plot of semivariances vs. distance and direction
(the separation vector )
• Each cell shows the semivariance at a given distance and direction (lag)
• Symmetric by definition
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 50
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 51
• Parameters to specify:
1. Direction of the major or minor axis in 1st quadrant; implicitly specifies
perpendicular as other axis; in ILWIS as Azimuth (degrees) clockwise from
Y (North), as with a compass; corresponding minor or major axis is then
+π/2 = +90◦ clockwise
2. Tolerance: Degrees on either side which are considered to have the ‘same’
angle
3. Band width: Limit the bin to a certain width; this keeps the band from taking
in too many far-away points
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 52
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 53
● 72 3.0
●
● 51 ●
● 17 ●
● 68● ● 2116
●
2.5 ●
51 ● 62 ●
● ●
● 88 ●
2.5
104 80
●
● 44 ●
● 91
● 4 2.0 ●
●
115
●
● 98 ●100
● 30● ●
16 ● 99
● 89 1.5 ●
● 95 ●
●118 ● 18
● 10 ● ● ● ●
● 70● 97● 98 1.0 2.0 ●
●
● 62 ●
● 16 ●
semivariance
0.5 ●
semivariance
●
● ●
●
● ● ●
● ● ●
●
0.0 1.5 ● ● ●
0 45 ●
● ●
3.0
109 ● 96
●
● ●
●120 ●
2.5 ●
●
● ●
●
● ● ●
●137
1.0 ● ●
156 135
●
2.0 ●
●156 ●
●158
●275 ●
●158 ●159 ●299
●297
1.5 ●154 ●226 ●264 ●282
●209 ●283 ●274
●
●
● 76 136 ●177
●
0.5
1.0 134
● ●118 ●172 ●
●109
0.5
● 91
● 12 ●
● 11
0.0 0.0
0 500 1000 1500 0 500 1000 1500
distance distance
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 54
This can give insight into the spatial process by which the points were
placed(repulsion, attraction, . . . )
References:
• Boots, B. N. & Getis, A. (1988). Point pattern analysis. Newbury Park: Sage;
• Ripley, B. D. (1981). Spatial statistics. New York: John Wiley and Sons.
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 55
Examples
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 56
• If points “attract” and form compact groups, with large spaces in between:
clustered
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 57
1. Points-in-area
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 58
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 59
Test of CSR : χ2
• Count the number of sample points falling in each circle, and summarize how
many times 0, 1, . . . are found ⇒ observed
• Calculate (by the Poisson distribution) the probability of a test circle containing
each number under the assumption of CSR
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 60
• Note: sample size must be large enough so most expected counts > 5; if not
must increase the size of test circles.
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 61
Distance methods
• Problem with area techniques: arbitrary areas; choice of size can affect
interpretation; only tests first-order effects
• Measure 1: Reflexive nearest neighbours (RNN): two points are first order
RNN if they are each other’s nearest neighbours; measure the distance
between them
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 62
• Calculates the distances between point pairs of a given order and uses this as
the ordinate
• Example: 2 points (out of 459) have at least one neighbour within by 1.8m;
frequency is 2/459 = 0.00436
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 63
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 64
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 65
References
[2] Paulo J. Ribeiro, Jr. and Peter J. Diggle. geoR: A package for geostatistical
analysis. R News, 1(2):14–18, June 2001.
[4] Roger Bivand. More on spatial data. R News, 1(3):13–17, September 2001.
[8] Brian D Ripley. Spatial statistics. John Wiley and Sons, New York, 1981.
[10] A Stein, Freek van der Meer, and B G F Gorte, editors. Spatial statistics for
remote sensing. Kluwer Academic, Dordrecht, 1999.
[11] John C. Davis. Statistics and data analysis in geology. John Wiley & Sons,
New York, 3rd edition, 2002.
[14] Noel Cressie. Statistics for spatial data. John Wiley & Sons, New York,
revised edition, 1993.
D G R OSSITER
AN INTRODUCTION TO APPLIED GEOSTATISTICS 67
[19] J-P. Chilès and P. Delfiner. Geostatistics: modeling spatial uncertainty. Wiley
series in probability and statistics. John Wiley & Sons, New York, 1999.
[23] B.N. Boots and A. Getis. Point pattern analysis. Scientific Geography series
8. Sage, Newbury Park, 1988.
D G R OSSITER