0% found this document useful (0 votes)
150 views

R Spatialpointpatterns

Corso GIS

Uploaded by

zarcone7
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
150 views

R Spatialpointpatterns

Corso GIS

Uploaded by

zarcone7
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

2

c
Copyright CSIRO
Australia 2008

Analysing spatial point patterns in R


Adrian Baddeley
[email protected]
[email protected]

All rights are reserved. Permission to reproduce individual copies of this document for
personal use is granted. Redistribution in any other form is prohibited.
The information contained in this document is based on a number of technical, circumstantial
or otherwise specied assumptions and parameters. The user must make its own analysis and
assessment of the suitability of the information or material contained in or generated from this
document. To the extent permitted by law, CSIRO excludes all liability to any party for any
expenses, losses, damages and costs arising directly or indirectly from using this document.

Workshop Notes
February 2008
c
Copyright CSIRO
2008
Abstract
This is a detailed set of notes for a workshop on Analysing spatial point patterns that has
been held several times in Australia and New Zealand in 20062008.
It covers statistical methods that are currently feasible in practice and available in public
domain software. Some of these techniques are well established in the applications literature,
while some are very recent developments.
The workshop uses the statistical package R and is based on spatstat, an add-on library
for R for the analysis of spatial data.
Topics covered include: statistical formulation and methodological issues; data input
and handling; R concepts such as classes and methods; nonparametric intensity estimates;
goodness-of-t testing for Complete Spatial Randomness; maximum likelihood inference for
Poisson processes; model validation for Poisson processes; distance methods and summary
functions such as Ripleys K function; non-Poisson point process models; simulation techniques; tting models using summary statistics; Gibbs point process models; tting Gibbs
models; simulating Gibbs models; validating Gibbs models; multitype and marked point patterns; exploratory analysis of marked point patterns; multitype Poisson process models and
maximum likelihood inference; multitype Gibbs process models and maximum pseudolikelihood; and line segment data.

This workshop requires R version 2.6.0 or later, and spatstat version 1.12-6 or later.

Acknowledgements
The author gratefully acknowledges countless comments and suggestions from workshop participants, and the support of CSIRO Mathematical and Information Sciences, The New
Zealand Statistical Association, The University of Waikato, The Statistical Society of Australia and The University of Western Australia.
c
Copyright CSIRO
2008

CONTENTS

Contents
1 Introduction

2 Statistical formulation

12

3 The R system

16

4 Introduction to spatstat

18

5 Objects, classes and methods

25

6 Data input

31

7 Methods 1: Investigating intensity

36

8 Dening the window

40

9 Manipulating point patterns

45

10 Methods 2: Tests of Complete Spatial Randomness

53

11 Methods 3: Maximum likelihood for Poisson processes

58

12 Methods 4: checking a tted Poisson model

67

13 Images in spatstat

74

14 Simple models of non-Poisson patterns

79

15 Methods 5: Distance methods for point patterns

83

16 Methods 6: inference using summary statistics

98

17 Methods 7: adjusting for inhomogeneity

105

18 Gibbs models

109

19 Methods 8: tting Gibbs models

116

20 Methods 9: validation of tted Gibbs models

125

21 Marked point patterns

129

22 Handling marked point pattern data

133

23 Methods 10: exploratory tools for marked point patterns

138

24 Methods 11: multitype Poisson models

151

25 Methods 12: Gibbs models for multitype point patterns

157

26 Line segment data

162
c
Copyright CSIRO
2008

CONTENTS

27 Further information on spatstat

164

Bibliography

165

Index

167

c
Copyright CSIRO
2008

Introduction

Introduction

1.1
1.1.1

Types of data
Points

A point pattern dataset gives the locations of objects/events occurring in a study region.

The mark could be multivariate, or even more complicated.

1.1.3

150

160

Our dataset may also include covariates any data that we treat as explanatory, rather than
as part of the response.
Covariate data may be a spatial function Z(u) dened at all spatial locations u, e.g. altitude,
soil pH, displayed as a pixel image or a contour plot:

140

The points could represent trees, animal nests, earthquake epicentres, petty crimes, domiciles
of new cases of inuenza, galaxies, etc.
The points might be situated in a region of the two-dimensional (2D) plane, or on the Earths
surface, or a 3D volume, etc. They could be points in space-time (e.g. earthquake epicentre
location and time). The software presented here is only applicable to 2D point patterns (but
were working on it).

Covariates

130

Marks

120

1.1.2

The points may have extra information called marks attached to them. The mark represents an
attribute of the point. The mark variable could be categorical, e.g. species or disease status:

Covariate data may be another spatial pattern such as another point pattern, or a line
segment pattern, e.g. a map of geological faults:

off
on

The mark variable could be continuous, e.g. tree diameter:


c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

1.2 Typical scientic questions

1.2

Introduction

Typical scientic questions


Japanese Pines

1.2.1

Intensity

Intensity is the average density of points (expected number of points per unit area). Intensity
may be constant (uniform) or may vary from location to location (non-uniform or inhomogeneous).

inhomogeneous

uniform

1.2.3

Covariate eects

For a point pattern dataset with covariate data, we typically


investigate whether the intensity depends on the covariates
allow for covariate eects on intensity before studying interaction between points

regular

clustered

Example 1 (Japanese pines) Locations of 65 saplings of Japanese pine in a 5.7 5.7 metre
square sampling region in a natural stand.
Main question: is the spacing between saplings greater than would be expected for a random
pattern? (reecting competition for resources)
c
Copyright CSIRO
2008

160

++++
+ +
+ +
++++
++++ ++++++++++ ++ +++ ++ +++
++
+ ++++
+ ++
++++
+
+
+ + +++++
+ ++
++
+
+++
++
++
++ +++
++ ++ + + +++++
+ ++ +
+++ +
+++
++ +
+ +
+ + +++ +
++
++
++++++++
++
++
+++
+
+
++++
+
+++
+ ++++++
+ +++ + + ++++++
+++
+
+
+
+
+++
+
+ ++
++++
+
+
+++++ +
+ +
+ + ++
++
+
+
++
+
+++
+ ++++
+
++
+++
+++
+
++
+
+
+++
++++
++
++++++++
+++++++++ +
+
+
+
+++++ + ++++++
+ ++++ + + + +
+ ++ ++
++++
+++
++
+ + + + +++ ++
++++
++++++ + +++ +
+ + ++ + +++ +
+
+++++
+++++ +++++++++++
+ ++ ++ ++++
++++++++++++ ++++++++++++ + ++++ + +++++++++
+++++ +
+
++ +++++++
+
++
++
+ +
+++++
++
+
+++++
++++++++
+
++ ++
++ +++ +
++++
+++
+++ ++
+
++ ++
+ +++ ++++ + +++ +++++ +++++++
+ +++
++++++
++ ++++++
++
++++ + ++ + +++++ +++ + +
+
+
+
+
+
+
+++
+
+ ++
+
+ +
+
+
+
+
+
+
+
+
+++++
+
+
+
+
+
+
+
++
+
++ +
+
++ +++
+
++ + ++++
+
++++ + ++ + ++ + ++
++++++ ++++ +
++++++ +++ ++ + ++++++ + +
+++++
++ +++
+++++
+++++
+
+ + + ++++ +++++ +
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +++
++++++++++
++ + + ++++
++++ +
+++++++ ++
+ + ++
+ +++
++++
+ + +
+
+ ++++
++ +
+
+ ++ +++++
+ ++++
+ +++++ ++ ++
+++
+
+ + ++++++
++ + + +
++
++
+
+++++
++++
+ +++
++
++
++++++ +++ + + +
++++++
+
++
+ +++++ ++ +++
+
+ +
++
++ +
+++ ++
++++ + +++
+
++ +
++
+
++
+++++++++++++++ + +
+ +++++++++ +++
+++
+++
+
+++++
+ ++ +
+ ++++++
+
+++ +
+ + +++++++ ++ +++ +++++++
++
+ ++++ + ++++ +
+
+++
+++++
+ + + + ++++++
++++++
++
+
+++
+++ +++++
++++
++
+ +
+ + + ++
++ +
+
+++
++++
+
+ +
++++++
++++ + ++ +
++
+
++ +++++++
+
+++ ++++++++++
+ ++
+
+++
++
+
+++
+
+ ++++
+++++ +
+++
++
++++++++
++ +++ ++ ++++
+++
+
++ +
+++++
++
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
++
+
+
+
+
+
++ +
+
+
+++ + ++
++
++ +++++ ++ +
++ +++++++ +++++
++++ ++
++++
+++++
+++
++++ ++
+
+++++
++++
+ +
+
+ + ++++
++
++++++
++ ++++
+
+ ++++
+++
+
+++
+++
+ +++
+ + +++
++++
+++ ++++
+
+++++
+++ +++
++
++
+ + ++
+++
+ +
+ +
++ +++++++
++++
+ ++ + +
+
++++
++
+++
++ ++++
+++ + +
++
+ ++ +
+++++++ +++++
+ + ++
++ ++
+++
+ ++
+
++
+ ++++
+ + ++ +
+ +
++
++++++++++++
+ +
+++++++++++++++++++++
+ + +++++ ++++
++
+
++++++
+++++++
++++++
++
+ ++++
++
+
+
+
+
+
+
+
+
+
+
+
+
+++
+
+
+
+
+
+
+
++
+ +++++++++++
++++
++
+
+
++
++++++++
+ +++ ++
++++++
++
+++++
+
+
+
++
+++++
+
+
+++
++ +
+++
+ +++
+++++
+++++++ ++++
++++ ++ +
+
++
+ +
+
+ +
+
+
++ +
++
+++
++++ ++++ ++
+ +++ +++ + +
++
++++++++
+
+ +
++++ + +++
++
+ +
+
++ ++++
++++++++
++
+ + ++ +
+++
+
+ ++
+ + + + +++ + +++
+
+++++
+ ++
+
+
++++++
+ +++ + + +
+++ + +
+++++++++ +++++ +
+
+ ++
+
+++ +
+ ++ + +
+
+
+
+
++ + + +
++++++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++ +++ ++
+++
+
+ + ++ +
+
+ + ++++++ + +
+++ +
+ +
++++ +
++
+
+
+
+
++
++
+ ++++
+
+ +++
++
++
++ +
++
+++
+ ++
++
+
+++
+
++++
++
+++++
+ +++ ++ + ++
+
+
++
+++
+
+
+ + +++++ + ++++
++
++++ ++ + + + ++ + + + ++ + +
++ + +
++
++
+++++
+++ +
++ + + ++++++++ +
++
++
+
++
+++++++
++
++++
++++
++++++
++ ++++++
+++
+ ++ ++ +
++
++ +++
+++
+
+ ++ +
+ ++ ++
++
++++
+++
+++
++++
+++ ++++++ +++
+ +
++++
+ +++++
++ ++++++ ++ +
+ ++
+ +++
++ ++ + + ++++++++++
++ + +
+
+ +++ ++++++ +
+++ ++++++
++++
++ +++
+
+
+++++
+
+
++ +++
+ + + + ++ + + +
++++++++
+++++++++
+ +++++
++ ++++ ++ ++
++++++++
++
+++++
++
+++++ +
+
++
++ +++
+
+ ++ +
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ++++++
++ + + ++ + + + ++ ++++++
++ +++
+ ++
++++++ + ++ ++ ++++
++++
+ +++ +
+
++++
++
+ +++ +++++
+++
++
+ +
++++++++++ +++++++++
+ +++++++
++
+ +++ +++++
++
+ ++
++
+
++++
++ ++ + + +
+++
+
++ + + ++++++++ ++++++
++
++ + + + ++ ++ + ++++ +++
+ ++++
+++++++
+++
+++ +++++++++++
+
++
+
++ ++++
+
++
++
+++ + + ++
+++++++ + + + +
+++
+
+ + ++++ +
+
++++++
++++
+
++++
+ + ++
++
++
+
++ + +
+
+ +
+
++++ +++++
++
++
++
+++++ ++ +++++ ++ ++
++
++
++
++ + +++ +++ ++ +++ + + + ++++
++++ ++
+
+++
+ ++
+ ++++
+ ++ +
+++

150

independent

140

Interpoint interaction is stochastic dependence between the points in a point pattern. Usually
we expect dependence to be strongest between points that are close to one another.

Example 2 (Tropical rainforest data) Locations of 3605 trees in a tropical rainforest, with
supplementary grid map of elevation (altitude).
Main questions: (1) does tree density depend on slope? (2) after accounting for variation in
tree density due to slope, is there evidence of clustering of trees?

130

Interaction

120

1.2.2

Example 3 (Queensland copper data) A intensive mineralogical survey yields a map of


copper deposits (essentially pointlike at this scale) and geological faults (straight lines). The
faults can easily be observed from satellites, but the copper deposits are hard to nd. The main
question is whether the faults are predictive for copper deposits (e.g. copper less/more likely to
be found near faults).
c
Copyright CSIRO
2008

1.2 Typical scientic questions

10

Introduction

blackoak
+ ++
+
+ +
++
++ +
+ ++
+ + ++++
+
++
+ +
+++
+
++++
+++
+
+
+
+
+
+++ +
++
+
++ +
++ +
+++ + +++ + + +
+
+
+ ++
+
+
+
+
+
+ +
+ +

Example 4 (Chorley-Ribble data) An apparent cluster of cases of cancer of the larynx occurred near a disused industrial incinerator. The area health authority mapped the domicile
locations of all cases (58) of cancer of the larynx and, for control purposes, a random sample of
cases (978) of lung cancer.
Main question: after allowing for spatial variation in density of the susceptible population
(for which the lung cancer cases are a surrogate), is there evidence of raised incidence of laryngeal
cancer near the incinerator?

+
++

+++++++
+
+ ++ + + ++
+++++ +
++ +++ ++
++
+ ++
+

++
+

+
++
+
+

+
+ +
+ ++

hickory

maple

+ ++++++
+++ ++ +++ + ++++++ + + + +++
+ +++
+ +
+++ ++
++++ + +++
+
+
++++++++
++++++
+++ ++ + + ++++
++ + +++ + +
++++
+
+++
+
++
+
+ +++++ +
+++ ++
++
+++
+++
++
++++++++ ++ +++++
+++++++++
+ + ++++ +
+
+
+
+
+
+
+
+
++++ +
+ ++
+++ ++ + +
+++
+
+
++
+++++++ +
+
++++
+
+
+
++
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+ + ++
++ ++++
+++
+ ++++++
++
+
++++
+++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +++++++
++ + + ++
+++
+++
++ + +
+ ++
+++
++ ++
+ ++++
++
+
++ + + +
+++++++++
++
+
++ ++ ++++++ ++ ++
+
+++++++++++ + +++ +
+
+
++
+ +++
++ + ++
++ + ++++++ +
+++ + ++++ ++ ++ ++++
+++
+ + ++ +++++
+ +++ ++
++
++++
+ +++ +++++++
+++
+++ + +
+ ++ +++
+ ++
++ +
++
+ ++++++++ ++++ +++
+
+
+++
+
+
+
+ ++++ ++++++
+ + + ++++ ++ ++
++++
+
++ ++ ++
+
+ +
+
+ +++
++
+ + ++
++ ++ ++
+ +
++
++++++
++
+ ++++
+
+ ++++ +++ +
+
+ +
+
+ +
+ ++
+ + ++++
+ +
++
+
+
+
+ +
+ + +++ ++ +
++++
++++
+
+
+
+
++ +
+++
++
++ +
++ + +++
++
+
+
++++ +
+ ++ ++
+
+
+
+
+
+
+
+++ + +
+ + +
++
+

+
+
+
+ +++
+
+++
+ ++ +
++
++
++ + +
+
+ +++
+
+
+
+
+ ++
+ +
+ + +++ + + ++
+ ++ ++
+ +
+ ++++++++
++++ +++
++ +
+
+ ++++++
++
+++++ ++
+ +
+ +++
+
+ ++ + ++++++
+ ++
+ ++++++
+
+
+++++ +
+ +++++ +
++++++ +
+ +++
++ +
++ +++ +++ ++++++ +
++
++++ +++ + +
+++
+ ++ + + ++ + +++
+
++
+ ++
+
+
+
+
+ ++ + + + +
+
+ + ++
+
+
+
++++
+
++++
+ +
++ ++
++ ++ + + + + +++ ++
++
+
++ ++++
+
+ + +++
+
+
+
++
+
+
+
+
+
+
+
+
+ +
+ + + + +
+
++
++ + ++ ++++++
++
++ ++ ++++++++++ ++ ++++++++++
++++++
+ +++ +
++
++ ++ ++++++++ + ++
+
+
+
+ +++++
++
++ ++
+ +++ +
+ ++
++ ++
+
+ + + ++ + ++ ++
+ + ++++
+ +++
++ ++ +++ +
+ + + +++ ++++++++ +++++++ +
+++ ++ ++ ++
+++ ++ + ++ + + + + +++++
+
+
+
+
+
+ +
+++ ++
+

Example 6 (Longleaf Pines) In a forest of Longleaf Pine trees in Georgia, USA, the locations
of 584 trees were recorded along with their diameter at breast height (dbh), a convenient surrogate
measure of size and age.
Main question: explain any spatial variation in the density and age of trees.

Longleaf Pines

ChorleyRibble Data

larynx
lung
incinerator

1.2.5

1.2.4

Dependence between points of dierent types

In a point pattern dataset with categorical marks, (aka multitype point pattern), dependence
between the dierent types may be formulated either as

Segregation of points with dierent marks

In a marked point pattern, we need to investigate whether points with dierent mark values are
segregated (found in dierent parts of the study region).

interaction between the sub-pattern of points of type i and the sub-pattern of points of
type j; or
dependence between the mark values of points at two specied locations.

Example 5 (Lansing Woods) In a 20-acre study region in Lansing Woods, Michigan, the
locations of 2251 trees and the botanical classication of each tree were recorded.
Main question: is the study region divided into domains where a single tree species dominates,
or are the dierent species randomly interspersed?
c
Copyright CSIRO
2008

Example 7 (Amacrine cells) The retina is a at sheet containing several layers of cells.
Amacrine cells occupy two adjacent layers, the on and o layers. In a microscope eld of
view, the locations of all amacrine cells were mapped, and classied into on and o .
Main question: is there evidence that the on and o layers grew independently of one
another?
c
Copyright CSIRO
2008

1.3 Overview of statistical methods

11

12

amacrine

2.1

Statistical formulation

Statistical formulation
Point processes

In this workshop, the observed point pattern x will be treated as a realisation of a random
point process X in two-dimensional space. A point process is simply a random set of points;
the number of points is random, as well as the locations of the points. Our goal is usually to
estimate parameters of the distribution of X.

off
on

2.2

Example 8 (Ants nests) The nests of two species of ants in a plot in Greece were mapped.
Auxiliary information records a eld/scrub boundary, and the position of a walking track.
Main question: does species A intentionally place its nests close to species B?

Should I treat the data as a point process?

Treating the point pattern as a point process eectively assumes that the pattern is random
(the locations of the points, and the number of points, are random) and that the pattern is
the observation or response of interest. A realisation of a point process is an unordered set of
points, so the points do not have a serial order (unless there are marks attached).
Example 9 A silicon wafer is inspected for defects in the crystal surface, and the locations of
all defects are recorded.

ants

This can be analysed as a point process in two dimensions, assuming the defects are pointlike. Were interested in the intensity of defects, spacing between defects, etc.

A
B

scrub

Example 10 Earthquake aftershocks in Japan are detected and their latitude, longitude and
time of occurrence are recorded.
field

This can be analysed as a point process in space-time (where space is the two-dimensional
plane or the Earths surface). If the occurrence times are ignored, it becomes a spatial point
process.
Example 11 The locations of petty crimes that occurred in the past week are plotted on a street
map of Chicago.

1.3

Overview of statistical methods

Statistical methods for spatial point patterns have a quirky history, and have not yet coalesced
into a mature statistical methodology. They include
summary statistics: the applied literature is dominated by ad hoc methods based on
evaluating a summary statistic (e.g. average distance from a point to its nearest neighbour)
with very little statistical theory to support them.
comparison to Poisson process: in the applied literature, hypothesis tests are invoked
chiey to decide whether the point pattern is completely random (a uniform Poisson point
process) whether or not this is scientically relevant. Lots of misunderstandings prevail.

This can be analysed as a point process. Were interested in the intensity (propensity for
crimes to occur), any spatial variation in intensity, clusters of crimes, etc. One issue here is
whether the recorded crime locations can be anywhere in two dimensional space, or whether
they are actually restricted to locations on the streets (making them a point process on a 1dimensional network).
Example 12 A tiger shark is captured, tagged with a satellite transmitter, and released. Over
the next month its location is reported daily. These points are plotted on a map.

modelling: only in the last decade has it nally become possible to formulate and t
realistic models to point pattern data. Theres still a lot of work to be done e.g. in
algorithms, model choice, goodness-of-t.

It is probably not appropriate to analyse these data as a spatial point process. At the very
least, the time of each observation should be included. They could be treated as a space-time
point process, except that its a strange process, as it consists of exactly one point at each instant
of time. These data should really be treated as a sparse sample of a continuous trajectory, and
analysed using other methods [which, alas, are fairly underdeveloped.] See the R package trip.

Well cover both classical and modern methods. Useful textbooks include [17, 21, 42, 33].
An important recent survey is [34].

Example 13 A herd of deer is photographed from the air at noon each day for 10 days. Each
photograph is processed to produce a point pattern of individual deer locations on a map.

c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

2.3 Assumptions about the data

13

Each day produces a point pattern that could be analysed as a realisation of a point process.
However, the observations on successive days are dependent (e.g. constant herd size, systematic
foraging behaviour). Assuming individual deer cannot be identied from day to day, this is
eectively a repeated measures dataset where each response is a point pattern. Methods for
this problem are in their infancy.
Example 14 In a designed controlled experiment, silicon wafers are produced under various
conditions. Each wafer is inspected for defects in the crystal surface, and the locations of all
defects are recorded as a point pattern.
This is a designed experiment in which the response is a point pattern. Methods for this
problem are in their infancy. There are some methods for replicated spatial point patterns
[9, 12, 22, 23, 26] that apply when each experimental group contains several point patterns.
Example 15 The points are not the original data, but were obtained after processing the data.
For example,
the original dataset is a pattern of small blobs, and the points are the blob centres;
the original dataset is a collection of line segments, and the points are the endpoints,
crossing points, midpoints etc;

14

Statistical formulation

Data are often supplied without information about the sampling window W . It is important to know the window W , since we need to know where points were not observed. Even
something as simple as estimating the density of points depends on the window. It would be
wrong, or at least dierent, to analyze a point pattern dataset by guessing the appropriate
window. An analogy may be drawn with the dierence between sequential experiments and
experiments in which the sample size is xed a priori.
For the same reason, it is not sucient to observe the values of covariates at the data points
only. In order to investigate the dependence of the point process on the covariate, we need to
have at least some observations of the covariate at other (non-data) locations.
Its implicitly assumed that all points of X within W have been mapped without omission.
Most models we use will assume that random points could have been observed at any location
in the window W , without further constraint. (Examples where this does not apply: GPS
locations of cars will usually lie along roads; certain cells lie only inside certain tissues).
When thinking about methodological issues its often useful to think about the discretised
version of a point process. Suppose the window W is chopped into innitely many pixels.
Each pixel is assigned the value I = 1 if it contains a point of X, and I = 0 otherwise. This
array of 0s and 1s constitutes the data that must be modelled. [e.g. obviously we cant model
the dependence of these indicators on a covariate if we only observe the covariate value at the
locations where I = 1.]

the original dataset is a space-lling tessellation of biological cells, and the points are the
centres of the cells.

00001001000100000000
10100000010001001000
00000010100000000010
00110000000100100000
00000001010001010010
00000100000000000000
00000000100000010100
01000000001010100100
00000000100000001000
00001000000001000000
10100010000000000001
00010000100100100100
00100100000000001000
10000000010100100010
00010010001000001000
00000001000000100000
00000000001000001010
00001000000010000000
00100000000100001000
10010010101001000000

This is a grey area. Point process methodology can be applied, and may be more powerful
or more exible than existing methodology for the unprocessed data. However the origin of the
point pattern may lead to artefacts (for example the centres of biological cells never lie very close
together, because cells have nonzero size) which must be taken into account in the analysis.

2.3

Assumptions about the data

The standard model assumes that the point process X extends throughout 2-D space, but
is observed only inside a region W , the sampling window. Our data consist of an unordered
set
x = {x1 , . . . , xn }, xi W, n 0
of points xi in W . The window W is xed and known. Usually our goal is inference about
parameters of X.

2.4

Marks and covariates

The main dierences between marks and covariates are that


marks are associated with data points;
marks are part of the response (the point pattern) while covariates are explanatory.
2.4.1

Marks

A mark variable may be interpreted as an additional coordinate for the point: for example
a point process of earthquake epicentre locations (longitude, latitude), with marks giving the
occurrence time of each earthquake, can alternatively be viewed as a point process in space-time
with coordinates (longitude, latitude, time).
A marked point process of points in space S with marks belonging to a set M is mathematically dened as a point process in the cartesian product S M . The space M of possible marks
may be anything. In current applications, typically the mark is either a categorical variable
(so that the points are grouped into types) or a real number. Multivariate marks consisting of
several such variables are also common.
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

2.4 Marks and covariates

15

y = {(x1 , m1 ), . . . , (xn , mn )},

The R system

A marked point pattern is an unordered set


xi W,

mi M

The R system

16

We will be using the statistical package R.

where xi are the locations and mi are the corresponding marks.


Marked point patterns are discussed in detail in section 21.

3.1

2.4.2

R is free software with an open-source licence. You can download it from r-project.org and
it should be easy to install on any computer (see the instructions at the website).
Books and online tutorials are available to help you learn to use R.

Covariates

3.2

How commands are printed in the notes

You can run an R session using either a point-and-click interface or a line-by-line command
interpreter. In these notes, R commands are printed as they would appear when typed at the
command line. So a typical series of R commands looks like this:
>
>
>
>

pi/2
sin(pi/2)
x <- sqrt(2)
x

120

130

140

150

160

Any kind of data may be recruited as an explanatory variable (covariate).


A spatial function, spatial covariate or geostatistical covariate is a function Z(u) observable (potentially) at every spatial location u W . Values of Z(u) may be available for a ne
grid of locations u:

How to obtain R

The values of a spatial function Z(u) may only be observable at some scattered sampling
locations u. An example is the measurement of soil pH at a few sampling locations. In this case,
the value of the covariate Z must be observed for all points xi of the point pattern x, and must
also be observed at some other non-data or background locations u W with u  x.
Alternatively, the covariate information may consist of another spatial pattern, such as a
point pattern or a line segment pattern. The way in which this covariate information enters
the analysis or statistical model depends very much on the context and the choice of model.
Typically the covariate pattern would be used to dene a surrogate spatial function Z, for
example, Z(u) may be the distance from u to the nearest line segment.

Note that you are not meant to type the > symbol; this is just the prompt for command input
in R. To type the rst command, just type pi/2.
In these notes we will sometimes also print the response that R gives to a set of commands.
In the example above, it would look like this:
> pi/2
[1] 1.570796
> sin(pi/2)
[1] 1
> x <- sqrt(2)
> x
[1] 1.414214
If the input is too long, R will break it into several lines, and print the character + to indicate
that the input continues from the previous line. (You dont type the +). Also if you type an
expression involving brackets and hit Return before all the open brackets have been closed, then
R will print a + indicating that it expects you to nish the expression.
> folderol <- 1.2
> sin(folderol * folderol * folderol * folderol * folderol * folderol *
+
folderol * folderol * folderol * folderol)
[1] -0.09132148

c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

3.3 Contributed libraries for R

3.3

17

Contributed libraries for R

Introduction to spatstat

In addition to the basic R system, the R website also oers many add-on modules (libraries or
packages) contributed by users. These can be downloaded from cran.r-project.org (under
Contributed Packages).
Packages that may be useful for analysing spatial data include:

ads
DCluster
fields
geoR
geoRglm
GeoXB
grasp
maptools
rgdal
sp
spatclus
spatialCovariance
spatialkernel
spatstat
spBayes
spdep
spgwr
splancs
spsurvey
trip

18

spatial point pattern analysis


detecting clusters in spatial count data
curve and function tting
model-based geostatistical methods
model-based geostatistical methods
interactive spatial exploratory data analysis
spatial prediction
geographical information systems
interface to GDAL geographical data analysis
base library for some spatial data analysis packages
detecting clusters in spatial point pattern data
spatial covariance for data on grids
interpolation and segregation of point patterns
Spatial point pattern analysis and modelling
Gaussian spatial process MCMC (grid data)
spatial statistics for variables observed at xed sites
geographically weighted regression
spatial and space-time point pattern analysis
spatial survey methods
analysis of spatial trip data

Introduction to spatstat

4.1

The spatstat package

Spatstat is a contributed R package for analysing spatial data, written by Adrian Baddeley and
Rolf Turner. Current versions of spatstat deal mainly with spatial point patterns in two
dimensions. The package supports
creation, manipulation and plotting of point patterns
exploratory data analysis
simulation of point process models
parametric model-tting
hypothesis tests, residual plots, diagnostics
Spatstat is one of the largest contributed packages available for R, with over 300 user-level
functions and a 500-page manual. It has its own web domain, www.spatstat.org, oering
information about the package.
Spatstat can be downloaded from cran.r-project.org (under Contributed packages
spatstat). To install spatstat you will also need to download the packages mgcv and sm.

4.2

Please acknowledge spatstat

If you use spatstat for research that leads to publications, it would be much appreciated if
you could acknowledge spatstat in your publications, preferably citing [4]. Citations help us
to justify the expenditure of time and eort on maintaining and developing the package.

4.3

Getting started

Here is a quick demonstration of spatstat in action. You can follow the demonstration by
typing the commands into R.
To begin any analysis using spatstat, rst start the R system, and type

To make use of a package, you need to:


1. download the package code (once only) without unpacking;

> library(spatstat)

2. install the package code on your system (once only);

The response will be something like this:


3. load the package into your current R session using the command library (each time you
start a new R session).
The installation step is performed automatically using R, not by manually unpacking the code.
Installation is usually a very easy process.
Instructions on how to install a package are given at cran.r-project.org. If you are running
Windows, rst start an R session. Then try the pull-down menu item Packages Install
packages. If this menu item is available, then you will be able to download and install any
desired packages by simply selecting the package name from the pulldown list. If this menu item
is not available (for internet security reasons), you can manually download packages by going
to the CRAN website under Contributed packages -- Windows binaries and downloading
the desired zip les of Windows binary les. To perform step 2, start an R session and use the
menu item Packages Install from local zip files to install.
If you are running Linux, step 1 is performed manually by going to the CRAN website under
Contributed Packages and downloading the tar le packagename.tar.gz. Step 2 is performed
by issuing the command R CMD INSTALL packagename.tar.gz.
c
Copyright CSIRO
2008

> library(spatstat)
This is mgcv 1.3-20
spatstat 1.12-7
Type help(spatstat) for information
The printout shows that, before loading spatstat, the system has loaded the package mgcv
that is required by spatstat. Then it loads spatstat, showing the version number of the
package.
For a list of the commands available in spatstat, type
> help(spatstat)
To get information on a particular command, type help(command).
To gain an impression of what is available in spatstat, you can run the package demonstration by typing demo(spatstat).
c
Copyright CSIRO
2008

4.4 Inspecting data

Inspecting data

20

Introduction to spatstat

> plot(density(X, 10))

0.014

100

density(X, 10)

0.01

40

To avoid typing swedishpines all the time, let us copy the data to another dataset with a
shorter name:

0.008

> data(swedishpines)

0.012

80

For our rst demonstration, well use one of the standard point pattern datasets that is installed
with the package. The Swedish Pines dataset represent the positions of 71 trees in a forest plot
9.6 by 10.0 metres.

60

4.4

19

> plot(X)

0.002

You can immediately plot the point pattern by typing

0.004

20

0.006

> X <- swedishpines

20

40

60

80

100

where 10 is my chosen value for the standard deviation of the Gaussian smoothing kernel.
If you prefer a contour plot,

> contour(density(X, 10), axes = FALSE)


density(X, 10)

Simply typing the name of the dataset gives you some basic information:
> X
planar point pattern: 71 points
window: rectangle = [0, 96] x [0, 100] units (one unit = 0.1 metres)
The contours are labelled in density units of trees per square decimetre.

Lets study the intensity (density of points) in this point pattern. For a few basic summary
statistics, type

4.5
> summary(X)

Exploratory data analysis

Planar point pattern: 71 points


Average intensity 0.0074 points per square unit (one unit = 0.1 metres)

Spatstat is designed to support all the standard types of exploratory data analysis for point
patterns.
One example is quadrat counting. The study region is divided into rectangles (quadrats) of
equal size, and the number of points in each rectangle is counted.

Window: rectangle = [0, 96] x [0, 100] units


Window area = 9600 square units
Unit of length: 0.1 metres

> Q <- quadratcount(X, nx = 4, ny = 3)


> Q

The coordinates are in decimetres (0.1 metre), so the average intensity is 0.0074 trees per
square decimetre or 0.74 trees per square metre.
To get an impression of local spatial variations in intensity, we can plot a kernel estimate of
intensity:
c
Copyright CSIRO
2008

y
x
[0,24]
(24,48]

[0,33.3] (33.3,66.7] (66.7,100]


4
5
7
3
9
3

c
Copyright CSIRO
2008

4.6 Multitype point patterns

(48,72]
(72,96]

6
9

21

7
7

6
5

22

Introduction to spatstat

marked planar point pattern: 2251 points


multitype, with levels = blackoak
hickory
maple
window: rectangle = [0, 1] x [0, 1] units (one unit = 924 feet)

> plot(X)
> plot(Q, add = TRUE, cex = 2)

misc

> summary(lansing)
Marked planar point pattern: 2251 points
Average intensity 2250 points per square unit (one unit = 924 feet)

*Pattern contains duplicated points*


Multitype:
frequency proportion intensity
blackoak
135
0.0600
135
hickory
703
0.3120
703
maple
514
0.2280
514
misc
105
0.0466
105
redoak
346
0.1540
346
whiteoak
448
0.1990
448

Another example is Ripleys K function. Ill explain more about the K function later. For
now, well just demonstrate how easy it is to compute and plot it. To compute the K function
for a point pattern X, type Kest(X). This returns an object which can be plotted.

Window: rectangle = [0, 1] x [0, 1] units


Window area = 1 square unit
Unit of length: 924 feet
> plot(lansing)

> K <- Kest(X)


> plot(K)

blackoak
1

hickory
2

maple
3

misc
4

redoak whiteoak
5
6

1000
0

500

K(r)

1500

lansing

10

15

20

r (one unit = 0.1 metres)

4.6

Multitype point patterns

A marked point pattern in which the marks are a categorical variable is usually called a multitype
point pattern. The types are the dierent values or levels of the mark variable.
Here is the famous Lansing Woods dataset recording the positions of 2251 trees of 6 dierent
species (hickories, maples, red oaks, white oaks, black oaks and miscellaneous trees).
> data(lansing)
> lansing
c
Copyright CSIRO
2008

In this plot, each type of point (i.e. each species of tree) is represented by a dierent plot
symbol. The last line of output above explains the encoding: black oak is coded as symbol 1
(open circle) and so on.
An alternative way to plot these data is to split them into 6 point patterns, each pattern
containing the trees of one species. This is done using split:
c
Copyright CSIRO
2008

redoak

4.6 Multitype point patterns

23

24

4.7

> plot(split(lansing))

split(lansing)
blackoak

hickory

maple

misc

redoak

whiteoak

The result of split(lansing) is a list of point patterns. The names of the list entries are the
names of the types (in this case "blackoak","hickory", etc). To extract one of these patterns,
e.g. the hickories,
> hick <- split(lansing)$hickory
> plot(hick)

Introduction to spatstat

Installed datasets

For reference, here is a list of the standard point pattern datasets that are supplied with the
installation of spatstat:
description
marks
covariates
window
name
amacrine
Hughes rabbit amacrine cells
2 types

anemones
Upton-Fingleton sea anemones
diameter

ants
Harkness-Isham ant nests
2 species
2 zones
convex poly
Tropical rainforest trees

topography
bei
betacells
Wassle et al. cat retinal ganglia
2 types

bramblecanes
Bramble Canes
3 ages

cells
Crick-Ripley biological cells

chorley
Chorley-South Ribble cancers
case/control

irregular
Queensland copper deposits

fault lines
copper
demopat
articial data
2 types

irregular
Finnish Pines
diameter

finpines
hamster
Ahernes hamster tumour data
2 types

humberside
Humberside child leukaemia
case/control

irregular

japanesepines Japanese Pines


lansing
Lansing Woods
6 species

longleaf
Longleaf Pine trees
diameter

nbfires
New Brunswick res
several

irregular
Mark-Esler-Ripley NZ trees

nztrees
ponderosa
Getis-Franklin Ponderosa pines

redwood
Strauss-Ripley redwood saplings

redwoodfull
Strauss redwood map (full set)

2 zones
simdat
Simulated point pattern

spruces
Spruce trees in Saxony
diameter

swedishpines
Strand-Ripley Swedish pines

The symbol
indicates that the window for the pattern is a rectangle.
To ick through a nice display of all these datasets, type demo(data).
To access one of these datasets, type data(name) where name is the name listed above. To
see information about the dataset, type help(name). To plot the dataset, type plot(name).

hick

4.8

Point-and-click on the screen

There is a graphical interface which allows you to draw a point pattern on the screen. Type
> X <- clickppp(10)
This opens a graphics window and invites you to point and click 10 times in the window.
The result is a point pattern, consisting of 10 points, stored in the object named X. To plot it,
type
> plot(X)

c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

25

26

Objects, classes and methods

Objects, classes and methods


Point pattern (class ppp)

The tutorial examples above have used some of the object-oriented features of R. It is very
useful to know a little about how these work.

5.1

Classes in R

R is an object-oriented language. A dataset with some kind of structure on it (e.g. a contingency


table, a time series, a point pattern) is treated as a single object.
For example, R includes a dataset sunspots which is a time series containing monthly sunspot
counts from 1749 to 1983. This dataset can be manipulated as if it were a single object:
> plot(sunspots)
> summary(sunspots)
> X <- sunspots

Rectangular window
(class owin)

Polygonal window
(class owin)

Binary mask window


(class owin)

Each object in R is identied as belonging to a particular type or class depending on its


structure. For example, the sunspots dataset is a time series:
> class(sunspots)
[1] "ts"

120

100

130

200

140

300

150

400

500

160

Pixel image (class im)

Standard operations, such as printing, plotting, or calculating the sample mean, are dened
separately for each class of object.
For example, typing plot(sunspots) invokes the generic command plot. Now sunspots is
an object of class "ts" representing a time series, and there is a special method for plotting
time series, called plot.ts. So the system executes plot.ts(sunspots). It is said that the plot
command is dispatched to the method plot.ts. The plot method for time series produces a
display that is sensible for time series, with axes properly annotated.
Tip: to nd out how to modify the plot for an object of class "foo", consult
help(plot.foo) rather than help(plot).

200

400

600

800

1000

1200

Line segment pattern (class psp)

5.2

Classes in spatstat

To handle point pattern datasets and related data, the spatstat package denes the following
classes of objects:
ppp: planar point pattern
owin: spatial region (observation window)
Most of the functionality in spatstat works on such objects. To use this functionality, youll
need to read your raw data into R and then convert it into an object of the appropriate format.
In particular spatstat has methods for plot, print and summary for each of these classes.
For example, the plot method for point patterns, plot.ppp, ensures that the x and y scales

im: pixel image


psp: pattern of line segments
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

5.2 Classes in spatstat

27

Objects, classes and methods

> plot(density(swedishpines, sigma = 10))


density(swedishpines, sigma = 10)

60

0.01

80

> data(humberside)
> plot(humberside)

0.014

100

are equal, and does various other things that are sensible when plotting a spatial point pattern
rather than just a list of (x, y) pairs.

28

0.002

20

0.006

40

humberside

20

40

60

80

100

To see a list of all methods available in R for a particular generic function such as plot:
> methods(plot)
To see a list of all methods that are available for a particular class such as ppp:
> methods(class = "ppp")
Exercise 1 Find out how to modify the command plot(swedishpines) so that the title reads
Swedish Pines data and the points are represented by plus-signs instead of circles.
When you type print(swedishpines) or just swedishpines, this invokes the generic command print, which dispatches to the method print.ppp, which prints some sensible information
about the point pattern swedishpines at the terminal.
> swedishpines
planar point pattern: 71 points
window: rectangle = [0, 96] x [0, 100] units (one unit = 0.1 metres)
The generic command summary is meant to provide basic summary statistics for a dataset.
When you type summary(swedishpines) this is dispatched to the method summary.ppp, which
computes a sensible set of summary statistics for a point pattern, and prints them at the terminal.
> summary(swedishpines)
Planar point pattern: 71 points
Average intensity 0.0074 points per square unit (one unit = 0.1 metres)

[1]
[5]
[9]
[13]
[17]
[21]
[25]
[29]
[33]

[.ppp
as.owin.ppp
density.ppp
is.marked.ppp
marks.ppp
pairdist.ppp
quadrat.test.ppp
shift.ppp
summary.ppp

5.3

Return values

5.3.1

[<-.ppp
as.ppp.ppp
distmap.ppp
is.multitype.ppp
marks<-.ppp
pcf.ppp
rescale.ppp
split.ppp
unique.ppp

affine.ppp
crossdist.ppp
duplicated.ppp
kstest.ppp
nndist.ppp
plot.ppp
rotate.ppp
split<-.ppp
unitname.ppp

as.data.frame.ppp
cut.ppp
identify.ppp
markformat.ppp
nnwhich.ppp
print.ppp
rshift.ppp
subset.ppp
unitname<-.ppp

The return value of a function

Every function in R returns a value. The return value may be null, or a single number, a
list, or any kind of object. When you type an R expression on the command line, the result of
evaluating the expression is printed.
> 1 + 1
[1] 2

Window: rectangle = [0, 96] x [0, 100] units


Window area = 9600 square units
Unit of length: 0.1 metres

> sin(pi/3)
[1] 0.8660254

The command density is also generic. It is normally used to compute a kernel density
estimate of a probability distribution from a vector of numbers. (This default method is
called density.default.) But there is also a method for point patterns, so that when you type
density(swedishpines), this is dispatched to density.ppp which computes a two-dimensional
kernel estimate of the intensity function.
c
Copyright CSIRO
2008

Just to confuse matters, the result of a function may be tagged as invisible so that it is not
printed.
> data(cells)
> plot(cells)
c
Copyright CSIRO
2008

5.3 Return values

29

Theres still a return value from the function, which can be captured by assigning the result
to a variable:
> a <- plot(cells)
> a

30

Objects, classes and methods

real-valued pixel image


100 x 100 pixel array (ny, nx)
enclosing rectangle: [0, 96] x [0, 100] units (one unit = 0.1 metres)
The class of pixel images in spatstat has methods for print, summary, plot and so on.

NULL
> summary(Z)
Tip: Many plotting commands return a value which is useful if you want to annotate
the plot. In spatstat the function plot.ppp plots a point pattern and returns
information about the encoding of the marks. After plotting a multitype pattern, to
make a nice legend for the plot, save the result of the plot call and pass it to the
legend command:
> a <- plot(lansing)
> legend(-0.25, 0.5, names(a), pch = a)

lansing

Another example is the command Kest which estimates Ripleys K-function. The value
returned by Kest is an object of class "fv" (function value table) containing the estimated
values of K(r), obtained using several dierent estimators, for a range of r values. This class
has methods for print, plot and so on.

blackoak
hickory
maple
misc
redoak
whiteoak

> u <- Kest(X)


> u

Tip: To nd out the format of the output returned by a particular function fun,
type help(fun) and read the section headed Value.
5.3.2

real-valued pixel image


100 x 100 pixel array (ny, nx)
enclosing rectangle: [0, 96] x [0, 100] units
dimensions of each pixel: 0.96 x 1 units
(one unit = 0.1 metres)
Image is defined on the full rectangular grid
Frame area = 9600 square units
Pixel values :
range = [0.00188947243195950,0.0155470858797917]
integral = 71.3036909843861
mean = 0.00742746781087355

Returning an object

A function which performs a complicated analysis of your data will typically return an object
belonging to a special class. This is a convenient way to handle calculations that yield large or
complicated output. It enables you to store the result for later use, and provides methods for
handling the result.
Many of the functions in spatstat return an object of a special class. For example, the
value returned by density.ppp is a pixel image (an object of class "im"). This is eectively a
large matrix, giving the values of the kernel estimate of intensity at each point in a ne regular
grid of locations.
> Z <- density(swedishpines, sigma = 10)
> Z

Function value object (class fv)


for the function r -> K(r)
Entries:
id
label
description
---------------r
r
distance argument r
theo
Kpois(r)
theoretical Poisson K(r)
border
Kbord(r)
border-corrected estimate of K(r)
trans
Ktrans(r)
translation-corrected estimate of K(r)
iso
Kiso(r)
Ripley isotropic correction estimate of K(r)
-------------------------------------Default plot formula:
. ~ r
Recommended range of argument r: [0, 24]
Unit of length: 0.1 metres
> plot(u)

c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

31

1500
1000

1: 0.40 0.70 0.91 0.92 0.13 0.92 0.72 0.15


9: 0.78 0.59 0.02 0.70 0.75 0.33 0.52 0.75
17: 0.19 0.32 0.87 0.13 0.63 0.08 0.72 0.67 0.96
26:
Read 25 items

500

K(r)

> y <- scan()

10

15

20

r (one unit = 0.1 metres)

You can also use scan(file="filename") to read a stream of numbers from a le. Alternatively, if the le is nicely formatted as a table with a separate line for each data point, use
read.table.

Data input

To analyse your own point pattern data in spatstat, youll need to read the raw data into R
and convert them into an object of class "ppp". This tutorial gives one basic recipe.

6.1

Data input

1: 1.94 0.32 1.74 0.64 0.12 1.44 0.29 0.74


9: 0.32 1.35 1.23 0.53 0.98 0.96 0.91 1.28
17: 1.24 0.14 1.75 0.24 0.45 0.94 1.22 1.60 0.62
26:
Read 25 items

32

Unmarked point pattern


In the example above, the x coordinates are in the range [0, 2] and the y coordinates in [0, 1].
To create the point pattern object we simply type

Basic recipe

In most cases, the observation window is a rectangle. The following steps will then be sucient.
1. store the x and y coordinates for the points in two vectors x and y.
2. if there are marks attached to the points, store the corresponding marks in a vector m.
(Note: only a single mark value per point is allowed; multivariate marks are not supported.
But were working on it.)

> P <- ppp(x, y, c(0, 2), c(0, 1))


> plot(P)
> P
planar point pattern: 25 points
window: rectangle = [0, 2] x [0, 1] units
P

3. create the point pattern object by


> ppp(x, y, xrange, yrange)
or, if there are marks,
> ppp(x, y, xrange, yrange, marks = m)
where xrange, yrange are vectors of length 2 giving the x and y dimensions of the rectangular window.
The value returned by the function ppp is an object of class "ppp" representing a point
pattern.

Marked point pattern


Mark values may have any atomic type: numeric, integer, character, logical, or complex. For
example, lets take a vector of real numbers:

Entering coordinate data


Suppose we have recorded the x, y coordinates of 25 points that lie in a rectangle [0, 2] [0, 1].
They can be entered into R in various ways, for example by typing them directly:
> x <- scan()
c
Copyright CSIRO
2008

> m <- scan()


1: 9.2 3.2 14.4 12.3 2.5 6.1 2.7 10.4
9: 10.2 0.4 20.9 10.4 25.7 7.7 13.7
16: 10.4 8.1 9.7 0.3 0.2 1.9 11.5
c
Copyright CSIRO
2008

6.1 Basic recipe

23: 16.8 36.2


26:
Read 25 items

33

34

Data input

5.5

YN

and include this as the marks vector for the point pattern:
> Q <- ppp(x, y, c(0, 2), c(0, 1), marks = m)
> Q
marked planar point pattern: 25 points
marks are numeric, of type double
window: rectangle = [0, 2] x [0, 1] units

If the marks are intended to be a categorical variable, ensure that m is stored as


a factor.
The last line of output indicates how the marks were plotted: the mark No was plotted as
symbol 1 (circle) and mark Yes was plotted as symbol 2 (triangle).
Notice that the factor levels have been re-sorted alphabetically (by default). This is one of
the common slip-ups with factors in R. To stipulate a dierent ordering of the levels,

> plot(Q)
0
10
20
30
40
0.00000000 0.04323888 0.08647777 0.12971665 0.17295553

> m <- factor(m, levels = c("Yes", "No"))


> YN <- ppp(x, y, c(0, 2), c(0, 1), marks = m)
> YN

marked planar point pattern: 25 points


multitype, with levels = Yes
No
window: rectangle = [0, 2] x [0, 1] units
Tip: whenever you create a factor, check that the factor levels are as you intended,
using levels(x).
The last line of output is the return value from plot(Q), which indicates the scale used to plot
the marks. The mark value 10 was plotted as a circle of radius 0.0432.

6.2

Categorical marks
When the mark is a categorical variable, we have a multitype point pattern. The types are the
dierent levels of the mark variable. The mark values should be stored as a factor in R.
For example, lets attach random marks to the pattern, taking two possible values Yes and
No with equal probability.
>
>
>
>

Other ways of adding marks to a point pattern will be described in Section 23.

m <- sample(c("Yes", "No"), 25, replace = TRUE)


m <- factor(m)
YN <- ppp(x, y, c(0, 2), c(0, 1), marks = m)
YN

Checking data

It is prudent to check for quirks in the data.


Print out the coordinate values and marks to check for errors in data entry, and to determine whether the coordinates have been rounded.
Duplicated points are surprisingly common in data les (i.e. where two records in the le
refer to the same (x, y) location). Once you have entered the coordinates into R as a twocolumn matrix or a data frame D say, you can check for duplication using the command
any(duplicated(D)). If your data are already in the form of a point pattern X, you can
also type any(duplicated(X)) to detect duplication. To remove duplicated points, type
Y <- unique(X).

marked planar point pattern: 25 points


multitype, with levels = No
Yes
window: rectangle = [0, 2] x [0, 1] units

Plotting the point pattern is always wise. Look for unexpected patterns, and points that
lie outside the window.

> plot(YN)

On a plot of a point pattern X, you can identify an individual point by typing plot(X);
identify(X) then clicking on the point.
The function ppp automatically checks for duplicated points, and for points that lie outside
the specied window.

No Yes
1
2
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

6.3 Units

6.3

35

36

Units

A point pattern X may include information about the units of length in which the x and y
coordinates are recorded. This information is optional; it merely enables the package to print
better reports and to annotate the axes in plots.
If the x and y coordinates in the point pattern P were recorded in metres, type
> unitname(P) <- c("metre", "metres")
at least in Australia or New Zealand. The two strings are the singular and plural forms of the
unit. In Scandinavia and Germany you would type

Methods 1: Investigating intensity

Methods 1: Investigating intensity

Finally we can start working on statistical methods for analysing point pattern data.
When we analyse numerical data, we often begin by taking the sample mean. The analogue
of the mean or expected value of a random variable is the intensity of a point process.
Intensity is the average density of points (expected number of points per unit area). Intensity may be constant (uniform or homogeneous) or may vary from location to location
(inhomogeneous). Investigation of the intensity should be one of the rst steps in analysing a
point pattern.

> unitname(P) <- "meter"


The measurement unit can also be given as some multiple of a standard unit. If, for example,
one unit for the x and y coordinates equals 42 centimetres, type
> unitname(P) <- list("cm", "cm", 42)
Beware that the unitname applies only to the coordinates, and not to the marks, of a point
pattern.
Altering the unitname in an existing dataset is usually not sensible; it simply alters the
name of the unit, without changing the entries in the x and y vectors. If you want to convert
to dierent units (e.g. from metres to kilometres or from imperial to metric units), use the
command rescale as described in Section 9.2.5. If you want to actually change the coordinates
by a linear transformation, producing a dataset that is not equivalent to the original one, use
affine.

6.4

Other ways to make point patterns

7.1
7.1.1

Uniform intensity
Theory

If the point process X is homogeneous, then for any sub-region B of two-dimensional space, the
expected number of points in B is proportional to the area of B:
E[N (X B)] = area(B)
and the constant of proportionality is the intensity. Intensity units are numbers per unit area
(length2 ). If we know that a point process is homogeneous, then the empirical density of points,
=

n(x)
area(W )

is an unbiased estimator of the true intensity .

To create a point pattern object we can either


create one from raw data using the function ppp

7.1.2

convert data from other formats (including other packages) using as.ppp

To compute the estimator in spatstat, use summary.ppp:

Implementation in spatstat

point-and-click on a graphics device using clickppp


> data(bei)
> summary(bei)

read data from a le using scanpp


transform an existing point pattern using a variety of tools

Planar point pattern: 3604 points


Average intensity 0.00721 points per square metre

generate a random pattern using one of the simulation routines


use one of the standard point pattern datasets supplied with the package.
The package help le help(spatstat) lists all the available options.
Note that it is a standard naming convention in R that, for a class "foo", there should
be a creator function foo that creates objects of this class from raw numerical data, and a
converter function as.foo that converts data from other formats into objects of class "foo".
We adhere to this convention in spatstat:
Class
"ppp"
"owin"
"im"

Creator
ppp
owin
im

Converter
as.ppp
as.owin
as.im

Window: rectangle = [0, 1000] x [0, 500] metres


Window area = 5e+05 square metres
Unit of length: 1 metre
The estimated intensity is = 0.00721 points per square metre. To extract this intensity
value, type
> lamb <- summary(bei)$intensity
> lamb

More alternatives for using ppp will be covered in Section 8.4.

[1] 0.007208
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

7.2 Inhomogeneous intensity

7.2
7.2.1

37

Inhomogeneous intensity

38

Methods 1: Investigating intensity

> Q <- quadratcount(bei, nx = 6, ny = 3)


> plot(bei, cex = 0.5, pch = "+")
> plot(Q, add = TRUE, cex = 2)

Theory

In general the intensity of a point process will vary from place to place. Assume that the
expected number of points falling in a small region of area du around a location u is equal to
(u) du. Then (u) is the intensity function of the process, satisfying

(u) du
E[N (X B)] =

bei
+
+ +++ ++
+++
+
+
+ +++ + ++++
+ ++
+ ++ + +
+ + + ++++++
++
++
+
+
+ ++ + ++
+ +
+ +++ +++ +
+
+ +++
+ ++
+++
++
++ + +
+
+
+++
+
+
+
++ ++++ +++
+ +
++ + + +++ +
++++
+
+ +++++ + +
+ ++
+
+
++
++++ + + +
++ +
++
++
+
+++
+++++ +
+ ++
+
+
+
+
++ + ++
+
++
+ ++++
+
++
+ +
+
++ + + + + ++ ++
+
+++
+
++ + +
+
++
+
+
++++++++
+
++++
+ + +++
+++ + + + +
++
+
+ +++++
++
+
++ + +
++ +
++
+
++
+
++
+++++
+
++
++
++
+
+++++
++++
++
+
+
+++
++
++ +
+
++
++
+
++
++
+
+
++
+
+++++
+ +++
+
+++
+++
+++ +++ + + ++ + +++
+
+ ++ + +
+ + ++ + +
+++
++
+ +
+++
+
+ + +
+
+ +
+
+
+
++++ ++ +
++
+
+
+
+++
+++++ + + + +
+
+
++
++
++ +
+
+ + +
++
++ + ++
++ +
+++++
+ ++
++
+
+
+
+
+
+
+
+
++++
+
+++ +
+
+
++
++ + + +
+
+ + + ++
++
+++ +
+
++
++++ + +++
+ ++
+ +
++ +
+++
+
+ + ++
++++++
+
+ + + ++ + ++ +++ ++
+
+ ++ + + ++
+
+
++
++ ++
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+++++
+
+
+++++
+
+
++++++
+++ ++ +++
+ +
++
++++ ++
++++
+
+
++ ++ + +
++ +
+
++
+ + + + ++
+
++ +
++
++
+
+++
+ + + +
++ +++++ +++ + ++++
+ ++ +
+ ++ ++
++
+ + + ++
+ + +++++
+
++
+++ ++ ++++
+
+
+
+
+
+++ +
+ +
+ +++
++ ++
++ + ++
+++++ +
+
+ +
++ +
+
+
+
++ +
++
+
++
++
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+++
+ ++
++
+++++
+ + + + ++
+
++
+++
+
+
+ + ++ +++ + +
++ ++ +++
+
+ +
+
+++ +
++
++
+ + ++++ +
+ + ++
+
+ ++ + +
++
++ +++ + +
+
+++
+
+ ++ ++
+++ ++++
++ + +
+ + + +
+ + + ++
+ +
+ + +
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
++
+
++ + +
++
+ +++
++
+ + +
+ + + +
++ +
++
+
+
+
+ +++ + +
++
+ ++ +
+ ++
+++ +++++ + + +++ +
+++ +
++++ + + + +
+
+
+
++++
+
++
+ ++
+ ++ + +
+
+ ++
+ +
++ +
+++ +
++ +
+
++
+
+ +
+++++ ++
+
+
+
++
+ + + ++
+
+ +
++++ + +
+
+
++ + +
++ + + + +
++ +
+ +++
+ +
++++
+
+++
+ +++ + + + +
+
+
+ + + ++
++ + +
++
+++++
+ ++
+ +
+ +
+ +
+ +++ + +
++
++
+ +
+
++ + + + +
+ +
+ ++
++
++
++ + +
+
+
+
++
+ ++++
++ ++ +
+ + ++ + ++ ++
++
++ +
+
+
+
+ + + ++++ ++++++
++ ++ +++
+
+
+
++
+ ++
+++
+ +++++ +
+++++
+ +
+
+ + ++ +
+
+
++
++++
++++
+ +
+ ++
+
+
+ +
+ + +
+++
+
+
+
++ + + +
+
+
++
++++ + ++++ + +
+ ++ + + + +++
+
+
+
++ +
+
+++
++
+
++
+
++
+
+ ++
+
+
+
++++ + + ++++
+
+
+
+
+
++ +
+
+
+
++
+
+ +
+ + +
++++
+
++ + +++ ++++
++
+
+
++ +
+ ++++ +++
+
++ +
+++
+
+ + +
+ +
+
++
++ +
+
+++
+
+ + + + ++ + ++++
+
+
+ ++ + + + +
+ + + ++ + +++++++ +
+
+ ++
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+ + +++
+
+ +
+
+
++
+ + +++ ++ ++ + +
+
+ ++
+
+
+
+
+
++
+
+ +
+
+ + + +
+
+
++
+
+
+++
++++
+ + + +
+ +
+
++ +
++
+ + +
+ +
++
+
+ + +
+ ++ ++ + + ++++ ++++ + + ++
+
++ ++++
+
+++ + + +
+
+ + ++
+
++
+ +
+
+ + + +
+ + ++ ++++
+
+
++ +
+
+ ++++ ++
+
++ + +
+++
+
+ ++ ++ +++ ++++ ++ +
+++ + +++ +++++
+
+++
+ + + + +
+
++
++ +
+ + ++++
+
+
+ ++
+ ++
++++
+
+ +
+++
++
+
+
+
++
++ +
+
+ + +
++ +++
+
++ + +
+ ++
+
+ + ++
+++ +++ ++
+
+
++ + ++
+ + ++ ++
+
+
+
+++
+ +++
++
+ ++ +++
+
+
+
+
++ + +++ +
+
+
+
+
++++ ++ ++ + + + + +++ + + + + + ++ + + ++
+ ++
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+ ++ + + + + +
+ +++++
+
++ +
++ +
+
+
+
++ +
++++++
+ ++ +
+
+
+ ++ + ++ +++ + + ++
+ +
+++
+++
+++
+ ++ +
+
+
+++++
+
+
+
+++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++++++++
+ ++
++
++ +
++ ++ ++
+++ ++++ +
+
+
+
+ ++
+
+++
+ ++ + + + + +
++
++++
+
+
++ +++ ++++
++
+++++++
+++
+
+
+++++ ++++
++
++ +++ +++ +
+
+ + ++
+
+
+
+
++ ++ + + + + +
+
+
++
++++++
+
+++ +
+++
++
++
+
+ + + + ++ ++
+
++
+ ++ ++ +
+
+
++
+
+ +
+
+
+
+ +
+
++ ++
+++ ++ ++
+
+ +
+
+
+ +
+
+++
+
+ +++ ++
+
+
+
+
+ +
+
+
+ +
+
+++
+ + +
+ ++
+
+
+
+
+
++
+ + ++ +
+ +
+
+
+
+
+
++ +
+
+
++ ++ ++
++ ++
+
+
++ +
+
+ +
++ +
++
+
+
+
+ +
+
+ ++
++
++++
+
+
++
++
+
+
+ +++ + + +++ +
+ +
+ +
+
++ +
+ + + ++ + +
++++
+
+
+
+ + +
++
++
+ ++
+
+ ++
+
+
++ +++
+ +
++ +
+
++
+
+
+
++
+
+ ++
++
+
+++ +
++
++
+ + +
+
+
+
+
++++++
+ + ++ +
++
+
+
++
++++++
++
+
+
++
++
+
++
+ +
++ ++ ++
++ + +
+++ ++ +
+
+ +
+
+++
+ +
+++
+++
+++ +++ +
++
++
+
+++
+ + ++++
+ +
+ + +
+ +++ +
+
+ +
++
+
+
+++
+
+++
++ ++
+
+
+
++
+
+ + +++++
++++
+
+
++
+
+
++
+
+
+
+
++
+
+++ ++ +
+
+
+
+
+ + +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+ +++++ + ++ + +
++++
+ +
++
+++ +++
+
+ +
+ +
+
+
+++++ +
++
++ ++ ++++
+ + +
+++++++ + +++
+
+
+
++
+
+
+
++
++
+
+
+
+
+ + ++
++ +++
+ ++ + +
+
+ ++ + + + ++
++ +
+
+
+
+
++
+
+
+
+
+
+
+
++ + +
+
++
+ + +
+ +
+++
+
+ +++ ++++++++ +
++ + +
+
+
+++ +
++
+
+ ++ +
++
+ ++ +
++
++++
+ +
+ + + ++
+
+
+
+
+
+ ++ +++ ++ +
++
+ +
+
+ +++ ++++ + ++ +++ +
+++ ++ + +++++ +
+
+
+
+++++++ +++++++++
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++ +++
++
++ ++
+
+
+ + +++ +++
+
+
+
+
+ +
++++ +++ +++++ ++ ++ + +
+ +
++
+ +++
+ ++
+ + + ++++
+ ++ + +
+
++
+
+ +
++ + +
++
+
+
+
+++
+
+
+
+
++ +
+ +
+
+
+ + ++++
+
++ +
+ + +++
+
+ +
+
+ + +
+
+ + ++
+
+ +++ + + + ++
++ +
++
+
+
++
+
+
++ +
+ ++ +
+ ++ ++
++++ ++++++ +
+ + +
+++++
+++++++ +++++++++ ++
++
+ ++++ + ++ +++ +
+
+ + +++ ++ +
+
+ + +++
+ +
+ +
+ +
+
+
+ +
+
+ ++
+
++ +
+
+ ++
+
+
+
+++
+++
+++
+
+
+
+ +++
++ ++
+
+
+
+
++ +
++ ++ +++
++++
+++
++
++++++
+ +++ ++++++
+
++
+
+++
+ ++
+ +
++
+
+
+ ++
+
+ + +
+ + + +
+
+ +
+
++
+
+ ++++ +
+ ++++++ +++
+
+
++ +
++
+
+ ++
+
+++
+ + ++ +
+
+++++
+
++ ++ +++ +++++
++ ++ +
+
++++
++
+
++ + +
+
+ + +
+
+++ +
++
+
++
+++++++++
++++
++
+++
+
+ +++
+
+++ + +
+ + +
++
++ ++
++ +++
++++ + +
+++
+++
+ +++ ++
+ + +
+
+
++
+ ++ +

for all regions B.


More generally there could be singular concentrations of intensity (e.g. earthquake epicentres
may be concentrated along a fault line) so that an intensity function does not exist. Then we
speak of the intensity measure dened by
(B) = E[N (X B)]
R2 ,

assuming the expectation is nite.


for each B
If it is suspected that the intensity may be inhomogeneous, the intensity function or intensity
measure can be estimated nonparametrically by techniques such as quadrat counting and kernel
smoothing.
In quadrat counting, the window W is divided into subregions (quadrats) B1 , . . . , Bm of
equal area. We count the numbers of points falling in each quadrat, nj = n(x Bj ) for j =
1, . . . , m. These are unbiased estimators of the corresponding intensity measure values (Bj ).
The usual kernel estimator of the intensity function is

(u)
= e(u)

n


(u xi ),

(1)

i=1

where (u) is the kernel (an arbitrary probability density) and



(u v) dv
e(u)1 =

337

608

162

73

105

268

422

49

17

52

128

146

231

134

92

406

310

64

The value returned by quadratcount is an object belonging to the special class "quadratcount".
We have used the plot method for this class to get the display above.
Kernel density (or intensity) estimation using an isotropic Gaussian kernel is implemented
in spatstat by the function density.ppp, a method for the generic command density.

> den <- density(bei, sigma = 70)


> plot(den)
> plot(bei, add = TRUE, cex = 0.5)

(2)

0.02
0.015

400

0.01

300

0.005

Implementation in spatstat

100

7.2.2

200

a smoothed version of the true intensity function (u). The choice of smoothing kernel involves
a tradeo between bias and variance.
Intensity can also be estimated using parametric methods, as we explain in Section 11.

500

den


is an edge eect bias correction. Clearly (u)
is an unbiased estimator of


(u v)(v) dv,
(u) = e(u)

Quadrat counting is performed in spatstat by the function quadratcount.


0

> quadratcount(bei, nx = 4, ny = 2)
y
x

200

400

600

800

1000

The value returned by density.ppp is a pixel image (object of class "im"). This class has
methods for print, summary, plot, contour (contour plots), persp (perspective plots) and so
on.

[0,250] (250,500]
[0,250]
544
666
(250,500]
165
677
(500,750]
643
130
(750,1e+03]
298
481

> persp(den)
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

7.2 Inhomogeneous intensity

39

40

Dening the window

den

Dening the window

Many commands in spatstat require us to specify a window, study region or domain. It will
be handy to know more about windows in spatstat.
An object of class "owin" (observation window) represents a region or window in twodimensional space. The window may be
a rectangle;
den

a polygon or polygons, with polygonal holes; or

an irregular shape represented by a binary pixel image mask.

Polygonal window

Binary mask window

Rectangular window

> contour(den)
den

Objects of this class are created by the function owin. There are methods for printing and
plotting windows, and numerous geometrical operations.

8.1
8.1.1

Making windows
Rectangular window

To create a rectangular window, type


> owin(xrange, yrange)
where xrange, yrange are vectors of length 2 giving the x and y dimensions, respectively, of
the rectangle.
> owin(c(0, 3), c(1, 2))
window: rectangle = [0, 3] x [1, 2] units
For a square window you can also use square:
> square(5)
window: rectangle = [0, 5] x [0, 5] units
8.1.2

Circular window

For a circular window use disc:


> W <- disc(radius = 3, centre = c(0, 0))
Currently a circular window is represented as a polygon with a large number of edges.
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

8.1 Making windows

8.1.3

41

42

Dening the window

Polygonal window

Spatstat supports polygonal windows of arbitrary shape and topology. That is, the boundary
of the window may consist of one or more closed polygonal curves, which do not intersect
themselves or each other. The window may have holes. Type
> owin(poly = p)
or
> owin(poly = p, xrange, yrange)
to create a polygonal window. The argument poly=p indicates that the window is polygonal
and its boundary is given by the dataset p. Note we must use the name=value syntax to give
the argument poly. The arguments xrange and yrange are optional here; if they are absent,
the x and y dimensions of the bounding rectangle will be computed from the polygon.
If the window boundary is a single polygon, then p should be a list with components x and y
giving the coordinates of the vertices of the window boundary, traversed anticlockwise. For
example, the triangle with corners (0, 0), (1, 0) and (0, 1) is created by

Notice that the rst boundary polygon is traversed anticlockwise and the second clockwise,
because it is a hole.
It is often useful to plot a polygonal window with line shading:
> plot(Z, hatch = TRUE)
Z

> Z <- owin(poly = list(x = c(0, 1, 0), y = c(0, 0, 1)))


> plot(Z)

8.1.4

Binary mask

A window may be dened by a discrete pixel approximation. Type


owin(mask=m, xrange, yrange)

Note that polygons should not be closed, i.e. the last vertex should not equal the rst
vertex. The same convention is used in the standard plotting function polygon().
If the window boundary consists of several separate polygons, then p should be a list, each
of whose components p[[i]] is a list with components x and y describing one of the polygons.
The vertices of each polygon should be traversed anticlockwise for external boundaries and
clockwise for internal boundaries (holes). For example, the following creates a triangle
with a square hole.
> Z <- owin(poly = list(list(x = c(0, 8, 0), y = c(0, 0, 8)), list(x = c(2,
+
2, 3, 3), y = c(2, 3, 3, 2))))
> plot(Z)
c
Copyright CSIRO
2008

to create the window object. Here m should be a matrix with logical entries; it will be interpreted
as a binary pixel image whose entries are TRUE where the corresponding pixel belongs to the
window.
The rectangle with dimensions xrange, yrange is divided into equal rectangular pixels. The
correspondence between matrix indices m[i,j] and cartesian coordinates is slightly idiosyncratic:
the rows of m correspond to the y coordinate, and the columns to the x coordinate. The entry
m[i,j] is TRUE if the point (xx[j],yy[i]) (sic) belongs to the window, where xx, yy are
vectors of pixel coordinates equally spaced over xrange and yrange respectively. The length of
xx is ncol(m) while the length of yy is nrow(m).
In some GIS applications the study region will be given as a binary pixel image. A safe
strategy is to dump the data from the GIS system to a text le, and read the text le into R
using scan. Then reformat it as a matrix, and use owin to create the window object.
To convert a rectangle or polygonal window to a binary mask, use as.mask.
c
Copyright CSIRO
2008

8.2 Functions that return a window

43

> Z <- owin(poly = list(x = c(0, 1, 0), y = c(0, 0, 1)))


> W <- as.mask(Z)
> plot(W)

44

Dening the window

> grad <- bei.extra$grad


> V <- solutionset(elev <= 140 & grad > 0.1)
> plot(V)

8.3
8.2

Functions that return a window

Some functions return a window object. They include


as.owin
Convert other data to a window object
disc
Create a circular window
clickpoly
The user draws a polygon on the screen
bounding.box
Bounding box of a window
bounding.box.xy Bounding box of a point pattern
convexhull.xy
Convex hull of a point pattern
ripras
Ripley-Rasson estimator of window, given only the points
trim.rectangle
Cut o side(s) of a rectangle
levelset
Level set of a pixel image
solutionset
Solution of an equation involving pixel image(s)
For example, the dataset bei.extra$elev is a pixel image containing altitude (elevation)
values for a study region. To nd the subset where altitude exceeds 145,
> elev <- bei.extra$elev
> W <- levelset(elev, 145, ">")
> plot(W)
W

8.4

The result W is a window.


The accompanying dataset bei.extra$grad is a pixel image of the slope (gradient) of the
terrain. To nd the subset where altitude is below 140 and slope exceeds 0.1,
c
Copyright CSIRO
2008

Operations on windows

Basic methods for the class "owin" include


print.owin
print short description of a window
summary.owin print detailed summary of a window
plot.owin
plot a window
Numerous geometrical operations are implemented for window objects. They include:
area.owin
compute windows area
diameter
compute windows diameter
intersect.owin
intersection of two windows
union.owin
union of two windows
bounding.box
Find a tight bounding box for the window
complement.owin swap inside and outside
rotate
rotate window
shift
translate window
affine
apply ane transformation
rescale
change scale and adjust units
as.mask
convert to binary image mask
dilate.owin
morphological dilation
erode.owin
morphological erosion
eroded.areas
compute areas of eroded windows
inside.owin
determine whether a point is inside a window
distmap.owin
distance transform image
centroid.owin
compute centroid (centre of mass) of window
is.subset.owin
determine whether one window contains another

Creating a point pattern in any window

As we saw in Section 6.1, the function ppp() will create a point pattern (an object of class
"ppp") from raw numerical data in R.
Suppose the x, y coordinates of the points of the pattern are contained in vectors x and y of
equal length. Then
ppp(x, y, other.arguments)
c
Copyright CSIRO
2008

45

will create the point pattern. The other arguments must determine a window for the pattern,
in one of two ways:
the other arguments can be passed to owin to determine
ppp(x, y, xrange, yrange)
point pattern
ppp(x, y, poly=p)
point pattern
ppp(x, y, poly=p, xrange, yrange) point pattern
ppp(x, y, mask=m, xrange, yrange) point pattern

a window:
in rectangle
in polygonal window
in polygonal window
in binary mask window

if W is a window object (class "owin") then


> ppp(x, y, window = W)

46

Manipulating point patterns

You can extract these components individually; for example, to make a histogram of the
x coordinates just type hist(P$x). However, do not assign values to these components
directly, or you may create inconsistencies in the data which cause spatstat to crash. To
manipulate point patterns, use the functions provided.
Although a point pattern should be treated as an unordered set, the coordinates are obviously
stored in a particular order, and can be addressed using that order.
>
>
>
>
>

data(longleaf)
x <- longleaf$x
y <- longleaf$y
diameter <- longleaf$marks
cbind(x, y, diameter)[1:5, ]

will create the point pattern.


You may already have a window W (an object of class "owin") ready to hand, and now want
to create a pattern of points in this window. For example you may want to put a new point
pattern inside the window of an existing point pattern X; the window is accessed as X$window,
so type
ppp(x, y, window=X$window)

[1,]
[2,]
[3,]
[4,]
[5,]

x
200.0
199.3
193.6
167.7
183.9

y diameter
8.8
32.9
10.0
53.5
22.4
68.0
35.6
17.7
45.4
36.9

If the marks are a categorical variable, then P$marks is a factor.

Manipulating point patterns

Before proceeding, we need to know more about how to manipulate and interrogate point pattern
data.

9.1

Format of ppp objects

A point pattern is represented in spatstat by an object of the class "ppp". This contains the
coordinates of the points, optional mark values attached to the points, and a description of the
study region or spatial window.
9.1.1

Format

A point pattern object P has the following components:

>
>
>
>
>

55
56
57
58
59
60

data(chorley)
x <- chorley$x
y <- chorley$y
type <- chorley$marks
data.frame(x, y, type)[55:60, ]
x
355.6
355.5
355.7
355.6
359.0
353.1

y
413.9
413.9
413.9
414.1
417.3
426.9

type
larynx
larynx
larynx
larynx
lung
lung

P$n is the number of points (which may be zero).

> is.factor(type)

P$x is a numeric vector containing the x coordinates of the points. Its length equals P$n
(and may be zero).

[1] TRUE

P$y is a numeric vector containing the y coordinates of the points. Its length also equals
P$n.

> levels(type)
[1] "larynx" "lung"

P$marks contains the marks. It is either NULL, or a vector of length P$n containing the
mark values. The entries of P$marks may be of any atomic type (character, numeric,
logical, complex).

> table(type)

P$window is an object of class "owin" (observation window) determining the study region
or spatial window.

type
larynx
58

c
Copyright CSIRO
2008

lung
978

c
Copyright CSIRO
2008

9.2 Operations on ppp objects

9.1.2

47

A point pattern needs a window

Manipulating point patterns

> bei[1:10]

Note especially that, when you create a new point pattern object, you need to specify the spatial
region or window in which the pattern was observed. In spatstat, the observation window is an
integral part of the point pattern. A point pattern dataset consists of knowledge about where
points were not observed, as well as the locations where they were observed. Even something as
simple as estimating the intensity of the pattern depends on the window of observation. It would
be wrong, or at least dierent, to analyze a point pattern dataset by guessing the appropriate
window (e.g. by computing the convex hull of the points). An analogy may be drawn with the
dierence between sequential experiments and experiments in which the sample size is xed a
priori.
Often, the window of observation is a rectangle, so this requirement just means that we have
to specify the x and y dimensions of the rectangle when we create the point pattern. Windows
with a more complicated shape can easily be represented in spatstat, as described below.
For situations where the window is really unknown, spatstat provides the function ripras
to compute the Ripley-Rasson estimator of the window, given only the point locations.

9.2

48

Operations on ppp objects

planar point pattern: 10 points


window: rectangle = [0, 1000] x [0, 500] metres
It is also possible to extract the subset dened by a spatial region. If X is a point pattern
and W is a spatial window (object of class "owin") then X[W] is the point pattern consisting of
all points of X that lie inside W.
> W <- owin(c(100, 800), c(100, 400))
> W
window: rectangle = [100, 800] x [100, 400] units
> bei[W]
planar point pattern: 918 points
window: rectangle = [100, 800] x [100, 400] units

Directly manipulating the entries inside an object is not safe. It is also unnecessary, because
these manipulations can be performed using functions or operators.
For point patterns (objects of class "ppp") there are the following operations.
9.2.1

Tip: You may need to put quotes around the subset operator in some contexts.
The generic subset operator is [ but the help le is summoned by typing help("[").
The subset method for point patterns is called [.ppp but the help le is summoned
by typing help("[.ppp").

Extracting subsets

Recall that in R the subset operator is [ ]. If x is a vector of numbers, then x[s] extracts an
element or subset of x. The subset index s can be
a positive integer: x[3] means the third element of x;
a vector of positive integers indicating which elements to extract: x[c(2,4,6)] extracts
the 2nd, 4th and 6th elements of x;
a vector of negative integers indicating which elements not to extract: x[-1] means all
elements of x except the rst one;
a vector of logical values, of the same length as x, with each TRUE entry of s indicating
that the corresponding entry of x should be extracted, and FALSE indicating that it should
not be extracted. For example x[x > 3.1] extracts those elements of x which are greater
than 3.1.
To extract a subset of a point pattern in spatstat, we also use the subset operator [ ]. If
X is a point pattern then X[s] is also a point pattern, consisting of those points of X selected by
the subset index s, where s can be any of the three types listed above, (Recall that the points
in a point pattern object are stored in a particular order; this is the order in which they are
indexed by s.)
> data(bei)
> bei

9.2.2

Fiddling with marks

To extract the marks from a point pattern, use marks:


> m <- marks(X)
To add or change marks, use marks<> marks(X) <- whatever
To delete marks from a point pattern, assign the marks to NULL:
> marks(X) <- NULL
For convenience, you can also perform these operations inside an expression, using the function unmark to remove marks and the binary operator %mark% to add marks:
>
>
>
>

data(redwood)
radii <- rexp(redwood$n, rate = 10)
X <- redwood %mark% radii
X

marked planar point pattern: 62 points


marks are numeric, of type double
window: rectangle = [0, 1] x [-1, 0] units

planar point pattern: 3604 points


window: rectangle = [0, 1000] x [0, 500] metres

> unmark(X)
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

9.2 Operations on ppp objects

49

Changing scales and units

A scalar dilation can be applied using affine. For example, the Swedish Pines data were
recorded in decimetres. To convert the coordinates to metres, we could type

For a point pattern with real-valued marks, the method cut.ppp for the generic function
cut will divide the range of mark values into several discrete bands, yielding a point pattern
with categorical marks:
> Y <- cut(X, 3)
> Y <- cut(X, breaks = c(0, 1, 10, Inf))
> Y

9.2.3

Manipulating point patterns

9.2.5

planar point pattern: 62 points


window: rectangle = [0, 1] x [-1, 0] units

marked planar point pattern: 62 points


multitype, with levels = (0,1]
(1,10]
window: rectangle = [0, 1] x [-1, 0] units

50

>
>
>
>

data(swedishpines)
X <- affine(swedishpines, mat = diag(c(1/10, 1/10)))
unitname(X) <- c("metre", "metres")
X

planar point pattern: 71 points


window: rectangle = [0, 9.6] x [0, 10] metres
The command rescale performs the same function:
(10,Inf]
> data(swedishpines)
> X <- rescale(swedishpines, 10)
> X

Combining point patterns

Any number of point patterns can be combined to make a single pattern, using superimpose.

planar point pattern: 71 points


window: rectangle = [0, 9.6] x [0, 10] metres
Beware that this does not change the marks in the point pattern. If your marks represent
tree diameter and you want to rescale them as well, this must be done by hand.

> X <- runifpoint(20)


> Y <- runifpoint(10)
> superimpose(X, Y)

9.3
planar point pattern: 30 points
window: rectangle = [0, 1] x [0, 1] units

Example

We will use one of the standard point pattern datasets that is installed with the package. The
NZ trees dataset represent the positions of 86 trees in a forest plot 153 by 95 feet.

The argument W, if given, species the window for the combined point pattern.
> superimpose(X, Y, W = square(2))

> data(nztrees)
> nztrees
planar point pattern: 86 points
window: rectangle = [0, 153] x [0, 95] feet

planar point pattern: 30 points


window: rectangle = [0, 2] x [0, 2] units

> plot(nztrees)
nztrees

To attach a separate mark to each component pattern, use argument names:


> superimpose(Hooray = X, Boo = Y)
marked planar point pattern: 30 points
multitype, with levels = Hooray
Boo
window: rectangle = [0, 1] x [0, 1] units
9.2.4

Geometrical transformations

The commands rotate, shift and affine apply two-dimensional rotation, vector shifts, and
ane transformations, respectively.
c
Copyright CSIRO
2008

To get an impression of local spatial variations in intensity, we plot a kernel density estimate
of intensity.
c
Copyright CSIRO
2008

9.3 Example

51

> contour(density(nztrees, 10), axes = FALSE)

52

Manipulating point patterns

> nzchop <- nztrees[chopped]


We can now study the chopped point pattern:

density(nztrees, 10)

> summary(nzchop)
Planar point pattern: 78 points
Average intensity 0.00555 points per square foot
Window: rectangle = [0, 148] x [0, 95] feet
Window area = 14060 square feet
Unit of length: 1 foot
> plot(density(nzchop, 10))
> plot(nzchop, add = TRUE)
0.02

density(nzchop, 10)

0.015

80

The density surface has a steep slope at the top right-hand corner of the study region.
Looking at the plot of the point pattern itself, we can see a cluster of trees at the top right.
You may also notice a line of trees at the right-hand edge of the study region. It looks
as though the study region may have included some trees that were planted as a boundary or
avenue. This sticks out like a sore thumb if we plot the x coordinates of the trees:

60

> hist(nztrees$x, nclass = 25)

20

0.005

40

0.01

Histogram of nztrees$x

50

100

150

Removing the right margin seems to have produced a much more uniform pattern.

We might want to exclude the right-hand boundary from the study region, to focus on the
pattern of the remaining trees. Lets say we decide to trim a 5-foot margin o the right-hand
side.
First we create the new, trimmed study region:
> chopped <- owin(c(0, 148), c(0, 95))
or more slickly,
> win <- nztrees$window
> chopped <- trim.rectangle(win, xmargin = c(0, 5), ymargin = 0)
> chopped
window: rectangle = [0, 148] x [0, 95] feet
(Notice that chopped is not a point pattern, but simply a rectangle in the plane.)
Then, using the subset operator [.ppp, we simply extract the subset of the original point
pattern that lies inside the new window:
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

53

10

Methods 2: Tests of Complete Spatial Randomness

The basic reference or benchmark model of a point process is the uniform Poisson point
process in the plane with intensity , sometimes called Complete Spatial Randomness (CSR).
Its basic properties are
the number of points falling in any region A has a Poisson distribution with mean area(A)
given that there are n points inside region A, the locations of these points are i.i.d. and
uniformly distributed inside A
the contents of two disjoint regions A and B are independent.
The uniform Poisson process is often the null model in an analysis. For historical reasons,
many applied writers focus on establishing that their data do not conform to a uniform Poisson
process.

10.1

54

Methods 2: Tests of Complete Spatial Randomness

Conceptually, if we discretise a homogeneous Poisson process into innitesimal pixels, the


indicators I are independent and identically distributed, with success probability P {I = 1} =
dA where dA is the innitesimal area of a pixel.
To develop some intuition about completely random patterns, its useful to repeat the command plot(rpoispp(100)) several times (use the up-arrow key to recall the previous command
line) so that you see several replicates of the Poisson process. In particular you will notice that
the points in a homogeneous Poisson process are not uniformly spread: there are empty gaps
and clusters of points.
The command rpoispp has arguments lambda (the intensity) and win (the window in which
to simulate). The default window is the unit square.
> data(letterR)
> plot(rpoispp(100, win = letterR))

rpoispp(100, win = letterR)

Denition

The homogeneous Poisson process of intensity > 0 has the properties


(PP1): the number N (X B) of points falling in any region B is a Poisson random variable;
(PP2): the expected number of points falling in B is E[N (X B)] = area(B);
(PP3): if B1 , B2 are disjoint sets then N (XB1 ) and N (XB2 ) are independent random variables;
(PP4): given that N (X B) = n, the n points are independent and uniformly distributed in B.
The list is redundant; (PP2) and (PP3) are sucient.
This process is often called Complete Spatial Randomness (CSR) especially in biological
science. Under CSR, points are independent of each other and have the same propensity to be
found at any location.
It is easy to simulate the Poisson process directly by following the properties (PP1)(PP4).
In spatstat, use the command rpoispp (by convention, random data generators have names
beginning with r).

If you want to simulate a Poisson process conditionally on a xed number of points, use the
command runifpoint.
> runifpoint(100)
planar point pattern: 100 points
window: rectangle = [0, 1] x [0, 1] units

> plot(rpoispp(100))

rpoispp(100)

c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

10.2 Quadrat counting tests for CSR

10.2

55

Quadrat counting tests for CSR

> quadrat.test(nzchop, nx = 3, ny = 2)
Chi-squared test of CSR using quadrat counts
data: nzchop
X-squared = 5.0769, df = 5, p-value = 0.4066
The value returned by quadrat.test is an object of class "htest" (the standard R class
for hypothesis tests). Printing the object (as shown above) gives comprehensible output about
the outcome of the test. Inspecting the p-value, we see that the test does not reject the null
hypothesis of CSR for the (chopped) New Zealand trees data.
The return value quadrat.test also belongs to the special class "quadrat.test". Plotting
the object will display the quadrats, annotated by their observed and expected counts and the
Pearson residuals (observed counts nj at top left; expected count at top right; Pearson residuals
at bottom).
> M <- quadrat.test(nzchop, nx = 3, ny = 2)
> M
Chi-squared test of CSR using quadrat counts
data: nzchop
X-squared = 5.0769, df = 5, p-value = 0.4066

0.28

1.1
13

17

13
1.1

13 12

10.3

Critique

Since this kind of technique is often used in the applied literature, a few comments are appropriate.
The main critique of the quadrat test approach is the lack of information. This is a goodnessof-t test in which the alternative hypothesis H1 is simply the negation of H0 , that is, the
alternative is that the process is not a homogeneous Poisson process. A point process may
fail to satisfy properties (PP1)(PP4) either because it violates (PP2) by having non-uniform
intensity, or because it violates (PP3)(PP4) by exhibiting dependence between points. There
are too many types of departure from H0 .
The usual justication for the classical 2 goodness-of-t test is to assume that the counts
are independent, and derive a test of the null hypothesis that all counts have the same expected
value. Invoking it here is slightly naive, since the independence of counts is also open to question
here.
Indeed we can also turn things around and view the 2 test as a test of the Poisson distributional properties (PP2)(PP3) assuming that the intensity is uniform. The Pearson 2 test
statistic

2
j (nj n/m)
X2 =
n/m

(where n = j nj is the total number of points) coincides, up to a constant factor, with the
sample variance-to-mean ratio of the counts nj , which is often interpreted as a measure of
over/under-dispersion of the counts nj assuming they have constant mean.
The power of the quadrat test depends on the size of quadrats, and falls to zero for quadrats
which are either very large or very small. The power also depends on the alternative hypothesis,
in particular on the spatial scale of any departures from the assumptions of constant intensity
and independence of points. The choice of quadrat size carries an implicit assumption about the
spatial scale.

Kolmogorov-Smirnov test of CSR

Typically a more powerful test of CSR is the Kolmogorov-Smirnov test in which we compare
the observed and expected distributions of the values of some function T .
We specify a real-valued function T (x, y) dened at all locations (x, y) in the window. We
evaluate this function at each of the data points. Then we compare this empirical distribution
of values of T with the predicted distribution of values of T under CSR, using the classical
Kolmogorov-Smirnov test.
In spatstat the spatial Kolmogorov-Smirnov test is performed by kstest. This function is
generic. The method for point patterns, kstest.ppp, performs the Kolmogorov-Smirnov test
for CSR.
If X is the data point pattern, then

nzchop

13 17

[1] 0.4065648

10.4

> plot(nzchop)
> plot(M, add = TRUE, cex = 2)

13 14

Methods 2: Tests of Complete Spatial Randomness

> M$p.value

In classical literature, the homogeneous Poisson process (CSR) is usually taken as the appropriate
null model for a point pattern. Our basic task in analysing a point pattern is to nd evidence
against CSR.
A classical test for the null hypothesis of CSR is the 2 test based on quadrat counts. As
explained earlier, the window W is divided into subregions (quadrats) B1 , . . . , Bm of equal area.
We count the numbers of points falling in each quadrat, nj = n(x Bj ) for j = 1, . . . , m. Under
the null hypothesis of CSR, the nj are i.i.d. Poisson random variables with the same expected
value. The Pearson 2 goodness-of-t test can be used.

56

13

> kstest(X, fun)

1.1

1.1

performs the test, where fun is a function(x,y) in the R language.


For example, lets consider the nzchop data and choose the function T to be the x coordinate,
T (x, y) = x. This means we are simply comparing the observed and expected distributions of
the x coordinate.

0.28

The p-value can also be extracted by


c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

10.4 Kolmogorov-Smirnov test of CSR

57

58

11

> kstest(nzchop, function(x, y) {


+
x
+ })

Methods 3: Maximum likelihood for Poisson processes

Methods 3: Maximum likelihood for Poisson processes

If we are willing to assume (tentatively) that the points are independent, then we can apply
some decent statistical methods to the investigation of the intensity.

Spatial Kolmogorov-Smirnov test of CSR

11.1
data:

covariate function(x, y) {
x} evaluated at points of nzchop
and transformed to uniform distribution under CSR
D = 0.0741, p-value = 0.7566
alternative hypothesis: two-sided

The result of kstest is an object of class "htest" (the standard R class for hypothesis
tests) and also of class "kstest" so that it can be printed and plotted. The print method
(demonstrated above) reports information about the hypothesis test such as the p-value. The
plot method displays the observed and expected distribution functions.
>
+
+
>
>

KS <- kstest(nzchop, function(x, y) {


x
})
plot(KS)
pval <- KS$p.value

(PP2): the number N (X B) of points falling in a region B has expectation



(u) du.
E[N (X B) =
B

(PP4): given that N (X B) = n, the n points are independent and identically distributed, with
common probability density f (u) = (u)/I, where I = B (u) du.
This process can also be simulated using rpoispp using the same properties. The intensity
argument lambda can be a constant, a function(x,y) giving the values of the intensity function
at coordinates x, y, or a pixel image containing the intensity values at a grid of locations.
> lambda <- function(x, y) {
+
100 * (x + y)
+ }
> plot(rpoispp(lambda))

x}

0.4

0.6

rpoispp(lambda)

0.0

0.2

probability

0.8

1.0

Spatial KolmogorovSmirnov test of CSR


based on distribution of covariate function(x, y) {
pvalue= 0.8299

Inhomogeneous Poisson process

The inhomogeneous Poisson process with intensity function (u), u R2 , is a modication of


the homogeneous Poisson process, in which properties (PP2) and (PP4) above are replaced by

50

100
function(x, y) {

150

x}

Sometimes this test generates a warning message about tied values. Typically this occurs
because the coordinates in the dataset have been rounded to the nearest integer, so that there
are tied observations.

c
Copyright CSIRO
2008

If we discretise an inhomogeneous Poisson process, the indicators I are independent, but


have unequal success probabilities, P {I(u) = 1} = (u) dA.
The inhomogeneous Poisson process is a plausible model for point patterns under several
scenarios. One is random thinning: suppose that a homogeneous Poisson process of intensity
is generated, and that each point is either deleted or retained, independently of other points.
Suppose the probability of retaining a point at the location u is p(u). Then the resulting process
of retained points is inhomogeneous Poisson, with intensity (u) = p(u).
c
Copyright CSIRO
2008

11.2 Likelihood methods

59

Consider, for example, a model of plant propagation which assumes that seeds are randomly
dispersed according to a Poisson process, and seeds randomly germinate or do not germinate,
independently of each other, with a germination probability that depends on the local soil
conditions. The resulting pattern of plants is an inhomogeneous Poisson process.

11.2

60

11.3.1

Methods 3: Maximum likelihood for Poisson processes

Model-tting function

The tting function is called ppm (point process model) and is very closely analogous to the
model tting functions in R such as lm and glm. The statistic S(u) is specied by an R language formula, like the formulas used to specify the systematic relationship in a linear model or
generalised linear model. The basic syntax is:

Likelihood methods
> ppm(X, ~trend)

The log-likelihood for the homogeneous Poisson process with intensity is


log L(; x) = n(x) log area(W )

(3)

where n(x) is the number of points in the dataset x. The maximum likelihood estimator of is
=

> ppm(bei, ~1)

n(x)
area(W )

Stationary Poisson process

 is var[]
 = /area(W ).
which is also an unbiased estimator. The variance of
Consider an inhomogeneous Poisson process with intensity function (u) depending on a
parameter . The log-likelihood for is
log L(; x) =

n



log (xi )

i=1

(u) du

(4)

This is a well-behaved likelihood; for example if log (u) is linear in , then the log-likelihood
is concave, so there is a unique MLE. However, the MLE  is not analytically tractable, so it
must be computed using numerical algorithms such as Newtons method.
The usual asymptotic theory of maximum likelihood applies: under suitable large sample
conditions, the MLE of is asymptotically normal. If we wish to test CSR, the likelihood ratio
test statistic

L()
R = 2 log

L()
is asymptotically 2 under CSR, and this gives an asymptotically optimal test of CSR against
the alternative of an inhomogeneous Poisson process with intensity (u).

11.3

where X is the point pattern dataset, and ~trend is an R formula with no left-hand side. This
should be viewed as a model with log link, so the formula ~trend species the form of the
logarithm of the intensity function.
To t the homogeneous Poisson model:

Fitting Poisson processes in spatstat

Mark Berman and Rolf Turner [13] (see also [30, 16, 31]) developed a clever computational device
for nding the MLE of by exploiting a formal similarity between the Poisson log-likelihood (4)
and that of a loglinear Poisson regression.
The Berman-Turner algorithm is implemented in spatstat. The intensity function (u)
must be loglinear in the parameter :

Uniform intensity:

0.007208

To t an inhomogeneous Poisson model with an intensity that is log-linear in the cartesian


coordinates, i.e. ((x, y)) = exp(0 + 1 x + 2 y),
> ppm(bei, ~x + y)
Nonstationary Poisson process
Trend formula: ~x + y
Fitted coefficients for trend formula:
(Intercept)
x
y
-4.7245290274 -0.0008031288 0.0006496090
Here x and y are reserved names that always refer to the cartesian coordinates. In the output,
the tted coecients are the maximum likelihood estimates of 0 , 1 , 2 , the coecients of the
linear predictor. The tted intensity function is
((x, y)) = exp (4.724529 + 0.000803 x + 0.00065 y) .
To t an inhomogeneous Poisson model with an intensity that is log-quadratic in the cartesian
coordinates, i.e. such that log ((x, y)) is a quadratic in x and y:
> ppm(bei, ~polynom(x, y, 2))
Nonstationary Poisson process
Trend formula: ~polynom(x, y, 2)

log (u) = S(u)

(5)

where S(u) is a real-valued or vector-valued function of location u. The form of S is arbitrary so


this is not much of a restriction. In practice S(u) could be a function of the spatial coordinates
of u, or an observed covariate, or a mixture of both. Assuming (5), the log-likelihood (4) is a
convex function of , so maximum likelihood is well-behaved.
c
Copyright CSIRO
2008

Fitted coefficients for trend formula:


(Intercept)
polynom(x, y, 2)[x]
polynom(x, y, 2)[y]
-4.275762e+00
-1.609187e-03
-4.895166e-03
polynom(x, y, 2)[x^2] polynom(x, y, 2)[x.y] polynom(x, y, 2)[y^2]
1.625968e-06
-2.836387e-06
1.331331e-05
c
Copyright CSIRO
2008

11.3 Fitting Poisson processes in spatstat

61

62

Methods 3: Maximum likelihood for Poisson processes

grad

0.3
0.2

0.25

400

0.1

200

Nonstationary Poisson process

0.05

100

> side <- function(z) factor(ifelse(z < 500, "left", "right"))


> ppm(bei, ~side(x))

0.15

300

To t a model with constant but unequal intensities on each side of the vertical line x = 500,
the explanatory variable S(u) should be a factor with two levels, Left and Right say, taking
the value Left when the location u is to the left of the line x = 500.

500

Essentially any kind of model formula can be used, involving the reserved names x and y
and any covariates (as we explain later).

Trend formula: ~side(x)

200

400

600

800

1000

1200

To t the inhomogeneous Poisson model with intensity which is a loglinear function of slope,

Fitted coefficients for trend formula:


(Intercept) side(x)right
-4.8026460
-0.2792705

i.e.
(u) = exp(0 + 1 Z(u))

(6)

where 0 , 1 are parameters and Z(u) is the slope at location u, we type


When factors are involved, the interpretation of the coecients depends on which contrasts
are in force. By default the treatment contrasts are assumed. This means that the treatment
eect is taken to be zero for the rst level of the factor, and the estimated treatment eects for
other levels are eectively estimates of the dierence from the rst level. In this case "left"
comes alphabetically before "right", so by default, the rst level is "left". The tted model
is

exp(4.8026)
if x < 500
((x, y)) =
exp(4.8026 + (0.2793)) if x 500
Rather than relying on such interpretations, it is prudent to use the command predict to
compute predicted values of the model, as explained in Section 11.4 below.

11.3.2

> ppm(bei, ~slope, covariates = list(slope = grad))


Nonstationary Poisson process
Trend formula: ~slope
Fitted coefficients for trend formula:
(Intercept)
slope
-5.390553
5.022021
In the call to ppm, the argument covariates should be a list of name=value pairs. The names
should match the variables appearing in the model formula. The values should be pixel images.
The printout includes the tted coecients 0 , 1 so the tted model is

Models involving spatial covariates

(u) = exp(5.390553 + 5.022021 Z(u)).

It is also possible to t an inhomogeneous Poisson process model with an intensity function that
depends on an observed covariate. Let Z(u) be a covariate that has been measured at every
location u in the study window. Then Z(u), or any transformation of it, can serve as the statistic
S(u) in the parametric form (5) for the intensity function.
The point pattern dataset bei is supplied with accompanying covariate data bei.extra.
The covariates are the elevation (altitude) and the slope of the terrain at each location in the
window, given as two pixel images bei.extra$elev and bei.extra$grad.

(7)

It might be more appropriate to t the inhomogeneous Poisson model with intensity that is
proportional to slope,
(u) = Z(u)
(8)
where again Z(u) is the slope at u. Equivalently
log (u) = log + log Z(u).

(9)

There is no coecient in front of the term log Z(u) in (9), so this term is an oset. To t this
model,

> data(bei)
> grad <- bei.extra$grad
> plot(grad)

> ppm(bei, ~offset(log(slope)), covariates = list(slope = grad))


c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

11.4 Fitted models

63

64

Methods 3: Maximum likelihood for Poisson processes

Nonstationary Poisson process

0.006

100

(u) = e2.427127 Z(u) = 0.0883 Z(u).

0.004

The tted coecient is the constant log appearing in (9), so converting back to the form
(8), the tted model is

200

0.008

300

Fitted coefficients for trend formula:


(Intercept)
-2.427127

0.01

400

500

Trend formula: ~offset(log(slope))

0.012

Fitted trend

11.4

400

600

800

1000

Fitted models

The value returned by the model-tting function ppm is an object of class "ppm" that represents
the tted model. This is analogous to the tting of linear models (lm), generalised linear models
(glm) and so on.

11.4.1

200

Standard operations

The following standard operations on tted models in R can be applied to point process models
(i.e. these operations have methods for the class "ppm"):
print
print basic information
summary print detailed summary information
plot
plot the tted intensity
predict compute the tted intensity
fitted
compute the tted intensity at data points
update
re-t the model
coef
extract the tted coecient vector 
vcov
variance-covariance matrix of 
anova
analysis of deviance
logLik
log-likelihood value
For information on these methods, see print.ppm, summary.ppm, plot.ppm etc.
> fit <- ppm(bei, ~x + y)
> fit

> predict(fit, type = "trend")


real-valued pixel image
50 x 50 pixel array (ny, nx)
enclosing rectangle: [0, 1000] x [0, 500] metres
> predict(fit, type = "cif", ngrid = 256)
real-valued pixel image
256 x 256 pixel array (ny, nx)
enclosing rectangle: [0, 1000] x [0, 500] metres
> coef(fit)
(Intercept)
x
-4.7245290274 -0.0008031288

y
0.0006496090

> vcov(fit)
(Intercept)
x
y
(Intercept) 1.854091e-03 -1.491267e-06 -3.528289e-06
x
-1.491267e-06 3.437842e-09 1.208410e-14
y
-3.528289e-06 1.208410e-14 1.338955e-08
> sqrt(diag(vcov(fit)))

Nonstationary Poisson process

(Intercept)
x
y
4.305915e-02 5.863311e-05 1.157132e-04

Trend formula: ~x + y

> round(vcov(fit, what = "corr"), 2)

Fitted coefficients for trend formula:


(Intercept)
x
y
-4.7245290274 -0.0008031288 0.0006496090

(Intercept)
x
y

> plot(fit, how = "image")


c
Copyright CSIRO
2008

(Intercept)
x
y
1.00 -0.59 -0.71
-0.59 1.00 0.00
-0.71 0.00 1.00

c
Copyright CSIRO
2008

11.4 Fitted models

65

11.5

This is the tted model with intensity function


((x, y)) = exp (0 + 1 x + 2 y)
with the following estimates:
var(i )
i i
0 -4.724529
0.001854091
1 -0.0008031288 3.437842e-09
1.338955e-08
2 0.000649609
11.4.2

66

(10)

Methods 3: Maximum likelihood for Poisson processes

Simulating the tted model

A tted Poisson model can be simulated automatically using the function rmh.
> X <- rmh(fitprop)
> plot(X, main = "realisation of fitted model")

standard deviation
0.04305915
5.863311e-05
0.0001157132

realisation of fitted model

Model selection

Analysis of deviance for nested Poisson point process models is implemented in spatstat as
anova.ppm. The rst model should be a sub-model of the second.
> fit <- ppm(bei, ~slope, covariates = list(slope = grad))
> fitnull <- update(fit, ~1)
> anova(fitnull, fit, test = "Chi")
Analysis of Deviance Table
Model 1: .mpl.Y ~ 1
Model 2: .mpl.Y ~ slope
Resid. Df Resid. Dev
1
20507
18728.4
2
20506
18346.1

Df Deviance P(>|Chi|)
1

382.3 4.018e-85

This eectively performs the likelihood ratio test of the null hypothesis of a homogeneous
Poisson process (CSR) against the alternative of an inhomogeneous Poisson process with intensity that is a loglinear function of the slope covariate (6). The p-value is extremely small,
indicating rejection of CSR in favour of the alternative. (Please ignore the columns Resid.Df
and Resid.Dev which are artefacts of the discretisation. Only the deviance dierence and the
dierence in degrees of freedom are valid.)
At the time of writing, automatic model selection (using step) does not work for the class
"ppm".
Note that standard Analysis of Deviance requires the null hypothesis to be a sub-model of the
alternative. Unfortunately the model (8), in which intensity is proportional to slope, does not
include the homogeneous Poisson process as a special case, so we cannot use analysis of deviance
to test the null hypothesis of homogeneous Poisson against the alternative of an inhomogeneous
Poisson with intensity (8).
One possibility here is to use the Akaike Information Criterion AIC for model selection.
> fitprop <- ppm(bei, ~offset(log(slope)), covariates = list(slope = grad))
> fitnull <- ppm(bei, ~1)
> AIC(fitprop)
[1] 42496.65
> AIC(fitnull)
[1] 42763.92
The smaller AIC favours the model (8) with intensity is proportional to slope.
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

67

12

68

Methods 4: checking a tted Poisson model

Methods 4: checking a tted Poisson model


bei

After tting a point process model to a point pattern dataset, we should check that the model is a
good t (goodness-of-t), and that each component assumption of the model was appropriate
(validation). This section presents some techniques available for checking a tted Poisson
model.

666

544

In contrast, informal tools do not impose assumptions on the data and their interpretation
depends on human judgement. A typical example is the residual, dened for each observation by
(residual) = (observed) - (fitted). If the model is a good t, then the residuals should
be noise, centred around zero.

Goodness-of-t

A goodness-of-t test is a formal test of the null hypothesis that the model is true, against the
very general alternative that the model is not true.
The 2 goodness-of-t test based on quadrat counts can be applied to a tted Poisson model,
homogeneous or inhomogeneous. Under the null hypothesis, the quadrat counts are independent
Poisson variables with dierent mean values, and the means are estimated by the tted model.

>
>
>
>

677

480

130

9.1

2.7

Model checking can be either formal or informal. Formal techniques are based on detailed probabilistic assumptions about the data, and allow us to make probabilistic statements
about the outcome. They include hypothesis tests, p-values, Bayesian model selection, 2 tests,
goodness-of-t tests and Monte Carlo tests. These have been presented in the previous sections.

12.1

600

600

165

2.3

400

481

14
480

14

643

320

400

298

12

320

1.2

The plot displays, for each quadrat, the observed number of points (top left), the predicted
number of points according to the model (top right), and the Pearson residual (bottom) dened
by
(observed) (expected)

Pearson residual =
expected
If the original data were Poisson, this transformation approximately standardises the residuals
so that they have mean zero and variance 1 when the model is true. A Pearson residual of 14
is a gross departure from the tted model.
The Kolmogorov-Smirnov test can also be applied to a tted Poisson model, with homogeneous or inhomogeneous intensity.
> kstest(fit, function(x, y) {
+
y
+ })

data(bei)
fit <- ppm(bei, ~x)
M <- quadrat.test(fit, nx = 4, ny = 2)
M

Spatial Kolmogorov-Smirnov test of inhomogeneous Poisson process


data:

Chi-squared test of fitted model fit using quadrat counts

covariate function(x, y) {
y} evaluated at points of bei
and transformed to uniform distribution under fit
D = 0.1026, p-value < 2.2e-16
alternative hypothesis: two-sided
This uses the method kstest.ppm for the generic function kstest.

data: data from fit


X-squared = 711.5036, df = 6, p-value < 2.2e-16

12.2
12.2.1

If (as in this case) the formal goodness-of-t test rejects the tted model, we would then like
to gain an informal impression of the type of departure from the model (i.e. in what way the
data appear to depart from the predictions of the model) so that we may formulate a better
model. To do this we can inspect the residual counts.

> plot(bei, pch = ".")


> plot(M, add = TRUE, cex = 1.5, col = "red")

Validation using residuals


Residuals

Residuals from the tted model are an important diagnostic tool in other areas of applied
statistics, but in spatial statistics they have only recently been developed ([35, 41], [40, pp.
4950], [6]).

For a tted Poisson process
model, with tted intensity (u),
the predicted number of points


du. Hence the residual in each region B R2 is dened [6]
falling in any region B is B (u)
to be the observed minus predicted number of points falling in B: [6]


(u)
du
(11)
R(B) = n(x B)
B

c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

12.2 Validation using residuals

69

70

Methods 4: checking a tted Poisson model

where x is the observed point pattern, n(x B) the number of points of x in the region B, and

(u)
is the intensity of the tted model.
These residuals are closely related to the residuals for quadrat counts that were used above.
Taking the set B to be one of our quadrats, the observed quadrat count isn(x B). The

 area(B) if the model is CSR, or more generally
expected quadrat count is
B (u) du if the
model is an inhomogeneous
Poisson
process.
Hence
the
raw
residual
is
observed
- expected


du.
= n(x B) B (u)

12.2.2

Smoothed raw residuals

Residual measure

Equation (11) denes the total residual for any region B, large or small.
Intuitively the residuals can be visualised as an electric charge, with unit positive charge at

each data point, and a diuse negative charge at all other locations u, with density (u).
If the
model is true, then these charges should approximately cancel.
If wed like to visualise this electric charge, one way is to plot the observed points and the
tted intensity function together:

This is an image plot of the smoothed residual eld



s(u) = (u)
(u)

(12)


where (u)
is the nonparametric, kernel estimate of the intensity,

(u)
= e(u)

n(x)

(u xi )

i=1

>
>
>
>

while (u) is a correspondingly-smoothed version of the parametric estimate of the intensity


according to the tted model,

(u v)(v) dv.
(u) = e(u)

data(bei)
fit <- ppm(bei, ~x + y)
plot(predict(fit))
plot(bei, add = TRUE, pch = "+")

Here is the smoothing kernel and e(u) is the edge correction (2) on page 37. The dierence
(12) should be approximately zero if the model is true.
In this example the smoothed residual image contains a visible trend, suggesting that the
model is inappropriate.

200

400

600

800

0.012
0.01
0.008

12.2.4

Lurking variable plot

If there is a spatial covariate Z(u) that plays an important role in the analysis, it may be useful
to display a lurking variable plot of the residuals against Z. This is a plot of C(z) = R(B(z))
against z, where
B(z) = {u W : Z(u) z}

0.006

+
+++++++++++++++++
++
+
++
++
+++++++++ +
+ ++
+ + ++
++++ +++++
++
++ +
+++++++ ++++ + ++++
+ + +
++
++++++
+
+++ +
++++++
+
+
++
++++++ + ++ +
++
+++
+++++++++++
+++
++ ++
++
++
+ +++++
+
+ +
++
++
+
+
+
+
++
++
+++++++ + ++++++++
+
+
+
+
+++
++++ ++++ +
+++
+
++
++ ++++
+
++++
++ +
+
+
++
++
+
++++
++
++
++
++++
++++
+
+++
+
+++++
+ ++++
+++
+
+ ++
+++++ + +
++
++++ ++ + +
+++
+ ++++
+
+
++++ +++
+ ++
+
+++
+++++
++ +
+
+++++
++++++++++
+++
+++++++
++
++
+++
++++ +
++++++
++
++++++
++
+ ++++++++ +
+++++++
+++++++++++++++
+ + ++++
++
+++
+ ++ +
+++++
++++
++
++
++++
+++++
+++++++
++
++
+ ++++
++++
+
+++
+++
++++
+++++++ +++ +
+++ + + +
+++ +
+++
+ +++++
++
++
+ +++
++
++
++ +++
++
+++
+
++++
+
+ + +++
+++ ++++++
+
++++
++
+
++++ +++++++++
+
+
+
+
+ +++
+
+
+
++
+
+
+
+++++++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++ + ++ +++ ++++++++++++++++++
++++ ++ ++++
+++
++
+ + ++
+ ++ +++++++
+ +++ +++ ++
+ +++
++++
+++
+++
+++++ ++
+++
++
+
+
++ +
++ + + +
++
++++ +++
+
+
+
+
++ +++
+ ++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +++
+ +
+
++++++ +
+
++ +
++++
+
+++++++++
+
+++++
+++ ++++++++ ++++++ +++++++++
+++++ ++
++
++
+
+
+ ++ ++ +
+ +++
+++++ +++
+
+
+
+
+
+
+
+
+
+
+++++++++
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++ ++ ++ +
+++ ++++++++++++
++++++++
+++
++
+++++ ++
+++
++ +
++++++ +
++
++
++++++++++
+++
+
++
++
+ +++ +
+++
+
+++ + +
+
+
++
+ +++++
++++++ +++ +
+ +++ ++++ +
++ + ++ +
+++++++
+++++
+++
++++
++
+++++++++
++ ++ +
++++++
++++++
+++
+++++ +
+
+++
+++
+
+
+++
+
+ +++
+ ++ +++
++++
++
+++++
++++++ +++++
+
++
+ ++ +++
+
++
++++++
++
++
+++
++
+
+ + ++
++
+ +++ + + +++ + +
+++
++++++++++ ++++
++++++++
++++++++
++++
++ +
+ ++++
+
+
++++++
++++++++++++++
+ + +
+++
++++++++++
+++
+++
+
++
+
++++++++
+
+++
+++++++
++
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+ ++
+++
+
++
+++++
++++++++
+++
++
++++ +
+++++
++++
++
+
++
+ +++
+ ++
++
++++
++
+
+++
+++
++
+++
+++++
++
+
+ + + ++
+
++++
++++ ++++++++ +
++
+
++
+++++
++
++
+ ++
++
++++
++++++++ + ++
++ +
+
+++++
+++
+++
+ + +++ + +
++
+
+ + + ++++++ ++++++++ + +
+
++
+++ +
++
+
+
++++++
++ + +
+
+
++ +
++
+
+ +
+ ++ +++
++++++++++
++
++
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ + ++
+
+ +++++
+++++
+ ++
+++++ +
+++ ++
++++
++++++
+ + + + ++
++ +
+ +
++
+
++
+++
++
++ +
+
++
++
++
++
+++++++
++++++ ++++++++++
+
+ + ++++
+++++
++
+
+++ ++ +
++ + ++++++ ++++ + +
+++
++++++++ +++
+++
+++ +++++ +++++++++++
++++ ++ +++
+
+
+
++
++
++
++ +++++
+
++
+++++
++
+ +
++ +
++
++ + + + ++
+++
++ +++
++
++ +
+++
+++++
+
++ + + ++++++
++++++++++++
++ +++
+++++
++++
+
++ ++++++
++
+
+++
++++
+++
++
+++++++
++++++
++
++ + +++++
++
+
+
++++++++
++
++++
+++
+++
++ ++ ++
++
++
++
+
+ ++
+ + + ++ + ++++ ++
++
+++++++
++ +++++++
++++
++
+++
+ +++
++++++
++ +++++
+
+
+ ++
+
+
+++++
+++++++
++++
+
+++++++
++++++
+++++++
+++++++
+
+
++
++
++
+++
+
+
++++++
+
++
+
+
++
++++++
+ +++++++++
++
+
+
+
+
+
+
+++++++
+
+
+
+ ++
+
+++
++
+
+
+
+
+
+ + ++++ + +++ ++ +
+
+
+++ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+++
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
+
+
+
+
+++++ ++
+ + +++ ++ + + +
++++++
+++++ + ++++++++ ++ +++
+ ++
+++
++++
++
++ ++++
+++++
++
++
+++ +
++
+++
+++ + +
++++ ++++++ +++++ ++ +++++
+ + +++++
+ + +

0.004

100

200

300

400

500

predict(fit)

is the region of space where the covariate value is less than or equal to z.

1000

Each data point should be visualised as a charge of +1, while the colour image indicates a
negative charge density.

> grad <- bei.extra$grad


> lurking(fitx, grad, type = "raw")

400
600

A more useful way to visualise the residuals is to smooth them.

200

Smoothed residuals
cumulative raw residuals

12.2.3

> data(bei)
> fitx <- ppm(bei, ~x)
> diagnose.ppm(fitx, which = "smooth")

0.00

0.05

0.10

0.15

0.20

covariate

c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

0.25

0.30

12.2 Validation using residuals

71

Note that the lurking variable plot typically starts and ends at the horizontal axis, since (for
any model with an intercept term) the total residual for the entire window W must equal zero.
This is analogous to the fact that the residuals in linear regression sum to zero.
The plot also shows approximate 5% signicance bands for the cumulative residual C(x) or
C(y), obtained from the asymptotic variance under the model.
This plot indicates that the model is grossly inadequate; the tted intensity function fails to
capture the dependence of intensity on slope.
12.2.5

Four-panel plot

If there are no spatial covariates, use the command diagnose.ppm to plot the residuals:

72

Methods 4: checking a tted Poisson model

The top left panel is a direct representation of the residual charge, with circles representing
the data points (positive residuals) and a colour scheme representing the tted intensity (negative
residuals). However, it is often dicult to interpret.
The two other panels are lurking variables against one of the cartesian coordinates. For
example, the bottom left panel is a lurking variable plot for the x-coordinate. Imagine a vertical
line which sweeps from left to right across the window. The progressive total residual to the left
of the line is plotted against the position of the line.
In this example, the lurking variable plot for the y coordinate suggests a lack of t at about
y = 0.15, and the image of the smoothed residual eld suggests an excess of positive residuals
at about x = 0.8, y = 0.15, both indicating that the model underestimates the true intensity of
points in this vicinity.

> data(japanesepines)
> fit <- ppm(japanesepines, ~x + y)
> diagnose.ppm(fit)
12.2.6

The residual plots described above are only useful for detecting misspecication of the trend in
the tted model. For example, the cells dataset has a uniform intensity but is clearly not a
Poisson pattern:

cumulative sum of raw residuals


6

y coordinate

>
>
>
>
>

data(cells)
par(mfrow = c(1, 2))
plot(cells)
plot(Kest(cells))
par(mfrow = c(1, 1))

0.2

0.4

0.6

0.8

QQ plot

0.20
0.10
0.05

K(r)

0.15

cells

0.2

0.4

0.6

0.8

0.00

cumulative sum of raw residuals

Kest(cells)

0.00

x coordinate

0.05

0.10

0.15
r

yet the residual plots appear to show nothing is wrong:


This combination of four plots has proved to be a useful quick indication of departure from
the trend in the model.
The bottom right panel is an image of the smoothed residual eld.
c
Copyright CSIRO
2008

> fitPois <- ppm(cells, ~1)


> diagnose.ppm(fitPois)
c
Copyright CSIRO
2008

0.20

0.25

12.2 Validation using residuals

73

74

Images in spatstat

13

Its time to learn some more about pixel images in spatstat. They represent spatial functions
Z(u) in many dierent contexts.
An object of class "im" represents a pixel image. It species a rectangular grid of locations
(pixels) in two dimensional space, and a numerical value for each pixel. The pixel values
can be real numbers, integers, complex numbers, single characters or strings, logical values or
categorical values. A pixels value can also be NA, meaning that it is not dened at that location.

cumulative sum of raw residuals


6

13.1

0.6
0.2

0.4

y coordinate

0.8

Creating a pixel image


Creating an image from raw data

13.1.1

To create a pixel image from raw data, use im:

> im(mat, xcol, yrow)

cumulative sum of raw residuals

Images in spatstat

0.2

0.4

0.6

0.8

x coordinate

Interaction between points in a point process corresponds roughly to the distribution of the
responses in loglinear regression. To validate the interaction terms in a point process model, we
should plot the distribution of the residuals. The appropriate tool is a QQ plot.

where mat is a matrix containing the pixel values. The pixel values could have been generated
by hand, or read from a le.
The correspondence between matrix indices mat[i,j] and cartesian coordinates is slightly
idiosyncratic: the rows of m correspond to the y coordinate, and the columns to the x coordinate.
The argument xcol is a vector of equally-spaced x coordinate values corresponding to the
columns of mat, and yrow is a vector of equally-spaced y coordinate values corresponding to
the rows of mat. These vectors determine the spatial position of the pixel grid. The length of
xcol is ncol(mat) while the length of yrow is nrow(mat). If mat is not a matrix, it will be
converted into a matrix with nrow(mat) = length(yrow) and ncol(mat) = length(xcol).
>
>
>
+
>

vec <- seq(-5, 5, length = 1200) + rnorm(1200)


mat <- matrix(vec, nrow = 30, ncol = 40)
noisy <- im(mat, xcol = seq(0, 4, length = 40), yrow = seq(0,
3, length = 30))
plot(noisy)

30

> qqplot.ppm(fitPois, nsim = 39)

2.5
2.0

0
30

1.0

20

1.5

10

10

10

20

30

Mean quantile of simulations

This shows a QQ plot of the smoothed residuals, with pointwise 5% critical envelopes
from simulations of the tted model. This indicates that the uniform Poisson model is grossly
inappropriate.

20

0.5

30

0.0

data quantile

10

3.0

20

noisy

For some strange reason, R does not allow matrices with categorical (factor) values. To
create a pixel image with categorical values, leave the pixel values as a vector. The im command
will reshape it:
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

76

as.im
density.ppp
density.psp
distmap.owin
distmap.ppp
distmap.psp
setcov
predict.ppm
[.im
shift.im
eval.im
cut.im
interp.im

> cutvec <- cut(mat, 3)


> cutnoise <- im(cutvec, xcol = seq(0, 1, length = 40), yrow = seq(0,
+
1, length = 30))
> plot(cutnoise)

(2.71,1.91]

13.2.1

0.2

0.4

0.6

0.8

1.0

1.2

Although mat was a matrix, cutvec is a vector, with factor values. Finally cutnoise is a
factor-valued image.

13.1.2

Converting a function to an image

The command as.im will convert other types of data to a pixel image.

Methods for plotting an image object include:


plot.im
display as colour image
contour.im contour plot
persp.im
perspective plot of surface
Note that the default colour map for image plots in R has only 12 colours and can convey a
misleading impression of the gradation of pixel values in the image. Use the argument col to
control the colour map.
>
>
>
>

opa <- par(mfrow = c(1, 2))


plot(Z)
plot(Z, col = grey(seq(1, 0, length = 512)))
par(opa)
Z

1
0.0
0
1.0

1.0

The second argument of as.im is a window object (class "owin") specifying the domain of
the image.

0.5
0.5

0.0

0.5

f <- function(x, y) {
x^2 + y^2
}
w <- owin(c(-1, 1), c(-1, 1))
Z <- as.im(f, w)

0.5

>
+
+
>
>

1.5
0.5

1.0

A function f(x,y) can be converted into a pixel image. This makes it easy to create a pixel
image in which the pixel values are dened by an algebraic formula in the x and y coordinates.

Plotting an image

1.0

13.1.3

Functions that return a pixel image

1.5

0.0

Inspecting an image

1.0

0.0

0.2

(7.33,2.71]

13.2

converts other data to a pixel image


kernel smoothing of point pattern
kernel smoothing of line segment pattern
distance function of window
distance function of point pattern
distance function of line segment pattern
geometric covariance function of a window
tted intensity of a point process model
subset of an image (or look up pixel values)
vector shift of an image
evaluate any expression involving images
convert numeric image to factor image
spatial interpolation of image

0.6

0.8

(1.91,6.53]

1.0

cutnoise

0.4

Images in spatstat

0.5

75

13.2.2

0.5

0.0

0.5

1.0

13.1 Creating a pixel image

1.0

Exploratory analysis

To inspect an image, the following are useful.

Functions that return an object of class "im" include:


c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

0.5

0.0

0.5

1.0

Images in spatstat

elev[S]

143.5

200

Subsets of an image

143
142.5
141.5
141

13.3.1

Manipulating images

120

13.3

140

142

as.matrix extract matrix of pixel values from image


cut.im
convert numeric image to factor image
hist.im
histogram of pixel values
For an image Z with any type of values, plot(cut(Z, 3)) will divide the pixel values into 3
bands, and display the image with the 3 bands rendered in 3 dierent colours.
To compute numerical summaries of pixel values, like the median or order statistics of the
pixel values, extract the pixel values using as.matrix(Z) then apply the summary operation.

78

180

77

160

13.3 Manipulating images

100

140.5

The subset operator [ has a method for pixel images, [.im:


200

> X[S]
> X[S, drop = TRUE]

220

240

260

280

300

320

This can even be performed interactively, using the R function locator to click on a point
in the window:

The subset to be extracted is determined by the index argument S.


> elev[locator(1)]
If S is a point pattern, or a list(x,y), then the values of the pixel image X at these points
are extracted, and returned as a vector.
If S is a window (an object of class "owin"), the values of the image inside this window
are extracted. The result is a pixel image if possible, and a numeric vector otherwise (see
help("[.im") for details).

13.3.2

Computation with images

The handy function eval.im allows us to perform pixel-by-pixel calculations on an image or on


several compatible images.
If Z is a pixel image, to take the logarithm of each pixel value,
> logZ <- eval.im(log(Z))

If S is a pixel image with logical values, it is interpreted as a window (with TRUE inside
the window).
The logical argument drop determines whether pixel values that are undened are omitted
(drop = TRUE) or returned as the value NA (drop=FALSE).
See help("[.im") for full details.
The subset operator can be used to look up the value of a pixel image at a single point:
> data(bei)
> elev <- bei.extra$elev
> elev[list(x = 142, y = 356)]

If A and B are two pixel images with compatible grids of pixels (i.e. having the same numbers
of pixels and the same coordinate locations), then to nd the sum of the corresponding pixel
values,
> C <- eval.im(A + B)
The expressions may involve constants and functions as well, so long as the expression is
parallelised.
> W <- eval.im(sin(pi * Z))
> V <- eval.im(Z > 3)
> U <- eval.im(ifelse(Z > 3, 42, Z))
Other functions
shift.im
cut.im
interp.im
levelset
solutionset

[1] 147.08
or to display a subregion:
> S <- owin(c(200, 300), c(100, 200))
> plot(elev[S])
c
Copyright CSIRO
2008

which manipulate images include the following:


vector shift of an image
convert numeric image to factor image
spatially interpolate an image
threshold an image (produces a window)
nd the region where a statement is true (produces a window)

c
Copyright CSIRO
2008

79

14

80

Simple models of non-Poisson patterns

rThomas: the Thomas process, in which each cluster consists of a Poisson() number of
random points, each having an isotropic Gaussian N (0, 2 I) displacement from its parent.

Simple models of non-Poisson patterns

A point process that is not Poisson can be said to exhibit interaction or dependence between
the points. Its time to introduce some models for such processes. This section covers simple
models that are derived from the Poisson process, and still retain some of the tractable features
of the Poisson model.

rNeymanScott: the general Neyman-Scott cluster process in which the cluster mechanism
is arbitrary.

14.2
14.1

Poisson cluster processes

In a Poisson cluster process, we begin with a Poisson process Y of parent points. Each parent
point yi Y then gives rise to a nite set Zi of ospring points according to some stochastic
mechanism. The set comprising all the ospring points forms a point process X. Only X is
observed.

Cox processes

A Cox point process is eectively a Poisson process with a random intensity function. Let (u)
be a random function with non-negative values, dened at all locations u R2 . Conditional on
, let X be a Poisson process with intensity function . Then X is a Cox process.
A trivial example is the mixed Poisson process in which we generate a random variable
and, conditional on , generate a uniform Poisson process with intensity . Following are three
dierent realisations of this process:
>
>
+
+
+
+
>

par(mfrow = c(1, 3))


for (i in 1:3) {
lambda <- rexp(1, 1/100)
X <- rpoispp(lambda)
plot(X)
}
par(mfrow = c(1, 1))
X

An example is the Matern cluster process in which the parent points come from a homogeneous Poisson process with intensity , and each parent has a Poisson () number of ospring,
independently and uniformly distributed in a disc of radius r centred around the parent.
The Matern cluster process can be generated in spatstat using the command rMatClust.
[By convention, random data generators in R always have names beginning with r.]
Moments of Cox processes are tractable (in terms of the moments of ). The intensity
function of X is (u) = E[(u)].
A Cox model is the analogue of a random eects model. It is always overdispersed relative
to a Poisson process (i.e. the variance of the number of points falling in a region, is greater
than the mean). Cox processes are the most convenient models for clustered point patterns. A
particularly interesting and useful class is that of log-Gaussian Cox processes (LGCP) in which
log (u) is a Gaussian random function [33, 34].
The Matern Cluster process and the Thomas process are both Cox processes.
Currently there are no functions in spatstat for generating the general Cox process, but
if you have a way of generating realisations of a random function of interest, then you can
use rpoispp to generate the Cox process. The intensity argument lambda of rpoispp can be a
function(x,y) or a pixel image.

> plot(rMatClust(kappa = 10, r = 0.1, mu = 5))

rMatClust(kappa = 10, r = 0.1, mu = 5)

14.3

Thinned processes

Thinning means deleting some of the points from a point pattern. Under independent thinning
the fate of each point is independent of other points. When independent thinning is applied to a

Other Poisson cluster processes implemented in spatstat are


c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

14.4 Sequential models

81

Poisson process, the resulting process of retained points is Poisson. To get a non-Poisson process
we need some kind of dependent thinning mechanism.
In Materns Model I, a homogeneous Poisson process Y is rst generated. Any point in Y
that lies closer than a distance r from the nearest other point of Y, is deleted. Thus, pairs of
close neighbours annihilate each other.

82

Simple models of non-Poisson patterns

> plot(rSSI(0.05, 200))

rSSI(0.05, 200)

> plot(rMaternI(70, 0.05))


rMaternI(70, 0.05)

Sequential point processes are the easiest way to generate highly ordered patterns with high
intensity.

In Materns Model II, the points of the homogeneous Poisson process Y are marked by arrival
times ti which are independent and uniformly distributed in [0, 1]. Any point in Y that lies
closer than distance r from another point that has an earlier arrival time, is deleted.
> plot(rMaternII(70, 0.05))
rMaternII(70, 0.05)

14.4

Sequential models

In a sequential model, we start with an empty window, and the points are placed into the window
one-at-a-time, according to some criterion.
In Simple Sequential Inhibition, each new point is generated uniformly in the window and
independently of preceding points. If the new point lies closer than r units from an existing
point, then it is rejected and another random point is generated. The process terminates when
no further points can be added.
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

83

Methods 5: Distance methods for point patterns

Methods 5: Distance methods for point patterns

0.25
0.2

1.0

0.15

0.8

0.1

clustered

0.05

regular

0.0

0.2

independent

0.6

Suppose that a point pattern appears to have constant intensity, and we wish to assess whether
the pattern is Poisson. The alternative is that the points are dependent (they exhibit interaction).
Classical writers suggested a simple trichotomy between independence (the Poisson process),
regularity (where points tend to avoid each other), and clustering (where points tend to be
close together). [The concept of clustering does not imply that the points are organised into
identiable clusters; merely that they are closer together than expected for a Poisson process.]

Empty space distances

0.4

15

84

0.0

0.2

0.4

0.6

0.8

1.0

Tip: Quite a useful exploratory tool is the Stienen diagram obtained by drawing a
circle around each data point of diameter equal to its nearest neighbour distance:

15.1

Distances

> plot(X %mark% (nndist(X)/2), markscale = 1, main = "Stienen diagram")

The classical techniques for investigating interpoint interaction are distance methods, based on
measuring the distances between points. Specically we may consider

Stienen diagram

pairwise distances sij = ||xi xj || between all distinct pairs of points xi and xj (i = j)
in the pattern;
nearest neighbour distances ti = minj=i sij , the distance from each point xi to its
nearest neighbour;
empty space distances d(u) = mini ||uxi ||, the distance from a xed reference location
u in the window to the nearest data point.
If you need to compute these directly, they are available in spatstat using the functions
pairdist, nndist and distmap respectively. If X is a point pattern object,
pairdist(X) returns the matrix of pairwise distances.
nndist(X) returns the vector of nearest neighbour distances.
distmap(X) returns a pixel image whose pixel values are the empty space distances to the
pattern X measured from every pixel.
>
>
>
>

data(cells)
emp <- distmap(cells)
plot(emp, main = "Empty space distances")
plot(cells, add = TRUE)

15.2

Empty space distances

Its easiest to start by explaining the analysis of the empty space distances
The distance
d(u, x) = min{||u xi || : xi x}
from a xed location u R2 to the nearest point in a point pattern x, is called the empty
space distance or void distance. It can be computed for all locations u on a ne grid, using
the spatstat function distmap as we saw above.

c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

15.2 Empty space distances

15.2.1

85

Edge eects

It is not easy to interpret a histogram of the empty space distances. The empirical distribution of
the empty space distances depends on the geometry of the window W as well as on characteristics
of the point process X.
Another viewpoint is that the window introduces a sampling bias. Recall that under the
standard model (Section 2.3) the point process X extends throughout 2-D space, but is observed
only inside W . This leads to bias in the distance measurements. Conning observations to a
window W implies that the observed distance d(u, x) = d(u, X W ) to the nearest data point
inside W , may be greater than the true distance d(u, X) to the nearest point of the complete
point process X.
observed

86

Methods 5: Distance methods for point patterns

The edge eect problem can also be regarded as a form of censoring (analogous to rightcensoring in survival data), as rst pointed out by CSIRO researcher Geo Laslett [29]. A
counterpart of the Kaplan-Meier estimator is available. For further information see [7].
Thus, assuming that the point process is homogeneous, we are able to compute an unbiased
and reasonably accurate estimate of the empty space function F dened by (13).
To interpret this estimate, a useful benchmark is the Poisson process. Notice that d(u, X) > r
if and only if there are no points of X in the disc b(u, r) of radius r centred on u. For a
homogeneous Poisson process of intensity , the number of points falling in b(u, r) is Poisson
with mean = area(b(u, r)) = r 2 , so the probability that there are no points in this region
is exp() = exp(r 2 ). Hence for a Poisson process

true

Fpois (r) = 1 exp(r 2 ).

(16)

Typically we compare F(r) with the value of Fpois (r) obtained by plugging in the estimated
= n(x)/area(W ). Values F(r) > Fpois (r) suggest that empty space distances in the
intensity
point pattern are shorter than for a Poisson process, suggesting a regularly space pattern; while
values F(r) < Fpois (r) suggest a clustered pattern.

15.2.3

Implementation in spatstat

The function Fest computes estimates of F (r) using several edge corrections, and the benchmark
value for the Poisson process.

15.2.2

Empty space function F

Ignoring the edge problems for a moment, let us focus on the entire point process X.
Assuming X is stationary (statistically invariant under translations), we can dene the cumulative distribution function of the empty space distance
F (r) = P {d(u, X) r}

(13)

where u is an arbitrary reference location. If the process is stationary then this denition does
not depend on u.
The empirical distribution function of the observed empty space distances on a grid of locations uj , j = 1, . . . , m,
1 
1 {d(uj , x) r}
(14)
F (r) =
m
j

is a negatively biased estimator of F (r), for reasons explained above.


Corrections for this edge eect bias are required. Many edge corrections are available.
Typically they are weighted versions of the ecdf,

e(uj , r)1 {d(uj , x) r}
(15)
F (r) =
j

where e(u, r) is an edge correction weight designed so that F(r) is unbiased. These corrections
are eectively forms of the Horvitz-Thompson estimator of survey sampling fame.
c
Copyright CSIRO
2008

>
>
>
>

data(cells)
plot(cells)
Fc <- Fest(cells)
Fc

Function value object (class fv)


for the function r -> F(r)
Entries:
id
label
description
---------------r
r
distance argument r
theo
Fpois(r)
theoretical Poisson F(r)
rs
Fbord(r)
border corrected estimate of F(r)
km
Fkm(r)
Kaplan-Meier estimate of F(r)
hazard
lambda(r)
Kaplan-Meier estimate of hazard function lambda(r)
raw
Fraw(r)
uncorrected estimate of F(r)
-------------------------------------Default plot formula:
. ~ r
Recommended range of argument r: [0, 0.085]
c
Copyright CSIRO
2008

15.2 Empty space distances

87

88

Methods 5: Distance methods for point patterns

This is a call to plot.fv. The printed output is the return value from plot.fv, which
explains the encoding of the dierent function estimates using the R graphics parameters lty
(line type) and col (line colour).
Youll notice that, by default, the uncorrected estimate raw and the hazard rate hazard were
not plotted. The choice of estimates to be plotted, and the style in which they are plotted, are
controlled by the second argument to plot.fv, which should be an R language formula involving
the identier names r, theo, rs, km, hazard and raw. To plot the hazard rate against r,

cells

> plot(Fest(cells), hazard ~ r, main = "Hazard rate of F")

80

Hazard rate of F

> par(pty = "s")


> plot(Fest(cells))

hazard

40
20
0.00

0.02

0.04

0.06

0.08

To plot all the estimates of F (r), including the uncorrected estimate:


> plot(Fest(cells), cbind(km, rs, raw, theo) ~ r)

lty col
1
1
2
2
3
3

0.4
0.00

0.02

0.04

0.06

0.08

0.2

Notice the use of cbind to specify several dierent graphs on the same plot.
To plot the estimates of F (r) against the Poisson value, in the style of a PP plot:

0.0

F(r)

0.4

0.0

0.6

0.2

0.8

Fest(cells)

0.6

0.8

Fest(cells)

km , rs , raw , theo

km
rs
theo

The value returned by Fest is an object of class "fv" (function value table). This is
eectively a data frame with some extra information. The printout for Fc indicates that the
columns in the data frame are named r, theo, rs, km, hazard and raw. The rst column r
contains a sequence of values of the function argument r. The next column theo contains the
corresponding values of F (r) for a homogeneous Poisson process. The columns rs, km and raw
contain dierent estimates of the empty space function F , namely the reduced sample estimator,
the Kaplan-Meier estimator, and the uncorrected empirical distribution function, respectively.
The column hazard contains an estimate of the hazard rate of F , i.e. h(r) = (d/dr) log(1F (r)),
a by-product of the Kaplan-Meier estimate.

60

Tip: Dont use F as a variable name! Its a reserved word an abbreviation for
FALSE.

0.00

0.02

0.04

0.06

0.08

> plot(Fest(cells), cbind(km, rs, theo) ~ theo)


c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

15.2 Empty space distances

89

90

Methods 5: Distance methods for point patterns

Fest(cells)

0.8
0.6
0.0

0.0

0.2

0.4

asin(sqrt(cbind(km, rs, theo)))

0.6
0.4
0.2

km , rs , theo

0.8

1.0

1.2

1.0

Fest(cells)

0.0

0.2

0.4

0.6

0.8

1.0

0.00

theo

0.02

0.04

0.06

0.08

(including theo on the left side here gives us the diagonal line).
The symbol . stands for all recommended estimates of the function. So an abbreviation
for the last command is

> plot(Fest(cells), . ~ theo)

Transformations can be applied to these function values. For example, to subtract the
theoretical Poisson value from the estimates,

15.3

Nearest neighbour distances

For other types of distances we encounter similar problems. For the nearest neighbour distances
ti = minj=i ||xi x|j||, again it is not easy to interpret a histogram of the observed distances.
The empirical distribution of the nearest neighbour distances depends on the geometry of the
window W as well as on characteristics of the point process X. Conning observations to a
window W implies that the observed nearest-neighbour distances are larger, in general, than the
true nearest neighbour distances of points in the entire point process X. Corrections for this
edge eect bias are required.
15.3.1

G function

Assuming the point process X is stationary, we can dene the cumulative distribution function
of the nearest-neighbour distance for a typical point in the pattern,

> plot(Fest(cells), . - theo ~ r)

G(r) = P {d(u, X \ {u}) r | u X}

where u is an arbitrary location, and d(u, X \ {u}) is the shortest distance from u to the point
pattern X excluding u itself. If the process is stationary then this denition does not depend on
u.
The empirical distribution function of the observed nearest-neighbour distances

0.20

0.25

Fest(cells)

0.15

G (r) =

1 
1 {ti r}
n(x)

(18)

0.10

is a negatively biased estimator of G(r), for reasons we explained above. Many edge corrections
are available. Typically they are weighted versions of the ecdf,


e(xi , r)1 {ti r}
(19)
G(r)
=

0.00

0.05

cbind(km, rs, theo) theo

(17)

0.00

0.02

0.04

0.06

0.08


To apply Fishers variance stabilising transformation (F (t)) = sin1 ( (F (t))),


is approximately unbiased. A
where e(xi , r) is an edge correction weight designed so that G(r)
counterpart of the Kaplan-Meier estimator is also available.
For a homogeneous Poisson point process of intensity , the nearest-neighbour distance
distribution function is known to be
Gpois (r) = 1 exp(r 2 ).

> plot(Fest(cells), asin(sqrt(.)) ~ r)


c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

(20)

plot(Gest(X))
plot(Gest(X), . - theo ~ r)
plot(Gest(X), . ~ theo)

and Fishers variance-stabilising transformation (G(t)) = sin1 ( G(t)) applied to the PP
plot:
> fisher <- function(x) {
+
asin(sqrt(x))
+ }
> plot(Gest(cells), fisher(.) ~ fisher(theo))
Gest(cells)

1.5

Function value object (class fv)


for the function r -> G(r)
Entries:
id
label
description
---------------r
r
distance argument r
theo
Gpois(r)
theoretical Poisson G(r)
rs
Gbord(r)
border corrected estimate of G(r)
km
Gkm(r)
Kaplan-Meier estimate of G(r)
hazard
lambda(r)
Kaplan-Meier estimate of hazard function lambda(r)
raw
Graw(r)
uncorrected estimate of G(r)
--------------------------------------


G(r)
and Gpois (r) plotted against r

G(r)
Gpois (r) plotted against r

G(r)
plotted against Gpois (r) in PP style

1.0

> Gc <- Gest(cells)


> Gc

Methods 5: Distance methods for point patterns

0.5

This is identical to the empty space function for the Poisson process. Intuitively, because points
of the Poisson process are independent of each other, the knowledge that u is a point of X does
not aect any other points of the process, hence G is equivalent to F .


Interpretation of G(r)
is the reverse of F(r). Values G(r)
> Gpois (r) suggest that nearest
neighbour distances in the point pattern are shorter than for a Poisson process, suggesting a

clustered pattern; while values G(r)
< Gpois (r) suggest a regular (inhibited) pattern.
The function Gest computes estimates of G(r) using several edge corrections, and the benchmark value for the Poisson process.

92

fisher(cbind(km, rs, theo))

91

0.0

15.3 Nearest neighbour distances

0.0

0.5

1.0

1.5

fisher(theo)

Default plot formula:


. ~ r

15.4

Pairwise distances and the K function

The observed pairwise distances sij = ||xi xj || in the data pattern x constitute a biased sample
of pairwise distances in the point process, with a bias in favour of smaller distances. For example,
we can never observe a pairwise distance greater than the diameter of the window.
Ripley [36] dened the K-function for a stationary point process so that K(r) is the expected
number of other points of the process within a distance r of a typical point of the process.
Formally
1
(21)
K(r) = E [n(X b(u, r) \ {u}) | u X] .

For a homogeneous Poisson process, intuitively, the knowledge that u is a point of X does
not aect the other points of the process, so that X \ {u} is conditionally a Poisson process. The
expected number of points falling in b(u, r) is r 2 . Thus for a homogeneous Poisson process

Recommended range of argument r: [0, 0.15]


> par(pty = "s")
> plot(Gest(cells))

G(r)

0.6

0.8

Gest(cells)

0.0

0.2

0.4

Kpois (r) = r 2

0.00

0.05

0.10

0.15


The estimate of G(r) suggests strongly that the pattern is regular. Indeed, G(r)
is zero for
r 0.07 which indicates that there are no nearest-neighbour distances shorter than 0.07.
 include:
Common ways of plotting G
c
Copyright CSIRO
2008

(22)

regardless of the intensity.


Numerous estimators of K have been proposed. Most of them are weighted and renormalised
empirical distribution functions of the pairwise distances, of the general form

1

1 {||xi xj || r} e(xi , xj ; r)
(23)
K(r)
=
2 area(W )

i j=i
where e(u, v, r) is an edge correction weight. The choice of estimator does not seem to be very
important, as long as some edge correction is applied.
c
Copyright CSIRO
2008

15.4 Pairwise distances and the K function

93



Again we usually compare the estimate K(r)
with the Poisson K function. Values K(r)
> r 2

suggest clustering, while K(r)
< r 2 suggests a regular pattern.
In spatstat the function Kest computes several estimates of the K-function.

94

Methods 5: Distance methods for point patterns

To compute the estimated L function, use Lest.


> L <- Lest(cells)
> plot(L, main = "L function")

> Gc <- Kest(cells)


> Gc

0.20
0.15
L(r)
0.10
0.05
0.00

Function value object (class fv)


for the function r -> K(r)
Entries:
id
label
description
---------------r
r
distance argument r
theo
Kpois(r)
theoretical Poisson K(r)
border
Kbord(r)
border-corrected estimate of K(r)
trans
Ktrans(r)
translation-corrected estimate of K(r)
iso
Kiso(r)
Ripley isotropic correction estimate of K(r)
--------------------------------------

0.25

L function

0.00

0.10

0.15

0.20

0.25

Another related summary function is the pair correlation function

Default plot formula:


. ~ r

g(r) =

Recommended range of argument r: [0, 0.25]

K  (r)
2r

where K  (r) is the derivative of K. The pair correlation is in some ways easier to interpret than
either K or L, although it is more dicult to estimate. Roughly speaking, the pair correlation
g(r) is the probability of observing a pair of points separated by a distance r, divided by the
corresponding probability for a Poisson process. This is a non-centred correlation which may
take any nonnegative value. The value g(r) = 1 corresponds to complete randomness; for the
Poisson process the pair correlation is gpois (r) 1. For other processes, values g(r) > 1 suggest
clustering or attraction at distance r, while values g(r) < 1 suggest inhibition or regularity.
To compute the estimated pair correlation function, use pcf.

> par(pty = "s")


> plot(Kest(cells))

0.15

0.20

Kest(cells)

> plot(pcf(cells))

0.10

K(r)

0.05

1.5

0.00

0.05

pcf(cells)

0.05

0.10

0.15

0.20

0.25

which transforms the Poisson K function to the straight line Lpois (r) = r, making visual assessment of the graph much easier. The square root transformation also approximately stabilises
the variance of the estimator, making it easier to assess deviations.
c
Copyright CSIRO
2008

0.0

In this case, the interpretation of all three summary statistics F , G and K is the same:
emphatic evidence of a regular pattern. It is not always the case that these three summaries
give equivalent messages.
A commonly-used transformation of K is the L-function

K(r)
L(r) =

0.5

g(r)

1.0

0.00

0.00

0.05

0.10

0.15

0.20

0.25

Here we have used the method pcf.ppp. This computes a standard kernel estimate which
performs well except at very small values of r. So it is prudent not to read too much into the
behaviour of the pcf close to r = 0.
c
Copyright CSIRO
2008

Methods 5: Distance methods for point patterns

allstats(cells)

F function

G function

0.0

0.0

and compute the L function using eval.fv:

0.2

0.2

0.4

0.4

> K <- Kest(cells)


> plot(K, sqrt(./pi) ~ r)

0.6

0.6

If you want to try another algebraic transformation of a summary function, the transformation can be computed using eval.fv. You can also plot algebraic transformations of a summary
function using the plotting formula argument to plot.fv. For example, if we have already
computed the K function, we can plot the L function by

96

0.8

95

0.8

15.5 J function

0.00

0.02

0.04

0.06

0.08

0.00

0.05

0.10

0.15

K function
0.20

J function

0.05
0.00

If you have already computed the K function and wish to derive the pair correlation, there
is another algorithm pcf.fv that calculates g(r) = K  (r)/(2r) by numerical dierentiation.

0.10

0.15

> K <- Kest(cells)


> L <- eval.fv(sqrt(K/pi))

> K <- Kest(cells)


> g <- pcf(K)

15.5

0.00

15.6

J function

A useful combination of F and G is the J function [44]


J(r) =

1 G(r)
1 F (r)

(24)

0.06

0.08

0.00

0.05

0.10

0.15

0.20

0.25

Caveats

1. the functions F , G and K are dened and estimated under the assumption that the point
process is stationary (homogeneous).

(25)
2. these summary functions do not completely characterise the process.

Values J(r) > 1 suggest regularity, and J(r) < 1 suggest clustering.
An appealing property of the J function is that the superposition X = X1 X2 of two
independent point processes X1 , X2 has J-function
J(t) =

0.04

The use of summary functions for analysing point patterns has become established across wide
areas of applied science, following Ripleys inuential paper [36] and many subsequent textbooks
[17, 19, 21, 43, 38, 39, 42] until quite recently.
There is a tendency to apply them uncritically and exclusively. Its important to remember
that

dened for all r 0 such that F (r) < 1. For a homogeneous Poisson process, Fpois = Gpois , so
that
Jpois (r) 1.

0.02

3. if the process is not stationary, deviations between the empirical and theoretical functions
 and Kpois ) are not necessarily evidence of interpoint interaction, since they may
(e.g. K
also be attributable to variations in intensity.

2
1
J1 (t) +
J2 (t)
1 + 2
1 + 2

where J1 , J2 are the J-functions of X1 , X2 respectively and 1 , 2 are their intensities.


The J function is computed by Jest.
The convenient function allstats eciently computes the F , G, J and K functions for a
dataset. They can be plotted automatically.
> plot(allstats(cells))
c
Copyright CSIRO
2008

For an example of caveat 2, here is a point process constructed by Baddeley and Silverman
[10] which has the same K function as the homogeneous Poisson process:

>
>
>
>

par(mfrow = c(1, 2))


X <- rcell(nx = 15)
plot(X)
plot(Kest(X))

c
Copyright CSIRO
2008

97

98

16

Kest(X)

Methods 6: inference using summary statistics

Methods 6: inference using summary statistics

Although summary statistics such as the K-function are intended primarily for exploratory
purposes, it is also possible to use them as a basis for parameter estimation and hypothesis
testing.

0.10

0.15

0.20

15.6 Caveats

16.1

Envelopes and Monte Carlo tests

0.00

0.05

 with Kpois , etc, can be formalised in terms of hypothesis testing.


The graphical comparison of K
The null hypothesis is Complete Spatial Randomness (a homogeneous Poisson process) and the
alternative comprises all other processes.

For an example of caveat 3, we generate an inhomogeneous Poisson pattern and apply the
ordinary K function estimator. The result appears to show clustering, but this is an artefact of
the spatial inhomogeneity.
par(mfrow = c(1, 2))
X <- rpoispp(function(x, y) {
300 * exp(-3 * x)
})
plot(X)
plot(Kest(X))

Pointwise Monte Carlo test

Following Besag [14] and Ripley [36, 38], formal hypothesis tests are conducted using the Monte
Carlo test principle [25, 15] rather than the Neyman-Pearson lemma. Suppose the reference
curve is the theoretical K function for a completely random (uniform Poisson) point process.
Generate M independent simulations of this process inside the study region W . Compute the
 (j) (r) for j = 1, . . . , M . Obtain the
estimated K functions for each of these realisations, say K
pointwise upper and lower envelopes of these simulated curves,
 (j) (r)
L(r) = min K
j

 (j) (r).
U (r) = max K
j

Kest(X)

0.10

0.15

0.20

0.25


For any xed value of r, consider the probability that K(r)
lies outside the envelope [L(r), U (r)]

for the simulated curves. If the data came from a uniform Poisson process, then K(r)
and
 (M ) (r) are statistically equivalent and independent, so this probability is equal
 (1) (r), . . . , K
K
to 2/(M + 1) by symmetry. That is, the test which rejects the null hypothesis of a uniform

Poisson process when K(r)
lies outside [L(r), U (r)], has exact signicance level = 2/(M + 1).
Instead of the pointwise maximum and minimum, one could use the pointwise order statistics
(the pointwise kth largest and k smallest values) giving a test of exact size = 2k/(M + 1).

0.05

16.1.2

0.00

>
>
+
+
>
>

16.1.1

Envelopes in spatstat

In spatstat the function envelope computes the pointwise envelopes.


> data(cells)
> E <- envelope(cells, Kest, nsim = 39, rank = 1)
> E
Pointwise critical envelopes for K(r)
Obtained from 39 simulations of simulations of CSR
Significance level of pointwise Monte Carlo test: 2/40 = 0.05
Data: cells
Function value object (class fv)
for the function r -> K(r)
Entries:
id
label
description
---------------c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

16.1 Envelopes and Monte Carlo tests

99

100

Methods 6: inference using summary statistics

To avoid this problem we can construct simultaneous critical bands which have the property
 ever wanders outside the critical bands is exactly 5%.
that, under H0 , the probability that K

One simple way to achieve this is to compute, for each estimate K(r),
its maximum deviation

Kpois (r)|. This is computed for each of the M
from the Poisson K function, D = maxr |K(r)
simulated datasets, and the maximum value Dmax obtained. Then the upper and lower limits
are

r
r
distance argument r
obs
obs(r)
function value for data pattern
theo
theo(r)
theoretical value for CSR
lo
lo(r)
lower pointwise envelope of simulations
hi
hi(r)
upper pointwise envelope of simulations
--------------------------------------

L(r) = r 2 Dmax

Default plot formula:


. ~ r

U (r) = r 2 + Dmax .
The estimated K function for the data transgresses these limits if and only if the D-value for
the data exceeds Dmax . Under H0 this occurs with probability 1/(M + 1). Thus, a test of size
5% is obtained by taking M = 19.

Recommended range of argument r: [0, 0.25]


> plot(E, main = "pointwise envelopes")

> E <- envelope(cells, Kest, nsim = 19, rank = 1, global = TRUE)


> plot(E, main = "global envelopes")

pointwise envelopes

K(r)

0.05

0.05

0.00

0.10

0.05

K(r)

0.10

0.15

0.15

0.20

0.20

0.25

global envelopes

0.00

0.05

0.10

0.15

0.20

0.25

0.00

0.00

0.05

0.10

0.15

0.20

A more powerful test is obtained if we (approximately) stabilise the variance, by using the
L function in place of K.

0.25

> E <- envelope(cells, Lest, nsim = 19, rank = 1, global = TRUE)

> plot(E, main = "global envelopes of L(r)")


global envelopes of L(r)

Note that the theory of the Monte Carlo test, as presented above, requires that r be xed in
advance. If we plot the envelope and check whether the empirical K function ever wanders
outside the envelope, this is equivalent to choosing the value of r in a data-dependent way, and
the true signicance level is higher (less signicant).
c
Copyright CSIRO
2008

0.20
0.15

L(r)

0.10
0.00

Simultaneous Monte Carlo test

0.05

16.1.3

0.05

Tip: A common and dangerous mistake is to misinterpret the simulation envelopes


They cannot be interpreted as a measure of
as condence intervals around K.
accuracy of the estimated K function! They are the critical values for a test of the
hypothesis that K(r) = r 2 .

0.25

0.30

For example if r had been xed at r = 0.10 we would have rejected the null hypothesis of
CSR at the 5% level. The value M = 39 is the smallest to yield a two-sided test with signicance
level 5%.

0.00

0.05

0.10

0.15
r

c
Copyright CSIRO
2008

0.20

0.25

Methods 6: inference using summary statistics

> data(cells)
> e <- expression(runifpoint(cells$n, cells$window))
> E <- envelope(cells, Lest, nsim = 19, global = TRUE, simulate = e)
> plot(E, main = "envelope with fixed n")

0.15

0.20

0.25

envelope with fixed n

L(r)

Envelopes for any tted model

0.00

In the explanation above, we assumed that the null hypothesis was CSR (complete spatial
randomness, a uniform Poisson process). In fact the Monte Carlo testing rationale can be
applied to any point process model serving as a null hypothesis. We simply have to generate
simulated realisations from the null hypothesis, and compute the summary function for each
simulated realisation.
To simulate from a tted point process model (object of class "ppm"), call the envelope
function, giving the tted model as the rst argument of envelope. Then the simulated patterns
will be generated according to this tted model. The original data point pattern, to which the
model was tted, is stored in the tted model object; the original data are extracted and the
summary function for the data is also computed.
The following code ts an inhomogeneous Poisson process to the Beilschmiedia pattern, then
generates simulation envelopes of the L function by simulating from the tted inhomogeneous
Poisson model.

102

0.10

16.1.4

101

0.05

16.1 Envelopes and Monte Carlo tests

> data(bei)
> fit <- ppm(bei, ~elev + grad, covariates = bei.extra)
> E <- envelope(fit, Lest, nsim = 19, global = TRUE, correction = "border")
> plot(E, main = "envelope for inhomogeneous Poisson")

0.05

0.10

0.15

0.20

0.25

16.1.6

Envelopes based on a set of point patterns

Envelopes can also be computed from a user-supplied list of point patterns, instead of the
simulated point patterns generated by a chosen simulation procedure.
This improves eciency and consistency if, for example, we are going to calculate the envelopes of several dierent summary statistics.

100

120

envelope for inhomogeneous Poisson

60

80

>
>
>
>
>

data(cells)
SimPatList <- list()
for (i in 1:1000) SimPatList[[i]] <- runifpoint(cells$n)
EK <- envelope(cells, Kest, simulate = SimPatList, nsim = 1000)
Ep <- envelope(cells, pcf, simulate = SimPatList, nsim = 1000)

20

40

L(r)

0.00

20

40

60

80

100

16.2

120

r (metres)

16.1.5

Model-tting using summary statistics

In the method of moments we estimate a parameter by solving


E [S(X)] = S(x)

Envelopes based on any simulation procedure

Envelopes can also be computed using any user-specied procedure to generate the simulated
realisations. This allows us to perform randomisation tests, for example.
The simulation procedure should be encoded as an R expression, which will be evaluated
each time a simulation is required. For example if we type

where S(x) is the observed value of a statistic S for our data x, and the left side is the theoretical
mean of S for the model governed by parameter .
The analogue for point process models is to t the model by matching a summary statistic
such as the K function to its theoretical value under the model.

> sim <- expression(rpoispp(100))

16.2.1

then each time the expression sim is evaluated, it will yield a dierent random outcome of the
Poisson process with intensity 100 in the unit square.
This expression should be passed to the envelope function as the argument simulate.
The following code generates simulation envelopes for the L function based on simulations
of CSR which have the same number of points as the data pattern.

In a precious few cases, the K function of a point process is known exactly as an analytic
expression in terms of the model parameters. These include many Neyman-Scott processes. For
example, the K-function of the Thomas process with parameters = (, , ) is

c
Copyright CSIRO
2008

Theoretical mean known analytically

K (r) = r 2 +
c
Copyright CSIRO
2008

r2
1
(1 exp( 2 )).

(26)

16.2 Model-tting using summary statistics

103


We may thus t a Thomas model by solving K (r) = K(r)
for some values of r. More eciently
we choose to minimise the discrepancy between the two functions over some range [a, b]:
 b
p
 q

(27)
D=
K(r) K (r)q dr
a

where 0 a < b, and where p, q > 0 are indices. This method was originally advocated by Peter
Diggle and collaborators, and is now known as the method of minimum contrast. See [21].
To t the Thomas model by minimum contrast to the K function, use thomas.estK.
> data(redwood)
> fit <- thomas.estK(redwood, c(kappa = 10, sigma2 = 0.1))
The second argument to thomas.estK gives a set of starting values for the parameters, used
in the minimisation search.
The tted model, fit, is an object of class minconfit. There are methods for printing and
plotting objects of this class.

104

Methods 6: inference using summary statistics

The plot shows the theoretical K function of the tted Thomas process (fit), three nonparametric estimates of the K function (iso, trans, border) and the Poisson K function
(theo).
Other models can be tted using matclust.estK (Matern cluster process), lgcp.estK (logGaussian Cox process), or mincontrast (generic tting algorithm for method of minimum contrast).
16.2.2

Monte Carlo

For the vast majority of point process models, the true K function K (r) is not known analytically in terms of the parameter . In principle we could use Monte Carlo simulation to determine
an approximation to K (r), for any given , by generating a large number of simulated realisations of the process with parameter , computing the estimated K-function for each realisation,
and taking the pointwise sample average. Its possible to do this in spatstat using the generic
algorithm mincontrast. Details are not given here as it is rather ddly at present, and will
change soon.

> fit
Minimum contrast fit (object of class "minconfit")
Model: Thomas process
Fitted by matching theoretical K function to Kest(redwood)
Parameters fitted by minimum contrast ($par):
kappa
sigma2
23.545183910 0.002214530
Derived parameters of Thomas process ($modelpar):
kappa
sigma
mu
23.54518391 0.04705879 2.63323490
Converged successfully after 139 iterations.
Domain of integration: [ 0 , 0.25 ]
Exponents: p= 2, q= 0.25
> plot(fit)

0.00

0.05

0.10

K(r)

0.15

0.20

fit

0.00

0.05

0.10

0.15

0.20

0.25

c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

105

17

Methods 7: adjusting for inhomogeneity

106

Methods 7: adjusting for inhomogeneity

Inhomogeneous K function

If a point pattern is known or suspected to be spatially inhomogeneous, then our statistical


analysis of the pattern should take account of this inhomogeneity.

17.1

Inhomogeneous K function

There is a modication of the K function that applies to inhomogeneous processes [2]. If (u)
is the true intensity function of the point process X, then the idea is that each point xi will be
weighted by wi = 1/(xi ).
The inhomogeneous K-function is dened as

Kinhom (r) = E

1  1
1 {0 < ||u xj || r}
(u)
(xj )
xj X

(28)

assuming that this does not depend on location u. Thus, (u)K(r) is the expected total weight
of all random points within a distance r of the point u, where the weight of a point xi is 1/(xi ).
If the process is actually homogeneous, then (u) is constant and Kinhom (r) reduces to the
usual K function (21).
It turns out that, for an inhomogeneous Poisson process with intensity function (u), the
inhomogeneous K function is
Kinhom, pois (r) = r 2

The plot suggests that, even after accounting for dependence on altitude and slope, the trees
still appear to be clustered.
The intensity function (u) could also be estimated by kernel smoothing the point pattern
data. However, notice that the estimator (30) of the inhomogeneous K function depends on
 i ). These are positively biased estimates
the estimated intensity values at the data points, (x
 i ) should be estimated by kernel
of the true values (xi ). In order to avoid bias, the value (x
smoothing of the point pattern with the point xi deleted. This leave-one-out estimator is
implemented in Kinhom and is invoked when the argument lambda is not given:
> Ki2 <- Kinhom(bei)
> plot(Ki2, main = "Kinhom using leave-one-out")

(29)
bord.modif
border
theo

exactly as for the homogeneous case.

lty col
1
1
2
2
3
3

The standard estimators of K can be extended to the inhomogeneous K function:


 inhom (r) =
K

  1 {||xi xj || r}
1
e(xi , xj ; r)
 i )(x
 j)
area(W )
(x
i

(30)

j=i


where e(u, v, r) is an edge correction weight as before, and (u)
is an estimate of the intensity
function (u).
There remains the question of how to estimate the intensity function (u). It is usually

advisable to obtain the intensity estimate (u)
by tting a parametric model, to avoid overtting.
Here is an example for the tropical rainforest data, using the covariate data to suggest a model
for the intensity.

>
>
>
>
>

(the smoothing parameter can also be controlled.)


The inhomogeneous analogue of the L-function is dened by

 inhom (r)
K

.
Linhom (r) =
2r
This can be computed using Linhom. For an inhomogeneous Poisson process, Linhom (r) r.
The inhomogeneous analogue of the pair correlation function can be dened, similarly to the
homogeneous case, as
K
(r)
ginhom (r) = inhom .
2r
It has the same interpretation, namely, that ginhom (r) is the probability of observing a pair of
points at certain locations separated by a distance r, divided by the corresponding probability
for a Poisson process of the same (inhomogeneous) intensity.
The inhomogeneous pair correlation function is currently computed by calling Kinhom followed by pcf.fv (which does numerical dierentiation):

data(bei)
fit <- ppm(bei, ~elev + grad, covariates = bei.extra)
lam <- predict(fit, locations = bei)
Ki <- Kinhom(bei, lam)
plot(Ki, main = "Inhomogeneous K function")

> g <- pcf(Kinhom(bei))


c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

17.2 Inhomogeneous cluster models

17.2

107

Inhomogeneous cluster models

The inhomogeneous Poisson process was described in Section 11.1. We can also introduce spatial
inhomogeneity into any of the non-Poisson models described in Section 14.
In the case of Poisson cluster processes (Section 14.1) we can introduce inhomogeneity in
either the parent process or the ospring processes.
To make the parents inhomogeneous, we simply generate the parent points from an inhomogeneous Poisson process with some intensity function (u).
To make the clusters inhomogeneous, we use a clever construction by Waagepetersen [45].
For a parent point at location (x0 , y0 ), the ospring are generated from a Poisson process with
intensity (x, y) = (x, y)f (x x0 , y y0 ), where f (u, v) is either the Gaussian probability
density (for the Thomas process) or the uniform probability density in a disc (for the Matern
cluster process), and (x, y) is the reference or modulating intensity. The number of ospring
from a given parent (x0 , y0 ) is a Poisson random variable with mean


B(x0 , y0 ) = (x, y) dx dy = f (x x0 , y y0 )(x, y) dx dy.
The simulation algorithms rMatClust and rThomas allow these options. If the parent intensity parameter kappa is given as a function(x,y) or a pixel image, then the parents are
Poisson with inhomogeneous intensity kappa. If the ospring mean parameter mu is given as a
function(x,y) or a pixel image, then this determines an inhomogeneous reference density for
the clusters.
> Z <- as.im(function(x, y) {
+
6 * exp(2 * x - 1)
+ }, owin())
> plot(rMatClust(10, 0.05, Z))

Methods 7: adjusting for inhomogeneity

Waagepetersen [45] pointed out that, if we take a Thomas process or Matern cluster process
(or in general a Neyman-Scott process) with homogeneous parent intensity and inhomogeneous cluster reference density (u), then the overall intensity of the process is
(u) = (u)
and the inhomogeneous K-function is the same as it would be if were constant.
Thus, we can t a Thomas process or Matern cluster process with inhomogeneous clusters
as follows:
1. estimate the inhomogeneous intensity (u) of the process.
2. derive an estimate of the inhomogeneous K-function.
3. use the method of minimum contrast to estimate the parent intensity and the cluster
scale parameter (Gaussian standard deviation or disc radius), exactly as we would in the
homogeneous case.
Here is an application to the rainforest data.
>
>
>
>
>

data(bei)
fit <- ppm(bei, ~elev + grad, covariates = bei.extra)
lam <- predict(fit, locations = bei)
Ki <- Kinhom(bei, lam)
thomas.estK(Ki, c(kappa = 4e-04, sigma2 = 1))

Minimum contrast fit (object of class "minconfit")


Model: Thomas process
Fitted by matching theoretical K function to Ki
Parameters fitted by minimum contrast ($par):
kappa
sigma2
4.267423e-04 2.941906e+01
Derived parameters of Thomas process ($modelpar):
kappa
sigma
mu
0.0004267423 5.4239342345
NA
Converged successfully after 113 iterations.
Domain of integration: [ 0 , 125 ]
Exponents: p= 2, q= 0.25

rMatClust(10, 0.05, Z)

17.3

108

Fitting inhomogeneous models by minimum contrast

Minimum contrast methods can be applied to inhomogeneous point process models.


In principle we could t any model (homogeneous or inhomogeneous) by the method of
minimum contrast using any summary statistic. However, the method works best when we
have an exact formula for the true value of the summary function for the model, expressed as a
function of the parameters of the model.
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

109

18

Gibbs models

110

Gibbs models

The inhomogeneous Poisson process in W with intensity function (u) has probability density

One way to construct a statistical model (in any eld of statistics) is to write down its probability
density. Advantages of doing this are:

f (x) =

n


(xi ).

(33)

i=1

the functional form of the density reects its probabilistic properties.


terms or factors in the density often have an interpretation as components of the model.

where the constant is

it is easy to introduce terms that represent the dependence of the model on covariates, etc.


= exp
W


(1 (u)) du .

This approach is useful provided the density can be written down, and provided the density
is tractable.
Spatial point process models that are constructed by writing down their probability densities
are called Gibbs processes. Good references on Gibbs point processes are [43, 18].

The densities (32) and (33) are products of terms associated with individual points xi . This
reects the conditional independence property (PP4) of the Poisson process.

18.1

18.3

Probability densities

It is possible to dene probability densities for spatial point processes that live inside a bounded
window W .
The probability density will be a function f (x) dened for each nite conguration x =
{x1 , . . . , xn } of points xi W for any n 0. Notice that the number of points n is not xed,
and may be zero. Apart from this peculiarity, probability densities for point processes behave
much like probability densities in more familiar contexts.
Thats all you need to know for applications. If youre interested in the mathematical
technicalities, read on; otherwise, skip to section 18.2.
A point process X inside W is dened to have probability density f if and only if, for any
nonnegative integrable function h,
E[h(X)] = e|W | h()f () + e|W |


n=1

1
n!


W

h({x1 , . . . , xn })f ({x1 , . . . , xn }) dx1 dxn


(31)

Pairwise interaction models

In order to construct spatial point processes which exhibit interpoint interaction (stochastic
dependence between points), we need to introduce terms in the density that depend on more
than one point. The simplest are pairwise interaction models, which have probability densities
of the form

n(x)



b(xi )
c(xi , xj )
(34)
f (x) =
i=1

i<j

where is a normalising constant, b(u), u W is the rst order term, and c(u, v), u, v W
is the second order or pairwise interaction term. The pairwise interaction term introduces
dependence between points. The interaction function must be symmetric, c(u, v) = c(v, u). In
principle we are free to choose any functions b and c, provided the resulting density is integrable
(the right side of (31) should be nite when h 1).

where |W | denotes the area of W .


In particular, the probability that X contains exactly n points is
pn = P{n(X) = n} =

e|W |
n!


W

18.3.1
f ({x1 , . . . , xn }) dx1 dxn

Hard core process

If we take b(u) and

e|W | f ().

for n 1 and p0 = P{n(X) = 0} =


Given that there are exactly n points, the
conditional joint density of the locations x1 , . . . , xn is f ({x1 , . . . , xn })/pn .

18.2

Poisson processes

The uniform Poisson process with intensity 1 has probability density f (x) 1.
The uniform Poisson process in W with intensity has probability density
f (x) = n(x)

(32)

where n(x) is the number of points in the conguration x, and the constant is
= e(1)|W | .
c
Copyright CSIRO
2008


c(u, v) =

1 if ||u v|| > r


0 if ||u v|| r

(35)

where ||u v|| denotes the distance between u and v, and r > 0 is a xed distance, then the
density becomes

n(x) if ||xi xj || > r for all i = j
f (x) =
0
otherwise
This is the density of the Poisson process of intensity in W conditioned on the event that no
two points of the pattern lie closer than r units apart. It is known as the (classical) hard core
process.
c
Copyright CSIRO
2008

18.3 Pairwise interaction models

111

112

18.3.3
Hard core process

Gibbs models

Other pairwise interaction models

Other pairwise interactions that are considered in spatstat include the Strauss-hard core interaction (with hard core distance h > 0 and interaction distance r > h)

0 if ||u v|| h
c(u, v) =
if h < ||u v|| r ,

1 if ||u v|| > r


the soft-core interaction (with scale > 0 and index 0 < < 1)
2/


,
c(u, v) =
||u v||
the Diggle-Gates-Stibbard interaction (with interaction range )

2

if ||u v||
sin ||uv||
2
,
c(u, v) =
1
if ||u v|| >

18.3.2

the Diggle-Gratton interaction (with hard core distance , interaction distance and index )

0
 if ||u v||
||uv||
c(u, v) =
if < ||u v|| ,

1
if ||u v|| >

Strauss process

Generalising the hard core process, suppose we take b(u) and



c(u, v) =

1 if ||u v|| > r


if ||u v|| r

and the general piecewise constant interaction in which c(||u v||) is a step function of ||u v||.
Piecewise constant interaction

(36)

where is a parameter. Then the density becomes


f (x) = n(x) s(x)

(37)

where s(x) is the number of pairs of distinct points in x that lie closer than r units apart.
The parameter controls the strength of interaction between points. If = 1 the model
reduces to a Poisson process with intensity . If = 0 the model is a hard core process. For
values 0 < < 1, the process exhibits inhibition (negative association) between points.
Strauss( = 0.2)
Strauss( = 0.7)

18.4

Higher-order interactions

There are some useful Gibbs point process models which exhibit interactions of higher order,
that is, in which the probability density has contributions from m-tuples of points, where m > 2.
One example is the area-interaction or Widom-Rowlinson process [11] with probability density
(38)
f (x) = n(x) A(x)

For > 1, the density (37) is not integrable. Hence the Strauss process is dened only for
0 1 and is a model for inhibition between points. This is typical of most Gibbs models.
c
Copyright CSIRO
2008

where is the normalising constant, > 0 is an intensity parameter, and > 0 is an interaction
parameter. Here A(x) denotes the area of the region obtained by drawing a disc of radius r
centred at each point xi , and taking the union of these discs. The value = 1 again corresponds
to a Poisson process, while < 1 produces a regular process and > 1 a clustered process.
This process has interactions of all orders. It can be used as a model for moderate regularity or
clustering.
c
Copyright CSIRO
2008

18.5 Conditional intensity

18.5

113

114

Gibbs models

Conditional intensity
areainteraction

Strauss

The main tool for analysing a Gibbs point process is its conditional intensity (u, X). Intuitively
this determines the conditional probability of nding a point of the process at the location u given
complete information about the rest of the process. For formal denitions see [18]. Informally,
the conditional probability of nding a point of the process inside an innitesimal neighbourhood
of the location u, given the complete point pattern at all other locations, is (u, X) du.

For point processes in a bounded window, the conditional intensity at a location u given the
conguration x is related to the probability density f by
f (x {u})
(u, x) =
f (x)

(39)

(for u  x), the ratio of the probability densities for the conguration x with and without the
point u added.
The homogeneous Poisson process with intensity has conditional intensity
(u, x) =
while the inhomogeneous Poisson process with intensity function (u) has conditional intensity
(u, x) = (u)
. The conditional intensity for a Poisson process does not depend on the conguration x, because
the points of a Poisson process are independent.
For the general pairwise interaction process (34) the conditional intensity is
n(x)

(u, x) = b(u)

c(u, xi ).

(40)

i=1

For the hard core process,



(u, x) =

if ||u xi || > r for all i


0 otherwise

(41)

which has the nice interpretation that a point u is either permitted or not permitted depending
on whether it satises the hard core requirement.
For the Strauss process
(42)
(u, x) = t(u,x)
where t(u, x) = s(x {u}) s(x) is the number of points of x that lie within a distance r of the
location u. For < 1, this has the interpretation that a random point is less likely to occur at
the location u if there are many points in the neighbourhood.
c
Copyright CSIRO
2008

For the area-interaction process,


(u, x) = B(u,x)

(43)

where B(u, x) = A(x {u}) A(x) is the area of that part of the disc of radius r centred on u
that is not covered by discs of radius r centred at the other points xi x. If the points represent
trees or plants, we may imagine that each tree takes nutrients and water from the soil inside a
circle of radius r. Then we may interpret B(u, x) as the area of the unclaimed zone where a
new plant at location u would be able to draw nutrients and water without competition from
other plants. For < 1 we can interpret (43) as saying that a random point is less likely to
occur when the unclaimed area is small.
The conditional intensity of a point process determines the probability density, through (39).
Hence we can use the conditional intensity to dene a point process. The conditional intensity
is the preferred modelling tool for Gibbs processes: it has a direct interpretation, and it is easier
to handle than the probability density.

18.6

Simulating Gibbs models

Gibbs models can be simulated by Markov chain Monte Carlo algorithms. Indeed, MCMC
algorithms were invented to simulate Gibbs processes [32, 37].
In brief, these algorithms simulate a Markov chain whose states are point patterns. The chain
is designed so that its equilibrium distribution is the distribution of the point process we want
to simulate. If the chain were run for an innite time, the state would converge in distribution
to the desired point process. In practice the chain is run for a long nite time. Further details
are beyond the scope of this workshop; consult [33, 34] for more information.
Currently spatstat oers the function rmh which simulates Gibbs processes using the
Metropolis-Hastings algorithm.
> rmh(model, start, control)
model determines the point process model to be simulated (see help(rmhmodel)).
start determines the initial state of the Markov chain (see help(rmhstart)).
control species control parameters for running the Markov chain, such as the number
of iteration steps (see help(rmhcontrol)).
c
Copyright CSIRO
2008

18.6 Simulating Gibbs models

115

116

In the simplest uses of rmh, the three arguments are lists of parameter values. To generate a
simulated realisation of the Strauss process with parameters = 2, = 0.7, r = 0.7 in a square
of side 10,

19

> mo <- list(cif = "strauss", par = c(beta = 2, gamma = 0.2, r = 0.7),


+
w = square(10))
> X <- rmh(model = mo, start = list(n.start = 42), control = list(nrep = 1e+06))
The other arguments specify a random initial state of 42 points, and that the algorithm shall
be run for a million iterations.

19.1

Methods 8: tting Gibbs models

Methods 8: tting Gibbs models


Maximum pseudolikelihood

Maximum likelihood estimation is intractable for most point process models. At the very least
it requires Monte Carlo simulation to evaluate the likelihood (or the score and the Fisher information).
A workable alternative, at least for investigative purposes, is to maximise the log pseudolikelihood


log (xi ; x)
(u, x) du.
(44)
log PL (; x) =
i

You may recognise this as being very similar to the likelihood (4) of the Poisson process. In
general it is not a likelihood, but the analogue of the score equation

log PL () = 0

is an unbiased estimating equation. Thus the maximum pseudolikelihood estimator is asymptotically unbiased, consistent and asymptotically normal under appropriate conditions.
The main advantage of maximum pseudolikelihood is that, at least for popular Gibbs models,
the conditional intensity (u, x) is easily computable, so that the pseudolikelihood is easy to
compute and to maximise. The main disadvantage is the bias and ineciency of maximum
pseudolikelihood in small samples.
More computationally-intensive estimation procedures typically use the maximum pseudolikelihood estimate as their initial guess. We are implementing such procedures in spatstat as
well.

19.2

Fitting Gibbs models in spatstat

We have already met the function ppm for tting Poisson point process models. In fact this
function will t a wide class of Gibbs models.
ppm contains an implementation of the algorithm of Baddeley and Turner [3] for maximum
pseudolikelihood (which extends the Berman-Turner device for Poisson processes to a general
Gibbs process). The conditional intensity of the model, (u, x), must be loglinear in the
parameters :
(45)
log (u, x) = S(u, x),
generalising (5), where S(u, x) is a real-valued or vector-valued function of location u and conguration x. Parameters appearing in the loglinear form (45) are called regular parameters, and
all other parameters are irregular parameters. For example, the Strauss process conditional
intensity (42) can be recast as
log (u, x) = log + (log )t(u, x)
so that = (log , log ) are regular parameters, but the interaction distance r is an irregular
parameter (technically called a bloody nuisance parameter).
In spatstat we split the conditional intensity into rst-order and higher-order terms:
log (u, x) = S(u) + V (u, x).

(46)

The rst order term S(u) describes spatial inhomogeneity and/or covariate eects. The higher
order term V (u, x) describes interpoint interaction.
The model with conditional intensity (46) is tted by calling ppm in the form
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

19.2 Fitting Gibbs models in spatstat

117

118

Methods 8: tting Gibbs models

ppm(X, ~ terms, V)
The rst argument X is the point pattern dataset. The second argument ~terms is a model
formula, specifying the rst order term S(u) in (46), in the manner described in Section 11.
Thus the rst order term S(u) in (46) may take very general forms.
The third argument V is an object of the special class "interact" which describes the
interpoint interaction term V (u, x) in (46). It may be compared to the family argument
which determines the distribution of the responses in a linear model or generalised linear model.
Only a limited number of canned interactions are available in spatstat, because they must be
constructed carefully to ensure that the point process exists.
To t the Strauss process to the cells data using ppm,
> data(cells)
> ppm(cells, ~1, Strauss(r = 0.1))

Interaction: Strauss process


interaction distance:
0.1
Fitted interaction parameter gamma:

0.0128

Relevant coefficients:
Interaction
-4.357253
To t an inhomogeneous Strauss process with log-quadratic rst order term,
> ppm(cells, ~polynom(x, y, 2), Strauss(r = 0.1))
Nonstationary Strauss process
Trend formula: ~polynom(x, y, 2)

Stationary Strauss process

Fitted coefficients for trend formula:


(Intercept)
polynom(x, y, 2)[x]
polynom(x, y, 2)[y]
3.019133
11.064005
6.154949
polynom(x, y, 2)[x^2] polynom(x, y, 2)[x.y] polynom(x, y, 2)[y^2]
-9.853849
-1.761367
-5.579568

First order term:


beta
294.2333
Interaction: Strauss process
interaction distance:
0.1
Fitted interaction parameter gamma:

Interaction: Strauss process


interaction distance:
0.1
Fitted interaction parameter gamma:

0.0128

Relevant coefficients:
Interaction
-4.359277
Here Strauss is a special function that creates an interaction object (class "interact")
describing the interaction structure of the Strauss process. Notice that we had to specify the
value of the irregular parameter r (more about that later).
To t the inhomogeneous Strauss process with conditional intensity
(u, x) = b(u) t(u,x)
where, say, b(u) is loglinear in the Cartesian coordinates,
log b((x, y)) = 0 + 1 x + 2 y
we simply type
> ppm(cells, ~x + y, Strauss(r = 0.1))
Nonstationary Strauss process
Trend formula: ~x + y
Fitted coefficients for trend formula:
(Intercept)
x
y
5.7460724
0.1465176 -0.2724205

0.0071

Relevant coefficients:
Interaction
-4.945833

19.3

Interpoint interactions

Instead of Strauss we may use any of the following functions to create an interaction:
Poisson()
the Poisson point process (the default)
Strauss()
the Strauss process
StraussHard()
the Strauss/hard core point process
Softcore()
pairwise interaction, soft core potential
PairPiece()
pairwise interaction, piecewise constant
DiggleGratton() Diggle-Gratton potential
LennardJones()
Lennard-Jones potential
Pairwise()
pairwise interaction, user-supplied potential
AreaInter()
area-interaction process
Geyer()
Geyers saturation process
Saturated()
Saturated pair model, user-supplied potential
OrdThresh()
Ord process, threshold potential
Ord()
Ord model, user-supplied potential
(There are two additional ones for multitype point processes, described in section 25.3.2.)
The area-interaction model and the Geyer saturation model are quite handy, as they can be
used to model both clustering and regularity.

c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

19.4 Fitted point process models

119

Methods 8: tting Gibbs models

print
summary
plot
predict
fitted
update
coef
vcov
anova
logLik

> data(redwood)
> ppm(redwood, ~1, Geyer(r = 0.07, sat = 2))

Stationary Geyer saturation process


First order term:
beta
17.0143
Interaction: Geyer saturation process
interaction distance:
0.07
saturation parameter:
2
Fitted interaction parameter gamma:

120

print basic information


print detailed summary information
plot the tted (conditional) intensity
tted (conditional) intensity
tted (conditional) intensity at data points
re-t the model
extract the tted coecient vector 
variance-covariance matrix of 
analysis of deviance
evaluate log-pseudolikelihood

(the methods for anova and vcov are only available for Poisson models).
Plotting a tted model generates a series of image and contour plots of
2.3509
the tted rst order term exp(
S(u))

Relevant coefficients:
Interaction
0.8547814

the tted conditional intensity (u, x) evaluated for the data pattern x
For Poisson models, the two plots are equivalent, and give the tted intensity function.

> ppm(redwood, ~1, AreaInter(r = 0.03))

> fit <- ppm(cells, ~polynom(x, y, 2), Strauss(r = 0.1))


> par(mfrow = c(1, 2))
> plot(fit, how = "image", ngrid = 256)

Stationary Area-interaction process


First order term:
beta
571.5617

For more detailed explanation of modelling, see [5].

19.4

1000

0.6

500

0.4
0.2

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.0

0.2

500

0.4

Relevant coefficients:
Interaction
2.950212

1.0
0.8

1500

0.6

1000

0.8

19.11

1500

Fitted cif

1.0

Fitted trend

Interaction: Area-interaction process


disc radius:
0.03
Fitted interaction parameter eta:

0.0

0.2

0.4

0.6

0.8

1.0

For non-Poisson models, it is also possible to extract and plot the interpoint interaction
function, using fitin.

Fitted point process models

The result of the ppm call is an object of class "ppm" (point process model). This is very closely
analogous to a tted linear model (lm) or tted generalised linear model (glm).
Standard R operations that are dened for tted point process models (i.e. that have methods
for the class "ppm") include:
c
Copyright CSIRO
2008

> model <- ppm(X, ~1, PairPiece(seq(10, 100, by = 10)))


> f <- fitin(model)
> plot(f)
c
Copyright CSIRO
2008

19.5 Simulation from tted models

121

122

Methods 8: tting Gibbs models

K(r)

1000

0.8
0.6

500

0.4

0.0

0.2

Pairwise interaction

1.0

1500

1.2

envelope(fit, nsim = 39)

20

40

60

80

100

120
0

Distance

10

15

20

r (one unit = 0.1 metres)

19.6
19.5

Dealing with nuisance parameters

Irregular parameters, such as the interaction radius r in the Strauss process, cannot be estimated
directly using ppm. Indeed the statistical theory for estimating such parameters is unclear.
For some special cases, a maximum likelihood estimator of the nuisance parameter is available. For example, for the hard core process (Strauss process with interaction parameter = 0)
with interaction radius r, the maximum likelihood estimator is the minimum nearest-neighbour
distance. Thus the following is a reasonable approach to the cells dataset:

Simulation from tted models

A tted Gibbs model can also be simulated automatically using rmh.

> fit <- ppm(swedishpines, ~1, Strauss(r = 7))


> Xsim <- rmh(fit)
> plot(Xsim, main = "Simulation from fitted Strauss model")

> rhat <- min(nndist(cells))


> rhat <- rhat * 0.99999
> ppm(cells, ~1, Strauss(r = rhat))
Stationary Strauss process
First order term:
beta
168.2692

Simulation from fitted Strauss model

Interaction: Strauss process


interaction distance:
0.0836293018068393
Fitted interaction parameter gamma:
0
Relevant coefficients:
Interaction
-19.29955
The analogue of prole likelihood, prole pseudolikelihood, provides a general solution which
may or may not perform well. If = (, ) where denotes the nuisance parameters and the
regular parameters, dene the prole log pseudolikelihood by
The envelope command will also generate simulation envelopes for a tted model.
PLP(, x) = max log PL ((, ); x) .

The right hand side can be computed, for each xed value of , by the algorithm ppm. Then we
just have to maximise PLP() over . This is done by the command profilepl:

> plot(envelope(fit, nsim = 39))


c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

19.6 Dealing with nuisance parameters

123

124

19.7

> data(simdat)
> df <- data.frame(r = seq(0.05, 2, by = 0.025))
> pfit <- profilepl(df, Strauss, simdat, ~1)
> pfit
Profile log pseudolikelihood values
for model:
ppm(simdat, ~1, interaction = Strauss)
fitted with rbord= 2
Interaction: Strauss
with irregular parameter r in [0.05, 2]
Optimum value of irregular parameter: r = 0.275
The result is an object of class profilepl containing the prole log pseudolikelihood function,
the optimised value of the irregular parameter r, and the nal tted model. To plot the prole
log pseudolikelihood,

Methods 8: tting Gibbs models

Improvements over maximum pseudolikelihood

Maximum pseudolikelihood is quick and dirty. There are statistically more ecient alternatives,
but they are computationally intensive.
Currently we have implemented the easiest of these alternatives, the Huang-Ogata [27] onestep approximation to maximum likelihood. Starting from the maximum pseudolikelihood estimate P L , we simulate M independent realisations of the model with parameters P L , evaluate
the canonical sucient statistics, and use them to form estimates of the score and Fisher information at = P L . Then we take one Newton-Raphson step, updating the value of . The
rationale is that the log-likelihood is approximately quadratic in a neighbourhood of the maximum pseudolikelihood estimator, so that one Newton-Raphson step is almost enough.
To use the Huang-Ogata method instead of maximum pseudolikelihood, add the argument
method="ho".
> fit <- ppm(simdat, ~1, Strauss(r = 0.275), method = "ho")
> fit

> plot(pfit)

Stationary Strauss process


ppm(simdat, ~1, interaction = Strauss)

15.5
16.5

Interaction: Strauss process


interaction distance:
0.275
Fitted interaction parameter gamma:

17.5

log PL

14.5

First order term:


beta
2.500546

0.0

0.5

1.0

1.5

Relevant coefficients:
Interaction
-0.3637451

2.0

To extract the nal tted model,

> vcov(fit)

> pfit$fit

[,1]
[,2]
[1,] 0.01070257 -0.01264063
[2,] -0.01264063 0.03635432

Stationary Strauss process


First order term:
beta
2.583110
Interaction: Strauss process
interaction distance:
0.275
Fitted interaction parameter gamma:

0.6951

For models tted by Huang-Ogata, the variance-covariance matrix returned by vcov is computed from the simulations.

0.5631

Relevant coefficients:
Interaction
-0.5743608
There is a summary method for these objects as well.
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

125

20

126

Methods 9: validation of tted Gibbs models

Methods 9: validation of tted Gibbs models


envelope(fit, Lest, nsim=19, global=TRUE)

L(r)

As an example, consider the following data:

0.00

0.05

0.10

> data(residualspaper)
> X <- residualspaper$Fig4b
> plot(X)

0.15

0.20

0.25

Goodness-of-t testing and model validation for Poisson models were described in Section 12.
Checking a tted Gibbs point process model is more dicult. There is little theory available to
support goodness-of-t tests and the like.

0.00

0.05

0.10

0.15

0.20

0.25

Lets subtract the theoretical Poisson value L(r) = r to get a more readable plot:
> plot(envelope(fit, Lest, nsim = 19, global = TRUE), . - r ~ r)

0.01
0.00
0.02

> fit <- ppm(X, ~polynom(x, y, 2), Strauss(0.05), correction = "isotropic")

0.01

We t a Strauss process model with a log-quadratic intensity term:

cbind(obs, mmean, hi, lo) r

envelope(fit, Lest, nsim=19, global=TRUE)

The question is how to conrm or validate this model.

0.00

0.05

0.10

0.15

0.20

0.25

20.1

Goodness-of-t testing for Gibbs processes


This is fairly consistent with a Strauss process.

For a tted Gibbs process, no theory is available to support the 2 goodness-of-t test or the
Kolmogorov-Smirnov test. The predicted mean number of points in a given region is not known
in closed form for a Gibbs process. Thus, the appropriate test statistic for a 2 test is not even
available in closed form, let alone the null distribution of this statistic.
Instead, goodness-of-t for tted Gibbs models often relies on the summary functions K and
G. The command envelope will accept as its rst argument a tted Gibbs model, and will
simulate from this model to determine the critical envelope.

> plot(envelope(fit, Lest, nsim = 19, global = TRUE))


c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

20.2

127

Residuals for Gibbs processes

Residuals for a general Gibbs model were dened only recently [6, 1]. The total residual in a
region B R2 is dened as

 x) du
(u,
(47)
R(B) = n(x B)
B

Methods 9: validation of tted Gibbs models

plots is a little more dicult without the signicance bands. One tends to place a little more
emphasis on the smoothed residual eld.
Interaction between points in a point process corresponds roughly to the distribution of the
responses in loglinear regression. To validate the interaction terms in a point process model, we
should plot the distribution of the residuals.
> qqplot.ppm(fit, nsim = 39)

qqplot.ppm(fit, nsim=39)

50

> diagnose.ppm(fit)

data quantile

50

 x) is the
where again n(x B) is the observed number of points in the region B, and (u,
conditional intensity of the tted model, evaluated for the data point pattern x. If the tted
model is correct, the residuals have mean zero.
This denition is similar to the denition of residuals for Poisson processes (Section 12.2)

except that the intensity (u)
of the tted Poisson process has been replaced by the conditional
 x) of the tted Gibbs process evaluated for the data point pattern x.
intensity (u,
Residuals for Gibbs processes can be plotted as explained in Section 12.2.

128

100

20.2 Residuals for Gibbs processes

100

cumulative sum of raw residuals

15

10

10

15

100

20

y coordinate

0.8
0.6

0
20
15
10
5
0
5

0.2

0.4

0.6

0.8

100

This shows a QQ plot of the smoothed residuals, with pointwise 5% critical envelopes from
simulations of the tted model. This suggests that the Strauss model is reasonable.
These validation techniques generalise and unify many existing exploratory methods. For
particular models of interpoint interaction, the QQ plot is closely related to the summary
functions F , G and K. See [6].

10

cumulative sum of raw residuals

50

Mean quantile of simulations


Residuals: raw

0.2

0.4

50

x coordinate

At the time of writing, spatstat does not yet display 2 signicance bands for the lurking
variable plots when the tted model is not Poisson. The interpretation of the lurking variable
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

129

21

Marked point patterns

21.1

Marked point patterns

Each point in a spatial point pattern may carry additional information called a mark. For
example, points which are classied into two or more dierent types (on/o, case/control, species,
colour, etc) may be regarded as marked points, with a mark which identies which type they
are. Data recording the locations and heights of trees in a forest can be regarded as a marked
point pattern where the mark attached to a trees location is the tree height.
In our current implementation, the mark attached to each point must be a single value (which
may be numeric, character, complex, logical, or factor). Many of the functions in spatstat
handle marked point patterns in which the mark attached to each point is either
a continuous variate or real number. An example is the Longleaf Pines dataset
(longleaf) in which each tree is marked with its diameter at breast height. The marks
component must be a numeric vector such that marks[i] is the mark value associated
with the ith point. We say the point pattern has continuous marks.
a categorical variate. An example is the Amacrine Cells dataset (amacrine) in which
each cell is identied as either on or o. Such point patterns may be regarded as
consisting of points of dierent types. The marks component must be a factor such
that marks[i] is the label or type of the ith point. We call this a multitype point pattern
and the levels of the factor are the possible types.
longleaf

130

Marked point patterns

occurrence time of each earthquake, can alternatively be viewed as a point process in space-time
with coordinates (longitude, latitude, time).
A marked point process of points in space S with marks belonging to a set M is mathematically dened as a point process in the cartesian product S M . The space M of possible marks
may be anything. In current applications, typically the mark is either a categorical variable
(so that the points are grouped into types) or a real number. Multivariate marks consisting of
several such variables are also common.
A marked point pattern is an unordered set
y = {(x1 , m1 ), . . . , (xn , mn )},

xi W,

mi M

where xi are the locations and mi are the corresponding marks.

21.3
21.3.1

Methodological issues
Should the data be treated as a marked point process?

In a marked point process the points are random. Treating the data as a point process is
inappropriate if the locations are xed, or if the locations are not part of the response.
Example 16 Todays maximum temperatures at 25 Australian cities are displayed on a map.
This is not a point process in any useful sense. The cities are xed locations. The temperatures are observations of a spatial variable at a xed set of locations. See the R packages sp,
spdep, spgwr for suitable methods.
Example 17 A mineral exploration dataset records the map coordinates where 15 core samples
were drilled, and for each core sample, the assayed concentration of iron in the sample.

amacrine

This should not be treated as a point process. The core sample locations were chosen by a
geologist, and are part of the experimental design. The main interest is in the iron concentration
at these locations. This should probably be analysed as a geostatistical dataset. See the R
packages geoR, geoRglm for suitable methods.
21.3.2

Note that, in some other packages, a point pattern dataset consisting of points of two dierent
types (A and B say) is represented by two datasets, one representing the points of type A and
another containing the points of type B. In spatstat we take a dierent approach, in which
all the points are collected together in one point pattern, and the points are then labelled by
the type to which they belong. An advantage of this approach is that it is easy to deal with
multitype point patterns with more than 2 types. For example the classic Lansing Woods dataset
represents the positions of trees of 6 dierent species. This is available in spatstat as a single
dataset, a marked point pattern, with the marks having 6 levels.

21.2

Joint vs. conditional analysis

There are more choices for analysis (and more traps) when marks are present. Schematically, if
we write X for the points and M for the marks, then a statistical model for the marked point
pattern could be formulated in several ways:
[X] [M |X] conditional on locations points X are rst generated according to a
spatial point process, then marks M are assigned to the points by a random mechanism
[M |X];
[M ] [X|M ] conditional on marks or split by marks marks M are rst generated
according to some random mechanism [M ], then they are placed at certain locations X by
point process(es) [X|M ];
[X, M ] joint marked points are generated according to a marked point process.

Formulation

A mark variable may be interpreted as an additional coordinate for the point: for example
a point process of earthquake epicentre locations (longitude, latitude), with marks giving the
c
Copyright CSIRO
2008

These approaches typically lead to dierent stochastic models and have dierent inferential
interpretations. Correspondingly, there are dierent null hypotheses that can be tested:
c
Copyright CSIRO
2008

21.3 Methodological issues

131

random labelling: given the locations X, the marks are conditionally independent and
identically distributed;
independence of components: the sub-processes Xm of points of each mark m, are independent point processes;

132

Marked point patterns

This can be analysed either as a marked point pattern (where the mark is the case/control
label) or, by conditioning on locations, as a random eld of case/control values attached to the
known domicile locations.
ChorleyRibble Data

complete spatial randomness and independence (CSRI): the locations X are a uniform
Poisson point process, and the marks are independent and identically distributed. (This
implies both random labelling and independence of components).
These null hypotheses are not equivalent.
The properties of random labelling and independence of components are not equivalent. For
example, take a point process X where nearest neighbour distances are always larger than a
threshold r, and attach random marks to the points. The resulting marked point process cannot
be generated using the independence construction, because if points with dierent marks are
independent, they can come arbitrarily close to one another.
Example 18 (Ant nests data) Two species of ants build nests in a desert. We want to investigate ecological interaction between the species, and between dierent nests of the same species.
The locations of all nests are mapped, and marked by the species.
These data can be analysed as a marked point process consisting of two dierent types of
points. The mark attached to each point is its species (a categorical variable). The most
natural kind of modelling and analysis is either joint [X, M ] or split by species [M ] [X|M ]. We
could also treat one of the species as a covariate and analyse the other species conditional on it.
Example 19 Trees in an orchard are examined and their disease status (infected/not infected)
is recorded. We are interested in the spatial characteristics of the disease, such as contagion
between neighbouring trees.
These data probably should not be treated as a point process. The response is disease
status. We can think of disease status as a label applied to the trees after their locations have
been determined. Since we are interested in the spatial correlation of disease status, the tree
locations are eectively xed covariate values. It would probably be best to treat these data
as a discrete random eld (of disease status values) observed at a nite known set of sites (the
trees).
21.3.3

Grey areas

There are some grey areas which permit several alternative choices of analysis. It could be
appropriate either to analyse the locations and marks jointly (denoted [X, M ]), or to analyse
the marks conditional on the locations ([M |X]) or to analyse the locations given the marks
([X|M ]).
One grey area occurs when the locations are random, but may be ancillary for the parameters
of interest.
Example 20 Case-control study of cancer [20, 24]. The domicile locations of all new cases
of a rare cancer are mapped. To allow for spatial variation in the density of the susceptible
population, domicile locations are recorded for a random sample of (matched) controls.
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

133

22

Handling marked point pattern data

This section explains how to create a marked point pattern dataset in spatstat, and how to
manipulate it.

22.1

134

Handling marked point pattern data

marked planar point pattern: 10 points


multitype, with levels = a
b
window: rectangle = [0, 1] x [0, 1] units

If point pattern data are stored in a text le, the command scanpp will read the data and
create a point pattern object of class "ppp". The argument multitype=TRUE will ensure that
the mark values are interpreted as a factor.

Creating datasets

In spatstat version 1, each point in a point pattern can be marked with a single value (i.e.
one mark value per point). The marks are stored in a vector, of the same length as the number
of points. The marks can be of any atomic type: numeric, integer, character, factor, logical or
complex.
A marked point pattern dataset can be created using any of the following tools:
ppp
create point pattern dataset
as.ppp
convert other data to point pattern
superimpose combine several point patterns into a marked point pattern
marks
extract marks from a point pattern
marks<attach marks to a point pattern
%mark%
attach marks to a point pattern
unmark
delete marks from a point pattern
scanpp
read point pattern data from text le
clickppp
create a pattern using point-and-click on the screen
The command ppp can be used to create a marked point pattern dataset from raw data. The
syntax is

> X <- scanpp("myfile.txt", window = square(1), multitype = TRUE)


The command superimpose combines several point patterns within the same window. It
can be used to create a multitype point pattern, if you have already created separate point
patterns containing the points of each type. Suppose X1 and X2 are unmarked point patterns
Then superimpose(A=X1, B=X2) will create a multitype point pattern by attaching the mark
A to each point of X1, attaching the mark B to each point of X2, and combining the points.
X1

X2

superimpose(A = X1, B = X2)

> ppp(x, y, ..., marks = m)


where x, y and m are vectors of equal length containing the (x, y) coordinates and the corresponding mark values, and ... are arguments that determine the window for the point pattern.

Marks can be attached to an existing point pattern X using the function marks<- as in
> marks(X) <- m

Tip: If the marks are intended to be a categorical variable (representing the types
in a multitype point pattern),
ensure that m is stored as a factor in R.

or using the binary operator %mark%,


> Y <- X %mark% m

when the point pattern X has been created, check that it is multitype using
is.multitype(X).
check that the factor levels are as you intended, using levels(m) or levels(marks(X))
where X is the marked point pattern. If the factor levels are character strings,
they will be sorted into alphabetical order by default.
be careful when performing equality/inequality comparisons involving a factor. Particular danger occurs when the factor levels are strings that represent
integers.
The command as.ppp will convert data in another format (for example, a 2-column or 3column matrix or data frame) to a point pattern object of class "ppp". The third column of a
matrix or data frame will be interpreted as containing the marks.
> mydata <- data.frame(x = runif(10), y = runif(10), m = sample(letters[1:3],
+
10, replace = TRUE))
> as.ppp(mydata, square(1))
c
Copyright CSIRO
2008

These are convenient when you want to assign new marks to a dataset that are computed
using another variable, or perhaps to randomise the marks in a dataset.
A multitype point pattern can also be created interactively using clickppp, using the argument types to specify the possible types.

22.2

Inspecting a marked point pattern

Basic tools for inspecting a marked point pattern include the print, plot and summary methods.
> data(amacrine)
> amacrine
marked planar point pattern: 294 points
multitype, with levels = off
on
window: rectangle = [0, 1.6012] x [0, 1] units (one unit = 662 microns)
> summary(amacrine)
c
Copyright CSIRO
2008

22.2 Inspecting a marked point pattern

135

Marked planar point pattern: 294 points


Average intensity 184 points per square unit (one unit = 662 microns)
Multitype:
frequency proportion intensity
off
142
0.483
88.7
on
152
0.517
94.9
Window: rectangle = [0, 1.6012] x [0, 1] units
Window area = 1.60121 square units
Unit of length: 662 microns

22.3
22.3.1

on
2

Handling marked point pattern data

Manipulating data
Manipulating marks

The following tools can manipulate the marks in a point pattern:


marks
extract marks
marks<- attach marks to a point pattern
%mark%
attach marks to a point pattern
unmark
remove marks from point pattern
For example, the Lansing Woods data are tree locations marked by diameter at breast height
(dbh) in centimetres. To convert the marks from diameters to circular areas,
>
>
>
>

> plot(amacrine)
off
1

136

data(lansing)
d <- marks(lansing)
a <- (pi/4) * d^2
marks(lansing) <- a

22.3.2

Separating points of dierent types

amacrine

A multitype point pattern can be separated into the sub-patterns of points of each type, using
the split command.
> data(amacrine)
> Y <- split(amacrine)
In fact split is a generic function and the commands above invoke the split method for
the class of point patterns, split.ppp. The result Y is a list of point patterns, with names
that correspond to the type labels. This list also belongs to the class "splitppp" which can be
plotted automatically:
You can also convert a marked point pattern into a data frame for closer inspection of the
coordinates and mark values:

> plot(split(amacrine))
split(amacrine)

> as.data.frame(amacrine)
x
y marks
1
0.0224 0.0243
on
2
0.0243 0.1028
on
3
0.1626 0.1477
on
........

off

on

The marks can be extracted using the function marks:


> data(longleaf)
> m <- marks(longleaf)
Beware the possibility that two points with dierent marks may occupy the same spatial
location. This is not currently detected by ppp since, for a marked point pattern, the function
duplicated.ppp regards two points as identical only when their coordinates and mark values
are identical. To detect duplication of the spatial locations, use duplicated(unmark(X)).
Further tools are presented in the next section.
c
Copyright CSIRO
2008

22.3.3

Cutting the numerical scale into bands

For a point pattern with numeric marks, the marks can be converted to a factor, using a method
for the generic function cut. The user species a series of cut-points on the numerical scale; all
mark values between two cut-points are given the same label.
c
Copyright CSIRO
2008

22.3 Manipulating data

137

138

For example, the Longleaf Pines data are the locations of trees marked with their diameter
at breast height, dbh, in centimetres. By convention we dene adult trees to be those with
dbh greater than 30 centimetres. To obtain the bivariate point pattern of adult and juvenile
trees,

23

> data(longleaf)
> longleaf
marked planar point pattern: 584 points
marks are numeric, of type double
window: rectangle = [0, 200] x [0, 200] metres

Methods 10: exploratory tools for marked point patterns

This section covers some tools for exploratory data analysis of marked point patterns. Most of
the tools have been developed for the special case of multitype point patterns (i.e. where the
marks are categorical).

23.1

> X <- cut(longleaf, breaks = c(0, 30, 80), labels = c("juvenile",


+
"adult"))
> X

Methods 10: exploratory tools for marked point patterns

Intensity

The Lansing Woods data give the locations of 6 species of trees in a forest in Michigan. Elementary estimates of the frequency distribution of species, and the intensity of each species, are
available from summary.ppp.

marked planar point pattern: 584 points


multitype, with levels = juvenile
adult
window: rectangle = [0, 200] x [0, 200] metres
> data(lansing)
> summary(lansing)

> par(mfrow = c(1, 2))


> plot(longleaf)
0
20
40
60
80
0.000000 1.722522 3.445045 5.167567 6.890090

Marked planar point pattern: 2251 points


Average intensity 2250 points per square unit (one unit = 924 feet)

> plot(X, main = "cut(longleaf)")


juvenile
1

*Pattern contains duplicated points*


Multitype:
frequency proportion intensity
blackoak
135
0.0600
135
hickory
703
0.3120
703
maple
514
0.2280
514
misc
105
0.0466
105
redoak
346
0.1540
346
whiteoak
448
0.1990
448

adult
2

> par(mfrow = c(1, 1))

longleaf

cut(longleaf)

Window: rectangle = [0, 1] x [0, 1] units


Window area = 1 square unit
Unit of length: 924 feet

Its sensible to examine the sub-patterns of dierent types separately, using split.ppp.

> plot(split(lansing))
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

23.1 Intensity

139

hickory

Methods 10: exploratory tools for marked point patterns

> pBlackoak <- eval.im(blackoak/(blackoak + hickory + maple + misc +


+
redoak + whiteoak))
> plot(pBlackoak)
> detach(Y)

split(lansing)
blackoak

140

maple

0.8

0.15

1.0

pBlackoak

redoak

whiteoak

It would be useful to compute and plot a separate estimate of intensity for each type of tree.
This is possible using the functions density.splitppp and plot.listof. They are invoked
simply by typing
> plot(density(split(lansing)), ribbon = FALSE)

density(split(lansing))

0.4

0.6

0.8

1.0
0.8
0.6

1.0

0.2

0.4

0.6

0.8

1.0

0.8

1.0

Numeric marks: distribution and trend

0.0

0.2

0.4

0.6

0.8

1.0

0.8

1.0

Histogram of marks(longleaf)

1.0
0.0

0.2

0.4

0.6

0.8

1.0
0.8
0.6
0.4
0.6

1.0

Parametric estimates of intensity can be obtained using ppm, tting a Poisson model with
an intensity function that may depend on location and/or on the marks. See below.

whiteoak

0.2
0.4

0.8

> data(longleaf)
> hist(marks(longleaf))

redoak

0.0
0.2

0.6

0.0
0.0

misc

0.0

0.4

For a point pattern with marks that are numeric (real numbers or integers) or logical values,
the mark values can be extracted using the marks function and inspected using the histogram
or kernel density estimate:

0.2

0.4

0.8
0.6
0.4
0.2
0.2

0.2

maple

0.0
0.0

0.0

23.2

hickory
1.0

blackoak

0.0

0.2

0.05

0.4

0.1

0.6

misc

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

The relative proportions of intensity can then be computed using eval.im:


> Y <- density(split(lansing))
> attach(Y)
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

23.3 Simple summaries of neighbouring marks

141

To assess spatial trend in the marks, one way is to form a kernel regression smoother. The
smoothed mark value at location u R2 is

i mi (u xi )
m(u)

= 
i (u xi )
where k is the smoothing kernel, and mi is the mark value at data point xi . This is computed
by smooth.ppp:
> plot(smooth.ppp(longleaf))

35

150

40

200

smooth.ppp(longleaf)

Methods 10: exploratory tools for marked point patterns

> data(amacrine)
> M <- marktable(amacrine, R = 0.1)
> M[1:10, ]
mark
point off on
1
1 1
2
2 2
3
4 3
4
3 1
5
4 1
6
2 3
7
3 2
8
1 1
9
3 1
10
3 2

30

More general summaries of the marks of neighbours can be obtained using the function
markstat. For example, to compute the average diameter of the 5 closest neighbours of each
tree in the Longleaf Pines dataset,

20

25

100

> md <- markstat(longleaf, mean, N = 5)


> md[1:10]

15

50

142

[1] 43.40 43.40 48.58 21.70 48.38 53.32 40.28 29.82 24.92 21.70
0

50

100

150

200

23.4
You can also use cut.ppp followed by split.ppp to look for spatial inhomogeneity of the
marks:
> data(spruces)
> plot(split(cut(spruces, 3)))

Summary functions

The summary functions F , G, J and K (and other functions derived from K, such as L and the
pair correlation function) have been extended to multitype point patterns.
Assume the multitype point process X is stationary. Let Xj denote the sub-pattern of points
of type j, with intensity j . Then
Fj (r) is the empty space function for Xj

split(cut(spruces, 3))

(0.16,0.23]

(0.23,0.3]

Gij (r) is the distribution function of the distance from a point of type i to the nearest
point of type j
(0.3,0.37]

Kij (r) is 1/j times the expected number of points of type j within a distance r of a
typical point of type i.
Jij is dened as
Jij (r) =

1 Gij (r)
.
1 Fj (r)

The functions Gij , Kij , Jij are called cross-type or i-to-j summary functions. They are computed in spatstat by Gcross, Kcross and Jcross.

23.3

Simple summaries of neighbouring marks

We are often interested in the marks that are attached to the close neighbours of a typical point.
For a multitype point pattern, the function marktable compiles a contingency table of the
marks of all points within a given radius of each data point:
c
Copyright CSIRO
2008

> data(amacrine)
> amacrine
> plot(Gcross(amacrine, "on", "off"))
c
Copyright CSIRO
2008

23.4 Summary functions

143

144

Methods 10: exploratory tools for marked point patterns

> data(lansing)
> a <- alltypes(lansing, "G")

0.8

Gcross(amacrine, "on", "off")

0.4

Gcross(r)

0.6

> plot(a[2:3, 2:3])

0.2

Array of Gcross functions for lansing.


hickory

0.0

maple

0.06
0.8

0.05

km , rs , theo

0.2
0.000

0.005

> plot(alltypes(amacrine, "G"))

0.015

0.020

0.025

0.030

0.005

0.010

0.015

0.020

0.025

0.030

0.025

0.030

0.8
0.2

0.4

km , rs , theo

0.6

0.8

0.6

0.0
0.000

0.005

0.010

0.015

0.020

0.025

0.030

0.000

0.005

0.010

0.015

0.020

r (one unit = 924 feet)

0.4

r (one unit = 924 feet)

Also dened are the i-to-any summaries

0.0

0.0

0.1

0.2

0.2

0.3

0.4

km , rs , theo

0.5

0.0

0.6

0.8

0.7

0.000

r (one unit = 924 feet)

0.2

maple

km , rs , theo

on

off

km , rs , theo

0.010

r (one unit = 924 feet)

Array of Gcross functions for amacrine.

off

0.4

0.6

0.8
0.0

0.2

hickory

km , rs , theo

The command alltypes enables the user to compute the cross-type summary functions
between all pairs of types simultaneously. For example, to compute Gij (r) for all i and j in
the amacrine cells data, we would use alltypes(amacrine, "G"). The result is automatically
displayed as an array of plot panels.

0.0

0.04

0.6

0.03

r (one unit = 662 microns)

0.4

0.02

0.6

0.01

0.4

0.00

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.00

0.01

0.03

0.04

0.05

Gi (r), the distribution function of the distance from a point of type i to the nearest other
point of any type;

0.06

0.6

number of points of any type within a distance r of a


Ki (r) is 1/ times the expected
typical point of type i. Here = j j is the intensity of the entire process X.

0.4

km , rs , theo

0.6

Ji dened by

0.2

0.4

Ji (r) =

0.0

0.0

0.2

km , rs , theo

on

0.02

r (one unit = 662 microns)

0.8

r (one unit = 662 microns)

0.00

0.01

0.02

0.03

0.04

0.05

r (one unit = 662 microns)

0.06

0.00

0.01

0.02

0.03

0.04

0.05

0.06

1 Gi
1 F (r)

r (one unit = 662 microns)

The result of alltypes is a function array (object of class "fasp") which can be indexed
by row and column subscripts. If the point pattern has a large number of possible types, you
can compute the array of all possible pairwise G functions, then use the subscript operator to
inspect a subset of the array.
c
Copyright CSIRO
2008

These are computing by Gdot, Kdot and Jdot respectively, or using alltypes.

> plot(Gdot(amacrine, "on"))


c
Copyright CSIRO
2008

23.4 Summary functions

145

146

Methods 10: exploratory tools for marked point patterns

Array of pair correlation functions for amacrine

Gdot(amacrine, "on")

on

0.8

pcf , theo

0.7
0.6

0.2

0.2

0.9

1.0

1.0
0.8
0.6
0.4

off

pcf , theo

0.4

Gdot(r)

0.6

1.1

0.8

off

0.00

0.05

0.10

0.15

0.20

0.25

0.00

0.05

0.10

0.15

0.20

0.25

r (one unit = 662 microns)

0.0

r (one unit = 662 microns)

0.01

0.02

0.03

0.04

0.05

0.06

0.8
0.4
0.2

0.6

> plot(alltypes(amacrine, "Gdot"))

0.6

pcf , theo

0.9
0.8
0.7

on

pcf , theo

1.0

r (one unit = 662 microns)

1.0

1.1

0.00

0.00

0.05

0.10

0.15

0.20

0.25

0.00

r (one unit = 662 microns)

0.05

0.10

0.15

0.20

0.25

r (one unit = 662 microns)

Array of Gdot functions for amacrine.

23.5

0.6

0.8

The mark correlation function f (r) of a stationary marked point process Y is a measure of
the dependence between the marks of two points of the process a distance r apart [42]. It is
informally dened as
E[f (M1 , M2 )]
f (r) =
E[f (M, M  )]

0.4

where M1 , M2 are the marks attached to two points of the process separated by a distance r,
while M, M  are independent realisations of the marginal distribution of marks.
Here f is any function f (m1 , m2 ) with two arguments which are possible marks of the pattern,
and which returns a nonnegative real value. Common choices of f are:

0.0

0.2

km , rs , theo

off

Mark correlation function

0.00

0.01

0.02

0.03

0.04

0.05

0.06

r (one unit = 662 microns)

for continuous real-valued marks, f (m1 , m2 ) = m1 m2 ;

0.6

for marks taking values in [0, 2], f (m1 , m2 ) = sin(m1 m2 ).

0.4

Note that f (r) is not a correlation in the usual statistical sense. It can take any nonnegative real value. The value 1 suggests lack of correlation: under random labelling, f (r) 1.
The interpretation of values larger or smaller than 1 depends on the choice of function f .
The mark correlation function is computed in spatstat by markcorr. It has the syntax

0.0

0.2

km , rs , theo

on

0.8

for categorical marks (multitype point patterns), f (m1 , m2 ) = 1 {m1 = m2 };

0.00

0.01

0.02

0.03

0.04

0.05

0.06

r (one unit = 662 microns)

The pair correlation functions corresponding to the K-functions can also be computed, using
pcf.fasp.

> markcorr(X, f)
where X is a point pattern and f is an R language function. For example, for the amacrine
data, the natural function f is f (m1 , m2 ) = 1 {m1 = m2 } which we encode as
> eqfun <- function(m1, m2) {
+
m1 == m2
+ }

> K <- alltypes(amacrine, "K")


> P <- pcf(K, method = "b", spar = 1)
> plot(P, lwd = 2)
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

23.6 Randomisation tests

147

148

Methods 10: exploratory tools for marked point patterns

Then simply
0.20

test of marked Poisson model

0.10

Kcross(r)

0.15

> M <- markcorr(amacrine, eqfun, correction = "translate", method = "density",


+
kernel = "epanechnikov")

0.00

0.05

> plot(M)

0.00

0.05

0.10

0.15

0.20

0.25

1.0

r (one unit = 662 microns)

0.4

m(r)

0.6

0.8

Notice that the arguments i and j here do not match any of the formal arguments of
envelope, so they are passed to Kcross. This has the eect of calling Kcross(X, i="on", j="off")
for each of the simulated point patterns X. Each simulated pattern is generated by the homogeneous Poisson point process with intensities estimated from the dataset amacrine.

0.2

23.6.2

Its also possible to test other null hypotheses by a randomisation test. We discussed two popular
null hypotheses:

0.0
0.00

0.05

0.10

0.15

0.20

Independence of components

0.25

r (one unit = 662 microns)

random labelling: given the locations X, the marks are conditionally independent and
identically distributed;

23.6

Randomisation tests

Simulation envelopes of summary functions can be used to test various null hypotheses for
marked point patterns.

23.6.1

Poisson null

The null hypothesis of a homogeneous Poisson marked point process can be tested by direct
simulation, using envelope as before. For example, using the cross-type K function as the test
statistic,

> data(amacrine)
> E <- envelope(amacrine, Kcross, nsim = 39, i = "on", j = "off")

independence of components: the sub-processes Xm of points of each mark m, are independent point processes.

In a randomisation test of the independence-of-components hypothesis, the simulated patterns X are generated from the dataset by splitting the data into sub-patterns of points of one
type, and randomly shifting these sub-patterns, independently of each other. The shifting is
performed by rshift:

> E <- envelope(amacrine, Kcross, nsim = 39, i = "on", j = "off",


+
simulate = expression(rshift(amacrine, radius = 0.25)))

> plot(E, main = "test of independent components")

> plot(E, main = "test of marked Poisson model")


c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

23.6 Randomisation tests

149

150

Methods 10: exploratory tools for marked point patterns

test of random labelling

0.2
0.0

Jdot(r) J(r)

0.2

0.10
0.00

0.4

0.05

Kcross(r)

0.15

0.4

0.20

test of independent components

0.00

0.05

0.10

0.15

0.20

0.25

0.00

r (one unit = 662 microns)

0.01

0.02

0.03

The independence-of-components hypothesis seems to be accepted in this example.


Under the independence hypothesis,

0.05

The random labelling hypothesis also seems to be accepted.

Kij (r) = r 2
Gij (r) = Fj (r)
Jij (r) 1.
while the i-to-any functions have complicated values. Thus, we would normally use Kij or Jij
to construct a test statistic for independence of components.
23.6.3

0.04

r (one unit = 662 microns)

Random labelling

In a randomisation test of the random labelling null hypothesis, the simulated patterns X are
generated from the dataset by holding the point locations xed, and randomly resampling the
marks, either with replacement (independent random sampling) or without replacement (randomly permuting the marks). The resampling operation is performed by rlabel.
Under random labelling,
Ji (r) = J(r)
Ki (r) = K(r)
Gi (r) = G(r)
(where G, K, J are the summary functions for the point process without marks) while the other,
cross-type functions have complicated values. Thus, we would normally use something like
Ki (r) K(r) to construct a test statistic for random labelling.
To do this, cook up a little function to evaluate Ji (r) J(r):
> Jdif <- function(X, ..., i) {
+
Jidot <- Jdot(X, ..., i = i)
+
J <- Jest(X, ...)
+
dif <- eval.fv(Jidot - J)
+
return(dif)
+ }
> E <- envelope(amacrine, Jdif, nsim = 39, i = "on", simulate = expression(rlabel(amacrin
> plot(E, main = "test of random labelling")
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

151

24

Methods 11: multitype Poisson models

24.2

This section covers multitype Poisson process models: basic properties, simulation, and tting
models to data.

24.1
24.1.1

152

Theory

Methods 11: multitype Poisson models

Simulation

Realisations of Poisson marked point processes can be generated using rmpoispp. The rst
argument of this command species the intensity or intensity function (u, m). It can be a
constant, a vector of constants, or an R function.

Complete spatial randomness and independence

A uniform Poisson marked point process in R2 with marks in M can be dened in the following
equivalent ways.
randomly marked Poisson process (Poisson [X], iid [M |X]): a Poisson point process of
locations X with intensity is rst generated. Then each point xi is labelled with a
random mark mi , independently of other points, with distribution P {Mi = m} = pm for
m M.

>
>
>
>
>
>

par(mfrow = c(1, 2))


Xunif <- rmpoispp(100, types = c("A", "B"), win = square(1))
plot(Xunif, main = "CSRI, intensity A=100, B=100")
Xunif <- rmpoispp(c(100, 20), types = c("A", "B"), win = square(1))
plot(Xunif, main = "CSRI, intensity A=100, B=20")
par(mfrow = c(1, 1))

superposition of independent Poisson processes (iid [M ], Poisson [X|M ]): for each possible
mark m M, a Poisson process Xm is generated, with intensity m . The points of Xm
are tagged with the mark m. Then the processes Xm with dierent marks m M are
superimposed, to yield a marked point process.
Poisson marked point process (jointly Poisson [X, M ]): a Poisson process on R2 M is
generated, with intensity function (u, m) = m at location u and mark m.

CSRI, intensity A=100, B=100

CSRI, intensity A=100, B=20

These constructions are equivalent when m = pm . See the lovely book by Kingman [28].
Since the established term CSR (complete spatial randomness) is used to refer to the uniform
Poisson point process, I propose that the uniform marked Poisson point process should be called
complete spatial randomness and independence (CSRI).
24.1.2

Inhomogeneous Poisson marked point processes

A inhomogeneous Poisson marked point process Y with joint intensity (u, m) for locations u
and mark values m is simply dened as an inhomogeneous Poisson point process on R2 M
with intensity function (u, m).
Lets restrict attention to the case of categorical marks, where M is nite. Then the process
Y has the following properties:
The locations X, obtained by removing the marks, constitute an inhomogeneous Poisson
process in R2 with intensity function
(u) =

(u, m).

Conditional on the locations X, the marks attached to the points are independent. For a
point xi the conditional distribution of the mark mi is P{Mi = m} = (xi , m)/(xi ).
The sub-process Xm of points with mark m, is an inhomogeneous Poisson point process
with intensity m (u) = (u, m).
The sub-processes Xm of points with dierent marks m are independent processes.
c
Copyright CSIRO
2008

>
+
+
>
+
+
>
>
>
>
>

X1 <- rmpoispp(function(x, y, m) {
300 * exp(-3 * x)
}, types = c("A", "B"))
lamb <- function(x, y, m) {
ifelse(m == "A", 300 * exp(-4 * x), 300 * exp(-4 * (1 - x)))
}
X2 <- rmpoispp(lamb, types = c("A", "B"))
par(mfrow = c(1, 2))
plot(X1, main = "")
plot(X2, main = "")
par(mfrow = c(1, 1))

c
Copyright CSIRO
2008

24.3 Fitting Poisson models

153

154

Methods 11: multitype Poisson models


f (y) = exp

 

mM W

24.3.2

(1 (u, m) du

 n(y)


(xi , mi ).

(48)

i=1

Maximum likelihood

For the multitype Poisson process with intensity function (u, m) at location u W and mark
m M, the loglikelihood is, up to a constant,
log L =

n


log (xi , mi )

i=1

 

(u, m) du.

(49)

mM W

where mi is the mark attached to data point xi . This is formally equivalent to the loglikelihood
of a Poisson loglinear regression, so the Berman-Turner algorithm can again be used to maximise
the loglikelihood.

24.3

24.3.3

Fitting Poisson models

Poisson marked point process models may be tted to point pattern data using ppm. Currently
the methods are only available for multitype point processes (categorical marks).
24.3.1

Probability densities

Let W R2 be the study region, and M the (nite) set of possible marks. Then a marked point
pattern is a set
y = {(x1 , m1 ), . . . , (xn , mn )},

xi W,

mi M,

n0

of pairs (xi , mi ) of locations xi with marks mi . It can be viewed as a point pattern in the
Cartesian product W M.
The probability density of a marked point process is a function f (y) dened for all marked
point patterns y including the empty pattern .
The process with probability density f (y) 1 is the uniform Poisson marked point process
with intensity 1 for each mark. That is, for this model, the sub-process of points with mark
mi = m is a uniform Poisson process with intensity 1. If the marks are removed, we obtain a
Poisson point process with intensity equal to |M|, the number of possible types.
The uniform Poisson marked point process with intensity (u, m) = m has probability
density


f (y) = exp

mM


= exp

mM

 n(y)

(1 m )|W |
mi

(1 m )|W |

i=1

Model-tting in spatstat

Poisson marked point process models are tted to data using ppm.
The trend formula in the call to ppm may involve the reserved name marks as a variable.
This refers to the marks of the points. Since the marks are categorical, marks is treated as a
factor variable for modelling purposes.
To t the homogeneous multitype Poisson process (CSRI), equation (50), we call
> ppm(X, ~marks)
The formula ~marks indicates that the trend depends only on the marks, and not on spatial
location; since marks is a factor, the trend has a separate constant value for each level of marks.
This is the model (50).
Note that if we had typed
> ppm(X, ~1)
this would have tted the special case of CSRI where the intensities m are equal, m say,
for all possible marks. That model is only appropriate if we believe that all mark values are
equally likely.
For the Lansing Woods data, the minimal model that makes sense is (50), so we call
> ppm(lansing, ~marks)
Stationary multitype Poisson process
Possible marks:
blackoak hickory maple misc redoak whiteoak
Trend formula: ~marks

nm (y)
m

mM

where nm (y) is the number of points in y having mark value m.


The inhomogeneous Poisson marked point process with intensity function (u, m), at location
u W and mark m M, has probability density
c
Copyright CSIRO
2008

Intensities:
beta_blackoak
135
beta_whiteoak
448

beta_hickory
703

c
Copyright CSIRO
2008

beta_maple
514

beta_misc
105

beta_redoak
346

24.3 Fitting Poisson models

155

Since lansing is a multitype point pattern (its marks are categorical), the variable marks in
the formula is a factor. The model has one parameter/coecient for each level of the factor, i.e.
one coecient for each type of point. In other words, this is the homogeneous Poisson marked
point process with intensity m for points of mark m.
Youll notice that the parameter estimates m coincide with those obtained from summary.ppp
above. That is a consequence of the fact that the maximum likelihood estimates (obtained by
ppm) are also the method-of-moments estimates (obtained by summary.ppp).
A more complicated example is
> ppm(lansing, ~marks + x)
Nonstationary multitype Poisson process
Possible marks:
blackoak hickory maple misc redoak whiteoak

156

The symbol * here is an interaction in the usual sense for linear models. The tted model
is the marked Poisson process with
log ((x, y, m)) = m + m x
where 1 , . . . , 6 and 1 , . . . , 6 are parameters. The intensity is loglinear in x with a dierent
slope and intercept for each mark.
The result of ppm is again an object of class "ppm" representing a tted point process model.
To plot the tted intensity and conditional intensity of the tted model, use plot.ppm. For a
multitype point process you will get a separate plot for each possible mark value.
More complicated examples are:
> ppm(lansing, ~marks * polynom(x, y, 2))
> ppm(lansing, ~marks * harmonic(x, y, 2))

Trend formula: ~marks + x


Fitted coefficients for trend formula:
(Intercept) markshickory
marksmaple
4.94294727
1.65008211
1.33694849
markswhiteoak
x
1.19951845
-0.07581624

Methods 11: multitype Poisson models

marksmisc
-0.25131442

marksredoak
0.94116400

This is the marked Poisson process whose intensity function ((x, y, m)) at location (x, y)
and mark m satises
log ((x, y, m)) = m + x
where 1 , . . . , 6 and are parameters. The intensity is loglinear in x, with a dierent intercept
for each mark, but the same slope (parallel loglinear regression). In the printout above, the
tted slope parameter is =-0.07581624. As discussed in Section 11.3 on page 61, the tted
coecients m for the categorical mark are interpreted in the light of the contrasts in force.
The default is the treatment contrasts, and the rst level of the mark is blackoak, so in this
case the tted coecient for m=blackoak is 4.942947, while the tted coecient for m=hickory
is 4.942947 + 1.650082 = 6.593029 and so on.
> ppm(lansing, ~marks * x)
Nonstationary multitype Poisson process
Possible marks:
blackoak hickory maple misc redoak whiteoak
Trend formula: ~marks * x
Fitted coefficients for trend formula:
(Intercept)
markshickory
marksmaple
5.2378062
1.4424915
0.6795604
markswhiteoak
x markshickory:x
1.0901772
-0.7063987
0.4511157
marksredoak:x markswhiteoak:x
0.5380413
0.2421379

marksmisc
-0.8482907
marksmaple:x
1.3243326

marksredoak
0.6916392
marksmisc:x
1.2138278

c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

157

25

Methods 12: Gibbs models for multitype point patterns

25.1.3

Gibbs point process models (section 18) are also available for marked point processes, and can
be tted to data using ppm. Currently the methods are only implemented for multitype point
processes (categorical marks), so we restrict attention to this case.

25.1

Gibbs models

Much of the theory of Gibbs models described in Section 18 carries over immediately to multitype
point processes.
25.1.1

Methods 12: Gibbs models for multitype point patterns

Pairwise interactions not depending on marks

The simplest examples of multitype pairwise interaction processes are those in which the interaction term cm,m (u, v) does not depend on the marks m, m . For example, we can take any of
the interaction functions c(u, v) described in section 18.3 and use it to construct a marked point
process.
Such processes can be constructed equivalently as follows [8]:

an unmarked Gibbs process is generated with rst order term b(u) = mM bm (u) and
pairwise interaction c(u, v).
each point xi of this unmarked process is labelled with a mark mi with probability distribution P{mi = m} = bi (xi )/b(xi ) independent of other points.

Conditional intensity

The conditional intensity (u, X) of an (unmarked) point process X at a location u was dened
in section 18.5. Roughly speaking (u, x) du is the conditional probability of nding a point
near u, given that the rest of the point process X coincides with x.
For a marked point process Y the conditional intensity is a function ((u, m), Y) giving a
value at a location u for each possible mark m. For a nite set of marks M , we can interpret
((u, m), y) du as the conditional probability nding a point with mark m near u, given the rest
of the marked point process.
The conditional intensity is related to the probability density f (y) by
((u, m), y) =

f (y {u})
f (y)

for (u, m)  y.
For Poisson processes, the conditional intensity ((u, m), y) coincides with the intensity
function (u, m) and does not depend on the conguration y. For example, the homogeneous
Poisson multitype point process or CSRI (Section 24.1.1) has conditional intensity
((u, m), y) = m

(50)

where m 0 are constants which can be interpreted in several equivalent ways (section 18.5).
The sub-process consisting of points of type m only is Poisson with intensity m . The process
obtained
by ignoring the types, and combining all the points, is Poisson with intensity =

m m . The marks attached to the points are i.i.d. with distribution pm = m /.
25.1.2

158

Pairwise interactions

A multitype pairwise interaction process is a Gibbs process with probability density of the form

n(y)


bmi (xi ) cmi ,mj (xi , xj )
(51)
f (y) =
i=1

i<j

where bm (u), m M are functions determining the rst order trend for points of each type,
and cm,m (u, v), m, m M are functions determining the interaction between a pair of points of
given types m and m . The interaction functions must be symmetric, cm,m (u, v) = cm,m (v, u)
and cm,m cm ,m . The conditional intensity is
n(y)

((u, m); y) = bm (u)

cm,mi (u, xi ).

(52)

i=1

c
Copyright CSIRO
2008

If additionally the intensity functions are constant, bm (u) m , then such a point process
has the random labelling property.
25.1.4

Mark-dependent pairwise interactions

Various complex kinds of behaviour can be created by postulating a pairwise interaction that
does depend on the marks.
A simple example is the multitype hard core process in which m (u) and

1 if ||u v|| > rm,m
(53)
cm,m (u, v) =
0 if ||u v|| rm,m
where rm,m = rm ,m > 0 is the hard core distance for type m with type m . In this process, two
points of type m and m respectively can never come closer than the distance rm,m .
By setting rm,m = for a particular pair of marks m, m we eectively remove the interaction term between points of these types. If there are only two types, say M = {1, 2},
then setting r1,2 = implies that the sub-processes X1 and X2 , consisting of points of types
1 and 2 respectively, are independent point processes. In other words the process satises the
independence-of-components property.
The multitype Strauss process has pairwise interaction term

1
if ||u v|| > rm,m
cm,m (u, v) =
(54)
m,m if ||u v|| rm,m
where rm,m > 0 are interaction radii as above, and m,m 0 are interaction parameters.
In contrast to the unmarked Strauss process, which is well-dened only when its interaction
parameter is between 0 and 1, the multitype Strauss process allows some of the interaction
parameters m,m to exceed 1 for m = m , provided one of the relevant types has a hard core
(m,m = 0 or m ,m = 0).
If there are only two types, say M = {1, 2}, then setting 1,2 = 1 implies that the subprocesses X1 and X2 , consisting of points of types 1 and 2 respectively, are independent Strauss
processes.
The multitype Strauss-hard core process has pairwise interaction term

if ||u v|| < hm,m


0
cm,m (u, v) =
m,m if hm,m ||u v|| rm,m
(55)

1
if ||u v|| > rm,m
where rm,m > 0 are interaction distances and m,m 0 are interaction parameters as above,
and hm,m are hard core distances satisfying hm,m = hm ,m and 0 < hm,m < rm,m .
c
Copyright CSIRO
2008

25.2 Pseudolikelihood for multitype Gibbs processes

25.2

159

Pseudolikelihood for multitype Gibbs processes

Models can be tted by maximum pseudolikelihood. For a multitype Gibbs point process with
conditional intensity ((u, m); y), the log pseudolikelihood is
 

n(y)

log PL =

log ((xi , mi ); y)

i=1

mM

((u, m); y) du.

(56)

The pseudolikelihood can be maximised using an extension of the Berman-Turner device [3].

25.3

Fitting Gibbs models to multitype data

Marked point process models may be tted to point pattern data using ppm. Currently the
methods are only available for multitype point processes (categorical marks).
25.3.1

Interactions not depending on marks

The model-tting function ppm expects an argument interaction that species the interpoint
interaction structure of the point process. The default is no interaction, corresponding to a
Poisson process.
On page 118 there is a list of interpoint interactions for modelling unmarked point patterns.
These interactions can also be used, without modication, to t models to multitype point
patterns.
For example
> ppm(lansing, ~marks, Strauss(0.07))
ts a multitype version of the Strauss process (section 18.3.2) in which the conditional intensity
is
((u, m), y) = m t(u,y) .
Here m are constants which account for the unequal abundance of the dierent species of tree.
The other quantities are the same as in (42). The interaction between two trees is assumed to be
the same for all species, and is controlled by the interaction parameter and interaction radius
r = 0.07. For example, this includes the case = 0 where no two trees (whatever species they
belong to) come closer than 0.07 units apart, a multitype hard core process.
25.3.2

Interactions depending on marks

There are two additional interpoint interactions dened in spatstat for multitype point patterns:
MultiStrauss
the multitype Strauss process
MultiStraussHard multitype hybrid hard core / Strauss process
In these models, the interaction between two points depends on the types of the points as
well as their separation. For example, in the multitype Strauss process, for each pair of types i
and j there is an interaction radius rij and interaction parameter ij .
To t the stationary multitype Strauss process to the dataset betacells:
> data(betacells)
> r <- 30 * matrix(c(1, 2, 2, 1), nrow = 2, ncol = 2)
> ppm(betacells, ~1, MultiStrauss(c("off", "on"), r), rbord = 60)
c
Copyright CSIRO
2008

160

Methods 12: Gibbs models for multitype point patterns

Stationary Multitype Strauss process


Possible marks:
off on
First order terms:
beta_off
beta_on
0.0001373652 0.0001373652
Interaction: Pairwise interaction family
Interaction:
Multitype Strauss process
2 types of points
Possible types:
[1] "off" "on"
Interaction radii:
off on
off 30 60
on
60 30
Fitted interaction parameters gamma_ij:
off
on
off 0.0000 0.8303
on 0.8303 0.0000
Relevant coefficients:
markoffxoff markoffxon
markonxon
-17.2378706 -0.1860184 -17.2138383

To t a nonstationary multitype Strauss process with log-cubic polynomial trend:

> ppm(betacells, ~polynom(x, y, 3), MultiStrauss(c("off", "on"),


+
r), rbord = 60)

For more detailed explanation and examples of modelling and the interpretation of model
formulae for point processes, see [5].

25.3.3

Plotting the tted interaction

The tted pairwise interaction in a point process model can be plotted using fitin. The value
returned by fitin is a function array (class "fasp").

> model <- ppm(betacells, ~polynom(x, y, 3), MultiStrauss(c("off",


+
"on"), r), rbord = 60)

> plot(fitin(model))
c
Copyright CSIRO
2008

25.3 Fitting Gibbs models to multitype data

161

26

Fitted pairwise interactions

Line segment data

1.0
0.8
0.6
0.4

Pairwise interaction

>
>
>
>

data(copper)
L <- copper$Lines
L <- rotate(L, pi/2)
plot(L)

0.0

0.2

1.0
0.8
0.6
0.4
0.2
0.0

off

Pairwise interaction

20

40

60

20

40

60

0.8
0.6
0.4
0.2
0.0

0.0

0.2

0.4

0.6

0.8

Pairwise interaction

1.0

Distance

1.0

Distance

on

Line segment data

spatstat also has some facilities for handling spatial patterns of line segments.
For example, the copper dataset in spatstat contains a dataset copper$Lines that records
the locations of geological faults in a survey region.

on

off

Pairwise interaction

162

20

40
Distance

60

20

40

60

Distance

A spatial pattern of line segments is represented by an object of class "psp". It consists of


a list of line segments (given by the coordinates of their two endpoints), and a window in which
the line segments were observed. The line segments may also carry marks.
Objects of class "psp" can be created by the function psp or obtained by converting other
data using the function as.psp.
Capabilities available for this class include:
[.psp
subset operator (also performs clipping)
marks.psp
extract marks
endpoints.psp
extract midpoints of line segments
midpoints.psp
compute midpoints of line segments
lengths.psp
compute lengths of line segments
angles.psp
compute angles of orientation for line segments
rotate.psp
rotate a line segment pattern
shift.psp
shift a line segment pattern
affine.psp
apply ane transformation
pairdist.psp
distances between line segments
crossdist.psp
distances between line segments
nndist.psp
closest distances between line segments
density.psp
kernel-smoothed intensity image
crossing.psp
nd intersection points between line segments
selfcrossing.psp nd intersection points between line segments
unitname.psp
determine units of length
rescale.psp
change units of length
rshift.psp
apply random shift to each line segment
c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

163

27

There are also the usual methods


plot.psp
print.psp
summary.psp

164

Further information on spatstat

Further information on spatstat

Help les

plot a line segment pattern


print information on a line segment pattern
compute summary of a line segment pattern

For information on a particular command in spatstat, consult the online help le by typing
help(command). The help les are detailed and extensive. The complete manual is over 500
pages.
For examples of the use of a particular command, read the examples section in the help le,
or type example(command) to see the examples executed.

> summary(L)
146 line segments
Lengths:
Min. 1st Qu.
Median
Mean 3rd Qu.
Max.
0.09242 6.61400 12.18000 15.02000 19.95000 65.48000
Total length: 2192.57251480451 km
Length per unit area: 0.196937548404655
Angles (radians):
Min. 1st Qu.
Median
Mean 3rd Qu.
Max.
0.008107 0.549500 1.747000 1.378000 2.113000 2.912000
Window: polygonal boundary
single connected closed polygon with 4 vertices
enclosing rectangle: [-158.23, -0.19] x [-0.335, 70.11] km
Window area = 11133.3 square km
Unit of length: 1 km

Quick reference
Type help(spatstat) for a quick-reference overview of all the functions available in the package.
For a demonstration of many of the capabilities of spatstat, type demo(spatstat).
For a visual display of all the datasets supplied in spatstat, type demo(data).

Website
The website www.spatstat.org contains information on recent updates to the package, frequentlyasked questions, bug xes, literature and other developments.

Modelling
For examples on tting point process models, see [5].

> plot(distmap(L))
> plot(L, add = TRUE)

Citation

60

10 12 14

40

distmap(L)

If you use spatstat in a research publication, it would be much appreciated if you could cite
the paper [4], or mention spatstat in the acknowledgements.
In doing so, you will help us to justify the expenditure of time and eort on maintaining and
developing the package.
Citation details are also available in the package by typing citation(package="spatstat").

20

Queries and requests

150

100

50

If you have diculty in getting the package to do what you want, or if you have a suggestion for
additional features that could be added, please contact the package authors, [email protected]
and [email protected], or email the R special interest group in spatial and geographical
statistics, [email protected].

c
Copyright CSIRO
2008

c
Copyright CSIRO
2008

REFERENCES

165

References
[1] A. Baddeley, J. Mller, and A.G. Pakes. Properties of residuals for spatial point processes.
Annals of the Institute of Statistical Mathematics, 2007. To appear. Accepted for publication
6 July 2007.
[2] A. Baddeley, J. Mller, and R. Waagepetersen. Non- and semiparametric estimation of interaction in inhomogeneous point patterns. Statistica Neerlandica, 54(3):329350, November
2000.
[3] A. Baddeley and R. Turner. Practical maximum pseudolikelihood for spatial point patterns
(with discussion). Australian and New Zealand Journal of Statistics, 42(3):283322, 2000.
[4] A. Baddeley and R. Turner. Spatstat: an R package for analyzing spatial point patterns.
Journal of Statistical Software, 12(6):142, 2005. URL: www.jstatsoft.org, ISSN: 15487660.
[5] A. Baddeley and R. Turner. Modelling spatial point patterns in R. In A. Baddeley, P. Gregori, J. Mateu, R. Stoica, and D. Stoyan, editors, Case Studies in Spatial Point Pattern
Modelling, number 185 in Lecture Notes in Statistics, pages 2374. Springer-Verlag, New
York, 2006. ISBN: 0-387-28311-0.
[6] A. Baddeley, R. Turner, J. Mller, and M. Hazelton. Residual analysis for spatial point
processes (with discussion). Journal of the Royal Statistical Society, series B, 67(5):617666,
2005.
[7] A.J. Baddeley. Spatial sampling and censoring. In O.E. Barndor-Nielsen, W.S. Kendall,
and M.N.M. van Lieshout, editors, Stochastic Geometry: Likelihood and Computation, chapter 2, pages 3778. Chapman and Hall, London, 1998.
[8] A.J. Baddeley and J. Mller. Nearest-neighbour Markov point processes and random sets.
International Statistical Review, 57:89121, 1989.
[9] A.J. Baddeley, R.A. Moyeed, C.V. Howard, and A. Boyde. Analysis of a three-dimensional
point pattern with replication. Applied Statistics, 42(4):641668, 1993.
[10] A.J. Baddeley and B.W. Silverman. A cautionary example on the use of second-order
methods for analyzing point patterns. Biometrics, 40:10891094, 1984.

166

REFERENCES

[16] D.R. Brillinger. Comparative aspects of the study of ordinary time series and of point
processes. In P.R. Krishnaiah, editor, Developments in Statistics, pages 33133. Academic
Press, 1978.
[17] N.A.C. Cressie. Statistics for Spatial Data. John Wiley and Sons, New York, 1991.
[18] D.J. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes. Springer
Verlag, New York, 1988.
[19] P.J. Diggle. Statistical analysis of spatial point patterns. Academic Press, London, 1983.
[20] P.J. Diggle. A point process modelling approach to raised incidence of a rare phenomenon
in the vicinity of a prespecied point. Journal of the Royal Statistical Society, series A,
153:349362, 1990.
[21] P.J. Diggle. Statistical Analysis of Spatial Point Patterns. Arnold, second edition, 2003.
[22] P.J. Diggle, N. Lange, and F. M. Benes. Analysis of variance for replicated spatial point
patterns in clinical neuroanatomy. Journal of the American Statistical Association, 86:618
625, 1991.
[23] P.J. Diggle, J. Mateu, and H.E. Clough. A comparison between parametric and nonparametric approaches to the analysis of replicated spatial point patterns. Advances in
Applied Probability (SGSA), 32:331343, 2000.
[24] P.J. Diggle and B. Rowlingson. A conditional approach to point process modelling of
elevated risk. Journal of the Royal Statistical Society, series A (Statistics in Society),
157(3):433440, 1994.
[25] A.C.A. Hope. A simplied Monte Carlo signicance test procedure. Journal of the Royal
Statistical Society, series B, 30:582598, 1968.
[26] C.V. Howard, S. Reid, A.J. Baddeley, and A. Boyde. Unbiased estimation of particle density
in the tandem-scanning reected light microscope. Journal of Microscopy, 138:203212,
1985.
[27] F. Huang and Y. Ogata. Improvements of the maximum pseudo-likelihood estimators
in various spatial statistical models. Journal of Computational and Graphical Statistics,
8(3):510530, 1999.

[11] A.J. Baddeley and M.N.M. van Lieshout. Area-interaction point processes. Annals of the
Institute of Statistical Mathematics, 47:601619, 1995.

[28] J.F.C. Kingman. Poisson Processes. Oxford University Press, 1993.

[12] M. Bell and G. Grunwald. Mixed models for the analysis of replicated spatial point patterns.
Biostatistics, 5:633648, 2004.

[29] G.M. Laslett. Censoring and edge eects in areal and line transect sampling of rock joint
traces. Mathematical Geology, 14:125140, 1982.

[13] M. Berman and T.R. Turner. Approximating point process likelihoods with GLIM. Applied
Statistics, 41:3138, 1992.

[30] P.A.W. Lewis. Recent results in the statistical analysis of univariate point processes. In
P.A.W. Lewis, editor, Stochastic point processes, pages 154. Wiley, New York, 1972.

[14] J. Besag and P.J. Diggle. Simple Monte Carlo tests for spatial pattern. Applied Statistics,
26:327333, 1977.

[31] J.K. Lindsey. The analysis of stochastic processes using GLIM. Springer, Berlin, 1992.

[15] J.E. Besag and P. Cliord. Generalized Monte Carlo signicance tests. Biometrika, 76:633
642, 1989.
c
Copyright CSIRO
2008

[32] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller. Equation of
state calculations by fast computing machines. Journal of Chemical Physics, 21:10871092,
1953.
c
Copyright CSIRO
2008

REFERENCES

167

[33] J. Mller and R.P. Waagepetersen. Statistical Inference and Simulation for Spatial Point
Processes. Chapman and Hall/CRC, Boca Raton, 2003.

Index

[34] J. Mller and R.P. Waagepetersen. Modern statistics for spatial point processes. Research
Report R-2006-12, Department of Mathematical Sciences, Aalborg University, April 2006.
Submitted for publication.

analysis of deviance, 65

[35] Y. Ogata. Statistical models for earthquake occurrences and residual analysis for point
processes. Journal of the American Statistical Association, 83:927, 1988.

circular windows, 40
classes, 25
in R, 25
in spatstat, 25
clickppp, 24
complete spatial randomness, 53
and independence, 130, 151
denition, 53
Kolmogorov-Smirnov test, 56
quadrat counting test, 55
conditional intensity, 113
for marked point processes, 157
contrasts, 61, 155
covariate eects, 8
covariates, 6, 15, 61
in ppm, 61
Cox process, 80
CSRI, 130, 151
conditional intensity, 157
tting to data, 154
simulating, 152

[36] B.D. Ripley. Modelling spatial patterns (with discussion). Journal of the Royal Statistical
Society, series B, 39:172212, 1977.
[37] B.D. Ripley. Simulating spatial patterns: dependent samples from a multivariate density.
Applied Statistics, 28:109112, 1979.
[38] B.D. Ripley. Spatial Statistics. John Wiley and Sons, New York, 1981.
[39] B.D. Ripley. Statistical Inference for Spatial Processes. Cambridge University Press, 1988.
[40] A. Sarkka. Pseudo-likelihood approach for pair potential estimation of Gibbs processes.
Number 22 in Jyv
askyl
a Studies in Computer Science, Economics and Statistics. University
of Jyvaskyla, 1993.
[41] D. Stoyan and P. Grabarnik. Second-order characteristics for stochastic structures connected with Gibbs point processes. Mathematische Nachrichten, 151:95100, 1991.
[42] D. Stoyan and H. Stoyan. Fractals, Random Shapes and Point Fields. John Wiley and
Sons, Chichester, 1995.
[43] M.N.M. van Lieshout. Markov Point Processes and their Applications. Imperial College
Press, 2000.
[44] M.N.M. van Lieshout and A.J. Baddeley. A nonparametric measure of spatial interaction
in point patterns. Statistica Neerlandica, 50:344361, 1996.
[45] R. Waagepetersen. An estimating function approach to inference for inhomogeneous
Neyman-Scott processes. Submitted for publication, 2006.

empty space function, 85


envelopes, 98
and Monte Carlo tests, 98
for any tted model, 101
for any simulation procedure, 101
in spatstat, 98
of summary functions, 98
exploratory data analysis, 20
for marked point patterns, 138

binary mask, 26, 42

tted model, 119


goodness-of-t, 67, 125
interpretation of coecients, 61
methods for, 63
residuals, 68, 127
simulation of, 66
tting models
by Huang-Ogata method, 124
maximum pseudolikelihood, 116
to marked point patterns, 154, 159
via summary statistics, 98, 102
fv, 30
geometrical transformations, 49
Gibbs models, 109
area-interaction, 112
Diggle-Gates-Stibbard, 112
Diggle-Gratton, 112
tting, 116
by Huang-Ogata method, 124
maximum pseudolikelihood, 116
ppm, 116
tting to marked point patterns, 159
goodness-of-t, 125
hard core process, 110
in spatstat, 118
innite order interaction, 112
multitype, 157
maximum pseudolikelihood, 159
multitype pairwise interaction, 157
pairwise interaction, 112
residuals, 127
simulation, 114
simulation of tted model, 121
soft core, 112
Strauss process, 111
Strauss-hard core, 112

data entry, 31
at the terminal, 31
basic, 31, 32
checking, 34
from le, 32
marked point patterns, 133
marks, 32
point-and-click, 24
datasets
inspecting, 19
provided in spatstat, 24
dispatching, 25
distance methods, 83
distances
empty space, 83, 84
nearest neighbour, 83, 90
pairwise, 83, 92
distmap, 83
edge eects, 85
empty space distances, 83, 84

c
Copyright CSIRO
2008

168

INDEX

goodness-of-t, 67
for tted Gibbs model, 125
for Poisson models, 67
hard core process, 110
multitype, 158
Huang-Ogata method, 124
im, 25, 74
images, 74
computing with, 78
creating, 74
from raw data, 74
exploratory inspection of, 76
extracting subset, 77
plotting, 76
returned by a function, 75
independence of components, 130, 148
intensity
function, 37
kernel estimator, 37
homogeneous, 36
inhomogeneous, 37
investigation of, 36
measure, 37
of marked point process, 138
interaction, 7, 10
distance methods, 83
in spatstat, 118
multitype, 157, 159
in spatstat, 159
plotting a tted interaction, 160
QQ plot, 73
simple models, 79
summary functions, 83
K function, 21, 92
for multitype point pattern, 142
inhomogeneous, 105
kernel estimator of intensity, 37, 38
kernel smoothing of marks, 140
Kolmogorov-Smirnov test
of CSR, 56
of inhomogeneous Poisson, 68
line segments, 162
lurking variable plot, 70
mark correlation function, 146
marked point patterns

169

cutting marks into bands, 136


data entry, 133
exploratory data analysis, 138
exploring marks, 140
inspecting, 134
joint and conditional analysis, 130
manipulating, 136
methodological issues, 130
model-tting, 154, 159
probabilistic formulation, 129
randomisation tests, 130
separating into types, 136
summary functions, 142
marked point process
intensity, 138
marks, 5, 14, 129
categorical, 33
data entry, 31, 32
exploratory data analysis, 140
manipulating, 136
operations on, 48
smoothing, 140
spatial trend in, 140
versus covariates, 14
markstat, 142
marktable, 141
Matern cluster process, 79
maximum likelihood, 58
maximum pseudolikelihood, 116, 159
for multitype Gibbs models, 159
improvements over, 124
methods, 25
default method, 27
dispatch, 25
minimum contrast, 98, 102
model validation, 67, 125
Monte Carlo test, 98
pointwise, 98
simultaneous, 99
multitype hard core process, 158
multitype point pattern, 9, 10, 21, 33
multitype point patterns
separating into types, 136
summary functions, 142
multitype Strauss process, 158
nearest neighbour distances, 83, 90
nndist, 83
nuisance parameters, 122
c
Copyright CSIRO
2008

170

owin, 25, 40
pairdist, 83
pairwise distances, 83, 92
pairwise interaction process, 110
point pattern, 5
marked, 129
marks, 5, 14
multitype, 9, 10
needs window, 47
point process model for, 12
standard model, 13
point process, 12
point process models
area-interaction, 112
Diggle-Gates-Stibbard, 112
Diggle-Gratton, 112
Gibbs, 109
hard core, 110
innite order interaction, 112
pairwise interaction, 110, 112
soft core, 112
Strauss, 111
Strauss-hard core, 112
Poisson cluster processes, 79
Poisson models
tting, 59
goodness-of-t, 67
homogeneous, 53
inhomogeneous, 58
log-likelihood, 59
marked, 151
maximum likelihood, 58
residuals, 68
Poisson point process
homogeneous
denition, 53
simulation, 53
inhomogeneous
denition, 58
tting, 59
likelihood, 59
motivation, 58
simulation, 58
Poisson-derived models, 79
polygonal windows, 26, 41
ppm, 63, 119
marked Gibbs point process models, 159
c
Copyright CSIRO
2008

INDEX

marked Poisson point process models,


154
methods for, 63
ppp, 25
combining several, 49
extracting subset, 47
format, 45
geometrical transformations, 49
in arbitrary window, 44
manipulating, 45
needs window, 47
operations on, 47
ways to make, 35
probability density, 109
prole pseudolikelihood, 122
pseudolikelihood, 116
prole pseudolikelihood, 122
quadrat counting, 20, 37
quadrat counting test
of CSR, 55
quadrat test
of inhomogeneous Poisson, 67
R, 16
contributed packages, 17
where to get, 16
random labelling, 130, 149
random thinning, 58
randomisation tests, 130, 147
for marked point patterns, 147
rectangular windows, 26, 40
residuals, 68, 127
for tted Gibbs model, 127
for Poisson models, 68
lurking variable plot, 70
QQ plot, 72
smoothed residual eld, 70
return value, 28
rpoispp, 53, 58
runifpoint, 54
sequential models, 81
simulation
of tted Gibbs model, 121
of tted Poisson model, 66
smoothed residual eld, 70
spatstat, 18, 164
citing, 18
getting started, 18

INDEX

171

installing, 18
split, 23
standard model, 13
Strauss process, 111
tting to data, 117
multitype, 158
summary functions, 83
and Monte Carlo tests, 98
critique, 96
edge eects, 85
envelopes, 98
F , 85
for multitype point patterns, 142
G, 90
inference using, 98
inhomogeneous K, 105
J, 95
K, 92
L, 93
mark correlation, 146
model-tting with, 102
pair correlation, 93
tests
2 quadrat counting, 55
Kolmogorov-Smirnov, 56, 68
Monte Carlo, 98
thinning, 80
Thomas process, 79
tips, 25, 29, 34, 48, 84, 87, 99, 133
treatment contrasts, 61
unitname, 35
units of length, 35
validation, 67, 125
windows, 40
binary mask, 26, 42
circular, 40
needed in any point pattern, 47
operations on, 44
polygonal, 26, 41
rectangular, 26, 40
returned by functions, 43
2 quadrat counting test, 55

c
Copyright CSIRO
2008

You might also like