Point Pattern Analysis in An ArcGIS Environment
Point Pattern Analysis in An ArcGIS Environment
For a number of years in the Departments of Geography at San Diego State University and the University of
Illinois, we have been working on the development of software for a variety of ESDA routines, especially in
the field of spatial association. These software programs make it possible to easily apply such analytical
devices as nearest neighbor, general pattern, and local statistics. We have called our routine Point Pattern
Analysis (PPA), but the actual product is constantly in flux as we attempt to make the approach as user
friendly and comprehensive as we can. Those who have worked on the programs over the years include
Marc Armstrong (Iowa), Laura Hungerford (Illinois), DongMei Chen (Queens), Lauren Scott (ESRI), and
the authors.
Currently we are adapting some of the materials of PPA for use in an ArcGIS environment. We
use Visual Basic for Applications (VBA) to create a visual representation of several PPA routines. We are
now in a position to demonstrate the use of certain local statistics in a fully functioning, uncoupled system.
In this paper we demonstrate our approach on the local G statistic (Ord and Getis 1995). We should point
out that one of our recent endeavors is to create a user-friendly approach to the new O statistic (Ord and
Getis 2001), a spatial statistic designed to describe local clustering in a non-stationary setting.
To better understand what it is that we are attempting to do, we call attention to our on-line
software materials, which represent an earlier approach to 14 different ESDA tools with documentation. It
Building on that work, Lauren Scott recently included in an ArcView 3.2 environment several of these
routines. We used these successfully in exercises assigned for social science students and young faculty in
CSISS workshops on pattern analysis the last two summers. Now, we are moving further along by
Thus far PPA has been used in about a dozen published research projects. In addition, SpaceStat,
the well-known spatial econometrics package, and SAGE, a British ESDA package, have a few of the PPA
statistics included. Just recently, ClusterSeer, from BioMedware, the health-related software company, has
ArcGIS
ArcGIS is the latest version of Environmental Research Systems Institute's (ESRI) suite of GIS products.
ArcGIS is designed as a scalable system for geographic data creation, management, and analysis. ESRI
products have a large user base. The ESRI website states that there are over 1 million ESRI software users
worldwide, and that 50,000 university students receive instruction utilizing ESRI products every year (ESRI
2002).
In previous versions, ESRI's desktop GIS, ArcView, and its enterprise level GIS, ArcInfo, were
very different in terms of the primary geographic data model and user interface. In ArcGIS, however, all of
the products use the same data model, GUI interface, and development environment (Limp 2001). In
addition to ArcView and ArcInfo, there is a medium sized version of ArcGIS called ArcEditor. Each
variety of ArcGIS consists of several individual applications that provide a set of functionality. ArcMap is
a primary component of all three versions and is the interface for data display and analysis. An analysis
The development platform for the ArcGIS applications is called ArcObjects. ArcObjects is built using
Microsoft's Component Object Model (COM) and can be embedded and extended using any COM
compliant language. Visual Basic and Visual C++ are COM compliant languages that can be used to
extend existing ArcGIS applications or write stand-alone applications that utilize ArcObjects components.
This method of development generally requires that the developer first becomes a competent Visual Basic
VBA is a scripting language that allows the developer to use the existing framework of ArcMap and add
functionality through the creation of custom tools, menus, and modules (Zeller 2001).
The primary advantage of using VBA to develop spatial analysis tools for ArcMap is ease of
development. The integrated development environment (IDE) allows the programmer to design, code, and
debug within a common environment. Building GUI interfaces is simplified with a drag and drop control
toolbox. Easy integration with common Windows features such as context help, Windows help files, and
file access dialogs allow custom tools to have the same look and feel as larger applications. As with all
scripting languages, the primary drawback of VBA is slow execution compared to compiled code. This
The package is written in the C programming language and includes a variety of pattern analysis routines.
There are both DOS and UNIX versions of the program. A screenshot of the DOS version of the program
There are several limitations that the PPA users face when analyzing their data. PPA accepts input
as an ASCII file in three columns: x coordinate, y coordinate, and value at the point. The output is saved to
an ASCII file and displayed on the screen as text. Often, PPA users must convert their data files to ASCII
before running PPA and again convert their files to a GIS compatible format for display after analysis. The
interface of PPA is text menu based. The lack of a GUI interface and features such as file dialogs create
frustration among users used to windowing environments. Integrating PPA routines into a GIS environment
would eliminate these drawbacks and allow for truly integrated exploratory spatial data analysis (ESDA).
• Nearest Neighbor Analysis – A test of significance on the observed mean nearest neighbor distance
in a point pattern
• Refined Nearest Neighbor Analysis – A Monte Carlo test comparing the complete distribution
function of the observed nearest neighbor distances.
• K-function Analysis – Also called second order analysis, a point pattern test based on inter-event
distances and a Monte Carlo significance test.
• Local K-function – A local test based on K-function analysis that examines the clustering of
neighbors around each point individually.
• Weighted K-function analysis – A statistical test based on the permutation of the weights at each
point.
• Knox space time clustering – A method to analyze the clustering of events in both space and time
based on the Poisson distribution.
• Join count statistic – A simple measure of spatial autocorrelation for data measured at the nominal
scale.
• Geary’s c - A measure of global spatial autocorrelation that uses the sum of the squared differences
between pairs of data values as its measure of covariation.
• Getis and Ord’s General G - A multiplicative measure of overall spatial association of values that
fall within a given distance of each other.
• Local Getis and Ord G-statistics – These statistics evaluate the extent to which a location is
surrounded by a cluster of high or low values.
PPA in ArcGIS
We have selected the local Getis and Ord G-statistics as a test case for implementing PPA routines in
ArcGIS. This custom tool is written as a VBA macro. The macro uses one form to receive user input, and
an additional dialog box to query the user for an output field name. After computing the statistic for each
observation, the result is saved in the specified field of the original dataset. The complete code listing of
the local G macro is included in Appendix A, and an example of its use follows.
The example will use the local G-statistic to analyze the cancer rates of New Mexico counties.
The data are taken from the National Cancer Institute Biometry Research Group Datasets, and it is available
on their website, http://dcp.nci.gov/bb/datasets.html. The total number of cases diagnosed between 1980
and 1989 were divided by the 1985 population estimate to determine the cancer rate (cases per 10,000
population) of each county. County seats are used as a proxy for population centroids. The data are in the
shapefile format and are loaded into ArcMap. For reference, a polygon layer of the New Mexico counties
is also displayed.
menu will be displayed with two more options. Selecting the Visual Basic Editor option will bring up the
VBA editor and allow the user to create, view, and modify macros. Selecting the Macros… option will
display the macros available to the user in the current ArcMap document. The Macros dialog is shown in
Figure 3.
In the Macros dialog box, the user selects the desired macro and clicks an action button along the
right hand side. In this case the only macro available is the LocalG.LocalG macro. The name before the
period indicates the code module that the macro is in, and the name after the period is the name of the
subroutine to run. When the Run button is clicked the macro begins execution.
The Local G Statistic dialog is displayed to prompt the user for input. The first combo box
prompts the user to select a layer to analyze. This combo box will be populated with all the point layers in
the active map. After selecting a point layer, the second combo box is populated with all the numeric fields
available for the specified layer. In our example, the user selects the NM County Seats point layer and the
CancerRate field.
The text box labeled Distance is used to enter a distance for analysis. The distance units are the
same as the units of the map, and in this case the units are meters. The checkbox labeled Star specifies
whether to include the value at the observation itself in the analysis at that point. For the example we have
chosen a distance of 100,000 meters and we have checked the Star checkbox. After specifying the
parameters of analysis the user presses the OK button to continue. At anytime the user may select the
Next, the user is prompted for an output field name (Figure 5). A default name is supplied, but the
user may change the name. For display purposes we have changed the field name to Gi*(100km) from the
After the user selects an output field name, the specified G-statistic is calculated for each
observation and stored in the attribute table of the layer being analyzed. The attribute table for our example
dataset is shown in Figure 6. The existing data and our newly created field, the column on the right hand
side of the table labeled Gi*(100km), can be viewed together immediately after execution of the macro.
The existing functionality of ArcMap can then be used to display the results graphically using
charts, graphs, and maps. These graphical outputs can be dynamically linked to the data table to further aid
ESDA. The map in Figure 7 shows the local G-statistic values calculated above. Figure 8 is a more
complex map that displays both the cancer rates and the G-statistic value at five different distances for each
county.
Figure 7: The local Gi* statistic value for 100km calculated from cancer rates of New Mexico counties
Looking Ahead
The use of VBA macros to create spatial data analysis tools in ArcMap is relatively easy to learn and quick
to implement. Leveraging the existing file manipulation, data interfaces, and visualization functionality of
ArcMap allows the programmer to concentrate on developing spatial analysis methods. There are many
existing features of ArcObjects that we have not yet implemented. The integration of help files and the
utilization of the graphing objects are planned additions. Still unexplored is the ability of ArcObjects to
create new layers and graphics to aid in the visualization of data and results. We hope to show distance
qualities in analysis by stages. This will provide an outstanding tool for those wanting to learn about the
suffice. To perform analysis that use Monte Carlo simulations, such as K-function analysis, or involve a
large numbers of computations, like the O statistic, requires compiled software components. These
components will demand a more intensive development effort. Although ESRI has a large user base, the
expense of purchasing a distribution of ArcGIS makes these tools inaccessible to many institutions,
companies, and individuals. It is important, therefore, to maintain existing free versions of PPA and work
with other vendors and developers of open source spatial analysis tools.
Figure 8: Map of New Mexico Cancer Rates by County and Gi* values for five distances
References
ESRI (2001). ArcObjects Developer Help, Environmental Research Systems Institute, Inc., Redlands,
California
Ord, J.K. and Getis, A., (1995). Local Spatial Autocorrelation Statistics: Distribution Issues and an
Application, Geographical Analysis, 27(4): 286-306
Ord, J.K. and Getis, A., (2001). Testing for Local Spatial Autocorrelation in the Presence of Global
Autocorrelation, Journal of Regional Science, 41(3): 411-432
Getis, Arthur, Chen, Dong Mei, and Aldstadt, Jared (2000). Point Pattern Analysis 1.0,
http://xerxes.sph.umich.edu:2000/cgi-bin/cgi-tcl-examples/generic/ppa/ppa.cgi
Zeller, Michael, (2001). Exploring ArcObjects, Environmental Research Systems Institute, Inc., Redlands,
California
Appendix A: VBA Code for Local G Macro
This appendix is a listing of the code for the macro module that calculates the local G statistic. There are
global variable declarations, two module subroutines (LocalG and getFields), and three form event
subroutines.
LocalG Subroutine
Sub LocalG()
'This macro computes the local G statistics for point feature classes
'in ArcMap
'If there are no point layers in the map, send a message and exit
If pointExists = False Then
MsgBox "This routine requires a point layer"
Exit Sub
End If
With pOutFieldEdit
.Editable = True
.IsNullable = False
.Name = Left(outFieldName, 10)
.AliasName = outFieldName
.Type = esriFieldTypeDouble
.Length = 16
.Precision = 15
.Scale = 4
End With
pFeatureClass.AddField pOutField
Set pOutField = Nothing
outFieldIndex = pFeatureClass.FindField(Left(outFieldName, 10))
End If
Distance = Sqr(((dataset(counter, 0) - _
dataset(jcounter, 0)) ^ 2) + _
(((dataset(counter, 1) - _
dataset(jcounter, 1)) ^ 2)))
Else
For counter = 0 To (icount - 1)
Set pFeature = pFeatureClass.GetFeature(counter)
Distance = Sqr(((dataset(counter, 0) - _
dataset(jcounter, 0)) ^ 2) + _
(((dataset(counter, 1) - _
dataset(jcounter, 1)) ^ 2)))
If sumwxn = 0 Then
dataset(counter, 3) = 0
Else
dataset(counter, 3) = (sumwxn - (xbarn * sumwn)) _
/ (sn * Sqr(((icount * sumwn) _
- (sumwn ^ 2)) / (icount - 1)))
End If
pEnumLayer.Reset
Set pILayer = pEnumLayer.Next
For i = 0 To (pFields.FieldCount - 1)
Set pfield = pFields.Field(i)
If pfield.Type = esriFieldTypeDouble Or esriFieldTypeInteger Or _
esriFieldTypeSingle Or _
esriFieldTypeSmallInteger Then
frmLocalG.cmbField.AddItem pfield.Name
End If
Next i
End Sub
Form Event Subroutines
layerName = cmbLayer.Value
valueField = cmbField.Value
gDist = CDbl(txtDistance.Value)
star = chkStar.Value
Unload frmLocalG
End Sub