SAM: A Comprehensive Application For Spatial Analysis in Macroecology
SAM: A Comprehensive Application For Spatial Analysis in Macroecology
doi: 10.1111/j.1600-0587.2009.06299.x
# 2010 The Authors. Journal compilation # 2010 Ecography
Subject Editor: Carsten Rahbek. Accepted 15 December 2009
SAM (Spatial Analysis in Macroecology) is a freeware application that offers a comprehensive array of spatial statistical
methods, focused primarily on surface pattern spatial analysis. SAM is a compact, but powerful stand-alone software,
with a user-friendly, menu-driven graphical interface. The methods available in SAM are the most commonly used in
macroecology and geographical ecology, and range from simple tools for exploratory graphical analysis (e.g. mapping
and graphing) and descriptive statistics of spatial patterns (e.g. autocorrelation metrics), to advanced spatial regression
models (e.g. autoregression and eigenvector filtering). Download of the software, along with the user manual, can be
downloaded online at the SAM website: /<www.ecoevol.ufg.br/> (permanent URL at /<http://purl.oclc.org/sam//>).
Today there are many software applications and packages 22% for physical geography and 9% for questions related
available for spatial statistical analysis. Some of them are to evolutionary biology. These papers were collectively
stand-alone applications that offer several methods (e.g. published in 45 different journals, by authors from 33
Passage /<www.passagesoftware.net/>, GeoDa Bgeodacenter. different countries. The most commonly used methods
asu.edu), while others are specific to particular methods implemented in SAM were Moran’s I correlogram (43%),
(e.g. GWR3 /<http://ncg.nuim.ie/ncg/GWR/software.htm/>, Dutilleul’s (1993) estimator of effective sample size used
SpaceMaker2 /<www.bio.umontreal.ca/casgrain/en/labo/ in correlation analysis (12%), spatial auto-regression
spacemaker.html/>, ModTTest /<www.bio.umontreal.ca/ models (SAR, CAR or GLS, 11%) and spatial eigenvector
legendre/indexEn.html/>) or collections of routines within mapping (7%).
a general purpose statistical platform (e.g. SpDep for R, SAM has been under continuous development and
EconoTools for MatLab). SAM (Spatial Analysis in expansion (Table 1). SAM now uses extremely optimized
Macroecology, Rangel et al. 2006) is a compact, but linear algebra libraries for the most computer-intensive
powerful stand-alone freeware application, compiled for methods, so that time-consuming procedures (e.g. involving
the MS Windows environment, with a user-friendly, menu- eigenanalysis) are now must faster. Here we show how
driven graphical interface. SAM offers a comprehensive the most important features currently available in SAM
array of spatial statistical methods. The methods available in evolved, while highlighting the new and improved features
SAM are the most commonly used in macroecology and available in SAM v4, released in March 2010.
geographical ecology, ranging from simple tools for The data table in SAM is a rectangular matrix of
exploratory graphical analysis (e.g. mapping and graphing) numeric values, in which columns are variables and rows are
and descriptive statistics of spatial patterns (e.g. autocorre- individual observations (e.g. grid cells), formatted in tab-
lation metrics), to advanced spatial regression models (e.g. delimited text (ASCII) (*.txt or *.sam), dBase (*.dbf), MS
autoregression and eigenvector filtering). Excel (*.xls) or ESRI shapefile (*.shp and companion
Since SAM’s first release, in August 2005, it has been files). Geographic coordinates must be included as two of
downloaded about 9300 times (Fig. 1a), by researchers the columns (variables) in the data file. In addition to
working in 60 countries around the world. By tracking the main data table, recent versions of SAM also allow the
scientific publications that cite the original SAM paper input of species presence/absence matrices, in which each
(Rangel et al. 2006, Fig. 1b), we identified 165 studies species is represented in its own column, while rows are
that cited SAM, of which 83% reported that SAM was locations in which the species is present (1) or absent (0).
directly used for spatial statistical analysis. Among those Presence/absence matrices can be used, for instance, to
studies, 77% used SAM to investigate general ecological compute richness patterns considering different criteria
questions, whereas 25% used for biodiversity conservation, (e.g. body size and taxonomic structures; Bini et al. 2004,
46
(a) (b)
100
12000
90
Number of Downloads
80 10000
Number of Citations
70
8000
60
50 6000
40
4000
30
20 2000
10 0
Aug-05
Dec-05
Apr-06
Aug-06
Dec-06
Apr-07
Aug-07
Dec-07
Apr-08
Aug-08
Dec-08
Apr-09
Aug-09
Dec-09
0
2006 2007 2008 2009
Figure 1. (a) Time-series of number of scientific publications that cite the original SAM paper (Rangel et al. 2006). (b) Time series of
cumulative number of SAM downloads since first release (August 2005). Discontinuities in April 2007 and April 2008 were caused by
intensified download activity following the releases of SAM v2 and v3.
Terribile et al. 2009). In addition, if a matrix of species’ have become a standard format to share information on
traits (with species in rows and species’ traits in columns) is species distributions (range polygons or points), SAM can
available, then individual species in the presence/absence process the distribution of each species to record its
matrix can be selected according to a given trait (e.g. species presence or absence in each grid cell, and thus generate
with body size larger than the average body size), or species presence/absence matrices directly from shapefiles. Finally,
traits can be mapped in the geographical space, given the from ESRI rasters or text files, environmental layers can be
species assemblage in each location. This is a very useful downscaled to the resolution of the grid by calculating
tool for those interested in some of the most frequent mean and standard deviation of all observations within
investigated macroecological patterns, as for instance, the each grid cell, which then become additional variables in
Bergmann’s and Rapoport’s rules. the main data matrix.
Previous SAM versions were mostly dedicated to data The graphical exploratory data analysis (GEDA) is one
analysis, and most of data processing relevant to macro- of the most important steps in statistical analysis (Tukey
ecological studies had to be done with the aid of a GIS 1980). For this reason, one of SAM’s greatest strengths
software. The GIS environment implemented in the is its rich collection of graphical analytical tools and the
current SAM version, however, allows users to easily pre- simplicity of using and editing them. All charts, which
pare data for macroecological analysis without any addi- may be drawn with just a few clicks, allow zooming,
tional software. Grids can be generated in any resolution scrolling and changing colors, maximizing investigators’
and extent, using equal area square or hexagonal cells, and capacity to find patterns and identify particular details in
they can be saved into shapefiles. Also, because shapefiles the data. Colors are abundantly used to highlight patterns
Table 1. The evolution of the most used modules available in SAM. Greek letters denote the versions of the modules (a: first; b: second;
g: third; d: forth). GEDA stands for Graphical Exploratory Data Analysis, PAM stands for Presence/Absence Matrix and SEVM stands for Spatial
Eigenvector Mapping).
GEDA tools a b g d
Moran’s I and Auto-Correlogram a a a b
Spatial Correlation a a a b
Regression and Partial Regression a b b g
PAM and Spp. Attributes Mapping a a b
Principal Component Analysis a a a
Auto-Regression: Lagged a a b
Auto-Regression: SAR/CAR a a b
Auto-Regression: GLS a a b
SEVM a b
Model Selection and Multi-Model Inference a a
Logistic Regression a b
Geographically Weighted Regression a a
GIS Processing and Mapping a b
Pattern Finder a b
Ripley’s K a
Join-Count Analysis a
Mantel Test a
ANOVA a
47
in the data or to superimpose multiple plots in the same Still in the context of spatial autocorrelation, a new
panel. For example, in two-dimensional scatter-plots, poly- important feature in SAM 4.0 is that autocorrelation can
nomial regression lines can be easily drawn to highlight be evaluated in multidimensional data using Mantel test
the relationship between two variables. In the three- (Manly 1998). This technique is widely used in ecology
dimensional scatter-plot, tilting and rotating allow that and evolutionary biology to evaluate if the (dis)similarity
visual inspection of the data can be done from any among samples (many metrics are available) is structured
perspective. In the new version, the residuals plot applies in geographic space. One of the advanced features on SAM
a well designed set of color gradients in a scatter plot and implementation of Mantel test is the ability to perform a
in a map simultaneously. This tool graphically displays Mantel correlogram, which separates the geographic space
the correlation between two variables and is ideal for into sequential distance classes to aid the identification of
evaluating the geographic structure in model’s residuals, as changes in the strength of correlation between matrices
it uses different color gradients to differentiate under- and (e.g. compositional similarity against a distance matrix) at
over-estimated values. different scales.
Maps are the most important exploratory tools in The problem of inflated type I error rates and model
spatial data analysis, which is why they are fully embedded instability that may arise from violation of the assumption
in each of SAM’s analytical modules. SAM allows users of residuals independence in ecological models are now
to easily draw one or multiple maps simultaneously, well-known (Legendre 1993, Schabenberger and Gotway
which facilitates the visual comparison of the spatial 2005, Diniz-Filho et al. 2008, Cliff and Ord 2009). In
patterns in different variables. The map module in SAM SAM, this assumption can be easily checked for by
allows re-sizing, zooming and scrolling simple maps, as evaluating the spatial correlogram of regression residuals
well as inspecting values by moving the cursor over the that is automatically calculated when the regressed data is
map. An even more advanced mapping module is enabled spatially explicit. However, researchers have been gradually
when the main data is extracted from an ESRI shapefile. abandoning classical null-hypothesis testing when the actual
For example, one may overlay multiple map layers, which goal of the analysis is to confront multiple competing
may be regular or irregular polygons, and points, to hypotheses (Hilborne and Mangel 1997, Burnham and
produce publication-quality maps. The graduated color Anderson 2002). Instead, they have been adopting the
gradients, with customizable classes, are applicable to each information theoretic approach to select the best model
shapefile, with automatically generated legends. among a large set of competing models, or combining
The strength of the relationships among variables can
the most parsimonious models as a function of their rank.
change across space (see GWR below). For example, water
SAM performs model selection and multi-model inference
availability is thought to affect species richness in the
employing the Akaike information criterion (AIC), which
tropics, whereas temperature is the most important driver
provides a parsimonious balance between model predictive
of species richness at higher latitudes (Hawkins et al.
power and complexity. Thus, when a set of competing
2003). Pattern Finder is a new tool available in SAM that
graphically links scatter plots, maps and tables to aid the explanatory variables are defined by the researcher, SAM
identification of geographically structured relationships. evaluates models that emerge from all possible combina-
Using this tool, one can select cells in a map, then the tions of individual variables, and ranks them according to
points in the scatter plot and the rows in a spreadsheet their AIC value and derived statistics (e.g. Akaike’s weights
that refer to the selected cells are highlighted. The selec- and delta AIC). In addition, when the goal is to estimate
tion of the data may also be made directly from the model parameters or to generate a single predictive model,
scatter plot or the spreadsheet. This is an especially useful a ‘‘multi-model’’ consensus is calculated by averaging and
tool to detect outliers or mistyping. weighting the estimated model parameters as function of
One of the most important steps in exploratory analysis Akaike weights. Although this module is based on a
of spatial data is to measure the magnitude and direction standard OLS approach, spatial structure may be easily
of spatial autocorrelation, which has been defined as ‘‘the incorporated by adding spatial covariates as ‘‘fixed’’
property of random variables taking values, at pairs of predictors in the model selection procedure (Diniz-Filho
locations a certain distance apart, that are more similar et al. 2008).
(positive autocorrelation) or less similar (negative auto- When a matrix of explanatory variables represents two
correlation) than expected for randomly associated pairs of or more sets of competing hypotheses, it is possible to
observations’’ (Legendre 1993). Moran’s I coefficient is quantify the explanatory power due to individual sets
one of the most commonly used descriptors of spatial of variables as well as the magnitude of redundancy
autocorrelation. Moran’s I can be calculated for individual between the sets. Partial regression analysis has been
distance classes (e.g. from 0 to 300 km, 300 to 600 km), widely applied in spatial ecology to quantify how the
producing a plot known as a spatial correlogram. Besides a total variation in a response variable can be attributed to
standard spatial correlogram, SAM’s current version also the independent effects of the 1) environmental variation
computes asymmetric correlograms, directional correlo- not structured in space, 2) spatially structured environ-
grams (Rosenberg 2000), Anselin’s Moran’s I scatter plot mental variation, 3) intrinsic spatially contagious processes,
(Anselin 1996), and local Moran’s I (LISA, Sokal et al. and the 4) unexplained variation. The partial regression
1998). Also, a new module in SAM implements join- module in the current version SAM allows users to define
count analysis, which measures the magnitude of spatial up to three sets of variables, which could be, for example,
autocorrelation in binary data, and is thus very useful to contemporary environmental factors (e.g. temperature),
describe the spatial pattern in the distribution of species. historical factors (e.g. mean root distance of a phylogenetic
48
tree) and spatial covariates (e.g. polynomial expansions of mental drivers and their relationship with other macro-
geographic coordinates). ecological patterns (Terribile et al. 2009). Moreover,
A strategy commonly employed to account for spatial spatial autologistic model is also available in SAM. This
autocorrelation in regression analysis is to explicitly incor- model uses the information on the relative position of the
porate in the model the spatial relationship between pairs species occurrence to generate a spatial weighting covariate,
of sites. The family of statistical techniques that employ and aims to improve the model predictive power by
this strategy is collectively known as autoregression, or accounting for stochastic processes driving species distribu-
spatial regression models (Dormann et al. 2007), because tion, such as species’ dispersal capacity (Segurado et al.
they require the estimation of the autoregressive parameter 2006). Dormann et al. (2007) recently used artificial
to measure the magnitude of autocorrelation in the data. simulation to show that autologistic models sometimes
There are several autoregressive (AR) models available in underestimate the effect of environmental factors, although
SAM, including: pure (PAR), lagged-response (LRAR), his analyses have been questioned by Betts et al. (2009).
lagged-predictor (LPAR), simultaneous (SAR), conditional Download of the software, along with the user manual,
(CAR) and moving-average (MAAR) autoregression. In can be found online at the SAM website: /<www.ecoevol.
addition, researchers may also use a semi-variogram to ufg.br/sam/> (/<http://purl.oclc.org/sam//>).
define a variance-covariance matrix and incorporate the To cite SAM or acknowledge its use, cite this Software
spatial structure in a Generalized Least Squares (GLS) note as follows, substituting the version of the application
model (a technique known as kriging regression). that you used for ‘‘Version 4’’:
Among the techniques available today for spatial regres-
sion, one of the most flexible and statistically powerful is Rangel, T. F., Diniz-Filho, J. A. F. and Bini, L. M. 2010.
spatial eigenvector mapping (SEVM, Borcard et al. 2004, SAM: a comprehensive application for Spatial Analysis in
Diniz-Filho and Bini 2005, Griffith and Peres-Neto 2006, Macroecology. Ecography 33: 4650, (Version 4).
Bini et al. 2009). SEVM comes in various flavors,
depending on how the matrix of spatial relationships
among pairs of observations is defined. The module Acknowledgements We thank SAM users for the continuous
stimulus on the SAM project. We are also grateful to all users
implemented in SAM has been continuously improved,
that provided feedback on the software, especially bug report-
and the current version has three important new features: ing. TFR is funded by CAPES/Fulbright fellowship, Univ. of
1) allows both binary connectivity and continuous distance Connecticut and National Science Foundation (DEB-0639979
matrices, 2) provides additional ways to select eigenvectors, and DBI-0851245). JAFD-F and LMB have been continuously
including the minimization of Moran’s I in model residuals, funded by CNPq research fellowships.
and 3) both explanatory variables (e.g. environmental
factors) and spatial eigenvectors can be analyzed simultan-
eously within the SEVM module, which enables the
automated computation of a partial regression analysis References
between explanatory variables and spatial predictors.
Both spatial and non-spatial regression models actually Anselin, L. 1996. The Moran scatterplot as an ESDA tool to assess
require careful evaluation of the stationarity assumption local instability in spatial association. In: Fischer, M. et al.
(a lack of importance in the absolute geographical position (eds), Spatial analytical perspectives in GIS. Taylor and
Francis, pp. 111125.
for estimating model parameters), as violations to this Betts, G. M. et al. 2009. Comments on ‘‘Methods to account
assumption may lead to biases in estimated parameters for spatial autocorrelation in the analysis of species distribu-
(Fotheringham et al. 2002). For example, if the direction tional data: a review’’. Ecography 32: 374378.
and magnitude of an ecological process shift from one Bini, L. M. et al. 2004. Macroecological explanations for
region to another, the parameter estimated by a global differences in species richness gradients: a canonical analysis
stationary model weights the strength and direction of the of South American birds. J. Biogeogr. 31: 18191827.
process in both regions, which may lead to the conclusion Bini, L. M. et al. 2009. Coefficient shifts in geographical ecology:
that the processes is globally irrelevant to the observed an empirical evaluation of spatial and non-spatial regression.
pattern. Thus, the Geographically Weighted Regression Ecography 32: 193204.
Borcard, D. et al. 2004. Dissecting the spatial structure of
(GWR), now implemented in SAM, is an important
ecological data at multiple spatial scales. Ecology 85:
method because it allows users to evaluate possible viola- 18261832.
tions of the stationarity assumption and to estimate geo- Burnham, K. P. and Anderson, D. R. 2002. Model selection
graphically varying model parameters that may then be and multimodel inference: a practical information-theoretic
biologically interpretable (Cassemiro et al. 2007). approach, 2nd ed. Springer.
In modeling process, another new possibility in the Cassemiro, F. A. S. et al. 2007. Non-stationarity, diversity
recent version of SAM is to use presence-absence data to gradients and the metabolic theory of ecology. Global
model species’ distributions, in the context of niche Ecol. Biogeogr. 16: 820822.
modeling or species distribution modeling (SDM) (Elith Cliff, A. D. and Ord, J. K. 2009. What were we thinking?
Geogr. Anal. 41: 351363.
et al. 2006). Although SAM is not particularly designed to
Diniz-Filho, J. A. F. and Bini, L. M. 2005. Modelling
run the many different algorithms available for SDM, it geographical patterns in species richness using eigenvector-
now provides a routine for logistic regression that can be based spatial filters. Global Ecol. Biogeogr. 14: 177185.
used for SDM when presence and absence data are Diniz-Filho, J. A. F. et al. 2008. Model selection and information
available. This tool can be coupled with other richness theory in geographical ecology. Global Ecol. Biogeogr. 17:
analyses and allows a first evaluation of species’ environ- 479488.
49
Dormann, C. F. et al. 2007. Methods to account for spatial Manly, B. F. J. 1998. Randomization, bootstrap and Monte Carlo
autocorrelation in the analysis of species distributional data: a methods in biology. Chapman and Hall.
review. Ecography 30: 609628. Rangel, T. F. L. V. B. et al. 2006. Towards an integrated
Dutilleul, P. 1993. Modifying the t-test for assessing the computational tool for spatial analysis in macroecology and
correlation between two spatial processes. Biometrics 49: biogeography. Global Ecol. Biogeogr. 15: 321327.
305314. Rosenberg, M. S. 2000. The Bearing correlogram: a new method
Elith, J. et al. 2006. Novel methods improve prediction of species’ of analyzing directional spatial autocorrelation. Geogr. Anal.
distributions from occurrence data. Ecography 29: 129151. 32: 267278.
Fotheringham, A. S. et al. 2002. Geographically weighted Schabenberger, O. and Gotway, C. A. 2005. Statistical methods
regression: the analysis of spatially varying relationships. for spatial data analysis. Chapman and Hall/CRC.
Wiley. Segurado, P. et al. 2006. Consequences of spatial autocorrelation
Griffith, D. A. and Peres-Neto, P. R. 2006. Spatial modeling for niche-based models. J. Anim. Ecol. 43: 433444.
in ecology: the flexibility of eigenfunction spatial analysis. Sokal, R. R. et al. 1998. Local spatial autocorrelation in biological
Ecology 87: 26032613. variables. Biol. J. Linn. Soc. 65: 4162.
Hawkins, B. A. et al. 2003. Energy, water, and broad-scale Terribile, L. C. et al. 2009. Richness patterns, species distribution
geographic patterns of species richness. Ecology 84: 3105 and the principle of extreme deconstruction. Global Ecol.
3117.
Biogeogr. 18: 123136.
Hilborne, R. and Mangel, M. 1997. The ecological detective:
Tukey, J. W. 1980. We need both exploratory and confirmatory.
confronting models with data. Princeton Univ. Press.
Am. Stat. 34: 2325.
Legendre, P. 1993. Spatial autocorrelation: trouble or new
paradigm? Ecology 74: 16591673.
50