R Pedology 3
R Pedology 3
D.E. Beaudette
Dept. Land, Air and Water Resources
University of California, Davis
Abstract
Soils are routinely sampled and characterized according to genetic horizons (layers), resulting in
data that are associated with principal dimensions: location (x,y), depth (z), and property space (p).
The high dimensionality and grouped nature of this type of data can complicate standard analysis,
summarization, and visualization. The aqp package was developed to address some of these issues, as
well as provide a useful framework for the advancement of quantitative studies in soil genesis, geography,
and classification.
1 Background
The soils of the world support a wide range of natural ecosystems, agricultural production, industrial
processes, and the largest surficial carbon pool (Schlesinger, 1997). The rise and fall of past civilizations
can be directly linked to the use and misuse of the soil resource (Hillel, 1998). A staggering quantity of soils
information has been collected over the last 100 years, yet these data are often underutilized due to the
sheer volume and complex structure. We have developed an R package that supports the interpretation
of massive soils databases through numerical extensions to traditional methods of visualizing, aggregating,
and classifying soils information. Further development of these numerical analogues will provide a new
set of quantitative tools that soil scientists and surveyors can use in conjunction with well-established,
qualitative methods.
Soil science is an integrative approach to understanding surficial processes that includes concepts from
several disciplines (Buol et al., 2003). Pedology, one of several branches of soil science, is the study of the
genesis, morphology, classification, and geography of soils. Soil profiles are usually described, sampled, and
characterized by genetic horizons (“layers” defined by morphology and usually associated with an inferred
process), extending from the surface to a lower boundary determined by bedrock contact or to a depth
of 150-200 cm (Soil Survey Division Staff, 1993). The stratigraphy and morphology of soil horizons are
usually the first data that the soil scientist uses to qualitatively classify a soil: i.e. degree of alteration
relative to the parent material (Figure 1(a)), expression of oxidized or reduced forms of iron (Figure 1(b)),
accumulation of organic matter, or evidence of cyclical deposition of new material (Figure 1(c)).
Hans Jenny was one of the first researchers to advocate a semi-quantitative theory of soil genesis; in
which he described the “factors of soil formation” concept (Jenny, 1941). This novel approach is based on
the expression:
S = f (cl, o, r, p, t) (1)
where S represents a branch within a soil classification system, a collection of soil properties associated
with a soil profile or a single layer (horizon). The parameters within the “clorpt” framework are: cl
1
(a) (b) (c)
Figure 1: Examples of soil profiles illustrating how horizons change with depth. Color, texture, structure
and root abundance are common visual indicators of near surface processes in soil.
representing a climate factor, o representing an organic factor, r representing a relief factor, p representing
a parent material factor, and t representing time. The S term in equation 1 can be modeled as matrix of
soil properties (columns) associated with either genetic horizons or regular depth-slices (rows), occurring
at some location in space. While the “clorpt” model is a useful construct for understanding how soil
genesis might proceed, quantitative evaluation is usually not possible because of complex interaction and
possible feedback mechanisms between terms on the right-hand side of the expression (Huggett, 1975).
The left-hand side of the expression, S, is especially difficult to quantitatively define when it describes
a collection of soil horizons and properties. The magnitude of measured properties, correlation between
properties, and trends with depth are all critical elements of how a soil profile is interpreted as a whole
(Arkley, 1976)
Several mature systems exist for the classification of soil profiles; Soil Taxonomy, World Reference
Base, Australian Soil Classification, etc. (Buol et al., 2003). Each system is based on current knowledge
of soil genesis, manifestation of specific processes in the form of field or lab measured properties, and
region-specific land use limitations. Most soil classification systems seek to accommodate the (potential)
global variability of soils (including Soil Taxonomy and World Reference Base), while others are tailored
to region-specific soil variability. Soil Taxonomy (Soil Survey Staff, 1999) provides a rich vocabulary for
grouping soils into several levels of a hierarchy based on established land-use limitations and our current
knowledge of soil genesis. However, Soil Taxonomy does not currently define an approach for numerically
describing the difference between soils. There has been limited work on purely numerical systems of soil
classification (Rayner, 1966; Moore and Russell, 1967; Moore et al., 1972; Little and Ross, 1985; Dale et al.,
1989), and several authors have suggested the potential merit to such an approach (Webster, 1968; Arkley,
1976; Minasny and McBratney, 2007; Carré and Jacobson, 2009). In particular, Young and Hammer (2000)
suggested that fine-scale soil variability is more adequately captured by numerical classification as opposed
to Soil Taxonomy. To date, most numerical soil classification methods are rarely employed outside of case
studies presented within scientific journals. A numerical classification system could potentially be used to
bridge national taxonomic systems (i.e. Soil Taxonomy and the Australian Soil Classification system) based
solely on soil physical and chemical processes. Alternative classification schemes could be generated from
the same underlying data, but directed towards specific goals, by selecting which variables and dissimilarity
2
metric are used. For example, soil texture, organic matter content, and aggregate stability information
(weighted such that near-surface horizons contribute more to the final classification) could be used to
generate a classification scheme supporting erosion prevention.
3
P001 P002 P003 P004 P005 P006 P007 P008 P009
A1 Oi Oa/A A1 A A1 Oi Oe 0 cm
Oi
A2 AB A2 A1
A A A2 A
AB
AB A2
BA A3 Bw1
C1
AB AB1
Bt1 Bt1
Bt1 Bw2
C2 Bw1 AB
Bt2 50 cm
BA
Bt2 AB2
Bt2
Bw3
Bt
AB3
Rt Bw1
100 cm
C
C1
Bw2
Bw2
C2
2C
150 cm
2C1
2C
2C2 2C1
2C2
3C
3Bwb
3C 3Ab
200 cm
3Cb
3Bwb
250 cm
Figure 2: Visualization of nine soil profiles from Pinnacles National Monument, CA; colored by RGB
representations of field-described dry colors.
D651 illuminant (Equation 3), conversion from XY Z (D65 illuminant) to rgb (Equation 4), scaling of rgb
values to conform to a specific gamma value (Equation 5).
xY
X= (2)
y
Y =Y
Y (1 − x − y)
Z=
y
0.990448 −0.012371 −0.003564
XD65 YD65 ZD65 = X Y Z −0.007168 1.015594 0.006770 (3)
−0.011615 −0.002928 0.918157
3.24071 −0.969258 0.0556352
r g b = XD65 YD65 ZD65 −1.53726 1.87599 −0.203996 (4)
−0.498571 0.0415557 1.05707
(
12.92 × {r, g, b} : r, g, b ≤ 0.0031308
R, G, B = (1.0/2.4)
(5)
1.055 × {r, g, b} − 0.055 : r, g, b > 0.0031308
1
Most R plotting functions, and computer monitors in general, use the sRGB color profile which assumes a D65 illuminant.
4
Munsell Hue
5YR ● 7.5YR 10YR 2.5Y
1 3 5 7
0.50
y−coordinate
0.45
●
0.40 ● ●
● ●
● ● ● ● ● ● ●
● ●
● ● ● ●
● ● ● ●
● ●
0.35 ● ●
● ● ●
●
● ●
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Munsell Chroma
Figure 3: Relationship between Munsell chroma and y-coordinate (xyY colorspace), for selected Munsell
hue (defined by line color and point symbol) and Munsell value (panels). Points at even-numbered Munsell
chroma values were derived from the MCSL spectral database. Points at odd-numbered Munsell chroma
values were estimated by spline interpolation.
5
compute
summary
statistics by
segment
Figure 4: Demonstration of the soil profile aggregation algorithm, for a single soil property, aggregated
along a regular sequences of depth slices.
6
in the previous section can be extended to re-align soils data (collected by genetic horizon) onto a common
depth-basis. The format slices() function is provided to re-format the resulting “sliced” data into a list
of sp class elements, suitable for mapping or modeling tasks.
where n is the number of slices, wi is an optional weighting coefficient, and G( ) is Gower’s generalized
dissimilarity metric (Gower, 1971).
The algorithm (implemented in the profile compare() function) represents a compromise between the
way soils are commonly described and sampled (by genetic horizon type) and a normalized basis for the
comparison of measured properties (depth slice). Gower’s generalized dissimilarity metric is available via
the daisy() function (from the cluster package). This metric can be evaluated from binary, categorical,
and continuous variables; and can accommodate limited occurrences of missing observations (Kaufman
and Rousseeuw, 2005). Between-profile dissimilarities are computed to a user-specified maximum depth.
7
k
0 0.01 0.05
0.005 0.02 0.1
20
40
Depth
60
80
100
Weighting Coefficient
4 4 4 2
2 2 3 1
3 3 2 4
1 1 1 3
5 5 5 5
9 9 10 9
7 7 6 7
8 8 8 10
10 10 9 6
6 6 7 8
50 40 30 20 10 0 30 20 10 5 0 8 6 4 2 0 4 3 2 1 0
Figure 6: Effect of adjusting the depth weighting (k) parameter from 0 to 0.1 on soil profile grouping.
Cophenetic correlation coefficients are printed below k values. “Average linkage” agglomerative clustering
was used to build dendrograms.
k and d are > 0. The decay rate parameter (k) determines how rapidly a slice’s dissimilarity value is
down-weighed with depth: a value of 0.1 would effectively remove any influence of dissimilarities computed
below about 30 cm, and a value of 1 would weight all slices equally (Figure 5(b)). The actual value for
k should be determined as objectively as possible; i.e with a combination of knowledge about expected
vertical anisotropy and a metric such as the cophenetic correlation coefficient (Sneath and Sokal, 1973).
Within the sample dataset sp3, incrementing k from 0 to 0.1, with respect to resulting agglomerative
(“average” method) clustering is demonstrated in Figure 6. For this dataset, lower levels of k result in
better agreement (larger cophenetic correlation coefficients) between the dissimilarity matrix and grouping
defined by agglomerative clustering. The highest cophenetic correlation coefficient is encountered when k
= 0.01, close to the depth weighting values suggested by Russell and Moore (1968).
Variable soil depth can interfere significantly with the calculation of between-profile dissimilarity. For
example, how should the dissimilarity between two profiles at a given depth, when one of the profiles
is shallower than that depth? When soil depths are not well defined (e.g. alluvial soils excavated with
different tools) a lower-limit to the depth-wise dissimilarity calculation should be sufficient. When the soils
8
in question have been described down to a natural lower boundary (e.g. bedrock, root-restricting layer,
etc.) the dissimilarity between soil and non-soil material should be incorporated into the final dissimilarity
between profiles. As currently implemented in the function daisy() (Kaufman and Rousseeuw, 2005),
Gower’s dissimilarity metric is undefined when one of the two inputs is missing. Therefore, when a 25
cm deep profile is compared with a 50 cm deep profile, pair-wise dissimilarities are only accumulated for
the first 25 cm of soil (dissimilarities from 26 - 50 cm are NULL). When summed, the total dissimilarity
between these profiles will generally be much lower than if the profiles had been of equal depth.
Our algorithm has an option (setting replace na=TRUE) to replace NULL dissimilarity with the max-
imum dissimilarity between any pair of profiles for the current depth slice (Figure 7). In this way, the
dissimilarity between a slice of soil and a corresponding slice of non-soil reflects the fact that these two
materials should be treated very differently (i.e. maximum dissimilarity). This alternative calculation of
dissimilarity between soil and non-soil slices solves the problem of comparing shallow profiles with deeper
profiles. However, it can result in a new problem: dissimilarity calculated between two shallow profiles
will be erroneously inflated beyond the extent of either profile’s depth when deeper profiles exist in the
collection. Our algorithm has an additional option (setting add soil flag=TRUE) that will preserve NULL
distances between slices when both slices represent non-soil material (Figure 7). With this option enabled,
shallow profiles will only accumulate mutual dissimilarity to the depth of the deeper profile.
4 4 8
2 3 7
3 2 9
1 1 6
5 10 10
9 6 4
7 8 3
8 7 2
10 9 1
6 5 5
30 25 20 15 10 5 0 40 30 20 10 0 40 30 20 10 0
Figure 7: Effect of enabling the replace na and add soil flag options on between-sample dissimilarity
when k = 0.01, and max d = 100 cm. Cophenetic correlation coefficients are printed below options.
“Average linkage” agglomerative clustering was used to build dendrograms.
4 4 4 2
2 2 3 1
3 3 2 4
1 1 1 3
5 5 5 5
9 9 10 10
7 7 6 6
8 8 8 8
10 10 9 9
6 6 7 7
Every Depth Slice Every 2nd Depth Slice Every 10th Depth Slice Every 20th Depth Slice
C = 0.96 C = 0.96 C = 0.94 C = 0.85
Figure 8: Effect of adjusting the sample interval parameter from 1 to 20 on soil profile grouping, when
k = 0.01, and max d = 100 cm. Cophenetic correlation coefficients are printed below options. “Average
linkage” agglomerative clustering was used to build dendrograms.
For massive collections of soil profiles the sample interval argument to profile compare() can be
9
used to reduce memory consumption by computing pair-wise dissimilarities every n slices. For example,
the comparison of 1,000 soil profiles, each with 5 variables, to a maximum depth of 50 cm requires 192.3 Mb
of RAM for the storage of the entire dissimilarity matrix (all depth slices) and takes about 70 seconds to
perform (1.3 Ghz Intel CPU). Computing dissimilarity values every 5 slices reduces memory consumption to
1 fifth the original size (38.5 Mb) and processing time by a factor of about 3 (22 seconds). Within the sample
dataset sp3, larger sample interval values result in lower total dissimilarity values, minor differences in
grouping structure, and minor reduction in cophenetic correlation coefficients; up to a sampling interval
of every 10th slice (Figure 8). However, the specific threshold defining a reasonable trade-off between
computational efficiency and preservation of detail will depend on the input dataset, available computing
resources, and the purpose of the analysis. An optimized version of profile compare() that uses file-based
storage for the collection of dissimilarity matrices is currently in development.
Selection of variables included in the dissimilarity calculation, dissimilarity metric, depth-weighting
coefficient, replacement of NULL distances, and grouping criteria all affect the output of this algorithm–
and require further evaluation. Ideally, variables should be selected to accomodate the type of grouping
that is most appropriate for the task at hand. For example, a classification reflecting basic parameters of
soil development could be built from physical and chemical properties (particle size, pH, CEC, %BS, soil
color, etc.) whereas a classification geared towards soil management decisions could be built from other
properties (horizon morphology, bulk density, soil depth, etc.).
10
with substances containing large quantities of amorphous, poorly crystalline, or micro-crystalline (i.e. clay
mineral) phases (Bish, 1994).
The Observed Patterns method is based on empirical mixture modeling, rather than fundamental
principals, and thus requires less detailed characterization of the sample (Bish, 1994). A simulated pattern
is fit to the unknown pattern (all previously normalized to an internal standard) based on iteratively
estimating the fractions of phases that are present within the unknown sample. Proportions are estimated
by minimizing the sum of absolute differences between the unknown pattern and the composite pattern at
each 2θ value:
X
D(2θ) = |Iu (2θ) − Wp ∗ Ip (2θ)| (9)
where Iu (2θ) is the intensity of the unknown sample at each 2θ value, Wp is the proportion of phase p, and
Ip (2θ) is the intensity of phase p at each 2θ value. This approach requires knowledge of the main mineral
phases within the unknown, pure standards of those phases, and that all analysis be conducted under the
same operating conditions. If pure standards cannot be obtained, it is possible to simulate patterns for
those phases from the Powder Diffraction File. The accuracy of this method can be greatly improved when
pure standards are available that closely match the crystal size and composition of the phases present in
the unknown sample.
11
Listing 1: Setup the environment and load an example dataset.
# setup environment
library(aqp) ; data(sp3)
# generate surrogate horizon names from clay / CEC / pH
sp3$name <- paste(round(sp3$clay), '/', round(sp3$cec), '/', round(sp3$ph,1))
# color conversion
sp3$soil_color <- with(sp3, munsell2rgb(hue, value, chroma))
Listing 2: Compute between-profile dissimilarity and build a dendrogram from the result using divisive
hierarchical clustering.
# load required libraries
require(ape) ; require(cluster)
# perform comparison of profiles
d <- profile_compare(sp3, vars=c('clay','cec','ph'), max_d=100, k=0.01)
h <- diana(d)
p <- ladderize(as.phylo(as.hclust(h)))
# convert soil data into ProfileList object for plotting
sp3.list <- initProfileList(sp3)
A new plot of the dendrogram is generated with the standard plot method for ape class objects; adjustments
are made in order to accommodate sketches of the soil profiles below (Figure 9). Information on the ordering
of soil profiles is extracted from the special last plot.phylo object, and used to position profile sketches
below corresponding terminal nodes of the dendrogram. Finally, soil profile sketches are generated by the
profile plot() function, applied to a SoilProfileList class object (Figure 9). If so desired, alternative
depth-function plots could be inserted below their corresponding “leaves” of the dendrogram; i.e. particle
size information, principal component scores, etc.
The results of this numerical classification (Figure 9) match field observation of soil properties, and
expected differences between major lithologic types. Profiles 1-4 were collected from soils formed on
metavolcanic rocks of varying iron content; with higher clay and pH values found on rocks with the highest
iron content (profiles 2-4). Profile 5 was collected from a soil formed on metasedimentary rock, with high
clay content and much lower pH values. Profiles 6-10 were collected from soils with low clay content
and slightly higher pH values formed on granodiorite. Slightly higher clay contents and an increasing pH
depth-function differentiate profiles 7-9 (swale position) from profiles 6 & 10 (backslope position). General
patterns in soil color mirror the 3 groups identified within the clustering: deep red colors found in group
1 (high-iron soils from metavolcanic rocks) and group 2 (metasedimentary rocks), gray to brown colors
found in the swale position of group 3, and the lighter, more yellow colors found on the backslope position
(Figure 9).
According to branching within the dendrogram (Figure 9), the metasedimentary-soil appears to be
most similar to the metavolcanic-soil group. Inspection of the dissimilarity matrix reveals that this soil is
approximately 31% similar to the soils of the metavolcanic group and only 9% similar to the soils of the
granodiorite group.
Next, depth-slice aggregation of cec and clay values is performed by calling the soil.slot() function
for each of the three major groups identified via cluster analysis. Depth-slice aggregation of pH values is
applied to groups defined by cutting the dendrogram at a lower level, such that the granodiorite group is
split according to hillslope position (Figure 9). The ddply() function (plyr package) is simplest to use,
however the by() and do.call() functions could have been used as well. Visualization of the depth-wise
12
Listing 3: Plot the dendrogram with soil profile sketches below.
par(mar=c(1,1,1,1))
p.plot <- plot(p, cex=0.8, label.offset=-3, direction='up', y.lim=c(60,-2),
x.lim=c(1,sp3.list$num_profiles+1), show.tip.label=FALSE)
tiplabels(col=c(1,2,4)[cutree(as.hclust(p), 3)],
pch=c(15,15,15,16)[cutree(as.hclust(p), 4)], cex=2)
# get the last plot geometry
lastPP <- get("last_plot.phylo", envir = .PlotPhyloEnv)
# vector of indices for plotting soil profiles below leaves of dendrogram
new_order <- sapply(1:lastPP$Ntip,
function(i) which(as.integer(lastPP$xx[1:lastPP$Ntip]) == i))
# plot the profiles, in the ordering defined by the dendrogram
# with a couple fudge factors to make them fit
profile_plot(sp3.list, color="soil_color", plot.order=new_order,
scaling.factor=0.3, width=0.1, cex.names=0.65,
y.offset=max(lastPP$yy)+8, add=TRUE)
# add a legend
legend(0.4, -2, legend=c('metavolcanic rocks', 'metasedimentary rocks',
'granodiorite: backslope', 'granodiorite: swale'),
col=c(1,2,4,4), pch=c(15,15,15,16), bty='n', cex=1)
Listing 4: Compute a normalized similarity between a single profile and all others within the collection.
# get groups from above and leave out soil number 5
groups <- factor(cutree(as.hclust(p), 3)[-5],
labels=c('metavolcanic','granodiorite'))
# using dissimilarity matrix from above,
# subset soil number 5 vs. all others
d.5 <- as.matrix(d)[5, -5]
# normalized similarity = 1 - ( dissimilarity / max(dissimilarity) )
1 - round(tapply(d.5, groups, mean) / max(d), 2)
# metavolcanic granodiorite
# 0.31 0.09
trends and uncertainty (+/- 1 standard deviation) is performed with the custom lattice panel function
panel.depth function() (Figure 10).
Aggregation of soil profile information gives an indication of group-wise central tendency and an em-
pirical estimate of variability (Figure 10). Clay content (Figures 10(a)) and CEC values (Figures 10(b))
are highest within the metavolcanic-soils, with a marked but highly variable increase at 60-80 cm in depth.
CEC values are lowest in the granitic-soils and show very low variability with depth. The metasedimentary-
soil group lies closer to the metavolcanic-soils, and additional observations (required to compute depth-wise
variability) would assist with further, interpretation. Visualization of aggregate soils information can also
aid interpretation of the results from the previous classification. Of the three characteristics supplied to
the profile compare() function (clay content, CEC, and pH), the distribution of cec values and clay
content with depth appears to be the most important factor contributing to differences between groups
(Figures 10(a) and 10(b)). Diverging pH depth trends (Figure 10(c)) differentiate the two sub-groups
identified within the granitic-soils (backslope vs. swale hillslope position).
13
metavolcanic rocks
metasedimentary rocks
granodiorite: backslope
● granodiorite: swale
● ● ●
2 4 3 1 5 7 9 8 6 10
18 / 22 / 5.6 15 / 16 / 5.9 15 / 23 / 5.8 12 / 15 / 5.1 0 cm
14 / 20 / 5.8 18 / 28 / 5.6 9 / 7 / 6.2 7 / 7 / 6.4
20 / 13 / 5.8 22 / 12 / 6 11 / 6 / 5.2 7/8/6
18 / 13 / 6.1
18 / 13 / 5.9 19 / 14 / 5.4 9 / 6 / 5.7
32 / 9 / 5.1 7 / 5 / 6.2 20 cm
22 / 15 / 5.9 9 / 8 / 6.2
21 / 14 / 6.2
21 / 14 / 5.4 11 / 6 / 6 6 / 4 / 6.1
25 / 14 / 6 25 / 14 / 6 40 cm
51 / 11 / 4.8 8 / 5 / 6.2 7 / 5 / 5.5
22 / 13 / 5.6
33 / 22 / 5.8 28 / 16 / 6.2
8 / 6 / 6.4
35 / 22 / 5.9 60 cm
20 / 12 / 5.8 11 / 6 / 6.3
56 / 42 / 5.9 5 / 4 / 6.3 10 / 5 / 5.8
46 / 40 / 6.1 8 / 5 / 5.3
46 / 34 / 5.8 39 / 26 / 5.7 80 cm
34 / 39 / 6 16 / 7 / 7.3
10 / 5 / 6.3
100 cm
120 cm
Figure 9: Divisive hierarchical clustering of soil profiles from the sp3 sample dataset. Tip colors repre-
sent group membership defined by cutting the dendrogram into three classes, and labeled according to
underlying rock type. Horizon names have been substituted with: “clay / CEC / pH”.
5 Concluding Remarks
The examples presented in the previous sections represent only a handful of the functions within the aqp
package. Several additional functions are included that can be used to format and display depth slices of
soils information according to spatial coordinates. A random profile() function is included to simulate
soil profile data, for the development and testing of aggregation and classification algorithms. The bundled
documentation includes extensive, annotated examples based on three sample soils datasets. Examples
presented in this chapter were based on a small number of soil profiles for clarity. However, functions in
the aqp package have been successfully applied to studies involving several thousand soil profiles. Stable
versions of the aqp are hosted on CRAN2 , and the active development version of the aqp package will
continue to be hosted on R-Forge3 .
6 Acknowledgments
Several portions of this research was funded by the Kearney Foundation of Soil Science. I would like to
thank Dr. Brent Myers for providing thoughtful commentary on several of the ideas presented in this
2
http://cran.r-project.org/web/packages/aqp/
3
http://aqp.r-forge.r-project.org/
14
Listing 5: Aggregate soil property information according to groups identified through divisive hierarchical
clustering.
require(plyr) ; require(lattice)
# note that this example only illustrates a single iteration of the steps outlined above
# split data into 3 major classes (following rock type)
g <- factor(cutree(as.hclust(p), 3), labels=c('metavolcanic rocks',
'metasedimentary rocks', 'granodiorite'))
g <- data.frame(group=g, id=factor(names(g)))
# combine groups with original dataframe
sp3.new <- merge(sp3, g, by='id')
sp3.new$prop <- sp3.new$cec
# perform aggregation, by group
a <- ddply(sp3.new, .(group), .fun=soil.slot)
# manually add mean +/- SD to the result
a$upper <- with(a, p.mean+p.sd)
a$lower <- with(a, p.mean-p.sd)
Listing 6: Plot aggregate soil property data. Note that the following code listing corresponds to Figure 10(b)
# use custom plotting function for uncertainty viz.
xyplot(
top ~ p.mean, data=a, groups=group, subscripts=TRUE,
lower=a$lower, upper=a$upper, ylim=c(100,-5), alpha=0.3,
ylab='Depth (cm)', xlab='CEC (cmol(+) / kg soil)',
panel=panel.depth_function,
auto.key=list(lines=TRUE, points=FALSE, columns=2,
title='Soil Profile Group', cex=0.75, size=4, between=1),
par.settings=list(superpose.line=list(col=c(1,2,4), lty=1))
)
15
paper.
References
Arkley, R. J., 1976. Statistical methods in soil classification research. p. 37–69. In N. C. Brady (ed.)
Advances in Agronomy. Academic Press, New York, NY.
Bish, D., 1994. Quantitative x-ray diffraction analysis of soil. sssa miscellaneous publication 9, p. 267–295.
In J. Amonette and L. Zelazny (ed.) Quantitative Methods in Soil Mineralogy. Soil Science Society of
America, Inc.
Buol, S. W., R. C. Graham, P. A. McDaniel, and R. J. Southard, 2003. Soil Genesis and Classification.
5th ed. Iowa State Press, Ames, IA.
Carré, F. and M. Jacobson, 2009. Numerical classification of soil profile data using distance metrics.
Geoderma 148:336–345.
Dale, M., A. McBratney, and J. Russell, 1989. On the role of expert systems and numerical taxonomy in
soil classification. European Journal of Soil Science 40:223–234.
Gower, J. C., 1971. A general coefficient of similarity and some of its properties. Biometrics 27:857–871.
ISSN 0006341X.
Greene-Kelly, R., 1953. The identification of montmorillonite in clays. Journal o f Soil Science 4:233–237.
Huggett, R., 1975. Soil landscape systems: A model of soil genesis. Geoderma 13:1 – 22. ISSN 0016-7061.
Hughes, R., D. More, and H. Glass, 1994. Qualitative and quantitative analysis of clay minerals in soils.
p. 330–359. In J. Amonette and L. Zelazny (ed.) Quantitative X-Ray Diffraction Analysis of Soil. Soil
Science Society of America.
Kaufman, L. and P. J. Rousseeuw, 2005. Finding Groups in Data An Introduction to Cluster Analysis.
Wiley-Interscience.
Little, I. and D. Ross, 1985. The levenshtein metric, a new means for soil classification tested by data from
a sand-podzol chronosequence and evaluated by discriminant function analysis. Australian Journal of
Soil Research 23:115–130.
Maechler, M., P. Rousseeuw, A. Struyf, and M. Hubert, 2005. Cluster analysis basics and extensions. See
the ’Changelog’ file (in the package source).
Minasny, B. and A. B. McBratney, 2007. Incorporating taxonomic distance into spatial prediction and
digital mapping of soil classes. Geoderma 142:285–293.
Moore, A. and J. Russell, 1967. Comparison of coefficients and grouping procedures in numerical analysis
of soil trace element data. Geoderma 1:139–158.
16
Moore, A., J. Russell, and W. Ward, 1972. Numerical analysis of soils: A comparison of three soil profile
models with field classification. Journal of Soil Science 23:194–209.
Paradis, E., J. Claude, and K. Strimmer, 2004. Ape: Analyses of phylogenetics and evolution in R language.
Bioinformatics 20:289–290.
R Development Core Team, 2006. R: A Language and Environment for Statistical Computing. R Founda-
tion for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
Rayner, J., 1966. Classification of soils by numerical methods. Journal of Soil Science 17:79–92.
Russell, J. and A. Moore, 1968. Comparison of different depth weightings in the numerical analysis of
anisotropic soil profile data. In Proceedings of the 9th International Congress of Soil Science, volume 4,
p. 205–213.
Schlesinger, W. H., 1997. Biogeochemistry: An Analysis of Global Change. 2nd ed. Academic Press, New
York, NY.
Sneath, P. H. A. and R. R. Sokal, 1973. Numerical Taxonomy. W.H. Freeman and Company. Examples
related to soils on page 439.
Soil Survey Division Staff, 1993. Soil Survey Manual. U.S. Department of Agriculture Handbook 18. United
States Department of Agriculture.
Soil Survey Staff, 1999. Soil Taxonomy: A Basic System of Soil Classification for Making and Interpreting
Soil Surveys. Number 436 In Agricultural Handbook. USDA.
Soil Survey Staff, 2004. Soil Survey Laboratory Methods Manual. 4th ed. Number 42 In Soil Survey
Investigations Report. USDA-NRCS, Washington, D.C.
Webster, R., 1968. Fundamental objections to the 7th approximation. Journal of Soil Science 19:354–366.
Webster, R. and M. Oliver, 1990. Statistical Methods in Soil and Land Resource Survey. Oxford University
Press.
Whittig, L. D. and W. R. Allardice, 1986. Principles of x-ray diffraction. In Methods of Soil Analysis, Part
1. Physical and Mineralogical Methods. American Society of Agronomy-Soil Science Society of America,
2nd ed.
Young, F. and R. Hammer, 2000. Defining Geographic Soil Bodies by Landscape Position, Soil Taxonomy,
and Cluster Analysis. Soil Sci Soc Am J 64:989–998.
17
Soil Profile Group Soil Profile Group
metavolcanic rocks granodiorite metavolcanic rocks granodiorite
metasedimentary rocks metasedimentary rocks
0 0
20 20
Depth (cm)
Depth (cm)
40 40
60 60
80 80
10 20 30 40 50 10 20 30 40
(a) (b)
20
Depth (cm)
40
60
80
pH (1:1 water:soil)
(c)
Figure 10: Depth-slice aggregation of clay content (a), cation exchange capacity (b) and pH (c) based
groups identified via cluster analysis. Lines are mean values, shaded area represents the mean ± 1 standard
deviation.
18