0% found this document useful (0 votes)
19 views12 pages

Aplicaciones de Mapeo Digital en Producción Ordenes de Suelos

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views12 pages

Aplicaciones de Mapeo Digital en Producción Ordenes de Suelos

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Pedosphere 21(3): 339–350, 2011

ISSN 1002-0160/CN 32-1315/P


c 2011 Soil Science Society of China
Published by Elsevier B.V. and Science Press

Application of a Digital Soil Mapping Method in Producing Soil Orders


on Mountain Areas of Hong Kong Based on Legacy Soil Data∗1

SUN Xiao-Lin1,2,3 , ZHAO Yu-Guo2,3 , ZHANG Gan-Lin2,3 , WU Sheng-Chun1,3 , MAN Yu-Bon1 and
WONG Ming-Hung1,3,∗2
1
Croucher Institute for Environmental Sciences, and Department of Biology, Hong Kong Baptist University, Hong Kong
Special Administrative Region (China)
2
State Key Laboratory of Soil and Sustainable Agriculture, Institute of Soil Science, Chinese Academy of Sciences, Nanjing
210008 (China)
3
Joint Open Laboratory of Soil and Environment, Institute of Soil Science, Chinese Academy of Sciences and Hong Kong
Baptist University (China)
(Received September 30, 2010; revised March 22, 2011)

ABSTRACT
Based on legacy soil data from a soil survey conducted recently in the traditional manner in Hong Kong of China, a
digital soil mapping method was applied to produce soil order information for mountain areas of Hong Kong. Two modeling
methods (decision tree analysis and linear discriminant analysis) were used, and their applications were compared. Much
more effort was put on selecting soil covariates for modeling. First, analysis of variance (ANOVA) was used to test the
variance of terrain attributes between soil orders. Then, a stepwise procedure was used to select soil covariates for linear
discriminant analysis, and a backward removing procedure was developed to select soil covariates for tree modeling. At the
same time, ANOVA results, as well as our knowledge and experience on soil mapping, were also taken into account for
selecting soil covariates for tree modeling. Two linear discriminant models and four tree models were established finally, and
their prediction performances were validated using a multiple jackknifing approach. Results showed that the discriminant
model built on ANOVA results performed best, followed by the discriminant model built by stepwise, the tree model built
by the backward removing procedure, the tree model built according to knowledge and experience on soil mapping, and
the tree model built automatically. The results highlighted the importance of selecting soil covariates in modeling for soil
mapping, and suggested the usefulness of methods used in this study for selecting soil covariates. The best discriminant
model was finally selected to map soil orders for this area, and validation results showed that thus produced soil order map
had a high accuracy.
Key Words: decision tree analysis, linear discriminant analysis, soil covariate selection

Citation: Sun, X. L., Zhao, Y. G., Zhang, G. L., Wu, S. C., Man, Y. B. and Wong, M. H. 2011. Application of a digital
soil mapping method in producing soil orders on mountain areas of Hong Kong based on legacy soil data. Pedosphere.
21(3): 339–350.

INTRODUCTION rapid industrialization and intensive urbanization hap-


pening during the past decades. The local agriculture
Soil information is urgently needed globally for has shrunk to a negligible size. The fast growing popu-
applications in agriculture (Bishop and McBratney, lation of 7 million people residing on the 1 104 km2 ter-
2001), ecological (Dobos et al., 2001; Scull et al., ritory is always posing huge pressure on the local envi-
2005) and hydrological process modeling (Zhu et al., ronment and ecology (Li et al., 2001; Warren-Rhodes
1997). This is particularly true for Hong Kong, with and Koenig, 2001). Due to the inherent heterogeneity

∗1
Supported by the Public Policy Research of the Research Grants Council of Hong Kong, China (No. 2002-PPR-3), the
Knowledge Innovation Program of the Chinese Academy of Sciences (No. KZCX2-YW-409), the National Natural Science
Foundation of China (Nos. 40625001 and 40771092), and the Mini-AOE (Area of Excellence) Fund from the Hong Kong
Baptist University, China (No. RC/AOE/08-09/01).
∗2
Corresponding author. E-mail: [email protected].
340 X. L. SUN et al.

of local soil, it is difficult to predict slope stability by applying a model describing the relationship be-
which is threatened by highly erratic nature of local tween soil and soil covariates. Modeling the relation-
hydrological influences given the local rugged terrain ship is the key step which deserves special attention
and intense seasonal rainfall (Au, 1998). in digital soil mapping (Kempen et al., 2009). Vari-
The available soil information of Hong Kong is far ous modeling methods have been used in digital soil
from sufficient to meet various needs. The five small mapping (McBratney et al., 2003). Among them, deci-
pieces of 1:25 000 soil maps which were drawn a half sion tree analysis is frequently used for predicting soil
century ago (Grant, 1962) only focused on the local classes (Henderson et al., 2005; Scull et al., 2005; Gri-
agricultural areas, about 27% of the whole territory. nand et al., 2008). Compared with other methods, this
Several other soil maps covering the whole territory method has advantages of being non-parametric, quan-
were made from a soil survey conducted recently in titative and categorical variables compatible, highly ef-
the traditional manner using the soil surveyors’ men- ficient, etc. (Grinand et al., 2008). Linear discriminant
tal model (Luo et al., 2007). However, digital soil map- analysis is also often used for predicting soil classes,
ping has been considered more promising for future soil and its performance has been considered as easy, ef-
mapping (McBratney et al., 2003; Grinand et al., 2008; ficient and successful (Thomas et al., 1999; Dobos et
Lagacherie, 2008), and it is now being widely applied al., 2000). Although the two methods can choose vari-
to update traditional soil maps based on legacy soil ables automatically, it is crucial to provide suitable
data (Kempen et al., 2009; Stoorvogel et al., 2009). variables because a model for soil mapping should not
Digital soil mapping is based on soil genesis, and be only statistically sound but also pedologically plau-
therefore soil information is also inferred from soil co- sible (Kempen et al., 2009). Therefore, in this study of
variates on the five soil forming factors, i.e., climate, digital soil mapping, selecting soil covariates for mode-
parent material, relief, organisms and time (McBrat- ling was emphasized.
ney et al., 2003). According to Luo et al. (2007), the The major objective of the present study was to re-
local soils are Ferrosols, Cambosols, Ferralosols, Ar- fine a soil order map for mountain areas of Hong Kong
gosols, Gleyosols, Primosols and Anthrosols based on based on legacy soil data from the recent soil survey
the Chinese Soil Taxonomy (CRGCST, 2001). These of Luo et al. (2007). Two modeling methods, decision
soil orders are, respectively, similar to Acrisols, Cam- tree and linear discriminant analysis, were employed,
bisols, Ferralsols, Luvisols, Gleysols, Leptisols and An- and different combinations of soil covariates from local
throsols of the World Reference Base for Soil Re- dominant soil forming factors, i.e., parent material and
sources (IUSS Working Group WRB, 2006). Primosols relief, were tried. Application of the modeling methods
and Anthrosols usually occur on local flat areas where and combinations of soil covariates on mapping were
soil formation is mainly due to human activities, and compared.
Gleyosols exclusively occurs on swamps and wetlands
MATERIALS AND METHODS
of the northern border. Therefore, these three soil or-
ders can be easily identified according to land use. Study area
The remaining soil orders are mainly distributed on
local mountain areas and their distributions are quite Hong Kong (22◦ 8 –22◦ 36 N, 113◦ 50 –114◦ 23
complex being influenced by parent material, relief and E) is located in the southeastern tip of China with a
vegetation. Thus, this study is aimed at mapping these total land area of 1 104 km2 (Fig. 1). The climate is
soil orders. According to Grant (1962), spatial distri- subtropical and temperate. The annual air tempera-
bution of soils in Hong Kong is mainly controlled by ture is about 23.0 ◦ C, and the mean annual rainfall is
parent material and relief whereas vegetation is not a 2 214 mm.
good indicator for soil change. Luo et al. (2007) indi- About 75% of the land is mountainous. The moun-
cated that distributions of the targeted soil orders are tains mainly trend from northeast to southwest. The
quite different by parent material and relief, but are highest mountain is Tai Mo Shan with an elevation of
very similar by vegetation. Hence, local parent mate- 957 m, followed by Lan Tau Island with an elevation
rial and relief information can be used as soil covariates of 934 m (Fig. 1). Slopes on the mountains are usually
to map the soil orders on local mountain areas. very rugged. On foot slopes of the mountains, vegeta-
In digital soil mapping, prediction is implemented tion is mostly comprised of old Banyan trees (Ficus
DIGITAL SOIL ORDER MAPPING OF HONG KONG 341

Fig. 1 Study area and sample locations in Hong Kong, China.

retusa). On mountainsides not subjected to erosion, sea, and it mainly occurs in urban areas.
vegetation is generally grasses. On higher slopes, vege-
Soil sampling, classification and mapping
tation changes to grasses with patches of melastoma-
ceous and ericaceous undershrubs. On the top of high In the soil survey of Luo et al. (2007), soil sampling
mountains, such as Tai Mo Shan and Lan Tau Is- was conducted in a traditional manner. The dominant
land (Fig. 1), the vegetation is mainly turf, largely soil forming factors on mountain areas, parent material
consisting of Ischaemum spp., containing ground or- and relief (Fig. 2), were mainly used to guide soil sam-
chids, balsams and mountain Compositae (Grant, pling. After studying the information of parent mate-
1962). The remaining 25% of the land comprises many rial and relief, soil surveyors empirically divided Hong
flat patches. These areas are separately distributed Kong soil into 11 kinds of soil association, and several
along the coastlines, in alluvial lowlands, in valleys and representative soil profiles for each kind of soil associa-
in basins. Most of the flat areas are very intensively tion were examined. Totally, 51 soil profiles were exa-
urbanized, and the remaining only a few farmlands, mined, and 41 of them were on mountain areas (Fig. 1).
less than 5% of the whole land area, are adjacent to The morphological characteristics, such as soil hori-
and separated by the urban area in the New Terri- zon depth, Munsell color, structure, etc., of all exami-
tories (Fig. 1). On the mid-northern boundary of the ned profiles were recorded. A number of physical and
New Territories, there are a lot of wetlands and swamps chemical properties of the profiles related to soil clas-
(mainly mangroves) (Fig. 1). sification were analyzed in order to classify the pro-
Fig. 2 shows the local parent materials based on files. Details about these analyses refer to Luo et al.
the 1:200 000 Hong Kong geologic map (The Hong (2007). Based on measured soil properties, the profiles
Kong Geological Survey, 1999). According to this map, were classified as Ferrosols, Ferralosols, Argosols and
parental materials are tuff (48%), granite (26%), allu- Cambosols according to the Chinese Soil Taxonomy
vium/colluvium (11%), sedimentaries (6%), lava (3%), (CRGCST, 2001), as indicated earlier. Ferrosols are
tuff and lava (1%), and fill (5%). Tuff dominates the characterized as pH < 5.0, base saturation less than
mountain areas, especially high mountains, followed 50%, having a lower activity-ferric horizon and an al-
by granite on relatively lower mountains. Sedimen- litic property. Ferralosols are characterized as acid,
taries are mixture of sandstone, siltstone, mudstone, alumino-silicate minerals accumulated, having a lower
graphitic and conglomerate, occurring on lower places. activity-ferric horizon and high clay content. Argosols
Alluvium/colluvium occurs on flat areas (Fig. 2), some are characterized as weak acid, illuvial accumulations
of which are covered by patches of urban areas. Lava of clay in B horizon, a low ratio of SiO2 /Al2 O3 and
occurs on the eastern part of the Lan Tau Island, as free iron less than 40% of total iron in soil body. Cam-
well as mixture of tuff and lava. Fill means the ma- bosols are characterized as lower accumulations of clay
terial added by human when reclaiming land from the in B horizon and a lower ratio of SiO2 /Al2 O3 in sub-
342 X. L. SUN et al.

Fig. 2 Information related to dominated soil forming factors: parent material and relief.

soil. The 4 soil orders are similar to Ultisols, Oxisols, rial and relief (Fig. 2). Based on the relief information
Alfisols and Inceptisols of the U.S. Soil Taxonomy. which was a 15 m digital elevation model (DEM) con-
Soil mapping in this survey was based on the hy- structed based on the 1:50 000 Hong Kong topographic
pothesis that similar soil formation conditions will ge- map using ArcGIS software, many terrain attributes
nerate similar soils. Information of the two dominant were derived: elevation (Z, m), slope (S, ◦ ), aspect (A,

soil forming factors (Fig. 2) was analyzed, and Hong ), profile curvature [Kp , m (100 m)−1 ], plan curvature
Kong area was divided into a number of landform- [Kc , m (100 m)−1 ], natural logarithm of specific catch-
parent material groups. By assigning representative ment area [ln(As ), m2 m−1 ], stream power index (SPI)
soil taxa to each kind of landform-parent material and topographic wetness index (TWI). These terrain
group, a soil group map was established. In the present attributes are commonly used in digital soil mapping
study, this soil group map was used to produce a soil (McBratney et al., 2003), and details about them refer
order map (Fig. 3), by assigning a soil order of a soil to Florinsky et al. (2002). Considering the non-linear
group to the corresponding areas of that soil group. variation of aspect as a circular variant, the sine of
aspect (sinA) was used instead of aspect itself.
Soil covariates
Decision tree analysis and linear discriminant analysis
Soil covariates used in this study were also from
the two dominant soil forming factors: parent mate- Decision tree analysis is commonly used in soil ma-
DIGITAL SOIL ORDER MAPPING OF HONG KONG 343

Fig. 3 Soil order map based on the soil group map from the soil survey of Luo et al. (2007).

pping (Dobos et al., 2001; Scull et al., 2005; Grinand Selection of soil covariates for decision tree analysis
et al., 2008). For example, Scull et al. (2005) used this and linear discriminant analysis
method to predict soil taxonomic unit from soil covari-
ates. In this study, C5.0 (Quinlan, 2002) was used to Selection of soil covariates used for digital soil map-
implement decision tree analysis. A tree model must ping is an “acute” problem (Lark et al., 2007). Though
be pruned before using it to classify new samples. In a lot of data mining methods can automatically choose
C5.0, a tree model was firstly pruned by combining the best variables to build a model, they do this just
a leaf node or a branch that was predicted to have based on the given set of variables by modelers. For
a relatively high misclassification rate (%, percentage different given sets of soil covariates, different mode-
of misclassified objects over all classified objects) with ling results will be generated. Therefore, in this study,
another one. This pruning process was applied to eve- different sets of soil covariates were tried.
ry branch and every leaf node. Following this pruning Before decision tree analysis and linear discri-
process, globally pruning continued to prune the tree minant analysis, variances of those calculated terrain
model by examining performance of the tree model as attributes between soil orders were tested by analysis
a whole. In this study, decision tree analysis was im- of variance (ANOVA) in SAS system. Results of this
plemented based on all of the soil covariates mentioned analysis could indicate the importance of a terrain at-
in the section of soil covariates. tribute in differentiating soil orders, which would be
Linear discriminant analysis is also a commonly suggestive for selecting variables in linear discriminant
used method for digital soil mapping (McBratney et analysis and decision tree analysis.
al., 2003). For example, Thomas et al. (1999) used li- Linear discriminant analysis was conducted thro-
near discriminant analysis to predict soil classes based ugh the stepwise DISCRIM procedure. Selecting vari-
on terrain attributes derived from a DEM. Linear dis- ables in this procedure was based on statistics of
criminant function is determined by a measure of gene- Wilks’s Lambda and average squared canonical cor-
ralized squared distance based on the pooled covari- relation. Meanwhile, results from ANOVA were also
ance matrix (Sinowski and Auerswald, 1999). Scores of considered in building discriminants.
these discriminant functions on new cases were used to For the 9 soil covariates mentioned above, there
classify the new cases. In this study, this analysis was would be 512 combinations of them, which obviously
implemented by the DISCRIM procedure in SAS sys- would generate different mapping results. Assuming
tem. Since linear discriminant analysis can not handle that the more closely a soil covariate is related to lo-
categorical variables, only the terrain attributes stated cal soil formation, the more it will influence the results
in the section of soil covariates were used in this ana- of decision tree analysis and vice versa, a backward
lysis. removing procedure was developed to screen the most
344 X. L. SUN et al.

and least influential soil covariates after trying all of fect agreement) to 0 (no agreement above that ex-
the 9 soil covariates in decision tree analysis to find: pected by chance) to −1 (complete disagreement). It
(1) the most influential soil covariates by removing that was computed as:
attribute with the least usage in decision tree analysis,
and (2) the least influential soil covariates by removing Po − Pc
K= (1)
that attribute with the greatest usage in decision tree 1 − Pc
analysis. Subsequently, decision tree analysis was con- np
ducted again based on the remaining attributes. These Po = × 100% (2)
no
processes were repeated until the performance of deci-
sion tree analysis became very poor (indicated by mis- C
 nip × nio
classification rate). Based on the results of backward
i=1
no
removing variables in decision tree analysis, as well as Pc = (3)
the results of ANOVA, the most influential soil forming no
factors, i.e., the most closely related factors to soil for- where Po is the observed agreement, Pc is the chance
mation, were identified and were selected as the best agreement, C is the number of categories of the tested
combinations of soil covariates to build a tree model. samples, no is the total number of tested samples, np
For comparison, some other sets of soil covariates were is the total number of correctly predicted tested sam-
selected to build tree models, as well as one selected ples, nip is the total number of tested samples which
according to the knowledge on local soil genesis and are predicted into the ith category, nio is the total
our experience on soil mapping. number of tested samples which actually are the ith
category. The accuracy of the final map was also as-
Validation and the final soil order map
sessed in terms of producer accuracy and user accuracy
Considering the very limited sample size in the (Scull et al., 2005). Producer accuracy is a measure of
legacy soil data, this study used the approach pro- how well the test data are classified. User accuracy is
posed by Bishop and McBratney (2001), i.e., multi- a measure of how likely a test sample classified into a
ple jackknifing, to validate prediction results. This ap- given category actually belongs to that category. The
proach involved random separating the whole sampling flowchart of making the final soil order map is shown
dataset in a ratio into two subsets, one for prediction in Fig. 4.
and one for validation. The random separation was
conducted a number of times. Each time, a model was
constructed based on the prediction subset and then
validated based on the validation subset. Mean accu-
racy from this cross-validation over a number of times
was used to identify the best model. In this study, the
sampling dataset was separated in the ratio of about
90% to 10%, i.e., 35 samples for prediction and 6 sam-
ples for validation. Accuracy of an established model
was assessed in terms of misclassification rate. This
validation procedure was randomly repeated 25 times.
The best model was identified in terms of the mean
misclassification rate over the 25 times.
The best model of those established during the
above validation was selected to make the final soil or-
der map. The criterion for selecting the best model was
prediction accuracy in terms of misclassification rate
and kappa statistic (Sim and Wright, 2005). Kappa
statistic is an index which compares the agreement
against that which might be expected by chance. It
can be thought of as the chance-corrected proportional Fig. 4 The flowchart for mapping soil order on mountain
agreement (K), and possible values range from 1 (per- areas of Hong Kong, China.
DIGITAL SOIL ORDER MAPPING OF HONG KONG 345

RESULTS AND DISCUSSION bution of the four soil orders, according to the results
shown in Table I.
Variances of terrain attributes between soil orders The results of ANOVA showed that the relation-
Table I showed that only elevation had significant ship between these soil orders and a single terrain at-
differences (P < 0.05) between soil orders, followed by tribute was not very significant. This may be due to
ln(As ) with difference between soil orders approaching the fact that formation of the soil orders was not go-
the significant level of 0.05 (P = 0.06). The variance verned by a single terrain attribute, but by a number
of slope between soil orders was also wider than the of factors, including not only terrain attributes, but
others. The significant difference in elevation between also other soil forming factors and their interactions
soil orders indicated that soil orders over Hong Kong (CRGCST, 2001). However, the results of ANOVA
had an obvious vertical zonation. Average elevations of would be useful for identifying key soil covariates in
the four soil orders were 294, 216, 137 and 61 m, respec- the following linear discriminant analysis and decision
tively, for Ferrosols, Argosols, Ferralosols and Cam- tree analysis.
bosols. The relatively significant difference of ln(As )
Soil covariates in linear discriminant analysis
(P = 0.06) indicated the importance of water flow in
soil processes, as eluviation induced by water flow plays Table II showed that five terrain attributes were
a major role in the formation of Ferrosols, Ferralosols statistically significant in constructing linear discrimi-
and Argosols (CRGCST, 2001). The influence of slope nants, according to the statistics of Wilks’ lambda and
on soil processes through redistributing materials and average squared canonical correlation. Elevation con-
energy over space was slightly strong, as evident by its tributed the most to the discriminations, followed by
relatively more remarkable differences between soil or- ln(As ), slope, TWI and profile curvature. This se-
ders than other terrain attributes, though it was not quence was the same as that presented in Table I be-
statistically significant (P = 0.1) (Table I). Other ter- cause stepwise DISCRIM and ANOVA shared the same
rain attributes seemed to be less related to the distri- way to assess contributions of the terrain attributes,
i.e., the way of analyzing variances of the terrain at-
TABLE I
tributes among soil orders. However, ANOVA only
Analysis of variance of terrain attributesa) between soil or- considered one single terrain attribute at a time, while
ders on mountain areas of Hong Kong, China stepwise DISCRIM considered several at a time. Thus,
Item Z S sinA Kp Kc ln(As ) SPI TWI more terrain attributes appeared as significant vari-
F value 3.33 2.17 0.12 1.02 0.67 2.62 1.00 1.71
ables in linear discriminant analysis compared with
P > F b) 0.03* 0.10 0.94 0.39 0.57 0.06 0.40 0.18 ANOVA.
Two sets of soil covariates were selected to model
*Significant at 0.05 level.
a) discriminant functions. The first set (DI) was selected
Z = elevation, S = slope, A = aspect, Kp = profile curva-
according to the results of ANOVA that showed the
ture, Kc = plan curvature, As = specific catchment area,
SPI = stream power index, TWI = topographic wetness most significant influences of elevation, ln(As ) and
index. slope on soil formation. Therefore, the first set (DI)
b)
The confidence of F value. comprised these three terrain attributes. The second

TABLE II

Results of linear discriminant analysis for the five terrain attributesa) through the stepwise DISCRIM procedure
Step Terrain Wilks’ P < Wilks’ Average squared canonical P < ASCC
attribute lambda lambda correlation (ASCC)
1 Z 0.78 0.03* 0.07 0.03*
2 ln(As ) 0.65 0.01* 0.13 0.01*
3 S 0.55 0.01* 0.17 0.009**
4 TWI 0.49 0.01* 0.21 0.01*
5 Kp 0.47 0.03* 0.22 0.02*

*, **Significant at P = 0.05 and P = 0.01 levels, respectively.


a)
Z = elevation, As = specific catchment area, S = slope, TWI = topographic wetness index, Kp = profile curvature.
346 X. L. SUN et al.

set (DII) was selected according to the results of li- results suggested that parent material, elevation, pro-
near discriminant analysis in Table II, comprising ele- file curvature and ln(As ) were the most influential soil
vation, ln(As ), slope, TWI and profile curvature. Two covariates on decision tree modeling of the local soil-
discriminant functions were modeled based on these landscape relationships.
two sets of soil covariates. On the whole training M1 to M8 listed in the middle part of Table III
dataset, their misclassification rates were 46% and shows the results of removing the most important soil
44%, respectively. covariates. Number of leaf nodes and misclassification
rates varied dramatically when parent material, eleva-
Soil covariates in decision tree analysis tion, slope, profile curvature, ln(As ), plan curvature
and TWI were removed one by one. Comparatively,
Table III shows the results from implementing the these indexes varied more dramatically in the middle
procedure of backward removing variables in decision part of Table III than in the upper part, indicating a
tree analysis. L1 to L8 listed in the upper part of Table greater change in the performances and structures of
III shows the results of removing the least important modeled trees. It reflected that the more important
soil covariates. In the first five trees, SPI, slope, TWI a variable was, the more it influenced tree modeling.
and plan curvature were removed one by one. All of As a consequence, plan curvature and SPI were deter-
the misclassification rates of these trees were 0%, and mined to be the least influencing variables in decision
changes in the number of leaf nodes were few. The usa- tree modeling of the local soil-landscape relationships.
ges of most soil covariates also changed slightly. In the The above results showed that slope was removed
last three trees, sine of aspect and ln(As ) were removed very early during the process of removing both the
one by one. Both misclassification rates and usages of most and least important variables. Thus, it was not
the remaining soil covariates increased slightly, with easy to determine whether slope was important or not.
the number of leaf nodes changed a little more. These However, results of ANOVA in Table I showed that

TABLE III

Decision tree modeling results of removing the least and most important soil covariatesa)
Setb) Parent material Z S sinA Kp Kc TWI SPI ln(As ) Leaf nodes Misclassification rate
%
Least important
L1 100 63 10 17 63 12 12 0 39 17 0.0
L2 100 63 10 17 63 12 12 - 39 17 0.0
L3 100 63 - 17 63 12 12 - 39 18 0.0
L4 100 63 - 27 71 - 24 - 39 18 0.0
L5 100 63 - 27 71 12 - - 39 18 0.0
L6 100 63 - 27 71 - - - 39 18 2.4
L7 100 83 - - 73 - - - 49 19 2.4
L8 100 83 - - 73 - - - - 16 9.8
Most important
M1 100 63 10 17 63 12 12 0 39 17 0.0
M2 - 100 27 46 68 20 63 7 68 14 7.3
M3 - - 100 32 44 39 0 15 71 14 12.2
M4 - - - 78 100 95 73 0 61 16 7.3
M5 - - - 61 - 83 63 0 100 16 2.4
M6 - - - 5 - 100 100 76 - 10 24.4
M7 - - - 22 - - 100 100 - 9 31.7
M8 - - - 7 - 100 - 76 - 7 36.6
LM 100 76 20 - 63 - - - 49 19 0.0
KE 100 63 10 - 68 39 59 - - 19 0.0
a)
Z = elevation, S = slope, A = aspect, Kp = profile curvature, Kc = plan curvature, TWI = topographic wetness index,
SPI = stream power index, As = specific catchment area.
b)
LM: removing the least and most important soil covariates together; KE: knowledge and experience in soil mapping.
DIGITAL SOIL ORDER MAPPING OF HONG KONG 347

slope was relatively closely related to distributions of their numbers of leaf nodes were close, i.e., 17, 18, 19
soil orders (P = 0.1), and in fact slope was very fre- and 19.
quently used for soil mapping (McBratney et al., 2003).
Therefore, slope was finally used in the subsequent pre- Mapping soil orders based on selected soil covariates
diction of soil orders. Contrary to slope, sinA was re-
Results from validating the above selected two dis-
moved very late during the both processes, but results
criminant functions and four decision tree models using
of ANOVA showed that this soil covariate was scarcely
the multiple jackknifing validation approach are sum-
related to distributions of soil orders (P = 0.9). There-
marized in Table IV. Generally, the two discriminant
fore, sinA was not used in the subsequent prediction of
functions performed better than the four tree models,
soil orders. Based on the above analysis, the first set of
although they did not explore the usefulness of parent
soil covariates for decision tree modeling was designed
material for predicting soil orders. Between the two
to include all of the most influential soil covariates:
discriminant functions, model DI performed much bet-
parent material, elevation, slope, profile curvature and
ter than model DII, reflecting that a more straightfor-
ln(As ), i.e., LM listed in Table III.
ward but good enough discriminant function was much
With the objective of generating a good quality
more useful. Among the four tree models, model LM
soil order map for this area, other three sets of soil
which was selected through backward removing proce-
covariates, i.e., L1 , L5 and KE in Table III, were also
dure performed best, followed by model KE which was
selected to construct decision tree models. L1 and L5
selected according to knowledge and experience on soil
were chosen because they were two extremes in the pro-
mapping, model L5 and model L1 . Results from vali-
cess of backward removing the most important vari-
dating the four tree models reflected that: 1) it was
ables. L1 contained all soil covariates, while L5 con-
important to select soil covariates for tree modeling,
tained the fewest but keeping the lowest misclassifica-
even though decision tree analysis can automatically
tion rate. KE was established according to the com-
choose best variables for modeling; 2) backward re-
mon knowledge about local soil genesis and our expe-
moving procedure improved tree modeling, as well as
rience in soil mapping. As mentioned earlier, parent
knowledge and experience on soil mapping; 3) a more
material and relief played dominant roles in the local
straightforward but good enough tree model was much
soil development. McSweeney et al. (1994) contended
more useful, like that for discriminant functions.
that on the watershed scale, soil formation and deve-
lopment were mainly influenced by five environmental TABLE IV
variables: elevation, slope, profile curvature, plan cur-
vature and TWI. However, ln(As ), depicting the con- Summaries on misclassification rate of selected two
discriminant functions (DI and DII) and four decision tree
tribution area of water flowing across a point, would
models (L1 , L5 , LM and KE) using the multiple jackknifing
be more appropriate in reflecting the influence of water
validation approach
on soil processes than TWI, which depicts stationary
water content in soil. This is because water flow pas- Model Mean Minimum Maximum Standard deviation
sing through soil bodies or on soil surfaces affects soil %
more directly and more strongly than stationary water DI 49 33 83 14.55
in soil bodies (Florinsky et al., 2002; Thompson et al., DII 59 33 100 16.75
2006). Particularly formations of Ferrolsols, Ferralosols L1 71 33 100 15.18
and Argosols are more related to water movement than L5 69 33 100 14.29
water content (CRGCST, 2001). Furthermore, results LM 67 33 100 15.64
of ANOVA shown in Table I also indicated that ln(As ) KE 68 33 100 14.77
was more related to distributions of soil orders than
TWI. As a consequence, KE was designed to com- According to mean misclassification rates listed in
prise parent material, elevation, slope, profile curva- Table IV, model DI with the lowest mean misclassifi-
ture, plan curvature and ln(As ). cation rate was identified as the best model for produ-
Based on the four designed sets of soil covariates, cing the final soil order map. Among all the 25 models
four decision trees were modeled respectively. Perfor- which were established using this model during jack-
mances of these four tree models are presented in Ta- knifing validation, the one having the lowest misclassi-
ble III. Misclassification rates of them were all 0%, and fication rate (33%) and the highest kappa statistic
348 X. L. SUN et al.

(0.63) was selected to generate the final soil order map Ferrolosols were also reduced substantially on the digi-
(Fig. 5). User accuracy and producer accuracy of this tal map compared with the traditional map. The two
map from jackknifing validation are presented in Ta- soil orders seemed to be replaced by Argosols in Fig. 5,
ble V. The table shows that the final map was very while some Argosols were replaced by Ferralosols. To-
accurate to reflect the distribution of Ferrosols and tally, the digital map reproduced only 18.4% of the
Ferralosols. Distribution of Argosols was very poorly traditional one. This big gap between them is due first
predicted and distribution of Cambosols was the worst to the different models used to produce the two maps
predicted. Weighted by the area percentages of soil or- and second but more crucially, it is most likely due to
ders on the map, average producer accuracy and ave- the limited sample size of the legacy soil data on which
rage user accuracy of this map were 98.7% and 82.6%, this study was based.
respectively, showing that the final soil order map had
a high accuracy. CONCLUSIONS
The final digital soil order map (Fig. 5) was very
different from the traditional one (Fig. 3). Apparently, Linear discriminant models performed better than
the digital soil order map was much more detailed in re- decision tree models. ANOVA was very useful for selec-
flecting the distribution of soil orders. Comparison be- ting soil covariates for building a discriminant. The re-
tween the two maps showed that Cambosols in the digi- moving procedure developed in this study for selecting
tal map were much less than in the traditional map and soil covariates for tree modeling was useful to improve
were separately distributed among other soil orders. the performance of tree modeling. This was also true

Fig. 5 The final soil order map of Hong Kong predicted by model DI, which included three soil covariates, i.e., elevation,
natural logarithm of specific catchment area and slope, for modeling discriminant functions.

TABLE V

Error matrix of final soil order map


Predicted soil order Observed soil order User
accuracy
Argosols Cambosols Ferralosols Ferrosols
%
Argosol 1 2 0 0 33
Cambosol 0 0 0 0 0
Ferralosol 0 0 1 0 100
Ferrosol 0 0 0 2 100
Producer accuracy (%) 100 0 100 100
DIGITAL SOIL ORDER MAPPING OF HONG KONG 349

for knowledge and experience on soil mapping. Clearly, cedures, and integration of spatial context. Geoderma.
results of this study highlighted the importance of se- 143: 180–190.
lecting soil covariates in modeling for soil mapping, Henderson, B. L., Bui, E. N., Moran, C. J. and Simon, D.
and suggested the usefulness of the methods used in A. P. 2005. Australia-wide predictions of soil properties
using decision trees. Geoderma. 124: 383–398.
this study for selecting soil covariates.
IUSS Working Group WRB. 2006. World reference base
A soil order map was finally made using the best for soil resources 2006. World Soil Resources Reports
model established in this study, with a high accuracy No. 103. FAO, Rome.
in terms of misclassification rate, kappa statistics, pro- Kempen, B., Brus, D. J., Heuvelink, G. B. M. and Stoorvo-
ducer accuracy and user accuracy. Comparison be- gel, J. J. 2009. Updating the 1:50 000 Dutch soil map
tween the digital soil order map and the traditional using legacy soil data: A multinomial logistic regression
one showed that the former just reproduced 18.4% of approach. Geoderma. 151: 311–326.
the later. The big gap between them can be attributed Lagacherie, P. 2008. Digital soil mapping: a state of the art.
In Hartemink, A. E., McBratney, A. B. and Mendonca-
to the very limited sample size of the legacy soil data
Santos, M. L. (eds.) Digital Soil Mapping with Limited
on which this study was based. Thus, more soil sam-
Data. Springer, Dordrecht. pp. 3–14.
ples in future soil surveys for this area will be necessary Lark, R. M., Bishop, T. F. A. and Webster, R. 2007. Using
in order to improve the final soil order map, as well as expert knowledge with control of false discovery rate to
to produce more detailed soil maps. select regressors for prediction of soil properties. Geo-
derma. 138: 65–78.
ACKNOWLEDGEMENT Li, X., Poon, C. and Liu, P. S. 2001. Heavy metal contami-
nation of urban soils and street dusts in Hong Kong.
The authors are grateful to anonymous reviewers Appl. Geochem. 16: 1361–1368.
for their valuable comments and suggestions, and Dr. Luo, Y. M., Li, Z. G., Wu, L. H., Wu, S. C., Zhang, G. L.,
A. O. W. Leung at the Croucher Institute for Environ- Zhou, S. L., Zhao, Y. G., Zhao, Q. G., Wong, M. H. and
mental Sciences, Hong Kong Baptist University, Hong Zhang, H. B. 2007. Hong Kong Soils and Environment
Kong Special Administrative Region, China for editing (in Chinese). Science Press, Beijing.
the language of this manuscript. McBratney, A. B., Mendonca-Santos, M. L. and Minasny,
B. 2003. On digital soil mapping. Geoderma. 117:
3–52.
REFERENCES McSweeney, K., Gessler, P. E., Slater, B. K., Hammer, R.
D., Bell, J. and Petersen, G. W. 1994. Towards a new
Au, S. W. C. 1998. Rain-induced slope instability in Hong framework for modeling the soil-landscape continuum.
Kong. Eng. Geol. 51: 1–36. In Amundson, R, G., Harden, J. W. and Singer, M.
Bishop, T. F. A. and McBratney, A. B. 2001. A comparison (eds.) Factors of Soil Formation: A Fiftieth Anniver-
of prediction methods for the creation of field-extent sary Retrospective. Soil Science Society of America,
soil property maps. Geoderma. 103: 149–160. Madison, WI. pp. 127–145.
Cooperative Research Group on Chinese Soil Taxonomy Quinlan, J. R. 2002. Data Mining Tools See5 and C5.0.
(CRGCST). 2001. Chinese Soil Taxonomy. Science Available online at http://www.rulequest.com/see5-
Press, Beijing and New York. info.html (verified on January, 2011).
Dobos, E., Micheli, E., Baumgardner, M. F., Biehl, L. and Scull, P., Franklin, J. and Chadwick, O. A. 2005. The
Helt, T. 2000. Use of combined digital elevation model application of classification tree analysis to soil type
and satellite radiometric data for regional soil mapping. prediction in a desert landscape. Ecol. Model. 181:
Geoderma. 97: 367–391. 1–15.
Dobos, E., Montanarella, L., Nègre, T. and Micheli, E. Sim, J. and Wright, C. C. 2005. The kappa statistic in
2001. A regional scale of soil mapping approach using reliability studies: use, interpretation, and sample size
intergrated AVHRR and DEM data. Int. J. Appl. requirements. Phys. Ther. 85: 257–268.
Earth Observ. Geoinf. 3: 30–42. Sinowski, W. and Auerswald, K. 1999. Using relief para-
Florinsky, I. V., Eilers, R. G., Manning, G. R. and Fuller, L. meters in a discriminant analysis to stratify geological
G. 2002. Prediction of soil properties by digital terrain areas with different spatial variability of soil properties.
modelling. Environ. Modell. Softw. 17: 295–311. Geoderma. 89: 113–128.
Grant, C. J. 1962. The Soils and Agriculture of Hong Kong. Stoorvogel, J. J., Kempen, B., Heuvelink, G. B. M. and
Government Press, Hong Kong. de Bruin, S. 2009. Implementation and evaluation of
Grinand, C., Arrouays, D., Laroche, B. and Martin, M. P. existing knowledge for digital soil mapping in Senegal.
2008. Extrapolating regional soil landscapes from an Geoderma. 149: 161–170.
existing soil map: Sampling intensity, validation pro- The Hong Kong Geological Survey. 1999. Geological Map
350 X. L. SUN et al.

of Hong Kong in 1:200 000. 2nd Edition. Survey and 2006. Soil-landscape modeling across a physiographic
Mapping Office, Lands Department, the Government region: topographic patterns and model transportabi-
of Hong Kong Special Administrative Region, Hong lity. Geoderma. 133: 57–70.
Kong. Warren-Rhodes, K. and Koenig, A. 2001. Ecosystem ap-
Thomas, A. L., King, D., Dambrine, E., Couturie, A. and propriation by Hong Kong and its implications for sus-
Roque, J. 1999. Predicting soil classes with parameters tainable development. Ecol. Econ. 39: 347–359.
derived from relief and geologic materials in a sand- Zhu, A. X., Band, L., Vertessy, R. and Dutton, B. 1997.
stone region of the Vosges mountains (Northeastern Derivation of soil properties using a soil land inference
France). Geoderma. 90: 291–305. model (SoLIM). Soil Sci. Soc. Am. J. 61: 523–533.
Thompson, J. A., Pena-Yewtukhiw, E. M. and Grove, J. H.

You might also like