0% found this document useful (0 votes)
17 views15 pages

Jurnal Penggunaan ML Dalam Memonitoring Tutupan Lahan

Uploaded by

Hendra Triantoro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views15 pages

Jurnal Penggunaan ML Dalam Memonitoring Tutupan Lahan

Uploaded by

Hendra Triantoro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Ecological Informatics 74 (2023) 101955

Contents lists available at ScienceDirect

Ecological Informatics
journal homepage: www.elsevier.com/locate/ecolinf

Application of machine learning approaches for land cover monitoring in


northern Cameroon
Yisa Ginath Yuh a, *, Wiktor Tracz b, H. Damon Matthews a, Sarah E. Turner a
a
University of Concordia, 1455 Boulevard de Maisonneuve Ouest, Montréal, QC H3G 1M8, Canada
b
Faculty of Forestry, Warsaw University of Life Sciences (WULS-SGGW), Nowoursynowska 159, 02-776 Warsaw, Poland

A R T I C L E I N F O A B S T R A C T

Keywords: Machine learning (ML) models are a leading analytical technique used to monitor, map and quantify land use and
Land use and land cover change land cover (LULC) and its change over time. Models such as k-nearest neighbour (kNN), support vector machines
Machine learning (SVM), artificial neural networks (ANN), and random forests (RF) have been used effectively to classify LULC
Remote sensing
types at a range of geographical scales. However, ML models have not been widely applied in African tropical
Land cover classification
African forest and savanna
regions due to methodological challenges that arise from relying on the coarse-resolution satellite images
available for these areas. In this study, we compared the performance of four ML algorithms (kNN, SVM, ANN
and RF) applied to LULC monitoring within the Mayo Rey department, North Province, Cameroon. We used
satellite data from the Landsat 7 Enhanced Thematic Mapper Plus (ETM+) combined with 8 Operational Land
Imager (OLI) images of northern Cameroon for November 2000 and November 2020. Our results showed that all
four classification algorithms produced relatively high accuracy (overall classification accuracy >80%), with the
RF model (> 90% classification accuracy) outperforming the kNN, SVM, and ANN models. We found that
approximately 7% of all forested areas (dense forest and woody savanna) were converted to other land cover
types between 2000 and 2020; this forest loss is particularly associated with an expansion of both croplands and
built-up areas. Our study represents a novel application and comparison of statistical and ML approaches to LULC
monitoring using coarse-resolution satellite images in an African tropical forest and savanna setting. The
resulting land cover maps serve as an important baseline that will be useful to the Cameroon government for
policy development, conservation planning, urban planning, and deforestation and agricultural monitoring.

1. Introduction Satellite image processing is one of the most important tools used by
researchers for generating LULC maps (Chavez, 1996; Cracknell and
Human activities are continually modifying the Earth’s land surface. Reading, 2014; Mohajane et al., 2018; Xia et al., 2015). Using satellite
Changes in anthropogenic land use and land cover (LULC)are particu­ images is cost efficient and provides Earth surface data that cover large
larly acute in tropical regions, where rapid rates of deforestation, agri­ geographical areas. Datasets derived from satellite images enable ac­
cultural expansion, industrial development, migration, growth in curate classification of land cover types, and can be used to detect
population density and urbanization often manifest as an outcome of changes in land cover at different spatial scales (Gomez et al., 2016;
neocolonial extractivism and associated geopolitical conflict (Erika Kavzoglu and Colkesen, 2009). However, the processing time needed to
et al., 2015; Escobar, 2011; Pereira and Tsikata, 2021; Watson et al., generate accurate LULC maps using satellite image processing still rep­
2001; Yeshaneh et al., 2012). Monitoring and mitigating the negative resents a major challenge to remote sensing researchers, particularly
consequences of changing LULC has become a priority for researchers when using coarse-resolution satellite images (i.e., Landsat, from the
and policymakers worldwide. To monitor these changes successfully, National Aeronautics and Space Administration (NASA) and U.S.
there is a need to produce reliable and accurate LULC maps. Such maps Geological Survey (USGS) program that provides publicly available
provide vital information required for policy development, conservation satellite image data) (Gomez et al., 2016; https://landsat.gsfc.nasa.gov).
planning, urban planning, and deforestation and agricultural moni­ To improve accuracy and decrease processing time, several Machine
toring (Gebhardt et al., 2014; Wessels et al., 2003). Learning (ML) algorithms have been tested for LULC mapping using

* Corresponding author.
E-mail address: [email protected] (Y.G. Yuh).

https://doi.org/10.1016/j.ecoinf.2022.101955
Received 28 July 2022; Received in revised form 11 December 2022; Accepted 13 December 2022
Available online 21 December 2022
1574-9541/© 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-
nc-nd/4.0/).
Y.G. Yuh et al. Ecological Informatics 74 (2023) 101955

remote sensing data. Prominent examples include: k-Nearest Neighbors forest setting. Changes in LULC patterns in this part of Cameroon are
(kNN) (Samaniego and Schulz, 2009; Thakur and Panse, 2022; Zerrouki strongly affected by socio-economic factors such as changing farming
et al., 2019); Support Vector Machines (SVM) (Adam et al., 2014; Car­ practices, legal and illegal logging, and increases in the practice of
doso-Fernandes et al., 2020; Gong et al., 2013; Paneque-Ga’lvez et al., pastoral nomadism. Describing and understanding the impacts of socio-
2013; Thakur and Panse, 2022; Zerrouki et al., 2019); Artificial Neural economic and demographic changes on land use is vital for developing
Networks (ANN) (Megahed et al., 2015; Pacheco and Hewitt, 2014; Silva integrated, socially and economically sustainable environmental man­
et al., 2020; Zerrouki et al., 2019); Random Forest (RF) (Adam et al., agement and biodiversity conservation. Therefore, in addition to
2014; Gong et al., 2013; Thakur and Panse, 2022; Zerrouki et al., 2019); comparing classification algorithms, a goal of this study is to produce
the Maximum Likelihood Classification (MLC) (Guermazi et al., 2016); accurate LULC maps and estimates of land cover changes over the past
and Decision Trees (DT) (Teodoro, 2015; Thakur and Panse, 2022; 20 years. The result of this analysis can serve as baseline information
Törmä, 2013). These algorithms combine computer science and data that is required by the Cameroon government for policy development,
mining to solve classification, clustering, regression and other pattern biodiversity and forest conservation planning, urban planning, and
recognition problems (Cracknell and Reading, 2014; Hastie et al., 2009). deforestation and agricultural monitoring within this region.
They employ supervised classification systems using training datasets to
minimize classification errors that could otherwise be caused by the 2. Materials and methods
internal structure of the algorithms (Bousquet et al., 2004; Hastie et al.,
2009). As a result, ML algorithms can be used to improve classification 2.1. The study area
performance without needing to articulate the underlying mechanisms
and assumptions of traditional statistical models (Clarke, 2013; Hastie Our study area consists of the northern portion of the Mayo Rey
et al., 2009). They can therefore, be trained using both balanced datasets department, located in the North Province of Cameroon (Fig. 1). We
(with the same amount or number of pixels sampled for each LULC) and chose this region because: (1) It is a sub-Saharan tropical region with
imbalanced datasets (with different amount or number of pixels sampled mostly cloud-free Landsat images that are freely available, and easy to
for each LULC class) without major classification uncertainties. Here, we acquire and preprocess; and (2) the region is experiencing large scale
focus on four ML algorithms, kNN, SVM, ANN, and RF, which have been deforestation and agricultural expansion, and a rise in migration asso­
shown to be well suited to LULC classification and to outperform other ciated with political tensions in neighboring countries (Chad and the
algorithms such as MLC and DT (Khatami and Mountrakis, 2016; Noi Central African Republic) (Ndjidda, 2001; Tchobsala and Mbolo, 2010;
and Kappas, 2017). The kNN model is a non-parametric model that Tchotsoua, 2006).
performs LULC classification based on the distance between k closest The Mayo Rey department covers a total surface area of approxi­
samples drawn from training datasets. The approach depends on thor­ mately 36,000 km2. The population of the Mayo Rey is ~242,000 peo­
ough image (predictor) pre-processing so as to reduce sampling bias and ple, and this region borders on two countries: Chad in the north east and
ensure equal treatment of predictors when computing distance (Kuhn the Central African Republic to the south east. Our study area covers
and Johnson, 2016). The SVM model uses support vectors (i.e. based on approximately 800,000 ha in northern Mayo-Rey, located between
a subset of training data points closest to decision boundaries) to locate longitudes 13.7◦ E and 15◦ E and latitudes 8.4◦ N and 9.4◦ N (Fig. 1).
optimal decision boundaries that separate two LULC classes (Cortes and Annual mean rainfall ranges between 800 and 1000 mm and typical
Vapnik, 1995; Kuhn and Johnson, 2016). The ANN model is a mathe­ temperatures range from 25 to 30 ◦ C, though maximum temperature can
matical model developed as an analogy of the human brain. Using an reach values as high as 450 C. Elevations range between 348 and 794 m
interconnected group of responsive and conducting nodes, the ANN above sea level. This region is part of the Afro tropic biome and supports
model mimics, in a very simplified fashion, the functionality of the savanna and forest ecoregions, with the largest intact tracts of savanna
human brain for knowledge acquisition, recall, synthesis and problem forest found within the Bouba Njida National Park (Olson et al., 2001).
solving (Kubat, 1999; Yang, 2009). In LULC classification, the Multi- This national park contains a wide variety of ecosystems, some of which
Layer Perceptron (MLP) type of ANN has been used most often (Silva include: open and mixed wooded savanna grasslands, semi-evergreen
et al., 2020). MLP carries out backpropagation of training samples to riparian forests and thick dry savanna forests. Our study region also
accurately classify LULC. The RF was developed as an ensemble of ML supports a diversity of large mammalian fauna including elephants,
models that use bootstrap techniques to build many single decision tree lions, spotted hyena, buffalo, and many species of monkey and antelope.
models (Breiman, 2001; Mellor et al., 2013; Rodriguez-Galiano et al.,
2012). The RF model uses subsets of predictor variables (e.g. Landsat 2.2. Image acquisition and pre-processing
bands) to split observation datasets into subsets of homogenous samples
to build each decision tree (Mellor et al., 2013). We downloaded Landsat 7 Enhanced Thematic Mapper Plus (ETM+)
The kNN, SVM, ANN, and RF learning approaches have proven and Landsat 8 Operational Land Imager (OLI) images of northern
successful in improving LULC classification performance (Khatami and Cameroon from https://earthexplorer.usgs.gov/, for the dates 17
Mountrakis, 2016, Noi and Kappas, 2017), but the application of these November 2000 and 29 November 2020, and with a 10% maximum
methods requires considerable image preprocessing (particularly with cloud threshold. The images were loaded and preprocessed in R (R Core
coarse resolution images) in order to reduce uncertainties in LULC Team, 2016) using the “raster”, “rgdal” and “RStoolbox” packages (a full
classifications. Furthermore, there has been limited application of these description of packages is shown in Table M2).
approaches to effective monitoring of changes in LULC within tropical Image preprocessing was conducted using radiometric corrections
forest areas across Africa, for which coarse resolution satellite images (Jensen, 2005). Radiometric corrections convert digital satellite
are often the only available option. Those studies that do exist have numbers to radiance measures; this process corrects internal sensor er­
generally relied on applying only a single method (Brink and Eva, 2008; rors and reduces atmospheric noise. To perform radiometric corrections,
Matlhodi et al., 2019; Midekisa et al., 2017; Zoungrana et al., 2015), we applied four techniques using the radCor (Radiometric Calibration
which can increase classification uncertainties relative to the use of and Correction) function from the “RStoolbox” package (Leutner and
multiple ML methods. Horning, 2016): 1) the apparent reflectance model (AR) (Caselles and
In this study, our goal is to apply statistical and ML approaches using Garcia, 1989); 2) simple dark object subtraction (SDOS) (Chavez, 1988);
four classification algorithms (kNN, SVM, ANN, and RF) to map and 3) dark object subtraction (DOS) (Chavez, 1988); and 4) Cosine esti­
quantify changes in LULC within a tropical forest and savanna region in mation of atmospheric transmittance (COST) (Chavez, 1996). In com­
Central Africa (the Mayo Rey department of North Province, Cameroon), bination, these four radiometric processing techniques provided the
and to provide a novel comparison of these algorithms in an African necessary image preprocessing for our analyses.

2
Y.G. Yuh et al. Ecological Informatics 74 (2023) 101955

Fig. 1. Location of the study area within the Mayo Rey department of northern Cameroon.

We first converted the Landsat 7 and 8 digital numbers to at-satellite wavelength values are found in https://www.usgs.gov/landsat
radiance. We then applied the AR model, which helps correct the -missions/landsat-7 and https://www.usgs.gov/landsat-missions/
spectral band irradiance and solar zenith angle of the acquired images, landsat-8.
to convert the at-satellite radiance to top of atmosphere (TOA) reflec­
tance. However, the AR model does not correct for atmospheric scat­
2.3. Image processing (classification and change detection)
tering and absorption, and we therefore applied the SDOS to carry out
haze reduction, followed by the DOS approach, which assumes that
To process (classify) the atmospherically corrected surface reflec­
scattering is highest in the blue bands and gradually decreases towards
tance images of the study area for both years, we used a series of R
the near infra-red (NIR) bands. We used DOS to remove atmospheric
packages, including: “randomForest”, “caret”, “kknn”, “rpart”, “rgdal”,
scattering, and corrected for atmospheric additive scattering, spectral
“raster”, “sp”, “e1071”, “RStoolbox”, “nnet”, “kernlab”, “ggplot2” and
band irradiance and solar zenith. The main limitation to these ap­
the “NeuralNetTools” packages. Full descriptions of packages are shown
proaches is that they do not produce correct band reflectance values
in Table S1.
after removing atmospheric scattering (Chavez, 1996). To address this
limitation, we completed the radiometric corrections with the COST
2.3.1. Generating training (test and validation) datasets
model to correct for the multiplicative effects of atmospheric scattering
We used three steps for image classification: 1) the establishment of
and absorption, and produce images with correct band reflectance
training datasets, 2) classification, and 3) accuracy assessment. In
values. We then extracted the study area from the corrected images, and
generating training datasets, we first identified eight LULC classes in our
selected six bands as independent variables for image processing. We
study area, including croplands, dense forest, grassland savanna, open
used Landsat 7 band numbers 1, 2, 3, 4, 5 and 7, which correspond to
savanna/barelands, built-up areas, water bodies, wetlands, and woody
wavelengths of 0.45–0.52 μm, 0.52–0.60 μm, 0.63–0.69 μm, 0.77–0.90
savanna (see Table S2 for a full description of the LULC classes). We
μm, 1.55–1.75 μm and 2.08–2.35 μm, respectively, and Landsat 8 band
identified and selected these eight LULC classes to be consistent with the
numbers 2, 3, 4, 5, 6 and 7, which correspond to wavelengths of
land cover types used by Moderate Resolution Imaging Spectroradi­
0.45–0.51 μm, 0.53–0.59 μm, 0.64–0.67 μm, 0.85–0.88 μm, 1.57–1.65
ometer (MODIS) Global Land Cover products for the years 2010 and
μm and 2.11–2.29 μm, respectively. More details about these
2020. These MODIS data are generated by NASA, and mapped at a 500

3
Y.G. Yuh et al. Ecological Informatics 74 (2023) 101955

m pixel resolution (Friedl and Sulla-Menashe, 2019). We also incorpo­ 2.3.2. Performing LULC classifications and change detection
rated results from published datasets that examined and mapped single Using the remaining 80% test data alongside the image subset for the
land cover types, i.e., global forest cover data (Hansen et al., 2013; study area, we applied four image classification models (kNN, SVM,
Potapov et al., 2020), and global croplands and built-up areas data ANN and RF) to classify the test data by applying the train() function
(Potapov et al., 2020, 2021). These published datasets have been vali­ from the “caret” package. In order to clarify what we used as training
dated through statistically significant correlations with ancillary data­ vectors for all four classification approaches, we used the names of the
sets from the United Nations Food and Agricultural Organization (FAO), eight LULC classes identified (croplands, dense forest, grassland
as well as with other global land cover products generated by the NASA savanna, open savanna/barelands, built-up areas, water bodies, wet­
Global Ecosystems Dynamics Investigation (GEDI) service (Potapov lands, and woody savanna) as target variable values. For predictor
et al., 2022). variables, we used the six Landsat image bands mentioned in Section
Next, using these previously published datasets as reference, we 2.2. as predictors for our analysis. Before training each model, we
carried out a balanced land cover data sampling approach by randomly defined a set of model tuning parameters using the trainControl() func­
sampling approximately 500 pixel points (representing approximately tion of the “caret” package. Each modeling algorithm had at least one
5% of image pixels) for each land cover class, in each year of study, tuning parameter that controlled model performance, and the train­
based on sampling approaches for each class applied in Potapov et al., Control() function helped to evaluate these tuning parameters for model
(Potapov et al., 2021; Potapov et al., 2022) and Friedl and Sulla- performance. Table S3 shows the parameterization settings (i.e. model
Menashe (2019). For example, we sampled forest pixels from areas of type, number of tuning parameters/iterations, tuning methods and
land with trees ≥5 m in height (Potapov et al., 2021, 2022), and a description) for each of the four ML algorithms.
canopy cover ≥20% (FAO, 2000b; Friedl and Sulla-Menashe, 2019). We
then separated dense forest pixel samples from woody savanna pixel 2.3.2.1. kNN: k-nearest neighbour classification. In classifying pixels into
samples by following the criteria of Friedl and Sulla-Menashe (2019) (i. land use categories with the kNN model, we used the “kknn” package.
e. dense forest had >60% canopy cover, and woody savannas had be­ This model considers a group of k samples that are closest to the un­
tween 30% and 60% canopy cover (Table 1) (see Table S2 for more known sample, with the class of each unknown sample deduced by
details)). We loaded the training datasets in R, using the “sp” vector calculating the average of the k nearest neighbors (Akbulut et al., 2017;
package, then allocated 80% of the data as test files and the remaining Wei et al., 2017). In training the kNN classifier, we defined the LULC
20% as validation files using the createDataPartition () function from the classes of the test datasets as target (response) variables and the subset of
“caret” package. The test datasets enabled us to check optimal model image band reflectance values as predictors. Prior to performing model
parameters and initial model performance based on repeated cross- runs, we centered and scaled the predictor variables in order to reduce
validation, while the validation dataset enabled us to check final sampling bias during distance computation (Kuhn and Johnson, 2016).
model accuracy (Qian et al., 2015). Centering and scaling of predictors were done using the center_scale()
function of the “caret” package.

2.3.2.2. SVM: support vector machines classification. In classifying with


the SVM model, we used the packages: “e1071”, “kernlab” and
Table 1 “svmRadial”. These packages use the radial basis function (RBF) kernel
Land use classifications and definitions used for sampling pixels. of the SVM classifier to accurately perform LULC classification (Knorn
Land use class Land use class definition criteria Source et al., 2009; Shi and Yang, 2015). We carried out automatic tuning and
for sampling pixels again centered and scaled our predictor variables in order to reduce
Croplands Cultivated crops >60% of area Pixel points extracted from sampling bias.
Potapov et al. (2021);
selected only points that 2.3.2.3. ANN: artificial neural networks classification. In classifying with
matched the criteria defined
the MLP ANN model, we used the package “nnet”, which provides pos­
in Friedl and Sulla-Menashe
(2019) sibilities for adjusting weighted decay and size, thereby countering the
Dense forest Canopy height ≥ 5 m Potapov et al., 2021, 2022 effects of model overfitting. We used an MLP ANN architecture with 1
Canopy cover >60% Friedl and Sulla-Menashe, hidden layer established as a default setting within the “nnet” package,
2019
and with 6 neurons defined for our model inputs. The number of neurons
Grassland Canopy height of <5 m Friedl and Sulla-Menashe,
savanna Herbaceous non-agricultural 2019
in the input layer was equal to the number of used bands (6), and the
vegetation or grassland cover output layer had 8 neurons (representing 8 LULC classes). A back
>10% propagation learning algorithm was used during the training phase of
Open Vegetation cover of <10% Friedl and Sulla-Menashe, the model. Size and decay were used to define the primary model tuning
savanna/ 2019
parameters, and the control () function was used to control for model
barelands
Built-up areas Human-made land surfaces Potapov et al. (2020) runs. As with the kNN approach, we defined the LULC classes of the test
associated with built structures, datasets as target variables and the band reflectance values as pre­
such as commercial and dictors, and equally centered and scaled the predictor variables in order
residential infrastructures, and
to reduce sampling bias.
roads.
Water bodies Inland areas covered with at least Potapov et al. (2020) and
60% permanent water, and not Friedl and Sulla-Menashe 2.3.2.4. RF: random forest classification. With the RF approach, we used
obscured by objects above the (2019) the “randomForest” package. We allowed the model to set the number of
surface such as buildings, tree
trees (ntree) and number of features in each split (mtry) by default so as
canopies, and bridges.
Wetlands Vegetated and non-vegetated Friedl and Sulla-Menashe,
to ensure satisfactory model performance (Duro et al., 2012; Matlhodi
lands inundated with between 30 2019 et al., 2019; Zhang and Roy, 2017). i.e. about 500 decision trees were
and 60% water, and usually created by the model under default settings, with over 3000 training
forming swampy or peatlands samples randomly selected for training purposes under default settings.
Woody Canopy height ≥ 5 m Potapov et al., 2021, 2022
savanna Canopy cover between 30% and Friedl and Sulla-Menashe,
60% 2019

4
Y.G. Yuh et al. Ecological Informatics 74 (2023) 101955

2.3.3. Estimating classification accuracy in a given land cover class by the total number of pixels of that land
To produce and validate our LULC maps from all ML models, we cover class from the reference data. In producer’s accuracy, misclassified
applied two different approaches. In the first approach, classification pixels are referred to as an error of omission. The user’s accuracy defines
accuracies for all four models (kNN, SVM, ANN, and RF) were computed the reliability of a given land cover map with respect to how close the
and compared with the test and validation datasets using the con­ derived map is to ground observations, calculated by dividing the
fusionMatrix() function from the “caret” package. We computed four number of correctly classified pixels in a given land cover class by the
commonly used accuracy: overall accuracy (OA), producer’s accuracy, total number of pixels classified in that class. In user’s accuracy, mis­
user’s accuracy and Kappa coefficients, with OA values validated classified pixels are also referred to as an error of omission. The kappa
through statistical significant tests, and at 95% confidence intervals (CI). coefficient describes the percentage agreement between the test and
The OA defines the overall percentage of correctly classified LULC validation data in a generated land cover map. It is based on the prob­
classes, calculated as the number of correctly classified land cover pixels ability that the test data will be close to the validation data in the land
divided by the total number of pixels in the dataset (Congalton, 1991). cover mapping process. The kappa coefficient is highly correlated to the
The producer’s accuracy defines the percentage accuracy of each LULC overall accuracy. In general, these accuracy scores determine the degree
class in a LULC map, calculated by dividing the number of correct pixels to which a classified land cover map agrees with reality or conforms to

Fig. 2. Comparison of LULC classification of the study area between the years 2000 (left) and 2020 (right) based on the four models. Figures a-b illustrate ANN
classification maps; c-d illustrate kNN maps; e-f illustrate RF maps and g-h illustrate SVM maps.

5
Y.G. Yuh et al. Ecological Informatics 74 (2023) 101955

the truth (Campbell, 1996; Smits et al., 1999). They have been suc­ 3. Results
cessfully used to validate land cover maps generated at different
geographical scales (Liu et al., 2021; Sari et al., 2021; Wang et al., 2013; 3.1. Model performance and LULC mapping
Yang et al., 2021; Yuh et al., 2019), and therefore produce a robust
approach for validating land cover. From the kNN, SVM, ANN, and RF The LULC maps produced by the four ML classification algorithms
accuracy assessments, we generated LULC maps by predicting model are shown in Fig. 2. All four ML models performed well at producing
results with the subset image of the study area, using the predict () LULC classifications for both years of study (2000 and 2020) with OA
function from the R “prediction” package. scores of >80%, and statistically significant (p < 0.05) correlations with
In a second approach, we performed a Pearson’s correlation test existing LULC maps. The model accuracies for all four models (pro­
between our generated LULC products (i.e. the land cover product from ducer’s, user’s and overall accuracies, as well as kappa values) are
the model with the best classification accuracy) and datasets from shown in Tables 2-5. The RF model had the best overall performance
already published studies such as the Hansen et al. (2013) global forest (OA of 90% for the year 2020 and 99% for the year 2000), and it out­
cover map, the 2020 MODIS global land cover products (Friedl and performed the kNN, SVM and ANN models which had OAs of between
Sulla-Menashe, 2019), and the global built-up and cropland data pub­ 80% and 90% for both years of study. Because the RF model produced
lished in Potapov et al., (Potapov et al., 2020, 2021). We used these the best OA, we used the RF LULC maps from both years of study for
published datasets because they have been generated with high degrees further processing. To further validate the RF LULC maps, we correlated
of accuracy, and have been properly validated through statistically them with existing LULC maps (result shown in Tables 6 and 7), and then
significant correlations with ancillary datasets from the United Nations quantified the areas affected by LULC change between 2000 and 2020
Food and Agricultural Organization (FAO), as well as with other global using these validated maps. Pearson’s correlation tests show that our
land cover products generated by the NASA Global Ecosystems Dy­ land cover classes were strongly and significantly correlated with map
namics Investigation (GEDI) service (Potapov et al., 2022). The vali­ products published in Hansen et al. (2013), Friedl and Sulla-Menashe
dated datasets were then converted to vector layers using the raster to (2019), and Potapov et al. (2020, 2021). For example, our woody
polygon conversion tool in GIS software (ArcMap 10.8), and the attri­ savanna areas were strongly and significantly correlated with woody
bute tables for both years of study were intersected for change detection savannas extracted from Potapov et al. (2020) and Hansen et al. (2013)
analysis (Yuh et al., 2019). Detected changes between LULC types (i.e. (R2 = 0.98, p < 0.05 for the year 2020, and R2 = 0.99, p < 0.05 for the
change from one LULC type in the year 2000 to another in the year year 2000). Furthermore, we found a 90% correlation with our crop­
2020) were quantified in hectares using spatial statistics with the ArcGIS lands and those from the 2020 MODIS global land cover dataset (R2 =
geometry tool. 0.9, p < 0.05), and 98% correlation with croplands published in Potapov
et al. (2021). For built-up areas, we also found relatively strong corre­
lations between our datasets and datasets from Potapov et al., 2020 (R2
= 0.8, p < 0.05 for the year 2020, and R2 = 0.7, p < 0.05 for the year

Table 2
Accuracy assessment for the kNN classification.
2020

LULC class Croplands Dense Grassland Open savanna/ Built-up Water Wetlands Woody
forest savanna barelands areas bodies savanna

Croplands 80 0 0 0 0 0 0 9
Dense forest 0 5 0 0 0 2 1 0
Grassland savanna 1 0 12 0 0 0 0 0
Open savanna/barelands 0 0 0 5 0 0 0 0
Built-up areas 0 0 0 3 3 0 0 1
Water bodies 4 0 0 0 0 221 9 0
Wetlands 0 0 0 0 0 0 29 0
Woody savanna 0 0 0 0 0 0 0 412
Total 85 5 12 8 3 223 39 422
Overall producer’s accuracy 94.3 100 100 63 100 99.1 74.4 97.6
(%)
Overall accuracy ¼ 91.1%
95% CI (89%, 92%);
Kappa statistics ¼ 89%;
p < 0.05
2000
LULC class Croplands Dense Grassland Open savanna/ Built-up Water Wetlands Woody
forest savanna barelands areas bodies savanna
Croplands 20 0 0 0 0 0 0 33
Dense forest 0 296 0 6 1 11 0 0
Grassland savanna 0 0 40 0 1 0 1 0
Open savanna/barelands 5 0 1 35 68 0 4 1
Built-up areas 0 0 0 0 4 0 0 0
Water bodies 0 0 0 0 0 75 0 0
Wetlands 0 1 0 0 0 1 77 0
Woody savanna 0 0 0 0 0 0 0 149
Total 25 297 41 41 74 87 82 183
Overall producer’s accuracy 80 99.7 99 85.4 91.9 86.2 93.9 81.4
(%)
Overall accuracy ¼ 89.7%
95% CI (85%, 91%);
Kappa statistics ¼ 88%;
p < 0.05

6
Y.G. Yuh et al. Ecological Informatics 74 (2023) 101955

Table 3
Accuracy assessment for the ANN classification.
2020

LULC class Croplands Dense Grassland Open savanna/ Built-up Water Wetlands Woody
forest savanna barelands areas bodies savanna

Croplands 80 0 0 0 0 1 0 9
Dense forest 0 5 0 0 0 0 1 0
Grassland savanna 1 0 12 0 0 0 0 0
Open savanna/barelands 0 0 0 5 0 0 0 0
Built-up areas 0 0 0 3 3 0 0 1
Water bodies 4 0 0 0 0 223 9 0
Wetlands 0 0 0 0 0 0 29 0
Woody savanna 0 0 0 0 0 0 0 412
Total 85 5 12 8 3 223 39 422
Overall producer’s accuracy 94.3 100 100 100 100 100 74.4 97.6
(%)
Overall accuracy ¼ 95.8%
95% CI (93%, 97%);
Kappa statistics ¼ 94%;
p < 0.05
2000
LULC class Croplands Dense Grassland Open savanna/ Built-up Water Wetlands Woody
forest savanna barelands areas bodies savanna
Croplands 79 0 0 0 0 0 0 0
Dense forest 24 38 0 2 0 3 1 282
Grassland savanna 4 0 41 0 1 0 0 0
Open savanna/barelands 0 0 0 15 15 0 7 36
Built-up areas 0 1 0 0 31 0 0 0
Water bodies 0 0 0 0 0 77 0 0
Wetlands 0 0 0 0 0 1 84 0
Woody savanna 0 0 0 0 0 0 0 415
Total 107 38 41 17 47 81 92 733
Overall producer’s accuracy 73.8 100 100 88.2 70 95 91.3 56.6
(%)
Overall accuracy ¼ 84.4%
95% CI (80%, 87%);
Kappa statistics ¼ 83%;
p < 0.05

2000). We found a 99% correlation with water bodies and wetlands approximately 326,184 ha. Grassland savanna increased by approxi­
from the 2020 MODIS data (R2 = 0.99, p < 0.05). For grassland savanna mately 126,268 ha within the study period (from ~756 ha (0.1% of the
and open savanna/barelands, average correlation strengths where R2 = study area) in the year 2000 to ~127,000 ha (16% of the study area) in
0.5, p < 0.05 and R2 = 0.48, p < 0.05 respectively. Figs. 3 and 4 show the year 2020). With the loss in forest cover (both dense forest and
comparisons between our land cover maps and existing land cover maps woody savanna) and expansion of agriculture, there has also been a
published in Hansen et al. (2013), Friedl and Sulla-Menashe (2019), and significant expansion in built-up areas in this portion of Mayo Rey. Built-
Potapov et al. (2020, 2021). up areas expanded by approximately 3538 ha within the 20-year study
period (~1748 ha (0.2% of the study area) in the year 2000, to a ~ 5286
ha (0.7% of the study area) in the year 2020). We also found that the
3.2. Quantification of LULC classification Mayo Rey department of northern Cameroon has experienced dramatic
declines in inland water bodies over the study period. Water bodies have
The results from the different LULC classification approaches con­ declined from covering approximately 42,829 ha (5.4% of the study
ducted for both years of study (2000 and 2020) show that the total LULC area) in the year 2000 to approximately 24,095 ha (3% of the study
area for this section of the Mayo Rey department is approximately area) in the year 2020, leaving a loss of inland water bodies of
793,000 ha. The areas of individual land cover types and the changes approximately 18,733 ha.
that we detected between the years 2000 and 2020 are summarized in
Table 8. We found a significant loss in woody savanna within the study
area, and an almost complete loss of what little dense forest cover 3.3. Quantification of changes in LULC
existed in the study area. In the year 2000, woody savanna covered a
total land area of about 304,976 ha in our study area, which constituted The changes in LULC between the years 2000 and 2020 are shown in
approximately 39% of the land area analyzed. Woody savanna declined Fig. S1 and Table S4 To highlight these results, we generated thematic
to approximately 253,903 ha (32%) in the year 2020, accounting for hotspot maps for gains and losses in three of the LULC classes identified:
approximately 51,073 ha loss in woody savanna area within the study woody savanna, croplands and built-up areas (Fig. 5). Overall, changes
region. While dense forests covered only 291 ha of our study area in the in LULC over this period were dominated by an expansion of cropland
year 2000, it had declined to about 9 ha by the year 2020, suggesting an areas. We calculated a gain of 326,084 ha of cropland area over the 20-
almost complete loss in dense forest cover area within the study area. year period, with over 71,266 ha of open savanna/barelands, and
The Mayo Rey department has experienced a large-scale expansion 41,900 ha of woody savanna converted to croplands. A smaller amount
in cropland areas and grassland savanna in the 20 years of the study of cropland area (14,989 ha) has been lost to other LULC types, with
period. In the year 2000, croplands covered a total land area of most cropland loss occurring where abandoned croplands have been
approximately 50,000 ha, constituting about 6.3% of our study area. converted to grassland savannas (approximately 1046 ha), and some
Croplands increased to approximately 376,184 ha (47% of our study cropland loss associated with the expansion of built-up areas (653 ha).
area) in the year 2020, for a total increase in cropland area of An expansion in the build environment has also occurred over the 20

7
Y.G. Yuh et al. Ecological Informatics 74 (2023) 101955

Table 4
Accuracy assessment for the RF classification.
2020

LULC class Croplands Dense Grassland Open savanna/ Built-up Water Wetlands Woody
forest savanna barelands areas bodies savanna

Croplands 80 0 0 0 0 0 0 8
Dense forest 0 9 0 0 0 6 0 0
Grassland savanna 1 0 12 0 0 0 0 0
Open savanna/barelands 0 0 0 5 0 0 0 0
Built-up areas 0 0 0 0 3 0 0 1
Water bodies 4 0 0 4 0 221 9 0
Wetlands 0 0 0 0 0 0 30 0
Woody savanna 0 0 0 0 0 0 0 413
Total 85 9 12 9 3 227 39 422
Overall producer’s accuracy 94.3 100 100 56 100 97.4 76.9 97.9
(%)
Overall accuracy ¼ 90.3%
95 CI (94%, 97%);
Kappa statistics ¼ 94%;
p < 0.05
2000
LULC class Croplands Dense Grassland Open savanna/ Built-up Water Wetlands Woody
forest savanna barelands areas bodies savanna
Croplands 20 0 0 0 0 0 0 0
Dense forest 0 38 0 0 0 0 0 0
Grassland savanna 0 0 41 0 0 0 0 0
Open savanna/barelands 0 0 0 17 0 0 0 2
Built-up areas 0 0 0 0 47 0 0 0
Water bodies 0 0 0 0 0 76 0 0
Wetlands 0 0 0 0 0 92 0
Woody savanna 0 0 0 0 0 0 33
Total 20 38 41 17 47 76 92 35
Overall producer’s accuracy 100 100 100 100 100 100 100 94.3
(%)
Overall accuracy ¼ 99.3%
95 CI (94%, 99%);
Kappa statistics ¼ 97%;
p < 0.05

years of study, with a total of 3994 ha of built-up expansion. Of the land research on land use change in Africa.
converted to built-up areas, over 653 ha came from abandoned crop­ Our results show that the four ML classification models used here
lands, and another 2982 ha from open savanna/barelands. Of the 282 ha (kNN, SVM, ANN and RF) are robust approaches that could potentially
of dense forest cover lost between 2000 and 2020, over 218 ha was improve classification uncertainties within tropical forest regions glob­
converted to croplands, while of the 51,000 ha net loss in woody ally. It is worth noting however, that we did not consider potential non-
savanna areas, over 41,900 ha was converted to croplands. Despite this linearities that could arise from ecosystem dynamics in this region,
conversion of woody savanna, over 20,000 ha of woody savanna have which could be addressed in future analyses using models such as
also been gained, (with open savanna/bareland changing to woody Convergent Cross Mapping or Optimal Information Flow (Li and Con­
savanna in over 18,894 ha of those 20,000 ha). vertino, 2021). Nevertheless, our results are consistent with other
studies that have shown that the four algorithms we used here are able to
4. Discussion produce a high degree of classification accuracy (Adam et al., 2014;
Ghosh and Joshi, 2014), and as such, they can potentially outperform
4.1. Comparison and validation of ML classification models other supervised classifiers (e.g. MLC) in other tropical contexts as well
(Khatami and Mountrakis, 2016).
In this study, we have provided an analysis and comparison of land We found further that the RF model performed the best for both years
cover classifications produced by four ML algorithms for a portion of the of study compared to the other three models: RF showed greater than
Mayo Rey Department of Northern Cameroon. A recent syntheses and 90% accuracy compared to between 80 and 90% accuracy generated for
meta-analysis demonstrated that tropical regions are underrepresented the SVM, ANN and kNN models. These results show that RF could be the
in current studies, especially across equatorial Africa (Khatami and most suitable approach for LULC mapping within tropical regions across
Mountrakis, 2016). Furthermore, previous studies conducted within Africa, even though some studies have found that the kNN model out­
tropical regions across Africa have most often used the MLC supervised performed RF, as well as the other two algorithms, in some other con­
approach (Pacheco and Hewitt, 2014; Yuh et al., 2019), however, this texts (Heydari and Mountrakis, 2018; Pouteaua et al., 2011). However,
approach poses several methodological challenges or classification un­ consistent with our findings, RF is generally accepted as the best ML
certainties when coupled with the coarse-resolution satellite images (i. approach for mapping LULC (Belgiu and Drăgut, 2016; Pelletier et al.,
e., Landsat) used by most remote sensing researchers in Africa. Unlike in 2016), based on its superior modeling performance when compared to
Europe or North America, many African countries do not have advanced other ML algorithms (Gislason et al., 2006; Rodriguez-Galiano et al.,
space agencies with national satellite data collection programs, or large 2012).
budgets for large-scale land surveys; most African-based remote sensing The RF algorithm specifically solves problems associated with using
research relies on freely available Landsat data. Developing methods for freely available coarse-resolution Landsat images, by using Landsat
reducing classification uncertainties associated with the use of coarse- bands to split observation datasets into a subset of homogenous samples,
resolution Landsat images is therefore particularly pertinent for which are then used in building single decision trees (Mellor et al.,

8
Y.G. Yuh et al. Ecological Informatics 74 (2023) 101955

Table 5
Accuracy assessment for the SVM classification.
2020

LULC class Croplands Dense Grassland Open savanna/ Built-up Water Wetlands Woody
forest savanna barelands areas bodies savanna

Croplands 237 0 1 0 0 0 0 9
Dense forest 0 42 3 0 0 0 0 0
Grassland savannas 0 8 20 0 3 0 0 0
Open savanna/barelands 0 0 1 75 0 0 0 0
Built-up areas 0 1 5 1 24 0 0 2
Water bodies 0 0 0 0 0 47 10 0
Wetlands 0 0 0 0 0 0 29 0
Woody savanna 0 0 0 0 0 0 0 401
Total 237 51 30 76 27 47 39 412
Overall producer’s accuracy 100 82.4 66.7 98.7 88.9 100 74.4 97.3
(%)
Overall accuracy ¼ 88.6%
95% CI (87%, 90%);
Kappa statistics ¼ 87%;
p < 0.05
2000
LULC class Croplands Dense Grassland Open savanna/ Built-up Water Wetlands Woody
forest savanna barelands areas bodies savanna
Croplands 20 0 133 0 0 0 0 0
Dense forest 0 41 1 0 5 0 0 0
Grassland savanna 0 0 158 0 0 0 0 1
Open savanna/barelands 0 0 0 3 0 0 0 2
Built-up areas 0 0 0 0 11 0 0 0
Water bodies 0 2 0 0 0 76 0 0
Wetlands 0 2 0 0 0 92 33
Woody savanna 0 0 0 0 0 0 613
Total 20 43 294 3 16 76 92 649
Overall producer’s accuracy 100 95.3 53.7 100 68.8 100 100 94.5
(%)
Overall accuracy ¼ 89%
95% CI (84%, 91%);
Kappa statistics ¼ 87%;
p < 0.05

Table 6
Accuracy validation for the year 2020. This table shows correlation strengths between our land cover data, and datasets from other published results for the year 2020.
Correlation strengths are only determined for land cover classes that are available from the cited studies.
Land cover class Modis global land cover products Global forest cover data Global croplands data Global built-up data

Data Correlation Data Correlation Data Correlation Data Correlation


available? strength available? strength available? strength available? strength

Croplands Yes 0.9 No NA Yes 0.4 No NA


Dense forest No NA No NA No NA No NA
Grassland savanna Yes 0.5 No NA No NA No NA
Open savanna/ Yes 0.48 No NA No NA No NA
barelands
Built-up areas Yes NA No NA No NA Yes 0.8
Water bodies Yes 0.99 No NA No NA No NA
Wetlands Yes 0.99 No NA No NA No NA
Woody savanna No NA Yes 0.98 No NA No NA
Overall correlation 0.77 0.98 0.4 0.8

Modis global land cover products (Friedl and Sulla-Menashe, 2019; Global forest cover data (Hansen et al., 2013; Potapov et al., 2020); Global croplands data (Potapov
et al., 2021); Global built-up data (Potapov et al., 2020); NA = Not Applicable; NS = Non-significant.

2013). The best decision trees are automatically selected by the model in compare results with ground-truthed data or datasets classified at a local
an ensemble approach to predict land cover maps following a pixel to level. Local level classification of land surface features and ground-truth
pixel sampling approach. Given its high computation power, the RF mapping can provide more realistic land use and land cover class cate­
model is a powerful LULC prediction tool that should be prioritized in gory identification as compared with using global land cover products
LULC mapping within afro tropical regions. Despite this evidence for validation. However, we are not aware of any ground-truthed or
however, the ANN and SVM models still remain the most frequently used local level datasets available for our study area. As such, we followed a
classification algorithms for monitoring LULC and its change over time standardized protocol that was developed for mapping the MODIS
using Landsat images (Adam et al., 2014; Gong et al., 2013; Khatami and global land cover products, as well as for the Hansen et al. (2013), and
Mountrakis, 2016). Potapov et al. (2020, 2021) global forest cover, global croplands and
We validated our RF results by correlating the resulting land cover built-up dataset respectively. We cross validated our results with these
classifications with existing global land cover products. Although this products, as these products (especially the Hansen and Potapov data­
approach is robust, an alternate validation approach could be to sets) have been mapped with high levels of accuracy, and were validated

9
Y.G. Yuh et al. Ecological Informatics 74 (2023) 101955

Table 7
Accuracy validation for the year 2000. This table shows correlation strengths between our land cover data, and datasets from other published results for the year 2000.
Correlation strengths are only determined for land cover classes that are available from the cited studies. Global Modis land cover datasets do not exist for the year
2000, and are therefore excluded from the Table.
Land cover class Global forest cover data Global croplands data Global built-up data

Data available? Correlation strength Data available? Correlation strength Data available? Correlation strength

Croplands No NA Yes 0.98 No NA


Dense forest No NA No NA No NA
Grassland savanna No NA No NA No NA
Open savanna/ barelands No NA No NA No NA
Built-up areas No NA No NA Yes 0.98
Water bodies No NA No NA No NA
Wetlands No NA No NA No NA
Woody savanna Yes 0.99 No NA No NA
Overall correlation 0.99 0.98 0.98

Modis global land cover products (Friedl and Sulla-Menashe, 2019; Global forest cover data (Hansen et al., 2013; Potapov et al., 2020); Global croplands data (Potapov
et al., 2021); Global built-up data (Potapov et al., 2020); NA = Not Applicable; NS = Non-significant.

Fig. 3. Maps showing comparison in LULC between our study and products extracted from the MODIS global LULC products (Friedl and Sulla-Menashe, 2019), as
well as products published by Potapov et al. (2021). Map comparisons are for the year 2020, and represent a comparison between water bodies (e-f), open savanna (c-
d) and woody savanna (a-b).

10
Y.G. Yuh et al. Ecological Informatics 74 (2023) 101955

Fig. 4. Maps showing comparison in LULC between our study and products extracted from the MODIS global LULC products (Friedl and Sulla-Menashe, 2019), as
well as products published by Potapov et al. (2020, 2021). Map comparisons are for the year 2020, and represents a comparison between croplands (a, c and e), and
built-up areas (b, d and f).

Table 8
Quantified LULC class areas and change areas between the years 2000 and 2020. Percentages represent the fraction of the study area represented by the land cover class
in each year, as well as the fraction of study area represented by the change in area between 2000 and 2020.
2000 2020 2020–2000

Land cover class Area (ha) Area (%) Area (ha) Area (%) Changed area (ha) Changed area (%)

Croplands 50,099.9 6.3 376,184.6 47.3 326,084.7 41.0


Dense forest 291.0 0.0 9.1 0.0 − 281.9 0.0
Grassland savanna 756.2 0.1 127,024.3 16.0 126,268.2 15.9
Open savanna/barelands 390,180.9 47.3 2526.5 0.3 − 387,654.4 − 48.9
Built-up areas 1747.7 2.1 5285.9 0.7 3538.3 0.4
Water bodies 42,828.5 5.4 24,095.1 3.0 − 18,733.4 − 2.4
Wetlands 2134.1 0.3 5460.3 0.7 3326.2 0.4
Woody savannas 304,975.9 38.5 253,902.8 32.0 − 51,073.1 − 6.5

11
Y.G. Yuh et al. Ecological Informatics 74 (2023) 101955

Fig. 5. Hotspot maps for gains and loss in three LULC classes identified to show the highest environmental values in the Mayo Rey department of northern Cameroon.
They include: woody savanna, croplands and built-up areas.

with more conventional validation datasets from the United Nations Central Park in New York City (340 ha; Britannica, 2022) to a forested
Food and Agricultural Organization (FAO). We do interpret our results area in 2020 that is was only 9 ha, about half the size of the Buckingham
with some caution, as the MODIS global land cover data has several Palace grounds in London (Royal Collection Trust, 2022).
limitations due to misclassification of some land cover features (Friedl It is likely that political tensions in neighbouring Chad and the
and Sulla-Menashe, 2019), and so direct comparisons may produce some Central African Republic have contributed to the increased population
erroneous results. Nevertheless, our overall results provide a novel density that is associated with some of the land use changes we docu­
analysis and maps that can be usefully applied for policy development mented in our study. The expansion in croplands and built-up areas, and
and sustainable land use planning in Cameroon. the loss of dense forests and woody savannas in the Mayo Rey have
occurred in parallel with a rapid rise in migration into this region, as
refugees flee political tensions from neighboring countries (Chad and
4.2. Changes in land cover between 2000 and 2020
the Central African Republic). A lack of economic opportunities for
migrants and displaced people has contributed to ecological pressures in
The result of our comparison of the RF LULC classifications between
a part of the world already economically disadvantaged by neocolonial
2000 and 2020 showed substantial changes in land cover in the Mayo extractivism. This region has seen a rise in illegal logging, and conver­
Rey department over this 20-year period, characterized by increased
sion of forest and woody savanna land for agriculture and nomadic
croplands and built-up areas, and corresponding decreases in forested pastoral use (Ndjidda, 2001; Tchobsala and Mbolo, 2010; Tchotsoua,
areas. It is notable here that the area of dense forest in this region
2006). These tensions and land use changes are an outcome of
decreased from an area in the year 2000 that was comparable to that of

12
Y.G. Yuh et al. Ecological Informatics 74 (2023) 101955

extractivism embedded in contemporary global capitalism and associ­ In particular, there have been high rates of immigration in this area as a
ated geopolitical conflict, where resources flow unequally and unsus­ result of armed conflicts in the neighboring countries Chad and the
tainably to the global north (Escobar, 2011; Pereira and Tsikata, 2021). Central African Republic – a situation that requires attention and
Furthermore, many people in this region depend on fuel wood for resource allocation so that all the people involved can support them­
heating and cooking (Megevand, 2013), which can also contribute to selves sustainably in the region while maintaining important ecosystem
deforestation when coupled with rapid social and demographic shifts. In services and relationships with land, water and forests based on local
addition, an increase in the number of people practicing nomadic ecological knowledge systems.
pastoralist livelihoods may be contributing to a loss in forest cover and Our results provide baseline information required by the Cameroon
woody savanna and associated expansion of grassland savanna, crop­ government for policy development, conservation planning, urban
lands and open savanna/barelands in a context where local traditional planning, and deforestation and agricultural monitoring in northern
ecological knowledge is being lost due to human displacement. Tradi­ Cameroon. Our methodological approaches will foster the advancement
tionally, mobility has been a strategy Indigenous pastoralists have used of knowledge in the application of ML algorithms for LULC monitoring
in order to reduce negative impacts on the land, and to respond sus­ within tropical rain forest regions across Africa, especially with the use
tainably to change (Kongnso, 2022). However, geopolitical pressures of coarse-resolution Landsat images. We recommend that these mapping
disrupt sustainable traditional practices and lifeways, leading to over- approaches be tested further in forested areas across other African re­
grazing of lands by cattle, which can then lead to desertification and gions that remain underrepresented in the remote sensing literature.
deforestation (Asner et al., 2004; Kongnso, 2022). When combined with
geopolitical tensions and human displacement, cattle grazing and other Funding
forms of agriculture, especially industrial agriculture, may have led to a
large-scale reduction of surrounding water bodies through irrigation This research did not receive any specific grant from funding
changes and desertification (Fonteh, 2014). agencies in the public, commercial, or not-for-profit sectors.
The loss in water bodies that we documented could also be related to
climate changes (Fonteh, 2014) considering that the study area faces CRediT authorship contribution statement
high seasonal temperatures with relatively low precipitation rates.
Increasing temperatures resulting from climate change are associated Yisa Ginath Yuh: Conceptualization, Data curation, Formal anal­
with increased evapotranspiration (Cheo et al., 2013), which can ysis, Investigation, Methodology, Software, Validation, Visualization,
contribute to reduced lake area and altered surface runoff patterns Writing – original draft, Writing – review & editing. Wiktor Tracz:
(Frederick, 2002). In addition to climate change effects, inland water Supervision, Validation, Writing – review & editing. H. Damon Mat­
loss could also be attributed to flood mitigation and post-flooding thews: Validation, Writing – review & editing. Sarah E. Turner: Vali­
reconstruction projects in the Far North region of Cameroon (http dation, Writing – review & editing.
s://www.worldbank.org/en/results/2020/11/10/flood-management-
in-the-far-north-of-cameroon). The Northern region of Cameroon expe­
rienced high levels of flooding in the year 2012 as a result of high Declaration of Competing Interest
rainfall, causing extensive damage to property and crops. As a conse­
quence, the Cameroon government and the World Bank implemented an The authors declare no competing financial interest.
emergency rehabilitation plan (2014–2020). This plan oversaw the
building of more than 7000 ha of dykes, 2700 ha of dams, and 7500 ha of Data availability
irrigation schemes. Our study found significant decreases in the area of
inland water bodies near these rehabilitated areas, suggesting that they Data will be made available on request.
may have had unintended negative consequences for the Mayo Rey and
surrounding ecosystems. Acknowledgements

5. Conclusion This work was conducted as an independent research course work for
a master of Science diploma in Forest Information Technology at the
Our study provides a first attempt, to our knowledge, to apply and Faculty of Forestry, Warsaw University of Life Science, Poland. Sincere
compare four statistical and ML models (kNN, ANN, RF and SVM) as thanks therefore goes to Prof. Michal Zasada for providing YGY with the
potentially robust means of monitoring changes in LULC using coarse hands-on practice in statistics and machine learning for biodiversity. We
resolution satellite images within a tropical African biome. By testing also acknowledge the contributions of Prof. Dr. Jan-Peter Mund for his
these approaches with cloud free Landsat images from the northern organized seminars and training modules in remote sensing of the
section of the Mayo Rey department of northern Cameroon, we showed Environment. Further thanks goes to Herve Noundo Brice for providing
that all four classification algorithms provided significant and relatively assistance with resources for data curation, compilation and analysis,
high degrees of accuracy in LULC classification (i.e. all models had and Dr. Julián Idrobo and Dr. Nalini Mohabir for editorial suggestions.
>80% OA), supporting similar findings from other regions of the world.
As a result, highly accurate LULC maps and quantified change detection Appendix A. Supplementary data
derived through the application of these ML approaches are possible.
Our findings show that the RF model outperformed the kNN, SVM and Supplementary data to this article can be found online at https://doi.
ANN models, and produced highly accurate LULC maps, which pro­ org/10.1016/j.ecoinf.2022.101955.
duced statistically significant correlations when validated against other
existing global LULC products. References
We showed further that significant areas of forest (dense forest and
Adam, E., Mutanga, O., Odindi, J., Abdel-Rahman, E.M., 2014. Landuse/ cover
woody savanna) have been lost through conversion to other LULC types
classification in a heterogeneous coastal landscape using RapidEye imagery:
within the 20-year study period. In particular, large proportions of these evaluating the performance of random forest and support vector machines
forest areas were converted to croplands and built-up areas between classifiers. Int. J. Remote Sens. 35, 3440–3458.
2000 and 2020. We suggest that many of the LULC changes we observed Akbulut, Y., Sengur, A., Guo, Y., Smarandache, F., 2017. NS-k-NN: neutrosophic set-
based k-nearest neighbors classifier. Symmetry 9, 179.
are related to growth in population density within the study area, Asner, G.P., Elmore, A.J., Olander, L.P., Martin, R.E., Harris, A.T., 2004. Grazing systems,
without an associated increase in economic and social support systems. ecosystem responses, and global change. Annu. Rev. Environ. Resour. 29, 261–299.

13
Y.G. Yuh et al. Ecological Informatics 74 (2023) 101955

Belgiu, M., Drăgut, L., 2016. Random forest in remote sensing: a review of applications Kongnso, M.E., 2022. Indigenous Adaptation of Pastoralists to Climate Variability and
and future directions. ISPRS J. Photogramm. Remote Sens. 114, 24–31. Range Land Management in the Ndop Plain, North West Region, Cameroon. in
Bousquet, O., Boucheron, S., Lugosi, G., 2004. Introduction to statistical learning theory. Indigenous Knowledge and Climate Change: A Sub-Saharan African Pespective.
In: Bousquet, O., von Luxburg, U., Ratsch, G. (Eds.), Advanced Lectures on Machine Springer Nature Switzerland. Sustainable Development Goals Series. SDG: 13
Learning. Lecture Notes in Artificial Intelligence, vol. 3176. Springer, Heidelberg, Climate Action.
Germany. Kubat, M., 1999. Neural networks: a comprehensive foundation by Simon Haykin,
Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32. Macmillan, 1994, ISBN 0-02-352781-7. Knowl. Eng. Rev. 13 (4), 409–412. https://
Brink, A.B., Eva, H.D., 2008. Monitoring land cover change dynamics in Africa: a sample doi.org/10.1017/S0269888998214044.
based remote sensing approach. Appl. Geogr. 29, 1–12. Kuhn, M., Johnson, K., 2016. Applied predictive modeling. [in] Lo, C.P.; Choi, J. 2004. A
Britannica, T. (Ed.), n.d. of Encyclopaedia (2022, September 2). Central Park. hybrid approach to urban land use/cover mapping using Landsat 7 enhanced
Encyclopedia Britannica. https://www.britannica.com/place/Central-Park-New- thematic mapper plus (ETM+) images. Int. J. Remote Sens. 25, 2687–2700.
York-City. Leutner, B., Horning, N., 2016. Package ‘RStoolbox’. http://bleutner.github.io/
Campbell, J.B., 1996. Introduction to Remote Sensing, 2nd ed. Taylor and Francis, RStoolbox.
London. Li, J., Convertino, M., 2021. Inferring ecosystem networks as information flows. Sci. Rep.
Cardoso-Fernandes, J., Teodoro, A.C., Lima, A., Roda-Robles, E., 2020. Semi- 11, 7094. https://doi.org/10.1038/s41598-021-86476.
automatization of support vector machines to map lithium (Li) bearing pegmatites. Liu, L., Zhang, X., Gao, Y., Chen, X., Shuai, X., Mi, J., 2021. Finer-resolution mapping of
Remote Sens. 12 (14), 2319. https://doi.org/10.3390/rs12142319. global land cover: recent developments, consistency analysis, and prospects. J. Rem.
Caselles, V., Garcia, M.L., 1989. An alternative approach to estimate atmospheric Sens. 2021, 38. https://doi.org/10.34133/2021/5289697.
correction in multitemporal studies. Int. J. Remote Sens. 10, 1127–1134. Matlhodi, B., Kenabatho, P.K., Parida, B.P., Maphanyane, J.G., 2019. Evaluating land use
Chavez, P.S., 1988. An improved dark-object subtraction technique for atmospheric and land cover change in the gaborone Dam Catchment, Botswana, from 1984-2015
scattering correction of multispectral data. Rem. Sens. Environ. 24, 459–479. using GIS and remote sensing. Sustainability 11, 5174.
Chavez, P.S., 1996. Image-based atmospheric correction-revisited and improved. Megahed, Y., Cabral, P., Silva, J., Caetano, M., 2015. Land cover mapping analysis and
Photogramm. Eng. Remote. Sens. 62, 1025–1036. urban growth modelling using remote sensing techniques in Greater Cairo Region of
Cheo, A.E., Voigt, H.-J., Mbua, R.L., 2013. Vulnerability of water resources in northern Egypt. ISPRS Int. J. Geo Inf. 4 (3), 1750e1769.
Cameroon in the context of climate change. Environ. Earth Sci. 70, 1211–1217. Megevand, C., 2013. Deforestation Trends in the Congo Basin: Reconciling Economic
Clarke, B., 2013. Guest editorial for special issue of statistical analysis and data mining, 6 Growth and Forest Protection. World Bank, Washington DC.
(4), 271–272. Mellor, A., Haywood, A., Stone, C., Jones, S., 2013. The performance of random forests in
Congalton, R.G., 1991. A review of assessing the accuracy of classifications of remotely an operational setting for large area sclerophyll forest classification. Remote Sens. 5
sensed data. Remote Sens. Environ. 37 (1), 35–46. (6), 2838–2856.
Cortes, C., Vapnik, V., 1995. Support-vector networks. Mach. Learn. 20, 273–297. Midekisa, A., Holl, F., Savory, D.J., Andradepacheco, R., Gething, P.W., Bennett, A.,
Cracknell, M.J., Reading, A.M., 2014. Geological mapping using remote sensing data: a Sturrock, H., 2017. Mapping land cover change over continental Africa using
comparison of five machine learning algorithms, their response to variations in the Landsat and Google Earth Engine cloud computing. PLoS One 12, e0184926.
spatial distribution of training data and the use of explicit spatial information. Mohajane, M., Essahlaoui, A., Oudija, F., Hafyani, M.E., Hmaidi, A.E., Ouali, A.E.,
Comput. Geosci. 63, 22–33. Randazzo, G., Teodoro, A.C., 2018. Land use/land cover (LULC) using landsat data
Duro, D.C., Franklin, S.E., Dubé, M.G., 2012. A comparison of pixel-based and object- series (MSS, TM, ETM+ and OLI) in azrou forest, in the Central Middle Atlas of
based image analysis with selected machine learning algorithms for the classification Morocco. Environments 5 (12), 131. https://doi.org/10.3390/
of agricultural landscapes using SPOT-5 HRG imagery. Rem. Sens. Environ. 118, environments5120131.
259–272. Ndjidda, 2001. Structures et dynamiques des espèces ligneuses dans les zones Sud Est du
Erika, R., Celso, B.L., Martin, H., Erik, L., Robert, O., Arief, W., Daniel, M., Louis, V., Parc National de Waza. Mémoire du Diplôme d’Ingénieur des Eaux et Forêts.
2015. Assessing change in national forest monitoring capacities of 99 tropical Université de Dschang, p. 62.
Countries. For. Ecol. Manag. 352 (109–123), 76. Noi, P.T., Kappas, M., 2017. Comparison of random forest, k-nearest neighbor, and
Escobar, A., 2011. Encountering development: The making and unmaking of the Third support vector machine classifiers for land cover classification using sentinel-2
World, vol. 1. Princeton University Press. imagery. Sensors 18, 18.
FAO, 2000. On Definitions of Forest and Forest Change, Forest Resources Assessment Olson, D.M., Dinerstein, E., Wikramanayake, E.D., Burgess, N.D., et al., 2001. Terrestrial
Programme Working Paper 33, November, 2000. FAO, Rome, Italy, p. 15. ecoregions of the world: a new map of life on earth. BioScience 51 (11), 933–938.
Fonteh, M.F., 2014. An assessment of impacts of climate change on available water Pacheco, J.D., Hewitt, R.J., 2014. Modelado del cambio de usos del suelo urbano a trav_
resources and security in Cameroon. J. Cameroon Acad. Sci. 11 (2). es de Redes Neuronales Artificiales. Comparaci_on con dos aplicaciones de software.
Frederick, K.D. (Ed.), 2002. Water Resources and Climate Change: Themanagement of GeoFocus 14 (1), 1e22.
Water Resources, vol. 2. Edward Elgar PublishingLtd, Cornwall, p. 528. Paneque-Ga’lvez, J., Mas, J.-F., More, G., et al., 2013. Enhanced land use/cover
Friedl, M., Sulla-Menashe, D., 2019. MCD12Q1 MODIS/Terra+Aqua land cover type classification of heterogeneous tropical landscapes using support vector machines
yearly L3 Global 500m SIN Grid V006 [Data set]. In: NASA EOSDIS Land Processes and textural homogeneity. Int. J. Appl. Earth Obs. Geoinf. 23, 372–383.
DAAC. https://doi.org/10.5067/MODIS/MCD12Q1.006. Accessed 2022-07-20 from. Pelletier, C., Valero, S., Inglada, J., Champion, N., Dedieu, G., 2016. Assessing the
Gebhardt, S., Wehrmann, T., Ruiz, M.A.M., Maeda, P., Bishop, J., Schramm, M., robustness of random forests to map land cover with high resolution satellite image
Kopeinig, R., Cartus, O., Kellndorfer, J., Ressl, R., et al., 2014. MAD-MEX: automatic time series over large areas. Rem. Sens. Environ. 187, 156–168.
wall-to-wall land cover monitoring for the Mexican REDD-MRV program using all Pereira, C., Tsikata, D., 2021. Contextualising extractivism in Africa. Feminist Africa 2
Landsat data. Remote Sens. 6, 3923–3943. (1), 14–47.
Ghosh, A., Joshi, P.K., 2014. A comparison of selected classification algorithms for Potapov, P., Li, X., Hernandez-Serna, A., Tyukavina, A., Hansen, M.C., et al., 2020.
mapping bamboo patches in lower Gangetic plains using very high resolution Mapping and monitoring global forest canopy height through integration of GEDI
WorldView 2 imagery. Int. J. Appl. Earth Obs. Geoinf. 26, 298–311. and Landsat data. Rem. Sens. Environ. 112165 https://doi.org/10.1016/j.
Gislason, P.O., Benediktsson, J.A., Sveinsson, J.R., 2006. Random forests for land cover rse.2020.112165.
classification. Pattern Recogn. Lett. 27, 294–300. Potapov, P., Turubanova, S., Hansen, M.C., Tyukavina, A., Zalles, V., et al., 2021. Global
Gomez, C., White, J.C., Wulder, M.A., 2016. Optical remotely sensed time series data for maps of cropland extent and change show accelerated cropland expansion in the
land cover classification: a review. Int. Soc. Photogram. Rem. Sens. 116, 55–72. twenty-first century. Nat. Food. https://doi.org/10.1038/s43016-021-00429-z.
Gong, P., Wang, J., Yu, L., et al., 2013. Finer resolution observation and monitoring of Potapov, P., Hansen, M.C., Pickens, A., Hernandez-Serna, A., Tyukavina, A., et al., 2022.
global land cover: first mapping results with Landsat TM and ETM? data. Int. J. The global 2000-2020 land cover and land use change dataset derived from the
Remote Sens. 34, 2607–2654. Landsat archive: first results. Front. Rem. Sens. https://doi.org/10.3389/
Guermazi, E., Bouaziz, M., Zairi, M., 2016. Water irrigation management using remote frsen.2022.856903.
sensing techniques: a case study in Central Tunisia. Environ. Earth Sci. 75, 202. Pouteaua, R., Collinb, A., Stolla, B.A., 2011. Comparison of Machine Learning Algorithms
Hansen, M.C., Potapov, P., Moore, R., Hancher, M., Turubanova, S.A., Tyukavina, A., for Classification of Tropical Ecosystems Observed by Multiple Sensors at Multiple
et al., 2013. High-resolution global maps of 21st-century forest cover change. Scales. International Geoscience and Remote Sensing Symposium 2011, Vancouver,
Science 342, 850–853. BC, Canada.
Hastie, T., Tibshirani, R., Friedman, J.H., 2009. The Elements of Statistical Learning: Qian, Y., Zhou, W., Yan, J., Li, W., Han, L., 2015. Comparing machine learning classifiers
Data Mining, Inference, and Prediction. Springer-Verlag, New York. for object-based land cover classification using very high resolution imagery. Remote
Heydari, S.S., Mountrakis, G., 2018. Effect of classifier selection, reference sample size, Sens. 7, 153–168.
reference class distribution and scene heterogeneity in per-pixel classification R Core Team, 2016. R: A Language and Environment for Statistical Computing. R
accuracy using 26 Landsat sites. Rem. Sens. Environ. 204, 648–658. Foundation for Statistical Computing, Vienna, Austria.
Jensen, J.R., 2005. Introductory Digital Image Processing: A Remote Sensing Perspective. Rodriguez-Galiano, V., Ghimire, B., Rogan, J., Chica-Olmo, M., Rigol-Sanchez, J., 2012.
3rd. Prentice Hall, Upper Saddle River, New Jersey. An assessment of the effectiveness of a random forest classifier for land-cover
Kavzoglu, T., Colkesen, I.A., 2009. kernel functions analysis for support vector machines classification. ISPRS J. Photogramm. Remote Sens. 67, 93–104.
for land cover classification. Int. J. Appl. Earth Obs. Geoinf. 11, 352–359. Royal Collection Trust, 2022. Press Office, Royal Collection Trust, York House, St
Khatami, R., Mountrakis, G., Stehman, S.V., 2016. A meta-analysis of remote sensing James’s Palace, London, SW1A 1BQ T. +44 (0)20 7839 1377. press@
research on supervised pixel-based land cover image classification processes: general royalcollection.org.uk. www.royalcollection.org.uk.
guidelines for practitioners and future research. Rem. Sens. Environ. 177, 89–100. Samaniego, L., Schulz, K., 2009. Supervised classification of agricultural land cover using
Knorn, J., Rabe, A., Radeloff, V.C., Kuemmerle, T., Kozak, J., Hostert, P., 2009. Land a modified k-NN technique (MNN) and Landsat remote sensing imagery. Remote
cover mapping of large areas using chain classification of neighboring Landsat Sens. 1 (4), 875–895.
satellite images. Remote Sens. Environ. 113, 957–964.

14
Y.G. Yuh et al. Ecological Informatics 74 (2023) 101955

Sari, I.L., Weston, C.J., Newnham, G.J., Volkova, L., 2021. Assessing accuracy of land Watson, R.T., Noble, I.R., Bolin, B., Ravindranath, N.H., Verardo, D.J., Dokken, D.J.,
cover change maps derived from automated digital processing and visual 2001. Land Use, Land Use Change, and Forestry. Cambridge University Press,
interpretation in tropical forests in Indonesia. Remote Sens. 13, 1446. https://doi. Cambridge (United Kingdom).
org/10.3390/rs13081446. Wei, C., Huang, J., Mansaray, L.R., Li, Z., Liu, W., Han, J., 2017. Estimation and mapping
Shi, D., Yang, X., 2015. Support vector machines for land cover mapping from remote of winter oilseed rape LAI from high spatial resolution satellite data based on a
sensor imagery. In: Monitoring and Modeling of Global Changes: A Geomatics hybrid method. Remote Sens. 9, 488.
Perspective. Springer, Dordrecht, the Netherlands, pp. 265–279. Wessels, K.J., Reyers, B., Jaarsveld, A.S., Rutherford, M.C., 2003. Identification of
Silva, L.P., Xavier, A.P.C., Silva, R.M., Santos, C.A.G., 2020. Modeling land cover change potential conflict areas between land transformation and biodiversity conservation
based on an artificial neural network for a semiarid river basin in northeastern in north-eastern South Africa. Agric. Ecosyst. Environ. 95, 157–178.
Brazil. Glob. Ecol. Conserv. 21, e00811. Xia, J.S., Mura, M.D., Chanussot, J., Du, P., He, X., 2015. Random subspace ensembles
Smits, P.C., Dellepiane, S.G., Schowengerdt, R.A., 1999. Quality assessment of image for hyperspectral image classification with extended morphological attribute
classification algorithms for land-cover mapping: a review and proposal for a cost- profiles. IEEE Trans. Geosci. Remote Sens. 53, 4768–4786.
based approach. Int. J. Remote Sens. 20, 1461–1486. Yang, X., 2009. Artificial neural networks for urban modeling. In: Madden, M. (Ed.),
Tchobsala, Amougou A., Mbolo, M., 2010. Impact of wood cuts on the structure and Manual of geographic information systems. ASPRS, USA, pp. 647–657.
floristic diversity of vegetation in the peri-urban zone of Ngaoundere, Cameroon. Yang, Y., Yang, D., Wang, X., Zhang, Z., Nawaz, Z., 2021. Testing accuracy of land cover
J. Ecol. Nat. Environ. 2 (11), 235–258. classification algorithms in the qilian mountains based on GEE cloud platform.
Tchotsoua, M., 2006. Evolution récente des territoires de l’Adamawa central: de la Remote Sens. 13, 5064. https://doi.org/10.3390/rs13245064.
spatialisation à l’aide pour un développement maîtrisé. Université d’Orléans. Ecole Yeshaneh, E., Wagner, W., Exner-Kittridge, M., Legesse, D., Blöschl, G., 2012. Identifying
doctorale sciences de l’homme et de la société. HDR. Discipline (Géographie- land use/cover dynamics in the Koga catchment, Ethiopia, from multi-scale data,
Aménagement Environnement), p. 267. and implications for environmental change. ISPRS Int. J. Geo Inf. 2, 302–323.
Teodoro, A.C., 2015. Applicability of data mining algorithms in the identification of Yuh, Y.G., Dongmo, Z.N., N’Goran, P.K., Ekodeck, H., Mengamenya, A., et al., 2019.
beach features/patterns on high-resolution satellite data. J. Appl. Remote. Sens. 9 Effects of land cover change on great apes distribution at the lobeke national park
(1), 095095 https://doi.org/10.1117/1.JRS.9.095095. and its surrounding forest management units, South-East Cameroon. A 13-year time
Thakur, R., Panse, P., 2022. Classification performance of land use from multispectral series analysis. Sci. Rep. 9, 1445.
remote sensing images using decision tree, K-nearest neighbor, random forest and Zerrouki, N., Harrou, F., Sun, Y., Hocini, L., 2019. A machine learning-based approach
support vector machine using EuroSAT data. Int. J. Intellig. Syst. Appl. Eng. 10 (1s), for land cover change detection using remote sensing and radiometric
67–77. measurements. IEEE Sensors J. 19 (14), 5843–5850. https://doi.org/10.1109/
Törmä, M., 2013. Land cover classification of finnish lapland using decision tree jsen.2019.2904137.
classification algorithm. Photogram. J. Finland 23 (2). Zhang, H.K., Roy, D.P., 2017. Using the 500 m MODIS land cover product to derive a
Wang, Y., Jiang, D., Zhuang, D., Huang, Y., Wang, W., Yu, X., 2013. Effective key consistent continental scale 30 m Landsat land cover classification. Rem. Sens.
parameter determination for an automatic approach to land cover classification Environ. 197, 15–34.
based on multispectral remote sensing imagery. PLoS One 8 (10), e75852. https:// Zoungrana, Benewinde J.-B., et al., 2015. Multi-temporal landsat images and ancillary
doi.org/10.1371/journal.pone.0075852. data for land use/cover change (LULCC) detection in the Southwest of Burkina Faso,
West Africa. Remote Sens. 7, 12076–12102.

15

You might also like