NDVI Versus CNN Features in Deep Learning for Land Cover Clasification of Aerial Images
NDVI Versus CNN Features in Deep Learning for Land Cover Clasification of Aerial Images
Abstract—Agriculture plays a strategic role in the economic learning classifier. Others have demonstrated the effectiveness
development of a country. Appropriate classification of land cover of techniques such as Normalized Difference Vegetation Index
images is vital for planning the right agricultural practices and (NDVI) over CNN features on simple machine learning models
maintaining sustainable environment. This paper provides such as SVMs, decision trees, random forests. Our study is an
methods and analysis for land cover classification of remote
sensing images. Satellite images form the input while mapping of
extension of this comparison where we look at NDVI versus
every image to a distinct class is obtained as output. The objective CNN features on MLP for land scene classification. We intend
is to compare the hand-crafted features based on Normalized to use the dataset SAT-6, which is based on sampling image
Difference Vegetation Index (NDVI) and feature learning from tiles from the much larger National Agriculture Imagery
Convolutional Neural Networks (CNN). The rationale of this work Program (NAIP) dataset. A sample input of satellite images is
is to take advantage of techniques that are illumination invariant. as shown in Fig. 1. Validation of results is based on
NDVI versus CNN features have been compared on a linear experimentation and comparison of the performance of
Support Vector Machine (SVM). However, no comparative study different models is based on their test accuracy.
has been carried out related to DVI based features and CNN based
features on a deep learning classifier. This paper compares the
performance of different classifiers and evaluates them based on
test accuracy.
I. INTRODUCTION
Land cover classification of remote sensing images has a
wide variety of application domains. It is mainly used in
precision agriculture for soil, crop and pest management, land
use planning, and water quality modeling. Changes in land
cover and land use affect global systems (for example,
atmosphere, climate, and sea level) or occur in a localized Fig. 1. Sample input - Satellite images
fashion in enough places to have a significant effect (Meyer and
Turner, 1992). Hence, integrating data science with image The aim of our study is to compare the performance of
analysis is crucial to understanding changes on a broad scale for different classifiers for the dataset SAT-6, which comprises of
the sake of creating better global environments in future. six classes of land cover. An image consists of four bands - Red,
Blue, Green and Near Infrared (NIR). The classifiers have the
Geospatial analysis is essential for a wide range of industry same underlying deep architecture - which is basically a Multi-
applications. Geospatial images and its classification form the Layer Perceptron. The classifiers will differ in the features of
key basis for these analysis’. Several Machine Learning and the input. First method will involve hand crafted features from
Deep Learning methods have tried various approaches in the the NDVI technique. The second method will have features
past for classifying the geospatial images which include extracted by CNN. The classifiers will be evaluated based on
Decision Trees, Random Forests, Support Vector machines and test accuracy.
so on. In this project, we intend to compare the performance of
handcrafted features versus features extracted from The paper is organized as follows: In Section II, related
Convolutional Neural Network (CNN). The features obtained work is presented. Section III describes the proposed approach.
from the two different techniques are fed into a Multi-Layer Validation methodology or experimentation details are
Perceptron (MLP) to learn the classification rule. Existing presented in Section IV. This includes dataset, experimental
works compare handcrafted features such as Gray-Level Co- design and CNN architecture. Results are presented in Section
Occurrence Matrix (GLCM) and CNN features on a deep V. Finally, the concluding remarks are in Section VI.
6484
Authorized licensed use limited to: Carnegie Mellon University Libraries. Downloaded on May 08,2025 at 01:44:38 UTC from IEEE Xplore. Restrictions apply.
entire NAIP dataset is ∼65 terabytes spanning the whole of the For the Multi Layer Perceptron, we experimented with
Continental United States (CONUS). Saikat et al used the several architectures. The activation function used within the
uncompressed digital Ortho quarter quad tiles (DOQQs) which hidden layers is the hyperbolic tangent sigmoid function and for
are GeoTIFF images and the area corresponds to the United the output layer, it is the softmax function. The training function
States Geological Survey (USGS) topographic quadrangles. To is the scaled conjugate gradient method, which was chosen for
maintain the high variance inherent in the entire NAIP dataset, its computational efficiency in terms of speed.
they sampled image patches from a multitude of scenes (a total
of 1500 image tiles) covering different landscapes from V. RESULTS
California. SAT-6 consists of a total of 405,000 image patches We use the work from Saikat et al to serve as baseline for
each of size 28×28 and covering 6 land cover classes - barren comparing our CNN model. The baseline model achieved an
land, trees, grassland, roads, buildings, and water bodies. The overall test accuracy of 79% on the SAT-6 dataset.
images consist of 4 bands – red, green, blue and Near Infrared
(NIR). For the MLP architecture of 1 layer deep 10 neurons wide
network, we observed the highest performance. The results
B. Experimental Design observed are described as follows:
For the vanilla CNN model, we used the same experiment For the hand-crafted feature selection method, we achieved
set up as Saikat et al, with a small change where we introduced an overall test accuracy of 83.25%. We observe much better
a batch normalization layer for all the CNN layers in between results using our CNN network (Saikat + batch normalization).
the convolutions and the activations. Saikat et al used the model We achieved an overall test accuracy of 98.26%. The below
as follows: the first convolutional layer comprised of 6 feature confusion matrix (Fig. 5) provides a comprehensive view on the
maps, followed by a subsampling layer of kernel size 3*3 with test data results. Interestingly, we see that the two feature
average pooling, followed by convolutional layer with 12 extraction methods yield similar performance for the output
feature maps, followed by subsampling layer of kernel size 5*5 classes except 4 and 5 (grassland and road respectively) where
with max pooling, finally collected to the output layer. The CNN performs significantly better. Table 1 presents the
pooling windows were overlapping with a stride size of 2 pixels. comparison of performance and Fig. 5 represents the confusion
The last subsampling layer is connected to a fully-connected matrices for NDVI features with MLP and CNN features with
layer with 64 neurons. The output of the fully-connected layer is MLP. In this case, MLP Architecture with 10 Neurons and 1
fed into a 6-way softmax function that generates a probability Layer is considered.
distribution over the six class labels of SAT-6.
Baselines from Saikat et al -
C. CNN Architecture
CNN architecture used for experimentation is as shown in • CNN: 79%
Fig. 4. • Vegetation Index based feature + Deep Belief
Network: 93.916%
Output Layer
6485
Authorized licensed use limited to: Carnegie Mellon University Libraries. Downloaded on May 08,2025 at 01:44:38 UTC from IEEE Xplore. Restrictions apply.
VI. CONCLUSION AND FUTURE WORK
In this paper, we have compared two different feature
extraction methods - CNN and NDVI to see how automatic
feature learning fares with respect to a manual feature extraction
technique for image classification using a deep neural network,
which is a Multi-Layer Perceptron. CNN features with batch
normalization resulted in the best performance. We also see that
NDVI is better than CNN without batch normalization. To this
end, we have shown evidence to support that there is benefit in
applying normalization for classifying remote sensing images,
since they tend to vary in brightness as they span wide regions
of space. Illumination invariance turns out to be the inevitable
consequence of applying normalization. In future work, we
think it would be beneficial to study the effectiveness of this
technique to datasets entirely comprised of dark images.
REFERENCES
[1] David J. Mulla, Twenty five years of remote sensing in precision
agriculture: Key advances and remaining knowledge gaps, Biosystems
Engineering, Volume 114, Issue 4, 2013, Pages 358-371, ISSN 1537-
5110,
https://doi.org/10.1016/j.biosystemseng.2012.08.009.(http://www.scienc
edirect.com/science/article/pii/S153751 1012001419)
[2] Keiller Nogueira, Otvio A.B. Penatti, and Jefersson A. dos Santos. 2017.
Towards better exploiting convolutional neural networks for remote
sensing scene classification. Pattern Recogn. 61, C (January 2017), 539-
556. DOI: https://doi.org/10.1016/j.patcog.2016.07.001
[3] Saikat Basu and (2015). DeepSat - A Learning framework for Satellite
Imagery. CoRR, abs/1509.03602.
[4] J. R. Bergado, C. Persello and C. Gevaert, "A deep learning approach to
the classification of sub-decimetre resolution aerial images," 2016 IEEE
International Geoscience and Remote Sensing Symposium (IGARSS),
Beijing, 2016, pp. 1516-1519. doi: 10.1109/IGARSS.2016.7729387R.
[5] Xiao Xiang Zhu and (2017). Deep learning in remote sensing: a review.
CoRR, abs/1710.03959.
[6] Dennis C. Duro, Steven E. Franklin, Monique G. Dubé, A comparison of
pixel-based and object-based image analysis with selected machine
learning algorithms for the classification of agricultural landscapes using
SPOT-5 HRG imagery, Remote Sensing of Environment, Volume 118,
2012, Pages 259-272, ISSN 0034-4257,
https://doi.org/10.1016/j.rse.2011.11.020.
[7] O. A. B. Penatti, K. Nogueira and J. A. dos Santos, "Do deep features
generalize from everyday objects to remote sensing and aerial scenes
domains?," 2015 IEEE Conference on Computer Vision and Pattern
Recognition Workshops (CVPRW), Boston, MA, 2015, pp. 44-51. doi:
10.1109/CVPRW.2015.7301382
[8] A. Romero, C. Gatta and G. Camps-Valls, "Unsupervised Deep Feature
Extraction for Remote Sensing Image Classification," in IEEE
Transactions on Geoscience and Remote Sensing, vol. 54, no. 3, pp. 1349-
1362, March 2016. doi: 10.1109/TGRS.2015.2478379
Fig. 5. Confusion Matrices – MLP Architecture (10 Neurons, 1 Layer)
[9] J. Arenas-Garcia, K. B. Petersen, G. Camps-Valls and L. K. Hansen,
"Kernel Multivariate Analysis Framework for Supervised Subspace
We also looked at the generalization performance of both Learning: A Tutorial on Linear and Kernel Multivariate Methods," in
these techniques on classification of images taken in the dark. IEEE Signal Processing Magazine, vol. 30, no. 4, pp. 16-29, July 2013.
We used MATLAB’s brighten function to darken the test doi: 10.1109/MSP.2013.2250591
images in the RGB channels to simulate pictures taken in the [10] National Agriculture Imagery Program (NAIP) Information sheet:
dark. We observed similar performance by both these feature https://www.fsa.usda.gov/Internet/FSA_File/naip_2009_info_final.pdf
extraction methods on the same test data with a much lower
performance than what was observed for daylight images.
Perhaps it could be that the method we used (MATLAB’s
brighten function only takes an RGB colormap as the input, and
this ignores the NIR band) to simulate darkness was not
appropriate.
6486
Authorized licensed use limited to: Carnegie Mellon University Libraries. Downloaded on May 08,2025 at 01:44:38 UTC from IEEE Xplore. Restrictions apply.