0% found this document useful (0 votes)
3 views

Optimizing_Activation_Function_in_Deep_Artificial_

The research investigates the optimization of activation functions (AF) in deep artificial neural networks (DANN) for landcover fuzzy pixel-based classification using Landsat 8 satellite images. The study identifies the sigmoid function as the most effective AF for improving classification accuracy in remote sensing applications. The methodology includes preparing a reference map, applying various AFs, and assessing the classification accuracy through a series of tests.

Uploaded by

Dr. Ahmed Serwa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Optimizing_Activation_Function_in_Deep_Artificial_

The research investigates the optimization of activation functions (AF) in deep artificial neural networks (DANN) for landcover fuzzy pixel-based classification using Landsat 8 satellite images. The study identifies the sigmoid function as the most effective AF for improving classification accuracy in remote sensing applications. The methodology includes preparing a reference map, applying various AFs, and assessing the classification accuracy through a series of tests.

Uploaded by

Dr. Ahmed Serwa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/319158466

Optimizing Activation Function in Deep Artificial Neural Networks Approach


for Landcover Fuzzy Pixel-Based Classification

Article in International Journal of Remote Sensing Application · January 2017


DOI: 10.14355/ijrsa.2017.07.001

CITATIONS READS

3 378

1 author:

Assoc.Prof.Dr.Eng.Ahmed Serwa
Helwan University
22 PUBLICATIONS 65 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Development of 3d ground simulator based on ground surveying observations View project

All content following this page was uploaded by Assoc.Prof.Dr.Eng.Ahmed Serwa on 17 September 2018.

The user has requested enhancement of the downloaded file.


International Journal of Remote Sensing Applications (IJRSA) Volume 7, 2017 www.seipub.org/ijrsa
doi: 10.14355/ijrsa.2017.07.001

Optimizing Activation Function in Deep


Artificial Neural Networks Approach for
Landcover Fuzzy Pixel-Based Classification
A. Serwa
Faculty of Engineering in El-Mataria, Helwan University, Cairo, Egypt.
[email protected]

Abstract

Artificial Neural Networks (ANN) is widely used in remote sensing classification. Optimizing ANN still an enigmatic field of
research especially in remote sensing. This research work is a trial to discover the ANN activation function to be used perfectly
in classification (landcover mapping). The first step is preparing the reference map then assume a selected activation function
and receive the ANN fuzzified output. The last step is comparing the output with the reference to reach the accuracy
assessment. The research result is fixing the activation function that is perfect to be used in remote sensing classification. A real
multi-spectral Landsat 7 satellite images were used and was classified (using ANN) and the accuracy of the classification was
assessed with different activation functions. The sigmoid function was found to be the best activation function.

Keywords

Remote Sensing; Classification; ANN; Activation Function; Landcover Mapping; Fuzzy Accuracy Assessment

1. Introduction

Automation in remote sensing systems is a challenging field of research work due to the need of reducing cost.
Artificial neural networks (ANN) approach is coming back to the research focus after it is extended to deep
learning (DL). Although it is not a novel these days but some of its behavior stills enigmatic specially in remote
sensing classification (Serwa, 2016). DL techniques have been proven useful in landcover classification. So, we need
to advance the state of the art by learning not only weights between neurons or the network structure itself but also
the activation functions. Current deep learning literature largely focuses on improving architectures and adding
regularization to the training process. Remote sensing, particularly satellite imagery, is perhaps the only cost-
effective technology able to provide data at a global scale. Within ten years, commercial services are expected to
provide sub-meter resolution images everywhere at a fraction of current costs (Murthy, et al., 2014). A little
number of researches discussing the activation function (AF) which transfers the signal from the input units to the
next hidden units. Recently due to the invention of DL researchers start to investigate the effect of AF on accuracy
e.g. Michael Xie, 2016, who studied the transfer learning from deep features for remote sensing and poverty
mapping. His study did not deal with multispectral images (MS) but only single band to study the light as a sign
for richness or poverty areas. This research dealing with low level deep feature classification using MS images. The
standard sigmoid reaches an approximation power comparable to or better than classes of more established
functions investigated in the approximation theory (DasGupta & Schnitger, 1993) (Karlik & Olgac, 2010). Jordan
presented the logistic function which is a natural representation of the posterior probability in a binary
classification problem (Jordan, 1995) (Karlik & Olgac, 2010). (Ö zkan & Erbek, 2003) made similar study but using
shallow learning in addition to examine only three AFs (linear, sigmoid and tanh). His study focusing on hard
classification not fuzzy one like this research. The novelty of this research work is make a sharp decision about
selecting AF in remote sensing classification. This research makes a spotlight inside the black box of DANN
because of severity in performance and accuracy obtained concerning with classification.

2. Deep Artificial Neural Network (Dann):

ANN are computational approach that simulate the microstructures of a biological nervous system depending on

1
www.seipub.org/ijrsa International Journal of Remote Sensing Applications (IJRSA) Volume 7, 2017

the signal transference. Basically, all ANNs have a similar topological structure. Some of the neurons interface with
the real world to receive its input and other neurons provide the world with the network’s output. All the
remaining neurons are hidden from view. So, there are three types of neurons; input, hidden and output neurons.
One must note that input neurons have single input and multiple output while the hidden neurons have multiple
input and multiple output. On contrast to input neurons is the output neurons that have multiple inputs and single
outputs. Figure 1 shows a typical multi-layer perceptron (MLP) architecture with shallow learning. In Case of
DANN the number of hidden layers is larger so that the weight updates is very slow and gradually moves toward
the optimized solution carefully as shown in figure 2. Using MLP in DANN is widely used epically in supervised
classification in the field of remote sensing. Both ANN and DANN must be optimized in architecture and
performance; the first is concerned with number of hidden layers and the second concerned with selecting
activation function (AF) in addition to achieving the correct weights.

FIG 1: TYPICAL MLP ARCHITECTURE WITH SHALLOW LEARNING.

FIG. 2: TYPICAL MLP ARCHITECTURE WITH DEEP LEARNING.

The back propagation neural networks (BPNN) algorithm is a generalized least squares algorithm that adjusts the
connection weights between units to minimize the mean square error between the network output and the target
output, its architecture is MLP form. The target output is known from training data it is classes' values
(Experimental work Results). In the first data entered to input unit are multiplied by the connection weights and is
summed to result the nets input to the unit in the hidden layer as shown in Figure 1 and given by:

nets=∑Xi Wis (1)


Where: Xi is a pixel vector of the input image of ith input layer; Wis is matrix of the connection weights between
the ith input layer unit to sth hidden layer unit. Each unit in sth hidden layer calculates a weighted sum of its
inputs and passes the sum via an activation function to the units in the jth output layer through weight vector Wsj¬.
There are a range of activation functions to transfer the data from hidden layer unit to an output layer unit. These
include linear, tangent hyperbolic, sigmoid functions etc. Although, the use of these functions may lead to

2
International Journal of Remote Sensing Applications (IJRSA) Volume 7, 2017 www.seipub.org/ijrsa

difference in accuracy of classification, it can be defined as:

Os= F (λ, nets) (2)

Where: F is called activation function (will be explained later), Os, is the output from the Sth hidden layer unit and
λ is a gain parameter, which controls the connection weights between the hidden layer unit and the output layer
unit. Outputs from the hidden units are multiplied with the connection weights and are summed to produce the
output of jth unit in the output layer as:

Oj = Os Wsj (3)

Where: Oj is the network output for jth output unit (i.e. the land cover class) and Wsj is the weight of the
connection between sth hidden layer unit and jth output layer unit.

An error function (E), determined from a sample of target outputs (reference training data) and network outputs, is
minimized iteratively. The process continues until E converges to minimum allowed value and the adjusted
weights are obtained. E is given by:

c (4)
E = 0.5 ∑ (Tj – Oj)2
j=1
Where Tj, is the target output vector, of is the network output vector and c is the number of classes. The target
vector is determined from the known class allocations of the training pixels, which are coded in binary form. The
collection of known class allocations of all pixels will form the target vector. After computing the error of the
network, it is compared with the limiting error EL of the network. If E < EL, the network training is stopped
otherwise E is back propagated to the units in the hidden and the input layers. The number of iterations may vary
from one dataset to the other, and is generally determined by trial and error. The process of back propagation and
weight adjustment is explained in the following: First the error vector at each unit of the output layer is computed
as:

(5)
Ej = Oj (1- Oj ) E
Then the error vector for each unit at the hidden layer is computed as:

c (6)
Es = Os (1- Os)∑ Oj Esj
j=1

Thereafter, the net error in connection weights between output layer and hidden layer is computed as,

(7)
Esj = Os Ej

And error in connection weights between hidden layer and input layer as,

Eis = Fm Xi Es (8)

Where: Fm is the momentum factor which controls the momentum of the connections between the hidden layer
unit and the input layer unit.

The weights between output layer and hidden layer are updated as:

(9)
(Wsj)new = (Wsj)old + Esj
And the weights between input layer and the hidden layer are updated as:

(10)
(Wis)new = (Wis)old + Eis

3
www.seipub.org/ijrsa International Journal of Remote Sensing Applications (IJRSA) Volume 7, 2017

The gain parameter λ is also updated as:

(11)
(λ)new = (λ)old + LR Es
Where: LR is the learning rate which controls the time of the learning process.

3. Methodology:

The research is focused on the optimization of AF for remote sensing image classification so, Landsat 8 MS image is
used. The reference of the study area is prepared pixel by pixel in order to achieve best training and testing
performance. ENVI 5.3 software is used to achieve the necessary tasks such as: spatial subletting, preparing regions
of interest (ROIs), geofencing classification etc. Another self-developed software called advanced digital image
processor for remote sensing (ADIPRS) is used to carry out the DANN with different AFs. This software was
developed by (Serwa, 2009). It is modified to cover DL task.

3-1 System Overview:


In this research Landsat 8 MS image is used with its spectral 30m bands resolution. The reference of the study area
is prepared pixel by pixel to achieve best training and testing performance. ENVI 5.3 software is used to achieve
the necessary tasks such as: spatial subletting, preparing regions of interest (ROIs), geofencing classification etc.
Another self-developed software called advanced digital image processor for remote sensing (ADIPRS) is used to
carry out the DANN with different AFs. This software was developed by (Serwa, 2009). It is modified to cover DL
task. System overview is indicated in figure 3 in the form of block diagram. The reason for using ADIPRS in
classification is the lack of selection of AF in any other remote sensing software.

FIGURE 3: BLOCK DIAGRAM OF THE RESEARCH WORK.

3.2- Research Data:

The data used in this research work concerned with a part of great Cairo (Giza governate) in Egypt and it contains:
1-Eight bands of Landsat 8 satellite image. 2- AutoCAD maps with scale 1:5000. The reference map was produced
using AutoCAD maps (mainly) and SPOT 5 satellite image (secondary) of the study area in addition to suite visits
with Garmin GPSmap 62s navigator (3 meters accuracy). Figure 4 indicates the study area on ENVI environment in
before spatial and spectral subset. The study area is represented by 500 X 500 pixels (225 km2) for the southern part
of great Cairo (Giza governate). Figure 5 shows the reference that is built pixel by pixel to obtain a best training and
testing performance.

4
International Journal of Remote Sensing Applications (IJRSA) Volume 7, 2017 www.seipub.org/ijrsa

FIGURE 4: TRUE COLOR LANDSAT 8 FOR THE STUDY ARE

FIGURE 5: REFERENCE FOR THE STUDY AREA.

3.3- Tools:

Tools include two software; ENVI 5.3 for spatial subset and for all necessary tasks (reference preparing, all
corrections) beside ADIPRS for applying DANN and for accuracy assessment. Figure 6 shows a flowchart diagram
for DANN algorithm that ADIPRS apply according to the explained equations in section 2. In figure 7 the interface
of the DANN module of ADIPRS software. It is developed basically to achieve the research objective by selecting

5
www.seipub.org/ijrsa International Journal of Remote Sensing Applications (IJRSA) Volume 7, 2017

different AFs and apply DANN algorithm. The optimal architecture is indicated where number of input neurons
equals number of image bands while the output neurons equals number of landcover classes. The hidden structure
is:4 hidden layers with neurons 5,8,7 and 7.

Start

MS Bands and Testing & Training Data

Selecting Activation

nets (Eq. 1)

O s (Eq. 2)

Oj (Eq. 3)

E (Eq. 4)

Yes
E < Limit
No
Ej (Eq. 5), Es (Eq. 6), Esj (Eq. 7) and Eis (Eq. 8)

Wsj (Eq. 9),Wis (Eq. 10) and (λ)new (Eq. 11)

End Final classified image

Start/End Input/Output Flow Compute

FIGURE 6: DANN FOR PIXEL-BASED CLASSIFICATION.

6
International Journal of Remote Sensing Applications (IJRSA) Volume 7, 2017 www.seipub.org/ijrsa

FIGURE 7: DANN MODULE.

The AF must be selected before run ADIPRS and the network architecture is fixed to study the effect of the
variance of AF on the accuracy. After reaching the converging condition the classification report can be printed as
shown in figure 8. User accuracy, producer accuracy, overall accuracy and Kapa value (K-hat statistic) is included
in the report. Both, user and producer accuracies for each class is also computed. A series of twelve AFs are studied
they are; Linear, step, Piecewise Linear, Sigmoid, Complementary loglog, Bipolar, Bipolar Sigmoid, Tanh, Hard
Tanh, Absolute, Rectifier and Smooth Rectifier. Figure 9 indicates the definition of each AF in the form of equation
and graph.

FIGURE 8: CLASSIFICATION REPORT.

7
www.seipub.org/ijrsa International Journal of Remote Sensing Applications (IJRSA) Volume 7, 2017

FIGURE 9: STUDIED AFS: 1) LINEAR. 2) STEP. 3) PIECEWISE LINEAR. 4) SIGMOID. 5) COMPLEMENTARY LOG LOG. 6) BIPOLAR. 7) BIPOLAR SIGMOID. 8)
TANH. 9) HARD TANH. 10) ABSOLUTE. 11) RECTIFIER. 12) SMOOTH RECTIFIER.

The reason for selecting these twelve AFs is that they are the most mentioned in the literature. A heroic effort is
made to develop and test each AF especially in testing its effect on classification accuracy. AF is affecting the
classification results in remote sensing but rarely researches handle its effect on accuracy using real data.
Performance accuracy (signal to noise ratio) was used to assess the accuracy.

4. Results

Each AF is selected to be used in classification then the accuracy assessment is carried out. Overall accuracy is
chosen to express the accuracy gauge. The network architecture is fixed to examine only the effect of selecting AF
on classification accuracy. Table 1 shows the end results numerically in the form of overall accuracy and number of
iterations. Figure 10 shows the final abstracted results graphically.
TABLE 1: TABULATION OF END RESULTS.

AF Accuracy % No. Iterations


Linear 91.23 65

Step 93.56 97

Piecewise linear 94.36 58

Sigmoid 94.98 35

Complementary log log 94.68 36

Bipolar 89.4 91

Bipolar sigmoid 95 36

Tanh 89.65 45

Hard Tanh 86.98 65

Absolute 68.45 106

Rectifier 91.21 81

Smooth rectifier 91.95 68

8
International Journal of Remote Sensing Applications (IJRSA) Volume 7, 2017 www.seipub.org/ijrsa

Number of iteration is necessary because of its pointing to the cost function even if the cost is out of our research
scope. We cannot neglect the solution conditions such as number of iterations but it can be put as secondary
criterion in the case of equality in accuracy. The maximum accuracy is achieved by using both bipolar sigmoid
(95%) and sigmoid (94.98%) AFs. A test of hypothesis is carried out and found that it is not significant if we
assumed that both results are equally. A statistical correlation test is made and the correlation is -0.6148 and it can
be considered not correlated (overall accuracy vs. number of iterations). Sigmoid AF gives best accuracy in best
number of iterations while the absolute AF gives the worst accuracy in max number of iterations. Some AFs such
as step, piecewise linear and complementary log log gives unexpectable results because they are not familiar in
remote sensing. Their accuracies varied in the range of 93 % -94 %. The rest of the tested AFs can be considered
moderate in accuracy and cost but we cannot recommend it. The results show that accuracies of most of the tested
classes behaves the same as the overall accuracy which means that the AF affect the general solution not a specific
solution. That means the results can be generalized in behavior.

FIGURE 10: END RESULTS.

In fact, the correlation study was not necessary the first dimension (AF) is variable and its order is not fixed but it is
made just in case some researcher used to make it. Last issue, the accuracy varied from 68.45 % to 95 % due to the
selection of AF and it is a meaningful change.

5. Conclusion:

Little number of research works focused on optimizing AF for DANN. Remote sensing data behaves in different
way as a random signal. AF must be selected carefully because it affects the classification results. One can conclude
that AFs affect classification accuracy in remote sensing so that about 27 % accuracy enhancement can be achieved
by selecting a good AF. The classification time is affected by selecting a good AF. Both sigmoid and bipolar
sigmoid AFs are recommended to be used in classification of remote sensing landcover features.

REFERENCES

[1] Serwa, "Development of Soft Computational Simulator for Aerial Imagery Project Planning," Surveying and Land
Information Science (SaLIS), vol. 75, no. 2, 2016.
[2] K. Murthy, M. Shearn, B. D. Smiley, A. H. Chau, J. Levine and D. and Robinson, "Skysat-1: very high-resolution imagery
from a small satellite.," International Society for Optics and Photonics, 2014.
[3] B. DasGupta and G. Schnitger, the Power of Approximating: A Comparison of Activation, Advances in Neura Information
Processing Systems ed., San Mateo, CA: Morgan Kaufmann Publishers, 1993.

9
www.seipub.org/ijrsa International Journal of Remote Sensing Applications (IJRSA) Volume 7, 2017

[4] B. Karlik and A. V. Olgac, "Performance Analysis of Various Activation Functions in Generalized MLP Architectures of
Neural Networks," International Journal of Artificial Intelligence And Expert Systems (IJAE), vol. 1, no. 4, pp. 111-122,
2010.
[5] M. Jordan, "Why the logistic function? A tutorial discussion on probabilities and neural networks," Massachusetts Institute
of Technology, Massachusetts, 1995.
[6] C. Ö zkan and F. S. Erbek, "The Comparison of Activation Functions for," Photogrammetric Engineering & Remote Sensing,
vol. 69, no. 11, p. 1225–1234, 2003.
[7] A. Serwa, Automatic Extraction of Topographic Features from Digital Images. PhD thesis, Cairo: Faculty of Engineering in
Cairo- Azhar University, 2009.
[8] A. Serwa and H. Semarry, "Integration of Soft Computational Simulator and Strapdown Inertial Navigation System for
Aerial Surveying Project Planning," Spatial Information Research (SPIR), vol. 24, no. 3, p. 279–290, June 2016.
[9] Michael Xie, Neal Jean, Marshall Burke, David Lobell, and Stefano Ermon, "Transfer Learning from Deep Features for
Remote Sensing and Poverty Mapping," in AAAI Conference on Artificial Intelligence (AAAI-16), 2016.

Dr. Ahmed Serwa Working as Assoc. Prof. of Geomatics and GIS, Civil Engineering
Department, Faculty of Engineering in Mataria Helwan University, Cairo, Egypt. He is
a researcher in the field of Remote sensing, Photogrammetry, Geodesy, GIS,
Geoinformatics and Computer SW development in Earth Sciences.

10

View publication stats

You might also like