Classification of Glaucoma in Fundus Images Using Convolutional Neural Network With MobileNet Architecture

Uploaded by

5023.65 Andini Vira Salsabilla Z.P

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Classification of Glaucoma in Fundus Images Using Convolutional Neural Network With MobileNet Architecture

Uploaded by

5023.65 Andini Vira Salsabilla Z.P

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2022 1st International Conference on Information System & Information Technology (ICISIT)

Classification of Glaucoma in Fundus Images Using

Convolutional Neural Network with MobileNet
2022 1st International Conference on Information System & Information Technology (ICISIT) | 978-1-6654-0200-2/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICISIT54091.2022.9872945

Architecture
Ibnu Da’wan Salim Ubaidah Yunendah Fu’adah Sofia Sa’idah
School of Electrical Engineering School of Electrical Engineering School of Electrical Engineering
Telkom University Telkom University Telkom University
Bandung, Indonesia Bandung, Indonesia Bandung, Indonesia
[email protected]

Rita Magdalena Abel Bima Wiratama Richard Bina Jadi Simanjuntak

School of Electrical Engineering School of Electrical Engineering School of Electrical Engineering
Telkom University Telkom University Telkom University
Bandung, Indonesia Bandung, Indonesia Bandung, Indonesia

Abstract—Glaucoma is a damaged optic nerve due to increased Early examination of glaucoma is the first step to reducing
pressure on the eyeball. The cause is a mismatch between eye the severity of the sufferer. Optometrists usually detect glau-
fluid (aqueous humor) produced and the amount of eye fluid coma using Cup to Disc Ratio or CDR parameter [3]. However,
secreted. Ophthalmologists usually detect glaucoma using Cup to
Disc Ratio or CDR parameter. However, the calculation of CDR the calculation of CDR parameters is still done manually by
parameters is still done manually, usually done by trained doctors dividing the diameter of Optic Cup and the diameter of Optic
and relatively expensive and limited equipment. This study pro- Disk. In this case, it is usually performed by a trained physician
poses a system that can classify glaucoma using the Convolutional and a relatively expensive and limited Heidelberg Retinal
Neural Network method with MobileNet architecture. MobileNet Tomograph (HRT) device. Therefore, we need an automated
has two convolution parts: depthwise convolution and pointwise
convolution. The function of the Depthwise Convolution is to system to facilitate doctors in diagnosing glaucoma.
apply a single convolution filter per input channel, while the Several previous studies have developed glaucoma detection
function of the pointwise convolution is to build new features with fundus image data. In 2012, Muthu Rama Krishan et
by calculating the linear combination of the input channels by
al. conducted a study using a dataset from Kasturba Medical
applying the 1x1 convolution. The data used comes from rim-
one-r1 database. Result accuracy of the proposed method reaches College, which consisted of two classes: normal and glaucoma
99%. Automated glaucoma classification can assist medical staff [4]. Application Support Vector Machine (SVM) method using
in identifying the best treatment for their patients. HOS, TT, DWT, and Energy features produces an accuracy of
Index Terms—Glaucoma, Convolutional Neural Network 91.67%.
(CNN), and MobileNet.
In 2016, Shishir Maheshwari et al. conducted a study using
I. I NTRODUCTION the Medical Image Analysis Group and Kasturba Medical
Glaucoma is a disease of the optic nerve and gets worse over College dataset, which consisted of two classes: normal and
time [1]. The cause is a mismatch between the amount of eye glaucoma [5]. Application Support Vector Machine (SVM)
fluid (aqueous humor) produced and the amount of eyeball method using 2D EWT and correntropy resulted in an accuracy
fluid secreted. As a result, the ocular pressure of glaucoma of 98.33%. In the next two years, Kishore Balasubramanian
patients is higher than that of healthy eyes. If there is no et al. conducted a study using datasets from the HRF and
follow-up, this condition can lead to permanent blindness. A RIM-ONE databases, which consisted of two classes: normal
total of 2.78% of visual impairment globally is caused by and glaucoma [6]. The application of K-Nearest Neighbor
glaucoma [2]. In blind cases, glaucoma is the second-largest (KNN) method using PCA and Hybrid features: correlation,
globally after cataracts. In 2010, the number of people with and homogeneity produces 99% accuracy.
glaucoma reached 60.5 million people. The global incidence In 2019, Kamel H. Rahouma et al. conducted a study using
of glaucoma is predicted to reach 76 million in 2020 and a dataset from the rim-one r2 database consisting of two
118.8 million in 2024. According to Riskesdas, in 2007, the classes: normal and glaucoma [7]. Application of Artificial
prevalence of glaucoma in Indonesia was 0.46%. As many Neural Network (ANN) method using GLCM and GLRLM
as four to five people out of 1,000 Indonesians suffer from produces an accuracy of 99%. In the same year, Ali Serener et
glaucoma. al. conducted a study comparing which accuracy was better be-

Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on May 29,2024 at 06:43:04 UTC from IEEE Xplore. Restrictions apply.
978-1-6654-0200-2/22/$31.00 ©2022 IEEE 198
2022 1st International Conference on Information System & Information Technology (ICISIT)

tween GoogLeNet and ResNet50 architectures [8]. This study

classified glaucoma into three classes: advanced-glaucoma,
early-glaucoma, and no-glaucoma. This study results indicate
that the GoogLeNet architecture is better than ResNet50.
In 2020, Yunendah Nur Fu’adah et al. conducted a study
using the rim-one database r2, which consists of two classes:
normal and glaucoma [9]. Application of Convolutional Neural
Network (CNN) method using three hidden layers with a 3x3
filter size of 16, 32, 64 produces an accuracy of 91%. In
the same year, Shishir Maheshwari et al. conducted a study
using the rim-one database, which consisted of two classes:
normal and glaucoma [10]. Application of Convolutional Neu-
ral Network (CNN) method using the Alexnet architecture
and Local binary pattern results in an accuracy of 98.90%.
Ajitha S et al. also conducted a study in the same year
using datasets from ORIGA, and DRASHTI-GS [11]. This
study compares which accuracy is better between VGG16 and
ResNet50. In the process, this study classified glaucoma into
two classes: glaucoma, and no glaucoma. This study shows
that the ResNet50 architecture is better than VGG16 in the
DRASHTI-GS dataset.
Using traditional techniques to classify glaucoma in several
previous studies has shown satisfactory results. In the tradi- Fig. 1. CNN model with MobileNet architecture proposed for the classification
tional technique, classifying systems usually involves feature of glaucoma
extraction and classification separately [10]. However, CNN
networks can automatically extract features and classify data
according to their class. The advantage of the CNN-based A. Dataset
method is that the feature extraction and classification pro- The dataset used in this study is the rim-one-r1 database
cesses are not carried out separately. The method that uses which was obtained from the website of the Medical Image
CNN as the primary approach for classifying glaucoma in Analysis Group [13]. The dataset consists of 169 fundal images
several previous studies can still be improved. Therefore, this with five classes: 14 deep, 12 early, 14 moderate, 118 normal,
study will apply the CNN method to other architectures. and 11 oht. An augmentation procedure is used since the
One other CNN architecture is MobileNet. The MobileNet fundus image data is not balanced and the amount is small.
architecture and other architectures differ because the convolu- The whole fundus image data becomes 2000 images after the
tion layer uses a filter thickness that matches the input image. augmentation process, with 400 deep, 400 early, 400 moderate,
[12]. So allows for a faster time in the training process and has 400 normal, and 400 oht.
a smaller weight size, which can be easily implemented on the
needs of mobile devices. In addition, this study also proposes
a classification of glaucoma with more classes, namely five
classes consisting of deep, early, moderate, normal, and ocular
hypertension (OHT) derived from the RIM-ONE R1 database.
The proposed approach’s application is expected to improve
the accuracy performance in classifying glaucoma.

II. M ETHOD
This study proposes detection for glaucoma classification
based on deep, early, moderate, normal, and oht. The system
uses the CNN method with MobileNet architecture. Fundus
image input will pass through depthwise separable convolu-
tion, which contains two layers, namely depthwise convolution Fig. 2. Dataset contents of each class
and pointwise convolution. Then, the input passes through
the fully connected layer, and the softmax activation classifies
glaucoma according to the predefined class. The CNN model B. Convolutional Neural Network (CNN)
with MobileNet architecture proposed in this study is generally CNN is one of the supervised learning groups. In general,
shown in figure 1. CNN is an architecture that can be trained, and consists of

Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on May 29,2024 at 06:43:04 UTC from IEEE Xplore. Restrictions apply.
199
2022 1st International Conference on Information System & Information Technology (ICISIT)

several stages [14], [15]. CNN is divided into two stages: value. The general form of the pooling layer uses a 2x2 filter
feature extraction and classification layer. which is applied in two stages and works on each input slice.

Fig. 3. Convolutional Neural Network Architecture

Feature extraction is a process where the unique charac-

Fig. 5. Max-Pooling Layer
teristics of the data are taken which will then be processed
[16]. The purpose of feature extraction is to extract meaningful
Classification Layer is a stage used to identify data that has
information from the data to be processed, reduce the amount
been processed in feature extraction. In the classification layer,
of data, and increase processing precision. There are three parts
there are several parts: the flatten layer, the fully connected
to feature extraction: convolutional layer, function activation
layer, and the softmax activation [9], [16].
layer, and pooling layer.
4) Flatten Layer: is changing the result of feature extraction
1) Convolutional Layer: is the stage of performing a con-
of a multidimensional array into a vector through a smoothing
volution operation on the output of the previous layer [15].
process which is then used as input for the fully-connected
Convolution is a mathematical term in which the application
layer [19].
of one function to the output of another function is repeated.
Convolutional layer consists a collection of neurons arranged 5) Fully Connected Layer: is the layer where all the acti-
to form a filter (pixel). vation neurons are all connected. This layer is used in Multi-
Layer Perceptron (MLP) to change the dimensions of the data
so that they can be classified linearly [18].
6) Softmax Activation: is a logistic regression algorithm
used for classification with more than two classes [20]. Equa-
tion 2 shows the equation of softmax activation.

ezj
fj (Z) = P zk (2)
ke

The notation denotes the result of the function for each jth
element in the class output vector. The argument z shows the
hypothesis given by the training model that the softmax func-
tion can classify it. Softmax gives better probabilistic results
Fig. 4. Convolutional Layer
compared to other classification algorithms, thus enabling the
calculation of probabilities for all labels. The label will be
2) Rectified Linear Units (ReLU) Activation: is an operation
converted into a vector with a value between zero and one. If
to increase model representation and introduce nonlinearity.
the values are added together, they will produce value one.
ReLU makes all negative pixel values zero, which helps draw
the feature map more closely to the associated image [17]. C. MobileNet
Equation 1 shows the equation of ReLU activation.
MobileNet is a CNN architecture that can be used to over-
f (x) = max(0, x) (1) come the need for excessive computing resources. MobileNet
is designed from limited resources and can produce maximum
3) Pooling Layer: is a filtering process with a specific size accuracy. So MobileNet has latency, specifications, and low
and step that alternately moves throughout the feature area. energy consumption, so it is very suitable to be applied
Purpose of pooling layer is to reduce the size of matrix. to the mobile device application [12]. MobileNet is formed
In the pooling layer, two types of pooling are often used: from depthwise separable convolution, replacing the standard
max-pooling and average pooling [18]. Max-pooling takes convolution, which divides convolutions into two separate
the maximum value, while average-pooling takes the average layers: depthwise convolutional and pointwise convolutional.

Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on May 29,2024 at 06:43:04 UTC from IEEE Xplore. Restrictions apply.
200
2022 1st International Conference on Information System & Information Technology (ICISIT)

system success in classifying glaucoma, while TFP and TFN

indicate system failure in classifying glaucoma.

IV. R ESULT AND D ISCUSSION

The glaucoma dataset used contains 2000 fundus images:
deep, early, moderate, normal, and oht. Dataset used in training
contains 1600 data, and test contains 400 data. Validation
contains 20 percent of the training data: 320 validation data
and 1280 training data. This research uses the MobileNet
architecture with various optimizations and learning rates.
The four optimizations used are Adam, Nadam, SGD, and
RMSprop, while the learning rates range from 0.01, 0.001,
Fig. 6. Standard convolution and depthwise separable convolution 0.0001, and 0.00001. This study has four parameters: accuracy,
precision, recall, and f1-score. Results of model accuracy
proposed in the study are shown in table 1.
The first layer or depthwise convolution applies a single
convolution filter per input channel [21]. The second layer TABLE I
or pointwise convolution builds a new feature by calculating O PTIMIZER AND L EARNING RATE COMPARISON OF THE PROPOSED MODEL
the linear combination of input channels and applying a 1x1
Accuracy
convolution. Figure 7 below is a block layer consisting of Optimizer Learning Rate
Train Val Test
depthwise convolution and pointwise convolution followed by 0.00001 0.98 0.86 0.84
batch normalization and ReLU activation. 0.0001 0.99 0.91 0.94
Adam
0.001 0.99 0.97 0.97
0.01 0.99 0.98 0.98
0.00001 0.98 0.87 0.86
0.0001 0.99 0.91 0.97
Nadam
0.001 1.00 0.99 0.99
0.01 0.99 0.99 0.98
0.00001 0.34 0.34 0.34
0.0001 0.50 0.36 0.38
SGD
0.001 0.99 0.96 0.98
0.01 1.00 0.97 0.98
0.00001 0.97 0.79 0.89
0.0001 0.99 0.89 0.93
RMSprop
0.001 0.99 0.98 0.88
Fig. 7. Depthwise Separable Convolution Architecture 0.01 0.99 0.96 0.97

III. S YSTEM P ERFORMANCE Based on table 1, it is known that the higher the learning
rate, the higher the usual accuracy. The best accuracy results
Four parameters are used in this study to measure system
are obtained using Nadam optimizer and learning rate of 0.001,
performance: accuracy, precision, recall, and f1-score. These
indicated by a good training accuracy of 1.00. There is no
measurements are shown in equations 3, 4, 5, and 6 [22].
difference in accuracy between the validation and test data,
TTP + TTN 0.99.
Accuracy = (3)
TTP + TTN + TFP + TFN
TTP
P recision = (4)
TTP + TFP
TTP
Recall = (5)
TTP + TFN
precision × recall
F 1 − Score = 2 × (6)
precision + recall
Total True Positive (TTP) the number of predicted and actual
data is positive. Total True Negative (TTN) the number of
predicted and actual data is negative. Total False Positive (TFP)
the number of predictive data is positive and actual is negative.
Total False Negative (TFN) the number of predictive data is
negative and actual is positive. So, TTP and TTN indicate Fig. 8. Confusion matrix of test data.

Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on May 29,2024 at 06:43:04 UTC from IEEE Xplore. Restrictions apply.
201
2022 1st International Conference on Information System & Information Technology (ICISIT)

Figure 8 shows that most of the train data used are classified TABLE III
correctly according to their class. Figure 9 shows the ROC D ESCRIPTION OF THE METHODS THAT HAVE BEEN USED TO CLASSIFY
GLAUCOMA
curve on the test data resulting in good classification until
almost all are close to the point (0,1). Authors Classifier Architectures/ Class Accuracy
Features
Muthu SVM HOS, TT, DWT, Normal 91.67%
Rama and Energy fea- and
Krishan et tures glaucoma
al. (2012)
Shishir SVM 2D EWT and cor- Normal 98.33%
Mahesh- rentropy and
wari et al. glaucoma
(2016)
Kishore KNN PCA and Hybrid Normal 99%
Balasub- features (correla- and
ramanian tion and homo- glaucoma
et al. geneit)
(2018)
Kamel H. ANN GLCM and Normal 89.3%
Rahouma GLRLM and
et al. glaucoma
Fig. 9. ROC curve for test data. (2019)
Ali CNN GoogleNet No- 85%
Serener et glaucoma,
al. (2019) early-
glaucoma
TABLE II and
S YSTEM MODEL PERFORMANCE IN THIS STUDY advanced-
glaucoma
Class Precision Recall F1-Score Ali CNN ResNet-50 No- 86%
Deep 1.00 1.00 1.00 Serener et glaucoma,
Early 0.99 1.00 0.99 al. (2019) early-
Moderate 1.00 1.00 1.00 glaucoma
Normal 1.00 0.99 0.99 and
OHT 1.00 1.00 1.00 advanced-
glaucoma
Yunendah CNN Three hidden lay- Normal 91%
Fu’dah et ers using a 3x3 and
Table II shows performance results from precision, recall, al. (2020) filter size of 16, glaucoma
and f1-score, with an average value of 0.99. These results 32, 64
indicate that this model is promising. So this research uses Shishir CNN Alexnet and Lo- Normal 98.90%
the proposed model of Nadam optimizer and a learning rate Mahesh- cal binary pattern and
wari et al. glaucoma
of 0.001. (2020)
Table III provides a brief description of the methods that Ajitha S et R-CNN ResNet-50 Glaucoma 92.5%
al. (2020) and no-
have been used to classify glaucoma. In traditional techniques, glaucoma
the classification system still involves feature extraction and Ajitha S et R-CNN VGG16 Glaucoma 92.%
classification separately. However, CNN can automatically al. (2020) and no-
glaucoma
extract features and classify them into different classes. The This CNN MobileNet Deep, 99%
advantage of the CNN-based method is that there is no need Study early,
to perform feature extraction and classification separately. moderate,
normal
The proposed method is MobileNet which overcomes the and oht
size of the too large weights and the time required in the
training process. To prevent dataset imbalance, we perform
an augmentation process that makes the datasets of each class
have the same number. V. C ONCLUSION
The advantages of the proposed method are: This paper has developed an automatic glaucoma classifica-
1) Can be used to process fundus images to help medical tion based on fundus image processing using the CNN method
staff classify glaucoma, with MobileNet architecture. The MobileNet architecture con-
2) With the diversity of classes in the rim-one r1 database, sists of depthwise separable convolution, followed by a fully
the system can classify glaucoma into five classes: deep, early, connected layer and a softmax layer. This study obtained the
moderate, normal, and oht, best model when using the Nadam optimizer and a learning
3) The application of the proposed approach makes the level of 0.001 in classifying the glaucoma dataset into deep,
model have a small weight size so that it can be implemented early, moderate, normal, and oht classes. The model obtained
on mobile devices. an accuracy rate of 99% with an average precision, recall, and

Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on May 29,2024 at 06:43:04 UTC from IEEE Xplore. Restrictions apply.
202
2022 1st International Conference on Information System & Information Technology (ICISIT)

f1-score of 0.99. This method obtained promising classification [21] G. Wang, G. Yuan , T. Li and M. Lv, ”An multi-scale learning network
performance, which can assist medical staff in identifying the with depthwise separable convolutions,” IPSJ Trans. Comput. Vis. Appl.,
2018.
best treatment for their patients. The authors intend to expand [22] Bajwa M N, Malik M I, Siddiqui S A, Dengel A, Shafait F, Neumeier W
the work by increasing the number of datasets and other retinal and Ahmed S 2019 Correction to: Two-stage framework for optic disc
diseases in the future. localization and glaucoma classification in retinal fundus images using
deep learning (BMC Medical Informatics and Decision Making (2019)
19 (136) DOI: 10.1186/s12911-019- 0842-8) BMC Med. Inform. Decis.
R EFERENCES Mak. 19 1–16

[1] P. V. Rao, R. Gayathri and R. Sunitha, ”A Novel Approach for Design

and Analysis of Diabetic Retinopathy Glaucoma Detection using Cup to
Disk Ration and ANN,” Procedia Materials Science, p. 446 – 454, 2015.
[2] InfoDATIN, ”Situasi Glaukoma di Indonesia,” Pusat Data dan Informasi
Kementrian Kesehatan RI, Jakarta Selatan, 2014.
[3] R. Munarto, E. Permata and I. Ginanjar, ”Klasifikasi Glaucoma Meng-
gunakan Cup-To-Disc Ratio dan Neural Network,” Simposium Nasional
RAPI XV , 2016.
[4] M. Rama Krishnan M and O. Faust, ”Automated Glaucoma Detection
Using Hybrid Feature Extraction in Retinal Fundus Images,” Journal of
Mechanics in Medicine and Biology, 2012.
[5] S. Maheshwar, R. B. Pachori and U. R. Acharya, ”Automated Diagnosis
of Glaucoma Using Empirical Wavelet Transform and Correntropy
Features Extracted from Fundus Images,” IEEE Journal of Biomedical
and Health Informatics, vol. 21, no. 3, pp. 803 - 813, 2016.
[6] K. Balasubramanian, N. P. Ananthamoorth and K. Gayathridevi , ”Auto-
matic Diagnosis and Classification of Glaucoma Using Hybrid Features
and K-Nearest Neighbor,” American Scientific Publishers, vol. 8, no. 3,
pp. 1598-1606(9), 2018.
[7] K. H. Rahouma, M. M. Mohamed and N. S. A. Hameed, ”Glaucoma
Detection and Classification Based on Image Processing and Artificial
Neural Networks,” Egyptian Computer Science Journal, vol. 43, no. 3,
2019.
[8] A. Serener and S. Serte, ”Transfer Learning for Early and Advanced
Glaucoma Detection with Convolutional Neural Networks,” Tiptekno’19,
2019.
[9] Y. N. Fu’adah, S. Sa’idah, I. Wijayanto, N. Ibrahim, S. Rizal and
R. Magdalena, ”Computer Aided Diagnosis for Early Detection of
Glaucoma using Convolutional Neural Network (CNN),” Springer, 2021.
[10] S. Maheshwar, V. Kanhangad and R. B. Pachori, ”CNN-based approach
to glaucoma diagnosis using LBP-based data transfer and augmentation
learning,” Drafted at Elsevier Biomedical Signal Processing and Control,
2020.
[11] S. Ajitha and M. V. Judy, ”Faster R-CNN classification for the recog-
nition of glaucoma,” First International Conference on Advances in
Physical Sciences and Materials, 2020.
[12] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand
, M. Andreetto and H. Adam , ”MobileNets: Efficient Convolutional
Neural Networks for Mobile Vision Applications,” arVix, 2017.
[13] Medical Images Analysis Group, ” RIM-ONE Original
Versions Release 1: RIM-ONE r1,” [Online]. Available:
https://medimrg.webs.ull.es/research/retinal-imaging/rim-one/ [Accessed
30 December 2021].
[14] Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, ”Gradient-Based Learn-
ing Applied to Document Recognition,” IEEE, 1998.
[15] E. N. Arrofiqoh and Harintaka, ”Implementasi Metode Convolutional
Neural Network Untuk Klasifikasi Tanaman Pada Citra Resolusi Tinggi,”
Geomatika, vol. 24, pp. 61-68, 2018.
[16] K. K. Patro, A. Jaya Prakash, M. Jayamanmadha Rao, and P. Rajesh
Kumar, “An Efficient Optimized Feature Selection with Machine Learn-
ing Approach for ECG Biometric Recognition,” IETE J. Res., vol. 0, no.
0, pp. 1–12, 2020.
[17] B. I. Taweh, Introduction to Deep Learning using R, Apress, San
Fransisco, California, USA, 2017.
[18] I. A. Sabilla, ”Arsitektur Convolutional Neural Network (CNN) Untuk
Klasifikasi Jenis dan Kesegaran Buah Pada Neraca Buah,” Tesis Institut
Teknologi Sepuluh Nopember, 2020.
[19] S. Kiranyaz, O. Avci, O. Abdeljaber, T. Ince, M. Gabbouj, and D. J.
Inman, “1D convolutional neural networks and applications: A survey,”
Mech. Syst. Signal Process., vol. 151, p. 107398, 2021.
[20] S. Ilahiyah and A. Nilogiri, ”Implementasi Deep Learning Pada Identi-
fikasi Jenis Tumbuhan Berdasarkan Citra Daun Menggunakan Convolu-
tional Neural Network,” Justindo, vol. 3, pp. 49-56, 2018.

Authorized licensed use limited to: Institut Teknologi Sepuluh Nopember. Downloaded on May 29,2024 at 06:43:04 UTC from IEEE Xplore. Restrictions apply.
203