Pneumonia Detection Using Convolution Neural Network
Pneumonia Detection Using Convolution Neural Network
Submitted by:
Date:
25th Nov, 2019
NEPAL COLLEGE OF
INFORMATION TECHNOLOGY
Balkumari, Lalitpur, Nepal
Abstract
Pneumonia is an infection in the lungs that can be caused by bacteria, viruses, or fungi. It can be
difficult to diagnose because the symptoms are similar to cold or influenza. It is generally
diagnosed using physical symptoms, blood test, and chest X-rays. Detecting pneumonia is not that
easy, well-experienced doctors are required to diagnose accurately. We have used VGG16
classifier based on convolution neural network which will detect and classify the presence of
pneumonia from a chest X-ray image. Chest X-ray is currently the best method for diagnosing
pneumonia. With the computer-aided diagnosis, physicians can make chest X-ray diagnosis more
quickly and accurately. User will upload a Chest X-ray image in the web application and
Convolutional Neural Network (CNN) will detect whether a patient has pneumonia or not based
on X-ray image of their chest.Validation will be done using confusion matrix.
Keywords: pneumonia, chest X-ray, convolution neural network, binary classification, detection,
web application
Contents
1. Introduction .................................................................................................................................... 1
1.1. Problem Statement .................................................................................................................. 2
1.2. Project Objective...................................................................................................................... 2
1.3. Significance of study................................................................................................................. 3
2. Literature review ............................................................................................................................. 4
2.1. Supervised Machine Learning:.................................................................................................. 5
2.2. Convolution Neural Network .................................................................................................... 5
2.2.1. Layers in CNN ........................................................................................................................ 7
2.2.1.1. Input Layer ....................................................................................................................... 7
2.2.1.2. Convo Layer...................................................................................................................... 7
2.2.1.3. Pooling Layer .................................................................................................................... 8
2.2.1.4. Fully Connected Layer(FC) ................................................................................................ 8
2.2.1.5. Softmax / Logistic Layer .................................................................................................... 8
2.2.1.6. Output Layer .................................................................................................................... 8
2.3. Backpropagation ...................................................................................................................... 9
2.4. Activation Function ................................................................................................................ 12
2.4.1. Types of Activation Function .............................................................................................. 13
2.4.1.1. Sigmoid / Logistic Function ......................................................................................... 13
2.4.1.2. TanH / Hyperbolic Tangent ......................................................................................... 13
2.4.1.3. ReLU (Rectified Linear Unit) ........................................................................................ 14
2.4.1.4. Leaky ReLU ................................................................................................................. 15
2.4.1.5. ReLU ........................................................................................................................... 16
2.4.1.6. Softmax ...................................................................................................................... 17
2.4.1.7. Swish .......................................................................................................................... 17
2.5. VGG16 – Convolutional Network for Classification and Detection ........................................... 18
2.6. Transfer Learning ................................................................................................................... 19
3. Methodology ................................................................................................................................. 22
3.1. Introduction ........................................................................................................................... 22
3.2. Data Sets ................................................................................................................................ 22
3.3. Training the Model ................................................................................................................. 23
3.4. Software development life-cycle ............................................................................................ 24
3.5. Tools and technique ............................................................................................................... 25
4. System Design ............................................................................................................................... 27
4.1. System Architecture ............................................................................................................... 27
4.2. API Architecture ..................................................................................................................... 28
4.3. Customed VGG trained model architecture ............................................................................ 29
4.4. Use Case Diagram .................................................................................................................. 30
4.5. Class Diagram......................................................................................................................... 31
5. Result and Discussion .................................................................................................................... 34
5.1. Data preparation .................................................................................................................... 34
5.2. Data preprocessing ................................................................................................................ 34
5.3. Pneumonia detection with CNN ............................................................................................. 34
5.4. Feature Extraction .................................................................................................................. 35
5.5. Building the Network ............................................................................................................. 35
5.6. Validation............................................................................................................................... 35
5.7. Verification ............................................................................................................................ 37
6. Bibliography.................................................................................................................................. 38
Appendix ............................................................................................................................................... 40
List of figures
This project, Pneumonia Detection using Convolution Neural Network focuses on detecting
pneumonia with the help of chest X-ray image. We aim to develop a Convolutional Neural
Network (CNN) that can detect whether a patient has pneumonia or not, based on an X-ray image
of their chest. This project is also proposed to study the different kinds of pre-existing models and
tools for detecting a disease from radiology. Chest X-rays are currently the best method for
diagnosing pneumonia. But there is still a lack of access with almost two-thirds of the world’s
population to radiology diagnostics. It is also much more difficult to make clinical diagnoses with
chest X-rays than with other imaging modalities such as CT scan or MRI [1]. This leads to
inaccurate results, our project aims to reduce the cost of detecting pneumonia with just a chest X-
ray image which could cost a lot if CT scan or MRI has to be done and it can take several days.
So, by using better technologies we could make tests more accurate.
Pneumonia is an infection in the lungs that can be caused by bacteria, viruses, or fungi. It is a
serious and life-threatening disease. The lungs become inflamed, and the tiny air sacs, or alveoli,
inside the lungs fill up with fluid. It can occur in young and healthy people, but it is most dangerous
for older adult and infants, people with other diseases, and those with impaired immune systems.
Children younger than 5 years age worldwide die due to pneumonia [2].
The risk of pneumonia is immense for many, especially in developing nations where billions face
energy poverty and relay on pollution forms of energy. The WHO estimates that over four millions
of premature death occur annually from household pollution-related disease including pneumonia
[3]. Based on the data published by Patan hospital an average of 500 patient are diagnosed with
pneumonia yearly, among which most of the patients are of age under 5. The first symptom of
pneumonia is similar to that of cold or flu, slowly it can turn into high fever, chills, and cough with
sputum. Other symptoms of pneumonia include:
Fast heartbeat
Fast breathing and shortness of breath
Chest pain that usually worsens when taking a deep breath
Sweating, nausea and vomiting, diarrhea, muscle pain
Page 1 of 43
Dusky or purplish skin color, or cyanosis, from poorly oxygenated blood.
Pneumonia can be difficult to diagnose because the symptoms are so variable and are similar to
those seen in a cold or influenza. To diagnose medical history, physical examination and some
tests like a blood test, chest X-ray, pulse oximetry and sputum test are done [4]. .
1.1.Problem Statement
Pneumonia is not that easy for doctors to detect, even doctors having good experience might not
be able to detect pneumonia by checking x-ray of the patients. There is still a lack of access with
almost two-thirds of the world’s population lacking access to radiology diagnostics. It is also much
more difficult to make clinical diagnoses with chest X-rays than with other imaging modalities
such as CT scan or MRI. But CT scan and MRI are expensive techniques. This may lead to
inaccurate results. In the context of Nepal, many of the rural government health posts lack basic
equipment, and some have not been staffed for years. Rural areas of Nepal have one doctor for
every 150,000 people so it is difficult for patients to be diagnosed properly.
To eradicate such defects in pneumonia detection, better technologies can be developed that can
improvise detection and testing. Automating this detection task would greatly improve the
efficiency of radiologists.
This project involves both the studying of the existing system and combining the methods of the
existing systems in such a way that they give an optimal solution. The objective of the project is
to
Come up with the system for conformation of Pneumonia in patients by doctors or a normal
person.
Develop tool for pneumonia detection in a remote area where experienced doctors are not
available.
To develop a user-friendly Web-application for Pneumonia detection.
Page 2 of 43
1.3. Significance of study
The study helps in the development of the system that detects pneumonia. It provides accurate
information to the users regarding pneumonia. It also helps in making the system dynamic as
per the requirement of the user as the data about the patients can be added, updated or deleted
as per requirements. Finally, the study also looks into other similar kinds of system in use to
explore the benefits of the system as well as its drawbacks.
Page 3 of 43
2. Literature review
The progress that artificial intelligence (AI) has made with regard to radiology has indeed exceeded
any and all expectations in terms of providing accurate, automated diagnoses, and, most cases,
even outshined human healthcare professionals.
One prominent example includes the algorithm, ‘CheXNet,’ an artificial neural network designed
to detect pneumonia from chest X-rays, at a performance rate greater than the average radiologist.
The system was developed by Stanford University and tested against four practicing radiologists
to diagnose 14 diseases and achieved “state-of-the-art results” on all of them[5].
Anjana Tiha detected pneumonia form Chest X-ray image using Custom Deep Convolution Neural
Network and by retraining pre-trained model “IncpectionV3” with 5856 images of X-ray with
testing accuracy 89.53 % and loss 0.41 [6].
YuanTian evaluated CNN classifier by using separate train sets and test set with the conclusion
that CNN classifier is 91% accurate [7]. Latest improvements in deep learning models and the
availability of huge datasets have assisted algorithms to outperform medical personnel in
numerous medical imaging tasks such as skin cancer classification [8], hemorrhage identification
[9], arrhythmia detection [10] , and diabetic retinopathy detection [11]. Automated diagnoses
enabled by chest radiographs have received growing interests. These algorithms are increasingly
being used for conducting lung nodule detection [12] and pulmonary tuberculosis classification
[13].
Benjamin Antin, Joshua Kraitz and Emil Martayan detected pneumonia in Chest X-ray with
Supervised Learning with the conclusion that logistic regression does not adequately capture the
complexities of dataset [14]. Rahin H. Abiyev and Mohammad Khaleel Sallam Ma’aitah designed
CNN for diagnosis of chest diseases with the conclusion that CNN is better than other deep CNN
models such as GIST, VGG16, and VGG19 [15]. Alishba Imran trained a CNN to detect
Pneumonia with the accuracy of 88.89% [16].
Page 4 of 43
2.1. Supervised Machine Learning:
Supervised learning, in the context of artificial intelligence (AI) and machine learning, is a type of
system in which both input and desired output data are provided. Input and output data are labelled
for classification to provide a learning basis for future data processing. The term supervised
learning comes from the idea that an algorithm is learning from a training dataset, which can be
thought of as the teacher. Supervised machine learning systems provide the
learning algorithms with known quantities to support future judgments. Supervised learning
systems are mostly associated with retrieval-based AI but they may also be capable of using a
generative learning model [17].
The majority of practical machine learning uses supervised learning. Supervised learning is where
you have input variables (x) and an output variable (Y) and you use an algorithm to learn the
mapping function from the input to the output Y = f(X). The goal is to approximate the mapping
function so well that when you have new input data (x) that you can predict the output variables
(Y) for that data [18].
Page 5 of 43
recognition, recommender systems, image classification, medical image analysis, and natural
language processing.
CNNs are regularized versions of multilayer perceptrons. Multilayer perceptrons usually mean
fully connected networks, that is, each neuron in one layer is connected to all neurons in the next
layer. The "fully-connectedness" of these networks makes them prone to overfitting data. Typical
ways of regularization include adding some form of magnitude measurement of weights to the loss
function. However, CNNs take a different approach towards regularization: they take advantage
of the hierarchical pattern in data and assemble more complex patterns using smaller and simpler
patterns. Therefore, on the scale of connectedness and complexity, CNNs are on the lower extreme.
Convolutional networks were inspired by biological processes in that the connectivity pattern
between neurons resembles the organization of the animal visual cortex. Individual cortical
neurons respond to stimuli only in a restricted region of the visual field known as the receptive
field. The receptive fields of different neurons partially overlap such that they cover the entire
visual field.
CNNs use relatively little pre-processing compared to other image classification algorithms. This
means that the network learns the filters that in traditional algorithms were hand-engineered.
This independence from prior knowledge and human effort in feature design is a major
advantage.
A Convolutional neural network (CNN) is a neural network that has one or more convolutional
layers and are used mainly for image processing, classification, segmentation and also for other
auto correlated data.
A convolution is essentially sliding a filter over the input. One helpful way to think about
convolutions is this quote from Dr. Prasad Samarakoon: “A convolution can be thought as “looking
at a function’s surroundings to make better/accurate predictions of its outcome.”
Rather than looking at an entire image at once to find certain features it can be more effective to
look at smaller portions of the image [19].
Page 6 of 43
Figure 1. 1: A CNN sequence to classify handwritten digits
Page 7 of 43
2.2.1.3. Pooling Layer
Pooling layer is used to reduce the spatial volume of input image after convolution. It is used
between two convolution layer. If FC layer is applied after Convo layer without applying pooling
or max pooling, then it will be computationally expensive . So, the max pooling is only way to
reduce the spatial volume of input image. Max pooling is applied in single depth slice with Stride
of 2. We can observe the 4 x 4 dimension input is reduce to 2 x 2 dimension.
There is no parameter in pooling layer but it has two hyperparameters — Filter(F) and Stride(S).
In general, if input dimension is W1 x H1 x D1, then
W2 = (W1−F)/S+1
H2 = (H1−F)/S+1
D2 = D1
Where W2, H2 and D2 are the width, height and depth of output.
Fully connected layer involves weights, biases, and neurons. It connects neurons in one layer to
neurons in another layer. It is used to classify images between different category by training.
Softmax or Logistic layer is the last layer of CNN. It resides at the end of FC layer. Logistic is used
for binary classification and softmax is for multi-classification.
Output layer contains the label which is in the form of one-hot encoded (medium, 2019)
https://towardsdatascience.com/covolutional-neural-network-cb0883dd6529.
Page 8 of 43
2.3. Backpropagation
Back-propagation is the essence of neural net training. It is the method of fine-tuning the weights
of a neural net based on the error rate obtained in the previous epoch (i.e., iteration). Proper tuning
of the weights allows you to reduce error rates and to make the model reliable by increasing its
generalization.
Backpropagation is a short form for "backward propagation of errors." It is a standard method of
training artificial neural networks. This method helps to calculate the gradient of a loss function
with respects to all the weights in the network.
Backpropagation repeatedly adjusts the weights of the connections in the network so as to minimize
a measure of the difference between the actual output vector of the net and the desired output vector.
In other words, backpropagation aims to minimize the cost function by adjusting network’s weights
and biases. The level of adjustment is determined by the gradients of the cost function with respect
to those parameters.
Computing Gradients:
Gradient of a function C(x_1, x_2, …, x_m) in point x is a vector of the partial derivatives of
C in x.
𝜕𝐶 𝜕𝐶 𝜕𝐶 𝜕𝐶
=[ , ,… , ]
𝜕𝑥 𝜕𝑥1 𝜕𝑥2 𝜕𝑥𝑚
The derivative of a function C measures the sensitivity to change of the function value (output
value) with respect to a change in its argument x (input value). In other words, the derivative
tells us the direction C is going.
The gradient shows how much the parameter x needs to change (in positive or negative
direction) to minimize C.
Page 9 of 43
Computation of those gradients happens using a technique called chain rule.
𝜕𝐶 𝜕𝐶 𝜕𝑧𝑗𝑙
𝑙 = 𝑐ℎ𝑎𝑖𝑛 𝑟𝑢𝑙𝑒
𝜕𝑤𝑗𝑘 𝜕𝑧𝑗𝑙 𝜕𝑤𝑗𝑘
𝑙
𝑧𝑗𝑙 = ∑ 𝑤𝑗𝑘
𝑙 𝑙−1
𝑎𝑘 + 𝑏𝑗𝑙 𝑏𝑦 𝑑𝑒𝑓𝑖𝑛𝑎𝑡𝑖𝑜𝑛
𝑘=1
𝜕𝑧𝑗𝑙
𝑙 = 𝑎𝑘𝑙−1 𝑏𝑦 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡𝑖𝑎𝑡𝑖𝑜𝑛 (𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑖𝑛𝑔 𝑑𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒)
𝜕𝑤𝑗𝑘
𝜕𝐶 𝜕𝐶 𝑙−1
𝑙 = 𝑎 𝑓𝑖𝑛𝑎𝑙 𝑣𝑎𝑙𝑢𝑒
𝜕𝑤𝑗𝑘 𝜕𝑧𝑗𝑙 𝑘
𝜕𝐶 𝜕𝐶 𝜕𝑧𝑗𝑙
= 𝑐ℎ𝑎𝑖𝑛 𝑟𝑢𝑙𝑒
𝜕𝑏𝑗𝑙 𝜕𝑧𝑗𝑙 𝜕𝑏𝑗𝑙
𝜕𝑧𝑗𝑙
=1 𝑏𝑦 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡𝑖𝑎𝑡𝑖𝑜𝑛 (𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑖𝑛𝑔 𝑑𝑒𝑟𝑖𝑣𝑎𝑡𝑖𝑣𝑒)
𝜕𝑏𝑗𝑙
𝜕𝐶 𝜕𝐶
= 1 𝑓𝑖𝑛𝑎𝑙 𝑣𝑎𝑙𝑢𝑒
𝜕𝑏𝑗𝑙 𝜕𝑧𝑗𝑙
Page 10 of 43
Equations for derivative of C in a single bias 𝑏𝑗𝑙 :
The common part in both equations is often called “local gradient” and is expressed as follows:
𝜕𝐶
𝛿𝑗𝑙 = 𝑙𝑜𝑐𝑎𝑙 𝑔𝑟𝑎𝑑𝑖𝑒𝑛𝑡
𝜕𝑧𝑗𝑙
The “local gradient” can easily be determined using the chain rule. I won’t go over the process
now but if you have any questions, please comment below.
𝜕𝐶
𝑤 ∶= 𝑤− ∈
𝜕𝑤
𝜕𝐶
𝑏 ∶= 𝑏− ∈
𝜕𝑏
end
Algorithm for optimizing weights and biases (also called “Gradient descent”)
w and b are matrix representations of the weights and biases. Derivative of C in w or b can be
calculated using partial derivatives of C in the individual weights or biases.
I would like to dedicate the final part of this section to a simple example in which we will calculate
2
the gradient of C with respect to a single weight 𝑤22 .
Page 11 of 43
2
Weight 𝑤22 connects a22 and z22 , so computing the gradient requires applying the chain rule
through z23 and a32 :
(3) (3)
𝜕𝐶 𝜕𝐶 𝜕𝑧2 𝜕𝐶 𝜕𝑎2 (2) 𝜕𝐶 (3) (2)
(2)
= (3)
. (2)
= (3)
. (3)
. 𝑎2 = (3)
. 𝑓 ′ (𝑧2 ). 𝑎2
𝜕𝑤22 𝜕𝑧2 𝜕𝑤22 𝜕𝑎2 𝜕𝑧2 𝜕𝑎2
Calculating the final value of derivative of C in a32 requires knowledge of the function C.
Since C is dependent on a32 , calculating the derivative should be fairly straightforward [20].
Activation functions are an extremely important feature of the artificial neural networks. They
basically decide whether a neuron should be activated or not. Whether the information that the
neuron is receiving is relevant for the given information or should it be ignored.
The activation function is the non-linear transformation that we do over the input signal. This
transformed output is then send to the next layer of neurons as input.
When we do not have the activation function the weights and bias would simply do a linear
transformation. A linear equation is simple to solve but is limited in its capacity to solve complex
problems. A neural network without an activation function is essentially just a linear regression
model. The activation function does the non-linear transformation to the input making it capable
to learn and perform more complex tasks. We would want our neural networks to work on
complicated tasks like language translations and image classifications. Linear transformations
would never be able to perform such tasks.
Activation functions make the back-propagation possible since the gradients are supplied along
with the error to update the weights and biases. Without the differentiable non-linear function, this
would not be possible [21].
Page 12 of 43
2.4.1. Types of Activation Function
Advantages
Smooth gradient, preventing “jumps” in output values.
Output values bound between 0 and 1, normalizing the output of each neuron.
Clear predictions—For X above 2 or below -2, tends to bring the Y value (the prediction)
to the edge of the curve, very close to 1 or 0. This enables clear predictions.
Disadvantages
Vanishing gradient—for very high or very low values of X, there is almost no change to
the prediction, causing a vanishing gradient problem. This can result in the network
refusing to learn further, or being too slow to reach an accurate prediction.
Outputs not zero centered.
Computationally expensive.
Advantages
Page 13 of 43
Zero centered—making it easier to model inputs that have strongly negative, neutral, and
strongly positive values.
Otherwise like the Sigmoid function.
Disadvantages
Like the Sigmoid function.
Advantages
Computationally efficient—allows the network to converge very quickly
Non-linear—although it looks like a linear function, ReLU has a derivative function and
allows for backpropagation
Disadvantages
The Dying ReLU problem—when inputs approach zero, or are negative, the gradient of
the function becomes zero, the network cannot perform backpropagation and cannot learn.
Page 14 of 43
Mathematically it is defined by:
Advantages
Prevents dying ReLU problem—this variation of ReLU has a small positive slope in the
negative area, so it does enable backpropagation, even for negative input values
Otherwise like ReLU.
Disadvantages
Results not consistent—leaky ReLU does not provide consistent predictions for negative
input values.
Page 15 of 43
Figure 1. 7: Leaky ReLU function
2.4.1.5. ReLU
Advantages
Allows the negative slope to be learned—unlike leaky ReLU, this function provides the
slope of the negative part of the function as an argument. It is, therefore, possible to perform
backpropagation and learn the most appropriate value of α.
Otherwise like ReLU
Disadvantages
May perform differently for different problems.
Page 16 of 43
2.4.1.6. Softmax
Advantages
Able to handle multiple classes only one class in other activation functions—normalizes
the outputs for each class between 0 and 1, and divides by their sum, giving the probability
of the input value being in a specific class.
Useful for output neurons—typically Softmax is used only for the output layer, for neural
networks that need to classify inputs into multiple categories.
2.4.1.7. Swish
Page 17 of 43
Figure 1. 8: Swish function
Page 18 of 43
VGG16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman
from the University of Oxford in the paper “Very Deep Convolutional Networks for Large-Scale
Image Recognition”. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a
dataset of over 14 million images belonging to 1000 classes. It was one of the famous model
submitted to ILSVRC-2014. It makes the improvement over AlexNet by replacing large kernel-
sized filters (11 and 5 in the first and second convolutional layer, respectively) with multiple 3×3
kernel-sized filters one after another. VGG16 was trained for weeks and was using NVIDIA Titan
Black GPU’s [23].
Less training data—starting to train a model from scratch is a lot of work and requires a
lot of data. For example, if we want to create a new algorithm that can detect a frown, we
need a lot of training data. Our model will first need to learn how to detect faces, and
only then can it learn how to detect expressions, such as frowns. Instead, if we use a
Page 19 of 43
model that has already learned how to detect faces, and retrain this model to detect
frowns, we can accomplish the same result using far less data.
Makes deep learning more accessible—working with transfer learning makes it easier to
use deep learning. It’s possible to obtain the desired results without being an expert in
deep learning, by using a model that was created by a deep learning specialist and
applying it to a new problem.
Domain adaptation
In this approach, a dataset on which the machine was trained is different from (but still related
to) the target data set. A good example of this would be a spam email filtering model. Let’s say
this model was trained to identify spam email for user A. When the model is then used for user
B, domain adaptation will be used, because even though the task is the same (filtering emails),
user B receives different types of emails from user A.
Multitask learning
This method involves two or more tasks being resolved simultaneously so that similarities and
differences can be leveraged. It is based on the idea that a model that has been trained on a related
task can gain skills that improve its ability in the new task.
Going back to our spam email filtering model, let’s say this model is learning what features it
should look for when identifying spam mail for user A and user B. Because the users are very
different, the model needs to look for different features in order to identify each users’ spam
mail. For example, user A is an Italian speaker so an Italian language email should not be a red
flag. However, user B is a Chinese speaker, so an email in Italian might be considered a spam
feature. While simultaneously learning to identify spam features for user A and B, the model
learns that regardless of the language, emails requesting credit card details are more likely to be
spam.
Page 20 of 43
Zero-shot learning
This technique involves a model trying to solve a task to which it was not exposed during
training. For example, let’s say we are training a model to identify animals in pictures. To
identify the animals, the machine is taught to identify two parameters: the color yellow and spots.
The model is then trained on multiple pictures of chicks, which it learns to identify because they
are yellow but do not have spots, and dalmations, which it knows has spots but are not yellow.
To expand on this example, you may not have pictures of giraffes on which to train the model,
but the model knows that giraffes are yellow and have spots. When the model encounters an
image of a giraffe, it will be able to identify it, even though it had never seen a giraffe in training.
One-shot learning
This approach requires that a model learns how to categorize an object, after being exposed to it
either once or just a few times. To do this, the model leverages information it has about known
categories. For example, our animal classifying model knows how to identify a horse. The model
is then exposed to a single photo of a zebra, which looks exactly like a horse but has white and
back stripes. The model will then be able to classify zebras, without being exposed to additional
pictures, because it transferred knowledge it already had about horses (missinglink, n.d.).
Page 21 of 43
3. Methodology
3.1. Introduction
There are three main parts of the system. The first part is client which is built from Django. Its
interface is used to upload the image and see the predicted result. The second part is a SQLite
database which is the default database for Django. The last part is API which is built from flask
which runs independently. Initially, the client inputs the chest x-ray image into the client part
which saves the data to the database and passes the data to the Flask API. Then the API will analyze
the image and predict output. In the API, the trained Pneumonia classifier model is loaded with its
target size, the default target size for the VGG model is 224*224. The string encoded image is
decoded into jpeg format. The decoded image is passed to preprocess image where the image is
resized and formatted. Finally, image is passed to production algorithm. The prediction returns the
output of an array (0, 0) value ranges from 0 to 1, where 0 means Pneumonia not detected and 1
means detected.
About model architecture, we take images of chest x-ray having one with Pneumonia and another
without pneumonia. We divide the images into 3 sets: Train data, Test data and Validation data
with the images of set 5216, 624, 16 respectively. Initially, we train our model using VGG16 with
the weight provided by imagenet. Before training, we remove the top layer from ResNet50. Also,
we added two activation functions ReLU and Sigmoid. For training, we run the model to 16
epochs. After Training the model to 16 epochs we got, loss: 0.0703, accuracy: 0.9743 – val loss:
0.0493 - val_acc: 1.0000.
Page 22 of 43
account for any grading errors, the evaluation set was also checked by a third expert (Kosovan,
2018)https://www.kaggle.com/kosovanolexandr/keras-nn-x-ray-predict-pneumonia-86-54
There are various ways to train a model. We have use a VGG16 model which we have customize
the output classes for the prediction of custom dataset. Dataset are divided in 85% of data for
training and 15% for testing. Dataset will be divided into three groups training, testing and
validating where model will be trained only in training dataset keeping data set untouched and
evaluated in test dataset. Model will be run in testing dataset for better results.
Page 23 of 43
3.4. Software development life-cycle
Page 24 of 43
3.5. Tools and technique
Python
Mysql
Django
Django is a Python-based free and open-source web framework, which follows the model
template-view (MTV) architectural pattern. It is maintained by the Django Software Foundation
(DSF), an independent organization Django's primary goal is to ease the creation of complex,
database-driven websites. The framework emphasizes reusability and "pluggability" of
components, less code, low coupling, rapid development, and the principle of don't repeat yourself.
Jupyter Notebook
The Jupyter Notebook is an open-source web application that allows to create and share documents
that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and
transformation, numerical simulation, statistical modeling, data visualization, machine learning,
and much more.
Opencv
Page 25 of 43
Keras
Tensorflow
TensorFlow is a free and open-source software library for dataflow and differentiable
programming across a range of tasks. It is a symbolic math library, and is also used for machine
learning applications such as neural networks. It is used for both research and production at
Google.
Flask
CNN
A CNN is basically an Artificial Neural Network (ANN) with additional Convolutional layers at
the input.The architecture will vary from model to model, and it is this difference that causes the
difference in model performance.
Confusion matrix
Page 26 of 43
4. System Design
Page 27 of 43
4.2. API Architecture
Page 28 of 43
4.3. Customed VGG trained model architecture
Page 29 of 43
4.4. Use Case Diagram
The use case model for any system consists of ‘use cases’. Use cases represent different ways in
which the system can be used by the user. A simple way to find all the use case of a system is to
ask the questions “What the user can do using the system?” .The use cases partition the system
behavior into transactions such that each transaction performs some useful action from the users’
point of view.
The purpose of the use case to define a piece of coherent behavior without revealing the internal
structure of the system. A use case typically represents a sequence of interaction between the user
and the system.
+ signup
+ Admin + User
+ login
+ upload_X-ray_image
+ view_result
+ Guest
+ view_history
+ configure_system
+ display_result
+ manage_user
Page 30 of 43
Above, use case diagram has two actors that are admin, user and guest. Admin saves new registered
user, verifies user that logins, saves the images uploaded by user or guest and displays result to
user. User uses this system to detect pneumonia in the X-ray image. If user is new to the system
he/she initially registers to the system then login to the system. After, logging into the system user
uploads the X-ray image and views the result. Guest uses this system just to upload the X-ray
image and view the result.
+ 1..1 + has
+ user + account
- username:varchar + 1..1 - id:varchar
- email_id:varchar
- password:varchar
+ register()
+ login()
+ 1..1
+ uploads
Page 31 of 43
Above, class diagram shows the relationship between the classes. Here each user has account and
has unique id. New user registers to the system and then login to system. User and guest uploads
X-ray image. Upload x-ray image passes the image uploaded by user to prediction. Prediction
processes the image and displays the result.
4.6.Sequence Diagram
A sequence diagram simply depicts interaction between objects in a sequential order i.e. the order
in which these interactions take place. We can also use the terms event diagrams or event scenarios
to refer to a sequence diagram. Sequence diagrams describe how and in what order the objects in
a system function [26].
register()
login()
feature extraction()
upload image()
extracted feature()
predicted result()
result()
Page 32 of 43
Above sequence diagram shows the interaction between different objects present in project. When
user visits the website he/she registers to the system and then login. System application save the
new registered users and verifies the logged in users. User uploads the image to system and feature
is extracted from image database. The extracted feature is send to trained neural network, this
predicts the pneumonia and passes the result to system application.
Page 33 of 43
5. Result and Discussion
ModelCheckpoint: when training requires a lot of time to achieve a good result, often many
iterations are required. In this case, it is better to save a copy of the best performing model
only when an epoch that improves the metrics ends.
EarlyStopping: sometimes, during training we can notice that the generalization gap (i.e.
the difference between training and validation error) starts to increase, instead of
decreasing. This is a symptom of overfitting that can be solved in many ways (reducing
model capacity, increasing training data, data augumentation, regularization, dropout, etc).
Page 34 of 43
Often a practical and efficient solution is to stop training when the generalization gap is
getting worse (Medium, 2019).
For the feature extraction pre trained VGG is used. Output layer is removed from the pertained
model because the default output layer have a 1000 classes for image classification which we don't
required. Entire network is used as a fixed feature extractor for the new data set.
5.6. Validation
Since class score is calculated, validation is done with confusion matrix of all class score.
Predicated labels and truth value used to calculate recall, precision, f1 score and f1 beta score.
These are calculated for every epoch whose validation loss is below the previous one. Here, we
save lowest validation loss in a variable is below the previous one. Here, we save lowest validation
Page 35 of 43
loss in a variable and if any next validation loss is lower than saved then recall, presion, f1 score
and f1 beta score are calculated. Following are the formula in True positive, True negative, False
positive and False negative for calculating Recall, Precision, F1 score and F1 beta score.
𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑅𝑒𝑐𝑎𝑙𝑙 (𝑟) =
𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒
𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝑝) =
𝑇𝑟𝑢𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
2∗𝑝∗𝑟
𝐹1𝑠𝑐𝑜𝑟𝑒 =
𝑝+𝑟
𝑝∗𝑟
𝐹1𝑏𝑒𝑡𝑎 𝑠𝑐𝑜𝑟𝑒 = (1 + 𝛽)
𝛽∗𝑝+𝑟
The beta parameter determines the weight of precision in the combined score. Beta < 1 lends more
weight to precision, while beta > 1 favors recall (beta=0 considers only precision, beta = ∞ only
recall).
Page 36 of 43
Figure: Confusion Matrix for test data.
5.7. Verification
In each 10 epoch, best model i.e when validation loss is lowest is saved with value of recall, f1 score and
precision score. The final test result of the model accuracy is 89.26, precision is 88.36, recall is 95.38 and
f1 score 91.73 which is shown below in figure.
Page 37 of 43
6. Bibliography
Page 38 of 43
13. P. Lakhani and B. Sundaram. (2017, April 4). Retrieved from
https://pubs.rsna.org/doi/full/10.1148/radiol.2017162326
14. Benjamin Antin, J. K. (2017). Retrieved from http://cs229.stanford.edu/proj2017/final-
reports/5231221.pdf
15. Ma’aitah, R. H. (2018, Aug 1). Retrieved from
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6093039/
16. An Efficient Deep Learning Approach to Pneumonia Classification in Healthcare.
(2019). Retrieved from
https://www.researchgate.net/publication/332049903_An_Efficient_Deep_Learning_App
roach_to_Pneumonia_Classification_in_Healthcare
17. n.d.). Retrieved from techtarget:
https://searchenterpriseai.techtarget.com/definition/supervised-learning
Page 39 of 43
Appendix
Code:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
from os import listdir, makedirs
from os.path import join, exists, expanduser
from keras import applications
from keras.preprocessing.image import ImageDataGenerator
from keras import optimizers
from keras.models import Sequential, Model, load_model
from keras.layers import Dense, GlobalAveragePooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras import backend as K
import tensorflow as tf
from keras.utils.data_utils import Sequence
import sys
from PIL import *
sys.modules['Image'] = Image
print(os.listdir("../input/chest-xray-pneumonia/chest_xray/chest_xray/test"))
img_width, img_height = 224, 224
train_data = '../input/chest-xray-pneumonia/chest_xray/chest_xray/train'
test_data = '../input/chest-xray-pneumonia/chest_xray/chest_xray/test'
val_data = '../input/chest-xray-pneumonia/chest_xray/chest_xray/val'
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
rescale=1. / 255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
train_generator = train_datagen.flow_from_directory(
train_data,
target_size=(224, 224),
batch_size=16,
class_mode='categorical')
test_generator = test_datagen.flow_from_directory(
test_data,
target_size=(224, 224),
batch_size=16,
class_mode='categorical')
validation_generator = val_datagen.flow_from_directory(
val_data,
target_size=(224, 224),
batch_size=16,
class_mode='categorical')
#import inception with pre-trained weights. do not include fully #connected layers
from keras.applications.resnet50 import ResNet50
model = ResNet50(weights='imagenet', include_top=False)
result = model.output
result = GlobalAveragePooling2D()(result)
Page 40 of 43
# and a fully connected output/classification layer
predictions = Dense(2, activation='sigmoid')(result)
inception_transfer = Model(inputs=model.input, outputs=predictions)
inception_transfer.compile(loss='categorical_crossentropy',
optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
metrics=['accuracy'])
import tensorflow as tf
from keras.models import Sequential
with tf.device("/device:GPU:0"):
history_pretrained = inception_transfer.fit_generator(
test_generator,
steps_per_epoch=len(test_generator),
epochs=16,
shuffle = True,
verbose = 1,
validation_data = test_generator,
validation_steps=1,
use_multiprocessing=True,
)
import matplotlib.pyplot as plt
# summarize history for accuracy
plt.plot(history_pretrained.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['Pretrained'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history_pretrained.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['Pretrained'], loc='upper left')
plt.show()
from keras.applications.vgg16 import VGG16
model = VGG16(weights='imagenet', include_top=False)
result = model.output
result = GlobalAveragePooling2D()(result)
# add a fully-connected layer
result = Dense(512, activation='relu')(result)
# and a fully connected output/classification layer
predictions = Dense(2, activation='sigmoid')(result)
inception_transfer = Model(inputs=model.input, outputs=predictions)
inception_transfer.compile(loss='categorical_crossentropy',
optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),
metrics=['accuracy'])
import tensorflow as tf
with tf.device("/device:GPU:0"):
history_pretrained = inception_transfer.fit_generator(
train_generator,
steps_per_epoch=len(train_generator),
epochs=16,
shuffle = True,
verbose = 1,
validation_data = test_generator,
validation_steps=1,
use_multiprocessing=True,
)
import matplotlib.pyplot as plt
# summarize history for accuracy
plt.plot(history_pretrained.history['val_acc'])
plt.title('model accuracy')
Page 41 of 43
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['Pretrained'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history_pretrained.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['Pretrained'], loc='upper left')
plt.show()
scores = inception_transfer.evaluate_generator(test_generator,steps=1)
print('acc =',scores[1]*100)
inception_transfer.save('new_pneuomonia_model.h5')
app = Flask(__name__)
app.config["DEBUG"] = True
def get_model():
global model
# model = load_model('m.h5')
model = load_model('m.h5')
print("* MODEL LOADED !")
def target_size():
return 224,224
def preprocess_image(image):
image = image.resize(target_size())
image = img_to_array(image)
image = np.expand_dims(image,axis=0)
return image
@app.route('/predict', methods=['POST'])
def predict():
message = request.get_json(force=True)
encoded = message['image']
decoded = base64.b64decode(encoded)
image = Image.open(io.BytesIO(decoded))
processed_image = preprocess_image(image)
prediction = model.predict(processed_image).tolist()
print(prediction)
Page 42 of 43
response = {
'prediction' : {
'Pneumonia' : 0,
'Normal' : 0
}
}
return jsonify(response)
@app.route('/', methods=['GET'])
def home():
return render_template("main.html")
app.run()
Page 43 of 43