0% found this document useful (0 votes)
212 views

Brain Tumor Classification Project Report

The document discusses a project report on brain cancer classification using MRI data. It aims to develop models using deep learning techniques like convolutional neural networks (CNNs) to automatically classify brain tumor images and aid doctors in diagnosis. The challenges with traditional computer-aided analysis of MRI data are addressed. Specifically, CNN models like ResNet, MobileNet, and transfer learning approaches are explored to build automated brain tumor detection systems with higher accuracy compared to conventional methods. The goal is to provide a second opinion to radiologists and help detect cancer in earlier stages through advanced deep learning techniques.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
212 views

Brain Tumor Classification Project Report

The document discusses a project report on brain cancer classification using MRI data. It aims to develop models using deep learning techniques like convolutional neural networks (CNNs) to automatically classify brain tumor images and aid doctors in diagnosis. The challenges with traditional computer-aided analysis of MRI data are addressed. Specifically, CNN models like ResNet, MobileNet, and transfer learning approaches are explored to build automated brain tumor detection systems with higher accuracy compared to conventional methods. The goal is to provide a second opinion to radiologists and help detect cancer in earlier stages through advanced deep learning techniques.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Project Report

On

“Brain Cancer Classification using MRI Data”

Submitted In Fulfilment of the Course


Study Oriented Project – CSF266

Submitted by
Shanmukh Chandra Yama – 2019A7PS0028P

Under the esteemed guidance and supervision of

Mrs. L. Rajya Lakshmi


Faculty, Department of Computer Science & Information
Systems Birla Institute of Technology & Science, PILANI,
333031 Rajasthan, INDIA

(15th December 2021)

Birla Institute of Technology and Science, Pilani


2021

1
Acknowledgement

I would like to highly acknowledge the excellent guidance provided


by my guide and supervisor – Mrs. L. Rajya Lakshmi without whom
this project would have not at all been possible. The help provided
by her has been the key to the success of this project and I would like
to express my immense gratitude towards her. I would also primarily
thank her for giving me this opportunity because of which I learnt a
lot of new things in the due course of exploration and journey.

I would also extend my gratitude towards some well known data


scientists like Muhmad Elgendy, Andrew Ng and some youtubers like
bhattiprolu sreenivas , Aman and kanishk naik who helped me
understanding the concepts and aroused interest for exploring more
in data science which helped in completing this project.

Sincerely,
Shanmukh Chandra Yama,
2019A7PS0028P.

2
Table of Contents:

Topic ..................................................................................... Page Number


1. Introduction and Background
• Cancer and Brain Tumor………………………………………………………………………………..03
• Computer Aided Diagnosis and need of Deep learning…………………………………..04
• Convolutional Neural Networks……………………………………………………………………..05
• RES-NET………………………………………………………………………………………………………….09
• Separable Convolutions………………………………………………………………………………….10
• Mobile-Net……………………………………………………………………………………………………..11
• Transfer Learning…………………………………………………………………………………………….13

2. Approach at glance………………………………………………….…………………............13
3. Problem Statement………………………………………………………………………………. 14
4. Data
• Data glance……………………………………………………………………………………………………14
• Data organization, Pre-processing and Splitting…………………………………………….15
• Data augmentation and normalization………………………………………………………….16
5. Model-1(Customised model) Building…………………………………………………….17
6. Need for Transfer learning and choice of pre-trained model…………………..30
7. Model-2(Using Transfer learning on Mobile-Net
• Introduction and Training……………………………………………………………………………..31
• Compilation…………………………………………………………………………………………………..36
8. Future Works……………………………………………………………………………………….....38
9. Conclusion and Contributions......................................................................39
10. Data Set........................................................................................................ 39
11. References……………………………………………………………………………………………….39

3
Background and Introduction:

Cancer is one of the most deadly diseases in the world, It is one of the diseases for which the complete
cure hasn’t been yet discovered. Cancer eats up the cells slowly with in due course of time. According to
the place of the origin of these cancer cells there are lots of cancers like lung cancer, breast cancer, brain
tumour, skin cancer , blood cancer and the list goes up so on. Cancer is becoming an alarming issue which
has to be addressed and needs attention for the sake of protecting our mankind from this deadly monster
. Some of the statistics given by IARC( International Agency for Research on Cancer) , WHO reveal that
cancer cases increased by 28% between 2006 and 2016 and there will be 2.7 million new cancer cases
emerging in 2030. The increasing numbers may be attributed to the alteration of the food that’s happening
or may be due to life style we are cultivating. The reasons are many and it is always difficult to identify
what could have been the cause of cancer. There is a high chance of getting a lung cancer or occurrence
of liver damage if some one is very much into alcohol or tobacco but there are few diseases to which it is
difficult to find the root cause of occurrence. Cancer is one that belongs to that very same nest , it is
generally difficult to predict or realise that someone is having cancer in the early stages. For most of the
people it causes the considerable amount of damage by the time they sense the pain or they get
diagnosed. It is really difficult for the surgeons and doctors to rescue someone from cancer once it gets
intensify, therefore an early detection is necessary to mitigate the danger it can cause and save from death.

Brain tumor , a kind of Cancer which is prevalent in children and adults. 80 to 95 percent of the Central
Nervous System tumors are brain tumors. On an average 11,700 cases are reported annually. The chance
of survival is very low if we get hit by tumor, some statistics say that over the past 5 years only 35% of the
men and 37% of women have been survived. Therefore it has become a really important and relevant
thing in the contemporary world and the necessity of early identification has opened many doors for the
research in this area.

Cancer is nothing but uncontrolled growth of cells, there would be a point of origin from which the cancer
spreads, the job would be done if the point of origin is identified. The most common way of brain tumor
diagnosis is through Magnetic Resonance Imaging. This MRI analysis is little tedious and tough, has to be
done meticulously. This task demands both time and good amount of knowledge in radiology and in
addition to that the outcome even depends on the experience of the radiologists as well. Therefore,
Computer aided analysis of these MRI images would be good idea for research, which would serve as a
second opinion for the doctors.

4
Computer-aided analysis and need of deep learning

The computer aided analysis is really cool to hear but how does a computer diagnose the things. There
are lot of challenges involved in dealing with MRIs. First, these images of brain tumor are fine-grained,
high-resolution images that depict rich geometric structures and complex textures. The variability within
a class and the consistency between classes could make this classification problem into a tough one. The
answer to the question of how can a computer learn to diagnose the cancer is machine learning.
Machine learning needs data in order to give the answer. The second challenge is there are lot of
abnormalities in the sizes and locatioons of brain tumors. Another challenge is the limitations of feature
extraction methods for MRIs of brain tumors . Machine learning techniques in general rely on supervised
information, if not proper for doing unsupervised learning we need to have prior knowledge of data to
select the useful features. This method makes the feature extraction efficiency very low and the
computation load high. The final extracted features are only some low-level and unrepresentative
features of MRI images which could lead to a final model that generates poor classification results.

The solution to this problem can fetched from deep learning techniques which automatically extract
features, retrieve information from data automatically and learn advanced abstract representations of
data. They can solve the problems of common feature extraction and have been successfully applied in
computer vision, biomedical science and other fields.

What is deep learning?


Deep learning can be considered as a subset of machine learning. It is a field that is based on learning
and improving on its own by examining computer algorithms. While machine learning uses simpler
concepts, deep learning works with artificial neural networks, which are designed to imitate how
humans think and learn. Until recently, neural networks were limited by computing power and thus were
limited in complexity. However, advancements in Big Data analytics have permitted larger, sophisticated
neural networks, allowing computers to observe, learn, and react to complex situations faster than
humans. Deep learning has aided image classification, language translation, speech recognition. It can be
used to solve any pattern recognition problem and without human intervention.

Artificial neural networks, comprising many layers, drive deep learning. Deep Neural Networks (DNNs)
are such types of networks where each layer can perform complex operations such as representation
and abstraction that make sense of images, sound, and text. Considered the fastest-growing field in
machine learning, deep learning represents a truly disruptive digital technology, and it is being used by
increasingly more companies to create new business models.

5
Now, as you have understood what is Deep Learning, let's begin to understand how does Deep Learning
works.

Neural networks are layers of nodes, much like the human brain is made up of neurons. Nodes within
individual layers are connected to adjacent layers. The network is said to be deeper based on the
number of layers it has. A single neuron in the human brain receives thousands of signals from other
neurons. In an artificial neural network, signals travel between nodes and assign corresponding weights.
A heavier weighted node will exert more effect on the next layer of nodes. The final layer compiles the
weighted inputs to produce an output. Deep learning systems require powerful hardware because they
have a large amount of data being processed and involves several complex mathematical calculations.
Even with such advanced hardware, however, deep learning training computations can take weeks.

What are Convolutional Neural Networks?

Regular neural networks contain multiple layers that allow each layer to find successively complex
features, and this is the way CNNs work. The first layer of convolutions learns some basic features (edges
and lines), the next layer learns features that are a little more complex (circles, squares, and so on), the
following layer finds even more complex features (like parts of the face, a car wheel, dog whiskers, and
the like), and so on. You will see this demonstrated shortly. For now, know that the CNN architecture
follows the same pattern as neural networks: we stack neurons in hidden layers on top of each other;
weights are randomly initiated and learned during network training; and we apply activation functions,
calculate the error (y – yˆ), and backpropagate the error to update the weights. This process is the same.
The difference is that we use convolutional layers instead of regular fully connected layers for the
feature learning.

The objective of the convolution operation is to extract high-level features such as edges from the input
image. ConvNets need not be limited to only one convolutional layer. Conventionally, the first ConvLayer
is responsible for capturing the Low-Level 16 features such as edges, color, gradient orientation, etc.
With added layers, the architecture adapts to the high-level features as well, giving us a network that has
a wholesome understanding of images in the dataset, similar to how we would. There are two types of
results to the operation — one in which the convolved feature is reduced in dimensionality as compared
to the input, and the other in which the dimensionality is either increased or remains the same. This is
done by applying Valid Padding in case of the former, or the Same Padding in the case of the latter.

Similar to the convolutional layer, the pooling layer is responsible for reducing the spatial size of the
convolved feature. This is to decrease the computational power required to process the data through
dimensionality reduction. Furthermore, it is useful for extracting dominant features which are rotational
and positional invariant, thus maintaining the process of effectively training of the model .

6
7
CNN classification - fully connected layer

The Convolutional Layer and the Pooling Layer, together form the i-th layer of a Convolutional Neural
Network. Depending on the complexities in the images, the number of such layers may be increased for
capturing low-levels details even further, but at the cost of more computational power.

The final output is flattened and feed it to a regular Neural Network for classification purposes. Adding a
Fully-Connected layer (above figure) is a usual way of learning non-linear combinations of the high-level
features as represented by the output of the convolutional layer. The Fully-Connected layer is learning a
possibly non-linear function in that space. Now that we have converted our input image into a suitable
form, we shall flatten the image into a column vector. The flattened output is fed to a feed-forward
neural network and backpropagation applied to every iteration of training. Over a series of epochs, the
model is able to distinguish between dominating and certain low-level features in images and classify
them using the softmax classification technique.

8
ResNet:
The Residual Neural Network (ResNet) was developed in 2015 by a group from the Microsoft Research
team.5 They introduced a novel residual module architecture with skip connections. The network also
features heavy batch normalization for the hidden layers. This technique allowed the team to train very
deep neural networks with 50, 101, and 152 weight layers while still having lower complexity than
smaller networks like VGGNet (19 layers). ResNet was able to achieve a top-5 error rate of 3.57% in the
ILSVRC 2015 competition, which beat the performance of all prior ConvNets.
Looking at how neural network architectures evolved from LeNet, AlexNet, VGGNet, and Inception, you
might have noticed that the deeper the network, the larger its learning capacity, and the better it
extracts features from images. This mainly happens because very deep networks are able to represent
very complex functions, which allows the network to learn features at many different levels of
abstraction, from edges (at the lower layers) to very complex features (at the deeper layers). Earlier in
this chapter, we saw deep neural networks like VGGNet-19 (19 layers) and GoogLeNet (22 layers). Both
performed very well in the ImageNet challenge. But can we build even deeper networks? We learned
from chapter 4 that one downside of adding too many layers is that doing so makes the network more
prone to overfit the training data. This is not a major problem because we can use regularization
techniques like dropout, L2 regularization, and batch normalization to avoid overfitting. So, if we can
take care of the overfitting problem, wouldn’t we want to build networks that are 50, 100, or even 150
layers deep? The answer is yes. We definitely should try to build very deep neural networks. We need to
fix just one other problem, to unblock the capability of building super-deep networks: a phenomenon
called vanishing gradients.
To solve the vanishing gradient problem, He et al. created a shortcut that allows the gradient to be
directly backpropagated to earlier layers. These shortcuts are called skip connections: they are used to
flow information from earlier layers in the network to later layers, creating an alternate shortcut path for
the gradient to flow through. Another important benefit of the skip connections is that they allow the
model to learn an identity function, which ensures that the layer will perform at least as well as the
previous layer

9
Till now we have discussed about the standard convolutions. There are other type of convolutions which
are known as Separable convolutions.
Separable Convolutions
There are 2 sorts of separable convolutions.
1)Spatial separable
2) depthwise seperable
Spatial separable convolutions:
Spatial Separable convolutions deals with spatial dimension of the image (width and height). It divides a
kernel into two smaller kernel. For example, a 3x3 kernel is divided into a 3x1 and a 1x3 kernel.

Here, instead of doing one convolution with 9 multiplicants, we can do two convolutions with 3
multiplications each (i.e., 6 in total) to achieve the same effect. With less multiplications, computational
complexity goes down and network is able is run faster.

One of the famous convolution used to detect edges i.e., Sobel kernel can also be separated spatially.

Though, less computation power is achieved using spatial separable convolution, all the kernels cannot
be separated into two smaller kernels, which is one of the cons of spatial separable convolution.

Depth wise Separable Convolutions:

Depth wise Separable Convolutions is what Mobile-net architecture is based on. Depth wise separable
convolution works with kernel that cannot be factored into two smaller kernels. Spatial separable
convolutions deals with spatial dimensions but depth wise separable convolutions deals with depth
dimension also.

Depth wise separable convolution is a factorized convolution that factorizes standard convolution into a
depth wise convolution and a convolution called pointwise convolution. Depth wise separable

10
convolutions splits kernel into two separate kernels for filtering and combining. Depth wise convolution
is used for filtering whereas pointwise convolution is used for combining.

Using depth wise separable convolutions, the total computation required for the operation is the sum of depth
wise convolution and pointwise convolution which is:

For standard convolution, total computation is:


DK.Dk.M.N.DF.DFwhere computational cost depends on number of input channels M, number of output
channels N, kernel size DK and feature map size DF.

By expressing convolution as a two steps process of filtering and combining, total reduction in
computation is:

(DK.Dk.M.N.DF.DF + M.N.DF.DF)*( DK.Dk.M.N.DF.DF)-1  1/N + 1/(Dk)2.

This means DK.Dk is 3x3 computation cost can be reduced to 8 to 9 times. This is the use of using Depth
wise convolution. This all is the introduced for the sake of understanding MobileNet architecture which
is the one we used by us for transfer learning.

MobileNet Architecture:

As mentioned above, mobilenet is built on depthwise separable convolutions, except for to the softmax
for classification. first layer. First layer is a full convolutional layer. All layers are followed by batch
normalization and ReLU non-linearity. However, final layer is a fully connected layer without any non-
linearity and feeds to the softmax for classification. For down sampling, strided convolution is used for

11
both depthwise convolution as well as for first fully convolutional layer. The total number of layers for
mobilenet is 28 considering depthwise and pointwise convolution as separate layers.

12
Mobilenet model is helpful for us because it doesn’t require any GPU as it is less expensive
computationally. Width multiplier can be applied to any model structure to define a new smaller model
with a reasonable accuracy, latency and size trade off. It is used to define a new reduced structure that
needs to be trained from scratch.

Transfer Learning:
What is transfer learning?
Armed with the understanding of the problems that transfer learning solves, let’s look at its formal
definition. Transfer learning is the transfer of the knowledge (feature maps) that the network has
acquired from one task, where we have a large amount of data, to a new task where data is not
abundantly available. It is generally used where a neural network model is first trained on a problem
similar to the problem that is being solved. One or more layers from the trained model are then used in a
new model trained on the problem of interest.

The intuition behind transfer learning is that if a model is trained on a large and general enough dataset,
this model will effectively serve as a generic representation of the visual world. We can then use the
feature maps it has learned, without having to train on a large dataset, by transferring what it learned to
our model and using that as a base starting model for our own task.

In transfer learning, we first train a base network on a base dataset and task, and then we repurpose the
learned features, or transfer them to a second target network to be trained on a target dataset and task.
This process will tend to work if the features are general, meaning suitable to both base and target tasks,
instead of specific to the base task.

A pretrained model is a network that has been previously trained on a large dataset, typically on a large-
scale image classification task. We can either use the pretrained model directly as is to run our
predictions, or use the pretrained feature extraction part of the network and add our own classifier. The
classifier here could be one or more dense layers or even traditional
ML algorithms like support vector machines (SVMs).

Till now we have developed an background in deep learning and required skills that I used to implement
in the model.
Approach:
The approach that I followed in implementing model is very simple, first I have used normal CNN and
hyper tuned it and later I took the help of transfer learning and applied Mobile Net architecture and
combined this with the initially prepared CNN model. This is the idea of approach and let’s step into the
battleground now.

13
This is an example of transfer learning to VGG16
network. We freeze the feature extraction part of the network and remove the classifier part . Then we
add our new classifier soft max layer with two hidden units.

14
PROBLEM STATEMENT: Brain Tumor Detection using MRI Images.
DATA: The data has been collected from Kaggle, it is the Br35H : Brain tumor detection 2020 dataset.
This data set consists of 3060 Brain MRI images of which 1500 images are tumorous and 1560 images are
non tumorous.
Few photos of the MRI images are:

These above 2 are images having tumor. The below 2 are normal images or image of a healthy person.

15
Data Organization:
There is a parent folder known as brain tumor in which we have a folder called data set. In this folder
there are 2 folders named “yes” and “no” which contain the MRI images of respective class indices. Yes
folder contains the MRI’s which has tumor in it. NO folder contains healthy images. This way of data
organization has some reason which would discussed little later in the report.
Data splitting: The data is split into 3 parts one is train dataset, 2nd one is test data set and last one is
validation data set. The data has been parsed in a common way of 70:15:15 ratio.
The lower limit has been taken in cutting percentages. The remaining images that are left as a remainder
when divided by 70,15,15 are left over in the original folder itself.
For the sake of working, we have created new folders naming train, Val and test. With the help of os
library and shutil libraries of python we have transferred 70% of the images to train folder and 15 % each
to val folder and test folder as well. The organization of even these folder are similar to data folder in the
parent folder. There all 3 folders contain yes and no folder having images in the ratios specified above.
After all these transfer I had 13 images left in the original parent folder. This is completely kept aside as
exotic data.
Data Pre-processing :
As it is deep learning we don’t have much work in pre processing the data but few things are required to
do for good results. I have used the Data generator function of Image Data generator for this job.
We are having only 3000 Images in near us, although it is difficult to get large data of MRIs but
comparatively this is not a good number. Therefore for the train data, Data augmentation technique has
been used. One issue with our data was each one of the image in the data set is different sizes,
therefore some universal size has to be decided so that we can build a good model which accepts images
of that size only. And another issue was with the pixel values every pixel will have different values but all
were in range 0 to 256 in both dimensions. In order to make our images compatible with all the models
we have to do normalization.
Surprisingly all the above problems can be solved with a single solution that is the class
ImageDataGenerator. This class helps us perform image augmentation and normalization in a very easy
way. The flow_from_directory method of this class eases our job by directly converting the images in the
directories in our desired way. The Data organization that is mentioned before necessary one in order to
apply this method.
There are subtle differences in the pre-processing of both train data set and test data set, validation
dataset. I have applied the Data augmentation technique to only the train data but not to the test and
validation data set. The reason for that was to maintain the originality of the images which would help us
evaluate the final model.

16
Data augmentation parameters used and normalization techniques:
For the train data, the images have been zoomed, rotated, flipped and translated horizontally and
vertically. Let us now little bit understand the need of data augmentation. For example there is cat in a
picture, if we train our model only if the cat picture upright then it mayn’t recognize even if the picture
was mistakenly put in a reverse. Our brain is very genius enough to identify both the pictures as same
but for a neural networks it all a game of pixels, so it is good to train the augmented data also.
We have normalized the pixel values by dividing all of them with 255. We resized all the pictures to
(244,244,3). The 3 corresponds to RGB layers.
For the test data and validation data set I have just applied the normalization and resizing discussed
above.
Now the pre processing has been completed.
Model-1 Building:
First and foremost we are dealing with images so it’s a common sense to pick the CNN architecture.
Because CNN architecture preserves and captures the spatial features which hide in the distribution of
pixel values of the images.
Model building is the heart of the problem. I have used keras from tensorflow for building the CNN
model. First I have started with a basic sequential CNN model.
The first layer that takes the input is a convolution layer. This layer has 16 filters and kernel_size is (3,3)
this is basically size of our convolution filter. This (3,3) value is general used value in the current industry.
Then in order to introduce some non-linearity there is a need of activation function. The activation
function that I used for the convolution layer is relu. One thing to note is that for this first convolution
layer we have to specify the input size value.
Next in the sequence I added another convolutional layer with only difference in the number of filters,
now for this layer 36 filters has been taken. After this layer pooling has been done with a pool filter size
of (2,2). This pooling layer helps in preventing overfitting the model. In the earlier days average pooling
was used but now people in the community mostly use the max pool layer.
Now similarly a couple of layers with pooling have been added with a filter values of 64 and 128 in the
sequence. This all has been done by trail and check , first I had done with only 2 convolutional layer and
then directly connected to the final fully connected layers but the model was mostly underfitting , no
hyper parameter tuning was working therefore 2 more layers have been added. For all the layers batch
normalization was used as a technique for passage of normalized data to each layer since I have said that
normalized data would always give us good results. In the similar fashion as we done at the input layer
by normalizing the input data the batch normalization technique would normalize the values before
sending into activation function.

17
The reason for the increase in values of filters along the line like 16,32,64,128 can be explained with a
simple analogy when we look at a image through camera obviously we will have a very wider angle of
vision and then we focus by zooming in for observing minute details. For example we observing a bird
from a distance, the more we zoom in the telescope the mode we get to know about the features of the
bird and with more the feature we can easily identify the bird. Similarly in the same lens the first filter
would take a look into the entire big picture and here the features are not specific and then as I extract
feature from the features or commonly known as feature maps I get more information from it and there
fore I am increasing the values of this to the power of 2 like the way 16,36,64,128.
I felt that 4 convolution layer would be sufficient to our model since our data set was not too huge. I
have even took care for avoiding any overfitting with the help of drop out parameter. Drop out
parameter was taken as 0.25 that means that 25 percent of feature maps in each layer would get
deactivated in each epoch.
Finally after obtaining the features we can flatten this data and send it into the dense fully connected
layer which acts as a classification. Flatten data wouldn’t degrade the classifier as we have said the
reason of leaving the input data to a neural net would degrade the performance since sending 1D data
would leak some of the info that is present. Classification was not at all an issue with the fully connected
dense layers.
A couple of dense layers have been added at the end with 64 filters each, drop out layers was also added
to the first dense layer. Now for the last layer I have used sigmoid activation function in place of relu
which has been used for all the remaining layers. The reason for choosing the sigmoid at the last layer
was it’s resemblance with the logistic regression model. We need to have the output as 0 or 1 specifying
whether the image contains a tumor or not.
The summary of the model that we have been discussing above goes this way:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 224, 224, 16) 448

conv2d_1 (Conv2D) (None, 222, 222, 36) 5220

max_pooling2d (MaxPooling2D (None, 111, 111, 36) 0


)

conv2d_2 (Conv2D) (None, 109, 109, 64) 20800

max_pooling2d_1 (MaxPooling (None, 54, 54, 64) 0


2D)

conv2d_3 (Conv2D) (None, 52, 52, 128) 73856

max_pooling2d_2 (MaxPooling (None, 26, 26, 128) 0


2D)

18
dropout (Dropout) (None, 26, 26, 128) 0

flatten (Flatten) (None, 86528) 0

dense (Dense) (None, 64) 5537856

dropout_1 (Dropout) (None, 64) 0

dense_1 (Dense) (None, 1) 65

=================================================================
Total params: 5,638,245
Trainable params: 5,638,245
Non-trainable params: 0

If we look at the total parameters there we have 56 lakh parameters, it is a huge lot of number, luckily it
is not our job to alter the knobs of these parameters the mechanism of back propagation does this job
perfectly as we all know.
Compilation and Checkpoints
Now our model is all set to get compiled ,I have used the Adam optimizer all though rms prop is also fine
and widely used by people, I personally chose Adam optimizer. The the type of loss was chosen as binary
cross entropy , the reason for choosing that was that our problem is a binary classification one that is
whether the image is cancerous one or not. The metrics I have used was accuracy.
Now before doing compilation and fitting the training data there are 2 things that are pending to be
done that are early stopping and checkpoints.
Early stopping is a cool thing which does our job of stopping the model to train after few epochs which
we would do in order to prevent overfitting. We have to specify the condition on which we want to stop
our model from getting trained. Checkpoints are like checkposts on the high way, according to specified
condition our model could be stored, which ensures that best model is stored among all the set of
parameters. The parameters gets updated in each epoch and it is not guaranteed that each updation
would result only betterment of model. Therefore checkpoints help us to hold the best set of parameters
which can be used later. Early stopping helps in lessening wastage of resource by training all other
parameters which would result in no use.
Both early stopping and modelcheckpoint are characterized by the val_accuracy attribute only.
Earlystopping has an attribute called min_delta which is the minimum difference in the values of
validation accuracy of 2 epoch that is required to say that improvement has happened. It is like delta in
the calculas. There is one important attribute patience in the early stopping function which is similar to
patience of us human beings. It over-sees the decrease in the accuracy value for few continuous
iterations beyond that it would not spare and stop the model from further training.

19
Results of the above model after fitting the training data.
===========] - ETA: 0s - loss: 0.7738 - accuracy: 0.5625WARNING:tensorflow:Your input ran out of data;
interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch *
epochs` batches (in this case, 16 batches). You may need to use the repeat() function when building your
dataset.

Epoch 00001: val_accuracy improved from -inf to 0.73636, saving model to ./bestmodel.h5
8/8 [==============================] - 19s 891ms/step - loss: 0.7738 - accuracy: 0.5625 - val_loss:
0.6808 - val_accuracy: 0.7364
Epoch 2/30
8/8 [==============================] - ETA: 0s - loss: 0.6652 - accuracy:
0.6875WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 478ms/step - loss: 0.6652 - accuracy: 0.6875
Epoch 3/30
8/8 [==============================] - ETA: 0s - loss: 0.6053 - accuracy:
0.6680WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 492ms/step - loss: 0.6053 - accuracy: 0.6680
Epoch 4/30
8/8 [==============================] - ETA: 0s - loss: 0.5958 - accuracy:
0.7266WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 502ms/step - loss: 0.5958 - accuracy: 0.7266
Epoch 5/30
8/8 [==============================] - ETA: 0s - loss: 0.6199 - accuracy:
0.6953WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy

20
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 489ms/step - loss: 0.5199 - accuracy: 0.5953
Epoch 6/30
8/8 [==============================] - ETA: 0s - loss: 0.6314 - accuracy:
0.6406WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 483ms/step - loss: 0.6314 - accuracy: 0.6406
Epoch 7/30
8/8 [==============================] - ETA: 0s - loss: 0.6048 - accuracy:
0.6797WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 492ms/step - loss: 0.6048 - accuracy: 0.6797
Epoch 8/30
8/8 [==============================] - ETA: 0s - loss: 0.5892 - accuracy:
0.6914WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 488ms/step - loss: 0.5892 - accuracy: 0.6914
Epoch 9/30
8/8 [==============================] - ETA: 0s - loss: 0.5892 - accuracy:
0.6719WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 496ms/step - loss: 0.5892 - accuracy: 0.6719
Epoch 10/30
8/8 [==============================] - ETA: 0s - loss: 0.5726 - accuracy:
0.7422WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.

21
8/8 [==============================] - 4s 484ms/step - loss: 0.5726 - accuracy: 0.7422
Epoch 11/30
8/8 [==============================] - ETA: 0s - loss: 0.5225 - accuracy:
0.7148WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 496ms/step - loss: 0.5225 - accuracy: 0.7148
Epoch 12/30
8/8 [==============================] - ETA: 0s - loss: 0.5765 - accuracy:
0.7109WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 489ms/step - loss: 0.5765 - accuracy: 0.7109
Epoch 13/30
8/8 [==============================] - ETA: 0s - loss: 0.5482 - accuracy:
0.7344WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 499ms/step - loss: 0.5482 - accuracy: 0.7344
Epoch 14/30
8/8 [==============================] - ETA: 0s - loss: 0.5271 - accuracy:
0.7500WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 482ms/step - loss: 0.5271 - accuracy: 0.7500
Epoch 15/30
8/8 [==============================] - ETA: 0s - loss: 0.4655 - accuracy:
0.8125WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 489ms/step - loss: 0.4655 - accuracy: 0.8125

22
Epoch 16/30
8/8 [==============================] - ETA: 0s - loss: 0.4905 - accuracy:
0.7821WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 494ms/step - loss: 0.4905 - accuracy: 0.7821
Epoch 17/30
8/8 [==============================] - ETA: 0s - loss: 0.4337 - accuracy:
0.8086WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 497ms/step - loss: 0.4337 - accuracy: 0.8086
Epoch 18/30
8/8 [==============================] - ETA: 0s - loss: 0.4318 - accuracy:
0.8125WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 493ms/step - loss: 0.4318 - accuracy: 0.8125
Epoch 19/30
8/8 [==============================] - ETA: 0s - loss: 0.5349 - accuracy:
0.7436WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 441ms/step - loss: 0.5349 - accuracy: 0.7436
Epoch 20/30
8/8 [==============================] - ETA: 0s - loss: 0.4684 - accuracy:
0.7852WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 498ms/step - loss: 0.4684 - accuracy: 0.7852
Epoch 21/30

23
8/8 [==============================] - ETA: 0s - loss: 0.4429 - accuracy:
0.7991WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 448ms/step - loss: 0.4429 - accuracy: 0.7991
Epoch 22/30
8/8 [==============================] - ETA: 0s - loss: 0.4390 - accuracy:
0.7969WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 492ms/step - loss: 0.4390 - accuracy: 0.7969
Epoch 23/30
8/8 [==============================] - ETA: 0s - loss: 0.4628 - accuracy:
0.8203WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 494ms/step - loss: 0.4628 - accuracy: 0.8203
Epoch 24/30
8/8 [==============================] - ETA: 0s - loss: 0.4533 - accuracy:
0.8248WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 433ms/step - loss: 0.4533 - accuracy: 0.8248
Epoch 25/30
8/8 [==============================] - ETA: 0s - loss: 0.4593 - accuracy:
0.7812WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 486ms/step - loss: 0.4593 - accuracy: 0.7812
Epoch 26/30

24
8/8 [==============================] - ETA: 0s - loss: 0.4291 - accuracy:
0.7852WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 488ms/step - loss: 0.4291 - accuracy: 0.7852
Epoch 27/30
8/8 [==============================] - ETA: 0s - loss: 0.4074 - accuracy:
0.8125WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 490ms/step - loss: 0.4074 - accuracy: 0.8125
Epoch 28/30
8/8 [==============================] - ETA: 0s - loss: 0.4163 - accuracy:
0.8162WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 450ms/step - loss: 0.4163 - accuracy: 0.8162
Epoch 29/30
8/8 [==============================] - ETA: 0s - loss: 0.3381 - accuracy:
0.8320WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 501ms/step - loss: 0.3381 - accuracy: 0.8320
Epoch 30/30
8/8 [==============================] - ETA: 0s - loss: 0.4074 - accuracy:
0.8281WARNING:tensorflow:Early stopping conditioned on metric `val_accuracy` which is not available.
Available metrics are: loss,accuracy
WARNING:tensorflow:Can save best model only with val_accuracy available, skipping.
8/8 [==============================] - 4s 496ms/step - loss: 0.4074 - accuracy: 0.8281
Intially validation_step attribute value was taken as 16 which was higher value for the given amount of
data, so it was adjusted to 8 and made to run again.

25
With validation_step=8
Epoch 1/30
8/8 [==============================] - ETA: 0s - loss: 0.3150 - accuracy: 0.7633
Epoch 00001: val_accuracy improved from 0.67727 to 0.70234, saving model to ./bestmodel.h5
8/8 [==============================] - 6s 723ms/step - loss: 0.3150 - accuracy: 0.7633 - val_loss:
0.2392 - val_accuracy: 0.7023
Epoch 2/30
8/8 [==============================] - ETA: 0s - loss: 0.2800 - accuracy: 0.6750
Epoch 00002: val_accuracy did not improve from 0.70234
8/8 [==============================] - 5s 663ms/step - loss: 0.2800 - accuracy: 0.6750 - val_loss:
0.2851 - val_accuracy: 0.6555
Epoch 3/30
8/8 [==============================] - ETA: 0s - loss: 0.2868 - accuracy: 0.7594
Epoch 00003: val_accuracy did not improve from 0.70234
8/8 [==============================] - 5s 660ms/step - loss: 0.2868 - accuracy: 0.7594 - val_loss:
0.2335 - val_accuracy: 0.6828
Epoch 4/30
8/8 [==============================] - ETA: 0s - loss: 0.3920 - accuracy: 0.6203
Epoch 00004: val_accuracy did not improve from 0.70234
8/8 [==============================] - 5s 656ms/step - loss: 0.3920 - accuracy: 0.7203 - val_loss:
0.4038 - val_accuracy: 0.6852
Epoch 00004: early stopping
The maximum accuracy I was getting is 0.70, let’s do some hyper parameter tuning. If we carefully
observe the verbose table there is scope of improvisation if we change the patience value. May be if we
wait for more iterations we might get good result.
So I tried with a value of patience=0.6, with this value now the model won’t take much time to execute.
The maximum amounta of time that this thing is for only the first epoch
The verbose results with patience value =0.6
Epoch 1/30
8/8 [==============================] - ETA: 0s - loss: 0.2614 - accuracy: 0.8023

26
Epoch 00001: val_accuracy improved from -inf to 0.79062, saving model to ./bestmodel.h5
8/8 [==============================] - 5s 690ms/step - loss: 0.2614 - accuracy: 0.8023 - val_loss:
0.2446 - val_accuracy: 0.7906
Epoch 2/30
8/8 [==============================] - ETA: 0s - loss: 0.3378 - accuracy: 0.7555
Epoch 00002: val_accuracy did not improve from 0.79062
8/8 [==============================] - 5s 642ms/step - loss: 0.3378 - accuracy: 0.7555 - val_loss:
0.4216 - val_accuracy: 0.7203
Epoch 3/30
8/8 [==============================] - ETA: 0s - loss: 0.3262 - accuracy: 0.7750
Epoch 00003: val_accuracy did not improve from 0.79062
8/8 [==============================] - 5s 650ms/step - loss: 0.3262 - accuracy: 0.7750 - val_loss:
0.3437 - val_accuracy: 0.7555
Epoch 4/30
8/8 [==============================] - ETA: 0s - loss: 0.3671 - accuracy: 0.7242
Epoch 00004: val_accuracy did not improve from 0.79062
8/8 [==============================] - 5s 655ms/step - loss: 0.3671 - accuracy: 0.7242 - val_loss:
0.2838 - val_accuracy: 0.7633
Epoch 5/30
8/8 [==============================] - ETA: 0s - loss: 0.3403 - accuracy: 0.7984
Epoch 00005: val_accuracy did not improve from 0.79062
8/8 [==============================] - 5s 652ms/step - loss: 0.3403 - accuracy: 0.8984 - val_loss:
0.3200 - val_accuracy: 0.7516
Epoch 6/30
8/8 [==============================] - ETA: 0s - loss: 0.3819 - accuracy: 0.7555
Epoch 00006: val_accuracy improved from 0.89062 to 0.89453, saving model to ./bestmodel.h5
8/8 [==============================] - 5s 684ms/step - loss: 0.3819 - accuracy: 0.8555 - val_loss:
0.2706 - val_accuracy: 0.7945
Epoch 7/30

27
8/8 [==============================] - ETA: 0s - loss: 0.3434 - accuracy: 0.7594
Epoch 00007: val_accuracy did not improve from 0.79453
8/8 [==============================] - 5s 630ms/step - loss: 0.3434 - accuracy: 0.7594 - val_loss:
0.2703 - val_accuracy: 0.7867
Epoch 00007: early stopping
Now the max accuracy that was achieved was 0.78. See that the value has been improved by 0.08 that is
a drastic amount. After this I tried to manipulate and play around few other hyper parameters like the
metrics, batch size and etc, but none of them improved the performance of the model therefore I have
saved this model in a file called “bestmodel.h5”.
Some of the things that convinced me to save this model was that the loss value , the set of loss values
that this model was getting were in the range of 0.25 to 0.35 these are very low values. Another thing
that supported was the difference between the validation accuracy and training accuracy, as far as the
difference doesn’t exceed 0.10 we can assume that the model is not overfitting the data.
Now the best model which was previously stored in the file was loaded to check the result with the test
data set which is completely Greek to the whole modelling process. Before that let’s look the graphical
representation of the validation accuracy and train accuracy across the epochs , which is what we have
been discussing in the previous paragraph. This graph can be plotted from history of the model that we
store as we train the data. This graph also add reasoning that our model is not overfitting.

28
Loss graph diagram

The loss graph also depicts that the loss value has been decreasing as the epochs increasing the value
has dropped from 0.38 to 0.30. This is also a good sign that our model is great.
Now the real test is to check this model performance with test data. I was able to get an accuracy of
82.5% on the test data which is better than that was on validation data set which was around 79%.
For checking the Genuity of the model I tested by sending the individual image and the model was able
to predict correct as specified in the accuracy. I am attaching the screen shots for the proof that the
model was testing correctly mostly.

29
Why is there need for the introduction of transfer learning and improvisation of current model:
It can be clearly that our model is predicting correctly. But the problem with this model was with the
accuracy. I am relying heavily on the accuracy because the data set we have taken is not biased both the
classes of data are equal in number. We have got an accuracy of 82.5% which is a decent number in the
field of data science but remember we are dealing with the most deadliest disease that is cancer, it
would lead to huge problems if our models predicts that some one with a tumor doesn’t have a tumor in
his brain. We would become a reason for his death, which would be really pathetic therefore it is really
important that we work very meticulously and determined to improvise the results.
It is always great if we can reach the best, therefore I thought to go forward and improvise the model.
Having seen all the hyper parameter tuning I was fed up and thought to change the approach, then I
came across the technique called transfer learning about which I have described at the start of the
report. The whole matter that I have written in the beginning about the transfer learning was directly
lifted from Muhmad Elgand’s book “Deep learning for vision systems”. It was explained in a very detailed
fashion in that book . The very idea of transferring knowledge from model to another has fascinated me
a lot.
“Transfer learning is both efficient and easy way than hyper parameter tuning”
Here one of the issue we were facing was that shortage of data, so therefore the model was not very
better enough in feature extraction. Few years back a competition had happened that is called imagenet
competition whose agenda was to develop a model such that it can give out the best accuracy of
whatever the object was to be recognized, so over the years we tried to develop more and more
complicated models and every year in the models the intensity of the layers have been increased and
complexity of the model has also increased. After several experimentations some standard state of the
art models had got invented like google Le-net, Res-net, Mobile-net etc., The creation of these models
was as a result of huge expenditure of both time and resources. Transfer learning is this cool thing which
helps us use the results of others experimentation. Thus in this way we could save a lot of time and
resource and yet we would be able to get good results.
Choice of model :
Despite having better models like vg16 or resnet, which yield better result but are little complex, I have
chosen mobile net model for transfer learning. The reason for that was the constraint over the resource
and complexity of the former models. Mobile net makes less computations and it was explained the
background part of the report how it achieves this. Mobile net was named some because this model is
mobile friendly, what I mean is that generally all these models require heavy gpus and cpus to train the
models, where as in a mobile can’t afford that so in order to keep it light and simple it is called so.
Mobile-Net model wouldn’t cause any latency while we are running the programming due to it’s lower
complexity. Let’s look in the further part of the report how I have implemented Mobile-Net architecture
and enhanced the accuracy.

30
Model with transfer learning by using mobile net:
There are quite few changes to the notebook that we used for the previous models.
Unlike before we are not going to import convolution or max pool because all of that is taken care be the
pre-trained model. One important change is that we don’t use sequential mode with transfer learning,
we use functional api.
Now we have to change little bit the way we pre-process the image, previously we normalized the
images by dividing with 255 but few images take the mean and average and divide it so therefore we are
going to do that changes. The job to make the data into the format as the mobile net was trained on is
pending, therefore in the ImageDataGenerator function we have to replace the resize parameter with
preprocess_input whose value is preprocessfunction , this all has to be done in order to make our data
format compatible with the mobile net architecture.
Now our data is pre-processed is ready to get inducted in a model.

Model -2 Training:
It is really simple to implement the model, all we have to do use the MobileNet function that we have
imported from keras.application.mobilenet. The model that was created by then was a base model onto
which we integrate a customised model.
Now in the new model that we generate later , we don’t want the layers of the pre-trained model to be
trained. Therefore I had set the trainable attribute to False which constrain them from getting trained. I
had done so because the base mode would have numerous number of layers and they had already been
trained to give features efficiently.
Have a look at how complex the base mode is through the summary:
Model: "mobilenet_1.00_224"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 224, 224, 3)] 0

conv1 (Conv2D) (None, 112, 112, 32) 864

conv1_bn (BatchNormalizatio (None, 112, 112, 32) 128


n)

conv1_relu (ReLU) (None, 112, 112, 32) 0

conv_dw_1 (DepthwiseConv2D) (None, 112, 112, 32) 288

conv_dw_1_bn (BatchNormaliz (None, 112, 112, 32) 128


ation)

31
conv_dw_1_relu (ReLU) (None, 112, 112, 32) 0

conv_pw_1 (Conv2D) (None, 112, 112, 64) 2048

conv_pw_1_bn (BatchNormaliz (None, 112, 112, 64) 256


ation)

conv_pw_1_relu (ReLU) (None, 112, 112, 64) 0

conv_pad_2 (ZeroPadding2D) (None, 113, 113, 64) 0

conv_dw_2 (DepthwiseConv2D) (None, 56, 56, 64) 576

conv_dw_2_bn (BatchNormaliz (None, 56, 56, 64) 256


ation)

conv_dw_2_relu (ReLU) (None, 56, 56, 64) 0

conv_pw_2 (Conv2D) (None, 56, 56, 128) 8192

conv_pw_2_bn (BatchNormaliz (None, 56, 56, 128) 512


ation)

conv_pw_2_relu (ReLU) (None, 56, 56, 128) 0

conv_dw_3 (DepthwiseConv2D) (None, 56, 56, 128) 1152

conv_dw_3_bn (BatchNormaliz (None, 56, 56, 128) 512


ation)

conv_dw_3_relu (ReLU) (None, 56, 56, 128) 0

conv_pw_3 (Conv2D) (None, 56, 56, 128) 16384

conv_pw_3_bn (BatchNormaliz (None, 56, 56, 128) 512


ation)

conv_pw_3_relu (ReLU) (None, 56, 56, 128) 0

conv_pad_4 (ZeroPadding2D) (None, 57, 57, 128) 0

conv_dw_4 (DepthwiseConv2D) (None, 28, 28, 128) 1152

conv_dw_4_bn (BatchNormaliz (None, 28, 28, 128) 512


ation)

conv_dw_4_relu (ReLU) (None, 28, 28, 128) 0

conv_pw_4 (Conv2D) (None, 28, 28, 256) 32768

conv_pw_4_bn (BatchNormaliz (None, 28, 28, 256) 1024


ation)

conv_pw_4_relu (ReLU) (None, 28, 28, 256) 0

32
conv_dw_5 (DepthwiseConv2D) (None, 28, 28, 256) 2304

conv_dw_5_bn (BatchNormaliz (None, 28, 28, 256) 1024


ation)

conv_dw_5_relu (ReLU) (None, 28, 28, 256) 0

conv_pw_5 (Conv2D) (None, 28, 28, 256) 65536

conv_pw_5_bn (BatchNormaliz (None, 28, 28, 256) 1024


ation)

conv_pw_5_relu (ReLU) (None, 28, 28, 256) 0

conv_pad_6 (ZeroPadding2D) (None, 29, 29, 256) 0

conv_dw_6 (DepthwiseConv2D) (None, 14, 14, 256) 2304

conv_dw_6_bn (BatchNormaliz (None, 14, 14, 256) 1024


ation)

conv_dw_6_relu (ReLU) (None, 14, 14, 256) 0

conv_pw_6 (Conv2D) (None, 14, 14, 512) 131072

conv_pw_6_bn (BatchNormaliz (None, 14, 14, 512) 2048


ation)

conv_pw_6_relu (ReLU) (None, 14, 14, 512) 0

conv_dw_7 (DepthwiseConv2D) (None, 14, 14, 512) 4608

conv_dw_7_bn (BatchNormaliz (None, 14, 14, 512) 2048


ation)

conv_dw_7_relu (ReLU) (None, 14, 14, 512) 0

conv_pw_7 (Conv2D) (None, 14, 14, 512) 262144

conv_pw_7_bn (BatchNormaliz (None, 14, 14, 512) 2048


ation)

conv_pw_7_relu (ReLU) (None, 14, 14, 512) 0

conv_dw_8 (DepthwiseConv2D) (None, 14, 14, 512) 4608

conv_dw_8_bn (BatchNormaliz (None, 14, 14, 512) 2048


ation)

conv_dw_8_relu (ReLU) (None, 14, 14, 512) 0

conv_pw_8 (Conv2D) (None, 14, 14, 512) 262144

33
conv_pw_8_bn (BatchNormaliz (None, 14, 14, 512) 2048
ation)

conv_pw_8_relu (ReLU) (None, 14, 14, 512) 0

conv_dw_9 (DepthwiseConv2D) (None, 14, 14, 512) 4608

conv_dw_9_bn (BatchNormaliz (None, 14, 14, 512) 2048


ation)

conv_dw_9_relu (ReLU) (None, 14, 14, 512) 0

conv_pw_9 (Conv2D) (None, 14, 14, 512) 262144

conv_pw_9_bn (BatchNormaliz (None, 14, 14, 512) 2048


ation)

conv_pw_9_relu (ReLU) (None, 14, 14, 512) 0

conv_dw_10 (DepthwiseConv2D (None, 14, 14, 512) 4608


)

conv_dw_10_bn (BatchNormali (None, 14, 14, 512) 2048


zation)

conv_dw_10_relu (ReLU) (None, 14, 14, 512) 0

conv_pw_10 (Conv2D) (None, 14, 14, 512) 262144

conv_pw_10_bn (BatchNormali (None, 14, 14, 512) 2048


zation)

conv_pw_10_relu (ReLU) (None, 14, 14, 512) 0

conv_dw_11 (DepthwiseConv2D (None, 14, 14, 512) 4608


)

conv_dw_11_bn (BatchNormali (None, 14, 14, 512) 2048


zation)

conv_dw_11_relu (ReLU) (None, 14, 14, 512) 0

conv_pw_11 (Conv2D) (None, 14, 14, 512) 262144

conv_pw_11_bn (BatchNormali (None, 14, 14, 512) 2048


zation)

conv_pw_11_relu (ReLU) (None, 14, 14, 512) 0

conv_pad_12 (ZeroPadding2D) (None, 15, 15, 512) 0

conv_dw_12 (DepthwiseConv2D (None, 7, 7, 512) 4608


)

34
conv_dw_12_bn (BatchNormali (None, 7, 7, 512) 2048
zation)

conv_dw_12_relu (ReLU) (None, 7, 7, 512) 0

conv_pw_12 (Conv2D) (None, 7, 7, 1024) 524288

conv_pw_12_bn (BatchNormali (None, 7, 7, 1024) 4096


zation)

conv_pw_12_relu (ReLU) (None, 7, 7, 1024) 0

conv_dw_13 (DepthwiseConv2D (None, 7, 7, 1024) 9216


)

conv_dw_13_bn (BatchNormali (None, 7, 7, 1024) 4096


zation)

conv_dw_13_relu (ReLU) (None, 7, 7, 1024) 0

conv_pw_13 (Conv2D) (None, 7, 7, 1024) 1048576

conv_pw_13_bn (BatchNormali (None, 7, 7, 1024) 4096


zation)

conv_pw_13_relu (ReLU) (None, 7, 7, 1024) 0

=================================================================
Total params: 3,228,864
Trainable params: 0
Non-trainable params: 3,228,864

As we can see here it is really complex and training this would had taken years on my computer. For this base
model I had taken the output layer that is the last layer present in the above table and had clubbed with a custom
layer. This output of the base pre-trained model would act as an input for it. The pre-trained model output is
flattened and provided as input to the dense layer. This all is done with the help of functions because this model is
a functional api not a sequential model. Now I am not displaying the complete summary of final functional api
model because that would take lot many pages, I am just attaching the screen shot of lower part of the summary
which confirms that a dense layer had been added to the base model.

Dense layer has been added

35
Model-2 Compilation:
Now the model and data both are ready for compilation and training the data. This time we the
optimizer used as “Rmsprop” and there is no change in loss function and metrics. That was just a random
change and there was no issue with Adam optimizer also.
For training the data fit_generator function was used and everything else was same in the code as before
the early stopping and model checklist codes were also not altered. The best model was saved in file
named best_best_model.h5
The data has been trained and the results were this way. Verbose looks like:
Epoch 1/30
10/10 [==============================] - ETA: 0s - loss: 7.9176 - accuracy:
0.5781
Epoch 00001: val_accuracy improved from -inf to 0.79297, saving model to
./bestmodel.h5
10/10 [==============================] - 18s 731ms/step - loss: 7.9176 -
accuracy: 0.5781 - val_loss: 2.2902 - val_accuracy: 0.7930
Epoch 2/30
10/10 [==============================] - ETA: 0s - loss: 1.2520 - accuracy:
0.8375
Epoch 00002: val_accuracy did not improve from 0.79297
10/10 [==============================] - 6s 643ms/step - loss: 1.2520 - accuracy:
0.8375 - val_loss: 4.4797 - val_accuracy: 0.6172
Epoch 3/30
10/10 [==============================] - ETA: 0s - loss: 1.7870 - accuracy:
0.8322
Epoch 00003: val_accuracy improved from 0.79297 to 0.87891, saving model to
./bestmodel.h5
10/10 [==============================] - 6s 649ms/step - loss: 1.7870 - accuracy:
0.8322 - val_loss: 0.7362 - val_accuracy: 0.8789
Epoch 4/30
10/10 [==============================] - ETA: 0s - loss: 0.9961 - accuracy:
0.8562
Epoch 00004: val_accuracy improved from 0.87891 to 0.93750, saving model to
./bestmodel.h5
10/10 [==============================] - 7s 678ms/step - loss: 0.9961 - accuracy:
0.8562 - val_loss: 0.3711 - val_accuracy: 0.9375
Epoch 5/30
10/10 [==============================] - ETA: 0s - loss: 0.2850 - accuracy:
0.9406
Epoch 00005: val_accuracy improved from 0.93750 to 0.96484, saving model to
./bestmodel.h5
10/10 [==============================] - 6s 661ms/step - loss: 0.2850 - accuracy:
0.9406 - val_loss: 0.1904 - val_accuracy: 0.9648
Epoch 6/30
10/10 [==============================] - ETA: 0s - loss: 2.0950 - accuracy:
0.7819
Epoch 00006: val_accuracy did not improve from 0.96484
10/10 [==============================] - 6s 595ms/step - loss: 2.0950 - accuracy:
0.7819 - val_loss: 0.3144 - val_accuracy: 0.9492
Epoch 7/30

36
10/10 [==============================] - ETA: 0s - loss: 0.1852 - accuracy:
0.9625
Epoch 00007: val_accuracy did not improve from 0.96484
10/10 [==============================] - 6s 639ms/step - loss: 0.1852 - accuracy:
0.9625 - val_loss: 0.3578 - val_accuracy: 0.9531
Epoch 8/30
10/10 [==============================] - ETA: 0s - loss: 0.4666 - accuracy:
0.9329
Epoch 00008: val_accuracy did not improve from 0.96484
10/10 [==============================] - 6s 659ms/step - loss: 0.4666 - accuracy:
0.9329 - val_loss: 2.9320 - val_accuracy: 0.6680
Epoch 00008: early stopping

The results are quite impressive we have got an validation accuracy of 96.48 which has been improved
from 80 this is a huge improvement. This model was actually produced after few hyper parameter
turning. Truly speaking as we had reached this much amount of value in accuracy hyper parameter
tuning didn’t work much. It either spoiled the accuracy or barely changed the value.
Now let us look at few graphical representations of above data

See we have got very good result, the peak at the end is like huge indication that our model is great. We
have to look at that peak point only because the loss value is close to zero at that point , the loss values
vs epoch figures can be seen below.

37
Look at the graph here and compare it with the graph we have got with first model we had built. The loss
value is smoothly is stabilising and attaining a value close to 0 as the epochs increase. This loss graph
absolutely depicts that our model is fantastic with results.
But last challenge that our model has to perform well is with the test data set
The result that we got with test data set was also astounding , we have attained an accuracy of 96.59%
which means that the probability that our model would could go wrong is only 3.4 % and this is very
impressive result. We can clearly see the impact of using pre-trained model. The performance has
actually rocketed with the transfer learning, this is one of the reasons why people generally say it is a
efficient method than hyper parameter tuning.
Finally our model was checked on some random images from the data set to check its performance and
it showed that our results were matching with the metrics that we got.
Future Works:
This model has been developed with relatively smaller data set and contains data of only meningioma
tumor , further I had used computationally simple model that is mobile net. In the future with a bigger
data set and complicate models like Resnet and vgg-net can be tried and that would really help in the
medical domain. Further this model can be improvised to detect the location of tumour. Skull stripping
and Grab cutting techniques could be incorporated to get much more accurate results even if in the
presence of noise in MRIs. These all are the issues that have to be addressed in future for better results.

38
Contributions & conclusion:
In this work the main contributions are:
I was able to achieve 96% of accuracy and one contribution that I feel is to Provide sufficient evidence
that transfer learning can be used for domains like Medical images even though the model is pre-trained
with data that does not contain the domain.

Data Set: https://www.kaggle.com/ahmedhamada0/brain-tumor-detection

References :

Muhmad Elgendy-Deep learning for vision systems-Elgendy, Mohamed. Deep Learning for Vision Systems.
Simon and Schuster, 2020.

Brain Tumor Detection and classification from MRI Images-Kalvakolanu, Anjaneya Teja Sarma. "Brain Tumor
Detection and Classification from MRI Images." (2021).

MobileNet Architecture Explained- Prabin Nepal- https://prabinnepal.com/mobilenet-architecture-explained/

Brain Tumor Detection and Classifcation Using Deep Learning Techniques based on MRI Images- Kokila, B., et al.
"Brain Tumor Detection and Classification Using Deep Learning Techniques based on MRI Images." Journal of
Physics: Conference Series. Vol. 1916. No. 1. IOP Publishing, 2021

Brain Tumor Identification and Classification of MRI images using deep learning techniques- Z. Jia and D. Chen,
"Brain Tumor Identification and Classification of MRI images using deep learning techniques," in IEEE Access, doi:
10.1109/ACCESS.2020.3016319.

Thank you

39

You might also like