0% found this document useful (0 votes)
4 views

Paul

This thesis presents a study on using deep learning techniques, specifically convolutional neural networks (CNNs), for the classification of brain tumors from MRI images. The research aims to automate the diagnosis process to reduce the burden on doctors and improve patient care by accurately identifying tumor types such as meningioma, glioma, and pituitary tumors. The study utilizes a dataset of 3064 brain images, achieving an average classification accuracy of 91.43% through advanced neural network methodologies.

Uploaded by

respondeaiemail1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Paul

This thesis presents a study on using deep learning techniques, specifically convolutional neural networks (CNNs), for the classification of brain tumors from MRI images. The research aims to automate the diagnosis process to reduce the burden on doctors and improve patient care by accurately identifying tumor types such as meningioma, glioma, and pituitary tumors. The study utilizes a dataset of 3064 brain images, achieving an average classification accuracy of 91.43% through advanced neural network methodologies.

Uploaded by

respondeaiemail1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Deep Learning for Brain Tumor Classification

By
Justin Paul

Thesis
Submitted to the Faculty of the
Graduate School of Vanderbilt University
in partial fulfillment of the requirements
for the degree of

MASTER OF SCIENCE
in
Computer Science

May 2016
Nashville, Tennessee

Approved:
Daniel Fabbri, Ph.D.
Bennett Landman, Ph.D.
To my friends and family,
whose support and advice has made all of this possible
ACKNOWLEDGEMENTS

This work would not have been possible without the mentorship and support from

friends, family, and professors. I am especially thankful to Professor Daniel Fabbri, whose

guidance and advisership has not only bolstered my career goals but also provided the academic

setting to allow me to learn. He is an example of both what it means to be a good researcher and

a good person. I additionally owe my gratitude to my professors in general, who have taught me

the fundamentals needed to achieve higher learning. Lastly, I am indebted to my family and

friends who have supported me throughout my time in academia. Their inspiration and

encouragement has been at the center of my drive to learn and create.


TABLE OF CONTENTS

PAGE

DEDICATION…………………………………………………………………………………..i

ACKNOWLEDGEMENTS…………………………………………………………………….ii

LIST OF TABLES………………………………………………………………………………v

LIST OF FIGURES…………………………………………………………………………….vi

1 Introduction……………………………………………………………………………………1

2 Related Work…………………………………………………………….………………...….2

3 Model…………………………………………………………………………….……………3

3.1 Convolutional Neural Network …………………………………………………...………...3

4 Algorithms and Implementation……………………………………..………………………..4

4.1 Forward Pass …………………………………………………………...…………...4

4.2 Backwards pass ……………………………………………………………………..5

5 Approach………………………………………………………………………………………5

5.1 Data ………………………………………………………………………………... 5

5.2 Preprocessing ………………………………………………………………………..6

5.2.1 Vanilla Data Preprocessing ……………………………………………..…6

5.2.2 Image and Location ………………………………………………….....….6

5.2.3 Tumor Zoom ……………………………………………………….…........6

5.3 Image Transformation …………………………………………………………….….6

5.3.1 Large Image Size ………………………………………..………………….6

5.3.2 Small Image Size ………………………………………………..………….6

5.4 Counteracting Overfitting………...………………………………………………….. 6


5.4.1 Crop Averaging …………………………………………………………….7

5.5 Network construction ……………………………………………………..….7

5.5.1 Convolutional Neural Network ………………………………...…………. 7

5.5.2 Fully Connected Neural Network ………………………………………….7

5.5.3 Concatenation of Convolutional and Fully Connected Input Layers....…….7

5.6 Random Forests ………………………………………………………………………8

5.7 Training ……………………………………………………………………………….8

5.7.1 Decaying Learning Rate …………………………...……………………….8

5.7.2 Accuracy Metrics……………………………...…………………………….8

6 Results…………………………………………………………………………………………...9

7 Conclusion and Future Work……………………………….…………………………………...9

8 References………………………………………………………..…………………………….20
LIST OF TABLES

Table Page

1. Average Five-Fold Cross Validation Test Accuracies with Brain Tumor Images Only……...8

2. Average Five-Fold Cross Validation Test Accuracies with Brain Tumor and Tumorless Brain

Images…………………………………………………………………………………………8

3. Number Correctly Predicted Per Image in Order of Confidence Until Incorrect Prediction for

256 x 256 Vanilla CNN……………………………………………………………………...19

4. Number Correctly Predicted Per Patient in Order of Confidence Until Incorrect Prediction for

256 x 256 Vanilla CNN……………………………………………………………………...19

5. Precision and Recall for Vanilla CNN 256 x 256 Best Per Image Model…………………...19

6. Precision and Recall for Vanilla CNN 256 x 256 Last Per Patient Model…………………..20
LIST OF FIGURES

Figure Page

1. Example Brain Tumors…………………………………………………………………...…….2

2. Example Neural Networks……………………………………………………………………...3

3. Example Average Axial Image………………………………………………………………..10

4. First Convolutional Layer’s 64 filters………………………………………………………....10

5. Average precision at k for Last models for per image accuracy………………………………11

6. Average precision at k for Best models for per image accuracy……………………..….........11

7. Average precision at k for Best-PD models for per image accuracy………………………….12

8. Average precision at k for Last models with tumorless brain images for per image accuracy.12

9. Average precision at k for Best models with tumorless brain images for per image accuracy.13

10. Average precision at k for Best-PD models with tumorless brain images for per image

accuracy……………………………………………………………………………………...13

11. Loss and accuracy history for Vanilla FCNN with 256 x 256…………………………..…..14

12. Loss and accuracy history for Vanilla CNN with 69 x 69…………………………………..14

13. Loss and accuracy history for CO CNN with 45 x 45 images………………………………15

14. Loss and accuracy history for Vanilla CNN with 64 x 64 images…………………………..15

15. Loss and accuracy history for Vanilla CNN with 256 x 256 images………………………..16

16. Loss and accuracy history for Vanilla FCNN with 64 x 64 images…………………………16

17. Loss and accuracy history of Vanilla ConcatCNN with 256 x 256 images…………………17

18. Loss and accuracy history for Tumor Locations ConcatCNN with 256 x 256 images……...18

19. Loss and accuracy history for Tumor Zoomed CNN with 207 x 312 images……………….19
20. Loss and accuracy history for Crop Averaging CNN with 196 x 196 images………………18
Deep Learning for Brain Tumor Classification

Justin Paul JUSTIN . S . PAUL @ VANDERBILT. EDU


Vanderbilt University

Abstract require a doctor to examine multiple image slices to de-


termine health issues which takes time away from more
Deep learning has been used successfully in su-
complex diagnoses. Our goal is to confidentally identify
pervised classification tasks in order to learn com-
brain cancer types to reduce doctor burden, leaving the
plex patterns. The purpose of the study is to ap-
most complex diagnoses to them.
ply this machine learning technique to classify-
ing images of brains with different types of tu- Previous research has developed specialized methods for
mors: meningioma, glioma, and pituitary. The automated brain tumor classification. Cheng et. al.1 has
image dataset contains 233 patients with a total created a brain tumor dataset containing T1-weighted contrast-
of 3064 brain images with either meningioma, enhanced images from 233 patients with three brain tu-
glioma, or pituitary tumors. The images are T1- mor types: meningioma, glioma, and pituitary. Addition-
weighted contrast enhanced MRI (CE-MRI) im- ally, the dataset has a variation of types of images: ax-
ages of either axial (transverse plane), coronal ial, coronal, and sagittal. Examples of these images can
(frontal plane), or sagittal (lateral plane) planes. be seen in Figure 1. In their work, Cheng et. al. used
This research focuses on the axial images, and image dilation and ring-forming subregions on tumor re-
expands upon this dataset with the addition of gions to increase accuracies of classifying brain tumors to
axial images of brains without tumors in order up to 91.28% using a Bag of Words (BoW) model. In ad-
to increase the number of images provided to the dition to BoW they also applied intensity histogram and
neural network. Training neural networks over gray level co-occurrence matrix (GLCM) with less accurate
this data has proven to be accurate in its classi- results1 . This research improves on previously presented
fications an average five-fold cross validation of results using a more general method of neural networks and
91.43%. by adding images of brains without tumors.
Three main types of NNs have been researched: fully con-
nected NNs (FCNNs), convolutional NNs (CNNs), and re-
1. Introduction current NNs (RNNs). For this study, CNNs are primarily
Patient diagnosis relies on a doctor’s manual evaluation of used given that the inputs are images, though FCNNs are
a patient and his or her test results. With no automated also examined. Though there have been prior attempts to
tools to help with a doctor’s diagnosis and limited number apply machine learning to medical data such as the example
of available doctors, not only is there a higher risk of mis- above, there are a lack of tools utilizing modern advances
diagnosis but also an increase in wait time for patients to in neural networks (NNs). While extensive research has
be seen. Doctors must take the time to manually review successfully applied these techniques to recognizing pat-
test results and images rather than spending time with the terns in non-medical images2 , the proposed research ap-
patient. In order to improve patient care, enhanced medi- plies them to medical images which there is a lack of avail-
cal technology in the form of automated tools is necessary able datasets. Furthermore, applying neural networks to
to increase doctor efficiency and decrease patient time in medical images has implications of faster and more precise
hospitals and time toward recovery. diagnoses.

The purpose of this research is to develop automated meth- By including images of brains without tumors, neural net-
ods to aid doctors in diagnosis in order to prevent misdi- works can better learn the structure of a brain and take
agnosis and decrease patient wait time. In particular, this steps towards differentiating brains with and without tu-
research achieves this automation through the classification mors. More generally, this differentiates physiological struc-
of brain tumor types from patient brain images. Images tures through deep learning. Applying neural networks to
medical images has implications of faster and more precise
diagnoses automatically, and this research introduces neu-
Deep Learning for Brain Tumor Classification

ral networks into the medical field where it has little current zooming into the tumor or region of interest and knowl-
use. The main contributions of this paper are as follows: edge of tumor existence. On the contrary, neural networks
are generalizable and can discover local features from im-
• Create a more generalized method for brain tumor clas- age input alone.
sification using deep learning Neural networks and its generalizability has only appeared
• Analyze the application of tumorless brain images on in recent years. After its rejection in the 1990’s, deep learn-
brain tumor classification ing came back into favor when Hinton et. al.8 in 2006 in-
troduced the method of pre-training hidden layers one at
• Empirically evaluate neural networks on the given datasets a time through unsupervised learning of restricted Boltz-
with per image accuracy and per patient accuracy. mann machines (RBMs). This demonstrated an effective
method of training neural networks through greedily stack-
ing RBMs. Since then the field of deep learning has ex-
2. Related Work panded and produced more efficient methods of training
neural networks and quickly became the state of the art.
A public brain tumor dataset was created from Nanfang Examples of modern neural networks are shown in Figure
Hospital, Guangzhou, China, and General Hospital, Tian- 2.
jing Medical University, China from 2005 to 2012 and was
used in Cheng et. al. 20151 to classify brain tumors in these While originally introduced into the public in 1998 by Le-
images. Three previous approaches were used to analyze Cun et. al.3 , convolutional neural networks gained popu-
this dataset: intensity histogram, gray level co-occurence larity when in 2012 Krizhevsky et. al.2 designed a win-
matrix (GLCM), and bag-of-words (BoW). Rather than only ning convolutional neural network for the ImageNet com-
using the tumor region, Cheng et. al. augmented the tumor petition and performed considerably better than the previ-
region by image dilation in order to enhance the surround- ous state of the art model. The computer vision commu-
ing tissue which could offer insights into the tumor type. nity adopted neural networks as the state of the art after
Augmentation continued by using increasing ring forma- this competition, realizing the potential convolutional neu-
tions helped by the image dilation and created by com- ral networks have on classification of images. Since 2012,
mon normalized Euclidean distances in order to use spatial convolutional neural networks have dominated other classi-
pyramid matching (SPM) to discover local features through fication competitions, including the Galaxy Zoo Challenge
computing histograms. In BoW, the local features are then that occurred from 2013 to 2014. Dieleman et. al.7 intro-
extracted through dictionary construction and historgram duced how data augmentation can greatly increase dataset
representation, which are then fed into a feature vector to size through transformations, rotations, and translations of
be trained on a classifier. Out of all three methods, BoW images. This in turn prevented overfitting and more gener-
gave the highest classification accuracy with 91.28%. Yet, alized learning.
this classification method is highly specialized, requiring Preventing overfitting in neural networks has been a main
Deep Learning for Brain Tumor Classification

(A)

(B)

Figure 2. (A) A standard fully connected neural network where each layer’s node is connected to each node from the
previous layer5 . (B) A convolutional neural network connecting a covolutional layer to a pooling layer7 .

focus for much research, and in 2014 Srivastava et. al.5 in- dard momentum where the gradient is taken at the current
troduced dropout as a simple way to prevent co-adaptation location.
of neurons. Dropout randomly drops neuron connections
Neural network research has recently started to combine
with a given probability, causing neuron units to become
with medical research though still in its infancy stages. While
more independent rather than relying on other neurons to
prior research has shown promising results10,14 , only re-
detect features. Similar in nature, maxout layers were de-
cently has access to large quantities of medical data begun
signed to work in conjunction with dropout. Created by
to surface. Many of the concepts and ideas from past re-
Goodfellow et. al.9 , maxout layers are equivalent to the
search in neural networks are directly applied to this study.
standard feed-forward multilayer perceptron neural network,
yet it uses a new activation function named the maxout unit.
This unit takes the maximum linear activation for that unit. 3. Model
In addition to overfitting, deep learning research has cre- Convolutional neural networks is a type of neural network
ated faster ways to train neural networks. Glorot et. al.12 designed specifically for images and have been shown to
revealed rectified linear units (ReLUs) performed much faster perform well in classifying various supervised learning tasks3 .
in supervised training of deep neural networks as compared There have been several variations of convolutional neu-
to logistic sigmoid neurons and performed equal if not bet- ral networks created with commonalities in structure, in-
ter than the hyperbolic tangent. This is due to ReLUs non- cluding the use and ordering of convolutional, pooling, and
linear nature where it creates sparse representations which dense layers.
work well for naturally sparse data. While ReLUs repre-
sent a change in nonlinearty application to improve learn- 3.1. Convolutional Neural Network
ing, Nesterov’s momentum13 is a form of momentum up-
date that has been adapted to neural networks. Nesterov’s Convolutional neural networks were created with the as-
momentum takes the gradient at a future location following sumption that nearby inputs are highly related to one an-
the the momentum from previous updates that has directed other. In the case with images, the values are pixels, and
updates in a particular direction. This differs from stan- pixels next to each other in images have a strong correlation
with each rather than pixels further away in distance. With
Deep Learning for Brain Tumor Classification

this assumption in mind, convolutional neural networks fo- 4. Algorithms and Implementation
cus on local regions of the images in order to extrapolate
local features from these subregions. This extrapolation of Several algorithms are utilized in the construction of a neu-
local features is performed in the convolutional layers. ral network ranging from updating weights to calculating
loss or error. This section will review the various algo-
As there is an increase in convolutional layers, these local rithms incorporated into convolutional neural network and
features build upon one another to form higher-order fea- the specifics on implementing the layers mentioned in the
tures, combining to understand what the image is in its en- previous section.
tirety. Extrapolating local features to higher order features
is called using local receptive fields where a neuron in the 4.1. Forward Pass
convolutional layer takes in a particular k x j subregion of
the image. In order for each neuron in the convolutional Convolutional neural networks is a type of feedforward neu-
layer to take in various blocks of k x j pixels, convolutional ral networks in which a forward pass of training is com-
layers can add in stride, which will shift the k x j pixels puted with no loops in neuron connections; the next layers
over by some given stride. For the best convolutional neu- must only be connected to previous layers. When moving
ral networks used in this research, a stride of 1 is used and to a convolutional or fully connected layer, a set of weights
values for k and j are 5 and 5 respectively. This implies that and bias is applied to all of the connected neurons from
k x j pixel subregions can overlap with each other, which the previous layer in order to sum them together. This can
depending on the size of the stride, can typically help since be seen as applying a certain weight to a certain pixel and
nearby pixels are related in value. adding a bias. This formula can be seen below for a certain
neuron i for a certain convolutional or fully connected layer
Max-pooling layers are often paired with convolutional lay- l receiving input.
ers in order to reduce dimensionality in the neural network
n
and augment pixel values slightly so as to make the layer X
ali = Wijl xj + bi
insensitive to small changes in pixel values. Max-pooling
j=1
is a type of subsampling or pooling layers which produce
smaller images from there inputs by taking the max value In this formula, j represents the certain input into neuron
over a k x j subregion to represent the entire subregion in i. The nonlinearity ReLU is then applied to this sum ali to
the newly produced image. Common values of k and j are 2 give neuron i in layer l its new value of zil .
and 3, and the best convolutional neural network from this zil = max(0, ali )
research currently use k = 2 and j = 2 in order to cut the
dimensionality to one-fourth of the size at each use. While These two formulas are applied to every neuron in a con-
averaging is another function used in pooling layers, each volutional or fully connected layer in order to obtain each
pooling layer used in this research applies the max func- neuron’s value respectively. For max-pooling layers, the
tion. max function is applied over certain k x j subregions in
order to get the max value as an output, and this is ap-
Convolutional and max-pooling layers in neural networks
plied over the entire input keeping note of the given stride.
are often fed into fully connected or dense layers (i.e. all
The last layer contains the softmax function instead of the
neurons in a layer are connected to each neuron in the fol-
ReLU function in order to assign probabilities of the image
lowing layer). Since fully connected layers have full con-
being a certain type of tumor.
nections to all activations in the previous layer, fully con-
l
nected layers perform high-level reasoning in the neural eai
network. For each neuron in every layer besides pooling zil = P al
ke
k
layers, a nonlinearity function is applied in the convolu-
tional neural network, otherwise layers could be collapsed The denominator represents the sum of all output neurons.
into one since applying linear functions can be substituted This will give the predictions for an image by choosing the
with applying just one linear function. The nonlinear func- highest probability. In order to learn from these probabil-
tion used in this case is the rectified linear units which have ities, first the loss or error of the predictions is calculated.
proven to increase performance in neural networks4 . To calculate loss, these convolutional neural networks use
The last layer in the neural network is a softmax layer of the categorical cross-entropy loss function.
size 3 or 4 neurons which is used to determine classifica- X
L= tj log(pj )
tion probabilites. These neurons represent the probabilities
j
of an image belonging to a particular category; three neu-
rons are for the three types of brain tumors and the fourth In the above formula, t represents the target label, and p
optional neuron is for brains without tumors. represents the prediction probability for the target label from
Deep Learning for Brain Tumor Classification

our calculated predictions from the neural network. Given its weights, biases, and its previous layer is shown below.
this summed error, an average categorical cross-entropy
loss is calculated by dividing by the total number of ex- ∂C ∂C ∂z l
=
amples in training m. ∂W l−1 ∂z l ∂W l−1
1 ∂C ∂C ∂z l
L =
m ∂bl−1 ∂z l ∂bl−1
In addition to categorical cross-entropy, it is common to ∂C ∂C ∂z l
=
add regularization to the loss in order to prevent weights ∂z l−1 ∂z l ∂z l−1
from increasing too far in magnitude which is prone to
This can be computed for any layer l by continuation of
overfitting. In this neural network, weight decay uses L1
backpropagation. Now the gradient for each parameter can
normalization.
λ X be used to update the parameters by using Nesterov’s momentum6 .
R= |w|
m w
p̂l = pl + µv l
In the above formula, w represents the weights in the neural ∂C
network, m is the number of training examples, and lambda v l = µv l − λ l
∂p
is the regularization contant / strength. The regularization
pl + = v l
constant is a hyperparameter which can vary based on the
design of the convolutional neural network. In this convo- In the above equations, p represents a parameter, l is the
lutional neural network, a regularization strength of 10−4 layer, p̂ is the look ahead in Nesterov’s momentum, and µ
is currently used. This regularization is combined with the is a hyperparameter momentum constant whose common
categorical cross-entropy to give the overall cost function. values include [0.5, 0.9, 0.95, 0.99] (in this research µ is
0.9). With this new set of weights and biases that were
1 1 X λ X
C= L+R= tj log(pj ) + |w| just updated, the neural network has just completed one
m m j m w epoch, which consists of one forward and one backward
iteration. Neural networks train through multiple epochs,
4.2. Backwards pass and, for this research, 100 and 500 epochs are used to train
the neural networks.
Neural networks now can use backpropagation to update
weights and biases by propagating this error backwards
through the neural network. This is propagated back until 5. Approach
the inputs are reached, and the backpropagation has reached In this section, we describe the various brain image datasets
all parameter weights W and biases b, during which they and our approach to practically training our convolutional
are updated in order to minimize the overall cost. In or- neural networks. We first describe the images in the brain
der to change the parameters in a direction that minimizes tumor dataset and the brains without tumors dataset. We
the cost function, partial derivatives are used with respect then describe processing and augmentation of images in or-
to each parameter, starting with the partial derivative of the der to gain more training data. Lastly, we discuss how the
cost function with respect to weights and bias. transformation of images is played into the software devel-
opment.
∂C ∂C
,
∂W l ∂bl
5.1. Data
In the above formula, l is the current layer (the last layer).
As stated previously, the brain tumor dataset contains 3064
The partial derivatives are used to update the weights con-
T1-weighted contrast-enhanced images with images cate-
nected to the last layer containing the softmax function. In
gorized into three sets: axial, coronal, and sagittal images.
order to continue to update previous layers weights and bi-
These represent the various planes images of the brain are
ases, the chain rule is applied from the current layer to the
scanned; they correlate with the transverse plane, frontal
previous layer. This is done by finding the partial deriva-
plane, or lateral plane planes respectively. There are 994
tive with respect to the current layer zl . Multipling this by
axial images, 1045 coronal images, and 1025 sagittal im-
the partial derivative of the previous layer with respect to
ages where each image contained an original size of 512 x
512 in pixels. Furthermore out of all of the images, there
were 708 slices of meningioma, 1425 slices of glioma, and
930 slices of pituitary tumors. These images originated
from 233 patients, so many of the images are from the same
patient.
Deep Learning for Brain Tumor Classification

In order to avoid confusing the convolutional neural net- brain tumor images are 512 x 512 which creates memory
work with three different planes of the brain that could have problems when loading all of the images into the neural
the same label, the images were separated into the three network. To prevent this issue, images were downsized to
planes, and this paper focuses on the axial images due to various sizes. While several image sizes were tested, we
the availability of tumorless brain images that were in the mainly discuss the polar ends of the downsizing since they
axial plane. This left us with a final brain tumor dataset of performed the best in areas regarding accuracy and training
191 patients and 989 images. Out of these images, we have time performance.
208 meningioma, 492 glioma, and 289 pituitary tumor im-
ages. The tumorless brains dataset contains 3120 axial 256 5.3.1. L ARGE I MAGE S IZE
x 256 images from 625 patients where each patient is rep-
Large image sizes consisted of images scaled to 256 x 256.
resented by 5 images that were randomly selected across
This required brain tumor images to downsize from 512 x
the splices from their brain scans.
512 while tumorless brain tumors remained the same size.
5.2. Preprocessing
5.3.2. S MALL I MAGE S IZE
Image data was preprocessed in several forms. Preprocess-
Similarly to Dieleman et. al.7 , brain tumor and tumorless
ing included imitating previous research using a neural net-
brain images were given as a large size of 512 x 512 and
work or typical image preprocessing for neural networks
256 x 256 respectively. While 512 x 512 images were too
which focuses only on the images themselves. Each form
large of a memory requirement for our neural networks to
of preprocessing is described below.
handle, 256 x 256 images were small enough to run tests,
though the speed of training was very high. In order to
5.2.1. VANILLA DATA P REPROCESSING
speed up training, images were cropped and shrunk. All
Brain tumor images were originally formatted as int16, and images were resized to 512 x 512 and subsequently cropped
tumorless brain images were formatted as float32. In order to 412 x 412 by removing 50 pixels from each side. These
to compare the two, each was scaled to a range of 0 to 255. pixels were majority unimportant, not withholding any in-
formation regarding the brain itself and holding constant
5.2.2. I MAGE AND L OCATION pixel values of 0. Since all brain images were equally cen-
tered in their images, only minor portions of the edges of
This preprocessing applies only to the brain tumor dataset. the brain images were affected. Images were then reduced
The brain tumor dataset provided the tumor location for to 69 x 69 in size by downscaling. This increased training
each image as a set of points that described the tumor bound- speed by a factor of 10. A small image size of 64 x 64 was
aries. In order to provide this to a neural network, the max- created as well by downsizing from the original size of 512
imum and minimum boundary point in the x and y direc- x 512 with similar increases in training speed.
tions were determined.
5.4. Counteracting Overfitting
5.2.3. T UMOR Z OOM
Convolutional neural networks have a high number of learn-
Rather than provide the neural network the maximum and
able parameters; the cutting edge neural networks have mil-
minimum boundary points in the x and y directions, these
lions of learnable parameters relying on a large number of
values were used in order to zoom into the tumor region of
images in order to train. With the limited dataset from the
each brain scan. In order for each image to have a consis-
brain tumor images, our neural networks were at a high
tent size, the minimum box needed to contain every tumor
risk of overfitting. Overfitting can occur when neural net-
was determined. To find this box, we found the mimum
works’ weights memorize training data rather than gener-
width and height needed to contain each tumor. The width
alize the input to learn patterns in the data. This can often
was determined via the difference between the minimum
happen due to small datasets. We applied several methods
x and maximum x, and the height was determiend via the
that prevent overfitting including data augmentation, regu-
difference between the minimum y and maximum y. This
larization through dropout, and parameter sharing implied
preprocessing was based on the note from Cheng et. al.1 ,
through rotations and transformations of images mentioned
stating how the tissue surrounding the tumor can give in-
below.
sight into tumor type.
Like many images, brain tumor image classifications are
5.3. Image Transformation invariant under translations, transformations, scaling, and
rotations. This allows for several forms of data augmenta-
Large image sizes has implications towards not only neural tion to be exploited. Data augmentation has proven useful
network training time but also memory issues. Original in expanding small datasets7 to prevent overfitting. In a set
Deep Learning for Brain Tumor Classification

of the tests run on the images, several forms of data aug- • Convolutional Layer with 64 filters of size 5 x 5 and
mentation were applied. stride of 1
• Max-pooling Layer with pool and stride size 2 x 2
1. Rotation: Images were rotated with an angle between
0° and 360° that was randomly taken from a normal • Convolutional Layer with 64 filters of size 5 x 5 and
distribution. stride of 1
2. Shift: Images were randomly shifted -4 to 4 pixels • Max-pooling Layer with pool and stride size 2 x 2
left or right and up or down7 . These minor shifts were
taken from a normal distribute and kept brains in the • Fully Connected Layer with 800 neurons
center of the image but changed the location of the • Fully Connected Layer with 800 neurons
brains enough to avoid memorization of location in an
image rather than relative to the brain itself. • Softmax Layer with 3 or 4 neurons depending on brain
tumor only in training or tumorless brain inclusion in
3. Scaling: Images were randomly rescaled using the training respectively
scaling between 1.3−1 and 1.3 from Dieleman et. a.l.7

4. Mirror: Each image was mirrored across its y-axis Each layer besides max-pooling applied the nonlinearity
(horizontally) with a probability of 0.5. ReLU, and each of the last three layers applied dropout to
help in regularization and overfitting. We will refer to this
After these initial transformations, further augmentation was neural network as CNN from now on in this paper.
performed in order to increase the size of the training set
each round. Each image was rotated 0° and 45° and flipped 5.5.2. F ULLY C ONNECTED N EURAL N ETWORK
horizontally to create four images. These four images were This neural network represents taking only images as input
then cropped to a size of 45 x 45 taking the four corners of as well, but it does not utilize any convolutional or max-
the images as edges to produce 16 different images. The pooling layers. This network consisted of the following
above data augmentation was run on the training data ev- layers.
ery epoch of training in order to constantly introduce new
images to the neural network every iteration. This augmen- • Fully Connected Layer with 800 neurons
tation affected training time very little. We will call this
processing step CO for counteracting overfitting. • Fully Connected Layer with 800 neurons

5.4.1. C ROP AVERAGING • Softmax Layer with 3 or 4 neurons depending on brain


tumor only in training or tumorless brain inclusion in
Following Krizhevsky et. al.2 , another form of data aug- training respectively
mentation was implemented during training for 256 x 256
images, in which images were downscaled to 224 x 224 Dropout and ReLUs were applied to each of these layers as
and five random patches of 196 x 196 were extracted from well. We will refer to this neural network as FCNN from
each training of these images in order to increase training now on in this paper.
data. When testing occurred, five 196 x 196 patches of each
test image downscaled to 224 x 224 were extracted, one 5.5.3. C ONCATENATION OF C ONVOLUTIONAL AND
for each corner of the image and one for the center. The F ULLY C ONNECTED I NPUT L AYERS
softmax probabilities for each of these images were then
averaged together to give averaged softmax probabilties. This neural network represents providing more information
than one image input. There are two version of the neural
5.5. Network construction network. Each version has a neural network synonymous
to CNN from above. However, a second input layer exists
A variety of neural networks were constructed based on the representing the same image input or the maximum and
preprocessing of image data. Each is described in detail in minimum x and y to represent the location of the tumor.
this section. These have their own neural network path that eventually
concatenates with the CNN from before. This second neu-
5.5.1. C ONVOLUTIONAL N EURAL N ETWORK ral network path consists of the following layers:
This neural network represents taking only images as input.
While many combinations of layers were tested, the best • Fully Connected Layer with 800 neurons
combination for this neural network was the following. • Fully Connected Layer with 800 neurons
Deep Learning for Brain Tumor Classification

Table 1. Average Five-Fold Cross Validation Test Accuracies with Brain Tumor Images Only
Model Details Per Image Accuracy Per Patient Accuracy
Image Size Preprocessing Network Epochs Last Best Best-PD Last Best Best-PD
256 x 256 Vanilla CNN 100 89.95 90.26 89.69 91.43 89.52 91.43
256 x 256 Vanilla FCNN 100 87.30 87.32 87.46 86.67 85.71 86.67
256 x 256 Vanilla ConcatNN 100 84.62 86.09 84.30 86.67 86.67 87.62
256 x 256 Tumor Locations ConcatNN 100 85.66 85.96 85.80 87.62 87.62 89.52
207 x 312 Tumor Zoomed CNN 100 88.99 88.16 88.99 88.57 88.57 88.57
69 x 69 Vanilla CNN 100 81.70 82.46 81.44 79.05 82.86 79.05
45 x 45 CO CNN 500 83.67 81.72 82.75 81.90 84.76 85.71
64 x 64 Vanilla CNN 100 83.83 84.52 82.10 82.86 82.86 82.86
64 x 64 Vanilla FCNN 100 80.86 80.43 81.30 77.14 76.19 77.14
196 x 196 Crop Averaging CNN 100 86.77 87.65 88.16 82.86 83.81 84.76

Table 2. Average Five-Fold Cross Validation Test Accuracies with Brain Tumor and Tumorless Brain Images
Model Details Per Image Accuracy Per Patient Accuracy
Image Size Preprocessing Network Epochs Last Best Best-PD Last Best Best-PD
256 x 256 Vanilla CNN 100 88.59 89.13 88.78 85.71 89.52 88.57
64 x 64 Vanilla CNN 100 85.06 82.69 83.21 84.76 82.86 84.76
64 x 64 Vanilla FCNN 100 84.51 86.30 84.05 82.86 84.76 81.90

The last layer of this path and the last fully connected layer for updating weights while validation data gave a glimpse
from CNN were then concatenated together and connected into how the neural network was improving over time. Af-
to one last fully connected layer with 800 neurons before ter the training phase was completed, the test data was then
reaching the softmax layer from CNN. We will refer to this used to see how well the neural networks predicted types
neural network as ConcatNN from now on in this paper. of tumors from new images.
A variety of hyperparameters are available to alter. We list
5.6. Random Forests the hyperparameters that produced the highest accuracies.
Random Forests were created in 2001 by Breiman11 , and
they are a combination of tree predictors where trees are de- • Regularization constant: 0.014
pendent on randomly sampled independent vectors. Each
tree is given features with minor amounts of perturbation in • Learning rate: 0.0001
order inject noise in the data, and noise is further injected • Momentum constant: 0.9
in the model level through randomization of attributes to
split decisions on. While random forests are not neural net- • Batch size: 4 for non-augmented datasets, 128 for
works, they have become a common technique in machine augmented datasets
learning research in the medical field. Two separate predic-
• Epochs: 100 (and one 500) which was compensation
tion tests were conducted, one using only the brain tumor
between accuracy and training time
dataset and another using both the brain tumor datset and
the tumorless brain dataset.
5.7.1. D ECAYING L EARNING R ATE
5.7. Training Rather than maintain a constant learning rate, a decaying
learning rate was attempted in order to increase accuracies
For each of the preprocessed datasets, patients were ran-
by decreasing the learning rate over time. However, each
domly placed into three sets for training, validation, and
case of the decaying learning rate had significantly worse
test with 149, 21, and 21 patients respectively. A patient
accuracies than without them.
represents all of a patients images; this avoids mixing pa-
tient data in both training and test which allows for eas-
5.7.2. ACCURACY M ETRICS
ier predictions since patient images are similar in structure.
The mean picture from training was subtracted from train, Three different models were computed during validation in
validation, and test in order to centralize the data. An exam- order to evaluate model performance on test data.
ple mean picture can be seen in Figure 3. This was found
to produce higher accuracies than cases without subtrac- • Last: The trained model after the last epoch.
tion of the mean picture. Training data was used during the
training of the neural networks in which train data was used • Best: The model at the point of the best validation
accuracy calculating per image accuracies.
Deep Learning for Brain Tumor Classification

• Best-PD: The model at the point of the best patient a perfect score for average precision at k for k equals 1 to
diagnosis validation accuracy calculating per patient 20. In Table 3 and 4 we continue to increase k for each
accuracies. cross validation until the neural network has an incorrect
prediction for per image and per patient accuracies respec-
For each of the above models, per image and per patient tively. For per image accuracy, the neural network averages
accuracies were applied to evaluate test performance. The reaching well over half of the test images before predict-
results from the test evaluations are shown in Table 1 and ing an incorrect tumor type, with the best cross validation
2. reaching 90% of images. For per patient accuracy, the neu-
ral network averages reaching over half of test patients as
6. Results well, with the best cross validation predicting 100% of the
patients correctly.
The accuracies for the conducted tests can be seen in Table
1 and Table 2. From these accuracies we can see the Vanilla To see how the Vanilla CNN 256 x 256 for Brain Tumors
CNN with image size 256 x 256 using the tumor brain Only neural network performs on each tumor type, we break
dataset only has the highest accuracy at 91.43%. Further- down the tumor types into meningioma, glioma, and pitu-
more, per patient accuracies proved consistent with per im- itary to evaluate the precision and recall for the Best model
age accuracies, implying consistent predictions across pa- for per image accuracy and the Last model for per patient
tient images. Even with the extra compute time through the accuracy since these performed the best in their respective
increase in epochs seen in CO FCNN with image size 45 x accuracy measure. These results can be seen in Tables 5
45, the larger size images trained neural networks more ac- and 6 respectively. These two models performed the best in
curately, producing 8% higher results. The weights from their respective accuracies. In Table 5, we can see menin-
the top neural network’s first convolutional layer can be gioma tumors were the most difficult to predict with an av-
seen in Figure 4. Minor structures representing low level erage of 0.84 precesion and 0.74 recall, while glioma and
features can be seen from each of these 5 x 5 weight re- pituitary had precision and recall in the mid-90%s. In Table
gions. In order to further compare these models, the loss 6, tumor type precision and recall is approximately equal
and accuracy history for training and validation sets were for averages with 93%, 93%, and 91% for meningioma,
plotted for each model with five-fold cross validation (Fig- glioma, and pituitary tumors respectively.
ures 11 - 20). While the loss histories show the 256 x 256 Lastly, random forest was run on both the brain tumor dataset
images had overfitting over time due to lack of examples, only and with tumorless brain images. The former and lat-
their accuracies showed to consistantly be in a higher range ter gained averages close to 90% consistantly with cons-
than smaller images. derable speed up as compared to training neural networks.
When looking specifically at the precision at k for these Using tumorless brain images did not affect the accuracies.
models (Figures 5 - 10), nearly all models remained above
a 90%. Precision at k uses the top k predictions with the 7. Conclusion and Future Work
highest probabilities over all images. This represents the
images neural networks were most confident in classify- Convolutional neural networks are the state of the art in
ing. Having 90% accuracy consistently implies the predic- computer vision, and introducing them into the medical
tions with the highest probabilities were often correct for field could greatly improve current practices of diagnos-
any neural network. A particular note is that any model us- ing patients. Training convolutional neural networks to de-
ing 256 x 256 images had a precision of 1.0 from k = 1 to tect types of tumors in brain images improves classification
20. Feeding larger images into the neural network ensured accuracy and provides initial steps into introducing deep
higher sucess of classification for a model’s top predictions. learning into medicine. Not only does this method produce
equal and better results when compared to Cheng et. al.’s
When comparing neural networks that used tumorless brain initial work, but neural networks also utilize a more general
datasets and neural networks that did not, there were mixed methodology requiring only an image to understand brain
results. For images of smaller size, adding tumorless brains tumor types. Furthermore, the accuracy per patient metric
tied or increased accuracies up to 2%. However for images consistently remained at the levels of per image accuracy
of larger size, tumorless brains appeared to produce slightly results, implying the neural network is providing consistent
less accuracies. predictions for patient images.
Analyzing the Vanilla CNN 256 x 256 for Brain Tumors Future work can add upon this research by exploring neural
Only neural network which performed the best overall in networks that train on coronal and sagittal images. Further-
per image accuracy and in per patient accuracy, we look more, combining patient images across planes can not only
into the confidence, precision, and sensitivity in Tables 3 - increase dataset size but also provide further insights into
6. As seen from Figure 5, Vanilla CNN 256 x 256 earned
Deep Learning for Brain Tumor Classification

Figure 3. The average axial image across a training set.

Figure 4. The 64 filters learned in the first convolutional layer of the best-performing neural network.
Deep Learning for Brain Tumor Classification

Figure 5. Average precision at k, where k ranges from 1 to 20, for Last models for per image accuracy. Vanilla CNN and
Vanilla FCNN for 256 x 256 images and Crop Averaging for 196 x 196 images received perfect scores for k = 1 to 20.

Figure 6. Average precision at k, where k ranges from 1 to 20, for Best models for per image accuracy. Vanilla CNN and
Vanilla FCNN for 256 x 256 images and Crop Averaging for 196 x 196 images received perfect scores for k = 1 to 20.
Deep Learning for Brain Tumor Classification

Figure 7. Average precision at k, where k ranges from 1 to 20, for Best-PD models for per image accuracy. Vanilla CNN
and Vanilla FCNN for 256 x 256 images and Crop Averaging for 196 x 196 images received perfect scores for k = 1 to 20.

Figure 8. Average precision at k, where k ranges from 1 to 20, for Last models with tumorless brain images for per image
accuracy. Vanilla CNN for 256 x 256 images received a perfect score for k = 1 to 20.
Deep Learning for Brain Tumor Classification

Figure 9. Average precision at k, where k ranges from 1 to 20, for Best models with tumorless brain images for per image
accuracy. Vanilla CNN for 256 x 256 images received a perfect score for k = 1 to 20.

Figure 10. Average precision at k, where k ranges from 1 to 20, for Best-PD models with tumorless brain images for per
image accuracy. Vanilla CNN for 256 x 256 images received a perfect score for k = 1 to 20.
Deep Learning for Brain Tumor Classification

Figure 11. Loss and accuracy history for Vanilla FCNN with 256 x 256 images. The top graph represents loss over time,
the middle graph represents per image accuracy over time, and the bottom graph represents per patient accuracy over time.

Figure 12. Loss and accuracy history for Vanilla CNN with 69 x 69 images. The top graph represents loss over time, the
middle graph represents per image accuracy over time, and the bottom graph represents per patient accuracy over time.
Deep Learning for Brain Tumor Classification

Figure 13. Loss and accuracy history for CO CNN with 45 x 45 images. The top graph represents loss over time, the
middle graph represents per image accuracy over time, and the bottom graph represents per patient accuracy over time.

Figure 14. Loss and accuracy history for Vanilla CNN with 64 x 64 images. The top graph represents loss over time, the
middle graph represents per image accuracy over time, and the bottom graph represents per patient accuracy over time.
Deep Learning for Brain Tumor Classification

Figure 15. Loss and accuracy history for Vanilla CNN with 256 x 256 images. The top graph represents loss over time,
the middle graph represents per image accuracy over time, and the bottom graph represents per patient accuracy over time.

Figure 16. Loss and accuracy history for Vanilla FCNN with 64 x 64 images. The top graph represents loss over time, the
middle graph represents per image accuracy over time, and the bottom graph represents per patient accuracy over time.
Deep Learning for Brain Tumor Classification

Figure 17. Loss and accuracy history for Vanilla ConcatCNN with 256 x 256 images. The top graph represents loss over
time, the middle graph represents per image accuracy over time, and the bottom graph represents per patient accuracy
over time.

Figure 18. Loss and accuracy history for Tumor Locations ConcatCNN with 256 x 256 images. The top graph represents
loss over time, the middle graph represents per image accuracy over time, and the bottom graph represents per patient
accuracy over time.
Deep Learning for Brain Tumor Classification

Figure 19. Loss and accuracy history for Tumor Zoomed CNN with 207 x 312 images. The top graph represents loss over
time, the middle graph represents per image accuracy over time, and the bottom graph represents per patient accuracy
over time. This model was trained on validation data.

Figure 20. Loss and accuracy history for Crop Averaging CNN with 196 x 196 images. The top graph represents loss
over time, the middle graph represents per image accuracy over time, and the bottom graph represents per patient
accuracy over time.
Deep Learning for Brain Tumor Classification

Table 3.
Number Correctly Predicted Per Image in Order of
Confidence Until Incorect Prediction for 256 x 256 Vanilla CNN
Cross Validations Number of Images Last Best Best-PD
1 82 53 56 35
2 111 23 22 23
3 126 81 75 84
4 119 107 110 106
5 110 86 66 63
Average1 109.6 70 65.8 61.8

1
Averaged over Five-Fold Cross Validation.

Table 4.
Number Correctly Predicted Per Patient in Order of
Confidence Until Incorect Prediction for 256 x 256 Vanilla CNN
Cross Validations Samples Last Best Best-PD
1 21 11 12 8
2 21 1 9 1
3 21 18 18 18
4 21 21 21 21
5 21 10 7 9
Average1 21 13.4 12.2 11.4

1
Averaged over Five-Fold Cross Validation.

Table 5. Precision and Recall for Vanilla CNN 256 x 256 Best Per Image Model2
Tumor Type Cross Validations Precision Recall F1-Score Support
1 1.00 0.75 0.86 20
2 0.84 0.62 0.71 34
Meningioma 3 0.57 0.62 0.59 13
4 1.00 0.88 0.94 26
5 0.57 0.86 0.69 14
Avg / Total1 0.84 0.74 0.78 107
1 1.00 1.00 1.00 23
2 0.80 0.93 0.86 55
Glioma 3 1.00 0.92 0.96 78
4 0.93 1.00 0.96 41
5 0.92 0.88 0.90 76
Avg / Total1 0.93 0.93 0.93 273
1 0.89 1.00 0.94 39
2 1.00 1.00 1.00 22
Pituitary 3 0.82 0.91 0.86 35
4 1.00 1.00 1.00 52
5 1.00 0.80 0.89 20
Avg / Total1 0.94 0.96 0.94 168

1 2
Averaged over Five-Fold Cross Validation. Best Per Image Model represents the Best model using the per image accuracy.
Deep Learning for Brain Tumor Classification

Table 6. Precision and Recall for Vanilla CNN 256 x 256 Last Per Patient Model2
Tumor Type Cross Validations Precision Recall F1-Score Support
1 1.00 0.57 0.73 7
2 0.88 0.88 0.88 8
Meningioma 3 0.75 0.75 0.75 4
4 1.00 1.00 1.00 5
5 1.00 0.83 0.91 6
Average1 0.93 0.80 0.85 30
1 1.00 1.00 1.00 4
2 0.88 0.88 0.88 8
Glioma 3 1.00 1.00 1.00 10
4 1.00 1.00 1.00 6
5 0.85 1.00 0.92 11
Average1 0.93 0.98 0.95 39
1 0.77 1.00 0.87 10
2 1.00 1.00 1.00 5
Pituitary 3 0.86 0.86 0.86 7
4 1.00 1.00 1.00 10
5 1.00 0.75 0.86 4
Average1 0.91 0.94 0.92 35

1 2
Averaged over Five-Fold Cross Validation. Last Per Patient Model represents the Last model using the per patient accuracy.

tumor type that is difficult to view from only one plane. vent neural networks from overfitting. J. Machine Learn-
This can particularly improve meningioma tumors which ing Res. 15, 19291958 (2014).
caused neural networks the most difficult in classifying.
Lastly, decreasing image size improved efficiency of train- 6. Nesterov, Y. A method of solving a convex program-
ing neural networks greatly. Improving performance on ming problem with convergence rate O(1/sqr(k)). So-
smaller images can have great benefits in training and as- viet Mathematics Doklady, 27:372376, 1983.
sisting doctors in treatment of patients. Dealing with noisy, 7. Dieleman, S., Willett K., & Dambre J. Rotation-invariant
smaller images can help generalize neural networks to un- convolutional neural networks for galaxy morphology
derstand more complex brain images which in turn can help prediction MNRAS;450:1441-1459 (2015).
doctors in their diagnosis.
8. Hinton, G. E., Osindero, S. & Teh, Y.-W. A fast learn-
8. References ing algorithm for deep belief nets. Neural Comp. 18,
15271554 (2006).
1. Cheng J, Huang W, Cao S, Yang R, Yang W, et al.
(2015) Correction: Enhanced Performance of Brain 9. I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville,
Tumor Classification via Tumor Region Augmenta- and Y. Bengio. Maxout networks. In Proceedings of
tion and Partition. PLOS ONE 10(12): e0144479. the 30th International Conference on Machine Learn-
doi: 10.1371/journal.pone.0144479 ing, pages 1319 1327. ACM, 2013.

2. A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet 10. Toward content based image retrieval with deep con-
Classification with Deep Convolutional Neural Net- volutional neural networks JES Sklan, AJ Plassard, D
works, Proc. Neural Information and Processing Sys- Fabbri, BA Landman SPIE Medical Imaging, 94172C-
tems, 2012. 94172C-6

3. LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient- 11. Breiman, L. (2001). Random forests, Machine Learn-
based learning applied to document recognition. Proc. ing 45: 532.
IEEE 86, 22782324 (1998).
12. Glorot, X., Bordes, A. & Bengio. Y. Deep sparse recti-
4. Glorot, X., Bordes, A. & Bengio. Y. Deep sparse recti- fier neural networks. In Proc. 14th International Con-
fier neural networks. In Proc. 14th International Con- ference on Artificial Intelligence and Statistics 315323
ference on Artificial Intelligence and Statistics 315323 (2011).
(2011).
13. Nesterov, Y. A method of solving a convex program-
5. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, ming problem with convergence rate O(1/sqr(k)). So-
I. & Salakhutdinov, R. Dropout: a simple way to pre- viet Mathematics Doklady, 27:372376, 1983.
Deep Learning for Brain Tumor Classification

14. Lasko, T. et al. (2013) PLOS ONE;8:6

You might also like