MAG-Net: Multi-Task Attention Guided Network For Brain Tumor Segmentation and Classification
MAG-Net: Multi-Task Attention Guided Network For Brain Tumor Segmentation and Classification
Sachin Gupta, Narinder Singh Punn, Sanjay Kumar Sonbhadra, and Sonali
Agarwal
{mit2019075,pse2017002,rsi2017502,sonali}@iiita.ac.in
Abstract. Brain tumor is the most common and deadliest disease that
can be found in all age groups. Generally, MRI modality is adopted
for identifying and diagnosing tumors by the radiologists. The correct
identification of tumor regions and its type can aid to diagnose tumors
with the followup treatment plans. However, for any radiologist analysing
such scans is a complex and time-consuming task. Motivated by the
deep learning based computer-aided-diagnosis systems, this paper pro-
poses multi-task attention guided encoder-decoder network (MAG-Net)
to classify and segment the brain tumor regions using MRI images. The
MAG-Net is trained and evaluated on the Figshare dataset that includes
coronal, axial, and sagittal views with 3 types of tumors meningioma,
glioma, and pituitary tumor. With exhaustive experimental trials the
model achieved promising results as compared to existing state-of-the-
art models, while having least number of training parameters among
other state-of-the-art models.
1 Introduction
Brain tumor is considered as the deadliest and most common form of cancer
in both children and adults. Determining the correct type of brain tumor in
its early stage is the key aspect for further diagnosis and treatment process.
However, for any radiologist, identification and segmentation of brain tumor via
multi-sequence MRI scans for diagnosis, monitoring, and treatment, are complex
and time-consuming tasks.
Brain tumor segmentation is a challenging task because of its varied behavior
both in terms of structure and function. Furthermore, the tumor intensity of a
person differs significantly from each other. MRI is preferred over other imaging
modalities [4] for the diagnosis of brain tumor because of its non-invasive prop-
erty that follows from without the exposure to ionizing radiations and superior
image contrast in soft tissues.
?
All authors have contributed equally.
2 Sachin et al.
Deep learning has shown advancement in various fields with promising per-
formance especially in the area of biomedical image analysis [24]. The convo-
lutional neural networks (CNN) [2] are the most widely used models in image
processing. The CNNs involve combination of convolution, pooling and activa-
tion layers accompanied with the normalization and regularization operations
to extract and learn the target specific features for desired task (classification,
localization, segmentation, etc.). In recent years various techniques have been
proposed for identification (classification and segmentation) of the brain tumor
using MRI images that achieved promising results [13, 30]. However, most of the
approaches use millions of trainable parameters that result in slower training
and analysis time, while also having high variance in results in case of limited
data samples.
In order to overcome the aforementioned drawbacks, Ronneberger et al. [26]
proposed U shaped network (U-Net) for biomedical image segmentation. The
model follows encoder-decoder design with feature extraction (contraction path)
and reconstruction phases (expansion path) respectively. In addition, skip con-
nections are introduced to propagate the extracted feature maps to the corre-
sponding reconstruction phase to aid upsample the feature maps. Finally, model
produces segmentation mask in same dimensions as the input highlighting the
target structure (tumor in our case). Following the state-of-the-art potential of
the U-Net model, many U-Net variants are proposed to further improve the seg-
mentation performance. Attention based U-Net model [19] is one such variant
that tend to draw the focus of the model towards target features to achieve
better segmentation results. The attention filters are introduced in the skip con-
nections where each feature is assigned weight coefficient to highlight its im-
portance towards the target features. Despite achieving the promising results,
these models have millions of trainable parameter which can be reduced by op-
timizing the convolution operation. This can be achieved by incorporating the
depthwise convolution operations [8] that is performed in two stages: depthwise
and pointwise convlutions. The reduction in the number of the parameters and
multiplications as compared to standard convolution operation can represented
as 1/r + 1/f 2 , where r is the depth of the output feature map and f is the ker-
nel height or width [21]. The achieved reduction in number of parameters and
multiplications is ∼ 80%. Following this context, attention guided network is
proposed that uses depthwise separable convolution for real time segmentation
and classification of the brain tumor using MRI imaging. The major contribution
of the present research work is as follows:
The rest paper is organized as follows: Section 2 describes the crux of related
work on brain tumor segmentation and classification. Section 3, talks about
the proposed architecture, whereas Section 4 discuses the training and testing
environment with experimental and comparative analysis. Finally, concluding
remarks are presented in Section 5.
2 Literature review
Identifying the brain tumor is a challenging task for the radiologists. Recently,
several deep learning based approaches are proposed to aid in faster diagnosis
of the diseases. Segmentation of the infected region is most common and critical
practice involved in the diagnosis. In addition, the segmented region can be
provided with label (classification) to indicate what type of anomaly or infection
is present in the image.
In contrast to the traditional approaches, Cheng et al. [7] proposed a brain
tumor classification approach using augmented tumor region instead of original
tumor region as RoI (region of interest). Authors utilized the bag of word (BOW)
technique to segment and extract local features from RoI. Dictionary is used
for encoding the extracted local features maps that are passed through SVM
(support vector machine) classifier. The approach outperformed the traditional
classification techniques with the accuracy of 91.28% but the performance is
limited by the data availability. In similar work, Ismael et al. [14] proposed an
approach of combining statistical features along with neural networks by using
filter combination: discrete wavelet transform (DWT)(represented by wavelet
coefficient) and Gabor filter (for texture representation). For classification of
the tumor, three layered neural network classifier is developed using multilayer
perceptron network that is trained with statistical features. In contrast to Cheng
et al. [7], authors also achieved promising results on the limited data samples
with an overall accuracy of. 91.9%.
Recently, capsule network [12] has shown great performance in many fields
especially in biomedical image processing. Afshar et al. [1] proposed basic cap-
snet with three capsules in last layer representing three tumor classes. However,
due to varied behavior (background, intensity, structure, etc.) of MRI image, the
proposed model failed to extract optimal features representing the tumor struc-
ture. The author achieved the tumor classification accuracy of 78% and 86.5%
using raw MRI images and tumor segmented MRI images respectively. In another
approach, Pashaei et al. [20] utilized CNN and kernel extreme learning machine
that comprises one hidden layer with 100 neurons to increase the robustness of
the model. With several experimental trials, the authors achieved an accuracy of
93.68% but detects only 1% of the positive pituitary tumor cases out of the total
pituitary tumor case. Deepak et al. [9] proposed a transfer learning approach that
uses pre-trained GoogleNet model to extract features (referred as deep CNN fea-
tures) with softmax classifier in the output layer to classify three tumor classes.
Furthermore, the authors combine the deep CNN features and SVM model to
analyse the classification performance. The authors achieved 97.1% accuracy but
4 Sachin et al.
3 Proposed work
The proposed multi-task attention guided network (MAG-Net) model, as shown
in Fig. 1, focuses on reducing overall computation, better feature extraction and
optimizing the training parameters by reduction. The overall architectural design
consists of an encoder, decoder, and classification module with 5.4M trainable
parameters. The overall architectural design of the model is inspired by the U-
Net encoder-decoder style [23]. Due to its state-of-the-art potential, this model is
MAG-Net 5
the most prominent choice among the researchers to perform biomedical image
segmentation [17].
In MAG-Net to reduce the number of training parameters without the cost of
performance, standard convolution operations are replaced with depthwise sepa-
rable convolution. In addition, the skip connections are equipped with attention
filters [19] to better extract the feature maps concerning the tumor regions. The
attention approach filters the irrelevant feature maps in the skip connection by
assigning weights to highlight its importance towards the tumor regions. Be-
sides, the encoder block is equipped with parallel separable convolution filters
of different sizes, where the extracted feature maps are concatenated for better
feature learning. These features are then passed to the corresponding decoder
blocks via attention enabled skip connections to aid in feature reconstruction
with the help of upsampling operation. The bottleneck layer connects the fea-
ture extraction path to the feature reconstruction path. In this layer filters of
different sizes are used along with the layer normalization. Furthermore, the
classification is performed using the extracted feature maps obtained from the
final encoder block.
3.1 Encoder
To detect the shape and size of varying image like brain tumor it is required to use
separable convolution of different sizes. Inspired from the concept of inception
neural network [22] the encoder segment is consist of separable convolutions of 1
x 1, 3 x 3, and 5 x 5 kernels. Each of separable convolutions are followed by layer
normalization. The extracted feature maps are fused with add operation that
are downsampled by max pooling operation. Fig. 2, shows the proposed encoder
architecture of MAG-Net model for some input feature map, Fi ∈ Rw×h×d ,
where w, h and d are the width, height and depth of the feature map.
3.2 Decoder
The decoder component follows from the encoder block and that tend to recon-
struct the spatial dimension to generate the output mask in same dimension as
input. It consists of upsampling of the feature maps along with the concatenation
with the attention maps followed by a separable convolution operation. Long skip
connections [10] are used to propagate the attention feature maps from encoder
to decoder to recover spatial information that was lost during downsampling in
encoder. By using attention in the skip connection it helps the model to suppress
the irrelevant features.
3.3 Classification
This module classifies the brain tumor MRI images into respective classes i.e
meningioma, glioma, and pituitary tumor by utilizing the features extracted
from the encoder block. This encoder block act as backbone model for both
classification and segmentation, thereby reducing the overall complexity of the
model. In this classification block the feature maps of the last encoder block act
as input that are later transformed into 1D tensor by using global average pool-
ing. The pooled feature maps are then processed with multiple fully connected
layers. The classification output is generated from the softmax activated layer
that generates the probability distribution of the tumor classes for an image.
Fig. 3. A slice of MRI scan with T1 modality showing different tumor classes: menin-
gioma, glioma, and pituitary
shown in Eq. 1) is a sigmoid activation [16] followed by cross entropy loss [29]
that compares each of the predicted probabilities to actual output. Categorical
cross entropy (CE, shown in Eq. 2) is a softmax activation function followed by
cross-entropy-loss that compares the output probability over each tumor class
for each MRI image.
N
1 X
LBCE = − (yi .log(p(yi )) + (1 − yi ).log(1 − P (yi ))) (1)
N i=1
where y represents actual tumor mask, p(y) represents predicted tumor mask
and N is the total number of images.
C
X
LCE = ti log(f (si )) (2)
i
where C is the no. of class, f (si ) is the probability of occurrence of each class ti
represents 1 for true label and 0 for others.
For segmentation the most popular evaluation matrics are dice coefficient
(shown in Eq. 3) and intersection-over-union (IoU / Jaccard index) (shown in
Eq. 4), and hence are utilized to evaluate the trained MAG-Net model. TP
defines correctly classified predictions FP defines wrongly classified, and FN
defines missed objects of each voxel.
2 ∗ TP
DiceCoef f icient = (3)
2 ∗ TP + FP + FN
TP
IoU = (4)
TP + FP + FN
To evaluate classification module of the MAG-Net, accuracy, precision, recall,
f1-score and micro average metrics are considered for better quantification and
visualization of the performance of the model. Precision of the class, as shown in
Eq. 5, quantifies about the positive prediction accuracy of the model. Recall is the
fraction of true positive which are classified correctly (shown in Eq. 6). F1-score
quantifies the amount of correct predictions out of all the positive predictions
(shown in Eq. 7). Support quantifies the true occurrence in the specified dataset
of the respective class. Micro average (µavg ) (shown in Eq. 8, Eq. 9 and Eq. 10) is
8 Sachin et al.
calculated for precision, recall, and F1-score. To compute micro average (µavg ),
the test dataset is divided into two sub dataset, on each of which the true positive,
false positive and false negative predictions are identified.
TP
P recision = (5)
(T P + F P )
TP
Recall = (6)
(F N + F P )
2 ∗ Recall ∗ P recision
F 1 − score = (7)
(Recall + P recision)
T P1 + T P2
µavg (P recision) = (8)
(T P1 + T P2 + F P1 + F P2 )
T P1 + T P2
µavg (Recall) = (9)
(T P1 + T P2 + F N1 + F N2 )
Fig. 4. Qualitative results of brain tumor segmentation and classification on MRI im-
ages (a, b, and c of different tumor classes) using MAG-Net model.
MAG-Net 9
4.3 Results
The MAG-Net outputs the segmented mask of a given MRI image consisting of
tumor region corresponding to meningioma, glioma, and pituitary as classified
by the model. For randomly chosen MRI slices, Fig. 4 presents the segmentation
and classification results of model. The visual representation confirms that the
results are close to the ground truth of respective tumor classes.
Table 1. Comparative analysis of the MAG-Net with the existing segmentation models
on test dataset.
Table 1 represents the result of the proposed work for segmentation in the
form of accuracy, loss, dice coefficient, Jaccard index, and trainable parameters
along with comparative analysis with other popular approaches. The proposed
framework outperforms the other approaches in segmenting tumor with the dice
and IoU score of 0.74 and 0.60 respectively. In contrast to other models, MAG-
Net achieved best results with minimal trainable parameters. The other popular
approaches taken in comparative analysis for segmentation are U-Net [15], U-
Net++ [18, 31], and wU-Net. [5].
Table 2. Comparative analysis of the MAG-Net with the existing classification models
on test dataset using confusion matrix.
Table 2 and Table 3 represent the results of the proposed work for classifica-
tion in the form of accuracy, loss, confusion matrix, and classification report for
meningioma, glioma, and pituitary tumor along with comparative analysis with
other state-of-the-art approaches: VGG-16 [28], VGG-19 [28], and ResNet50 [25].
With exhaustive experimental trials it is observed that MAG-Net outperformed
the existing approaches with significant margin in all the metrics.
Table 3. Comparative analysis of the MAG-Net with the existing classification models
on test dataset considering classification report as evaluation parameter.
5 Conclusion
In this paper, the complex task of brain tumor segmentation and classifica-
tion is addressed using multi-task attention guided network (MAG-Net). This a
MAG-Net 11
U-Net based model that features reduction in the overall computation, better
feature extraction and training parameters optimization. The proposed architec-
ture achieved significant performance on the Figshare brain tumor dataset by
exploiting the state-of-the-art advantages of U-Net, depthwise separable convo-
lution and attention mechanism. The MAG-Net model recorded the best clas-
sification and segmentation results compared to the existing classification and
segmentation approaches. It is believed that this work can also be extended to
other domains involving classification and segmentation tasks.
Acknowledgment
References
1. Afshar, P., Mohammadi, A., Plataniotis, K.N.: Brain tumor type classification via
capsule networks. In: 2018 25th IEEE International Conference on Image Process-
ing (ICIP). pp. 3129–3133. IEEE (2018)
2. Albawi, S., Mohammed, T.A., Al-Zawi, S.: Understanding of a convolutional neu-
ral network. In: 2017 International Conference on Engineering and Technology
(ICET). pp. 1–6. Ieee (2017)
3. Brownlee, J.: Use early stopping to halt the training of neural networks at the right
time. https://machinelearningmastery.com/how-to-stop-training-deep-n
eural-networks-at-the-right-time-using-early-stopping/ (2018), [Online;
accessed April 17, 2021]
4. Cancer.Net: Brain tumor: Diagnosis. https://www.cancer.net/cancer-types/br
ain-tumor/diagnosis (2020), [Online; accessed March 20, 2021]
5. CarryHJR: Nested unet. https://github.com/CarryHJR/Nested-UNet/blob/mas
ter/model.py. (2020), [Online; accessed March 11, 2021]
6. Cheng, J.: brain tumor dataset (4 2017), https://figshare.com/articles/data
set/brai\n tumor dataset/1512427
7. Cheng, J., Huang, W., Cao, S., Yang, R., Yang, W., Yun, Z., Wang, Z., Feng, Q.:
Enhanced performance of brain tumor classification via tumor region augmentation
and partition. PloS one 10(10), e0140381 (2015)
8. Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In:
Proceedings of the IEEE conference on computer vision and pattern recognition.
pp. 1251–1258 (2017)
9. Deepak, S., Ameer, P.: Brain tumor classification using deep cnn features via trans-
fer learning. Computers in biology and medicine 111, 103345 (2019)
10. Drozdzal, M., Vorontsov, E., Chartrand, G., Kadoury, S., Pal, C.: The importance
of skip connections in biomedical image segmentation. In: Deep learning and data
labeling for medical applications, pp. 179–187. Springer (2016)
12 Sachin et al.
30. Zhou, T., Ruan, S., Canu, S.: A review: Deep learning for medical image segmen-
tation using multi-modality fusion. Array 3, 100004 (2019)
31. Zhou, Z., Siddiquee, M., Tajbakhsh, N., Liang, J.U.: A nested u-net architecture
for medical image segmentation. arXiv preprint arXiv:1807.10165 (2018)