0% found this document useful (0 votes)
25 views

SSCLNet A Self-Supervised Contrastive Loss-Based Pre-Trained Network For Brain MRI Classification

Uploaded by

gdheepak1979
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

SSCLNet A Self-Supervised Contrastive Loss-Based Pre-Trained Network For Brain MRI Classification

Uploaded by

gdheepak1979
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Received 29 December 2022, accepted 10 January 2023, date of publication 16 January 2023, date of current version 23 January 2023.

Digital Object Identifier 10.1109/ACCESS.2023.3237542

SSCLNet: A Self-Supervised Contrastive


Loss-Based Pre-Trained Network for Brain MRI
Classification
ANIMESH MISHRA, RITESH JHA , AND VANDANA BHATTACHARJEE
Birla Institute of Technology, Mesra, Ranchi 835215, India
Corresponding author: Vandana Bhattacharjee ([email protected])

ABSTRACT Brain magnetic resonance images (MRI) convey vital information for making diagnostic
decisions and are widely used to detect brain tumors. This research proposes a self-supervised pre-training
method based on feature representation learning through contrastive loss applied to unlabeled data. Self-
supervised learning aims to understand vital features using the raw input, which is helpful since labeled data is
scarce and expensive. For the contrastive loss-based pre-training, data augmentation is applied to the dataset,
and positive and negative instance pairs are fed into a deep learning model for feature learning. Subsequently,
the features are passed through a neural network model to maximize similarity and contrastive learning of
the instances. This pre-trained model serves as an encoder for supervised training and then the classification
of MRI images. Our results show that self-supervised pre-training with contrastive loss performs better than
random or ImageNet initialization. We also show that contrastive learning performs better when the diversity
of images in the pre-training dataset is more. We have taken three differently sized ResNet models as the
base models. Further, experiments were also conducted to study the effect of changing the augmentation
types for generating positive and negative samples for self-supervised training.

INDEX TERMS Contrastive learning, convolutional neural networks, pre-training, ResNet, self-supervised.

I. INTRODUCTION work architecture on sequence-based encoding features for


The Brain is a complex part of the human body, and any discriminating the Adaptor Protein complexes. Kha et al. [8]
abnormality can affect an individual’s health [1]. A brain proposed a novel model constructed using convolutional
tumor is an abnormal and uncontrolled growth of the human neural network (CNN) and position-specific scoring matrix
brain cell. The brain tumor is classified as benign or malig- (PSSM) profiles for identification of SNARE proteins.
nant; or as pituitary, meningioma, or glioma [2], [3]. Inva- Deep learning methods such as convolutional neural net-
sive approaches such as biopsy or noninvasive methods such works (CNN) do not need manually handcrafted features.
as magnetic resonance imaging (MRI), positron emission They have shown exemplary performance in computer vision
tomography, and computed tomography are used for detect- on large, labeled datasets such as ImageNet [9]. Such deep
ing brain tumors. Among these, MRI is the most preferred models may not be suitable for the medical imaging field,
technique due to its capturing detailed information about the where the sample size of the dataset is usually small.
tumor’s location, progression, shape, and size. To assist a doc- Several researchers have used pre-trained CNN models to
tor’s diagnostic decisions, several researchers have proposed overcome this issue and adopted transfer learning and fine-
computer-aided systems using machine learning and deep tuning approaches [10], [11], [12], [13]. However, all these
learning methods [4], [5], [6]. Further, several researchers approaches apply supervised classification and require a
have applied deep learning methods to solve complex prob- labeled dataset, in the absence of which several researchers
lems such as authors in [7] use interpretable deep neural net- have used unsupervised or self-supervised learning. Repre-
sentation learning through contrastive learning is one such
The associate editor coordinating the review of this manuscript and approach; the main idea behind the approach is to learn the
approving it for publication was Gustavo Callico . representation function by creating augmentations for each

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 11, 2023 6673
A. Mishra et al.: SSCLNet: A Self-Supervised Contrastive Loss-Based Pre-Trained Network for Brain MRI Classification

data point. Then contrastive loss is applied to maximize the deep unsupervised feature learning has been explored to learn
similarity between the data point and its augmentation. At the the informative representations of images. Most deep unsu-
same time, the similarity between the data point and other pervised learning methods aim to learn the feature represen-
samples is minimized. For example, it has been shown [14] tations that can reconstruct the inputs themselves, such as the
that distributions of augmentations of different dog images auto-encoder (AE), the sparse auto-encoder (SAE) [21], the
tend to be similar. Still, their union has little overlap with denoising auto-encoder (DAE) [22], and the deconvolutional
distributions of augmentations of cat images. network (DeCNN) [23]. Mishra et al. [24] have applied a
In this work, we apply self-supervised learning based on semi-supervised approach for generating pseudo labels for
contrastive loss for brain MRI classification using unlabeled classification. Further, deep generative models, including
data for pre-training the model. The base architecture used the auto-encoding variational Bayes (AEVB) [25] and the
in our experiments is ResNet [15]. Deep networks naturally generative adversarial network (GAN) [26], have been pro-
capture low- and high-level features [16] and classifiers in vided to encode visual information. Generative adversarial
an end-to-end multilayer fashion, which is further enhanced networks have also been utilized for tissue and cell-level cat-
by increasing the number of layers. However, deep networks egorization, while sparse and variational autoencoders have
come with the problem of vanishing/ exploding gradients, been employed for unsupervised nuclei detection and transfer
and several researchers have addressed these problems by learning [27], [28], [29], [30].
normalized initialization or intermediate normalization lay- Nevertheless, generative models primarily work in the
ers [17], [18], [19]. But the degradation problem comes up pixel space, which is not scalable. On the other hand, con-
when the deep networks start converging, and accuracy gets trastive discriminative methods operate on augmentations of
saturated and degrades rapidly [20]. He et al. [15] present the data point and hence are less expensive computation-
a residual learning framework to ease the training of deep ally. Modern successes in computer vision challenges have
networks and address the degradation problem. These ResNet lately been attained by contrastive methods based on learn-
architectures form the base models for our work. ing latent-space features by differentiating between unla-
We propose SSCLNet: A Self Supervised Contrastive Loss beled training data. Such contrastive learning techniques pre-
based pre-trained Network for Brain MRI classification. The suppose that two views of the same picture should have
augmented data points are input as positive and negative comparable feature representations when subjected to minor
instances to a deep neural network for label feature learning. modifications [31], [32]. The consistency assumption has
The learned features are further passed through a neural been exploited by Dosovitskiy et al. to obtain a parametric
network for contrastive learning of instances. Supervised feature representation for each training instance [33]. Later,
training is applied with a small percentage of labeled data. Wu et al. [34] extended this work into a non-parametric
Finally, the classification is performed using the learned fea- feature representation using a dynamic memory bank to store
tures. We performed numerous experiments with different latent features of data samples. Any image that is not an
ResNet architectures and varied the ratio of labeled data augmentation of the original training instance is deemed
used for supervised training. The proposed technique was negative, and the memory bank is utilized to choose neg-
applied to brain MRI datasets. The major contributions of ative instances for each training instance. Then, without
this paper are as follows: (i) Pre-training a model by self- having to recompute feature vectors, negative samples are
supervised contrastive learning for Brain MRI classification, obtained using the memory bank. By optimizing the recip-
and (ii) Performance analysis by varying the percentage of rocal information between latent representations of positives,
labeled data used for supervised training and changing the simple picture augmentations (such as resizing images, hor-
augmentation types. The privilege of our work over existing izontal flips, color jittering, etc.) and memory banks have
approaches is that the network can learn better features for successfully learned representations [35], [36]. Ciga et al.
downstream classification tasks by pre-training. And thus, the [37] have applied self-supervised contrastive loss for dig-
SSCLNet proposed by us shows comparable performance to ital histopathology datasets. Bootstrap Your Own Latent
other methods. (BYOL), a novel method for self-supervised image represen-
The rest of the paper is organized as follows: Section II tation learning, is proposed by Grill et al. [38]. It is based
presents the related work, and Section III presents the meth- on two neural networks, the online and target networks, that
ods. Section IV explains the datasets, the implementation communicate with and learn from one another. Contrastive
details, and the evaluation metrics. Section V presents the learning’s fundamental premise is to transform the original
results, and Section VI concludes the paper. Our results show data into a feature space where positive pair similarities are
that self-supervised pre-training with unlabeled Brain MRI maximized and those of negative pairs are decreased, respec-
scans improves task performance. tively [39]. The positive and negative pairings are referred to
as previous in early writings. Large numbers of data pairs are
II. RELATED WORK essential to the effectiveness of contrastive models, as demon-
Computer-aided diagnostic systems have long sought unsu- strated by numerous studies [40]. For contrastive learning,
pervised learning since labeled data is scarce and expensive, several loss functions have been put forward. The distance
especially in medical image analysis. Over the last decade, between an anchor and a positive is minimized while the
6674 VOLUME 11, 2023
A. Mishra et al.: SSCLNet: A Self-Supervised Contrastive Loss-Based Pre-Trained Network for Brain MRI Classification

distance between an anchor and a negative is increased, for The loss eai for a given sample dia is given as,
instance, in the case of triplet loss [41]. Nonlinear logistic
exp s(zai , zbi )/τ1
 
regression is used in Noise Contrastive Estimation [42] to a
ei = −log P h   i
M a , za )/τ a , zb )/τ )
distinguish between the observed data and some produced j=1 exp s(z i j 1 + exp((zi j 1
noise. SimCLR, a contrastive learning strategy proposed by (1)
Chen et al. [43], [44], depends on a large number of mini-
batch instances to obtain negative samples for each training where τ1 is the temperature parameter.
instance rather than a custom network or memory bank. Con- Additionally, we put a constraint on the derived fea-
sequently, by supplying more negative samples per training tures, such that, the L2 − norm of the vector is 1. That is,
instance over training epochs, the quality of learnt represen- ∀i, ∥li ∥2 =1, and liz ≥ 0, z = 1, . . . , y, where, ∥.∥2 represents
tations was improved. the L2 − norm of a vector and liz is the zth element of label
feature li .
III. METHODS The instance level contrastive loss Li is calculated for every
The proposed work is based upon the SimCLR approach of augmented sample as,
Chen et al. [43] and applies contrastive learning of instances 1 XM
Li = (ea + ebi ). (2)
for pre-training of the network. Data augmentation operators 2M i=1 i
are applied to data points, then a base encoder learns represen-
C. CLASSIFICATION (CL) BLOCK
tations, which are fed into a neural network that maps these
representations to a feature space by maximizing agreement The features learned from the LFG and the ILCL blocks are
between positive examples, as illustrated in Figure 1. The applied for classification in the Classification Block, which
SSCLNet architecture is split into three blocks – the Label comprises a neural network ∅. The loss function used is the
Feature Generation (LFG) Block, the Instance Level Con- categorical cross-entropy loss.
trastive Learning (ILCL) Block, and finally, the Supervised
Classification (SC) Block. D. DESCRIPTION OF SSCLNet
The essential feature of the proposed approach is the learning
of representations by means of positive and negative samples.
A. LABEL FEATURE GENERATION (LFG) BLOCK
Given an image x, augmentations are applied to it to generate
The proposed framework uses data augmentation to construct
x ′ and x ′′ . Now, the image pairs x, x ′ , x ′ , x ′′ and

samples
data pairs. Given a data instance di , two transformations
x, x ′′ are treated as positive samples. For all other images

0 a and 0 b are applied, resulting in dia = 0 a (di ) and dib =
y ̸= x, the pairs (x, y) are treated as negative samples. This
0 b (di ). In our work, the data augmentations used are as fol-
has also been presented in Figure 2. The contrastive loss
lows: random cropping, random brightness, random contrast,
function maximizes the agreement between positive samples
and random noise. One shared deep neural network σ (.) is
while minimizing the agreement between negative samples.
used to extract label features from the augmented samples as
This concept has been implemented by the SSCLNet archi-
follows: lia = σ (dia ) and lib = σ (dib ). In our work, three
tecture proposed in this work. In the Label Feature Generation
Resnet architectures have been used; however, the method
(LFG) Block, embeddings are generated for the augmented
does not depend on any specific network.
data pairs by the shared network σ (.) which comprises of the
ResNet architecture. In the Instance Level Contrastive Learn-
B. INSTANCE LEVEL CONTRASTIVE LEARNING (ILCL) ing (ILCL) Block, a four layer multilayer perceptron α (.) is
BLOCK used and contrastive loss is applied. The pairwise similarity
Contrastive learning aims to maximize the similarities of pos- of positive samples is increased while that of negative pairs
itive pairs while minimizing them for negative pairs. Positive is reduced. These learnt features are then input to the final
pairs in our work are defined as those generated from the same classification layer ∅ (.) with categorical cross entropy loss,
instance, and negative pairs otherwise. Thus for a mini-batch for obtaining the output.
of size M , two types of augmentations are performed on
 a instance
each di , and 2M data samples are generated as, IV. EXPERIMENTS
a , d b , d b , . . . , d b . For a specific sample d a ,
d1 , d2a , . . . , dM 1 2  M i A. DATASETS
there is one positive pair dia , dib , and the remaining 2M − Two datasets from the Kaggle repository [45], [46] have been
2 are negative pairs. used in this study. However, we created our own datasets
In this block, for contrastive instance level learning, by applying augmentation to the Brain MRI 2-Class and
we take a four-layer nonlinear multilayer perceptron α (.) 4-Class datasets from the Kaggle repository. The dataset of
to map the features li learnt from the LFG Block to a sub- Brain MRI Tumor 2-Class used in this study has 2580 normal
space zai = α(lia ) and zbi = α(lib ) where the instance samples and 2561 tumor samples in the training dataset and
level contrastive  loss is applied. The pairwise similarity is 651 normal samples and 634 tumor samples in the test set.
measured as s zki 1 , zkj 2 = zki i .zkj 2 where r.s denotes the In this study, the images were made grayscale, and the border
dot product of r and s; k1 , k2 ∈ {a, b}; and i, j ∈ [1, M ]. of the skull was located by erasing the background color from

VOLUME 11, 2023 6675


A. Mishra et al.: SSCLNet: A Self-Supervised Contrastive Loss-Based Pre-Trained Network for Brain MRI Classification

FIGURE 1. The block diagram of the proposed framework. The images are subject to two data augmentations, and features are learned by shared
networks in the Label Feature Generation Block. A neural network in the Instance Level Contrastive Learning Block projects the features for maximizing
agreement by contrastive loss. The features from this embedding network are fed into the Classification Block for classification.

FIGURE 3. Brain MRI 2-class dataset visualization.

FIGURE 2. Positive and negative samples.

mented images. The augmentations have been chosen ran-


domly from the following: random cropping, random bright-
the image. As a result, it offered the original image’s con- ness, random contrast, and random noise.
tour. Histogram equalization and median filters were used.
The original dataset contains 512 × 512 images in various B. IMPLEMENTATION DETAILS
dimensions. All of these were downsized to 224 × 224 for We adopted several ResNet architectures (18, 34, and 50) as
processing and normalized between 0 and 1. The dataset our backbone architecture. For each architecture, three sets
of Brain MRI Tumor 4-Class used in this study contains of experiments were conducted. The first one with random
826 samples of glioma tumor, 822 samples of meningioma initialization of the ResNet, the second one with ImageNet
tumor, 395 samples of no tumor, and 827 samples of pituitary initialization, and the third one was the SSCLNet. For this, for
tumor class in the training set, and 100 samples of glioma initial adapting, we fine-tuned our ResNet architectures for
tumors, 115 samples of meningioma tumors, 105 samples of 100 epochs by adding a few layers after convolutional layers.
no tumors, and 74 samples of pituitary tumors for testing The hyperparameter tuning was done by running several
purposes. Median filters and histogram equalization were experiments. Our next step was the contrastive pre-training
applied. The original size of the dataset was 512×512, which step. For the contrastive learning framework, we used four
we resized into 224 × 224. dense layers with 512, 512, 256, and 256 neurons, with a
The visualization of 2-class and 4-class datasets are given dropout of 0.4 in the last dense layer. The network compris-
in Figures 3 and 4, while Figures 5 and 6 present the aug- ing ResNet architecture and the dense layers is named the

6676 VOLUME 11, 2023


A. Mishra et al.: SSCLNet: A Self-Supervised Contrastive Loss-Based Pre-Trained Network for Brain MRI Classification

For the pre-training dataset, we randomly sample images


from the 2-Class and 4-Class datasets. The implementation
code can be found at [49].

V. RESULTS AND ANALYSIS


We compare self-supervised pre-trained networks with ran-
dom and ImageNet initialization for ResNet 18, ResNet 34,
and ResNet 50.

A. OVERALL PERFORMANCE ANALYSIS


It is seen from the graph plots of Figure 7 that the self-
supervised pre-trained network, SSCLNet is superior to Ima-
geNet initialization for the 4-Class dataset. SSCLNet gives
the highest accuracy of 63.45%, 53.3%, and 69.04% for the
FIGURE 4. Brain MRI 4-class dataset visualization. ResNet 18, ResNet 34, and ResNet 50 architectures. The
F1-Scores for SSCLNet are 68%, 56%, and 75% for the
three architectures, achieving the highest value in all three
cases. The results of SSCLNet applied to the 2-Class dataset
(presented in Figure 8) show the highest values of accuracy
and F1-Scores for the ResNet 50 architecture and not-so-
promising values for the others. The ROC curves presented
in Figure 9 also show that the SSCLNet architecture gives the
FIGURE 5. Augmented images of Brain MRI 2-class dataset. best AUC value for the Brain MRI 2-class data for ResNet 50
model.

B. PERCENTAGE LABELED DATA FOR SUPERVISED


TRAINING
We conducted experiments with ResNet 50 architecture to
study the variation in accuracy and F1-score performance
when the ratio of labeled data used for supervised training is
FIGURE 6. Augmented images of Brain MRI 4-class dataset. changed. For the 4-Class dataset, we find that at 30% labeled
data, the accuracy values are 43% for Random, 45% for Ima-
geNet initialization, and 48% for SSCLNet. At 50% labeled
Embedding network. The output dimension of the embedding data, the values for Random initialization and SSCLNet show
layers was fixed at 32. The entire embedding network is an increase, 52% and 51%, respectively, but that for ImageNet
then trained end to end in a self-supervised fashion. Adam initialization shows a fall from 45% to 44%. Though the
optimizer with an initial learning rate of 0.0003 was adopted. accuracy curve for the Random and ImageNet initialization
Owing to memory limitations, we fixed the batch size at 64. shows a zig-zag pattern indicating a fall in accuracy for an
The next phase is the supervised training phase. Here we ini- increased percentage of labeled data, the curve for SSCLNet
tialize our supervised architecture consisting of dense layers shows a constant upward movement, as shown in Figure 10.
with SoftMax as the output layers. Seven dense layers with Similar behavior is observed in the F1-Score curve. Thus,
dimensions 256, 256, 128, 128, 64, 32, and 16 were applied we can say that there is an increase in the performance of
before the final classification SoftMax layer. The representa- SSCLNet with the increase in the percentage of labeled data
tions generated from the previous embedding network act as used for supervised training, and the increase is from 48%
input to the supervised architecture. The SoftMax prediction accuracy at 30% labeled data to 69% accuracy for 100%
of the vector representation acts as the output. With the labels labeled data. However, for the 2-Class dataset, as shown in
of the embeddings, the supervised architecture is then trained Figure 11, the SSCLNet does not have a very smooth upward
end to end as supervised training. The amount of labeled data curve, even though there is an overall increase in accuracy
available was varied to check the model’s performance for from 63% at 30% data to 71% at 100% labeled data. A similar
different percentages of labeled data. We compare the pre- increase from 63% at 30% data to 71% at 100% labeled data
trained proposed network SSCLNet with randomly initialized is seen for the F1-Score as well.
and ImageNet pre-trained ResNet 18, 34, and 50. We have
adopted fine-tuning for supervised training. The widely used C. EFFECT OF AUGMENTATION TECHNIQUES
metrics accuracy, F1-score, precision, and recall are used to We experimented by applying different augmentation tech-
evaluate our method. Higher values of these metrics indicate niques with the ResNet 50 backbone for SSCLNet, and the
better performance. 4-Class dataset.

VOLUME 11, 2023 6677


A. Mishra et al.: SSCLNet: A Self-Supervised Contrastive Loss-Based Pre-Trained Network for Brain MRI Classification

FIGURE 7. Overall performance analysis for Brain MRI 4-class dataset.

FIGURE 8. Overall performance analysis for Brain MRI 2-class dataset.

FIGURE 9. ROC curves for Random, Imagenet and SSCLNet initialization for Brain MRI 2-class dataset, ResNet 50 architecture.

TABLE 1. Effect of augmentation techniques. the accuracy and F1-Score values vary from 64.97% to
69.04% and from 69% to 75%, respectively. This shows that
the choice of augmentation techniques has an influence on
the performance of the model and may be done based on
a validation subset. In the present study, random cropping,
random brightness, random contrast, and random noise have
been applied, and the results are shown in the last row of
Table 1. The F1-Score is the highest among all the experi-
ments; however, accuracy is marginally low (by 0.5%) from
the random brightness and random noise techniques.

The results of experiments with different augmentation D. STATISTICAL EVALUATION


techniques are presented in Table 1, and it is seen that by An interval statistic called a confidence interval (CI) is used
selecting randomly from different augmentation techniques, to express how uncertain an estimate is. It offers both, a like-

6678 VOLUME 11, 2023


A. Mishra et al.: SSCLNet: A Self-Supervised Contrastive Loss-Based Pre-Trained Network for Brain MRI Classification

FIGURE 10. Accuracy and F1 score plots when {30, 50, 60, 80, 90, 100} % of the labeled Brain MRI – 4 class data is used
for supervised training.

FIGURE 11. Accuracy and F1 score plots when {30, 50, 60, 80, 90, 100} % of the labeled Brain MRI – 2 class data is used for
supervised training.

TABLE 2. 95% CI for ResNet 18 architecture, 4-class. TABLE 4. 95% CI for ResNet 50 architecture, 4-class.

TABLE 3. 95% CI for ResNet 34 architecture, 4-class.


It is seen from Table 2 that for the ResNet 18 architec-
ture and 4-class dataset, the accuracy values for SSCLNet
architecture vary from 63.314% to 63.586% that is, (63.45 ±
0.136)%. This implies that it is expected that with 95% confi-
dence the efficiency of the proposed model is likely between
63.314% and 63.586%.
From Table 3, it is seen that for ResNet 34 architecture and
lihood, a lower and upper bound. According to Cumming the 4-class dataset, the accuracy values for SSCLNet architec-
& Calin-Jageman [47], a short CI typically denotes a tiny ture vary from 53.252% to 53.348% that is, (53.3±0.0482)%.
margin of error. This range can be used to calculate a model’s This implies that it is expected that with 95% confidence the
capability estimate. In addition to statistical significance efficiency of the proposed model is likely between 53.252%
tests, CI is a branch of statistics that can be used to report and and 53.348%.
evaluate experimental results [48]. The typical calculations From Table 3, it is seen that for ResNet 50 architecture and
put them at 95%, 98%, and 99%. According to a 95% CI, 95% the 4-class dataset, the accuracy values for SSCLNet architec-
of the studies conducted will fall inside the range, whereas 5% ture vary from 68.867% to 69.213% that is, (69.04±0.173)%.
will not. We now give the calculated results of the numerous This implies that it is expected that with 95% confidence the
experiments presented earlier, with the Brain MRI 4-class efficiency of the proposed model is likely between 68.867%
dataset, with a 95% confidence interval in Table 2 – Table 4. and 69.213%.

VOLUME 11, 2023 6679


A. Mishra et al.: SSCLNet: A Self-Supervised Contrastive Loss-Based Pre-Trained Network for Brain MRI Classification

E. DISCUSSION It is also found that contrastive learning may not show much
From the accuracy and F1-Score values of 2-Class and improvement when representations fail to encode domain-
4-Class data presented in Figures 7 and 8, we see that con- specific information due to a smaller number of negative
trastive learning shows remarkable improvement when the samples or when there is lesser variation in the pre-training
pre-training dataset contains more diverse images, which is dataset.
the case with the 4-Class dataset (Figure 4). The increase
in the percentage of labeled data used for supervised train- ACKNOWLEDGMENT
ing also enhances the performance of SSCLNet. Similarly, The authors are indebted to, and thank the anonymous review-
changing the augmentations applied to data samples impacts ers for providing valuable suggestions, which helped them
the accuracy and F1-Scores, as seen in Table 1. The ROC prepare the article in its present form.
curves presented in Figure 9 also show the better performance
of SSCLNet architecture. It is further noted from the results REFERENCES
in Figures 10 and 11, that the improvement in accuracy is [1] R. Hoshide and R. Jandial, ‘‘2016 world health organization clas-
approximately 10% for the SSCLNet, as compared to the sification of central nervous system tumors: An era of molecular
biology,’’ World Neurosurg., vol. 94, pp. 561–562, Oct. 2016, doi:
other two initialization methods. In addition, the results of the 10.1016/j.wneu.2016.07.082.
experiments with the Brain MRI 4-class dataset, with a 95% [2] American Cancer Society. Accessed: Jul. 21, 2022. [Online]. Available:
confidence interval as presented in Table 2 – Table 4 show https://www.cancer.org/cancer.html
that our results are very stable, varying between 63.314% and [3] Brain Tumor: Diagnosis. Accessed: Jul. 21, 2022. [Online]. Available:
https://www.cancer.net/cancer-types/brain-tumor/diagnosis
63.586% for the ResNet 18, between 53.252% and 53.348% [4] G. S. Tandel, M. Biswas, O. G. Kakde, A. Tiwari, H. S. Suri, M. Turk,
for ResNet 34 and between 68.867% and 69.213% for the J. R. Laird, C. K. Asare, A. A. Ankrah, N. N. Khanna, B. K. Madhusudhan,
ResNet 50 architecture. These findings force one to ponder L. Saba, and J. S. Suri, ‘‘A review on a deep learning perspective in
brain cancer classification,’’ Cancers, vol. 11, no. 1, p. 111, 2019, doi:
upon the following questions: 10.3390/cancers11010111.
[5] M. M. Badža and M. Č. Barjaktarović, ‘‘Classification of brain tumors from
• Can one expect to find improvements if the pre-training
MRI images using a convolutional neural network,’’ Appl. Sci., vol. 10,
dataset is made by sampling from both 2-Class and no. 6, p. 1999, Mar. 2020, doi: 10.3390/app10061999.
4-Class data samples? [6] W. Anjali, B. Anuj, and V. S. Verma, ‘‘A review on brain tumor seg-
• Can we use the learned representations for clustering mentation of MRI images,’’ Magn. Reson. Imag., vol. 61, pp. 247–259,
Sep. 2019, doi: 10.1016/j.mri.2019.05.043.
tasks, and will there be an improvement in perfor- [7] Q.-H. Kha, T.-O. Tran, T.-T.-D. Nguyen, V.-N. Nguyen, K. Than, and
mance? N. Q. K. Le, ‘‘An interpretable deep learning model for classifying adap-
• What would be the effect of increasing the size of the tor protein complexes from sequence information,’’ Methods, vol. 207,
pp. 90–96, Nov. 2022, doi: 10.1016/j.ymeth.2022.09.007.
pre-training dataset? [8] Q.-H. Kha, Q.-T. Ho, and N. Q. K. Le, ‘‘Identifying SNARE proteins
These we would like to investigate in our future works. using an alignment-free method based on multiscan convolutional neu-
ral network and PSSM profiles,’’ J. Chem. Inf. Model., vol. 62, no. 19,
pp. 4820–4826, Sep. 2022, doi: 10.1021/acs.jcim.2c01034.
VI. CONCLUSION [9] O. Russakovsky, J. Deng, H. Su, and J. Krause, ‘‘ImageNet large scale
visual recognition challenge,’’ Int. J. Comput. Vis., vol. 115, no. 3,
It is important that good features are learned to achieve pp. 211–252, Apr. 2015, doi: 10.1007/s11263-015-0816-y.
good performance in complex tasks like computer vision or [10] H.-C. Shin, H. R. Roth, M. Gao, L. Lu, and Z. Xu, ‘‘Deep convolutional
pattern recognition. In our work, contrastive learning has neural networks for computer-aided detection: CNN architectures, dataset
characteristics and transfer learning,’’ IEEE Trans. Med. Imag., vol. 35,
been applied for learning the instances by which the model no. 5, pp. 1285–1298, May 2016, doi: 10.1109/TMI.2016.2528162.
is pretrained with unlabeled data, and this is used for the [11] A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, ‘‘CNN features
classification of Brain MRI images. To our knowledge, devel- off-the-shelf: An astounding baseline for recognition,’’ in Proc. IEEE Conf.
oping a classification model using unlabeled data and self- Comput. Vis. Pattern Recognit. Workshops, Jun. 2014, pp. 512–519, doi:
10.1109/CVPRW.2014.131.
supervised learning for MRI classification has not been done [12] H. Azizpour, A. S. Razavian, J. Sullivan, A. Maki, and S. Carlsson,
prior to this work. Our proposed SSCLNet applies the Sim- ‘‘From generic to specific deep representations for visual recognition,’’ in
CLR approach, which learns representations by maximiz- Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW),
Jun. 2015, pp. 36–45, doi: 10.1109/CVPRW.2015.7301270.
ing agreement between differently augmented views of the [13] O. A. B. Penatti, K. Nogueira, and J. A. dos Santos, ‘‘Do deep features
same data example via a contrastive loss in the latent space. generalize from everyday objects to remote sensing and aerial scenes
A stochastic data augmentation module transforms any given domains?’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops
(CVPRW), Jun. 2015, pp. 44–51, doi: 10.1109/CVPRW.2015.7301382.
data example randomly, resulting in two correlated views [14] S. Arora, H. Khandeparkar, M. Khodak, O. Plevrakis, and N. Saunshi,
of the same example, which are treated as the positive pair. ‘‘A theoretical analysis of contrastive unsupervised representation learn-
Then neural network encoders are applied to extract repre- ing,’’ in Proc. 36th Int. Conf. Mach. Learn., 2019, pp. 1–19.
[15] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image
sentations from augmented examples. Supervised training is recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
done using labeled data, and then the model is used for the Jun. 2016, pp. 770–778, doi: 10.1109/CVPR.2016.90.
classification of Brain MRI images. In this work, our aim was [16] M. D. Zeiler and R. Fergus, ‘‘Visualizing and understanding convolutional
to show that by pre-training, better features can be learned networks,’’ in Proc. ECCV, 2014, pp. 818–833.
[17] Y. Bengio, P. Simard, and P. Frasconi, ‘‘Learning long-term dependencies
for downstream classification tasks. The SSCLNet proposed with gradient descent is difficult,’’ IEEE Trans. Neural Netw., vol. 5, no. 2,
by us shows comparable performance to ImageNet training. pp. 157–166, Mar. 1994.

6680 VOLUME 11, 2023


A. Mishra et al.: SSCLNet: A Self-Supervised Contrastive Loss-Based Pre-Trained Network for Brain MRI Classification

[18] X. Glorot and Y. Bengio, ‘‘Understanding the difficulty of training deep [40] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, ‘‘Momentum contrast for
feedforward neural networks,’’ in Proc. AISTATS, 2010, pp. 249–256. unsupervised visual representation learning,’’ in Proc. IEEE/CVF Conf.
[19] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Delving deep into rectifiers: Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 9729–9738.
Surpassing human-level performance on ImageNet classification,’’ in Proc. [41] F. Schroff, D. Kalenichenko, and J. Philbin, ‘‘FaceNet: A unified embed-
IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1026–1034. ding for face recognition and clustering,’’ in Proc. IEEE Conf. Comput. Vis.
[20] K. He and J. Sun, ‘‘Convolutional neural networks at constrained time Pattern Recognit. (CVPR), Jun. 2015, pp. 815–823.
cost,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), [42] M. Gutmann and A. Hyvarinen, ‘‘Noise-contrastive estimation: A new
Jun. 2015, pp. 5353–5360. estimation principle for unnormalized statistical models,’’ in Proc. 13th
[21] A. Ng, ‘‘Sparse autoencoder,’’ CS294A Lect. Notes, vol. 72, pp. 1–19, Int. Conf. Artif. Intell. Statist., 2010, pp. 297–304.
Jan. 2011. [43] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, ‘‘A simple
[22] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, framework for contrastive learning of visual representations,’’ 2020,
‘‘Stacked denoising autoencoders: Learning useful representations in a arXiv:2002.05709.
deep network with a local denoising criterion,’’ J. Mach. Learn. Res., [44] T. Chen, S. Kornblith, K. Swersky, M. Norouzi, and G. Hinton,
vol. 11, no. 12, pp. 3371–3408, Dec. 2010. ‘‘Big self-supervised models are strong semi-supervised learners,’’ 2020,
[23] M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus, ‘‘Deconvolutional arXiv:2006.10029.
networks,’’ in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recog- [45] MRI 2 Class Dataset. Accessed: Apr. 10, 2022. [Online]. Available:
nit., Jun. 2010, pp. 2528–2535. https://www.kaggle.com/datasets/navoneel/brain-mri-images-for-brain-
[24] A. Mishra and V. Bhattacharjee, ‘‘Applying semi-supervised tumor-detection
learning on human activity recognition data,’’ in Proc. Int. [46] MRI 4 Class Dataset. Accessed: Apr. 10, 2022. [Online].
Conf. IoT Blockchain Technol. (ICIBT), May 2022, pp. 1–6, doi: Available: https://www.kaggle.com/datasets/sartajbhuvaji/brain-tumor-
10.1109/ICIBT52874.2022.9807808. classification-mri
[25] D. P. Kingma and M. Welling, ‘‘Auto-encoding variational Bayes,’’ 2013, [47] G. Cumming and R. Jageman, Introduction to the New Statistics: Estima-
arXiv:1312.6114. tion, Open Science, and Beyond. Evanston, IL, USA: Routledge, 2016.
[26] A. Radford, L. Metz, and S. Chintala, ‘‘Unsupervised representation [48] A. Claridge-Chang and P. N. Assam, ‘‘Estimation statistics should replace
learning with deep convolutional generative adversarial networks,’’ 2015, significance testing,’’ Nature Methods, vol. 13, no. 2, pp. 108–109,
arXiv:1511.06434. Jan. 2016, doi: 10.1038/nmeth.3729.
[27] J. Xu, L. Xiang, Q. Liu, H. Gilmore, J. Wu, J. Tang, and [49] A. Mishra. SSCLNet. Accessed: Dec. 29, 2022. [Online]. Available:
A. Madabhushi, ‘‘Stacked sparse autoencoder (SSAE) for nuclei detection https://github.com/cheersanimesh/SSCLNet
on breast cancer histopathology images,’’ IEEE Trans. Med. Imag., vol. 35,
no. 1, pp. 119–130, Jan. 2016, doi: 10.1109/TMI.2015.2458702.
[28] H. Chang, J. Han, C. Zhong, A. M. Snijders, and J.-H. Mao, ‘‘Unsupervised
transfer learning via multi-scale convolutional sparse coding for biomedi- ANIMESH MISHRA is currently pursuing the
cal applications,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 5, undergraduate degree with the Department of
pp. 1182–1194, May 2018, doi: 10.1109/TPAMI.2017.2656884. Computer Science and Engineering, Birla Insti-
[29] L. Hou, V. Nguyen, A.B. Kanevsky, D. Samaras, T. M. Kurc, tute of Technology, Mesra, Ranchi. He has a pas-
T. Zhao, R. R. Gupta, Y. Gao, W. Chen, and D. Foran, ‘‘Sparse autoen- sion for coding. He is highly interested in vari-
coder for unsupervised nucleus detection and representation in histopathol- ous research areas of computer science and would
ogy images,’’ Pattern Recognit., vol. 86, pp. 188–200, Feb. 2019, doi: like to pursue higher studies. His current interest
10.1016/j.patcog.2018.09.007. includes explore the emerging areas in machine
[30] B. Hu, Y. Tang, E. I.-C. Chang, Y. Fan, M. Lai, and Y. Xu, ‘‘Unsu- learning.
pervised learning for cell-level visual representation in histopathol-
ogy images with generative adversarial networks,’’ IEEE J. Biomed.
Health Informat., vol. 23, no. 3, pp. 1316–1328, May 2019, doi:
10.1109/JBHI.2018.2852639.
[31] X. Wang and A. Gupta, ‘‘Unsupervised learning of visual representations RITESH JHA received the M.Sc. degree in com-
using videos,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, puter science from G. B. Pant University and
pp. 2794–2802, doi: 10.1109/ICCV.2015.320. the Ph.D. degree in computer science from BIT,
[32] S. Becker and G. E. Hinton, ‘‘Self-organizing neural network that dis- Mesra, Ranchi, India. Currently, he is an Assistant
covers surfaces in random-dot stereograms,’’ Nature, vol. 355, no. 6356, Professor with the Department of Computer Sci-
pp. 161–163, Jan. 1992, doi: 10.1038/355161a0. ence and Engineering, BIT. His current research
[33] A. Dosovitskiy, P. Fischer, J. T. Springenberg, M. Riedmiller, and T. Brox, interest includes machine learning applied to
‘‘Discriminative unsupervised feature learning with exemplar convolu- healthcare data.
tional neural networks,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 38,
no. 9, pp. 1734–1747, Sep. 2016, doi: 10.1109/TPAMI.2015.2496141.
[34] Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, ‘‘Unsupervised feature learn-
ing via non-parametric instance discrimination,’’ in Proc. IEEE/CVF
Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 3733–3742, doi: VANDANA BHATTACHARJEE received the B.E.
10.1109/CVPR.2018.00393. degree in CSE from the Birla Institute of Tech-
[35] P. Bachman, R. D. Hjelm, and W. Buchwalter, ‘‘Learning representations nology (BIT), Mesra, Ranchi, in 1989, and the
by maximizing mutual information across views,’’ in Proc. Adv. Neural Inf.
M.Tech. and Ph.D. degrees in computer science
Process. Syst., vol. 32, 2019, pp. 15509–15519.
from Jawaharlal Nehru University, New Delhi, in
[36] O. J. Hénaff, A. Srinivas, J. De Fauw, A. Razavi, C. Doersch, S. M. Ali
1991 and 1995, respectively. She is a Professor
Eslami, and A. van den Oord, ‘‘Data-efficient image recognition with
contrastive predictive coding,’’ 2019, arXiv:1905.09272. with the Department of Computer Science and
[37] O. Ciga, T. Xu, and A. L. Martel, ‘‘Self supervised contrastive learning for Engineering, BIT. She has several national and
digital histopathology,’’ 2020, arXiv:2011.13971. international publications in journal and confer-
[38] J. B. Grill, F. Strub, F. Altché, C. Tallec, P. H. Richemond, E. Buchatskaya, ence proceedings. She has coauthored a book on
C. Doersch, B. A. Pires, Z. D. Guo, M. G. Azar, B. Piot, K. Kavukcuoglu, data analysis. Currently, she is working on deep learning techniques applied
R. Munos, and M. Valko, ‘‘Bootstrap your own latent: A new approach to to the domains of software fault prediction, classification of images, dis-
self-supervised learning,’’ 2020, arXiv:2006.07733. ease prediction, and learning without labels. Her research interests include
[39] R. Hadsell, S. Chopra, and Y. LeCun, ‘‘Dimensionality reduction by learn- machine learning and its applications. She is a Life Member of Computer
ing an invariant mapping,’’ in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Society of India.
Pattern Recognit., vol. 2, Jun. 2006, pp. 1735–1742.

VOLUME 11, 2023 6681

You might also like