SSCLNet A Self-Supervised Contrastive Loss-Based Pre-Trained Network For Brain MRI Classification
SSCLNet A Self-Supervised Contrastive Loss-Based Pre-Trained Network For Brain MRI Classification
ABSTRACT Brain magnetic resonance images (MRI) convey vital information for making diagnostic
decisions and are widely used to detect brain tumors. This research proposes a self-supervised pre-training
method based on feature representation learning through contrastive loss applied to unlabeled data. Self-
supervised learning aims to understand vital features using the raw input, which is helpful since labeled data is
scarce and expensive. For the contrastive loss-based pre-training, data augmentation is applied to the dataset,
and positive and negative instance pairs are fed into a deep learning model for feature learning. Subsequently,
the features are passed through a neural network model to maximize similarity and contrastive learning of
the instances. This pre-trained model serves as an encoder for supervised training and then the classification
of MRI images. Our results show that self-supervised pre-training with contrastive loss performs better than
random or ImageNet initialization. We also show that contrastive learning performs better when the diversity
of images in the pre-training dataset is more. We have taken three differently sized ResNet models as the
base models. Further, experiments were also conducted to study the effect of changing the augmentation
types for generating positive and negative samples for self-supervised training.
INDEX TERMS Contrastive learning, convolutional neural networks, pre-training, ResNet, self-supervised.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 11, 2023 6673
A. Mishra et al.: SSCLNet: A Self-Supervised Contrastive Loss-Based Pre-Trained Network for Brain MRI Classification
data point. Then contrastive loss is applied to maximize the deep unsupervised feature learning has been explored to learn
similarity between the data point and its augmentation. At the the informative representations of images. Most deep unsu-
same time, the similarity between the data point and other pervised learning methods aim to learn the feature represen-
samples is minimized. For example, it has been shown [14] tations that can reconstruct the inputs themselves, such as the
that distributions of augmentations of different dog images auto-encoder (AE), the sparse auto-encoder (SAE) [21], the
tend to be similar. Still, their union has little overlap with denoising auto-encoder (DAE) [22], and the deconvolutional
distributions of augmentations of cat images. network (DeCNN) [23]. Mishra et al. [24] have applied a
In this work, we apply self-supervised learning based on semi-supervised approach for generating pseudo labels for
contrastive loss for brain MRI classification using unlabeled classification. Further, deep generative models, including
data for pre-training the model. The base architecture used the auto-encoding variational Bayes (AEVB) [25] and the
in our experiments is ResNet [15]. Deep networks naturally generative adversarial network (GAN) [26], have been pro-
capture low- and high-level features [16] and classifiers in vided to encode visual information. Generative adversarial
an end-to-end multilayer fashion, which is further enhanced networks have also been utilized for tissue and cell-level cat-
by increasing the number of layers. However, deep networks egorization, while sparse and variational autoencoders have
come with the problem of vanishing/ exploding gradients, been employed for unsupervised nuclei detection and transfer
and several researchers have addressed these problems by learning [27], [28], [29], [30].
normalized initialization or intermediate normalization lay- Nevertheless, generative models primarily work in the
ers [17], [18], [19]. But the degradation problem comes up pixel space, which is not scalable. On the other hand, con-
when the deep networks start converging, and accuracy gets trastive discriminative methods operate on augmentations of
saturated and degrades rapidly [20]. He et al. [15] present the data point and hence are less expensive computation-
a residual learning framework to ease the training of deep ally. Modern successes in computer vision challenges have
networks and address the degradation problem. These ResNet lately been attained by contrastive methods based on learn-
architectures form the base models for our work. ing latent-space features by differentiating between unla-
We propose SSCLNet: A Self Supervised Contrastive Loss beled training data. Such contrastive learning techniques pre-
based pre-trained Network for Brain MRI classification. The suppose that two views of the same picture should have
augmented data points are input as positive and negative comparable feature representations when subjected to minor
instances to a deep neural network for label feature learning. modifications [31], [32]. The consistency assumption has
The learned features are further passed through a neural been exploited by Dosovitskiy et al. to obtain a parametric
network for contrastive learning of instances. Supervised feature representation for each training instance [33]. Later,
training is applied with a small percentage of labeled data. Wu et al. [34] extended this work into a non-parametric
Finally, the classification is performed using the learned fea- feature representation using a dynamic memory bank to store
tures. We performed numerous experiments with different latent features of data samples. Any image that is not an
ResNet architectures and varied the ratio of labeled data augmentation of the original training instance is deemed
used for supervised training. The proposed technique was negative, and the memory bank is utilized to choose neg-
applied to brain MRI datasets. The major contributions of ative instances for each training instance. Then, without
this paper are as follows: (i) Pre-training a model by self- having to recompute feature vectors, negative samples are
supervised contrastive learning for Brain MRI classification, obtained using the memory bank. By optimizing the recip-
and (ii) Performance analysis by varying the percentage of rocal information between latent representations of positives,
labeled data used for supervised training and changing the simple picture augmentations (such as resizing images, hor-
augmentation types. The privilege of our work over existing izontal flips, color jittering, etc.) and memory banks have
approaches is that the network can learn better features for successfully learned representations [35], [36]. Ciga et al.
downstream classification tasks by pre-training. And thus, the [37] have applied self-supervised contrastive loss for dig-
SSCLNet proposed by us shows comparable performance to ital histopathology datasets. Bootstrap Your Own Latent
other methods. (BYOL), a novel method for self-supervised image represen-
The rest of the paper is organized as follows: Section II tation learning, is proposed by Grill et al. [38]. It is based
presents the related work, and Section III presents the meth- on two neural networks, the online and target networks, that
ods. Section IV explains the datasets, the implementation communicate with and learn from one another. Contrastive
details, and the evaluation metrics. Section V presents the learning’s fundamental premise is to transform the original
results, and Section VI concludes the paper. Our results show data into a feature space where positive pair similarities are
that self-supervised pre-training with unlabeled Brain MRI maximized and those of negative pairs are decreased, respec-
scans improves task performance. tively [39]. The positive and negative pairings are referred to
as previous in early writings. Large numbers of data pairs are
II. RELATED WORK essential to the effectiveness of contrastive models, as demon-
Computer-aided diagnostic systems have long sought unsu- strated by numerous studies [40]. For contrastive learning,
pervised learning since labeled data is scarce and expensive, several loss functions have been put forward. The distance
especially in medical image analysis. Over the last decade, between an anchor and a positive is minimized while the
6674 VOLUME 11, 2023
A. Mishra et al.: SSCLNet: A Self-Supervised Contrastive Loss-Based Pre-Trained Network for Brain MRI Classification
distance between an anchor and a negative is increased, for The loss eai for a given sample dia is given as,
instance, in the case of triplet loss [41]. Nonlinear logistic
exp s(zai , zbi )/τ1
regression is used in Noise Contrastive Estimation [42] to a
ei = −log P h i
M a , za )/τ a , zb )/τ )
distinguish between the observed data and some produced j=1 exp s(z i j 1 + exp((zi j 1
noise. SimCLR, a contrastive learning strategy proposed by (1)
Chen et al. [43], [44], depends on a large number of mini-
batch instances to obtain negative samples for each training where τ1 is the temperature parameter.
instance rather than a custom network or memory bank. Con- Additionally, we put a constraint on the derived fea-
sequently, by supplying more negative samples per training tures, such that, the L2 − norm of the vector is 1. That is,
instance over training epochs, the quality of learnt represen- ∀i, ∥li ∥2 =1, and liz ≥ 0, z = 1, . . . , y, where, ∥.∥2 represents
tations was improved. the L2 − norm of a vector and liz is the zth element of label
feature li .
III. METHODS The instance level contrastive loss Li is calculated for every
The proposed work is based upon the SimCLR approach of augmented sample as,
Chen et al. [43] and applies contrastive learning of instances 1 XM
Li = (ea + ebi ). (2)
for pre-training of the network. Data augmentation operators 2M i=1 i
are applied to data points, then a base encoder learns represen-
C. CLASSIFICATION (CL) BLOCK
tations, which are fed into a neural network that maps these
representations to a feature space by maximizing agreement The features learned from the LFG and the ILCL blocks are
between positive examples, as illustrated in Figure 1. The applied for classification in the Classification Block, which
SSCLNet architecture is split into three blocks – the Label comprises a neural network ∅. The loss function used is the
Feature Generation (LFG) Block, the Instance Level Con- categorical cross-entropy loss.
trastive Learning (ILCL) Block, and finally, the Supervised
Classification (SC) Block. D. DESCRIPTION OF SSCLNet
The essential feature of the proposed approach is the learning
of representations by means of positive and negative samples.
A. LABEL FEATURE GENERATION (LFG) BLOCK
Given an image x, augmentations are applied to it to generate
The proposed framework uses data augmentation to construct
x ′ and x ′′ . Now, the image pairs x, x ′ , x ′ , x ′′ and
samples
data pairs. Given a data instance di , two transformations
x, x ′′ are treated as positive samples. For all other images
0 a and 0 b are applied, resulting in dia = 0 a (di ) and dib =
y ̸= x, the pairs (x, y) are treated as negative samples. This
0 b (di ). In our work, the data augmentations used are as fol-
has also been presented in Figure 2. The contrastive loss
lows: random cropping, random brightness, random contrast,
function maximizes the agreement between positive samples
and random noise. One shared deep neural network σ (.) is
while minimizing the agreement between negative samples.
used to extract label features from the augmented samples as
This concept has been implemented by the SSCLNet archi-
follows: lia = σ (dia ) and lib = σ (dib ). In our work, three
tecture proposed in this work. In the Label Feature Generation
Resnet architectures have been used; however, the method
(LFG) Block, embeddings are generated for the augmented
does not depend on any specific network.
data pairs by the shared network σ (.) which comprises of the
ResNet architecture. In the Instance Level Contrastive Learn-
B. INSTANCE LEVEL CONTRASTIVE LEARNING (ILCL) ing (ILCL) Block, a four layer multilayer perceptron α (.) is
BLOCK used and contrastive loss is applied. The pairwise similarity
Contrastive learning aims to maximize the similarities of pos- of positive samples is increased while that of negative pairs
itive pairs while minimizing them for negative pairs. Positive is reduced. These learnt features are then input to the final
pairs in our work are defined as those generated from the same classification layer ∅ (.) with categorical cross entropy loss,
instance, and negative pairs otherwise. Thus for a mini-batch for obtaining the output.
of size M , two types of augmentations are performed on
a instance
each di , and 2M data samples are generated as, IV. EXPERIMENTS
a , d b , d b , . . . , d b . For a specific sample d a ,
d1 , d2a , . . . , dM 1 2 M i A. DATASETS
there is one positive pair dia , dib , and the remaining 2M − Two datasets from the Kaggle repository [45], [46] have been
2 are negative pairs. used in this study. However, we created our own datasets
In this block, for contrastive instance level learning, by applying augmentation to the Brain MRI 2-Class and
we take a four-layer nonlinear multilayer perceptron α (.) 4-Class datasets from the Kaggle repository. The dataset of
to map the features li learnt from the LFG Block to a sub- Brain MRI Tumor 2-Class used in this study has 2580 normal
space zai = α(lia ) and zbi = α(lib ) where the instance samples and 2561 tumor samples in the training dataset and
level contrastive loss is applied. The pairwise similarity is 651 normal samples and 634 tumor samples in the test set.
measured as s zki 1 , zkj 2 = zki i .zkj 2 where r.s denotes the In this study, the images were made grayscale, and the border
dot product of r and s; k1 , k2 ∈ {a, b}; and i, j ∈ [1, M ]. of the skull was located by erasing the background color from
FIGURE 1. The block diagram of the proposed framework. The images are subject to two data augmentations, and features are learned by shared
networks in the Label Feature Generation Block. A neural network in the Instance Level Contrastive Learning Block projects the features for maximizing
agreement by contrastive loss. The features from this embedding network are fed into the Classification Block for classification.
FIGURE 9. ROC curves for Random, Imagenet and SSCLNet initialization for Brain MRI 2-class dataset, ResNet 50 architecture.
TABLE 1. Effect of augmentation techniques. the accuracy and F1-Score values vary from 64.97% to
69.04% and from 69% to 75%, respectively. This shows that
the choice of augmentation techniques has an influence on
the performance of the model and may be done based on
a validation subset. In the present study, random cropping,
random brightness, random contrast, and random noise have
been applied, and the results are shown in the last row of
Table 1. The F1-Score is the highest among all the experi-
ments; however, accuracy is marginally low (by 0.5%) from
the random brightness and random noise techniques.
FIGURE 10. Accuracy and F1 score plots when {30, 50, 60, 80, 90, 100} % of the labeled Brain MRI – 4 class data is used
for supervised training.
FIGURE 11. Accuracy and F1 score plots when {30, 50, 60, 80, 90, 100} % of the labeled Brain MRI – 2 class data is used for
supervised training.
TABLE 2. 95% CI for ResNet 18 architecture, 4-class. TABLE 4. 95% CI for ResNet 50 architecture, 4-class.
E. DISCUSSION It is also found that contrastive learning may not show much
From the accuracy and F1-Score values of 2-Class and improvement when representations fail to encode domain-
4-Class data presented in Figures 7 and 8, we see that con- specific information due to a smaller number of negative
trastive learning shows remarkable improvement when the samples or when there is lesser variation in the pre-training
pre-training dataset contains more diverse images, which is dataset.
the case with the 4-Class dataset (Figure 4). The increase
in the percentage of labeled data used for supervised train- ACKNOWLEDGMENT
ing also enhances the performance of SSCLNet. Similarly, The authors are indebted to, and thank the anonymous review-
changing the augmentations applied to data samples impacts ers for providing valuable suggestions, which helped them
the accuracy and F1-Scores, as seen in Table 1. The ROC prepare the article in its present form.
curves presented in Figure 9 also show the better performance
of SSCLNet architecture. It is further noted from the results REFERENCES
in Figures 10 and 11, that the improvement in accuracy is [1] R. Hoshide and R. Jandial, ‘‘2016 world health organization clas-
approximately 10% for the SSCLNet, as compared to the sification of central nervous system tumors: An era of molecular
biology,’’ World Neurosurg., vol. 94, pp. 561–562, Oct. 2016, doi:
other two initialization methods. In addition, the results of the 10.1016/j.wneu.2016.07.082.
experiments with the Brain MRI 4-class dataset, with a 95% [2] American Cancer Society. Accessed: Jul. 21, 2022. [Online]. Available:
confidence interval as presented in Table 2 – Table 4 show https://www.cancer.org/cancer.html
that our results are very stable, varying between 63.314% and [3] Brain Tumor: Diagnosis. Accessed: Jul. 21, 2022. [Online]. Available:
https://www.cancer.net/cancer-types/brain-tumor/diagnosis
63.586% for the ResNet 18, between 53.252% and 53.348% [4] G. S. Tandel, M. Biswas, O. G. Kakde, A. Tiwari, H. S. Suri, M. Turk,
for ResNet 34 and between 68.867% and 69.213% for the J. R. Laird, C. K. Asare, A. A. Ankrah, N. N. Khanna, B. K. Madhusudhan,
ResNet 50 architecture. These findings force one to ponder L. Saba, and J. S. Suri, ‘‘A review on a deep learning perspective in
brain cancer classification,’’ Cancers, vol. 11, no. 1, p. 111, 2019, doi:
upon the following questions: 10.3390/cancers11010111.
[5] M. M. Badža and M. Č. Barjaktarović, ‘‘Classification of brain tumors from
• Can one expect to find improvements if the pre-training
MRI images using a convolutional neural network,’’ Appl. Sci., vol. 10,
dataset is made by sampling from both 2-Class and no. 6, p. 1999, Mar. 2020, doi: 10.3390/app10061999.
4-Class data samples? [6] W. Anjali, B. Anuj, and V. S. Verma, ‘‘A review on brain tumor seg-
• Can we use the learned representations for clustering mentation of MRI images,’’ Magn. Reson. Imag., vol. 61, pp. 247–259,
Sep. 2019, doi: 10.1016/j.mri.2019.05.043.
tasks, and will there be an improvement in perfor- [7] Q.-H. Kha, T.-O. Tran, T.-T.-D. Nguyen, V.-N. Nguyen, K. Than, and
mance? N. Q. K. Le, ‘‘An interpretable deep learning model for classifying adap-
• What would be the effect of increasing the size of the tor protein complexes from sequence information,’’ Methods, vol. 207,
pp. 90–96, Nov. 2022, doi: 10.1016/j.ymeth.2022.09.007.
pre-training dataset? [8] Q.-H. Kha, Q.-T. Ho, and N. Q. K. Le, ‘‘Identifying SNARE proteins
These we would like to investigate in our future works. using an alignment-free method based on multiscan convolutional neu-
ral network and PSSM profiles,’’ J. Chem. Inf. Model., vol. 62, no. 19,
pp. 4820–4826, Sep. 2022, doi: 10.1021/acs.jcim.2c01034.
VI. CONCLUSION [9] O. Russakovsky, J. Deng, H. Su, and J. Krause, ‘‘ImageNet large scale
visual recognition challenge,’’ Int. J. Comput. Vis., vol. 115, no. 3,
It is important that good features are learned to achieve pp. 211–252, Apr. 2015, doi: 10.1007/s11263-015-0816-y.
good performance in complex tasks like computer vision or [10] H.-C. Shin, H. R. Roth, M. Gao, L. Lu, and Z. Xu, ‘‘Deep convolutional
pattern recognition. In our work, contrastive learning has neural networks for computer-aided detection: CNN architectures, dataset
characteristics and transfer learning,’’ IEEE Trans. Med. Imag., vol. 35,
been applied for learning the instances by which the model no. 5, pp. 1285–1298, May 2016, doi: 10.1109/TMI.2016.2528162.
is pretrained with unlabeled data, and this is used for the [11] A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson, ‘‘CNN features
classification of Brain MRI images. To our knowledge, devel- off-the-shelf: An astounding baseline for recognition,’’ in Proc. IEEE Conf.
oping a classification model using unlabeled data and self- Comput. Vis. Pattern Recognit. Workshops, Jun. 2014, pp. 512–519, doi:
10.1109/CVPRW.2014.131.
supervised learning for MRI classification has not been done [12] H. Azizpour, A. S. Razavian, J. Sullivan, A. Maki, and S. Carlsson,
prior to this work. Our proposed SSCLNet applies the Sim- ‘‘From generic to specific deep representations for visual recognition,’’ in
CLR approach, which learns representations by maximiz- Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW),
Jun. 2015, pp. 36–45, doi: 10.1109/CVPRW.2015.7301270.
ing agreement between differently augmented views of the [13] O. A. B. Penatti, K. Nogueira, and J. A. dos Santos, ‘‘Do deep features
same data example via a contrastive loss in the latent space. generalize from everyday objects to remote sensing and aerial scenes
A stochastic data augmentation module transforms any given domains?’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops
(CVPRW), Jun. 2015, pp. 44–51, doi: 10.1109/CVPRW.2015.7301382.
data example randomly, resulting in two correlated views [14] S. Arora, H. Khandeparkar, M. Khodak, O. Plevrakis, and N. Saunshi,
of the same example, which are treated as the positive pair. ‘‘A theoretical analysis of contrastive unsupervised representation learn-
Then neural network encoders are applied to extract repre- ing,’’ in Proc. 36th Int. Conf. Mach. Learn., 2019, pp. 1–19.
[15] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image
sentations from augmented examples. Supervised training is recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
done using labeled data, and then the model is used for the Jun. 2016, pp. 770–778, doi: 10.1109/CVPR.2016.90.
classification of Brain MRI images. In this work, our aim was [16] M. D. Zeiler and R. Fergus, ‘‘Visualizing and understanding convolutional
to show that by pre-training, better features can be learned networks,’’ in Proc. ECCV, 2014, pp. 818–833.
[17] Y. Bengio, P. Simard, and P. Frasconi, ‘‘Learning long-term dependencies
for downstream classification tasks. The SSCLNet proposed with gradient descent is difficult,’’ IEEE Trans. Neural Netw., vol. 5, no. 2,
by us shows comparable performance to ImageNet training. pp. 157–166, Mar. 1994.
[18] X. Glorot and Y. Bengio, ‘‘Understanding the difficulty of training deep [40] K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, ‘‘Momentum contrast for
feedforward neural networks,’’ in Proc. AISTATS, 2010, pp. 249–256. unsupervised visual representation learning,’’ in Proc. IEEE/CVF Conf.
[19] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Delving deep into rectifiers: Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020, pp. 9729–9738.
Surpassing human-level performance on ImageNet classification,’’ in Proc. [41] F. Schroff, D. Kalenichenko, and J. Philbin, ‘‘FaceNet: A unified embed-
IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, pp. 1026–1034. ding for face recognition and clustering,’’ in Proc. IEEE Conf. Comput. Vis.
[20] K. He and J. Sun, ‘‘Convolutional neural networks at constrained time Pattern Recognit. (CVPR), Jun. 2015, pp. 815–823.
cost,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), [42] M. Gutmann and A. Hyvarinen, ‘‘Noise-contrastive estimation: A new
Jun. 2015, pp. 5353–5360. estimation principle for unnormalized statistical models,’’ in Proc. 13th
[21] A. Ng, ‘‘Sparse autoencoder,’’ CS294A Lect. Notes, vol. 72, pp. 1–19, Int. Conf. Artif. Intell. Statist., 2010, pp. 297–304.
Jan. 2011. [43] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, ‘‘A simple
[22] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, framework for contrastive learning of visual representations,’’ 2020,
‘‘Stacked denoising autoencoders: Learning useful representations in a arXiv:2002.05709.
deep network with a local denoising criterion,’’ J. Mach. Learn. Res., [44] T. Chen, S. Kornblith, K. Swersky, M. Norouzi, and G. Hinton,
vol. 11, no. 12, pp. 3371–3408, Dec. 2010. ‘‘Big self-supervised models are strong semi-supervised learners,’’ 2020,
[23] M. D. Zeiler, D. Krishnan, G. W. Taylor, and R. Fergus, ‘‘Deconvolutional arXiv:2006.10029.
networks,’’ in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recog- [45] MRI 2 Class Dataset. Accessed: Apr. 10, 2022. [Online]. Available:
nit., Jun. 2010, pp. 2528–2535. https://www.kaggle.com/datasets/navoneel/brain-mri-images-for-brain-
[24] A. Mishra and V. Bhattacharjee, ‘‘Applying semi-supervised tumor-detection
learning on human activity recognition data,’’ in Proc. Int. [46] MRI 4 Class Dataset. Accessed: Apr. 10, 2022. [Online].
Conf. IoT Blockchain Technol. (ICIBT), May 2022, pp. 1–6, doi: Available: https://www.kaggle.com/datasets/sartajbhuvaji/brain-tumor-
10.1109/ICIBT52874.2022.9807808. classification-mri
[25] D. P. Kingma and M. Welling, ‘‘Auto-encoding variational Bayes,’’ 2013, [47] G. Cumming and R. Jageman, Introduction to the New Statistics: Estima-
arXiv:1312.6114. tion, Open Science, and Beyond. Evanston, IL, USA: Routledge, 2016.
[26] A. Radford, L. Metz, and S. Chintala, ‘‘Unsupervised representation [48] A. Claridge-Chang and P. N. Assam, ‘‘Estimation statistics should replace
learning with deep convolutional generative adversarial networks,’’ 2015, significance testing,’’ Nature Methods, vol. 13, no. 2, pp. 108–109,
arXiv:1511.06434. Jan. 2016, doi: 10.1038/nmeth.3729.
[27] J. Xu, L. Xiang, Q. Liu, H. Gilmore, J. Wu, J. Tang, and [49] A. Mishra. SSCLNet. Accessed: Dec. 29, 2022. [Online]. Available:
A. Madabhushi, ‘‘Stacked sparse autoencoder (SSAE) for nuclei detection https://github.com/cheersanimesh/SSCLNet
on breast cancer histopathology images,’’ IEEE Trans. Med. Imag., vol. 35,
no. 1, pp. 119–130, Jan. 2016, doi: 10.1109/TMI.2015.2458702.
[28] H. Chang, J. Han, C. Zhong, A. M. Snijders, and J.-H. Mao, ‘‘Unsupervised
transfer learning via multi-scale convolutional sparse coding for biomedi- ANIMESH MISHRA is currently pursuing the
cal applications,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 5, undergraduate degree with the Department of
pp. 1182–1194, May 2018, doi: 10.1109/TPAMI.2017.2656884. Computer Science and Engineering, Birla Insti-
[29] L. Hou, V. Nguyen, A.B. Kanevsky, D. Samaras, T. M. Kurc, tute of Technology, Mesra, Ranchi. He has a pas-
T. Zhao, R. R. Gupta, Y. Gao, W. Chen, and D. Foran, ‘‘Sparse autoen- sion for coding. He is highly interested in vari-
coder for unsupervised nucleus detection and representation in histopathol- ous research areas of computer science and would
ogy images,’’ Pattern Recognit., vol. 86, pp. 188–200, Feb. 2019, doi: like to pursue higher studies. His current interest
10.1016/j.patcog.2018.09.007. includes explore the emerging areas in machine
[30] B. Hu, Y. Tang, E. I.-C. Chang, Y. Fan, M. Lai, and Y. Xu, ‘‘Unsu- learning.
pervised learning for cell-level visual representation in histopathol-
ogy images with generative adversarial networks,’’ IEEE J. Biomed.
Health Informat., vol. 23, no. 3, pp. 1316–1328, May 2019, doi:
10.1109/JBHI.2018.2852639.
[31] X. Wang and A. Gupta, ‘‘Unsupervised learning of visual representations RITESH JHA received the M.Sc. degree in com-
using videos,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Dec. 2015, puter science from G. B. Pant University and
pp. 2794–2802, doi: 10.1109/ICCV.2015.320. the Ph.D. degree in computer science from BIT,
[32] S. Becker and G. E. Hinton, ‘‘Self-organizing neural network that dis- Mesra, Ranchi, India. Currently, he is an Assistant
covers surfaces in random-dot stereograms,’’ Nature, vol. 355, no. 6356, Professor with the Department of Computer Sci-
pp. 161–163, Jan. 1992, doi: 10.1038/355161a0. ence and Engineering, BIT. His current research
[33] A. Dosovitskiy, P. Fischer, J. T. Springenberg, M. Riedmiller, and T. Brox, interest includes machine learning applied to
‘‘Discriminative unsupervised feature learning with exemplar convolu- healthcare data.
tional neural networks,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 38,
no. 9, pp. 1734–1747, Sep. 2016, doi: 10.1109/TPAMI.2015.2496141.
[34] Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, ‘‘Unsupervised feature learn-
ing via non-parametric instance discrimination,’’ in Proc. IEEE/CVF
Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 3733–3742, doi: VANDANA BHATTACHARJEE received the B.E.
10.1109/CVPR.2018.00393. degree in CSE from the Birla Institute of Tech-
[35] P. Bachman, R. D. Hjelm, and W. Buchwalter, ‘‘Learning representations nology (BIT), Mesra, Ranchi, in 1989, and the
by maximizing mutual information across views,’’ in Proc. Adv. Neural Inf.
M.Tech. and Ph.D. degrees in computer science
Process. Syst., vol. 32, 2019, pp. 15509–15519.
from Jawaharlal Nehru University, New Delhi, in
[36] O. J. Hénaff, A. Srinivas, J. De Fauw, A. Razavi, C. Doersch, S. M. Ali
1991 and 1995, respectively. She is a Professor
Eslami, and A. van den Oord, ‘‘Data-efficient image recognition with
contrastive predictive coding,’’ 2019, arXiv:1905.09272. with the Department of Computer Science and
[37] O. Ciga, T. Xu, and A. L. Martel, ‘‘Self supervised contrastive learning for Engineering, BIT. She has several national and
digital histopathology,’’ 2020, arXiv:2011.13971. international publications in journal and confer-
[38] J. B. Grill, F. Strub, F. Altché, C. Tallec, P. H. Richemond, E. Buchatskaya, ence proceedings. She has coauthored a book on
C. Doersch, B. A. Pires, Z. D. Guo, M. G. Azar, B. Piot, K. Kavukcuoglu, data analysis. Currently, she is working on deep learning techniques applied
R. Munos, and M. Valko, ‘‘Bootstrap your own latent: A new approach to to the domains of software fault prediction, classification of images, dis-
self-supervised learning,’’ 2020, arXiv:2006.07733. ease prediction, and learning without labels. Her research interests include
[39] R. Hadsell, S. Chopra, and Y. LeCun, ‘‘Dimensionality reduction by learn- machine learning and its applications. She is a Life Member of Computer
ing an invariant mapping,’’ in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Society of India.
Pattern Recognit., vol. 2, Jun. 2006, pp. 1735–1742.