0% found this document useful (0 votes)
8 views

Multi-Learner Based Deep Meta-Learning For Few-Shot Medical Image Classification

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Multi-Learner Based Deep Meta-Learning For Few-Shot Medical Image Classification

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO.

1, JANUARY 2023 17

Multi-Learner Based Deep Meta-Learning for


Few-Shot Medical Image Classification
Hongyang Jiang , Mengdi Gao, Heng Li , Member, IEEE, Richu Jin, Hanpei Miao,
and Jiang Liu , Senior Member, IEEE

Abstract—Few-shot learning (FSL) is promising in I. INTRODUCTION


the field of medical image analysis due to high cost of
EEP learning has revolutionized image classification tasks
establishing high-quality medical datasets. Many FSL
approaches have been proposed in natural image scenes.
However, present FSL methods are rarely evaluated
D in recent years [1], [2]. With the development of com-
puting power and algorithm model, researches can elaborately
on medical images and the FSL technology applicable design good domain-adapted deep learning models if sufficient
to medical scenarios need to be further developed.
Meta-learning has supplied an optional framework to learnable data were available [1], [3]. However, in actual medical
address the challenging FSL setting. In this paper, we scenarios, several non-negligible issues about medical image
propose a novel multi-learner based FSL method for analysis need to be considered. First, it is time-consuming and la-
multiple medical image classification tasks, combining borious to accumulate and annotate vast amounts of medical data
meta-learning with transfer-learning and metric-learning. due to the data privacy and professional annotation requirements.
Our designed model is composed of three learners,
including auto-encoder, metric-learner and task-learner. In Second, the medical data generally follow a long-tailed distribu-
transfer-learning, all the learners are trained on the base tion and the data collection of some diseases with low incidence
classes. In the ensuing meta-learning, we leverage multiple is more difficult causing a serious problem of imbalance in data
novel tasks to fine-tune the metric-learner and task-learner volumes, which is adverse for deep learning. Third, learning
in order to fast adapt to unseen tasks. Moreover, to further quickly and effectively is a hallmark of human intelligence and
boost the learning efficiency of our model, we devised
real-time data augmentation and dynamic Gaussian how to learn to learn with limited learning materials is the pursuit
disturbance soft label (GDSL) scheme as effective of current deep learning models. Consequently, there has been
generalization strategies of few-shot classification tasks. growing research interest in reducing the required amount of
We have conducted experiments for three-class few-shot data when designing deep learning frameworks. Meanwhile,
classification tasks on three newly-built challenging few-shot learning (FSL) methods make computer-aided detec-
medical benchmarks, BLOOD, PATH and CHEST. Extensive
comparisons to related works validated that our method tion (CADe) and computer-aided diagnosis (CADx) available
achieved top performance both on homogeneous medical for rare or low prevalence diseases.
datasets and cross-domain datasets. In FSL tasks, several vulnerabilities of the designed model are
possible, e.g. over-fitting and knowledge forgetting. To alleviate
Index Terms—Medical image, few-shot learning, meta-
learning, metric-learner, transfer-learning. the over-fitting phenomenon, various data augmentation skills
have been employed to increase the amount of training data [4].
Besides, to avoid forgetting knowledge from a small proportion
Manuscript received 21 June 2022; revised 14 September 2022; of training data, researchers also attempted to assign larger
accepted 8 October 2022. Date of publication 17 October 2022; date weights to categories in minority to tackle the class imbalance.
of current version 5 January 2023. This work was supported in part However, present FSL models cannot benefit a lot from above
by General Program of National Natural Science Foundation of China
under Grant 82272086, in part by Guangdong Provincial Department of strategies. Some effective approaches have been proposed to
Education under Grant 2020ZDZX3043, in part by Guangdong Provin- settle down the FSL dilemma, including transfer-learning [5],
cial Key Laboratory under Grant 2020B121201001, in part by Shen- [6], [7], [8], [9], metric learning [10], [11], [12], [13] and
zhen Natural Science Fund under Grant JCYJ20200109140820699,
and in part by the Stable Support Plan Program under Grant meta-learning [14], [15], [16], [17]. The motivation of transfer-
20200925174052004. (Corresponding author: Jiang Liu.) learning is to create a high performance learner for a target
Hongyang Jiang, Heng Li, Richu Jin, and Hanpei Miao are with domain with a small amount of training data from a related
the Department of Computer Science and Engineering, Southern Uni-
versity of Science and Technology, Shenzhen 518055, China (e-mail: source domain with large amounts of training data. The basic
[email protected]; [email protected]; [email protected]. idea of metric learning is to establish similarity or dissimilarity
cn; [email protected]). between samples based on a distance metric or an optimized
Mengdi Gao is with the Department of Biomedical Engineering, Col-
lege of Future Technology, Peking University, Beijing 100871, China metric network. Generally, when conducting classification tasks,
(e-mail: [email protected]). several support samples are required for classifying new sam-
Jiang Liu is with the Research Institute of Trustworthy Autonomous ples.
Systems, Southern University of Science and Technology, Shenzhen
518055, China (e-mail: [email protected]). Meta-learning, a task-level optimization-based method,
Digital Object Identifier 10.1109/JBHI.2022.3215147 consists of meta-training and meta-testing stages. The aim of

2168-2194 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
18 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 1, JANUARY 2023

Fig. 1. Pipeline of our proposed FSL framework, consisting of transfer-learning phase and meta-learning phase. Three learners, i.e., autoencoder,
metric-learner and task-learner constitute our model and the learner in bold are learnable while others are frozen at each phase. The SS are the
scaling and shifting parameters of learners (encoder and metric-learner).

meta-learning is to transfer experience from base learning tasks 2) We design a multi-learner based FSL model that inte-
(meta-training) to unseen tasks (meta-testing). Humans are able grates auto-encoder, metric-learner and task-learner to
to quickly learn an object just through a few samples and can improve the accuracy and robustness of our model.
apply their skills to the process of new tasks learning. Thus, 3) We propose a Gaussian disturbance soft label (GDSL)
humans have the inborn ability to learn how to learn, which is the for each medical image during training to reduce the
essence of meta-learning. Although meta-learning demonstrates risk of over-fitting, which is illustrated in Section IV.
its great promising potential in machine learning community, Experiments show that the GDSL strategy can improve
it has not been fully evaluated and widely applied in medical the performance of the FSL model.
scenarios. One of the most challenges is that the domain shifting 4) We simulate three FSL scenarios of medical image
between the meta-training and meta-testing dataset, which is (i.e., BLOOD, PATHOLOGY and CHEST) classification
difficult to be stabilized and handled. In general, contents of based on three publicly available medical image datasets,
medical images are more fine-grained. The similarity differ- and further validate the effectiveness of our method.
ences between inter-class and intra-class for medical images 5) We evaluate the generalization of our method on a non-
are mostly not very significant, which increases the difficulty of medical public dataset, i.e. miniImageNet. Then, we
feature extraction. Thus, a good and robust feature representation prove the transferability of our method in the case of
learner is quite important, especially for the FSL task. cross-domain from the miniImageNet to medical datasets.
To conquer the challenges of few-shot classification tasks of Experimental results demonstrate the superiority of our
medical images in multiple modals, we propose a novel FSL method.
model with three learners, i.e. auto-encoder, metric-learner and
task learner. The pipeline of our method is elaborated in Fig. 1,
consisting of transfer-learning phase and meta-learning phase. II. RELATED WORK
Concretely, the auto-encoder and metric-learner help to extract Research literature on FSL of image classification tasks ex-
feature representations with good semantic consistency and hibits great diversity, spanning from data augmentation [4] to
similarity difference, respectively. Furthermore, the task-learner supervised learning [5], [6], [7], [8], [9], [10], [12], [14], [16],
performs specific classification tasks based on well-extracted [17]. In this work, some FSL methods, including fine-tuning
feature representations. Inspired by Sun et al. [17], we have based, metric-learning based and meta-learning based, are most
inherited and improved the FSL framework. On the one hand, relevant to ours, which is introduced in detail as follows.
the transfer-learning initially conduct the preliminary-training
based on large-scale data using all training datapoints in the
meta-training, and then the preliminary-testing based on N - A. Fine-Tuning Based Methods
way K-shot tasks in the meta-validation. On the other hand, The fine-tuning based method follows a standard transfer
after transfer-learning we have introduced the parameters of learning procedure that is a leading strategy in medical image
scaling and shifting (SS) for both the pre-trained encoder and analysis [5]. Researches aim at solving a specific task in the
metric-learner [17]. More specifically, the SS of encoder is target domain through transferring the knowledge learned from
fine-tuned and the task-learner and metric-learner are re-trained a relevant source domain. For a new training task, the model
through multiple episodes in meta-training. Then, the SS of pre-trained on large-scale images from one similar domain have
metric-learner is fine-tuned and the task-learner are re-trained proven to be a better parameter initialization way [6]. The
again for fast adaptation to novel unseen tasks in meta-testing. fine-tuning based methods consist of two-stages, pre-training
The contributions of this work can be summarized as follows: with base classes and fine-tuning with novel classes. In the
1) We propose an effective learning framework for few-shot pre-training stage, the whole auxiliary set with base classes is
medical image classification tasks, including two phases, utilized to train a feature extractor and classifier via the standard
transfer-learning and meta-learning, which can rescue the cross-entropy loss. Then, the fine-tuning strategy is conducted
learning dilemma caused by the scarcity of target learning based on the support set with novel classes to relearn parameters
samples. of the feature extractor and classifier. Basically, partial or whole

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: MULTI-LEARNER BASED DEEP META-LEARNING FOR FEW-SHOT MEDICAL IMAGE CLASSIFICATION 19

layers of the pre-trained feature extractor are fixed to avoid over- optimization between the base-learner and the meta-learner. The
fitting due to the limitation of support set. Once the fine-tuned base-learner is updated through training datapoints in each task.
feature extractor and classifier are achieved, the query set can Next, the meta-learner is optimized by meta fine-tuning with test
be predicted to evaluate the performance of the united models. datapoints in each task. Consequently, the meta-learner is able
Qi et al. [7] proposed a novel fine-tuning based classifier with to learn cross-task meta-knowledge, benefitting fast adaptation
imprinted weights that can be generated by the mean of fea- on novel tasks.
ture embedding vectors of low-shot samples. The experiments Finn et al. [14] proposed a model-agnostic meta-learning
proved that their method can provide better generalization than (MAML) method that is one of the popular representatives of
the comparative embedding method. Chen et al. [8] shown that meta-learning. Moreover, some derivates of the MAML were
fine-tuning based methods compared favorably against other also developed [15]. The main idea of the MAML was to learn
FSL approaches in a realistic cross-domain evaluation setting. an initialization of the neural network that followed the fast gra-
Although different modalities of base classes can provide differ- dient direction to classify novel classes effectively. Besides, the
ent types of transfer knowledge, Cheplygina et al. [9] reported latent embedding optimization (LEO) [16] method had a similar
that base classes do not have to be related to novel classes. learning algorithm to that of the MAML, including the inner
Thus, transfer learning from natural datasets (source domain) loop for getting task-specific parameter initialization and the
to medical datasets (target domain) was feasible [9]. However, outer loop for parameter updating. However, instead of directly
if the amount of data in novel classes is extremely small, the fine- learning the explicit high-dimensional model parameters, LEO
tuning based method is still prone to over-fitting phenomenon decoupled the gradient-based adaptation process within a low
and lack of generalization for novel classes. dimensional latent space and learned a generative distribution
of model parameters. The early meta-learning based method
B. Metric-Learning Based Methods merely followed a pure meta-training paradigm through training
a model from scratch. However, in recent image recognition
Metric-learning based methods are with simple methodology,
tasks, researchers also have attempted to combine the fine-
directly comparing the similarities or distances between the
tuning and meta-learning together to form hybrid approach. Sun
query image and each labeled image (or support image) in the
et al. [17] proposed a meta-transfer learning (MTL) approach
support set. In specific, the entire support set is firstly jointly
to leverage the advantages of both transfer-learning and meta-
encoded into the latent representation space. Then, each query
learning in the FSL setting.
image can also be projected into the above space so as to compute
There were a relatively small amount of FSL researches
the similarity between each query image and each support image.
about the medical image. Mahajan et al. [18] implemented
Based on the similarity measurement, the category of each query
the few-shot skin diseases identification and tried fast model
image can be predicted.
adaptation in long-tailed classes distribution settings, based on
Researchers proposed a prototypical network (ProtoNet) [10]
the meta-learning framework. Hu et al. [19] devised a novel data
and its derivates [11], which were classical metric-learning
augmentation method, not in the input space but logit space, ef-
based methods. The mean vector of feature embeddings of
fectively alleviating the over-fitting for classification tasks with
each support class was calculated as its corresponding prototype
limited medical images. Mai et al. [20] formulated the retinal
representation. Then, the similarity between each query image
disease FSL problem as a Student-Teacher learning task with
and each prototype was used for classification. Concretely, the
both the discriminative feature space and knowledge distillation
nearest-neighbor classifier was employed for prediction in the
(KD) technique. In this paper, we propose an innovative FSL
test stage. Sung et al. [12] proposed a relation network (Relation-
classification method for multiple modality medical images on
Net) that was another representative metric-learning method.
the basis of meta-learning, merging the merits of fine-tuning and
The RelationNet can learn a non-linear metric through a neural
metric-learning simultaneously.
network rather than select a specific metric function. Moreover,
Li et al. [13] put forward a covariance metric network (CovaM-
Net) that adopted a new covariance metric with a second-order III. PROBLEM DEFINITION AND DENOTATION
local covariance representation for each class, instead of con-
Meta-learning generally consists of two stages, meta-training
ventional first-order class representations (e.g., mean vector). In
and meta-testing [17]. Both meta-training and meta-testing also
metric-learning based methods, there are no data-independent
contain training and testing stages. Additionally, samples in
parameters in the classifier (e.g., the nearest-neighbor classifier).
meta-training and meta-testing are not datapoints but episodes,
Therefore, it is no need to employ fine-tuning procedure in the
and each episode is a few-shot classification task. Furthermore,
test stage.
the objective of meta-learning is not to classify unseen datapoints
but to fast adapt the previous learned experience or knowledge
C. Meta-Learning Based Methods to a new few-shot classification task.
Meta-learning based methods normally employ a meta- The denotations of meta-training and meta-testing are as fol-
training procedure on a series of few-shot tasks derived from lows. Given an auxiliary image dataset Dbase that has sufficient
the base classes in the training stage. This procedure help the images of base classes for meta-training, we first sample several
well-designed model fast adapt to unseen tasks in the test stage. tasks from a distribution p(T ) such that each T has a few images
In briefly, the meta-training paradigm is composed of a two-loop from some classes. T is also called episode, containing a support

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
20 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 1, JANUARY 2023

Fig. 2. Pictorial representation of our proposed method, including: (a) preliminary-training stage on large-scale data of meta-training, and (b)
training stage of meta-training, and (c) training stage of meta-testing. Thereafter, the performance evaluation of the model is at the test stage of
meta-testing.

set S to train the task-learner, and a query set Q to compute a IV. METHODOLOGY
specified validation loss that is used to optimize the auto-encoder
As shown in Fig. 1, our method consists of two main
and metric-learner. S consists of multiple N -way K-shot tasks,
training phases, transfer-learning and meta-learning. Fig. 2(a)
in which N is the number of selected classes and K is the amount
displays the procedure of transfer-learning that is introduced
of selected images for each selected class. Q contains randomly
in Section A. Fig. 2(b) and (c) demonstrates the pictorial
selected M images from the rest images in the N selected classes
representations of meta-training and meta-testing stages,
as test samples. In particular, meta-training aims to learn from
respectively. Meta-learning is described in the following
multiple episodes sampled from p(T ). For meta-testing, given an
Section B. In addition, to boost the overall learning efficiency,
unseen novel image dataset Dnovel , a new task Tnovel is sampled
both the data augmentation (Section C) and the Gaussian
similarly. “Unseen” means that there is no overlap of classes
disturbance soft label (GDSL) (Section D) strategies are
between meta-testing and meta-training tasks. In our method, the
applied to the meta-learning procedure.
Tnovel in meta-testing starts from the experience of the encoder
and metric-learner, and eventually adapts the task-learner. The
final evaluation is done by testing a set of unseen images in query A. Transfer Learning From Large-Scale Data
set of Tnovel . Hence, our method tries to optimize multi-learners It has been confirmed that basic graphic features without
under the meta-learning framework to learn better performance special semantic meanings are learned in the first few layers
on multiple medical image FSL tasks. of the convolutional neural network (CNN) [21]. Thus, the

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: MULTI-LEARNER BASED DEEP META-LEARNING FOR FEW-SHOT MEDICAL IMAGE CLASSIFICATION 21

well-trained initialization parameters through transfer learning SS parameters to implement fine-tuning in the training stage of
can help to reduce the difficulty of learning basic graphic meta-testing.
features in FSL tasks. In this paper, images from Dbase with In FM , three samples from a mini-batch are required for the
all base classes in meta-training are initially utilized to train inputs, which form a triad. There is an anchor ra , a positive
a classification model. Inspired by the MTL method [17], we sample rp and a negative sample rn in the triad. The ra with the
improve the architecture of FSL model that consists of an other two samples rp and rn constitute two pairs, a positive pair
auto-encoder (FAE ), a metric-learner (FM ) and a task-learner ra , rp  and a negative pair ra , rn . In addition, the distances
(or classifier, FT ). FAE and FM are responsible for extracting between the ra and the others (rp and rn ) are also computed.
robust and universal hidden representations (Fig. 2(a)), which Then, we employ a triplet margin loss function to penalize a short
is highly concerned with the final classification performance. distance D(ra , rn ) of a negative pair and a long distance D(ra ,
FAE can enhance the semantic consistency between the hidden rp ) of a positive pair, which is defined in (3). Inside, a margin
presentation and the original image, and FM can increase the M is the upper bounded distance between positive and negative
clustering performance of hidden presentations. FT is related to pairs and is set to 1.0. We apply the cosine measurement to
the target task, which can be different in transfer-learning and compute the distance D that is given in (4) (A and B are feature
meta-learning. The well-trained FAE and FM are utilized again vectors).
in following meta-learning. To sum up, the convergence path
of parameters of the proposed FSL model is guided through an Lm = max{0, (D(ra , rp ) − D(ra , rn ) + M )}. (3)
integrated loss, derived from FAE , FM and FT (Fig. 2). Those n
AB Ai Bi
three components are introduced in detail as below. D(A, B) = 1 − = 1 − n i=12 n .
A B i=1 i A 2
i=1 Bi
1) Auto-Encoder: The auto-encoder (FAE ) is an self-
supervised learning technique that is mainly composed of an (4)
encoder (HEn ) and a decoder (HDe ) [22]. HEn maps a high di-
A fixed-dimensional feature embedding vector for each sam-
mensional input X into a low dimensional hidden representation
ple will be output after FM , e.g, 1 × 256. In the case of large-
Xh . HDe is the opposite operation of HEn , that is reconstructing
scale training data, we can easily generate sufficient triplets for
X  from Xh . The process can be represented as (1). The loss
metric learning. For N -way K-shot tasks, we get N × K sam-
function La of FAE expects to minimize the reconstruction error
ples in each iteration of meta-training. The number of positive
which can be shown in (2). Inside, L, N and M express the
and negative pairs (Ppos and Pneg ) can be calculated as (5) and
channel, width and height of X and X  , respectively.
(6). Then, the number of optional triplets (Ptri ) is shown in (7).
HEn (X) = Xh , HDe (Xh ) = X  . (1) N  K  N K(K − 1)
1 · 2 =
Ppos = . (5)
L 
 N 
M
  2
1  2
Xkij − Xkij
La = . (2)     K  N (N − 1)K 2
LN M Pneg = N2 · K
k=1 i=1 j=1 1 · 1 = 2
. (6)

In our model, FAE is a fully CNN and HEn is also regarded       K−1 N (N − 1)K(K − 1)2
Ptri = N1 · N1−1 · K2 · = .
as shared network layers of the feature extractor. The size of Xh 1 2
will be reduced to 21n of X after employing n times downsam- (7)
pling in HEn . We want Xh to own more semantic information
about X. Then, we utilize HDe to reconstruct X  , which has Normally, both N and K should have a value of at least 2 when
the same size of X. To ensure semantic consistency between carrying out metric learning. However, to make N -way 1-shot
X  and X, the pixel-to-pixel mean square error loss function tasks suitable for metric learning, we adopt augmented samples
is introduced to train the FAE . Notably, HDe works just in the instead to form each triplet. Finally, samples can be mapped
transfer learning phase while keeps deactive in the meta-learning into an embedding space, on which the similarity between each
phase. In meta-training, HEn is frozen and instead a group of sample is learned to help with the final determination.
light-weight SS parameters are adopted to fine-tune HEn . 3) Task-Learner: The task-learner (FT ) is a task-specific
2) Metric-Learner: Metric learning is the overall expression classifier that is trained from scratch for each task or episode
for machine learning approaches based directly on similarities in both transfer-learning and meta-learning phases. As experi-
between samples. The concept of distance is used to describe mental results in [17] showed that the base-learner (FT in our
a relation between two samples. Naturally, two samples being study) with one-layer fully-connected network obtained better
closer to each other means that they are more similar, and vice classification accuracy than others. Hence, we keep this architec-
versa. In our study, taken Xh as the input, we construct a triplet ture of FT . FT is trained through the cross-entropy loss function
metric-learner (FM ) of two neural network layers for metric (Lt ) that is expressed in (8), where C is the number of classes
learning which can be shown in Fig. 2. Besides, to avoid the and yi , pi represent the ground truth and predict probability of
redundancy of FM , we implement parameter sharing for each a sample, respectively.
branch of the triplet. In the preliminary training stage, FM is C

trained for getting appropriate initialization parameters for the Lt = − [yi log(pi ) + (1 − 1yi )log(1 − pi )] . (8)
meta-learning phase. In particular, FM is frozen but endowed i=1

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
22 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 1, JANUARY 2023

Fig. 4. A procedure of data augmentation for the 5-way 1-shot task in


the meta-learning stage.

Fig. 3. Newly introduced scaling and shifting (SS) parameters φSS in


our model during meta-learning. φSSE and φSSM are SS parameters (i.e., Lmeta(te) = Lt ) that is shown as follows.
of HEn and FM , respectively.
θT = θT − α∇θT Lmeta(te) (θ, φSS{E,M } ). (14)
φSSM = φSSM − β∇φSSM Lmeta(te) (θ, φSS{E,M } ). (15)
In transfer learning, the above learners, FAE , FM and FT , are
trained together by an integrated loss function (Ltransf er ) that is Finally, in the test stage of meta-testing, all parameters of
expressed in (9). We assume θ{E,D,M,T } are original parameters learners are frozen and the performance evaluation is imple-
of HEn , HDe , FM and FT , respectively, and their optimiaztion mented on the query set of corresponding novel tasks.
process is display as (10). Besides, ω{1,2} are also learnable
parameters and both initialized to 1.0. As La is treated as a C. Data Augmentation in Real Time
regular term, γ is pre-set to 0.01. This section states the pattern of data augmentation in real
1 1 time during meta-learning. In meta-training, several episodes
Ltransf er = 2 Lm + Lt + γLa + log(ω1 ω2 ). (9) containing few-shot tasks are utilized for training, validation and
2ω1 2ω22
testing. When tackling N -way 1-shot tasks, data augmentation
θπ = θπ − α∇θπ Ltransf er (θπ , ω), π = {E, D, M, T }. (10) in real time is compulsory to carry out metric learning with
triplet margin loss function. Meanwhile, we augment the training
B. Meta-Learning data in real time to enrich triplets, and the procedure is depicted
Meta-learning is a task-level training process, which decom- in Fig. 4. The data augmentation contains translation, rotation,
poses into meta-training and meta-testing. In meta-training, flipping, cropping, gray scale and color jitter. Each pattern of
tasks are randomly selected from the base classes of meta- data augmentation is implemented with a specified probability
training dataset, on which the evaluation results are utilized for (p = 0.5). In the training stage of meta-training, each sample is
optimizing our model. Particularly, HDe is no longer participate randomly augmented three times, resulting in four samples in
in the learning process. Moreover, new SS parameters φSS , total for one class. In this paper, for a 3-way 1-shot task in an
including φSSE and φSSM , are introduced into the original iteration, we can generate 216 triplets according to (7).
parameters (i.e., convolution kernel, fully-connected weight,
bias) of our model, which can be illustrated in Fig. 3. Concretely, D. Gaussian Disturbance Soft Label
φSSE and φSSM are learnable parameters and represent SS A label is digitized knowledge used to interpret relative data,
parameters of HEn and FM , respectively. In the meta-training which can be divided to hard label and soft label. Regarding some
stage, the θE is frozen, and the θ{M,T } and φSSE are optimized simple tasks, the hard label (e.g., one-hot coding), is mostly
through the loss function Lmeta(tr) that is displayed in (11). With applied to guide model training. However, for some few-shot
the help of φSSE , we can obtain a HDe approximately suitable tasks with limited training data, the hard label is adverse to the
for the target task. The parameter optimization process can be generalization of the model, but the soft label is an effective
shown as (12) and (13). measure. Mathematically, the soft label changes the mapping
1 1 relationship between the data and categories, which is illustrated
Lmeta(tr) = Lm + Lt + log(ω1 ω2 ). (11) in Fig. 5. Thus, the soft label guided learning strategy not only
2ω12 2ω22
alleviate the possibility of over-fitting, but also increase the
θπ = θπ − α∇θπ Lmeta(tr) (θ, φSSE , ω), π = {M, T }. (12) domain adaptability of new tasks.
φSSE = φSSE − β∇φSSE Lmeta(tr) (θ, φSSE , ω). (13) In this paper, we initially propose a Gaussian disturbance soft
label (GDSL), which is constructed through adding a random
In the training stage of meta-testing, the θ{E,M } and φSSE are variable ε ∼ |N (μ, σ)| to the original one-hot label. N (μ, σ) is
frozen, and the θT and φSSM are trained to acquire task-specific a Gaussian distribution with a mean μ (set to 0.0 in our study) and
FT and FM , based on support set of novel tasks from meta- a variance σ. | · | represents absolute value function to ensure the
testing dataset. Lt is used to guide this optimization process non-negativity of ε. The generated process of GDSL is illustrated

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: MULTI-LEARNER BASED DEEP META-LEARNING FOR FEW-SHOT MEDICAL IMAGE CLASSIFICATION 23

consists of nine types of tissues and we randomly sample 600


images of each category to construct a FSL scenario.
CHEST is derived from the dataset NIH-ChestXray14, com-
prising frontal-view X-Ray images with text-mined 14 disease
labels [25]. Four diseases with the least number are chosen as
meta-testing classes and 600 images for each other category were
picked up randomly for meta-training and meta-validation.
The details of category allocation of above datasets are ex-
hibited in Table I and the benchmark image size is 84 × 84 in
our experiments. Since the pathological features of CHEST are
Fig. 5. Pictorial representation of Gaussian disturbance soft label from not as obvious as those of BLOOD and PATHOLOGY, we also
hard label space to soft label space. validate our method on the CHEST with an image size of 224
× 224. Besides, a non-medical dataset, miniImageNet [26],
TABLE I that is most widely used in related FSL works, is also utilized
COMBINATIONS OF THE META-TRAINING, META-VALIDATION AND for evaluating our method. Its experimental settings refer to
META-TESTING CLASSES. THE NUMBER OF DATAPOINTS FOR EACH CLASS previous researches [17]. Concretely, there are 64, 16 and 20
IS MARKED IN THE BRACKET
categories in the meta-training, meta-validation and meta-testing
stage, respectively. Each category contains 600 images with an
image size of 80 × 80.

B. Implementation Details
1) Episode Sampling: Regarding the three medical datasets,
we consider the 3-class classification task and randomly select
1 (5 or 10) images from each class as training samples and 15
images from the rest as test samples. More concretely, 3-way K-
shot (K = {1, 5, 10}) tasks are constructed for meta-learning. We
in Fig. 5. A random value of ε is introduced to the label of each randomly sample at most 5 k episodes in the meta-training stage,
sample in every iteration. For instance, the original one-hot label and 600 episodes in a test experiment for both meta-validation
{0, 1, 0} can be transformed to the GDSL {0.5ε, 1 − ε, 0.5ε}. and meta-testing. Notably, the model with the highest meta-
Notably, the GDSL of each sample in each epoch is not exactly validation accuracy is elected for meta-testing.
the same, and the larger the value of σ, the larger the mapping 2) Network Architectures: Following the literature [17],
range in the label space. [27], some embedding backbones are taken as feature extractors,
such as the Conv32F (4 CONV) and ResNet-style networks.
V. EXPERIMENTS Specifically, the Conv32F consisted of four convolution blocks,
each of which in turn is composed of a convolution layer, a
A. Datasets batch-normalization layer, a ReLU layer and a max-pooling
To test our FSL method, we random sample and rebuild layer. The numbers of filters for these blocks are {32, 32, 32,
three light-weight subsets from three publicly available medical 32}. The commonly utilized ResNet-style network for FSL
image datasets (i.e., BLOOD [23], PATHOLOGY [24] and includes ResNet12, ResNet18 and ResNet25 [27]. The depth
CHEST [25]). Categories in each dataset are separated for three of ResNet-style model is adjusted through adding or subtracting
parts, meta-training, meta-validation and meta-testing, in which the number of blocks [2]. However, previous researches have
the name and the quantity of each category are enumerated in Ta- verified that the performance of FSL for different ResNet-style
ble I. In this paper, the problem of identifying diseases/categories models is not the deeper the better [27]. In our method, the
in the meta-testing classes is modeled as a FSL problem, and we ResNet25 backbone with 12 blocks (2-CNN-layers) and another
verify our method on the three datasets respectively. CNN-layer is regarded as HEn , and the hidden presentation Xh
BLOOD is built based on a prior database of individual with 640 dimensions can be finally obtained.
normal cells which are organized into eight classes [23]. Inside, 3) Training Details: During the transfer learning phase, the
immature granulocytes (IG) contains four sub-types (type name model is trained by the SGD optimizer [28] and a total of 100
begin with ‘IG-’), which are taken as meta-validation classes. epochs are performed. The learning rate is initialized as 0.1, and
The remaining seven types in the BLOOD with 600 images decayed to its 0.2 times every 20 epoch. In the meta-learning
selected randomly for each type, are employed for meta-training phase, for N -way K-shot tasks, a regular learning considers a
and meta-testing. training step for optimizing FM and φSS{E,M } through the Adam
PATHOLOGY is constructed to predict survival from col- optimizer [29], followed by a validation step for optimizing FT
orectal cancer histology slides, which is a dataset (NCT-CRC- by the SGD optimizer. In each epoch, the learning rate of Adam
HE-100 K) with non-overlapping image patches from hema- is initialized as 0.001, and decayed to its half every 10 epochs.
toxylin & eosin stained histological images [24]. The dataset Concurrently, the original learning rate of SGD is set to 0.01

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
24 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 1, JANUARY 2023

TABLE II TABLE III


THE 3-WAY, 1-SHOT, 5-SHOT AND 10-SHOT CLASSIFICATION ACCURACY THE 3-WAY, 1-SHOT, 5-SHOT AND 10-SHOT CLASSIFICATION ACCURACY
(%) OF OUR PROPOSED METHOD BASED ON MULTIPLE EMBEDDING (%) ON THE BLOOD DATASET. THE BEST AND SECOND-BEST RESULTS ARE
BACKBONES (4 CONV, RESNET18 AND RESNET25) HIGHLIGHTED

TABLE IV
without periodic decay. Notably, in both transfer-learning and THE 3-WAY, 1-SHOT, 5-SHOT AND 10-SHOT CLASSIFICATION ACCURACY
meta-learning, the batch-size could be pre-set to 32 and the (%) ON THE PATHOLOGY DATASET. THE BEST AND SECOND-BEST
training will stop after 50 epochs. RESULTS ARE HIGHLIGHTED
4) Evaluation: Classification accuracy is used to evaluate the
performance of FSL modes. According to above episode sam-
pling rule, each test experiment contains 600 random episodes.
In the meta-testing stage, we repeat the test experiments 5 times
thus to get 3000 episodes. Finally, the average accuracy with
the 95% confidence interval is reported. Taking the CHEST as
an example, in each test experiment, we randomly select three
categories from the meta-testing classes (i.e., Edema, Fibrosis,
Hernia and Pneumonia) as unseen novel classes in each episode,
for the 3-way K-shot (K = {1, 5, 10}) tasks. Finally, the average
accuracy is calculated based on 3000 episode tests, which greatly TABLE V
reduce the performance perturbation. THE 3-WAY, 1-SHOT, 5-SHOT AND 10-SHOT CLASSIFICATION ACCURACY
(%) ON THE CHEST DATASET. THE BEST AND SECOND-BEST RESULTS ARE
HIGHLIGHTED
VI. RESULT ANALYSIS
A. Evaluations on Multiple Embedding Backbones
Table II demonstrates evaluation results of our proposed
method using HEn with different backbones, including 4 CONV,
ResNet18 and ResNet25 on three datasets. From Table I, we can
conclude that our method with deeper backbones yields better
performance on each dataset. Taking the BLOOD for example,
the classification accuracy of ResNet18 based model outper-
forms that of 4 CONV based by over 4.5%. The ResNet25 based
model has the best result with 63.47% which is 5.48% higher
than 57.99% (ResNet18 based) for the 1-shot task. In brief, our with the increase of shot number for all FSL methods on each
method can acquire better few-shot classification performance dataset.
with relatively deeper network architectures. BLOOD. In Table III, our method achieves top performance
for 1-shot tasks and sub-optimal performance for 5-shot tasks,
respectively. Regarding 3-way 10-shot classification tasks, al-
B. Comparison to State-of-The-Arts
though our method is not optimal, it shows impressive perfor-
Tables III, IV, and V present the overall comparisons to mance which outperforms comparative methods by quite large
related works using the benchmark image size, on the BLOOD, margins. For example, our method has a 10-shot accuracy of
PATHOLOGY and CHEST datasets. Classification accuracies 76.21% that is 18.52% higher than 57.69% (from MAML).
of our method embedding on the ResNet25 are reported. For Furthermore, Fig. 6(a) illustrates the T-SNE distribution of
all compared methods, we have implemented their methods hidden representations of three novel meta-testing categories
with ResNet-style backbones based on open-sourced codes and with our method, which manifests a better clustering effect than
optimize their performance via empirical parameter tuning. Note that with the MTL method (Fig. 6(b)) [17]. In sum, our method
that these evaluation results are the meta-testing results based contributed to an optimized embedding space in which we can
on the well-trained models with the highest meta-validation obtain high-cohesion and low-coupling features, boosting the
accuracies. In these tables, the performance gradually get better FSL performance.

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: MULTI-LEARNER BASED DEEP META-LEARNING FOR FEW-SHOT MEDICAL IMAGE CLASSIFICATION 25

TABLE VII
ABLATION STUDY OF TRANSFER-LEARNING (TL), DATE AUGMENTATION IN
META-LEARNING (DA-META), GSDLσ=0.01 , AUTO-ENCODER FAE AND
METRIC-LEARNER FM FOR 5-WAY, 1-SHOT AND 5-SHOT CLASSIFICATION
ACCURACY (%) ON THE MINIIMAGENET DATASET. THE BEST RESULTS ARE
HIGHLIGHTED

Fig. 6. The T-SNE distributions of hidden representations of (a) our


method and (b) MTL on the meta-testing classes of the BLOOD dataset.
Our method shows better clustering effect.

TABLE VI
THE 3-WAY, 1-SHOT AND 5-SHOT CLASSIFICATION ACCURACY (%) ON THE
CHEST DATASET WITH 224 × 224 IMAGE SIZE. THE BEST AND when enlarging image size. It directly leads to performance of
SECOND-BEST RESULTS ARE HIGHLIGHTED
some methods deteriorating under the large images, especially
for the 3-way 1-shot tasks. To sum up, our method has better
adaptability to large size images and achieves top performance
on few-shot classification tasks.

C. Ablation Study
Table VII displays ablation studies of the transfer-
learning (TL), data augmentation in meta-learning (DA-meta),
GSDLσ=0.01 strategy, FAE and FM , for 5-way 1-shot and
5-shot tasks on the miniImageNet. The ablation experiment
is founded on the baseline model of ResNet25 backbone.
PATHOLOGY. In Table IV, we give the results on the As Table VII shows, compared with the baseline model, the
PATHOLOGY. From this table, we again confirm that our transfer-learning (TL) phase can obtain about 13% and 15%
method outperforms others. Our method achieves around a improvement in 5-way 1-shot and 5-shot classification accu-
margin of 3.55% (0.05%) over the second-best Versa method racy, respectively. The main purpose of data augmentation in
on 1-shot (5-shot) tasks. An interesting observation is that the meta-learning (DA-meta) is to generate sufficient trplets for the
increases from 1-shot tasks to 5-shot tasks are much greater than metric-learner, as illustrated in Section VI.C. However, it can
the gain from 5-shot tasks to 10-shot tasks on the basis of each also improve the classification accuracy individully. Besides,
FSL method. This shows that FSL models are more data-hungry the GSDLσ=0.01 strategy that is used to alleviate model over-
when the shot number is relatively small. fitting can enhance the classification accuracy, especially for the
CHEST. Table V shows the results on the CHEST. As the 5-way 1-shot tasks. Moreover, we can analyze that both FAE
differences between each type of chest Xray images are not ob- and FM can independently boost the classification accuracies.
vious, increasing the difficulty of the FSL tasks. However, from Combining the two learners together can further advance the
Table V, we still observe that our approach consistently achieves classification performance, acquiring an accuracy of 64.26% for
finer performance. Our approach outperforms the ANIL which 5-way 1-shot and 79.35% for 5-way 5-shot. The results of abla-
performs poorly, around 10% for the 1-shot, over 12% both for tion experiments prove the necessity of TL phase and DA-meta,
the 5-shot and 10-shot tasks, respectively. and also demonstrate the effectiveness of GDSL strategy and
As the pathological features of CHEST maybe severely at- multiple learners (FAE and FM ), respectively. In the end, the
tenuate under the benchmark image size (i.e., 84 × 84), we also combination of all learners with the GDSL strategy is validated
validate the classification accuracy of 3-way, 1-shot and 5-shot optimal for few-shot classification tasks.
tasks from the CHEST with larger size of 224 × 224. Due to
computational resource constraints, 3-way 10-shot experiments
are omitted. From Table VI, we can observe that FSL methods, D. Parameter Sensitivity Analysis
including ProtoNet, R2D2, MTL, and Ours, acquire significant Regarding the GDSL strategy, the variance σ in N (μ, σ)
performance promotion with larger image size. We speculate reflects the mapping range in the label space for each sample,
that images with larger size enable the pathological details more which has different influences on the performance of FSL clas-
clear. Inside, our method still achieves the best classification sification tasks. On the one hand, the larger σ makes the label
performance on both 3-way, 1-shot and 5-shot tasks. Concretely, fluctuating greatly, adversely degenerating the performance of
compared with small image size experiments, our method also models. On the other hand, the smaller σ has little effect on the
obtains 2.86% and 3.56% gains on 3-way, 1-shot and 5-shot promotion of model generalization. To conduct the sensitivity
tasks respectively. However, noises are inevitably introduced analysis on the hyper-parameter σ, we set up experiments above

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
26 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 1, JANUARY 2023

Fig. 7. Test accuracy of 3-way, 1-shot, 5-shot, and 10-shot tasks on the (a) BLOOD, (b) PATHOLOGY and (c) CHEST datasets with different
variances (σ).

three medical datasets with 3-way, 1-shot, 5-shot and 10-shot TABLE VIII
THE 5-WAY, 1-SHOT AND 5-SHOT CLASSIFICATION ACCURACY (%) OF
tasks. Specifically, σ is compared in the range of [0.01, 0.02, DIFFERENT FSL METHODS USING THE ORIGINAL PAPER SETTINGS ON THE
0.03, 0.04, 0.05, 0.1, 0.2]. Fig. 7 demonstrates the best meta- MINIIMAGENET DATASET. THE BEST RESULTS ARE HIGHLIGHTED
testing accuracy using different σ on the BLOOD, PATHOL-
OGY and CHEST respectively.
In Fig. 7(a), as for σ on the BLOOD, although there is no
clear winner among different shots tasks, σ = 0.04 achieves the
highest test accuracy in 1-shot and 10-shot tasks and decent
test accuracy in 5-shot tasks. Similarly, the optimal value of σ
is 0.04 for the CHEST. In Fig. 7(b), the test accuracy curve of
1-shot generally show initially an increased and then a downward
trend. Thus, σ = 0.03 can be regarded as the best choice for the
PATHOLOGY.

E. Generalization on the Non-Medical Dataset


To illustrate the generalization of our method across mul-
tiple imaging modalities, we also test our method on the
miniImageNet dataset. The training details can be checked in
Section VI.B.(3). To ensure the fairness of the comparison, the
number of epochs in the meta-training stage is channged to 100
(same as compared methods). Notably, the σ of GDSL is set to transferring from the miniImageNet to the BLOOD (PATHOL-
0.01. Meanwhile, we report some performance of other popular OGY and CHEST). In our cross-domain experiments, all mod-
FSL methods based on the miniImageNet given in original els are trained on the source domain, namely meta-training
papers [10], [12], [14], [16], [26], [33], [34], [35], [36], [37], dataset of miniImageNet, and validated on the meta-validation
[38], [39], [40]. In addition, the evaluation results of Versa [30], dataset of miniImageNet [26]. Then, in the meta-testing stage,
ANIL [32], R2D2 [31] and MTL [17] are obtained from the the performance is directly evaluated on the target domain,
work [27] which implemented open-sourced codes using the namely all the classes of the BLOOD (PATHOLOGY and
original paper settings. We conduct our method on three ar- CHEST). Test episodes are constructed in each target dataset,
chitectures, including ResNet12, ResNet18 and ResNet25. The with 5-way 1-shot (or 5-shot) tasks and 15 query samples
results are displayed as Table VIII, which demonstrate that our for each way. From Table IX, our proposed model trained on
method based on ResNet25 is superior to comparative methods the miniImageNet can easily generalize to the BLOOD and
whether tackling 5-way 1-shot or 5-shot tasks. Additionally, PATHOLOGY, and enjoy small domain shifting. Concretely,
our method based on ResNet12 and ResNet18 can also achieve our method obtains excellent performance over both of the two
competitive performance. datasets compared with others. When performing on the CHEST,
our method consistently surpasses other comparative methods.
F. Cross-Domain Transferability Obviously, the performance of all methods on the CHEST has
The cross-domain transferability of the FSL model is of great a large domain shifting which is mostly caused by the task
importance, when it is difficult to collect a large amount of data difficulty and the knowledge difference between the source
in the same domain. Thus, we further evaluate the transferabil- domain and the target domain. Even so, experiments reveal
ity of our method in cross-domain scenarios. To this end, we the superiority of our method when handling the cross-domain
conduct experiments on three cross-domain scenarios, where scenarios.

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: MULTI-LEARNER BASED DEEP META-LEARNING FOR FEW-SHOT MEDICAL IMAGE CLASSIFICATION 27

TABLE IX
THE 5-WAY, 1-SHOT AND 5-SHOT CLASSIFICATION ACCURACY (%) ON CROSS-DOMAIN TRANSFERABILITY. ALL METHODS ARE LEARNED FROM THE SOURCE
DOMAIN, AND DIRECTLY EVALUATED ON THE TESTSET OF THE TARGET DOMAIN

VII. CONCLUSION [8] W.-Y. Chen, Y.-C. Liu, Z. Kira, Y.-C. F. Wang, and J.-B. Huang, “A
closer look at few-shot classification,” in Int. Conf. Learn. Representations,
In this paper, we propose an effective FSL framework for 2019, pp. 1–16. [Online]. Available: https://openreview.net/forum?id=
medical image classification, fusing both transfer-learning and HkxLXnAcFQ
[9] V. Cheplygina, “Cats or cat scans: Transfer learning from natural or medi-
meta-learning. We innovatively put forward to a multi-learner cal image source data sets?,” Curr. Opin. Biomed. Eng., vol. 9, pp. 21–27,
based model, including autoencoder, metric-learner and task- 2019.
learner, which is trained sequentially in the training stages of [10] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot
learning,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 4080–4090.
transfer-learning and meta-learning. Extensive experiments of [11] S. Laenen and L. Bertinetto, “On episodes, prototypical networks, and few-
3-way K-shot (K = {1, 5, 10}) FSL tasks on three medical shot learning,” in Proc. Adv. Neural Inf. Process. Syst., 2021, pp. 24581–
image datasets (BLOOD, PATHOLOGY and CHEST), witness 24592.
[12] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales,
the superiority of our method compared with state-of-the-arts. “Learning to compare: Relation network for few-shot learning,” in Proc.
The consistent improvements by the GDSL strategy proved that IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1199–1208.
the soft label space could expand the mapping range dynam- [13] W. Li, J. Xu, J. Huo, L. Wang, Y. Gao, and J. Luo, “Distribution consistency
based covariance metric networks for few-shot learning,” in Proc. AAAI
icly, benefitting the efficient FSL. Concurrently, we verify the Conf. Artif. Intell., 2019, pp. 8642–8649.
cross-domain transferability from the miniImageNet to each [14] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast
medical dataset and further confirm the stability and robust- adaptation of deep networks,” in Proc. Int. Conf. Mach. Learn., 2017,
pp. 1126–1135.
ness of our method. The proposed multiple learners can learn [15] L. Wang, Q. Cai, Z. Yang, and Z. Wang, “On the global optimality of
a few-shot classification task from several aspects, including model-agnostic meta-learning,” in Proc. Int. Conf. Mach. Learn., 2020,
semantic consistency, similarity and category discrimination. pp. 9837–9846.
[16] A. A. Rusu et al., “Meta-learning with latent embedding optimization,”
The multi-learner based model may provide a new research in Proc. Int. Conf. Learn. Representations, 2019, pp. 1–17. [Online].
idea for FSL on medical images, and each learner can also be Available: https://openreview.net/forum?id=BJgklhAcK7
further improved to achieve better performance. In the following [17] Q. Sun, Y. Liu, Z. Chen, T.-S. Chua, and B. Schiele, “Meta-transfer learning
through hard tasks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 3,
research, we will focus on optimizing the network structure pp. 1443–1456, Mar. 2020.
of each learner to get advanced classification accuracies on [18] K. Mahajan, M. Sharma, and L. Vig, “Meta-dermdiagnosis: Few-shot
cross-domain FSL tasks. skin disease identification using meta-learning,” in Proc. IEEE/CVF Conf.
Comput. Vis. Pattern Recognit. Workshops, 2020, pp. 730–731.
[19] Y. Hu, Z. Zhong, R. Wang, H. Liu, Z. Tan, and W.-S. Zheng, “Data
augmentation in logit space for medical image classification with limited
REFERENCES training data,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assist.
Interv., 2021, pp. 469–479.
[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, [20] S. Mai, Q. Li, Q. Zhao, and M. Gao, “Few-shot transfer learning for
no. 7553, pp. 436–444, 2015. hereditary retinal diseases recognition,” in Proc. Int. Conf. Med. Image
[2] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image Comput. Comput.-Assist. Interv., 2021, pp. 97–107.
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, [21] W. Rawat and Z. Wang, “Deep convolutional neural networks for image
pp. 770–778. classification: A comprehensive review,” Neural Comput., vol. 29, no. 9,
[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification pp. 2352–2449, 2017.
with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, [22] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of
pp. 84–90, 2017. data with neural networks,” Sci., vol. 313, no. 5786, pp. 504–507, 2006.
[4] E. Schwartz et al., “Delta-encoder: An effective sample synthesis method [23] A. Acevedo, A. Merino, S. Alférez, Á. Molina, L. Boldú, and J. Rodel-
for few-shot object recognition,” in Proc. Adv. Neural Inf. Process. Syst., lar, “A dataset of microscopic peripheral blood cell images for devel-
2018, pp. 2850–2860. opment of automatic recognition systems,” Data Brief, vol. 30, 2020,
[5] G. Litjens et al., “A survey on deep learning in medical image analysis,” Art. no. 105474.
Med. Image Anal., vol. 42, pp. 60–88, 2017. [24] J. N. Kather et al., “Predicting survival from colorectal cancer histology
[6] D. Erhan, A. Courville, Y. Bengio, and P. Vincent, “Why does unsupervised slides using deep learning: A retrospective multicenter study,” PLoS Med.,
pre-training help deep learning?,” in Proc. 13th Int. Conf. Artif. Intell. vol. 16, no. 1, 2019, Art. no. e1002730.
Statist., 2010, pp. 201–208. [25] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “Chestx-
[7] H. Qi, M. Brown, and D. G. Lowe, “Low-shot learning with imprinted ray8: Hospital-scale chest x-ray database and benchmarks on weakly-
weights,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, supervised classification and localization of common thorax diseases,” in
pp. 5822–5830. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2097–2106.

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
28 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 1, JANUARY 2023

[26] O. Vinyals et al., “Matching networks for one shot learning,” in Proc. Adv. [34] G. Cheng, R. Li, C. Lang, and J. Han, “Task-wise attention guided part
Neural Inf. Process. Syst., 2016, pp. 3637–3645. complementary learning for few-shot image classification,” Sci. China Inf.
[27] W. Li et al., “LibFewShot: A comprehensive library for few-shot learning,” Sci., vol. 64, no. 2, pp. 1–14, 2021.
2022, arXiv:2109.04898. [35] F. Zhou, L. Zhang, and W. Wei, “Meta-generating deep attentive metric for
[28] L. Bottou, “Stochastic gradient descent tricks,” in Neural Networks: Tricks few-shot classification,” IEEE Trans. Circuits Syst. Video Technol., vol. 32,
of the Trade, Berlin, Germany: Springer, 2012, pp. 421–436. no. 10, pp. 6863–6873, Oct. 2022.
[29] I. K. M. Jais, A. R. Ismail, and S. Q. Nisa, “Adam optimization algorithm [36] K. Lee, S. Maji, A. Ravichandran, and S. Soatto, “Meta-learning with
for wide and deep neural network,” Knowl. Eng. Data Sci., vol. 2, no. 1, differentiable convex optimization,” in Proc. IEEE/CVF Conf. Comput.
pp. 41–46, 2019. Vis. Pattern Recognit., 2019, pp. 10657–10665.
[30] J. Gordon, J. Bronskill, M. Bauer, S. Nowozin, and R. E. Turner, “Versa: [37] C. Simon, P. Koniusz, R. Nock, and M. Harandi, “Adaptive subspaces
Versatile and efficient few-shot learning,” in Proc. 3rd Workshop Bayesian for few-shot learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Deep Learn., 2018, pp. 1–9. Recognit., 2020, pp. 4136–4145.
[31] L. Bertinetto, J. F. Henriques, P. Torr, and A. Vedaldi, “Meta-learning [38] Y. Chen, Z. Liu, H. Xu, T. Darrell, and X. Wang, “Meta-baseline: Exploring
with differentiable closed-form solvers,” in Proc. Int. Conf. Learn. Rep- simple meta-learning for few-shot learning,” in Proc. IEEE/CVF Int. Conf.
resentations, 2019, pp. 1–15. [Online]. Available: https://openreview.net/ Comput. Vis., 2021, pp. 9062–9071.
forum?id=HyxnZh0ct7 [39] Q. Luo, L. Wang, J. Lv, S. Xiang, and C. Pan, “Few-shot learning
[32] A. Raghu, M. Raghu, S. Bengio, and O. Vinyals, “Rapid learning or feature via feature hallucination with variational inference,” in Proc. IEEE/CVF
reuse? towards understanding the effectiveness of MAML,” in Proc. Int. Winter Conf. Appl. Comput. Vis., 2021, pp. 3963–3972.
Conf. Learn. Representations, 2020, pp. 1–21. [Online]. Available: https: [40] Z. Yu, L. Chen, Z. Cheng, and J. Luo, “Transmatch: A transfer-learning
//openreview.net/forum?id=rkgMkCEtPB scheme for semi-supervised few-shot learning,” in Proc. IEEE/CVF Conf.
[33] A. Antoniou, H. Edwards, and A. Storkey, “How to train your MAML,” Comput. Vis. Pattern Recognit., 2020, pp. 12856–12864.
in Proc. Int. Conf. Learn. Representations, 2019, pp. 1–11. [Online].
Available: https://openreview.net/forum?id=HJGven05Y7

Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.

You might also like