Multi-Learner Based Deep Meta-Learning For Few-Shot Medical Image Classification
Multi-Learner Based Deep Meta-Learning For Few-Shot Medical Image Classification
1, JANUARY 2023 17
2168-2194 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
18 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 1, JANUARY 2023
Fig. 1. Pipeline of our proposed FSL framework, consisting of transfer-learning phase and meta-learning phase. Three learners, i.e., autoencoder,
metric-learner and task-learner constitute our model and the learner in bold are learnable while others are frozen at each phase. The SS are the
scaling and shifting parameters of learners (encoder and metric-learner).
meta-learning is to transfer experience from base learning tasks 2) We design a multi-learner based FSL model that inte-
(meta-training) to unseen tasks (meta-testing). Humans are able grates auto-encoder, metric-learner and task-learner to
to quickly learn an object just through a few samples and can improve the accuracy and robustness of our model.
apply their skills to the process of new tasks learning. Thus, 3) We propose a Gaussian disturbance soft label (GDSL)
humans have the inborn ability to learn how to learn, which is the for each medical image during training to reduce the
essence of meta-learning. Although meta-learning demonstrates risk of over-fitting, which is illustrated in Section IV.
its great promising potential in machine learning community, Experiments show that the GDSL strategy can improve
it has not been fully evaluated and widely applied in medical the performance of the FSL model.
scenarios. One of the most challenges is that the domain shifting 4) We simulate three FSL scenarios of medical image
between the meta-training and meta-testing dataset, which is (i.e., BLOOD, PATHOLOGY and CHEST) classification
difficult to be stabilized and handled. In general, contents of based on three publicly available medical image datasets,
medical images are more fine-grained. The similarity differ- and further validate the effectiveness of our method.
ences between inter-class and intra-class for medical images 5) We evaluate the generalization of our method on a non-
are mostly not very significant, which increases the difficulty of medical public dataset, i.e. miniImageNet. Then, we
feature extraction. Thus, a good and robust feature representation prove the transferability of our method in the case of
learner is quite important, especially for the FSL task. cross-domain from the miniImageNet to medical datasets.
To conquer the challenges of few-shot classification tasks of Experimental results demonstrate the superiority of our
medical images in multiple modals, we propose a novel FSL method.
model with three learners, i.e. auto-encoder, metric-learner and
task learner. The pipeline of our method is elaborated in Fig. 1,
consisting of transfer-learning phase and meta-learning phase. II. RELATED WORK
Concretely, the auto-encoder and metric-learner help to extract Research literature on FSL of image classification tasks ex-
feature representations with good semantic consistency and hibits great diversity, spanning from data augmentation [4] to
similarity difference, respectively. Furthermore, the task-learner supervised learning [5], [6], [7], [8], [9], [10], [12], [14], [16],
performs specific classification tasks based on well-extracted [17]. In this work, some FSL methods, including fine-tuning
feature representations. Inspired by Sun et al. [17], we have based, metric-learning based and meta-learning based, are most
inherited and improved the FSL framework. On the one hand, relevant to ours, which is introduced in detail as follows.
the transfer-learning initially conduct the preliminary-training
based on large-scale data using all training datapoints in the
meta-training, and then the preliminary-testing based on N - A. Fine-Tuning Based Methods
way K-shot tasks in the meta-validation. On the other hand, The fine-tuning based method follows a standard transfer
after transfer-learning we have introduced the parameters of learning procedure that is a leading strategy in medical image
scaling and shifting (SS) for both the pre-trained encoder and analysis [5]. Researches aim at solving a specific task in the
metric-learner [17]. More specifically, the SS of encoder is target domain through transferring the knowledge learned from
fine-tuned and the task-learner and metric-learner are re-trained a relevant source domain. For a new training task, the model
through multiple episodes in meta-training. Then, the SS of pre-trained on large-scale images from one similar domain have
metric-learner is fine-tuned and the task-learner are re-trained proven to be a better parameter initialization way [6]. The
again for fast adaptation to novel unseen tasks in meta-testing. fine-tuning based methods consist of two-stages, pre-training
The contributions of this work can be summarized as follows: with base classes and fine-tuning with novel classes. In the
1) We propose an effective learning framework for few-shot pre-training stage, the whole auxiliary set with base classes is
medical image classification tasks, including two phases, utilized to train a feature extractor and classifier via the standard
transfer-learning and meta-learning, which can rescue the cross-entropy loss. Then, the fine-tuning strategy is conducted
learning dilemma caused by the scarcity of target learning based on the support set with novel classes to relearn parameters
samples. of the feature extractor and classifier. Basically, partial or whole
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: MULTI-LEARNER BASED DEEP META-LEARNING FOR FEW-SHOT MEDICAL IMAGE CLASSIFICATION 19
layers of the pre-trained feature extractor are fixed to avoid over- optimization between the base-learner and the meta-learner. The
fitting due to the limitation of support set. Once the fine-tuned base-learner is updated through training datapoints in each task.
feature extractor and classifier are achieved, the query set can Next, the meta-learner is optimized by meta fine-tuning with test
be predicted to evaluate the performance of the united models. datapoints in each task. Consequently, the meta-learner is able
Qi et al. [7] proposed a novel fine-tuning based classifier with to learn cross-task meta-knowledge, benefitting fast adaptation
imprinted weights that can be generated by the mean of fea- on novel tasks.
ture embedding vectors of low-shot samples. The experiments Finn et al. [14] proposed a model-agnostic meta-learning
proved that their method can provide better generalization than (MAML) method that is one of the popular representatives of
the comparative embedding method. Chen et al. [8] shown that meta-learning. Moreover, some derivates of the MAML were
fine-tuning based methods compared favorably against other also developed [15]. The main idea of the MAML was to learn
FSL approaches in a realistic cross-domain evaluation setting. an initialization of the neural network that followed the fast gra-
Although different modalities of base classes can provide differ- dient direction to classify novel classes effectively. Besides, the
ent types of transfer knowledge, Cheplygina et al. [9] reported latent embedding optimization (LEO) [16] method had a similar
that base classes do not have to be related to novel classes. learning algorithm to that of the MAML, including the inner
Thus, transfer learning from natural datasets (source domain) loop for getting task-specific parameter initialization and the
to medical datasets (target domain) was feasible [9]. However, outer loop for parameter updating. However, instead of directly
if the amount of data in novel classes is extremely small, the fine- learning the explicit high-dimensional model parameters, LEO
tuning based method is still prone to over-fitting phenomenon decoupled the gradient-based adaptation process within a low
and lack of generalization for novel classes. dimensional latent space and learned a generative distribution
of model parameters. The early meta-learning based method
B. Metric-Learning Based Methods merely followed a pure meta-training paradigm through training
a model from scratch. However, in recent image recognition
Metric-learning based methods are with simple methodology,
tasks, researchers also have attempted to combine the fine-
directly comparing the similarities or distances between the
tuning and meta-learning together to form hybrid approach. Sun
query image and each labeled image (or support image) in the
et al. [17] proposed a meta-transfer learning (MTL) approach
support set. In specific, the entire support set is firstly jointly
to leverage the advantages of both transfer-learning and meta-
encoded into the latent representation space. Then, each query
learning in the FSL setting.
image can also be projected into the above space so as to compute
There were a relatively small amount of FSL researches
the similarity between each query image and each support image.
about the medical image. Mahajan et al. [18] implemented
Based on the similarity measurement, the category of each query
the few-shot skin diseases identification and tried fast model
image can be predicted.
adaptation in long-tailed classes distribution settings, based on
Researchers proposed a prototypical network (ProtoNet) [10]
the meta-learning framework. Hu et al. [19] devised a novel data
and its derivates [11], which were classical metric-learning
augmentation method, not in the input space but logit space, ef-
based methods. The mean vector of feature embeddings of
fectively alleviating the over-fitting for classification tasks with
each support class was calculated as its corresponding prototype
limited medical images. Mai et al. [20] formulated the retinal
representation. Then, the similarity between each query image
disease FSL problem as a Student-Teacher learning task with
and each prototype was used for classification. Concretely, the
both the discriminative feature space and knowledge distillation
nearest-neighbor classifier was employed for prediction in the
(KD) technique. In this paper, we propose an innovative FSL
test stage. Sung et al. [12] proposed a relation network (Relation-
classification method for multiple modality medical images on
Net) that was another representative metric-learning method.
the basis of meta-learning, merging the merits of fine-tuning and
The RelationNet can learn a non-linear metric through a neural
metric-learning simultaneously.
network rather than select a specific metric function. Moreover,
Li et al. [13] put forward a covariance metric network (CovaM-
Net) that adopted a new covariance metric with a second-order III. PROBLEM DEFINITION AND DENOTATION
local covariance representation for each class, instead of con-
Meta-learning generally consists of two stages, meta-training
ventional first-order class representations (e.g., mean vector). In
and meta-testing [17]. Both meta-training and meta-testing also
metric-learning based methods, there are no data-independent
contain training and testing stages. Additionally, samples in
parameters in the classifier (e.g., the nearest-neighbor classifier).
meta-training and meta-testing are not datapoints but episodes,
Therefore, it is no need to employ fine-tuning procedure in the
and each episode is a few-shot classification task. Furthermore,
test stage.
the objective of meta-learning is not to classify unseen datapoints
but to fast adapt the previous learned experience or knowledge
C. Meta-Learning Based Methods to a new few-shot classification task.
Meta-learning based methods normally employ a meta- The denotations of meta-training and meta-testing are as fol-
training procedure on a series of few-shot tasks derived from lows. Given an auxiliary image dataset Dbase that has sufficient
the base classes in the training stage. This procedure help the images of base classes for meta-training, we first sample several
well-designed model fast adapt to unseen tasks in the test stage. tasks from a distribution p(T ) such that each T has a few images
In briefly, the meta-training paradigm is composed of a two-loop from some classes. T is also called episode, containing a support
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
20 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 1, JANUARY 2023
Fig. 2. Pictorial representation of our proposed method, including: (a) preliminary-training stage on large-scale data of meta-training, and (b)
training stage of meta-training, and (c) training stage of meta-testing. Thereafter, the performance evaluation of the model is at the test stage of
meta-testing.
set S to train the task-learner, and a query set Q to compute a IV. METHODOLOGY
specified validation loss that is used to optimize the auto-encoder
As shown in Fig. 1, our method consists of two main
and metric-learner. S consists of multiple N -way K-shot tasks,
training phases, transfer-learning and meta-learning. Fig. 2(a)
in which N is the number of selected classes and K is the amount
displays the procedure of transfer-learning that is introduced
of selected images for each selected class. Q contains randomly
in Section A. Fig. 2(b) and (c) demonstrates the pictorial
selected M images from the rest images in the N selected classes
representations of meta-training and meta-testing stages,
as test samples. In particular, meta-training aims to learn from
respectively. Meta-learning is described in the following
multiple episodes sampled from p(T ). For meta-testing, given an
Section B. In addition, to boost the overall learning efficiency,
unseen novel image dataset Dnovel , a new task Tnovel is sampled
both the data augmentation (Section C) and the Gaussian
similarly. “Unseen” means that there is no overlap of classes
disturbance soft label (GDSL) (Section D) strategies are
between meta-testing and meta-training tasks. In our method, the
applied to the meta-learning procedure.
Tnovel in meta-testing starts from the experience of the encoder
and metric-learner, and eventually adapts the task-learner. The
final evaluation is done by testing a set of unseen images in query A. Transfer Learning From Large-Scale Data
set of Tnovel . Hence, our method tries to optimize multi-learners It has been confirmed that basic graphic features without
under the meta-learning framework to learn better performance special semantic meanings are learned in the first few layers
on multiple medical image FSL tasks. of the convolutional neural network (CNN) [21]. Thus, the
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: MULTI-LEARNER BASED DEEP META-LEARNING FOR FEW-SHOT MEDICAL IMAGE CLASSIFICATION 21
well-trained initialization parameters through transfer learning SS parameters to implement fine-tuning in the training stage of
can help to reduce the difficulty of learning basic graphic meta-testing.
features in FSL tasks. In this paper, images from Dbase with In FM , three samples from a mini-batch are required for the
all base classes in meta-training are initially utilized to train inputs, which form a triad. There is an anchor ra , a positive
a classification model. Inspired by the MTL method [17], we sample rp and a negative sample rn in the triad. The ra with the
improve the architecture of FSL model that consists of an other two samples rp and rn constitute two pairs, a positive pair
auto-encoder (FAE ), a metric-learner (FM ) and a task-learner ra , rp and a negative pair ra , rn . In addition, the distances
(or classifier, FT ). FAE and FM are responsible for extracting between the ra and the others (rp and rn ) are also computed.
robust and universal hidden representations (Fig. 2(a)), which Then, we employ a triplet margin loss function to penalize a short
is highly concerned with the final classification performance. distance D(ra , rn ) of a negative pair and a long distance D(ra ,
FAE can enhance the semantic consistency between the hidden rp ) of a positive pair, which is defined in (3). Inside, a margin
presentation and the original image, and FM can increase the M is the upper bounded distance between positive and negative
clustering performance of hidden presentations. FT is related to pairs and is set to 1.0. We apply the cosine measurement to
the target task, which can be different in transfer-learning and compute the distance D that is given in (4) (A and B are feature
meta-learning. The well-trained FAE and FM are utilized again vectors).
in following meta-learning. To sum up, the convergence path
of parameters of the proposed FSL model is guided through an Lm = max{0, (D(ra , rp ) − D(ra , rn ) + M )}. (3)
integrated loss, derived from FAE , FM and FT (Fig. 2). Those n
AB Ai Bi
three components are introduced in detail as below. D(A, B) = 1 − = 1 − n i=12 n .
A B i=1 i A 2
i=1 Bi
1) Auto-Encoder: The auto-encoder (FAE ) is an self-
supervised learning technique that is mainly composed of an (4)
encoder (HEn ) and a decoder (HDe ) [22]. HEn maps a high di-
A fixed-dimensional feature embedding vector for each sam-
mensional input X into a low dimensional hidden representation
ple will be output after FM , e.g, 1 × 256. In the case of large-
Xh . HDe is the opposite operation of HEn , that is reconstructing
scale training data, we can easily generate sufficient triplets for
X from Xh . The process can be represented as (1). The loss
metric learning. For N -way K-shot tasks, we get N × K sam-
function La of FAE expects to minimize the reconstruction error
ples in each iteration of meta-training. The number of positive
which can be shown in (2). Inside, L, N and M express the
and negative pairs (Ppos and Pneg ) can be calculated as (5) and
channel, width and height of X and X , respectively.
(6). Then, the number of optional triplets (Ptri ) is shown in (7).
HEn (X) = Xh , HDe (Xh ) = X . (1) N K N K(K − 1)
1 · 2 =
Ppos = . (5)
L
N
M
2
1 2
Xkij − Xkij
La = . (2) K N (N − 1)K 2
LN M Pneg = N2 · K
k=1 i=1 j=1 1 · 1 = 2
. (6)
In our model, FAE is a fully CNN and HEn is also regarded K−1 N (N − 1)K(K − 1)2
Ptri = N1 · N1−1 · K2 · = .
as shared network layers of the feature extractor. The size of Xh 1 2
will be reduced to 21n of X after employing n times downsam- (7)
pling in HEn . We want Xh to own more semantic information
about X. Then, we utilize HDe to reconstruct X , which has Normally, both N and K should have a value of at least 2 when
the same size of X. To ensure semantic consistency between carrying out metric learning. However, to make N -way 1-shot
X and X, the pixel-to-pixel mean square error loss function tasks suitable for metric learning, we adopt augmented samples
is introduced to train the FAE . Notably, HDe works just in the instead to form each triplet. Finally, samples can be mapped
transfer learning phase while keeps deactive in the meta-learning into an embedding space, on which the similarity between each
phase. In meta-training, HEn is frozen and instead a group of sample is learned to help with the final determination.
light-weight SS parameters are adopted to fine-tune HEn . 3) Task-Learner: The task-learner (FT ) is a task-specific
2) Metric-Learner: Metric learning is the overall expression classifier that is trained from scratch for each task or episode
for machine learning approaches based directly on similarities in both transfer-learning and meta-learning phases. As experi-
between samples. The concept of distance is used to describe mental results in [17] showed that the base-learner (FT in our
a relation between two samples. Naturally, two samples being study) with one-layer fully-connected network obtained better
closer to each other means that they are more similar, and vice classification accuracy than others. Hence, we keep this architec-
versa. In our study, taken Xh as the input, we construct a triplet ture of FT . FT is trained through the cross-entropy loss function
metric-learner (FM ) of two neural network layers for metric (Lt ) that is expressed in (8), where C is the number of classes
learning which can be shown in Fig. 2. Besides, to avoid the and yi , pi represent the ground truth and predict probability of
redundancy of FM , we implement parameter sharing for each a sample, respectively.
branch of the triplet. In the preliminary training stage, FM is C
trained for getting appropriate initialization parameters for the Lt = − [yi log(pi ) + (1 − 1yi )log(1 − pi )] . (8)
meta-learning phase. In particular, FM is frozen but endowed i=1
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
22 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 1, JANUARY 2023
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: MULTI-LEARNER BASED DEEP META-LEARNING FOR FEW-SHOT MEDICAL IMAGE CLASSIFICATION 23
B. Implementation Details
1) Episode Sampling: Regarding the three medical datasets,
we consider the 3-class classification task and randomly select
1 (5 or 10) images from each class as training samples and 15
images from the rest as test samples. More concretely, 3-way K-
shot (K = {1, 5, 10}) tasks are constructed for meta-learning. We
in Fig. 5. A random value of ε is introduced to the label of each randomly sample at most 5 k episodes in the meta-training stage,
sample in every iteration. For instance, the original one-hot label and 600 episodes in a test experiment for both meta-validation
{0, 1, 0} can be transformed to the GDSL {0.5ε, 1 − ε, 0.5ε}. and meta-testing. Notably, the model with the highest meta-
Notably, the GDSL of each sample in each epoch is not exactly validation accuracy is elected for meta-testing.
the same, and the larger the value of σ, the larger the mapping 2) Network Architectures: Following the literature [17],
range in the label space. [27], some embedding backbones are taken as feature extractors,
such as the Conv32F (4 CONV) and ResNet-style networks.
V. EXPERIMENTS Specifically, the Conv32F consisted of four convolution blocks,
each of which in turn is composed of a convolution layer, a
A. Datasets batch-normalization layer, a ReLU layer and a max-pooling
To test our FSL method, we random sample and rebuild layer. The numbers of filters for these blocks are {32, 32, 32,
three light-weight subsets from three publicly available medical 32}. The commonly utilized ResNet-style network for FSL
image datasets (i.e., BLOOD [23], PATHOLOGY [24] and includes ResNet12, ResNet18 and ResNet25 [27]. The depth
CHEST [25]). Categories in each dataset are separated for three of ResNet-style model is adjusted through adding or subtracting
parts, meta-training, meta-validation and meta-testing, in which the number of blocks [2]. However, previous researches have
the name and the quantity of each category are enumerated in Ta- verified that the performance of FSL for different ResNet-style
ble I. In this paper, the problem of identifying diseases/categories models is not the deeper the better [27]. In our method, the
in the meta-testing classes is modeled as a FSL problem, and we ResNet25 backbone with 12 blocks (2-CNN-layers) and another
verify our method on the three datasets respectively. CNN-layer is regarded as HEn , and the hidden presentation Xh
BLOOD is built based on a prior database of individual with 640 dimensions can be finally obtained.
normal cells which are organized into eight classes [23]. Inside, 3) Training Details: During the transfer learning phase, the
immature granulocytes (IG) contains four sub-types (type name model is trained by the SGD optimizer [28] and a total of 100
begin with ‘IG-’), which are taken as meta-validation classes. epochs are performed. The learning rate is initialized as 0.1, and
The remaining seven types in the BLOOD with 600 images decayed to its 0.2 times every 20 epoch. In the meta-learning
selected randomly for each type, are employed for meta-training phase, for N -way K-shot tasks, a regular learning considers a
and meta-testing. training step for optimizing FM and φSS{E,M } through the Adam
PATHOLOGY is constructed to predict survival from col- optimizer [29], followed by a validation step for optimizing FT
orectal cancer histology slides, which is a dataset (NCT-CRC- by the SGD optimizer. In each epoch, the learning rate of Adam
HE-100 K) with non-overlapping image patches from hema- is initialized as 0.001, and decayed to its half every 10 epochs.
toxylin & eosin stained histological images [24]. The dataset Concurrently, the original learning rate of SGD is set to 0.01
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
24 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 1, JANUARY 2023
TABLE IV
without periodic decay. Notably, in both transfer-learning and THE 3-WAY, 1-SHOT, 5-SHOT AND 10-SHOT CLASSIFICATION ACCURACY
meta-learning, the batch-size could be pre-set to 32 and the (%) ON THE PATHOLOGY DATASET. THE BEST AND SECOND-BEST
training will stop after 50 epochs. RESULTS ARE HIGHLIGHTED
4) Evaluation: Classification accuracy is used to evaluate the
performance of FSL modes. According to above episode sam-
pling rule, each test experiment contains 600 random episodes.
In the meta-testing stage, we repeat the test experiments 5 times
thus to get 3000 episodes. Finally, the average accuracy with
the 95% confidence interval is reported. Taking the CHEST as
an example, in each test experiment, we randomly select three
categories from the meta-testing classes (i.e., Edema, Fibrosis,
Hernia and Pneumonia) as unseen novel classes in each episode,
for the 3-way K-shot (K = {1, 5, 10}) tasks. Finally, the average
accuracy is calculated based on 3000 episode tests, which greatly TABLE V
reduce the performance perturbation. THE 3-WAY, 1-SHOT, 5-SHOT AND 10-SHOT CLASSIFICATION ACCURACY
(%) ON THE CHEST DATASET. THE BEST AND SECOND-BEST RESULTS ARE
HIGHLIGHTED
VI. RESULT ANALYSIS
A. Evaluations on Multiple Embedding Backbones
Table II demonstrates evaluation results of our proposed
method using HEn with different backbones, including 4 CONV,
ResNet18 and ResNet25 on three datasets. From Table I, we can
conclude that our method with deeper backbones yields better
performance on each dataset. Taking the BLOOD for example,
the classification accuracy of ResNet18 based model outper-
forms that of 4 CONV based by over 4.5%. The ResNet25 based
model has the best result with 63.47% which is 5.48% higher
than 57.99% (ResNet18 based) for the 1-shot task. In brief, our with the increase of shot number for all FSL methods on each
method can acquire better few-shot classification performance dataset.
with relatively deeper network architectures. BLOOD. In Table III, our method achieves top performance
for 1-shot tasks and sub-optimal performance for 5-shot tasks,
respectively. Regarding 3-way 10-shot classification tasks, al-
B. Comparison to State-of-The-Arts
though our method is not optimal, it shows impressive perfor-
Tables III, IV, and V present the overall comparisons to mance which outperforms comparative methods by quite large
related works using the benchmark image size, on the BLOOD, margins. For example, our method has a 10-shot accuracy of
PATHOLOGY and CHEST datasets. Classification accuracies 76.21% that is 18.52% higher than 57.69% (from MAML).
of our method embedding on the ResNet25 are reported. For Furthermore, Fig. 6(a) illustrates the T-SNE distribution of
all compared methods, we have implemented their methods hidden representations of three novel meta-testing categories
with ResNet-style backbones based on open-sourced codes and with our method, which manifests a better clustering effect than
optimize their performance via empirical parameter tuning. Note that with the MTL method (Fig. 6(b)) [17]. In sum, our method
that these evaluation results are the meta-testing results based contributed to an optimized embedding space in which we can
on the well-trained models with the highest meta-validation obtain high-cohesion and low-coupling features, boosting the
accuracies. In these tables, the performance gradually get better FSL performance.
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: MULTI-LEARNER BASED DEEP META-LEARNING FOR FEW-SHOT MEDICAL IMAGE CLASSIFICATION 25
TABLE VII
ABLATION STUDY OF TRANSFER-LEARNING (TL), DATE AUGMENTATION IN
META-LEARNING (DA-META), GSDLσ=0.01 , AUTO-ENCODER FAE AND
METRIC-LEARNER FM FOR 5-WAY, 1-SHOT AND 5-SHOT CLASSIFICATION
ACCURACY (%) ON THE MINIIMAGENET DATASET. THE BEST RESULTS ARE
HIGHLIGHTED
TABLE VI
THE 3-WAY, 1-SHOT AND 5-SHOT CLASSIFICATION ACCURACY (%) ON THE
CHEST DATASET WITH 224 × 224 IMAGE SIZE. THE BEST AND when enlarging image size. It directly leads to performance of
SECOND-BEST RESULTS ARE HIGHLIGHTED
some methods deteriorating under the large images, especially
for the 3-way 1-shot tasks. To sum up, our method has better
adaptability to large size images and achieves top performance
on few-shot classification tasks.
C. Ablation Study
Table VII displays ablation studies of the transfer-
learning (TL), data augmentation in meta-learning (DA-meta),
GSDLσ=0.01 strategy, FAE and FM , for 5-way 1-shot and
5-shot tasks on the miniImageNet. The ablation experiment
is founded on the baseline model of ResNet25 backbone.
PATHOLOGY. In Table IV, we give the results on the As Table VII shows, compared with the baseline model, the
PATHOLOGY. From this table, we again confirm that our transfer-learning (TL) phase can obtain about 13% and 15%
method outperforms others. Our method achieves around a improvement in 5-way 1-shot and 5-shot classification accu-
margin of 3.55% (0.05%) over the second-best Versa method racy, respectively. The main purpose of data augmentation in
on 1-shot (5-shot) tasks. An interesting observation is that the meta-learning (DA-meta) is to generate sufficient trplets for the
increases from 1-shot tasks to 5-shot tasks are much greater than metric-learner, as illustrated in Section VI.C. However, it can
the gain from 5-shot tasks to 10-shot tasks on the basis of each also improve the classification accuracy individully. Besides,
FSL method. This shows that FSL models are more data-hungry the GSDLσ=0.01 strategy that is used to alleviate model over-
when the shot number is relatively small. fitting can enhance the classification accuracy, especially for the
CHEST. Table V shows the results on the CHEST. As the 5-way 1-shot tasks. Moreover, we can analyze that both FAE
differences between each type of chest Xray images are not ob- and FM can independently boost the classification accuracies.
vious, increasing the difficulty of the FSL tasks. However, from Combining the two learners together can further advance the
Table V, we still observe that our approach consistently achieves classification performance, acquiring an accuracy of 64.26% for
finer performance. Our approach outperforms the ANIL which 5-way 1-shot and 79.35% for 5-way 5-shot. The results of abla-
performs poorly, around 10% for the 1-shot, over 12% both for tion experiments prove the necessity of TL phase and DA-meta,
the 5-shot and 10-shot tasks, respectively. and also demonstrate the effectiveness of GDSL strategy and
As the pathological features of CHEST maybe severely at- multiple learners (FAE and FM ), respectively. In the end, the
tenuate under the benchmark image size (i.e., 84 × 84), we also combination of all learners with the GDSL strategy is validated
validate the classification accuracy of 3-way, 1-shot and 5-shot optimal for few-shot classification tasks.
tasks from the CHEST with larger size of 224 × 224. Due to
computational resource constraints, 3-way 10-shot experiments
are omitted. From Table VI, we can observe that FSL methods, D. Parameter Sensitivity Analysis
including ProtoNet, R2D2, MTL, and Ours, acquire significant Regarding the GDSL strategy, the variance σ in N (μ, σ)
performance promotion with larger image size. We speculate reflects the mapping range in the label space for each sample,
that images with larger size enable the pathological details more which has different influences on the performance of FSL clas-
clear. Inside, our method still achieves the best classification sification tasks. On the one hand, the larger σ makes the label
performance on both 3-way, 1-shot and 5-shot tasks. Concretely, fluctuating greatly, adversely degenerating the performance of
compared with small image size experiments, our method also models. On the other hand, the smaller σ has little effect on the
obtains 2.86% and 3.56% gains on 3-way, 1-shot and 5-shot promotion of model generalization. To conduct the sensitivity
tasks respectively. However, noises are inevitably introduced analysis on the hyper-parameter σ, we set up experiments above
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
26 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 1, JANUARY 2023
Fig. 7. Test accuracy of 3-way, 1-shot, 5-shot, and 10-shot tasks on the (a) BLOOD, (b) PATHOLOGY and (c) CHEST datasets with different
variances (σ).
three medical datasets with 3-way, 1-shot, 5-shot and 10-shot TABLE VIII
THE 5-WAY, 1-SHOT AND 5-SHOT CLASSIFICATION ACCURACY (%) OF
tasks. Specifically, σ is compared in the range of [0.01, 0.02, DIFFERENT FSL METHODS USING THE ORIGINAL PAPER SETTINGS ON THE
0.03, 0.04, 0.05, 0.1, 0.2]. Fig. 7 demonstrates the best meta- MINIIMAGENET DATASET. THE BEST RESULTS ARE HIGHLIGHTED
testing accuracy using different σ on the BLOOD, PATHOL-
OGY and CHEST respectively.
In Fig. 7(a), as for σ on the BLOOD, although there is no
clear winner among different shots tasks, σ = 0.04 achieves the
highest test accuracy in 1-shot and 10-shot tasks and decent
test accuracy in 5-shot tasks. Similarly, the optimal value of σ
is 0.04 for the CHEST. In Fig. 7(b), the test accuracy curve of
1-shot generally show initially an increased and then a downward
trend. Thus, σ = 0.03 can be regarded as the best choice for the
PATHOLOGY.
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
JIANG et al.: MULTI-LEARNER BASED DEEP META-LEARNING FOR FEW-SHOT MEDICAL IMAGE CLASSIFICATION 27
TABLE IX
THE 5-WAY, 1-SHOT AND 5-SHOT CLASSIFICATION ACCURACY (%) ON CROSS-DOMAIN TRANSFERABILITY. ALL METHODS ARE LEARNED FROM THE SOURCE
DOMAIN, AND DIRECTLY EVALUATED ON THE TESTSET OF THE TARGET DOMAIN
VII. CONCLUSION [8] W.-Y. Chen, Y.-C. Liu, Z. Kira, Y.-C. F. Wang, and J.-B. Huang, “A
closer look at few-shot classification,” in Int. Conf. Learn. Representations,
In this paper, we propose an effective FSL framework for 2019, pp. 1–16. [Online]. Available: https://openreview.net/forum?id=
medical image classification, fusing both transfer-learning and HkxLXnAcFQ
[9] V. Cheplygina, “Cats or cat scans: Transfer learning from natural or medi-
meta-learning. We innovatively put forward to a multi-learner cal image source data sets?,” Curr. Opin. Biomed. Eng., vol. 9, pp. 21–27,
based model, including autoencoder, metric-learner and task- 2019.
learner, which is trained sequentially in the training stages of [10] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot
learning,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 4080–4090.
transfer-learning and meta-learning. Extensive experiments of [11] S. Laenen and L. Bertinetto, “On episodes, prototypical networks, and few-
3-way K-shot (K = {1, 5, 10}) FSL tasks on three medical shot learning,” in Proc. Adv. Neural Inf. Process. Syst., 2021, pp. 24581–
image datasets (BLOOD, PATHOLOGY and CHEST), witness 24592.
[12] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales,
the superiority of our method compared with state-of-the-arts. “Learning to compare: Relation network for few-shot learning,” in Proc.
The consistent improvements by the GDSL strategy proved that IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1199–1208.
the soft label space could expand the mapping range dynam- [13] W. Li, J. Xu, J. Huo, L. Wang, Y. Gao, and J. Luo, “Distribution consistency
based covariance metric networks for few-shot learning,” in Proc. AAAI
icly, benefitting the efficient FSL. Concurrently, we verify the Conf. Artif. Intell., 2019, pp. 8642–8649.
cross-domain transferability from the miniImageNet to each [14] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast
medical dataset and further confirm the stability and robust- adaptation of deep networks,” in Proc. Int. Conf. Mach. Learn., 2017,
pp. 1126–1135.
ness of our method. The proposed multiple learners can learn [15] L. Wang, Q. Cai, Z. Yang, and Z. Wang, “On the global optimality of
a few-shot classification task from several aspects, including model-agnostic meta-learning,” in Proc. Int. Conf. Mach. Learn., 2020,
semantic consistency, similarity and category discrimination. pp. 9837–9846.
[16] A. A. Rusu et al., “Meta-learning with latent embedding optimization,”
The multi-learner based model may provide a new research in Proc. Int. Conf. Learn. Representations, 2019, pp. 1–17. [Online].
idea for FSL on medical images, and each learner can also be Available: https://openreview.net/forum?id=BJgklhAcK7
further improved to achieve better performance. In the following [17] Q. Sun, Y. Liu, Z. Chen, T.-S. Chua, and B. Schiele, “Meta-transfer learning
through hard tasks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 3,
research, we will focus on optimizing the network structure pp. 1443–1456, Mar. 2020.
of each learner to get advanced classification accuracies on [18] K. Mahajan, M. Sharma, and L. Vig, “Meta-dermdiagnosis: Few-shot
cross-domain FSL tasks. skin disease identification using meta-learning,” in Proc. IEEE/CVF Conf.
Comput. Vis. Pattern Recognit. Workshops, 2020, pp. 730–731.
[19] Y. Hu, Z. Zhong, R. Wang, H. Liu, Z. Tan, and W.-S. Zheng, “Data
augmentation in logit space for medical image classification with limited
REFERENCES training data,” in Proc. Int. Conf. Med. Image Comput. Comput.-Assist.
Interv., 2021, pp. 469–479.
[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, [20] S. Mai, Q. Li, Q. Zhao, and M. Gao, “Few-shot transfer learning for
no. 7553, pp. 436–444, 2015. hereditary retinal diseases recognition,” in Proc. Int. Conf. Med. Image
[2] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image Comput. Comput.-Assist. Interv., 2021, pp. 97–107.
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, [21] W. Rawat and Z. Wang, “Deep convolutional neural networks for image
pp. 770–778. classification: A comprehensive review,” Neural Comput., vol. 29, no. 9,
[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification pp. 2352–2449, 2017.
with deep convolutional neural networks,” Commun. ACM, vol. 60, no. 6, [22] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of
pp. 84–90, 2017. data with neural networks,” Sci., vol. 313, no. 5786, pp. 504–507, 2006.
[4] E. Schwartz et al., “Delta-encoder: An effective sample synthesis method [23] A. Acevedo, A. Merino, S. Alférez, Á. Molina, L. Boldú, and J. Rodel-
for few-shot object recognition,” in Proc. Adv. Neural Inf. Process. Syst., lar, “A dataset of microscopic peripheral blood cell images for devel-
2018, pp. 2850–2860. opment of automatic recognition systems,” Data Brief, vol. 30, 2020,
[5] G. Litjens et al., “A survey on deep learning in medical image analysis,” Art. no. 105474.
Med. Image Anal., vol. 42, pp. 60–88, 2017. [24] J. N. Kather et al., “Predicting survival from colorectal cancer histology
[6] D. Erhan, A. Courville, Y. Bengio, and P. Vincent, “Why does unsupervised slides using deep learning: A retrospective multicenter study,” PLoS Med.,
pre-training help deep learning?,” in Proc. 13th Int. Conf. Artif. Intell. vol. 16, no. 1, 2019, Art. no. e1002730.
Statist., 2010, pp. 201–208. [25] X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “Chestx-
[7] H. Qi, M. Brown, and D. G. Lowe, “Low-shot learning with imprinted ray8: Hospital-scale chest x-ray database and benchmarks on weakly-
weights,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, supervised classification and localization of common thorax diseases,” in
pp. 5822–5830. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2097–2106.
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.
28 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 27, NO. 1, JANUARY 2023
[26] O. Vinyals et al., “Matching networks for one shot learning,” in Proc. Adv. [34] G. Cheng, R. Li, C. Lang, and J. Han, “Task-wise attention guided part
Neural Inf. Process. Syst., 2016, pp. 3637–3645. complementary learning for few-shot image classification,” Sci. China Inf.
[27] W. Li et al., “LibFewShot: A comprehensive library for few-shot learning,” Sci., vol. 64, no. 2, pp. 1–14, 2021.
2022, arXiv:2109.04898. [35] F. Zhou, L. Zhang, and W. Wei, “Meta-generating deep attentive metric for
[28] L. Bottou, “Stochastic gradient descent tricks,” in Neural Networks: Tricks few-shot classification,” IEEE Trans. Circuits Syst. Video Technol., vol. 32,
of the Trade, Berlin, Germany: Springer, 2012, pp. 421–436. no. 10, pp. 6863–6873, Oct. 2022.
[29] I. K. M. Jais, A. R. Ismail, and S. Q. Nisa, “Adam optimization algorithm [36] K. Lee, S. Maji, A. Ravichandran, and S. Soatto, “Meta-learning with
for wide and deep neural network,” Knowl. Eng. Data Sci., vol. 2, no. 1, differentiable convex optimization,” in Proc. IEEE/CVF Conf. Comput.
pp. 41–46, 2019. Vis. Pattern Recognit., 2019, pp. 10657–10665.
[30] J. Gordon, J. Bronskill, M. Bauer, S. Nowozin, and R. E. Turner, “Versa: [37] C. Simon, P. Koniusz, R. Nock, and M. Harandi, “Adaptive subspaces
Versatile and efficient few-shot learning,” in Proc. 3rd Workshop Bayesian for few-shot learning,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Deep Learn., 2018, pp. 1–9. Recognit., 2020, pp. 4136–4145.
[31] L. Bertinetto, J. F. Henriques, P. Torr, and A. Vedaldi, “Meta-learning [38] Y. Chen, Z. Liu, H. Xu, T. Darrell, and X. Wang, “Meta-baseline: Exploring
with differentiable closed-form solvers,” in Proc. Int. Conf. Learn. Rep- simple meta-learning for few-shot learning,” in Proc. IEEE/CVF Int. Conf.
resentations, 2019, pp. 1–15. [Online]. Available: https://openreview.net/ Comput. Vis., 2021, pp. 9062–9071.
forum?id=HyxnZh0ct7 [39] Q. Luo, L. Wang, J. Lv, S. Xiang, and C. Pan, “Few-shot learning
[32] A. Raghu, M. Raghu, S. Bengio, and O. Vinyals, “Rapid learning or feature via feature hallucination with variational inference,” in Proc. IEEE/CVF
reuse? towards understanding the effectiveness of MAML,” in Proc. Int. Winter Conf. Appl. Comput. Vis., 2021, pp. 3963–3972.
Conf. Learn. Representations, 2020, pp. 1–21. [Online]. Available: https: [40] Z. Yu, L. Chen, Z. Cheng, and J. Luo, “Transmatch: A transfer-learning
//openreview.net/forum?id=rkgMkCEtPB scheme for semi-supervised few-shot learning,” in Proc. IEEE/CVF Conf.
[33] A. Antoniou, H. Edwards, and A. Storkey, “How to train your MAML,” Comput. Vis. Pattern Recognit., 2020, pp. 12856–12864.
in Proc. Int. Conf. Learn. Representations, 2019, pp. 1–11. [Online].
Available: https://openreview.net/forum?id=HJGven05Y7
Authorized licensed use limited to: Mar Athanasius College Of Engineering. Downloaded on September 02,2024 at 08:23:05 UTC from IEEE Xplore. Restrictions apply.