Bioengineering 11 00406 v2
Bioengineering 11 00406 v2
Article
Deep Transfer Learning Using Real-World Image Features for
Medical Image Classification, with a Case Study on Pneumonia
X-ray Images
Chanhoe Gu 1 and Minhyeok Lee 1,2, *
1 Department of Intelligent Semiconductor Engineering, Chung-Ang University, Seoul 06974, Republic of Korea;
[email protected]
2 School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
* Correspondence: [email protected]
Abstract: Deep learning has profoundly influenced various domains, particularly medical image
analysis. Traditional transfer learning approaches in this field rely on models pretrained on domain-
specific medical datasets, which limits their generalizability and accessibility. In this study, we
propose a novel framework called real-world feature transfer learning, which utilizes backbone
models initially trained on large-scale general-purpose datasets such as ImageNet. We evaluate the
effectiveness and robustness of this approach compared to models trained from scratch, focusing
on the task of classifying pneumonia in X-ray images. Our experiments, which included converting
grayscale images to RGB format, demonstrate that real-world-feature transfer learning consistently
outperforms conventional training approaches across various performance metrics. This advance-
ment has the potential to accelerate deep learning applications in medical imaging by leveraging
the rich feature representations learned from general-purpose pretrained models. The proposed
methodology overcomes the limitations of domain-specific pretrained models, thereby enabling
accelerated innovation in medical diagnostics and healthcare. From a mathematical perspective, we
formalize the concept of real-world feature transfer learning and provide a rigorous mathematical
Citation: Gu, C.; Lee, M. Deep
formulation of the problem. Our experimental results provide empirical evidence supporting the
Transfer Learning Using Real-World
Image Features for Medical Image
effectiveness of this approach, laying the foundation for further theoretical analysis and exploration.
Classification, with a Case Study on This work contributes to the broader understanding of feature transferability across domains and
Pneumonia X-ray Images. has significant implications for the development of accurate and efficient models for medical image
Bioengineering 2024, 11, 406. analysis, even in resource-constrained settings.
https://doi.org/10.3390/
bioengineering11040406 Keywords: deep learning; transfer learning; medical image analysis; X-ray images; pneumonia
classification; convolutional neural networks; feature extraction
Academic Editors: Andrea Cataldo,
Ming Liu, Liquan Dong and
Qingliang Jiao
MSC: 68T27
However, the application of transfer learning in the context of medical image analysis
presents unique challenges. Medical images such as X-rays, Computed Tomography (CT)
scans, and Magnetic Resonance Imaging (MRI) exhibit distinct characteristics compared to
natural images. These images are typically grayscale, have high resolution, and contain
intricate anatomical structures and pathological patterns [12–14]. Consequently, the direct
application of pretrained models from natural image domains to medical image tasks may
not always yield optimal results.
To address this challenge, current approaches in medical image transfer learning often
rely on domain-specific pretrained models. These models are trained on large-scale medical
image datasets, which capture the unique characteristics of medical images. While these
domain-specific pretrained models have shown promising results, they have limitations in
terms of accessibility and generalizability. The availability of large-scale annotated medical
image datasets is limited, and the development of domain-specific pretrained models
requires substantial computational resources and expertise.
In this paper, we propose a novel approach to transfer learning for medical image
analysis, which we term “real-world feature transfer learning”. Our approach leverages
pretrained models from general image domains such as ImageNet as feature extractors for
medical image tasks. We hypothesize that the features learned by these models, despite
being derived from natural images, can provide meaningful representations for medical
images. By utilizing readily available pretrained models, our approach aims to overcome
the limitations of domain-specific pretrained models and enable more accessible and
efficient transfer learning for medical image analysis.
To validate our approach, we focus on the task of pneumonia detection using chest
X-ray images. Pneumonia is a common respiratory infection that affects millions of people
worldwide and is a leading cause of mortality, particularly in children and the elderly [15].
Accurate and timely diagnosis of pneumonia is crucial for effective treatment and patient
management. Chest X-ray imaging is the primary diagnostic tool for pneumonia; however,
the interpretation of these images can be challenging, even for experienced radiologists [16].
We conducted extensive experiments to evaluate the effectiveness of real-world feature
transfer learning for pneumonia detection. We employed various conventional CNN
architectures, including ResNet [17] and DenseNet [18], pretrained on the ImageNet dataset.
To adapt these models to the grayscale nature of chest X-ray images, we propose a simple
yet effective technique of replicating the grayscale channel to form a three-channel input
compatible with the pretrained models.
Our experimental results demonstrate that real-world-feature transfer learning achieves
superior performance compared to training models from scratch on the pneumonia detec-
tion task. The pretrained models exhibited faster convergence along with higher accuracy,
precision, recall, and F1 scores. These findings suggest that the features learned from
natural images can indeed be effectively transferred to medical image domains, even when
the source and target domains have significant differences.
The significance of our work lies in its potential to democratize deep learning for
medical image analysis. By leveraging readily available pretrained models from general
image domains, our approach reduces the reliance on domain-specific pretrained models
and large-scale annotated medical image datasets. This can facilitate the development of
accurate and efficient models for various medical image analysis tasks even in resource-
constrained settings. Furthermore, our approach opens up new avenues for exploring the
transferability of features across different domains and modalities, potentially leading to
more generalized and robust deep learning models.
From a mathematical perspective, our work contributes to the understanding of feature
representations and their transferability across domains. We formalize the concept of real-
world feature transfer learning and provide a rigorous mathematical formulation of the
problem. Our experimental results provide empirical evidence supporting the effectiveness
of this approach, laying the foundation for further theoretical analysis and exploration.
The main contributions of our work are as follows:
Bioengineering 2024, 11, 406 3 of 21
2. Related Work
Transfer learning has gained significant attention in the deep learning community,
particularly in the field of medical image analysis. The concept of transfer learning involves
leveraging knowledge acquired from a source domain to improve performance on a target
domain [19,20]. In the context of deep learning, transfer learning often involves using
pretrained models trained on large-scale datasets as feature extractors or performing
initialization for target tasks [21–23].
Recent studies have investigated the effectiveness of transfer learning for various
medical image analysis tasks. Alzubaidi et al. [24] proposed a novel transfer learning
approach for skin and breast cancer classification with medical images. They trained a
deep convolutional neural network (DCNN) model on large unlabeled medical image
datasets and then transferred the knowledge to train the model on a small amount of
labeled medical images. Their approach showed significantly improved performance on
both skin and breast cancer classification tasks compared to training from scratch. In the
domain of chest X-ray analysis, transfer learning has been extensively explored in recent years.
Rahman et al. [25] investigated the effectiveness of transfer learning for pneumonia detection
in chest X-rays. They evaluated various pretrained CNN architectures, such as AlexNet,
ResNet18, DenseNet201, and SqueezeNet, and found that fine-tuning these models led to
improved performance compared to training from scratch. Chouhan et al. [26] proposed a
novel transfer learning approach for pneumonia detection using an ensemble of pretrained
CNN models. Their approach achieved state-of-the-art performance on a public dataset,
outperforming individual CNN models.
While these studies have shown promising results, they primarily focus on transfer
learning within the medical domain, using pretrained models that have been trained on
medical image datasets. In contrast, our work explores the feasibility of transfer learning
from general image domains to medical image domains by leveraging pretrained models
from datasets such as ImageNet.
The concept of transferring knowledge from general image domains to medical image
domains has gained attention in recent studies. Raghu et al. [27] investigated the trans-
ferability of features from natural image datasets to medical image tasks. They found
that pretrained models from ImageNet can be effectively adapted to medical image clas-
sification tasks, achieving comparable performance to models trained from scratch on
medical datasets.
Building upon these findings, our work aims to provide a comprehensive analysis of
the effectiveness of real-world feature transfer learning for medical image analysis, focusing
on the task of pneumonia detection in chest X-rays. We extend the existing literature by
investigating the transferability of features from general image domains to the specific
domain of chest X-ray analysis, and propose a simple yet effective technique for adapting
pretrained models to grayscale medical images.
In addition to transfer learning, our work is related to the broader field of deep learn-
ing for medical image analysis. Deep learning has revolutionized the field of medical
image analysis, enabling the development of automated systems for various tasks, includ-
ing classification, detection, and segmentation [2,28]. CNNs have become the dominant
Bioengineering 2024, 11, 406 4 of 21
architecture in medical image analysis thanks to their ability to learn hierarchical features
directly from raw image data [29,30].
Several deep learning-based approaches have been proposed for pneumonia detection
in chest X-rays in recent years. Ayan and Ünver [31] developed a deep learning model
using a custom CNN architecture for pneumonia detection. They achieved high accuracy
and sensitivity on a public dataset, demonstrating the effectiveness of deep learning for
this task. Our work extends these studies by specifically focusing on the transfer learning
aspect and exploring the feasibility of leveraging pretrained models from general image
domains for pneumonia detection. We provide a comprehensive analysis of different CNN
architectures and their performance when utilized within a real-world feature transfer
learning framework.
3. Background
Transfer learning is a machine learning technique that leverages knowledge gained
from solving one problem and applies it to a different but related problem [19]. The goal
of transfer learning is to improve learning performance in the target domain utilizing
the knowledge learned from the source domain. This section provides a mathematical
formulation of transfer learning and its key concepts.
Definition 3. Given a source domain DS and corresponding source task TS along with a target
domain D T and corresponding target task T T , transfer learning aims to improve the learning of the
targeted conditional probability distribution P(YT | XT ) in D T using the knowledge learned from
DS and TS , where DS ̸= DT or TS ̸= T T .
Definition 4. In inductive transfer learning, the target task is different from the source task,
regardless of whether or not the source and target domains are the same. In this setting, the target
domain labeled data are available, and the goal is to improve the target task performance using the
knowledge learned from the source domain.
Given a source domain DS and corresponding source task TS along with a target do-
main D T and corresponding target task T T , inductive transfer learning aims to improve the
learning of the targeted conditional probability distribution P(YT | XT ) using the knowledge
learned from DS and TS , where TS ̸= T T .
Definition 5. In transductive transfer learning, the source and target tasks are the same, while
the source and target domains are different. In this setting, the target domain labeled data are not
available, and the goal is to improve the target task performance using the knowledge learned from
the source domain.
Bioengineering 2024, 11, 406 5 of 21
Given a source domain DS and corresponding source task TS along with a target
domain D T and corresponding target task T T , transductive transfer learning aims to im-
prove the learning of the target conditional probability distribution P(YT | XT ) using the
knowledge learned from DS and TS , where DS ̸= D T and TS = T T .
Definition 6. In unsupervised transfer learning, the target task is different from but related to
the source task, and no labeled data are available in either the source or target domains. The goal is
to improve the target task performance using the knowledge learned from the source domain in an
unsupervised manner.
Given a source domain DS and corresponding source task TS along with a target
domain D T and corresponding target task T T , unsupervised transfer learning aims to
improve the learning of the targeted conditional probability distribution P(YT | XT ) using
the knowledge learned from DS and TS , where TS ̸= T T and where no labeled data are
available in either DS or D T .
Definition 7. Let f S : XS → YS be a pretrained deep neural network model for source task TS ,
parameterized by θS . The goal of transfer learning in deep neural networks is to learn a target model
f T : X T → Y T , parameterized by θ T , by leveraging the knowledge learned from f S .
The transfer learning process in deep neural networks can be formalized as follows:
(0)
θ T = θS∗ , (1)
θ T∗ = arg min LT ( f T ( XT ; θ T ), YT ), (2)
θT
(0)
where θS∗ represents the optimal parameters of the pretrained source model f S , θ T repre-
sents the initial parameters of the target model f T , L T is the loss function for the target task,
and θ T∗ represents the optimal parameters of the target model after fine-tuning.
Remark 1. The pretrained source model f S can be used in various ways for transfer learning,
such as:
• Using f S as a fixed feature extractor and training a new classifier on top of the extracted
features for the target task.
• Fine-tuning the entire network f S using the target task data, allowing the pretrained parameters
to adapt to the target domain.
• Freezing some layers of f S and fine-tuning the remaining layers for the target task, balancing
the knowledge transfer and adaptation to the target domain.
The choice of transfer learning strategy depends on factors such as the size of the
target dataset, the similarity between the source and target domains, and the available
computational resources.
and object detection [34–39]. This section provides a rigorous mathematical formulation of
how this methodology is adapted for these specific tasks.
3.4.1. Segmentation
Definition 8. Let X be the input space of images and let Y be the output space of pixel-wise class
labels. The goal of image segmentation is to learn a mapping function f seg : X → Y that assigns a
class label to each pixel in an image.
Proposition 1. The mapping function f seg can be decomposed into two components: the pretrained
backbone model, parameterized by θ ∗ , and additional segmentation-specific layers, parameterized
by ψ.
f seg ( x; θ ∗ , ψ) = ψ ◦ θ ∗ ( x ) ∀x ∈ X (3)
Definition 9. Let xi denote a pixel in an image x ∈ X and let yi be its corresponding segmentation
label. The objective of the segmentation task is to learn the optimal parameters ψseg ∗ that minimize
where L is a suitable loss function, such as the cross-entropy loss, and |X | denotes the total
number of pixels in the image.
Definition 10. Let B be the space of bounding boxes and let C be the space of object classes. The
goal of object detection is to learn a mapping function f det : X → B × C that predicts a set of
bounding boxes and their associated class labels for objects in an image.
Proposition 2. Similar to segmentation, the mapping function f det can be decomposed into the pre-
trained backbone model, parameterized by θ ∗ , and additional detection-specific layers, parameterized
by ψ.
f det ( x; θ ∗ , ψ) = ψ ◦ θ ∗ ( x ) ∀x ∈ X (5)
Definition 11. Let yb,c denote the ground-truth bounding boxes and their associated class labels
∗
for an image x ∈ X . The objective of the object detection task is to learn the optimal parameters ψdet
that minimize the expected loss over the target dataset D T :
∗
ψdet = arg min Ex,y∼DT [ L(yb,c , f det ( x; θ ∗ , ψ))] (6)
ψ
where L is a loss function that typically includes terms for both the bounding box coordi-
nates and the class labels.
Remark 2. In practice, it is common to fine-tune the entire network, both θ and ψ, on the target
task dataset in order to achieve better performance. This approach differs from conventional transfer
learning, where only the target task parameters ψ are fine-tuned while keeping the source task
parameters θ fixed.
Proposition 3. The fine-tuning process for segmentation and object detection can be formulated
as follows:
Bioengineering 2024, 11, 406 7 of 21
|X |
" #
1
∗
(θseg ∗
, ψseg ) = arg min Ex,y∼DT
θ,ψ |X | ∑ L(yi , f seg (xi ; θ, ψ)) (7)
i =1
∗ ∗
(θdet , ψdet ) = arg min Ex,y∼DT [ L(yb,c , f det ( x; θ, ψ))] (8)
θ,ψ
Remark 3. The fine-tuning process allows the pretrained backbone to adapt its learned representa-
tions to the specific characteristics of the target task, leading to improved performance compared to
using fixed backbone parameters.
does not need to learn the features from scratch. This is particularly important when the
target task has limited labeled data, as the model can leverage the knowledge learned from
the source task to improve its performance. Finally, transfer learning can help to reduce
overfitting on the target task, as the pretrained model has already learned a robust set of
features that generalize well to related tasks.
In the context of medical image analysis, transfer learning has been widely adopted
to address the challenges of limited labeled data and the need for efficient and accurate
models. Medical imaging datasets are often small and expensive to annotate, making it
difficult to train deep learning models from scratch. By leveraging pretrained models from
large-scale datasets, transfer learning allows for the development of accurate models for
medical image analysis tasks, even with limited labeled data. This has led to significant
improvements in the performance of deep learning models for tasks such as medical image
classification, segmentation, and detection.
Despite the success of transfer learning in medical image analysis, there are still
challenges and opportunities for further research. One challenge is the domain shift
between the source and target tasks, which can limit the effectiveness of transfer learning.
This is particularly relevant when transferring knowledge from natural image datasets to
medical image datasets, as there can be significant differences in the characteristics of the
images and the underlying data distributions. To address this challenge, techniques such
as domain adaptation and unsupervised domain adaptation have been proposed to align
the feature spaces of the source and target domains and improve the transferability of the
learned features.
Definition 12. Let XC and X M denote the input spaces of conventional images and medical images,
respectively. We define a feature extraction function ϕ : X → F , where F is the feature space and
X = XC ∪ X M .
Assumption 2. We assume that the feature extraction function ϕ is a deep learning model parame-
N
terized by θ which has been trained on a large dataset of conventional images DC = {( xi , yi )}i=C1 ,
where xi ∈ XC and yi is the corresponding label.
Definition 13. Let FC and F M be the feature spaces corresponding to conventional and medical
images, respectively, such that FC = ϕ(XC ) and F M = ϕ(X M ).
Proposition 4. The features learned by the deep learning model ϕ from conventional images are
similar to the features of medical images, i.e., FC ≈ F M .
To prove this proposition, we introduce a measure of similarity between the feature spaces.
Definition 14. Let d : F × F → R be a distance function that quantifies the dissimilarity between
two feature vectors. We define the average distance between the feature spaces FC and F M as
1
D (FC , F M ) =
|XC ||X M | ∑ ∑ d(ϕ( xC ), ϕ( x M )). (9)
xC ∈XC x M ∈X M
Assumption 3. We assume that the distance function d satisfies the properties of a metric, i.e.,
non-negativity, identity of indiscernibles, symmetry, and triangle inequality.
Bioengineering 2024, 11, 406 9 of 21
Lemma 1. If the average distance between the feature spaces FC and F M is small, i.e., if D(FC , F M ) <
ϵ for some small ϵ > 0, then the features learned from conventional images are similar to the features of
medical images.
d(ϕ( xC ), ϕ( x M )) ≤
′
∑ ′
∑ d(ϕ( xC′ ), ϕ( x ′M )), (10)
xC ∈XC x M ∈X M
This implies that, for any pair of conventional and medical images, the distance between
their feature representations is bounded by a small value proportional to ϵ; therefore, the
features learned from conventional images are similar to the features of medical images.
Remark 4. The choice of the distance function d is crucial for quantifying the similarity between
feature spaces. Common choices include the Euclidean distance, cosine distance, and more advanced
metrics such as the Fréchet Inception Distance (FID) [40].
Conjecture 1. The similarity between the features learned from conventional images and the
features of medical images can be further improved by fine-tuning the pretrained model ϕ on a small
N
dataset of medical images D M = {( xi , yi )}i=M1 , where xi ∈ X M and yi is the corresponding label.
Proposition 5. Fine-tuning the pretrained model ϕ on the medical image dataset D M results in
an updated feature extraction function ϕ′ that minimizes the average distance between the feature
spaces FC and F M .
5. Method
Figure 1 illustrates the overall framework of our proposed real-world feature transfer
learning approach for medical image classification. Our methodology aims to reconcile
the commonly held belief that transfer learning between general image datasets such as
ImageNet and specialized medical image datasets such as X-ray images is challenging due
to the disparity in their features. Typically, deep learning models trained on datasets such as
ImageNet learn to recognize a plethora of real-world objects, such as the shape of animals,
color of fruits, structure of machines, etc. In contrast, medical images are predominantly
grayscale, and contain intricate patterns and details that represent health conditions and
diseases. This substantial difference in features makes transfer learning a challenging task.
Bioengineering 2024, 11, 406 10 of 21
Figure 1. Overall framework of the proposed real-world feature transfer learning approach for
medical image classification.
Further, a conventional deep learning model expects RGB images as input, necessitat-
ing the alteration of the model structure to accommodate grayscale images, which often
complicates the transfer learning process. In our methodology, we propose an approach
that circumvents these limitations and demonstrate the feasibility of transfer learning
between these dissimilar domains.
the valuable feature representations learned from the source domain and prevent overfitting
to the limited labeled data in the target domain.
To adapt the pretrained model to the specific task of medical image classification, we
append one or more layers, represented by G( x; θ, ψ), on top of the pretrained model. These
appended layers are designed to learn the task-specific mapping between the extracted
features and the corresponding class labels. The parameters of these appended layers,
denoted as ψ, are trainable, and are fine-tuned using the labeled medical image data.
During the fine-tuning process, the weights of the appended layers are updated using
the labeled medical image data. This allows the model to adapt to the specific characteristics
and patterns present in the medical images. The fine-tuning process is typically performed
using a smaller learning rate compared to the initial training of the pretrained model, as we
want to preserve the knowledge gained from the source domain while gradually adapting
to the target domain.
By leveraging the power of transfer learning through the use of a pretrained model
as the backbone and fine-tuning the appended layers, our model architecture effectively
captures the relevant features and patterns in medical images. This approach allows us to
benefit from the knowledge gained from large-scale datasets while adapting to the specific
requirements of medical image classification tasks.
In the above equation, DT represents the target domain data and L(y, G( x; θ ∗ , ψ)) is
the loss function that we aim to minimize. The parameters ψ are then fine-tuned based on
the medical image data.
• ResNeXt50 (32x4d) [43]: ResNeXt50 is a member of the ResNeXt family, which offers
a simple and scalable way to increase model capacity without a significant increase in
computational cost. The variant we used, denoted as 32x4d, indicates that the network has
a cardinality (number of parallel paths) of 32 and a width (number of channels) of 4.
• Wide ResNet50-2 [44]: Wide ResNet is a variant of the ResNet models with wider con-
volutional layers instead of deeper ones. The version we used, Wide ResNet50-2, indi-
cates that the network depth is 50 and the width is twice that of the
original ResNet50.
Each model was fine-tuned using the Pneumonia X-ray dataset [45] with Binary Cross-
Entropy (BCE) as the loss function. We adopted a training approach that preserves the
learned features in the pretrained models by freezing the weights in the convolutional
layers and only updating the weights in the newly appended fully connected layer.
5.6. Hyperparameters
The chosen hyperparameters for our experiments were an epoch size of 20 and a batch size
of 64. These values were chosen based on previous studies and our preliminary experiments.
5.8. Dataset
The Pneumonia X-ray dataset, which contains a total of 5863 X-ray images catego-
rized into Pneumonia and Normal, was used for this study. The dataset is divided into
training and testing sets, with 1341 normal and 3875 pneumonia images in the training
set and 234 normal and 390 pneumonia images in the testing set. Large disparities in
the distribution of classes represent a common challenge in medical datasets, which we
addressed during the training phase by using class weights in the loss function. The sample
distribution and example images of each class are displayed in Table 1 and Figure 2.
Figure 2. Example X-ray images of each class within the Pneumonia X-ray dataset. The images on
the left showcase normal X-ray images and the images on the right showcase pneumonia images.
TP + TN
Accuracy = (16)
TP + TN + FP + FN
where TP denotes true positives, TN denotes true negatives, FP denotes false positives,
and FN denotes false negatives.
Precision (also known as positive predictive value) measures the proportion of cor-
rectly predicted positive observations out of the total predicted positives.
TP
Precision = (17)
TP + FP
Recall (or sensitivity) measures the proportion of correctly predicted positive observa-
tions out of the total actual positives.
TP
Recall = (18)
TP + FN
Finally, the F1 score is the weighted average (harmonic mean) of precision and recall,
which helps to gauge the balance between precision and recall. It is particularly useful
when the data class distribution is uneven.
Precision × Recall
F1 Score = 2 × (19)
Precision + Recall
The above metrics were computed at the end of each epoch to provide a detailed
evaluation of the model’s performance over time.
6. Results
6.1. Efficacy of Transfer Learning: Training Results
To validate the efficacy of our approach, we first analyze the results obtained during
the training phase. Figure 3 presents a graphical representation of both the training loss
and the accuracy achieved by the models over time. The rapid convergence of the models
Bioengineering 2024, 11, 406 14 of 21
substantiates the premise that transfer learning from real-world images to medical images
is indeed feasible and highly effective.
Figure 3. Graphical representation of the training loss and accuracy achieved by the models over
time. The solid lines represent the real-world feature transfer learning models, while the dashed lines
represent from-scratch training models. The rapid convergence of the proposed models demonstrates
the feasibility and effectiveness of transferring learning from real-world images to medical images.
Figure 4. Comprehensive comparison between transfer learning models and models trained from
scratch. The line graphs illustrate the performance metrics of test accuracy and F1 score for each
model. The solid lines represent the real-world feature transfer learning models, while the dashed
lines represent from-scratch training models. The comparison emphasizes the superiority of the
transfer learning approach for handling medical image classification.
Table 2. Performance comparison of from-scratch training models vs. real-world feature transfer
learning models. Bold text indicates the best results for each experimental setting.
Table 3. Performance comparison of from-scratch training models vs. real-world feature transfer learning
models with reduced learning rate. Bold text indicates the best results for each experimental setting.
Overall, these results demonstrate the clear advantages of applying transfer learning
for medical image analysis, even when the source and target data domains differ greatly.
Not only did transfer learning models achieve rapid convergence during training, they
delivered superior performance in comparison to their from-scratch counterparts across
multiple performance metrics, thereby solidifying the merit of our approach.
by Szegedy et al. [46], is known for its deep and wide architecture, which allows for efficient
computation and improved performance compared to traditional CNN architectures.
The InceptionNet architecture is characterized by its use of inception modules, which
consist of multiple convolutional layers with different kernel sizes operating in parallel.
This design enables the network to capture features at various scales and resolutions,
enhancing its ability to represent complex patterns in the input data. The inception modules
are stacked together to form a deep network, with additional pooling layers and fully
connected layers added for classification purposes.
InceptionNet has been widely adopted in various computer vision tasks, includ-
ing image classification, object detection, and semantic segmentation. Its success can be
attributed to its ability to learn rich and discriminative feature representations from large-
scale datasets such as ImageNet. As a result, InceptionNet has become a popular choice for
transfer learning, where pretrained models are fine-tuned on target tasks to leverage the
knowledge gained from the source domain.
To evaluate the performance of InceptionNet in the context of real-world feature trans-
fer learning for medical image analysis, we conducted experiments using the InceptionNet-
v3 variant, which has 48 layers and has been pretrained on the ImageNet dataset. We
followed the same experimental setup as described in Section 5.4, where we fine-tuned
the pretrained InceptionNet model on the Pneumonia X-ray dataset and compared its
performance to a model trained from scratch.
Table 4 presents the results of our comparative analysis between the InceptionNet
model trained from scratch and the one utilizing transfer learning. The transfer learning
model achieves a lower test loss of 0.187 compared to 0.315 for the scratch model, indicating
better generalization and reduced overfitting. In terms of test accuracy, the transfer learning
model reaches 93.3%, outperforming the scratch model by 2.1 percentage points.
Table 4. Performance comparison of InceptionNet trained from scratch vs. real-world feature
transfer learning.
7. Discussion
The results presented in this study provide compelling evidence for the effectiveness
of real-world- feature transfer learning in the domain of medical image analysis. By
leveraging pretrained models from general image datasets such as ImageNet, we have
Bioengineering 2024, 11, 406 18 of 21
described as a collection of images exhibiting lung opacity, which may or may not be
caused by pneumonia. To establish a definitive diagnosis of pneumonia, additional clinical
information such as patient history, physical examination findings, and laboratory test
results would be required. Future research could focus on curating datasets with confirmed
pneumonia cases and exploring the use of transfer learning for differentiating between
various causes of lung opacity. Despite this limitation, the dataset serves as a valuable
resource for evaluating the effectiveness of transfer learning in identifying radiological
abnormalities, and showcases the potential of this approach for assisting in the initial
screening and triage of patients with suspected pneumonia.
8. Conclusions
This study investigated the application of transfer learning from general image datasets
to the specialized domain of medical imaging, specifically focusing on chest X-rays. The
research was motivated by the common belief that the significant differences in features
between general and medical images could hinder the effective use of transfer learning.
Through comprehensive experiments and analysis, we have provided substantial evidence
to challenge this assumption and demonstrate the viability of transfer learning for medical
image data.
Our experimental results consistently showed that using pretrained models, such as
those trained on ImageNet, significantly accelerates the training process for medical image
analysis tasks. Furthermore, the simple preprocessing step of converting grayscale medical
images to RGB format proved to be effective in facilitating the transfer of learned features.
Among the evaluated architectures, most models achieved rapid convergence and high
accuracy within the initial training epochs. In particular, ResNeXt50 and Wide ResNet50
demonstrated exceptional performance, reaching over 95% training accuracy in the early
stages. These findings highlight the importance of network depth and design principles in
enabling effective feature propagation during transfer learning.
The comparative analysis between transfer learning models and models trained from
scratch further emphasized the superiority of transfer learning. Across multiple evaluation
metrics, including test loss, accuracy, precision, recall, and F1 score, the transfer learning mod-
els consistently outperformed their counterparts. This underscores the consistent effectiveness
of transfer learning even when the source and target domains differ significantly.
One notable observation was the performance of ShuffleNet, a computationally ef-
ficient model. Despite initially lagging behind the other models in accuracy, ShuffleNet
showed a substantial improvement in test accuracy when transfer learning was applied.
This demonstrates the versatility of our approach and suggests that even models designed
for specific constraints can benefit from transfer learning.
The implications of our work extend beyond the immediate experimental results. By
demonstrating the feasibility of leveraging pretrained models from general image datasets
for medical imaging, we have provided a means to accelerate the development of deep
learning applications in the medical field. The accessibility of pretrained models and the
effectiveness of transfer learning could facilitate rapid advancements in medical image
analysis, potentially leading to more accurate and timely diagnoses.
However, it is important to acknowledge the limitations of this study and the opportu-
nities for future research. The experiments were conducted using only the Pneumonia X-ray
dataset; further validation on other medical image datasets would enhance the generaliz-
ability of our findings. Additionally, investigating the effects of different transfer learning
strategies, such as fine-tuning specific layers or employing alternative initializations, could
provide deeper insights into the mechanisms that enable successful knowledge transfer.
In summary, this paper contributes to the field of deep learning by empirically val-
idating the effectiveness of transfer learning from general image data to medical image
data. Our findings challenge prevailing assumptions and open up new possibilities for
applying transfer learning in specialized domains. This research highlights the adaptability
Bioengineering 2024, 11, 406 20 of 21
and power of deep learning and encourages further exploration and adoption of transfer
learning in various fields, particularly in domains with limited or highly specialized data.
Author Contributions: Conceptualization, C.G. and M.L.; formal analysis, C.G.; investigation, C.G.;
writing—original draft preparation, C.G. and M.L.; writing—review and editing, C.G. and M.L.;
visualization, M.L.; supervision, M.L. All authors have read and agreed to the published version of
the manuscript.
Funding: This research was supported by the Chung-Ang University Research Scholarship Grants
in 2024.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: No new data were created or analyzed in this study.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Shen, D.; Wu, G.; Suk, H.I. Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 2017, 19, 221–248. [CrossRef]
[PubMed]
2. Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; Van Der Laak, J.A.; Van Ginneken, B.; Sánchez, C.I.
A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. [CrossRef] [PubMed]
3. Lee, M. Recent Advancements in Deep Learning Using Whole Slide Imaging for Cancer Prognosis. Bioengineering 2023, 10, 897.
[CrossRef] [PubMed]
4. Kaur, A.; Singh, Y.; Neeru, N.; Kaur, L.; Singh, A. A survey on deep learning approaches to medical images and a systematic look
up into real-time object detection. Arch. Comput. Methods Eng. 2022, 29, 2071–2111. [CrossRef]
5. Ge, Y.; Zhang, Q.; Sun, Y.; Shen, Y.; Wang, X. Grayscale medical image segmentation method based on 2D&3D object detection
with deep learning. BMC Med. Imaging 2022, 22, 33.
6. Erdaş, Ç.B.; Sümer, E. A fully automated approach involving neuroimaging and deep learning for Parkinson’s disease detection
and severity prediction. PeerJ Comput. Sci. 2023, 9, e1485. [CrossRef] [PubMed]
7. Kim, M.; Yun, J.; Cho, Y.; Shin, K.; Jang, R.; Bae, H.j.; Kim, N. Deep learning in medical imaging. Neurospine 2019, 16, 657.
[CrossRef] [PubMed]
8. Tang, W.; Zhang, M.; Xu, C.; Shao, Y.; Tang, J.; Gong, S.; Dong, H.; Sheng, M. Diagnostic efficiency of multi-modal MRI based
deep learning with Sobel operator in differentiating benign and malignant breast mass lesions—a retrospective study. PeerJ
Comput. Sci. 2023, 9, e1460. [CrossRef] [PubMed]
9. Kim, H.E.; Cosa-Linan, A.; Santhanam, N.; Jannesari, M.; Maros, M.E.; Ganslandt, T. Transfer learning for medical image
classification: A literature review. BMC Med. Imaging 2022, 22, 69. [CrossRef]
10. Taşyürek, M.; Öztürk, C. A fine-tuned YOLOv5 deep learning approach for real-time house number detection. PeerJ Comput. Sci.
2023, 9, e1453. [CrossRef] [PubMed]
11. Yu, X.; Wang, J.; Hong, Q.Q.; Teku, R.; Wang, S.H.; Zhang, Y.D. Transfer learning for medical images analyses: A survey.
Neurocomputing 2022, 489, 230–254. [CrossRef]
12. Shamshad, F.; Khan, S.; Zamir, S.W.; Khan, M.H.; Hayat, M.; Khan, F.S.; Fu, H. Transformers in medical imaging: A survey. Med.
Image Anal. 2023, 88, 102802. [CrossRef] [PubMed]
13. Malhotra, P.; Gupta, S.; Koundal, D.; Zaguia, A.; Enbeyle, W. Deep neural networks for medical image segmentation.
J. Healthc. Eng. 2022, 2022. [CrossRef] [PubMed]
14. Chen, Y.; Yang, X.H.; Wei, Z.; Heidari, A.A.; Zheng, N.; Li, Z.; Chen, H.; Hu, H.; Zhou, Q.; Guan, Q. Generative adversarial
networks in medical image augmentation: A review. Comput. Biol. Med. 2022, 144, 105382. [CrossRef] [PubMed]
15. Rudan, I.; Boschi-Pinto, C.; Biloglav, Z.; Mulholland, K.; Campbell, H. Epidemiology and etiology of childhood pneumonia. Bull.
World Health Organ. 2008, 86, 408–416B. [CrossRef]
16. Rajpurkar, P.; Irvin, J.; Zhu, K.; Yang, B.; Mehta, H.; Duan, T.; Ding, D.; Bagul, A.; Langlotz, C.; Shpanskaya, K.; et al. Chexnet:
Radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv 2017, arXiv:1711.05225.
17. He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the Computer Vision—ECCV 2016:
14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part IV 14; 2016; pp. 630–645.
18. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708.
19. Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [CrossRef]
20. Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A survey on deep transfer learning. In Proceedings of the Artificial Neural
Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece,
4–7 October 2018; Proceedings, Part III 27; 2018; pp. 270–279.
Bioengineering 2024, 11, 406 21 of 21
21. Yosinski, J.; Clune, J.; Bengio, Y.; Lipson, H. How transferable are features in deep neural networks? Adv. Neural Inf. Process. Syst.
2014, 27.
22. Sharif Razavian, A.; Azizpour, H.; Sullivan, J.; Carlsson, S. CNN features off-the-shelf: An astounding baseline for recognition. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June
2014; pp. 806–813.
23. Kornblith, S.; Shlens, J.; Le, Q.V. Do better imagenet models transfer better? In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2661–2671.
24. Alzubaidi, L.; Al-Amidie, M.; Al-Asadi, A.; Humaidi, A.J.; Al-Shamma, O.; Fadhel, M.A.; Zhang, J.; Santamaría, J.; Duan, Y.
Novel transfer learning approach for medical imaging with limited labeled data. Cancers 2021, 13, 1590. [CrossRef] [PubMed]
25. Rahman, T.; Chowdhury, M.E.; Khandakar, A.; Islam, K.R.; Islam, K.F.; Mahbub, Z.B.; Kadir, M.A.; Kashem, S. Transfer learning
with deep convolutional neural network (CNN) for pneumonia detection using chest X-ray. Appl. Sci. 2020, 10, 3233. [CrossRef]
26. Chouhan, V.; Singh, S.K.; Khamparia, A.; Gupta, D.; Tiwari, P.; Moreira, C.; Damaševičius, R.; De Albuquerque, V.H.C. A novel
transfer learning based approach for pneumonia detection in chest X-ray images. Appl. Sci. 2020, 10, 559. [CrossRef]
27. Raghu, M.; Zhang, C.; Kleinberg, J.; Bengio, S. Transfusion: Understanding transfer learning for medical imaging. Adv. Neural Inf.
Process. Syst. 2019, 32.
28. Anwar, S.M.; Majid, M.; Qayyum, A.; Awais, M.; Alnowami, M.; Khan, M.K. Medical image analysis using convolutional neural
networks: A review. J. Med. Syst. 2018, 42, 1–13. [CrossRef] [PubMed]
29. Yamashita, R.; Nishio, M.; Do, R.K.G.; Togashi, K. Convolutional neural networks: An overview and application in radiology.
Insights Imaging 2018, 9, 611–629. [CrossRef] [PubMed]
30. Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A survey of the recent architectures of deep convolutional neural networks. Artif.
Intell. Rev. 2020, 53, 5455–5516. [CrossRef]
31. Ayan, E.; Ünver, H.M. Diagnosis of pneumonia from chest X-ray images using deep learning. In Proceedings of the 2019 Scientific
Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), Istanbul, Turkey, 24–26 April 2019; pp. 1–5.
32. Yeom, T.; Gu, C.; Lee, M. DuDGAN: Improving class-conditional GANs via dual-diffusion. IEEE Access 2024. [CrossRef]
33. Ko, K.; Lee, M. ZIGNeRF: Zero-shot 3D Scene Representation with Invertible Generative Neural Radiance Fields. In Proceedings
of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 4–8 January 2024; pp. 4986–4995.
34. Ywet, N.L.; Maw, A.A.; Nguyen, T.A.; Lee, J.W. YOLOTransfer-DT: An Operational Digital Twin Framework with Deep and
Transfer Learning for Collision Detection and Situation Awareness in Urban Aerial Mobility. Aerospace 2024, 11, 179. [CrossRef]
35. Kim, S.; Nam, B.H.; Jung, Y.H. Comparison of Deep Transfer Learning Models for the Quantification of Photoelastic Images.
Appl. Sci. 2024, 14, 758. [CrossRef]
36. Huber, F.; Inderka, A.; Steinhage, V. Leveraging Remote Sensing Data for Yield Prediction with Deep Transfer Learning. Sensors
2024, 24, 770. [CrossRef] [PubMed]
37. Mohammadi, S.; Belgiu, M.; Stein, A. Few-Shot Learning for Crop Mapping from Satellite Image Time Series. Remote Sens. 2024,
16, 1026. [CrossRef]
38. Nikezić, D.P.; Radivojević, D.S.; Lazović, I.M.; Mirkov, N.S.; Marković, Z.J. Transfer Learning with ResNet3D-101 for Global
Prediction of High Aerosol Concentrations. Mathematics 2024, 12, 826. [CrossRef]
39. Lee, S.; Lee, M. MetaSwin: A unified meta vision transformer model for medical image segmentation. PeerJ Comput. Sci. 2024,
10, e1762. [CrossRef] [PubMed]
40. Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a
local nash equilibrium. Adv. Neural Inf. Process. Syst. 2017, 30.
41. Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of
the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131.
42. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520.
43. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500.
44. Zagoruyko, S.; Komodakis, N. Wide Residual Networks. In Proceedings of the British Machine Vision Conference 2016, British
Machine Vision Association, York, UK, 19–22 September 2016.
45. Mooney, P. Chest X-ray Images (Pneumonia). 2021. Available online: https://www.kaggle.com/datasets/paultimothymooney/
chest-xray-pneumonia (accessed on 20 June 2023).
46. Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with
convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June
2015; pp. 1–9.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.