Transfer - Learning - For - Medical - Image - Classification SLR
Transfer - Learning - For - Medical - Image - Classification SLR
Abstract
Background: Transfer learning (TL) with convolutional neural networks aims to improve performances on a new task
by leveraging the knowledge of similar tasks learned in advance. It has made a major contribution to medical image
analysis as it overcomes the data scarcity problem as well as it saves time and hardware resources. However, transfer
learning has been arbitrarily configured in the majority of studies. This review paper attempts to provide guidance for
selecting a model and TL approaches for the medical image classification task.
Methods: 425 peer-reviewed articles were retrieved from two databases, PubMed and Web of Science, published
in English, up until December 31, 2020. Articles were assessed by two independent reviewers, with the aid of a third
reviewer in the case of discrepancies. We followed the PRISMA guidelines for the paper selection and 121 studies were
regarded as eligible for the scope of this review. We investigated articles focused on selecting backbone models and
TL approaches including feature extractor, feature extractor hybrid, fine-tuning and fine-tuning from scratch.
Results: The majority of studies (n = 57) empirically evaluated multiple models followed by deep models (n = 33)
and shallow (n = 24) models. Inception, one of the deep models, was the most employed in literature (n = 26). With
respect to the TL, the majority of studies (n = 46) empirically benchmarked multiple approaches to identify the
optimal configuration. The rest of the studies applied only a single approach for which feature extractor (n = 38) and
fine-tuning from scratch (n = 27) were the two most favored approaches. Only a few studies applied feature extractor
hybrid (n = 7) and fine-tuning (n = 3) with pretrained models.
Conclusion: The investigated studies demonstrated the efficacy of transfer learning despite the data scarcity. We
encourage data scientists and practitioners to use deep models (e.g. ResNet or Inception) as feature extractors, which
can save computational costs and time without degrading the predictive power.
Keywords: Deep learning, Transfer learning, Fine-tuning, Convolutional neural network, Medical image analysis
Introduction
Medical image analysis is a robust subject of research,
with millions of studies having been published in the
last decades. Some recent examples include computer-
*Correspondence: [email protected]
aided tissue detection in whole slide images (WSI) and
†
Mate E. Maros and Thomas Ganslandt have contributed equally to this the diagnosis of COVID-19 pneumonia from chest
work
1
images. Traditionally, sophisticated image feature
Department of Biomedical Informatics at the Center for Preventive
Medicine and Digital Health (CPD‑BW), Medical Faculty Mannheim,
extraction or discriminant handcrafted features (e.g.
Heidelberg University, Theodor‑Kutzer‑Ufer 1‑3, 68167 Mannheim, histograms of oriented gradients (HOG) features [1] or
Germany local binary pattern (LBP) features [2]) have dominated
Full list of author information is available at the end of the article
© The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativeco
mmons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
the field of image analysis, but the recent emergence across use cases, data subjects and data modalities. Our
of deep learning (DL) algorithms has inaugurated a major contributions are as follows:
shift towards non-handcrafted engineering, permitting (i) An overview of contributions to the various case
automated image analysis. In particular, convolutional studies is presented;
neural networks (CNN) have become the workhorse (ii) Actionable recommendations on how to leverage
DL algorithm for image analysis. In recent data chal- TL for medical image classification are provided;
lenges for medical image analysis, all of the top-ranked (iii) Publicly available medical datasets are compiled
teams utilized CNN. For instance, the top-ten ranked with URL as a supplementary material.
solutions, excepting one team, had utilized CNN in
the CAMELYON17 challenge for automated detection The rest of this paper is organized as follows. Section 2
and classification of breast cancer metastases in whole covers the background knowledge and the most com-
slide images [3]. It has also been demonstrated that the mon notations used in the following sections. In Sect. 3,
features extracted from DL surpassed that of the hand- we describe the protocol for the literature selection. In
crafted methods by Shi et al. [4]. Sect. 4, the results obtained are analyzed and compared.
However, DL algorithms including CNN require— Critical discussions are presented in Sect. 5. Finally, we
under preferable circumstances—a large amount of data end with a conclusion and the lessons learned in Sect. 6.
for training; hence follows the data scarcity problem. Par- Figure 1 is the main diagram which presents the whole
ticularly, the limited size of medical cohorts and the cost manuscript.
of expert-annotated data sets are some well-known chal-
lenges. Many research endeavors have tried to overcome Background
this problem with transfer learning (TL) or domain adap- Transfer learning
tation [5] techniques. These aim to achieve high perfor- Transfer learning (TL) stems from cognitive research,
mance on target tasks by leveraging knowledge learned which uses the idea, that knowledge is transferred across
from source tasks. A pioneering review paper of TL was related tasks to improve performances on a new task.
contributed by Pan and Yang [6] in 2010, and they clas- It is well-known that humans are able to solve simi-
sified TL techniques from a labeling aspect, while Weiss lar tasks by leveraging previous knowledge. The formal
et al. [7] summarized TL studies based on homogeneous definition of TL is defined by Pan and Yang with notions
and heterogeneous approaches. Most recently in 2020, of domains and tasks. “A domain consists of a feature
Zhuang et al. [8] reviewed more than forty representative space X and marginal probability distributionP(X),
TL approaches from the perspectives of data and models. where X = {x1 , ..., xn } ∈ X . Given a specific domain
Unsupervised TL is an emerging subject and has recently denoted byD = {X , P(X)}, a task is denoted by T =
received increasing attention from researchers. Wilson Y , f (·) where Y is a label space and f (·) is an objec-
and Cook [9] surveyed a large number of articles of unsu- tive predictive function. A task is learned from the pair
pervised deep domain adaptation. Most recently, gen- {xi , yi } where xi ∈ X and yi ∈ Y . Given a source domain
erative adversarial networks (GANs)-based frameworks DS and learning taskT S , a target domain DT and learning
[10–12] gained momentum, a particularly promising taskT T , transfer learning aims to improve the learning
approach is DANN [13]. Furthermore, multiple ker- of the target predictive function fT (·) in DT by using the
nel active learning [14] and collaborative unsupervised knowledge in DS andT S ” [6].
methods [15] have also been utilized for unsupervised Analogously, one can learn how to drive a motorbike T T
TL. (transferred task) based on one’s cycling skill T s (source
Some studies conducted a comprehensive review task) where driving two-wheel vehicles is regarded as the
focused primarily on DL in the medical domain. Litjens same domain DS = DT . This does not mean that one will
et al. [16] reviewed DL for medical image analysis by not learn how to drive a motorbike without riding a bike,
summarizing over 300 articles, while Chowdhury et al. but it takes less effort to practice driving the motorbike
[17] reviewed the state-of-the-art research on self-super- by adapting one’s cycling skills. Similarly, learning the
vised learning in medicine. On the other hand, others parameters of a network from scratch will require larger
surveyed articles focusing on TL with a specific case annotated datasets and a longer training time to achieve
study such as microorganism counting [18], cervical an acceptable performance.
cytopathology [19], neuroimaging biomarkers of Alzhei-
mer’s disease [20] and magnetic resonance brain imaging Convolutional neural networks using imageNet
in general [21]. Convolutional neural networks (CNN) are a special type
In this paper, we aimed to conduct a survey on TL of deep learning that processes grid-like topology data
with pretrained CNN models for medical image analysis such as image data. Unlike the standard neural network
consisting of fully connected layers only, CNN consists of employs a set of filters with different sizes, and its deep
at least one convolutional layer. Several pretrained CNN networks were constructed by concatenating the multi-
models are publicly accessible online with downloadable ple outputs. However, in the architecture of very deep
parameters. They were pretrained with millions of natu- networks, the parameters of the earlier layers are poorly
ral images on the ImageNet dataset (ImageNet large scale updated during training because they are too far from
visual recognition challenge; ILSVRC) [22]. the output layer. This problem is known as the vanish-
In this paper, CNN models are denoted as back- ing gradient problem which was successfully addressed
bone models. Table 1 summarizes the five most popu- by ResNet [27] by introducing residual blocks with skip
lar models in chronological order from top to bottom. connections between layers.
LeNet [23] and AlexNet [24] are the first generations The number of parameters of one filter is calculated
of CNN models developed in 1998 and 2012 respec- by (a * b * c) + 1, where a * b is the filter dimension, c is
tively. Both are relatively shallow compared to other the number of filters in the previous layer and added 1
models that are developed recently. After AlexNet won is the bias. The total number of parameters is the sum-
the ImageNet large scale visual recognition challenge mation of the parameters of each filter. In the classi-
(ILSVRC) in 2012, designing novel networks became fier head, all models use the Softmax function except
an emerging topic among researchers. VGG [25], also LeNet-5, which utilizes the hyperbolic tangent func-
referred to as OxfordNet, is recognized as the first deep tion. The Softmax function fits well with the classifica-
model, while GoogLeNet [26], also known as Incep- tion problem because it can convert feature vectors to
tion1, set the new state of the art in the ILSVRC 2014. the probability distribution for each class candidate.
Inception introduced the novel block concept that
Transfer learning with convolutional neural networks were considered. The time constraint is specified only for
TL with CNN is the idea that knowledge can be trans- the latest date, which is December 31, 2020. The exact
ferred at the parametric level. Well-trained CNN models search strings used for these two databases are denoted
utilize the parameters of the convolutional layers for a in Appendix A. Duplicates were merged before screening
new task in the medical domain. Specifically, in TL with assessment. The first author screened the title, abstract
CNN for medical image classification, a medical image and methods in order to exclude studies proposing a
classification (target task) can be learned by leveraging novel CNN model. Typically, this type of study stacked
the generic features learned from the natural image clas- up multiple CNN models or concatenated CNN models
sification (source task) where labels are available in both and handcrafted features, and then compared its effi-
domains. For simplicity, the terminology of TL in the cacy with other CNN models. Non-classification tasks,
remainder of the paper refers to homogeneous TL (i.e. and those publications which fell outside the aforemen-
both domains are image analysis) with pretrained CNN tioned date range, were also excluded. For the eligibility
models using ImageNet data for medical image classifica- assessment, full texts were examined by two researchers.
tion in a supervisory manner. A third, independent researcher was involved in deci-
Roughly, there are two TL approaches to leveraging sion-making in the case of discrepancy between the two
CNN models: either feature extractor or fine-tuning. The researchers.
feature extractor approach freezes the convolutional lay-
ers, whereas the fine-tuning approach updates param-
eters during model fitting. Each can be further divided Methodology analysis
into two subcategories; hence, four TL approaches are Eight properties of 121 research articles were surveyed,
defined and surveyed in this paper. They are intuitively investigated, compared and summarized in this paper.
visualized in Fig. 2. Feature extractor hybrid (Fig. 2a) Five are quantitative properties and three are qualitative
discards the FC layers and attaches a machine learning properties. They are specified as follows: (1) Off-the-shelf
algorithm such as SVM or Random Forest classifier into CNN model type (AlexNet, CaffeNet, Inception1, Incep-
the feature extractor, whereas the skeleton of the given tion2, Inception3, Inception4, Inception-Resnet, LeNet,
networks remains the same in the other types (Fig. 2b- MobileNet, ResNet, VGG16, VGG19, DenseNet, Xcep-
d). Fine-tuning from scratch is the most time-intensive tion, many or else); (2) Model performances (accuracy,
approach because it updates the entire ensemble of AUC, sensitivity and specificity); (3) Transfer learning
parameters during the training process. type (feature extractor, feature extractor hybrid, fine-tun-
ing, fine-tuning or many); (4) Fine-tuning ratio; (5) Data
Methods modality (endoscopy, CT/CAT scan, mammographic,
Publications were retrieved from two peer-reviewed microscopy, MRI, OCT, PET, photography, sonography,
databases (PubMed database on January 2, 2021, and SPECT, X-ray/radiography or many); (6) Data subject
Web of Science database on January 22, 2021). Papers (abdominopelvic cavity, alimentary system, bones, car-
were selected based on the following four conditions: diovascular system, endocrine glands, genital systems,
(1) convolutional or CNN should appear in the title or joints, lymphoid system, muscles, nervous system, tissue
abstract; (2) image data analysis should be considered; specimen, respiratory system, sense organs, the integu-
(3) “transfer learning” or “pretrained” should appear in ment, thoracic cavity, urinary system, many or else); (7)
the title or abstract; finally, (4) only experimental studies Data quantity; and (8) The number of classes. They fall
Fig. 2 Four types of transfer learning approach. The last classifier block needs to be replaced by a thinner layer or trained from scratch (ML: Machine
learning; FC: Fully connected layers)
into one of three categories, namely model, transfer 2019, because the process of indexing a publication may
learning or data. take anywhere from three to six months.
Backbone model
Results The majority of the studies (n = 57) evaluated several
Figure 3 shows the PRISMA flow diagram of paper selec- backbone models empirically as depicted in Fig. 4b.
tion. We initially retrieved 467 papers from PubMed For example, Rahaman and his colleagues [28] con-
and Web of Science. 42 duplicates were merged from tributed an intensive benchmark study by evaluating
two databases, and then 425 studies were assessed for fifteen models, namely: VGG16, VGG19, ResNet50,
screening. 189 studies were excluded during the screen- ResNet101, ResNet152, ResNet50V2, ResNet101V2,
ing phase, and then full texts of 236 studies were assessed ResNet152V2, Inception3, InceptionResNet2,
for the next stage. 114 studies were disqualified from MobileNet1, DenseNet121, DenseNet169, DenseNet201
inclusion, resulting in 121 studies. These selected studies and XceptionNet. They concluded that VGG19 presented
were further investigated and organized with respect to the highest accuracy of 89.3%. This result is exceptional
their backbone model and TL type. The data character- because other studies reported that deeper models (e.g.
istics and model performance were also analyzed to gain Inception and ResNet) performed better than the shal-
insights regarding how to employ TL. low models (e.g. VGG and AlexNet). Five studies [29–33]
Figure 4a shows that studies of TL for medical image compared Inception and VGG and reported that Incep-
classification have emerged since 2016 with a 4-year tion performed better, and Ovalle-Magallanes et al. [34]
delay after AlexNet [24] won the ImageNet Challenge in also concluded that Inception3 outperformed compared
2012. Since then the number of publications grew rapidly to ResNet50 and VGG16. Finally, Talo et al. [35] reported
for consecutive years. Studies published in 2020 seem that ResNet50 achieved the best classification accuracy
shrinking compared to the number of publications in compared to AlexNet, VGG16, ResNet18 and ResNet34.
Fig. 4 Studies of transfer learning in medical image classification over time (y-axis) with respect to a the number of publications, b applied
backbone model and c transfer learning type
Besides the benchmark studies, the most prevalent parameters of the convolutional layers. Additional file 2:
model was the Inception (n = 26) that consists of the least Table 2 in Appendix B presents an overview of four TL
parameters shown in Table 1. AlexNet (n = 14) and VGG approaches which were organized based on three dimen-
(n = 10) were the next commonly used models although sions: data modality, data subject and TL type.
they are shallower than ResNet (n = 5) and Inception-
Resnet (n = 2). Finally, only a few studies (n = 7) used a Data characteristics
specific model such as LeNet5, DenseNet, CheXNet, As the summary of data characteristics is depicted in
DarkNet, OverFeat or CaffeNet. Fig. 5, a variety of human anatomical regions has been
studied. Most of the studied regions were breast cancer
Transfer learning exams and skin cancer lesions. Likewise, a wide variety
Similar to the backbone model, the majority of models of imaging modalities contained a unique attribute of
(n = 46) evaluated numerous TL approaches, which are medical image analysis. For instance, computed tomog-
illustrated in Fig. 4c. Many researchers aimed to search raphy (CT) scans and magnetic resonance imaging (MRI)
for the optimal choice of TL approach. Typically, grid are capable of generating 3D image data, while digital
search was applied. Shin and his colleagues [36] exten- microscopy can generate terabytes of whole slide image
sively evaluated three components by varying three (WSI) of tissue specimens.
CNN models (CifarNet, AlexNet and GoogLeNet) with Figure 5b shows that the majority of studies consist of
three TL approaches (feature extractor, fine-tuning from binary classes, while Fig. 5c shows that the majority of
scratch with and without random initialization), and the studies have fallen into the first bin which ranges from 0
fine-tuned GoogLeNet from scratch without random ini- to 600. Minor publications are not depicted in Fig. 5 for
tialization was identified as the best performing model. the following reasons: the experiment was conducted
The most popular TL approach was feature extractor with multiple subjects (human body parts); multiple
(n = 38) followed by fine-tuning from scratch (n = 27), tasks; multiple databases; or the subject is non-human
feature extractor hybrid (n = 7) and fine-tuning (n = 3). body images (e.g. surgical tools).
Feature extractor takes the advantage of saving compu-
tational costs by a large degree compared to the others. Performance visualization
Likewise, the feature extractor hybrid can profit from Figure 6 shows scatter plots of model performance, TL
the same advantage by removing the FC layers and add- type and two data characteristics: data size and image
ing less expansive machine learning algorithms. This is modality. The Y coordinates adhere to two metrics,
particularly beneficial for CNN models with heavy FC namely area under the receiver operating characteris-
layers like AlexNet and VGG. Fine-tuning from scratch tic curve (AUC) and accuracy. Eleven studies used both
was the second most popular approach despite it being metrics, so they are displayed on both scatter plots. The
the most resource-expensive type because it updates the X coordinate is the normalized data quantity, otherwise
entire model. Fine-tuning is less expensive compared to it is not fair to compare the classification performance
the fine-tuning from scratch as it partially updates the with two classes versus ten classes. The data quantities of
Fig. 5 The overview of data characteristics of selected publications. a The correlation of anatomical body parts and imaging modalities. b The
number of classes c The histogram of the quantity of medical image datasets
Fig. 6 Scatter plots of model performance with data size, image modality, backbone model and transfer learning type. Color keys in a and b
indicate the medical image modality, whereas color keys in c and d represent backbone models. Transfer learning types are in any of four marker
shapes for all subfigures
three modalities—CT, MRI and Microscopy—reflect the complex concepts such as skip connections, bottlenecks,
number of patients. convolutional blocks introduced in Inception or ResNet.
For the fair comparison, studies employed only a single With respect to TL approaches, the majority of stud-
model, TL type and image modality are depicted (n = 41). ies empirically tested as many possible combinations of
Benchmark studies were excluded; otherwise, one study CNN models with as many as possible TL approaches.
would generate several overlapping data points and Compared to previously suggested best practices [40],
potentially lead to bias. The excluded studies are either some studies determined fine-tuning arbitrarily and
with multiple models (n = 57), with multiple TL types ambiguously. For instance, [41] froze all layers except the
(n = 14) or with minor models like LeNet (n = 9). last 12 layers without justification, while [42, 43] did not
According to Spearman’s rank correlation analyses, clearly describe the fine-tuning configuration. Lee et al.
there were no relevant associations observed between [44] partitioned VGG16/19 into 5 blocks, unfroze blocks
the size of the data set and performance metrics. Data sequentially and identified the model fine-tuned with two
size and AUC (Fig. 6a, c) showed no relevant correlation blocks that achieved the highest performance. Similarly,
(rsp = 0.05, p = 0.03). Similarly, only a weak positive trend fine-tuned CaffeNet by unfreezing each layer sequentially
(rsp = 0.13, p = 0.17) could be detected between the size [45]. The best results were obtained by the model with
of the dataset and accuracy (Fig. 6b, d). There was also no one retrained layer for the detection task and with two
association between other variables such as modality, TL retrained layers for the classification task.
type and backbone model. For instance, the data points Fine-tuning from scratch (n = 27) was a prevalent TL
of models, such as feature extractors that were fitted into approach in the literature, however, we recommend using
optical coherence tomography (OCT) images (purple this approach carefully for two reasons: firstly, it does
crosses, Fig. 6a, b) showed that larger data quantities did not improve the model performance as shown in Fig. 6
not necessarily guarantee better performance. Notably, and secondly, it is the computationally most expensive
data points in cross shapes (models as feature extractors) choice because it updates large gradients for entire lay-
showed decent results even though only a few fully con- ers. Therefore, we encourage one to begin with the fea-
nected layers were being retrained. ture extractor approach, then incrementally fine-tune the
convolutional layers. We recommend updating all layers
Discussion (fine-tuning from scratch), if the feature extractor does
In this survey of selected literature, we have summa- not reflect the characteristics of the new medical images.
rized 121 research articles applying TL to medical image There was no consensus among studies concerning the
analysis and found that the most frequently used model global optimum configuration for fine-tuning. [46] con-
was Inception. Inception is a deep model, nevertheless, cluded that fine-tuning the last fully connected layers of
it consists of the least parameters (Table 1) owing to the Inception3, ResNet50, and DenseNet121 outperformed
1 × 1 filter [37]. This 1 × 1 filter acts as a fully connected fine-tuning from scratch in all cases. On the other
layer in Inception and ResNet and it lowers the compu- hand, Yu et al. [47] found that retraining from scratch of
tational burden to a great degree [38]. To our surprise, DenseNet201 achieved the highest diagnostic accuracy.
AlexNet and VGG were the next popular models. At We speculate that one of the causes is the variety of data
first glance, this result seemed counterintuitive because subjects and imaging modalities addressed in Sect. 4.3.
ResNet is a more powerful model with fewer parameters Hence, investigating the medical data characteristics (e.g.
compared to AlexNet or VGG. For instance, ResNet50 anatomical sites, imaging modalities, data size, label size
achieved a top-5 error of 6.7% on ILSVRC, which was and more) and TL with CNN models would be interest-
2.6% lower than VGG16 with 5.2 times fewer param- ing to investigate, yet it is understudied in the current lit-
eters and 9.7% lower than AlexNet with 2.4 times fewer erature. Morid et al. [48] stated that deep CNN models
parameters [27]. However, this assumption is valid only may be more effective for the following image modalities:
if the model was fine-tuned from scratch. The number of X-ray, endoscopic and ultrasound images, while shallow
parameters significantly drops when the model is utilized CNN models may be optimal for processing these image
as a feature extractor as shown in Table 1. He et al. [39] modalities: OCT and photography for skin lesions and
performed an in-depth evaluation of the impact of vari- fundus. Nonetheless, more research is needed to further
ous settings for refining the training of multiple backbone confirm these hypotheses.
models, focusing primarily on the ResNet architecture. TL with random initialization often appeared in the
Another assumption was that AlexNet and VGG are easy literature [49–52]. These studies used the architecture of
to understand because the network morphology is linear CNN models only and initialized the training with ran-
and made up of stacked layers. This stands against more dom weights. One could argue that there is no transfer of
knowledge if the entire weights and biases are initialized, used for TL [60]. Similarly, we did not evaluate vision
but this is still considered as TL in the literature. transformers (ViT) [61], which are emerging for image
It is also worth noting that only a few studies [53, 54] data analysis. For instance, Liu et al. [62] compared 22
employed native 3D-CNN. Both studies reported that backbone models and four ViT models and concluded
3D-CNN outperformed 2D-CNN and 2.5-CNN mod- that one of the ViT models exhibited the highest accu-
els, however, Zhang et al. [53] set the number of the racy trained on cropped cytopathology cell images.
frames to 16 and Xiong et al. [54] reduced the resolu- Recently, Chen et al. [63] proposed a novel architecture
tion up to 21*21*21 voxels due to the limitation of com- that is a parallel design of MobileNet and ViT, in view
puter resources. The majority of the studies constructed of achieving not only more efficient computation but
2D-CNN or 2.5D-CNN from 3D inputs. In order to also better model performance.
reduce the processing burden, only a sample of image
slices from 3D inputs was taken. We expect that the num-
ber of studies employing 3D models will increase in the Conclusion
future as high-performance DL is an emerging research We aimed to provide actionable insights to the readers
topic. and ML practitioners, on how to select backbone CNN
We confirmed (Fig. 5c) that only a limited amount of models and tune them properly with consideration of
data was available in most studies for medical image medical data characteristics. While we encourage readers
analysis. Many studies took advantage of using pub- to methodically search for the optimal choice of model
licly accessible medical datasets from grand challenges and TL setup, it is a good starting point to employ deep
(https://grand-challenge.org/challenges). This is a par- CNN models (preferably ResNet or Inception) as feature
ticularly beneficial scientific practice because novel extractors. We recommend updating only the last fully
solutions are shared online allowing for better repro- connected layers of the chosen model on the medical
ducibility. We summarized 78 publicly available medical image dataset. In case the model performance needs to
datasets in Additional file 3: Suppl. Table 3 (Appendix C), be refined, the model should be fine-tuned by incremen-
which were organized based on the following five attrib- tally unfreezing convolutional layers from top to bottom
utes: data modality, anatomical part/region, task type, layers with a low learning rate. Following these basic steps
data name, published year and the link. can save computational costs and time without degrading
Although most evaluated papers included only brief the predictive power. Finally, publicly accessible medi-
information about their hardware setup, no details were cal image datasets were compiled in a structured table
provided about training or test time performance. As describing the modality, anatomical region, task type and
most medical data sets are small, usually consumer-grade publication year as well as the URL for accession.
GPUs in custom workstations or seldom server-grade
cards (P100 or V100) were sufficient for TL. Previous
Abbreviations
survey studies have investigated how DL can be opti- AUC: Area under the receiver operating characteristic curve; CT: Computed
mized and sped up on GPUs [55] or by using specifically tomography; CNN: Convolutional neural networks; DL: Deep learning; FC: Fully
designed hardware accelerators like field-programmable connected; FPGA: Field-programmable gate arrays; GPU: Graphics process‑
ing unit; HOG: Histograms of oriented gradients; ILSVRC: ImageNet large
gate arrays (FPGA) for neural network inference [56]. scale visual recognition challenge; LBP: Local binary pattern; MRI: Magnetic
We could not investigate these aspects of efficient TL resonance imaging; OCT: Optical coherence tomography; TL: Transfer learning;
because execution time was rarely reported in the sur- TPU: Tensor processing unit; ViT: Vision transformer; WSI: Whole slide image.
veyed literature.
This study is limited to surveying only TL for medi- Supplementary Information
cal image classification. However, many interesting The online version contains supplementary material available at https://doi.
org/10.1186/s12880-022-00793-7.
task-oriented TL studies were published in the past few
years, with a particular focus on object detection and
Additional file 1. Search terms.
image segmentation [57], as reflected by the amount of
public data sets (see also Additional file 3: Appendix C., Additional file 2. Summary table of studies.
Table 3). We only investigated off-the-shelf CNN mod- Additional file 3. Summary table of public medical datasets.
els pretrained on ImageNet and intentionally left out
custom CNN architectures, although these can poten- Acknowledgements
tially outperform TL-based models on certain tasks The authors would like to thank Joseph Babcock (Catholic University of
Paris) and Jonathan Griffiths (Academic Writing Support Center, Heidelberg
[58, 59]. Also, we did not evaluate aspects of potential University) for proofreading and Fabian Siegel MD and Frederik Trinkmann
model improvements leveraged by the differences of MD (Medical Faculty Mannheim, Heidelberg University) for comments on
the source- and the target domain of the training data the manuscript. We would like to thank the reviewer for their constructive
feedback.
Author contributions 11. Zhu J-Y, Park T, Isola P, Efros AA. Unpaired image-to-image translation
H.E.K. conceptualized the study. H.E.K. and A.CL. created the search query and using cycle-consistent adversarial networks. In: Proceedings of the
article collection. A.CL., N.S., M.J., M.E.M. and H.K. screened and evaluated the IEEE international conference on computer vision. 2017. pp. 2223–32.
selected papers. H.E.K. analyzed the data and created figures. H.E.K., M.E.M and 12. Zhang T, Cheng J, Fu H, Gu Z, Xiao Y, Zhou K, et al. Noise adaptation
T.G. interpreted the data. M.E.M. advised technical aspects of the study. H.E.K., generative adversarial network for medical image analysis. IEEE Trans
M.E.M, and T.G. wrote the manuscript. M.E.M. and T.G. supervised the study. All Med Imaging. 2019;39:1149–59.
authors critically reviewed the manuscript and approved the final version. 13. Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F,
et al. Domain-adversarial training of neural networks. J Mach Learn Res.
Funding 2016;17:2096–2030.
Open Access funding enabled and organized by Projekt DEAL. A.CL., N.S., 14. Wang Z, Du B, Tu W, Zhang L, Tao D. Incorporating distribution match‑
M.E.M. and T.G. were supported by funding from the German Ministry for Edu‑ ing into uncertainty for multiple kernel active learning. IEEE Trans Knowl
cation and Research (BMBF) within the framework of the Medical Informatics Data Eng. 2019;33:128–42.
Initiative (MIRACUM Consortium: Medical Informatics for Research and Care in 15. Zhang Y, Wei Y, Wu Q, Zhao P, Niu S, Huang J, et al. Collaborative unsu‑
University Medicine; 01ZZ1801E). pervised domain adaptation for medical image diagnosis. IEEE Trans
Image Process. 2020;29:7834–44.
Availability of data and materials 16. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, et al.
The dataset analyzed in this study are shown in Appendix B. In-depth infor‑ A survey on deep learning in medical image analysis. Med Image Anal.
mation is available on reasonable request from the corresponding author 2017;42:60–88.
([email protected]). 17. Chowdhury A, Rosenthal J, Waring J, Umeton R. Applying self-super‑
vised learning to medicine: review of the state of the art and medical
implementations. In: Informatics. Multidisciplinary Digital Publishing
Declarations Institute; 2021. p. 59.
18. Zhang J, Li C, Rahaman MM, Yao Y, Ma P, Zhang J, et al. A comprehen‑
Ethics approval and consent to participate sive review of image analysis methods for microorganism counting:
Not applicable. This manuscript is exempt from ethics approval because it from classical image processing to deep learning approaches. Artif
does not use any animal or human subject data or tissue. Intell Rev. 2021;1–70.
19. Rahaman MM, Li C, Wu X, Yao Y, Hu Z, Jiang T, et al. A survey for cervi‑
Consent for publication cal cytopathology image analysis using deep learning. IEEE Access.
Not applicable. 2020;8:61687–710.
20. Agarwal D, Marques G, de la Torre-Díez I, Franco Martin MA, García
Competing interests Zapiraín B, Martín RF. Transfer learning for Alzheimer’s disease through
The authors declare that they have no conflict of interest. neuroimaging biomarkers: a systematic review. Sensors. 2021;21:7259.
21. Valverde JM, Imani V, Abdollahzadeh A, De Feo R, Prakash M, Ciszek R,
Author details et al. Transfer learning in magnetic resonance brain imaging: a system‑
1
Department of Biomedical Informatics at the Center for Preventive Medicine atic review. J Imaging. 2021;7:66.
and Digital Health (CPD‑BW), Medical Faculty Mannheim, Heidelberg Univer‑ 22. ImageNet. https://www.image-net.org/update-mar-11-2021.php.
sity, Theodor‑Kutzer‑Ufer 1‑3, 68167 Mannheim, Germany. 2 Chair of Medical Accessed 18 May 2021.
Informatics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Wetterkreuz 23. Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied
15, 91058 Erlangen, Germany. to document recognition. Proc IEEE. 1998;86:2278–324.
24. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep
Received: 25 August 2021 Accepted: 30 March 2022 convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Wein‑
berger KQ, editors. Advances in neural information processing systems
25. Curran Associates, Inc.; 2012. p. 1097–105.
25. Simonyan K, Zisserman A. Very deep convolutional networks for large-
scale image recognition. arXiv:14091556 [cs]. 2015.
References 26. Hegde RB, Prasad K, Hebbar H, Singh BMK. Feature extraction using
1. Dalal N, Triggs B. Histograms of oriented gradients for human detec‑ traditional image processing and convolutional neural network meth‑
tion. In: 2005 IEEE computer society conference on computer vision ods to classify white blood cells: a study. Australas Phys Eng Sci Med.
and pattern recognition (CVPR’05). IEEE; 2005. pp. 886–93. 2019;42:627–38.
2. He D-C, Wang L. Texture unit, texture spectrum, and texture analysis. 27. He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recogni‑
IEEE Trans Geosci Remote Sens. 1990;28:509–12. tion. In: 2016 IEEE conference on computer vision and pattern recogni‑
3. CAMELYON17—Grand Challenge. grand-challenge.org. https://camel tion (CVPR). Las Vegas, NV, USA: IEEE; 2016. p. 770–8.
yon17.grand-challenge.org/evaluation/challenge/leaderboard/. 28. Rahaman MM, Li C, Yao Y, Kulwa F, Rahman MA, Wang Q, et al. Identifica‑
Accessed 3 Apr 2021. tion of COVID-19 samples from chest X-Ray images using deep learn‑
4. Shi B, Grimm LJ, Mazurowski MA, Baker JA, Marks JR, King LM, et al. ing: a comparison of transfer learning approaches. XST. 2020;28:821–39.
Prediction of occult invasive disease in ductal carcinoma in situ using 29. Burdick J, Marques O, Weinthal J, Furht B. Rethinking skin lesion seg‑
deep learning features. J Am Coll Radiol. 2018;15(3 Pt B):527–34. mentation in a convolutional classifier. J Digit Imaging. 2018;31:435–40.
5. Wang Z, Du B, Guo Y. Domain adaptation with neural embedding 30. Chen Q, Hu S, Long P, Lu F, Shi Y, Li Y. A transfer learning approach for
matching. IEEE Trans Neural Netw Learn Syst. 2019;31:2387–97. malignant prostate lesion detection on multiparametric MRI. Technol
6. Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Cancer Res Treat. 2019;18:1533033819858363.
Eng. 2010;22:1345–59. 31. Lakhani P. Deep convolutional neural networks for endotracheal tube
7. Weiss K, Khoshgoftaar TM, Wang D. A survey of transfer learning. J Big position and X-ray image classification: challenges and opportunities. J
data. 2016;3:1–40. Digit Imaging. 2017;30:460–8.
8. Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, et al. A comprehensive 32. Yang H, Zhang J, Liu Q, Wang Y. Multimodal MRI-based classification of
survey on transfer learning. Proc IEEE. 2020;109:43–76. migraine: using deep learning convolutional neural network. Biomed
9. Wilson G, Cook DJ. A survey of unsupervised deep domain adapta‑ Eng Online. 2018;17:138.
tion. ACM Trans Intell Syst Technol (TIST). 2020;11:1–46. 33. Yu S, Liu L, Wang Z, Dai G, Xie Y. Transferring deep neural networks for
10. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, the differentiation of mammographic breast lesions. Sci China Technol
et al. Generative adversarial nets. In: Advances in neural information Sci. 2019;62:441–7.
processing systems. 2014;27.
34. Ovalle-Magallanes E, Avina-Cervantes JG, Cruz-Aceves I, Ruiz-Pinales J. 57. Sun C, Li C, Zhang J, Rahaman MM, Ai S, Chen H, et al. Gastric histopa‑
Transfer learning for stenosis detection in X-ray coronary angiography. thology image segmentation using a hierarchical conditional random
Mathematics. 2020;8:1510. field. Biocybern Biomed Eng. 2020;40:1535–55.
35. Talo M, Yildirim O, Baloglu UB, Aydin G, Acharya UR. Convolutional neu‑ 58. Rahaman MM, Li C, Yao Y, Kulwa F, Wu X, Li X, et al. DeepCervix: a deep
ral networks for multi-class brain disease detection using MRI images. learning-based framework for the classification of cervical cells using
Comput Med Imaging Graph. 2019;78:101673. hybrid deep feature fusion techniques. arXiv preprint arXiv:210212191.
36. Shin H-C, Roth HR, Gao M, Lu L, Xu Z, Nogues I, et al. Deep convolu‑ 2021.
tional neural networks for computer-aided detection: CNN architec‑ 59. Alzubaidi L, Al-Amidie M, Al-Asadi A, Humaidi AJ, Al-Shamma O, Fadhel
tures, dataset characteristics and transfer learning. IEEE Trans Med MA, et al. Novel transfer learning approach for medical imaging with
Imaging. 2016;35:1285–98. limited labeled data. Cancers. 2021;13:1590.
37. Lin M, Chen Q, Yan S. Network in network. arXiv:13124400 [cs]. 2014. 60. Alzubaidi L, Fadhel MA, Al-Shamma O, Zhang J, Santamaría J, Duan Y,
38. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going et al. Towards a better understanding of transfer learning for medical
deeper with convolutions. arXiv:14094842 [cs]. 2014. imaging: a case study. Appl Sci. 2020;10:4523.
39. He T, Zhang Z, Zhang H, Zhang Z, Xie J, Li M. Bag of tricks for image 61. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner
classification with convolutional neural networks. arXiv:181201187 [cs]. T, et al. An image is worth 16x16 words: transformers for image recogni‑
2018. tion at scale. arXiv:201011929 [cs]. 2021.
40. Chollet F. Deep learning with Python. Simon and Schuster; 2021. 62. Liu W, Li C, Rahamana MM, Jiang T, Sun H, Wu X, et al. Is the aspect
41. Hemelings R, Elen B, Barbosa-Breda J, Lemmens S, Meire M, Pourjavan ratio of cells important in deep learning? A robust comparison of deep
S, et al. Accurate prediction of glaucoma from colour fundus images learning methods for multi-scale cytopathology cell image clas‑
with a convolutional neural network that relies on active and transfer sification: from convolutional neural networks to visual transformers.
learning. Acta Ophthalmol. 2020;98:e94-100. arXiv:210507402 [cs]. 2021.
42. Valkonen M, Isola J, Ylinen O, Muhonen V, Saxlin A, Tolonen T, et al. 63. Chen Y, Dai X, Chen D, Liu M, Dong X, Yuan L, et al. Mobile-former:
Cytokeratin-supervised deep learning for automatic recognition of bridging mobilenet and transformer. arXiv:210805895 [cs]. 2021.
epithelial cells in breast cancers stained for ER, PR, and Ki-67. IEEE Trans 64. Huang J, Habib A-R, Mendis D, Chong J, Smith M, Duvnjak M, et al. An
Med Imaging. 2019;39:534–42. artificial intelligence algorithm that differentiates anterior ethmoidal
43. Han SS, Park GH, Lim W, Kim MS, Na JI, Park I, et al. Deep neural artery location on sinus computed tomography scans. J Laryngol Otol.
networks show an equivalent and often superior performance to 2020;134:52–5.
dermatologists in onychomycosis diagnosis: automatic construction of 65. Yamada A, Oyama K, Fujita S, Yoshizawa E, Ichinohe F, Komatsu D, et al.
onychomycosis datasets by region-based convolutional deep neural Dynamic contrast-enhanced computed tomography diagnosis of
network. PLoS ONE. 2018;13:e0191493. primary liver cancers using transfer learning of pretrained convolutional
44. Lee K-S, Kim JY, Jeon E, Choi WS, Kim NH, Lee KY. Evaluation of scalabil‑ neural networks: is registration of multiphasic images necessary? Int J
ity and degree of fine-tuning of deep convolutional neural networks CARS. 2019;14:1295–301.
for COVID-19 screening on chest X-ray images using explainable deep- 66. Peng J, Kang S, Ning Z, Deng H, Shen J, Xu Y, et al. Residual con‑
learning algorithm. J Person Med. 2020;10:213. volutional neural network for predicting response of transarterial
45. Zhang R, Zheng Y, Mak TWC, Yu R, Wong SH, Lau JY, et al. Automatic chemoembolization in hepatocellular carcinoma from CT imaging. Eur
detection and classification of colorectal polyps by transferring low- Radiol. 2020;30:413–24.
level CNN features from nonmedical domain. IEEE J Biomed Health 67. Hadj Saïd M, Le Roux M-K, Catherine J-H, Lan R. Development of an arti‑
Inform. 2016;21:41–7. ficial intelligence model to identify a dental implant from a radiograph.
46. Singh V, Danda V, Gorniak R, Flanders A, Lakhani P. Assessment of critical Int J Oral Maxillofac Implants. 2020;35.
feeding tube malpositions on radiographs using deep learning. J Digit 68. Lee J-H, Kim D-H, Jeong S-N. Diagnosis of cystic lesions using pano‑
Imaging. 2019;32:651–5. ramic and cone beam computed tomographic images based on deep
47. Yu X, Zeng N, Liu S, Zhang Y-D. Utilization of DenseNet201 for diagnosis learning neural network. Oral Dis. 2020;26:152–8.
of breast abnormality. Mach Vis Appl. 2019;30:1135–44. 69. Parmar P, Habib AR, Mendis D, Daniel A, Duvnjak M, Ho J, et al. An artifi‑
48. Morid MA, Borjali A, Del Fiol G. A scoping review of transfer learning cial intelligence algorithm that identifies middle turbinate pneumatisa‑
research on medical image analysis using ImageNet. arXiv:200413175 tion (concha bullosa) on sinus computed tomography scans. J Laryngol
[cs, eess]. 2020. https://doi.org/10.1016/j.compbiomed.2020.104115. Otol. 2020;134:328–31.
49. Karri SPK, Chakraborty D, Chatterjee J. Transfer learning based classifica‑ 70. Kajikawa T, Kadoya N, Ito K, Takayama Y, Chiba T, Tomori S, et al. Auto‑
tion of optical coherence tomography images with diabetic macular mated prediction of dosimetric eligibility of patients with prostate
edema and dry age-related macular degeneration. Biomed Opt cancer undergoing intensity-modulated radiation therapy using a
Express. 2017;8:579–92. convolutional neural network. Radiol Phys Technol. 2018;11:320–7.
50. Kim Y-G, Kim S, Cho CE, Song IH, Lee HJ, Ahn S, et al. Effectiveness of 71. Dawud AM, Yurtkan K, Oztoprak H. Application of deep learning in neu‑
transfer learning for enhancing tumor classification with a convolu‑ roradiology: brain haemorrhage classification using transfer learning.
tional neural network on frozen sections. Sci Rep. 2020;10:21899. Comput Intell Neurosci. 2019;2019.
51. Lee H, Tajmir S, Lee J, Zissen M, Yeshiwas BA, Alkasab TK, et al. Fully auto‑ 72. Zhao X, Qi S, Zhang B, Ma H, Qian W, Yao Y, et al. Deep CNN models for
mated deep learning system for bone age assessment. J Digit Imaging. pulmonary nodule classification: model modification, model integra‑
2017;30:427–41. tion, and transfer learning. J Xray Sci Technol. 2019;27:615–29.
52. Tang Y-X, Tang Y-B, Peng Y, Yan K, Bagheri M, Redd BA, et al. Automated 73. da Nobrega RVM, Rebouças Filho PP, Rodrigues MB, da Silva SP, Junior
abnormality classification of chest radiographs using deep convolu‑ CMD, de Albuquerque VHC. Lung nodule malignancy classification
tional neural networks. NPJ Digit Med. 2020;3:70. in chest computed tomography images using transfer learning and
53. Zhang X, Zhang Y, Han EY, Jacobs N, Han Q, Wang X, et al. Classification convolutional neural networks. Neural Comput Appl. 2018;1–18.
of whole mammogram and tomosynthesis images using deep convo‑ 74. Zhang S, Sun F, Wang N, Zhang C, Yu Q, Zhang M, et al. Computer-aided
lutional neural networks. IEEE Trans Nanobiosci. 2018;17:237–42. diagnosis (CAD) of pulmonary nodule of thoracic CT image using
54. Xiong J, Li X, Lu L, Schwartz LH, Fu X, Zhao J, et al. Implementation strat‑ transfer learning. J Digit Imaging. 2019;32:995–1007.
egy of a CNN model affects the performance of CT assessment of EGFR 75. Nibali A, He Z, Wollersheim D. Pulmonary nodule classification
mutation status in lung cancer patients. IEEE Access. 2019;7:64583–91. with deep residual networks. Int J Comput Assist Radiol Surg.
55. Mittal S, Vaishay S. A survey of techniques for optimizing deep learning 2017;12:1799–808.
on GPUs. J Syst Archit. 2019;99:101635. 76. Pham TD. A comprehensive study on classification of COVID-19 on
56. Guo K, Zeng S, Yu J, Wang Y, Yang H. A survey of FPGA-based neural computed tomography with pretrained convolutional neural networks.
network accelerator. arXiv:171208934 [cs]. 2018. Sci Rep. 2020;10:16942.
77. Gao J, Jiang Q, Zhou B, Chen D. Lung nodule detection using convolu‑ 99. Riordon J, McCallum C, Sinton D. Deep learning for the classification of
tional neural networks with transfer learning on CT images. Combinato‑ human sperm. Comput Biol Med. 2019;111:103342.
rial Chemistry & High Throughput Screening. 2020. 100. Marsh JN, Matlock MK, Kudose S, Liu T-C, Stappenbeck TS, Gaut JP, et al.
78. Chowdhury NI, Smith TL, Chandra RK, Turner JH. Automated classifica‑ Deep learning global glomerulosclerosis in transplant kidney frozen
tion of osteomeatal complex inflammation on computed tomography sections. IEEE Trans Med Imaging. 2018;37:2718–28.
using convolutional neural networks. In: International forum of allergy & 101. Kanavati F, Toyokawa G, Momosaki S, Rambeau M, Kozuma Y, Shoji F,
rhinology. Wiley Online Library; 2019. pp. 46–52. et al. Weakly-supervised learning for lung carcinoma classification using
79. Nishio M, Sugiyama O, Yakami M, Ueno S, Kubo T, Kuroda T, et al. deep learning. Sci Rep. 2020;10:9297.
Computer-aided diagnosis of lung nodule classification between 102. Kather JN, Krisam J, Charoentong P, Luedde T, Herpel E, Weis C-A,
benign nodule, primary lung cancer, and metastatic lung cancer at et al. Predicting survival from colorectal cancer histology slides
different image size using deep convolutional neural network with using deep learning: A retrospective multicenter study. PLOS Med.
transfer learning. PLoS ONE. 2018;13:e0200721. 2019;16:e1002730.
80. Zachariah R, Samarasena J, Luba D, Duh E, Dao T, Requa J, et al. Predic‑ 103. He Y, Guo J, Ding X, van Ooijen PM, Zhang Y, Chen A, et al. Con‑
tion of polyp pathology using convolutional neural networks achieves volutional neural network to predict the local recurrence of giant
“resect and discard” thresholds. Am J Gastroenterol. 2020;115:138–44. cell tumor of bone after curettage based on pre-surgery magnetic
81. Zhu Y, Wang Q-C, Xu M-D, Zhang Z, Cheng J, Zhong Y-S, et al. Applica‑ resonance images. Eur Radiol. 2019;29:5441–51.
tion of convolutional neural network in the diagnosis of the invasion 104. Yuan Y, Qin W, Buyyounouski M, Ibragimov B, Hancock S, Han B, et al.
depth of gastric cancer based on conventional endoscopy. Gastrointest Prostate cancer classification with multiparametric MRI transfer learn‑
Endosc. 2019;89:806-815.e1. ing model. Med Phys. 2019;46:756–65.
82. Cho B-J, Bang CS, Park SW, Yang YJ, Seo SI, Lim H, et al. Automated 105. Borkowski K, Rossi C, Ciritsis A, Marcon M, Hejduk P, Stieb S, et al.
classification of gastric neoplasms in endoscopic images using a con‑ Fully automatic classification of breast MRI background parenchymal
volutional neural network. Endoscopy. 2019;51:1121–9. enhancement using a transfer learning approach. Medicine (Balti‑
83. Shichijo S, Nomura S, Aoyama K, Nishikawa Y, Miura M, Shinagawa T, more). 2020;99.
et al. Application of convolutional neural networks in the diagnosis of 106. Zhu Z, Harowicz M, Zhang J, Saha A, Grimm LJ, Hwang ES, et al. Deep
helicobacter pylori infection based on endoscopic images. EBioMedi‑ learning analysis of breast MRIs for prediction of occult invasive dis‑
cine. 2017;25:106–11. ease in ductal carcinoma in situ. Comput Biol Med. 2019;115:103498.
84. Shichijo S, Endo Y, Aoyama K, Takeuchi Y, Ozawa T, Takiyama H, et al. 107. Fukuma R, Yanagisawa T, Kinoshita M, Shinozaki T, Arita H, Kawaguchi
Application of convolutional neural networks for evaluating Helicobac‑ A, et al. Prediction of IDH and TERT promoter mutations in low-grade
ter pylori infection status on the basis of endoscopic images. Scand J glioma from magnetic resonance images using a convolutional
Gastroenterol. 2019;54:158–63. neural network. Sci Rep. 2019;9:1–8.
85. Patrini I, Ruperti M, Moccia S, Mattos LS, Frontoni E, De Momi E. Transfer 108. Banzato T, Causin F, Della Puppa A, Cester G, Mazzai L, Zotti A. Accu‑
learning for informative-frame selection in laryngoscopic videos racy of deep learning to differentiate the histopathological grading
through learned features. Med Boil Eng Comput. 2020;1–14. of meningiomas on MR images: a preliminary study. J Magn Reson
86. Samala RK, Chan H-P, Hadjiiski L, Helvie MA. Risks of feature leakage and Imaging. 2019;50:1152–9.
sample size dependencies in deep feature extraction for breast mass 109. Swati ZNK, Zhao Q, Kabir M, Ali F, Ali Z, Ahmed S, et al. Brain tumor
classification. Med Phys. 2020. classification for MR images using transfer learning and fine-tuning.
87. Mohamed AA, Berg WA, Peng H, Luo Y, Jankowitz RC, Wu S. A deep Comput Med Imaging Graph. 2019;75:34–46.
learning method for classifying mammographic breast density catego‑ 110. Yang Y, Yan L-F, Zhang X, Han Y, Nan H-Y, Hu Y-C, et al. Glioma grading
ries. Med Phys. 2018;45:314–21. on conventional MR images: a deep learning study with transfer
88. Perek S, Kiryati N, Zimmerman-Moreno G, Sklair-Levy M, Konen E, Mayer learning. Front Neurosci. 2018;12:804.
A. Classification of contrast-enhanced spectral mammography (CESM) 111. Deepak S, Ameer PM. Brain tumor classification using deep CNN
images. Int J Comput Assist Radiol Surg. 2019;14:249–57. features via transfer learning. Comput Biol Med. 2019;111:103345.
89. Samala RK, Chan H-P, Hadjiiski L, Helvie MA, Richter CD, Cha KH. Breast 112. Singla N, Dubey K, Srivastava V. Automated assessment of breast can‑
cancer diagnosis in digital breast tomosynthesis: effects of training cer margin in optical coherence tomography images via pretrained
sample size on multi-stage transfer learning using deep neural nets. convolutional neural network. J Biophoton. 2019;12:e201800255.
IEEE Trans Med Imaging. 2018;38:686–96. 113. Gessert N, Lutz M, Heyder M, Latus S, Leistner DM, Abdelwahed YS,
90. Huynh BQ, Li H, Giger ML. Digital mammographic tumor classification et al. Automatic plaque detection in IVOCT pullbacks using convolu‑
using transfer learning from deep convolutional neural networks. J Med tional neural networks. IEEE Trans Med Imaging. 2018;38:426–34.
Imaging. 2016;3:034501. 114. Ahn JM, Kim S, Ahn K-S, Cho S-H, Lee KB, Kim US. A deep learning
91. Chougrad H, Zouaki H, Alheyane O. Deep convolutional neural net‑ model for the detection of both advanced and early glaucoma using
works for breast cancer screening. Comput Methods Programs Biomed. fundus photography. PLoS ONE. 2018;13:e0207982.
2018;157:19–30. 115. Treder M, Lauermann JL, Eter N. Automated detection of exuda‑
92. Samala RK, Chan H-P, Hadjiiski LM, Helvie MA, Richter CD. Generalization tive age-related macular degeneration in spectral domain optical
error analysis for deep convolutional neural network with transfer learn‑ coherence tomography using deep learning. Graefe’s Arch Clin Exp
ing in breast cancer diagnosis. Phys Med Biol. 2020;65:105002. Ophthalmol. 2018;256:259–65.
93. Shen L, Margolies LR, Rothstein JH, Fluder E, McBride R, Sieh W. Deep 116. Zheng C, Xie X, Huang L, Chen B, Yang J, Lu J, et al. Detecting
learning to improve breast cancer detection on screening mammogra‑ glaucoma based on spectral domain optical coherence tomography
phy. Sci Rep. 2019;9:1–12. imaging of peripapillary retinal nerve fiber layer: a comparison study
94. Shafique S, Tehsin S. Acute lymphoblastic leukemia detection and clas‑ between hand-crafted features and deep learning model. Graefe’s
sification of its subtypes using pretrained deep convolutional neural Arch Clin Exp Ophthalmol. 2020;258:577–85.
networks. Technol Cancer Res Treat. 2018;17:1533033818802789. 117. Zago GT, Andreão RV, Dorizzi B, Salles EOT. Retinal image quality
95. Yu Y, Wang J, Ng CW, Ma Y, Mo S, Fong ELS, et al. Deep learning enables assessment using deep learning. Comput Biol Med. 2018;103:64–70.
automated scoring of liver fibrosis stages. Sci Rep. 2018;8:16016. 118. Burlina P, Pacheco KD, Joshi N, Freund DE, Bressler NM. Comparing
96. Huttunen MJ, Hassan A, McCloskey CW, Fasih S, Upham J, Vanderhyden humans and deep learning performance for grading AMD: a study
BC, et al. Automated classification of multiphoton microscopy images in using universal deep features and transfer learning for automated
of ovarian tissue using deep learning. J Biomed Opt. 2018;23:066002. AMD analysis. Comput Biol Med. 2017;82:80–6.
97. Talo M. Automated classification of histopathology images using trans‑ 119. Liu TA, Ting DS, Paul HY, Wei J, Zhu H, Subramanian PS, et al. Deep
fer learning. Artif Intell Med. 2019;101:101743. learning and transfer learning for optic disc laterality detection:
98. Mazo C, Bernal J, Trujillo M, Alegre E. Transfer learning for classifica‑ Implications for machine learning in neuro-ophthalmology. J Neu‑
tion of cardiovascular tissues in histological images. Comput Methods roophthalmol. 2020;40:178–84.
Programs Biomed. 2018;165:69–76.
120. Choi JY, Yoo TK, Seo JG, Kwak J, Um TT, Rim TH. Multi-categorical 141. Kim J-E, Nam N-E, Shim J-S, Jung Y-H, Cho B-H, Hwang JJ. Transfer learn‑
deep learning neural network to classify retinal images: a pilot study ing via deep neural networks for implant fixture system classification
employing small database. PLoS ONE. 2017;12:e0187336. using periapical radiographs. J Clin Med. 2020;9:1117.
121. Gómez-Valverde JJ, Antón A, Fatti G, Liefers B, Herranz A, Santos A, 142. Lee J-H, Kim D-H, Jeong S-N, Choi S-H. Detection and diagnosis of den‑
et al. Automatic glaucoma classification using color fundus images tal caries using a deep learning-based convolutional neural network
based on convolutional neural networks and transfer learning. algorithm. J Dent. 2018;77:106–11.
Biomed Opt Express. 2019;10:892–913. 143. Lee J-H, Jeong S-N. Efficacy of deep convolutional neural network algo‑
122. Xu BY, Chiang M, Chaudhary S, Kulkarni S, Pardeshi AA, Varma R. Deep rithm for the identification and classification of dental implant systems,
learning classifiers for automated detection of gonioscopic angle using panoramic and periapical radiographs: a pilot study. Medicine.
closure based on anterior segment OCT images. Am J Ophthalmol. 2020;99.
2019;208:273–80. 144. Paul HY, Kim TK, Wei J, Shin J, Hui FK, Sair HI, et al. Automated semantic
123. Shen X, Zhang J, Yan C, Zhou H. An automatic diagnosis method of labeling of pediatric musculoskeletal radiographs using deep learning.
facial acne vulgaris based on convolutional neural network. Sci Rep. Pediatr Radiol. 2019;49:1066–70.
2018;8:1–10. 145. Kim DH, MacKinnon T. Artificial intelligence in fracture detection:
124. Cirillo MD, Mirdell R, Sjöberg F, Pham TD. Time-independent prediction transfer learning from deep convolutional neural networks. Clin Radiol.
of burn depth using deep convolutional neural networks. J Burn Care 2018;73:439–45.
Res. 2019;40:857–63. 146. Cheng C-T, Ho T-Y, Lee T-Y, Chang C-C, Chou C-C, Chen C-C, et al. Appli‑
125. Huang K, He X, Jin Z, Wu L, Zhao X, Wu Z, et al. Assistant diagnosis of cation of a deep learning algorithm for detection and visualization of
basal cell carcinoma and seborrheic keratosis in chinese population hip fractures on plain pelvic radiographs. Eur Radiol. 2019;29:5469–77.
using convolutional neural network. J Healthcare Eng. 2020;2020. 147. Abidin AZ, Deng B, DSouza AM, Nagarajan MB, Coan P, Wismüller A.
126. Sun Y, Shan C, Tan T, Tong T, Wang W, Pourtaherian A. Detecting discom‑ Deep transfer learning for characterizing chondrocyte patterns in phase
fort in infants through facial expressions. Physiol Meas. 2019;40:115006. contrast X-Ray computed tomography images of the human patellar
127. Cheng PM, Malhi HS. Transfer learning with convolutional neural cartilage. Comput Biol Med. 2018;95:24–33.
networks for classification of abdominal ultrasound images. J Digit 148. Heidari M, Mirniaharikandehei S, Khuzani AZ, Danala G, Qiu Y, Zheng B.
Imaging. 2017;30:234–43. Improving the performance of CNN to predict the likelihood of COVID-
128. Xue L-Y, Jiang Z-Y, Fu T-T, Wang Q-M, Zhu Y-L, Dai M, et al. Transfer learn‑ 19 using chest X-ray images with preprocessing algorithms. Int J Med
ing radiomics based on multimodal ultrasound imaging for staging Inf. 2020;144:104284.
liver fibrosis. Eur Radiol. 2020;1–11. 149. Albahli S, Albattah W. Deep transfer learning for COVID-19 prediction:
129. Byra M, Styczynski G, Szmigielski C, Kalinowski P, Michalowski L, Paluszk‑ case study for limited data problems. Curr Med Imaging. 2020.
iewicz R, et al. Transfer learning with deep convolutional neural network 150. Minaee S, Kafieh R, Sonka M, Yazdani S, Jamalipour SG. Deep-COVID:
for liver steatosis assessment in ultrasound images. Int J Comput predicting COVID-19 from chest X-ray images using deep transfer learn‑
Assisted Radiol Surg. 2018;13:1895–903. ing. Med Image Anal. 2020;65:101794.
130. Banzato T, Bonsembiante F, Aresu L, Gelain ME, Burti S, Zotti A. Use 151. Apostolopoulos ID, Mpesiana TA. Covid-19: automatic detection from
of transfer learning to detect diffuse degenerative hepatic diseases x-ray images utilizing transfer learning with convolutional neural net‑
from ultrasound images in dogs: a methodological study. Vet J. works. Phys Eng Sci Med. 2020;43:635–40.
2018;233:35–40. 152. Romero M, Interian Y, Solberg T, Valdes G. Targeted transfer learning
131. Hetherington J, Lessoway V, Gunka V, Abolmaesumi P, Rohling R. SLIDE: to improve performance in small medical physics datasets. Med Phys.
automatic spine level identification system using a deep convolutional 2020;47:6246–56.
neural network. Int J CARS. 2017;12:1189–98. 153. Clancy K, Aboutalib S, Mohamed A, Sumkin J, Wu S. Deep learning pre-
132. Chi J, Walia E, Babyn P, Wang J, Groot G, Eramian M. Thyroid nodule training strategy for mammogram image classification: an evaluation
classification in ultrasound images by fine-tuning deep convolutional study. J Digit Imaging. 2020;33:1257–65.
neural network. J Digit Imaging. 2017;30:477–86.
133. Sridar P, Kumar A, Quinton A, Nanan R, Kim J, Krishnakumar R. Decision
fusion-based fetal ultrasound image plane classification using convolu‑ Publisher’s Note
tional neural networks. Ultrasound Med Biol. 2019;45:1259–73. Springer Nature remains neutral with regard to jurisdictional claims in pub‑
134. Byra M, Galperin M, Ojeda-Fournier H, Olson L, O’Boyle M, Comstock lished maps and institutional affiliations.
C, et al. Breast mass classification in sonography with transfer learning
using a deep convolutional neural network and color conversion. Med
Phys. 2019;46:746–55.
135. Chen C-H, Lee Y-W, Huang Y-S, Lan W-R, Chang R-F, Tu C-Y, et al.
Computer-aided diagnosis of endobronchial ultrasound images using
convolutional neural network. Comput Methods Programs Biomed.
2019;177:175–82.
136. Zheng Q, Furth SL, Tasian GE, Fan Y. Computer-aided diagnosis of con‑
genital abnormalities of the kidney and urinary tract in children based
on ultrasound imaging data by integrating texture image features and
deep transfer learning image features. J Pediatr Urol. 2019;15:75-e1.
Ready to submit your research ? Choose BMC and benefit from:
137. Kim DH, Wit H, Thurston M. Artificial intelligence in the diagnosis of Par‑
kinson’s disease from ioflupane-123 single-photon emission computed
• fast, convenient online submission
tomography dopamine transporter scans using transfer learning. Nucl
Med Commun. 2018;39:887–93. • thorough peer review by experienced researchers in your field
138. Papathanasiou ND, Spyridonidis T, Apostolopoulos DJ. Automatic • rapid publication on acceptance
characterization of myocardial perfusion imaging polar maps
• support for research data, including large and complex data types
employing deep learning and data augmentation. Hell J Nucl Med.
2020;23:125–32. • gold Open Access which fosters wider collaboration and increased citations
139. Cheng PM, Tejura TK, Tran KN, Whang G. Detection of high-grade small • maximum visibility for your research: over 100M website views per year
bowel obstruction on conventional radiography with convolutional
neural networks. Abdom Radiol. 2018;43:1120–7. At BMC, research is always in progress.
140. Devnath L, Luo S, Summons P, Wang D. Automated detection of pneu‑
moconiosis with multilevel deep features learned from chest X-Ray Learn more biomedcentral.com/submissions
radiographs. Comput Biol Med. 2021;129:104125.
1. use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
2. use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
3. falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
4. use bots or other automated methods to access the content or redirect messages
5. override any security feature or exclusionary protocol; or
6. share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at