Transfer Learning in Building Neural Network Model Case Study
Transfer Learning in Building Neural Network Model Case Study
Maja Braović
University of Split, Faculty of Electrical
Engineering, Mechanical Engineering
and Naval Architecture
Split, Croatia
[email protected]
Abstract—The transfer learning approach to the areas such as the laws of motion, which govern all objects in
development of deep learning (DL) models accelerates the motion, to more focused areas such as Wall Street analysts'
solving of the set task. It also allows solving tasks when large data expertise in predicting S&P 500 stock prices. The depth and
sets required for extracting non-linear mapping functions are not accuracy of domain knowledge can vary significantly. The
available. This approach can even reduce the need for large deeper the domain knowledge, the more reliable it is usually.
computer resources. This paper contains a case study that shows The previous example about knowledge domains applies
how easily and quickly it is possible to build a DL model on equally to the quality of knowledge. Existing knowledge
sparse data using a transfer learning approach. Image always helps to solve a certain task. According to Harsh [1],
processing task was chosen because the development of
“Knowledge may be regarded as context dependent and
convolutional neural networks (CNN - Convolutional Neural
Network) has led to excellent results in solving the challenges of
therefore understanding knowledge reusability for a given
image classification and object localization and detection. The system is an important quantity”.
transfer learning approach enabled simplified and accelerated Machine Learning (ML) is ubiquitous and a common
model development for the selected task. The applied process is computing approach in extracting, capturing, and using
formalized into separate distinct procedural steps. A system for knowledge today. Different ML techniques are at our disposal
the automatic detection of Alzheimer's dementia through MRI [2]. From Support Vector Machines, Decision Trees, and many
(Magnetic Resonance Imaging) images was the task that needed
others, to Deep Learning (DL) Artificial Neural Networks,
to be solved. The popular CNN AlexNet architecture was used as
ANN, just NN for short. As with the reuse of any knowledge,
a starting point for the development of the new model. A
minimally modified model developed by transfer learning
the reuse of models generated by ML reduces the time and
achieved an average accuracy of over 88% and an average recall effort required to solve a particular problem. As a rule, this is
of over 91%. These results are quite comparable to human much faster and easier than creating models from scratch.
operators. How suitable machine learning models are for transfer
learning is shown by the fact that Hugging Face, a platform
Keywords—transfer learning, deep learning, AlexNet,
for the ML community to share ML models and datasets,
convolutional neural network, OASIS data set, binary classifier,
currently offers more than 700 thousand public reusable
Alzheimer’s dementia.
models with a transfer learning approach [3]. Most NN models
are categorized as computer vision or NLP models. The
concept of reuse of NN called transfer learning (TL) was
I. INTRODUCTION proposed by Stevo Božinovski and Ante Fulgosi in 1976 in a
Reusing existing knowledge is a common approach in paper written in Croatian [4], [5]. A formalization of model
everyday life and research. If a person knows how to drive a properties that can help select a compatible model for transfer
car, he can reuse his knowledge of vehicle features such as learning applications is currently underway [6].
brakes and steering or traffic rules when he tries to drive a In this paper, we are focused on the reuse of DL modeled
truck. If they know how to bake a chocolate cake, they can knowledge. The case study represents the process of building
reuse their knowledge of cake crusts when they try to bake a a new NN model through transferring knowledge. The steps
vanilla cake. When the tasks being attempted have transferable taken to reuse the existing NN AlexNet [7] for a new task are
knowledge, reusing existing knowledge reduces the time and identified and implemented. A new task is the classification of
effort required to solve the problem. MR images with and without signs of Alzheimer's disease.
Knowledge is always domain specific and contextual no The built model is not intended for clinical use. It is used to
matter how broad the domain of knowledge is. This domain- pinpoint the concrete steps in the DL model building by
specific nature of knowledge can vary. From covering vast
transfer learning. And to show that even a small data set can training relies on error between generated output and real
produce a model comparable to human. output from training data in a backward propagation process.
Transfer learning is an approach to developing ML models, Deep learning methods use networks with deep
not a machine learning technique, gaining more popularity, and architectures with many hidden layers that can model highly
engaging more and more research effort especially with NN complex functions tackling challenging problems [12].
models that rely on large amounts of data and computation Convolutional neural networks (CNN) are deep neural
resources [8]. Transfer learning can help build useful NN networks for processing visual data in various image
models when only scarce data and little computation resources classification and recognition tasks. Specific processing layer
are available. Transfer learning term is now widely used across types, convolutional and pooling layers handle multi-
ML techniques and beyond, together with more specialized dimensional visual data [13]. Convolution layers extract
terms like inductive transfer, knowledge transfer, meta- different image features (lines, shapes, ...) by applying
learning, etc. [9], [10]. different kernel or convolution matrix, while pooling layers
reduce data dimensionality.
The rest of the paper is organized as follows. Section 2
presents deep learning with an emphasis on Convolutional Many CNN architectures are developed and can be used in
Neural Networks (CNNs) and the AlexNet architecture. the process of transfer learning to be reused for tasks similar
Section 3 discusses Transfer Learning (TL) and provides a to those for which they were initially built. Quite popular are
definition and mathematical formalization of deep transfer AlexNet [14], GoogLeNet [15], and ResNet [16] network
learning. Section 4 presents a case study of building an MRI architectures, each tested by and won (2012, 2014, 2015) the
classifier for Alzheimer's disease with TL approach. This ImageNet Large Scale Visual Recognition Challenge
section contains subsections on Alzheimer's disease, the (ILSVRC). This contest aims to enhance algorithms
OASIS dataset, and a detailed description of the classifier development for image processing and provide public training
development process. NN development procedure with data sets for computer vision models for image classification,
transfer learning is formalized with process steps identified object detection and similar tasks.
through this case study. Section 5 provides a conclusion and
summarizes the key ideas in the development of deep learning We have chosen AlexNet with ready-made solutions for
transfer learning models. transfer learning with AlexNet in Matlab [17]. Matlab R2018b
was used to remodel and retrain AlexNet to classify MRI
images with and without signs of Alzheimer disease.
II. DEEP LEARNING
Artificial neural networks, created according to human A. AlexNet
brain analogy, consist of “neurons”, termed nodes, with inputs AlexNet is a deep convolutional neural network developed
and activated outputs based on activation function and set by Krizhevsky, Sutskever, and Hinton [14]. It contains 60
threshold. NN can identify nonlinear relationships, recognizing million parameters and 650,000 neurons, five convolutional
patterns in the data without human intervention [11]. If there is layers, three pooling layers, two fully connected layers and a
a pattern in data and enough data to identify that pattern, a softmax output layer. The architecture of the network is shown
useful NN can be built. NN usually has one input layer (xi), in Fig. 2.
hidden layers in between and one output layer (yi) that is fully
connected. Fully connected layers connect every neuron in one The ReLU (Rectified Linear Unit) activation function is
layer to every neuron in another layer. That is classic MLP used for convolutional and fully connected layers. Pooling
(multilayer perceptron) and what most assume using the term layers adjust dimensionality. In the first fully connected layer,
neural network (Fig. 1). dropout with a probability of 0.5 is used to reduce network
overfitting. The network was trained on 1.2 million ImageNet
images and divided into 1000 different classes. Classes are
diverse and vary from animals (scorpion, leopard, Rottweiler,
…), events (Band Aid) to inanimate objects (castle, pillow,
…).
Predicted
Predicted
Alzh. 18 1 Alzh. 17 2
Healthy 3 16 Healthy 2 17
Actual Actual
fold 3 fold 4
Alzh. Healthy Alzh. Healthy
Fig. 6. Model input images from [23].
Predicted
Predicted
Alzh. 17 4 Alzh. 19 3
Training hyperparameters were set to batch=10, and
epoch=6. Since the network already “knows” a lot, there is no
Healthy 2 15 Healthy 0 16
need for long training. The Stochastic Gradient Descent with
Momentum (SGDM) algorithm, a common optimization
algorithm, is used. Matlab Toolboxes have made it easy to
generate a new model code. TABLE IV. MODEL PRECISION AND RECALL FOR EACH FOLD
The used Matlab code snippet is provided here: Fold 1 2 3 4 Average
Precision 0,8571 0,8947 0,8947 0,863 0,8773
imageAugmenter = imageDataAugmenter('RandXReflection',true, Recall 0,9473 0,8947 0,809 1 0,91275
'RandXTranslation',pixelRange, 'RandYTranslation',pixelRange);
augimdsTrain = augmentedImageDatastore(inputSize(1:2),imdsTrain,
'DataAugmentation',imageAugmenter); Obtained results are comparable to the human operator
augimdsValidation = results. Overview of several available studies on radiologist
augmentedImageDatastore(inputSize(1:2),imdsValidation); results reports sensitivity (recall) from 0,88 to 0,95 and
options = trainingOptions('sgdm', 'MiniBatchSize',10, 'MaxEpochs',6, accuracy from 0,924 to 0,9401 [27]. Accuracy is not provided
'InitialLearnRate',1e-4, 'Shuffle','every-epoch', in the overview. We have calculated accuracy according to (2).
'ValidationData',augimdsValidation, 'ValidationFrequency',3,
'Verbose',false, 'Plots','training-progress');
Accuracy = (sensitivity) (prevalence)
(specificity) (1 ! prevalence) (2)
C. Results overview and analysis
Prevalence is the proportion of Alzheimer's disease cases
The method of 4-fold cross-validation was applied to assess in the total data set. These studies are conducted on small data
the effectiveness of the created model. K-fold cross-validation sets (from 45 to 138) and different types of MRI images, but
of the NN model is a commonly used validation method [26]. present human metrics for comparison [27].
Available data set is randomly split into subsets of the data set
called “folds”. Each subset is used for validation while the rest The aim of the research was to determine the concrete
of the data is used for training. Model metrics are averaged steps in the construction of the NN model of transferable
folds metric. Validation accuracy metric for each fold is learning. Changes to the existing model were minimal. The
provided in Table II. developed automated classification system was just a case
study of a transfer learning approach for developing deep
TABLE II. MODEL ACCURACY FOR EACH FOLD learning models on small datasets. The presented case study
demonstrates a rapidly developed NN model on small datasets
Average accuracy fold 1 fold 2 fold 3 fold 4 that is comparable to a human operator.
88,815% 89,47% 89,47% 84,21% 92,11%
The steps of the process are:
The total input dataset is perfectly balanced and contains 1. Collect the data for which the model is developed
the same number of images for each class (i.e., with and 2. Identify existing models that can serve as a basis for
without Alzheimer's disease), which cannot be expected in the learning transfer, Hugging Face [3] is currently the
real world. The distribution of training and validation data, best way to do this step
although random, maintains balanced data. The impact of
unbalanced datasets on model results has not been investigated. 3. Identify minimal changes in the NN architecture to
The confusion matrices for each fold are shown in Table III. adapt the existing model
The precision and recall metrics are presented in Table IV. 4. Apply fine-tuning with available data
Achieved average precision is 0,8773, while average recall 5. If the new model is not adequate, return to step 3 until
is 0,91275. Detecting all Alzheimer’s patients is critically there are no more recognizable architectural changes,
important, so to be applicable in real life such a model should then return to step 2 and select another existing model
aim at a higher recall score.
In our case model accuracy could be further improved just [5] S. Bozinovski, “Reminder of the first paper on transfer learning in
using the entire OASIS-1 image data set. But the aim of this neural networks”, Informatica 44: pp. 291-302, 2020.
study was just to identify the actual process of building a NN [6] [x3] M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, L. Vasserman, B.
Hutchinson, E. Spitzer, I. D. Raji, and T. Gebru, “Model Cards for
model with transfer learning. And to inspect how simple and Model Reporting”, 2019 Conference on Fairness, Accountability, and
fast is to achieve adequate model metrics on a small data set Transparency (FAT* '19), New York, USA, pp. 220–229, 2019.
what has been achieved. The main limitation of this approach [7] A. Krizhevsky, I. Sutskever and G. E. Hinton, “Imagenet classification
is the need for an available model that can be reused for a novel with deep convolutional neural networks”, Advances in Neural
task. But this risk is constantly decreasing. Hugging Face Information Processing Systems, vol. 2, pp. 1097-1105, 2012.
currently contains more than 13 thousand models for image [8] K. Weiss, T.M. Khoshgoftaar and D. Wang, “A survey of transfer
classification and more than 2 thousand models for object learning”, Journal of Big Data 3, 9, 2016.
detection [3]. And many more. [9] C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A Survey on
Deep Transfer Learning”, Artificial Neural Networks and Machine
Learning, ICANN 2018. Lecture Notes in Computer Science, Vol
11141, pp. 270–279, Springer, 2018.
V. CONCLUSION [10] P.P. Ariza-Colpas, E. Vicario, A.I. Oviedo-Carrascal, S. Butt Aziz,
M.A. Piñeres-Melo, A. Quintero-Linero and F. Patara, “Human Activity
Transfer learning is becoming the dominant approach in Recognition Data Analysis: History, Evolutions, and New Trends”
machine learning systems development. Especially in deep Sensors 22(9), 3401, 2022.
learning, where the number of public pre-trained models [11] R.S.J Norvig, Artificial intelligence: a modern approach. Pearson Series
constantly grows particularly in image processing and Natural in Artificial Intelligence. M. Chang, J. Devlin, A. Dragan, D. Forsyth, I.
Language Processing tasks. Reusing pre-trained models Goodfellow, J. Malik, V. Mansinghka, J. Pearl, M.J. Wooldridge (4th
facilitates and speeds up development. Small data sets can be ed.). Hoboken, NJ: Pearson., 2021., ISBN 978-0-13-461099-3.
foundations for deep learning models with transfer learning. [12] S. Liu and W. Deng, “Very deep convolutional neural network based
image classification using small training sample size”, 2015 3rd IAPR
Computation power reduction with existing models reuse can Asian Conference on Pattern Recognition (ACPR), Kuala Lumpur,
be significant. This is important in cases like edge computing Malaysia, pp. 730-734, 2015.
where computation power is a critical resource. [13] M. Krichen, “Convolutional Neural Networks: A Survey”, Computers
2023, 12, 151. https://doi.org/10.3390/computers12080151
As the need for ML models continually grows, approaches
[14] I. Goodfellow, Y. Bengio and A. Courville, Deep Learning (Adaptive
like transfer learning that enhance model development grow Computation and Machine Learning series), The MIT Press, 2016.
more mature and feasible. Reusability and necessary [15] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D.
adjustments extent of captured knowledge in an existing model Erhan, V. Vanhoucke and A. Rabinovich, “Going deeper with
differs and is domain, task, and model specific. convolutions.” 2015 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pp. 1-9., 2014.
This paper presents the case study for transfer learning. [16] https://deeplearning.cms.waikato.ac.nz/user-guide/class-
Through the case study we have identified concrete steps for maps/IMAGENET/
the transfer learning for developing any NN. Any NN model [17] https://www.mathworks.com/matlabcentral/fileexchange/59133-deep-
could be built from scratch. But not this quickly, with modest learning-toolbox-model-for-alexnet-network
computational power and a small training data set. The [18] K. Weiss, T. M. Khoshgoftaar and D. Wang, “A survey of transfer
presented case of developing an image binary classifier for learning”, Journal of Big Data, 3:9, 2016.
automatically detecting Alzheimer's dementia depicts how the [19] SJ. Pan and Q. Yang, “A survey on transfer learning”, IEEE
transfer learning approach in developing NN works fast and Transactions on knowledge and data engineering 22, pp. 1345-1359,
efficiently. Customization of existing models, i.e. transfer 2010.
learning is gaining more and more attention. Thus, the fast [20] C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A Survey on
Deep Transfer Learning”, CANN 2018 Proceedings, Springer
development of deep neural networks has become increasingly International Publishing, pp. 270-279, 2018.
simplified due to the continually raising pool of pre-trained [21] V. Ashish, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
models. Ł. Kaiser, I. Polosukhin, “Attention Is All You Need”, 31st Conference
on Neural Information Processing Systems (NIPS 2017), Long Beach,
ACKNOWLEDGMENT CA, USA, 2017.
Data were provided by OASIS-1: Cross-Sectional: [22] A.S. Schachter and K.L. Davis, “Alzheimer’s disease”, Dialogues in
Principal Investigators: D. Marcus, R, Buckner, J, Csernansky, Clinical Neuroscience, 2(2), pp. 91–100, 2000.
J. Morris; P50 AG05681, P01 AG03991, P01 AG026276, R01 [23] https://sites.wustl.edu/oasisbrains/
AG021910, P20 MH071616, U24 RR021382. [24] JC. Morris, "The Clinical Dementia Rating (CDR): current version and
scoring rules.", Neurology 43, pp. 2412b-2414b, 1993.
[25] Y.N. Fu’adah1, I. Wijayanto1, N.K.C. Pratiwi1, F.F. Taliningsih1, S.
Rizal1 and M.A. Pramudit “Automated Classification of Alzheimer’s
REFERENCES Disease Based on MRI Image Processing using Convolutional Neural
Network (CNN) with AlexNet Architecture”, J. Phys.: Conf. Ser. 1844
[1] O. K. Harsh, “Data, Information and Knowledge & Reuse Management 012020
Techniques”, Proceedings of the World Congress on Engineering 2007
Vol I, WCE 2007, London, U.K., July 2 - 4, 2007. [26] T. Burzykowski, M. Geubbelmans, A. J. Rousseau and D. Valkenborg,
“Validation of machine learning algorithms”, American journal of
[2] M. Kubat, An Introduction to Machine Learning, Second Edition, orthodontics and dentofacial orthopedics 164 (2), pp. 295–297, 2023.
Springer International Publishing, 2017.
[27] D. E. Wollman and I. Prohovnik, “Sensitivity and specificity of
[3] https://huggingface.co/models neuroimaging for the diagnosis of Alzheimer's disease”, Dialogues in
[4] S. Bozinovski and A. Fulgosi, “The influence of pattern similarity and clinical neuroscience 5 (1), pp. 89–99, 2003.
transfer learning upon training of a base perceptron B2” (original in
Croatian), Proceedings of Symposium Informatica 3-121-5, 1976.