Unit-V Tranfer Learning Notes
Unit-V Tranfer Learning Notes
is reused as the starting point for a model on a second task. It involves reusing
knowledge from one task or domain to accelerate learning in a new, related task.
Instead of starting from scratch, pre-trained models provide a head-start, reducing
the need for extensive data and training time.
Figure 1.2 : Training from Scratch vs Transfer Learning. The top half of the figure
depicts a typical training setup using CNNs where we have a limited and skewed
dataset. The result is a model which isn’t as confident/performant as expected. The
bottom half of the figure depicts how transfer learning enables us to improve
performance on our limited and skewed dog breed dataset by simply re-using a
pre-trained state of the art CNN model (like VGG-16)
In deep learning, transfer learning is often used to solve problems with limited
data. This is because deep learning models typically require a large amount of data
to train, which can be difficult or expensive to obtain.
These three improvement categories have been depicted in figure 1.3for reference.
The dashed line showcases the improvements in learning the target task achievable
using transfer learning techniques.
Figure 1.3 Categories of improvements in target task learning through transfer learning techniques
Here are some reasons why you might want to use transfer learning:
To save time and resources: Training a deep learning model from scratch can be
time-consuming and computationally expensive. Transfer learning can help you
save time and resources by starting with a model that has already been trained on a
large dataset.
To improve model performance: Transfer learning can help you improve the
performance of your model by transferring the knowledge that the pre-trained
model has learned about the features of the data. This can be especially helpful if
you have limited data for your target task.
To solve problems with limited data: Transfer learning can be used to solve
problems with limited data by transferring the knowledge that the pre-trained
model has learned about the features of the data. This can be done by using feature
extraction or fine-tuning.
Lack of training data: There isn’t enough labeled training data to train your
network from scratch.
Same input: When task 1 and task 2 have the same input.
4.1 Types of Transfer Learning in Deep Learning
Transfer learning can be categorized into broadly three different types of transfer
learning scenarios.
This categorization is based on different relationship between the source and target
domain, and tasks to be completed.
Figure:1.4 Transfer Learning Scenarios: Inductive, Unsupervised and Transductive transfers are
three main scenarios under which transfer learning is applied.
Inductive Transfer:
In this scenario, the source and target tasks are different with domains being
similar or related.
Source and target data are typically labeled.
As the name suggests, the target labels are required to induce the knowledge
from source domain for use in target.
Typically, the source domain has a larger labeled dataset while the target
domain has limited labeled samples.
Thus, inductive transfer helps in improving performance in the target
domain by leveraging/inducing objective function with knowledge from
source domain.
This is the most common form of transfer learning we typically use in real-
world settings.
Unsupervised Transfer:
This scenario is actually similar to inductive transfer as the target and source
tasks are different with only difference being absence of labels in both
source and target domains. But in inductive transfer, source and/or target
data is often labeled.
Per its name, unsupervised transfer learning is unsupervised, meaning there
is no manually labeled data.
By comparison, inductive transfer can be considered supervised learning.
One common application of unsupervised learning is fraud detection.
The focus of this category is to handle transfer learning in unsupervised
scenarios such as dimensionality reduction, clustering, density estimation,
etc.
Transductive Transfer:
This occurs when the source and target tasks are the same, but the datasets
(or domains) are different.
More specifically, the source data is typically labelled while the target data
is unlabeled.
1. Domain adaptation
Domain adaptation is usually referred to in scenarios where the
marginal probabilities between the source and target domains are different, such
as P (Xs) ≠ P (Xt). There is an inherent shift or drift in the data distribution of the
source and target domains that requires tweaks to transfer the learning. For
instance, a corpus of movie reviews labeled as positive or negative would be
different from a corpus of product-review sentiments. A classifier trained on
movie-review sentiment would see a different distribution if utilized to classify
product reviews. Thus, domain adaptation techniques are utilized in transfer
learning in these scenarios.
2. Domain confusion
3. Multitask learning
In the case of multitask learning, several tasks are learned simultaneously without
distinction between the source and targets. In this case, the learner receives
information about multiple tasks at once, as compared to transfer learning, where
the learner initially has no idea about the target task.
4. One-shot learning:
One-shot learning is a variant of transfer learning where we try to infer the required
output based on just one or a few training examples. This is essentially helpful in
real-world scenarios where it is not possible to have labeled data for every possible
class (if it is a classification task) and in scenarios where new classes can be added
often.
Both methods are designed to enable ML models to perform well with minimal or
no training data.
Zero-shot revolves around the concept of predicting labels for unseen data classes,
and few-shot learning involves learning from only a small amount of data per class.
Using this practice, models can rapidly learn to make effective generalizations with
little data.
Zero-shot learning is another extreme variant of transfer learning, which relies on
no labeled examples to learn a task. Zero-data learning, or zero-short learning,
methods make clever adjustments during the training stage itself to exploit
additional information to understand unseen data.
Deep learning architectures can be made out of series of layers. Each layer
captures different features. capture simpler features in the initial layers and
complex ones in the deeper ones. For example, a typical CNN trained to identify
human faces would capture simpler features like straight edges and diagonals in
the initial layers while the deeper layers capture shapes and textures.
ALL layers extract certain features except the last one. The last layer transforms
these features into the objective at hand, i.e. classification, regression, etc. Thus, it
is an interesting proposition to utilize deep learning architectures (sans the final
layer) as feature extractors. This is a typical transfer learning setup where we
utilize deep learning architectures as feature extractors.
The same is depicted is Figure 1.6 Feature Extraction based Transfer Learning. we
can leverage deep learning architectures for feature extraction by simply removing
final few layers from a pre-trained model. Initial layers in a deep learning model
learn increasingly complex features. For the target task we freeze these layers from
the pretrained model and train only the newly attached (see green layers in the
architecture on the right).
Figure:1.6
4.2.2 Fine Tuning
This method is useful in scenarios where the target task has enough
training/labeled samples to train a deep, complex network with large number of
trainable weights. Fine-tuning a pre-trained network provides significant
performance improvements over training a network from scratch.
Pretrained models present certain advantages, Not only do they provide a better
starting point but also assist in knowledge transfer by virtue of utilizing proven
models for target domain tasks. Building upon the success of fine-tuning based
transfer learning methods, using the whole pretrained model is the generalized
form. In this case, in place of only retraining some of the layers (while rest were
fixed), we retrain the whole network for the target domain.
This method is useful in scenarios where we have enough training samples in the
target domain as well as the required amount of compute to handle retraining of
complete networks.
4.3 How Transfer Learning works
There are three main steps when fine-tuning a machine-learning model for a new
task.
First, select a pre-trained model with prior knowledge or skills for a related task. A
useful context for choosing a suitable model is to determine the source task of each
model. If you understand the original tasks the model performed, you can find one
that more effectively transitions to a new task.
Layers are the building blocks of neural networks. Each layer consists of a set of
neurons and performs specific transformations on the input data. Weights are the
parameters the network uses for decision-making. Initially set to random values,
weights are adjusted during the training process as the model learns from the data.
By freezing the weights of the pre-trained layers, you keep them fixed, preserving
the knowledge that the deep learning model obtained from the source task.
In some use cases, you can also remove the last layers of the pre-trained model. In
most ML architectures, the last layers are task-specific. Removing these final
layers helps you reconfigure the model for new task requirements.
Introducing new layers on top of your pre-trained model helps you adapt to the
specialized nature of the new task. The new layers adapt the model to the nuances
and functions of the new requirement.
You train the model on target task data to develop its standard output to align with
the new task. The pre-trained model likely produces different outputs from those
desired. After monitoring and evaluating the model’s performance during training,
you can adjust the hyperparameters or baseline neural network architecture to
improve output further. Unlike weights, hyperparameters are not learned from the
data. They are pre-set and play a crucial role in determining the efficiency and
effectiveness of the training process. For example, you could adjust regularization
parameters or the model’s learning rates to improve its ability in relation to the
target task.
One way of accessing these models is directly from the module APIs in the
specific deep learning libraries you might be using. For example:
TensorFlow has a nice applications module under the hood thanks to the Keras API
which you can access from the tf.keras.applications module in your code.
PyTorch has various packages like torchvision, torchtext and torchaudio which do
offer some of the pre-trained models specific to computer vision, text and audio
data respectively.
Methodology:
The key objective here is to take a pre-trained model off-the-shelf and use it
directly to predict the class of an input image.
Implementation
Let’s now use these pre-trained models to solve our objective of predicting the Top-5 classes of input
images.
We start by loading up the specific dependencies for image processing, modeling and inference.
import tensorflow as tf
import numpy as np
TF Version: 2.3.0
!wget https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt
!wget https://storage.googleapis.com/bit_models/imagenet21k_wordnet_lemmas.txt
data1k = []
data1k = f.readlines()
data21k = []
data21k = f.readlines()
len(list(imagenet1k_mapping.items())))
print('Sample:', list(imagenet1k_mapping.items())[:5])
len(list(imagenet21k_mapping.items())))
print('Sample:', list(imagenet21k_mapping.items())[:5])
load up the two pre-trained models we discussed earlier from TensorFlow Hub.
resnet_model_url = "https://tfhub.dev/tensorflow/resnet_50/classification/1"
resnet_50 = tf_hub.KerasLayer(resnet_model_url)
bit_model_url = "https://tfhub.dev/google/bit/m-r152x4/imagenet21k_classification/1"
bit_r152x4 = tf_hub.KerasLayer(bit_model_url)
model_type='resnet'):
if model_type =='resnet':
probs = model(image)
else:
logits = model(image)
probs = tf.nn.softmax(logits)
top5_imagenet_idxs = np.argsort(probs)[:-6:-1]
top5_probs = np.sort(probs)[:-6:-1]
pred_labels = [imagenet_mapping_dict[i]
for i in top5_imagenet_idxs]
img = Image.open('snow_leo.png').convert("RGB")
pre_img = preprocess_image(img)
plt.figure(figsize=(12, 3))
plt.subplot(1,3,1)
visualize_predictions(model=bit_r152x4, image=pre_img,
imagenet_mapping_dict=imagenet21k_mapping,
model_type='bit-multiclass')
plt.subplot(1,3,2)
visualize_predictions(model=resnet_50, image=resnet_img,
imagenet_mapping_dict=imagenet1k_mapping,
model_type='resnet')
plt.subplot(1,3,3)
plt.imshow(pre_img[0])
plt.tight_layout()
It looks like both our models performed well, and as expected the BitM model is
very specific and more accurate given it has been trained on over 21K classes with
very specific animal species and breeds.
4.5 Transfer Learning Challenges
While it offers numerous advantages, it also comes with its own set of challenges
and considerations that practitioners must be aware of.
In context of transfer learning, the improvements in target task learning can be
categorized into
Scenarios when some or all of these improvements are observed are termed as
positive transfer scenarios.
4.5.1Negative transfer: Yet in real world setting, this is not always the case. There
are cases when transfer learning can lead to a drop in overall performance. This is
termed as negative transfer.
Avoiding negative transfer is important yet the abstract list of causes are
difficult (if not impossible) to narrow down.
Dataset Bias & Mismatch: Transfer learning relies heavily on the assumption that
the source and target domains share some similarities. However, in real-world
scenarios, datasets can exhibit bias or mismatch in terms of distribution, quality, or
context.
Ethical & Privacy Concerns: Transfer learning can be problematic when dealing
with sensitive data, such as medical records or personal information. Care must be
taken to ensure that transferred knowledge doesn’t violate privacy or ethical
standards.
Deep learning. Transfer learning is commonly used for deep learning neural
networks to help solve problems with limited data. Deep learning models typically
require large amounts of training data, which can be difficult and expensive to
acquire.
Models like VGG and ResNet, trained on ImageNet, have been fine-tuned for tasks
such as medical image analysis.
NLP. BERT and GPT models have been fine-tuned for sentiment analysis and text
summarization tasks.
Computer vision. Pretrained models are useful for training computer vision tasks
like image segmentation, facial recognition and object detection, if the source and
target tasks are related.
Speech recognition. Models previously trained on large speech data sets are useful
for creating more versatile models. For example, a pretrained model could be
adapted to recognize specific languages, accents or dialects.
Object detection: Pretrained models that were trained to identify specific objects
in images or videos could hasten the training of a new model. YOLO models have
been adapted for applications like pedestrian detection in autonomous vehicles For
example, a pretrained model used to detect mammals could be added to a data set
used to identify different types of animals.
A pre-trained model is a machine learning model that has been trained on a large
dataset of data. This dataset is typically much larger than the dataset that will be
used to train the final model. The pre-trained model learns to extract features from
the data, and these features can be used to train the final model more quickly and
efficiently.
There is much popular pre-trained architecture, but some of the most common
include:
ImageNet Models: Perfect for computer vision tasks, models like VGG, ResNet,
and Inception Excel in image-related applications.
YOLO (You Only Look Once): Real-time object detection is YOLO's strength,
making it valuable for custom solutions.
These are just a few of the many popular pre-trained architectures. The best
architecture for a particular task will depend on the specific requirements of the
task.
FAQ’s:
In transfer learning, fine-tuning and feature extraction are two common techniques
to adapt a pre-trained model to a new task or domain: