0% found this document useful (0 votes)
9 views

Unit-V Tranfer Learning Notes

Uploaded by

vennu.sanjana
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Unit-V Tranfer Learning Notes

Uploaded by

vennu.sanjana
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Transfer learning is a machine learning method where a model developed for a task

is reused as the starting point for a model on a second task. It involves reusing
knowledge from one task or domain to accelerate learning in a new, related task.
Instead of starting from scratch, pre-trained models provide a head-start, reducing
the need for extensive data and training time.

What is Transfer Learning

Transfer learning is an approach to machine learning where a model trained on one


task is used as the starting point for a model on a new task. This is done by
transferring the knowledge that the first model has learned about the features of the
data to the second model.

Figure 1.1: Traditional Setup Vs Transfer Learning. The traditional approach


trains or builds a separate model for each task as shown in the left figure where we
have 3 isolated models for 3 tasks. The transfer learning approach leverages prior
knowledge from source tasks to improve upon the performance of target task as
shown in the right figure
Figure 1.1

Figure 1.2 : Training from Scratch vs Transfer Learning. The top half of the figure
depicts a typical training setup using CNNs where we have a limited and skewed
dataset. The result is a model which isn’t as confident/performant as expected. The
bottom half of the figure depicts how transfer learning enables us to improve
performance on our limited and skewed dog breed dataset by simply re-using a
pre-trained state of the art CNN model (like VGG-16)
In deep learning, transfer learning is often used to solve problems with limited
data. This is because deep learning models typically require a large amount of data
to train, which can be difficult or expensive to obtain.

Why Use Transfer Learning?

The overall aim of transfer learning is to improve learning of target task by


leveraging source domain knowledge. This improvement in target task learning can
be categorized broadly as:

 Initial performance achievable


 Learning time
 Final achievable performance

These three improvement categories have been depicted in figure 1.3for reference.
The dashed line showcases the improvements in learning the target task achievable
using transfer learning techniques.
Figure 1.3 Categories of improvements in target task learning through transfer learning techniques
Here are some reasons why you might want to use transfer learning:

To save time and resources: Training a deep learning model from scratch can be
time-consuming and computationally expensive. Transfer learning can help you
save time and resources by starting with a model that has already been trained on a
large dataset.

To improve model performance: Transfer learning can help you improve the
performance of your model by transferring the knowledge that the pre-trained
model has learned about the features of the data. This can be especially helpful if
you have limited data for your target task.

To solve problems with limited data: Transfer learning can be used to solve
problems with limited data by transferring the knowledge that the pre-trained
model has learned about the features of the data. This can be done by using feature
extraction or fine-tuning.

When to Use Transfer Learning

Lack of training data: There isn’t enough labeled training data to train your
network from scratch.

Existing network: There already exists a network that is pre-trained on a similar


task, which is usually trained on massive amounts of data.

Same input: When task 1 and task 2 have the same input.
4.1 Types of Transfer Learning in Deep Learning
Transfer learning can be categorized into broadly three different types of transfer
learning scenarios.

This categorization is based on different relationship between the source and target
domain, and tasks to be completed.

Figure:1.4 Transfer Learning Scenarios: Inductive, Unsupervised and Transductive transfers are
three main scenarios under which transfer learning is applied.

Inductive Transfer:

 In this scenario, the source and target tasks are different with domains being
similar or related.
 Source and target data are typically labeled.
 As the name suggests, the target labels are required to induce the knowledge
from source domain for use in target.
 Typically, the source domain has a larger labeled dataset while the target
domain has limited labeled samples.
 Thus, inductive transfer helps in improving performance in the target
domain by leveraging/inducing objective function with knowledge from
source domain.
 This is the most common form of transfer learning we typically use in real-
world settings.

Unsupervised Transfer:

 This scenario is actually similar to inductive transfer as the target and source
tasks are different with only difference being absence of labels in both
source and target domains. But in inductive transfer, source and/or target
data is often labeled.
 Per its name, unsupervised transfer learning is unsupervised, meaning there
is no manually labeled data.
 By comparison, inductive transfer can be considered supervised learning.
 One common application of unsupervised learning is fraud detection.
 The focus of this category is to handle transfer learning in unsupervised
scenarios such as dimensionality reduction, clustering, density estimation,
etc.

Transductive Transfer:

 This occurs when the source and target tasks are the same, but the datasets
(or domains) are different.
 More specifically, the source data is typically labelled while the target data
is unlabeled.

 Domain adaptation is a form of transductive learning, as it applies

knowledge gained from performing a task on one data distribution towards

the same task on another data distribution.

1. Domain adaptation
Domain adaptation is usually referred to in scenarios where the
marginal probabilities between the source and target domains are different, such
as P (Xs) ≠ P (Xt). There is an inherent shift or drift in the data distribution of the
source and target domains that requires tweaks to transfer the learning. For
instance, a corpus of movie reviews labeled as positive or negative would be
different from a corpus of product-review sentiments. A classifier trained on
movie-review sentiment would see a different distribution if utilized to classify
product reviews. Thus, domain adaptation techniques are utilized in transfer
learning in these scenarios.

2. Domain confusion

Different layers in a deep learning network capture different sets of features. We


can utilize this fact to learn domain-invariant features and improve their
transferability across domains. Instead of allowing the model to learn any
representation, we nudge the representations of both domains to be as similar as
possible.

3. Multitask learning

In the case of multitask learning, several tasks are learned simultaneously without
distinction between the source and targets. In this case, the learner receives
information about multiple tasks at once, as compared to transfer learning, where
the learner initially has no idea about the target task.

This is depicted in the following diagram:


Figure;1.5 Multitask learning: Learner receives information from all tasks simultaneously

4. One-shot learning:

One-shot learning is a variant of transfer learning where we try to infer the required
output based on just one or a few training examples. This is essentially helpful in
real-world scenarios where it is not possible to have labeled data for every possible
class (if it is a classification task) and in scenarios where new classes can be added
often.

5. Zero-shot and few-shot transfer learning.

Both methods are designed to enable ML models to perform well with minimal or
no training data.

Zero-shot revolves around the concept of predicting labels for unseen data classes,
and few-shot learning involves learning from only a small amount of data per class.
Using this practice, models can rapidly learn to make effective generalizations with
little data.
Zero-shot learning is another extreme variant of transfer learning, which relies on
no labeled examples to learn a task. Zero-data learning, or zero-short learning,
methods make clever adjustments during the training stage itself to exploit
additional information to understand unseen data.

4.2 Transfer Learning Methodologies


There are three major methodologies of performing transfer learning. It is
important to understand that the choice of method depends upon the source and
target domains, availability of labels, the task at hand amongst a number of other
parameters.

4.2.1 Feature Extraction

Deep learning architectures can be made out of series of layers. Each layer
captures different features. capture simpler features in the initial layers and
complex ones in the deeper ones. For example, a typical CNN trained to identify
human faces would capture simpler features like straight edges and diagonals in
the initial layers while the deeper layers capture shapes and textures.

ALL layers extract certain features except the last one. The last layer transforms
these features into the objective at hand, i.e. classification, regression, etc. Thus, it
is an interesting proposition to utilize deep learning architectures (sans the final
layer) as feature extractors. This is a typical transfer learning setup where we
utilize deep learning architectures as feature extractors.

The same is depicted is Figure 1.6 Feature Extraction based Transfer Learning. we
can leverage deep learning architectures for feature extraction by simply removing
final few layers from a pre-trained model. Initial layers in a deep learning model
learn increasingly complex features. For the target task we freeze these layers from
the pretrained model and train only the newly attached (see green layers in the
architecture on the right).
Figure:1.6
4.2.2 Fine Tuning

Fine-tuning, in simple words, refers to the method of using a pre-trained model as


our starting point. Similar to the feature-extraction method, we remove the final
classification/regression layer and add a set of new layers depending upon our
target task’s objective. Unlike the previous scenario where we froze the layers of
the pre-trained model, in this case we allow some of the re-used layers to be
trained/updated along with the newly added ones. This method is illustrated in
figure 1.7 for reference.
Figure 1.7 Fine-tuning based Transfer Learning involves freezing only a few layers (say CNN-1) of
the pre-trained model while fine-tuning some of them (CNN-2 and FC-1) along with the newly
added layers (Shallow Classifier and Output). Fine-tuning is slightly different from “feature-
extraction based transfer learning” where we froze all layers of the pre-trained model.

This method is useful in scenarios where the target task has enough
training/labeled samples to train a deep, complex network with large number of
trainable weights. Fine-tuning a pre-trained network provides significant
performance improvements over training a network from scratch.

4.2.3 Pre-trained Models

Pretrained models present certain advantages, Not only do they provide a better
starting point but also assist in knowledge transfer by virtue of utilizing proven
models for target domain tasks. Building upon the success of fine-tuning based
transfer learning methods, using the whole pretrained model is the generalized
form. In this case, in place of only retraining some of the layers (while rest were
fixed), we retrain the whole network for the target domain.

This method is useful in scenarios where we have enough training samples in the
target domain as well as the required amount of compute to handle retraining of
complete networks.
4.3 How Transfer Learning works

Steps to Implement Transfer Learning

There are three main steps when fine-tuning a machine-learning model for a new
task.

Select a pre-trained model

First, select a pre-trained model with prior knowledge or skills for a related task. A
useful context for choosing a suitable model is to determine the source task of each
model. If you understand the original tasks the model performed, you can find one
that more effectively transitions to a new task.

Configure your pre-trained models

After selecting your source model, configure it to pass knowledge to a model to


complete the related task. There are two main methods of doing this.
Freeze pre-trained layers

Layers are the building blocks of neural networks. Each layer consists of a set of
neurons and performs specific transformations on the input data. Weights are the
parameters the network uses for decision-making. Initially set to random values,
weights are adjusted during the training process as the model learns from the data.

By freezing the weights of the pre-trained layers, you keep them fixed, preserving
the knowledge that the deep learning model obtained from the source task.

Remove the last layer

In some use cases, you can also remove the last layers of the pre-trained model. In
most ML architectures, the last layers are task-specific. Removing these final
layers helps you reconfigure the model for new task requirements.

Introduce new layers

Introducing new layers on top of your pre-trained model helps you adapt to the
specialized nature of the new task. The new layers adapt the model to the nuances
and functions of the new requirement.

Train the model for the target domain

You train the model on target task data to develop its standard output to align with
the new task. The pre-trained model likely produces different outputs from those
desired. After monitoring and evaluating the model’s performance during training,
you can adjust the hyperparameters or baseline neural network architecture to
improve output further. Unlike weights, hyperparameters are not learned from the
data. They are pre-set and play a crucial role in determining the efficiency and
effectiveness of the training process. For example, you could adjust regularization
parameters or the model’s learning rates to improve its ability in relation to the
target task.

4.4 Diving into Transfer Learning

4.4.1 Accessing Pre-trained Models


We have various sources of data like images, audio, video and text, there
exist a diverse set of pre-trained models which have been trained on humongous
amounts of data on a wide variety of tasks like classification, representation
learning, detection and so on. The core idea in transfer learning as you already
know by now is to leverage a pre-trained model and adapt it to solve your own
problem at-hand, instead of training a model from scratch.

One way of accessing these models is directly from the module APIs in the
specific deep learning libraries you might be using. For example:

TensorFlow has a nice applications module under the hood thanks to the Keras API
which you can access from the tf.keras.applications module in your code.

PyTorch has various packages like torchvision, torchtext and torchaudio which do
offer some of the pre-trained models specific to computer vision, text and audio
data respectively.

A better way to access models is often by referring to a central model repository of


pre-trained models, often referred to as a model hub or a model zoo. The most
popular model hubs are TensorFlow Hub, PyTorch Hub and HuggingFace
Transformers.

4.4.2 Image Classification with Transfer Learning


Example: showcasing transfer learning in the context of image classification
or categorization. The objective here will be to take a few sample images of
animals and see how some canned, pre-trained models fare in classifying these
images.

Methodology:
The key objective here is to take a pre-trained model off-the-shelf and use it
directly to predict the class of an input image.

Key objective of image classification focuses on taking an input image, loading up


a pre-trained model from TensorFlow Hub in Python and classifying the Top 5
probable classes of the input image. This workflow is depicted in figure 1.8
Figure 1.8 Image Classification with Pre-trained CNNs. The figure depics the top-5 class probabilities for the
given input image using a pre-trained CNN model.

Pre-trained Model Architectures


We will be leveraging two state-of-the-art pre-trained convolution neural network
(CNN) models, namely:
ResNet-50: This is a residual deep convolutional neural network (CNN) with a
total of 50 layers focusing on the standard convolution and pooling layers a typical
CNN has along with batch normalization layers for regularization. The novelty of
these models includes residual or skips connections. This model was trained on the
standard ImageNet-1K dataset having a total of 1000 distinct classes.

BiT MultiClass ResNet-152 4x: This is Google’s latest state-of-the-art (SOTA)


invention in the world of computer vision called Big Transfer. The model was
trained on the ImageNet-21K dataset having a total of 21843 classes.

Convolutional neural networks:

Typically, a convolutional neural network, more popularly known as CNN model


consists of a layered architecture of several layers which include convolution,
pooling and dense layers besides the input and output layers. A typical architecture
is depicted in figure 1.9.
Figure 1.9 Architecture of a typical Convolutional Neural Network. This usually includes a stacked hierarchy
of convolution and pooling layers.

The key objective of this multi-stage hierarchical architecture is to learn spatial


hierarchies of patterns which are also translation invariant. This is possible through
two main layers in the CNN architecture: the convolution and pooling layers.

Using a layered architecture of stacked convolution layers helps in learning spatial


features with a certain hierarchy as depicted in Figure 1.10
Figure 1.10 Hierarchical feature maps extracted from convolutional layers. Each layer extracts relevant
features for the input image. Shallower layers extract more generic features and deeper layers extract specific
features pertaining to the given input image.

Implementation
Let’s now use these pre-trained models to solve our objective of predicting the Top-5 classes of input
images.

We start by loading up the specific dependencies for image processing, modeling and inference.

import tensorflow as tf

import tensorflow_hub as tf_hub

from PIL import Image

import matplotlib.pyplot as plt

import numpy as np

print('TF Version:', tf.__version__)


print('TF Hub Version:', tf_hub.__version__)

TF Version: 2.3.0

TF Hub Version: 0.8.0

Loading ImageNet Class Labels

!wget https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt

!wget https://storage.googleapis.com/bit_models/imagenet21k_wordnet_lemmas.txt

data1k = []

with open('ImageNetLabels.txt', 'r') as f:

data1k = f.readlines()

data21k = []

with open('imagenet21k_wordnet_lemmas.txt', 'r') as f:

data21k = f.readlines()

imagenet1k_mapping = {i: value.strip('\n')

for i, value in enumerate(data1k)}

imagenet21k_mapping = {i: value.strip('\n')

for i, value in enumerate(data21k)}

print('ImageNet 1K (ResNet-50) Total Classes:',

len(list(imagenet1k_mapping.items())))

print('Sample:', list(imagenet1k_mapping.items())[:5])

print('\nImageNet 21K (BiT ResNet-152 4x)Total Classes:',

len(list(imagenet21k_mapping.items())))

print('Sample:', list(imagenet21k_mapping.items())[:5])

ImageNet 1K (ResNet-50) Total Classes: 1001

Sample: [(0, 'background'), (1, 'tench'), (2, 'goldfish'),


(3, 'great white shark'), (4, 'tiger shark')]

ImageNet 21K (BiT ResNet-152 4x)Total Classes: 21843

Sample: [(0, 'organism, being'), (1, 'benthos'),

(2, 'heterotroph'), (3, 'cell'),

(4, 'person, individual, someone, somebody, mortal, soul')]

load up the two pre-trained models we discussed earlier from TensorFlow Hub.

resnet_model_url = "https://tfhub.dev/tensorflow/resnet_50/classification/1"
resnet_50 = tf_hub.KerasLayer(resnet_model_url)
bit_model_url = "https://tfhub.dev/google/bit/m-r152x4/imagenet21k_classification/1"
bit_r152x4 = tf_hub.KerasLayer(bit_model_url)

Getting the class probabilities from the model predictions


def visualize_predictions(model, image, imagenet_mapping_dict,

model_type='resnet'):

if model_type =='resnet':

probs = model(image)

probs = tf.reshape(probs, [-1])

else:

logits = model(image)

logits = tf.reshape(logits, [-1])

probs = tf.nn.softmax(logits)

top5_imagenet_idxs = np.argsort(probs)[:-6:-1]

top5_probs = np.sort(probs)[:-6:-1]

pred_labels = [imagenet_mapping_dict[i]

for i in top5_imagenet_idxs]

Visualizing Top-5 predictions of our pre-trained models on a sample image

img = Image.open('snow_leo.png').convert("RGB")

pre_img = preprocess_image(img)

plt.figure(figsize=(12, 3))
plt.subplot(1,3,1)

visualize_predictions(model=bit_r152x4, image=pre_img,

imagenet_mapping_dict=imagenet21k_mapping,

model_type='bit-multiclass')

plt.subplot(1,3,2)

resnet_img = tf.image.resize(pre_img, (224, 224))

visualize_predictions(model=resnet_50, image=resnet_img,

imagenet_mapping_dict=imagenet1k_mapping,

model_type='resnet')

plt.subplot(1,3,3)

plt.imshow(pre_img[0])

plt.tight_layout()

Prediction results on a Snow Leopard Image


We have the Top-5 predictions from our two pre-trained models depicted in a nice
visualization in figure1.11
Figure 1.11 Prediction results on a Snow Leopard Image

It looks like both our models performed well, and as expected the BitM model is
very specific and more accurate given it has been trained on over 21K classes with
very specific animal species and breeds.
4.5 Transfer Learning Challenges

While it offers numerous advantages, it also comes with its own set of challenges
and considerations that practitioners must be aware of.
In context of transfer learning, the improvements in target task learning can be
categorized into

 better initial performance,


 improved training times
 Enhanced final performance.

Scenarios when some or all of these improvements are observed are termed as
positive transfer scenarios.

The following are major challenges associated with transfer learning.

4.5.1Negative transfer: Yet in real world setting, this is not always the case. There
are cases when transfer learning can lead to a drop in overall performance. This is
termed as negative transfer.

Negative transfer can occur due to a number of reasons.

 It could be due to source task not being sufficiently related to target


task/domain.
 It could also be due to incorrect choice of transfer method or transfer
method being unable to leverage relationship between source and target.

Avoiding negative transfer is important yet the abstract list of causes are
difficult (if not impossible) to narrow down.

4.5.2 Transfer Bounds or Knowledge Gain: Transfer learning is a powerful


concept which provides pretty amazing improvements in learning the target tasks.
Yet it is quite difficult to quantify the amount of knowledge transferred. It is
important to quantify the transfer in transfer learning to understand the quality of
transfer and its viability. It is also important from the point of view of comparing
transfers from different sources (apart from train/test evaluation metrics) to
understand generalizability/robustness of transfer learnt models.

4.5.3 Other Challenges and Considerations

Dataset Bias & Mismatch: Transfer learning relies heavily on the assumption that
the source and target domains share some similarities. However, in real-world
scenarios, datasets can exhibit bias or mismatch in terms of distribution, quality, or
context.

Overfitting & Generalization:


One of the key challenges in transfer learning is finding the right balance between
overfitting and generalization. Transferring too much knowledge from the source
domain may lead to overfitting on the target domain, while transferring too little
may hinder generalization.

Catastrophic Forgetting: catastrophic forgetting, where the model forgets


important information from the source domain while adapting to the target domain.

Ethical & Privacy Concerns: Transfer learning can be problematic when dealing
with sensitive data, such as medical records or personal information. Care must be
taken to ensure that transferred knowledge doesn’t violate privacy or ethical
standards.

Computational Resources: Large pre-trained models, such as those in natural


language processing, demand substantial computational resources for training and
fine-tuning. Smaller organizations or researchers with limited access to high-
performance computing may face challenges in implementing transfer learning
effectively.

4.6 Key use cases for transfer learning (Applications)


Transfer learning is typically used for the following key use cases:

Deep learning. Transfer learning is commonly used for deep learning neural
networks to help solve problems with limited data. Deep learning models typically
require large amounts of training data, which can be difficult and expensive to
acquire.

Image recognition. Transfer learning can improve the performance of models


trained on limited labeled data, which is useful in situations with limited data, such
as medical imaging.

Models like VGG and ResNet, trained on ImageNet, have been fine-tuned for tasks
such as medical image analysis.

NLP. BERT and GPT models have been fine-tuned for sentiment analysis and text
summarization tasks.

Using transfer learning to train NLP models can improve performance by


transferring knowledge across tasks related to machine translation, sentiment
analysis and text classification.

Computer vision. Pretrained models are useful for training computer vision tasks
like image segmentation, facial recognition and object detection, if the source and
target tasks are related.

Speech recognition. Models previously trained on large speech data sets are useful
for creating more versatile models. For example, a pretrained model could be
adapted to recognize specific languages, accents or dialects.

Object detection: Pretrained models that were trained to identify specific objects
in images or videos could hasten the training of a new model. YOLO models have
been adapted for applications like pedestrian detection in autonomous vehicles For
example, a pretrained model used to detect mammals could be added to a data set
used to identify different types of animals.

4.7 Transfer Learning Models


In the world of transfer learning, pre-trained models play a central role. These
models come with pre-acquired knowledge from various domains and tasks,
serving as a starting point for efficient knowledge transfer.

What Is a Pre-Trained Model?

A pre-trained model is a machine learning model that has been trained on a large
dataset of data. This dataset is typically much larger than the dataset that will be
used to train the final model. The pre-trained model learns to extract features from
the data, and these features can be used to train the final model more quickly and
efficiently.

Popular Pre-Trained Architectures

There is much popular pre-trained architecture, but some of the most common
include:

ImageNet Models: Perfect for computer vision tasks, models like VGG, ResNet,
and Inception Excel in image-related applications.

VGG (Visual Geometry Group): VGG is a family of convolutional neural


networks that were first introduced in 2014. They are known for their simplicity
and efficiency, and they have been used for a variety of tasks, including image
classification, object detection, and segmentation.

ResNet (Residual Network): ResNet is a family of convolutional neural networks


that were introduced in 2015. They are known for their ability to learn deeper
features, excels in image classification and object detection.

BERT (Bidirectional Encoder Representations from Transformers): BERT is


a language model that was introduced in 2018. A game-changer in NLP, It is
known for its ability to learn long-range dependencies in text, and it has been used
for a variety of natural language processing tasks, including question-answering,
sentiment analysis, and text summarization.

GPT (Generative Pre-trained Transformer): GPT models are NLP


powerhouses, known for natural language generation and understanding.

MobileNet: Designed for mobile and embedded vision, MobileNet efficiently


handles object detection and image classification.

YOLO (You Only Look Once): Real-time object detection is YOLO's strength,
making it valuable for custom solutions.

Inception (GoogLeNet): Known for resource-efficient computations in computer


vision.

Xception: A model with exceptional performance in image classification.

These are just a few of the many popular pre-trained architectures. The best
architecture for a particular task will depend on the specific requirements of the
task.
FAQ’s:

Fine-tuning vs. Feature Extraction

In transfer learning, fine-tuning and feature extraction are two common techniques
to adapt a pre-trained model to a new task or domain:

1. Fine-tuning: Fine-tuning involves taking a pre-trained model (often on a


source task) and training it further on a target task. During fine-tuning, the
model’s weights are updated using the target task’s data while retaining some
knowledge from the source task. Fine-tuning is particularly useful when the
source and target tasks are closely related.

2. Feature Extraction: Feature extraction refers to using a pre-trained model as a


fixed feature extractor. Instead of modifying the model’s weights, the model is
used to extract relevant features from the input data. These features can then be
fed into a new classifier or model specific to the target task.

What are the different transfer learning strategies?

Transductive transfer learning

Inductive transfer learning

Unsupervised transfer learning

You might also like