0% found this document useful (0 votes)

73 views16 pages

CH 5

Transfer learning is a machine learning technique where a pre-trained model is fine-tuned on a related target task. It involves pre-training a model on a large source dataset, extracting useful features, and then fine-tuning the model on a smaller target dataset. This allows effective learning on target tasks with limited data and faster training compared to training from scratch. Popular models used for transfer learning include convolutional neural networks, recurrent neural networks, and transformer models.

Uploaded by

21dce106

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views16 pages

CH 5

Uploaded by

21dce106

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Transfer learning

• Transfer learning is a machine learning

technique where a model trained on one
task (the source task) is adapted or fine-
tuned to perform a different but related
task (the target task). The idea behind
transfer learning is to leverage the
knowledge and representations learned
during the training on the source task to
improve the performance of the model
on the target task, especially when there
is limited data available for the target
task.
Here's how transfer learning typically works

1.Pretraining
In the first step, a model is pre-trained on a large and diverse dataset or task that is typically unrelated to the target task. This
pre-trained model is often referred to as the "base model," “pre-trained model," or "feature extractor." The source task used
for pretraining could be a common task for which extensive data is available, such as image classification or natural
language understanding.
2.Feature Extraction
During pretraining, the model learns to extract valuable features and representations from the input data. These features
capture high-level patterns, structures, and knowledge that are relevant not just to the source task but can also be useful for
other tasks.
3.Fine-Tuning
After pretraining, the base model is adapted or fine-tuned for the specific target task. The architecture and parameters of the
base model are adjusted to better suit the target task. This fine-tuning typically involves training the model on a smaller
dataset related to the target task.
4. Transfer of Knowledge
The knowledge and representations gained during pretraining are transferred
to the target task. This can include lower-level features like edges in images,
linguistic patterns in text, or more abstract knowledge about concepts in the
data. The fine-tuning process helps the model refine its representations and
adapt them to the nuances of the target task.
5. Training and Evaluation
The model is further trained on the target task dataset, and its performance is
evaluated. Depending on the task, the model can be fine-tuned on the entire
target dataset or specific layers of the model can be frozen to retain some of
the knowledge from the source task
Transfer learning offers several advantages
• Reduced Data Requirements
• It allows for effective learning on target tasks with limited data, as the model
already possesses useful knowledge from the source task.
• Faster Training
• Fine-tuning is usually faster than training a model from scratch because the
model has already learned meaningful features during pretraining.
• Improved Performance
• Transfer learning often leads to improved performance on the target task, as
the model benefits from the generalization capabilities developed during
pretraining.
Some popular models and architectures used
for transfer learning include
Convolutional Neural Networks (CNNs)
• ResNet
• Residual Networks are deep CNN architectures known for their effectiveness
in image classification tasks. They have many layers and are often pretrained
on large image datasets.
• VGGNet
• VGG networks have a simple and uniform architecture, making them suitable
for fine-tuning and transfer learning on various computer vision tasks.
• Inception (GoogLeNet)
• Inception networks are known for their efficient use of computational
resources. They have been pretrained on large-scale image datasets.
• Recurrent Neural Networks (RNNs)
• LSTM and GRU Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks are common
choices for transfer learning in natural language processing tasks. They have been pretrained on large text
corpora.
• Transformer Models
• BERT Bidirectional Encoder Representations from Transformers (BERT) is a pretrained transformer-based
model for natural language understanding tasks. It has been fine-tuned for various NLP tasks like
sentiment analysis, question answering, and more.
• GPT (Generative Pretrained Transformer) GPT models, such as GPT-2 and GPT-3, are pretrained
transformers used for tasks like text generation, language modeling, and text completion.
• Siamese Networks
• Siamese networks are used for tasks like face recognition and similarity learning. They learn embeddings
that can be used for transfer learning in various one-shot or few-shot learning scenarios.
• Feature Extractors
• Pretrained models like VGGFace, MobileNet, or InceptionV3 are often used as feature
extractors for various computer vision tasks. These models can extract features that
are useful for tasks like face recognition, object detection, or image segmentation.
• Domain-Specific Models
• In some cases, domain-specific models pretrained on relevant datasets are used for
transfer learning. For example, models pretrained on medical images for medical
image analysis tasks or pretrained models for specific languages in the context of NLP.
• Custom Architectures
• Depending on the task and data, custom architectures can also be used for transfer
learning. Researchers and practitioners often adapt existing architectures or design
new ones tailored to the specific problem.
• The choice of which model to use for transfer learning depends on
factors like the availability of pretrained models, the nature of the
target task, and the size of the target dataset. Transfer learning allows
practitioners to leverage the power of large-scale pretraining to boost
the performance of models on various downstream tasks with limited
data.
RNN
• RNN stands for Recurrent Neural Network, which is a type of artificial
neural network designed for processing sequences of data. Unlike
traditional feedforward neural networks, which process data in a
single pass from input to output, RNNs have a feedback loop that
allows them to maintain a hidden state or memory of previous inputs.
This makes them particularly well-suited for tasks that involve
sequences, such as time series data, natural language processing, and
speech recognition.
• The key idea behind RNNs is that they can capture temporal
dependencies in data. In a typical RNN architecture, at each time step,
the network takes an input and combines it with the hidden state
from the previous time step to produce an output and update the
hidden state. This recurrent connection allows RNNs to capture
information from previous time steps and use it to influence their
current predictions.
Here's a step-by-step explanation of how
RNNs work
1.Initialization At the start of processing a sequence, the hidden state is typically initialized to a vector of
zeros. This hidden state serves as the network's "memory."
2.Processing the Sequence The RNN processes the input sequence one element at a time, sequentially, from
the first element to the last. For each element in the sequence (also known as a time step), the following steps
are performed
a. Input Encoding The input at the current time step is encoded into a fixed-size vector representation. This
encoding can be as simple as a one-hot encoding for categorical data or an embedding for text data.
b. Combining Input and Previous Hidden State The encoded input is combined with the previous hidden
state. This combination is often done using a set of weights and a non-linear activation function, similar to a
feedforward neural network layer. The result is the current hidden state, which captures information from both
the current input and the past hidden state.
c. Output Calculation The current hidden state can be used to make predictions or produce an output.
Depending on the task, this output can take different forms. For example, in a language modeling task, the
output might be a probability distribution over the next word in a sentence.
d. Updating the Hidden State The current hidden state becomes the hidden state for the next time step,
carrying information forward in the sequence.
3.Repeating the Process Steps 2a to 2d are repeated for each element
in the input sequence, allowing the RNN to capture dependencies and
patterns within the sequence.
4.Final Output After processing the entire sequence, the RNN may
produce a final output, depending on the specific task. For example, in a
sentiment analysis task, the final output might be the predicted
sentiment of a sentence.
Unlike Deep neural networks where we have different weight matrices for each Dense
network in RNN, the weight across the network remains the same. It calculates state
hidden state Hi for every input Xi . By using the following formulas
h= σ(UX + Wh-1 + B)
Y = O(Vh + C) Hence
Y = f (X, h , W, U, V, B, C)
Here S is the State matrix which has element si as the state of the network at timestep i
The parameters in the network are W, U, V, c, b which are shared across timestep
Applications of NLP

Chatbots and Named Entity

Sentiment Language Text Speech
Virtual Recognition
Analysis Translation Summarization Recognition
Assistants (NER)

Text Clustering
Language Text Text-to-Speech Language Spam
and Topic
Modeling Classification (TTS) Synthesis Generation Detection
Modeling

Language
Document Healthcare Financial Social Media
Understanding
Classification Applications Analysis Analysis
and Parsing

DL_MOD4 (3)
No ratings yet
DL_MOD4 (3)
105 pages
DL
No ratings yet
DL
251 pages
Rnn Tutorial
No ratings yet
Rnn Tutorial
41 pages
NLP Unit-3A Notes (1)
No ratings yet
NLP Unit-3A Notes (1)
28 pages
Lecture 17 Transfer Learning
No ratings yet
Lecture 17 Transfer Learning
12 pages
NN
No ratings yet
NN
25 pages
Operations Slides
No ratings yet
Operations Slides
11 pages
DL UNIT IV
No ratings yet
DL UNIT IV
15 pages
06 - LLM
No ratings yet
06 - LLM
18 pages
module5
No ratings yet
module5
21 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
Eng Ppt Tech
No ratings yet
Eng Ppt Tech
18 pages
AIDS II (1)
No ratings yet
AIDS II (1)
42 pages
PP&DS-5
No ratings yet
PP&DS-5
31 pages
Unit 3
No ratings yet
Unit 3
41 pages
PNAL10_RNNs
No ratings yet
PNAL10_RNNs
32 pages
DL MODULE 5
No ratings yet
DL MODULE 5
10 pages
UNIT-2 DL
No ratings yet
UNIT-2 DL
51 pages
DL-unit-4-part-2
No ratings yet
DL-unit-4-part-2
8 pages
AIDS-II PT1 Question Bank
No ratings yet
AIDS-II PT1 Question Bank
27 pages
IC Unit6 DeepLearning
No ratings yet
IC Unit6 DeepLearning
35 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
8 pages
Sentiment Analysis with an Recurrent Neural Networks
No ratings yet
Sentiment Analysis with an Recurrent Neural Networks
12 pages
Unit 3 NNDL-1
No ratings yet
Unit 3 NNDL-1
31 pages
Unit III- Recurrent Neural Networks
No ratings yet
Unit III- Recurrent Neural Networks
44 pages
Unit_4
No ratings yet
Unit_4
13 pages
ENG6500 8 DL IntroductionToDeepLearning Part2
No ratings yet
ENG6500 8 DL IntroductionToDeepLearning Part2
65 pages
ML Lec 21 RNN
No ratings yet
ML Lec 21 RNN
72 pages
Unit 3 Questions With Answers Ghanta Ka Password
No ratings yet
Unit 3 Questions With Answers Ghanta Ka Password
20 pages
DL Unit-3
No ratings yet
DL Unit-3
9 pages
PROGRAM 5n6 Dl_final
No ratings yet
PROGRAM 5n6 Dl_final
9 pages
Program 5n6 Dl
No ratings yet
Program 5n6 Dl
9 pages
Unit III (2) RNN, LSTM, Gru
No ratings yet
Unit III (2) RNN, LSTM, Gru
14 pages
UNIT III
No ratings yet
UNIT III
26 pages
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
No ratings yet
Introduction To Recurrent Neural Networks (RNNS) : Dr. Hans Weber February 9, 2024
9 pages
Unit 3
No ratings yet
Unit 3
8 pages
Deep Learning Updated
No ratings yet
Deep Learning Updated
11 pages
LLMs in 100 Images (Preview)
No ratings yet
LLMs in 100 Images (Preview)
15 pages
FDP AI,ML,DL Q5
No ratings yet
FDP AI,ML,DL Q5
2 pages
DL mod 3
No ratings yet
DL mod 3
4 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
RNN
No ratings yet
RNN
4 pages
NB4-10 PT V Transfer Learning
No ratings yet
NB4-10 PT V Transfer Learning
16 pages
Recurrent Neural Network: Dr. Sukanta Ghosh
100% (1)
Recurrent Neural Network: Dr. Sukanta Ghosh
34 pages
2111CS010077 deep learning
No ratings yet
2111CS010077 deep learning
10 pages
DL-UNIT_5
No ratings yet
DL-UNIT_5
10 pages
RNN
No ratings yet
RNN
23 pages
UNIT - 5 Lecture 2
No ratings yet
UNIT - 5 Lecture 2
26 pages
For Seminar
No ratings yet
For Seminar
17 pages
RNN Basics
No ratings yet
RNN Basics
17 pages
CP4252 ML UNIT- V
No ratings yet
CP4252 ML UNIT- V
17 pages
OCI DL Fundations
No ratings yet
OCI DL Fundations
4 pages
Krishna Rungta - TensorFlow in 1 Day Make Your Own Neural Network (2018) - Trang-1
No ratings yet
Krishna Rungta - TensorFlow in 1 Day Make Your Own Neural Network (2018) - Trang-1
24 pages
Deep Learning Report for Students
No ratings yet
Deep Learning Report for Students
32 pages
Transfer Learning Seminar
No ratings yet
Transfer Learning Seminar
12 pages
Unit 4
No ratings yet
Unit 4
27 pages
CNN - Case Study
No ratings yet
CNN - Case Study
4 pages
Unit Iv (CNN)
No ratings yet
Unit Iv (CNN)
8 pages
Lecture 10 - Knowledge and Reasoning - 2025 - LLM (1)
No ratings yet
Lecture 10 - Knowledge and Reasoning - 2025 - LLM (1)
121 pages
Lecture Notes_RRN
No ratings yet
Lecture Notes_RRN
8 pages
125 Questions GenAI Interview Guide
No ratings yet
125 Questions GenAI Interview Guide
24 pages
01. Sentiment Analysis for Social Media
No ratings yet
01. Sentiment Analysis for Social Media
26 pages
17 BatterBERT Corpus Generation Ref21
No ratings yet
17 BatterBERT Corpus Generation Ref21
13 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
6 pages
XCS224N Module6 Slides
No ratings yet
XCS224N Module6 Slides
99 pages
14-Word Embeddings II
No ratings yet
14-Word Embeddings II
31 pages
BERT-LSTM Model for Sarcasm Detection in Code-mixed Social Media Post
No ratings yet
BERT-LSTM Model for Sarcasm Detection in Code-mixed Social Media Post
20 pages
202106 a Tutorial of Transformers-邱锡鹏
No ratings yet
202106 a Tutorial of Transformers-邱锡鹏
108 pages
ai-05-00126 2
No ratings yet
ai-05-00126 2
33 pages
Dynamic Programming
No ratings yet
Dynamic Programming
74 pages
Divide and Conquer
No ratings yet
Divide and Conquer
50 pages
IITK PCC GenAI-AIML
No ratings yet
IITK PCC GenAI-AIML
32 pages
Project Final1
No ratings yet
Project Final1
39 pages
NLP Tutorial 2024
No ratings yet
NLP Tutorial 2024
23 pages
Down The Rabbit Hole Detecting Online Extremism, Radicalisation, and Politicised Hate Speech
No ratings yet
Down The Rabbit Hole Detecting Online Extremism, Radicalisation, and Politicised Hate Speech
35 pages
K-Means With Elbow Method
No ratings yet
K-Means With Elbow Method
24 pages
Blue Doodle Project Presentation
No ratings yet
Blue Doodle Project Presentation
15 pages
Explainability For Large Language Models: A Survey
No ratings yet
Explainability For Large Language Models: A Survey
31 pages
Scalable Matmul-Free Language Modeling: Com/Ridgerchu/Matmulfreellm
No ratings yet
Scalable Matmul-Free Language Modeling: Com/Ridgerchu/Matmulfreellm
19 pages
Error Analysis of Emotion Detection Using BERT
No ratings yet
Error Analysis of Emotion Detection Using BERT
8 pages
2023 Del Valle y de La Fuente Métodos de Análisis de Sentimiento Política y Odio
No ratings yet
2023 Del Valle y de La Fuente Métodos de Análisis de Sentimiento Política y Odio
11 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
15 pages
Domain Specific Language Models Pre Trained On Constru - 2024 - Automation in Co
No ratings yet
Domain Specific Language Models Pre Trained On Constru - 2024 - Automation in Co
14 pages
Chataug: Leveraging Chatgpt For Text Data Augmentation
No ratings yet
Chataug: Leveraging Chatgpt For Text Data Augmentation
12 pages
Model
No ratings yet
Model
5 pages
Deep Learning R18 Jntuh Lab Manual
0% (1)
Deep Learning R18 Jntuh Lab Manual
21 pages
NLP Quick NOtes
No ratings yet
NLP Quick NOtes
15 pages
Exploring Progress in Aspect-Based Sentiment Analysis An In-Depth Survey
No ratings yet
Exploring Progress in Aspect-Based Sentiment Analysis An In-Depth Survey
10 pages
A Guide To Google Search Ranking Systems - Google Search Central - Documentation - Google For Developers
No ratings yet
A Guide To Google Search Ranking Systems - Google Search Central - Documentation - Google For Developers
8 pages
VQA3
No ratings yet
VQA3
10 pages
A Transformer-Based Approach For Fake News Detection Using Time Series Analysis
No ratings yet
A Transformer-Based Approach For Fake News Detection Using Time Series Analysis
7 pages
Ce 315 Daa 03 2018
No ratings yet
Ce 315 Daa 03 2018
3 pages
Searcheng
No ratings yet
Searcheng
6 pages
Kumar_Shivam_CV
No ratings yet
Kumar_Shivam_CV
1 page
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet

CH 5

Uploaded by

CH 5

Uploaded by

Transfer learning

• Transfer learning is a machine learning

Chatbots and Named Entity

You might also like