CH 5
CH 5
1.Pretraining
In the first step, a model is pre-trained on a large and diverse dataset or task that is typically unrelated to the target task. This
pre-trained model is often referred to as the "base model," “pre-trained model," or "feature extractor." The source task used
for pretraining could be a common task for which extensive data is available, such as image classification or natural
language understanding.
2.Feature Extraction
During pretraining, the model learns to extract valuable features and representations from the input data. These features
capture high-level patterns, structures, and knowledge that are relevant not just to the source task but can also be useful for
other tasks.
3.Fine-Tuning
After pretraining, the base model is adapted or fine-tuned for the specific target task. The architecture and parameters of the
base model are adjusted to better suit the target task. This fine-tuning typically involves training the model on a smaller
dataset related to the target task.
4. Transfer of Knowledge
The knowledge and representations gained during pretraining are transferred
to the target task. This can include lower-level features like edges in images,
linguistic patterns in text, or more abstract knowledge about concepts in the
data. The fine-tuning process helps the model refine its representations and
adapt them to the nuances of the target task.
5. Training and Evaluation
The model is further trained on the target task dataset, and its performance is
evaluated. Depending on the task, the model can be fine-tuned on the entire
target dataset or specific layers of the model can be frozen to retain some of
the knowledge from the source task
Transfer learning offers several advantages
• Reduced Data Requirements
• It allows for effective learning on target tasks with limited data, as the model
already possesses useful knowledge from the source task.
• Faster Training
• Fine-tuning is usually faster than training a model from scratch because the
model has already learned meaningful features during pretraining.
• Improved Performance
• Transfer learning often leads to improved performance on the target task, as
the model benefits from the generalization capabilities developed during
pretraining.
Some popular models and architectures used
for transfer learning include
Convolutional Neural Networks (CNNs)
• ResNet
• Residual Networks are deep CNN architectures known for their effectiveness
in image classification tasks. They have many layers and are often pretrained
on large image datasets.
• VGGNet
• VGG networks have a simple and uniform architecture, making them suitable
for fine-tuning and transfer learning on various computer vision tasks.
• Inception (GoogLeNet)
• Inception networks are known for their efficient use of computational
resources. They have been pretrained on large-scale image datasets.
• Recurrent Neural Networks (RNNs)
• LSTM and GRU Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks are common
choices for transfer learning in natural language processing tasks. They have been pretrained on large text
corpora.
• Transformer Models
• BERT Bidirectional Encoder Representations from Transformers (BERT) is a pretrained transformer-based
model for natural language understanding tasks. It has been fine-tuned for various NLP tasks like
sentiment analysis, question answering, and more.
• GPT (Generative Pretrained Transformer) GPT models, such as GPT-2 and GPT-3, are pretrained
transformers used for tasks like text generation, language modeling, and text completion.
• Siamese Networks
• Siamese networks are used for tasks like face recognition and similarity learning. They learn embeddings
that can be used for transfer learning in various one-shot or few-shot learning scenarios.
• Feature Extractors
• Pretrained models like VGGFace, MobileNet, or InceptionV3 are often used as feature
extractors for various computer vision tasks. These models can extract features that
are useful for tasks like face recognition, object detection, or image segmentation.
• Domain-Specific Models
• In some cases, domain-specific models pretrained on relevant datasets are used for
transfer learning. For example, models pretrained on medical images for medical
image analysis tasks or pretrained models for specific languages in the context of NLP.
• Custom Architectures
• Depending on the task and data, custom architectures can also be used for transfer
learning. Researchers and practitioners often adapt existing architectures or design
new ones tailored to the specific problem.
• The choice of which model to use for transfer learning depends on
factors like the availability of pretrained models, the nature of the
target task, and the size of the target dataset. Transfer learning allows
practitioners to leverage the power of large-scale pretraining to boost
the performance of models on various downstream tasks with limited
data.
RNN
• RNN stands for Recurrent Neural Network, which is a type of artificial
neural network designed for processing sequences of data. Unlike
traditional feedforward neural networks, which process data in a
single pass from input to output, RNNs have a feedback loop that
allows them to maintain a hidden state or memory of previous inputs.
This makes them particularly well-suited for tasks that involve
sequences, such as time series data, natural language processing, and
speech recognition.
• The key idea behind RNNs is that they can capture temporal
dependencies in data. In a typical RNN architecture, at each time step,
the network takes an input and combines it with the hidden state
from the previous time step to produce an output and update the
hidden state. This recurrent connection allows RNNs to capture
information from previous time steps and use it to influence their
current predictions.
Here's a step-by-step explanation of how
RNNs work
1.Initialization At the start of processing a sequence, the hidden state is typically initialized to a vector of
zeros. This hidden state serves as the network's "memory."
2.Processing the Sequence The RNN processes the input sequence one element at a time, sequentially, from
the first element to the last. For each element in the sequence (also known as a time step), the following steps
are performed
a. Input Encoding The input at the current time step is encoded into a fixed-size vector representation. This
encoding can be as simple as a one-hot encoding for categorical data or an embedding for text data.
b. Combining Input and Previous Hidden State The encoded input is combined with the previous hidden
state. This combination is often done using a set of weights and a non-linear activation function, similar to a
feedforward neural network layer. The result is the current hidden state, which captures information from both
the current input and the past hidden state.
c. Output Calculation The current hidden state can be used to make predictions or produce an output.
Depending on the task, this output can take different forms. For example, in a language modeling task, the
output might be a probability distribution over the next word in a sentence.
d. Updating the Hidden State The current hidden state becomes the hidden state for the next time step,
carrying information forward in the sequence.
3.Repeating the Process Steps 2a to 2d are repeated for each element
in the input sequence, allowing the RNN to capture dependencies and
patterns within the sequence.
4.Final Output After processing the entire sequence, the RNN may
produce a final output, depending on the specific task. For example, in a
sentiment analysis task, the final output might be the predicted
sentiment of a sentence.
Unlike Deep neural networks where we have different weight matrices for each Dense
network in RNN, the weight across the network remains the same. It calculates state
hidden state Hi for every input Xi . By using the following formulas
h= σ(UX + Wh-1 + B)
Y = O(Vh + C) Hence
Y = f (X, h , W, U, V, B, C)
Here S is the State matrix which has element si as the state of the network at timestep i
The parameters in the network are W, U, V, c, b which are shared across timestep
Applications of NLP
Text Clustering
Language Text Text-to-Speech Language Spam
and Topic
Modeling Classification (TTS) Synthesis Generation Detection
Modeling
Language
Document Healthcare Financial Social Media
Understanding
Classification Applications Analysis Analysis
and Parsing