0% found this document useful (0 votes)
643 views21 pages

Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414

Module 4 | S8 CSE NOTES -KTU DEEP LEARNING NOTES | CST414

Uploaded by

suryajit27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
643 views21 pages

Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414

Module 4 | S8 CSE NOTES -KTU DEEP LEARNING NOTES | CST414

Uploaded by

suryajit27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Deep Learning Module 4

Recurrent Neural Networks (RNNs)


Recurrent Neural Networks (RNNs) are a class of artificial neural networks
designed to recognize patterns in sequences of data, such as time series,
natural language, and other sequential data. They are particularly well-
suited for tasks where context and sequential order are important. Here's an
in-depth explanation of the architecture and workings of RNNs:

1. Introduction and Basic Structure


RNNs are designed to handle sequential data by maintaining a hidden state
that captures information about previous elements in the sequence. Unlike
feedforward neural networks, which process input data independently,
RNNs use their internal state (memory) to process sequences of inputs.

2. Basic Architecture

Deep Learning Module 4 1


The basic structure of an RNN involves repeating units that take the current
input and the previous hidden state as inputs to produce the current hidden
state and output. Mathematically, the operations of an RNN can be
described as follows:

Hidden State Update:

[ht = f(Wh ht−1 + Wx xt + bh )]


​ ​ ​ ​ ​ ​

where:

( ht ) is the hidden state at time step ( t ).


( ht−1 ) is the hidden state at the previous time step.


( xt ) is the input at time step ( t ).


( Wh ) and ( Wx ) are weight matrices for the hidden state and input,
​ ​

respectively.

( bh ) is the bias term.


( f ) is a non-linear activation function, typically tanh or ReLU.

Output:
[
yt = f(Wy ht + by )]
​ ​ ​

where:

( yt ) is the output at time step ( t ).


( Wy ) is the weight matrix for the output.


Deep Learning Module 4 2


( by ) is the bias term for the output.

3. Unfolding in Time
RNNs can be visualized as being "unfolded" in time, where each time step
of the input sequence corresponds to a layer in the network. This unfolding
shows the recursive nature of RNNs, where each unit shares the same
parameters but processes different elements of the sequence.

4. Training RNNs
Training RNNs involves adjusting the weights and biases to minimize a loss
function, typically using a variant of backpropagation called
Backpropagation Through Time (BPTT). BPTT involves the following steps:

Forward Pass: Compute the hidden states and outputs for each time
step.

Loss Calculation: Compute the loss based on the predicted outputs and
actual targets.

Backward Pass: Calculate the gradients of the loss with respect to the
parameters by propagating the error backwards through time.

5. Challenges with Standard RNNs


Standard RNNs face several challenges, including:

Vanishing and Exploding Gradients: During backpropagation, gradients


can become very small (vanishing) or very large (exploding), making
training difficult. This is particularly problematic for long sequences
where dependencies span many time steps.

Limited Long-Term Memory: Standard RNNs struggle to capture long-


term dependencies due to the vanishing gradient problem.

6. Variants of RNNs
To address the limitations of standard RNNs, several variants have been
developed, including:

Long Short-Term Memory (LSTM): LSTMs introduce memory cells and


gating mechanisms (input, output, and forget gates) to better capture
long-term dependencies and mitigate the vanishing gradient problem.

Deep Learning Module 4 3


Gated Recurrent Unit (GRU): GRUs simplify the LSTM architecture by
combining the forget and input gates into a single update gate, while still
effectively capturing long-term dependencies.

7. Summary of RNN Operations


To summarize the operations of a standard RNN:

1. Initialization: Initialize the hidden state ( h0 ). ​

2. Iteration Over Sequence:

For each time step ( t ), compute the hidden state ( ht ) using the

current input ( xt ) and the previous hidden state ( ht−1 ).


​ ​

Compute the output ( yt ) using the current hidden state ( ht ).


​ ​

3. Loss Calculation and Backpropagation: Compute the loss over the


entire sequence and use BPTT to update the weights.

8. Mathematical Representation
The mathematical representation of an RNN highlights its recursive nature:

Hidden State: [ht ​ = f(Wh ht−1 + Wx xt + bh )]


​ ​ ​ ​

Output: yt ​ = f(Wy ht + by )
​ ​ ​

The above equations encapsulate the essence of RNNs, where the hidden
state at each time step is a function of the current input and the previous
hidden state, allowing the network to maintain a memory of previous inputs.

Recurrent Neural Networks (RNNs) can be configured in various


architectures depending on the nature of the input and output sequences.
Here are the main types:

1. One-to-One (Vanilla Neural Network)


Architecture:

Single input and single output.

No recurrent connections.

Use Case: Standard feedforward neural network tasks like image


classification.

Deep Learning Module 4 4


2. One-to-Many
Architecture:

Single input and a sequence of outputs.

The hidden state evolves over time to generate multiple outputs from
a single input.

Use Case: Image captioning, where a single image is the input and a
sequence of words (caption) is the output.

3. Many-to-One
Architecture:

A sequence of inputs and a single output.

Deep Learning Module 4 5


The hidden state accumulates information from the input sequence
to produce one output.

Use Case: Sentiment analysis, where a sequence of words (sentence) is


the input and the sentiment (positive/negative) is the output.

4. Many-to-Many (Sequence-to-Sequence)
Architecture:

A sequence of inputs and a sequence of outputs.

The hidden state is passed through each time step, generating an


output at each step.

Use Case: Machine translation, where a sequence of words in one


language is the input and a sequence of words in another language is
the output.

Deep Learning Module 4 6


Recurrent Neural Networks (RNNs) are highly effective for processing
sequential data, making them applicable in various domains. Here are some key
applications:

Natural Language Processing (NLP)


Language Modeling and Text Generation: RNNs predict the next word in a
sentence, enabling text generation and language modeling.

Machine Translation: Used in Seq2Seq models with attention mechanisms


to translate text between languages.

Sentiment Analysis: RNNs analyze text to determine sentiment by


capturing the contextual meaning of words.

Speech Recognition: Transcribe spoken language into text by processing


audio signals sequentially.

Time Series Analysis


Financial Forecasting: Predict stock prices and financial metrics by
analyzing historical data.

Weather Prediction: Forecast weather conditions by processing historical


weather data.

Healthcare
Patient Monitoring: Analyze medical data sequences to monitor patient
health and detect anomalies.

Deep Learning Module 4 7


Disease Progression Modeling: Model the progression of diseases by
analyzing medical records.

Anomaly Detection
Network Security: Detect unusual patterns indicating cyber-attacks by
analyzing network activity logs.

Fault Detection: Monitor machinery to detect faults and prevent


breakdowns.

Robotics and Control Systems


Autonomous Driving: Process sensor data to make real-time driving
decisions.

Motion Prediction: Predict future positions of moving objects or robots'


next actions.

Music and Art


Music Generation: Compose music by learning from sequences of notes.

Image Captioning: Generate descriptive captions for images by combining


RNNs with CNNs.

Chatbots and Conversational AI


Chatbots: Understand and generate human-like responses by maintaining
context in conversations.

Unrolling a Recurrent Neural Network (RNN) through time is a method of


visualizing and understanding the processing of sequences by RNNs. This
concept can be explained as follows:

Unrolling Through Time


1. Concept of Unrolling:

In its basic form, an RNN has a hidden state that is updated at each time
step of the input sequence. The same set of weights is used across all
time steps.

When we unroll an RNN through time, we expand the RNN into a


sequence of layers, each corresponding to a time step in the input

Deep Learning Module 4 8


sequence. This process turns the RNN into a deep feedforward
network, where each layer represents the RNN's state at a particular
time step.

2. Visualization:

Imagine an RNN processing a sequence of input data ( x =


(x1 , x2 , … , xT )). At each time step ( t ), the RNN takes the input ( xt )
​ ​ ​ ​

and the hidden state from the previous time step ( ht−1 ) to compute the ​

new hidden state ( ht ). ​

Unrolling the RNN means explicitly drawing out each of these time
steps as individual layers. So, instead of having one recurrent unit that
updates its state, we have a sequence of units, each with its own copy
of the weights, but all sharing the same parameters.

3. Mathematical Representation:

The hidden state at time ( t ) is given by:

[ ht ​ = f(Whh ht−1 + Wxh xt + bh )]


​ ​ ​ ​ ​

where ( Whh ) is the weight matrix for the hidden state, ( Wxh ) is the weight
​ ​

matrix for the input, and ( bh ) is the bias. ​

When unrolled, this becomes:


[h1 = f(Whh h0 + Wxh x1 + bh )]
​ ​ ​ ​ ​ ​

[h2 = f(Whh h1 + Wxh x2 + bh )]


​ ​ ​ ​ ​ ​

⋮
[hT = f(Whh hT −1 + Wxh xT + bh )]
​ ​ ​ ​ ​

4. Benefits of Unrolling:

Understanding and Visualization: Unrolling makes it easier to visualize


and understand the flow of information through the network over time.

Backpropagation Through Time (BPTT): Unrolling is essential for


training RNNs using BPTT, which is an extension of the
backpropagation algorithm. By unrolling the network, we can compute
the gradients of the loss function with respect to the weights over all
time steps and adjust the weights accordingly.

5. Example:

Deep Learning Module 4 9


Consider an RNN processing a sequence of words in a sentence to
predict the next word. Unrolling the RNN means creating separate
layers for each word in the sentence, where each layer represents the
RNN's state after processing that word. This way, the dependencies
between words and their influence on each other through time can be
visualized and learned.

RECURSIVE NEURAL NETWORKS


Recursive Neural Networks (RNNs) are a type of neural network where the
same set of weights is applied recursively over a structured input. Unlike the
more common Recurrent Neural Networks, which work well with sequential
data, Recursive Neural Networks are particularly effective with hierarchical
data structures.

Introduction
Recursive Neural Networks represent data in a hierarchical or tree-like
structure, making them suitable for tasks where the input naturally forms a
tree. This structure allows for deep learning of the input data, leading to a
more compact and potentially more powerful representation of the input.

Deep Learning Module 4 10


Structure and Function
Recursive Neural Networks are built upon a tree structure, where each node
in the tree is represented by a neural network. The typical structure of a
recursive neural network involves recursively applying the same neural
network unit at each node of the tree. This approach helps in capturing the
hierarchical relationships within the data.

1. Tree Representation:

Nodes and Edges: Each node in the tree represents a subpart of the
input, and the edges represent the relationship between these
subparts.

Recursive Application: The same neural network weights are


applied at each node, allowing the network to learn features at
different levels of the hierarchy.

2. Forward Propagation:

Recursive Computation: The computation starts from the leaves of


the tree and moves towards the root. Each node computes its state
based on the states of its child nodes.

State Computation: The state of a node is typically computed using


a non-linear function applied to the states of its child nodes,
combined with a weight matrix and a bias term.

3. Training:

Backpropagation Through Structure: The network is trained using


backpropagation, adapted to handle the tree structure. Gradients are
computed recursively from the root of the tree down to the leaves.

Loss Function: The loss is computed at the root of the tree, and the
error is propagated downwards, updating the weights at each node.

Applications
Recursive Neural Networks have been successfully applied in various
domains, including:

1. Natural Language Processing (NLP):

Sentence Parsing: Recursive neural networks can be used to parse


sentences into their grammatical structures, capturing the

Deep Learning Module 4 11


relationships between words.

Sentiment Analysis: By analyzing the structure of a sentence,


recursive networks can determine the sentiment expressed in the
text.

2. Computer Vision:

Image Segmentation: In tasks where an image can be decomposed


into hierarchical segments, recursive networks can effectively
capture the relationships between these segments.

Scene Understanding: Recursive networks help in understanding


the hierarchical structure of objects within a scene, providing a more
detailed understanding of the image.

3. Reasoning and Knowledge Representation:

Logic Reasoning: Recursive neural networks can represent logical


relationships and perform reasoning tasks by capturing the
hierarchical structure of logical statements.

Knowledge Graphs: They are used to embed nodes and edges of a


knowledge graph into a continuous vector space, preserving the
graph's structural properties.

Advantages
1. Hierarchical Learning:

Compact Representation: Recursive neural networks provide a


compact representation of the input by capturing its hierarchical
structure.

Deep Feature Learning: They enable deep learning of features at


multiple levels of the hierarchy, leading to better performance on
tasks that require understanding of hierarchical relationships.

2. Flexibility:

Adaptable Structure: Recursive networks can adapt to different


input structures, making them versatile for various applications.

3. Efficiency:

Reduced Depth: For a sequence of the same length, the depth of a


recursive neural network can be drastically reduced compared to a

Deep Learning Module 4 12


recurrent neural network, helping to manage long-term

dependencies more efficiently 43:0†source . 】
Challenges
1. Complex Training:

Gradient Computation: The recursive nature of the network makes


the gradient computation more complex, requiring careful
implementation of backpropagation through structure.

2. Structural Dependency:

Fixed Structure: The performance of recursive neural networks is


highly dependent on the structure of the input data. In many
applications, determining the optimal structure can be challenging.

Recursive Neural Networks (RNNs) are a type of artificial neural network


that is particularly adept at processing data in a sequential manner, making
them well-suited for tasks involving time-series data, language modeling,
and other sequential information. Here's an explanation of how RNNs work,
using an analogy:

Analogy: Storyteller Memory


Imagine a storyteller who narrates a long tale. As the story progresses, the
storyteller doesn't start fresh each time but instead builds upon the events
and characters introduced earlier. The storyteller's memory of the plot,
characters, and previous events helps them continue the story coherently.

1. State as Memory:

The storyteller's memory represents the "state" of the story at any


given point.

Similarly, an RNN maintains a "hidden state" which captures


information from previous inputs.

2. Processing New Information:

As the storyteller introduces new events or characters, they update


their memory to include this new information.

In an RNN, each new piece of data (such as a word in a sentence)


updates the hidden state.

Deep Learning Module 4 13


3. Continuity:

The continuity of the story depends on the storyteller's ability to


remember and integrate past events with new ones.

The RNN ensures continuity by using the hidden state to integrate


past inputs with the current input.

Working of RNNs
1. Input Sequence:

An RNN processes a sequence of inputs, ( x(1), x(2), … , x(T )),


where ( T ) is the length of the sequence.

2. Hidden State Update:

At each time step ( t), the RNN updates its hidden state ( h(t))
based on the previous hidden state ( h(t − 1)) and the current input
( x(t)).

This update is typically done using a function like: ( h(t) =


σ(Wh h(t − 1) + Wx x(t) + b)), where ( σ ) is an activation function
​ ​

(e.g., tanh or ReLU), and ( Wh ), ( Wx ), and ( b) are parameters
​ ​

learned during training.

3. Output Generation:

The RNN generates an output ( y(t) ) at each time step, which can
be used for tasks like prediction or classification.

The output is often derived from the hidden state: ( y(t) =


ϕ(Wy h(t) + c)), where ( ϕ) is another activation function, and ( Wy 

) and ( c) are parameters.

Applications of RNNs
RNNs are used in various applications where the order of data matters:

Language Modeling and Text Generation:

Predicting the next word in a sentence or generating coherent text


sequences.

Speech Recognition:

Deep Learning Module 4 14


Converting spoken language into text by processing audio signals
over time.

Time-Series Prediction:

Forecasting future values in time-series data such as stock prices or


weather patterns.

Machine Translation:

Translating sentences from one language to another by


understanding and generating sequences of words.

Video Analysis:

Understanding and generating sequences of frames in a video for


tasks like action recognition.

Recursive Neural Networks in Practice


RNNs have evolved into more sophisticated architectures like Long Short-
Term Memory (LSTM) and Gated Recurrent Units (GRU) to address issues
like vanishing and exploding gradients, allowing them to remember
information over longer sequences effectively.

LSTM ARCHITECTURE

Components:

Deep Learning Module 4 15


An LSTM cell comprises several crucial elements:
1. Cell State (Ct): Acts as the memory unit, capable of retaining
information for extended periods, addressing the vanishing gradient
problem.
2.
Forget Gate (ft): Decides which information to discard from the previous
cell state (Ct-1) based on the current input (xt) and previous hidden state
(ht-1). It uses a sigmoid activation function.
3.
Input Gate (it): Controls what new information to add to the cell state by
generating a candidate memory cell value (Ct^’) based on the current
input (xt) and previous hidden state (ht-1). It also uses a sigmoid
activation function.
4.
Output Gate (ot): Determines what information from the cell state to
output as part of the hidden state (ht) by generating an output vector
(ot(t) * tanh(Ct)). The output gate uses both a sigmoid activation function
and a hyperbolic tangent (tanh) activation function.
Information Flow:
1. Forget Gate (ft): Generates a forget vector (ft(t) * Ct-1) based on the
previous hidden state (ht-1) and current input (xt) using a sigmoid
activation function.
2.
Input Gate (it): Produces an input vector (it(t) * Ct^’) representing new
information to be added to the cell state, along with a candidate memory
cell value (Ct^’). The input gate utilizes a sigmoid activation function for
gate control.
3.
Cell State Update: Combines the forget vector and input vector to
update the current cell state (Ct).
4.
Output Gate (ot): Determines the flow of information from the current
cell state (Ct) to the hidden state (ht) by generating an output vector
(ot(t) * tanh(Ct)) using a sigmoid activation function for gate control and
a hyperbolic tangent (tanh) activation function for output scaling.
Overall, these components and the information flow mechanism, along
with the specific activation functions used, enable an LSTM cell to retain

Deep Learning Module 4 16


long-term dependencies and effectively process sequential data.

LSTM APPLICATIONS
LSTMs (Long Short-Term Memory) networks are a cornerstone in Natural
Language Processing (NLP), facilitating a variety of tasks due to their
proficiency in handling sequential data and learning long-term
dependencies within text. Here’s how LSTMs are applied in NLP:
1. Language Modeling:

Task: Predicting the next word in a sequence based on preceding words,
pivotal for autocompletion, text generation, and machine translation.

How LSTMs help: They discern contextual cues from prior words and
leverage their memory capabilities to grasp relationships, enabling
accurate word prediction.
2.
Machine Translation:

Task: Translating text from one language to another while preserving
semantic and syntactic coherence.

How LSTMs help: By capturing long-term dependencies in sentences,
like word order and grammatical structures, LSTMs ensure faithful
translation by considering the holistic context.
3.
Sentiment Analysis:

Task: Determining the emotional polarity of text (positive, negative, or
neutral).

How LSTMs help: Analyzing word sequences allows LSTMs to gauge
sentiment, considering not only individual words but also their contextual
nuances, thereby capturing subtle emotional cues effectively.
4.
Text Summarization:

Task: Condensing lengthy text while retaining essential information.

Deep Learning Module 4 17



How LSTMs help: By processing the entire text, LSTMs identify salient
points using their memory, ensuring that the summary encapsulates key
information.
5.
Text Generation:

Task: Crafting coherent and grammatically sound text, such as in
chatbots or creative writing assistants.

How LSTMs help: Drawing from patterns in vast text corpora, LSTMs
generate new text that adheres to established styles and themes,
maintaining continuity and consistency.
6.
Question Answering:

Task: Providing answers to questions posed in natural language,
leveraging extensive text corpora or knowledge bases.

How LSTMs help: By understanding question context and analyzing
relevant text passages, LSTMs use their memory to discern appropriate
answers from processed information.

GRU ARCHITECTURE
The Gated Recurrent Unit (GRU) is a type of recurrent neural network
(RNN) architecture designed to address the vanishing gradient problem
and efficiently capture long-range dependencies in sequential data.
Here’s a brief overview of the GRU architecture:

Deep Learning Module 4 18


1. Components:

Reset Gate (rt): Controls how much of the previous hidden state to
forget, considering the current input and previous hidden state.

Update Gate (zt): Regulates the balance between the previous hidden
state and the new candidate hidden state based on the current input and
previous hidden state.

Hidden State (ht): Represents the memory of the network at each time
step, encoding information from past inputs.
2.
Information Flow:
◦ The reset gate determines the relevance of past information,
influencing the reset vector.
◦ The update gate adjusts the combination of the previous hidden
state and the candidate hidden state, controlling the flow of new
information into the current hidden state.
◦ The new hidden state is computed as a weighted combination of the
previous hidden state and the candidate hidden state, with the update
gate determining the weighting.
3.
Advantages:

Simplicity: GRUs have a simpler architecture compared to LSTMs,
making them easier to train and potentially more computationally
efficient.

Performance: Despite their simplicity, GRUs can achieve comparable or
even superior performance to LSTMs on various sequential learning
tasks.

Understanding the GRU Cell


The GRU cell is the fundamental unit of a GRU network and consists of
three main components:

Update Gate

Reset Gate

Deep Learning Module 4 19


Candidate Hidden State

Update Gate
The update gate controls the amount of the past information that needs
to be passed along to the future. The output of the update gate zt is ​

calculated as follows:
zt = σ(Wz xt + Uz ht−1 )
​ ​ ​ ​ ​

where:

xt is the current input.


ht−1 is the hidden state from the previous timestep.


Wz and Uz are weight matrices.


​ ​

σ is the sigmoid activation function.

Reset Gate
The reset gate determines the amount of past information to forget. The
output of the reset gate rt is calculated as: ​

rt = σ(Wr xt + Ur ht−1 )
​ ​ ​ ​ ​

where:

Wr and Ur are weight matrices.


​ ​

Candidate Hidden State


~
The candidate hidden state ht is calculated using the reset gate: ​

~
ht = tanh(W xt + rt ⊙ Uht − 1) ​ ​

where:

⊙represents element-wise multiplication.


W and U are weight matrices.

GRU vs LSTM
Feature GRU LSTM

Simpler with two gates More complex with three gates


Structure
(update and reset) (input, forget, output)

Deep Learning Module 4 20


Fewer (3 weight
Parameters More (4 weight matrices)
matrices)

Training Speed Faster to train Slower to train

Space Typically uses fewer


Requires more memory resources
Complexity memory resources

Performs similarly to Performs well on many tasks but


Performance
LSTM; sometimes better more computationally expensive

Generally suitable for Effective for tasks like natural


Task Suitability large datasets or language understanding and
sequences machine translation

Deep Learning Module 4 21

You might also like