0% found this document useful (0 votes)

643 views21 pages

Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414

Module 4 | S8 CSE NOTES -KTU DEEP LEARNING NOTES | CST414

Uploaded by

suryajit27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

643 views21 pages

Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414

Module 4 | S8 CSE NOTES -KTU DEEP LEARNING NOTES | CST414

Uploaded by

suryajit27

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Deep Learning Module 4

Recurrent Neural Networks (RNNs)

Recurrent Neural Networks (RNNs) are a class of artificial neural networks
designed to recognize patterns in sequences of data, such as time series,
natural language, and other sequential data. They are particularly well-
suited for tasks where context and sequential order are important. Here's an
in-depth explanation of the architecture and workings of RNNs:

1. Introduction and Basic Structure

RNNs are designed to handle sequential data by maintaining a hidden state
that captures information about previous elements in the sequence. Unlike
feedforward neural networks, which process input data independently,
RNNs use their internal state (memory) to process sequences of inputs.

2. Basic Architecture

Deep Learning Module 4 1

The basic structure of an RNN involves repeating units that take the current
input and the previous hidden state as inputs to produce the current hidden
state and output. Mathematically, the operations of an RNN can be
described as follows:

Hidden State Update:

[ht = f(Wh ht−1 + Wx xt + bh )]

where:

( ht ) is the hidden state at time step ( t ).

( ht−1 ) is the hidden state at the previous time step.

( xt ) is the input at time step ( t ).

( Wh ) and ( Wx ) are weight matrices for the hidden state and input,

respectively.

( bh ) is the bias term.

( f ) is a non-linear activation function, typically tanh or ReLU.

Output:
[
yt = f(Wy ht + by )]

where:

( yt ) is the output at time step ( t ).

( Wy ) is the weight matrix for the output.

Deep Learning Module 4 2

( by ) is the bias term for the output.

3. Unfolding in Time
RNNs can be visualized as being "unfolded" in time, where each time step
of the input sequence corresponds to a layer in the network. This unfolding
shows the recursive nature of RNNs, where each unit shares the same
parameters but processes different elements of the sequence.

4. Training RNNs
Training RNNs involves adjusting the weights and biases to minimize a loss
function, typically using a variant of backpropagation called
Backpropagation Through Time (BPTT). BPTT involves the following steps:

Forward Pass: Compute the hidden states and outputs for each time
step.

Loss Calculation: Compute the loss based on the predicted outputs and
actual targets.

Backward Pass: Calculate the gradients of the loss with respect to the
parameters by propagating the error backwards through time.

5. Challenges with Standard RNNs

Standard RNNs face several challenges, including:

Vanishing and Exploding Gradients: During backpropagation, gradients

can become very small (vanishing) or very large (exploding), making
training difficult. This is particularly problematic for long sequences
where dependencies span many time steps.

Limited Long-Term Memory: Standard RNNs struggle to capture long-

term dependencies due to the vanishing gradient problem.

6. Variants of RNNs
To address the limitations of standard RNNs, several variants have been
developed, including:

Long Short-Term Memory (LSTM): LSTMs introduce memory cells and

gating mechanisms (input, output, and forget gates) to better capture
long-term dependencies and mitigate the vanishing gradient problem.

Deep Learning Module 4 3

Gated Recurrent Unit (GRU): GRUs simplify the LSTM architecture by
combining the forget and input gates into a single update gate, while still
effectively capturing long-term dependencies.

7. Summary of RNN Operations

To summarize the operations of a standard RNN:

1. Initialization: Initialize the hidden state ( h0 ).

2. Iteration Over Sequence:

For each time step ( t ), compute the hidden state ( ht ) using the

current input ( xt ) and the previous hidden state ( ht−1 ).

Compute the output ( yt ) using the current hidden state ( ht ).

3. Loss Calculation and Backpropagation: Compute the loss over the

entire sequence and use BPTT to update the weights.

8. Mathematical Representation
The mathematical representation of an RNN highlights its recursive nature:

Hidden State: [ht = f(Wh ht−1 + Wx xt + bh )]

Output: yt = f(Wy ht + by )

The above equations encapsulate the essence of RNNs, where the hidden
state at each time step is a function of the current input and the previous
hidden state, allowing the network to maintain a memory of previous inputs.

Recurrent Neural Networks (RNNs) can be configured in various

architectures depending on the nature of the input and output sequences.
Here are the main types:

1. One-to-One (Vanilla Neural Network)

Architecture:

Single input and single output.

No recurrent connections.

Use Case: Standard feedforward neural network tasks like image

classification.

Deep Learning Module 4 4

2. One-to-Many
Architecture:

Single input and a sequence of outputs.

The hidden state evolves over time to generate multiple outputs from
a single input.

Use Case: Image captioning, where a single image is the input and a
sequence of words (caption) is the output.

3. Many-to-One
Architecture:

A sequence of inputs and a single output.

Deep Learning Module 4 5

The hidden state accumulates information from the input sequence
to produce one output.

Use Case: Sentiment analysis, where a sequence of words (sentence) is

the input and the sentiment (positive/negative) is the output.

4. Many-to-Many (Sequence-to-Sequence)
Architecture:

A sequence of inputs and a sequence of outputs.

The hidden state is passed through each time step, generating an

output at each step.

Use Case: Machine translation, where a sequence of words in one

language is the input and a sequence of words in another language is
the output.

Deep Learning Module 4 6

Recurrent Neural Networks (RNNs) are highly effective for processing
sequential data, making them applicable in various domains. Here are some key
applications:

Natural Language Processing (NLP)

Language Modeling and Text Generation: RNNs predict the next word in a
sentence, enabling text generation and language modeling.

Machine Translation: Used in Seq2Seq models with attention mechanisms

to translate text between languages.

Sentiment Analysis: RNNs analyze text to determine sentiment by

capturing the contextual meaning of words.

Speech Recognition: Transcribe spoken language into text by processing

audio signals sequentially.

Time Series Analysis

Financial Forecasting: Predict stock prices and financial metrics by
analyzing historical data.

Weather Prediction: Forecast weather conditions by processing historical

weather data.

Healthcare
Patient Monitoring: Analyze medical data sequences to monitor patient
health and detect anomalies.

Deep Learning Module 4 7

Disease Progression Modeling: Model the progression of diseases by
analyzing medical records.

Anomaly Detection
Network Security: Detect unusual patterns indicating cyber-attacks by
analyzing network activity logs.

Fault Detection: Monitor machinery to detect faults and prevent

breakdowns.

Robotics and Control Systems

Autonomous Driving: Process sensor data to make real-time driving
decisions.

Motion Prediction: Predict future positions of moving objects or robots'

next actions.

Music and Art

Music Generation: Compose music by learning from sequences of notes.

Image Captioning: Generate descriptive captions for images by combining

RNNs with CNNs.

Chatbots and Conversational AI

Chatbots: Understand and generate human-like responses by maintaining
context in conversations.

Unrolling a Recurrent Neural Network (RNN) through time is a method of

visualizing and understanding the processing of sequences by RNNs. This
concept can be explained as follows:

Unrolling Through Time

1. Concept of Unrolling:

In its basic form, an RNN has a hidden state that is updated at each time
step of the input sequence. The same set of weights is used across all
time steps.

When we unroll an RNN through time, we expand the RNN into a

sequence of layers, each corresponding to a time step in the input

Deep Learning Module 4 8

sequence. This process turns the RNN into a deep feedforward
network, where each layer represents the RNN's state at a particular
time step.

2. Visualization:

Imagine an RNN processing a sequence of input data ( x =

(x1 , x2 , … , xT )). At each time step ( t ), the RNN takes the input ( xt )

and the hidden state from the previous time step ( ht−1 ) to compute the

new hidden state ( ht ).

Unrolling the RNN means explicitly drawing out each of these time
steps as individual layers. So, instead of having one recurrent unit that
updates its state, we have a sequence of units, each with its own copy
of the weights, but all sharing the same parameters.

3. Mathematical Representation:

The hidden state at time ( t ) is given by:

[ ht = f(Whh ht−1 + Wxh xt + bh )]

where ( Whh ) is the weight matrix for the hidden state, ( Wxh ) is the weight

matrix for the input, and ( bh ) is the bias.

When unrolled, this becomes:

[h1 = f(Whh h0 + Wxh x1 + bh )]

[h2 = f(Whh h1 + Wxh x2 + bh )]

⋮
[hT = f(Whh hT −1 + Wxh xT + bh )]

4. Benefits of Unrolling:

Understanding and Visualization: Unrolling makes it easier to visualize

and understand the flow of information through the network over time.

Backpropagation Through Time (BPTT): Unrolling is essential for

training RNNs using BPTT, which is an extension of the
backpropagation algorithm. By unrolling the network, we can compute
the gradients of the loss function with respect to the weights over all
time steps and adjust the weights accordingly.

5. Example:

Deep Learning Module 4 9

Consider an RNN processing a sequence of words in a sentence to
predict the next word. Unrolling the RNN means creating separate
layers for each word in the sentence, where each layer represents the
RNN's state after processing that word. This way, the dependencies
between words and their influence on each other through time can be
visualized and learned.

RECURSIVE NEURAL NETWORKS

Recursive Neural Networks (RNNs) are a type of neural network where the
same set of weights is applied recursively over a structured input. Unlike the
more common Recurrent Neural Networks, which work well with sequential
data, Recursive Neural Networks are particularly effective with hierarchical
data structures.

Introduction
Recursive Neural Networks represent data in a hierarchical or tree-like
structure, making them suitable for tasks where the input naturally forms a
tree. This structure allows for deep learning of the input data, leading to a
more compact and potentially more powerful representation of the input.

Deep Learning Module 4 10

Structure and Function
Recursive Neural Networks are built upon a tree structure, where each node
in the tree is represented by a neural network. The typical structure of a
recursive neural network involves recursively applying the same neural
network unit at each node of the tree. This approach helps in capturing the
hierarchical relationships within the data.

1. Tree Representation:

Nodes and Edges: Each node in the tree represents a subpart of the
input, and the edges represent the relationship between these
subparts.

Recursive Application: The same neural network weights are

applied at each node, allowing the network to learn features at
different levels of the hierarchy.

2. Forward Propagation:

Recursive Computation: The computation starts from the leaves of

the tree and moves towards the root. Each node computes its state
based on the states of its child nodes.

State Computation: The state of a node is typically computed using

a non-linear function applied to the states of its child nodes,
combined with a weight matrix and a bias term.

3. Training:

Backpropagation Through Structure: The network is trained using

backpropagation, adapted to handle the tree structure. Gradients are
computed recursively from the root of the tree down to the leaves.

Loss Function: The loss is computed at the root of the tree, and the
error is propagated downwards, updating the weights at each node.

Applications
Recursive Neural Networks have been successfully applied in various
domains, including:

1. Natural Language Processing (NLP):

Sentence Parsing: Recursive neural networks can be used to parse

sentences into their grammatical structures, capturing the

Deep Learning Module 4 11

relationships between words.

Sentiment Analysis: By analyzing the structure of a sentence,

recursive networks can determine the sentiment expressed in the
text.

2. Computer Vision:

Image Segmentation: In tasks where an image can be decomposed

into hierarchical segments, recursive networks can effectively
capture the relationships between these segments.

Scene Understanding: Recursive networks help in understanding

the hierarchical structure of objects within a scene, providing a more
detailed understanding of the image.

3. Reasoning and Knowledge Representation:

Logic Reasoning: Recursive neural networks can represent logical

relationships and perform reasoning tasks by capturing the
hierarchical structure of logical statements.

Knowledge Graphs: They are used to embed nodes and edges of a

knowledge graph into a continuous vector space, preserving the
graph's structural properties.

Advantages
1. Hierarchical Learning:

Compact Representation: Recursive neural networks provide a

compact representation of the input by capturing its hierarchical
structure.

Deep Feature Learning: They enable deep learning of features at

multiple levels of the hierarchy, leading to better performance on
tasks that require understanding of hierarchical relationships.

2. Flexibility:

Adaptable Structure: Recursive networks can adapt to different

input structures, making them versatile for various applications.

3. Efficiency:

Reduced Depth: For a sequence of the same length, the depth of a

recursive neural network can be drastically reduced compared to a

Deep Learning Module 4 12

recurrent neural network, helping to manage long-term
【
dependencies more efficiently 43:0†source . 】
Challenges
1. Complex Training:

Gradient Computation: The recursive nature of the network makes

the gradient computation more complex, requiring careful
implementation of backpropagation through structure.

2. Structural Dependency:

Fixed Structure: The performance of recursive neural networks is

highly dependent on the structure of the input data. In many
applications, determining the optimal structure can be challenging.

Recursive Neural Networks (RNNs) are a type of artificial neural network

that is particularly adept at processing data in a sequential manner, making
them well-suited for tasks involving time-series data, language modeling,
and other sequential information. Here's an explanation of how RNNs work,
using an analogy:

Analogy: Storyteller Memory

Imagine a storyteller who narrates a long tale. As the story progresses, the
storyteller doesn't start fresh each time but instead builds upon the events
and characters introduced earlier. The storyteller's memory of the plot,
characters, and previous events helps them continue the story coherently.

1. State as Memory:

The storyteller's memory represents the "state" of the story at any

given point.

Similarly, an RNN maintains a "hidden state" which captures

information from previous inputs.

2. Processing New Information:

As the storyteller introduces new events or characters, they update

their memory to include this new information.

In an RNN, each new piece of data (such as a word in a sentence)

updates the hidden state.

Deep Learning Module 4 13

3. Continuity:

The continuity of the story depends on the storyteller's ability to

remember and integrate past events with new ones.

The RNN ensures continuity by using the hidden state to integrate

past inputs with the current input.

Working of RNNs
1. Input Sequence:

An RNN processes a sequence of inputs, ( x(1), x(2), … , x(T )),

where ( T ) is the length of the sequence.

2. Hidden State Update:

At each time step ( t), the RNN updates its hidden state ( h(t))
based on the previous hidden state ( h(t − 1)) and the current input
( x(t)).

This update is typically done using a function like: ( h(t) =

σ(Wh h(t − 1) + Wx x(t) + b)), where ( σ ) is an activation function

(e.g., tanh or ReLU), and ( Wh ), ( Wx ), and ( b) are parameters

learned during training.

3. Output Generation:

The RNN generates an output ( y(t) ) at each time step, which can
be used for tasks like prediction or classification.

The output is often derived from the hidden state: ( y(t) =

ϕ(Wy h(t) + c)), where ( ϕ) is another activation function, and ( Wy

) and ( c) are parameters.

Applications of RNNs
RNNs are used in various applications where the order of data matters:

Language Modeling and Text Generation:

Predicting the next word in a sentence or generating coherent text

sequences.

Speech Recognition:

Deep Learning Module 4 14

Converting spoken language into text by processing audio signals
over time.

Time-Series Prediction:

Forecasting future values in time-series data such as stock prices or

weather patterns.

Machine Translation:

Translating sentences from one language to another by

understanding and generating sequences of words.

Video Analysis:

Understanding and generating sequences of frames in a video for

tasks like action recognition.

Recursive Neural Networks in Practice

RNNs have evolved into more sophisticated architectures like Long Short-
Term Memory (LSTM) and Gated Recurrent Units (GRU) to address issues
like vanishing and exploding gradients, allowing them to remember
information over longer sequences effectively.

LSTM ARCHITECTURE

Components:

Deep Learning Module 4 15

An LSTM cell comprises several crucial elements:
1. Cell State (Ct): Acts as the memory unit, capable of retaining
information for extended periods, addressing the vanishing gradient
problem.
2.
Forget Gate (ft): Decides which information to discard from the previous
cell state (Ct-1) based on the current input (xt) and previous hidden state
(ht-1). It uses a sigmoid activation function.
3.
Input Gate (it): Controls what new information to add to the cell state by
generating a candidate memory cell value (Ct^’) based on the current
input (xt) and previous hidden state (ht-1). It also uses a sigmoid
activation function.
4.
Output Gate (ot): Determines what information from the cell state to
output as part of the hidden state (ht) by generating an output vector
(ot(t) * tanh(Ct)). The output gate uses both a sigmoid activation function
and a hyperbolic tangent (tanh) activation function.
Information Flow:
1. Forget Gate (ft): Generates a forget vector (ft(t) * Ct-1) based on the
previous hidden state (ht-1) and current input (xt) using a sigmoid
activation function.
2.
Input Gate (it): Produces an input vector (it(t) * Ct^’) representing new
information to be added to the cell state, along with a candidate memory
cell value (Ct^’). The input gate utilizes a sigmoid activation function for
gate control.
3.
Cell State Update: Combines the forget vector and input vector to
update the current cell state (Ct).
4.
Output Gate (ot): Determines the flow of information from the current
cell state (Ct) to the hidden state (ht) by generating an output vector
(ot(t) * tanh(Ct)) using a sigmoid activation function for gate control and
a hyperbolic tangent (tanh) activation function for output scaling.
Overall, these components and the information flow mechanism, along
with the specific activation functions used, enable an LSTM cell to retain

Deep Learning Module 4 16

long-term dependencies and effectively process sequential data.

LSTM APPLICATIONS
LSTMs (Long Short-Term Memory) networks are a cornerstone in Natural
Language Processing (NLP), facilitating a variety of tasks due to their
proficiency in handling sequential data and learning long-term
dependencies within text. Here’s how LSTMs are applied in NLP:
1. Language Modeling:
◦
Task: Predicting the next word in a sequence based on preceding words,
pivotal for autocompletion, text generation, and machine translation.
◦
How LSTMs help: They discern contextual cues from prior words and
leverage their memory capabilities to grasp relationships, enabling
accurate word prediction.
2.
Machine Translation:
◦
Task: Translating text from one language to another while preserving
semantic and syntactic coherence.
◦
How LSTMs help: By capturing long-term dependencies in sentences,
like word order and grammatical structures, LSTMs ensure faithful
translation by considering the holistic context.
3.
Sentiment Analysis:
◦
Task: Determining the emotional polarity of text (positive, negative, or
neutral).
◦
How LSTMs help: Analyzing word sequences allows LSTMs to gauge
sentiment, considering not only individual words but also their contextual
nuances, thereby capturing subtle emotional cues effectively.
4.
Text Summarization:
◦
Task: Condensing lengthy text while retaining essential information.

Deep Learning Module 4 17

◦
How LSTMs help: By processing the entire text, LSTMs identify salient
points using their memory, ensuring that the summary encapsulates key
information.
5.
Text Generation:
◦
Task: Crafting coherent and grammatically sound text, such as in
chatbots or creative writing assistants.
◦
How LSTMs help: Drawing from patterns in vast text corpora, LSTMs
generate new text that adheres to established styles and themes,
maintaining continuity and consistency.
6.
Question Answering:
◦
Task: Providing answers to questions posed in natural language,
leveraging extensive text corpora or knowledge bases.
◦
How LSTMs help: By understanding question context and analyzing
relevant text passages, LSTMs use their memory to discern appropriate
answers from processed information.

GRU ARCHITECTURE
The Gated Recurrent Unit (GRU) is a type of recurrent neural network
(RNN) architecture designed to address the vanishing gradient problem
and efficiently capture long-range dependencies in sequential data.
Here’s a brief overview of the GRU architecture:

Deep Learning Module 4 18

1. Components:
◦
Reset Gate (rt): Controls how much of the previous hidden state to
forget, considering the current input and previous hidden state.
◦
Update Gate (zt): Regulates the balance between the previous hidden
state and the new candidate hidden state based on the current input and
previous hidden state.
◦
Hidden State (ht): Represents the memory of the network at each time
step, encoding information from past inputs.
2.
Information Flow:
◦ The reset gate determines the relevance of past information,
influencing the reset vector.
◦ The update gate adjusts the combination of the previous hidden
state and the candidate hidden state, controlling the flow of new
information into the current hidden state.
◦ The new hidden state is computed as a weighted combination of the
previous hidden state and the candidate hidden state, with the update
gate determining the weighting.
3.
Advantages:
◦
Simplicity: GRUs have a simpler architecture compared to LSTMs,
making them easier to train and potentially more computationally
efficient.
◦
Performance: Despite their simplicity, GRUs can achieve comparable or
even superior performance to LSTMs on various sequential learning
tasks.

Understanding the GRU Cell

The GRU cell is the fundamental unit of a GRU network and consists of
three main components:

Update Gate

Reset Gate

Deep Learning Module 4 19

Candidate Hidden State

Update Gate
The update gate controls the amount of the past information that needs
to be passed along to the future. The output of the update gate zt is

calculated as follows:
zt = σ(Wz xt + Uz ht−1 )

where:

xt is the current input.

ht−1 is the hidden state from the previous timestep.

Wz and Uz are weight matrices.

σ is the sigmoid activation function.

Reset Gate
The reset gate determines the amount of past information to forget. The
output of the reset gate rt is calculated as:

rt = σ(Wr xt + Ur ht−1 )

where:

Wr and Ur are weight matrices.

Candidate Hidden State

~
The candidate hidden state ht is calculated using the reset gate:

~
ht = tanh(W xt + rt ⊙ Uht − 1)

where:

⊙represents element-wise multiplication.

W and U are weight matrices.

GRU vs LSTM
Feature GRU LSTM

Simpler with two gates More complex with three gates

Structure
(update and reset) (input, forget, output)

Deep Learning Module 4 20

Fewer (3 weight
Parameters More (4 weight matrices)
matrices)

Training Speed Faster to train Slower to train

Space Typically uses fewer

Requires more memory resources
Complexity memory resources

Performs similarly to Performs well on many tasks but

Performance
LSTM; sometimes better more computationally expensive

Generally suitable for Effective for tasks like natural

Task Suitability large datasets or language understanding and
sequences machine translation

Deep Learning Module 4 21

Unit I
0% (1)
Unit I
21 pages
Unit II
No ratings yet
Unit II
56 pages
Unit III
No ratings yet
Unit III
38 pages
Unit IV
No ratings yet
Unit IV
22 pages
NN Unit 1 Complete Notes
100% (1)
NN Unit 1 Complete Notes
154 pages
Unit V
No ratings yet
Unit V
21 pages
Module 1 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
100% (1)
Module 1 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
18 pages
DL Unit 2
No ratings yet
DL Unit 2
29 pages
DEEP LEARNING NOTES - Btech
No ratings yet
DEEP LEARNING NOTES - Btech
26 pages
ML Unit-3
No ratings yet
ML Unit-3
23 pages
DL Notes 1 5 Deep Learning
100% (1)
DL Notes 1 5 Deep Learning
189 pages
ML Unit-2
100% (1)
ML Unit-2
28 pages
NNDL Lab Manual
No ratings yet
NNDL Lab Manual
41 pages
DL Lab Manual 2022-23
No ratings yet
DL Lab Manual 2022-23
34 pages
Autoencoders - Buffalo University
No ratings yet
Autoencoders - Buffalo University
36 pages
Optimization For Long-Term Dependencies
No ratings yet
Optimization For Long-Term Dependencies
57 pages
Unit 5 Reinforcement Learning Notes
No ratings yet
Unit 5 Reinforcement Learning Notes
20 pages
Module 2 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 2 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
20 pages
NNDL Technical Publication Notes
No ratings yet
NNDL Technical Publication Notes
81 pages
Ccs355 Neural Networks and Deep Learning Unit1
No ratings yet
Ccs355 Neural Networks and Deep Learning Unit1
29 pages
Module 5 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 5 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
26 pages
Module 3 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
No ratings yet
Module 3 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
20 pages
A Probabilistic Theory of Deep Learning: Unit 2
100% (1)
A Probabilistic Theory of Deep Learning: Unit 2
17 pages
Deep Learning Question Paper
100% (1)
Deep Learning Question Paper
3 pages
AD3501 - Deep Learning University Question
No ratings yet
AD3501 - Deep Learning University Question
2 pages
UNIT-4 Foundations of Deep Learning
100% (1)
UNIT-4 Foundations of Deep Learning
43 pages
Deep Learning-Question Bank-Module-Wise
67% (3)
Deep Learning-Question Bank-Module-Wise
5 pages
UNIT-1 Foundations of Deep Learning
100% (1)
UNIT-1 Foundations of Deep Learning
51 pages
21cse356t NLP Unit 4
No ratings yet
21cse356t NLP Unit 4
81 pages
Unit-3 Unit-3 RL Problems, Prediction and Control P 241111 181426
No ratings yet
Unit-3 Unit-3 RL Problems, Prediction and Control P 241111 181426
15 pages
Neural Networks
100% (1)
Neural Networks
26 pages
ML UNIT-4 Notes PDF
100% (1)
ML UNIT-4 Notes PDF
40 pages
RL Unit 1
100% (1)
RL Unit 1
26 pages
MCQ
100% (1)
MCQ
9 pages
Deep Learning R18 Jntuh Lab Manual
0% (1)
Deep Learning R18 Jntuh Lab Manual
21 pages
DLunit 5
No ratings yet
DLunit 5
17 pages
RL Unit 5
No ratings yet
RL Unit 5
30 pages
21CSE356T-NLP-Unit 4.1
No ratings yet
21CSE356T-NLP-Unit 4.1
46 pages
ccs355 Syllabus NNDL
100% (1)
ccs355 Syllabus NNDL
3 pages
ccs355 Lab Manual
No ratings yet
ccs355 Lab Manual
24 pages
Deep Learning Questions
50% (2)
Deep Learning Questions
51 pages
Module 1
No ratings yet
Module 1
23 pages
DL Unit-3
No ratings yet
DL Unit-3
9 pages
Unit 4 Notes
100% (1)
Unit 4 Notes
45 pages
DL Unit-2
No ratings yet
DL Unit-2
24 pages
Machine Learning, ML Ass 6
No ratings yet
Machine Learning, ML Ass 6
11 pages
DL Unit - 4
No ratings yet
DL Unit - 4
14 pages
ML Unit-1
100% (2)
ML Unit-1
12 pages
PRINT Microprocessor Manual
No ratings yet
PRINT Microprocessor Manual
84 pages
DL Unit - 5
No ratings yet
DL Unit - 5
14 pages
DLunit 4
No ratings yet
DLunit 4
16 pages
Unit I: Chapter 3:functional Units For Anns For Pattern Recognition Task
100% (2)
Unit I: Chapter 3:functional Units For Anns For Pattern Recognition Task
24 pages
DL - Assignment 6 Solution
100% (3)
DL - Assignment 6 Solution
6 pages
Question Bank Ann
50% (2)
Question Bank Ann
2 pages
DL - Assignment 1 Solution
No ratings yet
DL - Assignment 1 Solution
8 pages
Deep Learning-KTU
No ratings yet
Deep Learning-KTU
6 pages
Ait401 DL Syllubus
100% (1)
Ait401 DL Syllubus
13 pages
Unit 4
100% (1)
Unit 4
7 pages
DL - Assignment 8 Solution
100% (2)
DL - Assignment 8 Solution
6 pages
Assignment Week 8-Deep-Learning PDF
100% (1)
Assignment Week 8-Deep-Learning PDF
5 pages
CS 672 - Neural Networks - Practice - Midterm - Solutions
No ratings yet
CS 672 - Neural Networks - Practice - Midterm - Solutions
7 pages
Physics-Guided Physics-Informed and Physics-Encode
No ratings yet
Physics-Guided Physics-Informed and Physics-Encode
37 pages
Multilayer Perceptron (MLP) & Linear Separabaility
No ratings yet
Multilayer Perceptron (MLP) & Linear Separabaility
7 pages
Inteligencia Artificial Java (English)
100% (4)
Inteligencia Artificial Java (English)
222 pages
Chapter 2 - Artificial Neural Networks (ANNs)
No ratings yet
Chapter 2 - Artificial Neural Networks (ANNs)
27 pages
Module 2 DL
No ratings yet
Module 2 DL
9 pages
B.E Syllabus For DL
No ratings yet
B.E Syllabus For DL
4 pages
Deep Learning Handson
No ratings yet
Deep Learning Handson
65 pages
Earthquake Prediction Model Using Machine Learning
No ratings yet
Earthquake Prediction Model Using Machine Learning
8 pages
Neural Network (Machine Learning) - Wikipedia
No ratings yet
Neural Network (Machine Learning) - Wikipedia
40 pages
Handwritten Digit Recognizer Report
No ratings yet
Handwritten Digit Recognizer Report
48 pages
Artificial Neural Networks Notes Syllabus Unit-1
No ratings yet
Artificial Neural Networks Notes Syllabus Unit-1
24 pages
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
No ratings yet
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
43 pages
Istanbul Technical University: Institute of Science and Technology
No ratings yet
Istanbul Technical University: Institute of Science and Technology
113 pages
Machine Learning For Hydrologic Sciences: An Introductory Overview
No ratings yet
Machine Learning For Hydrologic Sciences: An Introductory Overview
40 pages
Deep Belief Networks (DBNS)
No ratings yet
Deep Belief Networks (DBNS)
19 pages
Advances of Machine Learning in Multi-Energy District Communities 2022
No ratings yet
Advances of Machine Learning in Multi-Energy District Communities 2022
28 pages
Lit Deep Learning
No ratings yet
Lit Deep Learning
19 pages
Data Mining Methods Basics - Resp
No ratings yet
Data Mining Methods Basics - Resp
33 pages
Cambridge International AS & A Level: Computer Science 9618/31
No ratings yet
Cambridge International AS & A Level: Computer Science 9618/31
12 pages
Power System Security Assessment and Enhancement - A Bibliographical Survey
No ratings yet
Power System Security Assessment and Enhancement - A Bibliographical Survey
14 pages
Lec5.1 - Convolutional Networks For Images - Speech - and Time Series PDF
No ratings yet
Lec5.1 - Convolutional Networks For Images - Speech - and Time Series PDF
14 pages
Computational Intelligence To Aid Text F
No ratings yet
Computational Intelligence To Aid Text F
14 pages
MLS 1 - Presentation
No ratings yet
MLS 1 - Presentation
11 pages
Atsalakis Et Al. 2011 - Elliott Wave Theory and Neuro-Fuzzy Systems in Stock Market Prediction - The WASP System
100% (1)
Atsalakis Et Al. 2011 - Elliott Wave Theory and Neuro-Fuzzy Systems in Stock Market Prediction - The WASP System
11 pages
Predicting Results of Brazilian Soccer League Matches: University of Wisconsin-Madison
No ratings yet
Predicting Results of Brazilian Soccer League Matches: University of Wisconsin-Madison
13 pages
Speed Control of DC Motor Using Neural Network Configuration
No ratings yet
Speed Control of DC Motor Using Neural Network Configuration
4 pages
CT605A-N Soft Computing
No ratings yet
CT605A-N Soft Computing
3 pages
Applications of Neural Networks - Tutorialspoint
No ratings yet
Applications of Neural Networks - Tutorialspoint
2 pages
Prediction of Hourly Heating Energy Use For Hvac Using Feedforward Neural Networks
No ratings yet
Prediction of Hourly Heating Energy Use For Hvac Using Feedforward Neural Networks
5 pages

Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414

Uploaded by

Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414

Uploaded by

Deep Learning Module 4

Recurrent Neural Networks (RNNs)

1. Introduction and Basic Structure

Deep Learning Module 4 1

Hidden State Update:

[ht = f(Wh ht−1 + Wx xt + bh )]﻿

( ht ﻿) is the hidden state at time step ( t ).

( ht−1 ﻿) is the hidden state at the previous time step.

( xt ﻿) is the input at time step ( t ).

( bh ﻿) is the bias term.

( f ﻿) is a non-linear activation function, typically tanh or ReLU.

( yt ﻿) is the output at time step ( t ).

( Wy ﻿) is the weight matrix for the output.

Deep Learning Module 4 2

5. Challenges with Standard RNNs

Vanishing and Exploding Gradients: During backpropagation, gradients

Limited Long-Term Memory: Standard RNNs struggle to capture long-

Long Short-Term Memory (LSTM): LSTMs introduce memory cells and

Deep Learning Module 4 3

7. Summary of RNN Operations

1. Initialization: Initialize the hidden state ( h0 ﻿). ​

2. Iteration Over Sequence:

current input ( xt ﻿) and the previous hidden state ( ht−1 ﻿).

Compute the output ( yt ﻿) using the current hidden state ( ht ﻿).

3. Loss Calculation and Backpropagation: Compute the loss over the

Hidden State: [ht ​ = f(Wh ht−1 + Wx xt + bh )]﻿

Recurrent Neural Networks (RNNs) can be configured in various

1. One-to-One (Vanilla Neural Network)

Single input and single output.

Use Case: Standard feedforward neural network tasks like image

Deep Learning Module 4 4

Single input and a sequence of outputs.

A sequence of inputs and a single output.

Deep Learning Module 4 5

Use Case: Sentiment analysis, where a sequence of words (sentence) is

A sequence of inputs and a sequence of outputs.

The hidden state is passed through each time step, generating an

Use Case: Machine translation, where a sequence of words in one

Deep Learning Module 4 6

Natural Language Processing (NLP)

Machine Translation: Used in Seq2Seq models with attention mechanisms

Sentiment Analysis: RNNs analyze text to determine sentiment by

Speech Recognition: Transcribe spoken language into text by processing

Time Series Analysis

Weather Prediction: Forecast weather conditions by processing historical

Deep Learning Module 4 7

Fault Detection: Monitor machinery to detect faults and prevent

Robotics and Control Systems

Motion Prediction: Predict future positions of moving objects or robots'

Music and Art

Image Captioning: Generate descriptive captions for images by combining

Chatbots and Conversational AI

Unrolling a Recurrent Neural Network (RNN) through time is a method of

Unrolling Through Time

When we unroll an RNN through time, we expand the RNN into a

Deep Learning Module 4 8

Imagine an RNN processing a sequence of input data ( x =

new hidden state ( ht ﻿). ​

The hidden state at time ( t ) is given by:

[ ht ​ = f(Whh ht−1 + Wxh xt + bh )﻿]

matrix for the input, and ( bh ﻿) is the bias. ​

When unrolled, this becomes:

[h2 = f(Whh h1 + Wxh x2 + bh )]﻿

Understanding and Visualization: Unrolling makes it easier to visualize

Backpropagation Through Time (BPTT): Unrolling is essential for

Deep Learning Module 4 9

RECURSIVE NEURAL NETWORKS

Deep Learning Module 4 10

Recursive Application: The same neural network weights are

Recursive Computation: The computation starts from the leaves of

State Computation: The state of a node is typically computed using

Backpropagation Through Structure: The network is trained using

1. Natural Language Processing (NLP):

Sentence Parsing: Recursive neural networks can be used to parse

Deep Learning Module 4 11

Sentiment Analysis: By analyzing the structure of a sentence,

[ht = f(Wh ht−1 + Wx xt + bh )]

( ht ) is the hidden state at time step ( t ).

( ht−1 ) is the hidden state at the previous time step.

( xt ) is the input at time step ( t ).

( bh ) is the bias term.

( f ) is a non-linear activation function, typically tanh or ReLU.

( yt ) is the output at time step ( t ).

( Wy ) is the weight matrix for the output.

1. Initialization: Initialize the hidden state ( h0 ).

current input ( xt ) and the previous hidden state ( ht−1 ).

Compute the output ( yt ) using the current hidden state ( ht ).

Hidden State: [ht = f(Wh ht−1 + Wx xt + bh )]

new hidden state ( ht ).

[ ht = f(Whh ht−1 + Wxh xt + bh )]

matrix for the input, and ( bh ) is the bias.

[h2 = f(Whh h1 + Wxh x2 + bh )]

An RNN processes a sequence of inputs, ( x(1), x(2), … , x(T )),

) and ( c) are parameters.

xt is the current input.

ht−1 is the hidden state from the previous timestep.

Wz and Uz are weight matrices.

σ is the sigmoid activation function.

Wr and Ur are weight matrices.

⊙represents element-wise multiplication.