0% found this document useful (0 votes)

12 views

Module2 L7 RNN LSTM

Uploaded by

heheryangosling

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Module2 L7 RNN LSTM

Uploaded by

heheryangosling

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 47

Recurrent Neural Network(RNN)

• Recurrent neural networks (RNN) are the state of the art algorithms
for sequential data and are used by Apple's Siri and Google's voice
search.
• Sequential Data?
• the points in the dataset are dependent on the other points in the dataset,
the data is said to be Sequential data.
• Ex: time series data, stock market price data, words in a sentence, gene
sequence data, etc.
• Why ANN cannot be used for sequential data?
• It doesn’t consider the dependencies within a sequence data.
• Ex: Given time-series data, develop a DNN to predict the outlook of a
day as sunny/rainy/windy.
• The traditional NN makes the prediction for each observation independent of the other observations.
• This violates the fact that weather on a particular day is strongly correlated with the weather of the previous
day and the following day.
• a traditional neural network assumes the data is non-sequential, and that each data point is independent of
other data points.
• Hence, the inputs are analyzed in isolation, which can cause problems in case there are dependencies in the
data.
• In traditional neural networks, all the inputs and outputs are independent of each other, but in cases when it is
required to predict the next word of a sentence, the previous words are required and hence there is a need to
remember the previous words.
• RNN are a type of Neural Network where the output from previous step are fed
as input to the current step.

• Most important feature of RNN is Hidden state, which remembers some information about a
sequence.
• RNN has a “memory” which remembers all information about what has been calculated in the
previous day.
• It uses the same parameters for each input as it performs the same task on all the inputs or
hidden layers to produce the output.
• This reduces the complexity of parameters, unlike other neural networks
Some Applications of RNN
Why not ANN?
• 1. An issue with using an ANN for language translation is, we cannot
fix the no. of neurons in a layer. It depends on the no. of words in the
input sentence.
Why not ANN?

2. Too much computations.

• Input words have to be converted to vectors(word2vec) using one-hot encoding.
• Hence that many neurons and parameters have to be learnt by the model.
Why not ANN?
• 3. Doesn’t preserve the sequence relationship in the input data
• a traditional neural network assumes the data is non-sequential, and that each data point is independent of
other data points.
• Hence, the inputs are analyzed in isolation, which can cause problems in case there are dependencies in the
data.
• Since each hidden layer has its own weights, bias and activations, they behave independently.
• When the input is a sequence data, the model should be also able to identify the relationship between
successive inputs
• If the task is to predict the• next word in a sentence using a MLP.
This will not help. All hidden layers with different
weights and bias work independently.
• To make the hidden layers preserve the sequence
relationship in the input, all hidden layers have to be
combined.
• To combine them use same weights and activation
functions
All these hidden layers can be rolled in
together in a single recurrent layer
How RNN works?
• Neurons in recurrent layer are called recurrent neurons
• At all the time steps weights of the recurrent neurons would be the same
• So a recurrent neuron stores the state of a previous input and combines with the
current input thereby preserving some relationship of the current input with the
previous input.
• RNN converts the independent activations into dependent activations by
providing the same weights and biases to all the layers, thus reducing the
complexity of increasing parameters and memorizing each previous outputs by
giving each output as input to the next hidden layer.
• Entire RNN computation involves – computations to update the cell
state at that time step and computations to predict the output at that
time step.
• During forward pass, we calculate the outputs at each time step, to
calculate the individual loss at each time step.
• The individual losses are combined to form the total loss.
• This total loss is used to train the neural network
Desirable Characteristics of RN N for
Sequence Modeling

• Ability to handle sequences of variable

lengths.
• Information about the next word to be
predicted in the sequence, might be
present very much earlier at the
beginning of the sequence.

• Ability to capture and model Long-Term

Dependencies
• This is possible since RNNs, keep
updating information collected from
the past by updating their
recurrent/hidden cell state at each
time step.
Desirable Characteristics of RNN for
Sequence Modeling…

• Ability to capture differences in sequence

order
• Two sentences with same words, but
different meaning.
• But the RNNs capture this difference,
since it uses the same weight matrices
at each time step, to update its hidden
state and remembers past information.
• In FFNN, the gradient of
the loss function is back
propagated through one
feed forward network in
one time step/input
• But in RNN, the gradient
of the total error is
propagated to the
individual time steps and
also across the time steps
from the most recent
time step to the very
beginning of the
sequence.
• Hence the name
“Backpropagation
through time”
Types of RNN
• One to One RNN
• One to Many RNN
• Many to One RNN
• Many to Many RNN
One to One RNN
• One to One RNN is the most basic and traditional
type of Neural network giving a single output for a
single input, as can be seen in the above image.
• It is also known as Vanilla Neural Network. It is used
to solve regular machine learning problems. Ex:
image classification

One to Many
• One to Many is a kind of RNN architecture is applied
in situations that give multiple output for a single
input.

• Image Captioning – Here, let’s say we have an image

for which we need a textual description. So we have a
single input – the image, and a series or sequence of
words as output. Here the image might be of a fixed
size, but the output is a description of varying lengths
 Many to One
• It takes a sequence of information as input and
produces a fixed size output.
• Many-to-one RNN architecture is usually seen for
sentiment analysis model as a common example. As
the name suggests, this kind of model is used when
multiple inputs are required to give a single output.
• Take for example The Twitter sentiment analysis
model. In that model, a text input (words as
multiple inputs) gives its fixed sentiment (single
output).
• Another example could be movie ratings model
that takes review texts as input to provide a rating
to a movie that may range from 1 to 5.
 Many-to-Many
• Many-to-Many RNN Architecture takes multiple
input and gives multiple output.
• Ex: language translaion
• Input is a sentence that has many words-> output
Problem of Long-Term Dependencies
• Problem of Vanishing Gradients?
• Multiply two small numbers(gradients) will result in a smaller number
(gradient).
• It becomes harder and harder for the neurons to propagate the error
to the earlier stages.
• Hence the parameters will be biased only to capture short term
dependencies.
• RNNs predict the next word in a sequence, based on the relevant
information in the distant past
• If the distance between the distant past and the current time step is
small, RRNs predict the next word correctly.
• As the sequence length increases, RNNs won’t be able to remember
the relevant information in the distant past and predict the next word.

• This is common in real life use cases with long sequences.

• This is due to the vanishing gradient problem.
A solution to the Vanishing gradient problem of RNN:

• Keep track of the long term dependencies by using “gates” .

• These “Gates” control which information from the “distant path” should be
passed through the network to update the current cell state.

• The most commonly used variants of RNN which are capable of remembering
long term dependencies using “gated cell” is the
LSTM (Long Short Term Memory) and GRU(gated recurrent unit).

• The “gates” perform different tensor operations to decide which information

can be removed/added to the current hidden state.
Problems of RNN
• Recurrent Neural Networks suffer from short-term memory.
• If a sequence is long enough, they’ll have a hard time carrying information
from earlier time steps to later ones. So to process a paragraph of text to do
predictions, RNN’s may leave out important information from the beginning.

• During back propagation, recurrent neural networks suffer from the

vanishing gradient problem.
• Layers that get a small gradient update stops learning.
• Those are usually the earlier layers.
• So because these layers don’t learn, RNN’s can forget what it seen in longer
sequences, thus having a short-term memory.
LSTMs and GRUs as a solution
• LSTM ’s and GRU’s were created as the solution to short-term memory. They have
internal mechanisms called gates that can regulate the flow of information.
• These gates can learn which data in a sequence is important to keep
or throw away.
• By doing that, it can pass relevant information down the long chain of
sequences to make predictions.
• LSTM’s and GRU’s can be found in speech recognition, speech
synthesis, and text generation. You can even use them to generate
captions for videos.
Recap of RNN
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-st
ep-explanation-44e9eb85bf21

• How a cell in a RNN calculates the hidden state? (Short-term memory)

• It combines the current input and the previous hidden state into a vector
• This vector has information on the current input and previous inputs.
• The vector goes through the tanh activation, and the output is the new
hidden state, or the memory of the network.
Recap of RNN
• The tanh activation is used to help regulate the values flowing
through the network.
• It squishes values to always be between -1 and 1.
• RNNs involve less computational resources
• But works well only for shorter sequences
LSTM
• An LSTM has a similar control flow
as a recurrent neural network.
• It processes data passing on
information as it propagates forward.
• The differences are the operations
within the LSTM’s cells.
• LSTMs have short-term memory in
“hidden states” and long-term
memory in “cell-states”
• The core concept of LSTM’s are the cell state, and it’s various gates.
• The cell state transfers relative information all the way down the sequence
chain.
• It helps to preserve the “long term memory” of the network.
• The cell state, helps information from the earlier time steps to make it’s
way to later time steps, reducing the effects of short-term memory.
• As the cell state goes on its journey, information get’s added or removed
to the cell state via gates.
• The gates are different neural networks that decide which information is
allowed on the cell state.
• The gates can learn what information is relevant to keep or forget during
training.
LSTM…

• Gates use “sigmoid” activation function to update or forget data.

• Sigmoid squishes its input values between 0 and 1.
• If sigmoid squishes its input X closer to 0, then X is forgotten.
• If sigmoid squishes its input X closer to 1, then X is kept.
• The network can learn which data is not important therefore can be
forgotten or which data is important to keep using Sigmoid.
• Three different gates regulate information flow in an LSTM cell. A
forget gate, input gate, and output gate.
LSTM

Forget Gate:
• This gate decides what information should be thrown away or kept.
• Information from the previous hidden state and information from the
current input is passed through the sigmoid function.
• Values come out between 0 and 1.
• The closer to 0 means to forget, and the closer to 1 means to keep.
Output gate:
• Output gate has 2 layers:
• “Tanh layer” generates a vector of new information that could be
written to the cell state
• “Sigmoid layer” decides which information should be kept from the
output of tanh function.
Cell State :
• Now we should have enough information to calculate the cell state.
• First, the cell state gets pointwise multiplied by the forget vector.
• This has a possibility of dropping values in the cell state if it gets
multiplied by values near 0.
• Then we take the output from the input gate and do a pointwise
addition which updates the cell state to new values that the neural
network finds relevant.
• That gives us our new cell state.
To summarize the LSTM:
• The “forget gate” with sigmoid function decides which information to
be forgotten from previous steps
• The “input gate” along with a tanh and a sigmoid function decides
what new inputs are added to the network

• The “output gate” updates the cell state, using the outputs from the
previous two gates.
• The tanh and sigmoid layers decide which part of the cell state are to
be output to the hidden state.
• To review,
• the Forget gate decides what is relevant to keep from prior steps.
• The input gate decides what information is relevant to add from the
current step.
• The output gate determines what the next hidden state should be.

TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
Sensitivity Analysis or Post Optimal Analysis
No ratings yet
Sensitivity Analysis or Post Optimal Analysis
32 pages
DL Unit - III Notes1
No ratings yet
DL Unit - III Notes1
14 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
42 pages
Unit 5
No ratings yet
Unit 5
76 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Module 06
No ratings yet
Module 06
5 pages
What are Recurrent Neural Networks.docx
No ratings yet
What are Recurrent Neural Networks.docx
7 pages
Unit_3_rcnn
No ratings yet
Unit_3_rcnn
25 pages
RNN SK
No ratings yet
RNN SK
17 pages
UNIT-3
No ratings yet
UNIT-3
30 pages
Lecture Notes_RRN
No ratings yet
Lecture Notes_RRN
8 pages
AD3501_UNIT3
No ratings yet
AD3501_UNIT3
29 pages
Sequence Modeling
No ratings yet
Sequence Modeling
131 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
36 pages
6b. Recurrent Neural Networks
No ratings yet
6b. Recurrent Neural Networks
38 pages
DeepLearning Unit-III
No ratings yet
DeepLearning Unit-III
99 pages
Unit V
No ratings yet
Unit V
32 pages
RNN LSTM Gru R
No ratings yet
RNN LSTM Gru R
97 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
6 pages
Unit 3 RCNN Updated
No ratings yet
Unit 3 RCNN Updated
28 pages
28-Recurrent Neural Networks - Bidirectional RNNs-19!09!2024
No ratings yet
28-Recurrent Neural Networks - Bidirectional RNNs-19!09!2024
12 pages
RNN
No ratings yet
RNN
23 pages
2 U4-Rnn
No ratings yet
2 U4-Rnn
17 pages
Deep Arch Msc 2024
No ratings yet
Deep Arch Msc 2024
83 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
Unit 4
No ratings yet
Unit 4
27 pages
module5
No ratings yet
module5
21 pages
Module 4-1
No ratings yet
Module 4-1
44 pages
T3-Slide_002_Vanilla RNNs
No ratings yet
T3-Slide_002_Vanilla RNNs
25 pages
Deep & Reinforcement - Unit 4
No ratings yet
Deep & Reinforcement - Unit 4
17 pages
DL
No ratings yet
DL
251 pages
What is an RNN
No ratings yet
What is an RNN
6 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
8 pages
Unit III- Recurrent Neural Networks
No ratings yet
Unit III- Recurrent Neural Networks
44 pages
Unit 3
No ratings yet
Unit 3
8 pages
A Recurrent Neural Network
No ratings yet
A Recurrent Neural Network
22 pages
Unit 4 - Machine Learning
No ratings yet
Unit 4 - Machine Learning
16 pages
MODULE 4
No ratings yet
MODULE 4
14 pages
Recurrent Neural Network: Dr. Sukanta Ghosh
100% (1)
Recurrent Neural Network: Dr. Sukanta Ghosh
34 pages
DL-UNIT_5
No ratings yet
DL-UNIT_5
10 pages
Sequence Models231205
No ratings yet
Sequence Models231205
72 pages
RNN introduction
No ratings yet
RNN introduction
22 pages
DL_MOD4 (3)
No ratings yet
DL_MOD4 (3)
105 pages
Unit V Recurrent Neural Networks
No ratings yet
Unit V Recurrent Neural Networks
35 pages
Unit IV
No ratings yet
Unit IV
31 pages
Recurrent Neural Network: SUBMITTED BY: Harmanjeet Singh ROLL NO - 1803448 B.Tech, Cse (7) Ctiemt, Shahpur (Jalandhar)
No ratings yet
Recurrent Neural Network: SUBMITTED BY: Harmanjeet Singh ROLL NO - 1803448 B.Tech, Cse (7) Ctiemt, Shahpur (Jalandhar)
11 pages
RNN
No ratings yet
RNN
32 pages
Unit 3 Deep Learning SPPU BE IT
No ratings yet
Unit 3 Deep Learning SPPU BE IT
30 pages
What is a Recurrent Neural Network
No ratings yet
What is a Recurrent Neural Network
36 pages
Recurrent Neural Networks
No ratings yet
Recurrent Neural Networks
18 pages
Recurrent Neural Network (RNN)
No ratings yet
Recurrent Neural Network (RNN)
26 pages
DL CO3- PPT 1
No ratings yet
DL CO3- PPT 1
22 pages
Steps For Training A Recurrent Neural Network: Advantages
No ratings yet
Steps For Training A Recurrent Neural Network: Advantages
13 pages
RNN
No ratings yet
RNN
14 pages
CS5560 Lect12-RNN - LSTM
No ratings yet
CS5560 Lect12-RNN - LSTM
30 pages
mod6
No ratings yet
mod6
48 pages
RNN_2
No ratings yet
RNN_2
144 pages
Unit III (2) RNN, LSTM, Gru
No ratings yet
Unit III (2) RNN, LSTM, Gru
14 pages
Day 4
No ratings yet
Day 4
22 pages
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
From Everand
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
Fouad Sabry
No ratings yet
Linear Programming - Simplex Maximization
No ratings yet
Linear Programming - Simplex Maximization
14 pages
Python Self Study Material
0% (1)
Python Self Study Material
9 pages
Engineering Optimization Theory and Practice 5th edition by Singiresu Rao 9781119454793 1119454794 - The latest ebook version is now available for instant access
100% (4)
Engineering Optimization Theory and Practice 5th edition by Singiresu Rao 9781119454793 1119454794 - The latest ebook version is now available for instant access
76 pages
YOLO V3 ML Project
No ratings yet
YOLO V3 ML Project
15 pages
Kakuro 5x5 A4
No ratings yet
Kakuro 5x5 A4
5 pages
5 FFT
100% (2)
5 FFT
39 pages
Object Detection Using Transformers: H.O.D DR.D.Haritha
No ratings yet
Object Detection Using Transformers: H.O.D DR.D.Haritha
24 pages
Tos in Grade 7 Tos
No ratings yet
Tos in Grade 7 Tos
2 pages
DAA
No ratings yet
DAA
4 pages
10th Nas Questions
No ratings yet
10th Nas Questions
10 pages
Transportation Problem: - Transportation Problem - A "Special Case" of LP - Reasons?
0% (1)
Transportation Problem: - Transportation Problem - A "Special Case" of LP - Reasons?
26 pages
Quadratic Formula
No ratings yet
Quadratic Formula
11 pages
ESO-208A Computational Methods in Engineering Assignment 1 2022-23 Semester-1
No ratings yet
ESO-208A Computational Methods in Engineering Assignment 1 2022-23 Semester-1
2 pages
Math 10 q1 WK 8 Module 8 Polynomial Equations
No ratings yet
Math 10 q1 WK 8 Module 8 Polynomial Equations
32 pages
Matlab Code For Bisection Method: Clear
No ratings yet
Matlab Code For Bisection Method: Clear
19 pages
Numerical Methods and Optimization: Theory and Practice for Engineers (Springer Optimization and Its Applications, 187) Jean-Pierre Corriou - Read the ebook now or download it for a full experience
100% (1)
Numerical Methods and Optimization: Theory and Practice for Engineers (Springer Optimization and Its Applications, 187) Jean-Pierre Corriou - Read the ebook now or download it for a full experience
76 pages
Mod 3 Math 311
No ratings yet
Mod 3 Math 311
12 pages
Operations Research
No ratings yet
Operations Research
7 pages
Homework 1 Solutions
No ratings yet
Homework 1 Solutions
2 pages
Project 2
No ratings yet
Project 2
9 pages
Final Exam Answer of Numerical Methods 2019 PDF
No ratings yet
Final Exam Answer of Numerical Methods 2019 PDF
6 pages
Ch-5 Integer Programming Edited 1
No ratings yet
Ch-5 Integer Programming Edited 1
31 pages
Analytical Decision Modeling For Business Decisions
No ratings yet
Analytical Decision Modeling For Business Decisions
35 pages
An Introduction To Convolutional Neural Networks: November 2015
No ratings yet
An Introduction To Convolutional Neural Networks: November 2015
12 pages
09 Power Method
No ratings yet
09 Power Method
72 pages
30-questions-to-test-a-data-scientist-on-linear-regression
No ratings yet
30-questions-to-test-a-data-scientist-on-linear-regression
10 pages
FACTORINGPOLYNOMIALS
No ratings yet
FACTORINGPOLYNOMIALS
21 pages
LU Decomposition With Gauss Elimination
No ratings yet
LU Decomposition With Gauss Elimination
4 pages
Cpe-310B Engineering Computation and Simulation: Solving Sets of Equations
No ratings yet
Cpe-310B Engineering Computation and Simulation: Solving Sets of Equations
46 pages

Module2 L7 RNN LSTM

Uploaded by

Module2 L7 RNN LSTM

Uploaded by

Recurrent Neural Network(RNN)

2. Too much computations.

• Ability to handle sequences of variable

• Ability to capture and model Long-Term

• Ability to capture differences in sequence

• Image Captioning – Here, let’s say we have an image

• This is common in real life use cases with long sequences.

• Keep track of the long term dependencies by using “gates” .

• The “gates” perform different tensor operations to decide which information

• During back propagation, recurrent neural networks suffer from the

• How a cell in a RNN calculates the hidden state? (Short-term memory)

• Gates use “sigmoid” activation function to update or forget data.

You might also like