0% found this document useful (0 votes)
3 views

RNN

The document discusses Recurrent Neural Networks (RNNs) and their applications in sequence models, such as speech recognition, sentiment classification, and video activity recognition. RNNs are specialized for processing sequential data and have advantages over traditional neural networks, including the ability to remember previous inputs. It also covers the architecture of RNNs, the forward and backward pass processes, and challenges like the vanishing gradient problem.

Uploaded by

Alaa Elnakeeb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

RNN

The document discusses Recurrent Neural Networks (RNNs) and their applications in sequence models, such as speech recognition, sentiment classification, and video activity recognition. RNNs are specialized for processing sequential data and have advantages over traditional neural networks, including the ability to remember previous inputs. It also covers the architecture of RNNs, the forward and backward pass processes, and challenges like the vanishing gradient problem.

Uploaded by

Alaa Elnakeeb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Lec 4

Recurrent neural network


Long short-term memory
Recurrent Neural Networks
Sequence models
Sequence Models
Sequence models are the machine learning models
that input or output sequences of data.
Sequential data includes text streams, DNA
sequences, audio clips, video clips, time-series data
and etc.
Recurrent Neural Networks (RNNs) is a popular
algorithm used in sequence models.
Difference between RNN and
traditional DNN
Applications of Sequence Models
1. Speech recognition: In speech recognition, an audio clip is given
as an input and then the model has to generate its text transcript.
Here both the input and output are sequences of data.
Applications of Sequence Models
2. Sentiment Classification: In sentiment classification opinions
expressed in a piece of text is categorized.
Here the input is a sequence of words.
Applications of Sequence Models
3. Video Activity Recognition: In video activity recognition, the
model needs to identify the activity in a video clip. A video clip is a
sequence of video frames, therefore in case of video activity
recognition input is a sequence of data.
All the previous examples need
Sequence models
Definition of RNN
Recurrent Neural Network (RNN) is a Deep learning algorithm and it is
a type of Artificial Neural Network architecture that is specialized for
processing sequential data.
RNNs are mostly used in the field of Natural Language Processing
(NLP). RNN maintains internal memory, due to this they are very
efficient for machine learning problems that involve sequential data.
RNNs are also used in time series predictions as well
Advantages of RNN
The main advantage of using RNNs instead of standard neural networks
is that the features are not shared in standard neural networks. Weights
are shared across time in RNN.

RNNs can remember its previous inputs but Standard Neural Networks
are not capable of remembering previous inputs. RNN takes historical
information for computation
Types of RNN
There are several RNN architectures based on the number of inputs and outputs,
1. One to Many Architecture: Image captioning is one good example of this
architecture. In image captioning, it takes one image and then outputs a sequence of
words. Here there is only one input but many outputs.
2. Many-to-One Architecture: Sentiment classification is one good example of this
architecture. In sentiment classification, a given sentence is classified as positive or
negative. In this case, the input is a sequence of words and output is a binary
classification.
3. Many to Many Architecture: There are two cases in many to many architectures,
An unrolled recurrent neural network
The Problem of Long-Term
Dependencies
RNN short-term dependencies
Language model trying to predict the next word based on the previous ones

the clouds are in the sky,


h0 h1 h2 h3 h4

A A A A A

x0 x1 x2 x3 x4
RNN long-term dependencies
Language model trying to predict the next word based on the previous ones

I grew up in India… I speak fluent Hindi.

h0 h1 h2 ht−1 ht

A A A A A

x0 x1 x2 xt −1 xt
StandardRNN
How RNN works
How RNN works
The input layer ‘x’ takes within the input to the neural network and
processes it and passes it onto the center layer.

The middle layer ‘h’ can encompass multiple hidden layers, each with its
own activation functions and weights and biases. If you have got a neural
network where the assorted parameters of various hidden
layers aren’t tormented by the previous layer,

The Recurrent Neural Network will standardize the various activation


functions and weights and biases in order that each hidden layer has the
identical parameters. Then, rather than creating multiple hidden
layers, it’ll create one and loop over it as over and over as needed.
Backpropagation Through Time
(BPTT)
RNN forward pass

V V V V V
W W W W W

U U U U U
t
RNN forward pass
In the forward pass, at a particular timestep, the input vector
and the hidden state vector from the previous timestep are
multiplied by their respective weight matrices and are
summed up by the addition node.

Next, they pass through a non-linear function. Then they are


copied: one of them goes as an input to the next time step, and
the other goes into the classification head where it’s
multiplied by a weight matrix to obtain the logits vector
before computing the cross-entropy loss.

This is a typical generative RNN setup where we model the


network such that given an input character, it predicts the
probability distribution of the next appropriate character.
RNN backword pass
In the backward pass, we start from the end and compute the gradient of
the classification loss wrt the logits vector — details of which have been
discussed in the previous section.

This gradient flows backward to the matrix multiplication node where


we compute the gradients wrt both the weight matrix and the hidden
state. The gradient wrt the hidden state flows backward to the copy
node where it meets the gradient from the previous time step.

You see, a RNN essentially processes sequences one step at a time, so


during backpropagation the gradients flow backward across time steps.
This is called backpropagation through time.

So, the gradient wrt the hidden state and the gradient from the previous
time step meet at the copy node where they are summed up.
Backpropagation through time
Backpropagation through time
Backpropagation through time
Back propagation through time
Backpropagation through time
Back propagation through time
Backpropagation through time
The Vanishing Gradient Problem
Activation function
Refrence
● http://colah.github.io/posts/2015-08-Understanding-LSTMs/
● http://www.wildml.com/

http://nikhilbuduma.com/2015/01/11/a-deep-dive-into-recurrent-neural-netwo
rks/

http://deeplearning.net/tutorial/lstm.html
● https://theclevermachine.files.wordpress.com/2014/09/act-funs.png
● http://blog.terminal.com/demistifying-long-short-term-memory-lstm-recurrent
-neural-networks/

A Critical Review of Recurrent Neural Networks for Sequence Learning,
Zachary C. Lipton, John Berkowitz

Long Short-Term Memory, Hochreiter, Sepp and Schmidhuber, Jurgen, 1997

Gers, F. A.; Schmidhuber, J. & Cummins, F. A. (2000), 'Learning to Forget:
Continual Prediction with LSTM.', Neural Computation 12 (10) , 2451-2471 .

You might also like