RNN
RNN
RNNs can remember its previous inputs but Standard Neural Networks
are not capable of remembering previous inputs. RNN takes historical
information for computation
Types of RNN
There are several RNN architectures based on the number of inputs and outputs,
1. One to Many Architecture: Image captioning is one good example of this
architecture. In image captioning, it takes one image and then outputs a sequence of
words. Here there is only one input but many outputs.
2. Many-to-One Architecture: Sentiment classification is one good example of this
architecture. In sentiment classification, a given sentence is classified as positive or
negative. In this case, the input is a sequence of words and output is a binary
classification.
3. Many to Many Architecture: There are two cases in many to many architectures,
An unrolled recurrent neural network
The Problem of Long-Term
Dependencies
RNN short-term dependencies
Language model trying to predict the next word based on the previous ones
A A A A A
x0 x1 x2 x3 x4
RNN long-term dependencies
Language model trying to predict the next word based on the previous ones
h0 h1 h2 ht−1 ht
A A A A A
x0 x1 x2 xt −1 xt
StandardRNN
How RNN works
How RNN works
The input layer ‘x’ takes within the input to the neural network and
processes it and passes it onto the center layer.
The middle layer ‘h’ can encompass multiple hidden layers, each with its
own activation functions and weights and biases. If you have got a neural
network where the assorted parameters of various hidden
layers aren’t tormented by the previous layer,
V V V V V
W W W W W
U U U U U
t
RNN forward pass
In the forward pass, at a particular timestep, the input vector
and the hidden state vector from the previous timestep are
multiplied by their respective weight matrices and are
summed up by the addition node.
So, the gradient wrt the hidden state and the gradient from the previous
time step meet at the copy node where they are summed up.
Backpropagation through time
Backpropagation through time
Backpropagation through time
Back propagation through time
Backpropagation through time
Back propagation through time
Backpropagation through time
The Vanishing Gradient Problem
Activation function
Refrence
● http://colah.github.io/posts/2015-08-Understanding-LSTMs/
● http://www.wildml.com/
●
http://nikhilbuduma.com/2015/01/11/a-deep-dive-into-recurrent-neural-netwo
rks/
●
http://deeplearning.net/tutorial/lstm.html
● https://theclevermachine.files.wordpress.com/2014/09/act-funs.png
● http://blog.terminal.com/demistifying-long-short-term-memory-lstm-recurrent
-neural-networks/
●
A Critical Review of Recurrent Neural Networks for Sequence Learning,
Zachary C. Lipton, John Berkowitz
●
Long Short-Term Memory, Hochreiter, Sepp and Schmidhuber, Jurgen, 1997
●
Gers, F. A.; Schmidhuber, J. & Cummins, F. A. (2000), 'Learning to Forget:
Continual Prediction with LSTM.', Neural Computation 12 (10) , 2451-2471 .