Module2 L7 RNN LSTM
Module2 L7 RNN LSTM
• Recurrent neural networks (RNN) are the state of the art algorithms
for sequential data and are used by Apple's Siri and Google's voice
search.
• Sequential Data?
• the points in the dataset are dependent on the other points in the dataset,
the data is said to be Sequential data.
• Ex: time series data, stock market price data, words in a sentence, gene
sequence data, etc.
• Why ANN cannot be used for sequential data?
• It doesn’t consider the dependencies within a sequence data.
• Ex: Given time-series data, develop a DNN to predict the outlook of a
day as sunny/rainy/windy.
• The traditional NN makes the prediction for each observation independent of the other observations.
• This violates the fact that weather on a particular day is strongly correlated with the weather of the previous
day and the following day.
• a traditional neural network assumes the data is non-sequential, and that each data point is independent of
other data points.
• Hence, the inputs are analyzed in isolation, which can cause problems in case there are dependencies in the
data.
• In traditional neural networks, all the inputs and outputs are independent of each other, but in cases when it is
required to predict the next word of a sentence, the previous words are required and hence there is a need to
remember the previous words.
• RNN are a type of Neural Network where the output from previous step are fed
as input to the current step.
• Most important feature of RNN is Hidden state, which remembers some information about a
sequence.
• RNN has a “memory” which remembers all information about what has been calculated in the
previous day.
• It uses the same parameters for each input as it performs the same task on all the inputs or
hidden layers to produce the output.
• This reduces the complexity of parameters, unlike other neural networks
Some Applications of RNN
Why not ANN?
• 1. An issue with using an ANN for language translation is, we cannot
fix the no. of neurons in a layer. It depends on the no. of words in the
input sentence.
Why not ANN?
One to Many
• One to Many is a kind of RNN architecture is applied
in situations that give multiple output for a single
input.
• These “Gates” control which information from the “distant path” should be
passed through the network to update the current cell state.
• The most commonly used variants of RNN which are capable of remembering
long term dependencies using “gated cell” is the
LSTM (Long Short Term Memory) and GRU(gated recurrent unit).
Forget Gate:
• This gate decides what information should be thrown away or kept.
• Information from the previous hidden state and information from the
current input is passed through the sigmoid function.
• Values come out between 0 and 1.
• The closer to 0 means to forget, and the closer to 1 means to keep.
Output gate:
• Output gate has 2 layers:
• “Tanh layer” generates a vector of new information that could be
written to the cell state
• “Sigmoid layer” decides which information should be kept from the
output of tanh function.
Cell State :
• Now we should have enough information to calculate the cell state.
• First, the cell state gets pointwise multiplied by the forget vector.
• This has a possibility of dropping values in the cell state if it gets
multiplied by values near 0.
• Then we take the output from the input gate and do a pointwise
addition which updates the cell state to new values that the neural
network finds relevant.
• That gives us our new cell state.
To summarize the LSTM:
• The “forget gate” with sigmoid function decides which information to
be forgotten from previous steps
• The “input gate” along with a tanh and a sigmoid function decides
what new inputs are added to the network
• The “output gate” updates the cell state, using the outputs from the
previous two gates.
• The tanh and sigmoid layers decide which part of the cell state are to
be output to the hidden state.
• To review,
• the Forget gate decides what is relevant to keep from prior steps.
• The input gate decides what information is relevant to add from the
current step.
• The output gate determines what the next hidden state should be.