IMP - Fundamentals of Deep Learning - Introduction To Recurrent Neural Networks
IMP - Fundamentals of Deep Learning - Introduction To Recurrent Neural Networks
×
-
(http://play.google.com/store/apps/details?id=com.analyticsvidhya.android)
(https://www.analyticsvidhya.com/blog/)
(https://trainings.analyticsvidhya.com/courses/course-v1:AnalyticsVidhya+DS101+2018T2/about?
utm_source=AVtopbanner)
Introduction
Let me open this article with a question – “working love learning we on deep”, did this make any sense to
you? Not really – read this one – “We love working on deep learning”. Made perfect sense! A little jumble in
the words made the sentence incoherent. Well, can we expect a neural network to make sense out of it? Not
really! If the human brain was confused on what it meant I am sure a neural network is going to have a tough
time deciphering such text.
There are multiple such tasks in everyday life which get completely disrupted when their sequence is
disturbed. For instance, language as we saw earlier- the sequence of words define their meaning, a time
series data – where time defines the occurrence of events, the data of a genome sequence- where every
sequence has a different meaning. There are multiple such cases wherein the sequence of information
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 1/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
determines the event itself. If we are trying to use such data for any reasonable output, we need a network
which has access to some prior knowledge about the data to completely understand it. Recurrent neural
networks thus come into play.
In this article I would assume that you have a basic understanding of neural networks, in case you need a
refresher please go through this article (https://www.analyticsvidhya.com/blog/2017/05/neural-network-from-
scratch-in-python-and-r/) before you proceed.
Table of Contents
Before we deep dive into the details of what a recurrent neural network is, let’s ponder a bit on if we really
need a network specially for dealing with sequences in information. Also what are kind of tasks that we can
achieve using such networks.
The beauty of recurrent neural networks lies in their diversity of application. When we are dealing with RNNs
they have a great ability to deal with various input and output types.
Sentiment Classification – This can be a task of simply classifying tweets into positive and negative
sentiment. So here the input would be a tweet of varying lengths, while output is of a fixed type and
size.
Image Captioning – Here, let’s say we have an image for which we need a textual description. So we
have a single input – the image, and a series or sequence of words as output. Here the image might be
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 2/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
Language Translation – This basically means that we have some text in a particular language let’s
say English, and we wish to translate it in French. Each language has it’s own semantics and would
have varying lengths for the same sentence. So here the inputs as well as outputs are of varying
lengths.
So RNNs can be used for mapping inputs to outputs of varying types, lengths and are fairly generalized in
their application. Looking at their applications, let’s see how the architecture of an RNN looks like.
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 3/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
Let’s say the task is to predict the next word in a sentence. Let’s try accomplishing it using an MLP. So what
happens in an MLP. In the simplest form, we have an input layer, a hidden layer and an output layer. The
input layer receives the input, the hidden layer activations are applied and then we finally receive the output.
Let’s have a deeper network, where multiple hidden layers are present. So here, the input layer receives the
input, the first hidden layer activations are applied and then these activations are sent to the next hidden
layer, and successive activations through the layers to produce the output. Each hidden layer is characterized
by its own weights and biases.
Since each hidden layer has its own weights and activations, they behave independently. Now the objective is
to identify the relationship between successive inputs. Can we supply the inputs to hidden layers? Yes we
can!
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 4/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
Here, the weights and bias of these hidden layers are different. And hence each of these layers behave
independently and cannot be combined together. To combine these hidden layers together, we shall have the
same weights and bias for these hidden layers.
We can now combines these layers together, that the weights and bias of all the hidden layers is the same.
All these hidden layers can be rolled in together in a single recurrent layer.
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 5/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
So it’s like supplying the input to the hidden layer. At all the time steps weights of the recurrent neuron would
be the same since its a single neuron now. So a recurrent neuron stores the state of a previous input and
combines with the current input thereby preserving some relationship of the current input with the previous
input.
Let’s take a simple task at first. Let’s take a character level RNN where we have a word “Hello”. So we
provide the first 4 letters i.e. h,e,l,l and ask the network to predict the last letter i.e.’o’. So here the vocabulary
of the task is just 4 letters {h,e,l,o}. In real case scenarios involving natural language processing, the
vocabularies include the words in entire wikipedia database, or all the words in a language. Here for simplicity
we have taken a very small set of vocabulary.
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 6/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
Let’s see how the above structure be used to predict the fifth letter in the word “hello”. In the above structure,
the blue RNN block, applies something called as a recurrence formula to the input vector and also its
previous state. In this case, the letter “h” has nothing preceding it, let’s take the letter “e”. So at the time the
letter “e” is supplied to the network, a recurrence formula is applied to the letter “e” and the previous state
which is the letter “h”. These are known as various time steps of the input. So if at time t, the input is “e”, at
time t-1, the input was “h”. The recurrence formula is applied to e and h both. and we get a new state.
Here, Ht is the new state, ht-1 is the previous state while xt is the current input. We now have a state of the
previous input instead of the input itself, because the input neuron would have applied the transformations on
our previous input. So each successive input is called as a time step.
In this case we have four inputs to be given to the network, during a recurrence formula, the same function
and the same weights are applied to the network at each time step.
Taking the simplest form of a recurrent neural network, let’s say that the activation function is tanh, the weight
at the recurrent neuron is Whh and the weight at the input neuron is Wxh, we can write the equation for the
state at time t as –
The Recurrent neuron in this case is just taking the immediate previous state into consideration. For longer
sequences the equation can involve multiple such states. Once the final state is calculated we can go on to
produce the output
Now, once the current state is calculated we can calculate the output state as-
1. A single time step of the input is supplied to the network i.e. xt is supplied to the network
2. We then calculate its current state using a combination of the current input and the previous state i.e.
we calculate ht
3. The current ht becomes ht-1 for the next time step
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 7/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
4. We can go as many time steps as the problem demands and combine the information from all the
previous states
5. Once all the time steps are completed the final current state is used to calculate the output yt
6. The output is then compared to the actual output and the error is generated
7. The error is then backpropagated to the network to update the weights(we shall go into the details of
backpropagation in further sections) and the network is trained
Let’s take a look of how we can calculate these states in Excel and get the output.
The inputs are one hot encoded. Our entire vocabulary is {h,e,l,o} and hence we can easily one hot encode
the inputs.
Now the input neuron would transform the input to the hidden state using the weight wxh. We have randomly
initialized the weights as a 3*4 matrix –
Step 1:
Now for the letter “h”, for the the hidden state we would need Wxh*Xt. By matrix multiplication, we get it as –
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 8/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
Step 2:
Now moving to the recurrent neuron, we have Whh as the weight which is a 1*1 matrix as and
For the letter “h”, the previous state is [0,0,0] since there is no letter prior to it.
Step 3:
Since for h, there is no previous hidden state we apply the tanh function to this output and get the current
state –
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 9/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
Step 4:
Now we go on to the next state. “e” is now supplied to the network. The processed output of ht, now becomes
ht-1, while the one hot encoded e, is xt. Let’s now calculate the current state ht.
Wxh*xt will be –
Step 5:
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 10/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
Now this would become ht-1 for the next state and the recurrent neuron would use this along with the new
character to predict the next one.
Step 6:
At each state, the recurrent neural network would produce the output as well. Let’s calculate yt for the letter e.
Step 7:
The probability for a particular letter from the vocabulary can be calculated by applying the softmax function.
so we shall have softmax(yt)
If we convert these probabilities to understand the prediction, we see that the model says that the letter after
“e” should be h, since the highest probability is for the letter “h”. Does this mean we have done something
wrong? No, so here we have hardly trained the network. We have just shown it two letters. So it pretty muchSubs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 11/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
Now the next BIG question that faces us is how does Back propagation work in case of a Recurrent Neural
Network. How are the weights updated while there is a feedback loop?
To imagine how weights would be updated in case of a recurrent neural network, might be a bit of a
challenge. So to understand and visualize the back propagation, let’s unroll the network at all the time steps.
In an RNN we may or may not have outputs at each time step.
In case of a forward propagation, the inputs enter and move forward at each time step. In case of a backward
propagation in this case, we are figuratively going back in time to change the weights, hence we call it the
Back propagation through time(BPTT).
In case of an RNN, if yt is the predicted value ȳt is the actual value, the error is calculated as a cross entropy
loss –
Et(ȳt,yt) = – ȳt log(yt)
E(ȳ,y) = – ∑ ȳt log(yt)
We typically treat the full sequence (word) as one training example, so the total error is just the sum of the
errors at each time step (character). The weights as we can see are the same at each time step. Let’s
summarize the steps for backpropagation
1. The cross entropy error is first computed using the current output and the actual output
2. Remember that the network is unrolled for all the time steps
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 12/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
3. For the unrolled network, the gradient is calculated for each time step with respect to the weight
parameter
4. Now that the weight is the same for all the time steps the gradients can be combined together for all
time steps
5. The weights are then updated for both recurrent neuron and the dense layers
The unrolled network looks much like a regular neural network. And the back propagation algorithm is similar
to a regular neural network, just that we combine the gradients of the error for all time steps. Now what do
you think might happen, if there are 100s of time steps. This would basically take really long for the network
to converge since after unrolling the network becomes really huge.
In case you do not wish to deep dive into the math of backpropagation, all you need to understand is that
back propagation through time works similar as it does in a regular neural network once you unroll the
recurrent neuron in your network. However, I shall be coming up with a detailed article on Recurrent Neural
networks with scratch with would have the detailed mathematics of the backpropagation algorithm in a
recurrent neural network.
Let’s use Recurrent Neural networks to predict the sentiment of various tweets. We would like to predict the
tweets as positive or negative. You can download the dataset here. (https://github.com/crwong/cs224u-
project/tree/master/data/sentiment)
We have around 1600000 tweets to train our network. If you’re not familiar with the basics of NLP, I would
strongly urge you to go through this article. (https://www.analyticsvidhya.com/blog/2017/01/ultimate-guide-to-
understand-implement-natural-language-processing-codes-in-python/) We also have another detailed article
(https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/) on word embedding
which would also be helpful for you to understand word embeddings in detail.
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 13/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
import pandas as pd
import numpy as np
import spacy
nlp=spacy.load("en")
X_train = train[train.columns[5]]
test_size=0.02,random_state=42 )
trainset2y=pd.get_dummies(trainset2y)
def stopwords(sentence):
new=[]
sentence=nlp(sentence)
for w in sentence:
if (w.is_stop == False) & (w.pos_ !="PUNCT"):
new.append(w.string.strip())
c=" ".join(str(x) for x in new)
return c
sentence=nlp(sentence)
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 14/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
str=""
for w in sentence:
str+=" "+w.lemma_
return nlp(str)
f = open(gloveFile,'r')
model = {}
for line in f:
splitLine = line.split()
word = splitLine[0]
embedding = [float(val) for val in splitLine[1:]]
model[word] = embedding
print ("Done."),len(model),(" words loaded!")
return model
model=loadGloveModel("/mnt/hdd/datasets/glove/glove.twitter.27B.200d.txt")
sent_vec = np.zeros(200)
numw = 0
for w in sent.split():
try:
except:
pass
return sent_vec
cleanvector=[]
for i in range(trainset2x.shape[0]):
document=trainset2x[i]
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 15/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
document=document.lower()
document=lemmatize(document)
document=str(document)
cleanvector.append(sent_vectorizer(document,model))
cleanvector =cleanvector.reshape(len(cleanvector),200,1)
tokenizer = Tokenizer(num_words=16000)
tokenizer.fit_on_texts(trainset2x)
sequences = tokenizer.texts_to_sequences(trainset2x)
word_index = tokenizer.word_index
print(data.shape)
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 16/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
nb_words=len(tokenizer.word_index)+1
embedding_vector = model.get(word)
if embedding_vector is not None:
embedding_matrix[i] = embedding_vector
print('Null word embeddings: %d' % np.sum(np.sum(embedding_matrix, axis=1) == 0))
trainy=np.array(trainy)
validy=np.array(validy)
def modelbuild():
model = Sequential()
model.add(keras.layers.InputLayer(input_shape=(15,1)))
keras.layers.embeddings.Embedding(nb_words, 15, weights=[embedding_matrix], input_length=15,
trainable=False)
model.add(keras.layers.Dense(units=2, input_dim=500,activation='softmax'))
finalmodel = modelbuild()
finalmodel.fit(trainx, trainy, epochs=10, batch_size=120,validation_data=(validx,validy))
If you would run this model, it may not provide you with the best results since this is an extremely simple
architecture and quite a shallow network. I would strongly urge you to play with the architecture of the
network to obtain better results. Also, there are multiple approaches to how to preprocess your data.
Preprocessing shall completely depend on the task at hand.
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 17/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
RNNs work upon the fact that the result of an information is dependent on its previous state or previous n
time steps. Regular RNNs might have a difficulty in learning long range dependencies. For instance if we
have a sentence like “The man who ate my pizza has purple hair”. In this case, the description purple hair is
for the man and not the pizza. So this is a long dependency.
If we backpropagate the error in this case, we would need to apply the chain rule. To calculate the error after
the third time step with respect to the first one –
Here we apply the chain rule and if any one of the gradients approached 0, all the gradients would rush to
zero exponentially fast due to the multiplication. Such states would no longer help the network to learn
anything. This is known as the vanishing gradient problem.
Vanishing gradient problem is far more threatening as compared to the exploding gradient problem, where
the gradients become very very large due to a single or multiple gradient values becoming very high.
The reason why Vanishing gradient problem is more concerning is that an exploding gradient problem can be
easily solved by clipping the gradients at a predefined threshold value. Fortunately there are ways to handle
vanishing gradient problem as well. There are architectures like the LSTM(Long Short term memory) and the
GRU(Gated Recurrent Units) which can be used to deal with the vanishing gradient problem.
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 18/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
As we saw, RNNs suffer from vanishing gradient problems when we ask them to handle long term
dependencies. They also become severely difficult to train as the number of parameters become extremely
large. If we unroll the network, it becomes so huge that its convergence is a challenge.
Long Short Term Memory networks – usually called “LSTMs” – are a special kind of RNN, capable of learning
long-term dependencies. They were introduced by Hochreiter & Schmidhuber. They work tremendously well
on a large variety of problems, and are now widely used. LSTMs also have this chain like structure, but the
repeating module has a slightly different structure. Instead of having a single neural network layer, there are
multiple layers, interacting in a very special way. They have an input gate, a forget gate and an output gate.
We shall be coming up with detailed article on LSTMs soon.
Another efficient RNN architecture is the Gated Recurrent Units i.e. the GRUs. They are a variant of LSTMs
but are simpler in their structure and are easier to train. Their success is primarily due to the gating network
signals that control how the present input and previous memory are used, to update the current activation
and produce the current state. These gates have their own sets of weights that are adaptively updated in the
learning phase. We have just two gates here, the reset an the update gate. Stay tuned for more detailed
articles on GRUs.
End Notes
I hope that this article would have given you a head start with the Recurrent Neural Networks. In the
upcoming articles we shall deep dive into the complex mathematics of Recurrent Neural Networks along with
the detailed descriptions of LSTMs and GRUs. Try playing with the architecture of these RNNs and be
amazed by their performance and applications. Do share your findings and approach in the comments
section.
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 19/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
You can also read this article on Analytics Vidhya's Android APP
(//play.google.com/store/apps/details?
id=com.analyticsvidhya.android&utm_source=blog_article&utm_campaign=blog&pcampaignid=MKT-Other-
global-all-co-prtnr-py-PartBadge-Mar2515-1)
Share this:
(https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/?share=linkedin&nb=1)
(https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/?share=facebook&nb=1)
436
(https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/?share=google-plus-1&nb=1)
(https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/?share=twitter&nb=1)
(https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/?share=pocket&nb=1)
(https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/?share=reddit&nb=1)
Like this:
Loading...
(HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/TAG/SEQUENCE-PREDICTION-WITH-NEURAL-NETWORKS/)
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 20/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
NEXT ARTICLE
h
PREVIOUS ARTICLE
(https://www.analyticsvidhya.com/blog/author/dishashree26/)
Dishashree Gupta
(Https://Www.Analyticsvidhya.Com/Blog/Author/Dishashree26/)
Dishashree is passionate about statistics and is a machine learning enthusiast. She has an experience
of 1.5 years of Market Research using R, advanced Excel, Azure ML.
RELATED ARTICLES
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 21/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
The 5 Best Machine Learning An Intuitive Understanding of The Ultimate Learning Path to
GitHub Repositories & Reddit Word Embeddings: From Count Becoming a Data Scientist in
TAVISH SRIVASTAVA (HTTPS://WWW.A… NSS (HTTPS://WWW.ANALYTICSVIDHYA… SHUBHAM JAIN (HTTPS://WWW.ANALY…
Threads from August 2018 Vectors to Word2Vec 2018
(https://www.analyticsvidhya.co (https://www.analyticsvidhya.co (https://www.analyticsvidhya.co
m/blog/2018/09/best-machine- m/blog/2017/06/word- m/blog/2018/01/ultimate-learning-
learning-github-repositories- embeddings-count-word2veec/) path-becoming-data-scientist-
reddit-threads-august-2018/) 2018/)
32 COMMENTS
Thanks for this extensive explanation, it is very hard to find really good articles that explains the basics of
RNN.
Great article.
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 22/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
REKHA Reply
December 20, 2017 at 10:05 am (https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-
networks/#comment-148776)
Excellent explanation
MUNAZIR Reply
December 20, 2017 at 3:08 pm (https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-
networks/#comment-148808)
Actually, the confusion was created because there was a typo in arrow (->) being written minus(-)
PIERRE Reply
January 16, 2018 at 3:47 pm (https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-
networks/#comment-150790)
Hello, isn’t there an extra zero in step 2 ? A 3×1 row vector multiplicated by a single number can’t lead to a
4×1 vector, that is Ht-1 should be [0,0,0]…?
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 23/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
Nice Article
AMAN Reply
March 14, 2018 at 5:42 am (https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-
networks/#comment-151889)
Hi Aman,
Make sure the data you have downloaded is not corrupted. Could you re-download it and try again?
The article is great and easily understandable but i think in the step 2 of Forward Propagation in a Recurrent
Neuron in Excel has 2 mistakes:
1. the formula should be Whh * ht-1 + bias
2. the ht-1 box should be of size 3×1 i.e only 3 zeros
correct me if i m wrong.
thank u for the great explanation…
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 24/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
Hi Anudeep,
RAY Reply
March 31, 2018 at 9:30 am (https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-
networks/#comment-152287)
For the letter “h”, the previous state is [0,0,0,0] since there is no letter prior to it.
Hey Ray, I have updated the same. Thanks for the feedback
VIGNESH Reply
April 2, 2018 at 9:20 pm (https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-
networks/#comment-152319)
Excellent article. Could you please explain why the matrix whx is 3*4 and whh is 1*1. Generally how do you
determine the shape of the weight matrix.
Hi Vignesh,
As mentioned in the article “weight at the recurrent neuron is Whh and the weight at the input neuron is Wxh”.
The shape of weight matrix depends on the input (connections to the previous layer) as well as the output
(connections to the next layer) of a specific neuron
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 25/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
CHANDAN Reply
April 22, 2018 at 9:07 am (https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-
networks/#comment-152762)
cleanvector =cleanvector.reshape(32000,200,1)
Hi Chandan,
The code in the article works fine for me. Can you please check if you have not skipped any of the steps.
I also faced the same problem as Chandan, because the data set on the given link contains only 20000 rows.
Then I tried this link which contains the large dataset :
https://github.com/asheeshgarg/stock/blob/master/stocksentiment/polarityData/TweetCorpus/training.1600000
(https://github.com/asheeshgarg/stock/blob/master/stocksentiment/polarityData/TweetCorpus/training.1600000
VISH Reply
April 27, 2018 at 3:49 pm (https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-
networks/#comment-152883)
Hi Vish,
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 26/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
You can read this article: Essentials of Deep Learning : Introduction to Long Short Term Memory
(https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/)
SANDEEP Reply
May 8, 2018 at 9:30 am (https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-
networks/#comment-153108)
“Now the input neuron would transform the input to the hidden state using the weight wxh. We have randomly
initialized the weights as a 3*4 matrix”
Hi Sandeep,
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 27/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
In this case, we have 4 inputs and three hidden layers. So we initialize the weight matrix as a 3×4 matrix.
SANDEEP Reply
May 9, 2018 at 7:54 pm (https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-
networks/#comment-153136)
Hi Aishwarya,
Could you please explain more? Why did we had 3 hidden layers and not less or more?
SAID Reply
May 19, 2018 at 7:53 pm (https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-
networks/#comment-153401)
Hi
I think the weight matrix 3×4 matrix is for ONE hidden layer that contains 3 neurons , three neurons is just a
choice that can be changed like in the fully connected neural network .
SANDEEP Reply
May 9, 2018 at 7:55 pm (https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-
networks/#comment-153137)
Hi Aishwarya,
Could you please explain more? Why did we have r hidden layers and not less or more?
SAID Reply
May 19, 2018 at 7:48 pm (https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-
networks/#comment-153400)
Hi ,
I think the weight matrix 3×4 matrix is for ONE hidden layer that contains 3 neurons . three neurons is just a
choice that can be changed like in the fully connected neural network .
PRATIK Reply
May 27, 2018 at 2:54 am (https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-
networks/#comment-153600)
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 28/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
Hi Faizan,
In this line “Now the input neuron would transform the input to the hidden state using the weight wxh. We
have randomly initialized the weights as a 3*4 matrix –”
How you came up with 3*4 matrix. What does it signify ( 3 and 4).
Thanks,
Pratik
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 29/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
Scopri di più
Scopri di più
POPULAR POSTS
24 Ultimate Data Science Projects To Boost Your Knowledge and Skills (& can be accessed freely)
(https://www.analyticsvidhya.com/blog/2018/05/24-ultimate-data-science-projects-to-boost-your-
knowledge-and-skills/)
A Complete Tutorial to Learn Data Science with Python from Scratch
(https://www.analyticsvidhya.com/blog/2016/01/complete-tutorial-learn-data-science-python-scratch-2/) Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 30/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
Scopri di più
RECENT POSTS
A Practical Guide to Object Detection using the Popular YOLO Framework – Part III (with Python
codes) (https://www.analyticsvidhya.com/blog/2018/12/practical-guide-object-detection-yolo-
framewor-python/)
DECEMBER 6, 2018
5 Best Machine Learning GitHub Repositories & Reddit Discussions (November 2018)
(https://www.analyticsvidhya.com/blog/2018/12/best-machine-learning-github-repositories-reddit-
threads-november-2018/)
DECEMBER 3, 2018
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 31/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
Building a Random Forest from Scratch & Understanding Real-World Data Products (ML for
Programmers – Part 3) (https://www.analyticsvidhya.com/blog/2018/12/building-a-random-forest-from-
scratch-understanding-real-world-data-products-ml-for-programmers-part-3/)
DECEMBER 3, 2018
Tutorial on Text Classification (NLP) using ULMFiT and fastai Library in Python
(https://www.analyticsvidhya.com/blog/2018/11/tutorial-text-classification-ulmfit-fastai-library/)
NOVEMBER 29, 2018
(http://www.edvancer.in/certified-data-scientist-with-python-
course?utm_source=AV&utm_medium=AVads&utm_campaign=AVadsnonfc&utm_content=pythonavad)
(https://trainings.analyticsvidhya.com/courses/course-
v1:AnalyticsVidhya+NLP101+2018_T1/about?utm_source=AVstickybannertop)
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 32/33
12/9/2018 Fundamentals of Deep Learning – Introduction to Recurrent Neural Networks
(https://datahack.analyticsvidhya.com/contest/the-convergence-
of-big-data-and-machine-learning-p/?utm_source=AVBlogbanner)
ANALYTICS DATA COMPANIES JOIN OUR COMMUNITY :
VIDHYA SCIENTISTS
Post Jobs
About Us Blog (https://www.analyticsvidhya.com/corporate/)
(http://www.analyticsvidhya.com/about-
(https://www.analyticsvidhya.com/blog)
Trainings (https://www.facebook.com/AnalyticsV
(https://twitter.com/ana
me/) Hackathon (https://trainings.analyticsvidhya.com)
46336 18860
Our Team (https://datahack.analyticsvidhya.com/)
Hiring
(https://www.analyticsvidhya.com/about-
Hackathons (https://www.facebook.com/AnalyticsV
(https://twitter.com/ana
Discussions
me/team/) (https://datahack.analyticsvidhya.com/)
(https://discuss.analyticsvidhya.com/) Followers Followers
Career Apply Jobs Advertising
(https://www.analyticsvidhya.com/career- (https://www.facebook.com/AnalyticsV
(https://twitter.com/ana
(https://www.analyticsvidhya.com/contact/)
(https://www.analyticsvidhya.com/jobs/)
analytics-vidhya/)
Leaderboard Reach Us
Contact Us (https://www.analyticsvidhya.com/contact/)
(https://plus.google.com/+Analyticsvidh
(https://datahack.analyticsvidhya.com/users/)
(https://www.analyticsvidhya.com/contact/)
(https://www.linkedin.c
Write for us 3041 7513
(https://www.analyticsvidhya.com/about- (https://plus.google.com/+Analyticsvidh
(https://www.linkedin.c
me/write/)
Followers Followers
(https://plus.google.com/+Analyticsvidh
(https://www.linkedin.c
© Copyright 2013-2018 Analytics Vidhya. Privacy Policy (https://www.analyticsvidhya.com/privacy-policy/) Don't have an account? Sign up (h
Terms of Use (https://www.analyticsvidhya.com/terms/)
Subs
https://www.analyticsvidhya.com/blog/2017/12/introduction-to-recurrent-neural-networks/ 33/33