A Complete Guide To LSTM Architecture and Its Use in Text Classification
A Complete Guide To LSTM Architecture and Its Use in Text Classification
analyticsindiamag.com
8-10 minutes
Table of Contents
THE BELAMY
Introduction to LSTM
1 of 10 8/19/2022, 2:25 AM
A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...
1. Forget gate
2. Input gate.
3. Cell state
4. Output gate
There can be various LSTM network types but we can divide them
roughly into three types.
As the name suggests the forward pass and backward pass LSTM
are unidirectional LSTM which process the information in one
direction either on the forward side or on the backside where the
bidirectional LSTM processes the data on both sides to persist the
information. All the above-given LSTM types work on a basic
structure. Updating the basic structure causes the difference
between various LSTM. Next, in the article, we will see different
2 of 10 8/19/2022, 2:25 AM
A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...
Forget gate
Input gate.
Output gate
Image source
As the hidden layers and various gates are added to the simple
LSTM it changes its type. Like in BI LSTM network it can consist of
two LSTM passing information in an opposite or similar manner.
3 of 10 8/19/2022, 2:25 AM
A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...
Forget Gate
The above image shows a circuit of Forget gate where h and x are
information. This information goes through the sigmoid function
where the information which has a tendency towards zero gets
eliminated from the network.
4 of 10 8/19/2022, 2:25 AM
A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...
Input Gate
Cell State
The weight gained information goes through the cell state where
5 of 10 8/19/2022, 2:25 AM
A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...
this layer calculates the cell state. In the cell state, the output of
the forget gate and input gate gets multiplied by each other. The
information which has the possibility of dropping out gets multiplied
with near-zero values.
Here in the cell state, an addition between input and the output
values takes place which tries to get the cell state updated with the
information which is relevant to the network.
Output Gate
It is the last gate of the circuit that helps in deciding the next
hidden state of the network in which information goes through the
sigmoid function. Updated cell from the cell state goes to the tanh
function then it gets multiplied by the sigmoid function of the output
state. Which helps the hidden state to carry the information.
6 of 10 8/19/2022, 2:25 AM
A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...
This is the final stage of the circuit which helps the hidden state to
decide which information it should carry.
Here applying LSTM networks can have its own special feature.
Earlier in the article, we have discussed that LSTM has a feature
through which it can memorize the sequence of the data. It has
one more feature that it works on the elimination of unused
information and as we know the text data always consists a lot of
7 of 10 8/19/2022, 2:25 AM
A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...
You can use the full code for making the model on a similar data
set.
import numpy as np
from keras.datasets import imdb
from keras.layers import LSTM, embeddings, dense
from keras.preprocessing.sequence import pad_sequence
# fix random seed for reproducibility
np.random.seed(7)
# load the dataset but only keep the top 6000 words
(X_train, y_train), (X_test, y_test) =
imdb.load_data(num_words=6000)
# pad input sequences
X_train = pad_sequences(X_train, maxlen=500)
X_test = pad_sequences(X_test, maxlen=500)
#model
model = Sequential()
model.add(Embedding(6000, 32, input_length=500))
model.add(LSTM(100))
8 of 10 8/19/2022, 2:25 AM
A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam', metrics=['accuracy'])
print(model.summary())
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))
Here in the data set, we have good or bad reviews which can be
classified as 0 and 1 values. The loss function in binary cross-
entropy and it is suggested to use adam optimization when
working with text classification.
The below image shows the results and summary of the model
which we have created.
9 of 10 8/19/2022, 2:25 AM
A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...
Final Words
References
10 of 10 8/19/2022, 2:25 AM