0% found this document useful (0 votes)

50 views

A Complete Guide To LSTM Architecture and Its Use in Text Classification

The document provides an overview of LSTM architecture and its use for text classification. It discusses the components of an LSTM network, including forget gates, input gates, cell states, and output gates. LSTM networks are well-suited for text classification because they can remember sequential information in text data and discard unnecessary information through their gating mechanisms. The document uses examples to illustrate how information flows through each component of an LSTM network during the forward and backward passes.

Uploaded by

ahmed awsi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views

A Complete Guide To LSTM Architecture and Its Use in Text Classification

Uploaded by

ahmed awsi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...

analyticsindiamag.com

A Complete Guide to LSTM

Architecture and its Use in Text
Classification
Yugesh Verma

8-10 minutes

In the modern age of data science, neural networks are emerging

drastically because they have the ability to perform tasks rapidly
and easily. There are various kinds of neural networks which we
use to perform a variety of tasks. Here in this article, we will be
focused on the LSTM model, one of the variants of the neural
network. In one of our previous articles, we have discussed that
the LSTM networks perform better with sequential data like time
series. Here, we will consider text data as the sequential data and
we will try to fit a LSTM model with this. The major points to be
discussed in this article are given below.

Table of Contents

THE BELAMY

Sign up for your weekly dose of what's up in emerging technology.

Introduction to LSTM

The Architecture of LSTM

1 of 10 8/19/2022, 2:25 AM
A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...

1. Forget gate

2. Input gate.

3. Cell state

4. Output gate

Why do we use LSTM with text data?

Text classification using LSTM

LSTM (Long Short-Term Memory) network is a type of RNN

(Recurrent Neural Network) that is widely used for learning
sequential data prediction problems. As every other neural
network LSTM also has some layers which help it to learn and
recognize the pattern for better performance. The basic operation
of LSTM can be considered to hold the required information and
discard the information which is not required or useful for further
prediction.

There can be various LSTM network types but we can divide them
roughly into three types.

LSTM forward pass

LSTM backwards pass

Bidirectional LSTM or Bi-LSTM

As the name suggests the forward pass and backward pass LSTM
are unidirectional LSTM which process the information in one
direction either on the forward side or on the backside where the
bidirectional LSTM processes the data on both sides to persist the
information. All the above-given LSTM types work on a basic
structure. Updating the basic structure causes the difference
between various LSTM. Next, in the article, we will see different

2 of 10 8/19/2022, 2:25 AM
A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...

components of a basic LSTM model architecture.

The Architecture of LSTM

A simple LSTM network consists of the following components.

Forget gate

Input gate.

Output gate

Image source

As the hidden layers and various gates are added to the simple
LSTM it changes its type. Like in BI LSTM network it can consist of
two LSTM passing information in an opposite or similar manner.

Let’s have an overview of the gates and the state.

3 of 10 8/19/2022, 2:25 AM
A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...

Forget Gate

As we have discussed earlier, one of the main properties of the

LSTM is to memorize and recognize the information coming inside
the network and also to discard the information which is not
required to the network to learn the data and predictions. This gate
is responsible for this feature of the LSTM.

It helps in deciding whether information can pass through the

layers of the network. There are two types of input it expects from
the network one of them is the information from the previous layers
and another one is the information from the presentation layer.

The above image shows a circuit of Forget gate where h and x are
information. This information goes through the sigmoid function
where the information which has a tendency towards zero gets
eliminated from the network.

4 of 10 8/19/2022, 2:25 AM
A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...

Input Gate

Input gate helps in deciding the importance of the information by

updating the cell state. where the forget gate helps in the
elimination of the information from the network input gate decides
the measure of the importance of the information and helps the
forget function in elimination of the not important information and
other layers to learn the information which is important for making
predictions.

The information goes through the sigmoid and tanh functions

where the sigmoid decides the weight of information and tanh
reduces the bias of the network.

Cell State

The weight gained information goes through the cell state where

5 of 10 8/19/2022, 2:25 AM
A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...

this layer calculates the cell state. In the cell state, the output of
the forget gate and input gate gets multiplied by each other. The
information which has the possibility of dropping out gets multiplied
with near-zero values.

Here in the cell state, an addition between input and the output
values takes place which tries to get the cell state updated with the
information which is relevant to the network.

Output Gate

It is the last gate of the circuit that helps in deciding the next
hidden state of the network in which information goes through the
sigmoid function. Updated cell from the cell state goes to the tanh
function then it gets multiplied by the sigmoid function of the output
state. Which helps the hidden state to carry the information.

6 of 10 8/19/2022, 2:25 AM
A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...

This is the final stage of the circuit which helps the hidden state to
decide which information it should carry.

Why do we use LSTM with text data?

When performing normal text modelling, most of the

preprocessing task and modelling task focuses on creating data
sequentially. Examples of such tasks can be POS tagging,
stopwords elimination, sequencing of the text. These are the
methods that try to make data understood by a model with less
effort according to the known pattern. It can give the results.

Here applying LSTM networks can have its own special feature.
Earlier in the article, we have discussed that LSTM has a feature
through which it can memorize the sequence of the data. It has
one more feature that it works on the elimination of unused
information and as we know the text data always consists a lot of

7 of 10 8/19/2022, 2:25 AM
A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...

unused information which can be eliminated by the LSTM so that

the calculation timing and cost can be reduced,

So basically the feature of elimination of unused information and

memorizing the sequence of the information makes the LSTM a
powerful tool for performing text classification or other text-based
tasks.

Text classification using LSTM

In this section, I have created a LSTM model for text classification

using the IMDB data set provided by Keras that has the reviews on
the movies provided by the users on the IMDB site.

You can use the full code for making the model on a similar data
set.

import numpy as np
from keras.datasets import imdb
from keras.layers import LSTM, embeddings, dense
from keras.preprocessing.sequence import pad_sequence
# fix random seed for reproducibility
np.random.seed(7)
# load the dataset but only keep the top 6000 words
(X_train, y_train), (X_test, y_test) =
imdb.load_data(num_words=6000)
# pad input sequences
X_train = pad_sequences(X_train, maxlen=500)
X_test = pad_sequences(X_test, maxlen=500)
#model
model = Sequential()
model.add(Embedding(6000, 32, input_length=500))
model.add(LSTM(100))

8 of 10 8/19/2022, 2:25 AM
A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...

model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='adam', metrics=['accuracy'])
print(model.summary())
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))

Before processing the model we created a similar pad sequence

of the data so that it can be put to the model with the same length.

In the modelling, we are making a sequential model. The first layer

of the model is the embedding layer which uses the 32 length
vector, and the next layer is the LSTM layer which has 100
neurons which will work as the memory unit of the model. After
LSTM, the dense layer which is an output layer with sigmoid
function, sigmoid function helps in providing the labels.

Here in the data set, we have good or bad reviews which can be
classified as 0 and 1 values. The loss function in binary cross-
entropy and it is suggested to use adam optimization when
working with text classification.

The below image shows the results and summary of the model
which we have created.

9 of 10 8/19/2022, 2:25 AM
A Complete Guide to LSTM Architecture and its Use in Text Classification about:reader?url=https%3A%2F%2Fround-lake.dustinice.workers.dev%3A443%2Fhttps%2Fanalyticsindiamag.com%2Fa-comp...

Here in the model, we used only 3 epochs so that with smaller

data the model will not get overfitted. In the image, we can see the
result from the model is very satisfactory. It has increased to
around 90% and the final accuracy of all three epochs is 85%.

Final Words

As we have seen in the article we have done nothing in data

preprocessing, we just called the data and put it into a simple
LSTM model and the model has given very satisfactory results. We
can do a number of edits in the data or in the model which can be
more helpful for increasing the accuracy of our work. LSTM is a
commonly used network with sequential data like time series data,
audio data. There are various tasks we can perform in the time
series analysis domain using LSTM.

References

Long short-term memory.

The Sequential class.

Google Colab notebook for above codes

10 of 10 8/19/2022, 2:25 AM

What is LSTM - Long Short Term Memory_ - GeeksforGeeks
No ratings yet
What is LSTM - Long Short Term Memory_ - GeeksforGeeks
10 pages
2019 BEDA MEMAID REM-Compressed PDF
92% (12)
2019 BEDA MEMAID REM-Compressed PDF
805 pages
Bus 5112 Marketing Management Written Assignment Unit 6
100% (1)
Bus 5112 Marketing Management Written Assignment Unit 6
6 pages
FDA Approved Sorted API Manufacturers
50% (2)
FDA Approved Sorted API Manufacturers
10 pages
LSTM Networks Thesis Updated
No ratings yet
LSTM Networks Thesis Updated
5 pages
LSTM_networks_in_python__1723896317
No ratings yet
LSTM_networks_in_python__1723896317
17 pages
LSTM_AryanGomes
No ratings yet
LSTM_AryanGomes
13 pages
Long Short-Term Memory (LSTM) by Mohsin
No ratings yet
Long Short-Term Memory (LSTM) by Mohsin
17 pages
Unlocking The Power of Long Short-Term Memory (LSTM) Networks - by Sachinsoni - Medium
No ratings yet
Unlocking The Power of Long Short-Term Memory (LSTM) Networks - by Sachinsoni - Medium
23 pages
UNIT-III
No ratings yet
UNIT-III
5 pages
LSTM
No ratings yet
LSTM
12 pages
LSTM
No ratings yet
LSTM
19 pages
longshorttermmemorylstm-231215171600-1feb7b1b
No ratings yet
longshorttermmemorylstm-231215171600-1feb7b1b
17 pages
EPJ LSTM Survey
No ratings yet
EPJ LSTM Survey
14 pages
9 RNN LSTM Gru
No ratings yet
9 RNN LSTM Gru
91 pages
LSTM_1738024034
No ratings yet
LSTM_1738024034
13 pages
lstm
No ratings yet
lstm
12 pages
LSTM by Bushra
No ratings yet
LSTM by Bushra
16 pages
LSTM
No ratings yet
LSTM
22 pages
DL CO-3 PPT 3
No ratings yet
DL CO-3 PPT 3
19 pages
Neural Networks
No ratings yet
Neural Networks
22 pages
LSTM
No ratings yet
LSTM
27 pages
Long Short-Term Memory: Machine Learning Data Mining
No ratings yet
Long Short-Term Memory: Machine Learning Data Mining
6 pages
Introduction to Long Short-Term Memory(LSTM) _ Simplilearn
No ratings yet
Introduction to Long Short-Term Memory(LSTM) _ Simplilearn
7 pages
6 - RNN LSTM & Gru
No ratings yet
6 - RNN LSTM & Gru
14 pages
LSTM Presentation
No ratings yet
LSTM Presentation
23 pages
34-Long-Term Dependencies - Echo State Networks - Long Short-Term Memory and Othe-03!10!2024
No ratings yet
34-Long-Term Dependencies - Echo State Networks - Long Short-Term Memory and Othe-03!10!2024
14 pages
What is LSTM
No ratings yet
What is LSTM
5 pages
DLT UNIT-4
No ratings yet
DLT UNIT-4
18 pages
Long Short Term Memory (LSTM)
No ratings yet
Long Short Term Memory (LSTM)
33 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
8 pages
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
No ratings yet
Illustrated Guide To LSTM's and GRU'S - A Step by Step Explanation - by Michael Phi - Towards Data Science
15 pages
LSTM_Architecture_Presentation
No ratings yet
LSTM_Architecture_Presentation
18 pages
Short Notes On Vanishing & Exploding Gradients
No ratings yet
Short Notes On Vanishing & Exploding Gradients
30 pages
LSTM
No ratings yet
LSTM
14 pages
Cs224n 2025 Lecture06 Fancy Rnn
No ratings yet
Cs224n 2025 Lecture06 Fancy Rnn
57 pages
DeepuGupta1057 ML601
No ratings yet
DeepuGupta1057 ML601
9 pages
Understanding LSTM_ A Simple Guide with Diagrams and Real-Time Examples _ by Neural pAi _ Feb, 2025 _ Medium
No ratings yet
Understanding LSTM_ A Simple Guide with Diagrams and Real-Time Examples _ by Neural pAi _ Feb, 2025 _ Medium
15 pages
Introduction To Long Short Term Memory LSTM
No ratings yet
Introduction To Long Short Term Memory LSTM
6 pages
Sherstinsky 2020
No ratings yet
Sherstinsky 2020
28 pages
What is LSTM - Long Short Term Memory_ - GeeksforGeeks
No ratings yet
What is LSTM - Long Short Term Memory_ - GeeksforGeeks
14 pages
Long Short-Term Memory
No ratings yet
Long Short-Term Memory
9 pages
LSTM
No ratings yet
LSTM
24 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
7 pages
OlahLSTM NEURAL NETWORK TUTORIAL 15
No ratings yet
OlahLSTM NEURAL NETWORK TUTORIAL 15
9 pages
E-Eli5-Way-3bd2b1164a53: CNN (Source:)
No ratings yet
E-Eli5-Way-3bd2b1164a53: CNN (Source:)
4 pages
Long Short-Term Memory Survey Paper
No ratings yet
Long Short-Term Memory Survey Paper
6 pages
Kgptalkie Com Multi Step Time Series Predicting Using RNN LSTM
No ratings yet
Kgptalkie Com Multi Step Time Series Predicting Using RNN LSTM
32 pages
Implementation and Optimization of The Accelerator Based On FPGA Hardware For LSTM Network
No ratings yet
Implementation and Optimization of The Accelerator Based On FPGA Hardware For LSTM Network
8 pages
RNN and LSTM
No ratings yet
RNN and LSTM
15 pages
LSTM: A Search Space Odyssey: Klaus Greff, Rupesh K. Srivastava, Jan Koutn Ik, Bas R. Steunebrink, J Urgen Schmidhuber
No ratings yet
LSTM: A Search Space Odyssey: Klaus Greff, Rupesh K. Srivastava, Jan Koutn Ik, Bas R. Steunebrink, J Urgen Schmidhuber
12 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
41 pages
Unit 6
No ratings yet
Unit 6
41 pages
LSTM Material 1
No ratings yet
LSTM Material 1
3 pages
Recurrent Neural Network Using LSTM Model
No ratings yet
Recurrent Neural Network Using LSTM Model
15 pages
Understanding LSTM Networks - Colah's Blog
No ratings yet
Understanding LSTM Networks - Colah's Blog
7 pages
Presentation Title
No ratings yet
Presentation Title
10 pages
Understanding LSTM Networks
No ratings yet
Understanding LSTM Networks
15 pages
Colah Github Io Posts 2015 08 Understanding LSTMs
No ratings yet
Colah Github Io Posts 2015 08 Understanding LSTMs
16 pages
Stock Prediction
No ratings yet
Stock Prediction
27 pages
Long Short-Term Memory (LSTM)
No ratings yet
Long Short-Term Memory (LSTM)
25 pages
Long Short Term Memory Networks - Architecture of LSTM
No ratings yet
Long Short Term Memory Networks - Architecture of LSTM
14 pages
Rust for Network Programming and Automation, Second Edition
From Everand
Rust for Network Programming and Automation, Second Edition
Gilbert Stew
No ratings yet
Citizen Newsletter: The Conservative Voice of Henry County
No ratings yet
Citizen Newsletter: The Conservative Voice of Henry County
9 pages
Dead Lock
No ratings yet
Dead Lock
14 pages
Guía para Hacer Examen Físico Por Telemedicina.
No ratings yet
Guía para Hacer Examen Físico Por Telemedicina.
4 pages
Unit 1 Merged
No ratings yet
Unit 1 Merged
338 pages
[FREE PDF sample] Trump University Branding 101 How to Build the Most Valuable Asset of Any Business 1st Edition Donald Sexton ebooks
No ratings yet
[FREE PDF sample] Trump University Branding 101 How to Build the Most Valuable Asset of Any Business 1st Edition Donald Sexton ebooks
41 pages
Late Cretaceous Stratigraphy of The Upper Magdalena Basin in The Payandwhaparral Segment (Western Girardot Sub-Basin), Colombia
No ratings yet
Late Cretaceous Stratigraphy of The Upper Magdalena Basin in The Payandwhaparral Segment (Western Girardot Sub-Basin), Colombia
17 pages
PGY1-2 Reading List
No ratings yet
PGY1-2 Reading List
23 pages
Anchorbolt (318 08)
No ratings yet
Anchorbolt (318 08)
28 pages
402 - Villanueva-Mijares v. CA, 12 April 2000
No ratings yet
402 - Villanueva-Mijares v. CA, 12 April 2000
2 pages
CVE 303 - 3. Probability Distributions
No ratings yet
CVE 303 - 3. Probability Distributions
54 pages
Plastic Water Bottles As Plant Pots
No ratings yet
Plastic Water Bottles As Plant Pots
3 pages
Bugreport Pomelo - Global QKQ1.200830.002 2021 12 27 16 42 43 Dumpstate - Log 28176
No ratings yet
Bugreport Pomelo - Global QKQ1.200830.002 2021 12 27 16 42 43 Dumpstate - Log 28176
27 pages
Business Analysis Valuation Using Financial Statements Text and Cases Asia Pacific Edition 2nd Edition Wright Solutions Manual - Download All Chapters Immediately In PDF Format
100% (2)
Business Analysis Valuation Using Financial Statements Text and Cases Asia Pacific Edition 2nd Edition Wright Solutions Manual - Download All Chapters Immediately In PDF Format
34 pages
Patterson Park CDC Files For Bankruptcy
100% (1)
Patterson Park CDC Files For Bankruptcy
21 pages
8 CLSP Mortgages and Charges
100% (1)
8 CLSP Mortgages and Charges
5 pages
Contoh Soal Application Letter
No ratings yet
Contoh Soal Application Letter
2 pages
The Five Pillar of Green Bonds
No ratings yet
The Five Pillar of Green Bonds
9 pages
مشروع 2 ادارة2021
No ratings yet
مشروع 2 ادارة2021
78 pages
mRX516 517 PDF
No ratings yet
mRX516 517 PDF
36 pages
Untitled
No ratings yet
Untitled
4 pages
MBA80 Manual - Sneak Peek PDF
No ratings yet
MBA80 Manual - Sneak Peek PDF
14 pages
Musharakah
No ratings yet
Musharakah
18 pages
Stps 10120 C
No ratings yet
Stps 10120 C
9 pages
(Reduction of Reserve Price) Palpap Ichinichi Software v. Indian Bank, 2011 SCC OnLine Mad 1502
No ratings yet
(Reduction of Reserve Price) Palpap Ichinichi Software v. Indian Bank, 2011 SCC OnLine Mad 1502
6 pages
As Resume
No ratings yet
As Resume
4 pages
HMP 10.13 Enoc
No ratings yet
HMP 10.13 Enoc
46 pages
BRD - Tool Life Monitoring
No ratings yet
BRD - Tool Life Monitoring
6 pages