0% found this document useful (0 votes)
95 views12 pages

Understanding LSTM Architecture

Deep learning

Uploaded by

akshithasonia333
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views12 pages

Understanding LSTM Architecture

Deep learning

Uploaded by

akshithasonia333
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

LONG SHORT TERM

MEMORY (LSTM)

- Noureen Tabassum
23011DB025
M. Tech (Data Science)
WHAT IS LSTM?
 LSTM (Long Short-Term Memory) is a recurrent neural network
(RNN) architecture widely used in Deep Learning. It excels at capturing
long-term dependencies, making it ideal for sequence prediction tasks.
 Unlike traditional neural networks, LSTM incorporates feedback
connections, allowing it to process entire sequences of data, not just
individual data points
 This makes it highly effective in understanding and predicting patterns
in sequential data like time series, text, and speech.
 LSTM has become a powerful tool in artificial intelligence and deep
learning, enabling breakthroughs in various fields by uncovering
valuable insights from sequential data.
LSTM ARCHITECTURE
In the introduction to long short-term memory, we learned that it resolves the
vanishing gradient problem faced by RNN, so now, in this section, we will
see how it resolves this problem by learning the architecture of the LSTM. At
a high level, LSTM works very much like an RNN cell. Here is the internal
functioning of the LSTM network. The LSTM network architecture consists
of three parts, as shown in the image below, and each part performs an
individual function.
THE LOGIC BEHIND LSTM
 These three parts of an LSTM unit are known as gates. They control the flow
of information in and out of the memory cell or lstm cell..
 The first gate is called Forget gate, the second gate is known as the Input
gate, and the last one is the Output gate.
 An LSTM unit that consists of these three gates and a memory cell or lstm cell
can be considered as a layer of neurons in traditional feedforward neural
network, with each neuron having a hidden layer and a current state.
 Similar to that of RNN, an LSTM also has a hidden state where H(t-1) represents
the hidden state of the previous timestamp and Ht is the hidden state of the current
timestamp.
 In addition to that, LSTM also has a cell state represented by C(t-1) and C(t) for
the previous and current timestamps, respectively.
 Here the hidden state is known as Short term memory, and the cell state is known
as Long term memory. Refer to the following image.
ROLES OF GATES IN ARCHITECTURE
1. Forget Gate:
 In a cell of the LSTM neural network, the first step is to decide whether we
should keep the information from the previous time step or forget it. Here is the
equation for forget gate.

•Xt : input to the current timestamp.


•Uf : weight associated with the input
•Ht-1: The hidden state of the previous timestamp
•Wf : It is the weight matrix associated with the hidden state
[Link] Gate :

 The input gate is used to quantify the importance of the new information
carried by the input. Here is the equation of the input gate.

•Xt: Input at the current timestamp t


•Ui: weight matrix of input
•Ht-1: A hidden state at the previous timestamp
•Wi: Weight matrix of input associated with hidden state
3. Output Gate:

 Here is the equation of the Output gate, which is pretty similar to the two
previous gates.

 Its value will also lie between 0 and 1 because of this sigmoid function.
Now to calculate the current hidden state, we will use Ot and tanh of the
updated cell state. As shown below.
LSTM NETWORK
WHAT ARE BIDIRECTIONAL LSTM’S ?
 Bidirectional LSTMs (Long Short-Term Memory) are a type of recurrent
neural network (RNN) architecture that processes input data in both forward
and backward directions.
 In a traditional LSTM, the information flows only from past to future,
making predictions based on the preceding context.
 However, in bidirectional LSTMs, the network also considers future
context, enabling it to capture dependencies in both directions.
 As a result, bidirectional LSTMs are particularly useful for tasks that
require a comprehensive understanding of the input sequence, such as
natural language processing tasks like sentiment analysis, machine
translation, and named entity recognition.
ADVANTAGES AND DISADVANTAGES OF LSTM’S
 Advantages
 It can learn long-term dependencies in data.
 It helps us to better understand and predict the behaviour of stock markets.
 It gives us better accuracy.
 It is capable of learning and remembering information over time.

 Disadvantages
 One drawbackis that implementing lstm networks on FPGA requires specialized
hardware and software knowledge.
 If they are not trained properly they lead to overfitting
 Limited long-term memory
 Complex cell structure

You might also like