RNN_LSTM_Transformers_Notes
RNN_LSTM_Transformers_Notes
---------------------------------
RNNs are neural networks designed for sequential data. Unlike feedforward networks, RNNs have
Architecture:
- Input Layer: Takes one time step of the input sequence at a time.
- Hidden Layer: Processes the input and the previous hidden state.
Equation:
Where:
Limitations:
LSTMs are an advanced version of RNNs designed to handle long-term dependencies. They
Architecture:
- Output Gate: Controls the output based on the cell state and hidden state.
Equations:
Advantages:
3. Bidirectional LSTM
----------------------
A Bidirectional LSTM processes the sequence in both forward and backward directions, capturing
Architecture:
- Two LSTM layers: One processes the input forward, and the other processes it backward.
Equation:
Applications:
4. Encoder-Decoder Architecture
--------------------------------
- Encoder: Processes the input sequence and encodes it into a context vector.
Workflow:
1. The encoder processes the input sequence and generates a fixed-size context vector.
2. The decoder takes this context vector and generates the output sequence step-by-step.
5. Transformers
----------------
Transformers are powerful models that replace RNNs with self-attention mechanisms. They are the
Components:
1. Encoder-Decoder Structure:
Attention Equation:
Attention(Q, K, V) = softmax((QK^T)/sqrt(d_k))V
Advantages:
- Highly parallelizable.
Summary Table:
------------------------------------------------------
|---------------------|----------------------------|---------------------------|
------------------------------------------------------