Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
Module 4 - S8 CSE NOTES - KTU DEEP LEARNING NOTES - CST414
2. Basic Architecture
where:
( Wh ) and ( Wx ) are weight matrices for the hidden state and input,
respectively.
Output:
[
yt = f(Wy ht + by )]
where:
3. Unfolding in Time
RNNs can be visualized as being "unfolded" in time, where each time step
of the input sequence corresponds to a layer in the network. This unfolding
shows the recursive nature of RNNs, where each unit shares the same
parameters but processes different elements of the sequence.
4. Training RNNs
Training RNNs involves adjusting the weights and biases to minimize a loss
function, typically using a variant of backpropagation called
Backpropagation Through Time (BPTT). BPTT involves the following steps:
Forward Pass: Compute the hidden states and outputs for each time
step.
Loss Calculation: Compute the loss based on the predicted outputs and
actual targets.
Backward Pass: Calculate the gradients of the loss with respect to the
parameters by propagating the error backwards through time.
6. Variants of RNNs
To address the limitations of standard RNNs, several variants have been
developed, including:
For each time step ( t ), compute the hidden state ( ht ) using the
8. Mathematical Representation
The mathematical representation of an RNN highlights its recursive nature:
Output: yt = f(Wy ht + by )
The above equations encapsulate the essence of RNNs, where the hidden
state at each time step is a function of the current input and the previous
hidden state, allowing the network to maintain a memory of previous inputs.
No recurrent connections.
The hidden state evolves over time to generate multiple outputs from
a single input.
Use Case: Image captioning, where a single image is the input and a
sequence of words (caption) is the output.
3. Many-to-One
Architecture:
4. Many-to-Many (Sequence-to-Sequence)
Architecture:
Healthcare
Patient Monitoring: Analyze medical data sequences to monitor patient
health and detect anomalies.
Anomaly Detection
Network Security: Detect unusual patterns indicating cyber-attacks by
analyzing network activity logs.
In its basic form, an RNN has a hidden state that is updated at each time
step of the input sequence. The same set of weights is used across all
time steps.
2. Visualization:
and the hidden state from the previous time step ( ht−1 ) to compute the
Unrolling the RNN means explicitly drawing out each of these time
steps as individual layers. So, instead of having one recurrent unit that
updates its state, we have a sequence of units, each with its own copy
of the weights, but all sharing the same parameters.
3. Mathematical Representation:
where ( Whh ) is the weight matrix for the hidden state, ( Wxh ) is the weight
⋮
[hT = f(Whh hT −1 + Wxh xT + bh )]
4. Benefits of Unrolling:
5. Example:
Introduction
Recursive Neural Networks represent data in a hierarchical or tree-like
structure, making them suitable for tasks where the input naturally forms a
tree. This structure allows for deep learning of the input data, leading to a
more compact and potentially more powerful representation of the input.
1. Tree Representation:
Nodes and Edges: Each node in the tree represents a subpart of the
input, and the edges represent the relationship between these
subparts.
2. Forward Propagation:
3. Training:
Loss Function: The loss is computed at the root of the tree, and the
error is propagated downwards, updating the weights at each node.
Applications
Recursive Neural Networks have been successfully applied in various
domains, including:
2. Computer Vision:
Advantages
1. Hierarchical Learning:
2. Flexibility:
3. Efficiency:
2. Structural Dependency:
1. State as Memory:
Working of RNNs
1. Input Sequence:
At each time step ( t), the RNN updates its hidden state ( h(t))
based on the previous hidden state ( h(t − 1)) and the current input
( x(t)).
(e.g., tanh or ReLU), and ( Wh ), ( Wx ), and ( b) are parameters
3. Output Generation:
The RNN generates an output ( y(t) ) at each time step, which can
be used for tasks like prediction or classification.
Applications of RNNs
RNNs are used in various applications where the order of data matters:
Speech Recognition:
Time-Series Prediction:
Machine Translation:
Video Analysis:
LSTM ARCHITECTURE
Components:
LSTM APPLICATIONS
LSTMs (Long Short-Term Memory) networks are a cornerstone in Natural
Language Processing (NLP), facilitating a variety of tasks due to their
proficiency in handling sequential data and learning long-term
dependencies within text. Here’s how LSTMs are applied in NLP:
1. Language Modeling:
◦
Task: Predicting the next word in a sequence based on preceding words,
pivotal for autocompletion, text generation, and machine translation.
◦
How LSTMs help: They discern contextual cues from prior words and
leverage their memory capabilities to grasp relationships, enabling
accurate word prediction.
2.
Machine Translation:
◦
Task: Translating text from one language to another while preserving
semantic and syntactic coherence.
◦
How LSTMs help: By capturing long-term dependencies in sentences,
like word order and grammatical structures, LSTMs ensure faithful
translation by considering the holistic context.
3.
Sentiment Analysis:
◦
Task: Determining the emotional polarity of text (positive, negative, or
neutral).
◦
How LSTMs help: Analyzing word sequences allows LSTMs to gauge
sentiment, considering not only individual words but also their contextual
nuances, thereby capturing subtle emotional cues effectively.
4.
Text Summarization:
◦
Task: Condensing lengthy text while retaining essential information.
GRU ARCHITECTURE
The Gated Recurrent Unit (GRU) is a type of recurrent neural network
(RNN) architecture designed to address the vanishing gradient problem
and efficiently capture long-range dependencies in sequential data.
Here’s a brief overview of the GRU architecture:
Update Gate
Reset Gate
Update Gate
The update gate controls the amount of the past information that needs
to be passed along to the future. The output of the update gate zt is
calculated as follows:
zt = σ(Wz xt + Uz ht−1 )
where:
Reset Gate
The reset gate determines the amount of past information to forget. The
output of the reset gate rt is calculated as:
rt = σ(Wr xt + Ur ht−1 )
where:
~
ht = tanh(W xt + rt ⊙ Uht − 1)
where:
GRU vs LSTM
Feature GRU LSTM