AN2DL_04_2324_RecurrentNeuralNetworks
AN2DL_04_2324_RecurrentNeuralNetworks
2
Sequence Modeling
X0 X1 X2 X3 Xt
x1 x1 x1 x1 x1
… … … … …
xi xi xi xi … xi …
… … … … …
xI xI xI xI xI
time
3
Sequence Modeling
4
Memoryless Models for Sequences (1/2)
𝑊𝑡−2
Autoregressive models X0 X1 X2 X3 Xt
• Predict the next input from … …
time
Y0 Y1 Y2 Y3 … … Yt
time
5
Memoryless Models for Sequences (2/2)
Hidden
Feed forward neural networks
• Generalize autoregressive models 𝑊𝑡−2 𝑊𝑡−1 𝑊𝑡
time
time
6
Dynamical Systems (Models with Memory)
Hidden
Hidden
Hidden
• Input are treated as driving inputs …
Hidden
Hidden
Hidden
• Input are treated as driving inputs …
x1
1 (1)
(Computation Beyond the Turing Limit … 𝑐𝑏𝑡 𝑥 𝑡 , W𝐵 , 𝑐 𝑡−1 , VB
Hava T. Siegelmann, 1995)
𝑐𝐵𝑡−1
9
Recurrent Neural networks
1
x1
𝐽 𝐵
(1) (1) 𝑐1𝑡−1
ℎ𝑗𝑡 ⋅ = ℎ𝑗 𝑤𝑗𝑖 ⋅ 𝑥𝑖,𝑛
𝑡
+ 𝑣𝑗𝑏 ⋅ 𝑐𝑏𝑡−1
𝑗=0 𝑏=0
1 (1)
… 𝑐𝑏𝑡 𝑥 𝑡 , W𝐵 , 𝑐 𝑡−1 , VB
𝐽 𝐵
(1)
𝑡 (1)
𝑐𝑏𝑡 ⋅ = 𝑐𝑏 𝑤𝑏𝑖 ⋅ 𝑥𝑖,𝑛 𝑡−1
+ 𝑣𝑏𝑏′ ⋅ 𝑐𝑏′
𝑗=0 𝑏′=0
𝑐𝐵𝑡−1
10
Backpropagation Through Time
1
x1
…
𝑤𝑗𝑖 1
xi
… …
𝑔𝑡 𝑥 w
xI 𝑤𝐽𝐼
𝑐1𝑡−1
1 (1)
… 𝑐𝑏𝑡 𝑥 𝑡 , W𝐵 , 𝑐 𝑡−1 , VB
𝑐𝐵𝑡−1
11
Backpropagation Through Time
1 1 1 1
x1 x1 x1 x1
xi xi xi xi
… … … … …
𝑔𝑡 𝑥 w
xI xI xI xI 𝑤𝐽𝐼
… 1 (1)
… … … … 𝑐𝑏𝑡 𝑥 𝑡 , W𝐵 , 𝑐 𝑡−1 , VB
12
Backpropagation Through Time
1
𝑈−1 𝑈−1
… …
1 𝜕𝐸 1 𝜕𝐸 𝑡
𝑊𝐵 = 𝑊𝐵 − 𝜂 ⋅ 𝑉 = 𝑉𝐵 − 𝜂 ⋅
𝑈 𝜕𝑊𝐵𝑡−𝑢 𝐵 𝑈 𝜕𝑉𝐵𝑡−𝑢 xI 𝑤𝐽𝐼 𝑔𝑡 𝑥 w
𝑢=0 𝑢=0
… … … … 1 (1)
… 𝑐𝑏𝑡 𝑥 𝑡 , W𝐵 , 𝑐 𝑡−1 , VB
𝑉𝐵𝑡−3 𝑉𝐵𝑡−2 𝑉𝐵𝑡−1 𝑉𝐵𝑡
13
How much should we go back in time?
1
𝑤11
ℎ𝑗𝑡 𝑥 𝑡 , W 1 , 𝑐 𝑡−1 , V 1
Sometime output might be related to x1
xi
Jane walked into the room. John walked in too.
It was late in the day. Jane said hi to <???> … …
𝑔𝑡 𝑥 w
xI 𝑤𝐽𝐼
16
Dealing with Vanishing Gradient
𝑔 𝑎 = 𝑅𝑒𝐿𝑢 𝑎 = max 0, 𝑎
𝑔′ 𝑎 = 1𝑎>0
Build Recurrent Neural Networks using small modules that are designed
to remember values for a long time.
ℎ𝑡 = 𝑣 (1) ℎ𝑡−1 + 𝑤 (1) 𝑥 𝑦 𝑡 = 𝑔(𝑤 2
⋅ ℎ𝑡 )
𝑥
𝑣 (1) = 1
It only accumulates
the input ...
17
Long Short-Term Memories (LSTM)
RNN
LSTM
Input gate
LSTM
Forget gate
LSTM
Memory gate
LSTM
Output gate
LSTM
It combines the forget and input gates into a single “update gate.” It also
merges the cell state and hidden state, and makes some other changes.
Hidden
Hidden
Hidden
…
X0 X1 Xt
…
26
Multiple Layers and Bidirectional LSTM Networks
ReLu
ReLu
…
LSTM
LSTM
LSTM
…
LSTM
LSTM
LSTM
…
X0 X1 … Xt
27
Tips & Tricks
28
Multiple Layers and Bidirectional LSTM Networks
ReLu
ReLu
ReLu
ReLu
ReLu
… …
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
… …
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
… …
Xt Xt-1 X0
X0 X1 … Xt X0 X1 … Xt
Bidirectional
processing
29
Tips & Tricks
30
Sequential Data Problems
Fixed-sized Sequence output Sequence input (e.g. Sequence input and Synced sequence input
input (e.g. image captioning sentiment analysis sequence output (e.g. and output (e.g. video
to fixed-sized takes an image and where a given sentence Machine Translation: an classification where we
output outputs a sentence of is classified as RNN reads a sentence in wish to label each frame
(e.g. image words). expressing positive or English and then outputs of the video)
classification) negative sentiment). a sentence in French)
32
Sequence to Sequence Learning Examples (2/3)
33
Sequence to Sequence Learning Examples (3/3)
34