Recurrent neural networks
Deep Learning for Engineering
Master’s Degree in Electrical Engineering
Ivan Aldaya and Leandra Abreu
Table of Contents
▶ Introduction to RNNs
▶ Basic RNNs
▶ Long-short memory term and gated recurrent units
▶ Bibliography
1/25
Introduction to RNNs
Justification of RNNs
MLPs do not exploit correlation RNNs
CNNs exploit signal correlation
However, none of them presents memory
In some cases, it is interesting to have memory
• Mainly in series and sequence processing
• Forecasting of complicated series
is of great interest
• MLPs and CNNs can also be employed
2/25
Introduction to RNNs
Examples of RNNs
Time series forecast Other applications: nonl. comp.
Sahoo, B. B., Jha, R., Singh, A., & Kumar, D. (2019). Long Liu, X., Wang, Y., Wang, X., Xu, H., Li, C., & Xin, X. (2021).
short-term memory (LSTM) recurrent neural network for Bi-directional gated recurrent unit neural network based
low-flow hydrological time series forecasting. Acta nonlinear equalizer for coherent optical communication system.
Geophysica, 67(5), 1471-1481. Optics Express, 29(4), 5923-5933.
3/25
Introduction to RNNs
Types of time series and sequences
Dimensionality Directionality
Time series can be unidimensional: In many cases, only past values can be
• Some temperature trazes considered
• Some voltages or current • This is the case of forecasting
Or they can be multidimensional In some cases, we can have information
• Vectorial measurements of movement on both directions
forces, acceration... • Even for causal systems
• Multidimensional communication • Sometimes, we perform a time frame-
signals, amplitude/phase, polarization work conversion
4/25
Introduction to RNNs
General structure of RNNs
Single-layer RNN Multilayer RNN
x[k] x[k]
x[k+1] x[k+1]
Recurrent layer
Recurrent layer
Recurrent layer
x[k=2] y[k] x[k+2] y[k]
x[k+i] x[k+i]
x[k+N] x[k+N]
Recurrent Densely connected Recurrent Densely connected
stage stage stage stage
5/25
Introduction to RNNs
Different implementations
Basic RNNs Advanced RNNs Transformers
mem. mem. mem. mem. mem. mem. mem. mem. mem. mem.
cell cell cell cell cell cell cell cell cell cell
Directionality
They are intuitive but they They improve basic RNNs More advanced memory cells
have some problems: to avoid: Capable of extracting com-
• Training is challenging • Gradient van./exp. plex temoral patterns
• Subject to gradient vani- There are mainly two types: Transformers are used in:
shing/explosion • Long-short memory term • Natural language proc.
• Gated recurrent unit • DNA sequencing
6/25
Table of Contents
▶ Introduction to RNNs
▶ Basic RNNs
▶ Long-short memory term and gated recurrent units
▶ Bibliography
7/25
Basic RNNs
The recurrent unit
Recurrent connection
Basic RNNs
x1 T
ω1 ωr
x2 ω2
u y
ωi ∑
xi
ωN
xN
Weighted sum Activation function
8/25
Basic RNNs
Unwrapping an ANN
ωi
T
xi[k] ∑ h1[k]
ωr
y[k-1]
ωr
u ωi
xi[k]
ωi
∑ y[k] xi[k+1] ∑ h2[k]
ωr
ωi
xi[k+2] ∑ h3[k]
Unwrapping
ωr
ωi
xi[k+LW] ∑ y[k]
9/25
Basic RNNs
Implementation of RNNs
ωi In a more schematical way:
xi[k] ∑ h1[k]
xi[k] h1[k]
ωr
ωr
ωi
xi[k+1] ∑ h2[k]
xi[k+1] h2[k]
ωr ωr
ωi
xi[k+2] ∑ h3[k]
xi[k+j] hj[k]
ωr
ωr
ωi xi[k+N] y[k]
xi[k+LW] ∑ y[k]
10/25
Basic RNNs
Implementation of RNNs
There are two alternatives
Return the full sequence Return the last value
xi[k] h1[k] xi[k]
Full sequence is usually
ωr ωr adopted in intermediate
layers
xi[k+1] h2[k] xi[k+1]
ωr ωr Single output is usually
adopted in the final
xi[k+j] hj[k] xi[k+j] recurrent layer
ωr ωr
xi[k+N] y[k] xi[k+N] y[k]
11/25
Basic RNNs
Implementation of RNNs
Stacking different recurrent layers
h11[k] h12[k]
xi[k]
ωr(1) ωr(2) ωr(3)
h21[k] h22[k]
xi[k-1]
ωr(1) ωr(2) ωr(3)
hj1[k] hj2[k]
xi[k-j]
ωr(1) ωr(2) ωr(3)
hN1[k] hN2[k]
xi[k-N] y[k]
12/25
Basic RNNs
Formatting the data for RNNs
Overlap windowing is usually employed to build the input matrix and the outputs vector
The length of the window
should coincide with the
window 1 out1 depth of the unwrapped cell
LW = Window length
window 2 out 2
The number of windows is:
window 3 out 3 NW = Window number
window i out i
13/25
Basic RNNs
Formatting the data for RNNs
The different windows are combined in a single matrix
Output vector
window 1 out1
window 2 out 2
window 3 out 3
out i
Input matrix
window i
14/25
Basic RNNs
Formatting the data for RNNs
Window 2
Window 1
Window 0
Window i x[2] x[1] x[0]
x[i+0]
Output vector
x[i+1] x[3] x[2] x[1]
x[i+2] x[4] x[3] x[2]
x[i+3] x[5] x[4] x[3] x[i+11] x[13] x[12] x[11]
x[i+4] x[6] x[5] x[4]
Output 2
Output 1
Output 0
x[7] x[6] x[5]
Output i
x[i+5]
x[i+6] x[8] x[7] x[6]
x[i+7] x[9] x[8] x[7]
The size of the input matrix is:
x[i+8] x[10] x[9] x[8]
Input matrix N w × Lw
x[i+9] x[11] x[10] x[9] The size of the output vector is:
x[i+10] x[12] x[11] x[10] 1× Nw
15/25
Basic RNNs
Processing the data with RNNs
RU1,N1 RU2,N2 RU2,Nm
RU1,2 RU2,2 RU2,2
RU1,1 RU2,1 RU2,1
x[NW] x[1] x[1] x[0]
x
[NW+1] x[3] x[2] x[1]
x
[NW+2] x[4] x[3] x[2]
y[NW] y[2] y[1] y[0]
x
[NW+3] x[5] x[4] x[3]
x
Output vector
[NW+4] x[6] x[5] x[4]
Densily conn.
layer
x x x
[NW+LW] [LW+2] [LW+1] x[LW]
Input matrix with 1st recur. layer 2nd recur. layer mth recur. layer
size NW × LW with N1 units with N2 units with N2 units
16/25
Basic RNNs
Drawbacks of the RNNs, the exploiting/vanishing effect
Let's consider a single recurrent unit |ωR|>1 |ωR|<1
To simplify our discussion, we asume Gradient explosion Gradient vanishing
linear activation function
RU RU
RU x[0] x[0]
x[0]
The contribution
of the input x[i] x[1] x[1]
x[1] to the output is
proporcional to: x[2] x[2]
x[2]
x[3] x[3]
x[3] ωRLw−i
x[4] x[4]
x[4] Therefore, we have
two posibilities
x[LW] y[0] x[LW] y[0]
x[LW] y[0]
17/25
▶ Introduction to RNNs
▶ Basic RNNs
▶ Long-short memory term and gated recurrent units
▶ Bibliography
18/25
Long-short memory term and gated recurrent units
Short and long-term paths
A possible solution to this problem is to implement bypass connections
RU Short-mem. Long-mem.
path path
x[0] x[0]
x[1] x[1]
x[2] x[2]
x[3] x[3]
x[4] x[4]
Long. mem.
Short-mem. path
path
x[LW] ∑ y[0] x[LW] y[0]
19/25
Long-short memory term and gated recurrent units
Short-long memory term (LSTM) units
Short-memory
Long-memory path
logistic
Forget
gate
path
RU
x[0]
logistic
x[1]
x[2]
x[i] tanh
x[i]
logistic
Cell state
tanh
Output
gate
x[LW] y[0]
20/25
Long-short memory term and gated recurrent units
Gated recurrent units
RU RU
Input Input
x[0] x[0]
x[1] x[1] Long
memory
x[2]
Short Long x[2] terms
memory LSMT memory GRU +
terms terms Short
x[i] x[i]
memory
terms
Output Output
x[LW] y[0] x[LW] y[0]
21/25
Long-short memory term and gated recurrent units
Gated recurrent units
RU
logistic
Reset gate
x[0]
Cell state
x[1]
x[2]
x[i] logistic Update gate
x[i]
−1
Candidate tanh
hidden state
x[LW] y[0]
22/25
Table of Contents
▶ Introduction to RNNs
▶ Basic RNNs
▶ Long-short memory term and gated recurrent units
▶ Bibliography
23/25
Bibliography
1. Goodfellow, I. (2016). Deep learning (Vol. 196). MIT press.
2. Alpaydin, E. (2021). Machine learning. MIT press.
3. [Link]
4. [Link]
5. [Link]
deep-learning-2e54923f3e2
6. [Link] ecurrent − modern/[Link]
24/25
Exercise
Reply and justify the following issues applied to RNNs, LSTMs, and GRUs:
1. Which are the model parameters of each of the three architectures?
2. Which are the model hyperparameters of each of the three architectures?
3. Briefly compared them in terms of complexity?
Tip: for the sake of clarity, I recommend to build a table to summarize the information of
each question.
25/25