0% found this document useful (0 votes)
82 views26 pages

Understanding Recurrent Neural Networks

The document provides an overview of Recurrent Neural Networks (RNNs), including their structure, types, and implementations, as well as advanced models like Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU). It discusses the importance of memory in processing sequences and time series data, highlighting the advantages and challenges of using RNNs. Additionally, it includes a bibliography and an exercise section for further exploration of the topic.

Uploaded by

ivan.aldaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views26 pages

Understanding Recurrent Neural Networks

The document provides an overview of Recurrent Neural Networks (RNNs), including their structure, types, and implementations, as well as advanced models like Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU). It discusses the importance of memory in processing sequences and time series data, highlighting the advantages and challenges of using RNNs. Additionally, it includes a bibliography and an exercise section for further exploration of the topic.

Uploaded by

ivan.aldaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Recurrent neural networks

Deep Learning for Engineering


Master’s Degree in Electrical Engineering
Ivan Aldaya and Leandra Abreu
Table of Contents

▶ Introduction to RNNs

▶ Basic RNNs

▶ Long-short memory term and gated recurrent units

▶ Bibliography

1/25
Introduction to RNNs
Justification of RNNs

MLPs do not exploit correlation RNNs

CNNs exploit signal correlation

However, none of them presents memory

In some cases, it is interesting to have memory


• Mainly in series and sequence processing
• Forecasting of complicated series
is of great interest
• MLPs and CNNs can also be employed

2/25
Introduction to RNNs
Examples of RNNs

Time series forecast Other applications: nonl. comp.

Sahoo, B. B., Jha, R., Singh, A., & Kumar, D. (2019). Long Liu, X., Wang, Y., Wang, X., Xu, H., Li, C., & Xin, X. (2021).
short-term memory (LSTM) recurrent neural network for Bi-directional gated recurrent unit neural network based
low-flow hydrological time series forecasting. Acta nonlinear equalizer for coherent optical communication system.
Geophysica, 67(5), 1471-1481. Optics Express, 29(4), 5923-5933.

3/25
Introduction to RNNs
Types of time series and sequences

Dimensionality Directionality
Time series can be unidimensional: In many cases, only past values can be
• Some temperature trazes considered
• Some voltages or current • This is the case of forecasting

Or they can be multidimensional In some cases, we can have information


• Vectorial measurements of movement on both directions
forces, acceration... • Even for causal systems
• Multidimensional communication • Sometimes, we perform a time frame-
signals, amplitude/phase, polarization work conversion

4/25
Introduction to RNNs
General structure of RNNs
Single-layer RNN Multilayer RNN

x[k] x[k]

x[k+1] x[k+1]
Recurrent layer

Recurrent layer

Recurrent layer
x[k=2] y[k] x[k+2] y[k]

x[k+i] x[k+i]

x[k+N] x[k+N]

Recurrent Densely connected Recurrent Densely connected


stage stage stage stage
5/25
Introduction to RNNs
Different implementations

Basic RNNs Advanced RNNs Transformers

mem. mem. mem. mem. mem. mem. mem. mem. mem. mem.
cell cell cell cell cell cell cell cell cell cell

Directionality

They are intuitive but they They improve basic RNNs More advanced memory cells
have some problems: to avoid: Capable of extracting com-
• Training is challenging • Gradient van./exp. plex temoral patterns
• Subject to gradient vani- There are mainly two types: Transformers are used in:
shing/explosion • Long-short memory term • Natural language proc.
• Gated recurrent unit • DNA sequencing
6/25
Table of Contents

▶ Introduction to RNNs

▶ Basic RNNs

▶ Long-short memory term and gated recurrent units

▶ Bibliography

7/25
Basic RNNs
The recurrent unit

Recurrent connection

Basic RNNs
x1 T
ω1 ωr
x2 ω2
u y
ωi ∑
xi
ωN
xN

Weighted sum Activation function

8/25
Basic RNNs
Unwrapping an ANN

ωi
T
xi[k] ∑ h1[k]
ωr
y[k-1]
ωr
u ωi
xi[k]
ωi
∑ y[k] xi[k+1] ∑ h2[k]

ωr
ωi
xi[k+2] ∑ h3[k]
Unwrapping

ωr
ωi
xi[k+LW] ∑ y[k]

9/25
Basic RNNs
Implementation of RNNs

ωi In a more schematical way:


xi[k] ∑ h1[k]

xi[k] h1[k]
ωr
ωr
ωi
xi[k+1] ∑ h2[k]
xi[k+1] h2[k]

ωr ωr
ωi
xi[k+2] ∑ h3[k]
xi[k+j] hj[k]
ωr
ωr
ωi xi[k+N] y[k]
xi[k+LW] ∑ y[k]

10/25
Basic RNNs
Implementation of RNNs

There are two alternatives


Return the full sequence Return the last value

xi[k] h1[k] xi[k]


Full sequence is usually
ωr ωr adopted in intermediate
layers
xi[k+1] h2[k] xi[k+1]
ωr ωr Single output is usually
adopted in the final
xi[k+j] hj[k] xi[k+j] recurrent layer
ωr ωr

xi[k+N] y[k] xi[k+N] y[k]


11/25
Basic RNNs
Implementation of RNNs

Stacking different recurrent layers

h11[k] h12[k]
xi[k]

ωr(1) ωr(2) ωr(3)


h21[k] h22[k]
xi[k-1]

ωr(1) ωr(2) ωr(3)


hj1[k] hj2[k]
xi[k-j]

ωr(1) ωr(2) ωr(3)


hN1[k] hN2[k]
xi[k-N] y[k]
12/25
Basic RNNs
Formatting the data for RNNs

Overlap windowing is usually employed to build the input matrix and the outputs vector

The length of the window


should coincide with the
window 1 out1 depth of the unwrapped cell

LW = Window length
window 2 out 2
The number of windows is:
window 3 out 3 NW = Window number
window i out i
13/25
Basic RNNs
Formatting the data for RNNs

The different windows are combined in a single matrix

Output vector

window 1 out1

window 2 out 2

window 3 out 3

out i
Input matrix
window i

14/25
Basic RNNs
Formatting the data for RNNs

Window 2
Window 1
Window 0
Window i x[2] x[1] x[0]
x[i+0]
Output vector
x[i+1] x[3] x[2] x[1]

x[i+2] x[4] x[3] x[2]

x[i+3] x[5] x[4] x[3] x[i+11] x[13] x[12] x[11]

x[i+4] x[6] x[5] x[4]

Output 2

Output 1

Output 0
x[7] x[6] x[5]

Output i
x[i+5]

x[i+6] x[8] x[7] x[6]

x[i+7] x[9] x[8] x[7]


The size of the input matrix is:
x[i+8] x[10] x[9] x[8]
Input matrix N w × Lw
x[i+9] x[11] x[10] x[9] The size of the output vector is:
x[i+10] x[12] x[11] x[10] 1× Nw
15/25
Basic RNNs
Processing the data with RNNs

RU1,N1 RU2,N2 RU2,Nm


RU1,2 RU2,2 RU2,2
RU1,1 RU2,1 RU2,1
x[NW] x[1] x[1] x[0]
x
[NW+1] x[3] x[2] x[1]
x
[NW+2] x[4] x[3] x[2]
y[NW] y[2] y[1] y[0]
x
[NW+3] x[5] x[4] x[3]
x
Output vector
[NW+4] x[6] x[5] x[4]

Densily conn.
layer
x x x
[NW+LW] [LW+2] [LW+1] x[LW]

Input matrix with 1st recur. layer 2nd recur. layer mth recur. layer
size NW × LW with N1 units with N2 units with N2 units
16/25
Basic RNNs
Drawbacks of the RNNs, the exploiting/vanishing effect

Let's consider a single recurrent unit |ωR|>1 |ωR|<1


To simplify our discussion, we asume Gradient explosion Gradient vanishing
linear activation function
RU RU
RU x[0] x[0]
x[0]
The contribution
of the input x[i] x[1] x[1]
x[1] to the output is
proporcional to: x[2] x[2]
x[2]
x[3] x[3]
x[3] ωRLw−i
x[4] x[4]
x[4] Therefore, we have
two posibilities

x[LW] y[0] x[LW] y[0]


x[LW] y[0]

17/25
▶ Introduction to RNNs

▶ Basic RNNs

▶ Long-short memory term and gated recurrent units

▶ Bibliography

18/25
Long-short memory term and gated recurrent units
Short and long-term paths

A possible solution to this problem is to implement bypass connections

RU Short-mem. Long-mem.
path path
x[0] x[0]

x[1] x[1]

x[2] x[2]

x[3] x[3]

x[4] x[4]
Long. mem.
Short-mem. path
path
x[LW] ∑ y[0] x[LW] y[0]

19/25
Long-short memory term and gated recurrent units
Short-long memory term (LSTM) units

Short-memory

Long-memory path
logistic
Forget
gate

path
RU
x[0]
logistic
x[1]

x[2]
x[i] tanh

x[i]
logistic

Cell state
tanh
Output
gate
x[LW] y[0]

20/25
Long-short memory term and gated recurrent units
Gated recurrent units

RU RU
Input Input
x[0] x[0]

x[1] x[1] Long


memory
x[2]
Short Long x[2] terms
memory LSMT memory GRU +
terms terms Short
x[i] x[i]
memory
terms
Output Output

x[LW] y[0] x[LW] y[0]

21/25
Long-short memory term and gated recurrent units
Gated recurrent units

RU
logistic
Reset gate
x[0]

Cell state
x[1]

x[2]
x[i] logistic Update gate

x[i]
−1
Candidate tanh
hidden state

x[LW] y[0]

22/25
Table of Contents

▶ Introduction to RNNs

▶ Basic RNNs

▶ Long-short memory term and gated recurrent units

▶ Bibliography

23/25
Bibliography

1. Goodfellow, I. (2016). Deep learning (Vol. 196). MIT press.

2. Alpaydin, E. (2021). Machine learning. MIT press.

3. [Link]

4. [Link]

5. [Link]
deep-learning-2e54923f3e2

6. [Link] ecurrent − modern/[Link]

24/25
Exercise

Reply and justify the following issues applied to RNNs, LSTMs, and GRUs:

1. Which are the model parameters of each of the three architectures?

2. Which are the model hyperparameters of each of the three architectures?

3. Briefly compared them in terms of complexity?

Tip: for the sake of clarity, I recommend to build a table to summarize the information of
each question.

25/25

You might also like