Deep Learning: Hoàng Huy Minh Hoàng Thảo Lan Chi Phạm Huy Thiên Phúc Trương Huỳnh Đăng Khoa
Deep Learning: Hoàng Huy Minh Hoàng Thảo Lan Chi Phạm Huy Thiên Phúc Trương Huỳnh Đăng Khoa
CC03
Hoàng Huy Minh
Hoàng Thảo Lan Chi
Phạm Huy Thiên Phúc
Trương Huỳnh Đăng Khoa
What is
deep learning?
Deep learning is a subset of machine learning, which is a
neural network with three or more layers
3
What is deep leaning?
➔ These neural networks try to copy how our brains work so it can “learn”
from large amounts of data that is unstructured or unlabelled.
➔ While a neural network with 1 layer still work, additional hidden layers
increase accuracy
4
Applications of deep learning
5
History
and Development
6
7
Prominant
DL architectures
8
Traning with example
sets
9
1. Convolutional neural networks (CNN)
Structure Applications
• a multilayer neural network • image processing
• early layers recognize features and later layers • video recognition
recombine them into higher-level attributes of the • natural language
input processing
• Training: using back-propagation
10
2. Recurrent neural network (RNN)
Structure
• Connections to the same layer or prior layers that
help maintain memory of past inputs and model
problems in time
• Training: standard back-propagation or BPTT
Applications
• speech recognition
• handwriting recognition
11
3. LSTM networks
Structure Application
• memory cells Image and video captioning systems
• 3 gates: input, forget,
output, and weights to
control each gate
• Training: BPTT (Back-
propagation in time)
12
4. Gated recurrent unit (GRU) networks
Structure
• Simplified version of LSTM, more efficient,
less expressive
• 2 gates: update gate and reset gate
• Training: similar to LSTM
Applications
• text compression
• handwriting recognition
• speech recognition
• gesture recognition
• image captioning 13
14
1. Self-organized maps
Structure
• Clusters of the input data set
• Random weights are initialized to each feature of the input record and represent the input
node
• Nodes with the least distance between input and output node is most accurate and become
center points
Applications
• Dimensionality reduction
• Radiant grade result
• Cluster visualization
15
2. Autoencoders
Structure
• 3 layers: input, hidden, and output
layers
• Hidden layers have less nodes than
input
• Output layer decodes to
reconstructs input
• The weights are adjusted to
minimize the error
Applications
• Learn continuously
• Dimensionality reduction
• Data interpolation
• Data compression/decompression. 16
3. Restricted Boltzmann Machines (RBM)
Structure
• 2 layers: input and hidden layers
• Nodes are connected between 2 layers but not
within
• Neurons are activated at random
• Calculate probability distribution of the training
set by stochastic approach
• Hidden and visible bias
Applications
• Dimensionality reduction
• Collaborative filtering
17
Outstanding
achievements
18
Video to video synthesis
Ting-Chun Wang 2018
Architecture
Aim
Produce a high-resolution,
photorealistic video output from a
segmented input video in a diverse
set of formats
Method
1 generator to create images
2 discriminator to ensure the quality
of output videos
19
Video to video synthesis
1. Semantic Labels → Cityscapes Street Views
20
Video to video synthesis
2. Edge -> Face 3. Pose -> Body
Create videos of a human face
Create realistic dance videos from
from edge map video
video of poses
Generate different faces from
Can change the clothing and give
the same input
out consistent shadows
Can change the facial
appearance
21
Video to video synthesis
4. Frame prediction
2 tasks:
Synthesize future masks from observed
frames
Convert masks into videos by vid2vid
synthesis
22
Language models: BERT
Google AI Language team, 2019
23
Training of Google’s BERT
24
Thanks for
listening!
Any questions?
25