EE2211 Introduction to
Machine Learning
Lecture 12
Wang Xinchao
[email protected]
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Course Contents
• Introduction and Preliminaries (Xinchao)
– Introduction
– Data Engineering
– Introduction to Linear Algebra, Probability and Statistics
• Fundamental Machine Learning Algorithms I (Helen)
– Systems of linear equations
– Least squares, Linear regression
– Ridge regression, Polynomial regression
• Fundamental Machine Learning Algorithms II (Thomas)
– Over-fitting, bias/variance trade-off
– Optimization, Gradient descent
– Decision Trees, Random Forest
• Performance and More Algorithms (Xinchao)
– Performance Issues
– K-means Clustering
– Neural Networks
2
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
About this week’s lecture…
• Neural Network (NN) is a very big topic
– In NUS we have multiple full-semester modules to discuss NN
• EE4305 Fuzzy/Neural Systems for Intelligent Robotics
• EE5934/EE6934 Deep Learning
– In EE2211, we only give a very gentle introduction
A
• Understanding at conceptual level is sufficient
– In final exam, we have only 1 True/False + 1 MCQ about NN
– No computation is required
• You will do some computation in tutorial, but final exam
will be much simpler than the questions in tutorial
3
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Outline
• Introduction to Neural Networks
– Multi-layer perceptron
– Activation Functions
• Training and Testing of Neural Networks
– Training: Forward and Backward
– Testing: Backward
• Convolutional Neural Networks
4
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
In ML theperceptionis an algorithmforsupervisedlearningofbinaryclassifiers
Perceptron buildingblockofneuralnetwork
"& bias anartificialneuronthatdoescertaincomputations
y #& intheinput
data
+1 #& #!
W= #
"! #! "
##
#" Σ#&"& '(Σ*! +! )
O
"" !
## vectorproduct X !W WTX C'(X " W)
Neuron
"# Summation Activation
Function
bias
1
"! Neuron
$= "
"
"# Output of Neuron: '(X " W) or '(Σ*! +! )
Activation Function: non-linear function to
introduce non-linearity into the neural networks!
5
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Activation Functions
ISigmoid Activation Function
XTW 1
σ * = , To I
1 + , /01
v
can describeas the
ofinputofclasses
probability
g
XTW
6
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Activation Functions
ReLU Activation Function my www.ngyygnw
σ * = max 0, *
Rectified Linear Unit (ReLU)
7
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
near a
Multilayer Perceptron (Neural Network)
+! usethisas a classifier n'numberofclasses
+$ numbers ofneuron Nested function!
X= + outputwillbemputfornextlayer
%
+& h1 h2 hn-1 hn
player
!! #
2nd
!!,!
layer … &
!!,! Class 1
!,! output
σ1 σ2 … σn-1 σn
x1 ofinput
index
!
!#,! #
!#,! Class 2
x2 σ1 σ2 … σn-1 σn
⁞ ⁞ ⁞ ⁞
Class C
x !
3 !$,$
σ1 σ2 … σn-1 σn
t probability
x0 !
!%,$ 2# (X)
+1 +1 +1
4
bias
+1
layer 0 layer 1 layer 2 layer n-1 layer n
Input layer Output layer
Note: hn denotes the number of hidden neurons in layer n.
8
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Multilayer Perceptron (Neural Network)
!
!%,! ! number
, … , !%,' than
!
then
!
neuronX ( W! = [Σ!),!
!
) , … , Σ!),'! )) ]
494
! !
!!,!, … , !!,'! o
)
c clan vector
of W = !
! !
input !#,! , … , !#,'!
! !
!$,! , … , !$,'!
h1 h2 hn-1 hn
#
!#,#
… &
!#,# σn
x1 σ1 σ2 … σn-1
+! #
!!,# !
+$ !!,#
X= + x2 σ1 σ2 … σn-1 σn
%
⁞ ⁞ ⁞ ⁞
+&
σn-1
x3 !$,$
#
σ1 σ2 … σn
#" #
!%,$ !! (")
+1 +1 +1 +1
layer 0 layer 1 layer 2 layer n-1 layer n
Input layer Output layer
A
A neural network is essentially a nested function.
pad!$ X = $([1, … $( 1, $ * % + & ]%x + ' … ](x + ) )
bits
9
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Outline
• Introduction to Neural Networks
– Multi-layer perceptron
– Activation Functions
• Training and Testing of Neural Networks
– Training: Forward and Backward
– Testing: Backward Forward
• Convolutional Neural Networks
10
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
I Goal of Neural Network
ran
Training:
0.6
X = 0.5
to Learn W
0.7 h1 h2 hn-1 hn
#
!#,#
… &
!#,# σn
0.6
x1 σ1 σ2 … σn-1 0.7
#
!!,# !
!!,#
0.5 x2 σ1 σ2 … σn-1 σn 0.1
⁞ ⁞ ⁞ ⁞
σn-1
0.7 x3 !$,$
#
σ1 σ2 … σn 0.2
#
!%,$ !! (")
0.7
+1 +1 +1 +1 32 = 4* (X) = 0.1
0.2
layer 0 layer 1 layer 2 layer n-1 layer n
Input layer Output layer
ok
Specifically, W is learned through
1. Random initialization
2. Backpropagation
11
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
am Ittihad
I Neural Network Training: Backpropagation
Assume we train a NN for 3-class classification
One-hot
Input data h1 h2
…
hn-1 hn
Output Class Labels
0.6 #
!#,#
σ1 σ2 … σn-1
&
!#,# σn
0.7 0
x1
0.6 ! #
!
!!,#
X = 0.5
!,#
0.5 σ1 σ2 … σn-1 σn 0.1 0
x2 compare
0.7 ⁞ ⁞ ⁞ ⁞
0.7 σ1 σ2 … σn-1 σn 0.2 1
x3 ! #
$,$
!! (")
0.7 0
#
!%,$
+1 +1 +1 +1
32 = 4* (X) = 0.1 3= 0
layer 0 layer 1 layer 2 layer n-1 layer n
A 0.2 1
Output layer
Input layer
prediction label
A loss function
A
1. Forward: (weights are fixed) Randominitialization for a single sample:
To compute network responses
To compute the errors at each output min# ∑'$%& (65$ − 6$ )(
or
2. Backward: (weights are updated) min# | 65 − 6 |(
To pass back the error from the output to the hidden layers
To update all weights to optimize the network Update W!
12
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Neural Network Training: Backpropagation
• Recall that the parameters W are randomly initialized.
• We use Backpropagation to update W.
• In essence, Backpropagation is gradientAdescent!
• Assume we have N samples, each sample denoted by X4
and the output of NN by 4 4 , loss function is then
"
5= ∑6
45! 47 4 4
−4 , min7 5 tolearnw
Recall gradient descent in Lec 8: w ← w − η∇85
• We would therefore like to compute ∇85!
– < is a function of 6,
5 and 65 is a function of w.
– Use gradient descent and chain rule!
Being aware of the concept is sufficient for exam. No calculation needed.
13
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
a
Neural Network Testing
Once all network is trained and parameters are updated
Input data h1 h2
…
hn-1 hn
Output
0.6 #
!#,#
σ1 σ2 … σn-1
&
!#,# σn
0.7
x1
0.6 ! #
!
!!,#
X = 0.5
!,#
0.5 σ1 σ2 … σn-1 σn 0.1
x2
0.7 ⁞ ⁞ ⁞ ⁞
0.7 σ1 σ2 … σn-1 σn 0.2
x3 ! #
$,$
#
!%,$ !! (")
+1 +1 +1 +1 0.7
layer 0 layer 1 layer 2 layer n-1 layer n
32 = 0.1
Input layer Output layer 0.2
1. Forward: (weights are fixed)
To estimate compute network responses
To predict the output labels given novel inputs
14
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Supplementary materials
(Not required for exam)
1) https://www.youtube.com/watch?v=tIeHLnjs5U8
This video series includes animations that explain backpropagation
calculus.
2)
https://www.youtube.com/playlist?list=PLQVvvaa0QuDcjD5BAw2DxE6
OF2tius3V3
This video series includes hands-on coding examples in Python.
15
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Outline
• Introduction to Neural Networks
– Multi-layer perceptron
– Activation Functions
• Training and Testing of Neural Networks
– Training: Forward and Backward
– Testing: Backward
• Convolutional Neural Networks
16
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Convolutional Neural Network (CNN)
• A convolutional neural network (CNN) is a special type of
feed-forward network that significantly reduces the
number of parameters in a deep neural network.
• Very popular in image-related applications
• Each image is stored as a matrix in a computer
eachimageismodelled
as a matrix eachpixelis
modelled
as an entry of
the matrix
thehigherthepixel
thebrighter
https://medium.com/lifeandtech/convert-csv-file-to-images-309b6fdb8c49
17
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Convolutional Neural Network (CNN)
• If we model all matrix entries as inputs all at once
– Assume we have an image/matrix size of 200x200
– Assume we have 10K neuros in the first layer
– We already have 200x200x10K=400 Million parameters to learn!
a lot
First Layer Neurons
200
Fully connectedNeural
Networkisout good
10K
200 Neurons
Everyneuronislinkedto
everypixel
18
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Convolutional Neural Network (CNN)
• Hence, we introduce CNN to reduce the number of
parameters.
reducememusetitiatiiidairetriedin scanning
Designonlyoneneuron maketheneuron
to slideovertheimage
slidethroughtheimage
justoneneuron
H makea neurontoonlylookat a
verysmalllocal
19
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Convolutional Neural Network (CNN)
9915141031 1 t 89
my puny
Wi Wiz
0 -1 0
-1 5 -1
0 -1 0
g[× ,× ]
Kernels to
be learned
Image source: https://brilliant.org/wiki/convolutional-neural-network/
20
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Convolutional Neural Network (CNN)
0 -1 0
-1 5 -1
0 -1 0
g[× ,× ]
21
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Convolutional Neural Network (CNN)
0 -1 0
-1 5 -1
0 -1 0
g[× ,× ]
22
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Convolutional Neural Network (CNN)
0 -1 0
-1 5 -1
0 -1 0
g[× ,× ]
23
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Convolutional Neural Network (CNN)
0 -1 0
-1 5 -1
0 -1 0
g[× ,× ]
24
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Neural Networks are Effective
imageNet
25
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Summary
• Introduction to Neural Networks
– Multi-layer perceptron
– Activation Functions
• Training and Testing of Neural Networks
– Training: Forward and Backward
– Testing: Backward Forward
• Convolutional Neural Networks
26
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
27
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"