0% found this document useful (0 votes)
122 views27 pages

Lecture 12 - Neural Networks (DONE!!) PDF

This document provides an overview of Lecture 12 of the EE2211 Introduction to Machine Learning course. The lecture will cover neural networks, including multi-layer perceptrons and activation functions. It will discuss training and testing neural networks using forward and backward propagation, and also cover convolutional neural networks. The lecture notes indicate that while neural networks are an important topic, the course will only provide a conceptual introduction due to time constraints, and exam questions will be relatively simple.

Uploaded by

Sharelle Tew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views27 pages

Lecture 12 - Neural Networks (DONE!!) PDF

This document provides an overview of Lecture 12 of the EE2211 Introduction to Machine Learning course. The lecture will cover neural networks, including multi-layer perceptrons and activation functions. It will discuss training and testing neural networks using forward and backward propagation, and also cover convolutional neural networks. The lecture notes indicate that while neural networks are an important topic, the course will only provide a conceptual introduction due to time constraints, and exam questions will be relatively simple.

Uploaded by

Sharelle Tew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

EE2211 Introduction to

Machine Learning
Lecture 12

Wang Xinchao
[email protected]

!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Course Contents
• Introduction and Preliminaries (Xinchao)
– Introduction
– Data Engineering
– Introduction to Linear Algebra, Probability and Statistics
• Fundamental Machine Learning Algorithms I (Helen)
– Systems of linear equations
– Least squares, Linear regression
– Ridge regression, Polynomial regression
• Fundamental Machine Learning Algorithms II (Thomas)
– Over-fitting, bias/variance trade-off
– Optimization, Gradient descent
– Decision Trees, Random Forest
• Performance and More Algorithms (Xinchao)
– Performance Issues
– K-means Clustering
– Neural Networks
2
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
About this week’s lecture…
• Neural Network (NN) is a very big topic
– In NUS we have multiple full-semester modules to discuss NN
• EE4305 Fuzzy/Neural Systems for Intelligent Robotics
• EE5934/EE6934 Deep Learning
– In EE2211, we only give a very gentle introduction

A
• Understanding at conceptual level is sufficient
– In final exam, we have only 1 True/False + 1 MCQ about NN
– No computation is required

• You will do some computation in tutorial, but final exam


will be much simpler than the questions in tutorial

3
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Outline
• Introduction to Neural Networks
– Multi-layer perceptron
– Activation Functions
• Training and Testing of Neural Networks
– Training: Forward and Backward
– Testing: Backward
• Convolutional Neural Networks

4
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
In ML theperceptionis an algorithmforsupervisedlearningofbinaryclassifiers
Perceptron buildingblockofneuralnetwork

"& bias anartificialneuronthatdoescertaincomputations


y #& intheinput
data
+1 #& #!
W= #
"! #! "
##
#" Σ#&"& '(Σ*! +! )
O
"" !
## vectorproduct X !W WTX C'(X " W)
Neuron
"# Summation Activation
Function
bias
1
"! Neuron
$= "
"
"# Output of Neuron: '(X " W) or '(Σ*! +! )

Activation Function: non-linear function to


introduce non-linearity into the neural networks!

5
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Activation Functions
ISigmoid Activation Function
XTW 1
σ * = , To I
1 + , /01

v
can describeas the
ofinputofclasses
probability

g
XTW
6
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Activation Functions

ReLU Activation Function my www.ngyygnw


σ * = max 0, *
Rectified Linear Unit (ReLU)

7
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
near a
Multilayer Perceptron (Neural Network)
+! usethisas a classifier n'numberofclasses
+$ numbers ofneuron Nested function!
X= + outputwillbemputfornextlayer
%
+& h1 h2 hn-1 hn
player
!! #
2nd
!!,!
layer … &
!!,! Class 1
!,! output
σ1 σ2 … σn-1 σn
x1 ofinput
index

!
!#,! #
!#,! Class 2
x2 σ1 σ2 … σn-1 σn
⁞ ⁞ ⁞ ⁞
Class C
x !
3 !$,$
σ1 σ2 … σn-1 σn
t probability

x0 !
!%,$ 2# (X)
+1 +1 +1
4
bias
+1

layer 0 layer 1 layer 2 layer n-1 layer n


Input layer Output layer
Note: hn denotes the number of hidden neurons in layer n.
8
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Multilayer Perceptron (Neural Network)
!
!%,! ! number
, … , !%,' than
!
then
!
neuronX ( W! = [Σ!),!
!
) , … , Σ!),'! )) ]

494
! !
!!,!, … , !!,'! o
)
c clan vector
of W = !
! !
input !#,! , … , !#,'!
! !
!$,! , … , !$,'!
h1 h2 hn-1 hn
#
!#,#
… &
!#,# σn
x1 σ1 σ2 … σn-1
+! #
!!,# !
+$ !!,#
X= + x2 σ1 σ2 … σn-1 σn
%
⁞ ⁞ ⁞ ⁞
+&
σn-1
x3 !$,$
#
σ1 σ2 … σn

#" #
!%,$ !! (")
+1 +1 +1 +1

layer 0 layer 1 layer 2 layer n-1 layer n


Input layer Output layer
A
A neural network is essentially a nested function.
pad!$ X = $([1, … $( 1, $ * % + & ]%x + ' … ](x + ) )
bits
9
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Outline
• Introduction to Neural Networks
– Multi-layer perceptron
– Activation Functions
• Training and Testing of Neural Networks
– Training: Forward and Backward
– Testing: Backward Forward
• Convolutional Neural Networks

10
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
I Goal of Neural Network
ran
Training:
0.6
X = 0.5
to Learn W
0.7 h1 h2 hn-1 hn
#
!#,#
… &
!#,# σn
0.6
x1 σ1 σ2 … σn-1 0.7
#
!!,# !
!!,#
0.5 x2 σ1 σ2 … σn-1 σn 0.1
⁞ ⁞ ⁞ ⁞
σn-1
0.7 x3 !$,$
#
σ1 σ2 … σn 0.2

#
!%,$ !! (")
0.7
+1 +1 +1 +1 32 = 4* (X) = 0.1
0.2
layer 0 layer 1 layer 2 layer n-1 layer n
Input layer Output layer

ok
Specifically, W is learned through
1. Random initialization
2. Backpropagation
11
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
am Ittihad
I Neural Network Training: Backpropagation
Assume we train a NN for 3-class classification
One-hot
Input data h1 h2

hn-1 hn
Output Class Labels
0.6 #
!#,#
σ1 σ2 … σn-1
&
!#,# σn
0.7 0
x1
0.6 ! #
!
!!,#
X = 0.5
!,#
0.5 σ1 σ2 … σn-1 σn 0.1 0
x2 compare
0.7 ⁞ ⁞ ⁞ ⁞
0.7 σ1 σ2 … σn-1 σn 0.2 1
x3 ! #
$,$

!! (")
0.7 0
#
!%,$
+1 +1 +1 +1
32 = 4* (X) = 0.1 3= 0
layer 0 layer 1 layer 2 layer n-1 layer n
A 0.2 1
Output layer
Input layer
prediction label
A loss function
A
1. Forward: (weights are fixed) Randominitialization for a single sample:
To compute network responses
To compute the errors at each output min# ∑'$%& (65$ − 6$ )(
or
2. Backward: (weights are updated) min# | 65 − 6 |(
To pass back the error from the output to the hidden layers
To update all weights to optimize the network Update W!

12
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Neural Network Training: Backpropagation

• Recall that the parameters W are randomly initialized.


• We use Backpropagation to update W.
• In essence, Backpropagation is gradientAdescent!
• Assume we have N samples, each sample denoted by X4
and the output of NN by 4 4 , loss function is then
"
5= ∑6
45! 47 4 4
−4 , min7 5 tolearnw
Recall gradient descent in Lec 8: w ← w − η∇85
• We would therefore like to compute ∇85!
– < is a function of 6,
5 and 65 is a function of w.
– Use gradient descent and chain rule!
Being aware of the concept is sufficient for exam. No calculation needed.

13
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
a
Neural Network Testing
Once all network is trained and parameters are updated

Input data h1 h2

hn-1 hn
Output
0.6 #
!#,#
σ1 σ2 … σn-1
&
!#,# σn
0.7
x1
0.6 ! #
!
!!,#
X = 0.5
!,#
0.5 σ1 σ2 … σn-1 σn 0.1
x2
0.7 ⁞ ⁞ ⁞ ⁞
0.7 σ1 σ2 … σn-1 σn 0.2
x3 ! #
$,$

#
!%,$ !! (")
+1 +1 +1 +1 0.7
layer 0 layer 1 layer 2 layer n-1 layer n
32 = 0.1
Input layer Output layer 0.2

1. Forward: (weights are fixed)


To estimate compute network responses
To predict the output labels given novel inputs

14
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Supplementary materials
(Not required for exam)
1) https://www.youtube.com/watch?v=tIeHLnjs5U8

This video series includes animations that explain backpropagation


calculus.

2)
https://www.youtube.com/playlist?list=PLQVvvaa0QuDcjD5BAw2DxE6
OF2tius3V3

This video series includes hands-on coding examples in Python.

15
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Outline
• Introduction to Neural Networks
– Multi-layer perceptron
– Activation Functions
• Training and Testing of Neural Networks
– Training: Forward and Backward
– Testing: Backward
• Convolutional Neural Networks

16
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Convolutional Neural Network (CNN)
• A convolutional neural network (CNN) is a special type of
feed-forward network that significantly reduces the
number of parameters in a deep neural network.
• Very popular in image-related applications
• Each image is stored as a matrix in a computer

eachimageismodelled
as a matrix eachpixelis
modelled
as an entry of
the matrix

thehigherthepixel
thebrighter
https://medium.com/lifeandtech/convert-csv-file-to-images-309b6fdb8c49

17
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Convolutional Neural Network (CNN)
• If we model all matrix entries as inputs all at once
– Assume we have an image/matrix size of 200x200
– Assume we have 10K neuros in the first layer
– We already have 200x200x10K=400 Million parameters to learn!
a lot
First Layer Neurons
200
Fully connectedNeural
Networkisout good

10K
200 Neurons
Everyneuronislinkedto
everypixel

18
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Convolutional Neural Network (CNN)
• Hence, we introduce CNN to reduce the number of
parameters.
reducememusetitiatiiidairetriedin scanning

Designonlyoneneuron maketheneuron
to slideovertheimage
slidethroughtheimage
justoneneuron
H makea neurontoonlylookat a
verysmalllocal

19
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Convolutional Neural Network (CNN)

9915141031 1 t 89
my puny
Wi Wiz
0 -1 0

-1 5 -1

0 -1 0

g[× ,× ]
Kernels to
be learned

Image source: https://brilliant.org/wiki/convolutional-neural-network/


20
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Convolutional Neural Network (CNN)

0 -1 0

-1 5 -1

0 -1 0

g[× ,× ]

21
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Convolutional Neural Network (CNN)

0 -1 0

-1 5 -1

0 -1 0

g[× ,× ]

22
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Convolutional Neural Network (CNN)

0 -1 0

-1 5 -1

0 -1 0

g[× ,× ]

23
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Convolutional Neural Network (CNN)

0 -1 0

-1 5 -1

0 -1 0

g[× ,× ]

24
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Neural Networks are Effective
imageNet

25
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
Summary
• Introduction to Neural Networks
– Multi-layer perceptron
– Activation Functions
• Training and Testing of Neural Networks
– Training: Forward and Backward
– Testing: Backward Forward
• Convolutional Neural Networks

26
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"
27
!"#$%&'()*+",,-"./01"233"4()*+5"4656'7681"

You might also like