0% found this document useful (0 votes)
71 views22 pages

Machine Learning With Convolutional Neural Networks

The document summarizes machine learning with convolutional neural networks. It provides an overview of supervised learning and describes single-layer and multi-layer neural networks. It then discusses convolutional neural networks and their applications. Key concepts covered include training data, predictive models, learning rules, activation functions, forward propagation, backpropagation, and updating weights with gradient descent.

Uploaded by

TàngHình
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views22 pages

Machine Learning With Convolutional Neural Networks

The document summarizes machine learning with convolutional neural networks. It provides an overview of supervised learning and describes single-layer and multi-layer neural networks. It then discusses convolutional neural networks and their applications. Key concepts covered include training data, predictive models, learning rules, activation functions, forward propagation, backpropagation, and updating weights with gradient descent.

Uploaded by

TàngHình
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Machine Learning with

Convolutional Neural Networks

UZ

November 29, 2017


Overview

Supervised Learning

Single-layer Neural Network

Multi-layer Neural Network

Neural Network and Convolutional Neural Network


Machine Learning - Overview
correct output

Training Data input output


{input, correct output}

error

I Machine Learning: Adjust the parameters of the Predictive


Model with Training Data and iteratively reduce the error
using a Learning Rule
I Tasks: Prepare Training Data and develop a Predictive Model
with a Learning Rule
I Topics
1. Basics, Single-layer Neural Network, LMS Rule, MNIST
2. Multi-layer Neural Network, Learning Rules, MNIST
3. Convolutional Neural Network, Learning Rules, MNIST
Machine Learning - Applications 1
Input CNN Output

Denoise

Super
resolve

Segment

Detect

Patrol Boat (99%)


Classify
Boat (1%)
Machine Learning - Applications 2
Supervised Learning with Training Data

Training and Testing


correct output

input output

error

Application

input output
Supervised Learning with Neural Network
correct output

Training Data input Neural Network output


{input, correct output} output=f(input)

error

Definitions
x, d input and correct output (desired, target, label) vector
y, e model output and error vector

Output Error
y = f (x) e = d −y
y = f (x)
Y = f (X )
Supervised Learning with Neural Network correct output

Training Data input Neural Network output


{input, correct output} output=f(input)

error

Learning procedure
1. Initialize neural network with adequate weight values
2. Take input/correct output from the training data, feed input
to neural network, obtain the output, and calculate error
3. Adjust the weights to reduce the error with Gradient Descent
Method ∂E
wji (n + 1) = wji (n) − η = wji (n) + ∆wji (n) (1)
∂wji
4. Repeat Steps 2 - 3 for all training data
Single Layer Neural Network
x y=f(wTx+b) x y=f(Wx+b)
b b

input layer output layer wji


xi
input nodes neuron node yj

x, y, d input, output and desired (target) vector


W, f (·) weight matrix and activation function
X
y = f (wT x + b) = f ( wi · xi + b) (2)
i
y = f (Wx + b) (3)
ej = dj − yj (4)
wji (n + 1) = wji (n) + ηej xi LMS algorithm (5)
Activation functions
I Linear function
f (x) = x
I Sigmoid function
1
f (x) =
1 + e −x
I ReLU
f (x) = max(0, x)
f(x)

Sigmoid
ReLU

x
Linear
Multi Layer Neural Network
xi yj yk
wji wkj

I nodes J nodes K nodes


input hidden output
layer layer layer

Forward propagation
I
X
aj = wji xi + bj (6)
i=1
yj = fj (aj ), j = 1, . . . , J (7)
J
X
ak = wkj yj + bk (8)
j=1
yk = fk (ak ), k = 1, . . . , K (9)
ek = dk − yk , k = 1, . . . , K (10)
Generalized Delta Rule - 1
Error
K K
1X 2 1X
E = ek = (dk − yk )2 (11)
2 2
k=1 k=1

Partial derivative of the error with respect to the output layer


weights

∂E ∂fk (ak ) ∂(ak )


= −(dk − yk ) · · (12)
∂wkj ∂(ak ) ∂wkj
 
J
∂(ak ) ∂ X
= wkj yj + bk  = yj (13)
∂wkj ∂wkj
j=1
∂E 0
= −(dk − yk ) · fk (ak ) · yj (14)
∂wkj
Generalized Delta Rule - 2
General update rule
0
wkj (n + 1) = wkj (n) + η · (dk − yk ) · fk (ak ) · yj (15)

Linear output function

fk (ak ) = ak (16)
0
fk (ak ) = 1 (17)
wkj (n + 1) = wkj (n) + η · (dk − yk ) · yj (18)

Sigmoid output function


1
fk (ak ) = (19)
1 + exp(−ak )
0
fk (ak ) = fk (ak ) · (1 − fk (ak )) = yk · (1 − yk ) (20)
wkj (n + 1) = wkj (n) + η · (dk − yk ) · yk · (1 − yk ) · yj (21)
Generalized Delta Rule - 3
General update rule
0
wkj (n + 1) = wkj (n) + η · (dk − yk ) · fk (ak ) · yj (22)

Delta function
0
δk = (dk − yk ) · fk (ak ) (23)
0
= ek · fk (ak ) (24)
xi
wji δj yj ej w δk yk ek
kj

I nodes J nodes K nodes


input hidden output
layer layer layer

Weight-update equation

wkj (n + 1) = wkj (n) + η · δk · yj (25)


Hidden Layer Weight Update - 1
Error
K K K
1X 2 1X 1X
E = ek = (dk − yk )2 = (dk − fk (ak ))2
2 2 2
k=1 k=1 k=1
K J
1 X X
= (dk − fk ( wkj yj + bk ))2 (26)
2
k=1 j=1

Partial derivative of the error with respect to the hidden layer


weights
∂E X ∂fk (ak ) ∂(ak ) ∂yj ∂(aj )
= − (dk − yk )
∂wji ∂(ak ) ∂yj ∂(aj ) ∂wji
k
0 0
X
= − (dk − yk ) · fk (ak ) · wkj · fj (aj ) · xi (27)
k
0 0
X
= −fj (aj ) · xi (dk − yk ) · fk (ak ) · wkj (28)
k
0
X
= −fj (aj ) · xi δk · wkj (29)
k
Hidden Layer Weight Update - 2

0
X
∆wji = ηfj (aj ) · xi δk · wkj (30)
k
0 0
X
δj = fj (aj ) δk · wkj = fj (aj ) · ej (31)
k
∆wji = η · δj · xi (32)
xi
wji δj yj ej w δk yk ek
kj

I nodes J nodes K nodes


input hidden output
layer layer layer

Weight-update equation

wji (n + 1) = wji (n) + η · δj · xi (33)


Neural Network - Forward and Backprop

Forward Propagation

o1 w11 w12 i1
o2 w21 w22 i2
o3 w31 w32
1 1
2 2
3

Back Propagation
i1 w11 w21 w31 o1
i2 w12 w22 w32 o2
o3

o = W·i
i = WT · o
Neural Network - Generalization
x δL-1 yL-1eL-1 L δL yLeL
WL-1 W

Forward

yL−1 = f (WL−1 · x + bL−1 ) (34)


L L L−1 L
y = f (W · y +b ) (35)

Backward

δ L = f 0 (aL ) ◦ eL (36)
L−1 L T L
e = (W ) · δ (37)
L−1 0 L−1 L−1
δ = f (a )◦e (38)
Training and Validation/Testing

Training Data Predictive Model

Training Data
(Batch)
Epoch
Minibatch

Validation Data
Weight Update: Batch, SGD, Minibatch
Batch Gradient Descent (complete dataset or batch)

1 X (m) (m) (m) (m) (m)


∆wkj = η · yk · (1 − yk ) · (dk − yk ) · yj
M
m∈dataset
(39)

Stochastic Gradient Descent (one single m)


(m) (m) (m) (m) (m)
∆wkj = η · yk · (1 − yk ) · (dk − yk ) · yj (40)

Minibatch Gradient Descent (complete dataset split into smaller


minibatches)

1 X (m) (m) (m) (m) (m)


∆wkj = η · yk · (1 − yk ) · (dk − yk ) · yj
M
m∈minibatch
(41)
Neural Network and Convolutional Neural Network
Forward Propagation

o1 w11 w12 i1
o2 w21 w22 i2
o3 w31 w32
1 1
2 2
3

Back Propagation
i1 w11 w21 w31 o1
i2 w12 w22 w32 o2
o3
Forward Correlation Backward Correlation with rot180(w) or rot180(out)

1 2 3 11 12 22 21 22 21 22 21
4 5 6 21 22 1 2 12 11
1 2 12
1 11
2 1 12
2 11
7 8 9 3 4 3 4 3 4 3 4
1
2
3 22 21
1 2 22
1 21
2 1 22
2 21
1 12 11 12
4 3 4 3 11
4 3 12
4 11
2
5
6 3
7 1 2 1 2 1 2
4 22 21
8 3 4 3 21
22 4 3 22
4 21
12 11 12 11 12 11
9

4 3 4 3 4 3
2 11
1 12 11
2 12
1 11 12
2 1
21 22 21 22 21 22

4 11
3 12 11
4 12
3 11 12
4 3
2 21
1 22 21
2 22
1 21 22
2 1

11 12 11 12 11 12
4 21
3 22 4 22
21 3 4 3
21 22
2 1 2 1 2 1
Convolutional Neural Network
Forward Correlation Backward Correlation with rot180(w) or rot180(out)

1 2 3 11 12 22 21 22 21 22 21
4 5 6 21 22 1 2 12 11
1 2 12
1 11
2 1 12
2 11
7 8 9 3 4 3 4 3 4 3 4
1
2
3 22 21
1 2 22
1 21
2 1 22
2 21
1 12 11 12
4 3 4 3 11
4 3 12
4 11
2
5
6 3
7 1 2 1 2 1 2
4 22 21
8 3 4 3 21
22 4 3 22
4 21
12 11 12 11 12 11
9

4 3 4 3 4 3
2 11
1 12 11
2 12
1 11 12
2 1
21 22 21 22 21 22

4 11
3 12 11
4 12
3 11 12
4 3
2 21
1 22 21
2 22
1 21 22
2 1

11 12 11 12 11 12
4 21
3 22 4
21 3
22 4 3
21 22
2 1 2 1 2 1

Y = X ∗ rot180 (W)
X = Y∗W =W∗Y

You might also like