Machine Learning With Convolutional Neural Networks
Machine Learning With Convolutional Neural Networks
UZ
Supervised Learning
error
Denoise
Super
resolve
Segment
Detect
input output
error
Application
input output
Supervised Learning with Neural Network
correct output
error
Definitions
x, d input and correct output (desired, target, label) vector
y, e model output and error vector
Output Error
y = f (x) e = d −y
y = f (x)
Y = f (X )
Supervised Learning with Neural Network correct output
error
Learning procedure
1. Initialize neural network with adequate weight values
2. Take input/correct output from the training data, feed input
to neural network, obtain the output, and calculate error
3. Adjust the weights to reduce the error with Gradient Descent
Method ∂E
wji (n + 1) = wji (n) − η = wji (n) + ∆wji (n) (1)
∂wji
4. Repeat Steps 2 - 3 for all training data
Single Layer Neural Network
x y=f(wTx+b) x y=f(Wx+b)
b b
Sigmoid
ReLU
x
Linear
Multi Layer Neural Network
xi yj yk
wji wkj
Forward propagation
I
X
aj = wji xi + bj (6)
i=1
yj = fj (aj ), j = 1, . . . , J (7)
J
X
ak = wkj yj + bk (8)
j=1
yk = fk (ak ), k = 1, . . . , K (9)
ek = dk − yk , k = 1, . . . , K (10)
Generalized Delta Rule - 1
Error
K K
1X 2 1X
E = ek = (dk − yk )2 (11)
2 2
k=1 k=1
fk (ak ) = ak (16)
0
fk (ak ) = 1 (17)
wkj (n + 1) = wkj (n) + η · (dk − yk ) · yj (18)
Delta function
0
δk = (dk − yk ) · fk (ak ) (23)
0
= ek · fk (ak ) (24)
xi
wji δj yj ej w δk yk ek
kj
Weight-update equation
0
X
∆wji = ηfj (aj ) · xi δk · wkj (30)
k
0 0
X
δj = fj (aj ) δk · wkj = fj (aj ) · ej (31)
k
∆wji = η · δj · xi (32)
xi
wji δj yj ej w δk yk ek
kj
Weight-update equation
Forward Propagation
o1 w11 w12 i1
o2 w21 w22 i2
o3 w31 w32
1 1
2 2
3
Back Propagation
i1 w11 w21 w31 o1
i2 w12 w22 w32 o2
o3
o = W·i
i = WT · o
Neural Network - Generalization
x δL-1 yL-1eL-1 L δL yLeL
WL-1 W
Forward
Backward
δ L = f 0 (aL ) ◦ eL (36)
L−1 L T L
e = (W ) · δ (37)
L−1 0 L−1 L−1
δ = f (a )◦e (38)
Training and Validation/Testing
Training Data
(Batch)
Epoch
Minibatch
Validation Data
Weight Update: Batch, SGD, Minibatch
Batch Gradient Descent (complete dataset or batch)
o1 w11 w12 i1
o2 w21 w22 i2
o3 w31 w32
1 1
2 2
3
Back Propagation
i1 w11 w21 w31 o1
i2 w12 w22 w32 o2
o3
Forward Correlation Backward Correlation with rot180(w) or rot180(out)
1 2 3 11 12 22 21 22 21 22 21
4 5 6 21 22 1 2 12 11
1 2 12
1 11
2 1 12
2 11
7 8 9 3 4 3 4 3 4 3 4
1
2
3 22 21
1 2 22
1 21
2 1 22
2 21
1 12 11 12
4 3 4 3 11
4 3 12
4 11
2
5
6 3
7 1 2 1 2 1 2
4 22 21
8 3 4 3 21
22 4 3 22
4 21
12 11 12 11 12 11
9
4 3 4 3 4 3
2 11
1 12 11
2 12
1 11 12
2 1
21 22 21 22 21 22
4 11
3 12 11
4 12
3 11 12
4 3
2 21
1 22 21
2 22
1 21 22
2 1
11 12 11 12 11 12
4 21
3 22 4 22
21 3 4 3
21 22
2 1 2 1 2 1
Convolutional Neural Network
Forward Correlation Backward Correlation with rot180(w) or rot180(out)
1 2 3 11 12 22 21 22 21 22 21
4 5 6 21 22 1 2 12 11
1 2 12
1 11
2 1 12
2 11
7 8 9 3 4 3 4 3 4 3 4
1
2
3 22 21
1 2 22
1 21
2 1 22
2 21
1 12 11 12
4 3 4 3 11
4 3 12
4 11
2
5
6 3
7 1 2 1 2 1 2
4 22 21
8 3 4 3 21
22 4 3 22
4 21
12 11 12 11 12 11
9
4 3 4 3 4 3
2 11
1 12 11
2 12
1 11 12
2 1
21 22 21 22 21 22
4 11
3 12 11
4 12
3 11 12
4 3
2 21
1 22 21
2 22
1 21 22
2 1
11 12 11 12 11 12
4 21
3 22 4
21 3
22 4 3
21 22
2 1 2 1 2 1
Y = X ∗ rot180 (W)
X = Y∗W =W∗Y