Lecture_10_slides_-_after
Lecture_10_slides_-_after
1 N
▪ J= N
∑i=1 L (i)
▪ Weights updated after calculating the gradient over the entire dataset
• slow
• requires large memory
▪ J = L (i)
▪ Weights updated after calculating the gradient over the entire dataset
• Faster than SGD
• Reduces variance of gradient estimation
Optimization
Learning rate
▪ Momentum update
▪ Nesterov Accelerated Gradient (NAG)
▪ Adam
▪ and more…
Convolutional Neural Networks
Real-World Problem
Detecting and Classifying Pavement Distress
! ! ! !
! !
!
Pavement
distress
Convolutional Neural Networks (CNN)
Intro - Handling images with fully-connected NN
3072x1 input
3x32x32 image
Flatten
height (32)
depth (3)
→ for the 3 color channels: R, G, B
Convolutional Neural Networks
Intro - Handling images with fully-connected NN
-1 -1 -1
-1 8 -1
-1 -1 -1
https://muthu.co/basics-of-image-convolution/
Convolutional
2D Convolution computation example
To show how the convolution operation is computed, let’s use a simpler example:
5x5 input, 3x3 filter
1 0 3 0 2 1 -1 0
0 3 4 0 2 0 2 -3
1 0 2 0 1 -1 0 2
8 12 0 1 0 Bias: b = 0
0 6 3 2 0
Convolutional
Convolution computation example
1 0 3 0 2 1 -1 0 2
0 3 4 0 2
⋅ 0 2 -2 =
1 0 2 0 1 -1 0 2
8 12 0 1 0 Bias: b = 0
1 0 3 0 2 1 -1 0 2 5
0 3 4 0 2
⋅ 0 2 -2 =
1 0 2 0 1 -1 0 2
8 12 0 1 0 Bias: b = 0
=5
I
Convolutional
Convolution computation example
1 0 3 0 2 1 -1 0 2 5 -1
0 3 4 0 2
⋅ 0 2 -2 =
1 0 2 0 1 -1 0 2
8 12 0 1 0 Bias: b = 0
1 0 3 0 2 1 -1 0 2 5 -1
0 3 4 0 2
⋅ 0 2 -2 = -15
1 0 2 0 1 -1 0 2
8 12 0 1 0 Bias: b = 0
1 0 3 0 2 1 -1 0 2 5 -1
0 3 4 0 2
⋅ 0 2 -2 = -15 -7
1 0 2 0 1 -1 0 2
8 12 0 1 0 Bias: b = 0
1 0 3 0 2 1 -1 0 2 5 -1
0 3 4 0 2
⋅ 0 2 -2 = -15 -7 2
1 0 2 0 1 -1 0 2
8 12 0 1 0 Bias: b = 0
1 0 3 0 2 1 -1 0 2 5 -1
0 3 4 0 2
⋅ 0 2 -2 = -15 -7 2
1 0 2 0 1 -1 0 2 31
8 12 0 1 0 Bias: b = 0
1 0 3 0 2 1 -1 0 2 5 -1
0 3 4 0 2
⋅ 0 2 -2 = -15 -7 2
1 0 2 0 1 -1 0 2 31 -8
8 12 0 1 0 Bias: b = 0
1 0 3 0 2 1 -1 0 2 5 -1
0 3 4 0 2
⋅ 0 2 -2 = -15 -7 2
1 0 2 0 1 -1 0 2 31 -8 1
8 12 0 1 0 Bias: b = 0
3x32x32 image
depth (3)
Convolutional Neural Networks
Convolution Layer
3x32x32 image Filters always extend the full
depth of the input volume
3x5x5 filter
32
3 Note:
Filters are sometimes referred to as kernels
Convolutional Neural Networks
Convolution Layer
3x32x32 image
32
1 number:
32 ▪ The result of taking a dot product between the
filter and a small 3x5x5 chunk of the image
3 • (i.e. 5x5x3=75-dimensional dot product + bias)
Convolutional Neural Networks
Convolution Layer
3x32x32 image activation map
28
32 28
3 1
Convolutional Neural Networks
Convolution Layer
3x32x32 image activation map
28
32 28
3 1
Convolutional Neural Networks
Convolution Layer
3x32x32 image activation map
28
32 28
3 1
Convolutional Neural Networks
Convolution Layer
3x32x32 image activation map
28
32 28
3 1
Convolutional Neural Networks
Convolution Layer
3x32x32 image activation map
28
32 28
3 1
Convolutional Neural Networks
Convolution Layer
3x32x32 image activation map
28
32 28
3 1
Convolutional Neural Networks
Convolution Layer
3x32x32 image activation map
28
32 28
3 1
Convolutional Neural Networks
Convolution Layer
3x32x32 image activation maps
3x5x5 filter
28
32
32 28
Consider a second filter
3 Perform same convolution operation with 1
this filter to get a second activation map
Pooling
Pooling example
Max-pooling, stride of 2
3 0 1 0 2 4
0 1 8 12 0 0
3 12 4
4 0 0 3 2 2
4 3 2
2 0 1 0 1 1
3 6 9
3 2 0 6 0 5
1 0 6 0 0 9
Pooling example
of stude a
max
pooky
a
ppked to the
image
applied to
image
Convolutional Neural Networks
Pooling layer
CNNs may include pooling layers to reduce the spatial size of the representation
Pooling layers require two hyper-parameters: their spatial extent F and their stride S
▪ Most common layer uses 2x2 filters of stride 2 (F = 2, S = 2)
Convolutional applied with a stride
1 0 3 0 2 1 -1 0
0 3 4 0 2 0 2 -3
1 0 2 0 1 -1 0 2
8 12 0 1 0 Bias: b = 0
0 6 3 2 0
Convolutional Neural Networks
Changing the stride
Back to our simple example, but change to stride of 2
1 0 3 0 2 1 -1 0 2
0 3 4 0 2
⋅ 0 2 -2 =
1 0 2 0 1 -1 0 2
8 12 0 1 0 Bias: b = 0
1 0 3 0 2 1 -1 0 2 -1
0 3 4 0 2
⋅ 0 2 -2 =
1 0 2 0 1 -1 0 2
8 12 0 1 0 Bias: b = 0
1 0 3 0 2 1 -1 0 2 -1
0 3 4 0 2
⋅ 0 2 -2 = 31
1 0 2 0 1 -1 0 2
8 12 0 1 0 Bias: b = 0
1 0 3 0 2 1 -1 0 2 -1
0 3 4 0 2
⋅ 0 2 -2 = 31 1
1 0 2 0 1 -1 0 2
8 12 0 1 0 Bias: b = 0
Input (1x5x5) 0 0 0 0 0 0 0
1 0 3 0 2 0 1 0 3 0 2 0
0 3 4 0 2 0 0 3 4 0 2 0
1 0 2 0 1 Zero-padding = 1 0 1 0 2 0 1 0
8 12 0 1 0 0 8 12 0 1 0 0
0 6 3 2 0 0 0 6 3 2 0 0
0 0 0 0 0 0 0
If we use a 3x3 filter with a stride of 1 on the padded input, we get a 5x5 output
→ same size as input
Convolutional Neural Networks
Convolution layer summary
The convolution layer:
▪ Accepts a volume of size Cin × H1 × W1
LeNet-5
LeCun et al. ,1998
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 6, 28, 28] 156
-1 in output shape represents
ReLU-2 [-1, 6, 28, 28] 0 the mini-batch dimension
MaxPool2d-3 [-1, 6, 14, 14] 0
Conv2d-4 [-1, 16, 10, 10] 2,416
ReLU-5 [-1, 16, 10, 10] 0
MaxPool2d-6 [-1, 16, 5, 5] 0
Linear-7 [-1, 120] 48,120
ReLU-8 [-1, 120] 0
Linear-9 [-1, 84] 10,164
ReLU-10 [-1, 84] 0
Linear-11 [-1, 10] 850
Softmax-12 [-1, 10] 0
================================================================
Total params: 61,706
Convolutional Neural Networks
Optional
Popular architectures
AlexNet
Krizhevsky et al., 2012
VGG16 Linear-32
ReLU-33
[-1, 4096]
[-1, 4096]
102,764,544
0
Case study 6.5 from Machine Learning for Engineers book: “Finding volcanos on
Venus with pre-fit models”
Summary - exercises