0% found this document useful (0 votes)
4 views

Lecture_10_slides_-_after

The document provides an overview of convolutional neural networks (CNNs) and their applications, particularly in image processing and autoencoders. It discusses the optimization techniques used in training neural networks, including various gradient descent methods and their impact on convergence. Additionally, it highlights the importance of CNNs in efficiently modeling spatial relationships in images without losing structural information.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture_10_slides_-_after

The document provides an overview of convolutional neural networks (CNNs) and their applications, particularly in image processing and autoencoders. It discusses the optimization techniques used in training neural networks, including various gradient descent methods and their impact on convergence. Additionally, it highlights the importance of CNNs in efficiently modeling spatial relationships in images without losing structural information.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 66

u

Convulotional neural net


Data statistics
Outline

▪ Brief recap of neural networks


• Application: autoencoders
▪ Convolutional neural networks
3
Introduction Linear regression Logistic regression

Feature engineering Data statistics Naive Bayes

KNN Clustering Dimensionality reduction

Neural networks Convolutional neural Decision-trees


networks
Last time

“3-layer Neural Net”, or


“2-hidden-layer Neural Net”

Additional resource: https://www.deeplearningbook.org/


Example application of Neural Networks
Autoencoder
Recall - dimensionality reduction
PCA
Autoencoder
Autoencoder vs. PCA

Top: Some examples of the original MNIST


test samples

Middle: Reconstructed output from an auto-


encoder with a latent space of 8 dimensions
This auto-encoder uses convolutional layers, and
was trained on the MNIST training set

Bottom: Reconstructed output from PCA with


8 latent dimensions

Image credit: F. Fleuret, Deep Learning (EPFL)


Autoencoder
Autoencoder vs. PCA
Training for NN
Goal of optimization in ML:
N (i) (i) (i)
Minimize cost over batch: ∑ L =L(y , ŷ ) of the i
i=1
-th training example of batch

Want the optimization to:


• Converge quickly
• Find a good local minima (or even global minima)

Gradient descent (and variants) is the preferred way to


optimize neural networks

Choice of optimizer and hyper-parameters affect speed of


convergence and kind of local minima found
A. Amini et al. Spatial Uncertainty Sampling for
End-to-End Control, 2019
Gradient descent variants
(Vanilla / Batch) Gradient descent (GD):

1 N
▪ J= N
∑i=1 L (i)

▪ Weights updated after calculating the gradient over the entire dataset
• slow
• requires large memory

Stochastic gradient descent (SGD):

▪ J = L (i)

▪ Weights updated after calculating the gradient of a single example


• requires much less memory than GD
• high variance in parameter updates

Mini-batch Gradient descent


1 Nb
▪ J= Nb
∑i=1 L (i)

▪ Weights updated after calculating the gradient over the entire dataset
• Faster than SGD
• Reduces variance of gradient estimation
Optimization
Learning rate

Image credit: Jeremy Jordan (https://www.jeremyjordan.me/nn-learning-rate/)


Optimization
Optimizers

Variants of gradient descent are


commonly used in practice to speed-up
and improve convergence:

▪ Momentum update
▪ Nesterov Accelerated Gradient (NAG)
▪ Adam
▪ and more…
Convolutional Neural Networks
Real-World Problem
Detecting and Classifying Pavement Distress

! ! ! !
! !

Why? On-time preventive maintenance

Lack of on-time maintenance


• x3 the maintenance cost
• traffic delay
• more fuel consumption
• accidents
• …
Automatic pavement distress monitoring

!
Pavement
distress
Convolutional Neural Networks (CNN)
Intro - Handling images with fully-connected NN

3072x1 input
3x32x32 image

Flatten

height (32)

By flattening, spatial structure gets lost!


width (32)

depth (3)
→ for the 3 color channels: R, G, B
Convolutional Neural Networks
Intro - Handling images with fully-connected NN

A fully-connected neural net:


▪ Requires flattening the image
→ spatial structure gets lost Flatten
▪ Doesn’t scale well to large images
• e.g. 1024x1024x3 image results in
3’145’728 weights for each neuron of first
hidden layer

How to efficiently model correlation between neighboring pixels?


=> Convolutional Neural Networks
Convolution definition
Convolution of a signal with a filter
Convolution - what to do at boundaries
Convolution to get features
Filters can approximate what happens in neighborhood with a few numbers:

give average value of signal in a neighborhood

sharpen the signal

blur the signal

approximate derivative of a signal in a neighborhood:

approximate second derivative of a signal in a neighborhood


Convolution with a filter is a linear function
Convolution extensions
Example of image convolution

-1 -1 -1

-1 8 -1

-1 -1 -1

https://muthu.co/basics-of-image-convolution/
Convolutional
2D Convolution computation example
To show how the convolution operation is computed, let’s use a simpler example:
5x5 input, 3x3 filter

Input (5x5) Filter (3x3)

1 0 3 0 2 1 -1 0

0 3 4 0 2 0 2 -3

1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0
Convolutional
Convolution computation example

1 0 3 0 2 1 -1 0 2

0 3 4 0 2
⋅ 0 2 -2 =
1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 1x1 + 0x(-1) + 3x0 + 0x0 + 3x2 +


4x(-2) + 1x(-1) + 0x0 + 2x2 + 0
=2
Convolutional
Convolution computation example

1 0 3 0 2 1 -1 0 2 5

0 3 4 0 2
⋅ 0 2 -2 =
1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 0x1 + 3x(-1) + 0x0 + 3x0 + 4x2 +


0x(-2) + 0x(-1) + 2x0 + 0x2 + 0 b
ye

=5
I
Convolutional
Convolution computation example

1 0 3 0 2 1 -1 0 2 5 -1

0 3 4 0 2
⋅ 0 2 -2 =
1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 3x1 + 0x(-1) + 2x0 + 4x0 + 0x2 +


2x(-2) + 2x(-1) + 0x0 + 1x2 + 0
= -1
Convolutional
Convolution computation example

1 0 3 0 2 1 -1 0 2 5 -1

0 3 4 0 2
⋅ 0 2 -2 = -15

1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 0x1 + 3x(-1) + 4x0 + 1x0 + 0x2 +


2x(-2) + 8x(-1) + 12x0 + 0x2 + 0
= -15
Convolutional
Convolution computation example

1 0 3 0 2 1 -1 0 2 5 -1

0 3 4 0 2
⋅ 0 2 -2 = -15 -7

1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 3x1 + 4x(-1) + 0x0 + 0x0 + 2x2 +


0x(-2) + 12x(-1) + 0x0 + 1x2 + 0
= -7
Convolutional
Convolution computation example

1 0 3 0 2 1 -1 0 2 5 -1

0 3 4 0 2
⋅ 0 2 -2 = -15 -7 2

1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 4x1 + 0x(-1) + 2x0 + 2x0 + 0x2 +


1x(-2) + 0x(-1) + 1x0 + 0x2 + 0
=2
Convolutional
Convolution computation example

1 0 3 0 2 1 -1 0 2 5 -1

0 3 4 0 2
⋅ 0 2 -2 = -15 -7 2

1 0 2 0 1 -1 0 2 31

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 1x1 + 0x(-1) + 2x0 + 8x0 + 12x2


+ 0x(-2) + 0x(-1) + 6x0 + 3x2 + 0
= 31
Convolutional
Convolution computation example

1 0 3 0 2 1 -1 0 2 5 -1

0 3 4 0 2
⋅ 0 2 -2 = -15 -7 2

1 0 2 0 1 -1 0 2 31 -8

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 0x1 + 2x(-1) + 0x0 + 12x0 + 0x2 +


1x(-2) + 6x(-1) + 3x0 + 2x2 + 0
= -8
Convolutional Neural Networks
Convolution computation example

1 0 3 0 2 1 -1 0 2 5 -1

0 3 4 0 2
⋅ 0 2 -2 = -15 -7 2

1 0 2 0 1 -1 0 2 31 -8 1

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 2x1 + 0x(-1) + 1x0 + 0x0 + 1x2 +


0x(-2) + 3x(-1) + 2x0 + 0x2 + 0
=1
Convolution example

Example from ML4Engineers book.


Convolutional Neural Networks (CNN)
Convolution Layer

3x32x32 image

In PyTorch, images are represented as


(CxHxW)
• C: number of channels (depth)
• H: height
• W: width
height (32)
A pixel can be represented by a vector of 3 color (R, G, B)
intensities
width (32) I(c, h, w)

depth (3)
Convolutional Neural Networks
Convolution Layer
3x32x32 image Filters always extend the full
depth of the input volume

3x5x5 filter

Convolve the filter with the image


32 i.e. “slide over the image spatially,
computing dot products”

32

3 Note:
Filters are sometimes referred to as kernels
Convolutional Neural Networks
Convolution Layer
3x32x32 image

3x5x5 filter (w)

32

1 number:
32 ▪ The result of taking a dot product between the
filter and a small 3x5x5 chunk of the image
3 • (i.e. 5x5x3=75-dimensional dot product + bias)
Convolutional Neural Networks
Convolution Layer
3x32x32 image activation map

3x5x5 filter (w)

28

32 Convolve (slide) over all


spatial locations

32 28

3 1
Convolutional Neural Networks
Convolution Layer
3x32x32 image activation map

28

32 Convolve (slide) over all


spatial locations

32 28

3 1
Convolutional Neural Networks
Convolution Layer
3x32x32 image activation map

28

32 Convolve (slide) over all


spatial locations

32 28

3 1
Convolutional Neural Networks
Convolution Layer
3x32x32 image activation map

28

32 Convolve (slide) over all


spatial locations

32 28

3 1
Convolutional Neural Networks
Convolution Layer
3x32x32 image activation map

28

32 Convolve (slide) over all


spatial locations

32 28

3 1
Convolutional Neural Networks
Convolution Layer
3x32x32 image activation map

28

32 Convolve (slide) over all


spatial locations

32 28

3 1
Convolutional Neural Networks
Convolution Layer
3x32x32 image activation map

28

32 Convolve (slide) over all


spatial locations

32 28

3 1
Convolutional Neural Networks
Convolution Layer
3x32x32 image activation maps
3x5x5 filter

28

32

32 28
Consider a second filter
3 Perform same convolution operation with 1
this filter to get a second activation map
Pooling
Pooling example

Max-pooling, stride of 2

Input (1x6x6) Output (1x3x3)

3 0 1 0 2 4

0 1 8 12 0 0
3 12 4
4 0 0 3 2 2
4 3 2
2 0 1 0 1 1
3 6 9
3 2 0 6 0 5

1 0 6 0 0 9
Pooling example

Example from ML4Engineers book.

of stude a
max
pooky
a
ppked to the
image

(after Laplacian Kernel

applied to
image
Convolutional Neural Networks
Pooling layer
CNNs may include pooling layers to reduce the spatial size of the representation

Pooling layers require two hyper-parameters: their spatial extent F and their stride S
▪ Most common layer uses 2x2 filters of stride 2 (F = 2, S = 2)
Convolutional applied with a stride

We can also apply convolution with a stride of 2

Input (1x5x5) Filter (1x3x3)

1 0 3 0 2 1 -1 0

0 3 4 0 2 0 2 -3

1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0
Convolutional Neural Networks
Changing the stride
Back to our simple example, but change to stride of 2

Input (1x5x5) Filter (1x3x3) Output (1x2x2)

1 0 3 0 2 1 -1 0 2

0 3 4 0 2
⋅ 0 2 -2 =
1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 1x1 + 0x(-1) + 3x0 + 0x0 + 3x2 +


4x(-2) + 1x(-1) + 0x0 + 2x2 + 0
=2
Convolutional Neural Networks
Changing the stride
Back to our simple example, but change to stride of 2

Input (1x5x5) Filter (1x3x3) Output (1x2x2)

1 0 3 0 2 1 -1 0 2 -1

0 3 4 0 2
⋅ 0 2 -2 =
1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 3x1 + 0x(-1) + 2x0 + 4x0 + 0x2 +


2x(-2) + 2x(-1) + 0x0 + 1x2 + 0
= -1
Convolutional Neural Networks
Changing the stride
Back to our simple example, but change to stride of 2

Input (1x5x5) Filter (1x3x3) Output (1x2x2)

1 0 3 0 2 1 -1 0 2 -1

0 3 4 0 2
⋅ 0 2 -2 = 31

1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 1x1 + 0x(-1) + 2x0 + 8x0 + 12x2


+ 0x(-2) + 0x(-1) + 6x0 + 3x2 + 0
= 31
Convolutional Neural Networks
Changing the stride
Back to our simple example, but change to stride of 2

Input (1x5x5) Filter (1x3x3) Output (1x2x2)

1 0 3 0 2 1 -1 0 2 -1

0 3 4 0 2
⋅ 0 2 -2 = 31 1

1 0 2 0 1 -1 0 2

8 12 0 1 0 Bias: b = 0

0 6 3 2 0 2x1 + 0x(-1) + 1x0 + 0x0 + 1x2 +


0x(-2) + 3x(-1) + 2x0 + 0x2 + 0
=1
Convolutional Neural Networks
Zero-padding
Height and width shrink quite quickly due to the repeated convolutions
To avoid this, we can add zero-padding:
Zero-padded input (1x7x7)

Input (1x5x5) 0 0 0 0 0 0 0

1 0 3 0 2 0 1 0 3 0 2 0

0 3 4 0 2 0 0 3 4 0 2 0

1 0 2 0 1 Zero-padding = 1 0 1 0 2 0 1 0

8 12 0 1 0 0 8 12 0 1 0 0

0 6 3 2 0 0 0 6 3 2 0 0

0 0 0 0 0 0 0

If we use a 3x3 filter with a stride of 1 on the padded input, we get a 5x5 output
→ same size as input
Convolutional Neural Networks
Convolution layer summary
The convolution layer:
▪ Accepts a volume of size Cin × H1 × W1

▪ Requires four hyper-parameters:


• Number of filters K
• Spatial extent of filters F
• Stride S
• Amount of zero padding P

▪ Produces of a volume of size Cout × H2 × W2 where:


• Cout = K
• H2 = (H1 − F + 2P)/S + 1
• W2 = (W1 − F + 2P)/S + 1
Convolutional Neural Networks
Convolution layer summary
The convolution layer:
▪ Accepts a volume of size Cin × H1 × W1

▪ Requires four hyper-parameters:


• Number of filters K
• Spatial extent of filters F
• Stride S
• Amount of zero padding P

▪ Produces of a volume of size Cout × H2 × W2 where: Note:

• Cout = K There are F ⋅ F ⋅ Cin weights per filter,


for a total of (F ⋅ F ⋅ Cin) ⋅ K weights
• H2 = (H1 − F + 2P)/S + 1
and K biases per layer
• W2 = (W1 − F + 2P)/S + 1
Convolutional neural net

From: Machine Learning for Engineers book


Convolutional Neural Networks
Perception tasks
Convolutional Neural Networks
Optional
Popular architectures

LeNet-5
LeCun et al. ,1998

----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 6, 28, 28] 156
-1 in output shape represents
ReLU-2 [-1, 6, 28, 28] 0 the mini-batch dimension
MaxPool2d-3 [-1, 6, 14, 14] 0
Conv2d-4 [-1, 16, 10, 10] 2,416
ReLU-5 [-1, 16, 10, 10] 0
MaxPool2d-6 [-1, 16, 5, 5] 0
Linear-7 [-1, 120] 48,120
ReLU-8 [-1, 120] 0
Linear-9 [-1, 84] 10,164
ReLU-10 [-1, 84] 0
Linear-11 [-1, 10] 850
Softmax-12 [-1, 10] 0
================================================================
Total params: 61,706
Convolutional Neural Networks
Optional
Popular architectures

AlexNet
Krizhevsky et al., 2012

Winner of ImageNet Competition 2012


Convolutional Neural Networks
Optional
Popular architectures ----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 224, 224] 1,792
ReLU-2 [-1, 64, 224, 224] 0
Conv2d-3 [-1, 64, 224, 224] 36,928
ReLU-4 [-1, 64, 224, 224] 0
MaxPool2d-5 [-1, 64, 112, 112] 0
Conv2d-6 [-1, 128, 112, 112] 73,856
ReLU-7 [-1, 128, 112, 112] 0
Conv2d-8 [-1, 128, 112, 112] 147,584
ReLU-9 [-1, 128, 112, 112] 0
MaxPool2d-10 [-1, 128, 56, 56] 0
Conv2d-11 [-1, 256, 56, 56] 295,168
ReLU-12 [-1, 256, 56, 56] 0
Conv2d-13 [-1, 256, 56, 56] 590,080
ReLU-14 [-1, 256, 56, 56] 0
Conv2d-15 [-1, 256, 56, 56] 590,080
ReLU-16 [-1, 256, 56, 56] 0
MaxPool2d-17 [-1, 256, 28, 28] 0
Conv2d-18 [-1, 512, 28, 28] 1,180,160
ReLU-19 [-1, 512, 28, 28] 0
Conv2d-20 [-1, 512, 28, 28] 2,359,808
ReLU-21 [-1, 512, 28, 28] 0
Conv2d-22 [-1, 512, 28, 28] 2,359,808
ReLU-23 [-1, 512, 28, 28] 0
MaxPool2d-24 [-1, 512, 14, 14] 0
Conv2d-25 [-1, 512, 14, 14] 2,359,808
ReLU-26 [-1, 512, 14, 14] 0
Conv2d-27 [-1, 512, 14, 14] 2,359,808
ReLU-28 [-1, 512, 14, 14] 0
Conv2d-29 [-1, 512, 14, 14] 2,359,808
ReLU-30 [-1, 512, 14, 14] 0
MaxPool2d-31 [-1, 512, 7, 7] 0

VGG16 Linear-32
ReLU-33
[-1, 4096]
[-1, 4096]
102,764,544
0

Simonian & Zisserman, 2014


Dropout-34 [-1, 4096] 0
Linear-35 [-1, 4096] 16,781,312
ReLU-36 [-1, 4096] 0
Dropout-37 [-1, 4096] 0
Linear-38 [-1, 1000] 4,097,000
Softmax-39 [-1, 1000] 0
================================================================
Total params: 138,357,544
Convolutional Neural Networks
Optional
Popular architectures

GoogLeNet (Inception v1)


Szegedy et al., 2014

ConvNets have been getting deeper and deeper


e.g. ResNet-152 (He et al, 2015) → 152 layers
Example application: transfer learning
Train network for a task
Example: image classification
Requires large number of training data, and training resources

Modify the trained network for a different task (transfer learning)


Why? Can address limited data and time/resources for training

Case study 6.5 from Machine Learning for Engineers book: “Finding volcanos on
Venus with pre-fit models”
Summary - exercises

You might also like