Week6_Intro to Convolutional Neural Networks
Week6_Intro to Convolutional Neural Networks
So far, the structure of our neural network treats all inputs interchangeably.
No relationships between the individual inputs
Just an ordered set of variables
We want to incorporate domain knowledge into the architecture of a Neural Network.
2
Motivation
Image data has important structures, such as;
”Topology” of pixels
Translation invariance
Issues of lighting and contrast
Knowledge of human visual system
Nearby pixels tend to have similar values
Edges and shapes
Scale Invariance—objects may appear at different sizes in the image.
3
Motivation—Image Data
Fully connected would require a vast number of parameters
MNIST images are small (32 x 32 pixels) and in grayscale
Color images are more typically at least (200 x 200) pixels x 3 color channels (RGB) =
120,000 values.
A single fully connected layer would require (200x200x3)2 = 14,400,000,000 weights!
Variance (in terms of bias-variance) would be too high
So we introduce “bias” by structuring the network to look for certain kinds of patterns
4
Motivation
Features need to be “built up”
Edges -> shapes -> relations between shapes
Textures
Cat = two eyes in certain relation to one another + cat fur texture.
Eyes = dark circle (pupil) inside another circle.
Circle = particular combination of edge detectors.
Fur = edges in certain pattern.
5
Kernels
A kernel is a grid of weights “overlaid” on image, centered on one pixel
Each weight multiplied with pixel underneath it
6
Kernel: 3x3 Example
Input Kernel Output
3 2 1 -1 0 1
1 2 3 -2 0 2
1 1 1 -1 0 1
7
Kernel: 3x3 Example
Output
-1 0 1
3 2 1
-2 0 2
1 2 3
-1 0 1
1 1 1
8
Kernel: 3x3 Example
Input Kernel Output
3 2 1 -1 0 1
1 2 3 -2 0 2 2
1 1 1 -1 0 1
= 3 ⋅ −1 + 2 ⋅ 0 + 1 ⋅ 1
+ 1 ⋅ −2 + 2 ⋅ 0 + 3 ⋅ 2
+ 1 ⋅ −1 + 1 ⋅ 0 + 1 ⋅ 1
= −3 + 1 − 2 + 6 − 1 + 1 = 2
9
Kernels as Feature Detectors
Can think of kernels as a ”local feature detectors”
-1 1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 1 1 1 -1 1 1
-1 1 -1 -1 -1 -1 -1 1 1
10
Convolutional Neural Nets
Primary Ideas behind Convolutional Neural Networks:
Let the Neural Network learn which kernels are most useful
Use same set of kernels across entire image (translation invariance)
Reduces number of parameters and “variance” (from bias-variance point of view)
11
Convolutions
12
Convolution Settings—Grid Size
Grid Size (Height and Width):
The number of pixels a kernel “sees” at once
Typically use odd numbers so that there is a “center” pixel
Kernel does not need to be square
13
Convolution Settings—Padding
Padding
Using Kernels directly, there will be an “edge effect”
Pixels near the edge will not be used as “center pixels” since there are not enough
surrounding pixels
Padding adds extra pixels around the frame
So every pixel of the original image will be a center pixel as the kernel moves
across the image
Added pixels are typically of value zero (zero-padding)
14
Without Padding
1 2 0 3 1 -1 1 2 -2
1 0 0 2 2 1 1 0
2 1 2 1 1 -1 -2 0
0 0 1 0 0 Kernel Output
1 2 1 1 1
Input
15
With Padding
0 0 0 0 0 0 0
0 1 2 0 3 1 0 -1 1 2 -1
0 1 0 0 2 2 0
1 1 0
0 2 1 2 1 1 0
-1 -2 0
0 0 0 1 0 0 0
0 1 2 1 1 1 0 Kernel
0 0 0 0 0 0 0 Output
Input
16
Convolution Settings
Stride
The ”step size” as the kernel moves across the image
Can be different for vertical and horizontal steps (but usually is the same value)
When stride is greater than 1, it scales down the output dimension
17
Stride 2 Example—No Padding
1 2 0 3 1 -1 1 2 -2 3
1 0 0 2 2 1 1 0 0
2 1 2 1 1 -1 -2 0 Output
0 0 1 0 0 Kernel
1 2 1 1 1
Input
18
Stride 2 Example—with Padding
0 0 0 0 0 0 0
0 1 2 0 3 1 0 -1 1 2 -1 2
0 1 0 0 2 2 0
1 1 0 3
0 2 1 2 1 1 0
-1 -2 0
0 0 0 1 0 0 0
0 1 2 1 1 1 0 Kernel Output
0 0 0 0 0 0 0
Input
19
Convolutional Settings—Depth
In images, we often have multiple numbers associated with each pixel location.
These numbers are referred to as “channels”
– RGB image—3 channels
– CMYK—4 channels
The number of channels is referred to as the “depth”
So the kernel itself will have a “depth” the same size as the number of input channels
Example: a 5x5 kernel on an RGB image
– There will be 5x5x3 = 75 weights
20
Convolutional Settings—Depth
The output from the layer will also have a depth
The networks typically train many different kernels
Each kernel outputs a single number at each pixel location
So if there are 10 kernels in a layer, the output of that layer will have depth 10.
21
Pooling
Idea: Reduce the image size by mapping a patch of pixels to a single value.
Shrinks the dimensions of the image.
Does not have parameters, though there are different types of pooling operations.
22
Pooling: Max-pool
For each distinct patch, represent it by the maximum
2x2 maxpool shown below
2 1 0 -1
-3 8 2 5 8 5
1 -1 3 4 maxpool 1 4
0 1 1 -2
23
Pooling: Average-pool
For each distinct patch, represent it by the average
2x2 avgpool shown below
2 1 0 -1
-3 8 2 5 2 1.5
0 1 1 -2
24