0% found this document useful (0 votes)
26 views89 pages

9.CNN-1

Uploaded by

8varlock
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views89 pages

9.CNN-1

Uploaded by

8varlock
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

Lecture 9 -

Convolutional
Neural Networks
I2DL: Prof. Dai 1
Fully Connected Neural Network
Width

Depth
I2DL: Prof. Dai 2
Problems using FC Layers on Images
• How to process a tiny image with FC layers

5 weights

5
3 3 neuron layer

I2DL: Prof. Dai 3


Problems using FC Layers on Images
• How to process a tiny image with FC layers

25 weights
For the whole 5 × 5
image on 1
5 channel

5
3 3 neuron layer

I2DL: Prof. Dai 4


Problems using FC Layers on Images
• How to process a tiny image with FC layers

75 weights
For the whole 5 × 5
image on the 3
5 channel

5
3 3 neuron layer

I2DL: Prof. Dai 5


Problems using FC Layers on Images
• How to process a tiny image with FC layers

75 weights
For the whole
5 × 5 image on
75 weights the three
5 channels per
neuron
75 weights
5
3 3 neuron layer

I2DL: Prof. Dai 6


Problems using FC Layers on Images
• How to process a normal image with FC layers

1000

1000
3 3 neuron layer

I2DL: Prof. Dai 7


Problems using FC Layers on Images
• How to process a normal image with FC layers

1000 3 𝑏𝑖𝑙𝑙𝑖𝑜𝑛 weights

1000
3 1000 neuron layer

I2DL: Prof. Dai 8


Why not simply more FC Layers?
We cannot make networks arbitrarily complex

• Why not just go deeper and get better?


– No structure!!
– It is just brute force!
– Optimization becomes hard
– Performance plateaus / drops!

I2DL: Prof. Dai 9


Better Way than FC ?
• We want to restrict the degrees of freedom
– We want a layer with structure
– Weight sharing → using the same weights for different
parts of the image

I2DL: Prof. Dai 10


Using CNNs in Computer Vision

[Li et al., CS231n Course Slides] Lecture 12: Detection and Segmentation
I2DL: Prof. Dai 11
Convolutions

I2DL: Prof. Dai 12


What are Convolutions?

𝑓 ∗ 𝑔 = න 𝑓 𝜏 𝑔 𝑡 − 𝜏 𝑑𝜏
−∞

𝑓 = red
𝑔 = blue
𝑓 ∗ 𝑔 = green

Convolution of two box functions Convolution of two Gaussians


Application of a filter to a function
— The ‘smaller’ one is typically called the filter kernel
I2DL: Prof. Dai 13
What are Convolutions?
Discrete case: box filter
𝑓 4 3 2 -5 3 5 2 5 5 6

𝑔 1/3 1/3 1/3

‘Slide’ filter kernel from left to right; at each position,


compute a single value in the output data

I2DL: Prof. Dai 14


What are Convolutions?
Discrete case: box filter
𝑓 4 3 2 -5 3 5 2 5 5 6

𝑔 1/3 1/3 1/3

𝑓∗𝑔 3

1 1 1
4⋅ +3⋅ +2⋅ = 3
3 3 3

I2DL: Prof. Dai 15


What are Convolutions?
Discrete case: box filter
𝑓 4 3 2 -5 3 5 2 5 5 6

𝑔 1/3 1/3 1/3

𝑓∗𝑔 3 0

1 1 1
3⋅ + 2 ⋅ + (−5) ⋅ = 0
3 3 3

I2DL: Prof. Dai 16


What are Convolutions?
Discrete case: box filter
𝑓 4 3 2 -5 3 5 2 5 5 6

𝑔 1/3 1/3 1/3

𝑓∗𝑔 3 0 0

1 1 1
2⋅ + (−5) ⋅ + 3 ⋅ = 0
3 3 3

I2DL: Prof. Dai 17


What are Convolutions?
Discrete case: box filter
𝑓 4 3 2 -5 3 5 2 5 5 6

𝑔 1/3 1/3 1/3

𝑓∗𝑔 3 0 0 1

1 1 1
−5 ⋅ +3⋅ +5⋅ =1
3 3 3

I2DL: Prof. Dai 18


What are Convolutions?
Discrete case: box filter
𝑓 4 3 2 -5 3 5 2 5 5 6

𝑔 1/3 1/3 1/3

𝑓∗𝑔 3 0 0 1 10/3

1 1 1 10
3⋅ +5⋅ +2⋅ =
3 3 3 3

I2DL: Prof. Dai 19


What are Convolutions?
Discrete case: box filter
𝑓 4 3 2 -5 3 5 2 5 5 6

𝑔 1/3 1/3 1/3

𝑓∗𝑔 3 0 0 1 10/3 4

1 1 1
5⋅ +2⋅ +5⋅ = 4
3 3 3

I2DL: Prof. Dai 20


What are Convolutions?
Discrete case: box filter
𝑓 4 3 2 -5 3 5 2 5 5 6

𝑔 1/3 1/3 1/3

𝑓∗𝑔 3 0 0 1 10/3 4 4

1 1 1
2⋅ +5⋅ +5⋅ = 4
3 3 3

I2DL: Prof. Dai 21


What are Convolutions?
Discrete case: box filter
𝑓 4 3 2 -5 3 5 2 5 5 6

𝑔 1/3 1/3 1/3

𝑓∗𝑔 3 0 0 1 10/3 4 4 16/3

1 1 1 16
5⋅ +5⋅ +6⋅ =
3 3 3 3

I2DL: Prof. Dai 22


What are Convolutions?
Discrete case: box filter
4 3 2 -5 3 5 2 5 5 6

1/3 1/3 1/3

?? 3 0 0 1 10/3 4 4 16/3 ??

What to do at boundaries?

I2DL: Prof. Dai 23


What are Convolutions?
Discrete case: box filter
4 3 2 -5 3 5 2 5 5 6

1/3 1/3 1/3

?? 3 0 0 1 10/3 4 4 16/3 ??

What to do at boundaries?
Option 1: Shrink

3 0 0 1 10/3 4 4 16/3
I2DL: Prof. Dai 24
What are Convolutions?
Discrete case: box filter
0 4 3 2 -5 3 5 2 5 5 6 0

1/3 1/3 1/3

?? 3 0 0 1 10/3 4 4 16/3 ??

1 1 1 7 What to do at boundaries?
0⋅ +4⋅ +3⋅ =
3 3 3 3 Option 2: Pad (often 0’s)

7/3 3 0 0 1 10/3 4 4 16/3 11/3


I2DL: Prof. Dai 25
Convolutions on Images
-5 3 2 -5 3
Image 5 × 5

4 3 2 1 -3
1 0 3 3 5
-2 0 1 4 4

Output 3 × 3
6
5 6 7 9 -1
Kernel 3 × 3

0 -1 0
-1 5 -1 5 ⋅ 3 + −1 ⋅ 3 + −1 ⋅ 2 + −1 ⋅ 0 + −1 ⋅ 4
0 -1 0 = 15 − 9 = 6

I2DL: Prof. Dai 26


Convolutions on Images
-5 3 2 -5 3
Image 5 × 5

4 3 2 1 -3
1 0 3 3 5
-2 0 1 4 4

Output 3 × 3
6 1
5 6 7 9 -1
Kernel 3 × 3

0 -1 0
-1 5 -1 5 ⋅ 2 + −1 ⋅ 2 + −1 ⋅ 1 + −1 ⋅ 3 + −1 ⋅ 3
0 -1 0 = 10 − 9 = 1

I2DL: Prof. Dai 27


Convolutions on Images
-5 3 2 -5 3
Image 5 × 5

4 3 2 1 -3
1 0 3 3 5
-2 0 1 4 4

Output 3 × 3
6 1 8
5 6 7 9 -1
Kernel 3 × 3

0 -1 0
-1 5 -1 5 ⋅ 1 + −1 ⋅ −5 + −1 ⋅ −3 + −1 ⋅ 3
0 -1 0 + −1 ⋅ 2
=5+3= 8

I2DL: Prof. Dai 28


Convolutions on Images
-5 3 2 -5 3
Image 5 × 5

4 3 2 1 -3
1 0 3 3 5
-2 0 1 4 4

Output 3 × 3
6 1 8
5 6 7 9 -1
-7
Kernel 3 × 3

0 -1 0
-1 5 -1 5 ⋅ 0 + −1 ⋅ 3 + −1 ⋅ 0 + −1 ⋅ 1 + −1 ⋅ 3
0 -1 0 = 0 − 7 = −7

I2DL: Prof. Dai 29


Convolutions on Images
-5 3 2 -5 3
Image 5 × 5

4 3 2 1 -3
1 0 3 3 5
-2 0 1 4 4

Output 3 × 3
6 1 8
5 6 7 9 -1
-7 9
Kernel 3 × 3

0 -1 0
-1 5 -1 5 ⋅ 3 + −1 ⋅ 2 + −1 ⋅ 3 + −1 ⋅ 1 + −1 ⋅ 0
0 -1 0 = 15 − 6 = 9

I2DL: Prof. Dai 30


Convolutions on Images
-5 3 2 -5 3
Image 5 × 5

4 3 2 1 -3
1 0 3 3 5
-2 0 1 4 4

Output 3 × 3
6 1 8
5 6 7 9 -1
-7 9 2
Kernel 3 × 3

0 -1 0
-1 5 -1 5 ⋅ 3 + −1 ⋅ 1 + −1 ⋅ 5 + −1 ⋅ 4 + −1 ⋅ 3
0 -1 0 = 15 − 13 = 2

I2DL: Prof. Dai 31


Convolutions on Images
-5 3 2 -5 3
Image 5 × 5

4 3 2 1 -3
1 0 3 3 5
-2 0 1 4 4

Output 3 × 3
6 1 8
5 6 7 9 -1
-7 9 2
Kernel 3 × 3

0 -1 0 -5
-1 5 -1 5 ⋅ 0 + −1 ⋅ 0 + −1 ⋅ 1 + −1 ⋅ 6
0 -1 0 + −1 ⋅ −2
= −5

I2DL: Prof. Dai 32


Convolutions on Images
-5 3 2 -5 3
Image 5 × 5

4 3 2 1 -3
1 0 3 3 5
-2 0 1 4 4

Output 3 × 3
6 1 8
5 6 7 9 -1
-7 9 2
Kernel 3 × 3

0 -1 0 -5 -9
-1 5 -1 5 ⋅ 1 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 7 + −1 ⋅ 0
0 -1 0 = 5 − 14 = −9

I2DL: Prof. Dai 33


Convolutions on Images
-5 3 2 -5 3
Image 5 × 5

4 3 2 1 -3
1 0 3 3 5
-2 0 1 4 4

Output 3 × 3
6 1 8
5 6 7 9 -1
-7 9 2
Kernel 3 × 3

0 -1 0 -5 -9 3
-1 5 -1 5 ⋅ 4 + −1 ⋅ 3 + −1 ⋅ 4 + −1 ⋅ 9 + −1 ⋅ 1
0 -1 0 = 20 − 17 = 3

I2DL: Prof. Dai 34


Image Filters
• Each kernel gives us a different image filter
Edge detection Box mean
Input −1 −1 −1 1 1 1 1
−1 8 −1 1 1 1
9
−1 −1 −1 1 1 1

Sharpen Gaussian blur


0 −1 0
1 1 2 1
−1 5 −1 2 4 2
0 −1 0 16
1 2 1
I2DL: Prof. Dai 35
Convolutions on RGB Images
width height depth Depth dimension *must* match;
i.e., filter extends the full depth of the
image 32 × 32 × 3 input

filter 5 × 5 × 3

32 Convolve filter with image


5 i.e., ‘slide’ over it and:
− apply filter at each location
5 − dot products
3

32
Images have depth: e.g. RGB -> 3 channels
3
I2DL: Prof. Dai 36
Convolutions on RGB Images
32 × 32 × 3 image (pixels 𝑿)

5 × 5 × 3 filter (weights vector 𝒘)

1 number at a time:
equal to dot product between
32 filter weights 𝒘 and 𝒙𝒊 − 𝑡ℎ chunk of
5 the image. Here: 5 ⋅ 5 ⋅ 3 = 75-dim
𝑧𝑖 dot product + bias
5
3 𝑧𝑖 = 𝒘𝑇 𝒙𝑖 + 𝑏

32
5×5×3 ×1 (5 × 5 × 3) × 1 1
3
I2DL: Prof. Dai 37
Convolutions on RGB Images
Activation map
32 × 32 × 3 image (also feature map)

5 × 5 × 3 filter
28
32
5 Convolve

5
3 Slide over all spatial locations 𝑥𝑖
and compute all output 𝑧𝑖 ; 28
32 w/o padding, there are 1
28 × 28 locations
3
I2DL: Prof. Dai 38
Convolution Layer

I2DL: Prof. Dai 39


Convolution Layer

32 × 32 × 3 image Activation maps

5 × 5 × 3 filter
28
32
5 Convolve

5
3
Let’s apply a different filter 28
32 with different weights! 11
3
I2DL: Prof. Dai 40
Convolution Layer
Convolution “Layer”
32 × 32 × 3 image Activation maps

32 28
Convolve

Let’s apply **five** filters,


28
32 each with different weights!
5
3
I2DL: Prof. Dai 41
Convolution Layer
• A basic layer is defined by
– Filter width and height (depth is implicitly given)
– Number of different filter banks (#weight sets)

• Each filter captures a different image characteristic

I2DL: Prof. Dai 42


Different Filters
• Each filter captures different
image characteristics:
– Horizontal edges
– Vertical edges
– Circles
– Squares
– …

[Zeiler & Fergus, ECCV’14] Visualizing and Understanding Convolutional Networks


I2DL: Prof. Dai 43
Dimensions of a
Convolution Layer

I2DL: Prof. Dai 44


Convolution Layers: Dimensions
1 Input: 7×7
Filter: 3×3
Output: 5×5
Image 7 × 7

I2DL: Prof. Dai 45


Convolution Layers: Dimensions
2 Input: 7×7
Filter: 3×3
Output: 5×5
Image 7 × 7

I2DL: Prof. Dai 46


Convolution Layers: Dimensions
3 Input: 7×7
Filter: 3×3
Output: 5×5
Image 7 × 7

I2DL: Prof. Dai 47


Convolution Layers: Dimensions
4 Input: 7×7
Filter: 3×3
Output: 5×5
Image 7 × 7

I2DL: Prof. Dai 48


Convolution Layers: Dimensions
5 Input: 7×7
Filter: 3×3
Output: 5×5
Image 7 × 7

I2DL: Prof. Dai 49


Convolution Layers: Stride
With a stride of 1 Input: 7×7
Filter: 3×3
Stride: 1
Output: 5×5
Image 7 × 7

Stride of 𝑆: apply filter


every 𝑆-th spatial location;
i.e. subsample the image

I2DL: Prof. Dai 50


Convolution Layers: Stride
With a stride of 2 Input: 7×7
Filter: 3×3
Stride: 2
Output: 3×3
Image 7 × 7

I2DL: Prof. Dai 51


Convolution Layers: Stride
With a stride of 2 Input: 7×7
Filter: 3×3
Stride: 2
Output: 3×3
Image 7 × 7

I2DL: Prof. Dai 52


Convolution Layers: Stride
With a stride of 2 Input: 7×7
Filter: 3×3
Stride: 2
Output: 3×3
Image 7 × 7

I2DL: Prof. Dai 53


Convolution Layers: Stride
With a stride of 3 Input: 7×7
Filter: 3×3
Stride: 3
Output: ?×?
Image 7 × 7

I2DL: Prof. Dai 54


Convolution Layers: Stride
With a stride of 3 Input: 7×7
Filter: 3×3
Stride: 3
Output: ?×?
Image 7 × 7

I2DL: Prof. Dai 55


Convolution Layers: Stride
With a stride of 3 Input: 7×7
Filter: 3×3
Stride: 3
Output: ?×?
Image 7 × 7

Does not really fit (remainder left)


→ Illegal stride for input & filter size!

I2DL: Prof. Dai 56


Convolution Layers: Dimensions
Input width of 𝑵 Input: 𝑁×𝑁
Filter: F×𝐹

Filter height of 𝑭
Stride: 𝑆
Input height of 𝑵

Output: 𝑁−𝐹
+1 ×
𝑁−𝐹
+1
𝑆 𝑆
Filter width of 𝑭

7−3
𝑁 = 7, 𝐹 = 3, 𝑆 = 1: + 1=5
1
7−3
𝑁 = 7, 𝐹 = 3, 𝑆 = 2: + 1=3
2
7−3
𝑁 = 7, 𝐹 = 3, 𝑆 = 3: + 1 = 2. 3ത
3
Fractions are illegal
I2DL: Prof. Dai 57
Convolution Layers: Dimensions
Input Image

Conv + Conv + Conv +


24 ReLU
32 ReLU 28 ReLU

5 filters 8 filters 12 filters


5×5×3 5×5×5 24 5×5×8 20
32 28
8 12
3 5
Shrinking down so quickly (32→28→24→20) is typically not a good idea…

I2DL: Prof. Dai 58


Convolution Layers: Padding
Why padding?

• Sizes get small too quickly


• Corner pixel is only used
once

I2DL: Prof. Dai 59


Convolution Layers: Padding
0 0 0 0 0 0 0 0 0 Why padding?
Image 7 × 7 + zero padding

0 0
0 0
• Sizes get small too quickly
0 0
• Corner pixel is only used
0 0
once
0 0
0 0
0 0
0 0 0 0 0 0 0 0 0

I2DL: Prof. Dai 60


Convolution Layers: Padding
Input (𝑁 × 𝑁): 7×7
0 0 0 0 0 0 0 0 0 Filter (𝐹 × 𝐹): 3×3
Image 7 × 7 + zero padding

0 0 Padding (𝑃): 1
0 0 Stride (𝑆): 1
0 0 Output 7×7
0 0 Most common is ‘zero’ padding
0 0
Output Size:
0 0
0 0 𝑁+2⋅𝑃−𝐹 𝑁+2⋅𝑃−𝐹
+1 × +1
𝑆 𝑆
0 0 0 0 0 0 0 0 0
denotes the floor operator (as in
practice an integer division is performed)
I2DL: Prof. Dai 61
Convolution Layers: Padding
0 0 0 0 0 0 0 0 0 Types of convolutions:
Image 7 × 7 + zero padding

0 0
0 0
• Valid convolution: using no
0 0
padding
0 0
0 0
0 0
• Same convolution:
output=input size
0 0
𝐹−1
0 0 0 0 0 0 0 0 0 Set padding to 𝑃 =
2

I2DL: Prof. Dai 62


Convolution Layers: Dimensions
Example
32
Input image: 32 × 32 × 3
10 filters 5 × 5 10 filters
Stride 1 5×5×3
Pad 2 Depth of 3 is implicitly given 32
3
Output size is:
32 + 2 ⋅ 2 − 5
+ 1 = 32 Remember
1
𝑁+2⋅𝑃−𝐹 𝑁+2⋅𝑃−𝐹
Output: +1 × +1
i.e. 32 × 32 × 10 𝑆 𝑆

I2DL: Prof. Dai 63


Convolution Layers: Dimensions
Example
32
Input image: 32 × 32 × 3
10 filters 5 × 5 10 filters
Stride 1 5×5×3
Pad 2 32
3
Output size is:
32 + 2 ⋅ 2 − 5
+ 1 = 32 Remember
1
𝑁+2⋅𝑃−𝐹 𝑁+2⋅𝑃−𝐹
Output: +1 × +1
i.e. 32 × 32 × 10 𝑆 𝑆

I2DL: Prof. Dai 64


Convolution Layers: Dimensions
Example
32
Input image: 32 × 32 × 3
10 filters 5 × 5 10 filters
Stride 1 5×5×3
Pad 2 32
3

Number of parameters (weights):


Each filter has 5 × 5 × 3 + 1 = 76 params (+1 for bias)
-> 76 ⋅ 10 = 760 parameters in layer

I2DL: Prof. Dai 65


Example
• You are given a convolutional layer with 4 filters,
kernel size 5, stride 1, and no padding that operates
on an RGB image.

• Q1: What are the dimensions and the shape of its


weight tensor?

 A1: (3, 4, 5, 5)
 A2: (4, 5, 5)
 A3: depends on the width and height of the image

I2DL: Prof. Dai 66


Example
• You are given a convolutional layer with 4 filters,
kernel size 5, stride 1, and no padding that operates
on an RGB image.

• Q1: What are the dimensions and the shape of its


weight tensor?
A1: (3, 4, 5, 5)
Input Filter size = 5 × 5
channels Output size =
(RGB = 3) 4 filters
I2DL: Prof. Dai 68
Convolutional
Neural Network
(CNN)
I2DL: Prof. Dai 69
CNN Prototype
ConvNet is concatenation of Conv Layers and activations
Input Image

Conv + Conv + Conv +


24 ReLU
32 ReLU 28 ReLU

5 filters 8 filters 12 filters


5×5×3 5×5×5 24 5×5×8 20
32 28
8 12
3 5
I2DL: Prof. Dai 70
CNN Learned Filters

[Zeiler & Fergus, ECCV’14] Visualizing and Understanding Convolutional Networks


I2DL: Prof. Dai 71
Pooling

I2DL: Prof. Dai 72


Pooling Layer

[Li et al., CS231n Course Slides] Lecture 5: Convolutional Neural Networks


I2DL: Prof. Dai 73
Pooling Layer: Max Pooling
Single depth slice of input

3 1 3 5 ‘Pooled’ output
Max pool with
6 0 7 9 2 × 2 filters and stride 2 6 9
3 2 1 4 3 4
0 2 4 3

I2DL: Prof. Dai 74


Pooling Layer

• Conv Layer = ‘Feature Extraction’


– Computes a feature in a given region

• Pooling Layer = ‘Feature Selection’


– Picks the strongest activation in a region

I2DL: Prof. Dai 75


Pooling Layer
• Input is a volume of size 𝑊𝑖𝑛 × 𝐻𝑖𝑛 × 𝐷𝑖𝑛
• Two hyperparameters
– Spatial filter extent 𝐹 Filter count 𝐾 and padding 𝑃
– Stride 𝑆 make no sense here

• Output volume is of size 𝑊𝑜𝑢𝑡 × 𝐻𝑜𝑢𝑡 × 𝐷𝑜𝑢𝑡


𝑊𝑖𝑛 −𝐹
– 𝑊𝑜𝑢𝑡 = +1
𝑆
𝐻𝑖𝑛 −𝐹
– 𝐻𝑜𝑢𝑡 = +1
𝑆
– 𝐷𝑜𝑢𝑡 = 𝐷𝑖𝑛
• Does not contain parameters; e.g. it’s fixed function
I2DL: Prof. Dai 76
Pooling Layer
• Input is a volume of size 𝑊𝑖𝑛 × 𝐻𝑖𝑛 × 𝐷𝑖𝑛
• Two hyperparameters Common settings:
– Spatial filter extent 𝐹 𝐹 = 2, S = 2
𝐹 = 3, 𝑆 = 2
– Stride 𝑆
• Output volume is of size 𝑊𝑜𝑢𝑡 × 𝐻𝑜𝑢𝑡 × 𝐷𝑜𝑢𝑡
𝑊𝑖𝑛 −𝐹
– 𝑊𝑜𝑢𝑡 = +1
𝑆
𝐻𝑖𝑛 −𝐹
– 𝐻𝑜𝑢𝑡 = +1
𝑆
– 𝐷𝑜𝑢𝑡 = 𝐷𝑖𝑛
• Does not contain parameters; e.g. it’s fixed function
I2DL: Prof. Dai 77
Pooling Layer: Average Pooling
Single depth slice of input

3 1 3 5 ‘Pooled’ output
Average pool with
6 0 7 9 2 × 2 filters and stride 2 2.5 6

3 2 1 4 1.75 3

0 2 4 3

• Typically used deeper in the network

I2DL: Prof. Dai 78


CNN Prototype

[Li et al., CS231n Course Slides]


Lecture 5: Convolutional Neural Networks
I2DL: Prof. Dai 79
Final Fully-Connected Layer
• Same as what we had in ‘ordinary’ neural networks
– Make the final decision with the extracted features from
the convolutions
– One or two FC layers typically

I2DL: Prof. Dai 80


Convolutions vs Fully-Connected
• In contrast to fully-connected layers, we want to
restrict the degrees of freedom
– FC is somewhat brute force
– Convolutions are structured

• Sliding window to with the same filter parameters to


extract image features
– Concept of weight sharing
– Extract same features independent of location

I2DL: Prof. Dai 81


Receptive field

I2DL: Prof. Dai 82


Receptive Field
• Spatial extent of the connectivity of a convolutional
filter

=
3x3 filter 3x3 output

5x5 input

I2DL: Prof. Dai 83


Receptive Field
• Spatial extent of the connectivity of a convolutional
filter

=
3x3 filter 3x3 output

5x5 input 3x3 receptive field = 1 output pixel is


connected to 9 input pixels
I2DL: Prof. Dai 84
Receptive Field
• Spatial extent of the connectivity of a convolutional
filter
3x3 output

3x3 receptive field = 1 output pixel is


7x7 input
connected to 9 input pixels
I2DL: Prof. Dai 85
Receptive Field
• Spatial extent of the connectivity of a convolutional
filter
3x3 output

3x3 receptive field = 1 output pixel is


7x7 input
connected to 9 input pixels
I2DL: Prof. Dai 86
Receptive Field
• Spatial extent of the connectivity of a convolutional
filter
3x3 output

3x3 receptive field = 1 output pixel is


7x7 input
connected to 9 input pixels
I2DL: Prof. Dai 87
Receptive Field
• Spatial extent of the connectivity of a convolutional
filter
3x3 output

5x5 receptive field on the original input:


7x7 input
one output value is connected to 25 input pixels
I2DL: Prof. Dai 88
See you next time!

I2DL: Prof. Dai 92


References
• Goodfellow et al. “Deep Learning” (2016),
– Chapter 9: Convolutional Networks

• http://cs231n.github.io/convolutional-networks/

I2DL: Prof. Dai 93

You might also like