0% found this document useful (0 votes)
7 views

Week6_Intro to Convolutional Neural Networks

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Week6_Intro to Convolutional Neural Networks

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Motivation—Image Data

 So far, the structure of our neural network treats all inputs interchangeably.
 No relationships between the individual inputs
 Just an ordered set of variables
 We want to incorporate domain knowledge into the architecture of a Neural Network.

2
Motivation
Image data has important structures, such as;
 ”Topology” of pixels
 Translation invariance
 Issues of lighting and contrast
 Knowledge of human visual system
 Nearby pixels tend to have similar values
 Edges and shapes
 Scale Invariance—objects may appear at different sizes in the image.

3
Motivation—Image Data
 Fully connected would require a vast number of parameters
 MNIST images are small (32 x 32 pixels) and in grayscale
 Color images are more typically at least (200 x 200) pixels x 3 color channels (RGB) =
120,000 values.
 A single fully connected layer would require (200x200x3)2 = 14,400,000,000 weights!
 Variance (in terms of bias-variance) would be too high
 So we introduce “bias” by structuring the network to look for certain kinds of patterns

4
Motivation
 Features need to be “built up”
 Edges -> shapes -> relations between shapes
 Textures
 Cat = two eyes in certain relation to one another + cat fur texture.
 Eyes = dark circle (pupil) inside another circle.
 Circle = particular combination of edge detectors.
 Fur = edges in certain pattern.

5
Kernels
 A kernel is a grid of weights “overlaid” on image, centered on one pixel
 Each weight multiplied with pixel underneath it

 Output over the centered pixel is 𝑃


𝑝=1 𝑊𝑝 ⋅ 𝑝𝑖𝑥𝑒𝑙𝑝

 Used for traditional image processing techniques:


– Blur
– Sharpen
– Edge detection
– Emboss

6
Kernel: 3x3 Example
Input Kernel Output

3 2 1 -1 0 1

1 2 3 -2 0 2

1 1 1 -1 0 1

7
Kernel: 3x3 Example
Output

-1 0 1
3 2 1
-2 0 2
1 2 3
-1 0 1
1 1 1

8
Kernel: 3x3 Example
Input Kernel Output

3 2 1 -1 0 1

1 2 3 -2 0 2 2

1 1 1 -1 0 1

= 3 ⋅ −1 + 2 ⋅ 0 + 1 ⋅ 1
+ 1 ⋅ −2 + 2 ⋅ 0 + 3 ⋅ 2
+ 1 ⋅ −1 + 1 ⋅ 0 + 1 ⋅ 1

= −3 + 1 − 2 + 6 − 1 + 1 = 2

9
Kernels as Feature Detectors
Can think of kernels as a ”local feature detectors”

Vertical Line Horizontal Line


Detector Detector Corner Detector

-1 1 -1 -1 -1 -1 -1 -1 -1

-1 1 -1 1 1 1 -1 1 1

-1 1 -1 -1 -1 -1 -1 1 1

10
Convolutional Neural Nets
Primary Ideas behind Convolutional Neural Networks:

 Let the Neural Network learn which kernels are most useful
 Use same set of kernels across entire image (translation invariance)
 Reduces number of parameters and “variance” (from bias-variance point of view)

11
Convolutions

12
Convolution Settings—Grid Size
Grid Size (Height and Width):
 The number of pixels a kernel “sees” at once
 Typically use odd numbers so that there is a “center” pixel
 Kernel does not need to be square

Height: 3, Width: 3 Height: 1, Width: 3 Height: 3, Width: 1

13
Convolution Settings—Padding
Padding
 Using Kernels directly, there will be an “edge effect”
 Pixels near the edge will not be used as “center pixels” since there are not enough
surrounding pixels
 Padding adds extra pixels around the frame
 So every pixel of the original image will be a center pixel as the kernel moves
across the image
 Added pixels are typically of value zero (zero-padding)

14
Without Padding

1 2 0 3 1 -1 1 2 -2

1 0 0 2 2 1 1 0

2 1 2 1 1 -1 -2 0

0 0 1 0 0 Kernel Output

1 2 1 1 1

Input

15
With Padding
0 0 0 0 0 0 0

0 1 2 0 3 1 0 -1 1 2 -1

0 1 0 0 2 2 0
1 1 0
0 2 1 2 1 1 0
-1 -2 0
0 0 0 1 0 0 0

0 1 2 1 1 1 0 Kernel

0 0 0 0 0 0 0 Output

Input

16
Convolution Settings
Stride
 The ”step size” as the kernel moves across the image
 Can be different for vertical and horizontal steps (but usually is the same value)
 When stride is greater than 1, it scales down the output dimension

17
Stride 2 Example—No Padding

1 2 0 3 1 -1 1 2 -2 3

1 0 0 2 2 1 1 0 0

2 1 2 1 1 -1 -2 0 Output

0 0 1 0 0 Kernel

1 2 1 1 1

Input

18
Stride 2 Example—with Padding
0 0 0 0 0 0 0

0 1 2 0 3 1 0 -1 1 2 -1 2
0 1 0 0 2 2 0
1 1 0 3
0 2 1 2 1 1 0
-1 -2 0
0 0 0 1 0 0 0

0 1 2 1 1 1 0 Kernel Output

0 0 0 0 0 0 0

Input

19
Convolutional Settings—Depth
 In images, we often have multiple numbers associated with each pixel location.
 These numbers are referred to as “channels”
– RGB image—3 channels
– CMYK—4 channels
 The number of channels is referred to as the “depth”
 So the kernel itself will have a “depth” the same size as the number of input channels
 Example: a 5x5 kernel on an RGB image
– There will be 5x5x3 = 75 weights

20
Convolutional Settings—Depth
 The output from the layer will also have a depth
 The networks typically train many different kernels
 Each kernel outputs a single number at each pixel location
 So if there are 10 kernels in a layer, the output of that layer will have depth 10.

21
Pooling
 Idea: Reduce the image size by mapping a patch of pixels to a single value.
 Shrinks the dimensions of the image.
 Does not have parameters, though there are different types of pooling operations.

22
Pooling: Max-pool
 For each distinct patch, represent it by the maximum
 2x2 maxpool shown below

2 1 0 -1

-3 8 2 5 8 5

1 -1 3 4 maxpool 1 4
0 1 1 -2

23
Pooling: Average-pool
 For each distinct patch, represent it by the average
 2x2 avgpool shown below

2 1 0 -1

-3 8 2 5 2 1.5

1 -1 3 4 avgpool .25 1.5

0 1 1 -2

24

You might also like