0% found this document useful (0 votes)
54 views

CNN - Convolutional Neural Network

The document discusses convolutional neural networks (CNNs) which use convolutional and pooling layers to extract spatial features from images and reduce their dimensionality before classification. CNNs apply filters to local patches of the input to detect visual features like lines and edges, using multiple filters to extract different features. The convolutional and pooling layers are followed by fully connected layers to perform the final classification task based on the extracted spatial features.

Uploaded by

sayma meem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

CNN - Convolutional Neural Network

The document discusses convolutional neural networks (CNNs) which use convolutional and pooling layers to extract spatial features from images and reduce their dimensionality before classification. CNNs apply filters to local patches of the input to detect visual features like lines and edges, using multiple filters to extract different features. The convolutional and pooling layers are followed by fully connected layers to perform the final classification task based on the extracted spatial features.

Uploaded by

sayma meem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

CSE 465

Lecture 6
CNN – Convolutional Neural Network
Images/videos are just matrix with numbers
What do we want to do?
How to categorize images?

• Use features

• Rectangular shape • Long thick lines • Has nose


• Has bevels • Hanging ropes • Has long ears
• There is a logo of apple • Structure hanging over poles • Has eyes
• White color
Manual feature detection

• Must have domain knowledge


• Have previous experiences with best practices
• Define features
• Define previously known features
• Generate new features
• Use the features to classify
• The classifier will use the features
Manual feature detection: Problems

• Occlusion
• Blocking partially
• Different illumination
• Amount of light/brightness change
• Scale variation/deformation
• Viewpoint variation
How can we use machine to learn features?
Implementation so far

• Use fully connected feed-forward neural network with many layers


• Hopefully each layer will learn some important features
• And, finally we will be able to represent the correct function
• However, no spatial information
• Many, Many features
• Input
• Flattened 1-D vector of image representing numbers
• However, important spatial information gone!!!
What can we do?

• We need to use the spatial features


• How
• We can use filters to detect visual features like lines/segments etc.
• Do not use fully connected layers for the entire input
• Image size could be more than 256X256 nowadays
• If we use 1000 neurons for the first hidden layer
• We need to learn around 200 million parameters for the first layer
• Any bigger than that --- impossible to fit inside a single computer memory
• Use an architecture which reduces the images into features
• Learn the features first
• Then use the features to classify/recognize
First: Use spatial feature
And that gives us – reduced input layer

• Now, instead of feeding all input values to all nodes


• Feed only portion of image to particular neuron in hidden layers
And a match made in heaven - convolution
In practice: same filter sliding window algorithm
What does it do?
What does it do?
What does it do?
Can we use it to pick features?
Feature extraction with convolution

• Filter of size 4x4 : 16 different weights


• Apply this same filter to 4x4 patches
in input
• Shift by 1/2 (stride) pixels for next
patch

• Apply a set of weights – a filter – to


extract local features
• Use multiple filters to extract different
features
• Spatially share parameters of each
filter
One convolution layer
The whole CNN

cat dog ……
Convolution

Max Pooling
Can repeat
Fully Connected many
Feedforward network
Convolution times

Max Pooling

Flattened
Max Pooling
1 -1 -1 -1 1 -1
-1 1 -1 Filter 1 -1 1 -1 Filter 2
-1 -1 1 -1 1 -1

3 -1 -3 -1 -1 -1 -1 -1

-3 1 0 -3 -1 -1 -2 1

-3 -3 0 1 -1 -1 -2 1

3 -2 -2 -1 -1 0 -4 3
Why Pooling
• Subsampling pixels will not change the object
bird
bird

Subsampling

We can subsample the pixels to make image smaller


fewer parameters to characterize the image
A CNN compresses a fully connected network

• Reduces number of connections


• Shared weights on the edges
• Max pooling further reduces the complexity
Max Pooling

New image
1 0 0 0 0 1 but smaller
0 1 0 0 1 0 Conv
0 0 1 1 0 0 3 0
-1 1
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0 Max 30 13
Pooling
2 x 2 image
6 x 6 image
Each filter
is a channel
The whole CNN
3 0
-1 1 Convolution

3 1
0 3
Max Pooling
Can repeat
A new image
many
Convolution times
Smaller than the original
image
The number of channels Max Pooling

is the number of filters


The whole CNN
cat dog ……
Convolution

Max Pooling

Fully Connected A new image


Feedforward network
Convolution

Max Pooling

Flattened A new image


3
Flattening
0

1
3 0
-1 1 3

3 1 -1
0 3 Flattened

1 Fully Connected
Feedforward network

3
Convolusion representation using volumes
In summary CNN composed of two networks
Training CNN

• Learn weights for convolutional filters and fully connected layers using
backpropagation and the log loss (cross-entropy loss) function
CNN Application: Self driving cars/drones

• Convolution then de-convolution


• Get a pixel level map
Self driving cars/Drones
Generate image caption/description

• Use CNN to detect the image features


• Then instead of using the FCN
• Use RCN to generate the description

You might also like