Convolutional Neural Networks
Convolutional Neural Networks
Images haven’t loaded yet. Please exit printing, wait for images to load, and try to
A Comprehensive Guide to print again.
Convolutional Neural Networks — the
ELI5 way
Sumit Saha Follow
Dec 15, 2018 · 7 min read
The agenda for this eld is to enable machines to view the world as
humans do, perceive it in a similar manner and even use the knowledge
for a multitude of tasks such as Image & Video recognition, Image
Analysis & Classi cation, Media Recreation, Recommendation Systems,
Natural Language Processing, etc. The advancements in Computer
Vision with Deep Learning has been constructed and perfected with
time, primarily over one particular algorithm — a Convolutional
Neural Network.
Introduction
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 1/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 2/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way
An image is nothing but a matrix of pixel values, right? So why not just
atten the image (e.g. 3x3 image matrix into a 9x1 vector) and feed it
to a Multi-Level Perceptron for classi cation purposes? Uh.. not really.
Input Image
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 3/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way
In the gure, we have an RGB image which has been separated by its
three color planes — Red, Green, and Blue. There are a number of such
color spaces in which images exist — Grayscale, RGB, HSV, CMYK, etc.
You can imagine how computationally intensive things would get once
the images reach dimensions, say 8K (7680×4320). The role of the
ConvNet is to reduce the images into a form which is easier to process,
without losing features which are critical for getting a good prediction.
This is important when we are to design an architecture which is not
only good at learning features but also is scalable to massive datasets.
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 4/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way
Convoluting a 5x5x1 image with a 3x3x1 kernel to get a 3x3x1 convolved feature
Kernel/Filter, K =
1 0 1
0 1 0
1 0 1
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 5/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way
The lter moves to the right with a certain Stride Value till it parses the
complete width. Moving on, it hops down to the beginning (left) of the
image with the same Stride Value and repeats the process until the
entire image is traversed.
In the case of images with multiple channels (e.g. RGB), the Kernel has
the same depth as that of the input image. Matrix Multiplication is
performed between Kn and In stack ([K1, I1]; [K2, I2]; [K3, I3]) and all
the results are summed with the bias to give us a squashed one-depth
channel Convoluted Feature Output.
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 6/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way
There are two types of results to the operation — one in which the
convolved feature is reduced in dimensionality as compared to the
input, and the other in which the dimensionality is either increased or
remains the same. This is done by applying Valid Padding in case of
the former, or Same Padding in the case of the latter.
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 7/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way
When we augment the 5x5x1 image into a 6x6x1 image and then apply
the 3x3x1 kernel over it, we nd that the convolved matrix turns out to
be of dimensions 5x5x1. Hence the name — Same Padding.
The following repository houses many such GIFs which would help you
get a better understanding of how Padding and Stride Length work
together to achieve results relevant to our needs.
vdumoulin/conv_arithmetic
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 8/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way
Pooling Layer
There are two types of Pooling: Max Pooling and Average Pooling. Max
Pooling returns the maximum value from the portion of the image
covered by the Kernel. On the other hand, Average Pooling returns the
average of all the values from the portion of the image covered by the
Kernel.
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 9/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way
Types of Pooling
The Convolutional Layer and the Pooling Layer, together form the i-th
layer of a Convolutional Neural Network. Depending on the
complexities in the images, the number of such layers may be increased
for capturing low-levels details even further, but at the cost of more
computational power.
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 10/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way
Now that we have converted our input image into a suitable form for
our Multi-Level Perceptron, we shall atten the image into a column
vector. The attened output is fed to a feed-forward neural network
and backpropagation applied to every iteration of training. Over a
series of epochs, the model is able to distinguish between dominating
and certain low-level features in images and classify them using the
Softmax Classi cation technique.
There are various architectures of CNNs available which have been key
in building algorithms which power and shall power AI as a whole in
the foreseeable future. Some of them have been listed below:
1. LeNet
2. AlexNet
3. VGGNet
4. GoogLeNet
5. ResNet
6. ZFNet
. . .
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 11/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way
ss-is-master-chief/MNIST-Digit.Recognizer-
CNNs
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 12/13
4/13/2019 A Comprehensive Guide to Convolutional Neural Networks — the ELI5 way
https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 13/13