Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
https://www.datacamp.com/tutorial/introduction-to-convolutional-neural-networks-cnns
Intro
➔ translation-invariant characteristics:
➔ Beyond image classification tasks, CNNs are versatile and can be applied to a
range of other domains, such as natural language processing, time series
analysis, and speech recognition.
Inspiration Behind CNN and Parallels With The Human Visual System
Hierarchical architecture:
Local connectivity:
Translation invariance:
Non-linearity:
Key Components of a CNN
● Convolutional layers
● Rectified Linear Unit (ReLU for short)
● Pooling layers
● Fully connected layers
We dive into the definition of each one of these components through the example
of the following example of classification of a handwritten digit.
Convolution layers
● This is the first building block of a CNN. The main mathematical task
performed is called convolution, which is the application of a sliding window
function to a matrix of pixels representing an image. The sliding function
applied to the matrix is called kernel or filter, and both can be used
interchangeably.
● In the convolution layer, several filters of equal size are applied, and each
filter is used to recognize a specific pattern from the image, such as the
curving of the digits, the edges, the whole shape of the digits, and more.
● in the convolution layer, we use small grids (called filters or kernels) that
move over the image. Each small grid is like a mini magnifying glass that
looks for specific patterns in the photo, like lines, curves, or shapes. As it
moves across the photo, it creates a new grid that highlights where it found
these patterns.
● For example, one filter might be good at finding straight lines, another might
find curves, and so on. By using several different filters, the CNN can get a
good idea of all the different patterns that make up the image.
Let’s consider grayscale image of a handwritten digit.
let’s consider the kernel used for the convolution. It is a matrix with a
dimension of 3x3.
● The weights of
each element of the
kernel is
represented in the
grid.
● Zero represents
black grids and
ones white grid.
Perform the convolution operation by applying the dot product, and
work as follows:
1. Apply the kernel matrix from the top-left corner to the right.
2. Perform element-wise multiplication.
3. Sum the values of the products.
4. The resulting value corresponds to the first value (top-left corner) in the
convoluted matrix.
5. Move the kernel down with respect to the size of the sliding window.
6. Repeat steps 1 to 5 until the image matrix is fully covered.
https://towardsdatascience.com/gentle-dive-into-math-behind-convolutional-neural-networks-79a07dd44cf9#:~:text=Besides%20convolution
%20layers%2C%20CNNs,for%20each%20of%20those%20parts.
● we have seen, when we perform convolution over the 6x6 image with a 3x3
kernel, we get a 4x4 feature map.
● This is because there are only 16 unique positions where we can place our
filter inside this picture.
● Since our image shrinks every time we perform convolution, we can do it only
a limited number of times before our image disappears completely.
● we see that the impact of the pixels located on the outskirts is much smaller
than those in the center of image.
● This way we lose some of the information contained in the picture.
● To solve both of these problems we can pad our image with an additional
border. For example, if we use 1px padding, we increase the size of our photo
to 8x8, so that output of the convolution with the 3x3 filter will be 6x6.
● The padding width, should meet the following equation, where p is padding and
f is the filter dimension (usually odd).
Strided Convolution
When designing our CNN architecture, we can decide to increase the step if we
want the receptive fields to overlap less or if we want smaller spatial dimensions
of our feature map.
The dimensions of the output matrix - taking into account padding and stride - can
be calculated using the following formula.
Activation function
● The goal of the pooling layer is to pull the most significant features from the
convoluted matrix.
● This is done by applying some aggregation operations, which reduce the
dimension of the feature map (convoluted matrix).
● Hence reducing the memory used while training the network.
● Pooling is also relevant for mitigating overfitting.
The last pooling layer flattens its feature map so that it can be processed by the fully connected layer.
kernel/Filter