0% found this document useful (0 votes)
6 views

UNIT2-CNN

Deep learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

UNIT2-CNN

Deep learning
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

CONVOLUTION NEURAL NETWORK (CNN)

A convolution neural network (CNN) is a category of machine learning model,


namely a type of deep learning algorithm well suited to analyzing visual data.
CNNs -- sometimes referred to as convnets -- use principles from linear algebra,
particularly convolution operations, to extract features and identify patterns within
images. Although CNNs are predominantly used to process images, they can also
be adapted to work with audio and other signal data.
A convolution neural network, or CNN, is a deep learning neural network designed
for processing structured arrays of data such as images. Convolutional neural
networks are widely used in computer vision and have become the state of the art
for many visual applications such as image classification, and have also found
success in natural language processing for text classification.
 Convolutional neural networks are very good at picking up on patterns
in the input image, such as lines, gradients, circles, or even eyes and
faces. It is this property that makes convolutional neural networks so
powerful for computer vision. Unlike earlier computer vision
algorithms, convolutional neural networks can operate directly on a raw
image and do not need any preprocessing.
 A convolutional neural network is a feed-forward neural network, often
with up to 20 or 30 layers. The power of a convolutional neural network
comes from a special kind of layer called the convolutional layer.
Convolutional neural networks work

 CNNs use a series of layers, each of which detects different features of an input
image. Depending on the complexity of its intended purpose, a CNN can contain
dozens, hundreds or even thousands of layers, each building on the outputs of
previous layers to recognize detailed patterns.
 The process starts by sliding a filter designed to detect certain features over the
input image, a process known as the convolution operation (hence the name
"convolution neural network"). The result of this process is a feature map that
highlights the presence of the detected features in the image. This feature map then
serves as input for the next layer, enabling a CNN to gradually build a hierarchical
representation of the image.
 Initial filters usually detect basic features, such as lines or simple textures.
Subsequent layers' filters are more complex, combining the basic features identified
earlier on to recognize more complex patterns. For example, after an initial layer
detects the presence of edges, a deeper layer could use that information to start
identifying shapes.
Convolutional Neural Network Design

 The architecture of a convolutional neural network is a multi-layered feed-


forward neural network, made by stacking many hidden layers on top of each
other in sequence. It is this sequential design that allows convolutional neural
networks to learn hierarchical features.
 The hidden layers are typically convolutional layers followed by activation
layers, some of them followed by pooling layers.
Applications of Convolutional Neural Networks

Convolutional neural networks are most


widely known for image analysis but they
have also been adapted for several
applications in other areas of
machine learning, such as natural language
processing
Convolutional Neural Networks for Self-Driving
Cars

 Several companies, such as Tesla and Uber, are using convolutional


neural networks as the computer vision component of a self-driving
car.
 A self-driving car’s computer vision system must be capable of
localization, obstacle avoidance, and path planning.

 Let us consider the case of pedestrian detection. A pedestrian is a


kind of obstacle which moves. A convolutional neural network must
be able to identify the location of the pedestrian and extrapolate
their current motion in order to calculate if a collision is imminent.

 A convolutional neural network for object detection is slightly more


complex than a classification model, in that it must not only classify
an object, but also return the four coordinates of its bounding box.
CONVOLUTION OPERATION
 The convolution operation is the process of implying a combination of two
functions that produce the third function as a result, employing filters across the
entire input image allows the filter to discover that feature of the image. Which is
also called a feature map.
Kernel
Stride
Padding
Pooling
Understanding of convolution operation
KERNAL
The kernel is a rectangular small matrix, which slides over the image from left to right
and top to bottom.

STRIDE
The number of pixels we slide over the input image by the kernel is
called a stride.

Let’s understand through a simple analogy of a Jigsaw puzzle. A jigsaw puzzle


is a perfect example of a convolutional operation.
At each pixel of the original image I, we estimate the neighbourhood of pixels located at
the center of the image kernel. We then take this neighbourhood of pixels, convolve them
with the kernel k, and obtain a single output value Sij. The kernel can be slide from left-
to-right and top-to-bottom, of a larger image
We have different filters like blurring (average smoothing, Gaussian smoothing, median
smoothing, etc.), edge detection (Laplacian, Sobel, Scharr, Prewitt, etc.), and sharpening
— all of these operations are designed to perform a particular function.
Padding
 Padding is adding extra pixels around the corners of an image to increase the image
size and make the kernel visit more pixels for better feature extraction. Learn how
padding works, why it is important, and what are the types of padding in neural
networks.
Need for padding
 The main aim of the network is to find the important feature in the image with the
help of convolutional layers but it may happen that some features are the corner of
the image which the kernel (feature extractor) visit very less number of times due to
which there could be a possibility to miss out some of the important information.
So, padding is the solution where we add extra pixels around the 4 corners of the
image which increases the image size by 2 but should be as neutral as possible which
means do not alter the original image features in any way to make it complicated for
the network to learn further. Also, we can add more layers as now we have a larger
image.
45*0 + 12*(-1) + 5*0 + 22*(-1) + 10*5 + 35*(-1) + 88*0
+ 26*(-1) + 51*0 = -45
pooling
 The pooling operation involves sliding a two-dimensional filter over each channel of
feature map and summarizing the features lying within the region covered by the
filter.
For a feature map having dimensions nh x nw x nc, the dimensions of output obtained
after a pooling layer is

 ( nh - f + 1) / s x (nw - f + 1)/s x nc

-> nh - height of feature map


-> nw - width of feature map
-> nc - number of channels in the feature map
-> f - size of filter
-> s - stride length
Use pooling layers
Pooling layers are used to reduce the dimensions of the feature
maps. Thus, it reduces the number of parameters to learn and
the amount of computation performed in the network.
The pooling layer summarises the features present in a region
of the feature map generated by a convolution layer.
So, further operations are performed on summarised features
instead of precisely positioned features generated by the
convolution layer. This makes the model more robust to
variations in the position of the features in the input image.
Types of Pooling Layers

Max pooling
Min pooling
Global Pooling
 Max Pooling
 Max pooling is a pooling operation that selects the maximum element from the
region of the feature map covered by the filter. Thus, the output after max-pooling
layer would be a feature map containing the most prominent features of the previous
feature map.
 import numpy as np from keras.models
 import Sequential from keras.layers
 import MaxPooling2D
 # define input image
 image = np.array([[2.0, 2.0, 7.0, 3.0], [9.0, 4.0, 6.0, 1.0], [8.0, 5.0, 2.0, 4.0], [3.0,
1.0, 2.0, 6.0]])
 image = image.reshape(1.0, 4.0, 4.0, 1.0)
 # define model containing just a single max pooling layer
 model = Sequential( [MaxPooling2D(pool_size = 2, strides = 2)])
 # generate pooled output
 output = model.predict(image)
 # print output image
 output = np.squeeze(output) print(output)
 Average Pooling
 Average pooling computes the average of the elements present in
the region of feature map covered by the filter. Thus, while max
pooling gives the most prominent feature in a particular patch of
the feature map, average pooling gives the average of features
present in a patch.
 import numpy as np from keras.models
 import Sequential from keras.layers
 import AveragePooling2D
 # define input image
 image = np.array([[2.0, 2.0, 7.0, 3.0], [9.0, 4.0, 6.0, 1.0], [8.0, 5.0, 2.0, 4.0],
[3.0, 1.0, 2.0, 6.0]])
 image = image.reshape(1.0, 4.0, 4.0, 1.0)
 # define model containing just a single average pooling layer
 model = Sequential( [AveragePooling2D(pool_size = 2, strides = 2)])
 # generate pooled output
 output = model.predict(image)
 # print output image
 output = np.squeeze(output) print(output)
Global Pooling
Global pooling reduces each channel in the
feature map to a single value. Thus, an nh x
nw x nc feature map is reduced to 1 x 1 x
nc feature map. This is equivalent to using a
filter of dimensions nh x nw i.e. the dimensions
of the feature map.
Further, it can be either global max pooling or
global average pooling.
Code #3 : Performing Global Pooling using
keras
 import numpy as np from keras.models
 import Sequential from keras.layers
 import GlobalMaxPooling2D from keras.layers
 import GlobalAveragePooling2D
 # define input image
 image = np.array([[2.0, 2.0, 7.0, 3.0], [9.0, 4.0, 6.0, 1.0], [8.0, 5.0, 2.0, 4.0],
[3.0, 1.0, 2.0, 6.0]])
 image = image.reshape(1.0, 4.0, 4.0, 1.0)
 # define gm_model containing just a single global-max pooling layer
gm_model = Sequential( [GlobalMaxPooling2D()])
 # define ga_model containing just a single global-average pooling layer
ga_model = Sequential( [GlobalAveragePooling2D()])
 # generate pooled output gm_output = gm_model.predict(image) ga_output
= ga_model.predict(image)
 # print output image
 Gm_output = np.squeeze(gm_output) ga_output = np.squeeze(ga_output)
print("gm_output: ", gm_output) print("ga_output: ", ga_output)
gm_output: 9.0
ga_output: 4.0625
(CNNs) nonlinearity functions
 In Convolution Neural Networks (CNNs), nonlinearity functions play a crucial role
in enabling the network to learn complex patterns. These functions are typically
applied after each convolution operation to introduce nonlinearity into the model,
allowing it to capture more intricate features .
Here are some common nonlinearity functions used in CNNs:
ReLU (Rectified Linear Unit):
This is the most widely used activation function in CNNs. It replaces
all negative values in the feature map with zero, which helps in mitigating
the vanishing gradient problem and speeds up training.
Sigmoid:
This function maps input values to a range between 0 and 1. It’s
often used in the output layer for binary classification tasks.
Tanh (Hyperbolic Tangent):
Similar to the sigmoid function but maps input values to a range
between -1 and 1. It is often used in hidden layers.
Leaky ReLU:
A variation of ReLU that allows a small, non-zero gradient when the
unit is not active, which helps in learning during the training process.
Exponential linear unit(ELU)

This function tends to converge cost to zero faster and produce more accurate results
loss function
A loss function is a function that compares
the target and predicted output
values; measures how well the neural
network models the training data. When
training, we aim to minimize this loss
between the predicted and target outputs.

You might also like