UNIT2-CNN
UNIT2-CNN
CNNs use a series of layers, each of which detects different features of an input
image. Depending on the complexity of its intended purpose, a CNN can contain
dozens, hundreds or even thousands of layers, each building on the outputs of
previous layers to recognize detailed patterns.
The process starts by sliding a filter designed to detect certain features over the
input image, a process known as the convolution operation (hence the name
"convolution neural network"). The result of this process is a feature map that
highlights the presence of the detected features in the image. This feature map then
serves as input for the next layer, enabling a CNN to gradually build a hierarchical
representation of the image.
Initial filters usually detect basic features, such as lines or simple textures.
Subsequent layers' filters are more complex, combining the basic features identified
earlier on to recognize more complex patterns. For example, after an initial layer
detects the presence of edges, a deeper layer could use that information to start
identifying shapes.
Convolutional Neural Network Design
STRIDE
The number of pixels we slide over the input image by the kernel is
called a stride.
( nh - f + 1) / s x (nw - f + 1)/s x nc
Max pooling
Min pooling
Global Pooling
Max Pooling
Max pooling is a pooling operation that selects the maximum element from the
region of the feature map covered by the filter. Thus, the output after max-pooling
layer would be a feature map containing the most prominent features of the previous
feature map.
import numpy as np from keras.models
import Sequential from keras.layers
import MaxPooling2D
# define input image
image = np.array([[2.0, 2.0, 7.0, 3.0], [9.0, 4.0, 6.0, 1.0], [8.0, 5.0, 2.0, 4.0], [3.0,
1.0, 2.0, 6.0]])
image = image.reshape(1.0, 4.0, 4.0, 1.0)
# define model containing just a single max pooling layer
model = Sequential( [MaxPooling2D(pool_size = 2, strides = 2)])
# generate pooled output
output = model.predict(image)
# print output image
output = np.squeeze(output) print(output)
Average Pooling
Average pooling computes the average of the elements present in
the region of feature map covered by the filter. Thus, while max
pooling gives the most prominent feature in a particular patch of
the feature map, average pooling gives the average of features
present in a patch.
import numpy as np from keras.models
import Sequential from keras.layers
import AveragePooling2D
# define input image
image = np.array([[2.0, 2.0, 7.0, 3.0], [9.0, 4.0, 6.0, 1.0], [8.0, 5.0, 2.0, 4.0],
[3.0, 1.0, 2.0, 6.0]])
image = image.reshape(1.0, 4.0, 4.0, 1.0)
# define model containing just a single average pooling layer
model = Sequential( [AveragePooling2D(pool_size = 2, strides = 2)])
# generate pooled output
output = model.predict(image)
# print output image
output = np.squeeze(output) print(output)
Global Pooling
Global pooling reduces each channel in the
feature map to a single value. Thus, an nh x
nw x nc feature map is reduced to 1 x 1 x
nc feature map. This is equivalent to using a
filter of dimensions nh x nw i.e. the dimensions
of the feature map.
Further, it can be either global max pooling or
global average pooling.
Code #3 : Performing Global Pooling using
keras
import numpy as np from keras.models
import Sequential from keras.layers
import GlobalMaxPooling2D from keras.layers
import GlobalAveragePooling2D
# define input image
image = np.array([[2.0, 2.0, 7.0, 3.0], [9.0, 4.0, 6.0, 1.0], [8.0, 5.0, 2.0, 4.0],
[3.0, 1.0, 2.0, 6.0]])
image = image.reshape(1.0, 4.0, 4.0, 1.0)
# define gm_model containing just a single global-max pooling layer
gm_model = Sequential( [GlobalMaxPooling2D()])
# define ga_model containing just a single global-average pooling layer
ga_model = Sequential( [GlobalAveragePooling2D()])
# generate pooled output gm_output = gm_model.predict(image) ga_output
= ga_model.predict(image)
# print output image
Gm_output = np.squeeze(gm_output) ga_output = np.squeeze(ga_output)
print("gm_output: ", gm_output) print("ga_output: ", ga_output)
gm_output: 9.0
ga_output: 4.0625
(CNNs) nonlinearity functions
In Convolution Neural Networks (CNNs), nonlinearity functions play a crucial role
in enabling the network to learn complex patterns. These functions are typically
applied after each convolution operation to introduce nonlinearity into the model,
allowing it to capture more intricate features .
Here are some common nonlinearity functions used in CNNs:
ReLU (Rectified Linear Unit):
This is the most widely used activation function in CNNs. It replaces
all negative values in the feature map with zero, which helps in mitigating
the vanishing gradient problem and speeds up training.
Sigmoid:
This function maps input values to a range between 0 and 1. It’s
often used in the output layer for binary classification tasks.
Tanh (Hyperbolic Tangent):
Similar to the sigmoid function but maps input values to a range
between -1 and 1. It is often used in hidden layers.
Leaky ReLU:
A variation of ReLU that allows a small, non-zero gradient when the
unit is not active, which helps in learning during the training process.
Exponential linear unit(ELU)
This function tends to converge cost to zero faster and produce more accurate results
loss function
A loss function is a function that compares
the target and predicted output
values; measures how well the neural
network models the training data. When
training, we aim to minimize this loss
between the predicted and target outputs.