0% found this document useful (0 votes)
6 views

CNN 1

This document provides an overview of Convolutional Neural Networks (CNNs), detailing their architecture, applications, and advantages over traditional machine learning methods. It explains the steps involved in CNNs, including convolution, activation functions, and the importance of focusing on Regions of Interest (ROI) for efficiency and accuracy. Additionally, it discusses the limitations of 2D convolution and highlights key concepts such as loss functions and the comparison with traditional approaches.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

CNN 1

This document provides an overview of Convolutional Neural Networks (CNNs), detailing their architecture, applications, and advantages over traditional machine learning methods. It explains the steps involved in CNNs, including convolution, activation functions, and the importance of focusing on Regions of Interest (ROI) for efficiency and accuracy. Additionally, it discusses the limitations of 2D convolution and highlights key concepts such as loss functions and the comparison with traditional approaches.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Minor in AI

Convolution Neural Network

January 23, 2025


Minor in AI Notes

Figure 1: Deep Learning vs Machine Learning

1 Introduction
Convolutional Neural Networks (CNNs) are a class of deep neural networks designed
specifically for tasks involving structured data such as images, video sequences, and time-
series data. Their architecture mimics the connectivity pattern of neurons in the human
brain, where small clusters of neurons respond to stimuli in their receptive field.
CNNs are widely used in:

• Image and video recognition

• Object detection

• Medical image analysis

• Natural language processing (e.g., sentence classification)

2 Traditional Machine Learning vs Deep Learning


• Feature Extraction:

– Traditional ML relies on manually designed features.


– Deep Learning automatically extracts hierarchical features.

• Scalability:

– Traditional ML struggles with large-scale data.


– Deep Learning thrives on massive datasets using GPUs/TPUs.

• Performance:

– Traditional ML models like SVM and Random Forests perform well on small
datasets.
– Deep Learning surpasses in tasks involving unstructured data such as images
and videos.

Convolution Neural Network 1


Minor in AI Notes

Figure 2: Convolution in 2D

3 Steps in a CNN
1. Convolution: Applies filters (kernels) to extract features such as edges, textures,
or complex patterns.

2. ReLU: Introduces non-linearity by applying f (x) = max(0, x), enabling the model
to learn non-linear decision boundaries.

3. Pooling: Reduces dimensionality while retaining essential features, improving com-


putational efficiency.

4. Fully Connected Layer: Combines extracted features for final classification or


regression output.

4 Convolution in 2D
Convolution is a mathematical operation that processes an input matrix using a smaller
matrix called a kernel (or filter). It involves sliding the kernel over the input matrix
and computing the weighted sum of the overlapping elements. The result is stored in an
output matrix. To understand the formula:

1. For each position (i, j) in the output matrix Y , place the kernel K over the input
matrix X, aligning the top-left corner of the kernel with the position (i, j) in X.

2. Multiply each element of the kernel K[m, n] by the corresponding element in the
input matrix X[i + m, j + n].

3. Sum all these multiplied values to compute a single number, which becomes the
value of Y [i, j].

Convolution Neural Network 2


Minor in AI Notes

4. Slide the kernel to the next position and repeat the process until all positions in Y
are filled.

In simpler terms, convolution applies a filter (kernel) to the input to detect patterns,
reduce dimensionality, or create new representations of the data. Mathematically, it can
be written as:
k−1 X
X k−1
Y [i, j] = X[i + m, j + n] · K[m, n]
m=0 n=0

Where:

• X: The input matrix, which represents the data to be processed (e.g., an image or
feature map).

• K: The kernel or filter, a small matrix containing weights used to extract specific
patterns from the input.

• Y : The output matrix, which stores the results of the convolution operation.

• i, j: The row and column indices in the output matrix Y .

• k: The size (height and width) of the kernel K, assuming it is square.

• m, n: Indices that traverse the rows and columns of the kernel.

4.1 Example: Matrix Convolution (Single Step Calculation)


Let us perform a single step of the convolution operation using the following matrices:
 
1 1 1 0 0  
0 1 1 1 0 1 0 1
   
0 0 1 1 1 ∗ 0 1 0 = · · ·
 
0 0 1 1 0 1 0 1
0 1 1 0 0
   
1 1 1 1 0 1
Input Matrix (Slice) = 0 1 1 , Kernel = 0 1 0
0 0 1 1 0 1
The convolution operation involves element-wise multiplication followed by summa-
tion:

Step-by-step Calculation:
(1 · 1) + (1 · 0) + (1 · 1)
+(0 · 0) + (1 · 1) + (1 · 0)
+(0 · 1) + (0 · 0) + (1 · 1)

Result = 1 + 0 + 1 + 0 + 1 + 0 + 0 + 0 + 1 = 4
Thus, the value of the convolution operation at this position is: 4

Convolution Neural Network 3


Minor in AI Notes

Figure 3: Convolution Operation

4.2 Padding
Padding ensures the output size is preserved or adjusted based on the application:
• Valid Padding: No padding is applied, reducing output size.
• Same Padding: Adds zeros around the input to maintain the output size.

4.3 Stride
Stride refers to the step size with which the kernel moves. It impacts:
• Output Size: Larger strides result in smaller output dimensions.
• Computational Cost: Higher strides reduce computation at the cost of detail
loss.

5 Activation Functions
Activation functions introduce non-linearity into the model. Common types:
• Sigmoid: Squashes input into (0, 1). Useful for probabilities.
1
f (x) =
1 + e−x

• Tanh: Squashes input into (−1, 1). Centered at zero.


f (x) = tanh(x)

• ReLU: Replaces negative values with zero.


f (x) = max(0, x)

• Leaky ReLU: Allows a small gradient for negative values.


(
x x≥0
f (x) =
αx x < 0

• ELU: Exponential Linear Unit, smoothes ReLU’s zero gradient issue.

Convolution Neural Network 4


Minor in AI Notes

Figure 4: Activation Functions

6 Region of Interest (ROI)


A Region of Interest (ROI) refers to a specific part of an image or data that is
particularly important for the problem at hand. Instead of processing the entire image,
focusing on the ROI allows us to work on areas that hold meaningful information, reducing
computational complexity and improving efficiency.

6.1 Why ROI is Important


• Efficiency: By processing only the ROI, we save resources and time, especially for
high-resolution images.

• Focus on Relevant Data: Not all parts of an image are equally important. For
example:

– In medical imaging, the ROI could be a tumor in an X-ray or MRI scan.


– In facial recognition, the ROI is the face, excluding the background.

• Improved Accuracy: Concentrating on the ROI reduces noise and irrelevant data,
which can confuse the model.

6.2 Applications of ROI


• Medical Imaging: Detecting abnormalities such as tumors or fractures.

• Object Detection: Highlighting and identifying objects like cars, pedestrians, or


animals in an image.

• Image Segmentation: Dividing an image into meaningful sections, where each


section has a specific purpose.

Convolution Neural Network 5


Minor in AI Notes

7 Limitations of 2D Convolution
While 2D convolution is highly effective, it comes with certain limitations:

7.1 Fixed Receptive Field


• The receptive field is the area of the input image covered by the kernel at a given
time.

• A fixed kernel size may miss patterns that span a larger area, such as global textures
or large-scale structures.

7.2 Loss of Contextual Information


• Convolution operations focus on local patterns, such as edges or small textures.

• As a result, they may not capture relationships between distant parts of an image,
which is essential for understanding the overall structure.

7.3 High Computational Cost


• For high-resolution images, the number of computations increases significantly due
to the large input size.

• This can lead to slower processing and the need for more powerful hardware.

7.4 Limited Temporal Understanding


• 2D convolution cannot model temporal changes in data, such as variations across
video frames or sequences of events.

• For such cases, specialized architectures like 3D Convolutions or Recurrent Neural


Networks (RNNs) are required.

7.5 Boundary Effects


• Convolution may produce distorted results at the boundaries of an image due to
incomplete coverage of the kernel.

• Padding partially mitigates this issue, but it introduces artificial data (zeros or other
values) into the computation.

7.6 Sensitivity to Rotation and Scaling


• Convolutional filters are sensitive to the orientation and size of patterns in the image.

• Features such as edges may not be detected properly if the object is rotated or
scaled.

Convolution Neural Network 6


Minor in AI Notes

8 Conclusion
Convolutional Neural Networks (CNNs) are a cornerstone of modern deep learning, ex-
celling in tasks involving structured data such as images and videos. Their architecture,
inspired by the human brain’s visual processing, allows them to learn hierarchical features
from raw data without manual feature engineering.

8.1 Key Takeaways


• Loss Functions: The loss function is a critical component for supervised learning,
quantifying the difference between predicted outputs and ground truth. Examples
include Mean Squared Error (MSE) for regression and Cross-Entropy Loss for clas-
sification.

• CNN Workflow: The steps of a CNN—Convolution, ReLU, Pooling, and Fully


Connected layers—together enable efficient feature extraction and decision-making.
Each step plays a distinct role:

– Convolution captures local patterns using kernels.


– ReLU introduces non-linearity, enabling the model to learn complex patterns.
– Pooling reduces dimensionality, improving computational efficiency.
– Fully connected layers integrate features for final predictions.

• Region of Interest (ROI): Focusing on specific regions of interest improves model


efficiency and accuracy by reducing noise and processing only the most relevant
parts of an image. Applications include medical imaging, object detection, and
segmentation.

• Activation Functions: Non-linear activation functions such as Sigmoid, Tanh,


ReLU, and Leaky ReLU allow neural networks to model complex relationships in
data. Each activation function has specific use cases based on the nature of the task
and data.

• Limitations of 2D Convolution: While 2D convolutions are powerful, they face


challenges such as:

– Fixed receptive fields, limiting their ability to capture global patterns.


– Loss of contextual information from distant parts of an image.
– High computational cost for large-scale data.
– Inability to model temporal relationships, as seen in sequential data.
– Sensitivity to changes in rotation, scale, and orientation.

Techniques like 3D convolutions, padding, and architectural innovations aim to ad-


dress these limitations.

• Comparison with Traditional Machine Learning: CNNs outperform tradi-


tional approaches in tasks involving unstructured data by learning features directly
from raw data. They also scale better with larger datasets and complex tasks.

Convolution Neural Network 7


Minor in AI Notes

9 Appendix
9.1 Loss Functions
In supervised learning, the loss function serves as the guiding metric for model opti-
mization. It calculates the error between the predicted output (ypred ) and the ground
truth (ytrue ).
Examples:

• Mean Squared Error (MSE): Common for regression tasks.


n
1 X  (i) (i)
2
MSE = ytrue − ypred
n i=1

• Cross-Entropy Loss: Common for classification tasks.


n C
1 X X (i) (i)
Cross-Entropy = − ytrue,c log(ypred,c )
n i=1 c=1

Convolution Neural Network 8

You might also like