0% found this document useful (0 votes)
16 views

CV_T3_ Unit-7

This document provides an overview of Convolutional Neural Networks (CNNs) for image classification, detailing the architecture and function of various layers including convolutional, pooling, and fully connected layers. It explains the process of image classification, including data labeling, types of classification, and the difference between image classification and object detection. Additionally, it includes a step-by-step guide for building and training a CNN model using TensorFlow and Keras, along with data preprocessing and evaluation techniques.

Uploaded by

hrp25082003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

CV_T3_ Unit-7

This document provides an overview of Convolutional Neural Networks (CNNs) for image classification, detailing the architecture and function of various layers including convolutional, pooling, and fully connected layers. It explains the process of image classification, including data labeling, types of classification, and the difference between image classification and object detection. Additionally, it includes a step-by-step guide for building and training a CNN model using TensorFlow and Keras, along with data preprocessing and evaluation techniques.

Uploaded by

hrp25082003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 36

UNIT 7

Convolution Neural Network For Image Classification

7.1 Introduction to CNNs

7.2 Image Classification

7.3 Object Detection

7.1 Introduction to CNNs: Convolutional layers, pooling layers, fully


connected layers
What are Convolutional Neural Networks, ConvNets in short. Let’s start by looking at
how a ConvNet looks!

Convolutional Neural Network

CNN Architecture and Importance of Each Layer

A Convolutional Neural Network (CNN) is a type of deep learning model designed to


process structured data like images and videos. Its architecture consists of several key
layers, each with a specific role:

1. Input Layer

● Purpose: Takes in the raw image data.


● Example Input: A 2D grayscale image (matrix of pixel values) or a 3D RGB
image.
● Importance: Converts the image into a numerical format that the CNN can
process.
2. Convolutional Layer

● Purpose: Extracts features (edges, textures, shapes) by applying small filters


(kernels) over the input.

● How it works:
○ Kernels slide over the image, performing element-wise multiplications and
summing the results (convolution operation).
○ The result is a feature map that highlights specific patterns.
● Importance: Learns spatial features, preserving the relationship between pixels.
● Example: Detects edges or corners in an image of a face.
Here in above figure stride is 2,filter convolve over image with 2 difference in
horizontal and vertical slide.

3. Activation Layer (ReLU)

● Purpose: Introduces non-linearity by converting negative values in the feature


map to zero.
● Importance:
○ Helps the CNN learn complex patterns.
○ Without this layer, the CNN would behave like a linear model, limiting its
capability.

4. Pooling Layer

● Purpose: Reduces the spatial size of the feature maps while retaining important
information.
● Types:
○ Max Pooling: Keeps the maximum value in a region.
○ Average Pooling: Computes the average value in a region.
● Importance:
○ Reduces computational cost.
○ Prevents overfitting by generalizing feature detection.
○ Makes the network invariant to small translations (e.g., shifting an object
slightly in an image).
○ Pooling downsamples the image in its height and width but the number of
channels(depth) stays the same.

5. Flattening Layer
● Purpose: Converts the 2D or 3D feature maps into a 1D vector.

● Importance: Prepares the data for the fully connected layer.


6. Fully Connected Layer (Dense Layer)

● Purpose: Acts as the decision-making part of the network.


○ Every neuron is connected to every feature from the previous layer.
● Importance:
○ Combines extracted features to make predictions.
○ Example: Uses detected edges, shapes, and textures to classify an image as
a cat or dog.

7. Dropout Layer

● Purpose: Randomly ignores (drops) a percentage of neurons during training.


● Importance:
○ Prevents overfitting.
○ Improves generalization to unseen data.

8. Output Layer (Softmax/Logits)

● Purpose: Provides the final prediction.


○ Softmax: Converts raw scores (logits) into probabilities (used for multi-
class classification).
○ Sigmoid: Outputs a single probability for binary classification tasks.

● Importance: Ensures outputs are interpretable as probabilities.

How the Layers Work Together

1. Early Layers: Detect simple patterns (e.g., edges, lines).


2. Middle Layers: Combine simple patterns into complex features (e.g., eyes, ears).
3. Final Layers: Combine these features to identify the object (e.g., "This is a cat").

For each layer of the Artificial Neural Network, the following calculation takes
place(falls in DENSE Layers)
ANN Calculation for each layer
where,
x — is the input vector with dimension [p_l, 1]
W — Is the weight matrix with dimensions [p_l, n_l] where, p_l is the number of
neurons in the previous layer and n_l is the number of neurons in the current
layer.
b — Is the bias vector with dimension [p_l, 1]
g — Is the activation function, which is usually ReLU.
This calculation is repeated for each layer.
After passing through the fully connected layers, the final layer uses the softmax
activation function (instead of ReLU) which is used to get probabilities of the
input being in a particular class (classification).

Summary:

7.2 Image Classification


https://www.analyticsvidhya.com/blog/2020/02/learn-image-classification-cnn-
convolutional-neural-networks-3-datasets/
What is Image Classification?

● Assigns a label or tag to an entire image based on training data of labeled images.
● Involves pixel-level analysis to determine the most suitable label.
● Enables valuable insights and informed decisions.

Data Labeling in Training

● Accurate labeling during the training phase is crucial to avoid discrepancies.


● Often employs publicly available datasets for reliable model training.

Types of Image Classification

1. Binary Classification
○ Labels images into two categories (e.g., benign/malignant tumors, defect/no
defect).
○ Answers yes/no questions.
2. Multiclass Classification
○ Categorizes items into three or more classes (e.g., sentiment analysis, disease
classification).
3. Multilabel Classification
○ Allows an image to have multiple labels (e.g., identifying all colors in a fruit salad
image).
4. Hierarchical Classification
○ Organizes classes into a hierarchical structure with broad and specific categories.
○ Example: Identifying fruit type (apple vs. grape) followed by subtypes
(Honeycrisp, Red Delicious).

Image Classification vs. Object Detection

● Image Classification: Assigns one label to the entire image.


● Object Localization: Identifies and locates specific objects in an image (uses bounding
boxes).
● Object Detection: Combines classification and localization to identify multiple objects
and their positions.

How Image Classification Works

1. Image Pre-processing
○ Improves image quality for analysis.
○ Techniques: resizing, cropping, normalization, noise reduction, data
augmentation.
2. Feature Extraction
○ Identifies visual patterns (e.g., edge detection, texture analysis, CNN-based
feature learning).
○ Essential for distinguishing between classes.
3. Object Classification
○ Uses machine learning algorithms to assign the image to a class based on
extracted features.

Deep Neural Networks in Image Classification

● Convolutional Neural Networks (CNNs)


○ Key layers: input layer, convolution layer, pooling layer, ReLU layer.
○ Extracts features directly from raw data.
○ Achieves state-of-the-art performance in image classification tasks

CLASSIFICATION FOR READY DATA:

Here’s a step-by-step guide to build an image classification model using TensorFlow and
Keras:

Step 1: Install Required Libraries


pip install tensorflow matplotlib

Step 2: Import Necessary Libraries


import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt

Step 3: Load and Preprocess Data

● Use a dataset like CIFAR-10, MNIST, or load your custom dataset.


● Normalize pixel values to range [0,1].

Example with CIFAR-10:


# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.cifar10.load_data()

# Normalize pixel values


x_train, x_test = x_train / 255.0, x_test / 255.0

# Convert labels to one-hot encoding


y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)

Step 4: Define the CNN Model


model = Sequential([
# Convolutional layers
Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
MaxPooling2D((2, 2)),

Conv2D(64, (3, 3), activation='relu'),


MaxPooling2D((2, 2)),

# Flatten and Dense layers


Flatten(),
Dense(128, activation='relu'),
Dropout(0.5),
Dense(10, activation='softmax') # 10 classes for CIFAR-10
])

Step 5: Compile the Model

● Choose an optimizer, loss function, and evaluation metrics.

model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
Step 6: Augment Data (Optional)

● Use ImageDataGenerator for real-time data augmentation.

datagen = ImageDataGenerator(
rotation_range=15,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True)

datagen.fit(x_train)

Step 7: Train the Model

● Train the model using the training data.

history = model.fit(datagen.flow(x_train, y_train, batch_size=64),


epochs=10,
validation_data=(x_test, y_test))

Step 8: Evaluate the Model

● Check the model’s performance on the test dataset.

test_loss, test_accuracy = model.evaluate(x_test, y_test)


print(f"Test Accuracy: {test_accuracy:.2f}")

Step 9: Visualize Training Results

● Plot training and validation accuracy/loss.

plt.plot(history.history['accuracy'], label='Training Accuracy')


plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

Step 10: Save and Load the Model

● Save the trained model for future use.

model.save('image_classification_model.h5')

● Load the model later.


loaded_model =tf.keras.models.load_model('image_classification_model.h5')

Step 11: Make Predictions

● Use the model to classify new images.

import numpy as np

# Load an image and preprocess


new_image = x_test[0] # Example: using a test image
new_image = np.expand_dims(new_image, axis=0) # Add batch dimension

# Predict
predictions = model.predict(new_image)
print("Predicted Class: {np.argmax(predictions)}")

Function Purpose Parameters Example


Loads pre-defined datasets
tf.keras.datasets.<dataset>.load_d like CIFAR-10, MNIST, (x_train, y_train), (x_test, y_test) =
ata() etc. - None tf.keras.datasets.cifar10.load_data()

Creates a sequential model


Sequential() by stacking layers linearly. - None model = Sequential([...])
filters: Number of filters
(kernels).
kernel_size: Size of the
kernel (e.g., (3, 3)).
activation: Activation
Adds a convolutional function (e.g., 'relu').
Conv2D(filters, kernel_size, layer to extract features input_shape: Shape of input Conv2D(32, (3, 3), activation='relu',
activation, input_shape) from input data. (only for the first layer). input_shape=(32, 32, 3))

Reduces spatial pool_size: Size of the pooling


MaxPooling2D(pool_size) dimensions using pooling. window (e.g., (2, 2)). MaxPooling2D((2, 2))

Flattens 2D/3D feature


Flatten() maps into a 1D vector. - None Flatten()

units: Number of neurons.


Fully connected layer for activation: Activation
predictions or feature function (e.g., 'relu',
Dense(units, activation) combination. 'softmax'). Dense(128, activation='relu')
Drops a fraction of
neurons randomly during rate: Fraction of neurons to
Dropout(rate) training. drop (e.g., 0.5). Dropout(0.5)

optimizer: Optimization
algorithm (e.g., 'adam').
loss: Loss function (e.g.,
'categorical_crossentropy'). model.compile(optimizer='adam',
model.compile(optimizer, loss, Configures the model for metrics: Metrics for loss='categorical_crossentropy',
metrics) training. evaluation (e.g., 'accuracy'). metrics=['accuracy'])

rotation_range: Degree of
rotation.
width_shift_range: Fraction
for horizontal shift.
height_shift_range: Fraction
for vertical shift.
Applies real-time data horizontal_flip: Boolean for ImageDataGenerator(rotation_range=15,
ImageDataGenerator() augmentation for training. flipping horizontally. width_shift_range=0.1, horizontal_flip=T
x, y: Training data and labels.
batch_size: Number of
samples per batch.
epochs: Number of training
cycles. model.fit(datagen.flow(x_train, y_train,
model.fit(x, y, batch_size, epochs, Trains the model on the validation_data: Data for batch_size=64), epochs=10,
validation_data) training data. validation during training. validation_data=(x_test, y_test))
Evaluates the model’s test_loss, test_accuracy =
model.evaluate(x, y) performance on test data. x, y: Test data and labels. model.evaluate(x_test, y_test)
Saves the trained model to filepath: Path to save the
model.save(filepath) a file. model. model.save('model.h5')

Makes predictions on
model.predict(x) new/unseen data. x: Input data for predictions. predictions = model.predict(new_image)

CLASSIFICATION FOR SELF DATA:

step-by-step guide to building an image classification model from scratch using your

own dataset:

Step 1: Prepare Your Dataset


Organize your images in a folder structure:
dataset/
train/
class1/
img1.jpg
img2.jpg
class2/
img3.jpg
img4.jpg
validation/
class1/
img5.jpg
class2/
img6.jpg

1. Data :
○ train/ is for training images.
○ validation/ is for validation images.
2. Ensure that the dataset has balanced classes for better performance.

Upload Your Dataset to Google Colab


1. Compress your dataset (organized in train/validation folders) into a .zip file.
2. Upload it to your Colab environment:
Run the following code to upload files interactively:

from google.colab import files


uploaded = files.upload()
After uploading, unzip the dataset:
import zipfile
import os
zip_path = '/content/your_dataset.zip' # Replace with your uploaded file name
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
zip_ref.extractall('/content/dataset')
dataset_path = '/content/dataset'

Step 2: Import Required Libraries


import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt

Step 3: Set Dataset Paths


train_dir = '/content/dataset/train'
validation_dir = '/content/dataset/validation'

Step 4: Data Preprocessing

Prepare the data generators for training and validation.

# Data augmentation for training


train_datagen = ImageDataGenerator(
rescale=1.0/255,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest'
)

# Rescale validation data


validation_datagen = ImageDataGenerator(rescale=1.0/255)
# Create generators
train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(150, 150),
batch_size=32,
class_mode='categorical' # Use 'binary' for binary classification
)

validation_generator = validation_datagen.flow_from_directory(
validation_dir,
target_size=(150, 150),
batch_size=32,
class_mode='categorical'
)

Step 5: Define the CNN Model

Create a sequential CNN architecture.

model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)),
MaxPooling2D((2, 2)),

Conv2D(64, (3, 3), activation='relu'),


MaxPooling2D((2, 2)),

Conv2D(128, (3, 3), activation='relu'),


MaxPooling2D((2, 2)),

Flatten(),
Dense(512, activation='relu'),
Dropout(0.5),
Dense(train_generator.num_classes, activation='softmax')
])

Step 6: Compile the Model


model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
Step 7: Train the Model

Fit the model using the data generators.

history = model.fit(
train_generator,
steps_per_epoch=train_generator.samples // train_generator.batch_size,
epochs=10,
validation_data=validation_generator,
validation_steps=validation_generator.samples // validation_generator.batch_size
)

Step 8: Visualize Training Results

Plot the training and validation accuracy/loss.

# Plot accuracy
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

# Plot loss
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

Step 9: Save the Model

Save the trained model in Google Drive.

from google.colab import drive


drive.mount('/content/drive')

model.save('/content/drive/MyDrive/custom_image_classifier.h5')

Step 10: Test the Model

Test the model on new images uploaded to Colab.


Upload a new image:

from google.colab import files


uploaded = files.upload()
Load and preprocess the image:
from tensorflow.keras.preprocessing import image
import numpy as np

img_path = 'uploaded_image.jpg' # Replace with the uploaded file name


img = image.load_img(img_path, target_size=(150, 150))
img_array = image.img_to_array(img) / 255.0 # Normalize
img_array = np.expand_dims(img_array, axis=0) # Add batch dimension
Predict the class:
predictions = model.predict(img_array)
predicted_class = np.argmax(predictions)
print(f"Predicted Class: {predicted_class}")
7.3 Object Detection
A computer vision process of identifying examples of various objects in
videos or images. Deep learning or machine learning are usually
leveraged by object detection algorithms to create significant results.
Humans locate objects and recognize images or videos in a matter of
seconds but objection detection replicates this intelligence by using a
computer.

7.3.1 YOLO
https://medium.com/@shroffmegha6695/object-detection-with-deep-learning-beginners-friendly-key-
terms-explanation-d4fb594fea83

1. YOLO's Purpose: Real-time object detection combining


classification and bounding box prediction in a single neural
network.
2. Advantages:
○ Faster performance: Detects objects at 45 FPS.
○ Captures the entire image context in one look.
○ Generalizes well across different environments.
YOLO Architecture:
1. Components:
○ Backbone: Convolutional layers for feature extraction.
○ Neck: Fully connected layers for probabilities and bounding
box coordinates.
○ Head: Outputs predictions as a tensor.
2. Training Details:
○ Pre-trained backbone on ImageNet; uses additional layers for
detection.
○ Higher resolution (448x448) for finer detection.
○ Loss Function: Adjusts weights for localization and
classification errors using coefficients λcoord and λnoobj

YOLO (You Only Look Once) performs object detection explained step-
by-step in a simplified way, using the example of detecting players and
soccer balls:
1. Residual Blocks

● The image is divided into a grid of NxN cells (e.g., 4x4 in this
case).
● Each cell predicts:
○ If it contains an object.
○ The class of the object (e.g., "Player" or "Ball").
○ A confidence score for the prediction.
● Key Idea: Each grid cell "looks" only at the portion of the image it
covers to localize objects.

2. Bounding Box Regression

● YOLO predicts a bounding box for each object.


● A bounding box is represented by a vector
Y=[pc,bx,by,bh,bw,c1,c2]
○ pc: Probability the grid contains an object (higher for grids
covering actual objects).
○ bx,by: Coordinates of the box's center relative to the grid cell.
○ bh,bw: Height and width of the bounding box.
○ c1,c2: Class scores (e.g., "Player" or "Ball").
● This helps identify the position and size of objects, as seen with the
bounding box around the player.
3. Intersection Over Union (IOU)

● Multiple grid cells may predict bounding boxes for the same
object.
● To avoid duplicates, YOLO calculates IOU, the overlap ratio
between predicted and actual boxes:
● Predictions with IOU below a threshold (e.g., 0.5) are discarded,
retaining only relevant boxes.

4. Non-Maximum Suppression (NMS)


● Even after IOU filtering, multiple bounding boxes may remain for
a single object.
● NMS helps by:
1. Selecting the box with the highest confidence score.
2. Discarding overlapping boxes with lower confidence.
● The final result is a clean detection of objects with minimal noise.

Summary of Process

1. The image is split into a grid.


2. Each grid predicts objects and bounding boxes.
3. IOU filters out irrelevant predictions.
4. NMS ensures only the best bounding boxes remain.

5.
This efficient pipeline enables YOLO to detect multiple objects in real
time, as shown by the transformation from Image (A) to Image (B). It
explains how YOLO can detect both players and soccer balls accurately
while ignoring background noise.
Limitations:

1. Struggles with:
○ Detecting multiple objects in a single grid cell.
○ Groups of small objects or different classes in close
proximity.
2. Restricted to a limited number of bounding boxes and class
predictions per cell.

Step 1: Upload Kaggle API Key

1. Download your Kaggle API key (kaggle.json) from your Kaggle account:
○ Go to Account Settings.
○ Click "Create New API Token" to download the kaggle.json file.
2. Upload the kaggle.json file in Colab:
○ Run the cell below in Colab and click on the "Choose Files" button to upload
the file.

from google.colab import files


files.upload()

Step 2: Set Up the Environment


!pip install ultralytics

Step 3: YOLOv5 Object Detection Code

Here’s the main code for performing object detection on an image from the Kaggle dataset:

from ultralytics import YOLO


import cv2
import matplotlib.pyplot as plt
import os

# Load the YOLOv5 model


model = YOLO('yolov5s') # Use the small YOLOv5 model

# Path to the image


image_path = '/content/dataset/training_set/cats/cat.1.jpg' # Replace with the actual path to an
image in the dataset

# Check if the image exists


if not os.path.exists(image_path):
print("Image not found at {image_path}")
else:
# Load and preprocess the image
image = cv2.imread(image_path)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Perform object detection


results = model.predict(source=image_rgb, conf=0.5) # Confidence threshold set to 0.5

# Visualize the results


plt.figure(figsize=(10, 10))
plt.imshow(cv2.cvtColor(results[0].plot(), cv2.COLOR_BGR2RGB))
plt.axis('off')
plt.show()
THE SINGLE SHOT DETECTOR (SSD)
https://jonathan-hui.medium.com/ssd-object-detection-single-shot-multibox-detector-for-real-
time-processing-9bd8deac0e06

Key Features of SSD


● Single Shot:It directly predicts the presence of objects and their
bounding box coordinates in a single shot, making it faster and
more efficient.
● MultiBox: SSD uses a set of default bounding boxes (anchor
boxes) of different scales and aspect ratios at multiple locations in
the input image. SSD predicts adjustments to these default boxes to
locate objects accurately.
● Multi-Scale Detection: SSD operates on multiple feature maps
with different resolutions. Predictions are made at different scales
to capture objects at varying levels of granularity.

The architecture consists of two main components:


● Backbone Network: The backbone network comprises
convolutional layers without the fully connected classification
layers. It extracts rich features from the input image.
● Custom Convolutional Layers:
a. Multi-Scale Feature Maps for Detection: After the backbone
network, a series of convolutional layers are stacked on top.
These custom layers progressively decrease in size and
enable predictions at multiple scales.
b. Convolutional Predictors for Detection: For each added layer,
a 3x3 kernel is applied in a convolutional manner to produce
class probabilities and adjustments to default boxes. they are
placed in each feature map cell.
Step-by-Step Explanation of SSD (Single Shot MultiBox Detector)
SSD is an object detection algorithm that detects objects in images and provides their
classes and bounding boxes. Below is a step-by-step explanation of how SSD works,
along with an example for implementation.

1. Input Image Preprocessing


Explanation:
● Resize the input image to a fixed size (e.g., 300x300 or 320x320, depending on the
SSD model).
● Normalize pixel values (if required) and convert the image into a tensor with a
batch dimension.

2. Feature Extraction
Explanation:
● SSD uses a backbone network (e.g., VGG16, MobileNet, or ResNet) to extract
feature maps from the input image.
● These feature maps capture spatial and semantic information for object detection.
Behind the Scenes:
The backbone extracts features at multiple scales to detect objects of various sizes.

3. Anchor Boxes:::::IMP
Detailed Explanation of Anchor Boxes in SSD (Single Shot MultiBox Detector)
What Are Anchor Boxes?
● Anchor Boxes: Predefined boxes that serve as references for the object detection
process.
○ These boxes are used to propose potential locations for objects in the
image.
○ They come in different shapes and sizes to cover a variety of object shapes
and scales.
Role of Anchor Boxes in SSD
● SSD uses anchor boxes as initial estimates for the location and size of the object.
● The network adjusts these anchor boxes to better fit the detected objects in the
image.

How Anchor Boxes Work in SSD


1. Anchor Box Placement:
○ Anchor boxes are defined at each location (cell) in the feature map grid.
○ These predefined boxes are placed at different aspect ratios and scales to
match objects of various shapes.

2. Adjustment of Anchor Boxes:


○ The SSD network predicts how much each anchor box should be modified:
■ Translation: Shifting the box position.
■ Scaling: Changing the size of the box.
○ The network adjusts the anchor boxes by computing offsets that modify
their position and size to best fit the object in the image.
3. Multiple Anchor Boxes per Cell:
○ Each grid cell in the feature map can have multiple anchor boxes
associated with it.
○ These anchor boxes can have different aspect ratios and sizes.
○ For example, a cell could have 4 or more anchor boxes (depending on the
configuration).
Anchor Boxes in SSD Layers
● Different Layers, Different Anchor Box Sizes:
○ Anchor boxes in SSD are tied to different feature map layers with varying
resolutions:
■ High-resolution layers: Detect smaller objects (fine-grained boxes).
■ Low-resolution layers: Detect larger objects (coarse-grained
boxes).
○ As the feature map resolution decreases through successive convolutional
layers, anchor boxes are able to detect larger objects.
Adjusting the Anchor Boxes
● Anchor Box Predictions:
○ The network predicts the offsets for the anchor boxes:
■ Center coordinates (x, y).
■ Width and height of the bounding box.
○ The predicted box is compared with the ground truth box to calculate the
loss (based on Jaccard Index or IOU).
○ Confidence Scores: Each anchor box is also assigned a confidence score
indicating the likelihood of an object being present within that box.
● Matching Anchor Boxes:
○ During training, anchor boxes are matched with ground truth boxes based
on the Intersection over Union (IOU) threshold:
■ IOU 0.5 : The anchor box is considered a match.
■ IOU < 0.5: The anchor box is ignored.
○ This helps the model learn which anchor boxes should be adjusted for each
object.
Example of Anchor Boxes in SSD
1. Default Anchors:
○ SSD uses a set of predefined aspect ratios and scales for anchor boxes.
○ Typically, anchor boxes include:
■ Square boxes (aspect ratio 1:1).
■ Long boxes (aspect ratios like 2:1, 1:2).
○ These anchor boxes help the network detect objects of varying shapes and
sizes.
2. Anchor Box Variations:
○ The network may generate predictions based on multiple anchor boxes per
cell.
○ For instance, each feature map location may have 4 or more anchor boxes
with different aspect ratios:
■ 4 Anchor Boxes per Grid Cell: For example, aspect ratios such as
1:1, 2:1, 1:2, and 1:3.
○ After adjusting the anchor boxes, SSD predicts both the bounding box
(location, width, height) and the object class for each anchor box.

4. Predictions for Each Anchor Box:


For each anchor box, SSD predicts:
1. Class Scores: The probability of each class for the object in the box.
2. Bounding Box Offsets: Adjustments to the default anchor box to better fit the
detected object.

5. Non-Maximum Suppression (NMS):


● Post-processing is applied to filter out overlapping and low-confidence detections.
● NMS ensures that only the best bounding box for each detected object remains.

6. Output
● SSD outputs a set of bounding boxes, their associated class labels, and confidence
scores.

import cv2
import numpy as np
import tensorflow as tf

# Load TensorFlow SSD model


model_path = tf.keras.utils.get_file(
'ssd_mobilenet_v2_fpnlite_320x320',
'http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_fpnlite_32
0x320_coco17_tpu-8.tar.gz',
untar=True
)
saved_model_dir = f"{model_path}/saved_model"
detect_fn = tf.saved_model.load(saved_model_dir)

# Load and preprocess the image


image_path = "input_image.jpg"
image = cv2.imread(image_path)
height, width, _ = image.shape

# Resize image for SSD input


input_tensor = cv2.resize(image, (320, 320))
input_tensor = np.expand_dims(input_tensor, axis=0)
input_tensor = tf.convert_to_tensor(input_tensor, dtype=tf.uint8)

# Perform inference
detections = detect_fn(input_tensor)

# Extract detection results


boxes = detections['detection_boxes'][0].numpy()
classes = detections['detection_classes'][0].numpy().astype(np.int32)
scores = detections['detection_scores'][0].numpy()

# Draw detections on the image


for i in range(len(scores)):
if scores[i] > 0.5: # Confidence threshold
box = boxes[i]
y1, x1, y2, x2 = box
x1, x2 = int(x1 * width), int(x2 * width)
y1, y2 = int(y1 * height), int(y2 * height)
# Draw bounding box and label
cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)
label = f"Class {classes[i]}: {scores[i]:.2f}"
cv2.putText(image, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
(255, 0, 0), 2)

# Display the image with detections


cv2.imshow("Detections", image)
cv2.waitKey(0)
cv2.destroyAllWindows()
—----------------------------------------finish chp—----------------------------------------------
Self Practice:
Here is the complete code to build a male-female classification CNN model on Google Colab.
Follow these steps:

Step 1: Set Up Google Colab

1. Open Google Colab.


2. Create a new notebook and follow these steps.

Step 2: Import Libraries


import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import zipfile
import os

Step 3: Upload Dataset

1. Organize your dataset:


○ train/male/ for male images.
○ train/female/ for female images.
○ test/male/ and test/female/ for testing images.
2. Compress your dataset into a .zip file and upload it to Colab.
3. Run this code to extract the dataset:

# Upload the dataset file (e.g., dataset.zip)


from google.colab import files
uploaded = files.upload()

# Extract the dataset


with zipfile.ZipFile("dataset.zip", 'r') as zip_ref:
zip_ref.extractall("/content/dataset")

# Verify the extracted folders


os.listdir("/content/dataset")
Step 4: Data Preparation

Prepare your training and testing data using ImageDataGenerator:

# Data augmentation for training


train_datagen = ImageDataGenerator(
rescale=1.0/255,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest'
)

# Training data generator


train_generator = train_datagen.flow_from_directory(
'/content/dataset/train', # Path to training data
target_size=(128, 128), # Resize images to 128x128
batch_size=32,
class_mode='binary' # Binary classification
)

# Test data preparation


test_datagen = ImageDataGenerator(rescale=1.0/255)

test_generator = test_datagen.flow_from_directory(
'/content/dataset/test', # Path to testing data
target_size=(128, 128),
batch_size=32,
class_mode='binary'
)

Step 5: Build the CNN Model


model = Sequential([
# Convolutional Layer 1
Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3)),
MaxPooling2D(pool_size=(2, 2)),

# Convolutional Layer 2
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2, 2)),
# Convolutional Layer 3
Conv2D(128, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2, 2)),

# Flattening
Flatten(),

# Fully Connected Layer


Dense(128, activation='relu'),
Dropout(0.5), # Dropout to prevent overfitting

# Output Layer
Dense(1, activation='sigmoid') # Binary classification
])

Step 6: Compile and Train the Model


# Compile the model
model.compile(
optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy']
)

# Train the model


history = model.fit(
train_generator,
epochs=10, # Adjust based on your needs
validation_data=test_generator
)

Step 7: Evaluate the Model


# Evaluate the model on the test set
loss, accuracy = model.evaluate(test_generator)
print(f"Test Accuracy: {accuracy * 100:.2f}%")

Step 8: Save and Download the Model


# Save the model
model.save('male_female_classifier.h5')

# Download the model


from google.colab import files
files.download('male_female_classifier.h5')

Step 9: Predict with New Images

To make predictions on new images, upload the image to Colab and run this code:

from tensorflow.keras.preprocessing import image


import numpy as np

# Path to the uploaded image


img_path = '/content/new_image.jpg' # Replace with your image path

# Load and preprocess the image


img = image.load_img(img_path, target_size=(128, 128))
img_array = image.img_to_array(img)
img_array = np.expand_dims(img_array, axis=0) / 255.0

# Predict
prediction = model.predict(img_array)

# Output prediction
if prediction[0][0] > 0.5:
print("Prediction: Female")
else:
print("Prediction: Male")

You might also like