CV_T3_ Unit-7
CV_T3_ Unit-7
1. Input Layer
● How it works:
○ Kernels slide over the image, performing element-wise multiplications and
summing the results (convolution operation).
○ The result is a feature map that highlights specific patterns.
● Importance: Learns spatial features, preserving the relationship between pixels.
● Example: Detects edges or corners in an image of a face.
Here in above figure stride is 2,filter convolve over image with 2 difference in
horizontal and vertical slide.
4. Pooling Layer
● Purpose: Reduces the spatial size of the feature maps while retaining important
information.
● Types:
○ Max Pooling: Keeps the maximum value in a region.
○ Average Pooling: Computes the average value in a region.
● Importance:
○ Reduces computational cost.
○ Prevents overfitting by generalizing feature detection.
○ Makes the network invariant to small translations (e.g., shifting an object
slightly in an image).
○ Pooling downsamples the image in its height and width but the number of
channels(depth) stays the same.
5. Flattening Layer
● Purpose: Converts the 2D or 3D feature maps into a 1D vector.
7. Dropout Layer
For each layer of the Artificial Neural Network, the following calculation takes
place(falls in DENSE Layers)
ANN Calculation for each layer
where,
x — is the input vector with dimension [p_l, 1]
W — Is the weight matrix with dimensions [p_l, n_l] where, p_l is the number of
neurons in the previous layer and n_l is the number of neurons in the current
layer.
b — Is the bias vector with dimension [p_l, 1]
g — Is the activation function, which is usually ReLU.
This calculation is repeated for each layer.
After passing through the fully connected layers, the final layer uses the softmax
activation function (instead of ReLU) which is used to get probabilities of the
input being in a particular class (classification).
Summary:
● Assigns a label or tag to an entire image based on training data of labeled images.
● Involves pixel-level analysis to determine the most suitable label.
● Enables valuable insights and informed decisions.
1. Binary Classification
○ Labels images into two categories (e.g., benign/malignant tumors, defect/no
defect).
○ Answers yes/no questions.
2. Multiclass Classification
○ Categorizes items into three or more classes (e.g., sentiment analysis, disease
classification).
3. Multilabel Classification
○ Allows an image to have multiple labels (e.g., identifying all colors in a fruit salad
image).
4. Hierarchical Classification
○ Organizes classes into a hierarchical structure with broad and specific categories.
○ Example: Identifying fruit type (apple vs. grape) followed by subtypes
(Honeycrisp, Red Delicious).
1. Image Pre-processing
○ Improves image quality for analysis.
○ Techniques: resizing, cropping, normalization, noise reduction, data
augmentation.
2. Feature Extraction
○ Identifies visual patterns (e.g., edge detection, texture analysis, CNN-based
feature learning).
○ Essential for distinguishing between classes.
3. Object Classification
○ Uses machine learning algorithms to assign the image to a class based on
extracted features.
Here’s a step-by-step guide to build an image classification model using TensorFlow and
Keras:
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
Step 6: Augment Data (Optional)
datagen = ImageDataGenerator(
rotation_range=15,
width_shift_range=0.1,
height_shift_range=0.1,
horizontal_flip=True)
datagen.fit(x_train)
model.save('image_classification_model.h5')
import numpy as np
# Predict
predictions = model.predict(new_image)
print("Predicted Class: {np.argmax(predictions)}")
optimizer: Optimization
algorithm (e.g., 'adam').
loss: Loss function (e.g.,
'categorical_crossentropy'). model.compile(optimizer='adam',
model.compile(optimizer, loss, Configures the model for metrics: Metrics for loss='categorical_crossentropy',
metrics) training. evaluation (e.g., 'accuracy'). metrics=['accuracy'])
rotation_range: Degree of
rotation.
width_shift_range: Fraction
for horizontal shift.
height_shift_range: Fraction
for vertical shift.
Applies real-time data horizontal_flip: Boolean for ImageDataGenerator(rotation_range=15,
ImageDataGenerator() augmentation for training. flipping horizontally. width_shift_range=0.1, horizontal_flip=T
x, y: Training data and labels.
batch_size: Number of
samples per batch.
epochs: Number of training
cycles. model.fit(datagen.flow(x_train, y_train,
model.fit(x, y, batch_size, epochs, Trains the model on the validation_data: Data for batch_size=64), epochs=10,
validation_data) training data. validation during training. validation_data=(x_test, y_test))
Evaluates the model’s test_loss, test_accuracy =
model.evaluate(x, y) performance on test data. x, y: Test data and labels. model.evaluate(x_test, y_test)
Saves the trained model to filepath: Path to save the
model.save(filepath) a file. model. model.save('model.h5')
Makes predictions on
model.predict(x) new/unseen data. x: Input data for predictions. predictions = model.predict(new_image)
step-by-step guide to building an image classification model from scratch using your
own dataset:
1. Data :
○ train/ is for training images.
○ validation/ is for validation images.
2. Ensure that the dataset has balanced classes for better performance.
validation_generator = validation_datagen.flow_from_directory(
validation_dir,
target_size=(150, 150),
batch_size=32,
class_mode='categorical'
)
model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(150, 150, 3)),
MaxPooling2D((2, 2)),
Flatten(),
Dense(512, activation='relu'),
Dropout(0.5),
Dense(train_generator.num_classes, activation='softmax')
])
history = model.fit(
train_generator,
steps_per_epoch=train_generator.samples // train_generator.batch_size,
epochs=10,
validation_data=validation_generator,
validation_steps=validation_generator.samples // validation_generator.batch_size
)
# Plot accuracy
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
# Plot loss
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
model.save('/content/drive/MyDrive/custom_image_classifier.h5')
7.3.1 YOLO
https://medium.com/@shroffmegha6695/object-detection-with-deep-learning-beginners-friendly-key-
terms-explanation-d4fb594fea83
YOLO (You Only Look Once) performs object detection explained step-
by-step in a simplified way, using the example of detecting players and
soccer balls:
1. Residual Blocks
● The image is divided into a grid of NxN cells (e.g., 4x4 in this
case).
● Each cell predicts:
○ If it contains an object.
○ The class of the object (e.g., "Player" or "Ball").
○ A confidence score for the prediction.
● Key Idea: Each grid cell "looks" only at the portion of the image it
covers to localize objects.
● Multiple grid cells may predict bounding boxes for the same
object.
● To avoid duplicates, YOLO calculates IOU, the overlap ratio
between predicted and actual boxes:
● Predictions with IOU below a threshold (e.g., 0.5) are discarded,
retaining only relevant boxes.
Summary of Process
5.
This efficient pipeline enables YOLO to detect multiple objects in real
time, as shown by the transformation from Image (A) to Image (B). It
explains how YOLO can detect both players and soccer balls accurately
while ignoring background noise.
Limitations:
1. Struggles with:
○ Detecting multiple objects in a single grid cell.
○ Groups of small objects or different classes in close
proximity.
2. Restricted to a limited number of bounding boxes and class
predictions per cell.
1. Download your Kaggle API key (kaggle.json) from your Kaggle account:
○ Go to Account Settings.
○ Click "Create New API Token" to download the kaggle.json file.
2. Upload the kaggle.json file in Colab:
○ Run the cell below in Colab and click on the "Choose Files" button to upload
the file.
Here’s the main code for performing object detection on an image from the Kaggle dataset:
2. Feature Extraction
Explanation:
● SSD uses a backbone network (e.g., VGG16, MobileNet, or ResNet) to extract
feature maps from the input image.
● These feature maps capture spatial and semantic information for object detection.
Behind the Scenes:
The backbone extracts features at multiple scales to detect objects of various sizes.
3. Anchor Boxes:::::IMP
Detailed Explanation of Anchor Boxes in SSD (Single Shot MultiBox Detector)
What Are Anchor Boxes?
● Anchor Boxes: Predefined boxes that serve as references for the object detection
process.
○ These boxes are used to propose potential locations for objects in the
image.
○ They come in different shapes and sizes to cover a variety of object shapes
and scales.
Role of Anchor Boxes in SSD
● SSD uses anchor boxes as initial estimates for the location and size of the object.
● The network adjusts these anchor boxes to better fit the detected objects in the
image.
6. Output
● SSD outputs a set of bounding boxes, their associated class labels, and confidence
scores.
import cv2
import numpy as np
import tensorflow as tf
# Perform inference
detections = detect_fn(input_tensor)
test_generator = test_datagen.flow_from_directory(
'/content/dataset/test', # Path to testing data
target_size=(128, 128),
batch_size=32,
class_mode='binary'
)
# Convolutional Layer 2
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2, 2)),
# Convolutional Layer 3
Conv2D(128, (3, 3), activation='relu'),
MaxPooling2D(pool_size=(2, 2)),
# Flattening
Flatten(),
# Output Layer
Dense(1, activation='sigmoid') # Binary classification
])
To make predictions on new images, upload the image to Colab and run this code:
# Predict
prediction = model.predict(img_array)
# Output prediction
if prediction[0][0] > 0.5:
print("Prediction: Female")
else:
print("Prediction: Male")