0% found this document useful (0 votes)
41 views

BOW Assignment 210097

The document describes implementing a bag of visual words model for object recognition. It involves 4 main steps: 1) determining image features, 2) constructing a visual vocabulary through clustering image features, 3) classifying images based on the vocabulary, and 4) obtaining the most optimal class for a query image. The model is trained on a dataset containing 4 object classes and uses SIFT to extract features, k-means clustering to generate the visual vocabulary, and an SVM for classification. Pseudocode is provided for key steps like feature extraction, vocabulary generation, training, and testing the model.

Uploaded by

Sehr Malik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

BOW Assignment 210097

The document describes implementing a bag of visual words model for object recognition. It involves 4 main steps: 1) determining image features, 2) constructing a visual vocabulary through clustering image features, 3) classifying images based on the vocabulary, and 4) obtaining the most optimal class for a query image. The model is trained on a dataset containing 4 object classes and uses SIFT to extract features, k-means clustering to generate the visual vocabulary, and an SVM for classification. Pseudocode is provided for key steps like feature extraction, vocabulary generation, training, and testing the model.

Uploaded by

Sehr Malik
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Reg.

No: 210097
Name: Sehrish Rafique
Submitted To: Dr. Ehatisham
Date: December 20th, 2021

Implementing Bag of Visual words for Object Recognition


Task Scenario
There is an input image containing a cup, saucer, bottle, etc. The task is to be able to recognize
which of the objects are contained in the image.
It follows 4 simple steps
- Determination of Image features of a given label
- Construction of visual vocabulary by clustering, followed by frequency analysis
- Classification of images based on vocabulary generated
- Obtain most optimum class for query image

Dataset
The dataset contains the images of 4 types of objects:

- Soccer Ball
- Accordion
- Dollar Bill
- Motorbike

Implementation
Let’s begin with a few introductory concepts required Bag of words. We shall cover 4 parts.
- Clustering
- Bag of Visual Words Model
- Generating Vocabulary
- Training and testing

Beginning with KMeans clustering. Suppose there are X objects, that are to be divided into K
clusters. The input can be a set of features, X={x1,x2,...,xn}X={x1,x2,...,xn}. The goal is basically
to minimize the distance between each point in the scatter cloud and the assigned centroids.
Code Snippet
```
import numpy as np
from matplotlib import pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
# create a dataset sample space that will be used
# to test KMeans. Use function : make_blobs
n_samples = 1000
n_features = 5;
n_clusters = 3;
X, y = make_blobs(n_samples, n_features)
# X => array of shape [nsamples,nfeatures] ;;; y => array of shape[nsamples]
# X : generated samples, y : integer labels for cluster membership of each sample
# performing KMeans clustering
ret = KMeans(n_clusters = n_clusters).fit_predict(X)
print(ret)
__, ax = plt.subplots(2)
ax[0].scatter(X[:,0], X[:,1])
ax[0].set_title("Initial Scatter Distribution")
ax[1].scatter(X[:,0], X[:,1], c=ret)
ax[1].set_title("Colored Partition denoting Clusters")
# plt.scatter
plt.show()
```
Every image has certain discernable features, patterns with which humans decide as to what the
object perceived is.

Output:

Initial Scatter Distribution vs Colored Partition denoting Clusters


Bag of Visual Words
This is a supervised learning model. There will be a training set and a testing set.

This has following modules:

- Bag.py → It contains the main functions.


- Heplers.py → It contains various helper functionalities. It contains Imagehelpers,
FileHelper, BOVhelpers. Imagehelpers(this contains color scheme conversion, feature
detection).

FileHelper returns a dictionary of each object-name with a corresponding list of all images. It also
returns total image count. It returns a dictionary with key = object_name and value = list of images
and total number of images.

```
class FileHelper:
def getFiles(self, path):
"""
- returns a dictionary of all files
having key => value as objectname => image path
- returns total number of files.
"""
imlist = {}
count = 0
for each in glob(path + "*"):
word = each.split("/")[-1]
print " #### Reading image category ", word, " ##### "
imlist[word] = []
for imagefile in glob(path+word+"/*"):
print "Reading file ", imagefile
im = cv2.imread(imagefile, 0)
imlist[word].append(im)
count +=1
return [imlist, count]
```
ImageHelpers’s primary function is to provide with SIFT features present in an image.
We require these image features to develop our vocabulary.

```
class ImageHelpers:
def __init__(self):
self.sift_object = cv2.xfeatures2d.SIFT_create()

def gray(self, image):


gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
return gray

def features(self, image):


keypoints, descriptors = self.sift_object.detectAndCompute(image, None)
return [keypoints, descriptors]
```

How to develop visual vocabulary?


A visual word is. Simply anything that can be used to describe an image, we consider them as a
visual word. Thus, our image becomes a combination of visual words (that are essentially features).

We define this structure as a histogram. Essentially, histogram is just a measure of frequency


occurrence of a particular item. here in our case, we will be describing each image as a histogram
of features. How many features out of the total vocabulary are required to make sense of what the
computer is looking at.

Linking vocabulary and clustering:

Using SIFT, we detect and compute features inside each image. SIFT returns us a m×128m×128-
dimension array, where m is the number of features extrapolated. Similarly, for multiple images,
say 1000 images, we shall obtain

feature 0

feature 1

…….

Feature N

where featureN is an array of dimension m×128.

Developing Vocabulary Each cluster denotes a particular visual word. Every image can be
represented as a combination of multiple visual words. The best method is to generate a histogram
that contains the frequency of occurrence of each visual word. Thus the vocabulary comprises of
a set of histograms of encompassing all descriptions for all images.
```

def developVocabulary(self,n_images, descriptor_list, kmeans_ret = None):

self.mega_histogram = np.array([np.zeros(self.n_clusters) for i in range(n_images)])

old_count = 0

for i in range(n_images):

l = len(descriptor_list[i])

for j in range(l):

if kmeans_ret is None:

idx = self.kmeans_ret[old_count+j]

else:

idx = kmeans_ret[old_count+j]

self.mega_histogram[i][idx] += 1

old_count += l

print("Vocabulary Histogram Generated")

```

As seen, the input is n_images i.e. the total number of images and descriptor_list, that contains the
feature descriptor array ( one discussed above, the full stacked up list of features). Our histogram
is therefore of the size n_images×n_clusters thereby defining each image in terms of generated
vocabulary.

Training the machine to understand the images using SVM

This method contains the entire module required for training the bag of visual words model.

```

def trainModel(self):

# read file. prepare file lists.

self.images, self.trainImageCount = self.file_helper.getFiles(self.train_path)

# extract SIFT Features from each image


label_count = 0

for word, imlist in self.images.iteritems():

self.name_dict[str(label_count)] = word

print("Computing Features for ", word)

for im in imlist:

# cv2.imshow("im", im)

# cv2.waitKey()

self.train_labels = np.append(self.train_labels, label_count)

kp, des = self.im_helper.features(im)

self.descriptor_list.append(des)

label_count += 1

# perform clustering

bov_descriptor_stack = self.bov_helper.formatND(self.descriptor_list)

self.bov_helper.cluster()

self.bov_helper.developVocabulary(n_images = self.trainImageCount,
descriptor_list=self.descriptor_list)

# show vocabulary trained

# self.bov_helper.plotHist()

self.bov_helper.standardize()

self.bov_helper.train(self.train_labels)

```

This method recognizes a single image. It can be utilized individually as well.

```

def recognize(self,test_img, test_image_path=None):

kp, des = self.im_helper.features(test_img)


# print kp

print(des.shape)

# generate vocab for test image

vocab = np.array( [[ 0 for i in range(self.no_clusters)]])

# locate nearest clusters for each of

# the visual word (feature) present in the image

# test_ret =<> return of kmeans nearest clusters for N features

test_ret = self.bov_helper.kmeans_obj.predict(des)

# print test_ret

# print vocab

for each in test_ret:

vocab[0][each] += 1

print(vocab)

# Scale the features

vocab = self.bov_helper.scale.transform(np.atleast_2d(vocab))

# predict the class of the image

lb = self.bov_helper.clf.predict(vocab)

# print "Image belongs to class : ", self.name_dict[str(int(lb[0]))]

return lb

```

This method is to test the trained classifier. Reading all images from testing path and use
BOVHelpers.predict() function to obtain classes of each image.

```

self.testImages, self.testImageCount = self.file_helper.getFiles(self.test_path)

predictions = []
for word, imlist in self.testImages.iteritems():

print("processing " ,word)

for im in imlist:

# print imlist[0].shape, imlist[1].shape

print(im.shape)

cl = self.recognize(im)

print(cl)

predictions.append({

'image':im,

'class':cl,

'object_name':self.name_dict[str(int(cl[0]))]

})

```

For evaluation and plotting:


```

def predict(self, iplist):

predictions = self.clf.predict(iplist)

return predictions

def plotHist(self, vocabulary = None):

print("Plotting histogram")

if vocabulary is None:

vocabulary = self.mega_histogram

x_scalar = np.arange(self.n_clusters)
y_scalar = np.array([abs(np.sum(vocabulary[:,h], dtype=np.int32)) for h in
range(self.n_clusters)])

print(y_scalar)

plt.bar(x_scalar, y_scalar)

plt.xlabel("Visual Word Index")

plt.ylabel("Frequency")

plt.title("Complete Vocabulary Generated")

plt.xticks(x_scalar + 0.4, x_scalar)

plt.show()

```

Output:
Accordion detected

Bike Detected
Histogram:

Confusion Matrix

You might also like