0% found this document useful (0 votes)
165 views

Car Make and Model Recognition Using Ima

The document describes research on car make and model recognition (MMR) using image processing and machine learning techniques. It discusses using the bag-of-features model with SURF feature extraction and SVM classification, as well as convolutional neural networks. The researchers implemented MMR using these two approaches. They collected a dataset of over 10,000 images from online sources and took their own photos. Features were extracted and the images classified to recognize the make and model of vehicles.

Uploaded by

RAna AtIf
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
165 views

Car Make and Model Recognition Using Ima

The document describes research on car make and model recognition (MMR) using image processing and machine learning techniques. It discusses using the bag-of-features model with SURF feature extraction and SVM classification, as well as convolutional neural networks. The researchers implemented MMR using these two approaches. They collected a dataset of over 10,000 images from online sources and took their own photos. Features were extracted and the images classified to recognize the make and model of vehicles.

Uploaded by

RAna AtIf
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Car Make and Model Recognition using Image Processing and

Machine Learning
Hashir Yaqoob, Shaharyar Bhatti and Rana Raees Ahmed Khan.

Abstract—Advancement in automation, artificial intelligence


and robotics has enabled development of intelligent
A. Overview
transportation syustems for traffic monitoring and traffic The principal objective of this project is to address the
surveillance systems. Car Make and Model recognition is an problem of recognition and classification of objects,
important part of such systems. Our project automatically and
according to certain desited characteristics. In our case, it is
in real time, recognizes and classifies cars according to their
make and model using machine learning, deep neural networks, the classification of vehicles according to make and model.
and transfer learning to classify the cars, and image processing Our system incorporates latest techniques of machine learning
techniques to detect the cars in video feeds and in real time. Our for classification. A common approach for classification and
project consists of most advanced systems including
Convolutional Neural Networks, Bag of Features model and recognition is given as:
Support Vector Machines (SVM) classifiers. Moreover, this
project has a numerous number of child projects, meaning that  Extracting Features from Images: Feature extraction
you can use any image, text or document database and train our could be done by using image-processing techniques such
system to do identification and classification according to the
as SURF or SIFT etc. The image is then represented using
characterisitics defined by the user. Furthermore, the intention
of this paper is to demonstrate our work and provide pathways these features, which give a better description of the image,
to enable others to elaborate our work on this system. useful for classification.
 Classification: After feature extraction, the car iages
I. INTRODUCTION are classified using machine learning technques. We have

C AR Make and Model Recognition, otherwise known as


Make and Model Recognition (MMR) system is our
project. This project incorporates image processing methods
used deep neural networks using transfer learning and alsoo
implemented Bag of Features model.

and machine learning techniques to recognize and classify


vehicles according to their make and model. B. Bag of Features Method for MMR
The aim of this project is to develop a software system that
Bag of features method uses SURF features to encode or
takes an image or video feed as input; process it using image
processing techniques, and then use machine learning to represent the image. It then uses a machine learning
recognize and classify the make and model of the vehicle. For classfifcation algorithm called the support vector machine
instance, when an image of a Honda Civic is loaded as an SVM.
input to the system, it should classify the make as “Honda”
and model as “Civic”, with a satisfactory accuracy. C. Convolutional Neural Network for MMR
This research paper illustrates the methods and Convolutional Neural Network or CNN is a type of neural
implementation of Car Make and Model Recognition CMMR network used widely for image classification systems. In
system. Development platform was MATLAB, project this method, features are extracted using the convolutional
included built-in MATLAB functions, and the User Interface layers and then classified using later layers of the network.
(GUI) is implemented using MATLAB GUI developer
D. Data-sets used
“GUIDE”.
A number of image processing techniques are used for The image data sets we used comprised of three image
feature extraction, such as SURF (speeded up robust databases. First one was the COMPCARS[] Comprehensive
features), SIFT (scalar invariant feature transform), HOG Cars image database, another data base was collected from
(histogram of oriented gradients), cross correlation, peak online internet resources, and the third data set was the one
correlation and edge detection. Various filters are also used to we collected ourselves, adding upto an approximate number
extract special information such as edges, intensities, color of 10,000 images combined. The images were first split into
information and a lot more, for training a neural network.
The machine learning algorithms used in this project
include Convolutional Neural Networks, Deep Neural
Networks, Transfer Learning, Support Vector Machines
(SVM) classifier, and a Bag-Of-Features model.
categories according to the make and model of the vehicle, In the Bag-of-Features approach used by Abdul Jabbar
next the data set was split for training and testing images. Siddiqui, Abdelhamid Mammeri and Azzedine Boukerche in
their paper “Real-Time Vehicle Make and Model Recognition
based on a Bag-of-SURF-Features”[4], used the BOF model
E. Applications of VMMR systems
using SURF feature extractor. A vocabulary of words or
The project has its applications in the following fields: features was formed by clustering SURF features of the
 Security and surveillance images, next the classification was done using SVM
 Toll plaza systems classifier.
 Car parking systems C. VMMR using CNN (Deep Learning)
 Traffic monitoring systems
Yiren Zhou et al. [5], in their paper “Image based Vehicle
 It can also enhance license plate recognition Analysis using Deep Neural Nets…” used a pre-trained
systems. Neural Network ‘AlexNet”. Although, the AlexNet originally
is trained to recognize 1000 categories of different objects, the
This project can also be used for data mining and data analysis
authors used it to distinguish between various categories of
purposes to observe related trends such as the color
cars.
preferance, model preferance, etc.
Two approaches were used, the first one was to extract
II. RELATED WORK
feauters from higher layers of the CNN and then classify using
We studeid papers available on VMMR systems, we SVM classifiers. The second approach was to fine tune the
concentrated on those who used image processing and neural network to be used for VMMR system.
machine learning techniques for the implementation of
VMMR systems. At the end we chose BOF model and CNN Another paper named “View Independent Car Make and
approaches for the implementation. Model and Color Recognition” by Afshin Dehgan and Syed
Zain Masood [6], the authors trained a deep neural network
A. Car Make Recognition Using Logo Extraction for VMMR using a very large image data set consisting of
This approach uses the logo seen at the front or back of the millions of images. Their system is known as “Sighthound”.
vehicle and recognizes its make. The logo is first localized
using SIFT key point and then is extracted and matched with III. PROPOSED METHODS
the database. Mausam Jain and D. Tharun Kumar [1] used We implemented our MMR system using two approaches.
SIFT key points to detect the logo and extract it, then used The first approach used is the Bag-Of-Features model (BOF).
template matching to match the logo with its respective In BOF, the SURF method was used for feature extraction and
class. The accuracy they reported is around 70%. for classification, Multi-class Support Vector Machine
(SVM) classifier was used. The proposed workflow is as
B. Joint Car Make and Model Recognition
follows:
In this approach, a particular car make and model is treated as
a single category and recognized as such. Petrovic and Cootes
[2] used front images of the car for make and model
recognition. A region of interest is defined relative to license
plate and features are extraced in it. Next nearest neighbout
algorithm is used to classify.

Similarly Cheung and Ailee Chue [3] in the paper “Car make Figure 1: Bag of Feature model work flow
and model recognition” used two methods for feature
extraction and matching. They also iused SIFT key points as The second approach we used was the CNN neural network,
features and descriptors for matching the query car image which is trained for feature extraction and classification. We
with its database images. The second approach used was also used pre-trained neural network known as AlexNet, and
Harris Corners for interest point detection and Fast used Transfer Learning to train AlexNet on our data set to
Normalized Correlation for feature matching. classify vehicles in VMMR system.
IV. PROCEDURE AND IMPLEMENTATION
A. Data collection
First step to the VMMR system is the collection of data set
large enough that your systems are trained on enough images
to present a satisfactory accuracy. Our data set for vehicle
images consisted of three data sets as mentioned earlier, the
COMPCARS dataset, Internet Images, and personally
collected data set, with raw images numbering over 10,000
images. Figure 3: SURF features and SURF descriptors
B. BOF Implementation
 Clustering: using the SURF points, we create a
The BOF model is fairly and old model, and has been used
vocabulary by clustering, it is the method to
widely for document classification and text recognition and
collect similar data points and collecting them in
classification, thus it has some constraints regarding the limit
groups using via K-means clustering which uses
of size and capacity. BOF model is considered robust on the
the “Euclidean Distance” algorithm. “K” is the
other hand due to its flexibility on the nature and orientation
number of cluster centers (or words), which are
of the images. There is no size constraint and the feautre
the vocabulary of the BOF model. A cluster model
extraction process is fairly less time consuming than the
is shown below with K=5:
Neural network training time.
Due to the size limit constraints and development machines
constraints, we were forced to use a relatively small data set
to train BOF model. We used approximately 4000 vehicle
images, divided into 38 categories (models of the cars), and
trained the BOF model on this data set. The method of training
BOF is illustrated below:

Figure 4: k-Means clustering for k=5


Figure 2: training a BOF model

 Encoding: next we encoded all the training images


 Feature Extraction: we extract special features using the clusters made in step 2 and we represent
from all the training images. Feature extraction is each image according to the vocabulary of the
first achieved by locating SURF points and the BOF model. Encoding procedure is shown below:
defining SURF descriptors, we extract these
points. An image with detected SURF points and
their descriptors is shown below:

Figure 5: Encoding images with BOF Vocabulary


 Classification: we then use a multi class SVM
classifier to classify the new test images,
according to their make and model.

C. CNN using Transfer Learning


Figure 7: AlexNet layer architecture
We used Transfer Learning to re-train a pre-trained neural
network known as the AlexNet neural network. This network The input image passes through 5 convolutional layers and
was trained on millions of images with more than a 1000 3 fully-connected layers before the classification results are
categories. To use AlexNet we first had to arrange for a large shown.
enough vehicle image data set to ensure the proper advantage
of AlexNet in our system. D. Intensive Computational Requirements for CNN
AlexNet is trained on images which are resized to [227 x training and testing
227] pixel size. Smaller image size means less data to extract As mentioned before, the training of the CNN requires a lot
out from the images, thus there was a need to train it on a large of computational resources and computing power to be
data set to ensure a satisfactory accuracy. trained on a very large scale data set. Using a computer with
CNNs are better at image classification because it has a nVidia Graphic Processor with CUDA driver capabilites
additional layers besides the basic architecture layers of CNN, was used to train and test the neural network on images and
it also consists of nearly a 100 different types of filters, all of videos as well as live feed in real time.
them extract unique information from the image, for example
it has filters to extract noise, edges, colours, hidden details etc. Following is the training stage of the CNN AlexNet, which
A collage of all these filters outputs are shown below: illustrates the time consumption of the training process:

Figure 8: AlexNet training process

The steps involved in the training of CNN are as follows:

Figure 6: Output of 96 filters of 1st convolutional layer  Data set collection: collection of over 10,000
images.
 Resize, organize into classes and split for training
Neural networks take a lot of time for training on a large
data set. We used a GPU capable computer to train the CNN. and testing: the data set is resized to the image layer
The method we used to train the CNN using transfer input size for AlexNet, which is 227 x 227 pixels.
learning is stated below:  Modify AlexNet layers to use with our dataset:
 Feature Extraction: we first extract features using modify the fully connected layers to adjust for the
the 96 image filters present in the first number of classes for our data set.
convolutional layer of AlexNet. Because we use  Retrain network: retrain the network on our data set.
these layers repeatedly and with different  Classification: after training is done, we pass a test
configuration everytime, the earlier layers extract image to the network and obtain the output of the
low level features like edges and curves, while
classification result.\
later layers extract high level information.
 Classification: the classification in CNN is The modifications applied to the network architecture of
achieved in the “fully connected layer” of the
AlexNet are as follows (we modified layer 23 and layer
CNN, which is the last layer in the network.
25 to do transfer learning training of the AlexNet:
The architecture of AlexNet is given as follows:
Figure 9: Modified layers of AlexNet

E. Classification using SVM Classifier


The classification algorithm used in both BOF model and Equation 1: Equations to find the optimum separating hyperplane
CNN network is the Support Vector Machine classifier. between classes
We used multi class support vector machine classifiers to
A multi class classifier uses different data quantization
cover all the classes and categories of image data base. SVM
methods and multiple classifiers and hyperplanes for multiple
classifer is a robust classification method which produces
classes. The hyperplanes in multi-class classifeirs are not
near to accurate results.
linear. They use kernels to linearize non-linear data and apply
A simple SVM classifier is a binary classifier and works on
non-linear hyperplanes to distinguish between classes, such a
only two classes separated by a separating hyperplane, which
hyperplane is shown below, which linearize data and
defines the boundary between the two classes, as shown
hyperplane to make it a binary classifer to better classify test
below:
images:

Figure 11: linearization of data and hyperplane

As mentioned earlier, we use kernels to linearize the data,


dollowing are some of the kernels which are applied on the
multi class SVM classifier:

Figure 10: separating hyperplane

The optimal hyperplane is calculated using the following


equations to best separate the data points of the classes:

Equation 2: SVM Kernel functions


V. GRAPHICAL USER INTERFACE
Our system has an easy to understand and use GUI, which
is specifically designed to display image classification output,
video classification output, and classification in real time.
It has the option to ask the user to load an image, video or
start a live feed, and select what classification methods the
user wants to use, and then start the classification process by
clicking the Classify button.
Furthermore, we also have incorporated the training option
in the GUI, where the user can define the data set location and
start the training of selected classification models by the user.
Moreover, our GUI also has the option and functions to
extract logo of the car, and the license plate of the car, which Figure 14: Visual vocabulary histogram
further can be read using OCR-optical character recognition
(part of future development). GUI is displayed below:
To test the accuracy of the BOF model, we test it by passing
a test image of a “HONDA CIVIC” car to the BOF model
using our GUI, which yields the following results:

Figure 12: Project GUI

VI. EXPERIMENTAL RESULTS


A. Bag-Of-Features
BOF model was trained on around 4000 images which
contributed to make a total of 49 classes, or categories of cars.
The confusion matrix created after the training of the BOF Figure 15: BOF classification output
model is shown below:
The BOF model presented a satisfactory accuracy, ranging
from 70% to 78% on average, evaluated by our accuracy
formula.

B. CNN and Transfer Learning results


As expected, Neural networks are supposed to produce
more accurate results and robust outputs for classification.
We trained the AlexNet on 50 classes consisting of a total
of 10,000 images, which consisted of the COMPCARS data
set, and our own collected images from the premises of the
Figure 13: BOF confusion matrix university parking.
Following is the screenshot of the training stage of the
CNN on nVidia CUDA capable computer:

The visual vocabulary created after the clustering in the


BOF model is shown below, we set the vocabulary size to 600
words, and this is the total word count against the occourence:
These templates are used to classify the extracted logos
from the vehicle images, this function was implemented as an
extra and the accuracy of logo localization was not up to the
mark and needs more work and fine-tuning. However, a true
localization and classification match is shown below which
was carried out in the GUI:

Figure 16: CNN training process, average accuracy

As shown above, the accuracy of the evaluation set and the


training set is about 93%.
Next, we passed a series of images to the CNN, and Figure 19: LOGO detection and classification output
observed above 90% accuracy in classification. Following is
a snapshot of an image of “HONDA CIVIC”, passed as an
input test image and the ouput is shown below, all processed D. Evaluation Metric
in GUI:
This section emphasizes on the outcomes of our project,
and the accuracy it yielded for both models. The accuracy
evaluation formula we use is as follows:

𝑡𝑟𝑢𝑒 𝑚𝑎𝑡𝑐ℎ𝑒𝑠
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = × 100%
𝑡𝑜𝑡𝑎𝑙 𝑡𝑒𝑠𝑡 𝑖𝑚𝑎𝑔𝑒𝑠

Where true matches is the number of times the classifer truly


classifed a vehicle image.

VII. CONCLUSION
VMMR systems are useful in traffic monitoring and
surveillance, toll plaza systems and parking systems.
Currently work is being carried out by researchers to develop
Figure 17: CNN classification output accurate and robust techniques for car make and model
recognition. Our system includes advanced machine learning
algorithms such as CNN and deep learning and state of the art
AlexNet neural network.
C. LOGO Detection and classification
However the BOF model is ruled out as less effective and
We also implemented template matching using peak accurate classification method as compared to the neural
correlation method, to localize and extract the logo from front networks efficiency and accuracy. We obtained above 90%
and back of the car. For this, we created a template data set of accuracy using neural networks as compared to the 75%
logo images, as shown below: average accuracy of BOF model.
The accuracy can be further improved by using a larger
data sets having images numbering in millions, to ensure a
proper learning of the classes, which contains images taken in
every kind of different situations, so that the learning is
universal and more effective.
Furthermore, this system has a numerous number od child
Figure 18: car logo image templates projects in the domain of object and pattern recognition and
classification.
Moreover, this system could be used for data mining and
trend analysis, for example, we can use the calssification in REFERENCES
real time and observe the trends of most preferred colour of [1] Mausam Jain, D. Tharun Kumar, “Car Make and Model
vehicles, most preferred models of the cars, in any region. Recognition”, IIT Hyderabad, ODF, Yeddumailaram – 502205, 2015.
[2] V. S. Petrovic and T. F. Cootes, “Analysis of features for rigid
This analysis can be very helpful for vehicle manufacturing
structure vehicle type recognition”, in Proc. British Machine Vision
companies to make their business more profitable. Conference (BMVC’04), pp. 587-596, Kingstone UK, September
2004.
[3] Sparta Cheung, Alice Chu, “Make and Model Recognition of Cars”,
CSE 190A Projects in Vision and Learning, Final Report, 2008.
APPENDIX [4] Abdul Jabbar Siddiqui et. Al “Real-Time Vehicle Make and Model
Recognition Based on a Bag of SURF Features”, IEEE Transactions
 SURF: Speeded up robust features. on Intelligent Transportation Systems, VOL. 17, NO. 11, November,
 SIFT: Scalar Invariant Feature Transform. 2016.
 HOG: Histogram of oriented gradients. [5] Yiren Zhou et. Al. Ímage-Base Vehicle Analysis using Deep Neural
Network: A Systematic Study”, arXiv:1601.01145v2[cs.CV], August,
 Template Matching: Match an object in an 2016.
image using the templates in the template [6] Afshin Dehghan et. Al. “View Independent Vehicle Make, Model and
database. Color Recognition using Convolutional Neural Network”, Computer
Vision Lab, Sighthound Inc., Winter Park, FL.
 Peak Correlation: Match two images using arXiv:1702.01721v1[cs.CV], 6 Feb 2017.
peaks created by taking the discrete fourier [7] A Krizhevsky, et. Al. “Imagenet Classification with Deep
transform of both images. Convolutional Neural Networks:, Advances in Neural Information
Processing Systems, 2012.
 KNN: k nearest neighbour algorithm. [8] Derrick Liu, Yushi Wang, “Image Classification of Vehicle Make and
 BOF: Bag of Features Model using Convolutional Neural Networks and Transfer Learning”,
 CNN: Convolutional Neural Network. Stanford Uni.

 SVM: Support Vector Machine. [9] Dr. Kazi A. Kalpoma et. Al. “Logo Recognition using SURF Features
and kNN Search Tree”, International Journal of Scientific and
 Transfer Learning: Method of re-training a pre- Engineering Research, Volume 6, Issue 9, September-2015. ISSN
trained neural network using its layers weights 2229-5518.
[10] pp. 876—880. Available: http://www.halcyon.com/pub/journals/
and your own data set. 21ps03-vidmar
 VMMR: Vehicle Make and Model Recognition.
 MMR: Make and Model Recognition.

ACKNOWLEDGMENT
This project would not have been possible without the
support, supervision and sincere guidance of our supervisor,
Ms. Sumayya Haroon, Assistant Professor, faculty of
Electrical Engineering department of COMSATS Institute of
Information Technology, Islamabad. She guided us
effectively to make this project reach its completion.
We are thankful to our seniors, and faculty members, who
helped us out in every difficulty and obstacle that occoured
during the development of this project.
We would very humbly pay our gratitude to our family
members, for their support and prayers.
In the end, we would like to thank our group members for
putting in their sincere efforts and knowledge.
This work is dedicated to our respected supervisor, to the
university and our friends and family.

You might also like