0% found this document useful (0 votes)
8 views

Chinese Association of Automation. Technical Committee on Data Driven Control et al. - Unknown - Proceedings of 2018 IEEE 7th Data Drive-annotated

This paper presents a method for vehicle detection and classification using convolutional neural networks (CNNs), addressing limitations of traditional methods that yield coarse-grained results. The proposed approach includes vehicle area detection and brand classification, achieving an average accuracy of 93.32% across six vehicle models. Various CNN models, including RCNN and Faster RCNN, were tested, with Faster RCNN demonstrating superior performance in terms of accuracy and efficiency.

Uploaded by

ravinderytuse
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Chinese Association of Automation. Technical Committee on Data Driven Control et al. - Unknown - Proceedings of 2018 IEEE 7th Data Drive-annotated

This paper presents a method for vehicle detection and classification using convolutional neural networks (CNNs), addressing limitations of traditional methods that yield coarse-grained results. The proposed approach includes vehicle area detection and brand classification, achieving an average accuracy of 93.32% across six vehicle models. Various CNN models, including RCNN and Faster RCNN, were tested, with Faster RCNN demonstrating superior performance in terms of accuracy and efficiency.

Uploaded by

ravinderytuse
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

2018 IEEE 7th Data Driven Control and Learning Systems Conference

May 25-27, 2018, Enshi, Hubei Province, China

Vehicle Detection and Classification Using Convolutional Neural


Networks
Minglan Sheng1, Chunfang Liu1, Qi Zhang1, Lu Lou1, Yu Zheng2
1. College of Information Science and Engineering, Chongqing Jiaotong University, Chongqing 400074, China.
E-mail: [email protected]
2.Department of Rail Transit Engineering, Chongqing Vocational College of Transportation, Chongqing 402247, China
E-mail: [email protected]

Abstract: The vehicle detection and classification are important tasks in intelligent transportation system. The traditional
methods of vehicle detection and classification often cause the coarse-grained results due to suffering from the limited
viewpoints. Inspired by the latest achievements of Deep Learning successfully applied on images classification in recent years,
this paper presents a method based on convolutional neural network, which consists of two steps: vehicle area detection and
vehicle brand classification. Several typical network models have been applied in training and classification experiments for the
detailed contrast analysis, such as RCNN (Regions with Convolutional Neural Network features), Faster RCNN, AlexNet,
Vggnet, GoogLenet and Resnet. The proposed method can identify the vehicle models, brands and other information accurately,
with the original dataset and enriched dataset, the algorithm can obtain the results with average accuracy about 93.32% in the
classification of six kinds of vehicle models.
Key Words: Convolution neural network, Vehicle detection, Vehicle type classification

classifier construction classification system to classify


1. Introduction vehicles as car, bus and truck on the highway, no specific
Vehicle detection and classification have a promising vehicle brand identified; Based on the HOG and SVM,
future, for example, the vehicle class classification of vehicle Reference [19] used spatial pyramid transformation network
detection has been applied in the intelligent transportation to model, and aligned feature maps before classification.
system to analyze the traffic statistics. It also plays a key role The above algorithms have already obtained good result,
in solving problems as vehicle management, vehicle escape, however, there are still some shortages, for example, the
vehicle charge and so on. However, the vehicle detection and methods usually require the images are captured from certain
classification still face a few great challenges because that viewpoints; or the vehicle recognition granularity just stays
the number of vehicle class is very large and that some in the stage of the vehicle model classification. Therefore,
attributions of vehicle are too close to identify. this paper aims at detecting the vehicles from different
The traditional vehicle detection and classification in road viewpoints and classifying the brands of vehicles.
surveillance are mainly based on the following methods: 1)
SIFT (Scale-invariant feature transform) feature matching
2. The Establishment of Vehicle Dataset
and extraction; 2) the moving vehicle detection method of To demonstrate better generalization ability in the vehicle
Gaussian mixture model; 3) the license plate classification classification, the vehicle classification system usually needs
method; 4) the monitoring video classification method of to study the same type of vehicle in different colors, visuals
HOG (Histogram of Oriented Gradient) and SVM (Support and scenes. Stanford's vehicle classification dataset contains
Vector Machine). In [7] the vehicle class classification can 16185 images [15] collected from natural scenes or the web,
be achieved by using the features of the outline, windshield, among these images, there are 8144 well-calibrated training
rear-view mirror and license plate in front of the car. Based images and 8041 test images, and these images are
on the traditional method, only get lower recognition categorized as 196 vehicle types. However, we have found
reference [8] [16] [17] [18] studied the vehicle structure and that there are not enough training images of the same vehicle
pattern from the front view. Reference [8] use K-means type, and the same vehicle classification accuracy is not high.
algorithm and SIFT to finish feature matching, and in this Therefore, we choose 6 categories from the dataset, namely
way, the vehicle types can be identified. Reference [16] the Volkswagen, Audi, Chevrolet, BMW, Ford and
combined SIFT with K-NN(K-Nearest Neighbor) to classify Mercedes-Benz, thus the number of samples for each of
vehicle types, both methods require tedious preprocessing vehicle type have been enlarged. To test the generalization
of images; Reference [17] used convolution neural network ability about the vehicle types that we can usually see in the
to identify the car’s logo, need to extract car’s logo from local monitor, and so that we can do relevant applied
the original image to identify; Reference [18] used PCA to research about the method to detect and identify the China
identify the car’s logo; Reference [9] used convolution vehicle types, we have enlarged the test dataset with 3909
neural networks (CNNs) theory to design the corresponding vehicle images on the road of China.
feature extraction algorithm, and combined that with SVM

*this work was partly support by Scientific and Technological Research P


rogram of Chongqing Municipal Education (KJ1755492)

978-1-5386-2618-4/18/$31.00 ©2018 IEEE DDCLS'18


581
3. Vehicle Detection and Classification Based on
CNNs
In the vehicle detection and classification based on CNNs
we extract the vehicle region from its background area by
pictures, in order to test the generalization ability about the
domestic vehicle types of the vehicle dataset, and so that we
regional segmentation and remove background noise to
improve the recognition accuracy. Using convolutional
neural network to complete feature extraction and
classification training of tagged image datasets of many
vehicle types, and in this way, the task of identifying vehicle
types from different angles can be finished. Besides, we also
focus on such points as selection accuracy, recognition
accuracy and real-time performance in the vehicle detection
(b)
process. In vehicle detection training, when we dataset the Fig. 1(a) a candidate frame generated by selective search method;
parameter "NegativeOverlapRange" (where the overlap ratio (b) the final region obtained by greedy non-maximum suppression
between the candidate and the image tagged area is used as algorithm
the interval for negative training samples) to [0, 0.3]; and the
parameter "PositiveOverlapRange" (the overlap ratio 3.2 Faster RCNN
between the candidate and the image tagged area as the RCNN exists some problems in target detection: i)
interval of the positive training sample) to [0.7, 1], the detection algorithm is carried out in several stages, and the
accuracy of vehicle detection is the highest. In the cascade extraction of candidate frames is time-consuming; ii) data
vehicle detection and vehicle model recognition, the storage space for training is large; iii) the target detection
detected area will be sent to the recognition network for speed is slow. Therefore, Faster RCNN adopts the regional
classification. The method mentioned in this paper can help suggestion network (RPN) , RoI pooling layer and other
avoid the interference of the background area in the process optimization measures [3][4] including; iv) using the
of vehicle recognition, and thus we can get a more precise regional suggestion network to generate the regional
vehicle type classification. We have compared the loss candidate frame, and sharing the convolution feature of the
function with the accuracy curve of different classification whole map with the detection network, which reduces the
models and collected the classification information of all time taken to generate the regional candidate frame; ii)the
kinds of vehicle types. RoI pooling layer normalizes the whole image directly into
3.1 RCNN the depth network, and only processes each candidate at the
end of several layers, effectively avoiding redundant
RCNN uses convolutional neural network to detect objects, operations of multiple convolution extraction of features in
the algorithm can usually be divided into four stages: 1) input the same region of the candidate frame; iii) category
image; 2) A selective search method is used to select about classification and the precision adjustment of position can be
2000 candidate regions for object recognition, as shown in realized by network, and no additional storage is needed;
Fig. 1a),the yellow area is a number of candidate boxes iv)This paper defines two sibling outputs, one generating
searched; 3) large convolutional neural network is used to object category plus "background" category, and the other
extract features for each image block in the candidate box; 4) outputs four exact boundary box positions for each object.
as for the features extracted from the candidate box, the
classifier is used to judge whether or not these features
belong to a specific class. After the object is classified, the
greedy non-maximum suppression algorithm is used to filter
the region, as shown in Fig. 1(b).

(a)
Fig. 2: Overview of object detection method using RCNN [4].

DDCLS'18
582
The Faster RCNN algorithm, as is shown in figure 2, can be 3.3 Vehicle Detection
roughly divided into four steps:
In the experiment, the images of training dataset are as
1) Shared Layer Convolution
input, and the foreground area of target vehicle is detected by
As a method of target detection based on convolutional
RCNN and Faster RCNN, which use AlexNet Structure. The
neural networks, a dataset of basic convolution layers is used
AlexNet structure is shown in Fig. 4.
to extract the feature of the input images, which will be
The first five layers of AlexNet are convolution layers and
shared for subsequent RPN layer and fully connected layer.
the latter three layers are fully connected layers. The
2) Regional Recommendations Network
convolution layer is alternated by convolution pool with the
RPN is the full convolutional network, which outputs the
maximum size of 3×3 and step size of 2×2. In AlexNet, the
dataset of candidate regions and the target scores of each
method of pool is used to reduce the number of neurons, the
candidate region. The sliding small network on the
amount of computation and to control the over-fitting. Using
convolution feature map of shared convolution layer output
ReLU (Rectified Linear Unit) as the excitation function
is fully connected to the n×n window of the input
instead of the traditional function Logistics makes it easier to
convolutional feature map. Each sliding window is mapped
forward propagation and calculate the reverse gradient by
to a low-dimensional vector, then it will be output to the peer
using partial derivative. It also helps avoid complex
regression layer (reg) and the classification layer (cls). The K
algorithms such as exponent and division; The hidden layer
candidate regions are defined at the location of each sliding
neurons with output less than 0 are discarded to increase the
window in an arbitrary scale of length and width. There are
sparsity and alleviate the overfitting effect. The AlexNet
WHK candidate frames altogether for convolution feature
network structure is adjusted for experiments and the
mapping with a size of W×K.
extracted vehicle area is outputted as shown in Fig 3.
3) Object Detection Network
This part mainly refers to the spatial pyramid pooling
network [2], and the concept of pool layer is introduced in
this part. 1) the pooling layer is regarded as a single layer
space pyramid. 2) the original image is convolved only once
to get the feature of the whole picture, which is loaded into
the candidate box. The convolution feature of each candidate
frame is input to the next layer. By using a fixed size space
range of H×W, the subwindows of the RoI of a size of h×w is
extracted. These subwindows are mapped to the
corresponding grid cells by maximum pool.
4) Classification
Use the feature graph of the candidate box to calculate the (a)
category, the final exact position of the detection box is
obtained by bounding-box again. The multitask parallel loss
function is shown in equation (1):
1 1 (1)
L({ p i }, {v i }) =
N cls
*
L
i
cls
* *
( pi , pi ) + λ
N reg
p
i
i Lreg(vi , vi )
.

i is the index of a middle candidate mini-batch, and pi is


the prediction probability of the target. If the candidate is a
foreground, pi* is 1, and if the candidate is a background,
pi* is 0, vi is a vector, representing four parameterized
*
coordinates, the v i are the coordinate vectors of the
rectangular frame corresponding to the foreground candidate (b)
frame. Fig. 3: Vehicle detection, select the best area and output area scores
(a) the first area scores 0.99516; (b) the second area scores 0.99997

Fig. 4: AlexNet Structure [5]

DDCLS'18
583
The dataset of 3009 pictures collected by Stanford
3.4 Vehicle Classification
University is used to evaluate the trained models. We draw
The foreground vehicle area obtained by vehicle detection the loss curve and accuracy curve of every classification
algorithm is introduced into another convolutional neural and recognition network structure, which are helpful to
network for vehicle recognition. Convolutional neural understand the training process of every model. From the
network uses a certain number of convolutional kernels to comparison of the curves, it is clear that the Resnet-101
slide the input image with a certain step size to extract can get the fastest convergence. as shown in Fig. 6.
features in the convolution layer. Each convolution kernel
focuses on different features, so the convolution obtains
different feature maps. The feature maps are passed into the
latter convolution layer, and then the lower convolution layer
uses a certain number of convolution kernels to extract the
features of upper feature map, this operation will be
repeated. The weighting parameter will be got after the
features are processed for several times. Then the weighting

Loss
parameter will be connected to the full connection layer, the
full connection layer plays a role of classifier in the entire
convolution neural network, namely a model to classify the
extracted features. By using the method of Deep Learning,
the features ignored by the traditional methods can be
extracted, the recognition accuracy can be improved
obviously, and different convolution kernels are used to
obtain convolution feature diagrams, as shown in Fig .5.
Accuracy
(a)

Fig. 5: convolution feature map; AlexNet 1st layer visualization

4 Experimental Results and Discussion


The experimental results show that Faster RCNN
detection method has integrated feature extraction,
candidate frame boundary generation, linear regression (b)
and classification into a network, thus the comprehensive Fig. 6 (a) loss curves of loss; (b) curves of accuracy
performance has been greatly improved. In the cascade The experimental result obtained from the dataset is
vehicle classification network structure, the rate of shown in table 1. it can be seen that the detection network
accuracy is high; As for vehicle classification recognition Faster RCNN on the classification network cascade has a
network structure, we slightly adjust several networks higher accuracy than the RCNN on the cascade, and the
and network parameters to do training experiment. After best performing one is the cascading of the Vggnet19
collecting the relevant training experiment information, network structure, achieve 93.32%.
we analyze the accuracy rate of the different network
structures in two detection structure (RCNN and Faster Table 1: average accuracy of several models
RCNN). The accuracy calculation formula is as shown in RCNN Faster RCNN
equation (2). AlexNet 62.87% 65.20%
CN (correct ) Vggnet16 82.77% 92.82%
CR (%) =
TN (CN (correct ) + EN (error )) (2) Vggnet19 79.32% 93.32%
In the equation, CN means for the number of correctly GoogLenet 81.00% 91.09%
classified images and EN means for the number of Resnet-50 78.87% 90.96%
incorrectly classified images indicates the correct rate. To Resnet-101 82.11% 91.99%
compare the experimental results of various network
results, the classification network is applied into different In order to understand the network structure of
networks, such as AlexNet, Vggnet16, Vggnet19, classification and recognition in the case of cascaded Faster
GoogLenet, Resnet50 and Resnet101. RCNN for each class of classification, and to get a more

DDCLS'18
584
accurate analysis of experimental results, we have counted it has more training images it could extract more useful
the recognition accuracy for each category. The result is features to facilitate generalization.
shown in table 2. Chevrolet has a higher accuracy because

Table 2: Accuracy of Models Classification


BMW Ford Chevrolet
CN/TN CR CN/TN CR CN/TN CR
AlexNet 371/524 70.80% 318/514 61.87% 622/894 69.57%
Vggnet16 493/524 94.08% 468/514 91.05% 854/894 95.53%
Vggnet19 494/524 94.27% 471/514 91.63% 859/894 96.09%
GoogLenet 495/524 94.47% 436/514 84.82% 851/894 95.19%
Resnet-50 477/524 91.03% 451/514 87.74% 848/894 94.85%
Resnet-101 478/524 91.22% 468/514 91.05% 853/894 95.41%

Mercedes-Benz Audi Volkswagen


CN/TN CR CN/TN CR CN/TN CR
AlexNet 148/257 57.59% 389/580 60.07% 125/240 52.08%
Vggnet16 234/257 91.05% 536/580 92.41% 208/240 86.67%
Vggnet19 234/257 91.05% 540/580 93.10% 210/240 87.50%
GoogLenet 209/257 81.32% 536/580 92.41% 214/240 89.17%
Resnet50 214/257 83.27% 534/580 92.07% 213/240 88.75%
Resnet101 229/257 89.11% 529/580 91.21% 211/240 87.92%

In order to reduce the impact of the data sample on when Faster RCNN was concatenated after expanding the
classification accuracy, we use mirroring (left and right sample training dataset. At the same time, the recognition
flip)3049 training images, 6098 training pictures were added accuracy of each model for real road vehicles also increased.
into the original dataset. Several network models were used The results are shown in table 3. Such as AlexNet’s average
in the experiments. The experiment came to show that the accuracy rate increased from 65.2% to 73.55%, and
average accuracy of each classification model improved Vggnet16 increased from 92.82% to 92.94%.

Table 3: Accuracy rates of extended training classification models

BMW Ford Chevrolet Mercedes-Benz Audi Volkswagen ARC


Alexnet 82.63% 69.26% 69.80% 68.87% 76.80% 74.17% 73.55%
Vggnet16 86.03% 84.27% 89.71% 86.34% 90.67% 83.20% 87.54%
Vggnet19 93.32% 92.80% 96.20% 92.22% 95.69% 89.17% 94.12%
GoogLenet 91.73% 84.44% 93.67% 90.11% 91.47% 88.40% 90.63%
Resnet50 90.46% 83.85% 95.19% 89.88% 92.07% 85.42% 90.59%
Resnet101 92.52% 91.76% 96.03% 92.60% 96.03% 90.30% 93.92%

We apply the model trained by the mirror image dataset the best accuracy, AlexNet’s accuracy is the lowest. The
into the real traffic road video images collected by ourselves. reason is that the resnet101 and Vggnet19 have more
The collection of real traffic vehicle pictures is also tested. network layers and smaller convolution kernels and so that
The results are as shown in table 4. We can see that for the perform better feature extraction on images and complete the
recognition of the real traffic road image, the mirrored key steps of image recognition better.
training image dataset obtains the better classification
accuracy. At the same time, resnet101 and Vggnet19 reach
Table 4: Average accuracy of real traffic vehicle images
The original dataset Mirrored dataset
AlexNet 54.33% 59.31%
Vggnet16 63.54% 80.13%
Vggnet19 69.67% 84.66%
GoogLenet 67.44% 83.24%
Resnet50 78.70% 83.21%
Resnet101 79.43% 86.78%

To understand the experiment results, we use confusion shown in table 5. It can be seen that for the real road
matrix to analyze the specific error information. Take the images, BMW has the highest classification accuracy rate
model Vggnet16 as an example, the specific information is of 84.9%, because BMW is more easily recognized on the

DDCLS'18
585
appearance of the vehicle than other vehicles; at the same models is quite different. As shown in the table, Chevrolet
time, it is mistaken the probability of inspecting Audi is the has the lowest classification accuracy, only 65.14%.
lowest, only 1.92%, because the appearance of the two

Table 5: confusion matrix of real road vehicle test

BMW Ford Chevrolet Mercedes-Benz Audi Volkswagen


BMW 84.9% 2.53% 4.32% 4.33% 1.92% 2.00%
Ford 5.29% 82.01% 1.47% 5.30% 2.64% 3.29%
Chevrolet 6.90% 3.45% 77.19% 5.44% 4.92% 2.10%
Mercedes-Benz 4.30% 8.25% 7.18% 70.04% 6.54% 3.69%
Audi 1.84% 4.27% 5.20% 6.22% 79.14% 3.33%
Volkswagen 7.25% 9.90% 7.33% 6.52% 3.86%% 65.14%

The proposed method can detect and recognize vehicles


from different angles and multiple scenes as shown in Fig. 7.

Fig. 7: the real traffic vehicle detection and identification

However, the accuracy of a real traffic vehicle the propose method is effective and validated by the relevant
identification is still not high enough. The main reasons are experiments. The experimental results show that the vehicle
as follows: recognition rate is obviously improved. In the next step of
1) Image definition is low; the identification comes to be the study, we will apply the proposed method into the real
other similar vehicle types. traffic flow detection, and further improve the accuracy
2) The aspect ratio of some test images is too large that and robustness.
deformation is produced in the processing procedure.
3) The training dataset is still not large enough, leading to References
inaccurate location,a wrong classification of the test image [1] R. Girshick, Donahue J, Darrell T, et al. Rich Feature
dataset; an imbalanced training sample; few pictures of hierarchies or accurate object detection and semantic
certain vehicle types. segmentation, CVPR, 2014.
4) The method of this paper is still unable to identify [2] K. He, X. Zhang, S. Ren, et al. Spatial pyramid pooling in
accurately when it comes to the similar appearances of the deep convolution networks or visual classification, ECCV,
same vehicle type, extreme conditions such as fog and snow 2014.
have not been further studied. [3] R. Girshick, Fast RCNN, IEEE International Conference on
Computer Vision (ICCV), 2015.
The following research will further solve the problem that
[4] S. Ren, K. He, R. Girshick, et al. Faster RCNN: Towards
the accuracy of this method is not high in real traffic vehicle real-time object detection with region proposal networks.
detection and recognition. NIPS, 2015.
With increasing number of image in the training dataset, [5] A. Krizhevsky, I. Sutskever, and G. Hinton. Image net
the recognition accuracy vehicle is improved obviously. It is classification with deep convolution neural networks. In
clear that we can improve the training dataset in the real NIPS, 2012.
traffic surveillance, thus further improving the accuracy of [6] Z. W. Wu. Application of convolution neural network in
this method. image classification. University of Electronic Science and
Technology of China, 2015.
5 Conclusions [7] Q. W. Yao. Car face classification based on convolution
neural network. Zhejiang University, 2016.
Based on the different CNNs, this paper studies the vehicle [8] Y. Xiong. Vehicle type classification based on depth
model recognition and detection in urban traffic video learning. Huazhong University of Science and Technology,
surveillance. By means of the increase of training dataset, the 2014.
modification of parameters and the replacement of model,

DDCLS'18
586
[9] L. Deng, Z. J. Wang. Research on vehicle type classification [15] J. Krause, M. Stark, D. Jia, et al. 3D Object Representations
based on deep convolution neural network, Application or fine-Grained Categorization, IEEE International
Research of Computers, 33 (3): 930-932, 2016. Conference on Computer Vision Workshops. IEEE,
[10] J. Krause, M. Stark, D. Jia, et al. 3D Object Representations 2014:554-561.
or fine-Grained Categorization, IEEE International [16] M. Conos, Classification on vehicle make from a frontal view,
Conference on Computer Vision Workshops. IEEE Computer Master, Czech Tech. University Prague, Czech Republic,
Society, 2013: 554-561. 2006.
[11] L. Yang, P. Luo, C. L. Chen, et al. A large-scale car dataset or [17] A. Psyllos, et al., Vehicle model classification from frontal
fine-grained categorization and verification. 3973-3981, view image measurements, Computer Standards &
2015. Interfaces, 33:142-151, 2011.
[12] T. Huang, H. Yu, Y. H. Tian, et al. Salient region detection [18] Y. Peng, J. S. Jin, S. Luo, et al. Vehicle Type Classification
and segmentation or general object classification and image Using PCA with Sel-Clustering, IEEE International
understanding. Science China, 54(12):2461-2470, 2011. Conference on Multimedia and Expo Workshops. IEEE
[13] F. Y. Zhang. Based on depth learning of vehicle positioning Computer Society, 2012:384-389.
and vehicle classification. Jiangsu University, 2016. [19] M. Jaderberg, K. Simonyan, A. Zisserman, et al. Spatial
[14] K. Simonyan, A. Zisserman. Very Deep Convolution transformer networks. In Advances in Neural Information
Networks or Large-Scale Image classification. Computer Processing Systems, 2008–2016, 2015.
Science, 2014.

DDCLS'18
587

You might also like