Chinese Association of Automation. Technical Committee on Data Driven Control et al. - Unknown - Proceedings of 2018 IEEE 7th Data Drive-annotated
Chinese Association of Automation. Technical Committee on Data Driven Control et al. - Unknown - Proceedings of 2018 IEEE 7th Data Drive-annotated
Abstract: The vehicle detection and classification are important tasks in intelligent transportation system. The traditional
methods of vehicle detection and classification often cause the coarse-grained results due to suffering from the limited
viewpoints. Inspired by the latest achievements of Deep Learning successfully applied on images classification in recent years,
this paper presents a method based on convolutional neural network, which consists of two steps: vehicle area detection and
vehicle brand classification. Several typical network models have been applied in training and classification experiments for the
detailed contrast analysis, such as RCNN (Regions with Convolutional Neural Network features), Faster RCNN, AlexNet,
Vggnet, GoogLenet and Resnet. The proposed method can identify the vehicle models, brands and other information accurately,
with the original dataset and enriched dataset, the algorithm can obtain the results with average accuracy about 93.32% in the
classification of six kinds of vehicle models.
Key Words: Convolution neural network, Vehicle detection, Vehicle type classification
(a)
Fig. 2: Overview of object detection method using RCNN [4].
DDCLS'18
582
The Faster RCNN algorithm, as is shown in figure 2, can be 3.3 Vehicle Detection
roughly divided into four steps:
In the experiment, the images of training dataset are as
1) Shared Layer Convolution
input, and the foreground area of target vehicle is detected by
As a method of target detection based on convolutional
RCNN and Faster RCNN, which use AlexNet Structure. The
neural networks, a dataset of basic convolution layers is used
AlexNet structure is shown in Fig. 4.
to extract the feature of the input images, which will be
The first five layers of AlexNet are convolution layers and
shared for subsequent RPN layer and fully connected layer.
the latter three layers are fully connected layers. The
2) Regional Recommendations Network
convolution layer is alternated by convolution pool with the
RPN is the full convolutional network, which outputs the
maximum size of 3×3 and step size of 2×2. In AlexNet, the
dataset of candidate regions and the target scores of each
method of pool is used to reduce the number of neurons, the
candidate region. The sliding small network on the
amount of computation and to control the over-fitting. Using
convolution feature map of shared convolution layer output
ReLU (Rectified Linear Unit) as the excitation function
is fully connected to the n×n window of the input
instead of the traditional function Logistics makes it easier to
convolutional feature map. Each sliding window is mapped
forward propagation and calculate the reverse gradient by
to a low-dimensional vector, then it will be output to the peer
using partial derivative. It also helps avoid complex
regression layer (reg) and the classification layer (cls). The K
algorithms such as exponent and division; The hidden layer
candidate regions are defined at the location of each sliding
neurons with output less than 0 are discarded to increase the
window in an arbitrary scale of length and width. There are
sparsity and alleviate the overfitting effect. The AlexNet
WHK candidate frames altogether for convolution feature
network structure is adjusted for experiments and the
mapping with a size of W×K.
extracted vehicle area is outputted as shown in Fig 3.
3) Object Detection Network
This part mainly refers to the spatial pyramid pooling
network [2], and the concept of pool layer is introduced in
this part. 1) the pooling layer is regarded as a single layer
space pyramid. 2) the original image is convolved only once
to get the feature of the whole picture, which is loaded into
the candidate box. The convolution feature of each candidate
frame is input to the next layer. By using a fixed size space
range of H×W, the subwindows of the RoI of a size of h×w is
extracted. These subwindows are mapped to the
corresponding grid cells by maximum pool.
4) Classification
Use the feature graph of the candidate box to calculate the (a)
category, the final exact position of the detection box is
obtained by bounding-box again. The multitask parallel loss
function is shown in equation (1):
1 1 (1)
L({ p i }, {v i }) =
N cls
*
L
i
cls
* *
( pi , pi ) + λ
N reg
p
i
i Lreg(vi , vi )
.
DDCLS'18
583
The dataset of 3009 pictures collected by Stanford
3.4 Vehicle Classification
University is used to evaluate the trained models. We draw
The foreground vehicle area obtained by vehicle detection the loss curve and accuracy curve of every classification
algorithm is introduced into another convolutional neural and recognition network structure, which are helpful to
network for vehicle recognition. Convolutional neural understand the training process of every model. From the
network uses a certain number of convolutional kernels to comparison of the curves, it is clear that the Resnet-101
slide the input image with a certain step size to extract can get the fastest convergence. as shown in Fig. 6.
features in the convolution layer. Each convolution kernel
focuses on different features, so the convolution obtains
different feature maps. The feature maps are passed into the
latter convolution layer, and then the lower convolution layer
uses a certain number of convolution kernels to extract the
features of upper feature map, this operation will be
repeated. The weighting parameter will be got after the
features are processed for several times. Then the weighting
Loss
parameter will be connected to the full connection layer, the
full connection layer plays a role of classifier in the entire
convolution neural network, namely a model to classify the
extracted features. By using the method of Deep Learning,
the features ignored by the traditional methods can be
extracted, the recognition accuracy can be improved
obviously, and different convolution kernels are used to
obtain convolution feature diagrams, as shown in Fig .5.
Accuracy
(a)
DDCLS'18
584
accurate analysis of experimental results, we have counted it has more training images it could extract more useful
the recognition accuracy for each category. The result is features to facilitate generalization.
shown in table 2. Chevrolet has a higher accuracy because
In order to reduce the impact of the data sample on when Faster RCNN was concatenated after expanding the
classification accuracy, we use mirroring (left and right sample training dataset. At the same time, the recognition
flip)3049 training images, 6098 training pictures were added accuracy of each model for real road vehicles also increased.
into the original dataset. Several network models were used The results are shown in table 3. Such as AlexNet’s average
in the experiments. The experiment came to show that the accuracy rate increased from 65.2% to 73.55%, and
average accuracy of each classification model improved Vggnet16 increased from 92.82% to 92.94%.
We apply the model trained by the mirror image dataset the best accuracy, AlexNet’s accuracy is the lowest. The
into the real traffic road video images collected by ourselves. reason is that the resnet101 and Vggnet19 have more
The collection of real traffic vehicle pictures is also tested. network layers and smaller convolution kernels and so that
The results are as shown in table 4. We can see that for the perform better feature extraction on images and complete the
recognition of the real traffic road image, the mirrored key steps of image recognition better.
training image dataset obtains the better classification
accuracy. At the same time, resnet101 and Vggnet19 reach
Table 4: Average accuracy of real traffic vehicle images
The original dataset Mirrored dataset
AlexNet 54.33% 59.31%
Vggnet16 63.54% 80.13%
Vggnet19 69.67% 84.66%
GoogLenet 67.44% 83.24%
Resnet50 78.70% 83.21%
Resnet101 79.43% 86.78%
To understand the experiment results, we use confusion shown in table 5. It can be seen that for the real road
matrix to analyze the specific error information. Take the images, BMW has the highest classification accuracy rate
model Vggnet16 as an example, the specific information is of 84.9%, because BMW is more easily recognized on the
DDCLS'18
585
appearance of the vehicle than other vehicles; at the same models is quite different. As shown in the table, Chevrolet
time, it is mistaken the probability of inspecting Audi is the has the lowest classification accuracy, only 65.14%.
lowest, only 1.92%, because the appearance of the two
However, the accuracy of a real traffic vehicle the propose method is effective and validated by the relevant
identification is still not high enough. The main reasons are experiments. The experimental results show that the vehicle
as follows: recognition rate is obviously improved. In the next step of
1) Image definition is low; the identification comes to be the study, we will apply the proposed method into the real
other similar vehicle types. traffic flow detection, and further improve the accuracy
2) The aspect ratio of some test images is too large that and robustness.
deformation is produced in the processing procedure.
3) The training dataset is still not large enough, leading to References
inaccurate location,a wrong classification of the test image [1] R. Girshick, Donahue J, Darrell T, et al. Rich Feature
dataset; an imbalanced training sample; few pictures of hierarchies or accurate object detection and semantic
certain vehicle types. segmentation, CVPR, 2014.
4) The method of this paper is still unable to identify [2] K. He, X. Zhang, S. Ren, et al. Spatial pyramid pooling in
accurately when it comes to the similar appearances of the deep convolution networks or visual classification, ECCV,
same vehicle type, extreme conditions such as fog and snow 2014.
have not been further studied. [3] R. Girshick, Fast RCNN, IEEE International Conference on
Computer Vision (ICCV), 2015.
The following research will further solve the problem that
[4] S. Ren, K. He, R. Girshick, et al. Faster RCNN: Towards
the accuracy of this method is not high in real traffic vehicle real-time object detection with region proposal networks.
detection and recognition. NIPS, 2015.
With increasing number of image in the training dataset, [5] A. Krizhevsky, I. Sutskever, and G. Hinton. Image net
the recognition accuracy vehicle is improved obviously. It is classification with deep convolution neural networks. In
clear that we can improve the training dataset in the real NIPS, 2012.
traffic surveillance, thus further improving the accuracy of [6] Z. W. Wu. Application of convolution neural network in
this method. image classification. University of Electronic Science and
Technology of China, 2015.
5 Conclusions [7] Q. W. Yao. Car face classification based on convolution
neural network. Zhejiang University, 2016.
Based on the different CNNs, this paper studies the vehicle [8] Y. Xiong. Vehicle type classification based on depth
model recognition and detection in urban traffic video learning. Huazhong University of Science and Technology,
surveillance. By means of the increase of training dataset, the 2014.
modification of parameters and the replacement of model,
DDCLS'18
586
[9] L. Deng, Z. J. Wang. Research on vehicle type classification [15] J. Krause, M. Stark, D. Jia, et al. 3D Object Representations
based on deep convolution neural network, Application or fine-Grained Categorization, IEEE International
Research of Computers, 33 (3): 930-932, 2016. Conference on Computer Vision Workshops. IEEE,
[10] J. Krause, M. Stark, D. Jia, et al. 3D Object Representations 2014:554-561.
or fine-Grained Categorization, IEEE International [16] M. Conos, Classification on vehicle make from a frontal view,
Conference on Computer Vision Workshops. IEEE Computer Master, Czech Tech. University Prague, Czech Republic,
Society, 2013: 554-561. 2006.
[11] L. Yang, P. Luo, C. L. Chen, et al. A large-scale car dataset or [17] A. Psyllos, et al., Vehicle model classification from frontal
fine-grained categorization and verification. 3973-3981, view image measurements, Computer Standards &
2015. Interfaces, 33:142-151, 2011.
[12] T. Huang, H. Yu, Y. H. Tian, et al. Salient region detection [18] Y. Peng, J. S. Jin, S. Luo, et al. Vehicle Type Classification
and segmentation or general object classification and image Using PCA with Sel-Clustering, IEEE International
understanding. Science China, 54(12):2461-2470, 2011. Conference on Multimedia and Expo Workshops. IEEE
[13] F. Y. Zhang. Based on depth learning of vehicle positioning Computer Society, 2012:384-389.
and vehicle classification. Jiangsu University, 2016. [19] M. Jaderberg, K. Simonyan, A. Zisserman, et al. Spatial
[14] K. Simonyan, A. Zisserman. Very Deep Convolution transformer networks. In Advances in Neural Information
Networks or Large-Scale Image classification. Computer Processing Systems, 2008–2016, 2015.
Science, 2014.
DDCLS'18
587