0% found this document useful (0 votes)
8 views

Asım et al. - Unknown - A Vehicle Detection Approach using Deep Learning Methodologies-annotated

This study focuses on training a vehicle detection system using R-CNN and Faster R-CNN deep learning methodologies to optimize detection success rates. The process involves six main stages including data loading, CNN design, training configuration, detector training, and evaluation. Experimental comparisons are made to assess the effectiveness of the trained vehicle detector on sample datasets.

Uploaded by

ravinderytuse
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Asım et al. - Unknown - A Vehicle Detection Approach using Deep Learning Methodologies-annotated

This study focuses on training a vehicle detection system using R-CNN and Faster R-CNN deep learning methodologies to optimize detection success rates. The process involves six main stages including data loading, CNN design, training configuration, detector training, and evaluation. Experimental comparisons are made to assess the effectiveness of the trained vehicle detector on sample datasets.

Uploaded by

ravinderytuse
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

A Vehicle Detection Approach using Deep Learning Methodologies

Abdullah Asım YILMAZ1


Mehmet Serdar GÜZEL2
İman ASKERBEYLİ3
Erkan BOSTANCI4
1,2,3,4
Computer Engineering Department
Ankara University
Golbasi 50th Year Campus, I Block, 06830, Ankara,TR
Corresponding author email: [email protected]

Abstract: The purpose of this study is to successfully train our vehicle detector using R-CNN, Faster R-CNN
deep learning methods on a sample vehicle data sets and to optimize the success rate of the trained detector by
providing efficient results for vehicle detection by testing the trained vehicle detector on the test data. The
working method consists of six main stages. These are respectively; loading the data set, the design of the
convolutional neural network, configuration of training options, training of the Faster R-CNN object detector and
evaluation of trained detector. In addition, in the scope of the study, Faster R-CNN, R-CNN deep learning
methods were mentioned and experimental analysis comparisons were made with the results obtained from
vehicle detection.

Keywords: Vehicle detection, Deep Learning, Convolutional Neural Network

1 Introduction carried out about about vehicle detection and tracking


systems and day after day new solutions and
Today, many new technological developments have algorithms are developed with new studies [1].
occurred. As a result of these technological
developments, people may face several cruical Deep Learning and relevant techonologies are
problems. Some of the negativities to be experienced another mostly elaboreted topic in recent periods, as
with the detection of such problems can be well. If we define deep learning methods that they
minimized through various approaches. are the methods that includes artificial neural
networks , comprising at least one hidden layer. They
Because of this; it is necessary to fall in parallel with arecapable of performing feature detection process
technological and scientific developments, traffic automatically from large amounts of tagged training
accidents are still at the forefront people's daily life data. Deep learning methods utilize algorithms
both in our country and also all over the world. known as Neural Networks, which are inspired by
Compared to air, sea and railway traffic, traffic on information processing methods of biological
highways continues to be a significiant problem in nervous systems such as brain and these methods
current life [1]. Because no human being can not be allow computers to learn what each data represents
thought without driving out on a daily basis, traffic and what each corresponding model actually means.
accident are considered to be an ordinary event of
each human being. For this reason, it is essential for In this study, vehicle detection and deep learning
any person who involved in a traffic accident to approaches are combined. Moreover, our vehicle
objectively present the possible defect situations in a detector on the sample vehicle data sets are
technical, legal and scientific way. So, vehicle individually and successfully trained using fast R-
detection and tracking systems gather lots of CNN and R-CNN deep learning methods
attention from the reasearchers and is highly focused respectively. The trained vehicle detector is tested on
on by scientis. Therefore, there are studies have been the test data and efficient results are obtained from
vehicle detection problem. In addition, the success segmentation using constrained parametric min-cuts
detection rate of the trained detector has been tried to [5] , Multiscale combinatorial grouping [6] and so on
be maximized as much as possible and experimental [7]. These methods establish cells by implementing
analysis comparisons are made with the results convolution neural network with square cuts.
obtained from the methods.. Overall section will Although R-CNN is not based on the specific zone
detail the proposed method, wheras will illustrate proposal method, R-CNN performs its operations
implemnetaiton details and also the exerimental using selective search methods to provide
results. Finally , the study will be concluded in comparison with the predetermined work.
section 4.
2.1.2 CNN (Convolutional Neural Network) for
2 Methods used for Vehicle Detection Feature extraction

The working method consists of six main stages. In this study, a feature vector of size 4096 were
These are respectively; loading the data set, the extracted from each region proposal with Caffe deep
design of the convolutional neural network, learning framework. Features were calculated by
configuration of training options, training of the forwarding the average output 227x227 red-green-
Faster R-CNN object detector, evaluation of trained blue image with five convolution layers and two
detector. These stages and conventional and the faster completely connected layers.
R-CNN methods will be discussed in this section. In order to calculate an attribute in a region proposal,
the image data is first converted to a form compatible
with CNN. (In this study, fixed entrances of 227 *
2.1 R-CNN (Regions with Convolutional 227 pixels in size are used.). Then,the most simple of
Neural Network Features) the possible transformations of the random-shaped
regions was selected. Here, all the pixels in a tight
The R-CNN approach combines two basic concepts. bounding box around the candidate area are resolved
From these, the first is to carry out efficient unto the required size, regardless of the size or aspect
convolutional neural networks from bottom to up ratio. Before dissolving, the tight bounding box was
region proposals to locate and dismember objects. expanded to provide w pixels skewed picture content
Next, when the label training data is insufficient, it is around the box at the skewed dimension (w = 16 was
followed by a supervised training for a field-specific used). In addition, a simple bounding box regression
fine tuning task, which provides significant was used to expand the localization performance
performance improvement. The method is named R- within the application [13]. This is shown in the
CNN (CNN-enabled regions) because Regional following equation (1). The details of this equation
proposals are combined with CNNs. can be seen in [8].
𝑁
The working object detection system composed of 𝑎𝑟𝑔𝑚𝑖𝑛 2
three modules. Firstly, it categorically produces 𝑤∗ = ∑(𝑡∗𝑖 − 𝑤∗𝑇 ∅5 (𝑃İ ))2 + 𝜆||𝑊∗ ||
𝑤𝑥
independent region proposals. These essentially 𝑖
describe the candidate detection set that can be used (1)
by the detector. The second module includes a 2.1.3 Classify Regions
convolutional neural network, producing an attribute
vector of constant length from every region. The third In this study, selective search was performed on test
module, on the other hand, includes a cluster of linear images to obtain approximately 2000 region
SVMs that are specific to the class used for proposals at test time. Each proposal has been
assortment of regions [8]. resolved and advanced through CNN for the
calculation of attributes. Then, for each class, each
produced attribute vector is scored using the trained
2.1.1 Region Proposal support vector machine (SVM). Considering the
scored regions, greedy non-maximum
Various recent studies have provided methods to suppression is applied independently when there
produce categorical independent zone is high-intersection (IoU) overlap with the
recommendations.These methods have examples selected zone with a higher rating over a learned
such as the objectness of image windows [1] , threshold for a rejected region.
selective Search for Object Recognition [3] , category
independent object proposals [4] , object
2.1.4 R-CNN Training propose zones called RPN, and The second
component is the Faster R-CNN detector which
The details of R-CNN training will be discussed at utilize the region proposals. The whole system comes
the following sub-sections. from a single composite network created for object
detection [10].
2.1.4.1 Supervised Pre-training
The first component is a conventional network used
CNN was previously trained on a large auxiliary data to propose regions called RPN (Region Proposal
set (ILSVRC2012 classification [9,13]) using only Network)
image-level additional tags. CNN was previously
trained on data set (ImageNET ILSVRC2012 [9,13]) 2.2.1 Region Proposal Networks (RPN)
using only additional tags. This training was carried
out using Caffe Deep Learning framework. In this study, RPN receives as input image and
produces a set of rectangle object tender which all
have objectivity score. The RPN is designed with a
2.1.4.2 Domain-Specific Fine-Tuning fully convoluted network. Since calculations are
shared with a Faster R-CNN object detection
In order to adjust Convolutional Neural Network to network, it is assumed that both networks share a
new task and domain name , SGD training was common set of layers of convolution.
performed to function parameters using only warped
region proposals. Convolutional Neural Networks A mini network is moving on the exit of the
ImageNetspecific 1000 way classification layer has convolution property map by the last shared
been changed over with the N+1 way classification convolution layer to produce region proposals. As an
layer. Convolutional Neural Network framework has input, it takes the space window of the convolutional
not been changed here. (N = 20 for VOC and N = 200 property map n × n(used as n=3). All sliding windows
for ILSVRC2013). in work are matched to low dimensional property.
This feature composed of two sister fully bound box-
All region proposals, which are equal to or greater regression and box-classification layers. In this mini
than 0.5 iou overlap value, were accepted as positive network, all the fully connected layers are shared in
for the box class and others were accepted as all spatial locations.This framewoek is carried out by
negative. In each SGD iteration, 32 positive windows the convolution Layer and following the two brothers
and 96 background windows are properly sampled to 1 x 1 convolution layer.
create a mini stack of 128 sizes.
2.2.1.1 Training RPNs (Region Proposal Network)
2.1.4.3 Object Category Classifiers
In this study, RPNs are trained end-to-end with
Here, binary classifier training was used to perceive backpropagation and SGD. In order to train this
cars. It is a positive example of an image area in network, "image-centric sampling" strategy is
which a car is tightly enclosed. In a similar way, a applied.In the study, all the batchs come from the
background region that is not interested in cars is a images involving negative and positive sample
negative example. It is unclear how a partially anchors.
overlapping region of the car should be labeled. the
unclear state is solved by specifying an IoU overlap When the missing functions of all the anchors are
threshold value. Areas below this threshold value are optimized here, the orientation of the negative
identified as negative and those above the threshold examples is realized. For this reason, a random
value as positive.The overlap threshold “0.3” was sample of 256 anchors is shown in an image instead.
chosen by conducting a grid search on the According to this, if there are more than 128 positive
verification set. Once the features are removed and samples, it is filled with stacked samples. Otherwise,
the training tags are applied, SVM is applied it is filled with negative examples. In addition,
optimally to all classes. following the multitasking loss for Fast R-CNN, the
objective function has been reduced to a minimum.
2.2 Faster R-CNN This loss function is shown in the following
equation (2) , the details of the Equation also can be
The Faster R-CNN composed of two component. The seen in [10].
first component is a conventional network used to
1 This study aims to successfully train the vehicle
𝐿({𝑝𝑖 }, {𝑡𝑖 }) ∑ 𝐿𝑐𝑙𝑠(𝑃 ,𝑃∗)+𝜆 1 ∑ 𝑝 detector on the sample vehicle data sets using the
𝑁𝑐𝑙𝑠 İ 𝑖 𝑁𝑟𝑒𝑔 𝑖 𝑖 𝐿𝑟𝑒𝑔(𝑡𝑖 ,𝑡∗ )
𝑖 𝑖 Faster R-CNN and R-CNN deep learning methods,
shown in Section. It also aims to achieve maximal
(2) results for vehicle detection by testing the trained
vehicle detector on the test data. In addition, the
results obtained from these methods are compared
with experimental analysis. To do this, the Caffe deep
2.2.2 Sharing Features for RPN and Faster R- learning framework is used on the Matlab Program.
CNN
The purpose of this study is to successfully train our
vehicle detector using R-CNN, Faster R-CNN deep
Until now, it has been explained how to train a learning methods on a sample vehicle data sets and to
network to generate a region proposal without taking optimize the success rate of the trained detector by
into account the region-based object detection. Here, providing efficient results for vehicle detection by
Faster R-CNN is used for the detection network. testing the trained vehicle detector on the test data.
After that, a unified network learning algorithm The working method consists of six main stages.
consisting of shared convolution layer RPN and These are respectively; loading the data set, the
Faster R-CNN is defined. design of the convolutional neural network,
configuration of training options, training of the
Here both RPN and Faster R-CNN are trained Faster R-CNN object detector, evaluation of trained
independently and the layers of convolution are detector. In addition, in the scope of the study, Faster
characterized by different forms. For this reason, R-CNN, R-CNN deep learning methods were
instead of learning two separate networks, techniques mentioned and experimental analysis comparisons
have been developed that are able to share layers of were made with the results obtained from vehicle
convolution between two networks. There are three detection.
techniques to train networks in shared properties.
These are alternating training which are, approximate Our application consists of 6 main steps. These;
joint and non-approximate joint trainings loading the data set, the design of the convolutional
respectively. neural network, configuration of training options,
training of the Faster R-CNN object detector,
evaluation of trained detector.
2.3 Comparison of Faster R-CNN and R-CNN
Methods First, the loading of the data set is performed. In this
study, two different vehicle dataset were employed.
Today, the most sophisticated object detection The first dataset includes approximately 350 images
networks are based on region proposal algorithms for [11] and 1000 images are obtained from the second
the description and identification of object public vehicle dataset [12] . Each image in these
locations.Faster R-CNN has put forward the regional datasets includes one or two tagged vehicle samples .
proposal calculation as a bottleneck by reducing the In this study, the training data is stored on a table. The
working time of these detection networks in R-CNN. existing columns on the table contain the contents of
In the Faster R-CNN, a Region Proposition Network the path of the image files and the ROI tags for the
(RPN) is implemented that shares full image vehicles. In addition, in this section, the data set is
convolution characteristics using the detection divided into training and test sets to train the detector
network, so that almost free region proposals can be and evaluate the detector. In this part, in order to train
made. Faster R-CNN, along with the improved RPN, the detector, 60% of the data is selected as the
do not require external zone recommendations, training set, and the remaining data is selected and
unlike R-CNN. In addition, the RPN improves the used as the test set for evaluation of the detector’s
quality of the district proposal and thus improves the performance. Afterwards, the process of the design of
overall accuracy and speed of object detection. the CNN has been performed. In this phase, the type
and size of the input layer are defined. For
classification tasks, the input size is chosen as the size
3 Detail’s of Implementation of the training images. For detection tasks, CNN
should analyze smaller portions of the image, so the
input size was chosen as a 32x32 input size, similar
to the smallest object in the dataset. Next, the middle
layer of the network is defined.. The medium layer
created here is made of repeating blocks of
convoltional, relu (rectified linear units) and pool
layers. Finally, a final layer consisting of fully
connected layers and a softmax loss layer was
created. Next, the design of the CNN is completed by
combining the input, middle and final layers.

The third step is to configuration of training options.


At this stage, the training options for the Faster R-
CNN method have been configured in four steps. In
the first two steps, the region proposals used in the
Faster R-CNN and the detection networks are trained. Figure 1 Result of Frame for Faster R-CNN on test
In the last two steps, the first two steps were merged data set 1 [10]
to form a single network. Later, the training options
configuration for the R-CNN method were performed
in one step. Here, the network training algorithm is
configured using an SGDM with an initial learning
rate of 0.001.

In the fourth step, training of fast R-CNN and R-CNN


object detectors was carried out. At this stage, image
patches were extracted by selecting from the training
data during the training process. Two kinds of name
value pairs are used here. It has been checked which
image patch is used with these. Positive training
samples here are examples with 0.6 to 1.0 with
accuracy boxes, as measured by the bounding box
intersection of the unity metric. Negative education Figure 2 Result of Frame for R-CNN on test data
examples are examples that overlap between 0 and set 1 [10]
0.3. Maximized values for these parameters were
selected by testing the trained detector on the
verification set.
Besides, a Precision / Recall (PR) curve is created to
Finally, the process of evaluation of trained fast R- highlight the sensitivity state of our detector
CNN and R-CNN detectors were carried out. At this regarding the degree of recall at various levels, and
stage, the detection results were collected and the mean average precision(map) values are obtained
evaluated by running the trained detectors on the test from faster R-CNN and R-CNN respectively, at
set. approximately 0.73, 0.76 and 0.64, 0.65 values. The
Precision / Recall (PR) curve is shown in Figure 3
and Figure 7 for fast R-CNN and in Figure 4 and
Figure 8 for R-CNN. Furthermore, according to the
4 Conclusion results, that is shown in Table 1, obtained for the
purpose of vehicle detection, results obtained via
The proposed vehicle detector has been successfully Faster R-CNN method have higher detection quality
trained by using Faster R-CNN and R-CNN deep and average sensitivity value (mAP) than the results
learning methods on the sample vehicle datasets and obtained via R-CNN method. In addition, it has been
the vehicle detection process has been successfully observed that object detection is faster and more
performed by the trained vehicle detector being reliable via Faster R-CNN method.
tested on the test data set. As the output of the study,
the image frames are shown in Figure 1 and Figure 5
for fast R-CNN and in Figure 2 and Figure 6 for R-
CNN.
Figure 6 Result of Frame for R-CNN on test data
set 2 [12]
Figure 3 Precision/Recall (PR) curve for
Faster R-CNN on data set 1[10]

Figure 7 Precision/Recall (PR) curve for Faster R-


CNN on data set 2 [12]
Figure 4 Precision/Recall (PR) curve for R-CNN
on data set 1[10]

Figure 8 Precision/Recall (PR) curve for R-CNN


Figure 5 Result of Frame for Faster R-CNN on test on data set 2 [12]
data set 2 [12]
Table 1 Comparision Table for Faster R-CNN and [13] J. Donahue, Transferrable Represenations for Visual
R-CNN Methods Recognition , PhD Thesis, University of California, Berkeley,
2017,

Method Data set mAP(%) test


time(h)
Faster Data set 1 72.8 0.59
R-CNN [10]
Faster Data set 2 75.7 0.74
R-CNN [12]
R-CNN Data set 1 64.7 5.25
[10]
R-CNN Data set 2 65.3 6.67
[12]

References:

[1] B. Alexe, T. Deselaers, V. Ferrari, "Measuring the objectness


of image windows”, TPAMI, 2012.

[2] Güzel, MS , “Versatile Vehicle Tracking and Counting


Application”, KaraElmas Sciene and Eng Journal, 7(2), 622-626,
2017.

[3] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M.


Smeulders, “Selective Search for Object Recognition,”
International Journal of Computer Vision, Cilt. 104, s. 154–171,
2013.

[4] I. Endres, D. Hoiem, "Category independent object


proposals", ECCV, 2010.

[5] J. Carreira, C. Sminchisescu, “CPMC: Au-tomatic object


segmentation using constrained parametric min-cuts”, IEEE
Transactions on Pattern Analysis and Machine Intelligence, Cilt.
34, s. 1312–1328, 2012.

[6] P. Arbelaez, J. Pont-Tuset, J. Barron, F. Marques, and J.


Malik, "Multiscale combinatorial grouping", CVPR, 2014.

[7] D. Cires a̧ n, A. Giusti, L. Gambardella, and J.


Schmidhuber, "Mitosis detection in breast cancer histology
images with deep neural networks", MICCAI, 2013

[8] R. Girshick, J. Donahue, T. Darrell, and J. Malik, " Rich


feature hierarchies for accurate object detection and semantic
segmentation.", IEEE Conference on Computer Vision and
Pattern Recognition, CVPR, 2014.

[9] ImageNET Classes Date Set Avaliable at: “http://image-


net.org/”

[10] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN:


Towards real-time object detection with region proposal
networks, NIPS, 2015.

[11] Vehicle Detection Data Set, Matlab Official Web Site


Avaliable at: “https://www.mathworks. com/”, 2017..

[12] Standford Vehicle Data Set:


Avaliable at: http://ai.stanford.edu/~jkrause/cars/car_dataset.
Html, 2018.

You might also like