Asım et al. - Unknown - A Vehicle Detection Approach using Deep Learning Methodologies-annotated
Asım et al. - Unknown - A Vehicle Detection Approach using Deep Learning Methodologies-annotated
Abstract: The purpose of this study is to successfully train our vehicle detector using R-CNN, Faster R-CNN
deep learning methods on a sample vehicle data sets and to optimize the success rate of the trained detector by
providing efficient results for vehicle detection by testing the trained vehicle detector on the test data. The
working method consists of six main stages. These are respectively; loading the data set, the design of the
convolutional neural network, configuration of training options, training of the Faster R-CNN object detector and
evaluation of trained detector. In addition, in the scope of the study, Faster R-CNN, R-CNN deep learning
methods were mentioned and experimental analysis comparisons were made with the results obtained from
vehicle detection.
The working method consists of six main stages. In this study, a feature vector of size 4096 were
These are respectively; loading the data set, the extracted from each region proposal with Caffe deep
design of the convolutional neural network, learning framework. Features were calculated by
configuration of training options, training of the forwarding the average output 227x227 red-green-
Faster R-CNN object detector, evaluation of trained blue image with five convolution layers and two
detector. These stages and conventional and the faster completely connected layers.
R-CNN methods will be discussed in this section. In order to calculate an attribute in a region proposal,
the image data is first converted to a form compatible
with CNN. (In this study, fixed entrances of 227 *
2.1 R-CNN (Regions with Convolutional 227 pixels in size are used.). Then,the most simple of
Neural Network Features) the possible transformations of the random-shaped
regions was selected. Here, all the pixels in a tight
The R-CNN approach combines two basic concepts. bounding box around the candidate area are resolved
From these, the first is to carry out efficient unto the required size, regardless of the size or aspect
convolutional neural networks from bottom to up ratio. Before dissolving, the tight bounding box was
region proposals to locate and dismember objects. expanded to provide w pixels skewed picture content
Next, when the label training data is insufficient, it is around the box at the skewed dimension (w = 16 was
followed by a supervised training for a field-specific used). In addition, a simple bounding box regression
fine tuning task, which provides significant was used to expand the localization performance
performance improvement. The method is named R- within the application [13]. This is shown in the
CNN (CNN-enabled regions) because Regional following equation (1). The details of this equation
proposals are combined with CNNs. can be seen in [8].
𝑁
The working object detection system composed of 𝑎𝑟𝑔𝑚𝑖𝑛 2
three modules. Firstly, it categorically produces 𝑤∗ = ∑(𝑡∗𝑖 − 𝑤∗𝑇 ∅5 (𝑃İ ))2 + 𝜆||𝑊∗ ||
𝑤𝑥
independent region proposals. These essentially 𝑖
describe the candidate detection set that can be used (1)
by the detector. The second module includes a 2.1.3 Classify Regions
convolutional neural network, producing an attribute
vector of constant length from every region. The third In this study, selective search was performed on test
module, on the other hand, includes a cluster of linear images to obtain approximately 2000 region
SVMs that are specific to the class used for proposals at test time. Each proposal has been
assortment of regions [8]. resolved and advanced through CNN for the
calculation of attributes. Then, for each class, each
produced attribute vector is scored using the trained
2.1.1 Region Proposal support vector machine (SVM). Considering the
scored regions, greedy non-maximum
Various recent studies have provided methods to suppression is applied independently when there
produce categorical independent zone is high-intersection (IoU) overlap with the
recommendations.These methods have examples selected zone with a higher rating over a learned
such as the objectness of image windows [1] , threshold for a rejected region.
selective Search for Object Recognition [3] , category
independent object proposals [4] , object
2.1.4 R-CNN Training propose zones called RPN, and The second
component is the Faster R-CNN detector which
The details of R-CNN training will be discussed at utilize the region proposals. The whole system comes
the following sub-sections. from a single composite network created for object
detection [10].
2.1.4.1 Supervised Pre-training
The first component is a conventional network used
CNN was previously trained on a large auxiliary data to propose regions called RPN (Region Proposal
set (ILSVRC2012 classification [9,13]) using only Network)
image-level additional tags. CNN was previously
trained on data set (ImageNET ILSVRC2012 [9,13]) 2.2.1 Region Proposal Networks (RPN)
using only additional tags. This training was carried
out using Caffe Deep Learning framework. In this study, RPN receives as input image and
produces a set of rectangle object tender which all
have objectivity score. The RPN is designed with a
2.1.4.2 Domain-Specific Fine-Tuning fully convoluted network. Since calculations are
shared with a Faster R-CNN object detection
In order to adjust Convolutional Neural Network to network, it is assumed that both networks share a
new task and domain name , SGD training was common set of layers of convolution.
performed to function parameters using only warped
region proposals. Convolutional Neural Networks A mini network is moving on the exit of the
ImageNetspecific 1000 way classification layer has convolution property map by the last shared
been changed over with the N+1 way classification convolution layer to produce region proposals. As an
layer. Convolutional Neural Network framework has input, it takes the space window of the convolutional
not been changed here. (N = 20 for VOC and N = 200 property map n × n(used as n=3). All sliding windows
for ILSVRC2013). in work are matched to low dimensional property.
This feature composed of two sister fully bound box-
All region proposals, which are equal to or greater regression and box-classification layers. In this mini
than 0.5 iou overlap value, were accepted as positive network, all the fully connected layers are shared in
for the box class and others were accepted as all spatial locations.This framewoek is carried out by
negative. In each SGD iteration, 32 positive windows the convolution Layer and following the two brothers
and 96 background windows are properly sampled to 1 x 1 convolution layer.
create a mini stack of 128 sizes.
2.2.1.1 Training RPNs (Region Proposal Network)
2.1.4.3 Object Category Classifiers
In this study, RPNs are trained end-to-end with
Here, binary classifier training was used to perceive backpropagation and SGD. In order to train this
cars. It is a positive example of an image area in network, "image-centric sampling" strategy is
which a car is tightly enclosed. In a similar way, a applied.In the study, all the batchs come from the
background region that is not interested in cars is a images involving negative and positive sample
negative example. It is unclear how a partially anchors.
overlapping region of the car should be labeled. the
unclear state is solved by specifying an IoU overlap When the missing functions of all the anchors are
threshold value. Areas below this threshold value are optimized here, the orientation of the negative
identified as negative and those above the threshold examples is realized. For this reason, a random
value as positive.The overlap threshold “0.3” was sample of 256 anchors is shown in an image instead.
chosen by conducting a grid search on the According to this, if there are more than 128 positive
verification set. Once the features are removed and samples, it is filled with stacked samples. Otherwise,
the training tags are applied, SVM is applied it is filled with negative examples. In addition,
optimally to all classes. following the multitasking loss for Fast R-CNN, the
objective function has been reduced to a minimum.
2.2 Faster R-CNN This loss function is shown in the following
equation (2) , the details of the Equation also can be
The Faster R-CNN composed of two component. The seen in [10].
first component is a conventional network used to
1 This study aims to successfully train the vehicle
𝐿({𝑝𝑖 }, {𝑡𝑖 }) ∑ 𝐿𝑐𝑙𝑠(𝑃 ,𝑃∗)+𝜆 1 ∑ 𝑝 detector on the sample vehicle data sets using the
𝑁𝑐𝑙𝑠 İ 𝑖 𝑁𝑟𝑒𝑔 𝑖 𝑖 𝐿𝑟𝑒𝑔(𝑡𝑖 ,𝑡∗ )
𝑖 𝑖 Faster R-CNN and R-CNN deep learning methods,
shown in Section. It also aims to achieve maximal
(2) results for vehicle detection by testing the trained
vehicle detector on the test data. In addition, the
results obtained from these methods are compared
with experimental analysis. To do this, the Caffe deep
2.2.2 Sharing Features for RPN and Faster R- learning framework is used on the Matlab Program.
CNN
The purpose of this study is to successfully train our
vehicle detector using R-CNN, Faster R-CNN deep
Until now, it has been explained how to train a learning methods on a sample vehicle data sets and to
network to generate a region proposal without taking optimize the success rate of the trained detector by
into account the region-based object detection. Here, providing efficient results for vehicle detection by
Faster R-CNN is used for the detection network. testing the trained vehicle detector on the test data.
After that, a unified network learning algorithm The working method consists of six main stages.
consisting of shared convolution layer RPN and These are respectively; loading the data set, the
Faster R-CNN is defined. design of the convolutional neural network,
configuration of training options, training of the
Here both RPN and Faster R-CNN are trained Faster R-CNN object detector, evaluation of trained
independently and the layers of convolution are detector. In addition, in the scope of the study, Faster
characterized by different forms. For this reason, R-CNN, R-CNN deep learning methods were
instead of learning two separate networks, techniques mentioned and experimental analysis comparisons
have been developed that are able to share layers of were made with the results obtained from vehicle
convolution between two networks. There are three detection.
techniques to train networks in shared properties.
These are alternating training which are, approximate Our application consists of 6 main steps. These;
joint and non-approximate joint trainings loading the data set, the design of the convolutional
respectively. neural network, configuration of training options,
training of the Faster R-CNN object detector,
evaluation of trained detector.
2.3 Comparison of Faster R-CNN and R-CNN
Methods First, the loading of the data set is performed. In this
study, two different vehicle dataset were employed.
Today, the most sophisticated object detection The first dataset includes approximately 350 images
networks are based on region proposal algorithms for [11] and 1000 images are obtained from the second
the description and identification of object public vehicle dataset [12] . Each image in these
locations.Faster R-CNN has put forward the regional datasets includes one or two tagged vehicle samples .
proposal calculation as a bottleneck by reducing the In this study, the training data is stored on a table. The
working time of these detection networks in R-CNN. existing columns on the table contain the contents of
In the Faster R-CNN, a Region Proposition Network the path of the image files and the ROI tags for the
(RPN) is implemented that shares full image vehicles. In addition, in this section, the data set is
convolution characteristics using the detection divided into training and test sets to train the detector
network, so that almost free region proposals can be and evaluate the detector. In this part, in order to train
made. Faster R-CNN, along with the improved RPN, the detector, 60% of the data is selected as the
do not require external zone recommendations, training set, and the remaining data is selected and
unlike R-CNN. In addition, the RPN improves the used as the test set for evaluation of the detector’s
quality of the district proposal and thus improves the performance. Afterwards, the process of the design of
overall accuracy and speed of object detection. the CNN has been performed. In this phase, the type
and size of the input layer are defined. For
classification tasks, the input size is chosen as the size
3 Detail’s of Implementation of the training images. For detection tasks, CNN
should analyze smaller portions of the image, so the
input size was chosen as a 32x32 input size, similar
to the smallest object in the dataset. Next, the middle
layer of the network is defined.. The medium layer
created here is made of repeating blocks of
convoltional, relu (rectified linear units) and pool
layers. Finally, a final layer consisting of fully
connected layers and a softmax loss layer was
created. Next, the design of the CNN is completed by
combining the input, middle and final layers.
References: