0% found this document useful (0 votes)

36 views

Robust Inter-Vehicle Distance Estimation Method Based On Monocular Vision

Uploaded by

nirajaadithya.dasireddi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

Robust Inter-Vehicle Distance Estimation Method Based On Monocular Vision

Uploaded by

nirajaadithya.dasireddi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2907984, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI

Robust inter-vehicle distance estimation

method based on monocular vision
LIQIN HUANG1 , TING ZHE1 , JUNYI WU1 , QIANG WU2 ,(Senior Member, IEEE), CHENHAO
PEI1 , AND DAN CHEN1
1
College of Physics and Information Engineering, Fuzhou University, Fuzhou 350108, China
2
School of Electrical and Data Engineering, University of Sydney Technology, Sydney, Australia
Corresponding author: DAN CHEN ([email protected]).
This work was supported in part by Major Science and Technology Projects in Fujian China under Grant 2018H0018.

ABSTRACT Advanced driver assistance systems (ADAS) based on monocular vision are rapidly
becoming a popular research subject. In ADAS, inter-vehicle distance estimation from an in-car camera
based on monocular vision is critical. At present, related methods based on monocular vision for measuring
the absolute distance of vehicles ahead experience accuracy problems in terms of the ranging result, which
is low, and the deviation of the ranging result between different types of vehicles, which is large and
easily affected by a change in the attitude angle. To improve the robustness of a distance estimation
system, an improved method for estimating the distance of a monocular vision vehicle based on the
detection and segmentation of the target vehicle is proposed in this study to address the vehicle attitude
angle problem. The angle regression model (ARN) is used to obtain the attitude angle information of the
target vehicle. The dimension estimation network determines the actual dimensions of the target vehicle.
Then, a 2D base vector geometric model is designed in accordance with the image analytic geometric
principle to accurately recover the back area of the target vehicle. Lastly, “area–distance” modeling based
on the principle of camera projection is performed to estimate distance. Experimental results on the
real-world computer vision benchmark, KITTI, indicate that our approach achieves superior performance
compared with other existing published methods for different types of vehicles (including front and
sideway vehicles).

INDEX TERMS Attitude angle information, Distance estimation, Instance segmentation, Monocular
vision.

I. INTRODUCTION and matching between two cameras, stereo vision systems

ESEARCH on advanced driver assistance systems require a long execution time and exhibit low efficiency and
R (ADAS) is developing rapidly. ADAS play an im-
portant role in reducing traffic accidents, preventing rear-
considerable computational complexity. Monocular vision
can tolerate a complicated algorithm and can obtain an
end collisions between vehicles [35], and improving traffic optimal result within a shorter time than stereo vision [37].
safety performance. Inter-vehicle distance estimation is a However, the current distance estimation method for monoc-
crucial part of ADAS. Distance estimation methods based ular vision still experiences problems, such as low precision
on monocular vision can be divided into two major classes: and a narrow application range. A monocular-vision-assisted
sensor-based [9] [12] and vision-based [10] [21] systems. driving system can efficiently control real-time performance
Sensor-based systems use sensors, such as RADAR and because it conforms to the human visual system. Moreover,
LIDAR [19], to accurately provide the distance information it adapts to the applicable scene of modern vehicles and
of a target vehicle. However, high cost and target vehicle demonstrates considerable development prospects compared
data collection remain as critical issues. Meanwhile, vision- with other systems. Consequently, inter-vehicle distance
based systems are typically divided into two classes: stereo estimation based on monocular vision has been become a
vision [13] [27] and monocular vision [11] [14] [15]. Stereo popular research topic.
vision can more intuitively and accurately calculate the long To satisfy the requirements of monocular-vision-assisted
distance of vehicles ahead. However, due to the calibration driving system positioning, many distance estimation meth-

VOLUME 4, 2016 1

2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2907984, IEEE Access

L. Huang et al.: Robust inter-vehicle distance estimation method based on monocular vision

ods based on monocular vision have been proposed recently. ing in low ranging accuracy. Accordingly, a concept based
On the basis of their experimental results, monocular visual on instance segmentation was proposed in [8] using the
distance estimation methods can be approximately divid- projection geometry model established by the projected area
ed into relative depth and absolute distance estimations. to estimate the distance of the vehicle ahead. Compared
Relative depth estimation [22] [23] mostly predicts the with the method presented in [2] [7], redundant information
depth of each pixel in an image, expresses the foreground can be reduced and ranging accuracy can be improved.
and background through different gray values, and finally However, no modeling analysis is performed for the mech-
outputs the depth map of the depth variation of the entire anism of attitude angle change. The ranging result of the
scene. However, several articles, such as [24], have reported non-front target vehicle exhibits a considerably large error.
that this type of method primarily restores the relative rela- The application range of the distance estimation method is
tionship between the target vehicle and the subject vehicle insufficiently wide. To acquire the actual dimensions of a
but does not obtain the absolute distance (in meters) of the vehicle, vehicle type is obtained using a vehicle classifi-
target vehicle in the scene. A depth map contains redundant cation network. Then, the actual dimensions of the target
information, such as sky, distant buildings, and street trees, vehicle are obtained by matching the dimension information
which exerts minimal effect on distance measurement in of different types of vehicles that have been previously
traffic scenarios. Nevertheless, such information will reduce calculated.
the efficiency of estimating the distance of the target vehicle.
An absolute distance estimation method can obtain the
absolute distance of the target vehicle ahead. These meth-
ods have two categories in accordance with the system
model, namely, methods based on the network model and
those based on the geometric model. The use of ma-
chine learning methods, such as neural networks [14] and
cascade classifiers [15] [38], was originally proposed to
obtain a network model for distance estimation. These
methods require many positive and negative samples during
the training process to ensure good accuracy. However, a
satisfactory ranging result cannot be produced. Thereafter,
distance estimation based on the geometric model has been
proposed. In accordance with different projection principles,
distance estimation methods can be further divided into the
inverse perspective mapping (IPM) principle and the camera
projection principle. In target vehicle distance estimation FIGURE 1: (a) (c) shows partial scene of the vehicle taken
based on the IPM principle [1] [3] [4], the original image is along the direction of the camera0 s optical ray, (b)(d) is the
converted into a bird0 s eye view via IPM transformation to driving situation of the vehicle in whole scene.
restore the information of the road plane. Then, the distance
of the target vehicle is calculated using the IPM image Figure 1 shows the change in the attitude angle of the
obtained after the conversion. However, this method has target vehicle when it is in the front and sideway positions
two disadvantages: 1) The image brightness requirement is of the subject vehicle. As shown in Figs. 1(a) and 1(c),
relatively high. When the acquired image brightness is low, the projection relationship of each part of the vehicle is
detection system performance is low and distance estimation different, and the corresponding projection information in
accuracy is reduced, and 2) The size of the converted image the image also varies. When the method presented in [8]
changes, which causes some of the target vehicles in the is used, processing will result in a considerable difference
original image to be lost in the IPM image, thereby limiting in the accuracy of the distance estimation results among
the estimated range of the system. various vehicles, and the accuracy of the overall result will
To address the aforementioned problems, a distance esti- decrease. Therefore, the current study proposes a method
mation method for the vehicle ahead based on the camera based on the monocular vision vehicle distance estimation
projection principle was proposed in [2] using vehicle width method by integrating vehicle attitude angle information.
estimation and by comprehensively considering two road To improve the efficiency of a distance estimation system,
environments, i.e., with and without lane markings. By we enhanced the method for obtaining vehicle dimension
detecting the positions of the vehicle ahead [36] and the information in [8] by applying the advantages of the deep
vanishing point, a method based on the camera projection network framework and using the KITTI dataset to train
geometry model was developed in [7] for measuring the the dimension estimation network and obtain the actual
longitudinal distance of the vehicle ahead. However, this dimensions of a vehicle, thereby improving the detection
method cannot obtain detailed information of the target efficiency of a system.
vehicle and acquires redundant information, thereby result- The rest of this paper is organized as follows. Section
2 VOLUME 4, 2016

L. Huang et al.: Robust inter-vehicle distance estimation method based on monocular vision

II briefly discusses related studies and our contributions, vehicle ahead. However, the deflection of the vehicle during
primarily reviewing the progress of previous work in the driving is not considered, and this method is applicable only
field of distance estimation research. Section III explains the to the front vehicle.
entire distance estimation system and the method for each Subsequently, [1] presented a geometric model based on
block. Section IV introduces the research environment and obtaining vehicle position and lane line information while
experimental results to verify the accuracy and robustness using vehicle height to measure the distance of the vehicle
of the system. The conclusions of the study and future work ahead in the modeling of the original and IPM images,
are described in Section V. thereby solving the information loss problem in image
conversion. A distance estimation method based on vehicle
II. RELATED WORK detection information using vehicle width was developed in
In recent years, research on the methods for estimating [2], the method comprehensively considers two road envi-
the distance of monocular vision vehicles based on geo- ronments, namely, with and without lane markings. Howev-
metric models, which are largely divided into the inverse er, the methods proposed in [1] and [2] present the position
perspective mapping transformation method [3], projection of the target vehicle in the image as rectangular, and thus,
geometric relationship method [5], and fitting modeling many details of the target vehicle cannot be obtained, and a
method [16], has achieved considerable results. In [4] [17] considerable amount of redundant information is included.
[18] [20], a distance estimation method based on IPM was Huang et al. [8] suggested measuring the target vehicle0 s
proposed. The difference among these studies is the object distance based on vehicle segmentation information using
detection method, such as the road removal algorithm [4] the projected area. Compared with the method that uses ve-
[17], threshold adjustment [18], and hue, saturation, and hicle height or width, redundant information can be reduced
value (HSV) color mapping [20]. The overall idea is to to improve ranging accuracy. The method developed in [8]
convert the original image into a bird0 s eye view, which disregards the attitude angle problem of a vehicle, and the
approximates the image obtained from the top-down obser- applicable range of the system has limitations. This method
vation scene. The IPM image obtained through conversion is is mostly applicable to front vehicles, whereas the estimation
used to calculate the distance value of the target vehicle. The result of sideway vehicles acquires a large error. Moreover,
original image is converted to an IPM image by obtaining distance estimation by recovering the original projection
the camera0 s internal and external parameter information. information of the occluded vehicle from the overlapping
The method is simple and feasible but does not consider the area of the rectangle reduces ranging accuracy due to the
vehicle0 s attitude angle information when moving, thereby inaccuracy of the labeling of the rectangle.
resulting in a considerable distance error when the vehicle In summary, the major contributions of our work are as
moves. follows.
Subsequently, to avoid the problem of conversion between
images, [5] [6] [7] proposed to establish a geometric model • To improve the accuracy of the estimated results and
based on the principle of camera perspective projection to the robustness of the system for vehicles under differ-
estimate the distance of the target vehicle. Nakamura et al. ent driving conditions, This study proposes to integrate
[5] presented a monocular vision vehicle distance estima- vehicle attitude angle information based on vehicle
tion method based on the triangular geometric relationship segmentation information to realize distance estimation
between the horizontal and vertical directions to estimate of the vehicle ahead.
vehicle width. However, this method reduces only the error • The 2D base vector geometric model is designed in ac-
of vehicle width estimation during the tracking process and cordance with the principle of image analytic geometry,
does not consider the change in the attitude angle produced and the relationship between the back of the vehicle
by the vehicle during driving. Thus, a considerable error and the overall projection information of the vehicle
in estimating the distance of the non-front target vehicle is obtained. This relationship is used to determine the
is produced. Bao et al. [6] developed a monocular vision projected area of the back of the vehicle.
ranging method based on the linear relationship between • The method for obtaining vehicle size information is
the average vehicle width and the actual distance of the improved to enhance system efficiency and acquire
vehicle, however, this method does not consider the attitude the actual dimensions of a vehicle through our trained
change of the vehicle during driving. Moreover, the average dimension estimation network.
width of the vehicle in the image can only guarantee the • The test results of the KITTI benchmark dataset show
average ranging accuracy and not the accuracy of the single that the error rate for sideway vehicle ranging is less
vehicle ranging result. Huang et al. [7] established a method than 5%, and the accuracy deviation among vehicles
for measuring the longitudinal distance of the target vehicle under different driving conditions is less than 2%.
ahead based on the vanishing point of the lane line by These results considerably narrow the deviation of
detecting the positions of the vehicle and the vanishing ranging accuracy among different types of vehicles,
point. The method accurately detects the vanishing point overcome the limitations, and exceed the accuracy of
position to ensure the accuracy of the ranging result of the existing distance estimation methods.
VOLUME 4, 2016 3

L. Huang et al.: Robust inter-vehicle distance estimation method based on monocular vision

FIGURE 2: Distance estimation system framework.

III. SYSTEM MODEL tween the back of the vehicle and the entire vehicle, and the
A. SYSTEM OVERVIEW projected area of the back of the vehicle is calculated. Lastly,
a geometric model based on the camera projection principle
The primary reason for the problem in existing distance
is established to estimate the distance of vehicle ahead. In
estimation methods is that vehicle attitude angle information
our distance estimation system, a region proposal network
is not considered. In complex traffic scenarios, the driving
(RPN) that combines object classification and object can-
conditions of the target vehicle vary relative to the subject
didate regions is generated. The RPN network can be used
vehicle, and the attitude angle information of different types
to generate the candidate regions of target vehicles, thereby
of vehicles and the projection relationship in the image
achieving a complete end-to-end target detection module,
are different. As shown in Fig. 1, the target vehicle moves
which not only accelerates detection speed but also improves
respectively in the front and sideway positions of the subject
detection performance. The segmentation network is a pixel-
vehicle. In the sideway position, the projected parts of the
level segmentation of the target vehicle candidate region
vehicle in the image are not simply formed by the actual
using the mask region-based convolutional neural network
back projection of the vehicle, but the front vehicle is
(R-CNN) [25] instance segmentation network to obtain the
formed by the corresponding projection. If the projection
vehicle mask. Adopting the design ideas and deep network
relationship is the same, i.e., the projection areas of the two
framework of pose estimation in [26], the KITTI dataset was
types of vehicles are formed by the back of the vehicle, then
used to train the ARN and dimension estimation network
the accuracy of the ranging result between the two vehicles
to obtain vehicle attitude angle information and physical
deviates considerably, thereby resulting in a decrease in the
dimensions. Subsequently, this section primarily introduces
accuracy of the entire distance estimation system.
the local module design of the distance estimation system,
In summary, this study considers the vehicle attitude angle including the attitude angle design, vehicle back projection
information based on a vehicle detection and segmenta- information extraction, and distance estimation module de-
tion algorithm and establishes an “area–distance” geometric sign.
model based on the camera projection principle to estimate
the distance of the vehicle ahead. The system framework
is shown in Fig. 2. First, the entire RGB image is sent B. ATTITUDE ANGLE DESIGN
to the target detection part to extract the candidate area of Given that the driving lane of the subject vehicle changes,
the target vehicle. Then, the candidate regions are sent to the driving position of the target vehicle changes relative
the segmentation network, ARN, and dimension estimation to that of the subject vehicle. Thus, the direction of the
network to obtain the segmentation information, attitude light between the camera0 s optical center and the center of
angle information, and actual dimension of the target vehi- different target vehicles varies, thereby resulting in different
cle, respectively. Subsequently, a 2D base vector geometric attitude angle information of the target vehicles. As shown
model based on the principle of image analytic geometry in Fig. 3, the attitude angle information of a vehicle is
[39] is designed to obtain the projection relationship be- transformed into 2D space to establish an analysis plan. The
4 VOLUME 4, 2016

L. Huang et al.: Robust inter-vehicle distance estimation method based on monocular vision

FIGURE 3: Plan of vehicle FIGURE 4: Angle geometry

driving conditions. relationship.
FIGURE 5: Angle Regression Network structure.
orange rectangular box represents the front vehicle, whereas
the blue rectangular box represents the sideway vehicle. The information and how the projected area of a vehicle0 s back is
camera0 s optical center establishes the camera coordinate obtained by using this relationship, wherein the pixel value
system for the origin. The black dotted line indicates the of the mask is obtained from the segmentation information
horizontal line of the camera coordinate system, the red to represent the projected area.
arrow indicates the vehicle driving direction, and the blue
arrow indicates the light direction. θray1 and θray2 are 1) Relationship between the masks of the front vehicle and
the light ray angles of the front and sideway vehicles, the back of the sideway vehicle
respectively; θ1 and θ2 are called the global angles of the We can learn from the Section III-B that vehicle changes
front and sideway vehicles, respectively; and θl is called under different driving conditions are related to the vehicle
the local angle. The local angle of the front vehicle is attitude angle. Compared with the mask information of
◦
0 , whereas the local angle of the sideway vehicle is not the front vehicle, the sideway vehicle also contains other
◦
0 . The relationship among angles is shown in Fig. 4. parts of the mask information. However, the back of the
The blue rectangle refers to the target vehicle, the triangle vehicle remains unchanged, and the corresponding mask
is the camera0 s optical center of the subject vehicle, and information does not change. Therefore, the projected area
the horizontal dotted line is the horizontal axis of the on the back of the sideway vehicle is the same as the
camera coordinate system. θray is the angle between the projected area of the front vehicle, and Equation (1) is
ray connected to the vehicle center and the optical center obtained.
and the horizontal axis, θ is the angle between the vehicle
driving direction and the horizontal axis, θl is the local angle
of the vehicle, and θl = θ−θray . Subsequently, θray is called Sf ront vehicle mask = Ssideway vehicle back mask (1)
the ray angle, θ is called the global angle, and θl is called the where Sf ront vehicle mask is the projected area of the front
local angle. Given the change in attitude angle information, vehicle and Ssideway vehicle back mask denotes the projected
the projection relationship and mask information of a vehicle area on the back of the sideway vehicle.
are changed.
To obtain the required attitude angle information, we 2) Relationship between the masks of the front and sideway
adopt the concept of rectangular box regression in a faster vehicles
R-CNN [34] network and the design idea of the angle Assume that the camera0 s elevation and roll angles are
estimation architecture in [26]. On the basis of the last layer zero. Then, the image acquired by the camera is parallel to
of the convolution feature map, the regression parameters the actual observation scene. In a traffic scene, the vehicle
after the fully connected layers (FC) are modified, the travels on a straight road, regardless of the vehicle driving
required angle regression network is trained through the on the curve.
KITTI detection dataset, and the attitude angle information To obtain the mask on the back of the vehicle, the rela-
is finally obtained. The network structure is shown in Fig. tionship between the front vehicle mask and the entire mask
5. of the sideway vehicle must be first analyzed. We extract the
candidate area of the target vehicle along the direction of
C. VEHICLE BACK PROJECTION INFORMATION the light. As shown in Fig. 6, the area surrounded by the
EXTRACTION green line indicates the mask projected from the back of the
This section primarily describes how the projection rela- front and sideway vehicles, and the area surrounded by the
tionship between the back of the vehicle and the entire yellow line indicates the mask of the entire projection of
vehicle is obtained through attitude angle and segmentation the sideway vehicle.
VOLUME 4, 2016 5

L. Huang et al.: Robust inter-vehicle distance estimation method based on monocular vision

(a) Front vehicle

(a) Front vehicle (b) Sideway vehicle
FIGURE 6: Outline map of target vehicle candidate area,(a)
Mask outline of the Front vehicle, (b) Mask outline of
Sideway vehicle.

In accordance with the analytic geometric transformation

(b) Sideway vehicle
properties of the 2D image, each planar graph can be repre-
sented by a set of linearly independent basis vectors. Then, FIGURE 7: Target vehicle contour regularization, The green
the geometric transformation of the figure in the 2D space rectangular frame area approximates the mask on the front
can also be expressed by the geometric transformation of vehicle and back of the sideway vehicle, and the yellow
the base vector. Given that the shape of the mask projected rectangle approximates the whole mask of the sideway
by the vehicle in the image is an irregular pattern, which is vehicle.
inconvenient for further analysis, the rigidity of the vehicle
is used to approximate the mask of the vehicle projection
using a rectangle.
• The mask of the front vehicle is represented by the
e1 –e2 basis vector, as shown in Fig. 7(a). The target
vehicle is extracted along the direction of the camera
light. Thus, the physical meaning of the e1 base vector
is the light ray direction of the front vehicle. Given
that the direction of the front car is the same as that
of the light ray, the e1 base vector can also represent FIGURE 8: 2D base vector geometric model.
the driving direction of the vehicle. e2 is the vertical
vector of e1 .
is the projected area of the front vehicle, whereas
• The mask of the sideway vehicle is represented by the
Ssideway vehicle mask is the projected area of the sideway
e3 –e4 basis vector, as shown in Fig. 7(b). Similarly,
vehicle.
the physical meaning of the e3 base vector represents
the light ray direction of the sideway vehicle, and e4
3) Relationship between the entire mask and back mask of
is the vertical vector of e3 .
the sideway vehicle
Figure 7 is transformed into the same coordinate system, Given γ = θl , Sf ront vehicle = Ssideway
mask vehicle back mask ,
and a 2D base vector geometric model is constructed as can obtain Equation (3).
shown in Fig. 8. The base vector transformation of the
corresponding mask of the two types of vehicles is analyzed. Ssideway vehicle back mask
cos θl = (3)
The e1 –e2 base vector is used as the baseline to observe the Ssideway vehicle mask
change of the e3 –e4 base vector. The blue pair represents From Equation (3), the relation between the entire mask
the base vector of the front vehicle, whereas the red pair of the sideway vehicle and the mask on the back of the
represents the base vector of the sideway vehicle. sideway vehicle can be obtained, wherein the pixel value of
In accordance with its physical meaning, the angle (γ) of the mask area is obtained from the segmentation information
the offset between the base vectors is the local angle (θl ). to represent the projected area.
Given that the change in mask is consistent with the change
in the base vector, Equation (2) can be obtained as follows: D. DISTANCE ESTIMATION MODULE DESIGN
|e1 | |e1 e2 | Sf ront vehicle mask The projected area of the back of the vehicle obtained
cos γ = = = , (2) in the Section III-C3 is combined with the actual vehicle
|e3 | |e3 e4 | Ssideway vehicle mask
dimensions and camera focal length (in pixels) for modeling
where |e1 e2 | and |e3 e4 | respectively represent the mask based on the principle of camera projection to estimate
figures of the front and sideway vehicles.Sf ront vehicle mask the distance value of the vehicle ahead. Compared with
6 VOLUME 4, 2016

L. Huang et al.: Robust inter-vehicle distance estimation method based on monocular vision

(a) Principle of camera projection (b) Image plane to pixel plane

FIGURE 9: Projection geometry model of distance estimation, where f is the camera focal length and L is the physical
distance between the object vehicle and the camera.

the methods proposed in [2] [7], we use the perspective

projection relationship between the actual and projected ar-  
eas to establish the “area–distance” geometric model, which  X
0 0  c
  
x f 0
provides a more comprehensive use of vehicle information, Yc 
Zc  y  =  0 f 0 0 
 Zc  (6)
to improve the accuracy of the distance estimation system. 1 0 0 1 0
Moreover, we focus on the projection transformation rela- 1
tionship between surfaces, which can enhance the reliability
of the geometric models compared with that in [8].
Xw
    1    
u u dx
0 u0 f 0 0 0
1 R T  Yw 
1) Principle of camera projection Zc v  = L v  =  0 v0   0 f 0 0  T
dy 0 1  Zw 
1 1 0 0 1 0 0 1 0
The principle of camera projection primarily transforms the 1
points (Xw , Yw , Zw ) in the world coordinate system into the   1 0 0 0  Xw 
fx 0 u0 0
camera coordinate system (Xc , Yc , Zc ), which then become 0 1 0 0   Yw 
=0 fy v0 0  
points (x, y) on the 2D plane through perspective projection. 0 0 1 L  0 
0 0 1 0
0 0 0 1 1
Lastly, the points (x, y) is stored in the form of pixels (u, v),
  Xc 
as shown in Fig. 9. fx 0 u0 u0 L
Y 
If the world coordinate system is in the position shown =0 fy v0 v 0 L   c  ,
0
T 0 0 1 L
in Fig. 9(a), then R = I (unit matrix), T = 0 0 L , 1
Zw = 0, and Equation (4) can be obtained. (7)

where dfx = fx , dfy = fy , (u0 , v0 ) = (0, 0), and Zc = L,

      
Xc Xw 1 0 0 0 Xw
 Yc 
  = RT T  Yw
 0 1 0 0  Yw  Equation (7) is transformed into Equation (8).
 =    (4)
 Zc  0 1  Zw  0 0 1 L  0 
1 1 0 0 0 1 1      
u fx Xc + u0 L fx Xc
As shown in Fig. 9(b), the image coordinate system v  = 1  fy Yc + v0 L  = 1  fy Yc  (8)
L L
is converted to the pixel coordinate system, as shown in 1 L L
Equation (5).
2) Relationship of area conversion is derived from the
  1  
u dx 0 u0 x relationship of point conversion
v  =  0 1
dy v0  y  (5) The actual area of the target vehicle is divided into N parts
1 0 0 1 1 along the Yc direction, and each part is approximately a rect-
angle, as shown in Fig. 10. The four vertices of the i-th rect-
Applying the camera projection principle, as shown in angle are marked as P1i , P1i+1 , P2i , and P2i+1 , where Pri =
i i
Equation (6), and the conversion relationship between the Prx , Pry = xr i , yr i , (r = 1, 2; i = 1, 2, 3, · · · , N ). Prx
i
i
actual and pixel points in the camera coordinate system can and Pry represent the Xc and Yc coordinates of the four
be obtained, as shown in Equation (7). coordinate points, respectively.
VOLUME 4, 2016 7

L. Huang et al.: Robust inter-vehicle distance estimation method based on monocular vision

We determine the vehicle0 s position on the basis of the

definitions in the international vehicle collision warning
system [29]. Vehicles ahead are divided into two types:
the front and sideway vehicles. A front vehicle implies no
deviation between the longitudinal center lines of the subject
and target vehicles. If deviation exists, then the vehicle
ahead is a sideway vehicle.
Segmentation Network: In this work, the state-of-the-art
instance segmentation network, namely, Mask R-CNN, was
used as the segmentation network, and the candidate regions
FIGURE 10: Actual area of the visible part of the target of the target detection network were segmented at the pixel
vehicle. level to obtain the mask that was used to calculate vehicle
distance. Mask R-CNN has three advantages over faster
R-CNN. First, Mask R-CNN enhances the foundation of
Then, the actual area of the visible part of the target the network by using ResNeXt-101 with a feature pyramid
vehicle is network [28] as the feature extraction network. Second,
XN Mask R-CNN replaces RoIPool with RoIAlign to solve
i i
i+1 i

S= P1y − P2y P2x − P2x the misalignment issues caused by the direct sampling of
i=1 pooling. Third, Mask R-CNN can independently predict a
N
(9)
X binary mask for each class. The classification of each binary
y1i − y2 i
xi+1 i

= 2 − x2 . mask depends on the classification prediction given by the
i=1
network region of interest (ROI) classification branch, and
Using the relationship between the actual and pixel points thus will not cause competition among classes. Mask R-
(8), we can obtain CNN has demonstrated excellent performance in instance
segmentation. Compared with the method represented by a
"N #
X
i i
i+1 i
L2
S= v1 − v2 u2 − u2 2D bounding box, a mask is used to obtain the details of
fx fy
i=1 (10) the target vehicle, and the redundancy in the rectangle is
2
L reduced to improve the accuracy of the distance estimation
= Spixel ,
fx fy system. Therefore, Mask R-CNN was selected as our dis-
tance estimation system. The segmentation network of the
where Spixel represents the projected area of the target
system obtains the segmentation information of the target
vehicle in the image, i.e., the pixel value of the mask
vehicle in the image to ensure the accuracy of the system.
formed by the projection of the vehicle in the image, and S
Angle regression and dimension estimation net-
represents the actual area of the vehicle.
works: Given that the angle regression and dimension esti-
3) Estimate the physical distance of the vehicle ahead
mation networks are based on the CNN network framework,
we use the same regression network structure to obtain
In accordance with Equations (3) and (10), the distance
the required vehicle parameters, i.e., training a deep CNN
formula (11) is obtained as
to regress the angle of the vehicle and its dimensions.
1 1
fx fy S 2 fx fy Ssideway vehicle back 2 In the KITTI dataset, vehicles, vans, trucks, and buses
L= = are under different categories, and the distribution of the
Spixel Ssideway vehicle back mask
1 object dimensions for category instances is low-variance
fx fy Ssideway vehicle back 2 and unimodal. For example, the dimension variance for
= ,
Ssideway vehicle mask cos θl vehicles and cyclists is approximately several centimeters.
(11) Therefore, we use the L2 loss directly. To regress vehicle
where L is the physical distance of the vehicle ahead fx = parameters, we use a pretrained VGG network [33] without
fy = 7.2153×102 , Ssideway vehicle back is the actual area of its FC layers and add our angle and dimension estimation
the back of the sideway vehicle, and Ssideway vehicle mask modules. During training, each ground truth crop is resized
is the entire mask of the projected area of the sideway to 224×224. To make the network more robust to viewpoint
vehicle. θl is the local angle of the vehicle. changes and occlusions, the ground truth boxes are jittered,
and the ground truth angle is changed to account for the
IV. EXPERIMENT movement of the center ray of the crop.
In this study, the proposed distance estimation system is Datasets:The networks involved in the distance esti-
mostly applied to the vehicle camera system in an automatic mation system proposed in this study are Mask R-CNN,
driving scene. The research scene is an actual traffic scene of ARN, and a dimension estimation network. We used the
a modern vehicle. The research device is a camera mounted COCO and KITTI datasets [30] for training, thereby finally
behind the windshield of a vehicle for acquiring images. verifying our method on the KITTI detection benchmark.
8 VOLUME 4, 2016

L. Huang et al.: Robust inter-vehicle distance estimation method based on monocular vision

TABLE 1: Average error of distance estimation by different

methods (m)

Distance range 0-10m 10-20m >20m

method[1 ] 0.74 1.77 6.52

method[31 ] 0.81 1.81 7.21
ours 0.38 0.85 2.17
1 This footnote shows m = meters.

KITTI contains a training set of 7481 images and a test set

7581 images, and because our distance estimation system is
for the automatic driving scene, it focuses only on the “car”
category. The KITTI test dataset has no ground truth label.
Following the “rules,” we separate a part of the data from the
KITTI training set as a test set. The “rules” are as follows. FIGURE 11: Absolute error curve graph.
First, the data of the training and test sets must be from
different video sequences. Second, the selected data should
contain two scenarios: front and non-front. Third, ranging our distance estimation method is reduced by 0.43 m at
must be performed for vehicles with different distances. the maximum compared with the other methods within the
Thus, the selected verification image should include vehicle range of 10 m. Even within the range of distances greater
samples in different near and far situations. In accordance than 20 m, the average error of our distance estimation
with this “rule,” we use the 3481 images in the training set results is guaranteed to be approximately 2 m. Compared
as the test set to verify and analyze our distance estimation with the methods proposed in [1] [31], the average error
method. is considerably reduced, the optimal result is achieved, and
Evaluation metrics: the distance estimation accuracy of the system is improved.
Moreover, the maximum deviation among the average er-
a.Absolute Error:∆ (m) = |Lground true − Lexperimental |. rors over different distance ranges is approximately 1.8 m.
Compared with the method presented in [31], the deviation
∆
b.Relative Error:δ (%) = Lground true . between the system estimation results of different distances
n is reduced, and the overall distance estimation system is
1
P
c.Average Error: ∆ (m) = n |∆i |. more stable and robust.
i=1
n
1
P ∆i B. ATTITUDE ANGLE MODULE VERIFICATION
d.Average Error Rate: δ (%) = n Lground true i .
i=1
For the absolute error of different distance ranges, our
To verify the accuracy and robustness of the proposed method is compared with the method that disregards vehicle
distance estimation system, we compared it with different attitude angle information [8]. The method proposed in [8]
distance estimation methods from three aspects, namely, ac- is included in our test set, and the result is represented in
curacy verification of the entire system model, attitude angle the form of a curve graph, as shown in Fig. 11. In Fig.
module verification, and system robustness verification. 11, the variation trend of errors within different distance
ranges can be seen clearly. Our results at different distance
A. ACCURACY VERIFICATION OF THE ENTIRE SYSTEM positions present a slight deviation from the ground truth,
MODEL and overall system accuracy and stability are improved.
We compare the average error of the distance estimation To show the advantages of our system more intuitively,
results with the different methods proposed in [1] and [31], the result is illustrated in Fig. 12, thereby verifying the
and the results are presented in Table 1. In the system accuracy of the predicted values with the ground truth.
framework of the methods presented in [1] [31], a “non- Figure 12 presents the visualization of our method for
area–distance” geometric model was established by using estimating the distance between the sideway and occluded
the perspective principle to estimate distance, whereas we vehicles. The results show that after adding the attitude
propose to use the area projection principle to establish angle information of the target vehicle in our distance
the “area–distance” geometric model to realize distance estimation system framework, the absolute error of the target
estimation. vehicle ranging result within 25 m is guaranteed to be less
The experiment shows that the results of our method than 1 m. Even for target vehicles larger than 50 m, the
are optimal for distance estimation within different distance absolute error can be guaranteed to be approximately 2 m.
ranges. The average error of the estimation results using
VOLUME 4, 2016 9

L. Huang et al.: Robust inter-vehicle distance estimation method based on monocular vision

(a)

(b)

(c)

FIGURE 12: Distance experimental results on KITTI dataset.

C. SYSTEM ROBUSTNESS VERIFICATION V. CONCLUSION

To evaluate the performance of the proposed method more This study combines the attitude angle information of a
comprehensively and verify the robustness of the system, vehicle with its segmentation information and proposes a
front and sideway vehicles with different attitude angle robust inter-vehicle distance estimation method from an in-
information are used in testing. Meanwhile, the average car camera based on monocular vision. Considering the
error rate of the distance estimation results of different types attitude angle changes of different types of vehicles in
of vehicles is compared with the methods proposed in [8] complex traffic scenarios, distance estimation based on angle
and [32]. The results are presented in Table 2. information can improve the problem of considerable vari-
On the basis of the experimental results, the average ation in the ranging accuracy of different types of vehicles,
error rate of our method for estimating the distance of a thereby solving the system problem of limited detection
sideway vehicle is reduced to approximately 2.8%, thereby range, improving the robustness and accuracy of the system,
indicating a considerable decrease compared with other helping drivers focus on the situation ahead, and reducing
methods. Moreover, the deviation in the average error rate the occurrence of traffic accidents. From the experimental
of the distance estimation results between the front and results, this method can adapt to most traffic scenarios and
sideway vehicles is approximately 2%. The average error exhibits good robustness against different driving states of
rate for the estimated distance results among different types vehicles ahead.
of vehicles is substantially reduced compared with other In the future, we will analyze vehicles driving in dif-
methods, thereby overcoming the limitations and inappli- ferent scenarios (such as highway, corner, and rural street
cability of existing distance estimation methods. scenes) to further expand our distance estimation system.
10 VOLUME 4, 2016

L. Huang et al.: Robust inter-vehicle distance estimation method based on monocular vision

TABLE 2: Comparison between experimental distance and ground true of two groups of experiments

Number 1 2 3 4 5 6 7
Distance verification
Ground truth(m) 23.97 23.99 28.016 19.03 18.05 22.4 7.03
of the front cars
Experimental distance(m) 24 23.91 28.17 19.19 18.22 23.22 6.182

Absolute Error(m) 0.03 0.08 0.154 0.16 0.17 0.82 0.848

Ground truth(m) 19.205 10.84 11.06 7.68 10.4 13.395 53.395

Distance verification
Experimental distance(m) 24 23.91 28.17 19.19 18.22 23.22 6.182
of sideway vehicle cars
Absolute Error (m) 0.025 0.19 0.21 0.27 0.36 0.485 2.085

The average error rate of distance estimates for Front vehicles and Sideway vehicles

Method Front vehicle0 s average error rate(%) Sideway vehicle0 s average error rate (%)

Method [32] 4.702 9.237

Method [8] 1.333 4.698

Ours 1.077 2.820

In addition, for the front occlusion vehicle, there is no [11] E. Raphael, R. Kiefer, P. Reisman, and G. Hayon, “Development of a
simple method with high accuracy and efficiency in the camera-based forward collision alert system,” SAE International Journal
of Passenger Cars-Mechanical Systems, vol. 4, no. 2011-01-0579, pp. 467-
existing distance estimation methods. In order to improve 478, 2011.
the applicability of our method, we will focus on this aspect [12] D.O Cualain, M. Glavin, E. Jones, and P. Denny, “Distance detection
and continuously improve our distance estimation system. systems for the automotive environment: A review,” Irish Signals and
Systems Conf, Derry, N. Ireland, pp. 13-14, 2007.
[13] V. D. Nguyen, T. T. Nguyen, D. D. Nguyen, and J.W. Jeon, “Toward real
REFERENCES time vehicle detection using stereo vision and an evolutionary algorithm,”
[1] L. Liu, C. Fang, and S. Chen, “A Novel Distance Estimation Method 2012 IEEE 75th Vehicular Technology Conference (VTC Spring), pp. 1-5,
Leading a Forward Collision Avoidance Assist System for Vehicles on 2012.
Highways,” IEEE Transactions on Intelligent Transportation Systems, vol. [14] D. Bao and P.Wang, “Vehicle Distance Detection Based on Monocular
18, no. 4, pp. 937-949, April 2017. Vision,” 2016 International Conference on Progress in Informatics and
[2] J. Han, O. Heo, M. Park, S. Kee, and M.Sunwoo, “Vehicle distance Computing (PIC), pp. 187–191, 2016.
estimation using a mono-camera for FCW/AEB systems,” International [15] A. A Ali and H. A. Hussein, “Distance Estimation and Vehicle position
Journal of Automotive Technology, vol. 17, no. 3, pp. 483-491, June 2016. detection Based on Monocular Camera,” 2016 Al-Sadeq International
[3] M. Rezaei, M. Terauchi, and R. Klette, “Robust Vehicle Detection and Conference on Multidisciplinary in IT and Communication Science and
Distance Estimation Under Challenging Lighting,” IEEE Transactions on Applications (AIC-MITCSA), pp. 1–4, 2016.
Intelligent Transportation Systems, vol. 16, no. 5, pp. 2723-2743, Oct. [16] B. Li, X. Zhang, and M. Sato, “Pitch angle estimation using a Vehicle-
2015. Mounted monocular camera for range measurement,” 2014 12th Interna-
[4] P. Wongsaree, S. Sinchai, P. Wardkein, and J. Koseeyaporn, “Distance De- tional Conference on Signal Processing (ICSP), pp. 1161-1168, 2014.
tection Technique Using Enhancing Inverse Perspective Mapping,” 2018 [17] S.Tuohy, D.O¡¯Cualain, E. Jones, and M.Glavin, “Distance Determination
3rd International Conference on Computer and Communication Systems for an Automobile Environment using Inverse Perspective Mapping in
(ICCCS), pp. 217-221, April 2018. OpenCV,” Signals and Systems Conference (ISSC 2010), vol. 2010, pp.
[5] K. Nakamura, K. Ishigaki, T. Ogata, and S. Muramatsu, “Real-time 100-105, July 2010.
monocular ranging by bayesian triangulation,” 2013 IEEE Intelligent [18] A. Bharade, S. Gaopande, and A G Keskar, “Statistical approach for
Vehicles Symposium (IV). IEEE, pp. 1368-1373, 2018. distance estimation using Inverse Perspective Mapping on embedded
[6] D. Bao and P. Wang, “Vehicle distance detection based on monocular platform,” 2014 Annual IEEE India Conference (INDICON), pp. 1-5, Dec.
vision,” 2016 International Conference on Progress in Informatics and 2014.
Computing (PIC). IEEE, pp. 187–191, 2016. [19] W. Yao and U. Stilla, “Comparison of Two Methods for Vehicle Extraction
[7] D-Y. Huang, C-H. Chen, T-Y. Chen, W-C. Hu, and K-W. Feng, “Vehicle From Airborne LiDAR Data Toward Motion Analysis,” IEEE Geoscience
detection and inter-vehicle distance estimation using single-lens video and Remote Sensing Letters, vol. 8, no. 4, pp. 607-611, July, 2011.
camera on urban/suburb roads,” Journal of Visual Communication and [20] R. Adamshuk, D. Carvalho. J. H. Z. Neme, E. Margraf, S. Okida, A.
Image RepresentationElect, vol. 46, pp. 250-259, July 2017. Tussetet al., “On the Applicability of Inverse Perspective Mapping for the
[8] L. Huang, Y. Chen, Z. Fan, Z. Chen, “Measuring the Absolute Distance Forward Distance Estimation based on the HSV Colormap,” 2017 IEEE
of a Front vehicle from an In-car Camera Based on Monocular Vision and International Conference on Industrial Technology (ICIT), pp. 1036-1041,
Instance Segmentation,” Journal of Electronic Imaging, vol. 27, no. 4, pp. March, 2017.
043019, July 2018. [21] A. Joglekar, D.Joshi, R.Khemani, S. Nair, and S. Sahare,“Depth estimation
[9] M. Hammer, M. Hebel, B. Borgmann, M. Laurenzis, and M. Arens, using monocular camera,” International journal of computer science and
“Potential of lidar sensors for the detection of UAVs,” Proc.SPIE, vol. information technologies, vol. 2, no. 4, pp. 1758-1763, 2011.
10636, pp. 10636 - 10636 - 7, May 2018. [22] D. Eigen, C. Puhrsch, and R. Fergus, “Depth map prediction from a
[10] Reinhard Klette, “Concise computer vision,” Springer, 2014. [Online]. single image using a multi-scale deep network,” in Advances in neural
Available: www.springer.com. information processing systems(NIPS), pp. 2366-2374, 2014.

VOLUME 4, 2016 11

L. Huang et al.: Robust inter-vehicle distance estimation method based on monocular vision

[23] D. Eigen and R. Fergus, “Predicting depth, surface normals and semantic
labels with a common multi-scale convolutional architecture,” Proceed-
ings of the IEEE international conference on computer vision(ICCV),
pp.2650-2658, 2015.
[24] F. Mahmood and NJ. Durr, “Deep learning-based depth estimation from
a synthetic endoscopy image training set,” Proc. SPIE 10574, Medical
Imaging 2018: Image Processing, vol. 10574, pp. 1057421, March. 2018.
[25] K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” Proceed-
ings of the IEEE international conference on computer vision(ICCV), pp.
2961-2969, 2017.
[26] A. Mousavian, D. Anguelov, J. Flynn, and J. Kosecka, “3D Bounding
Box Estimation Using Deep Learning and Geometry,” Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition(CVPR),
pp. 7074-7082, 2017.
[27] V. T. B. Tram and M. Yoo, “Vehicle-to-Vehicle Distance Estimation Using
a Low-Resolution Camera Based on Visible Light Communications,”
IEEE Access, vol. 6, pp. 4521-4527, July, 2018.
[28] T-Yi. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie,
“Feature pyramid networks for object detection,” Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition(CVPR), pp.
2117-2125, 2017.
[29] ISO 15623, “Intelligent transport systems¡ªForward vehicle collision
warning systems¡ªPerformance requirements and test procedures.” ISO/TC
204, Intelligent transport systems, 2013.
[30] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving
? the KITTI vision benchmark suite,” 2012 IEEE Conference on Computer
Vision and Pattern Recognition(CVPR), pp. 3354-3361, 2012.
[31] S. Sivaraman and MM. Trivedi, “Integrated lane and vehicle detection,
localization, and tracking: A synergistic approach,” IEEE Transactions on
Intelligent Transportation Systems, vol. 14, no. 2, pp. 906-917, March,
2013.
[32] R. Garg, VK. BG, G. Carneiro, and I. Reid, “Unsupervised CNN for Single
View Depth Estimation: Geometry to the Rescue,” Computer Vision –
ECCV 2016, Springer International Publishing, pp. 740-756, 2016.
[33] K. Simonyan and A. Zisserman., “Very deep convolutional networks for
large-scale image recognition,” CoRR, vol. abs/1409.1556, 2014. [Online].
Available: http://arxiv.org/abs/1409.1556.
[34] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-
time object detection with region proposal networks,” Advances in neural
information processing systems(NIPS), pp. 91-99, 2015.
[35] C. Wang, L. Yang, Y. Wu, Y. Wu, X. Cheng, Z. Li, and Z. Liu, “Data
Provenance With Retention of Reference Relations,” IEEE Access, vol. 6,
pp. 77033-77042, October, 2018.
[36] C. Wang, Z. Zhao, L. Gong, L. Zhu, Z. Liu, and X. Cheng , “A Distributed
Anomaly Detection System for In-Vehicle Network Using HTM,” IEEE
Access, vol. 6, pp. 9091–9098, 2018.
[37] S. Liu, Z. Li, Y. Zhang, and X. Cheng, “Introduction of Key Problems in
Long-Distance Learning and Training,” Mobile Networks and Application-
s, vol. 24, no. 1, pp. 1-4, 2019.
[38] W. Xie, and X. Cheng, “Imbalanced big data classification based on virtual
reality in cloud computing,” Multimedia Tools and Applications, 2019.
[Online]. Available: https://doi.org/10.1007/s11042-019-7317-x.
[39] Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong., “Mathe-
matics for Machine Learning,” To be published by Cambridge University
Press, pp. 1541-1911, 2018. [Online]. Available: https://mml-book.com.