0% found this document useful (0 votes)
37 views8 pages

Unstructured Road Vanishing Point Detection

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views8 pages

Unstructured Road Vanishing Point Detection

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

1

Unstructured Road Vanishing Point Detection Using the


Convolutional Neural Network and Heatmap Regression
Yin-Bo Liu, Ming Zeng, Qing-Hao Meng
Institute of Robotics and Autonomous Systems
Tianjin Key Laboratory of Process Measurement and Control
School of Electrical and Information Engineering
Tianjin University, Tianjin 300072, China

Unstructured road vanishing point (VP) detection is a challenging problem, especially in the field of autonomous driving. In
this paper, we proposed a novel solution combining the convolutional neural network (CNN) and heatmap regression to detect
unstructured road VP. The proposed algorithm firstly adopts a lightweight backbone, i.e., depthwise convolution modified HRNet, to
arXiv:2006.04691v1 [cs.CV] 8 Jun 2020

extract hierarchical features of the unstructured road image. Then, three advanced strategies, i.e., multi-scale supervised learning,
heatmap super-resolution, and coordinate regression techniques are utilized to achieve fast and high-precision unstructured road VP
detection. The empirical results on Kong’s dataset show that our proposed approach enjoys the highest detection accuracy compared
with state-of-the-art methods under various conditions in real-time, achieving the highest speed of 33 fps.

Index Terms—vanishing point detection, unstructured road, HRNet, YOLO, heatmap regression.

I. I NTRODUCTION There have been several attempts to utilize deep-learning-


based strategies to solve road VP detection problems [8],
Recently, the research of vanishing point (VP) detection has
[9]. However, most of the deep-learning-based methods only
gradually become one of the most popular topics in the field
focused on structured roads. To the best of our knowledge,
of computer vision. Vanishing point is defined as the point of
there is no solution based on deep learning available in the
intersection of the perspective projections of a set of parallel
literature on the subject of unstructured road VP detection.
lines in 3D scene onto the image plane [1]. Since the VP of the
In this paper, we propose a novel heatmap regression
image contains more valuable cues, it has been widely used in
method based on multi-scale supervised learning for unstruc-
many areas, such as camera calibration [2], camera distortion
tured road VP detection. The heatmap regression technique
correction [3], visual place recognition [4], lane departure
can be used to estimate the locations of pixel-level keypoints
warning (LDW) [5], and simultaneous localization and map-
in the image and works well in 2D human pose estimation
ping (SLAM) [6]. Specifically, in terms of autonomous driving
applications [10]. However, the conventional heatmap regres-
applications, the detection techniques for structured roads with
sion techniques only estimate the keypoints on the 1/4 or 1/2
clear markings have been extensively explored in the literature.
single, coarse-scale of the input image. Therefore, it cannot
However, detecting unstructured roads without clear markings
meet the requirement of high-precision road VP detection in
is still a challenging problem that is not perfectly solved
the application of autonomous driving. The proposed method
[7]. The VP provides important clues for unstructured road
adopts three effective tricks such as multi-scale supervised
detection. To be concrete, autonomous vehicles can identify
learning, heatmap super-resolution and coordinate regression
the drivable areas according to the location of the VP, which
to achieve fast and accurate unstructured road VP detection.
provides early warning of departure from the lane.
The experimental results on the public Kong’s dataset verify
Existing road VP detection methods could be divided into
the effectiveness of the proposed algorithm.
two categories: traditional methods and deep-learning-based
The main contributions are as follows:
methods. Traditional approaches for unstructured road VP
• To the best of our knowledge, the proposed approach is
detection mainly rely on the texture information extracted
the first attempt to solve the problem of unstructured road
from the road images. These texture-based traditional methods
VP detection with the deep CNN architecture. Specif-
usually have two shortcomings: 1) The performance of these
ically, it integrates three advanced strategies including
methods is easily affected by the factors of uneven illumination
improved lightweight backbone design, multi-scale su-
and image resolution, and thus the detection results are often
pervised learning, and heatmap super-resolution, which
unstable, resulting in low VP detection accuracy in many
make the proposed algorithm have advantages of high
cases; 2) Traditional texture detection and follow-up voting
accuracy and rapidity.
processing are very time-consuming, which cannot meet the
• Our approach can run at 33 fps on an RTX 2080 Ti
high real-time requirements for autonomous driving. In recent
GPU, which is several times faster than state-of-the-art
years, deep learning technology has made a series of major
methods.
breakthroughs in many fields, especially for image recognition.
• In order to evaluate the performance of different algo-

Corresponding author: Ming Zeng, Qing-Hao Meng (email: zeng- rithms more accurately, we have constructed a manually
[email protected], qh [email protected]). labeled training dataset including 5,355 images of un-
2

structured roads. orientation information of the road. Shi et al. [1] use a particle
The remainder of this paper is organized as follows. We first filter for reducing the misidentification chances and computa-
review some relevant works in Section II. In Section III, the tional complexity, and a soft voting scheme for VP detection.
proposed algorithm is introduced in detail. Then, we compare Generally speaking, the texture-based methods can detect the
the performance of different algorithms in Section IV, and VP for both the structured and unstructured roads. However,
followed by the conclusions in Section V these methods also have common shortcomings: 1) Texture
detection performance strongly depends on image quality. For
II. R ELATED WORK the image with uneven illumination or poor definition, the
extracted textures are usually not good, which has a great
We firstly introduce some traditional algorithms about road
negative impact on the accuracy of VP detection; 2) The
VP detection. Secondly, a brief review of the state-of-the-art
process of cumulative pixel voting to predict the VP is time-
algorithmsinthefield of heatmap regression is given.
consuming, which cannot be directly used for the scenarios
where real-time requirements are high, e.g., the scenario of
A. Road VP detection automatic driving.
The traditionalmethodsfoundintheliterature could be divided
into three categories, i.e., edge-based detection methods, B. Heatmap regression for keypoint detection
region-based detection methods, and texture-based detection
methods. Among them, edge-based and region-based detection In recent years, the research of deep-learning-based heatmap
approaches are commonly used for detecting the VPs in the regression has attracted considerable attention in the field of
structured roads. The edge-based detection algorithms estimate image keypoint detection. The CNN based heatmap regression
the VPs based on the information of road edges and contours, scheme is firstly applied in human pose estimation [24]. The
e.g., the spline model proposed by Wang et al. [11], the empirical results show that it can accurately predict the proba-
cascade Hough transform model presented by Tuytelaars et bility of human joints with pixel resolution, which outperforms
al. [12]. The B-snake based lane model proposed by Wang et most of traditional keypoint detection methods. Subsequently,
al. [13] can describe a wider range of road structures due to the CNN based heatmap regression methods are successfully
the powerful ability of B-Spline, which can form any arbitrary introduced to other application areas, such as general target
shape. Instead of using hundreds of pixels, Ebrahimpour et al. recognition [25], layout detection [26] and target tracking [27].
[14] import the minimum information (only two pixels) to In the early heatmap regression networks, the conventional
the Hough space to form a line, which greatly reduces the Resnet module [28] is widely used as the backbone, but the
complexity of the algorithm. The region-based methods locate detection accuracy is not satisfactory. Although recently mod-
the VPs by analyzing the similar structures and repeating ified versions, such as stacked hourglass network (Hourglass)
patterns in the road images. Specifically, Alon et al. [15] utilize [10] and high-resolution representation network (HRNet) [29]
the technique of region classification and geometric projection greatly improve the accuracy of keypoint detection, they still
constraints to detect the road VP. The region features of have a common shortcoming of long prediction time, which
the self-similarity[16] and road boundaries [17] are useful is not suitable for high-speed scenarios.
features for road VP detection. In addition, Alvarez et al. [18]
utilize 3D scene cues to predict the VP. Although the above- III. M ETHODOLOGY
mentioned edge-based and region-based approaches work well Previous empirical results show that the CNN based
on the simple structured road scenes, for the complicated heatmap expression is an advanced technology for image key-
scenes of unstructured roads, the detection performance is point detection and it can locate the keypoint with pixel-level
usually poor or even completely invalid [19]. The main reasons resolution. At present, this methodology has achieved good re-
for unsatisfying results are that the unstructured roads often do sults in the application of 2D human pose estimation. The road
not have clear lane and road boundaries, and there are many VP can be regarded as a special kind of keypoints. Therefore,
disturbances such as tire or snow tracks. we can also use this advanced keypoint detection technology
In traditional texture-based methods, a large number of to deal with the challenging problem of unstructured road VP
textures are firstly extracted from the road image, and then detection. Concretely, the modified HRNet is firstly utilized as
the strategy of accumulating pixel voting is adopted to detect the backbone to extract the image features. Secondly, multi-
the road VP. For example, Rasmussen et al. [20] use Gabor scale heatmap supervised learning is employed to obtain more
wavelet filters for texture analysis and the Hough-style voting accurate keypoint (Vanishing Point) estimation. Finally, high-
strategy for VP detection. Kong et al. [21] employe 5-scale, precision VP coordinates are obtained using the strategy of
36-orientation Gabor filters to extract the textures, and then coordinate regression. The specific network architecture is
utilize the adaptive soft voting scheme for VP detection. shown in Fig. 1.
To accelerate the speed of VP detection, Moghadam et al.
[22] only use four Gabor filters in the process of feature
extraction. In order to obtain better textures, Kong et al. [19] A. Multi-Scale Supervision
replace conventional Gabor filters with generalized Laplacian The latest research suggests that multi-scale supervised
of Gaussian (gLoG) filters. Yang et al. [23] adopt a Weber learning is an effective way to obtain accurate heatmaps [29].
local descriptor to obtain salient representative texture and Multi-scale heatmap supervised learning refers to fusing two
3

Fig. 1: Illustration of our proposed network architecture. Firstly, we adopt the depthwise convolution [30] modified HRNet as
the backbone to extract hierarchical features of the input image. Then, we combine 1/4 and 1/2 scale heatmaps for multi-scale
supervised learning. Finally, the coordinate regression is employed and then directly output high-precision VP coordinates.

Fig. 2: Illustration of 3 types from 1/4 scale upsamples to 1/2 scale heatmap. (A) represents a layer of deconvolution +
BatchNorm + ReLU. (B) is the up-projection unit used in this paper. (C) is a 2-level up-projection model.

or more scale heatmaps for keypoint detection. Generally, in B. Coordinate Regression


traditional heatmap regression methods, the analyzing scale In order to obtain the coordinates of a keypoint from
of the heatmap is 1/4 of the input image and the correspond- the heatmap, there are two widely used traditional methods,
ing resolution can meet the requirements of most ordinary i.e., extracting the coordinates of the maximum point in the
keypoint detection tasks. However, for the task of road VP heatmap or estimating the keypoint position through a Gaus-
detection, 4 times error amplification is hard to accept. To this sian distribution. The traditional methods have two limitations:
end, we obtain a more accurate and higher-resolution 1/2 scale 1) Since the resolution of the heatmap is generally 1/2 or 1/4
heatmap from the coarse-grained 1/4 scale heatmap using the scale of the input image, the estimated error is amplified 2
back-projection-based super-resolution technique [7]. times or 4 times accordingly; 2) Estimating the coordinates
based on Gaussian distribution requires additional calculations
which affect the real-time performance of the algorithm.
The super-resolution module used in our proposed algorithm To overcome the shortcomings of traditional methods, the
is an up-projection unit (UPU), as shown in Fig. 2. Each sub- proposed algorithm introduces a coordinate regression module
module of Deconv consists of a 5 × 5 deconvolution operation to directly output accurate VP coordinates.
followed by BatchNorm and ReLU operations, and each sub- The coordinate regression module used in our approach
module of Conv is composed by a 3 × 3 convolution operation is inherited from YOLO. As proven in YOLO v2 [32], the
followed by BatchNorm and ReLU operations. As suggested prediction of the offset to the cell is a stabilized and accurate
in higherHRNet [31], the input of our upsample module is way in 2D coordinate regression. Thus, the predicted point
the concatenation of the feature maps and the predicted 1/4 (Vx , Vy ) is defined as
scale heatmap. In the subsequent ablation study, we will
Vx = f (x) + cx , (1)
systematically compare the performance difference using the
super-resolution module with other upsampling modules. Vy = f (y) + cy , (2)
4

Fig. 3: The above figure shows six of the detection results of our method. From left to right: input image, 1/4 scale VP estimation
heatmap, 1/2 scale VP estimation heatmap, VP coordinates determined by coordinate regression (the red dot represents the VP
position detected by the proposed algorithm, and the white dot is the ground truth).

where f (·) is a sigmoid function of the road VP. (cx , cy ) is C. Loss Function
the coordinate of the top-left corner of the associated grid cell. To train our complete network, we minimize the following
loss function.
Lvp = λcoord lcoord + λconf lconf + λh (lh1 + lh2 ), (3)
where lcoord , lh1 , lh2 and lconf denote the coordinate losses of
the VP, the low resolution heatmap loss, the high resolution
5

heatmap loss and the confidence loss, respectively. We use perspective of human perception. The normalized Euclidean
the mean-squared error for the VP heatmap loss, while as distance is defined as:
suggested in YOLO, the confidence and VP coordinates are kPg − Pv k
predicted through logistic regression. For the cells that do not N ormDist = , (4)
Diag(I)
contain the VP, we set λconf to 0.5, and for the cell that
where Pg and Pv denote the ground truth of the VP and
contains VP, we set λconf to 1. λcoord and λh are used for
the estimated VP, respectively. Diag(I) is the length of the
balancing the two training factors, i.e., the heatmap accuracy
diagonal of the input image. The closer the N ormDist is
and coordinate accuracy. Here, we set λh to 1 and λcoord to
to 0, the closer the estimated VP is to the ground truth. The
2.
N ormDist greater than 0.1 is set to 0.1, which is considered
to be a failure of the corresponding method.
D. Implementation Details
C. Comparisons
We implemented our method in Python using Pytorch 1.3
and CUDA 10 and ran it on an [email protected] with Fig. 3 shows the test results of the proposed algorithm on
dual NVIDIA RTX 2080 Ti. We used our unstructured road unstructured road images. From left to right are input image,
vanishing point (URVP) dataset as the training dataset, which 1/4 scale output heatmap, 1/2 scale output heatmap and VP
contains 5,355 images, and Kong’s public dataset as the coordinates detected by coordinate regression. The white dot
test dataset. All input images were reshaped to 320 × 320 represents the ground truth of VP, and the red dot stands for the
for training. We applied a Gaussian kernel with the same predicted VP coordinate. It is obvious that from left to right,
standard deviation (std = 3 by default) to all these ground the possible range of the detected VP is gradually reduced and
truth heatmaps. We used stochastic gradient descent (SGD) the proposed method achieves good results on the unstructured
for optimization and started with a learning rate of 0.001 for roads.
the backbone and 0.01 for the rest of the network. We divided In view of the fact that the Moghadam’s dataset has small
the learning rate by 10 every 20 epochs, with a momentum of number of testing images (only 500 images) and 248 images
0.9. We also adopted data augmentation with random flip and are the same as those of the Kong’s dataset, we only compared
image rotations. the proposed algorithm with four state-of-the-art methods, i.e,
Kong (Gabor) [21], Kong (gLoG) [19], Moghadam [22], Yang
[23] on the Kong’s dataset (1003 pictures). Fig. 4 shows some
IV. E XPERIMENTS VP detection examples, in which white dots denote the ground
In this section, we first briefly introduce the construction of truth results, red ones are results of our method, black ones
the training dateset, a.k.a. URVP dataset, and then illustrate a are results of Kong (Gabor), light blue ones are Moghadam’s
comprehensive performance study of our proposed method and results, green ones are Yang’s results, blue ones are results
other state-of-the-art algorithms using Kong’s public dataset. of Kong (gLoG). Obviously, our proposed algorithm is more
Subsequently, we quantitatively analyze the influence of each robust and accurate than the existing state-of-the-art methods.
part of the model on performance through an ablation study. Furthermore, we quantitatively evaluate the performance
of our method with normalized Euclidean distance. The test
results show that the proposed detection algorithm outperforms
A. Dataset construction other existing road VP detection algorithms. More specifically,
Currently, there are only two public databases available to the proposed method has the highest percentage of N ormDist
evaluate the performance of different algorithms of unstruc- error in [0, 0.1) compared with the four existing VP detection
tured road VP detection, i.e., Kong’s dataset (containing 1003 algorithms, as shown in Fig. 5. Our method can detect 207
images) and Moghadam’s dataset (500 images). In view of the images with N ormDist error less than 0.01, while the number
fact that the total number of images is very small, these two of images detected by Kong (Gabor), Yang, Kong (gLoG) and
public datasets are mainly used for algorithm testing. In other Moghadam is 175, 160, 138 and 78, respectively. For cases
words, there are not enough labeled images for deep network with large detection errors (N ormDist ≥ 0.1), the proposed
training. To this end, we utilized the tools of Flickr and Google algorithm only contains 103 images, while the comparison
Image to build a new training dataset, namely URVP. Specif- methods are Kong (Gabor) (160), Kong (gLoG) (209), Yang
ically, we first collected more than 10,000 unstructured road (210) and Moghadam (400). In addition, Table. I shows the
images using related keywords such as vanishing point, road, statistical results of the mean runtime of CPU for different
and unstructured road. Subsequently, after data cleaning and methods. It can be seen that our method is much faster than
manual labeling, 5,355 labeled images were finally obtained the counterparts of Kong (Gabor), Kong (gLoG), Yang and
as the training dataset. Moghadam’s algorithms.

D. Ablation Study
B. Metrics The multi-scale supervision, upsampling module, and coor-
We adopt the normalized Euclidean distance suggested in dinate regression are three important branches in our model.
[22] to measure the estimation error between the detected Therefore, we will quantitatively analyze the impact of these
VP and the ground truth manually determined through the branches on the performance of the detection network.
6

Fig. 4: Experimental results on some test images from Kong’s dataset. Red dot denotes the results of our algorithm, black ones
Kong (Gabor), light blue ones Moghadam, green ones Yang, blue ones Kong (gLoG), and white ones the ground truth.

TABLE I: Comparison results of mean error and mean running


100
Our method
time for different approaches. Here, we only compared the
90 Kong(Gabor) CPU running times since the Kong (Gabor), Kong (gLoG),
Kong(gLoG)
Moghadam Yang and Moghadam’s methods are non-CNN methods which
Percentage of the whole dataset(%)

80
Yang can not run on the GPU.
70

60 Methods Mean error CPU Running Time (s)


Kong (Gabor) 0.040639 20.1021
50
Kong (gLoG) 0.051556 21.213
40 Moghadam 0.063407 0.2423
30 Yang 0.045931 0.752
20
Proposed 0.034875 0.2024
10

0
convolution modified HRNet-48 (HRNet-48-M), the number
-10
-0.01 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 of images with a detection error N ormDist<0.01 is 207,
NormDist Error which is higher than the models using Hg4 (192) or HRNet-
Fig. 5: Comparison results of accumulated error distribution 48 (205). For cases with large detection errors (N ormDist
of different VP detection algorithms in Kong’s dataset. On the is equal to or greater than 0.1), unsatisfying results for the
x-axis, 0 stands for N ormDist in [0,0.01), 0.01 stands for model using HRNet-48-M is 103, which is smaller than those
N ormDist in [0.01, 0.02)..., and 0.1 represents N ormDist of models using HRNet-48 (115) and Hg4 (113). Moreover,
in [0.1,1]. the GPU estimated speed of the model using HRNet-48-M is
33 fps, which is faster than those of models using Hg4 (23 fps)
and HRNet-48 (29 fps). Obviously, the model using HRNet-
1) Backbone 48-M has two advantages, i.e., high speed and high accuracy
over the models with Hg4 and HRNet48.
We systematically tested the effects of different backbones,
i.e., Hourglass (Stack = 4) (Hg4) and HRNet (W = 48) 2) Multi-scale Supervision
(HRNet-48), on accuracy and detection speed of the model. The influence of multi-scale supervised learning on detec-
Table II shows the accuracy, GPU and CPU runtime with tion performance is shown in Fig. 6. When only the 1/2 scale
different backbones. To be concrete, using the depthwise supervised learning model is adopted, the number of images
7

TABLE II: Ablation study of different backbones. GPU-speed and CPU-speed represent the number of frames that different
backbones run on the GPU and CPU, respectively.
Number of images with Number of images with
Backbones GPU-speed CPU-speed
NormDist error <0.01 NormDist error ≥ 0.1
Hg4 192 113 23.04 fps 2.02 fps
HRNet-48 205 115 29.15 fps 2.90 fps
HRNet-48-M 207 106 33.05 fps 4.94 fps

with detection error N ormDist less than 0.01 is 181. When TABLE III: Ablation study of different upsampling modules.
only 1/4 scale supervised learning model is applied, the num- Upsampling
w/ Deconv
w/ UPU
w/ 2-stage
mean error
ber of images satisfying the accuracy is 169. When 1/2 and 1/4 block UPU
DB X 0.035541
scales are fused to implement multi-scale supervised learning UPU X 0.034875
and 1/4 scale of the input image is used to make coordinate 2-stage UPU X 0.034954
regression, the number of images satisfying N ormDist<0.01
is 203. When we adopt multi-scale supervised learning and TABLE IV: Ablation study of different regression components
1/2 scale of the input image for coordinate regression, we can such as single heatmap regression, multi-scale heatmap regres-
detect 207 images with N ormDist<0.01. Obviously, the last sion, and coordinate regression.
one is the best choice. w/ heatmap w/ multi-scale w/ coordinate
Type mean error
regression regression regression
a X 0.035416
100
1/4 + 1/2 and 1/2 out b X X 0.035152
Only 1/4 out c X X X 0.034875
90 Only 1/2 out
1/4 + 1/2 and 1/4 out
Percentage of the whole dataset(%)

80
combination of heatmap + multi-scale regression, the mean
70
error of N ormDist is 0.035152. When using the strategy of
60 heatmap + multi-scale regression + coordinate regression, the
mean error of N ormDist is 0.034875. Therefore, we select
50
the last strategy to estimate the coordinates of the road VP.
40
V. C ONCLUSION
30
Quickly and accurately detecting the vanishing point (VP)
20 in the unstructured road image is significantly important for
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 autonomous driving. In this paper, a CNN based heatmap
NormDist Error regression solution for detecting the unstructured road VP is
Fig. 6: Ablation study of multi-scale supervision. Illustration proposed. The modified lightweight backbone, i.e., depthwise
of the output of a single 1/4 scale and 1/2 scale supervised convolution modified HRNet, improves the detection speed to
learning coordinate regression, 1/2 and 1/4 scale fusion learn- 33 fps on an RTX 2080 Ti GPU, which is several times faster
ing and 1/4 scale coordinate regression, 1/2 and 1/4 scale than that of the state-of-the-art algorithms. In addition, three
fusion learning and 1/2 scale coordinate regression. useful tricks such as multi-scale supervised learning, heatmap
super-resolution, and coordinate regression are utilized, which
3) Upsampling Module make the proposed approach achieve the highest detection
We select three different upsampling modules (Fig. 2) and accuracy compared with the recent existing methods. In the
measure the impact of different module selection on the future, we plan to utilize the proposed VP detection technique
performance of VP detection. The results are shown in Table to carry out the research of multi-task learning for accurate
III. When the deconvolution module (A) is adopted, the mean lane detection. Our code and constructed URVP dataset will
error of N ormDist is 0.035541. If we select the 2-stage Up- be made publicly available for the sake of reproducibility
Projection Unit (C), the mean error of N ormDist is 0.034954.
And when using the Up-Projection Unit (B), it can achieve the ACKNOWLEDGMENT
smallest mean error (0.034875). Therefore, the VP detection This work is supported by the National Natural Science
network selects the Up-Projection Unit for the upsampling Foundation of China (No. 61573253), and National Key R&D
operation. Program of China under Grant No. 2017YFC0306200.
4) Coordinate Regression
Finally, we tested the influence of different coordinate R EFERENCES
regression selections on detection performance. The results are
[1] J. Shi, J. Wang, and F. Fu, “Fast and robust vanishing point detection
shown in Table IV. When we only use heatmap regression, for unstructured road following,” IEEE Transactions on Intelligent
the mean error of N ormDist is 0.035416. When using a Transportation Systems, vol. 17, no. 4, pp. 970–979, 2015.
8

[2] G. Zhang, H. Zhao, Y. Hong, Y. Ma, J. Li, and H. Guo, “On-orbit [25] H. Law and J. Deng, “Cornernet: Detecting objects as paired key-
space camera self-calibration based on the orthogonal vanishing points points,” in Proceedings of the European Conference on Computer Vision
obtained from solar panels,” Measurement Science and Technology, (ECCV), 2018, pp. 734–750.
vol. 29, no. 6, p. 065013, 2018. [26] C.-Y. Lee, V. Badrinarayanan, T. Malisiewicz, and A. Rabinovich,
[3] Z. Zhu, Q. Liu, X. Wang, S. Pei, and F. Zhou, “Distortion correc- “Roomnet: End-to-end room layout estimation,” in Proceedings of the
tion method of a zoom lens based on the vanishing point geometric IEEE International Conference on Computer Vision, 2017, pp. 4865–
constraint,” Measurement Science and Technology, vol. 30, no. 10, p. 4874.
105402, 2019. [27] G. Ning, Z. Zhang, C. Huang, X. Ren, H. Wang, C. Cai, and Z. He,
[4] L. Pei, K. Liu, D. Zou, T. Li, Q. Wu, Y. Zhu, Y. Li, Z. He, Y. Chen, “Spatially supervised recurrent convolutional neural networks for visual
and D. Sartori, “Ivpr: An instant visual place recognition approach object tracking,” in 2017 IEEE International Symposium on Circuits and
based on structural lines in manhattan world,” IEEE Transactions on Systems (ISCAS). IEEE, 2017, pp. 1–4.
Instrumentation and Measurement, 2019. [28] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
[5] J. H. Yoo, S.-W. Lee, S.-K. Park, and D. H. Kim, “A robust lane detection recognition,” in Proceedings of the IEEE conference on computer vision
method based on vanishing point estimation using the relevance of line and pattern recognition, 2016, pp. 770–778.
segments,” IEEE Transactions on Intelligent Transportation Systems, [29] K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution repre-
vol. 18, no. 12, pp. 3254–3266, 2017. sentation learning for human pose estimation,” in Proceedings of the
[6] Y. Ji, A. Yamashita, and H. Asama, “Rgb-d slam using vanishing point IEEE Conference on Computer Vision and Pattern Recognition, 2019,
and door plate information in corridor environment,” Intelligent Service pp. 5693–5703.
Robotics, vol. 8, no. 2, pp. 105–114, 2015. [30] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen,
[7] M. Haris, G. Shakhnarovich, and N. Ukita, “Deep back-projection “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings
networks for super-resolution,” in Proceedings of the IEEE conference of the IEEE Conference on Computer Vision and Pattern Recognition,
on computer vision and pattern recognition, 2018, pp. 1664–1673. 2018, pp. 4510–4520.
[8] S. Lee, J. Kim, J. Shin Yoon, S. Shin, O. Bailo, N. Kim, T.-H. Lee, [31] B. Cheng, B. Xiao, J. Wang, H. Shi, T. S. Huang, and L. Zhang,
H. Seok Hong, S.-H. Han, and I. So Kweon, “Vpgnet: Vanishing point “Bottom-up higher-resolution networks for multi-person pose estima-
guided network for lane and road marking detection and recognition,” in tion,” arXiv preprint arXiv:1908.10357, 2019.
Proceedings of the IEEE International Conference on Computer Vision, [32] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in
2017, pp. 1947–1955. Proceedings of the IEEE conference on computer vision and pattern
[9] H.-S. Choi, K. An, and M. Kang, “Regression with residual neural recognition, 2017, pp. 7263–7271.
network for vanishing point detection,” Image and Vision Computing,
vol. 91, p. 103797, 2019.
[10] A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks for
human pose estimation,” in European conference on computer vision.
Springer, 2016, pp. 483–499.
[11] Y. Wang, D. Shen, and E. K. Teoh, “Lane detection using spline model,”
Pattern Recognition Letters, vol. 21, no. 8, pp. 677–689, 2000.
[12] T. Tuytelaars, L. Van Gool, M. Proesmans, and T. Moons, “The
cascaded hough transform as an aid in aerial image interpretation,”
in Sixth International Conference on Computer Vision (IEEE Cat. No.
98CH36271). IEEE, 1998, pp. 67–72.
[13] Y. Wang, E. K. Teoh, and D. Shen, “Lane detection and tracking using
b-snake,” Image and Vision computing, vol. 22, no. 4, pp. 269–280,
2004.
[14] R. Ebrahimpour, R. Rasoolinezhad, Z. Hajiabolhasani, and M. Ebrahimi,
“Vanishing point detection in corridors: using hough transform and k-
means clustering,” IET computer vision, vol. 6, no. 1, pp. 40–51, 2012.
[15] Y. Alon, A. Ferencz, and A. Shashua, “Off-road path following using
region classification and geometric projection constraints,” in 2006
IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR’06), vol. 1. IEEE, 2006, pp. 689–696.
[16] H. Kogan, R. Maurer, and R. Keshet, “Vanishing points estimation
by self-similarity,” in 2009 IEEE Conference on Computer Vision and
Pattern Recognition. IEEE, 2009, pp. 755–761.
[17] E. Wang, A. Sun, Y. Li, X. Hou, and Y. Zhu, “Fast vanishing point
detection method based on road border region estimation,” IET Image
Processing, vol. 12, no. 3, pp. 361–373, 2017.
[18] J. M. Alvarez, T. Gevers, and A. M. Lopez, “3d scene priors for road
detection,” in 2010 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition. IEEE, 2010, pp. 57–64.
[19] H. Kong, S. E. Sarma, and F. Tang, “Generalizing laplacian of gaussian
filters for vanishing-point detection,” IEEE Transactions on Intelligent
Transportation Systems, vol. 14, no. 1, pp. 408–418, 2012.
[20] C. Rasmussen, “Roadcompass: following rural roads with vision+ ladar
using vanishing point tracking,” Autonomous Robots, vol. 25, no. 3, pp.
205–229, 2008.
[21] H. Kong, J.-Y. Audibert, and J. Ponce, “General road detection from a
single image,” IEEE Transactions on Image Processing, vol. 19, no. 8,
pp. 2211–2220, 2010.
[22] P. Moghadam, J. A. Starzyk, and W. S. Wijesoma, “Fast vanishing-point
detection in unstructured environments,” IEEE Transactions on Image
Processing, vol. 21, no. 1, pp. 425–430, 2011.
[23] W. Yang, B. Fang, and Y. Y. Tang, “Fast and accurate vanishing point
detection and its application in inverse perspective mapping of structured
road,” IEEE Transactions on Systems, Man, and Cybernetics: Systems,
vol. 48, no. 5, pp. 755–766, 2016.
[24] S.-E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, “Convolutional
pose machines,” in Proceedings of the IEEE conference on Computer
Vision and Pattern Recognition, 2016, pp. 4724–4732.

You might also like