Lightweight Detection Method of Pavement Potholes Based On Binocular Stereo Vision
Lightweight Detection Method of Pavement Potholes Based On Binocular Stereo Vision
A R T I C L E I N F O A B S T R A C T
Keywords: The identification of pavement defects is crucial and must be done efficiently, accurately, and cost-effectively. A
Pavement potholes detection commercial-grade sports camera and mobile vehicle work together to detect pavement potholes in a lightweight
Binocular stereo vision manner. A dataset of 6,186 images with a resolution of 1500 by 1500 pixels has been constructed based on
Deep learning
vehicle-mounted tilted images. Object detection and image segmentation are performed using a single-model
3D reconstruction
Pavement damage ratio
MASK R-CNN, and the principle of binocular stereo vision is employed to reconstruct the three-dimensional
Lightweighting (3D) structures of potholes. A novel methodology is introduced to calculate the 3D feature parameters of pot
holes, enabling the precise determination of the pavement pothole damage ratio. An average accuracy of 98% for
detection and 94% for segmentation is achieved with the Mask R-CNN model. The proposed algorithms achieve
an average DR calculation accuracy of 82%, facilitating precise identification of surface irregularities. The paper
presents an intelligent approach for detecting pavement potholes.
1. Introduction comprehensive pavement data and identify surface distress across the
entire pavement expanse. Yet, the principal limitation of this method
Potholes present a prevalent and hazardous issue for asphalt pave lies in the prohibitive investment and operational expenses [5]. There
ments, adversely impacting the driving experience and posing a fore, scholars have focused on developing methodologies that allow for
considerable threat to vehicle integrity and traffic safety [1]. To pre the effective, accurate, and economical identification and localization of
serve road quality and reduce the impact of potholes, transportation pavement potholes.
agencies require cost-effective strategies for evaluating and monitoring The development of deep learning methods for computer vision of
pavement conditions [2]. With the expansion of the transportation fers more intelligent solutions for detecting pavement potholes. Auto
infrastructure network, conventional inspection techniques fall short of mated pothole detection approaches can be categorized into three
addressing the frequency and scope necessary for consistent road groups: vibration-based, vision-based, and 3D reconstruction-based
detection and maintenance [3]. Consequently, the development of an methods [8]. Vibration-based detection method, while cost-effective,
economical and efficient detection method for potholes is imperative to is limited to identifying potholes that impact the data-collecting
facilitate routine evaluations. vehicle, with its accuracy contingent upon the sensors and the velocity
The traditional pavement pothole detection methods commonly used of the vehicle [9,10]. Vision-based detection method uses images or
at present cannot realize the normalized detection of pavement pot videos as input and detect the number and shape of pavement potholes
holes. The customary manual visual inspection [4] is marred by through image processing and deep learning techniques [11]. Despite
considerable drawbacks, including time-consuming, imprecise, expen being an economical and simple solution, its object detection precision is
sive, and unsafe. Furthermore, the reliability of detection outcomes is compromised in adverse environmental conditions such as shadows,
entirely reliant on the experience of the individual inspector [5]. In lighting variances, and pavement coloration [12]. To minimize the
contrast, semi-automatic techniques involve the utilization of commer impact of external factors, 3D reconstruction techniques are being
cial road inspection vehicles to detection the actual condition of the increasingly employed for the detection of pavement potholes [13]. This
pavement [6,7]. Vehicles equipped with line array cameras, laser sen method relies on stereo vision technology to identify the shape of pot
sors, and longitudinal acceleration sensors are used to collect holes and extract 3D features, including depth, resulting in relatively
* Corresponding author.
E-mail address: [email protected] (L. Zhang).
https://doi.org/10.1016/j.conbuildmat.2024.136733
Received 23 January 2024; Received in revised form 20 April 2024; Accepted 20 May 2024
Available online 7 June 2024
0950-0618/© 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
C. Xing et al. Construction and Building Materials 436 (2024) 136733
2. Methodology
2
C. Xing et al. Construction and Building Materials 436 (2024) 136733
(FPN) with ROI Align layer. The network structure of Mask R-CNN is distance-based method was introduced to estimate the average density
illustrated in Fig. 2. of the point cloud, facilitating the calculation of pothole surface area.
The ground plane equation was determined by fitting them with a
2.3. Methods of calculating 3D parameter information for potholes sample size of 10,000 iterations and a threshold of 0.002 m, ensuring the
effectiveness and robustness of the RANSAC algorithm.
The Random Sample Consensus (RANSAC) algorithm was employed The extraction method of pothole depth is shown in Fig. 3.
to extract 3D characteristic parameters of pavement potholes, with a The methodology for calculating the pothole surface area was based
particular emphasis on depth and surface area measurements. The al on the results of segmenting the ground points, consistent with the
gorithm for pothole surface area extraction was optimized, and a extraction method of pothole depth. The surface area was calculated
3
C. Xing et al. Construction and Building Materials 436 (2024) 136733
through the application of a lattice grid technique. It is critical to weight of the i-th type of pavement defect, which is taken as 1.0 for light
acknowledge that the size of an individual lattice grid exerts a sub potholes and 0.8 for heavy potholes; Ai is the area of the i-th type of
stantial influence on the result. Due to the discontinuous nature of the pavement defect, m2.
reconstructed point cloud, a grid area that is too small can result in an
overly dense array, which can lead to less accurate point cloud charac 3. Data acquisition and processing
terization information than the actual corresponding area, ultimately
reducing the calculated area. The calculation error will also increase if 3.1. Vehicle-mounted photography system
the grid area is too large.
A method to estimate the average density of a point cloud based on Data collection was conducted using a GoPro9 mounted on the center front hood of a
distance was proposed and the algorithm for calculating the pothole Volvo vehicle. The number of frames collected by the sports camera GoPro9 was 30. The
surface area was refined in this paper. The size of each grid was adjusted cameras were 300 mm apart with a focal length of 27 mm and a resolution of 4000*3000 for
according to the density of the point cloud, with smaller grids used for both the left and right images. The cameras had a horizontal field of view of 67◦ , a vertical
higher densities and larger grids used for lower densities. The local field of view of 53◦ , and a diagonal field of view of 80◦ .
density of each point was estimated by calculating its distance to the Guan [21] developed a vehicle-mounted camera system based on multi-view stereo
nearest neighbor. Subsequently, the average density of point cloud was imaging technology with an acquisition range of 2 m in width and 1 m in length. Li [22]
calculated by averaging the local densities of all points, as shown in Eq. fixed the binocular camera on the vehicle, with the camera lens oriented vertically downward
(5) and Eq. (6). and the acquisition speed set between 3 and 15 km/h. This configuration demonstrates a
( ( )) promising, cost-effective approach to automatically detecting 3D pavement distresses. How
dp = min dis p, q , q = 1, 2, ⋯, N, p ∕
=q (5)
ever, it has limitations in the field of high-speed, full-range image acquisition.
N To address these limitations, this study improves the mounting mode of the commercial
1 ∑
d= dp (6) camera, sets the acquisition angle, and increases the acquisition speed while maintaining the
N p=1 quality of the image.
As for the acquisition speed, this paper conducts image acquisition on general urban roads
Where, dp is the distance from a point p to the nearest point q in the point in Harbin, and the speed must meet the speed limit requirements of urban roads, which can be
cloud; N is the number of points in the point cloud; d is the average met when the speed is 40 km/h.
distance density of the point cloud. Regarding the camera’s erection height, field tests revealed that while the camera’s
A smaller d-value indicates a denser distribution and a lager density erection height is too high, it can obtain a better field of view. However, this also leads to a
of the point cloud, while a larger d-value indicates a sparser distribution decrease in the number of pixels of the near-end target, which affects the efficiency of sub
and a smaller density of the point cloud. The grid size is determined by sequent target detection and three-dimensional reconstruction. If the camera height is too low,
it will increase the occlusion of the front hood of the vehicle and reduce the number of remote
the d-value of the point cloud.
target pixels. Therefore, we have chosen to assume a height of 1.2 m when collecting the
The extraction method of pothole surface area is shown in Fig. 4.
actual pavement.
The filter processing algorithm is embedded in the parameter
Regarding the erection angle, if vertical shooting is adopted and the erection height is
extraction algorithm. For the whole parameter extraction algorithm, the
1.2 m, the transverse width of the GoPro9 shooting is 1.5 m. However, the single lane in the
algorithm complexity is O(nlogn).
area collected in this paper is 3.5 m, which is less than the width of the single lane.
Consequently, when data acquisition is conducted by the vertical above camera, multiple data
2.4. Road condition indicators acquisitions are required for a single lane, which is time-consuming and impractical for
lightweight detection of road potholes. Therefore, the method of oblique shooting is deter
The Pavement Condition Index (PCI) [26] is a crucial factor in mined. For the determination of the tilt Angle, the study found that when the tilt Angle is 60◦ ,
determining pavement maintenance decisions. PCI can be calculated the camera’s shooting range is larger, even if the vehicle deviates from the center of the lane,
using Eq. (7) and Eq. (8). its long-distance field of view Angle is enough to cover a lane. In addition, the spatial res
PCI = 100 − a0 DRa1 (7) olution is calculated, and it is found that the spatial resolution is less than 0.01 pixel/cm when
the farthest distance is 30 m.
i0
∑ Therefore, in order to ensure the best acquisition effect, the running speed of the
/
DR = 100 × ω i Ai A (8) acquisition vehicle is set at 40 km/h, that is, 11 m/s. The maximum acquisition distance is set
i=1
to 20 m, corresponding to the camera erection Angle of 60◦ . In addition, the camera is set to
capture 30 frames per second.
Where, DR is the pavement damage ratio; a0 :the a0 -value is 15 for
The vehicle-mounted camera system is shown in Fig. 5. The system was specifically
asphalt pavement and 10.66 for cement pavement; a1 : the a1 -value is
engineered to harvest data from asphalt pavements, including city streets, elevated roadways,
0.412 for asphalt pavement and 0.461 for cement pavement; ωi is the
4
C. Xing et al. Construction and Building Materials 436 (2024) 136733
and community roads. potholes, a set of 291 images was manually labeled with pixel-level labels. The dataset
Ultimately, one image was selected every six frames for a total of five hours of pavement encompassed prevalent pavement distractions inclusive of poor lighting and shadow occlusion
image data within the Harbin city limits. Subsequent to the exclusion of images devoid of conditions.
5
C. Xing et al. Construction and Building Materials 436 (2024) 136733
Table 1
Evaluation Indicators values for various image enhancement methods.
Methods SSIM PSNR NIQE
Table 2
AP-values for detection and segmentation of the model.
Categories Detection Segmentation
Contrary to other web-accessible open-source pothole datasets, this dataset captures covariance of the images X and Y; C1 ,C2 ,C3 are constants, C1 =
pothole scenarios from a fixed forward-facing viewpoint at a distance, more closely approx (K1 ∗ L)2 ,C2 = (K2 ∗ L)2 ,C3 = C2 /2, and in general, K1 = 0.01,K2 =
imating real-world vehicular perception. The dataset collection conditions have not been 0.03,L = 255.
simplified, ensuring coverage of a variety of factors that may be encountered during the The results are shown in Table 1.
collecting process on the pavement that include different weather and lighting conditions, as As can be seen from Table 2, the MSRCP method outperforms the
well as shadow interference from other objects. Fig. 6 shows a selection of the low-quality other methods in terms of SSIM values, and is second only to the MSRCR
images, and it is clear that the dataset is more challenging than most publicly accessible method in terms of PSNR values. The NIQE value obtained from the
datasets. MSRCP method is only higher slightly than that obtained from the KinD
method.
Upon critical appraisal through both subjective and objective as
3.2. Pavement pothole image data pre-processing sessments of image quality, it has been determined that the MSRCP
method renders superior amelioration regarding the enhancement of the
(1) Image enhancement algorithms image. This method ostensibly yields a more authentic reproduction of
The image acquisition process may not always occur under ideal conditions, thereby color fidelity, meticulously retains edge information of potholes, and
potentially compromising the accuracy of pavement pothole detection. The effectiveness of the simultaneously augments luminance without sacrificing detail. In
image enhancement methods based on Retinex theory and deep learning in enhancing low- addition, the image enhanced by the MSRCP method has a smoother
quality images were compared in this study. The enhancement in image quality will be overall transition that matches the natural characteristics of the image.
evaluated from both subjective and objective perspectives. The effects of various methods on Thus, the MSRCP method has been ultimately chosen as the optimal
Fig. 7.
the image enhancement in different scenarios are shown in strategy for image enhancement under conditions of low illumination.
As shown in Fig. 7, the same method of image enhancement has (2) Training objectives labelling
varying effects on images with different illumination situations. The The image resolution captured by the GoPro9 is 4000 ×3000. At this
MSRCP method achieves a significant enhancement effect in all three scale, the image size is deemed excessively large, necessitating the
situations. It also achieves a good balance combination of color, illu implementation of cropping methods to facilitate the formation of an
mination, and information on pitted edges, resulting in an image with optimized image dataset. The cropping approach of incorporating the
realistic color, smooth detail transitions, and enhanced visual appeal. Intersection over Union (IoU) was introduced in this paper to ensure the
The quality of the image was evaluated using objective reference maximal preservation of pertinent target information within the resul
quality evaluation metrics such as peak signal-to-noise ratio (PSNR) and tant sub-images, so as to improve the accuracy of the cropped dataset
structural similarity (SSIM), as well as non-reference quality evaluation and thus the effectiveness of model training. The cropping process was
metrics such as natural image quality evaluator (NIQE). executed utilizing a sliding window of 1500*1500 pixels with overlap
The PSNR can be calculated using Eq. (9) and Eq. (10). rate of 0.7. The threshold for the intersection ratio between the subgraph
(
2
) labeled box and the original labeled box was set to 0.3, ensuring that
(2n − 1) both the integrity of the information of target and the bounding box
PSNR = 10log10 (9)
MSE were maintained in the cropped sub-image. The threshold for the
intersection ratio between the subgraph labeled box and the original
H ∑ W
1 ∑ labeled box was set to 0.3, meaning that the sub-image and the infor
MSE = (X(i, j) − Y(i, j))2 (10)
H × W i=1 j=1 mation in the bounding box were preserved during the cropping process
if the overlap between the bounding box of the target (depicted in blue
Where, n is the number of bits per pixel, which is taken as 8 for grayscale within the orange sliding window) and the bounding box (marked in
maps. red) exceeded the threshold of 0.3. The cropping process of target is
The quality of an image is better when the NIQE value is smaller, shown in Fig. 8.
indicating a smaller gap with the natural image. The SSIM can be The final result of the cropping is that the training set: validation set:
calculated using Eq. (11). testing set is 1742: 272: 409, with an approximate ratio of 7: 1: 2.
( ) The LABELME software was used to label the dataset, and the la
2μ μ + C1 2σ X σ Y + C2 σXY + C3 beling targets were manhole covers, longitudinal potholes and general
SSIM X, Y = 2 X Y2 ∗ ∗ (11)
μX + μY + C1 σ 2X + σ2Y + C2 σX σY + C3 potholes. Compared to the simple use of rectangular boxes to label the
dataset, the polygonal boxes were used to label the dataset at the pixel
Where, μX and μY are the average of the images X and Y, respectively; σ X
level, which provides a more detailed labeling of the shape and size of
and σY are the variance of the images X and Y, respectively; σ XY is the
6
C. Xing et al. Construction and Building Materials 436 (2024) 136733
the potholes in the image and is more conducive to retaining the pothole with a CPU of Intel(R) Xeon(R) Silver 4214 R x2 and a GPU of NVIDIA
shape and size information. GeForce RTX 3090. The software configurations used for this study
(3) Image Augmentation algorithms consisted of Python 3.9.12, Pytorch 1.10.0, CUDA 11.3.1, and Torch
Data augmentation is a widely used technique in deep learning that vision 0.11.1.
expands the dataset and improves the generalization of the model by The Mask R-CNN model was trained using the Adam optimizer. The
applying a series of stochastic transformations to the original data. initial learning rate for the model was 0.00001, which was then scaled
Seven methods were utilized for image augmentation, as shown in down to 0.000001 after 50 epochs. The batch size is set to 16, and the
Fig. 9. training set consists of 5,055 images with a size of 600*600 pixels. The
Ultimately, the dataset was augmented through the application of size of the predicted image is also 600 by 600 pixels. Within the struc
various image augmentation methods to the set of 1742 cropped images. tured training framework, one epoch was defined as encompassing 314
Each data augmentation method was probabilistically applied at a rate iterations, with the model undergoing a total of 100 epochs, resulting in
of 50%. After the augmentation process, images that were devoid of 31,400 iterations in the entire training process. Optimal performance
labels, images where the target’s pixel count fell below 100, and images was demonstrated by the model on the validation set, indicative of its
featuring bounding boxes with dimensions less than 5 pixels in width robustness and precision in image recognition tasks.
and height were selectively excised from the dataset. The final
augmented dataset of 5055 images was obtained. The validation and
testing sets were not augmented since the model was applied by 3.4. 3D reconstruction of pavement potholes
comparing the prediction performance on real images.
(1) Camera calibration
The camera was calibrated using the Zhang Zhengyou method[27] to
3.3. Model training environment and hyperparameter optimisation eliminate lens distortion, including radial and tangential distortion.
Eighteen checkerboard grids were collected from different angles with
The Mask R-CNN model was trained and validated on a workstation an average calibration pixel error of 1.17 pixels. The calibration process
7
C. Xing et al. Construction and Building Materials 436 (2024) 136733
is shown in Fig. 10. The point cloud reconstructed within the purview of the stereo
The inner reference matrix M of the camera was obtained, as shown in Eq. (12). binocular camera encompasses data from all objects in the visual field,
⎡ ⎤ ⎡ ⎤ including a significant amount of non-pavement pothole point cloud
fx γ u0 3514.5 − 5.7511 2607.7
information, which has a negative impact on subsequent point cloud
M=⎣0 fy v0 ⎦ = ⎣ 0 3478.5 1939.6 ⎦ (12)
processing and related parameter calculation. To address this issue, it is
0 0 1 0 0 1
necessary to precisely extract the pavement pothole portion of the
The corresponding lens radial distortion k1, k2 were 0.0122, -0.0048, reconstructed point cloud, and obtain the relevant pothole point cloud
and the lens tangential distortion p1, p2 were -0.0004404, 0.0027. information. Image segmentation can provide pixel-level position in
(2) 3D point cloud reconstruction formation for potholes in captured images. Using this spatial data, along
The binocular stereo vision reconstruction method was employed for with the depth map and camera calibration parameters obtained during
reconstructing the three-dimensional point cloud of potholes. The pro the 3D reconstruction phase, a well-defined point cloud of potholes can
cess involved matching feature points, calculating parallax, and trans be extracted. The methodology for this interception process is depicted
forming coordinates to convert images into a point cloud of the potholes. in Fig. 12.
The potholes were reconstructed in 3D using METASHAPE software. This methodology enables the acquisition of the three-dimensional
This process involved importing the captured images and inputting the morphology of the pothole. However, at this stage, the pothole point
internal and external parameter matrices of the camera calibration. The cloud is highly disorganized, necessitating the application of the meth
captured images were then aligned to obtain the corresponding feature odologies described in Section 2.3 for the calculation of three-
points, and a parallax image was generated based on the matching re dimensional parameter information for potholes. This will facilitate
sults. Finally, the 3D point cloud of the potholes was reconstructed based the effective extraction of pertinent data, such as the surface area and
on the parallax image and the camera parameters. Fig. 11 shows the depth of the pothole.
comprehensive flowchart of 3D point cloud reconstruction. (4) Point cloud filtering
(3) 3D point cloud interception The point cloud derived from 3D reconstruction potentially contains
8
C. Xing et al. Construction and Building Materials 436 (2024) 136733
superfluous and noisy points, which could detrimentally affect subse As shown in Fig. 13, the statistical filtering process eliminates some
quent data processing and analysis. Thus, it is necessary to remove these ground points and causes the edges of the point cloud to appear
points using filtering. The statistical filtering was chosen due to the rounded, but the overall filtering result is continuous and reasonable.
uniform color of the asphalt pavement and its classification as a region (5) Point cloud attitude standardization
of weak texture, as well as the significant variance in the density of the Point cloud segmentation was used to determine the spatial co
reconstructed point cloud. Fig. 13 illustrates the effect of statistical ordinates of the ground plane, as the ground point cloud makes up the
filtering. majority of the reconstructed point cloud. The normal vector and Z-axis
9
C. Xing et al. Construction and Building Materials 436 (2024) 136733
inclination angle were then calculated, and the ground point cloud was mildly damaged potholes based on the current standard for categorizing
rotated by the corresponding angle using the normal vector rotation the degree of pothole damage.
method to achieve parallelism with the XOY plane. This process is shown
in Fig. 14. It can be seen that the standardization of the attitude em 4. Results and discussion
phasizes the height difference of the reconstructed point cloud.
4.1. Training and validation of mask R-CNN models
3.5. On-site acquisition of 3D parameter information for potholes
The changes of the AP values of the validation set and the model loss
Manual measurement method was used to collect depth and surface rate during the training process are shown in Fig. 16.
area from 15 sets of potholes to verify the validity and accuracy of the Fig. 16 shows that as the number of training epochs increases, the
proposed 3D parametric algorithm for pavement potholes. The loss decreases steadily with fluctuations, and the AP-values for the
maximum depth of potholes was determined by taking an average of 10 detection and segmentation also increase steadily with variation.
readings at the deepest point, which was measured using vernier cali Eventually, the AP-values for detection and segmentation of the model
pers. To quantify the surface area, a calibrated ruler was used in the are shown in Table 2.
images to determine the pixel-to-real-distance scale, which was used to When the IOU threshold is set to 0.5, the model achieves a mAP value
calculate the number of pixels in each pothole and determine the surface of 98.10% for detection and 94.00% for segmentation, demonstrating
area proportionally. Fig. 15 illustrates the process of acquiring 3D in the excellent ability of the model to detect and segment pavement
formation of pavement potholes on-site. defect. The effectiveness of object detection and segmentation is
In Fig. 15, the area of the pothole pixels was 237,030 pixels and the demonstrated in Fig. 17.
pixel ratio was 3.65 mm2/pixel. The final calculated surface area of the It can be seen from Fig. 17 that the model effectively detects and
pothole was 86.71 cm2, which was within 1% of the actual area of segments general potholes, longitudinal potholes, and manhole covers
86.55 cm2. The study collected 5 severely damaged potholes and 10 with high precision. Additionally, the model maintains reliable
10
C. Xing et al. Construction and Building Materials 436 (2024) 136733
detection and segmentation performance even when multiple adjacent surface area measurements are inflated by the presence of extremely
objects are present or when different objects coexist. It precisely locates large relative error detection results. However, the MFE values for both
general potholes, longitudinal potholes, and manhole covers within are relatively small, suggesting sufficient data concentration to ensure
images, executing pixel-level segmentation with accuracy. accurate calculation results. Large errors in potholes reconstruction can
be attributed to two main factors. Firstly, the asphalt pavement is a weak
texture object, making it difficult to extract and match feature points,
4.2. 3D reconstruction test results of potholes which in turn affects the accuracy of point cloud reconstruction. Sec
ondly, camera calibration errors can also contribute to inaccuracies. The
The algorithm proposed for extracting the 3D parameters of pave Zhang Zhengyou calibration method[27] was utilized in this paper,
ment potholes using binocular vision was evaluated by acquiring on-site which resulted in a calibration error of 1.17 pixels, introducing a mea
3D parameter information for the potholes. The results are presented in sure of uncertainty into the calibration outcomes.
Table 3. However, the analysis showed that the predicted damage level for
Table 3 shows that the depth of the potholes ranges from a maximum potholes No. 6 and No. 7 was inaccurately assessed. This can be attrib
of 30.95 mm to a minimum of 10.06 mm. The relative error of the depth uted to their depths nearing 25 mm, which is a threshold that is prone to
spans from -19.43–23.425%, with an absolute error between 0.03 mm misclassification. In addition, for the remaining 13 potholes, the pre
and 7.25 mm. Additionally, the surface area of the potholes extends dicted damage level concurred with the true results, thereby affirming
from a maximum of 2,472.53 mm2 to a minimum of 117.05 mm2. The the efficacy of the proposed algorithm in calculating pothole depth and
relative error for the surface area measurements fluctuates between surface area. As for the DR, its relative error showed consistent with the
-19.83% and 38.437%, with an absolute error ranging from 2.33 mm2 to surface area error, attributable to DR being a ratio of the damaged area
123.63 mm2. Root Mean Square Error (RMSE), Mean Absolute Per to the total detected area. The average detection accuracy for DR pre
centage Error (MAPE), and Mean Forecast Error (MFE) were used to diction is 82%.
evaluate the results. RMSE measures the deviation between the true and This paper analyzed the correlation between true values and model
detected values, while MAPE gauges detection accuracy, and MFE re detection values of pothole depth and surface area, using true values as
flects data concentration. For potholes depth, the calculated RMSE was independent variables and model detection values as dependent vari
13.08, with the MAPE at 12.528%, and MFE recorded as -0.248. Cor ables, as shown in Fig. 18.
responding to potholes surface area, the values of RMSE, MAPE, and It can be seen from Fig. 18 that the goodness of fit (R2) between the
MFE were 524.7, 18.189%, and -9.723, respectively. The data indicates true values and the detection values of the pothole depth is 0.948.
that the values of RMSE and MAPE associated with pothole depth and
11
C. Xing et al. Construction and Building Materials 436 (2024) 136733
Table 3
Results of potholes processing.
Number 1 2 3 4 5
Left view
Right view
3D reconstruction
Right view
3D reconstruction
12
C. Xing et al. Construction and Building Materials 436 (2024) 136733
Table 3 (continued )
Number 1 2 3 4 5
Right view
3D reconstruction
Similarly, the R2 between the true values of pothole surface area and the (3) Automated extraction of the pavement potholes point cloud is
detection values presents is 0.9752. The proposed algorithm for realized based on the image segmentation results. The relative error rate
measuring the 3D parameters of potholes based on binocular stereo of the proposed algorithm based on binocular stereo vision to extract the
vision can reliably detect the 3D features and true values of potholes depth and surface area of pavement potholes is 12.528% and 18.189%,
with greater accuracy, as indicated by the results. respectively, and the maximum accuracy rate is 99.76% and 99.19%.
(4) The pavement damage ratio (DR) is calculated by the damage
5. Conclusions level and 3D parameters of potholes with an average accuracy of 82%.
(5) The goodness of fit between the actual measurements and the
In response to the current difficulties in normalizing the detection of pavement potholes, model detection values of pavement pothole depth and surface area
conclusions are
an innovative and cost-effective approach is proposed in this paper. The exceeds 93%, indicating a strong correlation between the two variables
as follows: and confirming the reliability of the proposed new algorithm for 3D
(1) Commercial motion cameras can be employed for the detection of characterization of potholes.
pavement potholes. Lightweight detection and segmentation of pave In conclusion, an efficient and accurate lightweight method is pro
ment potholes is achieved on the basis of vehicle-mounted tilt images. posed in this paper to address the current difficulties encountered in
(2) The Mask R-CNN model demonstrates high average precision in pavement pothole normalization detection. In future research, it is
the detection and segmentation of pavement pothole images, achieving recommended to expand the research scope on pavement surface defect
rates of 98.10% and 94.00% respectively at an IOU of 0.5. Furthermore, and explore optimal hyperparameter combinations for the model using
this model effectively detects and segments various types of defects that an intelligent search strategy. The deep learning model should be con
coexist, exhibiting robustness in its performance. structed with high precision to enhance its adaptability and robustness
13
C. Xing et al. Construction and Building Materials 436 (2024) 136733
in real-world complex and variable environments. This will enable more [5] N. Ma, J. Fan, W. Wang, et al., Computer vision for road imaging and pothole
detection: a state-of-the-art review of systems and algorithms, Transp. Saf. Environ.
effective handling of various types of pavement defects and provide
4 (4) (2022) 3.
stable and consistent assessment of pavement surface conditions in a [6] C. Abdollahi, M. Mollajafari, A. Golroo, et al., A review on pavement data
wide range of application scenarios. acquisition and analytics tools using autonomous vehicles, Road. Mater. Pavement
Des. (2023) 1–27.
[7] Y. Deng, X. Shi, Y. Kou, et al., Optimized design of asphalt concrete pavement
Disclosure statement containing phase change materials based on rutting performance, J. Clean. Prod.
380 (2022).
No potential conflict of interest was reported by the authors. [8] Q. Mei, M. Gül, A cost effective solution for pavement crack inspection using
cameras and deep neural networks, Constr. Build. Mater. 256 (2020) 119397.
[9] H. Xin, Y. Ye, X. Na, et al., Sustainable road pothole detection: a crowdsourcing
CRediT authorship contribution statement based multi-sensors fusion approach, Sustainability 15 (8) (2023).
[10] J. Liu, Y. Wang, H. Luo, et al., Pavement surface defect recognition method based
on vehicle system vibration data and feedforward neural network, Int. J. Pavement
Mu Li: Writing – original draft. Lei Zhang: Writing – review & Eng. 24 (1) (2023).
editing, Funding acquisition. Yiqiu Tan: Writing – review & editing, [11] M. Azimi, A. Eslamlou, G. Pekcan, Data-driven structural health monitoring and
Funding acquisition. Yongkang Zhang: Writing – original draft. Hao damage detection through deep learning: state-of-the-art review, Sensors 20 (10)
(2020).
Deng: Writing – original draft. Chao Xing: Writing – review & editing, [12] Y. Kim, Y. Kim, S. Son, et al., Review of recent automated pothole-detection
Methodology, Funding acquisition. Guiping Zheng: Writing – original methods, Appl. Sci. -Basel 12 (11) (2022), 3D.
draft, Methodology, Data curation. [13] A. Dhiman, R. Klette, Pothole detection using computer vision and learning, IEEE
Trans. Intell. Transp. Syst. 21 (8) (2020) 3536–3550.
[14] H. Laga, L. Jospin, F. Boussaid, et al., A survey on deep learning techniques for
Declaration of Competing Interest stereo-based depth estimation, IEEE Trans. Pattern Anal. Mach. Intell. 44 (4)
(2022) 1738–1764.
We wish to draw the attention of the Editor to the following facts [15] X. Xiong, Y. Tan, Pixel-level patch detection from full-scale asphalt pavement
images based on deep learning, Int. J. Pavement Eng. 24 (1) (2023).
which may be considered as potential conflicts of interest and to sig [16] W. Lin, X. Li, H. Han, et al., A novel approach for pavement distress detection and
nificant financial contributions to this work. quantification using RGB-D camera and deep learning algorithm, Constr. Build.
Mater. 407 (2023).
[17] Y. Liu, F. Liu, W. Liu, et al., Pavement distress detection using street view images
Data Availability captured via action camera, IEEE Trans. Intell. Transp. Syst. (2023).
[18] Y. Liu, Y. Wang, X. Cai, et al., The detection effect of pavement 3D texture
Data will be made available on request. morphology using improved binocular reconstruction algorithm with laser line
constraint, Measurement 157 (2020) 107638.
[19] Y. Wang, R. Wang, X. Ren, et al., Improvement of binocular reconstruction
Acknowledgments algorithm for measuring 3D pavement texture using a single laser line scanning
constraint, Cmes-Comput. Model. Eng. Sci. 136 (2) (2023) 1951–1972.
[20] Du Y., Zhou Z., Wu Q., et al. A Pothole Detection Method Based On 3D Point Cloud
This work was supported by the National Key Research and Devel Segmentation[J]. Twelfth International Conference on Digital Image Processing
opment Program of China (Grant No. 2022YFB2602600) and National (ICDIP 2020), 2020,11519(2023-12-2):3D, 3D.
Natural Science Foundation of China (Grant No. 52378446 and No. [21] J. Guan, X. Yang, L. Ding, et al., Automated pixel-level pavement distress detection
based on stereo vision and deep learning, Autom. Constr. 129 (2021) 103788.
U20A20315) and Natural Science Foundation of Heilongjiang Province
[22] J. Li, T. Liu, X. Wang, Advanced pavement distress recognition and 3D
(Grant No. YQ2022E037). reconstruction by using GA-DenseNet and binocular stereo vision, Measurement
201 (2022) 111760.
References [23] W. Luo, Y. Qin, Q. Li, et al., Automatic mileage positioning for road inspection
using binocular stereo vision system and global navigation satellite system, Autom.
Constr. 146 (2023) 104705.
[1] C. Zhang, G. Li, Z. Zhang, et al., AAL-net: a lightweight detection method for road [24] Q. Xie, X. Hu, L. Ren, et al., A binocular vision application in IoT: realtime
surface defects based on attention and data augmentation, Appl. Sci.-Basel 13 (3) trustworthy road condition detection system in passable area, IEEE Trans. Ind.
(2023). Inform. 19 (1) (2023) 973–983.
[2] E. Ranyal, A. Sadhu, K. Jain, Automated pothole condition assessment in pavement [25] K. He, G. Gkioxari, P. Dollár, et al., Mask R-CNN, IEEE Trans. Pattern Anal. Mach.
using photogrammetry-assisted convolutional neural network, Int. J. Pavement Intell. 42 (2) (2020) 386–397.
Eng. 24 (1) (2023). [26] J. Li, T. Liu, X. Wang, et al., Automated asphalt pavement damage rate detection
[3] L. Zhang, M. Shan, C. Xing, et al., Mechanism of physical hardening on the fracture based on optimized GA-CNN, Autom. Constr. 136 (2022) 104180.
characteristics of polymer-modified asphalt binder, Constr. Build. Mater. 409 [27] S. Han, X. Dong, X. Hao, et al., Extracting objects’ spatial-temporal information
(2023) 134091. based on surveillance videos and the digital surface model, ISPRS Int. J. Geo-Inf. 11
[4] Y. Zhang, Z. Ma, X. Song, et al., Road surface defects detection based on IMU (2) (2022).
sensor, IEEE Sens. J. 22 (3) (2022) 2711–2721.
14