Data Deep
Data Deep
Review
Deep Learning in Data-Driven Pavement Image
Analysis and Automated Distress Detection:
A Review
Kasthurirangan Gopalakrishnan ID
Abstract: Deep learning, more specifically deep convolutional neural networks, is fast becoming
a popular choice for computer vision-based automated pavement distress detection. While
pavement image analysis has been extensively researched over the past three decades or so, recent
ground-breaking achievements of deep learning algorithms in the areas of machine translation,
speech recognition, and computer vision has sparked interest in the application of deep learning
to automated detection of distresses in pavement images. This paper provides a narrative review
of recently published studies in this field, highlighting the current achievements and challenges.
A comparison of the deep learning software frameworks, network architecture, hyper-parameters
employed by each study, and crack detection performance is provided, which is expected to provide
a good foundation for driving further research on this important topic in the context of smart
pavement or asset management systems. The review concludes with potential avenues for future
research; especially in the application of deep learning to not only detect, but also characterize the
type, extent, and severity of distresses from 2D and 3D pavement images.
Keywords: pavement cracking; pavement management; pavement imaging; 3D image; deep learning;
TensorFlow; deep convolutional neural networks
1. Introduction
Transportation infrastructure systems are essential to the minimum operations of the government
and commerce, and are considered the backbone of a nation’s economy. Yet, they are literally
crumbling across the globe and are considered “to be on life support” even by authorities in charge of
maintaining them, especially in the United States. According to the 2017 American Society of Civil
Engineers (ASCE) Infrastructure Report Card, the road infrastructure in the United States received a ‘D’
grade [1]. This state is due, in large part, to delayed maintenance and underinvestment in upgrades
of transportation infrastructure systems. This, combined with increasing budgetary constraints,
necessitates the development of efficient structural and functional health monitoring techniques for
early detection of distresses developing in pavements. This can lead to significant cost savings resulting
from timely maintenance and repair activities.
Highway agencies typically employ dedicated pavement data collection vehicles equipped with
high-speed digital cameras or 3D laser scanner for inspecting the pavement surface, and acquire 2D or
3D pavement images [2,3]. In recent years, the 3D automated survey systems have also been introduced
to acquire high-resolution 3D images of the pavement surface, which also offers opportunities for
detecting other distresses apart from cracking [4,5].
Automated detection of distresses from pavement images (see Figure 1) is a challenging problem
that has been quite thoroughly studied by the computer vision research community for more than three
Data 2018, 3, x 2 of 19
decades. However, the challenges associated with 2D pavement images, such as variations in image
three decades. However, the challenges associated with 2D pavement images, such as variations in
source (digital camera, smartphone, unmanned aerial vehicle (UAV), etc.), non-uniformity of cracks,
image source (digital camera, smartphone, unmanned aerial vehicle (UAV), etc.), non-uniformity of
surface texture (e.g., tining), lack of sufficient background illumination, and presence of other features
cracks, surface texture (e.g., tining), lack of sufficient background illumination, and presence of other
such features
as joints,such
among others,
as joints, continue
among others,tocontinue
keep this area of
to keep thisresearch active with
area of research activeresearchers constantly
with researchers
seeking newer methods and algorithms to address these challenges. Not
constantly seeking newer methods and algorithms to address these challenges. Not surprisingly,surprisingly, thetherecent
achievements by so-called
recent achievements by deep learning
so-called deep (DL) algorithms
learning have caught
(DL) algorithms the attention
have caught of theof
the attention pavement
the
imagepavement image analysis
analysis community andcommunity
have inspired andthem
have toinspired
move from themsystems
to movethatfrom
usesystems that use
handcrafted features
handcrafted features to data-driven distress detection systems (i.e., systems that automatically
to data-driven distress detection systems (i.e., systems that automatically learn features from the images). learn
features
Because from
of the the images).
availability Because of the parallel
of inexpensive, availability of inexpensive,
hardware, parallelamounts
and massive hardware,ofand massive data,
unlabeled
amounts of unlabeled data, deep learning has already produced breakthrough results in computer
deep learning has already produced breakthrough results in computer vision, speech recognition,
vision, speech recognition, and text processing. A popular example of DL success is its deployment
and text processing. A popular example of DL success is its deployment in self-driving cars.
in self-driving cars.
Figure 1. A sample Portland cement concrete (PCC)-surfaced pavement distress image captured using
Figure 1. A sample
pavement Portlandvehicle
data collection cementmoving
concrete (PCC)-surfaced
at highway speed and pavement
equippeddistress image captured using
with a downward-looking
pavement data collection vehicle moving at highway
high-speed digital camera (Source: FHWA LTPP database). speed and equipped with a downward-looking
high-speed digital camera (Source: FHWA LTPP database).
To date, applications of DL to pavement image analysis have mainly employed convolutional
neural networks (CNNs or ConvNets), a specific DL model with many convolution layers. Deep
To date, applications of DL to pavement image analysis have mainly employed convolutional
CNNs (DCNNs) are characterized by deeper architectures with numerous hidden layers enabling
neural networks (CNNs or ConvNets), a specific DL model with many convolution layers. Deep
them to learn many levels of abstraction, as opposed to shallow architectures with typically fewer
CNNs (DCNNs)
hidden layers are
[6–8].characterized by deeper
Although the first published architectures
works on thewith numerous
application of CNNshidden
or DLlayers enabling
in general
themtotopavement
learn many cracklevels of abstraction,
detection appeared in 2016,as opposed to shallow
some 12 papers have architectures with typically
already been published betweenfewer
hidden2016 and beginning of 2018. Although not as steep a growth in research productivity as in other in
layers [6–8]. Although the first published works on the application of CNNs or DL general
areas
to pavement crack detection
such as medical appeared
image analysis, it isin 2016, some
evident 12 interest
that the papers in have
the already
applicationbeenofpublished
DL to addressbetween
2016 various challenges
and beginning of in vision-based
2018. Although automated pavement
not as steep a growthdistress detectionproductivity
in research is fast growing. as in other areas
This paper
such as medical image provides the it
analysis, first narrativethat
is evident review on the application
the interest of deep learning
in the application of DL totoaddress
pavementvarious
image analysis and automated distress detection. Although
challenges in vision-based automated pavement distress detection is fast growing. there is not yet a sufficient number of
papers on this topic to conduct a comprehensive survey in the traditional sense, this quick review
This paper provides the first narrative review on the application of deep learning to pavement
nonetheless covers enough ground to assess the state-of-the-art and will hopefully spur future
image analysis and automated distress detection. Although there is not yet a sufficient number of
research in the application of DL to pavement image analysis. Considering the rather narrow range
papers on thispublished
of studies topic to conduct a comprehensive
so far on this topic, peer-reviewedsurvey in the
journal traditional
articles, as wellsense, thisappearing
as articles quick review
nonetheless covers enough ground to assess the state-of-the-art and will hopefully
in conference proceedings, were included in this narrative review. Additionally, one preprint spur future research
in theappearing
application of DL
in arXiv to pavement
online repositoryimage
was alsoanalysis.
reviewed.Considering the rather narrow range of studies
publishedThe so far
restonof this
this topic,
reviewpeer-reviewed
paper is structured journal articles,In
as follows. asSection
well as2,articles appearing
we summarize thein conference
existing
and emerging
proceedings, were deep
includedlearning software
in this frameworks
narrative review. for computer vision
Additionally, applications,
one preprint especially
appearing in arXiv
online repository was also reviewed.
The rest of this review paper is structured as follows. In Section 2, we summarize the existing and
emerging deep learning software frameworks for computer vision applications, especially relevant to
Data 2018, 3, 28 3 of 19
automated pavement distress detection. Section 3 forms the main crux of this paper, where each study
reviewed in this paper is briefly summarized. Section 4 provides useful overview and comparison
tables of all publications in terms of objectives and datasets, network architecture and hyper-parameters,
software framework used, hardware specs, and summary of test results. We conclude with a summary
and some emerging areas of future research driven by ongoing advances in deep learning technology.
2. Some Existing and Emerging Deep Learning Frameworks for Computer Vision Applications
A significant amount of parallelism in computations is involved in DCNN implementations.
To facilitate this, a number of popular open-source DL software frameworks exist, such as Caffe,
the Microsoft Cognitive Toolkit (CNTK), Google’s TensorFlow, Theano, Torch, dmlc MXnet, Chainer,
and Keras, among other. These frameworks make use of the system’s hardware (CPU and/or GPU)
settings to implement parallel programming and accelerate the computational process. A brief overview
of the most commonly used DL software frameworks for computer vision applications (especially
relevant to crack detection) is presented in this section, highlighting their pros and cons. This is mostly
based on some recently published comparative studies on DL software frameworks by evaluating
these frameworks in terms of extensibility, hardware utilization, and speed [9–11]. It is worth noting
that all these DL frameworks are undergoing constant development with active contributions from
researchers and the open-source community, and therefore the study results and conclusions from the
reported comparative studies may not be up to date.
2.1. Caffe
Caffe is developed at the Berkeley AI Research (BAIR) center and the Berkeley Vision and Learning
Center (BVLC) at the University of California, Berkeley with “expression, speed, and modularity in
mind” [12]. It is considered to be an easy-to-deploy production platform developed exclusively
for DL-based computer vision systems and is believed to be one of the fastest ConvNet or CNN
implementations available with an ability to process over 60 million images per day [13].
2.2. TensorFlow
TensorFlow, originally developed by researchers and engineers working on the Google Brain
Team, uses data flow graphs for numerical computation and is mainly designed for developing and
implementing deep neural network models [14]. One major advantage of TensorFlow that vastly
increased its popularity among DL researchers and companies is its ability to deploy computation to one
or more CPUs/GPUs on a variety of systems and devices through a single application programming
interface (API) [10]. Based on a comparative study of Theano, Torch, Neon, and TensorFlow DL
frameworks, Bahrampour et al. [9] concluded that TensorFlow, although a very flexible framework,
is not as competitive as other studied frameworks in terms of its performance on a single GPU. However,
in a similar benchmarking study comparing Caffe, CNTK, TensorFlow, and Torch, Shi et al. [11] reported
that no single framework consistently outperforms others. Fonnegra et al. [10] reported that TensorFlow
was faster than Theno when the DL architecture contained long short term memory (LSTM) units
as its core.
2.3. Theano
Theano, named after the Greek mathematician who may have been Pythagoras’ wife, was
originally developed at the Montreal Institute for Learning Algorithms (MILA) as a “CPU and GPU
compiler in Python” to “support rapid development of efficient machine learning algorithms” [15].
Although not intended to be a specific DL framework, it is a Python library for speedy numerical
computations and is considered to be a forerunner of today’s open-source DL frameworks, such as
CNTK and TensorFlow. The efficient symbolic differentiation (ability to compute derivatives for
functions with one or more inputs) offered by Theano is considered to be a big advantage for
implementing non-standard DL architectures [9]. For GPU-based training and deployment of both
Data 2018, 3, 28 4 of 19
CNNs and LSTM recurrent neural networks (RNNs), Bahrampour et al. [9] reported that Theano
resulted in the best performance, in comparison with Torch and TensorFlow, for smaller networks.
As of October 2017, after almost ten years of development, it was reported that there will be no further
development of Theano after Version 1.0.
2.4. Torch
Torch, based on LUA programming language, is a scientific computing platform for building
machine learning algorithms with fast and efficient GPU support [16]. Since its initial release in 2002,
Torch has grown to be a popular open-source DL framework for commercial applications (Facebook,
IBM, etc.), academic research studies, and so on, owing to its simplicity and extensibility [10]. Based on
the results of a comparative study by Bahrampour et al. [9], Torch performed the best for CPU-based
training and deployment of any DL architecture that was tested.
2.5. Keras
Keras is a high-level Python DL library and API capable of running on top of TensorFlow, CNTK,
or Theano as the backend [17]. It is well known, among both budding DL researchers and experienced
ones, for its ease-of-use (minimal programming) and ability to allow fast prototyping. Like other
open-source DL software frameworks, Keras is built on the guiding principles of user-friendliness,
modularity, and extensibility.
takes similar computing time for each image (around half a minute). On the other hand, much human
time goes in annotating the images for classification using DIGITS DL. However, once the DL model
was developed, it took only 2.5 ms to analyze one image. It was shown that DL is especially more time
efficient when the number of images exceeded 8000 [31].
Some [31] concluded that DL has significant potential in automated pavement crack detection
and classification especially when it comes to “big data” image analysis. Some [31] also suggested
to carry out image acquisition on a cloudy weather or the addition of an illumination device to the
mobile mapping system to avoid shadows and increase the image quality. Because the accuracy of
DL based classification is dependent on the quantity of images used in training, Some [31] suggested
using at least 10,000 images in each class for better classification accuracy. Extending the DL-based
crack identification analysis to one of object detection analysis, especially with street view images and
crack severity classification (low, medium, and high), was also recommended as a potential area for
future investigation.
2 focused on examining the variation in location between the training and testing set (i.e., train on
images from one location and test on images from another location) on the crack detection performance.
While a computer node at the high performance computing (HPC) facility at the University of Leeds
was used for network training, the testing was carried out on an Intel® Xeon® desktop workstation
with 128 GB RAM and Nvidia Quodro M4000 GPU. The DCNN framework was implemented in Keras
with TensorFlow as the backend. Based on their findings, Pauly et al. [34] concluded that increasing
the depth of DCNNs does lead to better crack detection performance in terms of accuracy and recall,
although they could not define a threshold. Further, the system failed to perform well when the location
of the pavement images used in training and testing were different [34].
3.5. RoadDamageDetector: A DL Mobile App for Road Damage Detection Based on Open-Source Smartphone
Road Images
Maeda et al. [36] employed an end-to-end DL-based state-of-the-art object detection method for
detecting and distinguishing objects in road images, acquired using a smartphone installed in a moving
car under realistic weather and lighting conditions, into eight different output categories (five kinds of
cracks, rutting-bump-pothole-separation, white line blur, and cross walk blur). Furthermore, Maeda et
al. [36] have also made publicly available the entire large-scale dataset consisting of 9053 road distress
images (with 15,435 instances of road distresses), as well as the DL-based smartphone application,
RoadDamageDetector, at this website: https://github.com/sekilab/RoadDamageDetector/.
A strong motivation for the development of the RoadDamageDetector [36] is the strong need
felt by municipalities in Japan (also true for local highway agencies across the globe) to not only
detect the existence of distress from a road image (which its predecessor, the Lightweight Road
Manager [37], did), but also detect and distinguish different types of damage (cracks, rutting, etc.)
Data 2018, 3, 28 7 of 19
using low-cost resources (such as a smartphone app) so that appropriate follow-up maintenance
activities can be pursued to address the distressed pavement [36]. For this study, Maeda et al. [36]
developed a smartphone app that captures road images of 600 × 600 pixels once per second when the
smartphone is mounted on the dashboard of a car driving at an average speed of 40 km/h (25 mph).
As the ultimate goal was a DL-based smartphone app for road damage detection, the single shot
multibox detector (SSD) [38] with Inception V2 and SSD using MobileNet [39] DL frameworks were
chosen for this study as they are considered efficient network architectures (very low computational
burden) and implementations for mobile vision applications [36]. For training SSD Inception V2,
they had to downsize the images from 600 × 600 pixels to 300 × 300 pixels and used an initial learning
rate of 0.002, reduced at a 0.95 rate of decay every 10,000 epochs. For training the SSD MobileNet,
the input image size again was 300 × 300 pixels and they used an initial learning rate of 0.003 with
a similar rate of decay as SSD Inception V2. Experiments were performed on Ubuntu 16.04 OS with
15 GB RAM and an NVIDIA GRID K520 GPU. For the development of the smartphone app, a Nexus 5X
smartphone with an MSM8992 CPU and 2 GB RAM was used. The best-performance results revealed
that the RoadDamageDetector smartphone app was able to achieve recalls and precisions greater than
75% with an inference time of 1.5 s [36].
3.6. Measurement and 3-D Reconstruction of Concealed Cracks in Asphalt Pavements Using GPR Images
When a new asphalt concrete (AC) layer is overlaid over existing (distressed) Portland cement
concrete (PCC) pavement or a base layer (semi-rigid base), it often leads to a phenomenon referred to as
reflection cracking, where a hidden crack initiating in the bottom PCC base/pavement layer propagates
to the top AC layer as a result of repeated traffic and environmental loading [40]. Thus, proactive
preventive measures for reflection cracking involve detection and location of concealed cracks in the
bottom PCC base/pavement when they have not yet propagated to the top. Ground penetrating radar
(GPR), a promising technology for non-destructive evaluation of pavement structures with several
useful subsurface applications, has been used for characterizing subsurface pavement defects like
concealed cracks. However, the state-of-the-art defect detection methods for GPR images involve
complex manual processes, and thus there is a need for GPR-based automated, low-cost damage
characterization method [41]. To fulfill this need, Tong et al. [41] employed separate CNN models for
the automatic recognition, location, measurement, and 3D reconstruction of concealed cracks from
GPR images of asphalt pavement.
A total of 6832 GPR images with various damage types as class labels (‘concealed cracks’,
‘subgrade settlement’, ‘roadbed cavities’, and ‘no damage’) were used for training and testing the CNNs
implemented using the Caffe DL environment in a PC with Intel® CoreTM i7-6700 CPU, 8 GB RAM,
and NVIDIA GeForce GTX GPU [41]. The damage recognition CNN had a typical CNN architecture
used for image classification problems: input layer that accepts input images of size 256 × 256 pixels,
two convolutional layers, two subsampling layers (max-pooling), two fully-connected layers, and one
output layer. The location CNN had a similar structure as the recognition CNN, except that the input
images were scaled to a size of 64 × 64 pixels. The feature extraction CNN (whose outputs were
used for 3D reconstruction) had a similar architecture as the location CNN, except that seven output
neurons were used to represent seven shape coordinates for the damage features. Based on the study
results, Tong et al. [41] concluded that all three CNNs developed for recognition, location, and feature
extraction of concealed cracks achieved their purposes and are suitable for field pavement applications.
3.7. Segmented Grid Based Pavement Crack Classification with DL and PCA
Wang and Hu [42] proposed a CNN-based pavement crack classification method by first
segmenting the pavement image into grids of different sizes, detecting the presence of cracks,
and then classifying the crack type (longitudinal, transverse, and alligator crack) after analyzing
the distribution of grids using principal component analysis (PCA). For their study, RGB pavement
images (3264 × 2448 pixels) were captured with an iPhone 6 smartphone by maintaining the distance
Data 2018, 3, 28 8 of 19
between the smartphone camera and the pavement to be approximately 1.3 m (4.3 ft). The RGB images
were then downsized to grayscale images of 960 × 704 pixels, suitable for analysis. Two grid settings
were applied on each pre-processed image: 15 × 11 non-overlapping grids, with each grid holding
64 × 64 pixels; and 30 × 22 grids, with a grid size of 32 × 32 pixels.
The CNN architecture implemented by Wang and Hu [42] in the TensorFlow DL framework
consisted of two convolutional layers, two sub-sampling layers (max-pooling), and one fully-connected
output layer. They employed the tanh activation function as opposed to the famous ReLu activation
function used by most vision-based image classification studies. The learning rate was set to 0.1 and
the batch size to 32. By applying the trained CNN for crack detection on segmented pavement images,
they derived the skeleton of cracks by retaining only the grids containing crack pixels and calculating
the coordinates of crack regions. PCA is then applied on the distribution of grids containing cracks
to estimate the specific crack type: longitudinal, transverse, and alligator. Based on their study
results, Wang and Hu [42] reported that segmentation of pavement images with a grid size of 64 × 64
achieved better classification performance with the following correct rate of crack classification: 97.2%
(longitudinal crack), 97.6% (transverse crack), and 90.1% (alligator crack).
3.8. Learning the Structure of Pavement Cracks from Raw Image Patches
Fan et al. [43] proposed an automated pavement crack detection methodology based on structured
prediction using CNN modeled as a multi-label classification problem. Their proposed methodology
has the ability to predict the whole crack structure at the pixel level in a pavement image based on
learning the crack structure of a very small patch within the image. First, a training database is built by
extracting individual pixel-level patches from raw images to train a CNN. As the ratio of crack-pixels
to non-crack-pixels is quite low for typical crack images, Fan et al. [43] also introduced a novel strategy
to address the challenge of severely imbalanced samples. By summing the centered-patch-structure
predictions of the trained CNN applied on all pixels, a probability map is obtained that reflects the
overall pavement condition.
The dataset for their experiments came from two established pavement images databases: CFD [44]
and AigleRN [18]. Fan et al. [43] employed a typical CNN architecture with four convolutional layers, two
sub-sampling (max-pooling) layers, and three fully-connected layers. All hidden layers were equipped
with ReLu units and the output layer with sigmoid activation function. The CNN was implemented in
a TensorFlow DL environment and the experiments were carried out on an Intel Xeon E5-2690 workstation
with 2.9 GHz CPU, 64 GB RAM, and NVIDIA Quadro K5000 GPU. A dropout ratio of 0.5 was used and
the weight decay rate was set to 0.0005 during the experiments. The Adam optimizer was employed with
a default learning rate of 0.001 and the batch size was set to 256. Based on their study results, Fan et al. [43]
reported that their proposed methodology achieved better crack detection performance compared with
other state-of-the-art methods, especially in dealing with different pavement textures, and has a better
generalization ability (training on one database and testing on another).
3.9. Continuous Pavement Inspection with CNN Trained on Google Street View Images
Ma et al. [45] propose an innovative method for large-scale image-based pavement degradation
assessment using CNN, Fisher vector encoding, and UnderBagging random forests. While most studies
reviewed so far on this topic focused on pavement images acquired with data collection vehicles or
smartphones, the study by Ma et al. [45] combined the publicly available maintenance records of road
infrastructure (along with GPS coordinates) with GPS-localized Google Street View images to create
a readily-labeled, large-scale road images dataset. To overcome the challenges associated with this
atypical dataset for pavement degradation assessment (such as texture classification when there is
large variation, class imbalance, etc.), Fisher vectors with CNN (FV–CNN) and UnderBagging random
forests were employed to develop an automated pavement condition rating method.
For their study, Ma et al. [45] collected more than 700,000 images of road surfaces from about
70,000 street segments recorded in the New York City Department of Transportation. Using their
Data 2018, 3, 28 9 of 19
proposed methodology, they were able to achieve an accuracy of 59.2% on the proposed complex
dataset. Their dataset and the code are publicly available for download at the following site: http:
//www3.cs.stonybrook.edu/~cvl/pavement.html.
3.11. Automated Crack Detection with Pre-Trained DL Model Using Transfer Learning
Gopalakrishnan et al. [47] proposed the use of a pre-trained DCNN model with transfer learning
for automated pavement crack detection. Large annotated image datasets are typically required by
DCNNs to achieve accurate and generalized predictions. It is not always possible to have access to large
pavement images datasets, although some very recent studies reviewed in this section have made them
available publicly to spur innovative research in this area. In many domains, acquisition of data and
annotating them, especially at large scale, is cost-prohibitive. The use of ‘off-the-shelf’ DCNN features
of well-established DCNNs such as VGG-16 (16-layer DCNN developed by the Visual Geometry Group
(VGG) at the University of Oxford), AlexNet, and GoogLeNet pre-trained on large-scale annotated
natural image datasets (such as ImageNet) has worked well for most cross-domain image classification
problems through the concept of transfer learning and fine-tuning [48]. Transfer learning enables the
deployment of deep learning models trained on “big data” natural-image datasets (like ImageNet)
and “transfers” their learning ability to a new domain, rather than freshly train a DCNN classifier
from the beginning [49]. This highly repurposable nature of deep learning models has been amply
demonstrated through several cross-domain image classification studies, especially in the area of
medical image analysis [50].
An interesting aspect of the research study by Gopalakrishnan et al. [47] was that the truncated
VGG-16 DCNN, pre-trained on massive ImageNet database that contains millions of natural images,
was used to vectorize the labeled pavement images, and a machine learning classifier was then used
to predict the labels (‘crack’ or ‘no crack’). In this strategy [17], (1) only the convolutional part of the
Data 2018, 3, 28 10 of 19
VGG-16 DCNN (everything up to the fully-connected layers) is instantiated; (2) the model is run once
with the training and validation data, and the last activation maps before the fully-connected layers
or the “bottleneck features” from the VGG-16 DCNN are recorded; and (43) a small fully-connected
classifier is finally trained on the stored features. The dataset (about 1056 pavement images) used
in their study came from the Federal Highway Administration’s (FHWA’s) Long-Term Pavement
Performance (LTPP) database and it had a mixture of AC-surfaced and PCC-surfaced pavement
images. They used the Keras DL framework and the experiments were carried out in a PC equipped
with Intel® CoreTM i7-5600 CPU @ 2.60 GHz (20 GB RAM) with no GPUs on 64-bit Windows® 10 OS.
The original input images with a size of 3072 × 2048 pixels were downscaled to 1000 × 500 pixels to
reduce the computational burden. For the single-layer NN classifier implemented in Keras, 256 neurons
were used in the hidden layer. The dropout value was set to 0.5. The image batch size was set to 32 and
all models were trained for up to 50 epochs. A single-layer NN classifier (with ‘Adam’ optimizer)
trained on ImageNet pre-trained VGG-16 DCNN features yielded the best performance [47].
that their proposed method can rapidly and easily achieve up to 90% accuracy in finding cracks in
realistic situations without any data augmentation and preprocessing.
4. Discussion
Deep learning is fast becoming a successful alternative approach for vision-based pavement crack
detection. A total of 12 recently published papers were reviewed in this study, among which two were
published in 2016, seven in 2017, and two in 2018 (as of March). Based on the current interest and
continued success of DL technology, it can be easily expected that by the end of 2018, these numbers
will be doubled, if not tripled.
Table 1. Overview of objectives and datasets used by papers. DL—deep learning; GPR—ground
penetrating radar; FHWA/LTPP—Federal Highway Administration’s Long-Term Pavement
Performance database.
Table 2. Overview of deep learning architecture and hyper-parameters used by individual studies.
Table 3. Overview of deep learning software framework, CPU and GPU specs, and summary test results from individual studies.
Method(s) for
Reference DL Software Framework CPU Specs GPU Specs Test Results Summary
Comparison
A = 0.9225; P = 0.9841; R =
Some [31] DIGITS/Caffe N/A N/A CrackIT
0.8493
Intel® Xeon® E3-1241 V3 P = 0.8696; R = 0.9251; F1 SVM and Boosting
Zhang et al. [33] Caffe NVIDIA Quadro K220
@ 3.5 GHz (8 GB RAM) = 0.8965 Methods
A = 0.913;
Intel® Xeon® E5-1630 v4
Pauly et al. [34] N/A NVIDIA Quadro M4000 P = 0.907; N/A
@ 3.70 GHz (128GB RAM)
R = 0.920
Eisenbach et al. [35] Keras (Theano backend) N/A NVIDIA Titan X A = 0.9772; F1 = 0.7246 CrackIT
AC:
A = 0.85;
Maeda et al. [36] TensorFlow N/A NVIDIA GRID K520 N/A
P = 0.73;
R = 0.68
Intel® CoreTM i7-6700 Recognition:
Tong et al. [41] Caffe NVIDIA GeForce GTX N/A
(8 GB RAM) A = 0.998
Wang and Hu [42] TensorFlow N/A N/A AC: A = 0.901 Neural Networks
P = 0.9018; Canny; Local
Intel® Xeon® E5-2690 2.9
Fan et al. [43] TensorFlow NVIDIA Quadro K5000 R = 0.9494; Thresholding;
GHz (64 GB RAM)
F1 = 0.9210 CrackForest
Keras (TensorFlow
Ma et al. [45] N/A N/A Average A = 0.582 SVM
backend)
C++ (CUDA C Platform
NVIDIA GeForce GTX P = 0.9013; R = 0.8763; F1 Pixel-SVM; Shadow
Zhang et al. [21] without using NVIDIA N/A
Titan (2 devices) = 0.8886 modeling
cuDNN library)
Intel® CoreTM i7-5600 @ A = 0.90; P = 0.90; R = 0.90;
Gopalakrishnan et al. [47] Keras (Theano backend) Not used N/A
2.60 GHz (20 GB RAM) F1 = 0.90
P = 0.847; R = 0.951; F1 = Canny; CrackIT;
Zhang et al. [51] Caffe with MATLAB HP Z220 workstation NVIDIA Quadro K4000
0.895 CrackForest
Note: N/A = not available or not applicable; SVM = support vector machine; A = accuracy; P = precision; R = recall; F1 = F1 score; AC = alligator crack.
It is acknowledged that the summary of results presented in Table 3 may not present a fair
comparison of the reported studies, primarily because different datasets were used by the studies.
Unless the proposed methods are tested on a standard benchmarking dataset, it is difficult to make a
fair comparison of their strengths and weaknesses, let alone recommending a winning strategy.
Another
Data 2018, 3,complicating
28 factor is the lack of a standard definition/consensus or metric for a pavement
15 of 19
crack and the different types of cracks among the pavement and computer vision community.
While end-to-end deep learning is an attractive notion where one can go directly from raw data
to desired result (e.x., a speech recognition task where one can directly go from raw audio recording to
transcription) without the need for hand-crafted features or multi-step feature processing, it requires
lots of input–output data for training efficient and accurate models. Strictly speaking, currently, there
are no end-to-end deep learning models for vision-based automated pavement distress detection.
This trend may change in the near future as more annotated or labeled pavement images dataset are
being made publicly available [35,36]. Meanwhile, deep transfer learning appears to be a reasonable
approach for researchers wanting to get familiarized with the application of CNNs to pavement
distress detection, as pre-trained CNNs trained on “big data” natural image (annotated) datasets could
be used on 2D pavement images as feature extractors [47,51].
Another major challenge is class imbalance and its effect on deep learning classification
performance. Typically, pavement images containing distresses are fewer compared with those
with no distress, especially considering a multi-class (longitudinal crack, transverse crack, alligator
crack, etc.) problem. This disparity can be more pronounced in the context of street view images.
This creates class imbalance, a problem that has been comprehensively studied in machine learning,
but is beginning to be studied and addressed in the context of deep learning [55]. This could be an
important area of future research in the context of deep learning application to automated pavement
distress detection.
In terms of hyper-parameter optimization, there appears to be some overall consensus among the
reviewed papers with respect to the use of ReLU activation function and the use of dropout to prevent
over-fitting. Other parameters like the learning rate, weight decay, and so on, took on default values in
most studies. This seems to suggest that fine tuning of hyper-parameters, although an important step,
is only secondary to the selection of architecture and training data quality, which have far more impact
on the network performance in vision-based pavement distress detection.
Apart from classification of distresses, characterizing the extent and severity of distresses are
equally important in the context of a pavement management system and reporting requirements.
This is an area that has been traditionally overlooked in the classical machine learning approach to
automated pavement distress detection. Not surprisingly, the current deep learning studies reviewed
in this paper did not address this topic, but it appears that most research groups are working on it
as part of their ongoing research. This could once again be attributed to the lack of availability of
large-scale annotated pavement image datasets that also include the extent and severity annotation.
Self-taught learning and unsupervised feature learning is a promising area of deep learning
research, where the algorithm is enabled to learn from unlabeled data. A single unlabeled example is
definitely less informative than a labeled one. However, if massive amounts of unlabeled examples
are made available and if deep learning algorithms are designed to exploit this unlabeled big data
effectively (by learning a good feature representation of the input), then they are expected to outperform
traditional approaches requiring massive manual labeling [56]. As mentioned previously, public
repositories with massive amounts of pavement images are being made available by researchers
and organizations. Combined with this, crowd-sourced smartphone pavement images and recent
explosion of interest in UAVs or drones for visual inspection, among others, are expected to result in
massive amounts of unlabeled pavement images suitable for unsupervised feature learning. Variational
auto-encoders (VAEs) and generative adversarial networks (GANs) are two innovative unsupervised
strategies that could be explored further for end-to-end unsupervised deep learning of representative
features from large amounts of unlabeled pavement images [50]. This will hopefully result in more
robust deep learning solutions that are less sensitive to location variance (changes in locations and
conditions of pavement images between the training and testing set).
The study of spatio-temporal variations in pavement surfaces is another area of research that
can benefit from advances in deep learning application to pavement image analysis. Especially in
the case of public image repositories that are routinely updated (e.x., Google StreetView images with
timestamps), one could study the variations in texture and other surface characteristics with time [45].
Data 2018, 3, 28 17 of 19
This could also lead to research in the broader area of vision-based pavement anomaly detection,
where one could study the evolution of distresses with time. Finally, recent advances in automated
captioning of images using CNNs-recurrent neural networks (LSTMs) hybrid architectures, which
combine the power of computer vision with natural language processing, may become a hot area of
research with the overarching goal of achieving textural representation of pavement images describing
the type, extent, and severity of distresses.
References
1. ASCE. American Society of Civil Engineers (ASCE) 2017 Infrastructure Report Card: Roads; American Society of
Civil Engineers (ASCE): Reston, VA, USA, 2017.
2. Flintsch, G.; McGhee, K. NCHRP Synthesis 401: Quality Management of Pavement Condition Data Collection;
Transportation Research Board: Washington, DC, USA, 2009.
3. Gopalakrishnan, K. Advanced Pavement Health Monitoring and Management. IGI Global Videos, 2016.
Available online: http://www.igi-global.com/video.aspx?ref=advanced-pavement-health-monitoring-
management&titleid=137625 (accessed on 9 May 2017).
4. Tsai, Y.; Wang, Z. Development of an Asphalt Pavement Raveling Detection Algorithm Using Emerging 3D Laser
Technology and Macrotexture Analysis; Final Report NCHRP IDEA Project 163; Transportation Research Board:
Washington, DC, USA, 2015.
5. Wang, K.C.P.; Li, Q.J.; Yang, G.; Zhan, Y.; Qiu, Y. Network level pavement evaluation with 1 mm 3D survey
system. J. Traffic Transp. Eng. Engl. Ed. 2015, 2, 391–398. [CrossRef]
6. Xie, D.; Zhang, L.; Bai, L. Deep Learning in Visual Computing and Signal Processing. Appl. Comput. Intell.
Soft Comput. 2017, 2017, e1320780. [CrossRef]
7. Agrawal, A.; Choudhary, A. Perspective: Materials informatics and big data: Realization of the ‘fourth
paradigm’ of science in materials science. APL Mater. 2016, 4, 053208. [CrossRef]
8. Liu, W.; Wang, Z.; Liu, X.; Zeng, N.; Liu, Y.; Alsaadi, F.E. A survey of deep neural network architectures and
their applications. Neurocomputing 2017, 234, 11–26. [CrossRef]
9. Bahrampour, S.; Ramakrishnan, N.; Schott, L.; Shah, M. Comparative Study of Deep Learning Software
Frameworks. arXiv, 2016.
10. Fonnegra, R.D.; Blair, B.; Díaz, G.M. Performance comparison of deep learning frameworks in
image classification problems using convolutional and recurrent networks. In Proceedings of the 2017
IEEE Colombian Conference on Communications and Computing (COLCOM), Cartagena, Colombia,
16–18 August 2017; pp. 1–6.
11. Shi, S.; Wang, Q.; Xu, P.; Chu, X. Benchmarking State-of-the-Art Deep Learning Software Tools.
In Proceedings of the 2016 7th International Conference on Cloud Computing and Big Data (CCBD),
Macau, China, 16–18 November 2016; pp. 99–104.
12. Caffe. Caffe—Deep Learning Framework. 2018. Available online: http://caffe.berkeleyvision.org/
(accessed on 15 February 2018).
13. Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; Darrell, T. Caffe:
Convolutional Architecture for Fast Feature Embedding. In Proceedings of the 22nd ACM International
Conference on Multimedia, New York, NY, USA, 19–21 August 1998; pp. 675–678.
14. TensorFlow. TensorFlow. 2018. Available online: https://www.tensorflow.org/ (accessed on 15 February 2018).
15. Theano. Theano at a Glance—Theano 1.0.0 Documentation. 2018. Available online: http://deeplearning.
net/software/theano/introduction.html (accessed on 16 February 2018).
16. Torch. Torch—Scientific Computing for LuaJIT. 2018. Available online: http://torch.ch/ (accessed on
16 February 2018).
17. Chollet, F. Keras; GitHub: San Francisco, CA, USA, 2015.
18. Chambon, S.; Moliard, J.-M. Automatic Road Pavement Assessment with Image Processing: Review and
Comparison. Int. J. Geophys. 2011, 20. [CrossRef]
Data 2018, 3, 28 18 of 19
19. Oliveira, H.; Correia, P.L. Automatic road crack segmentation using entropy and image dynamic thresholding.
In Proceedings of the 2009 17th European Signal Processing Conference, Glasgow, UK, 24–28 August 2009;
pp. 622–626.
20. Tsai, Y.-C.; Kaul, V.; Mersereau, R.M. Critical assessment of pavement distress segmentation methods.
J. Transp. Eng. 2009, 136, 11–19. [CrossRef]
21. Zhang, D.; Li, Q.; Chen, Y.; Cao, M.; He, L.; Zhang, B. An efficient and reliable coarse-to-fine approach for
asphalt pavement crack detection. Image Vis. Comput. 2017, 57, 130–146. [CrossRef]
22. Ayenu-Prah, A.; Attoh-Okine, N. Evaluating Pavement Cracks with Bidimensional Empirical Mode
Decomposition. EURASIP J. Adv. Signal Process. 2008, 861701, 2008. [CrossRef]
23. Subirats, P.; Dumoulin, J.; Legeay, V.; Barba, D. Automation of Pavement Surface Crack Detection using the
Continuous Wavelet Transform. In Proceedings of the 2006 International Conference on Image Processing,
Atlanta, GA, USA, 8–11 October 2006; pp. 3037–3040.
24. Wang, K.C.P.; Li, Q.; Gong, W. Wavelet-Based Pavement Distress Image Edge Detection with à Trous
Algorithm. Transp. Res. Rec. J. Transp. Res. Board 2007, 2024, 73–81. [CrossRef]
25. Ying, L.; Salari, E. Beamlet Transform-Based Technique for Pavement Crack Detection and Classification.
Comput. Aided Civ. Infrastruct. Eng. 2010, 25, 572–580. [CrossRef]
26. Koch, C.; Georgieva, K.; Kasireddy, V.; Akinci, B.; Fieguth, P. A review on computer vision based defect
detection and condition assessment of concrete and asphalt civil infrastructure. Adv. Eng. Inform. 2015, 29,
196–210. [CrossRef]
27. Oliveira, H.; Correia, P.L. Automatic Road Crack Detection and Characterization. IEEE Trans. Intell.
Transp. Syst. 2013, 14, 155–168. [CrossRef]
28. Fujita, Y.; Shimada, K.; Ichihara, M.; Hamamoto, Y. A method based on machine learning using hand-crafted
features for crack detection from asphalt pavement surface images. In Proceedings of the Thirteenth
International Conference on Quality Control by Artificial Vision 2017, Tokyo, Japan, 14 May 2017;
Volume 10338, 103380M.
29. Hizukuri, A.; Nagata, T. Development of a classification method for a crack on a pavement surface images
using machine learning. In Proceedings of the Thirteenth International Conference on Quality Control by
Artificial Vision 2017, Tokyo, Japan, 14 May 2017; 103380M; Volume 10338, 103380M.
30. Zou, Q.; Cao, Y.; Li, Q.; Mao, Q.; Wang, S. CrackTree: Automatic crack detection from pavement images.
Pattern Recognit. Lett. 2012, 33, 227–238. [CrossRef]
31. Some, L. Automatic Image-Based Road Crack Detection Methods; KTH Royal Institute of Technology:
Stockholm, Sweden, 2016.
32. Oliveira, H.; Correia, P.L. CrackIT—An image processing toolbox for crack detection and characterization.
In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France,
27–30 October 2014; pp. 798–802.
33. Zhang, L.; Yang, F.; Zhang, Y.D.; Zhu, Y.J. Road crack detection using deep convolutional neural network.
In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA,
25–28 September 2016; pp. 3708–3712.
34. Pauly, L.; Peel, H.; Hogg, D.; Fuentes, R. Deeper Networks for Pavement Crack Detection. In Proceedings of
the 34th International Symposium on Automation and Robotics in Construction (ISARC 2017), Taipei, Taiwan,
28 June–1 July 2017; pp. 1–7.
35. Eisenbach, M.; Stricker, R.; Seichter, D.; Amende, K.; Debes, K.; Sesselmann, M.; Ebersbach, D.; Stoeckert, U.;
Gross, H.M. How to get pavement distress detection ready for deep learning? A systematic approach.
In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA,
14–19 May 2017; pp. 2039–2047.
36. Maeda, H.; Sekimoto, Y.; Seto, T.; Kashiyama, T.; Omata, H. Road Damage Detection Using Deep Neural
Networks with Images Captured Through a Smartphone. arXiv, 2018.
37. Maeda, H.; Sekimoto, Y.; Seto, T. Lightweight Road Manager: Smartphone-based Automatic Determination
of Road Damage Status by Deep Neural Network. In Proceedings of the 5th ACM SIGSPATIAL International
Workshop on Mobile Geographic Information Systems, New York, NY, USA, 31 October 2016; pp. 37–45.
38. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, Ch.; Berg, A.C. SSD: Single Shot MultiBox Detector.
In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; Volume 9905, pp. 21–37.
Data 2018, 3, 28 19 of 19
39. Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H.
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv, 2017.
40. Dhakal, N.; Elseifi, M.A.; Zhang, Z. Mitigation strategies for reflection cracking in rehabilitated
pavements—A synthesis. Int. J. Pavement Res. Technol. 2016, 9, 228–239. [CrossRef]
41. Tong, Z.; Gao, J.; Zhang, H. Recognition, location, measurement, and 3D reconstruction of concealed cracks
using convolutional neural networks. Constr. Build. Mater. 2017, 146, 775–787. [CrossRef]
42. Wang, X.; Hu, Z. Grid-based pavement crack analysis using deep learning. In Proceedings of the 4th
International Conference on Transportation Information and Safety (ICTIS), Banff, AB, Canada, 8–10 August
2017; pp. 917–924.
43. Fan, Z.; Wu, Y.; Li, W. Automatic Pavement Crack Detection Based on Structured Prediction with the
Convolutional Neural Network. arXiv, 2018.
44. Shi, Y.; Cui, L.; Qi, Z.; Meng, F.; Chen, Z. Automatic Road Crack Detection Using Random Structured Forests.
IEEE Trans. Intell. Transp. Syst. 2016, 17, 3434–3445. [CrossRef]
45. Ma, K.; Hoai, M.; Samaras, D. Large-scale Continual Road Inspection: Visual Infrastructure Assessment in
the Wild. In Proceedings of the British Machine Vision Conference, London, UK, 4–7 September 2017.
46. Zhang, A.; Wang, K.C.P.; Li, B.; Yang, E.; Dai, X.; Peng, Y.; Fei, Y.; Liu, Y.; Li, J.Q.; Chen, C.
Automated Pixel-Level Pavement Crack Detection on 3D Asphalt Surfaces Using a Deep-Learning Network.
Comput. Civ. Infrastruct. Eng. 2017, 32, 805–819. [CrossRef]
47. Gopalakrishnan, K.; Khaitan, S.K.; Choudhary, A.; Agrawal, A. Deep Convolutional Neural Networks with
transfer learning for computer vision-based data-driven pavement distress detection. Constr. Build. Mater.
2017, 157, 322–330. [CrossRef]
48. Shin, H.-C.; Roth, H.R.; Gao, M.; Lu, L.; Xu, Z.; Nogues, I.; Yao, J.; Mollura, D.; Summers, R.M. Deep
convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and
transfer learning. IEEE Trans. Med. Imaging 2016, 35, 1285–1298. [CrossRef] [PubMed]
49. Bar, Y.; Diamant, I.; Wolf, L.; Lieberman, S.; Konen, E.; Greenspan, H. Chest pathology detection using deep
learning with non-medical training. In Proceedings of the 2015 IEEE 12th International Symposium on
Biomedical Imaging (ISBI), New York, NY, USA, 16–19 April 2015; pp. 294–297.
50. Litjens, G.; Kooi, T.; Bejnordi, B.E.; Setio, A.A.A.; Ciompi, F.; Ghafoorian, M.; van der Laak, J.A.W.M.; van
Ginneken, B.; Sánchez, C.I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42,
60–88. [CrossRef] [PubMed]
51. Zhang, K.; Cheng, H.D.; Zhang, B. Unified Approach to Pavement Crack and Sealed Crack Detection Using
Preclassification Based on Transfer Learning. J. Comput. Civ. Eng. 2018, 32, 04018001. [CrossRef]
52. Cha, Y.-J.; Choi, W.; Büyüköztürk, O. Deep Learning-Based Crack Damage Detection Using Convolutional
Neural Networks. Comput. Aided Civ. Infrastruct. Eng. 2017, 32, 361–378. [CrossRef]
53. Feng, C.; Liu, M.-Y.; Lee, T.-Y. Deep Active Learning for Civil Infrastructure Defect Detection and
Classification. In Proceedings of the 2017 International Workshop on Computing in Civil Engineering,
Seattle, WA, USA, 25–27 June 2017.
54. Gopalakrishnan, K.; Gholami, H.; Vidyadharan, A.; Choudhary, A.; Agrawal, A. Crack Damage Detection
in Unmanned Aerial Vehicle Images of Civil Infrastructure Using Pre-Trained Deep Learning Model.
Int. J. Traffic Transp. Eng. 2018, 8, 1–14.
55. Buda, M.; Maki, A.; Mazurowski, M.A. A systematic study of the class imbalance problem in convolutional
neural networks. arXiv, 2017.
56. Stanford University. Unsupervised Feature Learning and Deep Learning Tutorial. Deep Learning Tutotrial,
2018. Available online: http://ufldl.stanford.edu/tutorial/selftaughtlearning/SelfTaughtLearning/
(accessed on 9 March 2018).
© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).