A_Comprehensive_Review_of_Convolutional_Neural_Networks_for_Defect_Detection_in_Industrial_Applications
A_Comprehensive_Review_of_Convolutional_Neural_Networks_for_Defect_Detection_in_Industrial_Applications
ABSTRACT Quality inspection and defect detection remain critical challenges across diverse industrial
applications. Driven by advancements in Deep Learning, Convolutional Neural Networks (CNNs) have
revolutionized Computer Vision, enabling breakthroughs in image analysis tasks like classification and
object detection. CNNs’ feature learning and classification capabilities have made industrial defect detection
through Machine Vision one of their most impactful applications. This article aims to showcase practical
applications of CNN models for surface defect detection across various industrial scenarios, from pallet racks
to display screens. The review explores object detection methodologies and suitable hardware platforms
for deploying CNN-based architectures. The growing Industry 4.0 adoption necessitates enhancing quality
inspection processes. The main results demonstrate CNNs’ efficacy in automating defect detection, achieving
high accuracy and real-time performance across different surfaces. However, challenges like limited datasets,
computational complexity, and domain-specific nuances require further research. Overall, this review
acknowledges CNNs’ potential as a transformative technology for industrial vision applications, with
practical implications ranging from quality control enhancement to cost reductions and process optimization.
INDEX TERMS Computer vision, convolutional neural network, deep learning, industrial defect detection,
object detection, quality inspection: manufacturing.
applications, especially due to the abundance of complex data The field of Image Recognition has undergone a revolu-
from various sources like medical, industrial, sensor, social, tionary transformation over the years, driven by the evolution
visual, text, audio, graph, etc. of DL. Within this domain, CNNs [38] have emerged as a
A widely known DL technology is the Convolutional powerful model, propelling accuracy to levels approaching or
Neural Network (CNN) [11], [12], [13], which is hugely even surpassing human capabilities. This remarkable ascent
used in various domains like Natural Language Processing can be attributed to several key advantages of CNNs over
(NLP) [14], [15], [16], [17], [18], [19], [20], [21], [22], traditional artificial feature extraction and shallow learning
[23], [24], [25], Image [26], [27], [28], [29] and Speech methods. Unlike conventional methods that rely on hand-
recognition [30], [31], [32], [33], [34]. Several models crafted features, CNNs leverage a hierarchical architecture
have demonstrated that CNNs are particularly effective in composed of multiple convolutional and pooling layers. This
image recognition tasks, achieving state-of-the-art (SOTA) layered structure empowers the network to progressively
results [3]. A crucial part of their success is because the extract increasingly complex and abstract features from
hierarchical architecture enables them to retrieve features at the raw image data. Furthermore, CNNs follow a data-
varying degrees of abstraction and extract spatial features driven approach. Unlike traditional methods that depend
and patterns in images [35], [36]. Figure 1 shows the general heavily on expert-defined features, CNNs learn directly
relationship between AI, ML, DL, and CNN. from large image datasets. This automates feature extraction,
significantly lowering the barrier to entry for CV research and
product development.
Intelligent manufacturing is recognised as a corner-
stone of Industry 4.0, with quality inspection serving
as a critical and indispensable link in both production
and maintenance processes. Machine vision-based surface
defect detection plays a pivotal role in ensuring prod-
uct quality across diverse industries. However, industrial
image processing poses unique challenges compared to
natural image analysis, with a crucial emphasis on non-
standard customization. Consequently, directly applying
generic CNN models to industrial tasks often proves
ineffective. This necessitates the development of cus-
tomized CNN transformations tailored to specific industrial
scenarios.
This paper aims is to comprehensively analyze the appli-
cations of CNNs, particularly within the domain of Industrial
Surface Defect Detection. To broaden the scope of the review,
domains closely related to Structural Health Monitoring
(SHM), such as inspection methodologies for identifying
defects in various surfaces, such as pallet racks, exudates [39],
steel, rail, magnetic tiles, photovoltaic cells [40], [41],
FIGURE 1. A diagram showing the relationship between AI, ML, DL, and
CNN. fabrics, displays, etc., have been systematically investigated
to furnish a thorough and efficient review. By meticulously
surveying relevant literature, this review aims to serve as a
A. SURVEY OBJECTIVE valuable resource and guide for researchers and practitioners
Advances in Computer Vision (CV) are capable of enhancing in the industrial sector. It sheds light on the effective utiliza-
operations in a variety of industries. Despite this, researchers tion of advanced DL technologies, specifically customized
are increasingly discovering that CV architectures are incom- CNNs, for robust and automated surface defect detection in
patible with existing application requirements while being diverse industrial settings.
transferred from academic laboratories to manufacturing To facilitate a comprehensive understanding of CNNs
and industrial sectors. Several factors contribute to this in Industrial inspection systems, this paper delves into
problem, including a high cost of computation, increased critical interconnected modules: their historical evolution,
energy consumption, and more processing power from the fundamental building blocks, practical Object Detection
Central Processing Units (CPU) and Graphics Processing (OD) implementation technologies, and the essential hard-
Units (GPU) [37]. Due to these incompatibilities, researchers ware components necessary for efficient deployment. This
have shifted their focus to multidimensional parameters, exploration will equip readers with the knowledge and tools
like computational complexities, architectural footprints, and required to effectively leverage the power of CNNs for
energy consumption, rather than one-dimensional criteria robust and automated defect detection in diverse industrial
which provides high accuracy. settings.
B. RESEARCH SCOPE AND METHODOLOGY explored to identify further relevant research (forward
1) RESEARCH QUESTIONS searching).
To establish a focused inquiry, we initiated this review 4) Manual Search and Expert Consultation: To ensure
with a preliminary investigation. This investigation aimed to comprehensiveness, a targeted manual search for
identify a set of research questions (RQs) considered to be the relevant literature was conducted. Additionally, con-
most pertinent to the current understanding of CNNs and their sultations were held with the supervisory team and
impact in the Industrial Surface Defect Detection. These RQs researchers in the field to glean recommendations for
will subsequently guide the analysis and discussion presented pertinent literature sources.
throughout this article. 5) Data Extraction and Analysis: A systematic method-
ology was employed to extract and synthesize critical
1) What are the essential components of CNNs? information from the selected literature. This process
2) How have CNNs evolved over time, and what are the focused on gathering data on several key areas, includ-
recent advancements in their architecture? ing the categorization of surfaces within industrial
3) What are the key components and techniques involved settings, the various categories of defects, the employed
in implementing CNNs for Object Detection (OD)? CNN architectures and their modifications, techniques
4) What are the essential hardware components and utilized for performance improvement, the characteris-
considerations for efficient deployment of CNN-based tics of the datasets leveraged, and the principal findings
models? or outcomes generated by the implemented models.
5) What are the different applications of CNNs in the 6) Qualitative and Quantitative Analysis: Qualitative
domain of Industrial Surface Defect Detection? analysis was conducted to discern common themes,
6) How have CNNs been employed for defect detection challenges, and promising future research directions
in various industrial surfaces, such as pallet racks, steel, within the existing literature. Quantitative analysis was
rail, magnetic tiles, photovoltaic cells, fabrics, displays, employed to compare the performance of various CNN
etc.? architectures, techniques, and implementations applied
7) What techniques or strategies have been employed to to the domain of Industrial Surface Defect Detection.
enhance the performance of CNN models for Industrial 7) Dataset Analysis: Conducted a critical examination
Surface Defect Detection? of datasets employed in previous and current research
8) What types of datasets (real-world, synthetic, public, on industrial surface defect detection. This analysis
proprietary) have been commonly used for training and encompassed both publicly accessible datasets and
evaluating CNN models in this domain? domain-specific datasets curated by researchers or
industrial collaborators.
2) RESEARCH METHODS
C. RELATIONSHIP BETWEEN CNNS AND HARDWARE
This research employed a multifaceted approach to conduct
INFRASTRUCTURE
a thorough review of the literature on CNNs for Industrial
Surface Defect Detection. The methodology encompassed The field of CV has witnessed a transformative era of
several key strategies: architectural advancements in CNNs, resulting in a paradigm
shift towards automated visual applications. Despite their
1) Systematic Literature Review: A rigorous search significant breakthroughs, the successful integration of CNN
and analysis were undertaken, encompassing relevant models into the real world extends beyond the intricacies
research articles, conference proceedings, and techni- of their design. Hardware configurations emerge a crucial
cal reports. Clear inclusion and exclusion criteria were factor, dictating the feasibility, efficiency, and ultimately, the
established for selecting the reviewed literature. tangible applicability of CNN-based solutions. Therefore,
2) Keyword-based Search: A comprehensive set of key- establishing a firm understanding of the symbiotic relation-
words pertaining to the field, including ‘‘Convolutional ship between CNN development and underlying hardware
Neural Networks,’’ ‘‘CNN Components’’, ‘‘Hardware constraints is paramount before embarking on a detailed
Accelerators’’, ‘‘Surface Defect Detection,’’ ‘‘Indus- exploration of individual architectures.
trial Quality Inspection,’’ ‘‘Structural Health Monitor-
ing,’’ and ‘‘Object Detection,’’ was utilized to search 1) COMPUTATIONAL DEMANDS
prominent academic databases like IEEE Xplore, ACM CNNs exhibit significant computational requirements, neces-
Digital Library, ScienceDirect, Scopus, and Google sitating considerable processing power to execute their
Scholar. intricate operations. As CNN model complexity increases,
3) Backward and Forward Reference Searching: with enhanced depth and breadth for capturing intricate
Reference lists of identified relevant papers were features, their computational demands escalate exponen-
meticulously examined to unearth additional pertinent tially. Therefore, a meticulous assessment of hardware
literature sources (backward searching). Furthermore, considerations, particularly the selection of appropriate
papers citing the initially selected sources were processors and accelerators, is imperative to ensure the
effective execution and real-time performance. To address exhibiting the development of custom hardware acceleration
the demands of CNNs, specialized hardware platforms tailored to specific CNN architectures. However, ASICs,
such as GPUs, Field-Programmable Gate Arrays (FPGAs), despite boasting the highest performance and energy
and Application-Specific Integrated Circuits (ASICs) offer efficiency, incur substantial development costs and limited
substantial computational power capable of accelerating both flexibility.
training and inference processes. The design of CNN architectures and hardware consider-
ations exhibit a symbiotic relationship, where each domain
2) MEMORY DEMANDS critically influences and advances the other. Advancements
CNNs are known for their substantial memory demands, in hardware capabilities, particularly in terms of computa-
primarily due to the need to store intermediate feature tional power and memory resources, pave the way for the
maps and inference weights. The rapid evolution of deeper development of increasingly complex and sophisticated CNN
architectures has led to an exponential increase in model architectures. Conversely, innovative CNN architectures
size, measured in terms of the number of parameters. This serve as inspiration for developing specialized hardware
escalating memory burden necessitates the use of hardware tailored for DL tasks. This iterative cycle, fosters continuous
accelerators with sufficient memory capacity and bandwidth progress and innovation in both the algorithmic and hardware
to effectively address the memory demands of CNNs. realms, ultimately propelling advancements in diverse CV
applications.
3) ENERGY EFFICIENCY
For resource-constrained edge devices and embedded D. EXISTING SURVEYS
systems deploying CNNs, energy efficiency becomes a
Several valuable surveys have explored the application of
paramount concern. To address this challenge, hardware
DL by using CNNs in industrial surface defect detec-
accelerators have emerged, to execute CNN computations.
tion. Qi et al. [42] offer a comprehensive overview, while
These accelerators strive to achieve real-time inference
Cumbajin et al. [43] provide a detailed classification of
speeds while minimizing power consumption. To further
commonly encountered defects. Additionally, several recent
minimize the energy footprint of CNN architectures,
scholarly reviews have comprehensively surveyed the appli-
compression techniques such as quantization, pruning, and
cations of deep learning for surfaces and materials across
memory access optimization can be employed. Hardware
diverse fields. These include steel surfaces [44], rail
platforms characterized by low power consumption, like
tracks [45], photovoltaic systems [46], [47], [48], fabrics [49],
dedicated NN accelerators or edge devices designed for
[50], [51], and displays [52]. While these works shed light
power efficiency, facilitates the deployment of CNN
on the potential of CNNs, they fail to address the crucial
architectures with enhanced energy efficiency.
relationship between these architectures and the hardware
4) SCALABILITY AND PARALLELISM
accelerators necessary for successful real-world deployment.
Addressing this gap, authors in [53] investigate efficient CNN
These terms are intricately linked concepts in the context
implementation across various hardware platforms, but their
of CNN architecture. Scalability pertains to the ability of a
focus remains generic, excluding specific industrial scenar-
CNN to efficiently utilize multiple hardware resources for
ios. A broader survey by [54] explores the integration of
executing large datasets or conducting parallel computations.
CV algorithms with hardware accelerators, including object
Hardware platforms equipped with parallel processing capa-
detection, but lacks a specific connection to surface defect
bilities, like GPUs, contribute to accelerated training and
detection in industrial settings. This paper distinguishes itself
inference speeds by leveraging inherent parallelism within
by offering a comprehensive and up-to-date survey that
CNN architectures. Furthermore, advancements in hardware
specifically links object detection, hardware accelerators, and
design, including systolic arrays and tensor processing units
industrial defect detection within the overall CNN ecosystem
(TPUs), play a pivotal role in enabling efficient parallel
to provide a valuable resource for researchers, practitioners,
execution of CNN computations, thereby enhancing overall
and industry professionals seeking to leverage its combined
scalability.
power for robust and efficient industrial defect detection
5) FLEXIBLE DEPLOYMENT systems.
The deployment flexibility of CNNs is significantly
influenced by hardware considerations. Diverse application E. ORGANIZATION OF PAPER
domains often necessitate specific hardware platforms due This article is subsequently organized into the following
to varying constraints such as weight, power consumption, sections: Section II aims to guide readers through the
physical size, and cost limitations. For example, while foundational components of CNNs along with the crucial
GPU-based systems excel in high-performance computing mathematical computations that facilitate their functionality.
scenarios, their inherent resource requirements might render Section III delves into the architectural evolution from ANNs
them impractical for resource-constrained environments. to CNNs. Section IV dives into the intricate domain of OD,
Conversely, FPGAs provide superior reconfigurability, offering a comprehensive analysis of its historical trajectory
and current SOTA approaches. This analysis encompasses of a CNN for image recognition mandates an initial training
not only the architectural advancements but also the essential phase using extensive datasets comprising labeled images
ecosystem surrounding OD, including established frame- highlighting the focal areas. Throughout this process, CNN
works, readily available Application Programming Interfaces undergoes an iterative refinement, in which it acquires
(APIs), diverse datasets for model training and evaluation, the ability to correlate the discerned features with the
and robust metrics for quantifying performance. The subse- accurate corresponding labels using the implementation of
quent section, Section V, dives deep into the fascinating realm backpropagation and optimization techniques [55], [56],
of hardware acceleration for Industrial IoT-powered visual [57], [58]. Once the CNN attains proficiency through
inspection. This segment dissects the intricate functionalities training, its applicability extends to new and previously
and application potentials of three key technologies: GPUs, unseen images. This prediction process involves presenting
FPGAs, and ASICs. Section VI embarks on the primary the image through the network, where CNN discerns and
aim of this review, exploring the considerable potential evaluates the extracted features, ultimately selecting the
of CNNs for revolutionizing Industrial Defect Detection label corresponding to the highest predicted probability as
Systems. It provides a detailed exploration of CNN’s ability the output. This systematic approach underscores CNN’s
to be leveraged for robust inspection methodologies across adaptability and effectiveness in image recognition tasks.
a diverse spectrum of surfaces, encompassing pallet racks, Figure 3 provides a basic representation of a CNN
steel, rail, magnetic tiles, photovoltaic cells, fabric, screens, architecture designed for image classification. The CNN
and beyond. Section VII focuses on the critical challenges and receives an input image of forklift as a matrix of pixel
promising future directions that require further investigation. values. This image is subsequently introduced into the
Finally, Section VIII summarizes the key takeaways and input layer, which serves as the initial stage of feature
concludes this review. Figure 2 presents an overall structure extraction. The initial layer of the NN receives the input
of this survey. image and passes it on to the subsequent hidden layer. Within
this layer, a collection of filters, also termed kernels, are
II. BUILDING BLOCKS OF CNN employed to process the image data. These filters, typically
CNNs represent a formidable class of NNs prominently small matrices of weights, slide across the image, extracting
employed in the domain of image recognition. Characterized low-level features like edges and patterns. The output of
by a hierarchical architecture, CNNs incorporate pooling this layer is a feature map, where each pixel represents
and convolutional layers, which are configured to capture the activation of a particular filter at a specific location
pertinent characteristics from input images. After feature in the image. Subsequently, the feature map is transmitted
extraction, multiple FC layers leverage these discerned to the succeeding hidden layer, where a new array of
features to formulate predictive outcomes. The deployment filters of varying complexity is applied. These filters extract
higher-level features from the image, such as shapes, patterns, to extract patterns hierarchically, first recognizing simpler
and objects. This process of feature extraction and abstraction features like lines and curves, and then gradually progressing
is iterated across multiple hidden layers, each refining and to more complex patterns such as faces and objects. This
enriching the representation of the image. The final hidden hierarchical processing mirrors the human visual system,
layer generates features that are transferred to the output enabling CNNs to learn and recognize complex visual
layer consisting of a FC NN that classifies the image into patterns, making them a powerful tool for CV tasks.
one of the predefined categories. The final layer of the NN
generates a probability distribution for each of the potential B. CNN LAYERS
classes: forklift, car, and truck, indicating the confidence level A CNN generally comprises multiple layers, each assigned
associated with each class. The prediction of the network a distinct role within the network. Each one of them are
corresponds to the class exhibiting the strongest probability. explained below:
The filters’ weights in every CNN layer undergo a learning
mechanism known as backpropagation. This iterative process 1) CONVOLUTIONAL LAYER
allows the CNN to learn the optimal weights for extracting The Convolutional layer stands as the fundamental corner-
relevant features and making accurate classifications. stone in the architecture of a CNN [59], [60], composing
a collection of convolutional kernels, also known as filters,
which play a pivotal role in feature extraction from input
images. The convolutional kernel slides across the image,
dividing it into smaller matrices known as receptive fields.
This image segmentation facilitates the extraction of dis-
tinctive feature patterns. The convolution operation slides
the kernel across the image and performs an element-wise
multiplication between the kernel weights and the resultant
pixels in the receptive field [61], generating a feature map,
where each pixel represents the activation of a particular filter
at a specific location in the image. The convolution operation
can be mathematically represented in Equation 1:
FIGURE 3. A basic representation of a CNN architecture. XX
fkl (p, q) = ic (x, y) · elk (u, v) (1)
c x,y
CNNs have become the dominant approach for image
classification owing to their ability to effectively extract where fkl (p, q) represents the activation of neuron (p, q) in the
and acquire features from images, leading to optimised lth feature map produced by the kth filter, ic (x, y) represents
performance compared to traditional ML algorithms. Their the pixel value at position (x, y) in the cth channel of the input
success has revolutionized various applications, including image, and elk (u, v) represents the value at position (u, v) in
object detection, image recognition, and medical image the kth filter for the lth feature map.
analysis. A comprehensive overview of the CNN components
is essential for crafting innovative architectures and con- 2) POOLING LAYER
sequently attaining improved performance. Therefore, this Pooling layers serve a critical function in minimizing the
section provides a concise exploration of the foundational spatial dimensions (width and height) of feature maps,
components of CNNs, delving into the basic architecture to facilitating efficient processing in subsequent convolutional
foster a nuanced understanding of the various architectural layers [62], [63], [64]. The process executed by this layer,
variants within the realm of CNNs. also known as downsampling or subsampling [65], results in
a concurrent loss of information due to the size reduction.
A. INPUT IMAGE This layer operates independently on each feature map and
Digital images are composed of pixels, the fundamental units achieves downsampling through renowned strategies such as
of visual information. Each pixel is represented by a binary average and max pooling. However, this loss is advantageous
value ranging from 0-255, corresponding to its brightness for the network since it mitigates computational complexity
and hue. Upon viewing an image, the human brain processes and enhances resilience to minor translations in the input
a vast amount of information within the first second. This image. Equation (2) mathematically represents the pooling
remarkable ability stems from the intricate network of operation, where Zkl denotes the pooled feature map at the
neurons in the visual cortex, where each neuron possesses a lth layer for the kth input feature map. The function gp
receptive field, a specific region of the visual field to which determines the specific type of pooling operation employed,
it responds. Similar to the biological vision system, CNNs such as max pooling or average pooling.
also employ receptive fields, allowing individual neurons to
analyze data within their assigned areas. CNNs are designed Zkl = gp (Fkl ) (2)
its significance was not fully appreciated until 1986. This D. DOMINANCE (2010S)
algorithm enables NNs to learn from errors by modifying Following AlexNet’s triumph, CNNs have become one of
the weights of neurons, adjusting them according to the error the most prevalent architectures in DL. Despite being based
in the output layer. This made it possible to train NNs more on LeNet, currently, CNN models differ significantly from
efficiently and effectively, leading to a renewed interest in the their predecessors. As CNNs evolved, advanced processing
field and the development of a new NN architecture known units and new network blocks have been designed that have
as the CNN. contributed significantly to their improvement.
This traces back to the early 1960s when Hubel and ZFNet, designed by Zeiler and Fergus [126] in 2013,
Wiesel [118] disclosed that few neurons in the cat’s visual represents an evolutionary iteration of AlexNet, yielding
cortex and primates are sensitive to specific patterns of light enhanced performance metrics within the framework of the
and dark. This led to the emergence of the Neocognitron, ILSVRC. Noteworthy modifications include a reduction in
a model of visual perception introduced by Fukushima the first layer’s filter dimensions, downsized from 11 ×
in 1980 [119]. It was the neocognitron that introduced 11 to 7 × 7. Impressively, ZFNet achieved a top-5 validation
convolutional layers, which are a defining characteristic error rate of 16.5 during its training, which spanned
of CNNs. Fukushima arranged ‘‘S-cells’’ and ‘‘C-cells’’ in 12 days and leveraged the computational capabilities of a
alternating layers, creating a hierarchy known as ‘‘sandwich GTX 580 GPU.
layers’’ (SCSCS. . . ). S and C cells exhibit characteristics GoogLeNet [127], [128], [129], created by Szegedy and
close to simple and complex cells found in the visual cortex. his research team in 2014, stands as a CNN architecture
‘‘S-cells’’ possess adjustable parameters, while ‘‘C-cells’’ distinguished by the introduction of the Inception module
perform pooling operations. Backpropagation was not used [127]. This innovative convolutional layer demonstrated
on the network at the time. superior efficiency in learning complex features compared to
In 1998, LeNet-5 was developed at Bell Labs by Yann its predecessors. Notably, GoogLeNet boasted a remarkable
LeCun and his research team, which further enhanced the reduction in the number of parameters, utilizing 12 times
performance of CNNs [120]. LeCun used backpropagation fewer parameters than the prominent AlexNet. The training
to train Fukushima’s ANN [121], which achieved an error regimen for this model, executed on a select few high-
rate of 1% and a reject rate of about 9% on zip code digits. end GPUs, was accomplished within a week. It achieved
LeCun further improved CNNs using an error gradient-based a 6.67% top-5 error rate, underscoring its efficacy in
learning algorithm. image classification tasks. The Visual Geometry Group
Due to dwindling public interest and funding, only a (VGG) [130], a CNN model devised by Simonyan and
handful of researchers persevered in fields like pattern Zisserman of the University of Oxford, extends the depth
recognition. However, this period saw the emergence of of NN architectures. This augmentation not only attains
several paradigms that continue to be refined in modern SOTA precision across the ILSVRC datasets [131] but also
research. demonstrates applicability to various other image recognition
databases.
C. RESURGENCE (2000S) The introduction of skip connections in 2015, pioneered by
In the early 2000s, the limitations of computing power led to a the ResNet architecture [132], [133] for training deep CNNs
decline in the popularity of CNNs. However, the introduction [134], [135], marked a significant turning point in the field
of new training algorithms, such as greedy layer-wise training of CV. Huang et al. [134] and his research team initiated the
[122], rekindled interest in CNNs. These algorithms allowed development of DenseNet in 2016, a CNN that introduced an
for more efficient training of CNNs on larger datasets, innovative dense connectivity pattern. This novel architecture
resulting in improved performance. enables DenseNets to discern intricate features even with
Later in 2012, a CNN named AlexNet [123], [124] limited data. Noteworthy is the incorporation of depthwise
achieved victory in the ImageNet Large Scale Visual separable convolutions, reducing the number of connections
Recognition Challenge (ILSVRC), a competition involving and yielding a lightweight model. Subsequently, this con-
the classification of images into thousands of categories. cept was incorporated into numerous subsequent networks,
AlexNet’s victory marked a significant turning point in the including Inception-ResNet, Wide ResNet, and ResNeXt
development of CNNs, sparking a renewed surge of interest [136], [137], [138]. Various architectural configurations,
in the field. AlexNet’s complex architecture, being one of the including Wide ResNet, Pyramidal Net, PolyNet, ResNeXt,
first Deep Convolutional Neural Networks (DCNNs), played and Xception, investigate the influence of multilevel transfor-
a crucial role in its success. Utilizing numerous layers of mations on the learning capability of CNNs. This is achieved
convolutional and pooling operations, it effectively extracted by incorporating aspects such as cardinality or enhancing
and examined image features across different scales. This network width [137], [138], [139], [140].
deep architecture, along with the effective utilization of Accordingly, research focused on improved network archi-
GPUs, ReLU activation functions [125], regularization tech- tecture rather than parameter optimization and connection
niques like dropout [83] and data augmentation, contributed readjustment. This shift has led to a surge of novel
significantly to AlexNet’s outstanding performance. architectural ideas, including channel boosting, spatial and
VOLUME 12, 2024 94259
R. Khanam et al.: Comprehensive Review of Convolutional Neural Networks for Defect Detection
feature-map-wise exploitation, and attention-based informa- industries. They have revolutionized CV, enabling machines
tion processing [141], [142], [143]. to perform tasks like object detection, image classification,
image segmentation, and natural language processing with
E. MODERN ERA remarkable accuracy. CNNs are also playing an increasingly
In recent years, numerous CNN architectures have been important role in other fields, such as robotics, autonomous
designed to enhance operational efficiency and produce vehicles, and healthcare.
models of reduced computational burden. MobileNets [144],
[145] represent an exemplification of such endeavors, empha- IV. OBJECT DETECTION
sizing lightweight design principles to achieve heightened Since the commencement of the 2010s, there has been a
efficacy by employing techniques like depthwise separable notable acceleration in research focused on the application
convolutions, inverted residuals, and squeeze-and-excitation of DL methodologies to address CV challenges. These
layers. MobileNets stand as a testament to the pursuit of challenges encompass a diverse range of tasks, including
computational efficiency within CNN frameworks. Con- image classification, object detection, edge detection, object
versely, EfficientNet [146] embodies a paradigm wherein tracking, image segmentation, and feature extraction. Among
precision and efficiency converge harmoniously by leverag- these, OD has emerged as a central area of interest for defect
ing a compound scaling method that systematically adjusts detection, primarily due to its inherent ability to localize and
both network width and depth, EfficientNet ensures a classify defects as objects. This capability aligns seamlessly
judicious balance in enhancing accuracy while maintaining with the fundamental objective of defect detection [150],
computational efficiency. which is to identify and pinpoint the location of defects within
The Vision Transformer (ViT) [147] marks a departure a given image or video sequence.
from traditional CNN paradigms by adopting the transformer Figure 4 illustrates the layered architecture of CNN for OD.
architecture for image processing, demonstrating notable The process begins with the convolution of input image by
performance benchmarks across diverse image classification an activation function, resulting in feature maps that capture
tasks. A refinement of ViT, the Swin Transformer [148], spatial dependencies and local features. To mitigate spatial
introduces heightened efficiency and scalability through a complexity and retain key features, the network employs
hierarchical transformer architecture and a shifted window pooling layers. These layers downsample the feature maps
attention mechanism. ConvNeXt [149] distinguishes itself by while preserving relevant information. This process is iterated
incorporating a pioneering attention mechanism known as through multiple convolutional and pooling layers, each with
cross-covariance attention (CCA), contributing to its efficacy an increasing number of filters, resulting in a hierarchy
as a CNN. Table 1 presents a chronological overview of of increasingly abstract feature representations. Finally, the
key CNN architectures, showcasing notable developments extracted features are fed into FC layers, which act as a
from 1959 to 2022. Each entry includes the year of classifier and generate the output probabilities for different
inception, the architecture name, the primary authors, and its object classes. The primary goal is to identify instances
description. of real-world objects, such as humans, animals, bicycles,
Today, CNNs stand as one of the most powerful tools cars, etc, within real-time videos or still images [151]. This
in AI, with a wide range of applications across various process facilitates the recognition, localization, and detection
its versatility has extended to a wide range of object Additionally, Girshick introduced a novel acceleration tech-
classes. To effectively identify objects of varying resolutions, nique, achieving a 10x speedup compared to traditional
the HOG detector employs a technique called multi-scale methods without compromising accuracy [173], [178].
detection. The input image is rescaled numerous times,
keeping the dimensions of the detection window constant. B. DEEP LEARNING DETECTORS
This approach allows the detector to effectively capture The field of OD experienced a period of stagnation following
objects at different scales without significantly increasing the saturation of handcrafted features in the early 2010s. The
computational complexity. The HOG detector has become an resurgence of CNNs in 2012 [123] presented a promising
integral component of numerous OD algorithms [172], [173], avenue for revitalizing OD. In particular, the introduction
[174], and has found widespread applications in various of Region-based CNN (RCNN) by Girshick et al. in 2014
CV tasks. Its robustness to image transformations, efficient [177], [180] marked a significant breakthrough, propelling
representation of object shapes, and adaptability to various OD into an era of unprecedented advancement. The advent
object classes have made HOG an indispensable tool in OD of OD models using DL introduced a fundamental distinction
research and practice. between two main approaches: ‘‘two-stage detectors’’ and
‘‘one-stage detectors’’.
3) DEFORMABLE PART-BASED MODEL (DPM)
The Deformable Part-Based Model (DPM), introduced by 1) TWO-STAGE DETECTORS
Felzenszwalb et al. in 2008 [173], marked a significant Two-stage detectors adopt a coarse-to-fine strategy, employ-
advancement in traditional OD. DPM gained remarkable ing a series of stages to refine detection results. They
success, winning the prestigious PASCAL Visual Object follow a sequential approach wherein they initially propose
Classes (VOC) detection challenges in 2008 and 2009. DPM regions of interest or candidate object locations, subsequently
built upon the foundation of the HOG descriptor, introducing evaluating these proposed regions for object presence. This
a more flexible and adaptable representation for OD. The core section provides a detailed summary of different two-stage
concept of DPM lies in the principle of ‘‘divide and conquer.’’ detectors.
This approach involves decomposing the object into multiple
parts and treating the OD task as an ensemble of these a: RCNN
parts. This decomposition enables DPM to handle object The emergence of OD methods based on handcrafted features
variations effectively, accounting for non-rigid deformations marked a significant advancement in the field. However, these
and pose changes. A standard DPM detector consists of methods faced several limitations, including high computa-
two main filters: a root filter and multiple part filters. tional cost, sensitivity to noise, and difficulty in handling
The root filter represents the overall object shape, while object occlusion. To address these drawbacks, RCNNs uti-
the part filters represent individual object components. The lized a selective search method [181] to significantly reduce
part filters are learned automatically using latent variables, the number of region proposals. As depicted in Figure 6, the
eliminating the need for manual configuration. This process, RCNN architecture employs a selective search mechanism to
known as ‘‘multi-instance learning,’’ [175] was extensively extract a subset of approximately 2000 region proposals from
modified by Girshick et al. [176], [177], [178], [179], the input image. The subsequent step involves resizing these
enhancing the learning efficiency and accuracy. To further region proposals to a consistent image size and subjecting
improve detection accuracy, DPM incorporated methods like them to a pre-trained CNN model, such as AlexNet, trained
hard negative mining, bounding box regression, and context on the ImageNet dataset, to capture a diverse array of
priming. These techniques effectively filtered out irrelevant features. Finally, a SVM classifier is employed to determine
negative examples and refined bounding box predictions. the existence of an object within each region proposal
instance segmentation into the Faster RCNN framework. ResNet-FPN [132], [187], a combination of ResNet and
As depicted in Figure 11, Mask RCNN uses the two-stage FPN. This combination provides Mask RCNN with the
pipeline of Faster RCNN, while incorporating an additional ability to extract both fine-grained semantic information
branch for predicting object masks. This enables Mask and accurate localization cues, enabling high-performance
RCNN to effectively identify and segment diverse objects OD and instance segmentation. However, the introduction
within an image or video, addressing a critical instance of the mask branch adds a small computational overhead
segmentation challenge in CV applications. The mask to the network, resulting in a processing speed of approxi-
branch operates parallel to the class label and bounding mately 5 FPS. Despite this minor drawback, Mask RCNN has
box (BB) regression branches, enabling Mask RCNN to established itself as a powerful and versatile tool for instance
simultaneously detect objects, localize them precisely, and segmentation, offering a significant step forward in the field
generate high-quality segmentation masks for every instance. of OD.
This parallel processing enables the network to efficiently
generate all three outputs: object class label, bounding box 2) ONE-STAGE DETECTORS
coordinates, and object mask. Most of the two-stage object detectors operate within a
The effectiveness of Mask RCNN stems from its ability coarse-to-fine processing paradigm. This approach facilitates
to effectively utilize features extracted from the CNN using high precision without requiring intricate architectural
embellishments. However, its inadequate speed and inherent a ‘‘fast’’ version capable of processing frames at a staggering
complexity limit its practical applicability in engineering 155 fps while maintaining moderate accuracy (VOC07
applications. In contrast, one-stage detectors offer the advan- mAP = 52.7%), and an ‘‘enhanced’’ version achieving higher
tage of one-step object inference, making them attractive for accuracy (VOC07 mAP = 63.4%) at 45 fps. This represented
mobile devices due to their real-time capabilities and ease of a radical departure from the dominant two-stage paradigm
deployment. These detectors directly predict object bounding by employing a single NN to analyze the entire image
boxes and class labels from the input image, eliminating the in one pass. While offering dramatic speed improvements,
need for separate region proposals. YOLO exhibited a trade-off with localization accuracy,
particularly for smaller objects, when compared to its two-
stage counterparts. Subsequent iterations of YOLO [191],
a: YOLO [192], [194], along with concurrently developed detectors
In 2016, the field of OD witnessed a significant paradigm like SSD [205], focused on mitigating this accuracy-speed
shift with the introduction of You Only Look Once trade-off. YOLOv5 [197] took a different turn with its
(YOLO) by Redmon et al. [189] which is depicted in PyTorch implementation and modular architecture, prior-
Figure 12. From the single-shot approach of YOLOv1 to itizing speed and adaptability. YOLOv6 [200] explored
the anchor-free elegance of YOLOv8, each iteration has reparametrization and attention modules. Then, the YOLOv4
brought groundbreaking features and performance leaps. [194] team unveiled YOLOv7 [202], a further refinement
YOLOv1 distinguished itself by its remarkable speed, with that leveraged optimized architectural elements like dynamic
94266 VOLUME 12, 2024
R. Khanam et al.: Comprehensive Review of Convolutional Neural Networks for Defect Detection
label allocation and model design reparameterization. This advantages in speed and simplicity. This performance
iteration delivered impressive performance, surpassing most gap was investigated by Lin et al. in 2017 [206] by
existing object detectors in terms of speed and accuracy introducing RetinaNet, where he identified the exceptional
across a spectrum spanning 5-160 fps. Finally, YOLOv8 foreground-background class imbalance in the training phase,
[203] marked a paradigm shift with its anchor-free detection which hinders the learning process for dense one-stage
and transformer-based backbone, promising even greater detectors. RetinaNet introduces a novel loss function called
precision and efficiency. This ongoing evolution reflects the focal loss which modifies the conventional cross-entropy
remarkable dedication of YOLO’s developers to constantly loss by assigning greater emphasis to hard, misclassified
elevate the state of the art in OD, offering a diverse toolbox for examples. By focusing on these challenging instances, focal
tackling real-world challenges across various applications. loss enables RetinaNet to achieve accuracy comparable to
Table 2 charts the evolution of YOLO models from YOLOv1 two-stage detectors while maintaining significantly faster
to YOLOv8, highlighting key features, performance metrics, inference speeds (COCO [email protected] = 59.1%). This is
and architectural choices for each version. illustrated in Figure 14.
b: SSD d: SQUEEZEDET
In 2016, Liu et al. [205] introduced the Single-Shot Multibox SqueezeDet was introduced by Wu et al. [207], which
Detector (SSD), depicted in Figure 13. Their key innovation is a lightweight, single-stage, and highly efficient FCNN
lies in the multireference and multiresolution detection for detecting objects in systems related to autonomous
techniques. These techniques enable SSD to surpass the driving. The successful deployment of deep CNNs for OD in
precision of previous one-stage detectors, particularly for real-time necessitates the resolution of critical issues such as
small objects. SSD demonstrates strong performance in speed, model resolution, accuracy, and power consumption.
relation to speed and accuracy, achieving a 46.5% [email protected] Notably, the SqueezeDet model adeptly addresses these
on the COCO benchmark while a dedicated fast version challenges, as illustrated in Figure 15. It achieves real-time
operates at 59 fps. A distinctive feature of SSD compared to OD through a three-step process. First, it extracts high-
prior detectors is its ability to detect objects at different scales dimensional, low-resolution features via a single forward
across various network layers, unlike previous methods that pass using stacked convolution filters. Next, the innovative
restricted detection to the uppermost layers. ConvDet layer leverages these features to simultaneously
generate numerous bounding box proposals and predict
c: RETINANET their object categories. Finally, post-processing refines these
One-stage object detectors have long suffered from lower detections, yielding accurate and efficient object identifica-
accuracy compared to two-stage counterparts, despite their tion. SqueezeNet [208] serves as the backbone architecture of
SqueezeDet, where the model size remains remarkably small eters, and prolonged training times. Recognizing these
at less than 8 MB, significantly exceeding AlexNet [123] in limitations, Law and Deng [210] proposed a paradigm shift,
compactness while maintaining comparable accuracy. With reframing OD as a keypoint (bounding box corner) prediction
approximately two million trainable parameters, SqueezeDet problem. CornerNet leverages a CNN to directly predict
outperforms VGG19 [130] and ResNet-50 [132] in terms the locations of objects as paired keypoints, specifically the
of accuracy despite boasting significantly fewer parameters top-left and bottom-right corners. To facilitate accurate corner
(143 million and 25 million, respectively). On the Kitti localization, CornerNet introduces ‘‘corner pooling’’ layer,
dataset [209], SqueezeDet achieves an impressive 57.2 fps specifically designed to extract relevant features for corner
for input images of size 1242 × 375, while consuming only detection. The CNN generates a heatmap corresponding to
1.4 J of energy per image. These results demonstrate the all top-left and bottom-right corners, accompanied by an
model’s remarkable efficiency, rendering it highly suitable for incorporated vector map for identified corner. This novel
real-time OD in applications related to autonomous driving. approach surpassed the performance of most contemporary
one-stage detectors, achieving a COCO [email protected] of 57.8%.
e: CORNERNET The architectural configuration of CornerNet is illustrated
Prior OD frameworks predominantly relied on anchor in Figure 16. However, a notable limitation pertains to
boxes as a means for both classification and regression its propensity for generating inaccurate paired key points
reference points. This approach inherently assumes a degree associated with the detected object.
of uniformity in object characteristics, such as number,
location, scale, and aspect ratio. To achieve high performance, f: CENTERNET
these methods rely on establishing an extensive quantity of Zhou et al. [211] introduced CenterNet, a keypoint-based
pre-defined anchor boxes to better encompass the ground OD framework operating on a fully end-to-end mecha-
truth object configurations. However, this approach suffered nisms, as depicted in Figure 17. Unlike prior models like
from several drawbacks, including exacerbated category CornerNet [210] and ExtremeNet [212] which rely on costly
imbalance, reliance on numerous hand-tuned hyperparam- computational post-processing procedures like group-based
platform for evaluating and advancing OD algorithms. While has emerged as the de facto standard for the OD community.
featuring fewer object categories compared to ILSVRC, Its comprehensive annotations, diverse object instances,
MS-COCO compensates with a significantly larger number and challenging scenarios provide a robust platform for
of object instances per image. For instance, MS-COCO- algorithm development and evaluation, driving significant
17 boasts 164,000 images and 897,000 annotated objects advancements in the field.
across 80 categories. A key differentiator of MS-COCO
compared to datasets like Pascal VOC and ILSVRC lies in its d: OPEN IMAGES
inclusion of per-instance segmentation annotations alongside Building upon the success of the MS-COCO dataset,
bounding boxes. This detailed labeling facilitates precise 2018 witnessed the launch of the Open Images Detection
object localization and understanding of complex inter- (OID) Challenge [237], a landmark initiative that signifi-
object relationships. Furthermore, MS-COCO challenges cantly expanded the scale and complexity of OD tasks. Unlike
algorithms with its abundance of small objects (area less than MS-COCO, OID tackles two distinct tasks:
1% of the image) and densely packed scenes, pushing the 1. Standard Object Detection: This task aligns with
boundaries of detection accuracy and scalability. Similar to traditional OD frameworks, requiring the identification and
the impact of ImageNet on image classification, MS-COCO localization of individual objects within images. The OID
dataset for this challenge boasts an impressive 1.91 million tasks that encourage the development of more robust and
images, each annotated with 15.44 million bounding boxes nuanced algorithms capable of understanding the rich visual
encompassing 600 distinct object categories. This substantial interactions within images.
scale surpasses prior datasets by a significant margin, offering
researchers a rich and diverse ground for benchmarking and e: ROBOFLOW 100
advancing OD algorithms. Roboflow 100 (RF100) [238] was introduced in 2022 to
2. Visual Relationship Detection: Stepping beyond single- shift beyond the limitations of single-domain datasets like
object identification, OID introduces the challenging task of MS-COCO. RF100 emerged as a diverse and challenging
detecting relationships between pairs of objects within an benchmark for OD research by providing the following
image. This novel task delves into the intricate semantic con- features:
nections present in complex scenes, pushing the boundaries 1. Domain Diversity: Spanning 7 distinct domains (Aerial,
of CV beyond mere object localization. Videogames, Microscopic, Underwater, Documents, Electro-
Overall, OID represents a significant advancement in the magnetic, Real World) with over 224,714 images, RF100
field of OD, offering a comprehensive dataset and diverse challenges models to adapt to varied visual characteristics
and object types beyond the typical focus on common objects [239], [240], which replaced FPPW with the more holistic
in everyday settings. This diversity fosters a more realistic ‘‘false positives per-image (FPPI)’’ metric.
assessment of model generalizability across different tasks In recent years, the field of OD has primarily relied on
and environments. average precision (AP) as the primary evaluation metric.
2. Rich Annotations: Over 829 class labels with 11,170+ Originally introduced within the Pascal VOC2007 challenge,
hours of manual labeling ensure accurate and detailed annota- AP quantifies the AP achieved across different recall levels,
tions for diverse objects, enabling precise object localization typically evaluated for individual object categories. This
and understanding of complex inter-object relationships. This comprehensive metric balances both the true positive rate
level of annotation granularity is crucial for training and (OD accuracy) and the false positive rate (incorrect detec-
evaluating robust models capable of handling intricate real- tions) across the entire range of recall values. Additionally,
world scenarios. the mAP, calculated by averaging AP across all object
3. Accessibility: Open-sourced and publicly available, categories, is often employed as a single, overarching
RF100 promotes widespread research and development performance indicator. To assess the accuracy of object local-
activities by providing researchers with a readily accessible ization, the Intersection over Union (IoU) between predicted
platform for benchmarking and improving their OD models. bounding boxes and ground-truth annotations is employed.
4. Benchmarking Tools and Visualization: Code is avail- This metric measures the overlap between the predicted and
able for replicating benchmark results and performing actual object locations, with a threshold (typically 0.5) used
fine-tuning and evaluation on YOLOv5 and YOLOv7 mod- to determine whether a detection is considered successful.
els, allowing researchers to easily compare and analyze their If the IoU exceeds the threshold, the object is deemed
models’ performance on RF100. Additionally, the RF100 ‘‘detected,’’ otherwise it is classified as ‘‘missed.’’ This
website provides a platform for visualizing images and binary classification based on IoU has become the practiced
trained model results, facilitating data exploration and model standard for evaluating OD performance, with the 0.5-IoU
performance analysis. mAP serving as the primary benchmark for comparing and
ranking detectors.
2) METRICS Following the introduction of the influential MS-COCO
The question of how to accurately evaluate object detectors dataset in 2014, the focus within OD research demonstrably
is not static but rather adapts to the evolving landscape of shifted towards accurate object localization. Prior to this,
detection research. In the early days, a lack of consensus evaluation metrics often relied on a fixed IoU threshold for
existed on suitable metrics. For instance, pedestrian detection classifying detections as true positives. While this offered
research primarily employed the ‘‘miss rate versus false a simple binary assessment, it failed to capture the nuances
positives per window (FPPW)’’ metric [168]. This per- of precise bounding box placement. The MS-COCO dataset
window approach, however, proved inherently flawed, failing challenged this paradigm by employing an AP metric
to accurately predict detector performance for the entire calculated across a range of IoU thresholds (typically from
image [239]. This prompted a paradigm shift in 2009 with the 0.5 to 0.95). This innovative approach encouraged models
introduction of the Caltech pedestrian detection benchmark to not only identify objects but also to delineate their
spatial extent with greater accuracy. This is arguably of and remarkable acceleration for highly parallel workloads,
paramount importance for real-world applications, where such as CNNs [246]. Consequently, in the modern era, GPUs
imprecise localization could have significant consequences. have transformed from solely compelling graphics engines
Consider, for example, a robotic arm tasked with grasping to versatile, highly parallelized processing units, boasting
a specific tool. Even a marginally misplaced bounding box impressive throughput and memory bandwidth, ideally suited
could lead to a missed grasp and potentially hinder the robot’s for parallel computing paradigms.
functionality. By fostering research into improved localiza- Modern computing landscapes encompass two distinct
tion capabilities, the MS-COCO dataset has significantly processing paradigms: multiple CPUs and GPUs. CPUs
impacted the trajectory of OD research and paved the way for typically exhibit multi-instructional, out-of-order execution,
more robust and nuanced applications in diverse real-world leveraging large caches to mitigate single-thread latency
settings. while operating at high frequencies. In contrast, GPUs
possess thousands of in-order cores, relying on smaller
V. HARDWARE CATALOGUE caches and lower frequencies for parallel processing effi-
While the architectural advancements in OD as discussed in ciency [247]. Recognizing the challenges associated with
the previous sections have demonstrably fueled the success GPU-based application development and integration, various
of CNNs, it is crucial to acknowledge that architectural platforms have emerged to bridge this gap. Notable examples
breakthroughs are not the sole engine driving CV’s progress. include Open Computing Language (OpenCL) [248] and
The evolution of hardware over the preceding decades stands NVIDIA’s widely adopted Compute Unified Device Archi-
as an equally consequential contributing factor, particularly in tecture (CUDA) [249].
the context of deploying CNNs [241]. Significantly impactful The symbiosis between DL and GPUs has profoundly
advancements in hardware acceleration have yielded robust impacted various scientific domains. Examining the intricate
parallel computing architectures, propelling the efficient architecture of CNNs reveals a remarkable alignment with the
training and inference of increasingly complex, multi- inherent parallelism of GPUs. This synergy manifests in the
layered, and DCNN architectures. efficient execution of convolutional operations, diverse sub-
Hardware acceleration employs targeted interventions in sampling strategies, and neuron activations within FC layers
computer hardware to achieve a demonstrably reduced facilitated by binary-tree multipliers [250]. Recognizing the
latency and enhanced throughput for computational tasks, immense potential of GPUs in accelerating CNNs, a plethora
in contrast to traditional software execution on general- of libraries have emerged to facilitate seamless integration.
purpose CPUs. Notably, Princeton architectures have his- Prominent examples include cuDNN [251], Cuda-convert
torically prioritized serial computation models, coupled [252], and libraries embedded within popular DL frameworks
with intricate task scheduling algorithms [242]. CNNs pose like Caffe [253], TensorFlow [218], and Torch [254].
significant computational challenges due to their inherent The evaluation of GPU efficiency for DL applications
reliance on dense parallel computation. This reliance neces- hinges on three primary performance metrics: memory
sitates high memory bandwidth and often leads to excessive efficacy, computational throughput, and power consumption.
power consumption, particularly when dealing with complex These metrics collectively paint a picture of a GPU’s ability
network architectures [243]. to handle the intensive computational demands of DL tasks
Recognizing these hurdles, researchers and hardware while optimizing resource utilization. Among GPU vendors,
vendors have embarked on a concerted effort to develop inno- NVIDIA has cemented its position as the dominant force
vative strategies for boosting processing capabilities. This in the DL realm. Recognizing the diverse landscape of
endeavor strives to achieve enhanced parallelism, optimized DL applications, coupled with the constraints of demand-
inferencing, and efficient power utilization. This section ing deployment environments and budgetary limitations,
delves into a critical evaluation of prominent hardware NVIDIA has consistently expanded its GPU portfolio over
acceleration implementations, meticulously examining their the past two decades.
contributions, potential drawbacks, and broader implications Acknowledging the limitations of diverse GPU variants
for applications within the realm of CV. within resource-constrained environments demanding edge
deployment, compact form factors, and cost efficiency,
A. GPU NVIDIA developed the Jetson platform. Characterized by
The emergence of the Graphical Processing Unit (GPU) as a a heterogeneous architecture, Jetson leverages the CPU for
versatile computational force has transformed the landscape core operating system (OS) management while offloading
of modern computing. Initially conceived as a dedicated DL workloads onto the CUDA-powered GPU. This strategy
accelerator for real-time 3D graphics applications, rendering, facilitates the delivery of server-grade compute performance
and gaming [244], [245], the GPU’s inherent potential for at an attractive price point, evidenced by the proliferation
broader scientific and engineering applications was quickly of various Jetson variants specifically tailored for low-power
recognized as the 21st century unfolded. This realization embedded applications. Consequently, NVIDIA accelerator
stemmed from the GPU’s unique architecture, offering sig- kits have become a ubiquitous tool for diverse ML and DL
nificant performance gains for intensive computational tasks research endeavors and practical applications. Research by
VOLUME 12, 2024 94273
R. Khanam et al.: Comprehensive Review of Convolutional Neural Networks for Defect Detection
One optimization strategy, termed algorithmic opera- Model pruning [283], [284], [285] tackles the challenge
tion, entails the incorporation of computational transforms, of network complexity by selectively removing redundant
including the Fast Fourier Transform (FFT) [268], GEMM, parameters or weights. CNNs, in particular, possess a
and Winograd [269], applied to convolutional kernels and substantial weight count, yet not all of these weights offer
feature maps. The primary objective is to mitigate arithmetic significant, or in some cases, any measurable contribution to
operations post-deployment; which is the inference phase. performance. By eliminating these redundant elements, the
The employment of FFT contributes to a reduction in network sheds unnecessary complexity, resulting in a more
arithmetic complexity by transforming a 2-D convolution into lightweight and energy-efficient architecture. This facilitates
an element-wise matrix multiplication [270]. This transfor- deployment on resource-constrained devices, such as FPGAs,
mation offers substantial computational gains, particularly where computational limitations would otherwise hinder
for large kernel sizes, where the number of operations performance.
between kernels and feature maps escalates rapidly. Low-rank approximation [286] presents another avenue
GEMM stands as an extensively used method for the for CNN compression. This technique decomposes the
processing of DNNs in both CPUs and GPUs, demonstrating convolutional weight matrix or FC layers into a set of low-
notable efficacy through the vectorization of computations rank filters. Evaluating these filters requires significantly
in both convolutional and FC layers [271]. In scenarios less computational effort compared to the original weight
involving small kernels, the Winograd emerges as a more matrix, making it particularly advantageous for deployment
efficient strategy to arithmetic reduction compared to FFT, on hardware with limited computational capacity.
by leveraging the reuse of intermediate results [272]. The Quantization of CNNs has emerged as a promising tech-
efficacy of Winograd is exemplified by a considerable 7.28x nique for optimizing computational efficiency, particularly in
enhancement in runtime speed when applied to a VGG-Net, resource-constrained deployment environments. Leveraging
in contrast to GEMM, particularly observable on a Titan-X the inherently lower resource demands of fixed-point arith-
GPU [270]. Furthermore, Winograd exhibits a commendable metic compared to floating-point operations, quantization
throughput of 46 Giga Operations Per Second (GOPs) for involves representing CNN feature maps and weight matrices
AlexNet on a FPGA [273]. using fixed-point formats. This can lead to significant reduc-
Data-path optimization represents another strategic ini- tions in computational cost while maintaining acceptable
tiative directed towards achieving enhanced computational accuracy [287], [288]. For extremely constrained scenarios,
efficiency in architectures. Historically, FPGAs have conven- further compression can be achieved by quantizing weights
tionally structured and implemented processing elements in to binary values, effectively creating Binary Neural Networks
the form of 2-D systolic arrays [274], [275], [276]. However, (BNNs). However, this aggressive quantization approach
these implementations, while conceptually appealing, suffer can introduce significant accuracy degradation, necessitating
from an inherent limitation - the inability to implement careful trade-offs between efficiency and performance [289].
data caching mechanisms due to the imposed constraints on Rui and Qiang [290] investigated the efficacy of pruning
kernel size within the CNN architecture. This, unfortunately, through its application in a CNN architecture designed
restricts the overall efficacy of such designs. for textile defect detection in a production environment.
The loop optimization technique endeavors to address Their study employed tensorRT on an NVIDIA Jetson TX2
the aforementioned challenge through the incorporation of platform to implement pruning prior to deployment. The
various sub-components. First, loop reordering minimizes authors evaluated the impact of pruning on processing time,
redundant memory access by exploiting spatial locality, reporting a reduction from 80 milliseconds to 36 milliseconds
thereby enhancing cache utilization [277]. Second, loop for defect processing after pruning, signifying a significant
unrolling and pipelining contribute to improved FPGA performance improvement.
resource utilization, as demonstrated in [278] and [279],
respectively. Finally, loop tiling involves the partitioning of C. ASIC
weights and feature maps for each layer emanating from Within the realm of DL, ASICs stand out as custom-designed
the memory into ‘tiles,’ thereby facilitating efficient hosting hardware accelerators, prioritizing performance optimization
within on-chip buffers [280]. for specific applications over general-purpose functionality
CNNs exhibit remarkable versatility across diverse appli- [291]. This tailored approach allows ASICs to achieve
cation domains. This inherent adaptability is particularly superior performance in terms of accuracy and inference
advantageous in scenarios where a degree of error tolerance speed compared to GPUs and FPGAs when evaluated on the
is acceptable, such as quality inspection tasks within the target application. However, the inherent advantage of custom
manufacturing sector. Recognizing this potential, researchers design comes at the cost of significantly longer development
have actively pursued model compression strategies aimed at cycles due to the specialized nature of the design process.
mitigating architectural and hardware complexities of CNNs. In the past decade, the field of AI has witnessed the
Notably, three distinct approaches have emerged in model advent of diverse ASIC accelerators designed to address
compression: pruning [281], low-rank approximation [261], its unique computational demands. One notable example
and quantization [282]. is the HiSilicon Kirin-970, developed by Huawei [292].
This heterogeneous architecture features a dedicated neural Motivated by the ever-growing need for efficient and
processing unit (NPU) alongside a Cortex-A73 (quad-core) reliable warehouse operations and the limitations of manual
CPU cluster. This configuration demonstrably enhances inspection, this section aims to investigate the promising
throughput by 25x and energy efficiency by 50x compared potential of CNNs for Industrial defect detection systems.
to traditional CPU-only approaches. Similarly, Google has To broaden the scope of this review, we systematically
spearheaded the development of its custom-designed Tensor delve into closely related domains within the purview
Processing Unit (TPU) [293]. Optimized for DNNs and of Structural Health Monitoring (SHM). This includes
seamlessly integrated with the TensorFlow platform [254], inspection methodologies for identifying defects in various
the TPU offers a compelling alternative for high-performance surfaces, such as pallet racks, steel, rail, magnetic tiles,
AI inference. Apple has also entered the fray with its neural photovoltaic cells, fabric, screens, etc. By delving into the
engine [294], a specialized set of processor cores targeting latest advancements and available options in deployable CV
specific DL network operations, particularly in applications development frameworks, this survey equips researchers with
like facial recognition. These advancements highlight the the necessary knowledge to stay updated and make informed
growing trend of customized ASIC accelerators fostering decisions in their CV-related research within the context of
significant performance and efficiency gains within the AI SHM and beyond. This comprehensive approach not only
landscape. fills a critical knowledge gap in industrial defect inspection
Table 5 provides a comprehensive comparison across GPU, but also offers valuable insights for other closely related
FPGA, and ASIC platforms, considering a broader range domains, fostering cross-pollination of ideas and accelerating
of metrics. Furthermore, it is imperative to note that once advancements in the field.
developed, the design footprint of ASICs remains immutable.
This lack of reconfigurability represents a notable constraint A. PALLET RACKS
for ASICs, as the dynamic nature of diverse deployment envi- Pallet-rack inspection is a potentially novel application for
ronments necessitates the capacity for relevant adjustments to automated defect detection using MV within warehousing
accommodate evolving requirements. and manufacturing environments. Pallet racks constitute
the backbone of industrial logistics, enabling streamlined
VI. INDUSTRIAL DEFECT DETECTION APPLICATION storage and transport of goods. However, their susceptibility
AREAS to structural defects, such as cracks, dents, and corrosion,
Quality control sits at the heart of a robust and efficient poses significant safety hazards. These unobserved damage
manufacturing ecosystem. Any deviations from desired spec- to these vital structures can trigger a cascade of negative
ifications directly impact product functionality, marketabil- consequences if the racking were to collapse. These potential
ity, and ultimately, brand reputation. Therefore, enhancing repercussions include substantial financial losses due to
quality inspection mechanisms is paramount. In the realm of ruined stock, operational downtime, employee injuries, and
industrial automation, MV has emerged as a powerful tool in extreme cases, loss of life. While various mechanical
for revolutionizing quality control and process optimization. solutions, such as rackguards [300], exist to mitigate the
However, its widespread adoption within the manufacturing impact of collisions, they lack the intelligent capabilities
domain has been a gradual process, closely intertwined necessary for proactive damage detection and subsequent
with the advancements in both hardware capabilities and intervention.
underlying computational architectures. Prior to the past Hussain et al. [302] spearheaded research in this field
decade, limitations in computational power and sensor by investigating the application of DL architectures for
technology often rendered MV impractical for integration automated defect detection in pallet racking systems. Given
into existing industrial workflows. Consequently, quality the limited computational resources available in such
inspection primarily relied on human-based visual inferenc- industrial settings, the authors proposed a MobileNetV2-
ing, a method prone to inconsistencies and subjective bias. based model trained on the first publicly available pallet-
The intricacies of identifying and classifying diverse defect racking dataset, acquired through collaboration with multiple
types often surpass human capabilities. industry partners. To address potential data imbalances and
However, the landscape of manufacturing industry is enhance the model’s generalizability, authors implemented
undergoing a significant transformation, driven by the inte- a novel representative sample scaling technique, which
gration of automated processes through MV-based inspection ultimately led to a maP of 92.7% on the aforementioned
systems, particularly within the domain of Surface Defect dataset. Furthermore, their work distinguished itself from
Detection. MV leverages the capabilities of CV algorithms previous mechanical and sensor-based solutions by proposing
to analyze digital images of products, automatically identi- a unique hardware placement strategy. Instead of attaching
fying and classifying imperfections with high accuracy and any equipment directly to the racking structure itself, the
efficiency. This yields multifaceted benefits like reduction authors suggested mounting the inference hardware device
in labor costs, mitigation or elimination of human bias, on the forklift’s adjustable brackets, a solution that bal-
decreased inference time, and alleviation of human fatigue, ances performance with operational flexibility. This strategic
among others. placement significantly reduced hardware requirements, with
some installations achieving a 95% decrease in hardware CNN architectures. The proposed CNN architecture has
while maintaining a 50% IoU metric. This reduction was only 6.5 million learnable parameters, making it the first
accompanied by an expanded coverage area relative to the custom-designed CNN architecture for the pallet racking
operating forklift. domain. The system achieved a baseline accuracy of greater
Hussain et al. [303] further improved performance and than 90% and an overall F1 score of 96% on the test data,
real-time operational feasibility by proposing a domain demonstrating its effectiveness in detecting damaged pallet
variance modeling (DVM) approach for training the YoloV7 racking in warehousing and distribution centers. Various
architecture. Additionally, they broadened the scope of defect regularization strategies, including dropout at a drop rate of
detection to encompass not only vertical flaws, but also 50%, were applied to further enhance the performance and
horizontal cracks and rack support damage. The results were generalizability of the network. Dropout at a drop rate of 50%
noteworthy, with the system achieving an IoU of 50% at an provided the highest performance during training, achieving
impressive 91.1% accuracy, operating at 19 fps. 99% precision, recall, and F1 score. The performance of the
Farahnakian et al. [301] further contributed to the domain proposed architecture was evaluated on the test dataset, and
of automated racking inspection by focusing on semantic although there was a slight drop in the overall F1 score, the
segmentation, employing Mask-RCNN as the inference performance was still impressive at 96%.
architecture. While their reported performance slightly Hussain [305] further works on this domain and proposes
surpassed that of [302], a closer examination of their YOLO-v5n as the optimal architecture for automated pallet
dataset revealed limitations in its representativeness for real- rack inspection, achieving an impressive [email protected] accuracy
world deployment. The captured images depicted isolated of 96.8%, surpassing previous efforts in this domain. This
racking structures devoid of contextual information, such achievement hinges on a novel methodology that delivers
as the surrounding warehouse environment or loaded stock, a robust architecture characterized by high accuracy, strong
potentially hindering the generalizability of their proposed generalization capabilities, and a lightweight footprint. The
architecture. key to this success lies in the proposed augmentation
Expanding upon prior research, Hussain and Hill [304] strategy. By incorporating domain-specific augmentations,
introduces a development pipeline called CNN-Block Devel- the model learns robust features that generalize well to
opment Mechanism (CNN-BDM) that enables researchers real-world scenarios. This, in turn, leads to high accuracy
in the warehousing domain to develop custom lightweight without sacrificing generalizability. Furthermore, a variant
selection algorithm plays a crucial role in balancing accu- ing. This innovative framework demonstrates exceptional
racy and computational efficiency. Recognizing the need resilience against the inherent variability of real-world
for lightweight models suitable for edge deployment, the environments, often posing significant challenges for tradi-
algorithm prioritizes YOLO-v5n over its higher-accuracy tional CV algorithms. Furthermore, the same research team
counterpart, YOLO-v5x, due to the latter’s significant developed a structural visual inspection method leveraging
computational burden. Faster R-CNN [311]. This method enables the quasi-real-time
Alif [306] introduces Pallet-Net, a novel DL technique for simultaneous detection of multiple types of defects, further
automated pallet rack inspection. Leveraging an attention- enhancing the efficiency and accuracy of infrastructure
based CNN, Pallet-Net achieves a remarkable total accuracy inspection.
of 97.63%, surpassing existing methods (ViT and CCT) in Building upon existing work, He et al. [312] proposed a
terms of both precision (98%), recall (98%), and F1 score novel end-to-end DL system for steel plate defect inspection.
(98%). This exceptional performance highlights the model’s Their approach used CNN as the primary feature extraction
ability to effectively identify faulty pallet racks, contributing mechanism, generating feature maps at each processing
significantly to industrial safety and maintenance practices. stage. These feature maps were then integrated through a
Hu [307] presents a novel approach for automated pallet Multilevel-feature fusion network (MFN), culminating in
racking assessment utilizing the MobileNetV2-YOLOv5 a single feature representation containing enhanced spatial
framework. This framework enables the detection of various details of potential defects. Leveraging these enriched
damage types in pallet racking systems directly on edge plat- features, a RPN identified areas of interest, followed by
forms during pallet movement. Following a comprehensive a detector comprising a classifier and a bounding box
analysis, the study identifies MobileNetV2-YOLOv5 (n) as regressor to produce the final defect detection results. The
the optimal architecture due to its superior balance between proposed architecture demonstrated impressive performance,
high accuracy and computational efficiency for deployment achieving an accuracy of 99.67% for the defect classification
on resource-constrained edge devices. The proposed method- task and an mAP of 82.3 for the defect detection task on a
ology demonstrates impressive performance, achieving a publicly available dataset. Furthermore, the system achieves
precision of 90.6%, a recall of 95.7%, and a [email protected] of a detection speed of 20 ft/s while maintaining a mAP of 70%.
97.8%. These metrics highlight the framework’s effectiveness Addressing the challenge of imbalanced class distributions
in accurately identifying damage while maintaining compu- due to the inherent sparsity of abnormal samples, Lian et
tational feasibility for real-time applications. al. [313] proposed a novel Generative Adversarial Network
Reviewing the current literature, it becomes evident that a (GAN) architecture for identifying tiny flaws in steel plates.
significant gap exists in research concerning automated rack- Their approach leverages defect exaggeration, generating
ing inspection utilizing DL techniques. Table 6 summarizes both the clean image and an exaggerated version of the defect.
the key findings of existing research studies in this field. This augmented dataset is then fed into a GAN-CNN hybrid
model, demonstrably improving the accuracy of tiny surface
defect detection by effectively augmenting the minority class.
B. STEEL SURFACES Luo et al. [314] address the challenges of roll mark
Steel remains a critical element across various planar detection on steel strips, characterized by large intra-class
industries, including architecture, aerospace, machinery, and variations, low contrast, and harsh industrial environments.
automobile manufacturing. However, producing certain steel They propose a novel network, SCFPN, featuring a pyramid
strips can be technically complex, with stringent quality structure for enhanced defect feature extraction and CIoU
requirements. These complexities expose the material to var- loss function for improved training stability. To address data
ious defect formation risks arising from human, mechanical, limitations, they introduce CSU_STEEL, the first publicly
or environmental factors. Figure 20 illustrates common steel available database of hot-rolled steel strip surface defects.
defect types. SCFPN demonstrates impressive performance, achieving
With the growing demands for intelligent manufacturing 75.9% AP for fine-grained roll mark characterization and
and enhanced surface quality assurance, steel surface defect exceeding DeepPCB and NEU datasets with 99.2% and
detection has garnered significant attention in recent years. 82.8% mAP respectively.
In response to this critical need, Yi et al. [309] proposed Liu et al. [315] propose a surface defect detection
an end-to-end system for surface defect recognition in steel framework for steel strips that incorporates a self-attention
strips. Their system leverages a novel symmetric surround mechanism to capture spatial-wise semantic relationships and
saliency map for initial defect detection, followed by a DCNN model global contextual inter-dependencies. The framework
that directly maps defect images to their respective categories. known as Feature Refinement Faster R-CNN (FR-FRCNN),
This CNN, trained on raw defect images, forms the core of an automatically identifies the specific class and location of six
efficient end-to-end defect recognition pipeline. types of typical surface defects on steel strips. Compared
Cha et al. [310] introduced a novel DCNN architecture to the baseline framework (Faster R-CNN with FPN), the
capable of directly identifying cracks on concrete and steel proposed framework achieves higher detection accuracy and
surfaces, eliminating the need for manual feature engineer- better localization ability.
Feng et al. [316] address the critical challenge of automat- notion by demonstrating the potential of YOLOv5. Through
ing hot-rolled steel strip defect detection, highlighting meticulously crafted data augmentation strategies, they
its crucial role in various manufacturing industries. They achieve superior performance in both speed and accuracy
propose a novel approach leveraging a RepVGG architecture compared to Faster R-CNN. Their trained YOLOv5 model
augmented with a spatial attention mechanism. While the attains a mAP of 98.7% (IoU-0.5) while satisfying real-time
overall test accuracy reaches a promising 95.10%, the detection requirements for steel pipe production, with a single
authors acknowledge significant variations in performance image processing time of 0.12 seconds. However, the current
across different defect categories, with some as low as implementation has limitations. The X-ray data is processed
78.95%. Notably, they commendably present a detailed on a PC equipped with a GPU, indicating a centralized
analysis of their architecture’s computational complexity, processing architecture that may not be readily scalable in
acknowledging its relatively large size in terms of learnable large production environments. Future work could explore
parameters (83.825 million) and computational demand edge computing solutions or more distributed processing
(17.892 GMACs). This transparency allows for informed architectures to address this limitation.
comparison and future refinement of the model. In the study by [318], the authors proposed a custom CNN
Yang et al. [317] explore the potential of YOLOv5 for architecture for automatic metal casting defect detection
production-based weld steel defect detection using X-ray and compared their performance with SOTA architectures
images of weld pipes. Their work challenges the traditional like ResNet, MobileNet, and Inception. Their approach
dominance of two-stage detectors like Faster-RCNN in this leverages the depth-wise separable convolution introduced
domain, advocating for the effectiveness of single-stage by MobileNet within their custom architectures, alongside
detectors when equipped with appropriate training strategies. several optimization strategies such as Blurpool, stochastic
While conventional wisdom often assigns superior accuracy weight averaging, MixUp, label smoothing, and squeeze-
to two-stage detectors, Yang et al. effectively dispel this excitation. While the authors claim superior performance
based on the reduced number of parameters, their reported unsupervised layer-wise pre-training, where each layer of
accuracy is of 81.87% whereas the attained Inception stands the network is trained independently on unlabeled data
at 91.48%. before fine-tuning with labeled defect images. This approach
aimed to learn general image features and improve the
C. RAIL TRACKS overall network’s generalization capabilities. Secondly, they
Rail surface defect detection presents a unique but critical explored the effectiveness of training data augmentation,
challenge within the broader realm of steel surface defect a technique that artificially expands the training dataset
analysis. Exceptionally high contact pressures between wheel by generating variations of existing images. This serves
and rail, coupled with high traction and braking forces result to prevent overfitting and enhance the network’s ability
in a multitude of potential rail surface damage defects. Such to recognize defects under diverse lighting conditions and
defects can propagate from the rail surface into the rail image noise. By exploring these regularization methods, the
head, and if left undetected can result in catastrophic rail authors demonstrated the potential of DL for rail surface
failures. As the maintenance demands of an ever-growing defect detection using photometric stereo images. Their work
global rail network continues, it becomes essential to develop contributes to the development of reliable and automated
high-speed, reliable, and cost effective detection systems to inspection systems for ensuring the safety and integrity of
ensure the safety of railway operations. Figure 21 illustrates railway infrastructure.
common defects encountered in rails. Image-based MV To further automate manual inspection, Liang et al. [320]
have the potential to provide a cost-effective solutions to introduced a novel approach utilizing a DCNN to automate
rail defect detection, however, complicated by the high rail rail surface defect detection. Their work leverages the
reflectivity and complex ambient lighting of railway systems SegNet architecture [321], a well-established network known
(night/day, dew point, tunnels, cuttings and open sections) for its efficient encoder-decoder structure. This deep, 59-
wide adoption necessitates the development of robust, higher layer network extracts relevant features from rail surface
speed and more sophisticated image processing algorithms images while simultaneously performing spatial localization,
to effectively handle the complexities inherent in continuous enabling precise defect identification.
monitoring of rail surface defect images. Shang et al. [323] proposed a novel two-stage pipeline
for automated rail defect detection, emphasizing a novel
localization and classification strategy. In the first stage,
they employed traditional image processing techniques to
accurately localize on cropped rail images rather than the
original image. Subsequently, the cropped images were fed
into a fine-tuned CNN for defect classification. This approach
leveraged the powerful feature extraction capabilities of
CNNs while reducing computational complexity by focusing
only on the relevant regions. This targeted feature extraction
yielded superior discriminative power for subsequent classi-
fication, ultimately achieving impressive precision and recall
rates of 92.08% and 92.54%, respectively.
Leveraging the YOLOv3 DL model [192], Yanan et al.
[324] proposed an approach for rail surface defect inspection.
This framework divides the input image into a grid of S × S
cells. Within each cell, logistic regression was employed
to predict the bounding box object score, indicating the
confidence of a defect being present. Additionally, binary
cross-entropy loss function was utilized to predict the
specific categories of defects that the bounding box might
encompass. The research outcomes achieved an impressive
FIGURE 21. Common rail surface defects [42]. recognition rate exceeding 97%, with an identification time
of approximately 0.15 seconds.
Soukup and Huber-Mörk [319] explored the application Yuan et al. [325] addressed the challenge of real-time
of CNNs for rail surface defect detection using photometric defect detection by proposing an end-to-end framework
stereo images. This approach leverages the inherent depth utilizing a lightweight CNN architecture. Their approach
information encoded within such images, potentially leading leverages MobileNetV2 [145] as the core network for
to more robust and accurate defect identification compared to efficient feature extraction, enabling multi-scale defect
conventional methods. Their work focused on investigating detection for optimal performance. This methodology yielded
the impact of various regularization techniques on improving an impressive accuracy of 87.4 mAP while maintaining a
the performance of the CNNs. Firstly, they employed real-time processing speed of 60 fps.
technology presents a promising avenue for enhancing both bination enables real-time training and classification of
the efficiency and accuracy of quality control processes. Mura defects directly within the production line, eliminating
Luo et al. [339] presented an automated scratch detection the need for pre-training on large datasets or offline
method employing a two-module cascading architecture. The processing.
first module leveraged a series of low-level processing stages Building upon the challenges of diverse defect sizes and
to identify large scratches and localize potential small scratch shapes in screen defect detection, Lei et al. [341] developed
candidates. Subsequently, the second module employed a a novel end-to-end framework. Their approach integrates
lightweight ScratchNet model for classifying each identified merging and splitting strategies to effectively process image
small scratch candidate as a genuine scratch or a non-scratch patches of varying scales. Subsequently, a trained Recurrent
anomaly. This approach demonstrated remarkable accuracy, Neural Network (RNN) analyzes the processed patches and
achieving 96.35% for small scratch classification. identifies the ones most likely to contain defects. This system
In the domain of quality control for flat-panel displays, achieved a remarkable precision of 90.36%, significantly
Mura defects present a significant challenge due to their exceeding the 76.02 benchmark established by AlexNet
subtle visual nature and variability. To address this issue, [123].
Yang et al. [340] proposed a novel approach combining Lv et al. [342] proposed an automated defect detection sys-
online sequential classification and transfer learning for tem for mobile phone cover glass. Their approach leverages
real-time detection and classification of Mura defects on backlight imaging and a modified segmentation technique
production lines. The key innovation lies in the integration powered by DNNs to effectively extract and identify defects.
of a DCNN for feature extraction with a sequential extreme Furthermore, the authors proposed a GAN-based approach
learning machines (SELMs) classifier. This innovative com- coupled with a Faster R-CNN model [156] to address the
challenge of limited defect data in specific applications, such over image classification, notably providing precise spatial
as identifying imperfections on mobile phone cover glass. localization of objects. This capability unlocks diverse appli-
Their experiments convincingly demonstrated the efficacy of cations, particularly in manufacturing, where OD enables
this method in enhancing defect detection performance. integration with actuation mechanisms for enhanced effi-
ciency. While CNNs offer robust feature learning, high
H. OTHER SURFACES accuracy, end-to-end training, and transfer learning capa-
Beyond the aforementioned applications, a review of the bilities for OD, they face challenges [369], [370], [371].
relevant literature reveals that DL has also been successfully These include large labeled dataset requirements, computa-
employed in a diverse range of industrial quality detection tional complexity, real-time performance constraints, domain
tasks, including Metal surface defect detection [343], [344], shift and generalization issues, and lack of interpretability.
[345], Electronic component defect detection [346], [347], Future research should focus on developing efficient and
[348], Optical fiber defect detection [349], [350], Wheel lightweight architectures, exploring domain adaptation and
hub surface defect detection [351], [352], [353], Diode chip few-shot learning techniques, improving interpretability,
defect detection [354], [355], [356], Bottle mouth defect and seamlessly integrating CNN-based OD into industrial
detection [357], [358], [359], Precision parts defect detection workflows.
[360], Varistor defect detection [361], [362], Ceramic defect
detection [363], [364], [365], and Wood defect detection B. PROGRESSION OF YOLO MODELS
[366], [367], [368]. This broad spectrum of applications
Owing to the introduction of YOLO in 2015, this sector
showcases the substantial potential of DL-based MV technol-
of OD has established itself as a dominant force within
ogy for industrial quality inspection. However, it is crucial to
the CV landscape, continuing its contribution through the
acknowledge that DL applications in the industrial domain
release of YOLOv8 on January 2023. This phenomenal
are predominantly customized, necessitating a high degree
success can be attributed to the unwavering focus of its
of coupling between the technical models and specific
creators on meticulously optimizing two crucial metrics
inspection scenarios. Consequently, refined development
for real-world applications: accuracy and computational
tailored to each unique task is paramount for successful
efficiency, leading to superior inference speed. While the
implementation.
original author’s decision to halt further development due
to privacy concerns initially seemed like a setback [189],
I. COMPARATIVE ANALYSIS
it has ironically fostered a vibrant research landscape. Driven
As evident from Table 7, CNN-based approaches have been
by the vast potential applications of lightweight, real-time
widely explored for defect detection across various industrial
OD, numerous researchers and renowned research groups
surfaces, including pallet racks, steel, rail tracks, magnetic
have actively engaged in rigorous architectural optimiza-
tiles, photovoltaic cells, textiles, and displays. The reported
tions [372], [373], [374]. This competitive environment,
performance metrics, such as accuracy, mAP, precision,
fuels continuous innovation, as research organizations vie
and recall, demonstrate the efficacy of these techniques in
for superiority in the OD arena, driven by the vast potential
automating defect detection processes. However, challenges
applications that hinge upon fast, lightweight detection
like limited dataset availability, computational complexity,
capabilities.
and domain-specific nuances still exist, necessitating further
research and adaptation. Nonetheless, the practical implica-
tions of these CNN-based solutions are significant, ranging C. NECESSITIES OF ADOPTING HARDWARE-BASED
from enhancing quality control and improving product BENCHMARKS
functionality to reducing maintenance costs and optimizing For over a decade, the design of OD architectures has
manufacturing processes. been primarily driven by the pursuit of highly accurate
benchmark datasets. These datasets often encompass vast
VII. CHALLENGES AND FUTURE SCOPE amounts of data, featuring countless classes and millions
This study was dedicated to providing a comprehensive of images, leading to the creation of computationally
examination of the historic and current landscape of CNNs, expensive and complex architectures. However, the recent
considering both algorithmic intricacies and deployment in years have witnessed a paradigm shift within the field, with
various industrial defect detection systems. Through rigorous the focus transitioning from purely theoretical performance
review, this work synthesizes research trends, identifies optimizations to practical, real-world implementations. This
potential focal points, and outlines prospective directions for led to the emergence for the development of hardware-based
future investigations. benchmarks which necessitates expanding the competition
metrics beyond mere accuracy to encompass a broader spec-
A. WIDESPREAD ADOPTION OF OBJECT DETECTION trum of performance indicators like computational efficiency,
ARCHITECTURES resource allocation, and inference speed, particularly on
The surge in development of CNN-based object detection resource-constrained devices such as FPGAs [53], [375],
(OD) architectures stems from their inherent advantages [376].
methodologies, based on specific use case requirements and [20] Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil, ‘‘Learning semantic
constraints. representations using convolutional neural networks for web search,’’ in
Proc. 23rd Int. Conf. World Wide Web, Apr. 2014, pp. 373–374.
In conclusion, this comprehensive review has achieved its [21] N. Kalchbrenner, E. Grefenstette, and P. Blunsom, ‘‘A convolutional
primary objective by thoroughly exploring and answering all neural network for modelling sentences,’’ 2014, arXiv:1404.2188.
the outlined research questions pertaining to the applications [22] Y. Kim, ‘‘Convolutional neural networks for sentence classification,’’
of CNNs for industrial defect detection systems. While CNNs 2014, arXiv:1408.5882.
[23] R. Collobert and J. Weston, ‘‘A unified architecture for natural language
hold immense promise, they have not yet attained maturity. processing: Deep neural networks with multitask learning,’’ in Proc. 25th
Continued research efforts to enhance their efficiency, Int. Conf. Mach. Learn., 2008, pp. 160–167.
robustness, interpretability, and real-time capabilities are [24] T. Mikolov, K. Chen, G. Corrado, and J. Dean, ‘‘Efficient estimation of
imperative. Ultimately, a balanced and objective evaluation word representations in vector space,’’ 2013, arXiv:1301.3781.
[25] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu,
of CNNs’ strengths, limitations, and viable alternatives will and P. Kuksa, ‘‘Natural language processing (almost) from scratch,’’
pave the way for optimal solutions in industrial quality J. Mach. Learn. Res., vol. 12, pp. 2493–2537, Nov. 2493.
inspection. [26] T. Guo, J. Dong, H. Li, and Y. Gao, ‘‘Simple convolutional neural network
on image classification,’’ in Proc. IEEE 2nd Int. Conf. Big Data Anal.
(ICBDA), Mar. 2017, pp. 721–724.
REFERENCES [27] N. Sharma, V. Jain, and A. Mishra, ‘‘An analysis of convolutional
[1] M. Haenlein and A. Kaplan, ‘‘A brief history of artificial intelligence: On neural networks for image classification,’’ Proc. Comput. Sci., vol. 132,
the past, present, and future of artificial intelligence,’’ California Manage. pp. 377–384, Jul. 2018.
Rev., vol. 61, no. 4, pp. 5–14, Aug. 2019. [28] W. Rawat and Z. Wang, ‘‘Deep convolutional neural networks for image
[2] R. R. Nadikattu, ‘‘The emerging role of artificial intelligence in modern classification: A comprehensive review,’’ Neural Comput., vol. 29, no. 9,
society,’’ Int. J. Creative Res. Thoughts, vol. 1, no. 1, pp. 1–20, 2016. pp. 2352–2449, Sep. 2017.
[3] M. Krichen, ‘‘Convolutional neural networks: A survey,’’ Computers, [29] J. Naranjo-Torres, M. Mora, R. Hernández-García, R. J. Barrientos,
vol. 12, no. 8, p. 151, Jul. 2023. C. Fredes, and A. Valenzuela, ‘‘A review of convolutional neural network
[4] V. C. Müller and N. Bostrom, ‘‘Future progress in artificial intelligence: applied to fruit image processing,’’ Appl. Sci., vol. 10, no. 10, p. 3443,
A survey of expert opinion,’’ Fundam. Issues Artif. Intell., vol. 1, May 2020.
pp. 555–572, Aug. 2016. [30] J.-T. Huang, J. Li, and Y. Gong, ‘‘An analysis of convolutional neural
[5] A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi, ‘‘A survey of the networks for speech recognition,’’ in Proc. IEEE Int. Conf. Acoust.,
recent architectures of deep convolutional neural networks,’’ Artif. Intell. Speech Signal Process. (ICASSP), Apr. 2015, pp. 4989–4993.
Rev., vol. 53, no. 8, pp. 5455–5516, Dec. 2020. [31] S. Dua, S. S. Kumar, Y. Albagory, R. Ramalingam, A. Dumka,
[6] A. Voulodimos, N. Doulamis, A. Doulamis, and E. Protopapadakis, R. Singh, M. Rashid, A. Gehlot, S. S. Alshamrani, and A. S. AlGhamdi,
‘‘Deep learning for computer vision: A brief review,’’ Comput. Intell. ‘‘Developing a speech recognition system for recognizing tonal speech
Neurosci., vol. 2018, pp. 1–13, Aug. 2018. signals using a convolutional neural network,’’ Appl. Sci., vol. 12, no. 12,
[7] W. Mcculloch and W. Pitts, ‘‘A logical calculus of the ideas immanent in p. 6223, Jun. 2022.
nervous activity,’’ Bull. Math. Biol., vol. 52, nos. 1–2, pp. 99–115, 1990. [32] L. Trinh Van, T. Dao Thi Le, T. Le Xuan, and E. Castelli, ‘‘Emotional
[8] E. Akleman, ‘‘Deep learning,’’ Computer, vol. 53, no. 9, pp. 1–17, speech recognition using deep neural networks,’’ Sensors, vol. 22, no. 4,
Sep. 2020. p. 1414, Feb. 2022.
[9] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, [33] M. Kubanek, J. Bobulski, and J. Kulawik, ‘‘A method of speech coding
MA, USA: MIT Press, 2016. for speech recognition using a convolutional neural network,’’ Symmetry,
[10] Y. Bengio, I. Goodfellow, and A. Courville, Deep Learning, vol. 1. vol. 11, no. 9, p. 1185, Sep. 2019.
Cambridge, MA, USA: MIT Press, 2017. [34] S. Yin, C. Liu, Z. Zhang, Y. Lin, D. Wang, J. Tejedor, T. F. Zheng, and
[11] Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, ‘‘A survey of convolutional Y. Li, ‘‘Noisy training for deep neural networks in speech recognition,’’
neural networks: Analysis, applications, and prospects,’’ IEEE Trans. EURASIP J. Audio, Speech, Music Process., vol. 2015, no. 1, pp. 1–14,
Neural Netw. Learn. Syst., vol. 33, no. 12, pp. 6999–7019, Dec. 2022. Dec. 2015.
[12] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, [35] Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi, ‘‘Deep feature extraction
G. Wang, J. Cai, and T. Chen, ‘‘Recent advances in convolutional neural and classification of hyperspectral images based on convolutional
networks,’’ Pattern Recognit., vol. 77, pp. 354–377, May 2018. neural networks,’’ IEEE Trans. Geosci. Remote Sens., vol. 54, no. 10,
[13] S. Albawi, T. A. Mohammed, and S. Al-Zawi, ‘‘Understanding of a pp. 6232–6251, Oct. 2016.
convolutional neural network,’’ in Proc. Int. Conf. Eng. Technol. (ICET), [36] F. Tu, S. Yin, P. Ouyang, S. Tang, L. Liu, and S. Wei, ‘‘Deep convolutional
Aug. 2017, pp. 1–6. neural network architecture with reconfigurable computation patterns,’’
[14] A. Fesseha, S. Xiong, E. D. Emiru, M. Diallo, and A. Dahou, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 25, no. 8,
‘‘Text classification based on convolutional neural networks and word pp. 2220–2233, Aug. 2017.
embedding for low-resource languages: Tigrinya,’’ Information, vol. 12, [37] D. Strigl, K. Kofler, and S. Podlipnig, ‘‘Performance and scalability of
no. 2, p. 52, Jan. 2021. GPU-based convolutional neural networks,’’ in Proc. 18th Euromicro
[15] A. Alani, ‘‘Arabic handwritten digit recognition based on restricted Conf. Parallel, Distrib. Netw.-Based Process., Feb. 2010, pp. 317–324.
Boltzmann machine and convolutional neural networks,’’ Information, [38] Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew, ‘‘Deep
vol. 8, no. 4, p. 142, Nov. 2017. learning for visual understanding: A review,’’ Neurocomputing, vol. 187,
[16] W. Wang and J. Gang, ‘‘Application of convolutional neural network in pp. 27–48, Apr. 2016.
natural language processing,’’ in Proc. Int. Conf. Inf. Syst. Comput. Aided [39] B. A. Aydin, M. Hussain, R. Hill, and H. Al-Aqrabi, ‘‘Domain modelling
Educ. (ICISCAE), Jul. 2018, pp. 64–70. for a lightweight convolutional network focused on automated exudate
[17] P. Li, J. Li, and G. Wang, ‘‘Application of convolutional neural network detection in retinal fundus images,’’ in Proc. 9th Int. Conf. Inf. Technol.
in natural language processing,’’ in Proc. 15th Int. Comput. Conf. Wavelet Trends (ITT), May 2023, pp. 145–150.
Act. Media Technol. Inf. Process. , Dec. 2018, pp. 120–122. [40] A. Zahid, M. Hussain, R. Hill, and H. Al-Aqrabi, ‘‘Lightweight
[18] M. Giménez, J. Palanca, and V. Botti, ‘‘Semantic-based padding convolutional network for automated photovoltaic defect detection,’’ in
in convolutional neural networks for improving the performance in Proc. 9th Int. Conf. Inf. Technol. Trends (ITT), May 2023, pp. 133–138.
natural language processing. A case of study in sentiment analysis,’’ [41] D. Animashaun and M. Hussain, ‘‘Automated micro-crack detection
Neurocomputing, vol. 378, pp. 315–323, Feb. 2020. within photovoltaic manufacturing facility via ground modelling for a
[19] E. Grefenstette, P. Blunsom, N. de Freitas, and K. Moritz Hermann, ‘‘A regularized convolutional network,’’ Sensors, vol. 23, no. 13, p. 6235,
deep architecture for semantic parsing,’’ 2014, arXiv:1404.7296. Jul. 2023.
[42] S. Qi, J. Yang, and Z. Zhong, ‘‘A review on industrial surface defect [63] H. J. Jie and P. Wanda, ‘‘RunPool: A dynamic pooling layer for
detection based on deep learning technology,’’ in Proc. 3rd Int. Conf. convolution neural network,’’ Int. J. Comput. Intell. Syst., vol. 13, no. 1,
Mach. Learn. Mach. Intell., Sep. 2020, pp. 24–30. p. 66, 2020.
[43] E. Cumbajin, N. Rodrigues, P. Costa, R. Miragaia, L. Frazão, N. Costa, [64] F. Yan, L. Liu, X. Ding, Q. Zhang, and Y. Liu, ‘‘Monocular catadioptric
A. Fernández-Caballero, J. Carneiro, L. H. Buruberri, and A. Pereira, ‘‘A panoramic depth estimation via improved end-to-end neural network
systematic review on deep learning with CNNs applied to surface defect model,’’ Frontiers Neurorobotics, vol. 17, Sep. 2023, Art. no. 1278986.
detection,’’ J. Imag., vol. 9, no. 10, p. 193, Sep. 2023. [65] A. T. Kabakus, ‘‘DroidMalwareDetector: A novel Android malware
[44] X. Wen, J. Shan, Y. He, and K. Song, ‘‘Steel surface defect recognition: detection framework based on convolutional neural network,’’ Expert
A survey,’’ Coatings, vol. 13, no. 1, p. 17, Dec. 2022. Syst. Appl., vol. 206, Nov. 2022, Art. no. 117833.
[45] L. Kou, ‘‘A review of research on detection and evaluation of the [66] W. Ouyang, B. Xu, J. Hou, and X. Yuan, ‘‘Fabric defect detection using
rail surface defects,’’ Acta Polytechnica Hungarica, vol. 19, no. 3, activation layer embedded convolutional neural network,’’ IEEE Access,
pp. 167–186, 2022. vol. 7, pp. 70130–70140, 2019.
[46] B. Li, C. Delpha, D. Diallo, and A. Migan-Dubois, ‘‘Application of [67] I. D. Khan, O. Farooq, and Y. U. Khan, ‘‘Automatic seizure detection
artificial neural networks to photovoltaic fault detection and diagno- using modified CNN architecture and activation layer,’’ J. Phys., Conf.
sis: A review,’’ Renew. Sustain. Energy Rev., vol. 138, Mar. 2021, Ser., vol. 2318, no. 1, Aug. 2022, Art. no. 012013.
Art. no. 110512. [68] J.-Y. Gan, Y.-K. Zhai, Y. Huang, J.-Y. Zeng, and K.-Y. Jiang, ‘‘Research
[47] G. M. El-Banby, N. M. Moawad, B. A. Abouzalm, W. F. Abouzaid, of facial beauty prediction based on deep convolutional features using
and E. A. Ramadan, ‘‘Photovoltaic system fault detection techniques: double activation layer,’’ Acta Electonica Sinica, vol. 47, no. 3, p. 636,
A review,’’ Neural Comput. Appl., vol. 35, no. 35, pp. 24829–24842, 2019.
Dec. 2023. [69] H. Nakahara, T. Fujii, and S. Sato, ‘‘A fully connected layer elimination
[48] U. Hijjawi, S. Lakshminarayana, T. Xu, G. P. M. Fierro, and M. Rahman, for a binarizec convolutional neural network on an FPGA,’’ in Proc. 27th
‘‘A review of automated solar photovoltaic defect detection systems: Int. Conf. Field Program. Log. Appl. (FPL), Sep. 2017, pp. 1–4.
Approaches, challenges, and future orientations,’’ Sol. Energy, vol. 266, [70] C. Yang, Z. Yang, J. Hou, and Y. Su, ‘‘A lightweight full homomorphic
Dec. 2023, Art. no. 112186. encryption scheme on fully-connected layer for CNN hardware accelera-
[49] A. Rasheed, B. Zafar, A. Rasheed, N. Ali, M. Sajid, S. H. Dar, U. Habib, tor achieving security inference,’’ in Proc. 28th IEEE Int. Conf. Electron.,
T. Shehryar, and M. T. Mahmood, ‘‘Fabric defect detection using Circuits, Syst. (ICECS), Nov. 2021, pp. 1–4.
computer vision techniques: A comprehensive review,’’ Math. Problems [71] D. Ramachandran, R. S. Kumar, A. Alkhayyat, R. Q. Malik, P. Srinivasan,
Eng., vol. 2020, pp. 1–24, Nov. 2020. G. G. Priya, and A. Gosu Adigo, ‘‘Classification of electrocardiography
[50] C. Li, J. Li, Y. Li, L. He, X. Fu, and J. Chen, ‘‘Fabric defect detection in hybrid convolutional neural network-long short term memory with
textile manufacturing: A survey of the state of the art,’’ Secur. Commun. fully connected layer,’’ Comput. Intell. Neurosci., vol. 2022, pp. 1–10,
Netw., vol. 2021, pp. 1–13, May 2021. Jul. 2022.
[51] Y. Kahraman and A. Durmugoglu, ‘‘Deep learning-based fabric defect [72] T. Zheng, Q. Wang, Y. Shen, and X. Lin, ‘‘Gradient rectified parameter
detection: A review,’’ Textile Res. J., vol. 93, nos. 5–6, pp. 1485–1503, unit of the fully connected layer in convolutional neural networks,’’
Mar. 2023. Knowl.-Based Syst., vol. 248, Jul. 2022, Art. no. 108797.
[52] W. Ming, C. Cao, G. Zhang, H. Zhang, F. Zhang, Z. Jiang, and J. Yuan, [73] T. Yoshioka, N. Ito, M. Delcroix, A. Ogawa, K. Kinoshita, M. Fujimoto,
‘‘Review: Application of convolutional neural network in defect detection C. Yu, W. J. Fabian, M. Espi, T. Higuchi, S. Araki, and T. Nakatani,
of 3C products,’’ IEEE Access, vol. 9, pp. 135657–135674, 2021. ‘‘The NTTT Chime-3 system: Advances in speech enhancement and
[53] D. Ghimire, D. Kil, and S.-H. Kim, ‘‘A survey on efficient convolutional recognition for mobile multi-microphone devices,’’ in Proc. IEEE
neural networks and hardware acceleration,’’ Electronics, vol. 11, no. 6, Workshop Autom. Speech Recognit. Understand., Jun. 2015, pp. 436–443.
p. 945, Mar. 2022. [74] Z. Liao and G. Carneiro, ‘‘On the importance of normalisation layers
[54] M. Capra, B. Bussolino, A. Marchisio, M. Shafique, G. Masera, and in deep learning with piecewise linear activation units,’’ in Proc. IEEE
M. Martina, ‘‘An updated survey of efficient hardware architectures Winter Conf. Appl. Comput. Vis. (WACV), Mar. 2016, pp. 1–8.
for accelerating deep convolutional neural networks,’’ Future Internet, [75] S.-H. Wang, J. Hong, and M. Yang, ‘‘Sensorineural hearing loss
vol. 12, no. 7, p. 113, Jul. 2020. identification via nine-layer convolutional neural network with
[55] L. Shen, Z. Lin, and Q. Huang, ‘‘Relay backpropagation for effective batch normalization and dropout,’’ Multimedia Tools Appl., vol. 79,
learning of deep convolutional neural networks,’’ in Proc. Eur. Conf. nos. 21–22, pp. 15135–15150, Jun. 2020.
Comput. Vis. (ECCV), 2016, pp. 467–482. [76] T. Sledevic, ‘‘Adaptation of convolution and batch normalization layer for
[56] E. M. Dogo, O. J. Afolabi, N. I. Nwulu, B. Twala, and C. O. Aigbavboa, CNN implementation on FPGA,’’ in Proc. Open Conf. Electr., Electron.
‘‘A comparative analysis of gradient descent-based optimization algo- Inf. Sci., Apr. 2019, pp. 1–4.
rithms on convolutional neural networks,’’ in Proc. Int. Conf. Comput. [77] S. Ioffe and C. Szegedy, ‘‘Batch normalization: Accelerating deep
Techn., Electron. Mech. Syst., Dec. 2018, pp. 92–99. network training by reducing internal covariate shift,’’ in Proc. Int. Conf.
[57] Y. Ren and X. Cheng, ‘‘Review of convolutional neural network Mach. Learn., 2015, pp. 448–456.
optimization and training in image processing,’’ in Proc. 10th Int. Symp. [78] C. Garbin, X. Zhu, and O. Marques, ‘‘Dropout vs. Batch normalization:
Precis. Eng. Meas. Instrum., Mar. 2019, pp. 788–797. An empirical study of their impact to deep learning,’’ Multimedia Tools
[58] G. Habib and S. Qureshi, ‘‘Optimization and acceleration of convolu- Appl., vol. 79, nos. 19–20, pp. 12777–12815, May 2020.
tional neural networks: A survey,’’ J. King Saud Univ. Comput. Inf. Sci., [79] G. Li, X. Jian, Z. Wen, and J. AlSultan, ‘‘Algorithm of overfitting
vol. 34, no. 7, pp. 4244–4268, Jul. 2022. avoidance in CNN based on maximum pooled and weight decay,’’ Appl.
[59] J. A. Pandian, K. Kanchanadevi, V. D. Kumar, E. Jasinska, R. Gono, Math. Nonlinear Sci., vol. 7, no. 2, pp. 965–974, Jul. 2022.
Z. Leonowicz, and M. Jasinski, ‘‘A five convolutional layer deep con- [80] P. Dileep, D. Das, and P. K. Bora, ‘‘Dense layer dropout based CNN
volutional neural network for plant leaf disease detection,’’ Electronics, architecture for automatic modulation classification,’’ in Proc. Nat. Conf.
vol. 11, no. 8, p. 1266, Apr. 2022. Commun. (NCC), Feb. 2020, pp. 1–5.
[60] T.-C. Lu, ‘‘CNN convolutional layer optimisation based on quantum [81] W. Setiawan, ‘‘Character recognition using adjustment convolutional
evolutionary algorithm,’’ Connection Sci., vol. 33, no. 3, pp. 482–494, network with dropout layer,’’ IOP Conf. Ser., Mater. Sci. Eng., vol. 1125,
Jul. 2021. May 2021, Art. no. 012049.
[61] J. Gunther, P. M. Pilarski, G. Helfrich, H. Shen, and K. Diepold, [82] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and
‘‘First steps towards an intelligent laser welding architecture using deep R. R. Salakhutdinov, ‘‘Improving neural networks by preventing co-
neural networks and reinforcement learning,’’ Proc. Technol., vol. 15, adaptation of feature detectors,’’ 2012, arXiv:1207.0580.
pp. 474–483, Jul. 2014. [83] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
[62] N.-I. Galanis, P. Vafiadis, K.-G. Mirzaev, and G. A. Papakostas, R. Salakhutdinov, ‘‘Dropout: A simple way to prevent neural networks
‘‘Convolutional neural networks: A roundup and benchmark of their from overfitting,’’ J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958,
pooling layer variants,’’ Algorithms, vol. 15, no. 11, p. 391, Oct. 2022. 2014.
[84] P. Jiang, Y. Xue, and F. Neri, ‘‘Convolutional neural network pruning [106] A. Wiranata, S. A. Wibowo, R. Patmasari, R. Rahmania, and
based on multi-objective feature map selection for image classification,’’ R. Mayasari, ‘‘Investigation of padding schemes for faster R-CNN on
Appl. Soft Comput., vol. 139, May 2023, Art. no. 110229. vehicle detection,’’ in Proc. Int. Conf. Control, Electron., Renew. Energy
[85] J. Kim and J. Cho, ‘‘Low-cost embedded system using convolutional Commun., Dec. 2018, pp. 208–212.
neural networks-based spatiotemporal feature map for real-time human [107] C. Yang, Y. Wang, X. Wang, and L. Geng, ‘‘A stride-based convolution
action recognition,’’ Appl. Sci., vol. 11, no. 11, p. 4940, May 2021. decomposition method to stretch CNN acceleration algorithms for
[86] D. U. Jeong and K. M. Lim, ‘‘Convolutional neural network for efficient and flexible hardware implementation,’’ IEEE Trans. Circuits
classification of eight types of arrhythmia using 2D time–frequency Syst. I, Reg. Papers, vol. 67, no. 9, pp. 3007–3020, Sep. 2020.
feature map from standard 12-lead electrocardiogram,’’ Sci. Rep., vol. 11, [108] H. Naseri and V. Mehrdad, ‘‘Novel CNN with investigation on accuracy
no. 1, Aug. 2021, Art. no. 20396. by modifying stride, padding, kernel size and filter numbers,’’ Multimedia
[87] W. Lu, H. Sun, J. Chu, X. Huang, and J. Yu, ‘‘A novel approach for video Tools Appl., vol. 82, no. 15, pp. 23673–23691, Jun. 2023.
text detection and recognition based on a corner response feature map [109] S. Tummalapalli, L. Kumar, and N. L. B. Murthy, ‘‘Web service anti-
and transferred deep convolutional neural network,’’ IEEE Access, vol. 6, patterns detection using CNN with varying sequence padding size,’’ in
pp. 40198–40211, 2018. Proc. 12th Ind. Symp. Conjunct, 2022, pp. 153–165.
[88] F. Wang, C. Yang, S. Huang, and H. Wang, ‘‘Automatic modulation [110] C. Guo, Y.-L. Liu, and X. Jiao, ‘‘Study on the influence of variable stride
classification based on joint feature map and convolutional neural scale change on image recognition in CNN,’’ Multimedia Tools Appl.,
network,’’ IET Radar, Sonar Navigat., vol. 13, no. 6, pp. 998–1003, vol. 78, no. 21, pp. 30027–30037, Nov. 2019.
Jun. 2019. [111] J.-H. Luo, H. Zhang, H.-Y. Zhou, C.-W. Xie, J. Wu, and W. Lin, ‘‘ThiNet:
[89] J. Zou, T. Rui, Y. Zhou, C. Yang, and S. Zhang, ‘‘Convolutional neural Pruning CNN filters for a thinner net,’’ IEEE Trans. Pattern Anal. Mach.
network simplification via feature map pruning,’’ Comput. Electr. Eng., Intell., vol. 41, no. 10, pp. 2525–2538, Oct. 2019.
vol. 70, pp. 950–958, Aug. 2018. [112] A. Barroso-Laguna and K. Mikolajczyk, ‘‘Key.Net: Keypoint detection
[90] C. K. Dewa, ‘‘Suitable CNN weight initialization and activation function by handcrafted and learned CNN filters revisited,’’ IEEE Trans. Pattern
for Javanese vowels classification,’’ Proc. Comput. Sci., vol. 144, Anal. Mach. Intell., vol. 45, no. 1, pp. 698–711, Jan. 2023.
pp. 124–132, Jun. 2018. [113] W. S. Ahmed and A. A. A. Karim, ‘‘The impact of filter size and number
[91] W. Hao, W. Yizhou, L. Yaqin, and S. Zhili, ‘‘The role of activation of filters on classification accuracy in CNN,’’ in Proc. Int. Conf. Comput.
function in CNN,’’ in Proc. 2nd Int. Conf. Inf. Technol. Comput. Appl. Sci. Softw. Eng. (CSASE), Apr. 2020, pp. 88–93.
(ITCA), Dec. 2020, pp. 429–432. [114] F. Rosenblatt, ‘‘The perceptron: A probabilistic model for information
[92] A. Mondal and V. K. Shrivastava, ‘‘A novel parametric Flatten-p storage and organization in the brain,’’ Psychol. Rev., vol. 65, no. 6,
mish activation function based deep CNN model for brain tumor pp. 386–408, 1958.
classification,’’ Comput. Biol. Med., vol. 150, Nov. 2022, Art. no. 106183. [115] F. Rosenblatt, Principles of Neurodynamics: Perceptrons and the Theory
of Brain Mechanisms. Washington, DC, USA: Spartan books, 1962.
[93] B. Khagi and G.-R. Kwon, ‘‘A novel scaled-gamma-tanh (SGT) activation
function in 3D CNN applied for MRI classification,’’ Sci. Rep., vol. 12, [116] B. Widrow and M. E. Hoff, ‘‘Adaptive switching circuits,’’ in IRE
no. 1, p. 14978, Sep. 2022. WESCON Convention Record, vol. 4. New York, NY, USA, 1960,
pp. 96–104.
[94] T. Jannat, Md. A. Hossain, and A. Sayeed, ‘‘An effective approach
[117] P. J. Werbos, The Roots of Backpropagation: From Ordered Derivatives
for hyperspectral image classification based on 3D CNN with mish
to Neural Networks and Political Forecasting. Hoboken, NJ, USA: Wiley,
activation function,’’ in Proc. 25th Int. Conf. Comput. Inf. Technol.
1994.
(ICCIT), Dec. 2022, pp. 1074–1079.
[118] D. H. Hubel and T. N. Wiesel, ‘‘Receptive fields, binocular interaction and
[95] Y. Jiang, J. Xie, and D. Zhang, ‘‘An adaptive offset activation function for
functional architecture in the cat’s visual cortex,’’ J. Physiol., vol. 160,
CNN image classification tasks,’’ Electronics, vol. 11, no. 22, p. 3799,
no. 1, pp. 106–154, Jan. 1962.
Nov. 2022.
[119] K. Fukushima, ‘‘Neocognitron: A self-organizing neural network model
[96] Y. LeCun, L. Bottou, G. B. Orr, and K.-R. M´’uller, ‘‘Efficient BackProp,’’
for a mechanism of pattern recognition unaffected by shift in position,’’
in Neural Networks: Tricks of the Trade. Cham, Switzerland: Springer,
Biol. Cybern., vol. 36, no. 4, pp. 193–202, Apr. 1980.
1998, pp. 9–50.
[120] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, ‘‘Gradient-based
[97] T. Wang, D. J. Wu, A. Coates, and A. Y. Ng, ‘‘End-to-end text recognition learning applied to document recognition,’’ Proc. IEEE, vol. 86, no. 11,
with convolutional neural networks,’’ in Proc. 21st Int. Conf. Pattern pp. 2278–2324, 1998.
Recognit. (ICPR), Nov. 2012, pp. 3304–3308.
[121] Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard,
[98] B. Xu, N. Wang, T. Chen, and M. Li, ‘‘Empirical evaluation of rectified and L. Jackel, ‘‘Handwritten digit recognition with a back-propagation
activations in convolutional network,’’ 2015, arXiv:1505.00853. network,’’ in Proc. Adv. Neural Inf. Process. Syst., 1989, pp. 1–18.
[99] P. Ramachandran, B. Zoph, and Q. V. Le, ‘‘Searching for activation [122] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, ‘‘Greedy layer-
functions,’’ 2017, arXiv:1710.05941. wise training of deep networks,’’ in Proc. Adv. Neural Inf. Process. Syst.,
[100] S. Hochreiter, ‘‘The vanishing gradient problem during learning recurrent 2006, pp. 1–16.
neural nets and problem solutions,’’ Int. J. Uncertainty, Fuzziness Knowl.- [123] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘Imagenet classification
Based Syst., vol. 6, no. 2, pp. 107–116, Apr. 1998. with deep convolutional neural networks,’’ in Proc. Adv. Neural Inf.
[101] C. Nwankpa, W. Ijomah, A. Gachagan, and S. Marshall, ‘‘Activation Process. Syst., 2012, pp. 1–14.
functions: Comparison of trends in practice and research for deep [124] A. Krizhevsky, I. Sutskever, and G. E. Hinton, ‘‘ImageNet classification
learning,’’ 2018, arXiv:1811.03378. with deep convolutional neural networks,’’ Commun. ACM, vol. 60, no. 6,
[102] D. Dang, S. V. R. Chittamuru, S. Pasricha, R. Mahapatra, and D. Sahoo, pp. 84–90, May 2017.
‘‘BPLight-CNN: A photonics-based backpropagation accelerator for [125] V. Nair and G. E. Hinton, ‘‘Rectified linear units improve restricted
deep learning,’’ ACM J. Emerg. Technol. Comput. Syst., vol. 17, no. 4, Boltzmann machines,’’ in Proc. 27th Int. Conf. Mach. Learn., 2010,
pp. 1–26, Oct. 2021. pp. 807–814.
[103] A. Mazouz and C. P. Bridges, ‘‘Automated CNN back-propagation [126] M. D. Zeiler and R. Fergus, ‘‘Visualizing and understanding convolu-
pipeline generation for FPGA online training,’’ J. Real-Time Image tional networks,’’ in Proc. Eur. Conf. Comput. Vis., 2014, pp. 818–833.
Process., vol. 18, no. 6, pp. 2583–2599, Dec. 2021. [127] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
[104] V. Raj, R. Kumar, and N. S. Kumar, ‘‘An scrupulous framework to V. Vanhoucke, and A. Rabinovich, ‘‘Going deeper with convolutions,’’
forecast the weather using CNN with back propagation method,’’ in in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015,
Proc. 4th Int. Conf. Adv. Comput., Commun. Control Netw., Dec. 2022, pp. 1–9.
pp. 177–181. [128] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van
[105] Y. Jaramillo-Munera, Lina M. Sepulveda-Cano, A. E. Castro-Ospina, den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam,
L. Duque-Muñoz, and J. D. Martinez-Vargas, ‘‘Classification of epileptic M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner,
seizures based on CNN and guided back-propagation for interpretation I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and
analysis,’’ in Proc. Int. Conf. Smart Technol., Syst. Appl., 2022, D. Hassabis, ‘‘Mastering the game of go with deep neural networks and
pp. 212–226. tree search,’’ Nature, vol. 529, no. 7587, pp. 484–489, Jan. 2016.
[129] Q. Zhang, L. T. Yang, Z. Chen, and P. Li, ‘‘A survey on deep learning for [152] T. Liu, L. Zhang, Y. Wang, J. Guan, Y. Fu, J. Zhao, and S. Zhou, ‘‘Recent
big data,’’ Inf. Fusion, vol. 42, pp. 146–157, Jul. 2018. few-shot object detection algorithms: A survey with performance
[130] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for comparison,’’ ACM Trans. Intell. Syst. Technol., vol. 14, no. 4, pp. 1–36,
large-scale image recognition,’’ 2014, arXiv:1409.1556. Aug. 2023.
[131] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ‘‘ImageNet: [153] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and
A large-scale hierarchical image database,’’ in Proc. IEEE Conf. Comput. A. Zisserman, ‘‘The Pascal visual object classes (VOC) challenge,’’ Int.
Vis. Pattern Recognit., Jun. 2009, pp. 248–255. J. Comput. Vis., vol. 88, no. 2, pp. 303–338, Jun. 2010.
[132] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for [154] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang,
image recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei,
(CVPR), Jun. 2016, pp. 770–778. ‘‘ImageNet large scale visual recognition challenge,’’ Int. J. Comput. Vis.,
[133] D. Bhatt, C. Patel, H. Talsania, J. Patel, R. Vaghela, S. Pandya, vol. 115, no. 3, pp. 211–252, Dec. 2015.
K. Modi, and H. Ghayvat, ‘‘CNN variants for computer vision: History, [155] L. Nanni, S. Ghidoni, and S. Brahnam, ‘‘Handcrafted vs. non-handcrafted
architecture, application, challenges and future scope,’’ Electronics, features for computer vision classification,’’ Pattern Recognit., vol. 71,
vol. 10, no. 20, p. 2470, Oct. 2021. pp. 158–172, Nov. 2017.
[134] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, ‘‘Densely [156] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards real-time
connected convolutional networks,’’ in Proc. IEEE Conf. Comput. Vis. object detection with region proposal networks,’’ in Proc. Adv. Neural Inf.
Pattern Recognit. (CVPR), Jul. 2017, pp. 2261–2269. Process. Syst., 2015, pp. 1–26.
[135] C. Zhang, P. Benz, D. M. Argaw, S. Lee, J. Kim, F. Rameau, J.-C. Bazin, [157] A. Karpathy and L. Fei-Fei, ‘‘Deep visual-semantic alignments for
and I. S. Kweon, ‘‘ResNet or DenseNet? Introducing dense shortcuts generating image descriptions,’’ in Proc. IEEE Conf. Comput. Vis. Pattern
to ResNet,’’ in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Recognit. (CVPR), Jun. 2015, pp. 3128–3137.
Jan. 2021, pp. 3549–3558. [158] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel,
[136] C. Szegedy, S. Loffe, V. Vanhoucke, and A. Alemi, ‘‘Inception-v4, and Y. Bengio, ‘‘Show, attend and tell: Neural image caption generation
inception-resnet and the impact of residual connections on learning,’’ in with visual attention,’’ in Proc. 32nd Int. Conf. Mach. Learn. (ICML),
Proc. AAAI Conf. Artif. Intell., vol. 31, 2017, pp. 1–50. vol. 37, 2015, pp. 2048–2057.
[137] S. Zagoruyko and N. Komodakis, ‘‘Wide residual networks,’’ 2016, [159] Q. Wu, C. Shen, P. Wang, A. Dick, and A. v. d. Hengel, ‘‘Image
arXiv:1605.07146. captioning and visual question answering based on attributes and external
[138] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, ‘‘Aggregated residual knowledge,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 6,
transformations for deep neural networks,’’ in Proc. IEEE Conf. Comput. pp. 1367–1381, Jun. 2018.
Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 5987–5995. [160] B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, ‘‘Simultaneous
[139] X. Zhang, Z. Li, C. C. Loy, and D. Lin, ‘‘PolyNet: A pursuit of structural detection and segmentation,’’ in Proc. Eur. Conf. Comput. Vis., 2014,
diversity in very deep networks,’’ in Proc. IEEE Conf. Comput. Vis. pp. 297–312.
Pattern Recognit. (CVPR), Jul. 2017, pp. 3900–3908. [161] B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, ‘‘Hypercolumns for
[140] D. Han, J. Kim, and J. Kim, ‘‘Deep pyramidal residual networks,’’ in object segmentation and fine-grained localization,’’ in Proc. IEEE Conf.
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, pp. 447–456.
pp. 6307–6315. [162] J. Dai, K. He, and J. Sun, ‘‘Instance-aware semantic segmentation via
[141] F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, multi-task network cascades,’’ in Proc. IEEE Conf. Comput. Vis. Pattern
and X. Tang, ‘‘Residual attention network for image classification,’’ in Recognit. (CVPR), Jun. 2016, pp. 3150–3158.
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, [163] J. Fourie, S. Mills, and R. Green, ‘‘Harmony filter: A robust visual
pp. 6450–6458. tracking system using the improved harmony search algorithm,’’ Image
[142] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, ‘‘CBAM: Convolutional block Vis. Comput., vol. 28, no. 12, pp. 1702–1716, Dec. 2010.
attention module,’’ in Proc. Eur. Conf. Comput. Vis., Sep. 2018, pp. 3–19. [164] E. Cuevas, N. Ortega-Sánchez, D. Zaldivar, and M. Pérez-Cisneros,
[143] A. Khan, A. Sohail, and A. Ali, ‘‘A new channel boosted convolutional ‘‘Circle detection by harmony search optimization,’’ J. Intell. Robotic
neural network using transfer learning,’’ 2018, arXiv:1804.08528. Syst., vol. 66, no. 3, pp. 359–376, May 2012.
[144] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, [165] K. Kang, H. Li, J. Yan, X. Zeng, B. Yang, T. Xiao, C. Zhang,
M. Andreetto, and H. Adam, ‘‘MobileNets: Efficient convolutional neural Z. Wang, R. Wang, X. Wang, and W. Ouyang, ‘‘T-CNN: Tubelets with
networks for mobile vision applications,’’ 2017, arXiv:1704.04861. convolutional neural networks for object detection from videos,’’ IEEE
[145] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, Trans. Circuits Syst. Video Technol., vol. 28, no. 10, pp. 2896–2907,
‘‘MobileNetV2: Inverted residuals and linear bottlenecks,’’ in Oct. 2018.
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, [166] P. Viola and M. Jones, ‘‘Rapid object detection using a boosted cascade of
pp. 4510–4520. simple features,’’ in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
[146] M. Tan and Q. Le, ‘‘EfficientNet: Rethinking model scaling for Recognit. CVPR, Jul. 2001, pp. 1–68.
convolutional neural networks,’’ in Proc. Int. Conf. Mach. Learn., 2019, [167] P. Viola and M. J. Jones, ‘‘Robust real-time face detection,’’ Int.
pp. 6105–6114. J. Comput. Vis., vol. 57, no. 2, pp. 137–154, May 2004.
[147] K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, [168] N. Dalal and B. Triggs, ‘‘Histograms of oriented gradients for human
C. Xu, Y. Xu, Z. Yang, Y. Zhang, and D. Tao, ‘‘A survey on vision detection,’’ in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
transformer,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 1, Recognit., Jul. 2005, pp. 886–893.
pp. 87–110, Jan. 2023. [169] D. G. Lowe, ‘‘Object recognition from local scale-invariant features,’’ in
[148] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and Proc. 7th IEEE Int. Conf. Comput. Vis., Jan. 1999, pp. 1150–1157.
B. Guo, ‘‘Swin transformer: Hierarchical vision transformer using shifted [170] D. G. Lowe, ‘‘Distinctive image features from scale-invariant keypoints,’’
windows,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2021, Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004.
pp. 9992–10002. [171] S. Belongie, J. Malik, and J. Puzicha, ‘‘Shape matching and object
[149] S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, and recognition using shape contexts,’’ IEEE Trans. Pattern Anal. Mach.
S. Xie, ‘‘ConvNeXt v2: Co-designing and scaling ConvNets with masked Intell., vol. 24, no. 4, pp. 509–522, Apr. 2002.
autoencoders,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. [172] P. Felzenszwalb, D. McAllester, and D. Ramanan, ‘‘A discriminatively
(CVPR), Jun. 2023, pp. 16133–16142. trained, multiscale, deformable part model,’’ in Proc. IEEE Conf. Comput.
[150] M. Hussain, H. Al-Aqrabi, M. Munawar, and R. Hill, ‘‘Feature Vis. Pattern Recognit., Jun. 2008, pp. 1–8.
mapping for Rice leaf defect detection based on a custom convolutional [173] P. F. Felzenszwalb, R. B. Girshick, and D. McAllester, ‘‘Cascade object
architecture,’’ Foods, vol. 11, no. 23, p. 3914, Dec. 2022. detection with deformable part models,’’ in Proc. IEEE Comput. Soc.
[151] M. Hussain and H. Al-Aqrabi, ‘‘Child emotion recognition via custom Conf. Comput. Vis. Pattern Recognit., Jun. 2010, pp. 2241–2248.
lightweight CNN architecture,’’ in Kids Cybersecurity Using Compu- [174] T. Malisiewicz, A. Gupta, and A. A. Efros, ‘‘Ensemble of exemplar-
tational Intelligence Techniques. Cham, Switzerland: Springer, 2023, SVMs for object detection and beyond,’’ in Proc. Int. Conf. Comput. Vis.,
pp. 165–174. Nov. 2011, pp. 89–96.
[175] Y.-F. Li, J. T. Kwok, I. W. Tsang, and Z.-H. Zhou, ‘‘A convex method [202] C.-Y. Wang, A. Bochkovskiy, and H.-Y.-M. Liao, ‘‘YOLOv7: Trainable
for locating regions of interest with multi-instance learning,’’ in Proc. bag-of-freebies sets new state-of-the-art for real-time object detectors,’’
Mach. Learn. Knowl. Discovery Databases, Eur. Conf., 2009, pp. 15–30. in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
[176] D. Forsyth, ‘‘Object detection with discriminatively trained part-based Jun. 2023, pp. 7464–7475.
models,’’ Computer, vol. 47, no. 2, pp. 6–7, Feb. 2014. [203] (2023). Ultralytics Github Repository. Accessed: Jan. 16, 2024. [Online].
[177] R. Girshick, J. Donahue, T. Darrell, and J. Malik, ‘‘Rich feature Available: https://github.com/ultralytics/ultralytics
hierarchies for accurate object detection and semantic segmentation,’’ in [204] F. J. Solawetz. (2023). What is YOLOv8? The Ultimate Guide. Accessed:
Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 580–587. Jan. 16, 2024. [Online]. Available: https://blog.roboflow.com/whats-new-
[178] R. Girshick, P. Felzenszwalb, and D. McAllester, ‘‘Object detection in-yolov8/
with grammar models,’’ in Proc. Adv. Neural Inf. Process. Syst., 2011, [205] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and
pp. 1–47. A. C. Berg, ‘‘SSD: Single shot multibox detector,’’ in Proc. 14th Eur.
[179] R. B. Girshick, From Rigid Templates to Grammars: Object Detection Conf., Oct. 2016, pp. 21–37.
With Structured Models. Chicago, IL, USA: University of Chicago, 2012. [206] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, ‘‘Focal loss for
[180] R. Girshick, J. Donahue, T. Darrell, and J. Malik, ‘‘Region-based dense object detection,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
convolutional networks for accurate object detection and segmentation,’’ Oct. 2017, pp. 2999–3007.
IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 1, pp. 142–158, [207] B. Wu, A. Wan, F. Iandola, P. H. Jin, and K. Keutzer, ‘‘SqueezeDet:
Jan. 2016. Unified, small, low power fully convolutional neural networks for
[181] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and real-time object detection for autonomous driving,’’ in Proc. IEEE
A. W. M. Smeulders, ‘‘Selective search for object recognition,’’ Int. Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jul. 2017,
J. Comput. Vis., vol. 104, no. 2, pp. 154–171, Sep. 2013. pp. 446–454.
[182] R. B. Girshick, P. F. Felzenszwalb, and D. McAllester. (2012). Discrimi- [208] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally,
natively Trained Deformable Part Models. Accessed: Jan. 25, 2024. and K. Keutzer, ‘‘SqueezeNet: AlexNet-level accuracy with 50x fewer
[183] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Spatial pyramid pooling in parameters and <0.5MB model size,’’ 2016, arXiv:1602.07360.
deep convolutional networks for visual recognition,’’ IEEE Trans. Pattern [209] G.-S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, M. Datcu,
Anal. Mach. Intell., vol. 37, no. 9, pp. 1904–1916, Sep. 2015. M. Pelillo, and L. Zhang, ‘‘DOTA: A large-scale dataset for object
[184] R. Girshick, ‘‘Fast R-CNN,’’ 2015, arXiv:1504.08083. detection in aerial images,’’ in Proc. IEEE/CVF Conf. Comput. Vis.
[185] J. Dai, Yi Li, K. He, and J. Sun, ‘‘R-FCN: Object detection via region- Pattern Recognit., Jun. 2018, pp. 3974–3983.
based fully convolutional networks,’’ in Proc. Adv. Neural Inf. Process. [210] H. Law and J. Deng, ‘‘CornerNet: Detecting objects as paired keypoints,’’
Syst., 2016, pp. 1–48. in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 734–750.
[186] Z. Li, C. Peng, G. Yu, X. Zhang, Y. Deng, and J. Sun, ‘‘Light-head R- [211] X. Zhou, D. Wang, and P. Krähenbuhl, ‘‘Objects as points,’’ 2019,
CNN: In defense of two-stage object detector,’’ 2017, arXiv:1711.07264. arXiv:1904.07850.
[187] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. [212] X. Zeng, W. Ouyang, J. Yan, H. Li, T. Xiao, K. Wang, Y. Liu, Y. Zhou,
Belongie, ‘‘Feature pyramid networks for object detection,’’ in Proc. B. Yang, Z. Wang, H. Zhou, and X. Wang, ‘‘Crafting GBD-net for
IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, object detection,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 9,
pp. 936–944. pp. 2109–2123, Sep. 2018.
[188] K. He, G. Gkioxari, P. Dollár, and R. Girshick, ‘‘Mask R-CNN,’’ in Proc. [213] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and
IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2980–2988. S. Zagoruyko, ‘‘End-to-end object detection with transformers,’’ in Proc.
[189] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ‘‘You only look once: Eur. Conf. Comput. Vis. Cham, Switzerland: Springer, 2020, pp. 213–229.
Unified, real-time object detection,’’ in Proc. IEEE Conf. Comput. Vis. [214] X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, ‘‘Deformable
Pattern Recognit. (CVPR), Jun. 2016, pp. 779–788. DETR: Deformable transformers for end-to-end object detection,’’ 2020,
[190] J. Redmon. (2013). Darknet: Open Source Neural Networks in arXiv:2010.04159.
C. Accessed: Jan. 16, 2024. [Online]. Available: https://pjreddie. [215] (2024). Theano. [Online]. Available: http://deeplearning.net/software/
com/darknet theano/
[191] J. Redmon and A. Farhadi, ‘‘YOLO9000: Better, faster, stronger,’’ [216] (2024). Berkeley Vision and Learning Center. [Online]. Available:
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jul. 2017, http://caffe.berkeleyvision.org/
pp. 6517–6525. [217] (2024). Deeplearning4j. [Online]. Available: https://deeplearning4j.org
[192] J. Redmon and A. Farhadi, ‘‘YOLOv3: An incremental improvement,’’ [218] (2024). TensorFlow. [Online]. Available: https://www.tensorflow.org/
2018, arXiv:1804.02767. [219] (2024). Keras. [Online]. Available: https://keras.io/
[193] T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, [220] (2024). Chainer. [Online]. Available: https://chainer.org
P. Dollar, and C. L. Zitnick, ‘‘Microsoft COCO: Common [221] (2024). Apache. [Online]. Available: http://singa.apache.org
objects in context,’’ in Proc. Eur. Conf. Comput. Vis., 2014, [222] (2024). MXnet. [Online]. Available: http://mxnet.io/
pp. 740–755. [223] (2024). Microsoft Cognitive Toolkit CNTK. [Online]. Available:
[194] A. Bochkovskiy, C.-Y. Wang, and H.-Y. Mark Liao, ‘‘YOLOv4: Optimal https://www.microsoft.com/en-us/research/product/cognitive-toolkit/
speed and accuracy of object detection,’’ 2020, arXiv:2004.10934. [224] (2024). PyTorch. [Online]. Available: https://pytorch.org
[195] Z. Yao, Y. Cao, S. Zheng, G. Huang, and S. Lin, ‘‘Cross-iteration batch [225] (2024). Neon. [Online]. Available: http://neon.nerva-nasys.com/docs/
normalization,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. latest
(CVPR), Jun. 2021, pp. 12326–12335. [226] (2024). BigDL. [Online]. Available: https://github.com/intel-analytics/
[196] D. Misra, ‘‘Mish: A self regularized non-monotonic activation function,’’ BigDL
2019, arXiv:1908.08681. [227] (2024). Clarifai. [Online]. Available: https://clarifai.com/
[197] G. Jocher et al., ‘‘Ultralytics/YOLOv5: V3.0,’’ Aug. 2020. [228] (2024). Cloud Sight. [Online]. Available: https://cloudsight.readme.
[198] A. S. G. Jocher. (2021). Ultralytics/YOLOv5: V4.0—Activations, Weights io/v1.0/docs
& Biases Logging, PyTorch Hub Integration, 2021. [Online]. Available: [229] (2024). Microsoft Cognitive Service. [Online]. Available:
https://zenodo.org/records/4418161 https://www.microsoft.com/cognitive-services/en-us/computer-vision-
[199] M. Hussain, ‘‘YOLO-v1 to YOLO-v8, the rise of YOLO and its api
complementary nature toward digital manufacturing and industrial defect [230] (2024). Google Cloud Vision API. [Online]. Available:
detection,’’ Machines, vol. 11, no. 7, p. 677, Jun. 2023. https://cloud.google.com/vision/
[200] C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng, [231] (2024). IBM Watson Vision Recognition Service. [Online]. Available:
W. Nie, Y. Li, B. Zhang, Y. Liang, L. Zhou, X. Xu, X. Chu, X. Wei, http://www.ibm.com/watson/developercloud/visual-recognition.html
and X. Wei, ‘‘YOLOv6: A single-stage object detection framework for [232] (2024). Amazon Recognition. [Online]. Available:
industrial applications,’’ 2022, arXiv:2209.02976. https://aws.amazon.com/rekognition/
[201] S. Rath. (2022). YOLOv6 Object Detection Tutorial. Accessed: [233] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams,
Jan. 16, 2024. [Online]. Available: https://learnopencv.com/yolov6- J. Winn, and A. Zisserman, ‘‘The Pascal visual object classes challenge: A
object-detection/ retrospective,’’ Int. J. Comput. Vis., vol. 111, no. 1, pp. 98–136, Jan. 2015.
[234] A. Kuznetsova, H. Rom, N. Alldrin, J. Uijlings, I. Krasin, J. Pont- [256] R. Jin and Q. Niu, ‘‘Automatic fabric defect detection based on
Tuset, S. Kamali, S. Popov, M. Malloci, A. Kolesnikov, T. Duerig, and an improved YOLOv5,’’ Math. Problems Eng., vol. 2021, pp. 1–13,
V. Ferrari, ‘‘The open images dataset v4: Unified image classification, Sep. 2021.
object detection, and visual relationship detection at scale,’’ Int. J. [257] Raspberry Pi 4 Model B. Accessed: Jan. 10, 2024. [Online]. Available:
Comput. Vis., vol. 128, no. 7, pp. 1956–1981, Jul. 2020. https://thepihut.com/collections/raspberry-pi/products/raspberry-pi-4-
[235] R. Benenson, S. Popov, and V. Ferrari, ‘‘Large-scale interactive object model-b
segmentation with human annotators,’’ in Proc. IEEE/CVF Conf. Comput. [258] M. B. Mohamad Noor and W. H. Hassan, ‘‘Current research on Internet of
Vis. Pattern Recognit. (CVPR), Jun. 2019, pp. 11692–11701. Things (IoT) security: A survey,’’ Comput. Netw., vol. 148, pp. 283–294,
[236] S. Shao, Z. Li, T. Zhang, C. Peng, G. Yu, X. Zhang, J. Li, and Jan. 2019.
J. Sun, ‘‘Objects365: A large-scale, high-quality dataset for object [259] A. G. Frank, L. S. Dalenogare, and N. F. Ayala, ‘‘Industry 4.0
detection,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), Oct. 2019, technologies: Implementation patterns in manufacturing companies,’’ Int.
pp. 8429–8438. J. Prod. Econ., vol. 210, pp. 15–26, Apr. 2019.
[237] I. Krasin, T. Duerig, N. Alldrin, V. Ferrari, S. Abu-El-Haija, [260] U. Farooq, Z. Marrakchi, and H. Mehrez, ‘‘FPGA architectures: An
A. Kuznetsova, H. Rom, J. Uijlings, S. Popov, and A. Veit, ‘‘OpenImages: overview,’’ in Tree-Based Heterogeneous FPGA Architectures: Appli-
A public dataset for large-scale multi-label and multi-class image cation Specific Exploration and Optimization. New York, NY, USA:
classification,’’ Dataset, vol. 2, no. 3, p. 18, 2017. Springer, 2012, pp. 7–48.
[261] J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu,
[238] F. Ciaglia, F. Saverio Zuppichini, P. Guerrie, M. McQuade, and
S. Song, Y. Wang, and H. Yang, ‘‘Going deeper with embedded FPGA
J. Solawetz, ‘‘Roboflow 100: A rich, multi-domain object detection
platform for convolutional neural network,’’ in Proc. ACM/SIGDA Int.
benchmark,’’ 2022, arXiv:2211.13523.
Symp. Field-Programmable Gate Arrays, Feb. 2016, pp. 26–35.
[239] P. Dollar, C. Wojek, B. Schiele, and P. Perona, ‘‘Pedestrian detection:
[262] E. Nurvitadhi, G. Venkatesh, J. Sim, D. Marr, R. Huang, J. O. G. Hock,
A benchmark,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
Y. T. Liew, K. Srivatsan, D. Moss, S. Subhaschandra, and G. Boudoukh,
Jun. 2009, pp. 304–311.
‘‘Can FPGAs beat GPUs in accelerating next-generation deep neural
[240] P. Dollar, C. Wojek, B. Schiele, and P. Perona, ‘‘Pedestrian detection: An networks?’’ in Proc. ACM/SIGDA Int. Symp. Field-Programmable Gate
evaluation of the state of the art,’’ IEEE Trans. Pattern Anal. Mach. Intell., Arrays, Feb. 2017, pp. 5–14.
vol. 34, no. 4, pp. 743–761, Apr. 2012. [263] Y. Liu, P. Liu, Y. Jiang, M. Yang, K. Wu, W. Wang, and Q. Yao, ‘‘Building
[241] K. Neshatpour, M. Malik, M. A. Ghodrat, A. Sasan, and H. Homayoun, a multi-FPGA-based emulation framework to support networks-on-chip
‘‘Energy-efficient acceleration of big data analytics applications using design and verification,’’ Int. J. Electron., vol. 97, no. 10, pp. 1241–1262,
FPGAs,’’ in Proc. IEEE Int. Conf. Big Data, Oct. 2015, pp. 115–123. Oct. 2010.
[242] V. Kontorinis, L. E. Zhang, B. Aksanli, J. Sampson, H. Homayoun, [264] P. Dondon, J. Carvalho, R. Gardere, P. Lahalle, G. Tsenov, and
E. Pettis, D. M. Tullsen, and T. S. Rosing, ‘‘Managing distributed ups V. Mladenov, ‘‘Implementation of a feed-forward artificial neural net-
energy for effective power capping in data centers,’’ ACM SIGARCH work in VHDL on FPGA,’’ in Proc. 12th Symp. Neural Netw. Appl. Electr.
Comput. Archit. News, vol. 40, no. 3, pp. 488–499, Sep. 2012. Eng. (NEUREL), Nov. 2014, pp. 37–40.
[243] N. Hardavellas, M. Ferdman, B. Falsafi, and A. Ailamaki, ‘‘Toward dark [265] C. Ünsalan and B. Tar, Digital System Design With FPGA: Implementa-
silicon in servers,’’ IEEE Micro, vol. 31, no. 4, pp. 6–15, Jul. 2011. tion Using Verilog and VHDL. New York, NY, USA: McGraw-Hill, 2017.
[244] C. Yan and T. Yue, ‘‘A novel method for dynamic modelling and real- [266] R. Zhao, W. Song, W. Zhang, T. Xing, J.-H. Lin, M. Srivastava, R. Gupta,
time rendering based on GPU,’’ Geo-Inf. Sci., vol. 14, no. 2, pp. 149–157, and Z. Zhang, ‘‘Accelerating binarized convolutional neural networks
Aug. 2012. with software-programmable FPGAs,’’ in Proc. ACM/SIGDA Int. Symp.
[245] A. R. Brodtkorb, T. R. Hagen, and M. L. Sætra, ‘‘Graphics processing unit Field-Programmable Gate Arrays, Feb. 2017, pp. 15–24.
(GPU) programming strategies and trends in GPU computing,’’ J. Parallel [267] X. Wei, Y. Liang, and J. Cong, ‘‘Overcoming data transfer bottlenecks
Distrib. Comput., vol. 73, no. 1, pp. 4–13, Jan. 2013. in FPGA-based DNN accelerators via layer conscious memory manage-
[246] R. Barrett, M. Chakraborty, D. Amirkulova, H. Gandhi, G. Wellawatte, ment,’’ in Proc. 56th ACM/IEEE Design Autom. Conf. (DAC), Jun. 2019,
and A. White, ‘‘HOOMD-TF: GPU-accelerated, online machine learning pp. 1–6.
in the HOOMD-blue molecular dynamics engine,’’ J. Open Source Softw., [268] T. Abtahi, C. Shea, A. Kulkarni, and T. Mohsenin, ‘‘Accelerating
vol. 5, no. 51, p. 2367, Jul. 2020. convolutional neural network with FFT on embedded hardware,’’ IEEE
[247] H. Ma, ‘‘Development of a CPU-GPU heterogeneous platform based Trans. Very Large Scale Integr. (VLSI) Syst., vol. 26, no. 9, pp. 1737–1749,
on a nonlinear parallel algorithm,’’ Nonlinear Eng., vol. 11, no. 1, Sep. 2018.
pp. 215–222, Jun. 2022. [269] S. Kala, B. R. Jose, J. Mathew, and S. Nalesh, ‘‘High-performance
CNN accelerator on FPGA using unified winograd-GEMM architecture,’’
[248] J. E. Stone, D. Gohara, and G. Shi, ‘‘OpenCL: A parallel programming
IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 27, no. 12,
standard for heterogeneous computing systems,’’ Comput. Sci. Eng.,
pp. 2816–2828, Dec. 2019.
vol. 12, no. 3, pp. 66–73, May 2010.
[270] A. Lavin and S. Gray, ‘‘Fast algorithms for convolutional neural
[249] M. Garland, S. Le Grand, J. Nickolls, J. Anderson, J. Hardwick,
networks,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),
S. Morton, E. Phillips, Y. Zhang, and V. Volkov, ‘‘Parallel computing
Jun. 2016, pp. 4013–4021.
experiences with CUDA,’’ IEEE Micro, vol. 28, no. 4, pp. 13–27,
[271] J. Bottleson, S. Kim, J. Andrews, P. Bindu, D. N. Murthy, and J. Jin,
Jul. 2008.
‘‘ClCaffe: OpenCL accelerated caffe for convolutional neural networks,’’
[250] M. Halvorsen, ‘‘Hardware acceleration of convolutional neural net- in Proc. IEEE Int. Parallel Distrib. Process. Symp. Workshops (IPDPSW),
works,’’ M.S. thesis, Dept. Comput. Inf. Sci., Norwegian Univ. Sci. May 2016, pp. 50–57.
Technol. (NTNU), Trondheim, Norway, 2015. [272] S. Winograd, Arithmetic Complex. Computations. Philadelphia, PA,
[251] S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, USA: SIAM, 1980.
and E. Shelhamer, ‘‘CuDNN: Efficient primitives for deep learning,’’ [273] R. DiCecco, G. Lacey, J. Vasiljevic, P. Chow, G. Taylor, and
2014, arXiv:1410.0759. S. Areibi, ‘‘Caffeinated FPGAs: FPGA framework for convolutional
[252] CUDA Convnet2. Accessed: Jan. 10, 2024. [Online]. Available: neural networks,’’ in Proc. Int. Conf. Field-Programmable Technol.
https://code.google.com/archive/p/cuda-convnet2/ (FPT), Dec. 2016, pp. 265–268.
[253] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, [274] M. Sankaradas, V. Jakkula, S. Cadambi, S. Chakradhar, I. Durdanovic,
S. Guadarrama, and T. Darrell, ‘‘Caffe: Convolutional architecture for E. Cosatto, and H. P. Graf, ‘‘A massively parallel coprocessor for
fast feature embedding,’’ in Proc. 22nd ACM Int. Conf. Multimedia, convolutional neural networks,’’ in Proc. 20th IEEE Int. Conf. Appl.-
Nov. 2014, pp. 675–678. Specific Syst., Archit. Processors, Jul. 2009, pp. 53–60.
[254] R. Collobert, K. Kavukcuoglu, and C. Farabet, ‘‘Torch7: A MATLAB-like [275] S. Chakradhar, M. Sankaradas, V. Jakkula, and S. Cadambi, ‘‘A dynam-
environment for machine learning,’’ in Proc. BigLearn, NIPS Workshop, ically configurable coprocessor for convolutional neural networks,’’ in
2011, pp. 1–11. Proc. 37th Annu. Int. Symp. Comput. Archit., Jun. 2010, pp. 247–257.
[255] S. Mittal, ‘‘A survey on optimized implementation of deep learning [276] C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and
models on the NVIDIA Jetson platform,’’ J. Syst. Archit., vol. 97, Y. LeCun, ‘‘NeuFlow: A runtime reconfigurable dataflow processor for
pp. 428–442s, Aug. 2019. vision,’’ in Proc. CVPR Workshops, Jun. 2011, pp. 109–116.
[277] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, ‘‘Optimizing [300] J. J. Coyle, R. A. Novack, B. J. Gibson, and C. J. Langley, Supply
FPGA-based accelerator design for deep convolutional neural networks,’’ Chain Management: A Logistics Perspective. Chennai, India: Cengage
in Proc. ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, Learning, 2021.
Feb. 2015, pp. 161–170. [301] F. Farahnakian, L. Koivunen, T. Mäkilä, and J. Heikkonen, ‘‘Towards
[278] A. Rahman, S. Oh, J. Lee, and K. Choi, ‘‘Design space exploration of autonomous industrial warehouse inspection,’’ in Proc. 26th Int. Conf.
FPGA accelerators for convolutional neural networks,’’ in Proc. Design, Autom. Comput., Sep. 2021, pp. 1–6.
Autom. Test Eur. Conf. Exhibition, Mar. 2017, pp. 1147–1152. [302] M. Hussain, T. Chen, and R. Hill, ‘‘Moving toward smart manufac-
[279] Y. Li, Z. Liu, K. Xu, H. Yu, and F. Ren, ‘‘A GPU-outperforming FPGA turing with an autonomous pallet racking inspection system based
accelerator architecture for binary convolutional neural networks,’’ ACM on MobileNetV2,’’ J. Manuf. Mater. Process., vol. 6, no. 4, p. 75,
J. Emerg. Technol. Comput. Syst., vol. 14, no. 2, pp. 1–16, Apr. 2018. Jul. 2022.
[280] S. Derrien and S. Rajopadhye, ‘‘Loop tiling for reconfigurable accelera- [303] M. Hussain, H. Al-Aqrabi, M. Munawar, R. Hill, and T. Alsboui,
tors,’’ in Proc. Int. Conf. Field Program. Log. Appl., 2001, pp. 398–408. ‘‘Domain feature mapping with YOLOv7 for automated edge-based
[281] B. Liu, M. Wang, H. Foroosh, M. Tappen, and M. Penksy, ‘‘Sparse pallet racking inspections,’’ Sensors, vol. 22, no. 18, p. 6927, Sep. 2022.
convolutional neural networks,’’ in Proc. IEEE Conf. Comput. Vis. Pattern [304] M. Hussain and R. Hill, ‘‘Custom lightweight convolutional neural
Recognit. (CVPR), Jun. 2015, pp. 806–814. network architecture for automated detection of damaged pallet rack-
[282] M. Courbariaux, Y. Bengio, and J.-P. David, ‘‘Training deep neural ing in warehousing & distribution centers,’’ IEEE Access, vol. 11,
networks with low precision multiplications,’’ 2014, arXiv:1412.7024. pp. 58879–58889, 2023.
[283] X. Zhang, X. Liu, A. Ramachandran, C. Zhuge, S. Tang, P. Ouyang, [305] M. Hussain, ‘‘YOLO-v5 variant selection algorithm coupled with
Z. Cheng, K. Rupnow, and D. Chen, ‘‘High-performance video content representative augmentations for modelling production-based variance
recognition with long-term recurrent convolutional network for FPGA,’’ in automated lightweight pallet racking inspection,’’ Big Data Cognit.
in Proc. 27th Int. Conf. Field Program. Log. Appl. (FPL), Sep. 2017, Comput., vol. 7, no. 2, p. 120, Jun. 2023.
pp. 1–4. [306] M. A. R. Alif, ‘‘Attention-based automated pallet racking damage
[284] T.-J. Yang, Y.-H. Chen, and V. Sze, ‘‘Designing energy-efficient convolu- detection,’’ Int. J. Innov. Sci. Res. Technol., vol. 1, no. 1, p. 169, Jun. 2024.
tional neural networks using energy-aware pruning,’’ in Proc. IEEE Conf. [307] D. Hu, ‘‘Automated pallet racking examination in edge platform based on
Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 6071–6079. MobileNetV2: Towards smart manufacturing,’’ J. Grid Comput., vol. 22,
[285] A. Page, A. Jafari, C. Shea, and T. Mohsenin, ‘‘SPARCNet: A hardware no. 1, pp. 1–12, Mar. 2024.
accelerator for efficient deployment of sparse convolutional networks,’’ [308] Q. Luo, X. Fang, L. Liu, C. Yang, and Y. Sun, ‘‘Automated visual defect
ACM J. Emerg. Technol. Comput. Syst., vol. 13, no. 3, pp. 1–32, Jul. 2017. detection for flat steel surface: A survey,’’ IEEE Trans. Instrum. Meas.,
[286] R. Rigamonti, A. Sironi, V. Lepetit, and P. Fua, ‘‘Learning separable vol. 69, no. 3, pp. 626–644, Mar. 2020.
filters,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013, [309] L. Yi, G. Li, and M. Jiang, ‘‘An end-to-end steel strip surface defects
pp. 2754–2761. recognition system based on convolutional neural networks,’’ Steel Res.
[287] N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. Ma, S. Vrudhula, Int., vol. 88, no. 2, Feb. 2017, Art. no. 1600068.
J.-S. Seo, and Y. Cao, ‘‘Throughput-optimized OpenCL-based FPGA [310] Y.-J. Cha, W. Choi, and O. Büyüköztürk, ‘‘Deep learning-based
accelerator for large-scale convolutional neural networks,’’ in Proc. crack damage detection using convolutional neural networks,’’
ACM/SIGDA Int. Symp. Field-Programmable Gate Arrays, Feb. 2016, Comput.-Aided Civil Infrastruct. Eng., vol. 32, no. 5, pp. 361–378,
pp. 16–25. May 2017.
[288] Y. Ma, Y. Cao, S. Vrudhula, and J.-S. Seo, ‘‘Optimizing loop operation [311] Y. Cha, W. Choi, G. Suh, S. Mahmoudkhani, and O. Buyuközturk,
and dataflow in FPGA acceleration of deep convolutional neural ‘‘Autonomous structural visual inspection using region-based deep
networks,’’ in Proc. ACM/SIGDA Int. Symp. Field-Programmable Gate learning for detecting multiple damage types,’’ Comput.-Aided Civil
Arrays, Feb. 2017, pp. 45–54. Infrastruct. Eng., vol. 33, no. 9, pp. 731–747, Sep. 2018.
[289] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio, [312] Y. He, K. Song, Q. Meng, and Y. Yan, ‘‘An end-to-end steel
‘‘Binarized neural networks: Training deep neural networks with weights surface defect detection approach via fusing multiple hierarchical
and activations constrained to +1 or −1,’’ 2016, arXiv:1602.02830. features,’’ IEEE Trans. Instrum. Meas., vol. 69, no. 4, pp. 1493–1504,
[290] J. Rui and N. Qiang, ‘‘Research on textile defects detection based on Apr. 2020.
improved generative adversarial network,’’ J. Engineered Fibers Fabrics, [313] J. Lian, W. Jia, M. Zareapoor, Y. Zheng, R. Luo, D. K. Jain, and N. Kumar,
vol. 17, Jan. 2022, Art. no. 155892502211013. ‘‘Deep-learning-based small surface defect detection via an exaggerated
[291] Y. Qin, R. Purdy, A. Probst, C.-Y. Lin, and J.-G. Zhu, ‘‘ASIC local variation-based generative adversarial network,’’ IEEE Trans. Ind.
implementation of nonlinear CNN-based data detector for TDMR system Informat., vol. 16, no. 2, pp. 1343–1351, Feb. 2020.
in 28 nm CMOS at 200 Mbits/s throughput,’’ IEEE Trans. Magn., vol. 59, [314] Q. Luo, W. Jiang, J. Su, J. Ai, and C. Yang, ‘‘Smoothing complete feature
no. 3, pp. 1–8, Mar. 2023. pyramid networks for roll mark detection of steel strips,’’ Sensors, vol. 21,
[292] Huawei Technol. Co. (2017). Huawei Reveals the Future of no. 21, p. 7264, Oct. 2021.
Mobile AI at IFA. Accessed: Feb. 07, 2024. [Online]. Available: [315] R. Liu, M. Huang, and P. Cao, ‘‘An end-to-end steel strip surface
https://www.huawei.com/en/news/2017/9/mobile-ai-ifa-2017 defects detection framework: Considering complex background interfer-
[293] N. P. Jouppi et al., ‘‘In-datacenter performance analysis of a tensor ence,’’ in Proc. 33rd Chin. Control Decis. Conf. (CCDC), May 2021,
processing unit,’’ in Proc. ACM/IEEE 44th Annu. Int. Symp. Comput. pp. 317–322.
Archit. (ISCA), Jun. 2017, pp. 1–12. [316] X. Feng, X. Gao, and L. Luo, ‘‘X-SDD: A new benchmark for hot rolled
[294] J. Vincent. (2017). The Iphone X’s New Neural Engine Exemplifies steel strip surface defects detection,’’ Symmetry, vol. 13, no. 4, p. 706,
Apple’s Approach To AI. [Online]. Available: https://www.theverge.com/ Apr. 2021.
2017/9/13/16300464/apple-iphone-x-ai-neuralengine [317] D. Yang, Y. Cui, Z. Yu, and H. Yuan, ‘‘Deep learning based steel pipe
[295] W. J. Dally, Y. Turakhia, and S. Han, ‘‘Domain-specific hardware weld defect detection,’’ Appl. Artif. Intell., vol. 35, no. 15, pp. 1237–1249,
accelerators,’’ Commun. ACM, vol. 63, no. 7, pp. 48–57, Jun. 2020. Dec. 2021.
[296] I. Kuon and J. Rose, ‘‘Measuring the gap between FPGAs and ASICs,’’ [318] R. Lal, B. K. Bolla, and E. Sabeesh, ‘‘Efficient neural net approaches
in Proc. ACM/SIGDA 14th Int. Symp. Field Program. Gate Arrays, in metal casting defect detection,’’ Proc. Comput. Sci., vol. 218,
Feb. 2006, pp. 21–30. pp. 1958–1967, Aug. 2023.
[297] K. Ovtcharov, O. Ruwase, J.-Y. Kim, J. Fowers, K. Strauss, and [319] D. Soukup and R. Huber-Mork, ‘‘Convolutional neural networks for steel
E. S. Chung, ‘‘Accelerating deep convolutional neural networks using surface defect detection from photometric stereo images,’’ in Proc. Int.
specialized hardware,’’ Microsoft Res. Whitepaper, vol. 2, no. 11, pp. 1–4, Symp. Vis. Comput., 2014, pp. 668–677.
2015. [320] Z. Liang, H. Zhang, L. Liu, Z. He, and K. Zheng, ‘‘Defect detection of rail
[298] L. Gwennap. (2020). Groq Rocks Neural Networks. Accessed: surface with deep convolutional neural networks,’’ in Proc. 13th World
Jan. 25, 2024. [Online]. Available: https://wow.groq.com/wp- Congr. Intell. Control Autom. (WCICA), Jul. 2018, pp. 1317–1322.
content/uploads/2023/05/GROQ-ROCKS-NEURAL-NETWORKS.pdf [321] V. Badrinarayanan, A. Kendall, and R. Cipolla, ‘‘SegNet: A deep
[299] M. Khazraee, L. Zhang, L. Vega, and M. B. Taylor, ‘‘Moonwalk: NRE convolutional encoder–decoder architecture for image segmentation,’’
optimization in ASIC clouds,’’ ACM SIGPLAN Notices, vol. 52, no. 4, IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495,
pp. 511–526, May 2017. Dec. 2017.
[322] H. Dong, K. Song, Y. He, J. Xu, Y. Yan, and Q. Meng, ‘‘PGA- [346] H. Xin, Z. Chen, and B. Wang, ‘‘PCB electronic component defect
Net: Pyramid feature fusion and global context attention network for detection method based on improved YOLOv4 algorithm,’’ J. Phys. Conf.
automated surface defect detection,’’ IEEE Trans. Ind. Informat., vol. 16, Ser., vol. 1827, no. 1, Mar. 2021, Art. no. 012167.
no. 12, pp. 7448–7458, Dec. 2020. [347] M. Jeon, S. Yoo, and S.-W. Kim, ‘‘A contactless PCBA defect detection
[323] L. Shang, Q. Yang, J. Wang, S. Li, and W. Lei, ‘‘Detection of rail surface method: Convolutional neural networks with thermographic images,’’
defects based on CNN image recognition and classification,’’ in Proc. IEEE Trans. Compon., Packag., Manuf. Technol., vol. 12, no. 3,
20th Int. Conf. Adv. Commun. Technol. (ICACT), Feb. 2018, pp. 1–2. pp. 489–501, Mar. 2022.
[324] S. Yanan, Z. Hui, L. Li, and Z. Hang, ‘‘Rail surface defect detection [348] A. D. Santoso, F. B. Cahyono, B. Prahasta, I. Sutrisno, and A. Khumaidi,
method based on YOLOv3 deep learning networks,’’ in Proc. Chin. ‘‘Development of PCB defect detection system using image processing
Autom. Congr. (CAC), Nov. 2018, pp. 1563–1568. with YOLO CNN method,’’ Int. J. Artif. Intell. Res., vol. 6, no. 1, pp. 1–78,
[325] H. Yuan, H. Chen, S. Liu, J. Lin, and X. Luo, ‘‘A deep convolutional 2022.
neural network for detection of rail surface defect,’’ in Proc. IEEE Vehicle [349] S. Wang, L. Wu, W. Wu, J. Li, X. He, and F. Song, ‘‘Optical fiber defect
Power Propuls. Conf. (VPPC), Oct. 2019, pp. 1–4. detection method based on DSSD network,’’ in Proc. IEEE Int. Conf.
[326] Y. Huang, C. Qiu, and K. Yuan, ‘‘Surface defect saliency of magnetic Smart Internet Things, Aug. 2019, pp. 422–426.
tile,’’ Vis. Comput., vol. 36, no. 1, pp. 85–96, Jan. 2020. [350] S. Mei, Q. Cai, Z. Gao, H. Hu, and G. Wen, ‘‘Deep learning
[327] M. Hussain, H. Al-Aqrabi, and R. Hill, ‘‘PV-CrackNet architecture based automated inspection of weak microscratches in optical fiber
for filter induced augmentation and micro-cracks detection within a connector end-face,’’ IEEE Trans. Instrum. Meas., vol. 70, pp. 1–10,
photovoltaic manufacturing facility,’’ Energies, vol. 15, no. 22, p. 8667, 2021.
Nov. 2022. [351] K. Han, M. Sun, X. Zhou, G. Zhang, H. Dang, and Z. Liu, ‘‘A new
[328] M. Dhimish and P. Mather, ‘‘Development of novel solar cell micro crack method in wheel hub surface defect detection: Object detection algorithm
detection technique,’’ IEEE Trans. Semicond. Manuf., vol. 32, no. 3, based on deep learning,’’ in Proc. Int. Conf. Adv. Mech. Syst., Dec. 2017,
pp. 277–285, Aug. 2019. pp. 335–338.
[329] Clean Energy Associates (CEA), USA. World’s Largest Nighttime Solar [352] X. Sun, J. Gu, R. Huang, R. Zou, and B. Giron Palomares, ‘‘Surface
Module Electroluminescence (EL) Testing Identifies Causes of Six-Figure defects recognition of wheel hub based on improved faster R-CNN,’’
Losses for Major O&M. Accessed: Jun. 25, 2024. [Online]. Available: Electronics, vol. 8, no. 5, p. 481, Apr. 2019.
https://www.cea3.com/cea-blog/worlds-largest-nighttime-solar-module- [353] S. Cheng, J. Lu, M. Yang, S. Zhang, Y. Xu, D. Zhang, and H.
electroluminescence-el-testing-identifies-causes-2021 Wang, ‘‘Wheel hub defect detection based on the DS-cascade RCNN,’’
[330] M. Hussain, T. Chen, S. Titrenko, P. Su, and M. Mahmud, ‘‘A gradient Measurement, vol. 206, Jan. 2023, Art. no. 112208.
guided architecture coupled with filter fused representations for micro- [354] H. Lin, B. Li, X. Wang, Y. Shu, and S. Niu, ‘‘Automated defect inspection
crack detection in photovoltaic cell surfaces,’’ IEEE Access, vol. 10, of LED chip using deep convolutional neural network,’’ J. Intell. Manuf.,
pp. 58950–58964, 2022. vol. 30, no. 6, pp. 2525–2534, Aug. 2019.
[331] Z. Luo, S. Y. Cheng, and Q. Y. Zheng, ‘‘Corrigendum: GAN-based [355] M. L. Stern and M. Schellenberger, ‘‘Fully convolutional networks
augmentation for improving CNN performance of classification of for chip-wise defect detection employing photoluminescence images:
defective photovoltaic module cells in electroluminescence images,’’ IOP Efficient quality control in LED manufacturing,’’ J. Intell. Manuf., vol. 32,
Conf. Series. Earth Environ. Sci., vol. 354, Aug. 2019, Art. no. 012106. no. 1, pp. 113–126, Jan. 2021.
[332] B. Su, H. Chen, P. Chen, G. Bian, K. Liu, and W. Liu, ‘‘Deep learning- [356] P. Zheng, J. Lou, X. Wan, Q. Luo, Y. Li, L. Xie, and Z. Zhu, ‘‘LED chip
based solar-cell manufacturing defect detection with complementary defect detection method based on a hybrid algorithm,’’ Int. J. Intell. Syst.,
attention network,’’ IEEE Trans. Ind. Informat., vol. 17, no. 6, vol. 2023, pp. 1–13, Feb. 2023.
pp. 4084–4095, Jun. 2021. [357] W. Koodtalang, T. Sangsuwan, and S. Sukanna, ‘‘Glass bottle bottom
[333] A. Ahmad, Y. Jin, C. Zhu, I. Javed, A. Maqsood, and M. W. Akram, inspection based on image processing and deep learning,’’ in Proc. Res.,
‘‘Photovoltaic cell defect classification using convolutional neural Invention, Innov. Congr. (RI2C), Dec. 2019, pp. 1–5.
network and support vector machine,’’ IET Renew. Power Gener., vol. 14, [358] X. Zhang, L. Yan, and H. Yan, ‘‘Defect detection of bottled liquor based
no. 14, pp. 2693–2702, Oct. 2020. on deep learning,’’ in Proc. CSAA/IET Int. Conf. Aircr. Utility Syst.,
[334] Textile Handbook, Cotton Spinners Assoc., Hong Kong, 2000. vol. 2020, Sep. 2020, pp. 1259–1264.
[335] H. Y. T. Ngan, G. K. H. Pang, and N. H. C. Yung, ‘‘Automated [359] A. Gizaw and T. Kebebaw, ‘‘Water bottle defect detection system using
fabric defect detection—A review,’’ Image Vis. Comput., vol. 29, no. 7, convolutional neural network,’’ in Proc. Int. Conf. Inf. Commun. Technol.
pp. 442–458, 2011. Develop. Afr., Nov. 2022, pp. 19–24.
[336] J. Zhang, J. Jing, P. Lu, and S. Song, ‘‘Improved MobileNetV2- [360] Z. Qu, J. Shen, R. Li, J. Liu, and Q. Guan, ‘‘PartsNet: A unified deep
SSDLite for automatic fabric defect detection system based on cloud-edge network for automotive engine precision parts defect detection,’’ in Proc.
computing,’’ Measurement, vol. 201, Sep. 2022, Art. no. 111665. 2nd Int. Conf. Comput. Sci. Artif. Intell., Dec. 2018, pp. 594–599.
[337] F. Li and F. Li, ‘‘Bag of tricks for fabric defect detection based on cascade [361] T. Yang, L. Xiao, B. Gong, and L. Huang, ‘‘Surface defect recognition of
R-CNN,’’ Textile Res. J., vol. 91, nos. 5–6, pp. 599–612, Mar. 2021. varistor based on deep convolutional neural networks,’’ in Optoelectronic
[338] S. Song, J. Jing, Y. Huang, and M. Shi, ‘‘EfficientDet for fabric defect Imaging and Multimedia Technology, vol. 11187. Bellingham, WA, USA:
detection based on edge computing,’’ J. Engineered Fibers Fabrics, SPIE, 2019, pp. 267–274.
vol. 16, Jan. 2021, Art. no. 155892502110083. [362] T. Yang, S. Peng, and L. Huang, ‘‘Surface defect detection of
[339] Z. Luo, X. Xiao, S. Ge, Q. Ye, S. Zhao, and X. Jin, ‘‘Scratchnet: Detecting voltage-dependent resistors using convolutional neural networks,’’
the scratches on cellphone screen,’’ in Proc. 2nd CCF Chin. Conf., Multimedia Tools Appl., vol. 79, nos. 9–10, pp. 6531–6546,
Apr. 2017, pp. 178–186. Mar. 2020.
[340] H. Yang, S. Mei, K. Song, B. Tao, and Z. Yin, ‘‘Transfer-learning-based [363] O. Stephen, U. J. Maduh, and M. Sain, ‘‘A machine learning method for
online MURA defect classification,’’ IEEE Trans. Semicond. Manuf., detection of surface defects on ceramic tiles using convolutional neural
vol. 31, no. 1, pp. 116–123, Feb. 2018. networks,’’ Electronics, vol. 11, no. 1, p. 55, Dec. 2021.
[341] J. Lei, X. Gao, Z. Feng, H. Qiu, and M. Song, ‘‘Scale insensitive and focus [364] F. Lu, Z. Zhang, L. Guo, J. Chen, Y. Zhu, K. Yan, and X. Zhou, ‘‘HFENet:
driven mobile screen defect detection in industry,’’ Neurocomputing, A lightweight hand-crafted feature enhanced CNN for ceramic tile surface
vol. 294, pp. 72–81, Jun. 2018. defect detection,’’ Int. J. Intell. Syst., vol. 37, no. 12, pp. 10670–10693,
[342] Y. Lv, L. Ma, and H. Jiang, ‘‘A mobile phone screen cover glass defect Dec. 2022.
detection MODEL based on small samples learning,’’ in Proc. IEEE 4th [365] G. Wan, H. Fang, D. Wang, J. Yan, and B. Xie, ‘‘Ceramic tile surface
Int. Conf. Signal Image Process., Jul. 2019, pp. 1055–1059. defect detection based on deep learning,’’ Ceram. Int., vol. 48, no. 8,
[343] X. Tao, D. Zhang, W. Ma, X. Liu, and D. Xu, ‘‘Automatic metallic surface pp. 11085–11093, Apr. 2022.
defect detection and recognition with convolutional neural networks,’’ [366] J. Shi, Z. Li, T. Zhu, D. Wang, and C. Ni, ‘‘Defect detection of industry
Appl. Sci., vol. 8, no. 9, p. 1575, Sep. 2018. wood veneer based on NAS and multi-channel mask R-CNN,’’ Sensors,
[344] Y. Xu, K. Zhang, and L. Wang, ‘‘Metal surface defect detection using vol. 20, no. 16, p. 4398, Aug. 2020.
modified YOLO,’’ Algorithms, vol. 14, no. 9, p. 257, Aug. 2021. [367] L.-C. Chen, M. S. Pardeshi, W.-T. Lo, R.-K. Sheu, K.-C. Pai, C.-Y. Chen,
[345] Hsien-I. Lin and F. S. Wibowo, ‘‘Image data assessment approach for deep P.-Y. Tsai, and Y.-T. Tsai, ‘‘Edge-glued wooden panel defect detection
learning-based metal surface defect-detection systems,’’ IEEE Access, using deep learning,’’ Wood Sci. Technol., vol. 56, no. 2, pp. 477–507,
vol. 9, pp. 47621–47638, 2021. Mar. 2022.
[368] W.-H. Lim, M. B. Bonab, and K. H. Chua, ‘‘An aggressively pruned CNN RAHIMA KHANAM received the B.Sc. degree
model with visual attention for near real-time wood defects detection in computer science from Binary University,
on embedded processors,’’ IEEE Access, vol. 11, pp. 36834–36848, Malaysia, in 2018, and the Master of Computer
2023. Applications (MCA) degree from Jamia Millia
[369] R. Padilla, S. L. Netto, and E. A. B. da Silva, ‘‘A survey on performance Islamia University, India, in 2022. She is cur-
metrics for object-detection algorithms,’’ in Proc. Int. Conf. Syst., Signals rently pursuing the Ph.D. degree in computer
Image Process. (IWSSIP), Jul. 2020, pp. 237–242.
science and informatics with the University of
[370] T. Diwan, G. Anirudh, and J. V. Tembhurne, ‘‘Object detection
using YOLO: Challenges, architectural successors, datasets and appli-
Huddersfield, U.K., with a focus on developing an
cations,’’ Multimedia Tools Appl., vol. 82, no. 6, pp. 9243–9275, industrial pallet racking detection system utilizing
Mar. 2023. deep learning. Her research interests include the
[371] S. H. Shaikh, K. Saeed, N. Chaki, S. H. Shaikh, K. Saeed, and N. Chaki, intersection of AI, ML, the IoT, and deep learning technologies, with a
‘‘Moving object detection approaches, challenges and object tracking,’’ in specific focus on applications in industrial defect detection.
Moving Object Detection Using Background Subtraction. Springer, 2014,
pp. 5–14.
[372] P. Jiang, D. Ergu, F. Liu, Y. Cai, and B. Ma, ‘‘A review of YOLO algorithm
developments,’’ Proc. Comput. Sci., vol. 199, pp. 1066–1073, Mar. 2022.
[373] F.-J. Du and S.-J. Jiao, ‘‘Improvement of lightweight convolutional neural MUHAMMAD HUSSAIN received the B.Eng.
network model based on YOLO algorithm and its research in pavement degree in electrical and electronic engineering and
defect detection,’’ Sensors, vol. 22, no. 9, p. 3537, May 2022. the M.S. degree in the Internet of Things from
[374] R. K. Chandana and A. C. Ramachandra, ‘‘Real time object detection sys- the University of Huddersfield, West Yorkshire,
tem with YOLO and CNN models: A review,’’ 2022, arXiv:2208.00773. in 2019, and the Ph.D. degree in artificial intel-
[375] P. Dhilleswararao, S. Boppu, M. S. Manikandan, and L. R. Cenkeramaddi, ligence for defect identification, in 2022. He is
‘‘Efficient hardware architectures for accelerating deep neural networks:
an accomplished Researcher hailing from Dews-
Survey,’’ IEEE Access, vol. 10, pp. 131788–131828, 2022.
[376] T. Mohaidat and K. Khalil, ‘‘A survey on neural network hardware bury, West Yorkshire, U.K. His research interests
accelerators,’’ IEEE Trans. Artif. Intell., vol. 1, no. 1, pp. 1–21, Aug. 2024. include fault detection, particularly microcracks
[377] A. Boulemtafes, A. Derhab, and Y. Challal, ‘‘A review of privacy- on photovoltaic (PV) cells due to mechanical and
preserving techniques for deep learning,’’ Neurocomputing, vol. 384, thermal stress. His work contributes to optimizing PV systems’ efficiency
pp. 21–45, Apr. 2020. and reliability. He’s equally passionate about machine vision, focusing
[378] J. A. D. Cameron, P. Savoie, M. E. Kaye, and E. J. Scheme, ‘‘Design on lightweight architectures for edge device deployment in real-world
considerations for the processing system of a CNN-based automated production settings. Beyond fault detection, he explores AI interpretability,
surveillance system,’’ Expert Syst. Appl., vol. 136, pp. 105–114, concentrating on developing explainable AI for medical, and healthcare
Dec. 2019. applications. His interdisciplinary approach underscores his commitment
[379] A. E. Elhabashy, L. J. Wells, and J. A. Camelio, ‘‘Cyber-physical security to ethical and impactful AI solutions. With his diverse expertise spanning
research efforts in manufacturing—A literature review,’’ Proc. Manuf., AI, fault detection, machine vision, and interpretability. He aims to leave
vol. 34, pp. 921–931, Mar. 2019.
his mark on shaping the future of technology and its positive influence on
[380] S. B. Jha and R. F. Babiceanu, ‘‘Deep CNN-based visual defect
detection: Survey of current literature,’’ Comput. Ind., vol. 148, Jun. 2023,
society.
Art. no. 103911.
[381] K. B. Ismail, ‘‘Ensuring data privacy and security in healthcare computer
vision and AI applications: Investigating techniques for anonymization,
encryption, and federated learning,’’ Int. J. Appl. Mach. Learn. Comput.
RICHARD HILL is currently the Head of the
Intell., vol. 13, no. 12, pp. 1–10, 2023.
[382] M. A. Bansal, D. R. Sharma, and D. M. Kathuria, ‘‘A systematic review Department of Computer Science and the Director
on data scarcity problem in deep learning: Solution and applications,’’ of the Centre for Sustainable Computing, Uni-
ACM Comput. Surv., vol. 54, no. 10s, pp. 1–29, Jan. 2022. versity of Huddersfield, U.K. He has published
[383] L. Alzubaidi, J. Bai, A. Al-Sabaawi, J. Santamaría, A. S. Albahri, over 200 peer-reviewed articles. He has specific
B. S. N. Al-Dabbagh, M. A. Fadhel, M. Manoufali, J. Zhang, interests in digital manufacturing, including digital
A. H. Al-Timemy, Y. Duan, A. Abdullah, L. Farhan, Y. Lu, A. Gupta, threads and digital twinning. He was a recipient
F. Albu, A. Abbosh, and Y. Gu, ‘‘A survey on deep learning tools of several best paper awards, recognized by the
dealing with data scarcity: Definitions, challenges, solutions, tips, and IEEE for outstanding research leadership in the
applications,’’ J. Big Data, vol. 10, no. 1, p. 46, 2023. areas of big data, predictive analytics, the Internet
[384] M. L. Hutchinson, E. Antono, B. M. Gibbons, S. Paradiso, J. Ling, and of Things, cyber physical systems security, and industry 4.0.
B. Meredig, ‘‘Overcoming data scarcity with transfer learning,’’ 2017,
arXiv:1711.05099.
[385] G. A. Mukhaini, M. Anbar, S. Manickam, T. A. Al-Amiedy, and
A. A. Momani, ‘‘A systematic literature review of recent lightweight
detection approaches leveraging machine and deep learning mechanisms PAUL ALLEN received the Ph.D. degree in vehicle
in Internet of Things networks,’’ J. King Saud Univ. Comput. Inf. Sci., dynamics from Manchester Metropolitan Univer-
vol. 36, no. 1, Jan. 2024, Art. no. 101866. sity, U.K., in 1998. He is currently a Professor
[386] J. Wu, T. Tang, M. Chen, Y. Wang, and K. Wang, ‘‘A study on
of railway engineering and technology and the
adaptation lightweight architecture based deep learning models for
bearing fault diagnosis under varying working conditions,’’ Expert Syst. Director of the Institute of Railway Engineering,
Appl., vol. 160, Dec. 2020, Art. no. 113710. University of Huddersfield, Huddersfield, U.K.
[387] D. Zhang, X. Hao, D. Wang, C. Qin, B. Zhao, L. Liang, and W. Liu, ‘‘An His research interests include vehicle dynamics,
efficient lightweight convolutional neural network for industrial surface energy, traction and braking technologies, and
defect detection,’’ Artif. Intell. Rev., vol. 56, no. 9, pp. 10651–10677, remote monitoring solutions.
Sep. 2023.