YOLOv1 To v8 Unveiling Each VariantA Comprehensive Review of YOLO
YOLOv1 To v8 Unveiling Each VariantA Comprehensive Review of YOLO
ABSTRACT This paper implements a systematic methodological approach to review the evolution of
YOLO variants. Each variant is dissected by examining its internal architectural composition, providing
a thorough understanding of its structural components. Subsequently, the review highlights key architectural
innovations introduced in each variant, shedding light on the incremental refinements. The review includes
benchmarked performance metrics, offering a quantitative measure of each variant’s capabilities. The paper
further presents the performance of YOLO variants across a diverse range of domains, manifesting their
real-world impact. This structured approach ensures a comprehensive examination of YOLOs journey,
methodically communicating its internal advancements and benchmarked performance before delving into
domain applications. It is envisioned, the incorporation of concepts such as federated learning can introduce
a collaborative training paradigm, where YOLO models benefit from training across multiple edge devices,
enhancing privacy, adaptability, and generalisation.
INDEX TERMS Computer vision, YOLO, edge-computing, manufacturing, object detection, realtime.
a dialogue on the future trajectory of research, outlining delves into the innovations fueling the performance of each
potential pathways to strengthen YOLOs robustness in the variant, offering a comparative study that spans more than
realm of object detection. 20 domains.
In summary, this article dives into the evolving YOLO Furthermore, beyond examining the strengths of YOLO
architectures, comprehensively evaluating their effectiveness, architectures, this comprehensive review sheds light on the
and pondering over future prospects. The article unravels how persistent challenges faced by the YOLO series. By outlining
YOLO has transformed over the epochs of time, how effective current limitations and areas for improvement, the review
it is now, and what the future of YOLO variants may be in the aims to present a nuanced understanding of the ongoing
domain of computer vision. hurdles.
Additionally, it anticipates future developments and
A. SURVEY OBJECTIVE enhancements, providing insights into potential directions
This article seeks to examine the factors fuelling the profound for overcoming existing challenges. This forward-looking
adoption of the YOLO variants, with a focus on its evolution approach positions the review as a valuable resource not only
from YOLOv1 to YOLOv8. Figure 1 presents the article for understanding the historical evolution of YOLO but also
structure with the key components of the article highlighted for anticipating its trajectory in addressing emerging issues
in green. These components form the key objectives of this and meeting the demands of diverse domains.
paper:
1) Architectural Evolution Analysis: Examine the
architectural innovations across YOLO variants, eluci- C. ORGANIZATION OF PAPER
dating motivations and impact on real-time industrial This article is structured to succinctly examine the evolution
applications. and inspiration fuelling the popularity of YOLO variants
2) Training Strategies Scrutiny: Analyse YOLO’s train- in industrial applications. Beginning with an introduction
ing methodologies, including data augmentation and that lays the foundations, subsequent sections are intricately
transfer learning, to understand its adaptability across structured. Section II presents an overview of object detec-
diverse domains. tion. Section III, delves into the motivations and implications
3) Real-world Impact Assessment: Explore specific of architectural reforms across the variants, YOLOv1 to
domains where YOLO has manifested impressive YOLOv8.
efficacy, showcasing its practical versatility. Section IV, scrutinizes the versatility of YOLO variants
4) Challenges and Future Directions Exploration: through an examination of training methodologies, including
Identify realtime challenges, such as occlusions and data augmentation, transfer learning, and training datasets.
scale variations, and propose future research directions In Section V, a rigorous empirical assessment of YOLOv1-v8
to fortify YOLO’s standing in object detection. is conducted, benchmarking against contemporaneous mod-
els to quantify performance with respect to Mean-Average
B. IMPORTANCE OF SURVEY Precision (MAP), Frames Per Second (FPS) and internal
Although several papers have reviewed YOLO architectures, intricacies such as nature of loss functions deployed.
they often exhibit limitations such as focusing on specific Section VI, explores wide-ranging industrial applications
YOLO variants [10], [11], or concentrating on particular where YOLO has demonstrated efficacy, showcasing its prac-
application domains [12]. However, this review distinguishes tical versatility. Section VII, identifies barriers like handling
itself as the first to provide an in-depth analysis of mainstream occlusions, addressing biases, real-time edge deployment and
YOLO variants from YOLOv1 to YOLOv8. The analysis proposes future research directions.
Finally, Section VIII, summarises key findings, highlight- Look Once), RefineDet++, DSSD (Deconvolution Single
ing factors contributing towards YOLO’s popularity and its Shot Detector), and RetinaNet.
significant implications for the field of object detection. This SSD [23] deploys manifold convolutional feature maps at
organized structure ensures a coherent and insightful journey various scales to predict bounding boxes and class probability
through the multifaceted analysis of YOLO’s evolution and scores, effectively detecting objects of various sizes and
impact in object detection and the wider field of computer shapes in a single forward pass.
vision. RefineDet++ [24] optimises the original RefineDet archi-
tecture through iterative refinement of target proposals across
multiple stages, improving accuracy via enhanced feature
II. OBJECT DETECTION fusion mechanisms and refined target boundaries.
Addressing the intricacies of object detection presents numer- DSSD (Deconvolution Single Shot Detector) integrates
ous challenges. A key issue involves effectively managing deconvolution layers to preserve spatial information lost
fluctuations in image resolutions and aspect ratios [13], during feature pooling, enabling the model to capture
a task aggravated when the target objects manifest substantial fine-grained details by maintaining spatial resolution.
differences in spatial dimensions [14]. The presence of RetinaNet [25] addresses class imbalance via Focal
class imbalance, particularly in scenarios where ascertain a Loss, attributing higher weights to misclassified samples,
sufficient number of images for specific classes is challeng- enhancing the architecture’s ability to handle class imbalance
ing [15], can detrimentally impact architectural performance, and improve detection performance.
leading to biased predictions [16]. Vision Transformer (ViT) was introduced in 2020 [26].
Furthermore, a noteworthy hurdle is the computational Based the encoder and decoder mechanism, ViT continues
complexity associated with object detection architectures, the concept of tokens to visual data streams. As an
demanding considerable computational resources in terms alternate to CNNs, ViT can be utilized for the backbone
of power, memory, and time [17], [18]. Figure 2, illustrates feature extraction work. Selecting ResNet as a baseline,
object detection for both single and multiple objects in Wu et al. [27], integrated ViTs by replacing the ultimate
an image, detectors with deep internal networks require convolutional layer. This enabled the preceding convolutional
significant computational capabilities to process intricate layers to extract low-level features, which then segued into
datasets and extract essential features. the ViT, demonstrating the adaptability of the transformer
Object detection can be bifurcated into two categories: architecture in the territory of computer vision.
single- and two-stage detectors. The latter contains proposing
candidate regions within an image, followed by classification
III. EVOLUTION OF YOLO ARCHITECTURE:
and localization of proposed regions. Examples of two-stage
A. YOLOv1
detectors include RCNN (Region-based Convolutional Neu-
Announced in 2016, YOLOv1 marked a profound leap in
ral Network) [19], Fast R-CNN [20], Faster R-CNN [21], and
single-shot object detection. Enthused by the GoogLeNet
FPN (Feature Pyramid Network) [22].
architecture [28], YOLOv1 deployed a unique approach by
RCNN [19], proposed in 2014, deployed selective search
substituting GoogLeNet’s inception modules with (1 × 1)
for candidate region proposals, utilising a convolutional
convolution followed by (3 × 3) convolutional filters.
network for feature extraction. Fast R-CNN [20] facilitates
The architecture, benchmarked on the VOC Pascal Dataset
these concerns by proposing ROI pooling, which significantly
2007 and 2012 [29], exploited the Darknet framework for
reduces computations by extracting fixed-size feature maps
training. Featuring 24 convolution layers, with only four
for each region from the original feature maps.
of which were followed by max-pooling layers, YOLOv1
Faster R-CNN [21] enriched upon Fast R-CNN by
embraced (1 × 1) convolutions and global average pooling
implementing the Region Proposal Network (RPN). This
as standout features.
innovation eradicated the need for a separate proposal stage
Initially trained on the ImageNet dataset [30], the model
by directly generating region proposals from feature maps,
was exposed to fine-tuning by adding four additional convo-
optimising both speed and accuracy.
lutional layers and two fully connected layers with randomly
FPN (Feature Pyramid Network) [22] tackled the challenge
initialized weights. The activation function employed Leaky
of detecting targets at multiple scales by generating a feature
Rectified Linear Unit (LReLU), except for the last layer with
pyramid. This pyramid fused feature maps of varying reso-
a linear activation function. Despite its pioneering status,
lutions from different network stages, empowering effective
YOLOv1 exhibited drawbacks, including large localization
detection of targets across different scales. Notwithstanding
errors and lower recall compared to two-stage object
their impressive accuracy, two-stage detectors, are limited by
detectors.
their high computational demands.
In contrast, single-stage detectors aim to detect objects in
a single pass, side-stepping the need for a separate region B. YOLOv2
proposal step. Notable single-stage detectors include SSD YOLOv2 [31] was inspired by the once popular VGG
(Single Shot Multibox Detector), YOLO variants (You Only architecture, featuring the darknet-19 framework with 19 con-
C. YOLOv3
c: THREE-SCALE DETECTION MECHANISM
YOLOv3 [33], addressed the shortcomings observed in
its predecessors by concentrating on rectifying localisation YOLOv3 generated feature maps at three distinct scales,
errors and optimising detection efficiency, particularly for down-sampling the input at factors of 32, 16, and 8. Detection
smaller objects. Benchmarked on the COCO dataset [34], was carried out on a 13 × 13 feature map after a series of
YOLOv3 presented improved performance in detecting convolutions, followed by a 26 × 26 feature map obtained
smaller objects, while encountering difficulties in achieving via up-sampling and concatenation. Additionally, a 52 ×
precise results for medium and large-sized objects. 52 feature map was involved in the detection process. This
Constructed on the Darknet-53 framework, YOLOv3 three-scale mechanism enabled YOLOv3 to detect large,
employs a robust network comprising of 53 convolutional medium, and small-sized objects using distinct feature maps.
layers, incorporating 3 × 3 and 1 × 1 convolutional filters
along with skip connections, as presented in Table 2. Conspic- D. YOLOv4
uously, the Darknet-53 framework, with its 53 convolutional Authors of YOLOv4 [36], introduced a plethora of advanced
layers, achieved double the speed of ResNet-152 [35]. techniques and sophisticated methodologies, distinguishing
YOLOv4 as a faster and more accurate object detector
TABLE 2. YOLOv3 internal architecture. tailored for production systems compared to its predecessors.
YOLOv4 architecture was defined through a sequence
of pivotal components: initial image processing, feature
extraction utilising potent networks like VGG16 [37], Dark-
net53, and ResNet50, feature scaling with neck structures
like Feature Pyramid Network (FPN) and Path Aggregation
Network (PAN) [38], and the integration of single-stage and
two-stage detectors for prediction.
In their experimentation with architectures, authors com-
pared CSPResNeXt50, CSPDarknet53, and EfficientNetB3,
ultimately selecting CSPDarknet53 as the backbone. CSP-
Darknet53, featured 29 convolution layers with 3 × 3 fil-
ters and around 27.6 million parameters, incorporating
Cross-stage partial connections (CSP) to enhance gradient
combination efficiency with minimal computational cost.
Key architectural components included:
YOLOv6 incorporated new classification and regression requirements. Benchmarked on the MS COCO dataset test-
losses, employing a classification VariFocal loss and an dev 2017, the largest variant achieved an impressive AP of
SIoU/GIoU regression loss. 57.2% while maintaining a speed of around 29 FPS on an
NVIDIA Tesla T4.
c: SELF-DISTILLATION STRATEGY
YOLOv6 implemented a self-distillation strategy for both G. YOLOv7
regression and classification tasks. This strategy assisted YOLOv7, released in 2022, represents an innovative advance-
the model distill knowledge from its own predictions, ment in the realm of object detection [43]. At the time of
contributing to improved performance and generalisation. its release, it outperformed many present object detectors,
ranging from 5 FPS to an impressive 160 FPS.
d: QUANTIZATION SCHEME WITH RepOptimiser AND Notably, YOLOv7 was trained on the MS COCO dataset
CHANNEL-WISE DISTILLATION without leveraging pre-trained backbones, showcasing its
The authors introduced a quantisation scheme for detection ability to achieve remarkable results through its unique
using RepOptimiser and channel-wise distillation. This training approach. Architectural advents include:
scheme not only assisted in achieving a faster detector but
also ensured that quantisation did not compromise accuracy. a: EXTENDED EFFICIENT LAYER AGGREGATION NETWORK
(E-ELAN)
e: BIDIRECTIONAL CONCATENATION (BiC) MODULE YOLOv7 proposed an extended version of the efficient layer
YOLOv6 introduced a BiC module in the neck of the detector, aggregation network (ELAN) [44], termed E-ELAN. ELAN
enhancing localisation signals and delivering performance is a strategic mechanism facilitating efficient learning and
gains with negligible speed degradation. convergence in deep models by controlling the shortest
longest gradient path. E-ELAN optimises this concept
for models with unlimited stacked computational blocks.
f: ANCHOR-AIDED TRAINING (AAT) STRATEGY
It achieves this by shuffling and merging cardinality features,
AAT caters for both anchor-based and anchor-free paradigms
thus augmenting the network’s learning capabilities without
without compromising inference efficiency.
compromising the original gradient path.
g: ENHANCED BACKBONE AND NECK DESIGN b: MODEL SCALING FOR CONCATENATION-BASED MODELS
By deepening YOLOv6 to include another stage in the YOLOv7 adopted a concatenation-based architecture, and
backbone and neck, the architecture achieved state-of-the-art to generate models of varying sizes, it introduced a novel
performance on the COCO dataset at high-resolution input. mechanism for model scaling. Unlike standard scaling
techniques, such as depth scaling, YOLOv7 ensured the
h: SELF-DISTILLATION STRATEGY depth and width of the block are scaled proportionally. This
A new self-distillation strategy is implemented to boost the maintained the optimal structure of the model, preventing
performance of smaller models of YOLOv6, enhancing the unwanted distortions in the hardware usage of the model.
auxiliary regression branch during training and removing it
at inference to avoid a marked speed decline. c: PLANNED RE-PARAMETERIZED CONVOLUTION
(RepConvN)
i: MODEL VARIANTS AND PERFORMANCE Inspired by re-parameterized convolutions (RepConv) from
The authors provide eight scaled variants, ranging from YOLOv6, YOLOv7 introduced RepConvN. In contrast to
YOLOv6-N to YOLOv6-L6, catering to different application RepConv, RepConvN eradicates the identity connection,
TABLE 4. Key features and architectural evolution. TABLE 5. Training and optimization.
but not limited to random scaling, rotation, translation, YOLOv1: Pioneer in Object Detection (2015) The
illumination, and the popular Mosaic (YOLOv5) serves as a inaugural version of YOLO, YOLOv1, introduced the
cornerstone for enhancing the variant robustness. groundbreaking concept of real-time object detection using
By exposing variants to a myriad of augmented instances a single-stage architecture with anchor boxes. Deploying
during training, YOLO becomes adept at handling the the Darknet24 framework, it achieved a remarkable Mean
inherent variations and complexities present in real-world Average Precision (mAP) of 63.4% while maintaining a
scenarios. This augmentation strategy embebedded within the processing speed of 45 frames per second (FPS).
algorithmic pipeline not only mitigates the risk of overfitting YOLOv2: Refinements and Anchor Boxes (2016) Build-
but also fosters a model that generalizes effectively across ing upon the success of YOLOv1, YOLOv2 continued
diverse object appearances, orientations, and environmental the utilization of anchor boxes for improved localization
conditions. accuracy. Implemented within the Darknet24 framework,
it achieved a notable increase in mAP, reaching 69.0%, and
maintained real-time processing capabilities with 52 FPS.
B. DYNAMIC TRAINING MECHANISMS
YOLOv3: Multi-scale Features and Loss Functions
The training methodologies deployed across different vari-
(2018) YOLOv3 marked a balanced approach by adopting
ants of YOLO as presented in Table 5 underscore a continual
a multi-scale feature extraction architecture and introducing
evolution in optimizing object detection models. YOLOv1
novel loss functions such as CIoU, GIoU, and BCE. Utilizing
initiated the journey with a grid-based approach leveraging
the Darknet53 framework, it achieved a mAP of 57.9% and
the Darknet framework for training on the Pascal dataset.
demonstrated the ability to handle object detection across
Subsequent variants, like YOLOv2 and YOLOv3,
various scales at 34 FPS.
expanded their horizons by incorporating hierarchical
YOLOv4: Advanced Loss Functions (2020) With the
classification and adopting the Darknet-53 backbone, along
adoption of the CSPDarknet53 framework, YOLOv4 empha-
with introducing innovative techniques such as the FPN.
sized advanced loss functions, including CIoU, DFL, and
YOLOv4 further enhanced the training process through
BCE, aiming to enhance bounding box accuracy while
techniques like enhanced quantization, PAN, and RepVGG.
sustaining real-time processing. Despite a decrease in mAP
YOLOv5 marked a transition to PyTorch, embracing
to 44.3%, it exhibited a high FPS of 65.
AutoAnchor, Mosaic and MixUp for improved performance.
YOLOv5: Leap in Accuracy and Efficiency (2020)
YOLOv6 introduced advancements like RepVGG, PAN,
A significant leap in accuracy and efficiency, YOLOv5,
and EfficientRep, while YOLOv7 continued to innovate with
implemented the Modified CSP v7 architecture in PyTorch.
ELAN and model scaling. YOLOv8, developed in PyTorch,
With a single-stage detection mechanism and novel loss
stands out with its C2f module, EfficientRep, CIoU, and DFL
functions (CIoU, DFL, BCE), it achieved a mAP of 50.7%
for robust and efficient training.
and a substantial increase in FPS to 200, showcasing its
This iterative refinement in training techniques across
efficiency in real-time applications.
YOLO versions showcases a commitment to optimizing
YOLOv6 to YOLOv8: Iterative Improvements (2022-
object detection models through a diverse range of method-
2023) The subsequent iterations, YOLOv6, YOLOv7, and
ologies, each tailored to address the specific challenges and
YOLOv8, demonstrate a commitment to iterative improve-
opportunities presented by evolving datasets.
ments. YOLOv6, utilizing the EfficientRep architecture,
improved accuracy to 52.5%, while YOLOv7, based on
V. YOLO VERSIONS: A COMPARATIVE ANALYSIS the RepConvN, achieved a mAP of 56.8%. YOLOv8,
This section provides a comparative analysis of the reviewed introducing an anchor-free model, maintained a high
YOLO variants from YOLOv1 to YOLOv8, across a wide accuracy of 53.9% with an impressive processing speed
range of metrics, as presented in Table 6. of 280 FPS.
VI. REAL-WORLD APPLICATIONS AND IMPACT C. INDUSTRIAL AUTOMATION AND QUALITY CONTROL
A. SURVEILLANCE SYSTEMS AND PUBLIC SAFETY YOLO finds applications in industrial settings for automation
YOLO’s real-time processing capabilities make it invaluable and quality control [50]. In manufacturing, it can detect and
in surveillance systems, enhancing public safety through the inspect defects in products, ensuring adherence to quality
efficient monitoring of public spaces [46]. standards [51]. The real-time nature of YOLO facilitates swift
decision-making [52] in automated processes, contributing to
increased efficiency [53] and reduced errors [54], in areas
B. AUTONOMOUS VEHICLES AND TRAFFIC such as defect detection.
MANAGEMENT
In the realm of autonomous vehicles, YOLO plays a
crucial role in object detection for obstacle avoidance and D. HEALTHCARE IMAGING AND DIAGNOSIS
navigation [47]. Its rapid identification and classification In medical imaging, YOLO demonstrates efficacy in detect-
of objects contribute to the safe and efficient operation ing and localizing abnormalities, aiding medical profes-
of autonomous vehicles [48]. YOLO also supports traffic sionals in timely and accurate diagnoses in areas such as
management systems by providing real-time information on cancer and exudate detection for early diagnosis of deiabetic
road conditions [49]. retinopathy [55]. YOLO’s real-time processing is particularly
valuable in scenarios where quick decisions are critical for may struggle to accurately detect and delineate individual
patient care. instances. Future research could explore novel approaches,
such as improved feature representations or context-aware
E. ENVIRONMENTAL MONITORING AND WILDLIFE models, to enhance YOLO’s ability to cope with occlusions
CONSERVATION and cluttered scenes [88].
YOLO’s adaptability extends to environmental monitoring,
supporting wildlife conservation, biodiversity studies and B. SCALE VARIATIONS AND FINE-GRAINED OBJECT
renewable energy. It can detect and track animals in their DETECTION
natural habitats, aiding researchers in population monitoring The robust detection of objects at varying scales and the
and protection efforts. YOLO’s real-time capabilities enhance identification of fine-grained details remain areas where
the efficiency of conservation initiatives. YOLO can be refined [89]. Adapting the architecture to
better handle small or distant objects, potentially through
F. RETAIL AND CUSTOMER EXPERIENCE multi-scale feature fusion strategies, could elevate YOLO’s
In the retail domain, YOLO variants have been implemented performance in scenarios demanding fine-grained object
to enhancing customer experiences and optimise several detection. The integration of federated learning can con-
aspects of the supply chain. By leveraging its efficient tribute to the enhancement of YOLO’s adaptability across
object detection and tracking effeciency, YOLO variants can diverse scales by leveraging collaborative learning from edge
significantly contribute to automated inventory management, devices.
offering retailers real-time analysis of their stock levels and
product availability [56] and [57]. C. DOMAIN ADAPTATION AND GENERALIZATION
To further illustrate YOLO’s impact, Table 7 provides While YOLO has showcased versatility across domains,
an overview of diverse applications and research studies there is room for improvement in domain adaptation [90].
leveraging YOLO. Each entry in the table highlights the Ensuring robust performance when transitioning from one
reference, detection type, YOLO model used, key character- environment to another [4], especially in scenarios with
istics of the application, and performance metrics achieved. significant domain shifts, is a challenge. The integration
Notably, multiple applications have optimised the selected of federated learning introduces a collaborative approach
YOLO architecture for diverse purposes. While the majority to domain adaptation, allowing YOLO models to adapt to
of works presented, prioritised attaining high accuracy across diverse edge environments through decentralized learning.
metrics such as MAP, precision, and recall, certain works,
driven by limitations in hardware resources or domain
D. EXPLAINABILITY AND INTERPRETABILITY
restrictions, directed efforts toward optimising Frames Per
As with any machine learning system, addressing biases
Second (FPS) for expedited inferencing, highlighting YOLOs
in training data and ensuring ethical considerations are
versatility in adapting to the specific needs of different
paramount [91]. YOLO, like other object detection models,
applications.
may exhibit biases that mirror the biases present in the data it
Another notable observation showcased in Table 7 is that
was trained on. The integration of federated [92] learning can
most variants implemented are v3 onwards. This preference
contribute to addressing biases by ensuring a more diverse
can be attributed to the crucial role played by YOLO-
and representative dataset across edge devices, enhancing the
v3 as the initial variant addressing the challenge of small
fairness and interpretability [93]of YOLO models.
object detection. YOLO-v3 introduced multi-scale detection
mechanisms with subsequent variants i.e., PANet (YOLOv4),
building on this concept, thereby unlocking applicability in E. ADDRESSING BIASES AND ETHICAL CONSIDERATIONS
scenarios where the detection of small targets was essential. As YOLO evolves, considerations for privacy preservation
become increasingly important. The integration of YOLO
VII. CHALLENGES with federated learning aligns with privacy-preserving objec-
Despite its remarkable success, YOLO faces certain chal- tives by allowing models to be trained collaboratively
lenges and areas for improvement. This section critically across edge devices without centralizing sensitive data [94].
examines the limitations of the YOLO framework, proposes This integration addresses ethical considerations related to
potential avenues for future research to address these data privacy in various applications, from surveillance to
challenges, and explores the integration of YOLO with edge healthcare.
deployment and federated learning for enhanced privacy and
adaptability: F. REAL-TIME PROCESSING OPTIMIZATION
While YOLO is renowned for its real-time processing
A. HANDLING OCCLUSIONS AND CLUTTER capabilities, continuous optimization in this aspect is essen-
One persistent challenge for YOLO is effectively handling tial [95]. Future research may explore innovative techniques
occluded objects and scenes with high clutter. In scenarios for further improving inference speed without compromising
where objects overlap or are partially obscured [87], YOLO accuracy. The integration of edge deployment and federated
learning introduces a decentralized approach to real-time methodologies based on their specific domain requirements.
processing, where models are trained collaboratively on edge This diversity is evident in the extensive range of domains
devices, contributing to enhanced efficiency. presented in the third objective, highlighting YOLO variant
deployments across various industries. The incremental
G. EDGE DEPLOYMENT AND FEDERATED LEARNING advances in training strategies contribute significantly to the
The deployment of YOLO at the edge and the integration with adaptability and performance optimization of YOLO variants
federated learning present exciting opportunities [96]. Edge in real-world applications.
devices benefit from YOLO’s efficiency, enabling on-device In considering future challenges, it is envisioned that
object detection without relying heavily on centralized YOLO variants will continue to address and improve perfor-
servers [97]. Federated learning introduces a collaborative mance on small object targets, especially as they penetrate
training paradigm where YOLO models are trained across into more specialized areas such as precision manufacturing.
multiple edge devices [98], enhancing privacy, adaptability, This trajectory suggests a necessity for advancements in
and generalization [99]. This integration aligns with the lightweight architectures that balance high accuracy with
evolving landscape of decentralized and privacy-preserving stringent FPS requirements. As YOLO progresses, meeting
machine learning. the demands of niche applications will likely drive further
innovation in architectural design and optimisation, ensuring
VIII. CONCLUSION its continued relevance in domains with stringent require-
As we conclude this comprehensive exploration of YOLOs ments for precision and efficiency.
evolution, challenges, and integrations, it becomes evident
that YOLO has not only shaped the landscape of object REFERENCES
detection but continues to evolve dynamically, staying at [1] M. Hussain and R. Hill, ‘‘Custom lightweight convolutional neural
the forefront of advancements in computer vision. From network architecture for automated detection of damaged pallet rack-
the pioneering YOLOv1 to the sophisticated YOLOv8, ing in warehousing & distribution centers,’’ IEEE Access, vol. 11,
pp. 58879–58889, 2023.
the architectural innovations and training strategies have [2] M. Hussain, ‘‘YOLO-v5 variant selection algorithm coupled with
propelled YOLO into the limelight, making it a go-to choice representative augmentations for modelling production-based variance
for real-time object detection. in automated lightweight pallet racking inspection,’’ Big Data Cognit.
Comput., vol. 7, no. 2, p. 120, Jun. 2023.
Reviewing the first objective of this review, it is evident [3] M. F. Talu, K. Hanbay, and M. H. Varjovi, ‘‘CNN-based fabric defect
that YOLO variants have endured significant architectural detection system on loom fabric inspection,’’ Tekstil Konfeksiyon, vol. 32,
innovations during evolution. This progression includes no. 3, pp. 208–219, Sep. 2022.
highlights such as the introduction of Feature Pyramid [4] B. A. Aydin, M. Hussain, R. Hill, and H. Al-Aqrabi, ‘‘Domain modelling
for a lightweight convolutional network focused on automated exudate
Networks (FPN) in YOLOv3 and the incorporation of detection in retinal fundus images,’’ in Proc. 9th Int. Conf. Inf. Technol.
ELAN mechanisms in YOLOv7. Notably, the later variants Trends (ITT), May 2023, pp. 145–150.
have acknowledged the requirement for versatility to meet [5] M. A. Ansari, A. Crampton, and S. Parkinson, ‘‘A layer-wise surface
deformation defect detection by convolutional neural networks in laser
the diverse demands of industrial deployments. To address powder-bed fusion images,’’ Materials, vol. 15, no. 20, p. 7166, Oct. 2022.
this, researchers have proposed several sub-variants of [6] P. Lala Mehta and A. Kumar, ‘‘Livai: A novel resource-efficient real-time
each architecture, such as v5s/m/l/x, each with varying facial emotion recognition system based on a custom deep CNN model,’’
SSRN Electron. J., Feb. 2022.
internal architectural configurations. This approach enables
[7] M. Hussain, ‘‘YOLO-v1 to YOLO-v8, the rise of YOLO and its
developers to select a base architecture based on their specific complementary nature toward digital manufacturing and industrial defect
requirements for accuracy and detection rate requirements. detection,’’ Machines, vol. 11, no. 7, p. 677, Jun. 2023.
The resulting versatility has permitted YOLO variants to [8] A. Koubaa, A. Ammar, A. Kanhouch, and Y. AlHabashi, ‘‘Cloud versus
edge deployment strategies of real-time face recognition inference,’’ IEEE
successfully penetrate various applications in the industry as Trans. Netw. Sci. Eng., vol. 9, no. 1, pp. 143–160, Jan. 2022.
evident from Table 7. [9] Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, ‘‘Object detection in 20 years:
The second objective of the paper, which scrutinizes the A survey,’’ Proc. IEEE, vol. 111, no. 3, pp. 257–276, Mar. 2023.
training strategies for performance optimisation, reveals a [10] P. Jiang, D. Ergu, F. Liu, Y. Cai, and B. Ma, ‘‘A review of YOLO algorithm
developments,’’ Proc. Comput. Sci., vol. 199, pp. 1066–1073, Jan. 2022.
comprehensive analysis of training methodologies across [11] P. P. Khaire, R. D. Shelke, D. Hiran, and M. Patil, ‘‘Comparative study
YOLO variants. As presented in Table 5, each variant not only of a computer vision technique for locating instances of objects in images
endured testing on key benchmark datasets but also engaged using YOLO versions: A review,’’ in Proc. Int. Conf. Inf. Commun. Technol.
Intell. Syst., Springer, 2023, pp. 349–359.
in in-depth tuning of internal architectures. YOLOv4, for [12] C. Chen, Z. Zheng, T. Xu, S. Guo, S. Feng, W. Yao, and Y. Lan, ‘‘YOLO-
instance, transitioned from Darknet53 to CSPDarknet53, based UAV technology: A review of the research and its applications,’’
demonstrating a shift in architectural choices for enhanced Drones, vol. 7, no. 3, p. 190, Mar. 2023.
performance. In the case of YOLOv6, the focus moved [13] X. Qian, B. Wu, G. Cheng, X. Yao, W. Wang, and J. Han, ‘‘Building a
bridge of bounding box regression between oriented and horizontal object
towards training optimisation through EfficientRep, followed detection in remote sensing images,’’ IEEE Trans. Geosci. Remote Sens.,
by RepConvN (YOLOv7), indicating a deliberate effort to vol. 61, 2023.
incorporate incremental training boosters. [14] X. Qian, Y. Huo, G. Cheng, C. Gao, X. Yao, and W. Wang, ‘‘Mining high-
quality pseudoinstance soft labels for weakly supervised object detection
These refined training strategies have bestowed devel- in remote sensing images,’’ IEEE Trans. Geosci. Remote Sens., vol. 61,
opers with a rich selection pool, enabling them to select 2023.
[15] L. Li, X. Yao, X. Wang, D. Hong, G. Cheng, and J. Han, ‘‘Robust few- [41] C.-Y. Wang, A. Bochkovskiy, and H.-Y. Liao. (2022). YOLOv6. GitHub.
shot aerial image object detection via unbiased proposals filtration,’’ IEEE [Online]. Available: https://github.com/meituan/YOLOv6
Trans. Geosci. Remote Sens., vol. 61, 2023. [42] C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng,
[16] S. Agarwal, J. O. D. Terrail, and F. Jurie, ‘‘Recent advances in object W. Nie, Y. Li, B. Zhang, Y. Liang, L. Zhou, X. Xu, X. Chu, X. Wei,
detection in the age of deep convolutional neural networks,’’ 2019, and X. Wei, ‘‘YOLOv6: A single-stage object detection framework for
arXiv:1809.03193. industrial applications,’’ 2022, arXiv:2209.02976.
[17] L. Liu, W. Ouyang, X. Wang, P. Fieguth, J. Chen, X. Liu, and [43] C.-Y. Wang, A. Bochkovskiy, and H.-Y. Liao, ‘‘YOLOv7: Trainable bag-
M. Pietikäinen, ‘‘Deep learning for generic object detection: A survey,’’ of-freebies sets new state-of-the-art for real-time object detectors,’’ 2022,
2018, arXiv:1809.02165. arXiv:2207.02696.
[18] C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and [44] X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, ‘‘RepVGG: Making
I.-H. Yeh, ‘‘CSPNet: A new backbone that can enhance learning capability VGG-style ConvNets great again,’’ 2021, arXiv:2101.03697.
of CNN,’’ 2020, arXiv:1911.11929. [45] J. Solawetz, ‘‘What is YOLOv8? The ultimate guide,’’ Tech. Rep.,
[19] X. Xie, G. Cheng, J. Wang, X. Yao, and J. Han, ‘‘Oriented R-CNN for Jan. 2023.
object detection,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), [46] D. Beymer, ‘‘Person counting using stereo,’’ in Proc. Workshop Human
Oct. 2021, pp. 3500–3509. Motion, Dec. 2000, pp. 127–133.
[20] R. Girshick, ‘‘Fast R-CNN,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), [47] M. Nagy and G. Lăzăroiu, ‘‘Computer vision algorithms, remote sensing
Dec. 2015, pp. 1440–1448. data fusion techniques, and mapping and navigation tools in the Industry
[21] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards real- 4.0-based Slovak automotive sector,’’ Mathematics, vol. 10, no. 19,
time object detection with region proposal networks,’’ IEEE Trans. Pattern p. 3543, Sep. 2022.
Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017. [48] S. Battiato, S. Conoci, R. Leotta, A. Ortis, F. Rundo, and F. Trenta,
[22] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, ‘‘Benchmarking of computer vision algorithms for driver monitoring on
‘‘Feature pyramid networks for object detection,’’ in Proc. IEEE Conf. automotive-grade devices,’’ in Proc. AEIT Int. Conf. Electr. Electron.
Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 936–944. Technol. Automot. (AEIT AUTOMOTIVE), Nov. 2020, pp. 1–6.
[23] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and [49] J. Barthélemy, N. Verstaevel, H. Forehead, and P. Perez, ‘‘Edge-computing
A. C. Berg, ‘‘SSD: Single shot MultiBox detector,’’ in Proc. Eur. Conf. video analytics for real-time traffic monitoring in a smart city,’’ Sensors,
Comput. Vis., 2016, pp. 21–37. vol. 19, no. 9, p. 2048, May 2019.
[24] C. Sun, Y. Ai, S. Wang, and W. Zhang, ‘‘Dense-RefineDet for traffic sign [50] L. Scime and J. Beuth, ‘‘Anomaly detection and classification in a laser
detection and classification,’’ Sensors, vol. 20, no. 22, p. 6570, Nov. 2020. powder bed additive manufacturing process using a trained computer
[25] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, ‘‘Focal loss for dense vision algorithm,’’ Additive Manuf., vol. 19, pp. 114–126, Jan. 2018.
object detection,’’ 2017, arXiv:1708.02002. [51] N. Lyons, ‘‘Deep learning-based computer vision algorithms, immersive
[26] D. Wu, S. Lv, M. Jiang, and H. Song, ‘‘Using channel pruning-based analytics and simulation software, and virtual reality modeling tools
YOLO v4 deep learning algorithm for the real-time and accurate detection in digital twin-driven smart manufacturing,’’ Econ., Manage., Financial
of apple flowers in natural environments,’’ Comput. Electron. Agricult., Markets, vol. 17, no. 2, pp. 67–81, 2022.
vol. 178, Nov. 2020, Art. no. 105742. [52] K. Li, E. D. Miller, M. Chen, T. Kanade, L. E. Weiss, and P. G. Campbell,
[27] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, ‘‘Computer vision tracking of stemness,’’ in Proc. 5th IEEE Int. Symp.
T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, Biomed. Imag., Nano Macro, May 2008, pp. 847–850.
J. Uszkoreit, and N. Houlsby, ‘‘An image is worth 16 × 16 words: [53] Q.-J. Zhao, P. Cao, and D.-W. Tu, ‘‘Toward intelligent manufacturing:
Transformers for image recognition at scale,’’ 2020, arXiv:2010.11929. Label characters marking and recognition method for steel products with
[28] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, machine vision,’’ Adv. Manuf., vol. 2, no. 1, pp. 3–12, Mar. 2014.
V. Vanhoucke, and A. Rabinovich, ‘‘Going deeper with convolutions,’’ [54] S. Paneru and I. Jeelani, ‘‘Computer vision applications in construction:
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2015, Current state, opportunities & challenges,’’ Autom. Construct., vol. 132,
pp. 1–9. Dec. 2021, Art. no. 103940.
[29] M. Everingham, L. van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, [55] M. Hussain, H. Al-Aqrabi, M. Munawar, R. Hill, and S. Parkinson,
‘‘The Pascal visual object classes (VOC) challenge,’’ Int. J. Comput. Vis., ‘‘Exudate regeneration for automated exudate detection in retinal fundus
vol. 88, no. 2, pp. 303–338, Jun. 2010. images,’’ IEEE Access, vol. 11, pp. 83934–83945, 2022.
[30] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ‘‘ImageNet: [56] P. Cortez, L. M. Matos, P. J. Pereira, N. Santos, and D. Duque, ‘‘Forecasting
A large-scale hierarchical image database,’’ in Proc. IEEE Conf. Comput. store foot traffic using facial recognition, time series and support vector
Vis. Pattern Recognit., Jun. 2009, pp. 248–255. machines,’’ in Proc. Int. Joint Conf. Cham, Switzerland: Springer, 2017,
[31] J. Redmon and A. Ali, ‘‘YOLO9000: Better, faster, stronger,’’ in pp. 267–276.
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2017, [57] N. James, ‘‘Automated checkout for stores: A computer vision approach,’’
pp. 7263–7271. Revista Gestão Inovação Tecnologias, vol. 11, no. 3, pp. 1830–1841,
[32] T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, Jun. 2021.
P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollár, ‘‘Microsoft COCO: [58] W. Lan, J. Dang, Y. Wang, and S. Wang, ‘‘Pedestrian detection based on
Common objects in context,’’ 2014, arXiv:1405.0312. YOLO network model,’’ in Proc. IEEE Int. Conf. Mechatronics Autom.
[33] J. Redmon and A. Ali, ‘‘YOLOv3: An incremental improvement,’’ 2018, (ICMA), Aug. 2018, pp. 1547–1551.
arXiv:1804.02767. [59] W.-Y. Hsu and W.-Y. Lin, ‘‘Adaptive fusion of multi-scale YOLO for
[34] T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, pedestrian detection,’’ IEEE Access, vol. 9, pp. 110063–110073, 2021.
and C. L. Zitnick, ‘‘Microsoft COCO: Common objects in context,’’ in [60] S. Shinde, A. Kothari, and V. Gupta, ‘‘YOLO based human action
Proc. Eur. Conf. Comput. Vis., 2014, pp. 740–755. recognition and localization,’’ Proc. Comput. Sci., vol. 133, pp. 831–838,
[35] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image Jan. 2018.
recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), [61] P. Maski and A. Thondiyath, ‘‘Plant disease detection using advanced deep
Jun. 2016, pp. 770–778. learning algorithms: A case study of papaya ring spot disease,’’ in Proc. 6th
[36] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, ‘‘YOLOv4: Optimal Int. Conf. Image, Vis. Comput. (ICIVC), Jul. 2021, pp. 49–54.
speed and accuracy of object detection,’’ 2020, arXiv:2004.10934. [62] M. Lippi, N. Bonucci, R. F. Carpio, M. Contarini, S. Speranza,
[37] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for and A. Gasparri, ‘‘A YOLO-based pest detection system for precision
large-scale image recognition,’’ 2014, arXiv:1409.1556. agriculture,’’ in Proc. 29th Medit. Conf. Control Autom. (MED), Jun. 2021,
[38] Z. Ma, M. Li, and Y. Wang, ‘‘PAN: Path integral based convolution for pp. 342–347.
deep graph neural networks,’’ 2019, arXiv:1904.10996. [63] W. Yang and Z. Jiachun, ‘‘Real-time face detection based on YOLO,’’
[39] G. Jocher et al., (2020), ‘‘ultralytics/yolov5:v3.0,’’ Zenodo, doi: in Proc. 1st IEEE Int. Conf. Knowl. Innov. Invention (ICKII), Jul. 2018,
10.5281/zenodo.3983579. pp. 221–224.
[40] Z. Yao, Y. Cao, S. Zheng, G. Huang, and S. Lin, ‘‘Cross-iteration batch [64] W. Chen, H. Huang, S. Peng, C. Zhou, and C. Zhang, ‘‘YOLO-Face: A real-
normalization,’’ 2021, arXiv:2002.05712. time face detector,’’ Vis. Comput., vol. 37, no. 4, pp. 805–813, Mar. 2020.
[65] M. A. Al-masni, M. A. Al-antari, J.-M. Park, G. Gi, T.-Y. Kim, P. Rivera, [87] M. Ghafoor and A. Mahmood, ‘‘Quantification of occlusion handling
E. Valarezo, M.-T. Choi, S.-M. Han, and T.-S. Kim, ‘‘Simultaneous capability of 3D human pose estimation framework,’’ IEEE Trans.
detection and classification of breast masses in digital mammograms via Multimedia, 2022.
a deep learning YOLO-based CAD system,’’ Comput. Methods Programs [88] M. F. Aslan, A. Durdu, K. Sabanci, and M. A. Mutluer, ‘‘CNN and
Biomed., vol. 157, pp. 85–94, Apr. 2018. HOG based comparison study for complete occlusion handling in human
[66] Y. Nie, P. Sommella, M. O’Nils, C. Liguori, and J. Lundgren, ‘‘Automatic tracking,’’ Measurement, vol. 158, Jul. 2020, Art. no. 107704.
detection of melanoma with YOLO deep convolutional neural networks,’’ [89] H. T. Mustafa, J. Yang, and M. Zareapoor, ‘‘Multi-scale convolutional
in Proc. E-Health Bioeng. Conf. (EHB), Nov. 2019, pp. 1–4. neural network for multi-focus image fusion,’’ Image Vis. Comput., vol. 85,
[67] H. M. Ünver and E. Ayan, ‘‘Skin lesion segmentation in dermoscopic pp. 26–35, May 2019.
images with combination of YOLO and GrabCut algorithm,’’ Diagnostics, [90] A. Zahid, M. Hussain, R. Hill, and H. Al-Aqrabi, ‘‘Lightweight
vol. 9, no. 3, p. 72, Jul. 2019. convolutional network for automated photovoltaic defect detection,’’ in
[68] L. Tan, T. Huangfu, L. Wu, and W. Chen, ‘‘Comparison of RetinaNet, SSD, Proc. 9th Int. Conf. Inf. Technol. Trends (ITT), May 2023, pp. 133–138.
and YOLO v3 for real-time pill identification,’’ BMC Med. Informat. Decis. [91] D. S. Char, N. H. Shah, and D. Magnus, ‘‘Implementing machine learning
Making, vol. 21, no. 1, Nov. 2021. in health care—Addressing ethical challenges,’’ New England J. Med.,
[69] N. Bordoloi, A. K. Talukdar, and K. K. Sarma, ‘‘Suspicious activity vol. 378, no. 11, pp. 981–983, Mar. 2018.
detection from videos using YOLOv3,’’ in Proc. IEEE 17th India Council [92] A. Lakhan, M. A. Mohammed, K. H. Abdulkareem, H. Hamouda, and
Int. Conf. (INDICON), Dec. 2020, pp. 1–5. S. Alyahya, ‘‘Autism spectrum disorder detection framework for children
[70] K. Bhambani, T. Jain, and K. A. Sultanpure, ‘‘Real-time face mask and based on federated learning integrated CNN-LSTM,’’ Comput. Biol. Med.,
social distancing violation detection system using YOLO,’’ in Proc. IEEE vol. 166, Nov. 2023, Art. no. 107539.
Bengaluru Humanitarian Technol. Conf. (B-HTC), Oct. 2020, pp. 1–6. [93] H. Younes, H. L. Blevec, M. Léonardon, and V. Gripon, ‘‘Inter-
[71] Hendry and R.-C. Chen, ‘‘Automatic license plate recognition via sliding- operability of compression techniques for efficient deployment of CNNs
window darknet-YOLO deep learning,’’ Image Vis. Comput., vol. 87, on microcontrollers,’’ in Proc. Int. Conf. Syst.-Integr. Intell., Springer,
pp. 47–56, Jul. 2019. 2022, pp. 543–552.
[72] C. Dewi, R.-C. Chen, X. Jiang, and H. Yu, ‘‘Deep convolutional neural [94] N. Rane, S. Choudhary, and J. Rane, ‘‘YOLO and faster R-CNN
network for enhancing traffic sign recognition developed on YOLO v4,’’ object detection in architecture, engineering and construction (AEC):
Multimedia Tools Appl., vol. 81, no. 26, pp. 37821–37845, Apr. 2022. Applications, challenges, and future prospects,’’ Eng. Construction, Appl.,
[73] A. M. Roy, J. Bhaduri, T. Kumar, and K. Raj, ‘‘WilDect-YOLO: Challenges, Future Prospects, Oct. 2023.
An efficient and robust computer vision-based accurate object localization [95] B.-G. Han, J.-G. Lee, K.-T. Lim, and D.-H. Choi, ‘‘Design of a scalable and
model for automated endangered wildlife detection,’’ Ecolog. Informat., fast YOLO for edge-computing devices,’’ Sensors, vol. 20, no. 23, p. 6779,
vol. 75, Jul. 2023, Art. no. 101919. Nov. 2020.
[74] D. H. Dos Reis, D. Welfer, M. A. D. S. L. Cuadros, and D. F. T. Gamarra, [96] G. Plastiras, M. Terzi, C. Kyrkou, and T. Theocharidcs, ‘‘Edge intel-
‘‘Mobile robot navigation using an object recognition software with RGBD ligence: Challenges and opportunities of near-sensor machine learning
images and the YOLO algorithm,’’ Appl. Artif. Intell., vol. 33, no. 14, applications,’’ in Proc. IEEE 29th Int. Conf. Application-specific Syst.,
pp. 1290–1305, Nov. 2019. Architectures Processors (ASAP), Jul. 2018, pp. 1–7.
[75] A. Ye, B. Pang, Y. Jin, and J. Cui, ‘‘A YOLO-based neural network with [97] M. P. Véstias, ‘‘A survey of convolutional neural networks on edge with
VAE for intelligent garbage detection and classification,’’ in Proc. 3rd Int. reconfigurable computing,’’ Algorithms, vol. 12, no. 8, p. 154, Jul. 2019.
Conf. Algorithms, Comput. Artif. Intell., Dec. 2020. [98] Q. Wang, Q. Li, K. Wang, H. Wang, and P. Zeng, ‘‘Efficient federated learn-
[76] J. Li, J. Gu, Z. Huang, and J. Wen, ‘‘Application research of improved ing for fault diagnosis in industrial cloud-edge computing,’’ Computing,
YOLO v3 algorithm in PCB electronic component detection,’’ Appl. Sci., vol. 103, no. 10, pp. 2319–2337, Oct. 2021.
vol. 9, no. 18, p. 3750, Sep. 2019. [99] C. He, M. Annavaram, and S. Avestimehr, ‘‘Group knowledge transfer:
[77] J. Jiang, X. Fu, R. Qin, X. Wang, and Z. Ma, ‘‘High-speed lightweight Federated learning of large CNNs at the edge,’’ in Proc. Adv. Neural Inf.
ship detection algorithm based on YOLO-V4 for three-channels RGB SAR Process. Syst., vol. 33, 2020, pp. 14068–14080.
image,’’ Remote Sens., vol. 13, no. 10, p. 1909, May 2021.
[78] B. Chen and X. Miao, ‘‘Distribution line pole detection and counting based
on YOLO using UAV inspection line video,’’ J. Electr. Eng. Technol.,
vol. 15, no. 1, pp. 441–448, Jul. 2019.
[79] S. R. Vrajesh, A. N. Amudhan, A. Lijiya, and A. P. Sudheer, ‘‘Shuttlecock
detection and fall point prediction using neural networks,’’ in Proc. Int.
Conf. Emerg. Technol. (INCET), Jun. 2020, pp. 1–6.
[80] H. Wu, Y. Hu, W. Wang, X. Mei, and J. Xian, ‘‘Ship fire detection based
on an improved YOLO algorithm with a lightweight convolutional neural
network model,’’ Sensors, vol. 22, no. 19, p. 7420, Sep. 2022.
[81] K. Chen, H. Li, C. Li, X. Zhao, S. Wu, Y. Duan, and J. Wang, ‘‘An automatic MUHAMMAD HUSSAIN received the B.Eng.
defect detection system for petrochemical pipeline based on cycle-GAN degree in electrical and electronic engineering
and YOLO v5,’’ Sensors, vol. 22, no. 20, p. 7907, Oct. 2022. and the M.S. degree in Internet of Things from
[82] R. Zhang and C. Wen, ‘‘SOD-YOLO: A small target defect detection the University of Huddersfield, in 2019, and the
algorithm for wind turbine blades based on improved YOLOv5,’’ Adv. Ph.D. degree in artificial intelligence for defect
Theory Simulations, vol. 5, no. 7, Jul. 2022, Art. no. 2100631. identification. He is an accomplished Researcher
[83] I. Khokhlov, E. Davydenko, I. Osokin, I. Ryakin, A. Babaev, V. Litvinenko, hailing in Dewsbury, U.K. His work contributes to
and R. Gorbachev, ‘‘Tiny-YOLO object detection supplemented with optimizing PV systems’ efficiency and reliability.
geometrical data,’’ in Proc. IEEE 91st Veh. Technol. Conf. (VTC-Spring), He is equally passionate about machine vision,
May 2020, pp. 1–5. focusing on lightweight architectures for edge
[84] Y. A. Khan, S. Imaduddin, A. Ahmad, and Y. Rafat, ‘‘Image-based foreign
device deployment in real-world production settings. Beyond fault detection,
object detection using YOLO v7 algorithm for electric vehicle wireless
he explores AI interpretability, concentrating on developing explainable
charging applications,’’ in Proc. 5th Int. Conf. Power, Control Embedded
Syst. (ICPCES), Jan. 2023, pp. 1–6.
AI for medical and healthcare applications. His interdisciplinary approach
[85] E. S. T. K. Reddy and V. Rajaram, ‘‘Pothole detection using CNN and underscores his commitment to ethical and impactful AI solutions. With
YOLO v7 algorithm,’’ in Proc. 6th Int. Conf. Electron., Commun. Aerosp. his diverse expertise spanning AI, fault detection, machine vision, and
Technol., Dec. 2022, pp. 1255–1260. interpretability, he aims to leave his mark on shaping the future of technology
[86] A. Munin, A. Folarin, A. Munin-Doce, L. Alonso-Garcia, V. Diaz-Casas, and its positive influence on society. His research interests include fault
S. Ferreno-Gonzalez, and J. M. Ciriano-Palacios, ‘‘Real time vessel detection, particularly microcracks on photovoltaic (PV) cells due to
detection model using deep learning algorithms for controlling a barrier mechanical and thermal stress.
system,’’ J. SSRN, Apr. 2023.