2nd base paper
2nd base paper
com/scientificreports
Traffic sign detection utilizes computer vision and artificial intelligence technologies to recognize and interpret
traffic signs on the road automatically. This technology is critical for enhancing driving safety, guiding traffic
behavior and aiding in the decision-making processes of autonomous vehicles. Effective traffic sign detection
not only helps drivers adhere to traffic regulations, but also reduces traffic accidents and improves traffic flow
management.
As transportation networks expand and urbanization progresses rapidly, traffic signs become increasingly
vital for managing traffic flow and ensuring road safety. Yet, the growing variety and complexity of urban traffic
signs present significant challenges for traditional traffic sign recognition systems, which struggle to adapt to
dynamic and complex traffic conditions. Consequently, enhancing the accuracy and real-time performance of
automatic traffic sign recognition has emerged as a critical issue in contemporary traffic management.
This paper proposes a traffic sign detection algorithm based on an improved YOLOv81. By incorporating a
small object detection layer and integrating BiFPN structure into the neck network, the algorithm enhances the
accuracy and efficiency of traffic sign detection, providing a more reliable solution for traffic management and
autonomous driving systems.
The main contributions are as follows:
1. We explored the principles of small object detection layers for detecting traffic signs. By integrating a small
object detection layer into the YOLOv8 framework, we verified its effectiveness in detecting traffic signs.
2. We discussed the advantages of the BiFPN network and integrated it within the YOLOv8 model. Experi-
ments confirmed that the improved model significantly enhanced the accuracy of traffic sign detection.
Related works
Traffic signs are essential elements on road surfaces that indicate traffic rules, warn of road hazards, and provide
important information for drivers. They play a critical role in ensuring road safety, maintaining order, and
improving traffic efficiency. Vehicle-mounted traffic sign recognition systems effectively guide and regulate driver
behavior, ensuring safe driving and reducing traffic accidents. Additionally, with the ongoing development of
intelligent transportation systems, detecting and recognizing traffic signs further enhances autonomous driving
technology, improving road commuting efficiency.
The process of traffic sign detection involves two main steps: localization and recognition of the signs. During
the localization phase, the system identifies the presence and exact location of traffic signs within complex road
1Transportation Institute of Inner Mongolia University, Hohhot 010070, China. 2Inner Mongolia
Engineering Research Center for Intelligent Transportation Equipment, Hohhot 010070, China. email:
[email protected]
environments. In the recognition and classification phase, the system interprets the detected signs to determine
their types, such as stop signs or speed limits.
Traditional methods for traffic sign detection have primarily relied on image processing and machine learning
techniques, using detection algorithms such as Histogram of Orientation Gradients (HOG) and Scale-Invariant
Feature Transformations (SIFT), together with classification algorithms such as Support Vector Machines (SVM)
and Random Forests (RF).
The traditional approach primarily relies on distinctive features of traffic signs, such as their specific colors
(red, blue, yellow, etc.) and prominent shapes (triangular, circular, rectangular, etc.). These unique attributes
are used to extract features for detection, followed by classification with a trained classifier. De La Escalera et
al.2 elected the color and shape of the sign’s corners as features to extract traffic signs from the environment
and used neural networks to classify the identified signs. Gómez-Moreno et al.3 proposed a color segmentation
method using SVM and increased the speed with the Look-Up Table (LUT) while maintaining the quality. Yuan
et al.4 introduced a robust recognition method for traffic signs based on Color Global and Locally Oriented
Edge Magnitude Patterns (Color Global LOEMP). This technique effectively integrates color, global spatial
structure, global orientation structure, and local shape information, significantly enhancing the efficiency of
traffic sign recognition. Berkaya et al.5 utilized the EDCircles circle detection algorithm combined with an RGB-
based color thresholding technique to detect traffic signs. This approach used a feature extraction method that
integrates GABOR, LBP, and HOG techniques, and classification was subsequently performed with a SVM. This
methodological combination effectively enhanced the traffic sign detection process.
Whether based on color features or shape features, these features are prone to interference from external
objects of similar color and shape in complex road conditions, resulting in poor generalization. The emergence
of machine learning offers a new approach to traffic sign detection, with advantages such as strong generalization
and high robustness. Sun et al.6 introduced a traffic sign recognition method that skillfully combines a HOG
for feature extraction with an Extreme Learning Machine (ELM) classifier for rapid classification. The model
not only has high recognition accuracy in the GTSRB dataset but also exhibits significant advantages in
computational efficiency.
Aiming to strike a balance between computational efficiency and recognition accuracy, Huang et al.7
introduced a traffic sign recognition method based on ELM. This method integrates an enhanced version
of the HOG for feature extraction with the ELM classifier.The approach achieved high recognition accuracy
both on the GTSRB and the Belgium traffic sign classification dataset (BTSC), while maintaining very high
computational efficiency. Ellahyani et al.8 developed a method for traffic sign recognition by initially employing
the HSI color space for color threshold segmentation to pinpoint potential traffic sign regions. This approach
was augmented with the use of SVM and Random Forest classifiers for enhanced accuracy, and utilized HOG
features to recognize the traffic signs effectively.
As cities expand and road networks become more complex, traditional traffic sign recognition systems are
increasingly challenged by issues such as lighting variations, occlusions, and the similarities among different
types of signs. Traditional computer vision techniques often struggle in these complex environments, lacking
the robustness and accuracy needed. Consequently, deep learning-based traffic sign recognition algorithms have
gained prominence.
Deep learning algorithms utilize neural networks to model intricate relationships between inputs and
outputs. These algorithms have gained popularity in traffic sign recognition due to their ability to autonomously
learn high-level features directly from raw data. This capability significantly diminishes the necessity for manual
feature extraction, streamlining the process and enhancing the effectiveness of recognition systems. Li et al.9
utilized CNN to specifically target the detection and recognition of traffic signs in the United States, concentrating
particularly on speed limit signs. Their proposed method demonstrated impressive detection performance on
the LISA-TS dataset, highlighting its effectiveness in identifying these critical signs.
Li & Wang10 combined Faster R-CNN with MobileNets to precisely locate and classify small traffic signs. This
innovative approach leveraged the strengths of both technologies: Faster R-CNN for its efficient and accurate
detection capabilities and MobileNets for its lightweight, mobile-friendly architecture, resulting in enhanced
performance in recognizing smaller traffic signs. Tabernik & Skočaj11 enhanced the Mask R-CNN framework to
better recognize small traffic signs and introduced a novel data augmentation technique to improve the model’s
generalization capabilities. Evaluations on both the DFG and the Swedish traffic sign datasets demonstrated
significant performance gains, with the refined Mask R-CNN model achieving metrics such as mAP50 up to
95.5%. Zhang et al.12 proposed a cascaded R-CNN model with multiscale attention, which improves detection
accuracy by focusing on multiscale feature extraction and balancing imbalanced datasets, thus enhancing the
model’s performance in detecting small-sized traffic signs. Wang et al.13 developed an enhanced lightweight
traffic sign recognition algorithm based on YOLOv4-Tiny. The algorithm refines the K-means clustering method
to generate anchor frames tailored to the traffic sign dataset, which significantly improves detection recall and
target localization precision. When evaluated on the TT100K dataset, the improved algorithm achieved a mean
Average Precision (mAP) at 0.5 of 52.07% and demonstrated enhanced real-time performance. Dewi et al.14
combined YOLOv3 and Densenet models, incorporating SPP to optimize the feature extraction. This innovation
significantly boosted the recognition accuracy of small traffic signs.
The comparison of traffic sign detection algorithms is presented in Table 1. Traditional traffic sign detection
algorithms depend on manually features and tend to be sensitive to lighting conditions and complex backgrounds.
While machine learning techniques can automate feature extraction from images, their detection performance is
generally inferior. In contrast, deep learning approaches achieve high accuracy rates; however, they struggle with
detecting small targets effectively. Therefore, the paper will focus on leveraging deep learning to enhance the
accuracy of traffic sign detection, improving the robustness and real-time performance of detection algorithms.
Methodology
Method overview
The YOLO-BS detection algorithm presented in this paper is a one-stage traffic sign detection algorithm, and its
framework is shown in Fig. 1. The algorithm consists of three main parts: the backbone, the neck and the head.
Initially, images undergo preprocessing through input part data augmentation and other operations before
being fed into the backbone network. The backbone network extracts features from the images, producing feature
maps at four different scales. These feature maps are then processed by the neck network for feature fusion,
resulting in four scaled features. Finally, these features are input into the detection head network for prediction,
which outputs the position, confidence, and classification information of the detection boxes at each scale.
Detection layer Feature map dimensions Receptive field Detect object size
P2 160 × 160 Small Very small
P3 80 × 80 Medium Small
P4 40 × 40 Large Medium
P5 20 × 20 Larger Large
Fig. 2. Comparison of the neck before and after integrating small object layer and BiFPN.
The small object detection layer is introduced mainly by adding higher-resolution feature maps. These feature
maps can retain more spatial detail information, thus increasing the network’s sensitivity to small targets.
Traditional YOLO conducts target detection using a single-scale feature map, which limits its efficacy in
detecting smaller objects. By incorporating a small object detection layer, the architecture can simultaneously
engage multiple-scale feature maps for detection. Specifically, the P2 feature layer in the backbone is convolved
to obtain scale features rich in small target information, and then fused with the Upsample layer, followed by
input to the lower CSP module for multi-scale feature fusion. Finally, the detection head detects the fused multi-
scale features.
YOLOv8 introduces mesoscale and small-scale feature maps in addition to the original scale, enabling multi-
level detection across these varying scales. This multi-scale detection strategy enhances the network’s ability to
comprehensively capture a wider range of targets within the image, including those that are notably small.
BiFPN
The Bidirectional Feature Pyramid Network15 represents an advanced feature pyramid structure aimed at
bolstering the multi-scale feature fusion capabilities of CNN for target detection tasks. BiFPN has demonstrated
remarkable value in the field of target detection, becoming a key component in various cutting-edge detection
frameworks such as EfficientDet and YOLO.
Unlike traditional Feature Pyramid Networks (FPN), which enhance the detection of various-sized targets
by merging features of different scales via a top-down path, FPNs typically employ a homogeneous and
unidirectional information transfer. BiFPN enhances this approach by introducing a bidirectional information
flow, significantly optimizing feature utilization and representation through a weighted feature fusion mechanism,
thereby improving the overall efficacy of target detection.
Figure 2a is the original neck of YOLOv8, while the neck network structure after integrating BiFPN into
YOLO is shown in Fig. 2b. BiFPN realizes top-down and bottom-up bidirectional information flow through
a bidirectional feature pyramid structure. This design not only enhances the information transfer between
different layers of features, but also enables a fuller fusion of features from different scales, thus improving the
network’s ability to detect multi-scale targets.
BiFPN employs a fast normalized fusion method that improves upon the traditional feature fusion methods
used in FPN16. In conventional FPN, feature maps of different scales are typically merged using a simple addition
operation, which does not adequately account for the varying importance of features across scales. In contrast,
BiFPN introduces learnable weight coefficients for each scale’s feature maps during the fusion process, as shown
in Eq. (1):
∑ ω i Ii
O= ∑
ε+ ωj (1)
i
j
where Ii is the input feature, O is the output feature, ωi and ωj are the learnable weights, and ε = 0.0001 is a small
amount to mitigate potential numerical instability.
→in
−
Given a list of multiscale features P = (P1in, P2in, …), where Piin denotes the feature
→out at −
− →layer
in
i. The transformed
BiFPN effectively aggregates the different features to obtain a new feature list P = f( P ) as the output. The
BiFPN feature fusion process at layer 4 can be described as Eq. (2 and 3).
( )
ω1 · P4in + ω2 · Resize(P5in )
P4td = Conv (2)
ω 1 + ω2 + ε
( )
ω1′ · P4in + ω2′ · P4td + ω3′ · Risize(P3out )
P4out = Conv (3)
ω1′ + ω2′ + ω3′ + ε
where P4td denotes the intermediate features of layer 4 on the top-down path, while P4out denotes the output
features of layer 4 on the bottom-up path. Resize is an upsampling or downsampling operation for resolution
matching. Meanwhile, ‘Conv’ usually refers to a convolution operation for feature processing.
Experiments
TT100K dataset
The TT100K traffic signs dataset17, a collaborative creation by the joint lab of Tsinghua University and Tencent,
stands as the first large-scale traffic signs and signals dataset in China. It comprises over 100,000 traffic-related
images, encompassing a diverse range of traffic signs and traffic lights. The categorization of traffic signs within
the dataset is detailed in Fig. 3.
Some traffic signs in the TT100K traffic sign dataset have a myriad of labels, while others have fewer than
100 labels, leading to an imbalance in the sample distribution and making detection challenging. To address
this issue, a refined traffic sign dataset containing 45 categories was created by isolating those with more than
100 samples through a screening procedure. The number of corresponding labels for the screened traffic signs is
illustrated in Fig. 4. The refined traffic sign dataset is divided into a training set and a test set in an 8:2 ratio, with
the default validation set being the same as the test set.
Prediction
Positive Negative
True TP TN
Ground truth
False FP FN
Evaluation indicators
The metrics for evaluating the performance of the YOLO algorithm are P(precision), R(recall) and mAP (mean
Average Precision). These evaluation metrics are explained based on the confusion matrix as shown in Table 3.
Precision indicates the ratio of correctly predicted positive samples (TP) to all predicted positive samples
(TP + FP), and the accuracy rate mainly depends on whether the prediction results are accurate or not. Its
formula is shown in Eq. (4):
TP
P = (4)
TP + FP
Recall indicates the ratio of correctly predicted as positive samples (TP) to all true cases as positive samples
(TP + FN), and recall mainly depends on whether the prediction results are comprehensive or not. Its formula
is shown in Eq. (5):
TP
R= (5)
TP + FN
AP (Average Precision) refers to the area of the region below the PR curve plotted with recall as the horizontal
coordinate and precision as the vertical coordinate, AP is used to measure the performance of the algorithm in
recognizing each category. Its formula is shown in Eq. (6):
∫ 1
AP = P dR(6)
0
The value of mAP is taken equal to the average value of AP over all categories and is used to measure the
performance of the algorithm in recognizing all categories.
Ablation experiments
The system used for the experiment was Windows 10 Professional, and the computer hardware configuration
is shown in Table 4. The deep learning framework used was Pytorch2.3, and the commonly used experimental
environments such as CUDA11.8, cudnn11.x and opencv4.6.0 were installed.
The training-related parameters are set according to Table 5 to train the YOLOv8 model on the traffic sign
dataset.
Results discussion
Results of ablation experiments
The ablation experiments were conducted to evaluate the performance improvements of the proposed YOLO-BS
model over the baseline YOLOv8 and YOLOv8 with a small object detection layer. The results are presented in
Table 6, showcasing the impact of each enhancement on key metrics such as GFLOPs, P, R, mAP50, mAP50-95,
and FPS. Meanwhile, the variation curves of precision, recall, and mAP50 for the three models throughout the
training process are presented in Fig. 5.
As shown in Fig. 5, YOLOv8 exhibits the lowest precision, recall, and mAP50 values, while YOLO-BS achieves
the highest precision, recall, and mAP50 values. Specifically, the baseline YOLOv8 achieved a precision of 81.7%,
recall of 73.8%, and mAP50 of 81.8%. By adding the small object detection layer, the model’s precision increased
to 86.3%, recall to 79.2%, and mAP50 to 87.3%. The full YOLO-BS model, incorporating both the small object
detection layer and BiFPN, further improved these metrics to 87.9% precision, 80.5% recall, and 90.1% mAP50.
The results indicate that the YOLO-BS significantly outperforms the baseline YOLOv8 in all metrics.
Although the FPS is slightly lower than the baseline, the improved model maintains a good balance between
speed and accuracy. The integration of the BiFPN and small object detection layer into the YOLOv8 architecture
has improved the performance of the YOLO-BS model. By incorporating BiFPN, these frameworks significantly
enhance their capabilities in detecting small and multi-scale targets. Particularly in real-time detection tasks,
the efficient feature fusion mechanism of BiFPN allows the detection system to sustain high operational speeds
without compromising accuracy. This balance of speed and precision underscores BiFPN’s pivotal role in
improving the effectiveness of detection systems. The bidirectional information flow and weighted feature fusion
mechanism in BiFPN allow the model to better manage multi-scale features, enhancing detection accuracy for
small targets.
Additionally, the small object detection layer increases the network’s sensitivity to smaller traffic signs, which
are often encountered in traffic sign detection scenarios.
Conclusion
This paper proposed YOLO-BS, a traffic sign detection algorithm based on an improved YOLOv8 framework.
The small object detection layer enhances the network’s sensitivity to smaller objects, which are commonly
encountered in traffic sign detection scenarios. Additionally, the bidirectional information flow and weighted
feature fusion mechanism in BiFPN enable the model to better handle multi-scale features, improving the
detection accuracy for small targets.
Fig. 6. The detection results of Faster R-CNN, YOLOv5, YOLOv8, and YOLO-BS.
By incorporating a small object detection layer and integrating the BiFPN, the algorithm significantly
enhances the accuracy and robustness of traffic sign detection. Experimental results on the TT100K dataset
demonstrate that YOLO-BS outperforms current mainstream models, achieving high mAP and FPS metrics,
making it a promising solution for real-time traffic sign detection in intelligent transportation systems. Future
research will focus on further optimizing the YOLO-BS model, potentially through hardware acceleration
techniques and more efficient network architectures, to enhance its real-time performance further.
Data availability
The datasets used and analyzed during the current study are publicly available and can be accessed from https:/
/cg.cs.tsin
ghua.edu.c n/traffic-sign/
References
1. Varghese, R., Sambath, M. YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness. In: 2024
International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), IEEE, pp. 1–6. (2024).
2. De La Escalera, A., Moreno, L. E., Salichs, M. A. & Armingol, J. M. Road traffic sign detection and classification. IEEE Trans.
Industr. Electron. 44(6), 848–859 (1997).
3. Gómez-Moreno, H., Maldonado-Bascón, S., Gil-Jiménez, P. & Lafuente-Arroyo, S. Goal evaluation of segmentation algorithms for
traffic sign recognition. IEEE Trans. Intell. Transp. Syst. 11(4), 917–930 (2010).
4. Yuan, X., Hao, X., Chen, H. & Wei, X. Robust traffic sign recognition based on color global and local oriented edge magnitude
patterns. IEEE Trans. Intell. Transp. Syst. 15(4), 1466–1477 (2014).
5. Berkaya, S. K., Gunduz, H., Ozsen, O., Akinlar, C. & Gunal, S. On circular traffic sign detection and recognition. Expert Syst. Appl.
48, 67–75 (2016).
6. Sun, Z.-L., Wang, H., Lau, W.-S., Seet, G. & Wang, D. Application of BW-ELM model on traffic sign recognition. Neurocomputing
128, 153–159 (2014).
7. Huang, Z., Yu, Y., Gu, J. & Liu, H. An efficient method for traffic sign recognition based on extreme learning machine. IEEE Trans.
Cybern. 47(4), 920–933 (2016).
8. Ellahyani, A., El Ansari, M. & El Jaafari, I. Traffic sign detection and recognition based on random forests. Appl. Soft Comput. 46,
805–815 (2016).
9. Li, Y., Møgelmose, A. & Trivedi, M. M. Pushing the “Speed Limit”: high-accuracy US traffic sign recognition with convolutional
neural networks. IEEE Trans. Intell. Vehicles 1(2), 167–176 (2016).
10. Li, J. & Wang, Z. Real-time traffic sign recognition based on efficient CNNs in the wild. IEEE Trans. Intell. Transp. Syst. 20(3),
975–984 (2018).
11. Tabernik, D. & Skočaj, D. Deep learning for large-scale traffic-sign detection and recognition. IEEE Trans. Intell. Transp. Syst. 21(4),
1427–1440 (2019).
12. Zhang, J. M., Xie, Z. P., Sun, J., Zou, X. & Wang, J. A cascaded R-CNN with multiscale attention and imbalanced samples for traffic
sign detection. IEEE Access 8, 29742–29754 (2020).
13. Wang, L., Zhou, K., Chu, A., Wang, G. & Wang, L. An improved light-weight traffic sign recognition algorithm based on YOLOv4-
tiny. IEEE Access 9, 124963–124971 (2021).
14. Dewi, C., Chen, R.-C., Yu, H. & Jiang, X. Robust detection method for improving small traffic sign recognition based on spatial
pyramid pooling. J. Ambient Intell. Human. Comput. 14(7), 8135–8152 (2023).
15. Tan, M., Pang, R., Le, Q.V. Efficientdet: Scalable and efficient object detection. IN: Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition, pp 10781–10790. (2020).
16. Chen, J. Q. et al. Efficient and lightweight grape and picking point synchronous detection model based on key point detection.
Comput. Electron. Agricult. 217, 108612 (2024).
17. Zhu, Z., Liang, D., Zhang, S., Huang, X., Li, B., Hu, S. Traffic-sign detection and classification in the wild. In: Proceedings of the
IEEE conference on computer vision and pattern recognition, pp. 2110–2118. (2016).
18. Wang, J. F., Chen, Y., Dong, Z. K. & Gao, M. Y. Improved YOLOv5 network for real-time multi-scale traffic sign detection. Neural
Comput. Appl. 35(10), 7853–7865 (2023).
19. Li, Z. S. et al. Toward effective traffic sign detection via two-stage fusion neural networks. IEEE Trans. Intell. Trans. Syst. https: //do
i.org/ 10.1109/TI TS.2024.33 73793 (2024).
20. Gong, C. P., Li, A. J., Song, Y. M., Xu, N. & He, W. K. Traffic sign recognition based on the YOLOv3 algorithm. Sensors 22(23), 9345
(2022).
Author contributions
Hong Zhang and Mingyin Liang wrote the main manuscript text and Validated the method proposed in the ar-
ticle. Mingyin Liang. prepared Figs. 1, 2, 3 and 4. Yufeng Wang Collated experimental data. All authors reviewed
the manuscript.
Funding
This research is supported by National Natural Science Foundation of China (NSFC) (62362053), Program
for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region of China
(NJYT23060), 2024 Basic Research and Applied Basic Research of Hohhot (2024-G-J-29), "Inner Mongolia Sci-
ence and Technology Achievement Transfer and Transformation Demonstration Zone, University Collaborative
Innovation Base, and University Entrepreneurship Training Base" Construction Project (Supercomputing Power
Project) (21300-231510).
Declarations
Competing interests
The authors declare no competing interests.
Additional information
Correspondence and requests for materials should be addressed to H.Z.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.