Security in smart cities using YOLOv8 to detect lethal weapons
Security in smart cities using YOLOv8 to detect lethal weapons
Corresponding Author:
Ernesto Paiva-Peredo
Department of Electronic Engineering, Faculty of Engineering, Universidad Tecnológica del Perú
125 Natalio Sanchez Street, Santa Beatriz Urbanization, Cercado de Lima, Lima, Peru
Email: [email protected]
1. INTRODUCTION
Globally, there are more than 1,013 million firearms in the world and more than 85% of them are in
the hands of civilians [1]. Consequently, the use of firearms causes up to 1,000 deaths per day, along with
more than 250,000 armed incidents per year [2]. This reality highlights the importance of the use of closed
circuit television (CCTV) systems for the early identification of lethal weapons (firearms and sharp weapons)
in images [3]–[9]. In this way, it is possible to combat crimes such as assaults and robberies, which are carried
out with armed hands [5], [10]. In this context, the detection of crimes by images based on artificial intelligence
arises, due to its high precision, anticipation and adaptability in the detection of objects [6], [11].
Currently, the use of artificial intelligence, for the identification of violence scenarios such as crimes
involving the presence of lethal weapons, is in constant expansion. Therefore, algorithms based on an artificial
neural network and the moving picture experts group 7 (MPEG-7) descriptor are proposed to classify frames
of a CCTV transmission, in which results of 94% and 95%, respectively, are obtained [4], [12]. It is relevant to
note that both models exhibit a low rate of positive incorrect detections, but miss a considerable amount of neg-
ative incorrect detections. However, among the most outstanding approaches is the application of convolutional
neural networks (CNN) [9], [10], [13]. Therefore, a network based on visual geometry group-16 (VGG-16)
and another on VGG were configured, which have an efficiency of 93% and 86%, respectively [10], [13]. It
is important to note that both models use the VGG extractor, which does not allow optimal performance when
locating the presence of lethal weapons in the images. In this sense, for the identification of a crime scene it is
important to detect lethal weapons, for this purpose, CNNs are usually used [3], [14]. For example, the method
called flow gated network, which combines the advantages of three-dimensional convolutional neural networks
(3D-CNN) and optical flow, resulting in an accuracy of 87.25% [3]. It is worth mentioning that 3D-CNNs
require a higher computational load with respect to CNNs [3].
First, in the field of CNNs, YOLO is one of the most widely used to identify objects in the frames of a
real-time video sequence [8]. For this reason, a YOLOv3 model was trained, with the purpose of identifying the
presence of firearms in images [2], [15], [16]. This model is fused with recurrent convolutional neural networks
(R-CNN), through which it achieves a performance of 94.23% [15]. It should be noted that the model uses the
open source database ”kaggle”, this generates a limited management in the quality and variety of the training
data to the network. It is also important to consider that the R-CNN does not work in real time [15]. Also,
models for weapon detection are proposed by applying YOLOv5 [1], [8], [17], [18]. The model exhibits the
ability to identify lethal weapons, achieving a mAP of 52.92%. It also exhibits an inference rate of 61 frames
per second (FPS) [8]. In addition, the database applied to the training of the network, is deficient based on the
variety. Also, an efficiency of 93% image accuracy was obtained by combining YOLOv5 and faster R-CNN,
using a database of 3000 guns [1].
On the other hand, for the recognition of firearms in images by means of deep learning, the fusion
between posture estimation and object detection is being used [5], [6]. Therefore, an algorithm is designed to
define the pose of each person in a frame, in order to obtain the position of the hands and create a bounding
box where the object detector is applied [5], [6]. The model uses Open Pose to estimate the posture and vision
transformer (ViT) to detect the weapon [5]. It is important to point out that the efficiency relies on the pose
estimator, if it fails, there will be no detection process [5], [6]. To achieve an early detection of firearms, it is
essential to consider that these are not always carried in the hands, since there are varieties of weapons that can
be carried on the chest or hanging from the shoulders.
Given the need for more efficient systems for the detection of dangerous objects, whether firearms or
knives. A YOLOv8 CNN is trained using a cloud computing infrastructure to reduce the computational burden
[19]. In addition, a diverse dataset is generated that includes a wide range of lethal weapons, such as shotguns,
pistols, knives, machetes, among others. To augment the training data, synthesis techniques are employed, this
increases the database and gives better learning versatility to the CNN [3], [5].
2. METHODOLOGY
Today, the number of crimes involving lethal weapons (firearms and knives) has risen exponentially
[1], [4]. Measures have been taken to address this problem, such as the installation of CCTV systems. How-
ever, these systems only accumulate the data, and do not work it through video inspection or object detection
algorithms [7], [9], [12]. Therefore, a neural network capable of detecting lethal weapons in real time is trained,
in order to provide greater accuracy, versatility and reduction of false positives. After conducting a comparative
analysis of various object detection techniques that rely on deep learning, such as YOLO, single-shot multibox
detector (SSD) and the fastest R-CNN. It is determined that YOLO excels in achieving an optimal trade-off
between mean average precision (mAP) and inference speed, for real-time predictions [8], [11], [18].
Finally, YOLOv8 is defined as the CNN to be trained, because it is the most recent iteration in a
sequence of algorithms created by Open AI researchers for object detection and tracking. To our knowledge,
this is the first work investigating the use of YOLOv8 in identifying deadly weapons in images. The research is
divided into three stages, the first one covers the obtaining of the database, the second stage corresponds to the
data augmentation together with the etiquedata and the third stage the training of the CNN YOLOv8, obtained
from ultralytics.
were extracted, from Youtube 931 images were collected and from Google 1133 images were collected. The
acquisition process in the sources used is explained as follows.
2.1.2. YouTube
Additionally, the YouTube platform was used for the collection of images with the presence of a
sharp weapon, in this platform we can find videos of CCTV [3]. A flowchart is presented in Figure 2, of the
procedure of obtaining images from a video. Initially, we explored the YouTube platform using a series of
keywords related to violent acts, such as real knife fights, knife-wielding assailant, and other similar terms.
Subsequently, we used the online video converter program that automatically downloads the videos from the
obtained links. After completing this process, we extract images every 5 seconds, all at a speed of 30 FPS.
Security in smart cities using YOLOv8 to detect lethal weapons (Ederson Rodriguez-Rosas)
948 ❒ ISSN: 2252-8938
On the other hand, the images are tagged in the Makesense online platform. For this purpose, two tags
”Firearm” and ”White weapon” were created. These labels provide the neural network with information about
the exact location of the object to be identified, so that the network learns to recognize patterns and relevant
features in the area limited by the label. Finally, the images are divided into two groups, 4104 for training and
892 for validation. Figure 4 shows a group of labeled images.
On the other hand, the results of the YOLOv8n model with the same training parameters were very
similar to the previous one. Because a mAP50 of 89.59% and a mAP50-95 of 63.26% were obtained in the
detection of lethal weapons. This allows us to conclude that the YOLOv8n model has a better performance in
the detection of lethal weapons, this is observed when analyzing each evaluation metric in Figure 6.
Our YOLOv8n network demonstrates superior performance compared to existing methods in the
literature. The flow gated network, which combines 3D-CNN and optical flow for violence scene detection,
achieved a mAP of 87.25% [3]. In comparison, our YOLOv8n model achieved a higher mAP of 89.59%,
with lower computational requirements. Additionally, while a YOLOv3 and R-CNN hybrid model for firearm
detection reported a mAP of 85% [15], it is not suitable for real-time use due to its slower speed. Our YOLOv8n
model, with a mAP of 89.59%, supports real-time operation, making it ideal for early weapon detection.
Furthermore, it outperforms a YOLOv5 model designed to detect lethal weapons, which achieved a mAP
of 52.92% [8], thanks to our more diverse training dataset.
Security in smart cities using YOLOv8 to detect lethal weapons (Ederson Rodriguez-Rosas)
950 ❒ ISSN: 2252-8938
(a)
(b)
Figure 7. Matrix of confusion for (a) the YOLOv8n model and (b) the YOLOv8x model
3.2. Prediction
The prediction results of YOLOv8n and YOLOv8x models are depicted in Figures 8 and 9. Figure 8
includes the prediction results for a batch of 16 images using both models. In Figure 8(a), the predictions of
the YOLOv8n model are shown, detailing how the model performs on the given images. Figure 8(b) presents
the prediction results of the YOLOv8x model for the same batch of images, allowing for a direct comparison
with the YOLOv8n model.
Figure 9 further illustrates the prediction performance of both models. Figure 9(a) highlights an
instance where the YOLOv8n model misidentifies a white weapon as a person’s cap, providing insight into
the model’s limitations. In contrast, Figure 9(b) shows the YOLOv8x model’s ability to correctly identify the
white weapon with high accuracy, even when only a partial view of the weapon is visible. This comparison
underscores the YOLOv8x model’s superior detection capabilities in specific scenarios.
(a) (b)
Figure 8. Prediction results of the (a) YOLOv8n model and (b) YOLOv8x model
(a) (b)
Figure 9. Detection performance comparison of the (a) YOLOv8n mislabels a weapon and (b) YOLOv8x
labels it correctly
4. CONCLUSION
In this study, we developed and evaluated a YOLOv8 network specialized in the detection of lethal
weapons, covering both firearms and edged weapons. The results obtained were exceptional, with a perfor-
mance rate of 89.56%. This figure validates the effectiveness of our approach and highlights the model’s ability
to accurately identify potential threats. In addition, we note that our YOLOv8n model exhibits superiority in
certain aspects compared to YOLOv8x, especially in terms of accuracy. However, it is essential to emphasize
that the prediction results do not allow us to state with certainty a general superiority, indicating the need for
more detailed analyses in future research. Our research work provides a trained network with the ability to be
implemented in real time in any CCTV system that has the necessary computational parameters, as would be
the case in a smart city. Also, we believe that our proposed model could enhance its performance by training
with a more diverse database. It is worth noting the importance of the size of the images used in the training,
since YOLOv8 operates with a recommended size of 640 pixels for image.
REFERENCES
[1] A. H. Ashraf et al., “Weapons detection for security and video surveillance using cnn and YOLO-v5s,” Computers, Materials and
Continua, vol. 70, no. 2, pp. 2761–2775, 2022, doi: 10.32604/cmc.2022.018785.
[2] S. Narejo, B. Pandey, D. E. Vargas, C. Rodriguez, and M. R. Anjum, “Weapon detection using YOLOv3 for smart surveillance
system,” Mathematical Problems in Engineering, vol. 2021, 2021, doi: 10.1155/2021/9975700.
[3] M. Cheng, K. Cai, and M. Li, “RWF-2000: an open large scale video database for violence detection,” in 2020 25th International
Conference on Pattern Recognition (ICPR), IEEE, 2021, pp. 4183–4190, doi: 10.1109/ICPR48806.2021.9412502.
[4] M. Grega, S. Lach, and R. Sieradzki, “Automated recognition of firearms in surveillance video,” in 2013 IEEE International Multi-
Security in smart cities using YOLOv8 to detect lethal weapons (Ederson Rodriguez-Rosas)
952 ❒ ISSN: 2252-8938
Disciplinary Conference on Cognitive Methods in Situation Awareness and Decision Support (CogSIMA), IEEE, 2013, pp. 45–50,
doi: 10.1109/CogSIMA.2013.6523822.
[5] J. Ruiz-Santaquiteria, A. Velasco-Mata, N. Vallez, O. Deniz, and G. Bueno, “Improving handgun detection through a combination
of visual features and body pose-based data,” Pattern Recognition, vol. 136, 2023, doi: 10.1016/j.patcog.2022.109252.
[6] J. Ruiz-Santaquiteria, A. Velasco-Mata, N. Vallez, G. Bueno, J. A. Alvarez-Garcia, and O. Deniz, “Handgun detection using com-
bined human pose and weapon appearance,” IEEE Access, vol. 9, pp. 123815–123826, 2021, doi: 10.1109/ACCESS.2021.3110335.
[7] E. Paiva-Peredo, A. Vaghi, G. Montù, and R. Bucher, “Human detection on antistatic floors,” SSRN Electronic Journal, 2022, doi:
10.2139/ssrn.4264059.
[8] M. Boukabous and M. Azizi, “Image and video-based crime prediction using object detection and deep learning,” Bulletin of
Electrical Engineering and Informatics, vol. 12, no. 3, pp. 1630–1638, 2023, doi: 10.11591/eei.v12i3.5157.
[9] S. A. A. Shah, A. H. Emara, A. A. Wahab, N. A. Algeelani, and N. A. Al-Sammarraie, “Street-crimes modelled arms recogni-
tion technique employing deep learning and quantum deep learning,” Indonesian Journal of Electrical Engineering and Computer
Science, vol. 30, no. 1, pp. 528–544, 2023, doi: 10.11591/ijeecs.v30.i1.pp528-544.
[10] G. K. Verma and A. Dhillon, “A handheld gun detection using faster R-CNN deep learning,” in ACM International Conference
Proceeding Series, 2017, pp. 84–88, doi: 10.1145/3154979.3154988.
[11] J. Garcia-Pajuelo and E. Paiva-Peredo, “Comparison and evaluation of yolo models for vehicle detection on bicycle paths,” IAES
International Journal of Artificial Intelligence, vol. 13, no. 3, pp. 3634–3643, 2024, doi: 10.11591/ijai.v13.i3.pp3634-3643.
[12] M. Grega, A. Matiolański, P. Guzik, and M. Leszczuk, “Automated detection of firearms and knives in a CCTV image,” Sensors,
vol. 16, no. 1, 2016, doi: 10.3390/s16010047.
[13] D. Romero and C. Salamea, “Convolutional models for the detection of firearms in surveillance videos,” Applied Sciences, vol. 9,
no. 15, 2019, doi: 10.3390/app9152965.
[14] R. S. Mehsen, “Deep learning algorithm for detecting and analyzing criminal activity,” International Journal of Computing, vol. 22,
no. 2, pp. 248–253, 2023, doi: 10.47839/ijc.22.2.3095.
[15] A. R. Raju, T. Maddileti, S. J, R. Srinivas, and K. Saikumar, “Pseudo trained yolo R CNN model for weapon detection with a
real-time kaggle dataset,” International Journal of Integrated Engineering, vol. 14, no. 7, 2022, doi: 10.30880/ijie.2022.14.07.011.
[16] S. K. Nanda, D. Ghai, P. Ingole, and S. Pande, “Analysis of video forensics system for detection of gun, mask, and anomaly using
soft computing techniques,” AIP Conference Proceedings, vol. 2800, no. 1, 2023, doi: 10.1063/5.0162900.
[17] S. A. A. Akash, R. S. S. Moorthy, K. Esha, and N. Nathiya, “Human violence detection using deep learning techniques,” Journal of
Physics: Conference Series, vol. 2318, no. 1, 2022, doi: 10.1088/1742-6596/2318/1/012003.
[18] H. Gao, “A yolo-based violence detection method in iot surveillance systems,” International Journal of Advanced Computer Science
and Applications, vol. 14, no. 8, pp. 143–149, 2023, doi: 10.14569/IJACSA.2023.0140817.
[19] M. Zahrawi and K. Shaalan, “Improving video surveillance systems in banks using deep learning techniques,” Scientific Reports,
vol. 13, no. 1, 2023, doi: 10.1038/s41598-023-35190-9.
[20] Y. Al-Smadi et al., “Early wildfire smoke detection using different yolo models,” Machines, vol. 11, no. 2, 2023, doi: 10.3390/ma-
chines11020246.
[21] D. A. Cadillo-Laurentt and E. A. Paiva-Peredo, “Histopathological image classification using convolutional neural networks for
detection of metastatic breast cancer in lymph nodes,” International journal of online and biomedical engineering, vol. 20, no. 2,
pp. 31–45, 2024, doi: 10.3991/ijoe.v20i02.46789.
[22] V. E. Sathishkumar, J. Cho, M. Subramanian, and O. S. Naren, “Forest fire and smoke detection using deep learning-based learning
without forgetting,” Fire Ecology, vol. 19, no. 1, 2023, doi: 10.1186/s42408-022-00165-0.
[23] F. M. Talaat and H. ZainEldin, “An improved fire detection approach based on yolo-v8 for smart cities,” Neural Computing and
Applications, vol. 35, no. 28, pp. 20939–20954, 2023, doi: 10.1007/s00521-023-08809-1.
[24] P. Mehta, A. Kumar, and S. Bhattacharjee, “Fire and gun violence based anomaly detection system using deep neural networks,”
in 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), IEEE, 2020, pp. 199–204, doi:
10.1109/ICESC48915.2020.9155625.
[25] L. Zhao, L. Zhi, C. Zhao, and W. Zheng, “Fire-YOLO: a small target object detection method for fire inspection,” Sustainability,
vol. 14, no. 9, 2022, doi: 10.3390/su14094930.
[26] S. N. Saydirasulovich, A. Abdusalomov, M. K. Jamil, R. Nasimov, D. Kozhamzharova, and Y.-I. Cho, “A YOLOv6-based improved
fire detection approach for smart city environments,” Sensors, vol. 23, no. 6, 2023, doi: 10.3390/s23063161.
[27] H. Zheng, J. Duan, Y. Dong, and Y. Liu, “Real-time fire detection algorithms running on small embedded devices based on Mo-
bileNetv3 and YOLOv4,” Fire Ecology, vol. 19, no. 1, 2023, doi: 10.1186/s42408-023-00189-0.
[28] J. Lin, H. Lin, and F. Wang, “A semi-supervised method for real-time forest fire detection algorithm based on adaptively spatial
feature fusion,” Forests, vol. 14, no. 2, 2023, doi: 10.3390/f14020361.
BIOGRAPHIES OF AUTHORS
Kevin Acuña-Condori received his B.Eng. degree in electrical engineering from Univer-
sidad Nacional Mayor de San Marcos, Perú, in 2015. He was awarded the CONCYTEC scholarship
in 2015 for his master’s studies, which he completed in mechatronic engineering from the Pontif-
ical Catholic University of Peru, Lima, in 2017. Currently, he serves as a lecturer in electronic
design, digital systems, control and automation, digital image processing, artificial intelligence, and
robotics at the Universidad Tecnologica del Perú since 2017 and Pontifical Catholic University of
Peru since 2018. As a researcher at Universidad Tecnológica del Peru, he has authored publications
in bioengineering, artificial intelligence, and control. His research interests include bioengineering,
artificial intelligence, brain-computer interface, and neuroprosthetics. He can be contacted at email:
[email protected].
Security in smart cities using YOLOv8 to detect lethal weapons (Ederson Rodriguez-Rosas)