0% found this document useful (0 votes)
39 views8 pages

An Effective Approach For Violence Detection Using Deep Learning and Natural Language Processing4

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views8 pages

An Effective Approach For Violence Detection Using Deep Learning and Natural Language Processing4

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

An Effective Approach for Violence Detection using

Deep Learning and Natural Language Processing


Versha Kumari Khuhed Memon
Department of Electronic Engineering (M.E Student) Department of Electronic Engineering (Assistant Professor)
Mehran University of Engineering and Technology Mehran University of Engineering and Technology
Jamshoro, Pakistan Jamshoro, Pakistan
[email protected] [email protected]

Burhan Aslam Prof Dr Bhawani Shankar Chowdhry


Department of Electronic Engineering (Lab Supervisor) NCRA-CMS LAB
Mehran University of Engineering and Technology Mehran University of Engineering and Technology
Jamshoro, Pakistan Jamshoro, Pakistan
[email protected] [email protected]
2023 7th International Multi-Topic ICT Conference (IMTIC) | 979-8-3503-3846-1/23/$31.00 ©2023 IEEE | DOI: 10.1109/IMTIC58887.2023.10178618

Abstract—An effective tool for violence detection is highly been modernized by using the algorithms of Artificial Intelli-
demanded to examine the rise in crime rate in today’s era. gence, Machine Learning, and Natural Language Processing.
Artificial Intelligence can play a significant role in violence These smart systems have replaced the continuous monitoring
detection and monitoring to tackle various problems of secu-
rity and safety concerns. This research proposes strategies to by humans and minimized the occurrence of errors in violence
incorporate Deep Learning and Natural Language Processing detection. Machine learning and NLP produce results with
(NLP) to simultaneously detect anomalous objects and scenarios greater accuracy by training models with adequate datasets.
from videos using TensorFlow and aggressive, offensive, and hate In this research, the primary focus is on violence detection
speech from an audio channel of surveillance cameras. This through Artificial Intelligence which is categorized into two
research aims to automatically detect violence in real-time from
surveillance footage by using TensorFlow custom object detec- portions. The first portion is object-based detection, which
tion upon identification of firearms, robbery, fistfights, sexual involves identifying acts of violence such as fire, snatching,
harassment, and fire in successive images from the video feed. In fist fights, and the use of weapons like pistols, as well as
addition, the audio channel of such surveillance cameras can also instances of sexual harassment; and speech-based detection,
be significantly fruitful in detecting hate speech, verbal sexual which focuses specifically on detecting instances of abuse
abuse, and profanity. The proposed system includes an alert
mechanism that detects any type of violence and automatically through analyzing spoken language. The developed models
notifies the security administrator, enabling timely intervention can be deployed on any existing surveillance system with
to prevent potential damage to society. The developed models can next to negligible additional hardware and software resource
be deployed on any existing surveillance system with next to neg- requirements, thereby making it an efficient, fast, accurate, and
ligible additional hardware and software resource requirements, economical solution.
thereby making it an efficient, fast, accurate, and economical
solution. To train the model, custom datasets were designed for The structure of the paper is such that the work related to
6 categories in images and 2 categories in speech. The accuracy violence detection either by object or speech using various
of the developed system was found to be 84%, with adequate algorithms is discussed in Section II. Section III provides
performance under various luminance conditions, including night information on the equipment, algorithm, experimental setup,
vision images. and implementation methodology of the developed violence
Index Terms—Violence Detection, Object Detection, Deep
Learning, Natural Language Processing, Artificial Intelligence, detector. Section IV evaluates the performance of the devel-
TensorFlow, Smart Surveillance. oped system and presents the results whereas the paper is
concluded in Section V.
I. I NTRODUCTION II. L ITERATURE R EVIEW
One of the important factors that come to mind for the Many researchers proposed several techniques to detect
development and prosperity of the country is security, women’s violence through computer vision or deep learning techniques.
safety, and law enforcement. It enlightens the significance Violence can occur at any time and in several ways either by
of surveillance systems and violence detection. Moreover, using a pistol in-crowd, fire, fist fight, handbag snatching, and
violence from the usage of pistols, fire, snatching, sexual sexual harassment. These particular violence categories mostly
harassment, and fist fight can occur at any time, and it requires occur in today’s world and it became the most significant topic
human monitoring which is an inefficient way to detect such for researchers to detect violence by replacing the continuous
unusual and unpredictable events. The technological world has monitoring of humans.

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on November 06,2023 at 04:03:22 UTC from IEEE Xplore. Restrictions apply.
For violence detection, there are certain studies utilizing
both object and text detection. [1] [2] provide a comprehensive
analysis of the current state and emerging trends in violence
detection research, including the categorization of methods,
addressing challenges, and presenting datasets for testing.
[3] introduces BrutNet, a hybrid model combining DCNN
and GRU architectures, for automatic violence detection and
classification in videos. This has used Convolutional Neural
Networks (CNN) to detect the weapon from video surveillance
[4]. [5]proposes using YOLO-V3 for real-time automated vi-
sual surveillance to detect handguns. In the proposed detection
system, [6] a pre-trained deep learning model Mobile Net
V3-SSD Lite is used. [7] proposes a deep learning-based
method for predicting abnormal events in daycare environ- Fig. 1: Development Process Block Diagram
ments using networked surveillance systems and IoT devices,
with superior performance compared to previous methods, by
utilizing multi-classifiers, deep neural networks, and kernel tool kit (NLTK) library for NLP. This section also provides
density functions for dynamic activity prediction. Different the details pertaining to the hardware and software used in
algorithms have been used to detect fire and smoke in different the development of the proposed model. Here a detailed
areas in order to improve the system’s efficiency and speed explanation of both approaches is provided separately. The
[8] [9] [10]. [11] proposes an intelligent surveillance system block diagram of the proposed system is given in Fig. 1.
that automatically detects multiple anomalous activities in Initially, datasets pertaining to the 5 image categories were
videos, utilizing moving object detection, object tracking, and acquired from existing datasets available online and custom
behavior understanding, with a detection accuracy of up to additions were made to them. This is further discussed in
90% based on experimental results, addressing the need for section A. Similarly, the text dataset was also acquired online
efficient monitoring of surveillance videos in public places. and merged with the created dataset (as discussed in section
Sexual, abusive, and hate speech detection is useful to B) for binary classification of speech from microphones after
prevent bullying and harassment as these crimes are rising converting it to text. After the dataset acquisition phase, two
rapidly in the world. Different approaches have been used to separate models were trained for images and text and deployed
build the model for violence detection. An approach [12] [13] using parallel processing in Python. Upon detection of any
used ML algorithms e.g., Random Forest, Multinomial Naı̈ve anomaly from either one or both models, a warning signal is
Bayes, SVM with linear, Radial Basis Function and compared generated, along with the location of violent activity, for the
with Count Vectorizer and Tfidf Vectorizer features while [14] security administration’s brisk response, to avoid any mishap.
evaluates the system performance by using the ML algorithms
with an accuracy of 0.97. The recommended hardware and software requirements
In the past, many researchers worked on individual detection utilized are given in TABLE I and TABLE II respectively.
of all unusual objects and events but none of them have However, it is feasible to reproduce the system without these
worked on the compiled model having the ability to detect requirements, although the accuracy, training time, and real-
the objects and texts. By looking at the previous work, text time performance may differ.
classification has been done but none of them have used audio
and video input from surveillance cameras to simultaneously A. Object Detection
detect objects and texts using two different approaches. The Object detection can serve to be an efficient tool for
parallel processing of custom object detection is trained on five detecting a set of objects in an image or a video feed along
categories using TensorFlow and text classification with two with the location of the given object(s) in the scene. The
categories using NLP. The proposed model can be deployed object detection model is deployed to locate the 5 categories of
in a real-time surveillance system with no additional tool and objects/ scenarios for violence detection. Training and testing
results in greater accuracy using the ML technique. were carried out with images of different resolutions, captured
at different distances and under various lighting conditions.
III. I MPLEMENTATION M ETHODOLOGY The details of the dataset and TensorFlow model used are
This section elaborates in detail on the approaches incor- given in this section.
porated to build the system. This proposed system has been
developed using two algorithms. TensorFlow custom object 1) Dataset Acquisition and Training: To build a dataset to
detection has been used to detect violence, in particular pistol, achieve the goals of this research, images were collected from
fire, fist fight, snatching, and sexual harassment within the various sources including Mehran University IICT building
scope of the surveillance camera and the other is sexual abuse security cameras, Android phone IP camera, online available
and hate speech text detection using the Natural language datasets, and downloads from Google images. This enabled

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on November 06,2023 at 04:03:22 UTC from IEEE Xplore. Restrictions apply.
TABLE I: Recommended hardware requirement the model to learn and generalize well, enabling accurate
Hardware performance in various real-world situations, and making it
Workstation more robust and adaptable. The resolution of the images in
Model Lenovo Legion Y545
CPU Intel i7-9750H 2.6G (9th Generation)
the custom dataset is limited, as indicated in TABLE III,
GPU Nvidia GeForce GTX 1660Ti 6 GB ranging from minimum resolution to maximum resolution.
RAM 16GB Although higher-resolution images in a real-time environment
Storage 1TB HHD + 512GB SSD could potentially yield better accuracy, the accuracy described
OS Windows 10 Home
Smart Phone (For IP Webcam) in this paper is achieved within the range from minimum to
Device name OPPO A53 maximum resolution.
Model CPH2127
CPU Qualcomm SM4250 Octa Core
RAM 4.00 GB
Storage 64.0 GB
OS Android 10

TABLE II: Recommended software requirement


Software
Python 3.7.6
TensorFlow 1.14
TensorFlow-GPU 1.14
Nvidia CUDA 10.0
CuDNN 7.4.1
Bazel 0.24.1
Keras 2.3.0
NLTK 3.5

us to build a custom dataset with 5 categories to efficiently


cater to the local surrounding conditions, dressing styles, and
skin tones. The dataset was then labeled with 80% of the
total images used for training, while 20% of the images used
for testing. Details pertaining to the dataset are presented in
TABLE III which shows the data set classes, the total number
of custom and downloaded images, and the resolution range.
Fig. 2: Dataset
A sample subset of the dataset used in shown in Fig. 2.

TABLE III: Dataset Acquisition


2) TensorFlow Custom Object Detection: After dataset
Object Classes Total number of custom images acquisition, every individual image in the dataset is then
Labelled Test Train Min Resolution Max Resolution
annotated using LabelIMG. All the objects in each image are
Pistol 45 49 440 x 537 1176 x 1095
Fire 15 45 734 x 734 3120 x 4160 encapsulated with a bounding box and assigned the label of
Handbag Snatching 25 85 734 x 734 840 x 1077 the respective category. LabelIMG creates the XML file for
Fist Fight 10 28 848 x 680 984 x 680 every image in the dataset that describes the object and its
Sexual Harassment 14 50 734 x 734 806 x 1002
whereabouts in the corresponding image, as shown in Fig.
Object Classes Total number of downloaded images
3. and Fig. 4. The dataset is then spilt into training (80%)
Labelled Test Train Min Resolution Max Resolution and testing (20%) data. Converting train and test data into
Pistol 950 3655 255 x 198 987 x 638 CSV files. Further, the generation of TF records from these
Fire 150 482 400 x 324 600 x 400 train and test files will be used in custom object detection
Handbag Snatching 196 800 274 x 184 851 x 635
Fist Fight 100 800 276 x 183 300 x 168 using TensorFlow and GPU. Training with GPU reduces the
Sexual Harassment 96 440 275 x 183 300 x 168 time during training as compared to CPU. Before training the
model, a label map is created which tells the trainer what
The custom image dataset used is smaller in size compared each object is by writing a mapping of class names to class
to the downloaded image dataset. However, it provides real- ID numbers. The configuration (.config) file is required for
time images during the training process, which can improve the training to start. Faster-Rcnn-Inception-V2-Coco is used
the accuracy and performance of the system when deployed in to train the model for custom object detection. The inference
a real-time environment. The custom dataset also offers flexi- graph of the trained model is then exported and used for
bility and diversity by capturing images in different scenarios, deployment in Python where images from IP Webcam are
such as night vision images and varying light intensities, as analyzed for violence detection in real-time.
well as capturing images from different angles. This allows

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on November 06,2023 at 04:03:22 UTC from IEEE Xplore. Restrictions apply.
words with larger sizes. It provides a quick and easy way to
identify the most prominent words in a dataset.

TABLE IV: Text Dataset


Text Classes Total number of custom sentences
Labelled Train Min Words Max Words
Abusive 95 10 25
Normal 88 6 15

Text Classes Total number of downloaded sentences


Labelled Train Min Words Max Words
Fig. 3: Object Detection Block Diagram Abusive 590 5 22
Normal 290 7 25

TABLE V: Sample Text Dataset


Abusive Sentences
He punched me in the back of the fuckin’ head, right, he
punched me in the back of the fuckin’ head.
Please stop hassling me. Just leave me and go. Don’t beat
me.
Someone, please help. I am very scared, please do not kill
and hit me.

Fig. 4: Images with Labels Normal Sentences


This outfit looks really nice on me, I should wear this color
often.
Good morning, have a good day.
B. Text Detection Hey, I hope you win today’s match.
After object detection, this paper proposed another approach
to detect and recognize text by filtering inappropriate text con-
tent with NLP techniques. The following steps were followed
for text recognition:
1) Build labeled text datasets for the following:
• Sexual abuse
• Hate speech
• Abusive/offensive language

2) The proposed system converts speech to text, and then


it is followed by text recognition using NLTK.
3) Train an ML text model for real-time speech analysis
using Keras.
4) Feed the live webcam video to this incorporated system Fig. 5: Text Data Analysis
using a Python interface.
5) Tuning of system parameters for speed and accuracy 2) Text Model Using NLP: The text classification process
using TensorFlow with GPU. can be classified into five stages as shown in Fig. 6.
6) On detection of any abnormal scenario or abusive
speech, an alarm/notification system is to be activated
to inform the security administration.
1) Dataset Acquisition: In order to build a system that can
detect abusive speech, the text dataset with two conditions
(Abusive and normal speech) has been trained. 50% corpus of
abusive text and 50% corpus of normal text to identify the type
of speech. The dataset was generated using a combination of Fig. 6: Text Classification Stages
custom sentences and downloaded sentences from the internet,
both falling within the minimum and maximum word count
ranges specified in TABLE IV. TABLE V presents a collection C. Proposed Model
of text data sets that include both abusive and non-abusive Upon execution, parallel deployment and execution of both
sentences. The analysis of text data in Fig. 5 was conducted image and text-based ML models on video and audio in-
utilizing the technique of word cloud indicating more frequent puts from CCTV cameras in real-time. Surveillance camera

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on November 06,2023 at 04:03:22 UTC from IEEE Xplore. Restrictions apply.
continuously monitors the environment and looks for objects
like fire, pistol, snatching, fist fight, and sexual harassment or
abusive speech. Python interface has the ability to do parallel
processing for performing two different tasks (Object and
Text detection). On detection of any abnormal scenario or
abusive speech, an alarm/notification system is to be activated
to inform the security administration. The flowchart of the
proposed system is given in Fig. 7. Fig. 8: Pistol Detection

Fig. 9: Fire Detection

(a) Fist Fight Detection (b) Handbag Snatching


Detection
Fig. 7: Flowchart
Fig. 10: Fist Fight and Handbag Snatching Detection

IV. R ESULTS AND D ISCUSSION


This section discusses the experimental results obtained
from the object and text detection using TensorFlow and NLP
respectively with the aforementioned scenarios. The results are
divided into two sections.
A. Object Detection
(a) Sexual Harassment (b) Invalid Detection
In this subsection, the detection of objects has been pre- Detection
sented using the Faster-RCNN algorithm. The IP Webcam has
been used as the video channel of a surveillance camera at Fig. 11: Sexual Harassment and Invalid Detection
IICT building MUET Jamshoro. The findings of the object
detection model reveal that it is highly efficient in detecting
pistols, as evidenced by real-time results obtained in a de-
ployed environment. The model demonstrates superior accu-
racy in detecting five categories: Pistol, fire, fighting, handbag
snatching, and harassment, with minimal false positives as
illustrated in Fig. 8-11 (a). However, there are instances where
invalid detection occurs due to unclear images provided to the Fig. 12: Model Performance in Night Vision
system, as illustrated in Fig. 11 (b). Remarkably, the model
performs well even in night vision scenarios, achieving high
accuracy. at different LUX levels, and false positives were calculated. A
The model has been deployed in various environments with detection count of one indicates accurate detection with good
varying light intensities to assess its correctness, performance, accuracy, while a count of two represents incorrect detection
and accuracy under different conditions as shown in TABLE of two objects in real-time. Evaluating the performance of the
VI. All five object detection categories (Pistol, Fire, Handbag object detector in terms of correctness, accuracy, and efficiency
Snatching, Sexual Harassment, and Fighting) were evaluated is crucial.

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on November 06,2023 at 04:03:22 UTC from IEEE Xplore. Restrictions apply.
TABLE VI: Performance of object detector with different light dataset shows a further decrease to 84%. The accuracy and
intensities loss metrics have been comprehensively evaluated in relation
S. No. Object Type LUX (Light False Posi- Number to the training, validation, and testing datasets, providing a
Intensity) tives of De- comprehensive overview of the model’s performance.
tections
1 Fire 117 No 1 TABLE VIII: Accuracy and Loss
2 Fire 31 No 1
3 Fighting 120 Yes 2 Dataset Accuracy Loss
4 Fighting 20 No 1 Training 0.91 0.021
5 Fighting 215 No 1 Validation 0.86 0.0103
6 Sexual Harassment 120 Yes 2 Testing 0.84 0.0320
7 Sexual Harassment 117 No 1
8 Sexual Harassment 136 Yes 2
9 Handbag Snatching 118 No 1 The Fig. 14 represents an illustration of the concept of In-
10 Handbag Snatching 23 No 1 tersection over Union (IoU) as applied to an violence detection
11 Pistol 117 No 1
model. IoU serves as a performance and efficiency metric by
12 Pistol 25 No 1
13 Pistol 20 No 1 quantifying the overlap or similarity between the predicted
bounding box (detection) and the ground truth bounding box
(ground truth) of an object. IoU has been calculated using Eq
(3) As shown in Fig. 14, the IoU is observed to be greater
than 0.5, indicating that the detections are classified as true
positives.

Fig. 13: Confusion Matrix

The Confusion matrix in Fig. 13 summarizes the results of


testing the model’s performance on a total of 10,493 images.

TABLE VII: Precision and Recall Values


Category Precision Recall
Fist Fight 0.87 0.81
Fire 0.99 0.99 Fig. 14: Intersection over Union for Object Detection
Handbag Snatching 0.79 0.83
Pistol 0.98 0.97 The image acquisition time is the average time per frame
Sexual Harassment 0.79 0.81 as acquired from the IP camera due to network latency. The
average model inference time per frame after acquisition time.
It is very low as compared to network latency. The average
The precision and recall measures were calculated based on total time required for object detection. It is the sum of image
the values obtained from the confusion matrix as shown in acquisition time and model inference time. Fig. 15 represents
TABLE VII by using the Eq (1) and (2). the graphs for image acquisition, model inference and overall
T ruepositive processing time required for object detection.
P recision = (1) Considering the average overall performance time, as shown
T ruepositive + F alsepositive
in Fig. 15 (c), the developed system can serve as a fast and
T ruepositive real-time tool for violence detection.
Recall = (2)
T ruepositive + F alsenegative
Areaof overlap B. Text Detection
IoU = (3)
Areaof union This research proposes a framework to detect abusive,
sexual, and hate speech using an audio channel of the surveil-
The accuracy and loss metrics are employed to assess the lance camera. For experimentation, a python-based testing
performance of TensorFlow Custom Object Detection model. environment was built using text classification in NLP. This
As given in TABLE VIII, the Training dataset demonstrates a system uses the NLTK library to classify the text. The model
remarkably high accuracy of 91%, while the Validation dataset summary in Fig. 16 provides crucial details of the Text
exhibits a slightly lower accuracy of 86%, and the Testing Detection model trained using NLP, including the utilization

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on November 06,2023 at 04:03:22 UTC from IEEE Xplore. Restrictions apply.
a test sample with a score less than 0.5 is classified as normal
speech, while a score greater than or equal to 0.5 indicates
abusive speech. The table further details the occurrences of
true positives and false positives, providing insights into the
performance and efficiency of the model.
TABLE IX: Text Classification Results
S. No. Test Samples Abusive Normal True False
Speech Speech Posi- Posi-
(a) Image Acquisition Time tives tives
1 You bitch stay away Yes No Yes No
from me (0.97)
2 Good Morning No Yes Yes No
(0.02)
3 You bloody, Stay Yes No Yes No
away from me, I (0.94)
gonna kill you
4 I really curse those Yes No Yes No
people who speak (0.53)
(b) Model Inference Time bad and abusive lan-
guage in society
5 She was threatened Yes No No Yes
by her neighbor (0.99)

(c) Overall Processing Time

Fig. 15: Overall Processing Time for Object Detection


Fig. 17: Confusion Matrix
of a sequential model, the number of layers and units in each
layer, the activation functions employed, the input and output TABLE X: Accuracy and Loss
dimensions, and the total count of trainable parameters in the
Dataset Accuracy Loss
model.The hyper parameters are defined in the model with Training 1 0.0043
a vocab size of 5000, embedding dimension of 16 and a Validation 0.94 0.015
maximum length of sentence is 50. Testing 0.875 0.1

TABLE XI: Processing Time with and Without Google API


S. No. Sentences Classification
Processing Processing
Duration Duration
with without
Google API Google API
(seconds) (seconds)
1 I want to Abusive (0.94) 1.48 0.26
kill you
2 Go to Abusive (0.9) 2.34 0.26
hell, you
fucking
bastard
3 I want to go Normal (0.2) 1.99 0.27
for a pic-
nic tomor-
Fig. 16: Model Summary row morn-
ing
TABLE IX summarizes the outcomes obtained from the text
classification of abusive and normal speech using NLP. The The confusion matrix for text classification is shown in
results are categorized based on a predefined threshold, where Fig. 17. Table X demonstrates the accuracy and loss of

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on November 06,2023 at 04:03:22 UTC from IEEE Xplore. Restrictions apply.
training, validation, and testing datasets. However, TABLE XI authors are thankful to the DEAN FEECE Professor Dr.
elaborates the processing time taken by the system with and Mukhtiar Ali Unar for granting the permission to use IIT
without Google API. Fig. 18. illustrates a graph that depicts the building CCTV surveillance cameras to build the dataset for
system’s performance when using the Online Google API and violence detection. We are also thankful to Engr. Ghulam
the Offline Google API. The results indicate that the Online Mustafa Baloch, the officer incharge of Video Conferencing
Google API takes a longer time to process compared to the System IICT, Mehran UET Jamshoro for facilitating this
Offline Google API. However, utilizing the Offline Google research by providing the CCTV recordings for building up
API may result in higher memory consumption. the custom dataset.
R EFERENCES
[1] F. U. M. Ullah, M. S. Obaidat, A. Ullah, K. Muhammad, M. Hijji, and
S. W. Baik, “A comprehensive review on vision-based violence detection
in surveillance videos,” ACM Computing Surveys, vol. 55, no. 10, pp.
1–44, feb 2023.
[2] B. Omarov, S. Narynov, Z. Zhumanov, A. Gumar, and M. Khassanova,
“State-of-the-art violence detection techniques in video surveillance
security systems: a systematic review,” PeerJ Computer Science, vol. 8,
p. e920, apr 2022.
[3] M. Haque, S. Afsha, and H. Nyeem, “An efficient deep learning model
for violence detection,” 2023.
[4] H. Jain, A. Vikram, Mohana, A. Kashyap, and A. Jain, “Weapon
detection using artificial intelligence and deep learning for security
applications,” in 2020 International Conference on Electronics and
Sustainable Communication Systems (ICESC). IEEE, jul 2020.
[5] A. Warsi, M. Abdullah, M. N. Husen, and M. Yahya, “Automatic
handgun and knife detection algorithms: A review,” in 2020 14th
International Conference on Ubiquitous Information Management and
Communication (IMCOM). IEEE, jan 2020.
[6] M. Ghazal, N. Waisi, and N. Abdullah, “The detection of handguns
from live-video in real-time based on deep learning,” TELKOMNIKA
(Telecommunication Computing Electronics and Control), vol. 18, no. 6,
Fig. 18: Overall Processing with and without Google API p. 3026, dec 2020.
[7] G. Vallathan, A. John, C. Thirumalai, S. Mohan, G. Srivastava, and J. C.-
W. Lin, “Suspicious activity detection using deep learning in secure
assisted living IoT environments,” The Journal of Supercomputing,
V. C ONCLUSION vol. 77, no. 4, pp. 3242–3260, jul 2020.
[8] M. Grega, A. Matiolański, P. Guzik, and M. Leszczuk, “Automated
This model proved to be an efficient, fast, accurate, and detection of firearms and knives in a CCTV image,” Sensors, vol. 16,
economical solution for violence detection with no additional no. 1, p. 47, jan 2016.
[9] M. S. Allauddin, G. S. Kiran, G. R. Kiran, G. Srinivas, G. U. R. Mouli,
hardware and software requirements. This system has the and P. V. Prasad, “Development of a surveillance system for forest fire
ability to detect violence through surveillance cameras in low detection and monitoring using drones,” in IGARSS 2019 - 2019 IEEE
light intensity environments. During the model implementation International Geoscience and Remote Sensing Symposium. IEEE, jul
2019.
in the AV room of the IICT building, it was observed that the [10] A. NAMOZOV and Y. I. CHO, “An efficient deep learning algorithm
pistol and fire detection shows greater accuracy as compared for fire and smoke detection with limited data,” Advances in Electrical
to the other three classes. Sexual Harassment, Fist Fight, and and Computer Engineering, vol. 18, no. 4, pp. 121–128, 2018.
[11] S. Chaudhary, M. A. Khan, and C. Bhatnagar, “Multiple anomalous
Fighting have false positives, thus it shows less accuracy. activity detection in videos,” Procedia Computer Science, vol. 125, pp.
This system can be implemented in Universities, Hospitals, 336–345, 2018.
Banks, etc. for public safety to detect violence efficiently as [12] T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate
speech detection and the problem of offensive language,” Proceedings
it can detect speech and objects simultaneously. The three of the International AAAI Conference on Web and Social Media, vol. 11,
categories of Sexual Harassment, Handbag Snatching, and Fist no. 1, pp. 512–515, may 2017.
fighting have higher false positives within them since they [13] G. M. Barrientos, R. Alaiz-Rodrı́guez, V. González-Castro, and A. C.
Parnell, “Machine learning techniques for the detection of inappropriate
look similar. In the future, this can be improved by either erotic content in text,” International Journal of Computational Intelli-
increasing the dataset or merging these three categories into gence Systems, vol. 13, no. 1, p. 591, 2020.
one (physical abnormal activity), in which case the accuracy [14] F. Husain, “Arabic offensive language detection using machine learning
and ensemble machine learning approaches,” 2020.
will drastically increase. This system can achieve much better
accuracy if the dataset is increased for both object detection
and text classification. This can be done in future to remove
false positives and increase accuracy level.

ACKNOWLEDGMENT
This research work has been carried out in Research Lab-
1 and Audio Video Conference Room IIT Building, Mehran
University of Engineering and Technology, Jamshoro. The

Authorized licensed use limited to: Chaitanya Bharathi Institute of Tech - HYDERABAD. Downloaded on November 06,2023 at 04:03:22 UTC from IEEE Xplore. Restrictions apply.

You might also like