Human Activity Detection Using Deep - 2-1
Human Activity Detection Using Deep - 2-1
Abstract: For many academics, HAR is a hot topic. They can do this with ease because to a number of cutting-edge technologies, including
deep learning, which is useful in a number of contexts. While most of the current body of work has focused on wearable sensor data, it is
not always practical to get such data. Publicly accessible video datasets are mined for human activity detection in the proposed study using
deep learning techniques including CNNand long short-term memory. CNN extracts relevant characteristics from input data, whereas
LSTM eliminates and rejects superfluous data to increase performance. The confusion matrix's precision and recall are used to evaluate
the suggested technique. Accuracy is high across the board, as shown by the fact that the diagonals of the confusion matrices for all actions
are near to 1.
1. Introduction
The ability to identify human actions is now fundamental model training architecture, the kind of activities being
to survival. Classifying a person's actions in real-time performed, and the intended use of the system.
from a series of sensor data [1]or visual data collected
It has proven useful in a variety of settings, from
from a variety of input sources is the challenge known as
automated surveillance to healthcare to elder care to sports
human activity recognition. To accomplish its goal of
to robots to security to media broadcasting, and it has all
identifying human behaviour, activity, or condition,
showed great promise in solving real-world issues by
human identification systems use data from a wide variety
integrating technology. In addition [4], the use of vision
of sources[2].As wireless data transmission, Bluetooth,
datasets for the sake of preventing potentially risky
and cellular data continue to advance, this data may be
actions and identifying criminals is a major application.
quickly moved to a new media and put to use in modelling.
The human's motion may be tracked in real-time[3] The use of human activity has demonstrated promising
without any lag. In spite of its rising popularity over the results and substantial needs in the field of automated
last decade, there are still many obstacles to overcome video surveillance [5]. “These systems are ideal for
before it can reliably and quickly translate raw input data intelligent crowd surveillance in shopping malls, games,
into well-defined, actionable motion. The identification of live concerts, streets, and highways, crossroads, traffic
human actions from still or moving photographs is a lights, parking lots, since they can easily identify
difficult problem. Problems with size, clutter, occlusion, unwanted and suspicious activities and track people in the
perspective, lighting, and overall presentation must be crowd.” To recognise moving objects, a convolutional
addressed. neural network with an LSTM model is fed frames from a
video feed as input [6] before being trained to extract
Video surveillance, human-computer interaction, and
temporal and spatial features. As an added bonus, this may
human behaviour recognition are just a few examples of
assist mitigate risks by decreasing response times,
the many uses for multi-activity recognition.
balancing the burden of security staff, and immediately
Developing an automated system that can properly alerting the appropriate parties. Redundant frame
identify the activity being done by a person by evaluating detection [7] applied as pre-processing using a
data from a variety of sources is the primary goal of HAR. convolutional neural network may be used for successful
There may be variations in the procedure based on the results in classifying various human activities. Classifying
nature of the data source, the nature of the input data, the human actions is aided greatly by the pre-processing of
video frames.
1
Department of Computer Science and Engineering,
Nalanda College Of Engineering, Chandi, India
Image/video from cameras, video recording devices,
[email protected] surveillance cameras, 3D cameras, Microsoft Kinect
2
Department of Computer Science and Engineering, cameras, infrared sensors, etc. is used for activity
National Institute of Technology Patna, Patna-800005, India detection in vision-based activity recognition[8]. A
3
Department of Computer Science and Engineering,
National Institute of Technology Patna, Patna-800005, India
kinetic camera can take depth shots, however a standard
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, 11(10s), 535–542 | 535
camera can only take 2D or 3D photos. There's a lot of predict or classify output. Input and output data are
fresh visual data thanks to security cameras and YouTube, analysed by the ANN architecture, which then finds
but there are also issues. These include, but are not limited patterns and correlations. Input layers, hidden layers, and
to: background clutter; partial occlusion; viewpoint bias; an output layer make up an ANN. For image identification
inconsistent lighting; awkward camera angles; and tasks, neural networks excel. For this purpose, CNNs
shadows. stand out as the optimal neural network architecture. Deep
learning is a subfield of artificial neural networks that has
Despite the importance of video surveillance, there are
found use in a variety of problem-solving contexts,
several obstacles [9] to overcome when trying to identify
including but not limited to the ones listed above. In deep
human behaviours inside recordings, such as rapid view
learning [13], a unified algorithm with a single kind of
shifts, a lack of adequate view angles, etc. Information
activation function is used to handle input at each
loss over a longer period of time might be the outcome of
successive layer. Data characteristics useful for
these difficulties. Even if it is difficult to link the target
instruction, discovery, and comprehension are built layer
over spatial proximity in the case of camera position
by layer.
change. Associating objectives in an interval with little
available information calls for a fresh 3. Discriminative Deep Learning Models
strategy[10].Untrimmed depth recordings make it difficult
Discriminative feature learning models are demonstrated
to identify and classify posture changes. When it comes to
with consequent distributionclasses to boost their
the classification of posture change segments or body
classification and recognition powers. CNNs, RNNs and
movements, [11] uses CNN to extract characteristics from
other discriminative deep learning are used to identify
video frames.
human behaviour.
The majority of studies using human activity recognition
3.1 Convolutional Neural Network (CNNs)
have used eyesight or wearable sensors. Problems arise
when trying to use these wearable sensors [12] for HAR, Deep learning CNNs can detect and extract attributes from
since they need to be connected to the subject's body, an input image using learnable biases and weights.
which isn't always possible. There is an alternative to CNN[14]'s ability to collect both temporal and spatial
employing wearable sensors for HAR: using video frames connections in an image and minimise the picture without
captured by cameras. sacrificing aspects that help build a more scalable
prediction model are its main strengths.A conventional
2. Deep Learning
CNN has two primary parts. As illustrated in Figure 1,
Artificial neural networks (ANNs) replicate human brain there are two main components: feature extraction and
function. Neural networks learn from training input to classification.
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, 11(10s), 535–542 | 538
Fig 3: KTH [16] Activity Dataset
5.2 Data preparation
TensorFlow with the Keras backend was used to create To attain the required performance accuracy in machine
both the CNN and the RNN. RNN's input format is distinct learning, it is crucial to develop an effective machine
from that of CNN's. Each RNN data sample has to be in learning model assessment metrics. Each kind of
the form of a data sequence. In the context of the study at challenge requires a unique set of criteria for examination.
hand, a single data sequence corresponds to a single trial Different metrics are used for classification, regression,
of an activity done by a single volunteer. RNN requires ranking, clustering, associating, etc. “The evaluation
uniformly sized input data samples. However, the sizes of metrics not only provide the parameter by which the
the individual data files vary. Cropping is done before model's performance can be gauged, but also aid in
RNN input data files are of the same length. explaining the results obtained with alternative
implementations. There are a variety of criteria used to
5.3 Results and Analysis
evaluate the efficacy of machine learning models.”AOC
We conduct experiments on the KTH activity dataset and Many accuracy measurements may be determined,
analyse the results using a model that has been pre-trained including the F1 score, mean absolute error, and mean
using the KTH dataset. The remaining 30% is utilised for squared error.
testing after 70% of the data has been used for training.
The enigma matrix looks like-
Python's OpenCV package is used to extract dense optical
flow. The model's deep learning component is
implemented using Kears. For every action in the KTH
activity dataset, confusion matrix evaluates model
performance.
Table 1: Matrix of Confused
Negative Positive
Negative TN FN
Actual
Class
Positive FP TP
TP are cases that have been accurately identified as "True negatives," are those that have been appropriately
positive, whereas FP are those that have been labelled as such. FNare the opposite of true negatives
misidentified as negative. Truly negative examples, or (DN).
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, 11(10s), 535–542 | 539
Each action in the dataset is given a precision and recall
based on the aforementioned values.
Table 2: Precision and Recall for each activity
Activity Precision Recall
Walking 0.93 0.95
Jogging 0.80 0.82
Running 0.90 0.85
Boxing 0.94 0.90
Hand waving 0.97 0.94
Hand Clapping 0.84 0.91
Predicted Class
The suggested model's confusion matrix for all six actions isn't a universal benchmark dataset that covers all the
is shown in Table 3. All of the diagonal entries in the bases in terms of representing real-world
aforementioned confusion matrix have values extremely circumstances and behaviours. Because of this, HAR
near to 1, suggesting that the accuracy of the predictions assessment and training have become less reliable.
for each action is high. More work is required to compile a reliable
collection of accurate data, and a standard approach
6. Limitations
should be used to conduct quantitative comparisons
Recognising human actions has been a hot topic in the across different benchmarks.
scientific community recently. The widespread • • The intra- and inter-class differences in HAR
application of HAR to industries as varied as sports, video provide a significant difficulty. Subjects'
surveillance, filmmaking, and medicine is also a experiences of the same task might vary depending
contributing factor. In order to recognise actions, several on their size, attire, and character traits. The way a
techniques have been tried. The effectiveness of sensor person walks, for instance, may be completely
and video data for HAR depends on devices, data quality, unique to that individual. In addition, certain
experimental settings, light fluctuations, moving physical activities, such as walking or jogging, may
background, viewpoint shift, occlusion, noise, and so on. appear remarkably similar to others. It may be
There is a discussion of some of the difficulties and challenging to teach a system to recognise complex
restrictions of HAR here. behaviours that include numerous activities, such as
sipping tea while conversing on the phone. More
• Video datasets are notorious for their high memory
precise data on these actions and activities enables a
requirements. As a result, it would be impracticable
deep learning model to discern between them, and
to put the complete dataset into RAM. Some
hybrid devices can train a multileveled prediction
solutions have been presented to this problem, such
model to identify composite activities.
as passing a URL to download a video from a
website like YouTube. Another issue is that there
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, 11(10s), 535–542 | 540
• Dynamic backgrounds, occlusions, illumination “Human Action Recognition for Pose-based
variation, noise, varying lighting, varying Attention: Methods on the Framework of Image
perspectives, and low-quality photos and videos are Processing and Deep Learning,” 2021 56th Int. Sci.
all commonplace in real-world videos. Noise, Conf. Information, Commun. Energy Syst. Technol.
undesired signal detection, and subpar sensors and ICEST 2021 - Proc., pp. 23–26, 2021, doi:
gearbox systems may all compromise the quality of 10.1109/ICEST52640.2021.9483503.
data acquired by sensor-based methods. These [3] R. Poppe, “Vision-based human motion analysis :
factors increase the difficulty and difficulty level of An overview,” vol. 108, pp. 4–18, 2007, doi:
HAR. Sensor-based HAR may make use of multi- 10.1016/j.cviu.2006.10.016.
sensor data, as well as data from other sources, such [4] T. Özyer, D. S. Ak, and R. Alhajj, “Human action
as RGB, depth, a video skeleton, and more. recognition approaches with video datasets—A
• A HAR system may use more energy and materials. survey,” Knowledge-Based Syst., vol. 222, p.
Real-time precise sensing is essential for many uses, 106995, 2021, doi: 10.1016/j.knosys.2021.106995.
including video monitoring and care for the elderly. [5] R. Bodor, “Vision-Based Human Tracking and
More processing power, electricity, and memory will Activity Recognition.”
be required to process the massive amounts of data [6] S. Patil and K. S. Prabhushetty, “Bi-attention LSTM
generated by sensors and videos. Some programmes, with CNN based multi-task human activity detection
including those concerned with security, need to be in video surveillance,” Int. J. Eng. Trends Technol.,
able to foresee a user's next move based on an vol. 69, no. 11, pp. 192–204, 2021, doi:
analysis of their past actions. Because of these 10.14445/22315381/IJETT-V69I11P225.
constraints, automated activity identification in real [7] S. S. Begampure and P. M. Jadhav, “Intelligent
time is now more difficult than ever. Video Analytics For Human Action Detection: A
Deep Learning Approach With Transfer Learning,”
7. Conclusion Int. J. Comput. Digit. Syst., vol. 11, no. 1, pp. 63–71,
Computer vision, robotics, and various other applications 2022, doi: 10.12785/ijcds/110105.
use Human Action Recognition (HAR) to analyse and [8] D. Cavaliere, V. Loia, A. Saggese, S. Senatore,
interpret human activities. Accelerometers, gyroscopes, and M. Vento, “Knowledge-Based Systems A
and magnetometers in smartphones, smartwatches, or human-like description of scene events for a proper
video security cameras make data collection easy. Our UAV-based video content analysis ✩,” Knowledge-
study makes use of a publicly accessible activity dataset Based Syst., vol. 178, no. 2019, pp. 163–175, 2020,
including a variety of body parts and positions. doi: 10.1016/j.knosys.2019.04.026.
[9] H. Yu et al., “Multiple human tracking in wearable
We examine the information and sort people into
camera videos with informationless intervals,”
categories using methods like machine learning and deep
Pattern Recognit. Lett., vol. 112, pp. 104–110, 2018,
learning architecture. Researchers showed that raw data
doi: 10.1016/j.patrec.2018.06.003.
alone might provide better results if a balanced dataset is
[10] H. Madokoro, S. Nix, H. Woo, and K. Sato, “A mini-
utilised for training the model. Activity detection and
survey and feasibility study of deep-learning-based
categorization using video is useful in certain domains,
human activity recognition from slight feature
such as sports. We employed a state-of-the-art method that
signals obtained using privacy-aware environmental
relied on a sports-themed model that had already been
sensors,” Appl. Sci., vol. 11, no. 24, pp. 1–31, 2021,
trained to do this. Our dataset may be expanded to include
doi: 10.3390/app112411807.
the classification of scoring actions linked to individual
[11] X. Yang et al., “A CNN-based posture change
sports, in addition to the classification of various sports
detection for lactating sow in untrimmed depth
based on activity conducted by the person in the video.
videos,” Comput. Electron. Agric., vol. 185, no.
The findings have established a baseline performance for
March, p. 106139, 2021, doi:
output, and the hard dataset will aid researchers in
10.1016/j.compag.2021.106139.
classifying computer vision activities with very
[12] I. U. Khan, S. Afzal, and J. W. Lee, “Human activity
comparable intraclass features.
recognition via hybrid deep learning based model,”
Reference: Sensors, vol. 22, no. 1, 2022, doi:
10.3390/s22010323.
[1] M. M. Hassan, S. Huda, M. Z. Uddin, A. Almogren,
[13] L. Mo, F. Li, Y. Zhu, and A. Huang, “Human
and M. Alrubaian, “Human Activity Recognition
physical activity recognition based on computer
from Body Sensor Data using Deep Learning,” J.
vision with deep learning model,” Conf. Rec. - IEEE
Med. Syst., vol. 42, no. 6, 2018, doi:
Instrum. Meas. Technol. Conf., vol. 2016-July, 2016,
10.1007/s10916-018-0948-z.
doi: 10.1109/I2MTC.2016.7520541.
[2] D. Nikolova, I. Vladimirov, and Z. Terneva,
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, 11(10s), 535–542 | 541
[14] P. Y. Chen and V. W. Soo, “Humor recognition Continuous Abnormal Situations Detection and,”
using deep learning,” NAACL HLT 2018 - 2018 Procedia Comput. Sci., vol. 150, pp. 532–539, 2019,
Conf. North Am. Chapter Assoc. Comput. Linguist. doi: 10.1016/j.procs.2019.02.089.
Hum. Lang. Technol. - Proc. Conf., vol. 2, pp. 113– [20] V. Mavani, S. Raman, and K. P. Miyapuram, “Facial
117, 2018, doi: 10.18653/v1/n18-2018. Expression Recognition using Visual Saliency and
[15] A. A. Abed and S. A. Rahman, “Python-based Deep Learning Viraj Mavani L . D . College of
Raspberry Pi for Hand Gesture Recognition Python- Engineering Shanmuganathan Raman Indian
based Raspberry Pi for Hand Gesture Recognition,” Institute of Technology Krishna P Miyapuram
no. September, 2017, doi: 10.5120/ijca2017915285. Indian Institute of Technology,” pp. 2783–2788,
[16] M. Latah, “Human action recognition using support 2012.
vector machines and 3D convolutional neural [21] A. B. Sargano, X. Wang, P. Angelov, and Z. Habib,
networks,” Int. J. Adv. Intell. Informatics, vol. 3, no. “Human action recognition using transfer learning
1, pp. 47–55, 2017, doi: 10.26555/ijain.v3i1.89. with deep representations,” Proc. Int. Jt. Conf.
[17] Z. Shi, J. A. Zhang, R. Xu, and G. Fang, “Human Neural Networks, vol. 2017-May, pp. 463–469,
Activity Recognition Using Deep Learning 2017, doi: 10.1109/IJCNN.2017.7965890.
Networks with Enhanced Channel State [22] T. B. Moeslund, A. Hilton, and V. Kru, “A survey of
Information,” 2018 IEEE Globecom Work. GC advances in vision-based human motion capture and
Wkshps 2018 - Proc., 2019, doi: analysis,” vol. 104, pp. 90–126, 2006, doi:
10.1109/GLOCOMW.2018.8644435. 10.1016/j.cviu.2006.08.002.
[18] S. Chung, J. Lim, K. J. Noh, G. Kim, and H. Jeong, [23] Singh, S. ., Wable, S. ., & Kharose, P. . (2021). A
“Sensor data acquisition and multimodal sensor Review Of E-Voting System Based on Blockchain
fusion for human activity recognition using deep Technology. International Journal of New Practices
learning,” Sensors (Switzerland), vol. 19, no. 7, in Management and Engineering, 10(04), 09–13.
2019, doi: 10.3390/s19071716. https://doi.org/10.17762/ijnpme.v10i04.125
[19] O. S. Amosov, S. G. Amosova, Y. S. Ivanov, and S. [24] Veeraiah, D., Mohanty, R., Kundu, S., Dhabliya, D.,
V Zhiganov, “ScienceDirect ScienceDirect Tiwari, M., Jamal, S. S., & Halifa, A. (2022).
ScienceDirect Using the Ensemble of Deep Neural Detection of malicious cloud bandwidth
Networks for Normal and Using the Ensemble of consumption in cloud computing using machine
Deep Neural Networks for Normal and Abnormal learning techniques. Computational Intelligence and
Situations Detection and Recognition in the Neuroscience, 2022 doi:10.1155/2022/4003403
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, 11(10s), 535–542 | 542