0% found this document useful (0 votes)
25 views

Human Activity Detection Using Deep - 2-1

This document discusses using deep learning techniques like CNN and LSTM for human activity recognition (HAR) using depth data. Specifically, it proposes using CNN to extract relevant features from input data and LSTM to eliminate redundant data to improve performance. The accuracy of the proposed technique is evaluated using precision and recall from confusion matrices, which show high accuracy as the diagonals of the confusion matrices for all actions are close to 1.

Uploaded by

Riky Tri Yunardi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Human Activity Detection Using Deep - 2-1

This document discusses using deep learning techniques like CNN and LSTM for human activity recognition (HAR) using depth data. Specifically, it proposes using CNN to extract relevant features from input data and LSTM to eliminate redundant data to improve performance. The accuracy of the proposed technique is evaluated using precision and recall from confusion matrices, which show high accuracy as the diagonals of the confusion matrices for all actions are close to 1.

Uploaded by

Riky Tri Yunardi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

International Journal of

INTELLIGENT SYSTEMS AND APPLICATIONS IN


ENGINEERING
ISSN:2147-67992147-6799 www.ijisae.org Original Research Paper

Human Activity Recognition using LSTM with depth data


1
Kumari Priyanka Sinha, 2Prabhat Kumar, 3Rajib Ghosh

Submitted: 26/05/2023 Revised: 06/07/2023 Accepted: 23/07/2023

Abstract: For many academics, HAR is a hot topic. They can do this with ease because to a number of cutting-edge technologies, including
deep learning, which is useful in a number of contexts. While most of the current body of work has focused on wearable sensor data, it is
not always practical to get such data. Publicly accessible video datasets are mined for human activity detection in the proposed study using
deep learning techniques including CNNand long short-term memory. CNN extracts relevant characteristics from input data, whereas
LSTM eliminates and rejects superfluous data to increase performance. The confusion matrix's precision and recall are used to evaluate
the suggested technique. Accuracy is high across the board, as shown by the fact that the diagonals of the confusion matrices for all actions
are near to 1.

Keywords: HAR, deep learning, CNN, neural network

1. Introduction
The ability to identify human actions is now fundamental model training architecture, the kind of activities being
to survival. Classifying a person's actions in real-time performed, and the intended use of the system.
from a series of sensor data [1]or visual data collected
It has proven useful in a variety of settings, from
from a variety of input sources is the challenge known as
automated surveillance to healthcare to elder care to sports
human activity recognition. To accomplish its goal of
to robots to security to media broadcasting, and it has all
identifying human behaviour, activity, or condition,
showed great promise in solving real-world issues by
human identification systems use data from a wide variety
integrating technology. In addition [4], the use of vision
of sources[2].As wireless data transmission, Bluetooth,
datasets for the sake of preventing potentially risky
and cellular data continue to advance, this data may be
actions and identifying criminals is a major application.
quickly moved to a new media and put to use in modelling.
The human's motion may be tracked in real-time[3] The use of human activity has demonstrated promising
without any lag. In spite of its rising popularity over the results and substantial needs in the field of automated
last decade, there are still many obstacles to overcome video surveillance [5]. “These systems are ideal for
before it can reliably and quickly translate raw input data intelligent crowd surveillance in shopping malls, games,
into well-defined, actionable motion. The identification of live concerts, streets, and highways, crossroads, traffic
human actions from still or moving photographs is a lights, parking lots, since they can easily identify
difficult problem. Problems with size, clutter, occlusion, unwanted and suspicious activities and track people in the
perspective, lighting, and overall presentation must be crowd.” To recognise moving objects, a convolutional
addressed. neural network with an LSTM model is fed frames from a
video feed as input [6] before being trained to extract
Video surveillance, human-computer interaction, and
temporal and spatial features. As an added bonus, this may
human behaviour recognition are just a few examples of
assist mitigate risks by decreasing response times,
the many uses for multi-activity recognition.
balancing the burden of security staff, and immediately
Developing an automated system that can properly alerting the appropriate parties. Redundant frame
identify the activity being done by a person by evaluating detection [7] applied as pre-processing using a
data from a variety of sources is the primary goal of HAR. convolutional neural network may be used for successful
There may be variations in the procedure based on the results in classifying various human activities. Classifying
nature of the data source, the nature of the input data, the human actions is aided greatly by the pre-processing of
video frames.
1
Department of Computer Science and Engineering,
Nalanda College Of Engineering, Chandi, India
Image/video from cameras, video recording devices,
[email protected] surveillance cameras, 3D cameras, Microsoft Kinect
2
Department of Computer Science and Engineering, cameras, infrared sensors, etc. is used for activity
National Institute of Technology Patna, Patna-800005, India detection in vision-based activity recognition[8]. A
3
Department of Computer Science and Engineering,
National Institute of Technology Patna, Patna-800005, India
kinetic camera can take depth shots, however a standard

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, 11(10s), 535–542 | 535
camera can only take 2D or 3D photos. There's a lot of predict or classify output. Input and output data are
fresh visual data thanks to security cameras and YouTube, analysed by the ANN architecture, which then finds
but there are also issues. These include, but are not limited patterns and correlations. Input layers, hidden layers, and
to: background clutter; partial occlusion; viewpoint bias; an output layer make up an ANN. For image identification
inconsistent lighting; awkward camera angles; and tasks, neural networks excel. For this purpose, CNNs
shadows. stand out as the optimal neural network architecture. Deep
learning is a subfield of artificial neural networks that has
Despite the importance of video surveillance, there are
found use in a variety of problem-solving contexts,
several obstacles [9] to overcome when trying to identify
including but not limited to the ones listed above. In deep
human behaviours inside recordings, such as rapid view
learning [13], a unified algorithm with a single kind of
shifts, a lack of adequate view angles, etc. Information
activation function is used to handle input at each
loss over a longer period of time might be the outcome of
successive layer. Data characteristics useful for
these difficulties. Even if it is difficult to link the target
instruction, discovery, and comprehension are built layer
over spatial proximity in the case of camera position
by layer.
change. Associating objectives in an interval with little
available information calls for a fresh 3. Discriminative Deep Learning Models
strategy[10].Untrimmed depth recordings make it difficult
Discriminative feature learning models are demonstrated
to identify and classify posture changes. When it comes to
with consequent distributionclasses to boost their
the classification of posture change segments or body
classification and recognition powers. CNNs, RNNs and
movements, [11] uses CNN to extract characteristics from
other discriminative deep learning are used to identify
video frames.
human behaviour.
The majority of studies using human activity recognition
3.1 Convolutional Neural Network (CNNs)
have used eyesight or wearable sensors. Problems arise
when trying to use these wearable sensors [12] for HAR, Deep learning CNNs can detect and extract attributes from
since they need to be connected to the subject's body, an input image using learnable biases and weights.
which isn't always possible. There is an alternative to CNN[14]'s ability to collect both temporal and spatial
employing wearable sensors for HAR: using video frames connections in an image and minimise the picture without
captured by cameras. sacrificing aspects that help build a more scalable
prediction model are its main strengths.A conventional
2. Deep Learning
CNN has two primary parts. As illustrated in Figure 1,
Artificial neural networks (ANNs) replicate human brain there are two main components: feature extraction and
function. Neural networks learn from training input to classification.

Fig 1: CNN Architecture


Standard feature extraction architecture [2] contains input, characteristics like edge and corner colour, gradient
convolution, or pooling layers. Convolution layer extracts orientations, and so on are extracted in the first layer.
data from image or video frames [15]. A filter in the layer To further extract features from the input picture,
of convolution is gradually dragged across the input image this will be fed into higher-level layers. High-level
to convolve it. The feature map or activation map is the details about the input picture are extracted by the
result of the convolution layer. deeper convolution layers. The equation for the
convolution layers goes like this.-
• The 1D or 3D convolution stage is selected based on
the kind of input image[16]. Basic picture
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, 11(10s), 535–542 | 536
𝑙,𝑗 𝑗 𝑙−1,𝑗
𝑥𝑖 = 𝑓 (∑𝑚
𝑎=1 𝑤𝑎 𝑥𝑖+𝑎−1 + 𝑏𝑗 )
current state using the current input and the previous
state's output and uncover the relationship between
• One-dimensional or three-dimensional convolution, current and prior inputs.” Thus, it contains at least one
depending on the input picture type[16]. It is the job feedback link, allowing activation to circulate in a closed
of the first layer to extract the most basic aspects of loop. As a result of the hidden state, RNNs are very well-
the picture, such as the colours and orientations of suited to processing time-domain data, making them ideal
edges and corners and any gradients present. This candidates for the HAR dataset.
will be sent into further layers to help extract more
information from the input picture. When combined At each time step, the RNN model's gradients disappear
with the information from lower-level convolution and extend to zero and infinity [17]. In gradient back-
layers, the high-level features extracted by the propagation, the weight matrix multiplies the gradient
higher-level convolution layers reveal the whole signal many times. The gradient value will be driven to
context of the input picture. Formulation of the zero by repeated multiplication if the weight matrix's
convolution layers. eigen value is less than one, causing the vanishing
• • Fully connected layers are used to predict the gradient problem. When the weight matrix is larger than
output labels or classes of an input image by one, the value is pushed to infinity, leading to exploding
flattening the features recovered during feature gradient issues. LSTMmodel solves this issue by
extraction. The neurons in this layer are taught to introducing a novel component called a memory cell.[18].
perform non-linear functions by adjusting the 3.3 Hybrid Network
weights and biases between them. It's a useful tool
When two or more classifiers are combined, the resulting
for maximising outcomes like test scores. The next
network is called a hybrid network [19]. It combines the
layer, the output layer, will use SoftMax
predictions of two or more models, trains them together,
classification algorithms to assign labels to the
and integrates the results for improved accuracy. In the
images. The dropout layer is another popular layer in
realm of Machine Learning, this strategy is referred to as
CNN architecture. By eliminating part of the
"ensemble learning." When attempting to anticipate the
network's neurons during training, this layer may
same issues using numerous models, this method takes an
assist alleviate the overfitting issue.
average of the results. Combining CNN and LSTM[18]
Remarkable progress in artificial intelligence has been models lets you learn spatial and temporal data
made thanks to the development of many CNN simultaneously. “A CNN creates high-level spatial
architectures during the last few of decades. information on the activity of the photos for image
3.2 Recurrent Neural Network classification, while an RNN model extracts temporal
correlation between the clips' frames by remembering the
The usage of RNNs for sequential input in natural preceding frames. In addition to ensembles of numerous
language processing has increased in recent years. A models, hybrid approaches combine form-based and
recurrent neural network utilises its previous output as motion-based properties to depict an action”. Optical flow
input and contains hidden states. “Voice recognition, text or a histogram of motion intensity are used to capture
recognition, speech recognition, and forecasting require motion data, while shape-based features are extracted
time-series sequential data. [17] An RNN can calculate the from the still picture for use in action detection.

4. Working Model and Proposed Methodology

Fig 2: Working Model for Human Action Recognition


Convolutional and recurrent neural networks are minimise spatial dimensions and boost input array depth,
commonly used in human action recognition models. To the working model will combine Convolutional and Max-
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, 11(10s), 535–542 | 537
pooling layers[20]. We will next add a series of fully sequences to learn and analyse the temporal aspects for
connected layers on top of this, with the last (output) layer human activity detection, are used to manage the
consisting of six neurons, one for each of the enumerated spatiotemporal data. The LSTM network is used to
operations. A likelihood that the input video belongs to a eliminate extraneous details from the input data.
certain category will be provided by the model. The most However, the suggested deep learning method works well
likely classification label for that video is the one we'll use for temporal information over relatively short periods of
to describe it. There will be two distinct sets of time but is not suited for longer sequences. In reality, the
information, one for training and one for testing. The LSTM network is used to categorise sequential input,
model will be educated with familiar information from the which may then be utilised for action recognition.
training set, and then put to the test with information it has
Bidirectional LSTM networks classify using extracted
never seen before. After collecting data, the video model
characteristics. Certain instances of video classification
extracts just the frames it needs in a predetermined
for action recognition employ transfer learning [21].
manner, which is then fed into a machine learning
Transfer learning applies the insights learned from
algorithm so that it can make a more accurate forecast.
training a model on a large dataset similar to a new dataset.
Recent years have seen the rise of deep learning as a As a result, the characteristics learnt in the training set are
popular and very efficient method for human activity being put to the test in the validation set. Creating a
identification in video. Convolutional neural networks network capable of handling massive amounts of video
(CNNs) are employed in the proposed study as a deep data requires a tremendous amount of storage space and
learning technique for extracting features from video processing power.
clips. When training a model for classification, the
5. Implementation
convolutional layer may be used to extract and learn the
features that are then utilised to classify data. The 5.1 Dataset
suggested convolutional neural network (CNN) model can
When it comes to HAR studies, datasets are crucial. The
learn spatial characteristics from a single video frame. The
primary goal of HAR is to determine what a person does
geographical information is captured well by CNN, but
based on information gathered from many sources.
the temporal data is not. For human activity identification,
Numerous research have been conducted to better
temporal data from video sequences is also crucial for
organise and connect this mountain of data with relevant
capturing motion.
facts and insights. A dataset that details behaviour in a
Convolutional layers let CNN extract characteristics from variety of contexts is necessary for the study's success.
input video frames while maintaining key information. Therefore, the dataset's availability and quality are crucial
CNN is utilised for feature extraction, and the weights are to training a model to accurately recognise activities.
determined by the pre-training processing of its
Numerous researchers have made use of the various
specialised neural network networks. The input picture is
publicly available datasets that have been created and
utilised by the feature extraction network during
released. Using the same datasets allows for a more
extracting features. The neural network performs
accurate comparison of various methods and an
classification based on the retrieved characteristics.
assessment of their effectiveness. Unlike real-world
Using the input data and the CNN filter, the convolutional situations, many datasets were captured in standardised
layers generate the output features. experimental settings with fixed cameras and backdrops.
The input x (i, j) is convolved with the filter w (i, j) in the Datasets can be broken down further into subcategories
convolutional layer. C d is the filter size, which is recorded based on a variety of criteria, including the type of data
in z. (i, j)- collection method used [22] and the type of activity being
studied. KTH's Activity Dataset was utilised for this
𝑧(𝑖, 𝑗) = 𝑥(𝑖, 𝑗) × 𝑤(𝑖, 𝑗)
𝑐 𝑑 study. This dataset, which has been around since 2004, has
= ∑ ∑ 𝑥(𝑎, 𝑏). 𝑤(𝑖 − 𝑎, 𝑗 − 𝑏) 25 people engaging in 6 distinct activities across 4 distinct
𝑎=−𝑐 𝑏=−𝑑
controlled environments. A stationary camera films one
person walking, jogging, running, boxing, waving, and
In the process of action recognition using a convolutional
applauding. Its views, stories, and action-packed activities
neural network (CNN), features are retrieved that include
vary.
spatiotemporal information. Recurrent neural networks,
including LSTM in particular for pre-processing of video

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, 11(10s), 535–542 | 538
Fig 3: KTH [16] Activity Dataset
5.2 Data preparation
TensorFlow with the Keras backend was used to create To attain the required performance accuracy in machine
both the CNN and the RNN. RNN's input format is distinct learning, it is crucial to develop an effective machine
from that of CNN's. Each RNN data sample has to be in learning model assessment metrics. Each kind of
the form of a data sequence. In the context of the study at challenge requires a unique set of criteria for examination.
hand, a single data sequence corresponds to a single trial Different metrics are used for classification, regression,
of an activity done by a single volunteer. RNN requires ranking, clustering, associating, etc. “The evaluation
uniformly sized input data samples. However, the sizes of metrics not only provide the parameter by which the
the individual data files vary. Cropping is done before model's performance can be gauged, but also aid in
RNN input data files are of the same length. explaining the results obtained with alternative
implementations. There are a variety of criteria used to
5.3 Results and Analysis
evaluate the efficacy of machine learning models.”AOC
We conduct experiments on the KTH activity dataset and Many accuracy measurements may be determined,
analyse the results using a model that has been pre-trained including the F1 score, mean absolute error, and mean
using the KTH dataset. The remaining 30% is utilised for squared error.
testing after 70% of the data has been used for training.
The enigma matrix looks like-
Python's OpenCV package is used to extract dense optical
flow. The model's deep learning component is
implemented using Kears. For every action in the KTH
activity dataset, confusion matrix evaluates model
performance.
Table 1: Matrix of Confused
Negative Positive
Negative TN FN
Actual
Class

Positive FP TP

TP are cases that have been accurately identified as "True negatives," are those that have been appropriately
positive, whereas FP are those that have been labelled as such. FNare the opposite of true negatives
misidentified as negative. Truly negative examples, or (DN).

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, 11(10s), 535–542 | 539
Each action in the dataset is given a precision and recall
based on the aforementioned values.
Table 2: Precision and Recall for each activity
Activity Precision Recall
Walking 0.93 0.95
Jogging 0.80 0.82
Running 0.90 0.85
Boxing 0.94 0.90
Hand waving 0.97 0.94
Hand Clapping 0.84 0.91

Table 3: Confusion Matrix


Walking 0.95 0.60 0.03 0.00 0.00 0.00
Jogging 0.50 0.82 0.00 0.00 0.00 0.00
Running 0.15 0.40 0.85 0.00 0.00 0.00
True Class

Boxing 0.00 0.10 0.31 0.90 0.00 0.00


Hand waving 0.10 0.00 0.00 0.00 0.94 0.30
Hand
Clapping 0.00 0.00 0.00 0.00 0.00 0.91

Predicted Class

The suggested model's confusion matrix for all six actions isn't a universal benchmark dataset that covers all the
is shown in Table 3. All of the diagonal entries in the bases in terms of representing real-world
aforementioned confusion matrix have values extremely circumstances and behaviours. Because of this, HAR
near to 1, suggesting that the accuracy of the predictions assessment and training have become less reliable.
for each action is high. More work is required to compile a reliable
collection of accurate data, and a standard approach
6. Limitations
should be used to conduct quantitative comparisons
Recognising human actions has been a hot topic in the across different benchmarks.
scientific community recently. The widespread • • The intra- and inter-class differences in HAR
application of HAR to industries as varied as sports, video provide a significant difficulty. Subjects'
surveillance, filmmaking, and medicine is also a experiences of the same task might vary depending
contributing factor. In order to recognise actions, several on their size, attire, and character traits. The way a
techniques have been tried. The effectiveness of sensor person walks, for instance, may be completely
and video data for HAR depends on devices, data quality, unique to that individual. In addition, certain
experimental settings, light fluctuations, moving physical activities, such as walking or jogging, may
background, viewpoint shift, occlusion, noise, and so on. appear remarkably similar to others. It may be
There is a discussion of some of the difficulties and challenging to teach a system to recognise complex
restrictions of HAR here. behaviours that include numerous activities, such as
sipping tea while conversing on the phone. More
• Video datasets are notorious for their high memory
precise data on these actions and activities enables a
requirements. As a result, it would be impracticable
deep learning model to discern between them, and
to put the complete dataset into RAM. Some
hybrid devices can train a multileveled prediction
solutions have been presented to this problem, such
model to identify composite activities.
as passing a URL to download a video from a
website like YouTube. Another issue is that there
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, 11(10s), 535–542 | 540
• Dynamic backgrounds, occlusions, illumination “Human Action Recognition for Pose-based
variation, noise, varying lighting, varying Attention: Methods on the Framework of Image
perspectives, and low-quality photos and videos are Processing and Deep Learning,” 2021 56th Int. Sci.
all commonplace in real-world videos. Noise, Conf. Information, Commun. Energy Syst. Technol.
undesired signal detection, and subpar sensors and ICEST 2021 - Proc., pp. 23–26, 2021, doi:
gearbox systems may all compromise the quality of 10.1109/ICEST52640.2021.9483503.
data acquired by sensor-based methods. These [3] R. Poppe, “Vision-based human motion analysis :
factors increase the difficulty and difficulty level of An overview,” vol. 108, pp. 4–18, 2007, doi:
HAR. Sensor-based HAR may make use of multi- 10.1016/j.cviu.2006.10.016.
sensor data, as well as data from other sources, such [4] T. Özyer, D. S. Ak, and R. Alhajj, “Human action
as RGB, depth, a video skeleton, and more. recognition approaches with video datasets—A
• A HAR system may use more energy and materials. survey,” Knowledge-Based Syst., vol. 222, p.
Real-time precise sensing is essential for many uses, 106995, 2021, doi: 10.1016/j.knosys.2021.106995.
including video monitoring and care for the elderly. [5] R. Bodor, “Vision-Based Human Tracking and
More processing power, electricity, and memory will Activity Recognition.”
be required to process the massive amounts of data [6] S. Patil and K. S. Prabhushetty, “Bi-attention LSTM
generated by sensors and videos. Some programmes, with CNN based multi-task human activity detection
including those concerned with security, need to be in video surveillance,” Int. J. Eng. Trends Technol.,
able to foresee a user's next move based on an vol. 69, no. 11, pp. 192–204, 2021, doi:
analysis of their past actions. Because of these 10.14445/22315381/IJETT-V69I11P225.
constraints, automated activity identification in real [7] S. S. Begampure and P. M. Jadhav, “Intelligent
time is now more difficult than ever. Video Analytics For Human Action Detection: A
Deep Learning Approach With Transfer Learning,”
7. Conclusion Int. J. Comput. Digit. Syst., vol. 11, no. 1, pp. 63–71,
Computer vision, robotics, and various other applications 2022, doi: 10.12785/ijcds/110105.
use Human Action Recognition (HAR) to analyse and [8] D. Cavaliere, V. Loia, A. Saggese, S. Senatore,
interpret human activities. Accelerometers, gyroscopes, and M. Vento, “Knowledge-Based Systems A
and magnetometers in smartphones, smartwatches, or human-like description of scene events for a proper
video security cameras make data collection easy. Our UAV-based video content analysis ✩,” Knowledge-
study makes use of a publicly accessible activity dataset Based Syst., vol. 178, no. 2019, pp. 163–175, 2020,
including a variety of body parts and positions. doi: 10.1016/j.knosys.2019.04.026.
[9] H. Yu et al., “Multiple human tracking in wearable
We examine the information and sort people into
camera videos with informationless intervals,”
categories using methods like machine learning and deep
Pattern Recognit. Lett., vol. 112, pp. 104–110, 2018,
learning architecture. Researchers showed that raw data
doi: 10.1016/j.patrec.2018.06.003.
alone might provide better results if a balanced dataset is
[10] H. Madokoro, S. Nix, H. Woo, and K. Sato, “A mini-
utilised for training the model. Activity detection and
survey and feasibility study of deep-learning-based
categorization using video is useful in certain domains,
human activity recognition from slight feature
such as sports. We employed a state-of-the-art method that
signals obtained using privacy-aware environmental
relied on a sports-themed model that had already been
sensors,” Appl. Sci., vol. 11, no. 24, pp. 1–31, 2021,
trained to do this. Our dataset may be expanded to include
doi: 10.3390/app112411807.
the classification of scoring actions linked to individual
[11] X. Yang et al., “A CNN-based posture change
sports, in addition to the classification of various sports
detection for lactating sow in untrimmed depth
based on activity conducted by the person in the video.
videos,” Comput. Electron. Agric., vol. 185, no.
The findings have established a baseline performance for
March, p. 106139, 2021, doi:
output, and the hard dataset will aid researchers in
10.1016/j.compag.2021.106139.
classifying computer vision activities with very
[12] I. U. Khan, S. Afzal, and J. W. Lee, “Human activity
comparable intraclass features.
recognition via hybrid deep learning based model,”
Reference: Sensors, vol. 22, no. 1, 2022, doi:
10.3390/s22010323.
[1] M. M. Hassan, S. Huda, M. Z. Uddin, A. Almogren,
[13] L. Mo, F. Li, Y. Zhu, and A. Huang, “Human
and M. Alrubaian, “Human Activity Recognition
physical activity recognition based on computer
from Body Sensor Data using Deep Learning,” J.
vision with deep learning model,” Conf. Rec. - IEEE
Med. Syst., vol. 42, no. 6, 2018, doi:
Instrum. Meas. Technol. Conf., vol. 2016-July, 2016,
10.1007/s10916-018-0948-z.
doi: 10.1109/I2MTC.2016.7520541.
[2] D. Nikolova, I. Vladimirov, and Z. Terneva,
International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, 11(10s), 535–542 | 541
[14] P. Y. Chen and V. W. Soo, “Humor recognition Continuous Abnormal Situations Detection and,”
using deep learning,” NAACL HLT 2018 - 2018 Procedia Comput. Sci., vol. 150, pp. 532–539, 2019,
Conf. North Am. Chapter Assoc. Comput. Linguist. doi: 10.1016/j.procs.2019.02.089.
Hum. Lang. Technol. - Proc. Conf., vol. 2, pp. 113– [20] V. Mavani, S. Raman, and K. P. Miyapuram, “Facial
117, 2018, doi: 10.18653/v1/n18-2018. Expression Recognition using Visual Saliency and
[15] A. A. Abed and S. A. Rahman, “Python-based Deep Learning Viraj Mavani L . D . College of
Raspberry Pi for Hand Gesture Recognition Python- Engineering Shanmuganathan Raman Indian
based Raspberry Pi for Hand Gesture Recognition,” Institute of Technology Krishna P Miyapuram
no. September, 2017, doi: 10.5120/ijca2017915285. Indian Institute of Technology,” pp. 2783–2788,
[16] M. Latah, “Human action recognition using support 2012.
vector machines and 3D convolutional neural [21] A. B. Sargano, X. Wang, P. Angelov, and Z. Habib,
networks,” Int. J. Adv. Intell. Informatics, vol. 3, no. “Human action recognition using transfer learning
1, pp. 47–55, 2017, doi: 10.26555/ijain.v3i1.89. with deep representations,” Proc. Int. Jt. Conf.
[17] Z. Shi, J. A. Zhang, R. Xu, and G. Fang, “Human Neural Networks, vol. 2017-May, pp. 463–469,
Activity Recognition Using Deep Learning 2017, doi: 10.1109/IJCNN.2017.7965890.
Networks with Enhanced Channel State [22] T. B. Moeslund, A. Hilton, and V. Kru, “A survey of
Information,” 2018 IEEE Globecom Work. GC advances in vision-based human motion capture and
Wkshps 2018 - Proc., 2019, doi: analysis,” vol. 104, pp. 90–126, 2006, doi:
10.1109/GLOCOMW.2018.8644435. 10.1016/j.cviu.2006.08.002.
[18] S. Chung, J. Lim, K. J. Noh, G. Kim, and H. Jeong, [23] Singh, S. ., Wable, S. ., & Kharose, P. . (2021). A
“Sensor data acquisition and multimodal sensor Review Of E-Voting System Based on Blockchain
fusion for human activity recognition using deep Technology. International Journal of New Practices
learning,” Sensors (Switzerland), vol. 19, no. 7, in Management and Engineering, 10(04), 09–13.
2019, doi: 10.3390/s19071716. https://doi.org/10.17762/ijnpme.v10i04.125
[19] O. S. Amosov, S. G. Amosova, Y. S. Ivanov, and S. [24] Veeraiah, D., Mohanty, R., Kundu, S., Dhabliya, D.,
V Zhiganov, “ScienceDirect ScienceDirect Tiwari, M., Jamal, S. S., & Halifa, A. (2022).
ScienceDirect Using the Ensemble of Deep Neural Detection of malicious cloud bandwidth
Networks for Normal and Using the Ensemble of consumption in cloud computing using machine
Deep Neural Networks for Normal and Abnormal learning techniques. Computational Intelligence and
Situations Detection and Recognition in the Neuroscience, 2022 doi:10.1155/2022/4003403

International Journal of Intelligent Systems and Applications in Engineering IJISAE, 2023, 11(10s), 535–542 | 542

You might also like