Continuous Hand Gesture Segmentation and Acknowledgement of Hand Gesture Path For Innovative Effort Interfaces
Continuous Hand Gesture Segmentation and Acknowledgement of Hand Gesture Path For Innovative Effort Interfaces
Corresponding Author:
Bhupesh Kumar Dewangan
Department of Computer Science and Engineering, OP Jindal University
Punjipathra, Raigarh, Chhattisgarh 496109, India
Email: [email protected]
1. INTRODUCTION
Effective communication involves not only spoken words but also gestures, they are essential for
expressing and boosting communication’s expressiveness. This applies to both the speaker and the audience.
In the realm of human-computer interaction (HCI), gestures are instrumental in facilitating seamless
interaction. Gestures serve as a bridge between the speaker’s intent and the audience’s understanding,
forming the foundation of interaction [1]. When it comes to recognizing hand gestures, there are two primary
approaches: non-vision-based and vision-based [2]. Among these, vision-based methods are particularly
appealing due to their natural feel. Vision-based approaches can be further categorized as either active or
passive. Active sensing techniques have emerged as a successful avenue for gesture recognition, notably
through the utilization of devices such as Microsoft_kinect V2 [3], [4] and Leap_Motion cameras. These
technologies offer a dynamic and responsive means of capturing gestures, making the recognition process
more effective and accurate. In summary, effective communication relies not only on words but also on
gestures, which are pivotal in both conveying and enhancing the overall message. In the context of HCI,
gestures serve as a fundamental tool, bridging the gap between speakers and their audience. The methods
used for recognizing hand gestures vary, with vision-based approaches, particularly those employing active
sensing devices like Kinect V2 and leap motion cameras, proving to be highly successful in this endeavor.
Tools have been developed to aid linguists in analyzing gestures during interactions [4]. Different
aspects of a gesture, such as stillness, cerebral infarction, planning, hold, and retraction, are involved in it.
Classification is the first phase in applications that demand movements of the hands, and this provides a
significant challenge for movement analysis [4], [5]. In the context of recognition and classification of
frequent hand movements, two common approaches are considered: i) image division before recognition and
ii) cooperative separation and identification.
The latter approach, synchronized segmentation and recognition, is often favored as it feels more
natural and doesn’t require additional motion [5], [6]. The primary goal of this study is to create a framework
that combines segmentation and recognition simultaneously. It is required for the system to perform
categorization according to physical as well as ordered data when using inactive monitoring, which is often
used for vision-based interactions between humans and computers applying devices like Microsoft.
Whenever movement, the hand’s location in each frame can be determined using spatial categorization, and
the gesture’s beginning and conclusion points can be determined using temporal fragmentation. Both spatial
and temporal segmentation are important in a continuous video stream. It's crucial to keep within
consciousness that when viewing such flows, the movements that are relevant are frequently enmeshed
within a chaotic or dynamic background. Therefore, communicating knowledge of coordinates for position
and path velocity is crucial to effective interactions. Variations in gesture velocity may also offer difficulties.
2. PROPOSED METHOD
The process of segmenting gestures into distinct phases introduces complexities in the analysis of
these gestures. A number of obstacles must be solved in order to achieve the objective of building an
architecture for ongoing palm motion interpretation and identification that utilizes spatial-temporal and path
variables. We have identified three key problems in the framework of this research and have developed
remedies for each. The multilayer perceptron (MLP) [7]-[18], which has a deep layer building and a suitable
sampling method, is the deep learning system we ultimately use to achieve.
in visual form. The suggested deep learning connect accepts the retrieved features after being re-sampled
with nearest neighbor based algorithms. The next section gives the planned network’s features.
Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 286-295
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864 289
𝐸 = 𝑊𝑇𝑥𝑖 + 𝑏 (1)
whereas for numerous programs and deposits the formula will become:
𝐸 = ∑𝑚
𝐼=1 𝑊ℎ𝑖 ∗ 𝐺𝑖 + 𝑏 (2)
𝐺 = 𝑔1, 𝑔2 . . . 𝑔𝑚 (3)
𝑊1 = 𝑤1 , 𝑤1 . . . 𝑤1 (4)
The weights and unfairness are values that are arbitrarily prepared original. The weights and
unfairness are used to calculate the output 𝐸. These networks are tuned using optimal values of unfairness
and masses to fit our data.
Finally, the system categorizes the input into output class ŷ based on an stimulation function a
which is either rectified linear unit (ReLu) or sigmoid (𝜎) on 𝑧, such that.
𝛼 = 𝜎(𝑧) (5)
ŷ = {𝐷, 𝑃, 𝑆, 𝐻, 𝑅} (7)
MLP: it is a profound, counterfeit neural system formed on more than one perceptron. The
information layer gets the sign and settles on a choice or expectation. In any case, there are discretionary
quantities of shrouded layers that goes about as an MLP’s apparent processing engine [28]. MLPs are often
capable of carrying out instructional activities. Descent of gradients is the method of changing weights as
well as biases in line with the cost model by means of back-propagation. There are many available loss
functions. We will simply use (8).
The convolutional neural network (CNN) network has been recognized as one of the most important
machine vision applications [29]. By analyzing low-level information, such as the movement of arms, and
then setting up the representation, which is more abstract and specialized, through a series of levels of
convolution, CNN model is able to execute categorization.
4. RESEARCH MODELS
Theoretical background, image and video processing, feature extraction, gesture representation and
gesture recognition algorithms. In the next section, we will introduce some of the principles that we use in
our research. In this we have processed the images through feature extraction techniques. After that we have
applied the gestures recognition algorithm to classifying the images. Then after real-time processing have to
be done.
Continuous hand gesture segmentation and acknowledgement of hand gesture … (Prashant Richhariya)
290 ISSN: 2089-4864
video processing techniques, such as image filtering, segmentation, and feature extraction, are often applied
to isolate and enhance the hand region in the captured frames.
Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 286-295
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864 291
Figure 4. A gesture that represents the concept of “distortion” through selected images
4.2. Review of ML based techniques for continuous hand gesture segmentation and recognition
Krueger [20] applied the inductive MLP [21], [22], which is a supervised learning technique, for
addressing the signal unit division issue. The back-spread strategy is executed utilizing angle plunge, and a
versatile learning rate. Quan [23] demonstrated signal stage division as an issue of characterization, and
utilized SVM to plan a model to get familiar with the motion designs of each phase. The work mainly
addressed the limitations of the segmentation approach due to human behavior and conducted analysis by
considering the standpoint of linguistics and psycholinguistics specialists. Cao et al. [24] demonstrated the
issue as a characterization task, and applied SVM. The work exploited the transient parts of the issue and
utilized a few kinds of information pre-preparing to consider time and recurrence area highlights. Sturman
and Zeltzer [25] presents a survey about fleeting parts of hand motion examination, concentrating on
applications identified with normal discussion and psycholinguistic investigation. Mitra and Acharya [26]
constructed three separate identification models using different training techniques: First, a linguistic model
using an empirical language model; second, a signal model using a Bayesian or the CART system selection
tree; and third, a language model with a Bernoulli tree of choices. Then, in order to incorporate the outputs
from these modules and provide finding results, the hidden Markov model also called the HMM, is used.
Few such inquiries about examinations are not straightforwardly practically identical. However, it
gets helpful to analyses the exhibitions which were at that point achieved in such sort of issue. The outcomes
are recorded in Table 1.
5. METHOD
The entire experiment’s steps are shown in Figure 5. A few analyses have been done to assess and
improve the exhibition of the models worked with deep learning systems and utilizing the information
portrayals depicted by including different parameter. Two arrangements of investigations have been
concurred specifically for this investigation. In the first set, trials are conducted using a straightforward MLP
decoder powered by AI. The suggested supervised deep learning network with the kernel neural network re-
sampling approach is utilized in the final set of experiments. The recommended method utilises several
parameters.
Key parameters:
− 𝐼𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦: these measures the accuracy of an organization depiction and can be presented as:
− 𝐼𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛: over all favourable findings, it is the part of real results and is displayed as.
Continuous hand gesture segmentation and acknowledgement of hand gesture … (Prashant Richhariya)
292 ISSN: 2089-4864
− 𝐼𝐹_𝑠𝑐𝑜𝑟𝑒: it is calculated as the weighted average based on accuracy and review. Its value ranges from 0
to 1, with 1 being the best result.
Tables 2 and 3 shows the outcomes of tests done using the deep learning networks (DLN)
framework and neural networks (NN), respectively. Employing the thinking processes described in the
present article, comparing the levels of accuracy, average precision, and recall in all situations. Here are the
results shown in graphs for each context from the two experiment sets. Using the recommended framework,
the highest achievable accuracy in U3, V3, and T3 is, respectively, 93, 86, and 84. The average accuracy gain
across all situations was 18%, which is a substantial rise above previous works.
Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 286-295
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864 293
7. CONCLUSION
Gesture segmentation and recognition has several inherent difficulties, as it does not indicate a clear
starting point for the phase. Therefore, different segments of the same input video can be presented to
different researchers. There is also difficulty establishing a resting position and maintaining posture. To
better understand the classifier and its performance, the gesture behavior should be recorded in different
sessions. We develop a framework that addresses three related questions. Experimentation and evaluation are
performed by detecting, segmenting, and recognizing hand movements in videos. After resampling the image
using a KNN-based method, a deep learning network was used to perform gesture recognition, achieving
better accuracy than other base learning algorithms. It turns out that interesting motion embedded in a video
stream can be easily learned and recognized by frame resampling. The performance of the framework is
evaluated based on several metrics, including F-score and classification accuracy. We also compare the
recital of this framework with recently proposed accepted works. We face the challenge of using deep
learning algorithms based on spatiotemporal and path information in C_H__S_R. Finally, this work raises
open questions for researchers about simultaneous segmentation and recognition at different stages or the
definition of important gestures.
REFERENCES
[1] C. Yang, D. K. Han, and H. Ko, “Continuous hand gesture recognition based on trajectory shape information,” Pattern
Recognition Letters, vol. 99, pp. 39–47, Nov. 2017, doi: 10.1016/j.patrec.2017.05.016.
[2] V. Bhame, R. Sreemathy, and H. Dhumal, “Vision based hand gesture recognition using eccentric approach for human computer
interaction,” in 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Sep.
2014, pp. 949–953, doi: 10.1109/ICACCI.2014.6968545.
[3] C. Keskin, F. Kıraç, Y. E. Kara, and L. Akarun, “Real time hand pose estimation using depth sensors,” in Consumer Depth
Cameras for Computer Vision, 2011, pp. 119–137, doi: 10.1007/978-1-4471-4640-7_7.
[4] Z. Ren, J. Meng, J. Yuan, and Z. Zhang, “Robust hand gesture recognition with kinect sensor,” in MM’11-Proceedings of the
2011 ACM Multimedia Conference and Co-Located Workshops, 2011, pp. 759–760, doi: 10.1145/2072298.2072443.
[5] R. C. B. Madeo, S. M. Peres, and C. A. de M. Lima, “Gesture phase segmentation using support vector machines,” Expert
Systems with Applications, vol. 56, pp. 100–115, Sep. 2016, doi: 10.1016/j.eswa.2016.02.021.
[6] M.-C. Popescu, E. V. Balas, L. Perescu-Popescu, and N. Mastorakis, “Multilayer perceptron and neural networks,” WSEAS
Transactions on Circuits and Systems, 2009, doi: 10.5555/1639537.1639542.
[7] J. Alon, V. Athitsos, Q. Yuan, and S. Sclaroff, “A unified framework for gesture recognition and spatiotemporal gesture
segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 9, pp. 1685–1699, 2009, doi:
10.1109/TPAMI.2008.203.
[8] B. K. Dewangan, A. Jain, R. N. Shukla, and T. Choudhury, “An ensemble of bacterial foraging, genetic, ant colony and particle
swarm approach EB-GAP: a load balancing approach in cloud computing,” Recent Advances in Computer Science and
Communications, vol. 15, no. 5, 2020, doi: 10.2174/2666255813666201218161955.
[9] B. K. Dewangan, A. Agarwal, M. Venkatadri, and A. Pasricha, “A self-optimization based virtual machine scheduling to workloads in
cloud computing environment,” International Journal of Engineering and Advanced Technology, vol. 8, no. 4, pp. 91–96, 2019.
Continuous hand gesture segmentation and acknowledgement of hand gesture … (Prashant Richhariya)
294 ISSN: 2089-4864
[10] B. K. Dewangan, A. Agarwal, M. Venkatadri, and A. Pasricha, “SLA-based autonomic cloud resource management framework by
Antlion optimization algorithm,” International Journal of Innovative Technology and Exploring Engineering, vol. 8, no. 4,
pp. 119–123, 2019.
[11] B. K. Dewangan and P. Shende, “The sliding window method: an environment to evaluate user behavior trust in cloud
technology,” International Journal of Advanced Research in Computer and Communication Engineering, vol. 2, no. 2, pp. 1158-
1162, 2013.
[12] S. K. Baghel and B. K. Dewangan, “Defense in depth for data storage in cloud computing,” International Journal of Technology,
vol. 2, no. 2, pp. 58–61, 2012.
[13] B. K. Dewangan, M. Venkatadri, A. Agarwal, A. Pasricha, and T. Choudhury, “An automated self-healing cloud computing
framework for resource scheduling,” International Journal of Grid and High Performance Computing, vol. 13, no. 1, pp. 47–64,
2021, doi: 10.4018/IJGHPC.2021010103.
[14] B. K. Dewangan, A. Agarwal, T. Choudhury, and A. Pasricha, “Workload aware autonomic resource management scheme using grey
wolf optimization in cloud environment,” IET Communications, vol. 15, no. 14, pp. 1869–1882, 2021, doi: 10.1049/cmu2.12198.
[15] T. Choudhury, B. K. Dewangan, R. Tomar, B. K. Singh, T. T. Toe, and N. G. Nhu, “Autonomic computing in cloud resource
management in industry 4.0,” EAI/Springer Innovations in Communication and Computing, 2021.
[16] L. Chen, M. P. Harper, Y. Liu, and E. Shriberg, “Multimodal model integration for sentence unit detection,” ICMI’04-Sixth
International Conference on Multimodal Interfaces, 2004, pp. 121–128, doi: 10.1145/1027933.1027955.
[17] S. Haykin, Neural networks and learning machines, New Jersey: Pearson Prentice Hall, 2008.
[18] M. A. Nielsen, Neural networks and deep learning, Determination press San Francisco, CA, USA, 2015.
[19] P. K. Wagner, R. C. Madeo, S. M. Peres, and C. A. Lima, “Segmentation of gestural units with multilayer perceptrons (In
Portuguese),” Conference: X Encontro Nacional de Inteligência Artificial e Computacional (ENIAC), 2013.
[20] M. W. Krueger, “Artificial reality II,” Reading (Mass): Addison-Wesley, p. 304, 1991.
[21] W. Fan et al., “A method of hand gesture recognition based on multiple sensors,” 2010 4th International Conference on
Bioinformatics and Biomedical Engineering, iCBBE 2010, 2010, doi: 10.1109/ICBBE.2010.5516722.
[22] X. Zhang, X. Chen, Y. Li, V. Lantz, K. Wang, and J. Yang, “A framework for hand gesture recognition based on accelerometer
and EMG sensors,” IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans, vol. 41, no. 6, pp. 1064–
1076, 2011, doi: 10.1109/TSMCA.2011.2116004.
[23] Y. Quan, “Chinese sign language recognition based on video sequence appearance modeling,” Proceedings of the 2010 5th IEEE
Conference on Industrial Electronics and Applications, ICIEA 2010, 2010, pp. 1537–1542, doi: 10.1109/ICIEA.2010.5514688.
[24] X. Y. Cao, H. F. Liu, and Y. Y. Zou, “Gesture segmentation based on monocular vision using skin color and motion cues,” IASP 10-
2010 International Conference on Image Analysis and Signal Processing, 2010, pp. 358–362, doi: 10.1109/IASP.2010.5476096.
[25] D. J. Sturman and D. Zeltzer, “A survey of glove-based input,” IEEE Computer graphics and Applications, vol. 14, no. 1, pp. 30–
39, 1994, doi: 10.1109/38.250916.
[26] S. Mitra and T. Acharya, “Gesture recognition: a survey,” IEEE Transactions on Systems, Man and Cybernetics Part C:
Applications and Reviews, vol. 37, no. 3, pp. 311–324, 2007, doi: 10.1109/TSMCC.2007.893280.
[27] Y. Wu and T. S. Huang, “Vision-based gesture recognition: a review,” Lecture Notes in Computer Science (including subseries
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1739, pp. 103–115, 1999, doi: 10.1007/3-540-
46616-9_10.
[28] Y. Hamada, N. Shimada, and Y. Shirai, “Hand shape estimation using sequence of multi-ocular images based on transition
network,” Proceedings of the International Conference on Vision Interface, 2002, pp. 161–166.
[29] N. Tanibata, N. Shimada, and Y. Shirai, “Extraction of hand features for recognition of sign language words,” The 15th
International Conference on Vision Interface, 2002, pp. 391–398.
[30] Y. Wu and T. S. Huang, “Nonstationary color tracking for vision-based human-computer interaction,” IEEE Transactions on
Neural Networks, vol. 13, no. 4, pp. 948–960, 2002, doi: 10.1109/TNN.2002.1021895.
[31] C. Tomasi, S. Petrov, and A. Sastry, “3D tracking=classification+interpolation,” in Proceedings of the IEEE International
Conference on Computer Vision, 2003, vol. 2, pp. 1441–1448, doi: 10.1109/iccv.2003.1238659.
[32] G. Ye, J. J. Corso and G. D. Hager, “Gesture recognition using 3D appearance and motion features,” 2004 Conference on
Computer Vision and Pattern Recognition Workshop, Washington, DC, USA, 2004, pp. 160-160, doi: 10.1109/CVPR.2004.356.
[33] J. Y. Lin, Y. Wu, and T. S. Huang, “3D Model-based hand tracking using stochastic direct search method,” in Proceedings-Sixth
IEEE International Conference on Automatic Face and Gesture Recognition, 2004, pp. 693–698, doi: 10.1109/AFGR.2004.1301615.
[34] R. Aggarwal, S. Swetha, A. M. Namboodiri, J. Sivaswamy, and C. V. Jawahar, “Online handwriting recognition using depth
sensors,” Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 2015, pp. 1061–1065,
doi: 10.1109/ICDAR.2015.7333924.
BIOGRAPHIES OF AUTHORS
Int J Reconfigurable & Embedded Syst, Vol. 13, No. 2, July 2024: 286-295
Int J Reconfigurable & Embedded Syst ISSN: 2089-4864 295
Piyush Chauhan graduated from IEET now Baddi University. He received the,
M.Tech., computer science engineering and the Ph.D. degree from Jaypee University of
Information Technology. He worked as an assistant professor in computer science and
engineering at University of Petroleum and Energy Studies Dehradun. He has supervised over
30 master thesis and 6 Ph.D. dissertations. He received several prizes for the acknowledgment
of his outstanding research and teaching performance. He can be contacted at email:
[email protected].
Continuous hand gesture segmentation and acknowledgement of hand gesture … (Prashant Richhariya)