0% found this document useful (0 votes)
12 views

7seminar Report

resnet

Uploaded by

thunderthor000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

7seminar Report

resnet

Uploaded by

thunderthor000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 12

HCI with Hand Gesture Recognition Using ResNet and MobileNet 2023-2024

1. INTRODUCTION
Deep learning is a subset of machine learning where artificial neural networks,
algorithms inspired by the human brain, learn from large amounts of data. These
networks can identify patterns and features in data, making them particularly effective for
complex tasks like speech recognition, language translation, and image recognition. In
image processing, deep learning is used to perform tasks such as image classification,
object detection, and image generation. By training on a vast dataset of images, deep
learning models can learn to recognize various objects and features in new images. This
capability is crucial for applications like autonomous vehicles, facial recognition systems,
medical image analysis, and even artistic image generation. The versatility and accuracy
of deep learning in interpreting and manipulating images have made it a key technology
in modern image processing. Human-computer interaction (HCI) has now become a
frequent element of our lifestyles as computer technology and hardware equipment have
advanced. The usage of hand signals in this HCI has piqued people’s curiosity since it is
a nice method of engaging with the computer. A hand robot control, virtual gaming, and
natural user interfaces is just a few of the applications. A well-known use of hand gesture
recognition is the identification of human data, namely, sign language. Sign language is a
visual language in which ideas are communicated by a series of expressive hand motions
in a certain order. For deaf people, it is their only means of communication. According to
the World Health Organization (WHO), 5% of the world’s population (about 360 million
people) has moderate to severe hearing loss and can only communicate via their local
sign language (WHO, 2015). Because this communication is difficult for the auditory
population to grasp, there is a communication gap between the normal and speech-
hearing impaired groups. As a result, computer-assisted gesture detection may be used to
translate between sign languages. is would be beneficial and would act as a bridge
between the two communities. Recognition of gestures in ISL is a challenging task. ISL
uses both the hands for portraying a gesture as opposed to ASL (alphabets), which uses
only one hand. This increases complexity while applying feature extractors like Hough
Transform and Scale Invariant Feature Transform (SIFT). Also, while trying to predict
gestures in real time, the problem of background complexity occurs which might inhibit

Department of Computer Science and Engineering, VCET, Puttur. Page 1


HCI with Hand Gesture Recognition Using ResNet and MobileNet 2023-2024

accurate prediction of the gestures. So, it becomes essential to segment the hand gesture
region from the background. Though there are techniques like segmentation using colour
spaces and Otsus technique, they all have their limitations with respect to the background
conditions.

2.LITERATURE REVIEW
For sign language recognition in the human-computer relationship, many approaches are
present in the literature. The basic goal of these technologies is to facilitate
communication by deducing the accurate interpretation of the user’s gestures/signs. The
following steps are included in this methodology: capture and preprocessing, gesture
representation, feature extraction, and categorization. This subject looks at various

Department of Computer Science and Engineering, VCET, Puttur. Page 2


HCI with Hand Gesture Recognition Using ResNet and MobileNet 2023-2024

movement recognition techniques in the framework of several sign languages. A


summary of these strategies is provided below.

ASL Akhter (2018) described a technique for recognizing ASL alphabets. The multiple
ASL alphabets are represented using PCA-based attributes, a Gabor filter, and
orientation-based hash values. The collected characteristics are then categorized to use an
artificial neural network (ANN). The effectiveness of their self-created database of 24
fixed motions was examined in this study. Similarly, the authors in this study have
created a CNN-based model for human gesture recognition. For this, the model is trained
and evaluated on 31 distinct ASL alphabet and number classes. In another work, the
authors have used a different approach of ASL alphabet identification in which both color
and depth pictures of gestures were supplied into a CNN model. In this model, two
convolutional layers are utilized to extract characteristics from every intake, and the
information from these levels is combined and sent for categorization into a fully
connected layer. In addition to RGB photographs, numerous academics have worked
using depth sensors such as the Microsoft Kinect. In this study, the authors have
demonstrated a technique for sign language recognition utilizing CNN with multiview
augmentation and reasoning fusion. The depth photographs of the motions were collected
using the Microsoft Kinect camera. The writers of this study proposed using augmented
data for CNN model coaching. This approach obtained high classification accuracy at the
expense of high computation complexity. Additional Kinect sensor-based hand gesture
recognition method for ISL identification has been seen in the research. In this study, a
research was conducted by employing a distinct mix of feature extraction and machine
learning techniques for accurate recognition of hand gestures. A total of 140 static words
were used to measure performance across a variety of classes. In another work, the
authors described a new approach for recognizing ASL fingerspelling that makes use of a
depth sensor. The writers used a principal component analysis network (PCANet) to
retrieve and learn features from depth images in this document. Lastly, a linear SVM
(support-vector machine) is used to classify 24 static ASL motions. Authors here have
suggested a method for recognizing ISL using a 5-layer CNN model. The information in
this study was collected from 12 distinct signers that use the Microsoft Kinect sensor.
This method’s performance has only been evaluated for ISL numerals and alphabets.

Department of Computer Science and Engineering, VCET, Puttur. Page 3


HCI with Hand Gesture Recognition Using ResNet and MobileNet 2023-2024

For gesture recognition, several writers in journals have employed contact-based


approaches. Authors here have presented a wearable gadget-based approach for detecting
ASL. Six inertial measuring units (IMUs) were used to create 28 ASL sentences, which
were then classified using the LSTM algorithm. For a Chinese sign language translation
system, Xiao et al. propose a movement detection strategy based on recurrent neural
networks (RNNs). The signer’s skeleton pattern is employed for bidirectional
communication in this investigation. The performance of this approach is assessed using
standard RGB depth images of various stationary movements. For ISL translation,
demonstrated a sensor-based real-time hand gesture detection system.

For segmentation, Zernike moments and SVM were utilized. Reference [24] showed a
solution for static gesture recognition using Inception V3. For performance evaluation,
the static image dataset of 24 English letter ASL from was used. Using Inception v3, a 2-
stage learning approach was used to fine tune the classification model, which achieved an
efficiency of 91.35 percent. A neural network-based SLR approach was described by the
researchers of . A hand-detecting network based on faster R-CNN and 3D CNN for
extracting features, and LSTM for encoding and decoding make up the whole
identification system. Despite the fact that the dataset is limited, this method yields good
identification accuracy. For SLR systems, proposed a deep learning-based architecture. A
CNN-based framework was used for the classification of the indications. A web camera-
based database of ISL alphabets, numerals, and phrases was used to assess performance
in this study. The productivity of this technique was examined using different optimizers
in the CNN architecture. Contact-based, Kinect-based, and RB image-based hand motion
recognitions are all evaluated in Table 1. The following conclusions may be taken from
this section’s literature review.

Department of Computer Science and Engineering, VCET, Puttur. Page 4


HCI with Hand Gesture Recognition Using ResNet and MobileNet 2023-2024

3. METHODOLGY
EfficientNet Overview:
EfficientNet, particularly the EfficientNetV2 version used in this project, represents a
significant advancement in CNN (Convolutional Neural Network) architectures. Its
design is rooted in the principle of compound scaling, which uniformly scales the depth,
width, and resolution of the network. This allows EfficientNet to achieve higher accuracy
without a proportional increase in computational complexity.

Preprocessing:
The data preprocessing steps are critical in preparing the images for the model. The
pipeline involves:

- Resizing images to 256x256 pixels.Applying a Random Resized Crop to 224x224


pixels, accommodating the input size requirement of EfficientNet.

- Random Rotation to introduce rotational invariance.

Department of Computer Science and Engineering, VCET, Puttur. Page 5


HCI with Hand Gesture Recognition Using ResNet and MobileNet 2023-2024

- Color Jittering to enhance the robustness of the model against variations in brightness
and contrast.
- Gaussian Blur to simulate variations in image quality and focus.
- Normalization using predefined mean and standard deviation values, aligning the
dataset with the conditions of the pretrained model.
Training Process:
Data Loading: The images are loaded using the ImageFolder dataset class, which
assumes that images are organized in a folder structure where each folder corresponds to
a class. This dataset is then split into training and test sets.
Model Configuration: The EfficientNetV2 model is initialized with pretrained weights to
leverage transfer learning. The final classifier layer is replaced to match the number of
classes in the sign language dataset. The model is then transferred to the CUDA device if
available, enabling GPU acceleration.
Training Loop:For each epoch, the model is set to training mode.Inputs and labels from
the training data loader are forwarded through the model.Loss is calculated using Cross-
Entropy, which is appropriate for multi-class classification tasks.Backpropagation and
optimization steps are performed using Adam optimizer.
Evaluation Metrics:After training, the model's performance is evaluated using
metrics like accuracy, precision, and F1 score. These metrics provide a comprehensive
understanding of the model's effectiveness in classifying sign language gestures correctly.
Front-End Development Using React:
Overview: The front-end application is being developed using React, a popular
JavaScript library for building user interfaces. React's component-based architecture
allows for efficient and flexible development of the application's user interface.
User Interface: The user interface will feature a clean and intuitive design. It will include
a live video feed section where users can perform sign language gestures. Real-time
recognition results will be displayed, either as text or through an avatar mimicking the
sign.
Integration with Backend: The front-end will communicate with the backend (the
EfficientNet model) through a RESTful API or a WebSocket for real-time interaction.

Department of Computer Science and Engineering, VCET, Puttur. Page 6


HCI with Hand Gesture Recognition Using ResNet and MobileNet 2023-2024

The model's predictions will be sent back to the front-end, where they will be rendered
for the user.
Accessibility and Responsiveness: Special attention is being paid to ensure the
application is accessible, with considerations for users with various disabilities.
Additionally, the design will be responsive, ensuring usability across different devices
and screen sizes.

This project combines state-of-the-art deep learning techniques with modern web
development practices to create an accessible, efficient, and user-friendly tool for real-
time sign language recognition. The use of EfficientNet for backend processing ensures
high accuracy and efficiency, while React provides a robust and responsive front-end.
Together, these technologies create a system that can significantly enhance
communication for the deaf and dumb community.

The authors in this study have proposed an Arabic sign language recognition and
generating Arabic speech system, they worked on a dataset, which they manually
collected, the dataset has 31 classes 125 pictures for each 31 alphabet classes, and the
suggested system is differently tested by combining hyperparameters to obtain the
optimal outcomes with the least training time. They applied different preprocessing
techniques such as the extracted images are resized to 128 × 128 pixels and converted to
RGB and applied different augmentation techniques such as rotating images, horizontal
flipping, and shearing images with 0.2 random degree range. They used a CNN model for
feature extraction and classification, and the model was trained on 80% of the dataset and
tested on 20% rest of the dataset and achieved 90% accuracy on the test set.

Sagayam and Hemanth proposed an Arabic sign language recognition made by a group of
researchers from the Umm Al Qura University, Saudi Arabia, they worked on ArSL
(2018) dataset, they applied some preprocessing techniques such as removing noise from
image, converting images to gray scale, resizing them to dimensions 64 ∗ 64 pixels and
normalizing them by dividing each pixel on 255 to convert the range of pixels from 0 to
255 to 0–1, they proposed a custom CNN model, which was trained on 80% of the data
and tested on the remaining 20% (11,089 images), the accuracy they achieved on the
training set was 98.06%, and the accuracy they achieved on the test set was 88.87%.

Department of Computer Science and Engineering, VCET, Puttur. Page 7


HCI with Hand Gesture Recognition Using ResNet and MobileNet 2023-2024

As an example, a team of academics from the Al-Azhar University in Cairo, Egypt, and
the King Khalid University in Abha, Saudi Arabia, presented an Arabic sign language
recognition system for alphabets using machine learning methods. They experimented on
a collection of 2,800 photographs and 28 classes, with 10 persons considering the 28
distinct alphabets. Each letter received 100 photographs, for a total of 2,800 pictures.
Color photographs are enhanced in order to increase picture value. The color picture is
transformed to grayscale picture with 256 density stages and then resized to 640 x 480
pixels. Filtering techniques are also used to reduce ambient sound. They additionally
conducted optimization and picture improvement algorithms, classification, and
morphological straining to the color photograph in a variety of methods to produce the
best example to enable us subsequently retrieve one of the best attributes and achieve the
greatest efficiency. The feature extraction is performed using hand shape-based
description, where each hand image was described by a vector consisting of 15 values
where the values represent the key point locations, and the classification part was
performed by some algorithms such as, KNN classifier, Naïve-Bayesian, and MLP.
When testing, they achieved an accuracy rate of 97.548%.

Department of Computer Science and Engineering, VCET, Puttur. Page 8


HCI with Hand Gesture Recognition Using ResNet and MobileNet 2023-2024

6. DISCUSSION
In this study, sign language recognition is considered for detecting the Arabic alphabet.
The purpose of this study is to assist hearing-impaired people by assisting them with
modern technology. The dataset used in this study is ArASL2018, and it consisted of
54,000 images of 32 Arabic sign language alphabets. Various preprocessing techniques
were applied to these images like reshaping, resizing, smoothing the images, and
removing the noise. Apart from preprocessing, data augmentation techniques were also
applied. These techniques would help in improving the model and the accuracy.
Subsequently, preprocessing is performed. The images are resized to 64 ∗ 64 pixels and
converted from grayscale images to three-channel images. We applied the median filter
to the images, which acts as lowpass filtering in order to smooth the images and reduce
noise, to make the model more robust when generalizing in real-life scenarios (images),
and to avoid overfitting as possible. Different augmentation techniques are applied to the
images. Then, the preprocessed image is fed into two different models, which are
ResNet50 and MobileNetV2. ResNet50 and MobileNetV2 architectures were
implemented together. We applied two different models to our experimentation, and
these two models were pretrained on the Google ImageNet dataset. The first model
ResNet50 achieved an accuracy on the test set of about 97.5%. The second model
MobileNetV2 achieved an accuracy on the test set of about 97%. Finally, when we
ensembled the predictions of these two models, and our accuracy increased to about
98.2%.

6.1. Samples of Real-Time Detection

Department of Computer Science and Engineering, VCET, Puttur. Page 9


HCI with Hand Gesture Recognition Using ResNet and MobileNet 2023-2024

In Figure 18 we can see the first 9 Arabic alphabets and their classification in real time,
and we process the real-time video to predict the hand gesture in each frame and classify
it to one of the alphabet classes.

figure

Department of Computer Science and Engineering, VCET, Puttur. Page 10


HCI with Hand Gesture Recognition Using ResNet and MobileNet 2023-2024

7. CONCLUSIONS
Hand movements have been an important part of communication since the beginning of
time. Sign language, a visual form of communication, is based on hand movements. An
insight study of convolutional neural network (CNN) design is specifically designed for
hand motion sign language identification within that research. This technique is well
described and provides greater classification results with less training sets than those of
other current CNN models. In this investigation, VGG-11 and VGG-16 were also
designed and evaluated to test the effectiveness of this system. A currently accessible
American Sign Language (ASL) database is utilized to assess competence. The
competence of the proposed system, VGG-11, and VGG-16 is experimentally evaluated
in caparison to state structure techniques. Several effectiveness measures, in addition to
accuracy, were used to assess the efficacy of the suggested task. The findings reveal that
the proposed model outperforms previous techniques since it can classify a large number
of signals with a low rate of error. The technique is also tested with the new data and
shown to be the rotation and resizing robust. Here in this work, we trained a model,
which will be able to classify the Arabic sign language, which consists of 32 Arabic
alphabet sign classes. In images, sign language is detected through the pose of the hand.
In this study, we proposed a framework that consists of two CNN models, and each of
them is individually trained on the training set. The final predictions of the two models
were ensembled to achieve higher results. Dataset: We worked on a dataset released in
2019 and called ArSL2018 launched at the Prince Mohammad Bin Fahd University, Al
Khobar, Saudi Arabia, and we will talk about the details of this dataset in the next
section. Summary of results: We achieved on the test set that we took from our whole
data an accuracy of about 97% after applying many preprocessing techniques and
different hyperparameters for each model, different augmentation techniques, etc.

Department of Computer Science and Engineering, VCET, Puttur. Page 11


HCI with Hand Gesture Recognition Using ResNet and MobileNet 2023-2024

REFERENCES
[1].“HeartPrint: Exploring a Heartbeat-Based Multiuser Authentication With Single
mmWave Radar”-Yao Wang , Tao Gu , Senior Member, IEEE, Tom H. Luan ,
Senior Member, IEEE, Minjie Lyu, and Yue Li, IEEE INTERNET OF THINGS
JOURNAL, VOL. 9, NO. 24, 15 DECEMBER 2022
[2].G. Shen, J. Zhang, A. Marshall, L. Peng, and X. Wang, “Radio frequency
fingerprint identification for LoRa using spectrogram and CNN,” in Proc. IEEE
Conf. Comput. Commun. (INFOCOM), 2021, pp. 1–10.
[3].Y. Wang, W. Cai, T. Gu, W. Shao, Y. Li, and Y. Yu, “Secure your voice: An oral
airflow-based continuous liveness detection for voice assistants,” Proc. ACM
Interact. Mobile Wearable Ubiquitous Technol., vol. 3, no. 4, pp. 1–28, 2019.
[4].Y. Zhang, W. Hu, W. Xu, C. T. Chou, and J. Hu, “Continuous authentication
using eye movement response of implicit visual stimuli,” Proc. ACM Interact.
Mobile Wearable Ubiquitous Technol., vol. 1, no. 4, pp. 1–22, 2018.

Department of Computer Science and Engineering, VCET, Puttur. Page 12

You might also like