0% found this document useful (0 votes)
47 views33 pages

From - Table - of - Content - Report - s2t (1) (1) 2

The document outlines a project focused on developing a real-time system to recognize American Sign Language (ASL) hand gestures and translate them into text and speech, addressing communication barriers faced by deaf and mute individuals. It details the project's motivation, objectives, and modules, including data acquisition, gesture classification, and translation processes, while highlighting the need for an efficient and accessible solution. The literature review compares existing methods and their accuracies, emphasizing the project's aim to leverage deep learning and image processing for improved gesture recognition.

Uploaded by

Pritam Pawar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views33 pages

From - Table - of - Content - Report - s2t (1) (1) 2

The document outlines a project focused on developing a real-time system to recognize American Sign Language (ASL) hand gestures and translate them into text and speech, addressing communication barriers faced by deaf and mute individuals. It details the project's motivation, objectives, and modules, including data acquisition, gesture classification, and translation processes, while highlighting the need for an efficient and accessible solution. The literature review compares existing methods and their accuracies, emphasizing the project's aim to leverage deep learning and image processing for improved gesture recognition.

Uploaded by

Pritam Pawar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Chapter 1: Introduction

1.1 Introduction:-
American sign language is a predominant sign language Since the only disability D&M people have been
communication related and they cannot use spoken languages hence the only way for them to communicate
is through sign language. Communication is the process of exchange of thoughts and messages in various
ways such as speech, signals, behavior and visuals. Deaf and dumb(D&M) people make use of their hands to
express different gestures to express their ideas with other people. Gestures are the nonverbally exchanged
messages and these gestures are understood with vision. This nonverbal communication of deaf and dumb
people is called sign language.

In our project we basically focus on producing a model which can recognise Fingerspelling based hand
gestures in order to form a complete word by combining each gesture. The gestures we aim to train are as
given in the image below.

figure 1. ASL sign language


1.2 Scope
This System will be Beneficial for Both Dumb/Deaf People and the People Who do not understands the Sign
Language. They just need to do that with sign Language gestures and this system will identify what he/she is
trying to say after identification it gives the output in the form of Text as well as Speech format.

1
1.3 Project Modules
1.3.1.Data Acquisition
1.3.2. Data pre-processing and Feature extraction
1.3.3.Gesture Classification
1.3.4 Text and Speech Translation

1.4 Project Requirements


1.4.1 Hardware Requirement
• Webcam

1.4.2 Software Requirement


• Operating System: Windows 8 and Above
• IDE: PyCharm
• Programming Language: Python 3.9 5
• Python libraries: OpenCV, NumPy, Keras,mediapipe,Tensorflow

2
Chapter 2 :Literature Review
2.1
Motivation:-
In a world driven by communication, the ability to express oneself is not just a
convenience—it's a basic human right. Yet, millions of individuals who are hearing or
speech impaired often struggle to be heard, to be understood, and to participate fully in
daily life.
This project was born out of a deep desire to bridge the gap between the deaf-mute
community and the hearing world. Sign language is their voice—but not everyone
understands it. My goal is to translate their signs into text and speech, giving them a
powerful tool to communicate with ease and confidence.
The idea isn't just to build a software application—it's to create a bridge. A bridge where
gestures become words, where silence turns into voice, and where inclusion replaces
isolation.
With the help of real-time gesture recognition using MediaPipe and deep learning, this
project takes a step towards true accessibility. Every sign translated is a message made
louder. Every sentence spoken from a gesture is one more step toward equality.
Through this innovation, I hope not only to use technology for good, but also to send a
message:
Technology should empower everyone—not just those who can speak or hear.

2.2 Problem Defination:-


Communication is a fundamental aspect of human interaction. However, individuals who
are deaf or mute face significant barriers in expressing themselves and being understood in
day-to-day situations. While sign language serves as an essential medium for their
communication, it is not widely understood by the general population, creating a
communication gap between the hearing-impaired and the rest of society.
This lack of understanding often leads to social exclusion, limited access to services, and
dependency on interpreters, which can be both costly and unavailable in real-time
scenarios.
Despite advancements in technology, there is still a lack of affordable, accessible, and real-
time systems that can translate sign language into spoken language or text, especially in

3
the context of Indian Sign Language (ISL). Most existing systems are either static, slow, or
only work for single-word translation without capturing the flow of natural sentences.

Therefore, there is a critical need for a real-time, efficient, and accurate system that can
recognize hand gestures and convert them into both text and speech, enabling seamless
communication for the hearing and speech impaired.
This project aims to solve this problem by leveraging MediaPipe for hand tracking,
combined with Deep Learning (LSTM) for recognizing sentence-level gestures, and finally
converting the recognized text into audible speech using Text-to-Speech (TTS) technology.

2.3 Aim
• Communication barriers for hearing-impaired individuals.
• Hearing-impaired individuals face communication barriers with non-signing people.
• Sign language is not widely understood by the general population.
• Challenges in workplaces, education, healthcare, and public spaces.
2.4:-Objectives :-
More than 70 million deaf people around the world use sign languages to communicate. Sign
language allows them to learn, work, access services, and be included in the communities.

It is hard to make everybody learn the use of sign language with the goal of ensuring that
people with disabilities can enjoy their rights on an equal basis with others.

So, the aim is to develop a user-friendly human computer interface (HCI) where the computer
understands the American sign language This Project will help the dumb and deaf people by
making their life easy.

To create a computer software and train a model using CNN which takes an image of hand
gesture of American Sign Language and shows the output of the particular sign language in
text format converts it into audio format.

4
2.5:- Literature Review:-

Mahesh Kumar N B1 Assistant Professor (Senior Grade),


Bannari Amman Institute of Technology,
Sathyamangalam, Erode, India (2018):
This paper shows the sign language recognizing of 26 hand gestures in Indian sign language using
MATLAB. The proposed system contains four modules such as: pre-processing and hand segmentation,
feature extraction, sign recognition and sign to text. By using image processing the segmentation can be
done. Otsu algorithm is used for segmentation purposes Some of the features are extracted such as Eigen
values and Eigen vectors which are used in recognition. The Linear Discriminant Analysis (LDA) algorithm
was used for gesture recognition and recognized gestures are converted into text and voice format. The
proposed system helps to dimensionality.

figure 2.1pre-processing and hand segmentation

Translation of Sign Language Finger-Spelling to Text


using Image Processing
-by Krishna Modi

5
Mukesh Patel School of Technology and Management Engineering JVPD Scheme
Bhaktivedanta Marg, Vile Parle (W),
Mumbai-400 056(2013)
In This proposed system, they intend to recognize some very basic elements of sign language and to translate
them to text. Firstly, the video shall be captured frame-by-frame, the captured video will be processed and
the appropriate image will be extracted, this retrieved image will be further processed using BLOB analysis
and will be sent to the statistical database here the captured image shall compared with the one saved in
the database and the matched image will be used to determine the performed alphabet sign in the language.
Here, they will be implementing only

American Sign Language Finger-spellings, and They will construct words and sentences with them. With the
proposed method, they found that the probability of Obtaining desired output is around 93% which is
sufficient and Can be enough to make it suitable to be used on a larger scale For the intended purpose.

figure 2.2 Pre processing Sign Language Finger-spellings

Sign Language to Text and Speech Conversion


-By Bikash K. Yadav
Sinhgad College of Engineering, Pune, Maharashtra(2020)
Sign language is one of the oldest and most natural forms of language for communication. Since most people
do not know sign language and interpreters are very difficult to come by, They have come up with a real-
time method using Convolution Neural Network (CNN) for fingerspelling based American Sign Language
(ASL). In Their method, the hand is first passed through a filter and after the filter has applied the hand is
passed through a classifier that predicts the class of the hand gestures. Using Their approach, They are able
to reach a model accuracy of 95.8%.

6
figure 2.3 Translate the sign language

Sign Language to Text and Speech Translation in Real Time Using Convolutional Neural
Network-by Ankit Ojha Dept. of ISE
JSSATE Bangalore, India .
Creating a desktop application that uses a computer’s webcam to capture a person signing gestures for
American sign language (ASL), and translate it into corresponding text and speech in real time. The
translated sign language gesture will be acquired in text which is farther converted into audio. In this
manner they are implementing a finger spelling sign language translator. To enable the detection of
gestures, they are making use of a Convolutional neural network (CNN). A CNN is highly efficient in tackling
computer vision problems and is capable

of detecting the desired features with a high degree of accuracy upon sufficient training. The modules are
image acquisition, hand region segmentation and hand detection and tracking hand posture recognition
and display as text/speech. A finger spelling sign language translator is obtained which has an accuracy of
95%

CONVERSION OF SIGN LANGUAGE TO TEXT AND SPEECH USING MACHINE LEARNING


TECHNIQUES
Author : Victorial Adebimpe Akano(2018)

Communication with the hearing impaired (deaf/mute) people is a great challenge in our society today;
this can be attributed to the fact that their means of communication (Sign Language or hand gestures at a
local level) requires an interpreter at every instance. To convert ASL signed hand gestures into text as well

7
as speech using unsupervised feature learning to eliminate communication barrier with the hearing
impaired and as well provide teaching aid for sign language.

Sample images of different ASL signs were collected using the Kinect sensor using the image acquisition
toolbox on MATLAB. About five hundred (500) data samples (with each sign count five and ten (5-10)) were
collected as the training data. The reason for this is to make the algorithm very robust for images of the same
database in order to reduce the rate of misclassification. The combination FAST and SURF with a KNN of 10
also showed that unsupervised learning classification could determine the best matched feature from the
existing database. In turn, the best match was converted to text as well as speech. The introduced system
achieved a 92% accuracy of supervised feature learning and 78%of unsupervised feature learning

figure 2.4 Image Collection of different Asl

2.6 An Improved Hand Gesture Recognition Algorithm based on


image contours to Identify the American Sign Language
8
-by Rakesh Kumar
Department of Computer Engineering & Applications, GLA University,Mathura(2021)
this paper proposed a recognition and classification of hand gesture to identify the correct denotation with
maximum accurateness for standard American Sign Language. The proposal intelligently used the
information based on image contours to identify the character's representation of hand gesture. The

proposal optimizes the performance overhead through identifications of 17 characters and 6 symbols based
on image contours and convexity measurement of Standard American Sign Language without using complex
algorithms and specialized hardware devices. Accuracy measurement done through simulation, which shows
how their proposal provides more accuracy with minimum complexity in comparison to other state-of-the-
art works. The average accuracy is 86% overall.

figure 2.5 Accuracy of sign language

9
Chapter 3 : Proposed System

3.1 Comparison Table


Author Mahesh Krishna Bikash K. Ayush Victorial Rakesh
name Kumar Modi Yadav Pandey Adebimpe Kumar
Akano

Algorithm LDA Blob CNN CNN KNN contour


Analysis measure
ment
Accuracy 80% 93% 95.8% 95% 92% 86%

Year 2018 2013 2020 2020 2018 2021

3.1Comparison Table

The table below presents a comparative study of various research efforts undertaken by different authors in
the field of sign language recognition. Each author has proposed a unique algorithm to tackle the challenge
of gesture recognition and translation, evaluated by the accuracy of their respective systems:

1. Mahesh Kumar (2018) implemented the Linear Discriminant Analysis (LDA) technique for
gesture classification. While simple and computationally efficient, the model achieved an accuracy
of 80%, which is moderate compared to more advanced deep learning methods.

2. Krishna Modi (2013) applied Blob Analysis, a classical image processing method. Despite its
simplicity, it delivered an impressive 93% accuracy, showcasing that traditional methods can still
be effective when well-optimized.

3. Bikash K. Yadav (2020) and Ayush Pandey (2020) both employed Convolutional Neural
Networks (CNN), a deep learning-based approach known for its excellent performance in image-

10
related tasks. Their models reached 95.8% and 95% accuracy respectively, indicating CNN's strong
ability to extract complex features and learn from gesture images.

4. Victorial Adebimpe Akano (2018) used the K-Nearest Neighbors (KNN) algorithm, a classic
machine learning method. With an accuracy of 92%, the method proved to be reliable for
classification tasks when applied with proper feature extraction.

5. Rakesh Kumar (2021) opted for contour measurement, focusing on shape-based gesture analysis.
While effective, this method achieved 86% accuracy, slightly lower than deep learning methods but
still significant considering its interpretability and lower resource demand.

3.2 Research Gap


-In first research paper [1] they used LDA algorithm and they converted rgb image to binary
image but the image processing is not as good enough to get more accurate
features of particular sign
-In second research paper [2] they recognize sign using direct image pixels comparison
which are stored into their database they also converted rgb image to binary.in that image
processing they removed some necessary features.
-In third research paper [3] and [4] they have applied CNN algorithm for sign recognition
which is very effective but they didn’t do much image processing before feeding data to train
CNN
-in fifth research paper [5] they have used simplest algorithm knn for sign recognition and
they also didn’t do much image processing maybe that’s the reason for their moderate
accuracy
-in sixth research paper [6] they used contour and convexity measurements for image
recognition. But the algorithm didn’t result in good accuracy

3.3 Project Feasibility Study


3.3.1 Operational feasibility
- The whole purpose of this system is to handle the work much more accurately and efficiently with
less time consumption.

- This app is very user-friendly to use. They only require knowledge about American Sign Language.

-The system is operationally feasible as it is very easy for the End users to operate it. It only needs
basic information about windows application.

3.3.2 Technical feasibility


11
The technical needs of the system may include: Front-end and back-end selection An important issue for
the development of a project is the selection of suitable front-end and back-end. When we decided to
develop the project, we went through an extensive study to determine the most suitable platform that suits
the needs of the organization as well as helps in development of the project. The aspects of our study
included the following factors.
Front-end selection:

It must have a graphical user interface that assists users that are not from IT background.

So we have made front-end using Python Tkinter Gui.

Features:

1. Scalability and extensibility.

2. Flexibility.

3. Easy to debug and maintain.

Back-end Selection:We have used Python as our Back-end Language which has the most widest library
collections The technical feasibility is frequently the most difficult area encountered at this stage. Our app
will fit perfectly for technical feasibility.

3.3.3 Economic Feasibility

The developing system must be justified by cost and benefit. Criteria to ensure that effort is concentrated on
project, which will give best, return at the earliest. One of the factors, which affect the development of a new
system, is the cost it would require. Since the system is developed as part of project work, there is no manual
cost to spend for the proposed system. Also, all the resources are already available, it gives an indication of
the system is economically possible for development.
3.4 Timeline Chart

figure 3.0

12
3.5 Detailed Module description
3.5.1 Data Acquisition
The different approaches to acquire data about the hand gesture can be done in the following ways:
It uses electromechanical devices to provide exact hand configuration, and position. Different glove-
based approaches can be used to extract information. But it is expensive and not user friendly.
In vision-based methods, the computer webcam is the input device for observing the information of
hands and/or fingers. The Vision Based methods require only a camera, thus realizing a natural
interaction between humans and computers without the use of any extra devices, thereby reducing costs.
The main challenge of vision-based hand detection ranges from coping with the large variability of the
human hand’s appearance due to a huge number of hand movements, to different skin-color possibilities
as well as to the variations in viewpoints, scales, and speed of the camera capturing the scene.

3.5.2 Data pre-processing and Feature extraction


• In this approach for hand detection, firstly we detect hand from image that is acquired by webcam
and for detecting a hand we used media pipe library which is used for image processing. So,
after finding the hand from image we get the region of interest (Roi) then we cropped that image
and convert the image to gray image using OpenCV library after we applied the gaussian blur . The
filter can be easily applied using open computer vision library also known as OpenCV. Then we
converted the gray image to binary image using threshold and Adaptive threshold methods.
• We have collected images of different signs of different angles for sign letter A to Z.

13
figure 3.2

figure 3.2

in this method there are many loop holes like your hand must be ahead of clean soft background and that is in
proper lightning condition then only this method will give good accurate results but in real world we dont get
good background everywhere and we don’t get good lightning conditions too.
So to overcome this situation we try different approaches then we reached at one interesting solution in which
firstly we detect hand from frame using mediapipe and get the hand landmarks of hand present in that image
then we draw and connect those landmarks in simple white image
14
figure 3.3 in this image we collacted sign language of B

figure 3.4 in this image we collect the sign lanuages of A

figure 3.5 in this image we collected the sign language D

15
Mediapipe Landmark System:

figure 3.6 Hand Landmark Detection Using MediaPipe (21 Landmarks)


The above diagram represents the 21 hand landmarks used in hand gesture recognition, especially
with tools like MediaPipe Hands by Google. These landmarks are crucial in capturing and
analyzing the position and orientation of fingers for applications like sign language recognition,
gesture control, and hand tracking.
Each red dot in the image indicates a specific joint or fingertip on the hand, and the green lines
represent the connections between them, forming a skeletal structure of the hand. Below is a
breakdown of the landmark indices and their corresponding positions:
1. 0 – WRIST: The base of the hand.
2. 1 to 4 – THUMB (CMC to TIP): Traces the thumb from its base to the tip.
3. 5 to 8 – INDEX FINGER (MCP to TIP): Represents the joints of the index finger.
4. 9 to 12 – MIDDLE FINGER (MCP to TIP): Represents the joints of the middle finger.
5. 13 to 16 – RING FINGER (MCP to TIP): Joints of the ring finger.
6. 17 to 20 – PINKY FINGER (MCP to TIP): Joints of the smallest finger.
Now we will get this landmark points and draw it in plain white background using opencv
library

16
figure 3.7 In this Image now we get Landmark points of “B”

figure 3.8 this image is shown that how to display the with landmark

17
figure 3.9 In this Image now we get Landmark points of “A”

figure 3.10 this image is shown that how to display the with landmark

-By doing this we tackle the situation of background and lightning conditions because the mediapipe labrary
will give us landmark points in any background and mostly in any lightning conditions.
-we have collected 180 skeleton images of Alphabets from A to Z

18
3.5.3 Gesture Classification

Convolutional Neural Network (CNN)


CNN is a class of neural networks that are highly useful in solving computer vision problems. They found
inspiration from the actual perception of vision that takes place in the visual cortex of our brain. They make
use of a filter/kernel to scan through the entire pixel values of the image and make computations by setting
appropriate weights to enable detection of a specific feature. CNN is equipped with layers like convolution
layer, max pooling layer, flatten layer, dense layer, dropout layer and a fully connected neural network layer.
These layers together make a very powerful tool that can identify features in an image. The starting layers
detect low level features that gradually begin to detect more complex higher-level features

Unlike regular Neural Networks, in the layers of CNN, the neurons are arranged in 3 dimensions: width,
height, depth.

The neurons in a layer will only be connected to a small region of the layer (window size) before it, instead of
all of the neurons in a fully-connected manner.

Moreover, the final output layer would have dimensions(number of classes), because by the end of the CNN
architecture we will reduce the full image into a single vector of class scores.

figure 3.11

19
1. Convolutional Layer:
In convolution layer I have taken a small window size [typically of length 5*5] that extends to the depth of
the input matrix.

The layer consists of learnable filters of window size. During every iteration I slid the window by stride size
[typically 1], and compute the dot product of filter entries and input values at a given position.

As I continue this process well create a 2-Dimensional activation matrix that gives the response of that matrix
at every spatial position.

That is, the network will learn filters that activate when they see some type of visual feature such as an edge
of some orientation or a blotch of some colour.

2. Pooling Layer:
We use pooling layer to decrease the size of activation matrix and ultimately reduce the learnable parameters.

There are two types of pooling:

a. Max Pooling:
In max pooling we take a window size [for example window of size 2*2], and only taken the maximum of 4
values.

Well lid this window and continue this process, so well finally get an activation matrix half of its original Size.

b. Average Pooling:
In average pooling we take average of all Values in a window.

figure 3.12 In average pooling we take average of all Values in a window

20
3. Fully Connected Layer:
In convolution layer neurons are connected only to a local region, while in a fully connected region, well
connect the all the inputs to neurons.

figure 3.13 The preprocessed 180 images/alphabet will feed the keras CNN model.

Because we got bad accuracy in 26 different classes thus, We divided whole 26 different alphabets into 8
classes in which every class contains similar alphabets:

figure 3.13a [y,j]

figure 3.13b [c,o]

21
figure 3.13c [g,h]

figure 3.13d [b,d,f,I,u,v,k,r,w]

figure 3.13e [p,q,z]

22
figure 3.13f [a,e,m,n,s,t]

All the gesture labels will be assigned with a

probability. The label with the highest probability will treated to be the predicted label.

So when model will classify [aemnst] in one single class using mathematical operation on hand landmarks we
will classify further into single alphabet a or e or m or n or s or t.

figure 3.14 In this image show that Landmark

3.5.4 Text and Speech Translation


The model translates known gestures into words. we have used pyttsx3 library to convert the recognized words
into the appropriate speech. The text-to-speech output is a simple workaround, but it's a useful feature because
it simulates a real-life dialogue.
23
3.6 Project SRS
3.6.1 System Flowchart

figure 3.6.1

The given diagram represents the conceptual architecture or flow of processes in a sign language
recognition system that converts hand gestures into text and speech. This system is designed to bridge the
communication gap between the speech/hearing-impaired and the general population.

3.6.3 DFD diagram


DFD-Level 0

figure 3.6.2

This diagram provides a simple overview of how a Sign Language to Text Converter System works. It
visually represents the communication flow between the user and the system.

24
DFD-Level 1

figure 3.6.3

1. User Input (Hand Gestures):

• The user performs hand gestures representing specific alphabets, words, or phrases using Indian
Sign Language (ISL) or any other sign language standard.

2. Sign Language to Text Converter:

• The system captures the hand gestures via a camera or sensor module.

• Using techniques such as MediaPipe, CNN models, or keypoint detection, the system identifies the
gesture.

• Each gesture is mapped to its corresponding character based on a pre-trained recognition model.

3. Output (Corresponding Character):

• The recognized character is displayed as text to the user.

• This step may also include text-to-speech conversion for vocal output.

25
3.6.4 Sequence diagram

figure 3.6.4

This sequence diagram illustrates the step-by-step process of sign language recognition, starting from video
capture to final output generation. It explains how different components in the system interact to convert
hand gestures into corresponding text using a machine learning model.

26
Conversion of Sign Language to text/Speech

Implementation Plan for next semester

• Real-Time Sentence Translation:-


Use a Sequence Model like LSTM (Long Short-Term Memory) or
Transformer to:
Recognize a series of gestures (over time, like a video).
Output a full sentence directly, instead of word-by-word.

Current Output Upgraded Output


"HELLO" → "MY" → "NAME" → "IS" "HELLO, MY NAME IS RAHUL"
→ "RAHUL" (in one go)

• Two-Way Communication (Bi-Directional):-


Add Text/Speech → Sign Language (using animation or avatars).

Mode Function
Deaf/Mute → Hearing Sign → Text/Speech
Hearing → Deaf/Mute Text/Speech → Sign (animation) ← ADD THIS

• Multilingual Support:-
Allow users to choose other languages (Marathi, Tamil, Telugu, Bengali,
etc.)

How It Works:
Convert signs to English (default).
Use Google Translate API or any language model to translate English
target language.
Use text-to-speech (TTS) in the selected language.

29
Conversion of Sign Language to text/Speech

Sign English Hindi Tamil

👋
"Hello, how "नमस्ते , आप "வணக்கம் , எப் படி
🤚
are you?" कैसे हैं ?" இருக்கிறீர்கள் ?"
🧍‍♂️

Tech Stack:

• Use Python libraries:

o googletrans for translation

o gTTS or pyttsx3 for multilingual speech


:

• Integration with Smart Devices (IoT):-


Control smart devices using sign language.
Example:
Make the sign for “LIGHT ON” → the bulb turns on.
“FAN OFF” → fan turns off.
How to implement:
• Use your sign recognition model to detect command.
• Send command to device using:
o MQTT or Wi-Fi communication
o Raspberry Pi / ESP32

30
Conversion of Sign Language to text/Speech

Chapter 5 :
Implementation and Testing
Here are some snapshots when user shows some hand gestures in different background as well as
in different lightning conditions and system is giving corresponding prediction.

figure 5.1 in this image we check the result that “A” is predicted

figure 5.2 in this image we check the result that “W” is predicted

Here the hand gesture of sign ‘W’ is shown with different background and still our model is predicting
correct letter.

31
Conversion of Sign Language to text/Speech

figure 5.3 in this image we check the result that “B” is predicted

figure 5.4 in this image we check the result that “D” is predicted

After Implementing the cnn algorithm we made gui using python Tkinter and add Suggestions
also to make the process smooth for user.

32
Conversion of Sign Language to text/Speech

Below shown sign is used for giving space between words.

figure 5.5 in this image we get a Sentence ”DEER is predicted and it speak

Below shown sign use after predicting each alphabet to move further.

figure 5.6 in this image we indicate the pawm to get next character get

33
Conversion of Sign Language to text/Speech

Chapter 6
Conclusion and Future Work
Finally, we are able to predict any alphabet[a-z] with 97% Accuracy (with and without clean
background and proper lightning conditions) through our method. And if the background is clear
and there is good lightning condition then we got even 99% accurate results.

In Future work we will make one android application in which we implement this algorithm for
gesture prediction

34
Conversion of Sign Language to text/Speech

Chapter 7 : References

1. 1.Zhou, H.; Zhou, W.; Zhou, Y.; Li, H. Spatial-temporal multi-cue network for
continuous sign language recognition. In Proceedingsof the AAAI Conference on
Artificial Intelligence, New York, NY, USA, 7–12 February 2020 ; Volume 34, pp.
13009–13016.
2. Rodriguez, J.; Martínez, F. How important is motion in sign language translation? IET
Comput. Vis. 2021, 15, 224–234. [CrossRef]
3. Zheng, J.; Chen, Y.; Wu, C.; Shi, X.; Kamal, S.M. Enhancing neural sign language
translation by highlighting the facial expressioninformation. Neurocomputing 2021, 464,
462–472. [CrossRef]
4. Li, D.; Xu, C.; Yu, X.; Zhang, K.; Swift, B.; Suominen, H.; Li, H. Tspnet: Hierarchical
feature learning via temporal semanticpyramid for sign language translation. Adv. Neural
Inf. Process. Syst. 2020, 33, 12034–12045.
5. Núñez-Marcos, A.; Perez-de Viñaspre, O.; Labaka, G. A survey on Sign Language
machine translation. Expert Syst. Appl. 2022,213, 118993. [CrossRef]
6. Cui, R.; Liu, H.; Zhang, C. Recurrent convolutional neural networks for continuous sign
language recognition by stagedoptimization. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26July 2017; pp.
7361–7369

35

You might also like