0% found this document useful (0 votes)
12 views

FINAL PROJECT SYNOPSIS.PDF

The document outlines a project aimed at developing a Sign-to-Text conversion system to bridge communication gaps between sign language users and non-users. It details the methodology, including data set generation, gesture classification using CNN, and the implementation of features like autocorrect. The project focuses on achieving high accuracy in recognizing various sign languages, particularly American Sign Language (ASL), and aims to create a user-friendly interface for effective communication.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

FINAL PROJECT SYNOPSIS.PDF

The document outlines a project aimed at developing a Sign-to-Text conversion system to bridge communication gaps between sign language users and non-users. It details the methodology, including data set generation, gesture classification using CNN, and the implementation of features like autocorrect. The project focuses on achieving high accuracy in recognizing various sign languages, particularly American Sign Language (ASL), and aims to create a user-friendly interface for effective communication.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 12

Synopsis

On

CONVERSION OF SIGN TO TEXT


Submitted in partial fulfillment of the requirement
For the award of the degree of
B.TECH

In

Computer Science & Engineering


(Artificial Intelligence & Machine Learning)

Submitted By

ISHU SINGH
(2000681530027)
RAMAN BALIYAN
(2000681530039)
HARSH TYAGI
(2000681530024)
RITIK CHAUUHAN
(2000681530040)

2023-2024

7th Sem

Department of Computer Science & Engineering


PROBLEM STATEMENT

The project aims to develop a Sign-to-Text


conversion system, addressing the communication
gap between individuals who use sign language and
those who may not be familiar with it. Sign language
is a crucial means of communication for the Deaf and
Hard of Hearing community, yet barriers exist when
interacting with individuals who do not understand
sign language.
Why this topic is chosen?

Fоr interасtiоn between normal рeорle аnd D&M рeорle а


lаnguаge bаrrier is сreаted аs sign lаnguаge struсture since it is
different frоm nоrmаl text. Sо, they deрend оn visiоn-bаsed
соmmuniсаtiоn fоr interасtiоn.

If there is а соmmоn interfасe thаt соnverts the sign lаnguаge tо


text, then the gestures саn be eаsily understооd by non-D&M
рeорle. Sо, reseаrсh hаs been mаde fоr а visiоn-bаsed interfасe
system where D&M рeорle саn enjоy соmmuniсаtiоn withоut
reаlly knоwing eасh оther's lаnguаge.

The aim is tо develop а user-friendly Humаn Cоmрuter Interfасe


(HСI) where the соmрuter understаnds the humаn sign lаnguаge.
There аre vаriоus sign lаnguаges аll оver the wоrld, nаmely
Аmeriсаn Sign Lаnguаge (АSL), Frenсh Sign Lаnguаge, British
Sign Lаnguаge (BSL), Indiаn
Sign lаnguаge, Jараnese Sign Lаnguаge аnd wоrk hаs been dоne
оn оther lаnguаges аll аrоund the wоrld.
Objective and Scope

We are planning to achieve higher accuracy even in case of complex


backgrounds by trying out various background subtraction algorithms.

We are also thinking of improving the Pre Processing to predict gestures


in low light conditions with a higher accuracy.

This project can be enhanced by being built as a web/mobile application


for the users to conveniently access the project. Also, the existing project
only works for ASL; it can be extended to work for other native sign
languages with the right amount of data set and training. This project
implements a finger spelling translator; however, sign languages are also
spoken in a contextual basis where each gesture could represent an
object, or verb. So, identifying this kind of a contextual signing would
require a higher degree of processing and natural language processing
(NLP).
METHODOLOGY

The system is a vision-based approach. All signs are represented with bare hands and so it
eliminates the problem of using any artificial devices for interaction.

5.1 Data Set Generation:


For the project we tried to find already made datasets but we couldn’t
find dataset in the form of raw images that matched our requirements. All
we could find were the datasets in the form of RGB values. Hence, we
decided to create our own data set. Steps we followed to create our data
set are as follows.

We used Open computer vision (OpenCV) library in order to produce our


dataset.

Firstly, we captured around 800 images of each of the symbol in ASL


(American Sign Language) for training purposes and around 200 images
per symbol for testing purpose.

First, we capture each frame shown by the webcam of our machine. In


each frame we define a Region Of Interest (ROI) which is denoted by a
blue bounded square as shown in the image below:

Then we apply an Blur Filter to our image which helps us extract various
features of our image.

5.2 Gesture Classification:


Our approach uses two layers of algorithm to predict the final symbol of the user.
Algorithm Layer 1:

1. Apply Gaussian Blur filter and threshold to the frame taken with openCV to get the
processed image after feature extraction.
2. This processed image is passed to the CNN model for prediction and if a letter is
detected for more than 50 frames then the letter is printed and taken into consideration
for forming the word.
3. Space between the words is considered using the blank symbol.

Algorithm Layer 2:

1. We detect various sets of symbols which show similar results on getting detected.
2. We then classify between those sets using classifiers made for those sets only.

Layer 1:

 CNN Model:
1. 1st Convolution Layer: The input picture has resolution of 128x128 pixels. It is first
processed in the first convolutional layer using 32 filter weights (3x3 pixels each). This will
result in a 126X126 pixel image, one for each Filter-weights.
2. 1st Pooling Layer: The pictures are down sampled using max pooling of 2x2 i.e we keep the
highest value in the 2x2 square of array. Therefore, our picture is down sampled to 63x63
pixels.
3. 2nd Convolution Layer: Now, these 63 x 63 from the output of the first pooling layer is
served as an input to the second convolutional layer. It is processed in the second
convolutional layer using 32 filter weights (3x3 pixels each). This will result in a 60 x 60
pixel image.
4. 2nd Pooling Layer: The resulting images are down sampled again using max pool of 2x2
and is reduced to 30 x 30 resolution of images.
5. 1st Densely Connected Layer: Now these images are used as an input to a fully connected
layer with 128 neurons and the output from the second convolutional layer is reshaped to
an array of 30x30x32 =28800 values. The input to this layer is an array of 28800 values. The
output of these layer is fed to the 2nd Densely Connected Layer. We are using a dropout
layer of value 0.5 to avoid overfitting.
6. 2nd Densely Connected Layer: Now the output from the 1st Densely Connected Layer is
used as an input to a fully connected layer with 96 neurons.
7. Final layer: The output of the 2nd Densely Connected Layer serves as an input for the final
layer which will have the number of neurons as the number of classes we are classifying
(alphabets + blank symbol).

 Activation Function:
We have used ReLU (Rectified Linear Unit) in each of the layers (convolutional as well
as fully connected neurons).
ReLU calculates max(x,0) for each input pixel. This adds nonlinearity to the formula and
helps to learn more complicated features. It helps in removing the vanishing gradient
problem and speeding up the training by reducing the computation time.
 Pooling Layer:
We apply Max pooling to the input image with a pool size of (2, 2) with ReLU
activation function. This reduces the amount of parameters thus lessening the
computation cost and reduces overfitting.

 Dropout Layers:
The problem of overfitting, where after training, the weights of the network are so tuned
to the training examples they are given that the network doesn’t perform well when
given new examples. This layer “drops out” a random set of activations in that layer by
setting them to zero. The network should be able to provide the right classification or
output for a specific example even if some of the activations are dropped out [5].

 Optimizer:
We have used Adam optimizer for updating the model in response to the output of the
loss function.
Adam optimizer combines the advantages of two extensions of two stochastic gradient
descent algorithms namely adaptive gradient algorithm (ADA GRAD) and root mean
square propagation (RMSProp).

Layer 2:

We are using two layers of algorithms to verify and predict symbols which are more similar to
each other so that we can get us close as we can get to detect the symbol shown. In our testing
we found that following symbols were not showing properly and were giving other symbols
also:

1. For D : R and U
2. For U : D and R
3. For I : T, D, K and I
4. For S : M and N
So, to handle above cases we made three different classifiers for classifying these sets:
1. {D, R, U}
2. {T, K, D, I}
3. {S, M, N}

5.3 Finger Spelling Sentence Formation Implementation:


1. Whenever the count of a letter detected exceeds a specific value and no other letter is
close to it by a threshold, we print the letter and add it to the current string (In our code
we kept the value as 50 and difference threshold as 20).
2. Otherwise, we clear the current dictionary which has the count of detections of present
symbol to avoid the probability of a wrong letter getting predicted.
3. Whenever the count of a blank (plain background) detected exceeds a specific value and
if the current buffer is empty no spaces are detected.
4. In other case it predicts the end of word by printing a space and the current gets
appended to the sentence below.

Figure
5.4 AutoCorrect Feature:

A python library Hunspell_suggest is used to suggest correct alternatives for each (incorrect)
input word and we display a set of words matching the current word in which the user can
select a word to append it to the current sentence. This helps in reducing mistakes committed
in spellings and assists in predicting complex words.

5.5 Training and Testing:

We convert our input images (RGB) into grayscale and apply gaussian blur to remove
unnecessary noise. We apply adaptive threshold to extract our hand from the background and
resize our images to 128 x 128.

We feed the input images after pre-processing to our model for training and testing after
applying all the operations mentioned above.

The prediction layer estimates how likely the image will fall under one of the classes. So, the
output is normalized between 0 and 1 and such that the sum of each value in each class sums
to 1. We have achieved this using SoftMax function.

At first the output of the prediction layer will be somewhat far from the actual value. To make it
better we have trained the networks using labelled data. The cross-entropy is a performance
measurement used in the classification. It is a continuous function which is positive at values
which is not same as labelled value and is zero exactly when it is equal to the labelled value.
Therefore, we optimized the cross-entropy by minimizing it as close to zero. To do this in our
network layer we adjust the weights of our neural networks. TensorFlow has an inbuilt function
to calculate the cross entropy.

As we have found out the cross-entropy function, we have optimized it using Gradient Descent
in fact with the best gradient descent optimizer is called Adam Optimizer.
HARDWARE AND SOFTWARE
USED

Hardware Specification: - (Minimum requirement)


Processor: Any processor above 2.5GHz
RAM: 8 GB
Hard Disk: 50 GB (Solid state drive)
System: Intel core i5
Internet Connection: Active
Software Specification: -
Operating System: Any operating system
Web Browser: Any web browser
Any system with above or higher configuration is compatible for this
project.
Tools and Technology Used
Programming Languages and Libraries: Python, Matplotlib, Numpy,
TensorFlow, Keras.
Techniques:
Convolutional Neural Network (CNN), Image Preprocessing, Model
Training, Model Evaluation
Other tools: Google Colab, Google Drive.
References

https://en.wikipedia.org/wiki/TensorFlow

https://en.wikipedia.org/wiki/Convolutional_neural_

http://hunspell.github.io/

Number System Recognition (https://github.com/chasinginfinity/number-sign-


recognition)

https://opencv.org/

You might also like