Divya Final Year Report
Divya Final Year Report
A Project Report
ON
BY
Project Guide
Ms. Aishwarya M Bhat,
Assistant Professor
Department of Computer Science & Engineering
MITE, Moodabidri
February – May 2021
CERTIFICATE
Technological University, Belagavi during the year 2020 – 21. It is certified that all
corrections and suggestions indicated for Internal Assessment have been incorporated in the
report deposited in the departmental library. The project has been approved as it satisfies the
academic requirements in respect of project work prescribed for the Bachelor of Engineering
degree.
External Viva
1)
2)
ABSTRACT
i
ACKNOWLEDGEMENTS
The satisfaction and the successful completion of this project would be incomplete without
the mention of the people who made it possible, whose constant guidance encouragement
crowned our efforts withsuccess.
This project is made under the guidance of Ms. Aishwarya M Bhat, Senior Assistant
Professor, in the Department of Computer Science and Engineering. We would like to
express sincere gratitude to our guide for all the helping hand and guidance in thisproject.
We would like to thank our project coordinators Mr. Shivaprasad T K Senior Assistant
Professor in the Department of Computer Science and Engineering, for their cordial support,
valuable information and guidance, which helped us in completing this project through the
various stages.
We would like to express appreciation to Dr. Venkatramana Bhat P., Professor and Head of
the department, Computer Science and Engineering, for his support and guidance.
We would like to thank our Principal Dr. G.L. Easwara Prasad, for encouraging us and
giving us an opportunity to accomplish the project.
We also thank our management who helped us directly and indirectly in the completion of
thisproject.
Our special thanks to faculty members and others for their constant help and support.
Above all, we extend our sincere gratitude to our parents and friends for their constant
encouragement with moral support.
ii
TABLE OF CONTENTS
Contents Page No
ABSTRACT i
ACKNOWLEDGEMENT ii
TABLE OF CONTENTS iii
LIST OF TABLES vi
Chapter no TITLE
1. INTRODUCTION 1
1.1 Introduction 1
1.2 Problem Statement 1
1.3 Objectives (Purpose of the project) 2
1.4 Scope of the project 2
1.5 Organization of the report 3
2 LITERATURE SURVEY 4
2.1 Existing System 4
2.2 Limitations of existing systems 7
2.3 Proposed System 8
2.3.1 Convolutional Neural Network 9
2.3.2 Convolutional layers 10
2.3.3 Pooling Layers
11
2.3.4 Fully Connected Layers
2.3.5 Receptive fields 11
2.3.6 Weights
11
2.3.7 Preprocessing and Feature Extraction
iii
3.1.3 Assumptions and Dependencies 14
4 GANTT CHART 17
5 SYSTEM DESIGN 18
6 IMPLEMENTATION 23
6.1 CodeSnippets 23
7 TESTING 28
7.1 TestingLevels 28
iv
7.1.1 Unit Testing
7.1.2 Integration Testing
7.1.2 System Testing
7.1.4 Acceptance Testing
9.1 Conclusion 34
REFERENCES 36
v
LIST OF TABLES
vi
CHAPTER 1
INTRODUCTION
Sign Language Recognition System
Chapter 1
INTRODUCTION
1.1 Introduction
For deaf-mute people, the importance of body language cannot be more obvious. They
cannot speak and hear, so hand gesture is their most commonly used or even the only tool for
communication. They are primarily dependent on sign language to meet the needs of their daily
lives..The world where the deaf live and the world where the hearing live is the same, but only
few people are willing to communicate with deaf-mute people. The reason is simple: the
majority of hearing people cannot understand sign language and have difficulty communicating
with the deaf-mute. Sign language recognition system is one of the solutions to the problem of
communication with the deafmute. Our project aims to bridge the gap between the speech and
hearing-impaired people and the normal people. The basic idea of this project is to make a
system using which dumb people can significantly communicate with all other people using their
normal gestures.
The only way speech and hearing impaired (deaf and dumb) people can communicate is by
sign language. The main problem of this way of communication is normal people who cannot
understand the sign language cannot communicate with these people or vice versa. The basic
idea of this project is to make a system using which dump can significantly communicate with all
other people using their normal gestures. The project uses image processing to identify especially
English alphabetic sign language used by the deaf people to communicate and converts them into
text so that the normal people can understand.
1.3 Objectives
The objective of this system is as follows:
1.4 Scope
The scope of this project is to design an encryption algorithm that would serve as an
improvement towards all the chaotic approaches used in the past. This algorithm will
demonstrate a better confusion and diffusion tactic to increase key generation randomness
for more secure encryption.
1. Chapter 1 of this document consists of Introduction which gives a brief description of the
project and the scope of it.
2. Chapter 2 of this document describes the Literature Survey. It provides details about the
existing system, the limitations that the existing system experiences and the proposed system for
the project.
4. Chapter 4 is the Gantt Chart which is a bar chart showing the project schedule.
5. Chapter 5 is concerned with the System Design. It includes the architectural diagram, the
class diagram, the use case diagram and description, sequence diagram, activity diagram and data
flow diagram.
6. Chapter 6 describes the Implementation. It includes the detail description about how the
project is been implemented.
7. Chapter 7 describes the Testing where the proposed system is tested in various levels like unit
test, integration test and system test and how the program is executed with the set of test cases.
Chapter 2
LITERATURE SURVEY
Random-Forest and Multi-layer Perceptron. Despite the progress being made with the increasing
interest in gesture recognition, there are still important gaps to be addressed in the context of sign
languages. Besides improving usability and efficacy of the solutions, recognition of facial
expression and of both static and dynamic gestures in complex backgrounds must be considered.
“Ming JinCheok et al have published a paper entitled as A review of hand gesture and
sign language recognition techniques” [2].
Hand gesture recognition serves as a key for overcoming many difficulties and providing
convenience for human life. The ability of machines to understand human activities and their
meaning can be utilized in a vast array of applications. One specific field of interest is sign
language recognition. This paper provides a thorough review of state-of-the-art techniques used
in recent hand gesture and sign language recognition research. The techniques reviewed are
suitably categorized into different stages: data acquisition, pre-processing, segmentation, feature
extraction and classification, where the various algorithms at each stage are elaborated and their
merits compared. When segmenting hand area using skin color threshold algorithm, HSV is a
color space that are generally robust to illumination condition. From previous works, HMM has
been successfully implemented in many researches about gesture recognition, while SVM
appears as a popular approach towards static gesture recognition, having better performance.
Further, we also discuss the challenges and limitations faced by gesture recognition research in
general, as well as those exclusive to sign language recognition. Overall, it is hoped that the
study may provide readers with a comprehensive introduction into the field of automated gesture
and sign language recognition, and further facilitate future research efforts in this area.
“Tejashri J. Joshi et al have published a paper entitled as feature extraction method using
Principal Component Analysis (PCA)” [3].
This paper presents a common dimension reduction method. It is to retain some of the most
significant features of high-dimensional data, removing noise and unimportant features, so as to
achieve the purpose of improving data processing speed.
This paper developed a principal component analysis (PCA)-integrated algorithm for feature
identification in manufacturing; this algorithm is based on an adaptive PCA-based scheme for
identifying image features in vision-based inspection. PCA is a commonly used statistical
method for pattern recognition tasks, but an effective PCA-based approach for identifying
suitable image features in manufacturing has yet to be developed. Unsuitable image features tend
to yield poor results when used in conventional visual inspections. Furthermore, research has
revealed that the use of unsuitable or redundant features might influence the performance of
object detection. To address these problems, the adaptive PCA-based algorithm developed in this
study entails the identification of suitable image features using a support vector machine (SVM)
model for inspecting of various object images.
“Christopher Lee and Yangsheng Xu have published a paper entitled as glove based
gesture recognition system”[4].
The researches done in this field are mostly done using a glove based system. In the glove based
system, sensors such as potentiometer, accelerometer etc. is attached to each of the finger. Based
on their readings the corresponding alphabet is displayed. Christopher Lee and Yangsheng Xu
developed glove-based gesture recognition system that was able to recognize 14 of the letters
from the hand alphabet, learn new gestures and able to update the model of each gesture in the
system in online mode. Over the years advanced glove devices have been designed such as the
Sayre Glove, Dexterous Hand Master and Power Glove. The main problem faced by this glove
based system is that it has to be recalibrating every time whenever a new user uses these system.
Also the connecting wires restrict the freedom of movement.
“Byung-Woo Min et al have published a paper entitled as Sign Language Recognition using
Hidden Markov Model” [5].
Gesture is nothing but movement of hands, face and other part of body which is used to
communicate specific message to express thoughts, ideas, emotions, etc. Though other parts can
used for gesture but the hand is most easiest body part. So, in the field of Human-Computer
Interaction (HCI) hand gesture recognition is an active area of research. The hand gesture
recognition can be mainly divided into Data-Glove based and Vision Based approaches. The
Data-Glove based methods use sensor devices for digitizing hand. Due to extra sensors it is easy
to collect hand configuration and movement. It gives good performance but the devices are quite
expensive. The other approach is the Vision Based methods which require only a camera, thus it
gives a natural interaction between humans and computers without the use of any extra devices.
Therefore it is efficient to use and also cost effective. HMM is nothing but Markov process with
hidden states. And this hidden parameters are obtained from observable parameters. Basically
1. The main problem faced by this gloved based system is that it has to be recalibrate
In this study, the proposed system is aimed to recognize hand gesture for sign language.
1. In this proposed system there are two modules that is admin and user.
2. Admin will manage the dataset. Dataset is required to train the system.
3. We train the system with the images of dataset where in we use the processes like
preprocessing, feature extraction and classification.
4. Once the features are extracted from the dataset images we will create a weight file. This
weight file contains the summarized information of feature extracted.
5. It is a permanent file so that we can use in the function. Later anyone can use this
system.
6. Here we make use of CNN (Convolutional Neural Network) algorithm.
7. A convolutional neural network algorithm which can take in an input image,assign
importance (learning weights and biases) to various aspects/objects in the image and be
able to differentiate one from another
A convolutional neural network consists of an input layer, hidden layers and an output layer. In
any feed-forward neural network, any middle layers are called hidden because their inputs and
outputs are masked by the activation function and final convolution. In a convolutional neural
network, the hidden layers include layers that perform convolutions. Typically this includes a
layer that performs a dot product of the convolution kernel with the layer's input matrix. This
product is usually the Frobenius inner product, and its activation function is commonly ReLU.
As the convolution kernel slides along the input matrix for the layer, the convolution operation
generates a feature map, which in turn contributes to the input of the next layer. This is followed
by other layers such as pooling layers, fully connected layers, and normalization layers.
Convolutional layers convolve the input and pass its result to the next layer. This is similar to the
response of a neuron in the visual cortex to a specific stimulus. Each convolutional neuron
processes data only for its receptive field. Although fully connected feedforward neural
networks can be used to learn features and classify data, this architecture is generally impractical
for larger inputs such as high resolution images. It would require a very high number of neurons,
even in a shallow architecture, due to the large input size of images, where each pixel is a
relevant input feature. For instance, a fully connected layer for a (small) image of size 100 x 100
has 10,000 weights for each neuron in the second layer. Instead, convolution reduces the number
of free parameters, allowing the network to be deeper. For example, regardless of image size,
using a 5 x 5 tiling region, each with the same shared weights, requires only 25 learnable
parameters. Using regularized weights over fewer parameters avoids the vanishing gradients and
exploding gradients problems seen during backpropagation in traditional neural networks.
Furthermore, convolutional neural networks are ideal for data with a grid-like topology (such as
images) as spatial relations between separate features are taken into account during convolution
and/or pooling.
Convolutional networks may include local and/or global pooling layers along with traditional
convolutional layers. Pooling layers reduce the dimensions of data by combining the outputs of
neuron clusters at one layer into a single neuron in the next layer. Local pooling combines small
clusters,tiling sizes such as 2 x 2 are commonly used. Global pooling acts on all the neurons of
the feature map. There are two common types of pooling in popular use: max and average. Max
pooling uses the maximum value of each local cluster of neurons in the feature map,
while average pooling takes the average value
Fully connected layers connect every neuron in one layer to every neuron in another layer.
It is the same as a traditional multi-layer perceptron neural network (MLP). The flattened matrix
goes through a fully connected layer to classify the images.
In neural networks, each neuron receives input from some number of locations in the
previous layer. In a convolutional layer, each neuron receives input from only a restricted area of
the previous layer called the neuron's receptive field. Typically the area is a square (e.g. 5 by 5
neurons). Whereas in a fully connected layer, the receptive field is the entire previous layer.
Thus, in each convolutional layer, each neuron takes input from a larger area in the input than
previous layers. This is due to applying the convolution over and over, which takes into account
the value of a pixel, as well as its surrounding pixels. When using dilated layers, the number of
pixels in the receptive field remains constant, but the field is more sparsely populated as its
dimensions grow when combining the effect of several layers.
2.3.6 Weights
Each neuron in a neural network computes an output value by applying a specific function
to the input values received from the receptive field in the previous layer. The function that is
applied to the input values is determined by a vector of weights and a bias (typically real
numbers). Learning consists of iteratively adjusting these biases and weights.The vector of
weights and the bias are called filters and represent particular features of the input (e.g., a
particular shape). A distinguishing feature of CNNs is that many neurons can share the same
filter. This reduces the memory footprint because a single bias and a single vector of weights are
used across all receptive fields that share that filter, as opposed to each receptive field having its
own bias and vector weighting
Bottleneck features are the values computed in the pre-classification layer.The basic
technique to get transfer learning working is to get a pre-trained model (with the weights loaded)
and remove final fully-connected layers from that model. We then use the remaining portion of
the model as a feature extractor for our smaller dataset. These extracted features are called
"Bottleneck Features" (i.e. the last activation maps before the fully-connected layers in the
original model). We then train a small fully-connected network on those extracted bottleneck
features in order to get the classes we need as outputs for our problem.Data once prepared, is
divided into three parts.Training,Testing,Validation.Here, each folder is trained.We make use of
tensor flow inceptionV3 model.It is convolutional neural network that has 48 different layers and
that can process image of 299*299 dimensions.
Chapter 3
The SRS is a document, which describes completely the external behavior of the system.
This section of the SRS describes the general factors that affect the product and its requirements.
The system will be explained in its context to show how the system interacts with other systems
and introduce the basic functionalities of it.
Product perspective is essentially the relationship of the product to other products, defining if it
is independent or is part of a larger product. The proposed system requires the use of python
programming language platform and opencv for image processing to get the desired output.
Worldwide efforts have been made to aid the deaf community in communicating with non-
signers but most of the existing system either use specialized sensors or has low performance. It
helps the normal people to communicate with deaf people and even the deaf people can
communicate effectively with public. This system realizes their sign notations and realizes its
English equivalent System then convert text obtained into voice signal.
This system is developed to help deaf people community by converting their sign language into
text. System also will convert text to voice signals to help public to understand their language.
System consists of a web camera through which it receives the video input. Deaf people should
stand in front of the camera and show the sign notation. System converts the video data into
number of frames and realizes its alphabet equivalent. System then converts the word to audio
data which can be played through speakers.
We assume that system is installed with enough resources since system requires high end
configuration than average.
We assume that system consists in built camera to get the quality video frames as input.
System may show bit delay in identifying the sign signal since its computations.
This session includes the detailed description about the hardware requirements, software
requirement, functional requirement and non-functional requirements.
Hardware requirements refer to the physical parts of a computer and related devices. Internal
hardware devices include motherboards, hard drives and RAM. External hardware devices
include monitors, keyboards, mouse, printers and scanner.
1. Upload Module: This module is used to upload the dataset to the system memory.
2. Presentation Module: This module is used to present the result to end user.
3. Create model: This module is used to create a CNN module to train the system.
4. Train Module: This module is used to train the system with dataset images.
5. Initialize Camera: This module is used to initialize the camera.
6. Classification module: This module is used to classify the input video frames into
different alphabet labels.
7. Text to voice converter: This module is used to convert text into voice signal.
1. Usability: This system can be used by any deaf person without any effort and it has
appropriate user interface.
2. Maintainability: This software has designed to be user friendly and can be maintained by
even less educated people and require less maintenance.
3. Response Time: This system has good response time so that end user will get result
within the estimated time.
4. Software development life cycle: Here agile method is used which combines the
advantages of waterfall approach and iterative model.
CHAPTER 4
GANTT CHART
A Gantt chart is a type of bar chart, developed by Henry Gantt that illustrates a
project schedule. Gantt charts illustrate the start and finish of the terminal elements and
summary elements of the project. Terminal elements and summary elements comprise
the work breakdown structure of the project.
The following is the Gantt chart of the project “Sign Language Recognition System”
Chapter 5
SYSTEM DESIGN
System overview provides a top-level view of the entire software product. It highlights the
major components without taking account the inner details of the implementation. It describes
the functionality of the product and context and design of the software product. The application
will be developed in a way which provides the user to interact with the system and simplifies the
tasks by providing the smooth user interface and user experience with easily readable and
understandable view.
tg
Is Camera
Initialized
Obtain Frames
Capture Video
Determine Alphabets
Audio Result
dataset Dataset
Admin Create
Preprocessing Bottleneck
The Fig. 5.4.1 shows the data flow between each component in the system. First the admin
preprocesses the dataset to create a bottleneck. Admin also creates a model based on number of
layers and image dimension. Then model is trained by giving the dataset and number of
iterations as inputs,after the training process is completed model is validated.
Text
Text To
Voice
The Fig. 5.4.2 shows the data flow between each component in the system. First the input video
is taken from the user based on the notations. Then the video is split into frames based on the
duration. Then alphabet classification takes place which gives text as a output, then the text will
be converted to voice.
Chapter 6
IMPLEMENTATION
import cv2
cap = cv2.VideoCapture(0)
if cap.isOpened() == False:
print("Could not open video device")
#cap.set(cv2.CV_CAP_PROP_FRAME_WIDTH, 640)
#cap.set(cv2.CV_CAP_PROP_FRAME_HEIGHT, 480)
while(True):
# Capture frame-by-frame
#print(frame)
cv2.imshow("preview",frame)
cap.release()
cv2.destroyAllWindows()
The following code is used classify the alphabets and give text as a output.
import tensorflow as tf
import sys
import os
# Disable tensorflow compilation warnings
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf
image_path = sys.argv[1]
# Read the image_data
image_data = tf.gfile.FastGFile(image_path, 'rb').read()
# Loads label file, strips off carriage return
label_lines = [line.rstrip() for line
in tf.gfile.GFile("logs/output_labels.txt")]
# Unpersists graph from file
with tf.gfile.FastGFile("logs/output_graph.pb", 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
_ = tf.import_graph_def(graph_def, name='')
with tf.Session() as sess:
# Feed the image_data as input to the graph and get first prediction
softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
predictions = sess.run(softmax_tensor, \
{'DecodeJpeg/contents:0': image_data})
# Sort to show labels of first prediction in order of confidence
top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]
for node_id in top_k:
human_string = label_lines[node_id]
score = predictions[0][node_id]
print('%s (score = %.5f)' % (human_string, score))
message='Good Morning'
language='en'
audio=gTTS(text=message,lang=language,slow=False)
audio.save('myvoice.mp3')
# call(['vlc', 'myvoice.mp3'])
CHAPTER 7
playsound('myvoice.mp3'
TESTING
Chapter 7
TESTING
Testing is an activity to check whether the actual results match the expected results.
Testing also helps to identify errors, gaps or missing requirements in contrary to the actual
requirements. Testing is an important phase in the development life cycle of the product. During
the testing, the program to be tested was executed with a set of test cases and the output of the
program for the test cases was evaluated to determine whether the program is performing as
expected. Errors were found and corrected by using the following testing steps and correction
was recorded for future references. Thus, a series of testing was performed on the system before
it was ready for implementation. An important point is that software testing should be
distinguished from the separate discipline of Software Quality Assurance (SQA), which
encompasses all business process areas, not just testing.
Testing is part of Verification and Validation. Testing plays a very critical role for quality
assurance and for ensuring the reliability of the software. The objective of testing can be stated in
the following ways.
Unit testing tests the individual components to ensure that they operate correctly. Each
component is tested independently, without other system component. This system was tested
with the set of proper test data for each module and the results were checked with the expected
output. Unit testing focuses on verification effort on the smallest unit of the software design
module.
1. After every step of the algorithm is prepared debugging is carried out to ensure its proper
functioning.
2. A testing is carried out to whether the input video is captured or not.
3. A test was carried out whether the video is split into image frames and image is
normalized properly.
4. Testing was carried out to check the symbol in captured image matches with any of the
symbol image of the database.
Integration testing is another aspect of testing that is generally done in order to uncover
errors associated with the flow of data across interfaces. The unit-tested modules are grouped
together and tested in small segment, which makes it easier to isolate and correct errors. This
approach is continued until we have integrated all modules to form the system as a whole. After
the completion of each step it has been combined with the remaining module to ensure that the
project is working properly as expected
System testing tests a completely integrated system to verify that it meets its requirements.
After the completion of all the module they are combined together to test whether the entire
project is working properly. It deals with testing the whole project for its intended purpose. In
other words, the whole system is tested here. System testing was carried out by selecting the
image from the project folder and normalizing it for the sharpness of the image. After that the
image is distorted For each of the dataset image in each class (A,B,C...Z) a bottleneck file will be
created.
Project is tested at different levels to ensure that it is working properly and was meeting
the requirements which are specified in the requirement analysis. Acceptance testing is done
once the project is done and checked for the acceptance. The results from the system was
compared with the results from the traditional evaluation approach. Then the accuracy of the
system was tested by increasing the epochs or iterations
Result: Successful
1. Once the user stops showing the sign language in front of the camera,the pop up
window closes
2. The sign language is converted and displayed as text in the terminal and audio of the
text is also displayed
Expected Results: Text is converted to audio
Result: Successful
CHAPTER 8
RESULTS AND SNAPSHOTS
8.1 Snapshot-1
Figure 8.1 shows the how the virtual environment is activated and how to run the program
8.2 Snapshot-2
Figure 8.2 shows User showing the sign language in front of the webcam and video is captured
8.3 Snapshot-3
Figure 8.3 shows that sign language converted to text and it is displayed in the terminal
Chapter 9
CONCLUSION AND FUTURE WORK
9.1 Conclusion
Our project aims to make communication simpler between deaf and dumb people by introducing
Computer in communication path so that sign language can be automatically captured,
recognized, translated to text and displayed it on LCD.The output of the sign language will be
displayed in the text as well as voice signal.This makes the system more efficient and hence
communication of the hearing andspeech impaired people more easy.
In future work, proposed system can be developed and implemented using RaspberryPi. Image
Processing part should be improved so that System would be able to communicate in both
directions i.e.it should be capable of converting normallanguage to sign language and vice
versa.We will try to recognize signs which include motion. Moreover we will focus onconverting
the sequence of gestures into text i.e. word and sentences and thenconverting it into the speech
which can be heard.
REFERENCES
[1] Davi Hirafuji Neiva, Cleber Zanchenttin, Gesture recognition: A review focusing on sign
language in a mobile context, Expert Systems with Applications, 103, 159-183 (2018).
[2] Suharjito, Ricky Anderson, Fanny Wiryana, Meita Chandra Ariesta, Gede Putra Kusuma,
Sign Language Recognition Application Systems for Deaf-Mute People: A Review Based on
Input-Process-Output, 2nd International Conference on Computer Science and Computational
Intelligence 2017, ICCSCI 2017, 3-14 October 2017, Bail, Indonesia (2017).
[3] Ming Jin Cheok, Zaid Omar, Mohamed Hisham Jaward, A review of hand gesture and sign
language recognition techniques (2017).
[4]Tejashri J. Joshi, Shiva Kumar, N. Z. Tarapore, Vivek Mohile, Static Hand Gesture
Recognition using an Android Device, in International Journal of Computer Applications
(0975 – 8887) , Vol 12, No.21 (2015).
[5] Pan, T.-Y. , Lo, L.-Y. , Yeh, C.-W. , Li, J.-W. , Liu, H.-T. , & Hu, M.-C., Real-time sign
language recognition in complex background scene based on ahierarchical clus- tering
classification method, In Proceedings of the IEEE second international con - ference on
multimedia big data (BIGMM) ,64-67, IEEE (2016).