Final Project Report
Final Project Report
on
Voice Based E-mailing System
A Project Report Submitted in the partial fulfilment of the Requirements for the Degree of
Bachelor of Technology
in
COMPUTER SCIENCE & ENGINEERING
By
Ashish Gusain (180101024)
1. CERTIFICATE
2. ACKNOWLEDGEMENT
3. DECLARATION
4. ABSTRACT
5. LIST OF FIGURES
6. INTRODUCTION
7. LITERATURE REVIEW
8. METHODOLOGY
• REQUIREMENTS
• PACKAGES OVERVIEW
• IMPLEMENTATION
• CODING
9. EXPERIMENTAL RESULTS
10. CONCLUSION
11.FUTURE ASPECTS
12.REFERENCES
2
Certificate
This is to certify that the project report entitled “Voice Based E-mailing System for visually
impaired” presented by Ashish Gusain, Abhinav Nautiyal, Aniket Thakur and Akash
Dhasmana in the partial fulfilment for the award of Bachelor of Technology in Computer
Science and Engineering, is a record of work carried out by them under my supervision and
guidance at the department of Computer Science and Engineering at Uttaranchal Institute of
Technology, Dehradun.
It is also certified that this project has not been submitted at any other Institute for the award
of any other degrees to the best of my knowledge.
3
ACKNOWLEDGEMENT
It is indeed a great pleasure and matter of immense satisfaction for me to express our deep
sense of profound gratitude towards all the people who have helped, inspired me in my
project. First of all, We would like to thank our guide Mr. Mukesh Pandey and Mr. Saumil
Kumar for their effort in this project right form its selection to its completion. They spend
their precious time and guided us throughout the process.
4
DECLARATION
We here by declare that the project report entitled “Voice Based E-mailing System For Visually
Impaired ” submitted for the partial requirement of the Bachelor of Technology in Computer
Science and Engineering, is our original work and all the information in this document is an
authentic record of our own research work carried out. The matter embodied in this project report
has not been submitted anywhere else for award of any other degree.
5
Abstract
The topic of the project is to build a voice based e-mailing system for people who are visually
impaired or unable to write or read an e-mail. This voice based engine will help a visually
impaired user to write an e-mail using voice commands and also play out the emails in inbox
according to the voice commands that user gives. As the title suggests, this will be a voice based
engine for visually impaired persons using IVR- Interactive voice response, thus enabling
everyone to control their mail accounts using their voice only and to be able to read, send, and
perform all the other useful tasks. The system will prompt the user with voice commands to
perform certain action and the user will respond to the same. The main benefit of this system is
that the use of keyboard is completely eliminated, the user will have to respond through voice
and mouse click only. This project is proposed for the betterment of society. This project aims to
help the visually impaired people to be a part of growing digital India by using internet and also
aims to make life of such people quite easy. Also, the success of this project will also encourage
developers to build something more useful for visually impaired or illiterate people, who also
deserve a equal standard in society.
6
List of Figures and Tables
Figure 1.-Basic diagram demonstrating the architecture of voice based emailing system
7
Chapter 1. Introduction
This project is a voice based e-mailing system for people who are visually impaired or unable to
write or read an e-mail. This voice based engine will help a visually impaired user to write an e-
mail using voice commands and also play out the emails in inbox according to the voice
commands that user gives. As the title suggests, this will be a voice based engine for visually
impaired persons using IVR- Interactive voice response, thus enabling everyone to control their
mail accounts using their voice only and to be able to read, send, and perform all the other useful
tasks. The system will prompt the user with voice commands to perform certain action and the
user will respond to the same. The main benefit of this system is that the use of keyboard is
completely eliminated, the user will have to respond through voice and mouse click only. This
project is proposed for the betterment of society. This project aims to help the visually impaired
people to be a part of growing digital India by using internet and also aims to make life of such
people quite easy. Also, the success of this project will also encourage developers to build
something more useful for visually impaired or illiterate people, who also deserve a equal
standard in society.
Figure- 1.
The most common mail services that are used in our day to day life cannot be used by visually
challenged people. To make these systems convenient for these people who are visually
challenged there are various technologies provided to them like screen reader, automatic speech
recognizer, speech to text and text to speech, braille keyboard, etc. However, these technologies
are not that much useful for those people as it could not give the proper response like a normal
system.[1] The objective of Voice Based Email for Visually Impaired is to help challenge one’s
access mails easily and efficiently. This application is based on using speech-to-text and text-to
speech converters, thus enabling everyone to control their mail accounts using their voice only
and be able to read, send, and perform all the other useful tasks. The system will prompt the user
with voice commands to perform certain action and the user will respond to the same. So here
put to use are the Speech-to-Text and Text-to-Speech technologies using .net framework. The
Speech-to-Text also known as Automatic Speech Recognition converts spoken speech into text,
which helps compose emails as an easy task. The Text-to-Speech module gives audio output of
8
the mail received, the sender, the subject and the body of the mail is read out by the system.
1. SPEECH_ TO_ TEXT Converter : The voice recognition system is made up of various
components: feature extraction, an acoustic models database constructed fromtraining data, a
dictionary, a language model, and the voice recognition algorithm. The time and amplitude axes
of an analoguevoice signal must be sampled or digitised first. The voice signal is sampled at
regular intervals and analysed.Because the signal in this time is deemed stationary, the period is
usually 20 milliseconds. The generation of uniformly spaced discrete vectors of speechfeatures is
required for speech feature extraction. The parameters of acoustic models are estimated using
feature vectors from the training database.
2. TEXT_ TO_ SPEECH Converter : Using speech synthesis techniques to convert text to vocal
output. Although it was originally developed for the blind to listen to written material, it is now
widely used to transmit financial data, e-mail messages, and other information to the general
public through telephone. When offering instructions, text-to-speech is also employed on
handheld devices such as portable GPS units to proclaim street names.A string of 50 characters
of text (alphabets and/or digits) is accepted as input by our Text-to-Speech Converter. We've
connected the keyboard to the controller and defined all of the alphabet and digit keys on it in
this. The speech processor has an infinite dictionary and, in most cases, can speak practically any
text provided at the input.
9
PURPOSE OF THE PROJECT
This project proposes a python based application, designed specifically for visually impaired
people. This application provide a voice based mailing service where they could read and send mail
on their own, without any guidance through their Email accounts. The VMAIL system can be used
by a blind person to access mails easily and adeptly. Hence dependence of visually challenged on
other individual for their activities associated to mail can be condensed. The application will be a
python-based application for visually challenged persons using IVR- Interactive voice response,
thus sanctioning everyone to control their mail accounts using their voice only and to be able to
read, send, and perform all the other useful tasks. The system will ask the user with voice
commands to perform certain action and the user will respond to it. The main advantage of this
system is that use of keyboard is completely eliminated, the 4 user will have to respond through
voice only.
MOTIVATION
It is estimated that nearly 285 million people in the world are visually impaired and idea is to
facilitate suitable communication system for them. This reason was driving force behind developing
given system. One of the major disadvantages of existing system is that all operations are based on
mouse click events and keyboard. Operations depend completely on types of clicks specified by
idea. Also sometimes remembering keyboard shortcut is difficult. The extent of existing system is
limited for blind and visually impaired people. There is high need of developing a proper system
which curbs all the above drawbacks and turn into a simple system. Idea focuses on providing basic
functionalities like compose, send, receive E-mail along with advance features like voice based
operation, search mail, provision for voice as well as text based email with added ease and
simplicity. Related Work Interaction of the users to the system earlier was based on Screen reader
based technology and also system based on mouse click based operations were in for every
operation there is associated mouse click for example to compose email let say to left clicks.
Therefore interaction with the system is tough also there is need to keep events in mind. This paper
focuses on developing an email system which helps blind people to use communication services.
The system based in IVR is used, major idea is to discard keyboard and use of mouse operation.
Internet is rich source of knowledge and information, blind people face difficulties in accessing text
based material. The idea is to develop audio feedback based virtual environment like screen reader,
text to speech, etc. Voice mail architecture helps blind people to access info. in form of audio, text,
self read system. Idea focuses on helping visually impaired and illiterate people to access
technology by reducing cognitive load. Decision making depends on eyesight and everything that
happens or appears.
10
Chapter 2. Literature Review
In this project, a voice based email architecture is proposed which will help blind people to
access email. The existing system is not user friendly for blind people as it does not give any
audio feedback to readout contents for them. The proposed system makes use of Speech
Recognition, Interactive Voice Response and Mouse Click events. Also, for additional security
purposes voice recognition is used for user verification. In this system, Registration is the first
module. This module will collect complete information of the user by prompting the user to what
details need to be entered. The second module is the login module in which the system will ask
the user to provide user name and password. This is done through voice commands. Another
voice sample is asked for performing the voice verification. Then the user is redirected to the
inbox page once login is done. After login, users can perform normal operations of a mailing
system. System options are: Compose, Inbox, Sent Mail, Trash. The user can switch between
these using voice commands. The proposed the system that relies on a voice command based
system unlike the existing mail system. The complete system is primarily based on speech to text
commands. Once using this system the application will be prompting the user to speak specific
commands to avail respective services and if the user wants to access the respective services the
user needs to speak that command. This application makes use of SMTP (Simple Message
Transfer Protocol). SMTP or Simple Mail Transfer Protocol is an application that is used to send,
receive, and relay outgoing emails between senders and receivers.
The Main activity Screen will be the First screen to be displayed on start of the app. This screen
waits for the user to press the button so that the system will start accepting voice commands. And
this is a full sized button so they can press anywhere on the screen. Then using Voice commands
users can send, read emails.
This system uses mainly three technologies:
● Speech to text
● Text to Speech.
● Interactive Voice Response
Results: The results reflect that Tetra Mail is a better alternative for blind users due to its
consistent and blindfriendly interface design. The results of this prototype implementation show
an improved user experience, accuracy in task completion, and better control over touch screen
interfaces in performing basic activities of managing emails. The results demonstrate that Tetra
Mail is an accessibility-inclusive email client enabling blind people to have a better user
interaction experience and minimal cognitive overload in managing emails. The solution is tested
through an empirical study. Results showed that this email client helps blind people to send and
receive emails with comfort and ease.
Advantages of the above surveyed techniques In most of the papers, it can be seen that the
whole process of speech-to-text and text-to speech makes it more interactive and easy for the
visually impaired people. This system makes the disabled people feel like normal users. Also,
voice based is useful for handicapped and illiterate people. Automatic-speech recognizer is of the
11
major advantages. We can see a reduction in cognitive load taken by blind to remember and type
characters using a keyboard. Voice based email system is a user-friendly system.
Inbox: This page will store all of the mails received by the user.
Below steps explains how to access a mail from inbox:
• All the received Mails will be listed sorted in order of date
• Double left Click to give voice input to filter Mail, when Satisfied Left click to proceed
• In this Stage your mail will be read out Double Left Click to start/pause
Trash: This folder will store all of the mail deleted by the user. • All the deleted Mails will be
listed sortedin order of date • Double Left Click to start/pause • Left Click to proceed to Delete
the Mail or Right Click to back • If in Delete Section Left Click to Delete the Mail
Sent Mail: This folder will store all of the mails sent from the user
The advancement in computer based accessible systems has opened up many avenues for the
visually impaired across a wide majority of the globe. Audio feedback based virtual environment
like, the screen readers have helped blind people to access internet applications immensely.
However, a large section of visually impaired people in different countries, in particular, the Indian
sub-continent could not benefit much from such systems. This was primarily due to the difference
in the technology required for Indian languages compared to those corresponding to other popular
languages of the world. In this paper, we describe the voicemail system architecture that can be
used by a blind person to access e-mails easily and efficiently. The contribution made by this
research has enabled the blind people to send and receive voice-based e-mail messages in their
native language with the help of a mobile device. Our proposed system GUI has been evaluated
against the GUI of a traditional mail server. We found that our proposed architecture performs
much better than that of the existing GUIS. In this project, we use voice to text and text to voice
technique access for blind people. The navigation system uses TTS (Text-to-Speech) for blindness
in order to provide a navigation service through voice. Suggested system, as an independent
program, is fairly cheap and it is possible to install onto Smartphone held by blind people. This
allows blind people to easy access the program. An increasing number of studies have used
technology to help blind people to integrate more fully into a global world.
12
The software considers a system of instant messenger to favor interaction of blind users with any
other user connected to the network. Nowadays the advancement made in computer technology
opened platforms for visually impaired people across the world. It has been observed that nearly
about 60% of the total blind population across the world is present in INDIA. In this paper, we
describe the voice mail architecture used by blind people to access E-mail and 10 multimedia
functions of the operating system easily and efficiently.This architecture will also reduce cognitive
load taken by the blind to remember and type characters using the keyboard. It also helps
handicapped and illiterate people. In previous work, blind people does not send email using the
system. The multitude of email types along with the ability setting enables their use in nomadic
daily contexts. But these emails are not useful in all types of people such as blind people they can’t
send the email. Audio based email are only preferable for blind peoples. They can easily respond to
the audio instructions. In this system is very rare. So there is less chance to available this audio
based email to the blind people. We describe the voicemail system architecture that can be used by
a blind person to access e-mails easily and efficiently. The contribution made by this research has
enabled the blind people to send and receive voice-based e-mail messages in their native language
with the help of a computer or a mobile device. Our proposed system GUI has been evaluated
against the GUI of a traditional mail server. We found that our proposed architecture performs
much better than that of the existing GUIS. It involves the development of the following modules:
SPEECH_ TO_ TEXT Converter :The system acquires speech at run time through a microphone
and processes the sampled speech to recognize the uttered text. The recognized text can be stored in
a file. We are developing this on Android platform using Eclipse workbench. Our speech to-text
system directly acquires and converts speech to text. It can supplement other larger systems, giving
users a different choice for data entry. A speech-to-text system can also improve system
accessibility by providing data entry options for blind, deaf, or physically handicapped users.
Speech recognition system can be divided into several blocks: feature extraction, acoustic models
database which is built based on the training data, dictionary, language model and the speech
recognition algorithm. Analog speech signal must first be sampled at time and amplitude axes, or
digitized. Samples of the speech signal are analyzed in even intervals. This period is usually 20 ms
because the signal in this interval is considered stationary. Speech feature extraction involves the
formation of equally spaced discrete vectors of speech 11 characteristics. Feature vectors from
training database are used to estimate the parameters of acoustic models. The acoustic model
describes properties of the basic elements that can be recognized. The basic element can be a
phoneme for continuous speech or word for isolated words recognition.
TEXT_ TO_ SPEECH Converter: Converting text to voice output using speech synthesis
techniques. Although initially used by the blind to listen to written material, it is now used
extensively to convey financial data, e-mail messages, and other information via telephone for
everyone. Text-to-speech is also used on handheld devices such as portable GPS units to announce
street names when giving directions. Our Text-toSpeech Converter‖ accepts a string of 50 characters
of text (alphabets and/or numbers) as input. In this, we have interfaced the keyboard with the
controller and defined all the alphabets as well as digits keys on it. The speech processor has an
unlimited dictionary and can speak out almost any text provided at the input most of the times.
Hence, it has an accuracy of above 90%. It is a microcontroller based hardware coded in Embedded
C language. Further research is to be done to optimize various methods of inputting the text i.e.
Reading the text using optical sensor and converting it to speech so that almost all sorts of physical
challenges faced by the people while communicating are overcome. WORD RECOGNITION
:Voice recognition software (also known as speech to text software)allows an individual to use their
13
voice instead of typing on a keyboard. Voice recognition may be used to dictate text into the
computer or to give commands to the computer. Voice recognition software allows for a quick
method of writing onto a computer. It is also useful for people with disabilities who find it difficult
to use the keyboard. This software can also assist those who have difficulty with transferring ideas
onto paper as it helps take the focus out of the mechanics of writing. Word recognition is measured
as a matter of speed, such that a word with a high level of recognition is read faster than a novel
one. This manner of testing suggests that comprehension of the meaning of the words being read is
not required, but rather the ability to recognize 12 them in a way that allows proper pronunciation.
Therefore, context is unimportant, and word recognition is often assessed with words presented in
isolation in formats such as flash cards Nevertheless, ease in word recognition, as in fluency,
enables proficiency that fosters comprehension of the text being read
The visually challenged people find it very difficult to utilize this technology because of the fact
that using them requires visual perception. However not all people can use the internet. This is
because in order to access the internet you would need to know what is written on the screen. If that
is not visible it is of no use. This makes internet a completely useless technology for the visually
impaired and illiterate people. In this system mainly three types of technologies are used namely:
STT (Speech-to-text): here whatever we speak is converted to text. Their will a small icon ofmic on
whose clicking the user had to speak and his/her speech will be converted to text format, which the
naked people would see and read also.
TTS (text-to-speech) this, method is full opposite of STT. In this method, which converts the text
format of the emails to synthesized speech?
A text-to-speech (TTS) system converts language text into speech, alternative systems render
symbolic linguistic representations. Synthesized speech can be created by concatenating pieces of
recorded speech that are stored in a database.
Therefore we came up with our project as voice based email system for blinds which will help a lot
to visually impaired peoples and also illiterate peoples for sending their mails. The users of this
system don’t need to remember any basic information about keyboard shortcuts as well as location
of the keys. Simple mouse click operations are needed for functions making system easy to use for
user of any age group. Our system provides location of where user is prompting through voice so
that user doesn’t have to worry about remembering which mouse click operation
IVR (Interactive voice response): IVR is an advanced technology describes the interaction between
the user and the system in the way of responding by using keyboard for the respective voice
message. IVR allows user to interact with an email host system via a system keyboard, after that
users can easily service their own enquiries by listening 14 to the IVR dialogue. IVR systems
generally respond with pre-recorded audio voice to further assist users on how to proceed. The
audio that would be pre-recorded and the system need to have large volumes.
14
Chapter 3. Methodology
Requirements :
Figure-2.
Desktop/ Hardware Speaker
Laptop pc Requirements Microphone
Software
Python Requirements Pycharm
3.3+ / VS code
Send/Receive Emails
Hardware Requirements
• 1 GB RAM.
• Speaker
• Microphone
Software Requirements
• Windows10/11
• Visual Studio2013/Pycharm IDE
• Windows Operating System
• Python 3.3+
15
Packages Overview:
1. SpeechRecognition 3.8.1
Library for performing speech recognition, with support for several engines and APIs,
online and offline. It uses google cloud speech to text api which Accurately convert
speech into text with an API powered by the best of Google’s AI research and
technology.
2. PyAudio 0.2.11
PyAudio is required if and only if you want to use microphone input (Microphone).
PyAudio version 0.2.11+ is required, as earlier versions have known memory
management bugs when recording from microphones in certain situations.
3. gTTS 2.2.4
gTTS (Google Text-to-Speech), a Python library and CLI tool to interface with Google
Translate's text-to-speech API. Write spoken mp3 data to a file, a file-like object
(bytestring) for further audio manipulation, or stdout . Or simply pre-generate Google
Translate TTS request URLs to feed to an external program. Customizable speech-
specific sentence tokenizer that allows for unlimited lengths of text to be read, all while
keeping proper intonation, abbreviations, decimals and more. Customizable text pre-
processors which can, for example, provide pronunciation corrections.
4. smtplib 3.10.4
The smtplib module defines an SMTP client session object that can be used to send mail
to any internet machine with an SMTP or ESMTP listener daemon. For details of SMTP
and ESMTP operation, consult RFC 821 (Simple Mail Transfer Protocol) and RFC
1869 (SMTP Service Extensions).
5. BS4-Beautifulsoup 4.9.0
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works
with your favorite parser to provide idiomatic ways of navigating, searching, and
modifying the parse tree. It commonly saves programmers hours or days of work.
6. Pyglet 1.5.24
pyglet is a cross-platform windowing and multimedia library for Python, intended for
developing games and other visually rich applications. It supports windowing, user
16
interface event handling, game controllers and joysticks, OpenGL graphics, loading
images and videos, and playing sounds and music. pyglet works on Windows, OS X and
Linux.
The entire system is dependent on voice prompts and mouse clicks. When utilising this system, the
computer will prompt the user to execute specified activities in order to access particular services,
and the user must complete those activities in order to access those services. One of the most
significant advantages of this system is that the user will rarely need to use a keyboard.All actions
will be triggered by mouse clicks. The challenge now is how blind users will determine where the
mouse pointer is located. Because the blind user cannot track a specific place, he or she mustmove
the mouse across the screen from top to bottom and then left to right. Because it is only a basic
system.
The suggested system is based on an entirely new concept and is unlike any other postal system
already in use. The accessibility of the proposed system is the most essential factor that has been
considered. Only when a web system can be utilised effectively by all types of individuals, whether
able or disabled, is it said to be totally accessible. This accessibility is not provided by current
systems. As a result, the system we're creating is vastly different from the current one. Unlike the
existing system, which prioritises user friendliness for normal users, our approach prioritises user
friendliness for all types of people, including normal persons who are visually impaired and
illiterate.
While working on this project, we discovered a number of applications that had the same goal as us.
Visually impaired people are unable to use the most common postal services that we use on a daily
basis. This is due to the fact that they do not provide any means for the person in front of the screen
to hear the content. They are unable to determine where to click in order to complete the essential
actions since they are unable to visualise what is currently on screen. Existing System Features
Benefits Limitations VoiceTalk Users caneasily interact with anyone without typing any single
word from the keyboard The Application helps not onlyto the blind individual but also to the
individual who is illiterate. Multiple users cannot use the application due to lack of database. Even
if it is user friendly, using a computer for the first time is not as convenient for a visually impaired
person as it is for a typical user. Despite the fact that there are numerous screen readers available,
these individuals nevertheless experience some minor challenges. Screen readers read aloud
whatever is on the screen, and the user must utilise keyboard shortcuts as blind people cannot use
mouse location to trace to conduct the actions. This involves two things: first, the user cannot use
the mouse pointer since it is inconvenient if the location of the pointer cannot be traced, and second,
the user shouldbe familiar with the keyboard and know where each key is situated. As a result, a
user who is new to computers will be unable to use this service.
17
Speech recognition is the inter-disciplinary sub-field of computational linguistics that develops
methodologies and technologies that enables the recognition and translation of spoken language
into text by computers. It is also known as "automatic speech recognition" (ASR), "computer
speech recognition", or just "speech to text" (STT). It incorporates knowledge and research in the
linguistics, computer science, and electrical engineering fields.Some speech recognition systems
require "training" (also called "enrollment") where an individual speaker reads text or isolated
vocabulary into the system. The system analyzes the person's specific voice and uses it to fine-tune
the recognition of that person's speech, resulting in increased accuracy. Systems that do not use
training are called "speaker independent" systems. Systems that use training are called "speaker
dependent". Speech recognition applications include voice user interfaces such as voice dialing (e.g.
19 "Call home"), call routing (e.g. "I would like to make a collect call"), domotic appliance control,
search (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a
credit card number), preparation of structured documents (e.g. a radiology report), speech-to-text
processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice Input). The
term voice recognition or speaker identification refers to identifying the speaker, rather than what
they are saying. Recognizing the speaker can simplify the task of translating speech in systems that
have been trained on a specific person's voice or it can be used to authenticate or verify the identity
of a speaker as part of a security process. From the technology perspective, speech recognition has a
long history with several waves of major innovations. Most recently, the field has benefited from
advances in deep learning and big data. The advances are evidenced not only by the surge of
academic papers published in the field, but more importantly by the worldwide industry adoption of
a variety of deep learning methods in designing and deploying speech recognition systems.
Speech recognition works using algorithms through acoustic and language modeling. Acoustic
modeling represents the relationship between linguistic units of speech and audio signals; language
modeling matches sounds with word sequences to help distinguish between words that sound
similar. Often, hidden Markov models are used as well to recognize temporal patterns in speech to
improve accuracy within the system. The most frequent applications of speech recognition within
the enterprise include call routing, speech-to-text processing, voice dialing and voice search. While
convenient, speech recognition technology still has a few issues to work through, as it is
continuously developed. The pros of speech recognition software are it is easy to use and readily
available. Speech recognition software is now frequently installed in computers and mobile devices,
allowing for easy access. The downside of speech recognition includes its inability to capture words
due to variations of pronunciation, its lack of support for most languages outside of English and its
inability to sort through background noise. These factors can lead to inaccuracies. 20 Speech
recognition performance is measured by accuracy and speed. Accuracy is measured with word error
rate. WER works at the word level and identifies inaccuracies in transcription, although it cannot
identify how the error occurred. Speed is measured with the real-time factor. A variety of factors
can affect computer speech recognition performance, including pronunciation, accent, pitch,
volume and background noise.It is important to note the terms speech recognition and voice
recognition are sometimes used interchangeably. However, the two terms mean different things.
Speech recognition is used to identify words in spoken language. Voice recognition is a biometric
technology used to identify a particular individual's voice or for speaker identification
18
Speech synthesis is the synthetic production of speech. An automatic data handing out system used
for this purpose is called as speech synthesizer, and may be enforced in software package and
hardware product. A text-to-speech (TTS) system converts language text into speech, alternative
systems render symbolic linguistic representations. Synthesized speech can be created by
concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of
the stored speech units; a system that stores phones or diaphones provides the largest output range,
but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for
high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other
human voice characteristics to create a completely "synthetic" voice output. The quality of a speech
synthesizer is judged by its similarity to the human voice and by its ability to be understood clearly.
An intelligible text to speech program permits individual with ocular wreckage or reading
disabilities to concentrate to written words on a computing device. Several computer operational
systems have enclosed speech 23 synthesizers since the first nineteen nineties years. The text to
speech system is consisting of 2 parts:-front-end and a back-end. The frontend consist of 2 major
tasks. Firstly, it disciple unprocessed text containing symbols like numbers and abstraction into the
equivalent of written out words. This method is commonly known as text, standardization, or
processing. Front end then assigns spoken transcriptions to every word, and divides and marks the
text into speech units, like phrases, clauses, and sentences. The process of assigning phonetic
transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion. Phonetic
transcriptions and prosody information together make up the symbolic linguistic representation that
is output by the front-end. The back-end—often referred to as the synthesizer—then converts the
symbolic linguistic representation into sound. In certain systems, this part includes the computation
of the target prosody (pitch contour, phoneme durations), which is then imposed on the output
speech.
Text to Speech Conversion Text-to-speech (TTS) is a type of speech synthesis application that is
used to create a spoken sound version of the text in a computer document, such as a help file or a
Web page. TTS can enable the reading of computer display information for the visually challenged
person, or may simply be used to augment the reading of a text message. 24 Current TTS
applications include voice-enabled e-mail and spoken prompts in voice response systems. TTS is
often used with voice recognition programs. There are numerous TTS products available, including
Read Please 2000, Proverbe Speech Unit, and Next Up Technology's TextAloud. Lucent, Elan, and
AT&T each have products called “Text-to-Speech”. In addition to TTS software, a number of
vendors offer products involving hardware, including the Quick Link Pen from WizCom
Technologies, a pen-shaped device that can scan and read words; the Road Runner from Ostrich
Software, a handheld device that reads ASCII text; and DecTalk TTS from Digital Equipment, an
external hardware device that substitutes for a sound card and which includes an internal software
device that works in conjunction with the PC's own sound card.
.
19
Implementation:
This e-mailing engine, when executed prompts the user with two options through audio.
The first option is to compose an email while the other option is to read the mails from
inbox. The user has two speak out the choice as one or two. If the speech recognition
engine catches any sound like one, 1, on, or wan, it runs the module of composing an
email and then further asks the user to speak whatever he/she wants to send as the email.
Once it detects sufficient wait between two words spoken by the user, the engine sends
the mail successfully.
If the user speech recognition engine catches the sound like tu, two,
too and to, it runs the read inbox engine and speaks out the number of emails in the inbox
and then starts speaking out the unseen emails it fetched inside the users email account.
Coding:
Figure-3.
20
The following snippet will ask the users to choose whether to compose an email or check inbox
Figure-4
The below code snippet will act according to input command that user gives and will compose an email and
read inbox for unseen emails if user give one or two as voice command respectively.
21
\
If user gives command to write an email. i.e, 1, one, on, won etc.
22
Figure-5.
23
If user gives command to check his/her inbox ,i.e, 2,tu,too,to two etc.
Figure-6.
24
The following code snippet will fetch and read out the unseen emails present inside the users
inbox.
Figure-7.
25
With the help of a variety of assistive technology, the blind and visually impaired can now access
Internet material. Web elements and their contents are translated into synthetic speech by built-in
text-to-speech synthesisers in screen readers. Voice commands can be translated to text or computer
input using dictation software that uses speech recognition technology. For those who have been
educated to read braille, there are also refreshable braille screens and braille keyboards. In this case,
the ultimate goal of technology is to provide the visually impaired with an online experience that is
comparable to that of a person who is sighted. We've put together a comprehensive guide that
outlines the challenges and limitations with the internetfor the blind. WebAIM recently ran an
automatic searchof the top one million pages to see if they were accessible. Only a small percentage
of the websites met the accessibility compliance standards, according to the findings. One of the
most significant discoveries was that most websites use ambiguous labels for page elements. Many
images, buttons, and menu items are labelled as "image1," "button1," and so on, rather than being
given a meaningful semantic label. When screen readers provide obscure descriptions to blind
users, it just adds to their bewilderment and makes it more difficult for them to navigate the website
Designers and developers like to create appealing interfaces that catch the user's eye, so most online
material is focused onvisual display. To create a web application for visually impaired people that
uses speech to send emails. E-mails are often regarded as the most secure mode of communication
over the Internet for delivering and receiving sensitive information. However, people must meet a
specific criterion in order to use the Internet, and that criterion is that you must be able to see.
However, there are blind persons who cannot read or use a keyboard, thus we've come up with the
idea of voice-based email to communicate information. They may quickly communicate and obtain
information by sending and receiving emails in voice. Audio feedback virtual environments, such
as screen readers, have greatly aided Blind persons in using online apps. We describe the Voicemail
system architecture that can be used by a Blind person to access eMails easily and efficiently. As a
result, we've devised this project in which we'll create a voice-based email system that will allow
visually impaired persons who aren't familiar with computers to utilise email without difficulty.
This system's users would not require any basic knowledge of keyboard shortcuts or wherethe keys
are placed. All of the functions are controlled by a single mouse click. The system will issue voice
orders to the user to accomplish a certain action, and the user will react. The fundamental advantage
of this method is that it eliminates the need for a keyboard; instead, the user must answer solely
26
Chapter 4: Experimental Results
So the basic interface of the project results is shown in the figure below:-
Figure-8.
27
Figure-9.
28
Chapter 5: Conclusion
5. 1 Conclusion
This report proposes Voice based Email system for visually impaired people, which is developed
as an application which helps the blind and handicapped people to access mails easily and
efficiently. It provides a voice based mailing service where the visually impaired person could
read and send mail by their own without the help of others. It requires basic information about
keyboard shortcuts. System has eliminated all these concepts and overcome all difficulties faced
by the visually impaired. It uses a speech recognition application which provides an efficient
voice input method for mailing devices for blind. It is also useful for handicapped and illiterate
people. In future, we attempt to make the system keyboard free and fully voice based. So it's easy
for the visually impaired people to access the services. The system developed now is working
only on desktops. As use of mobile phones is emerging as a trend today, there is a scope to
incorporate this facility as an application in mobile phones also. Also, security measures to be
implemented during the login phase can be revised to form the system safer.
A detailed report with colourful maps, a colourful graph with the theory, balance, and interest,
and a colourful graph with the principle, balance, and interest are all included. The website is
user-friendly and accessible to all types of users. In this research, we propose a system to help
visually impaired people access email services more successfully. This technology will help blind
people. To use the services, the user only needs to follow the IVR's directions and make the
relevant mouse clicks. This e-mail system is simple to use and suitable for people of all ages.
With the use of a speech interpreter, itcan translate speech to text as well as text to speech,
making it a device that can be used by both visually impaired and blind persons.
29
5.2 Future Works
For people who can see, e-mailing is not a big deal, but for people who are not blessed with gift
of vision it postures a key concern because of its intersection with many vocational
responsibilities. This voice based email system has great application as it is used by blind people
as they can understand where they are. E.g. whenever cursor moves to any icon on the website
say Register it will sound like “Register Button”. There are many screen readers available. But
people had to remember mouse clicks. Rather, this project will reduce this problem as mouse
pointer would read out where he/she lies. This system focuses more on user friendliness of all
types of persons including regular persons, visually compromised people as well as illiterate.
ADVANTAGES
E-mailing isn't a significant difficulty for those who can see,but it's a major worry for those who
don't have the gift of sight because it intersects with so many job obligations. This voice-based
email system is useful for blindindividuals since it allows them to comprehend where they are.
For example, whenever the cursor travels over the Register icon on the page, it will sound like
"Register Button." There are a plethora of screen readers to choose from. People, on the other
hand, have to recall mouse clicks.Rather, because the mouse cursor will read out where he or she
is, this project will alleviate the difficulty. This method places a greater emphasis on user
friendliness for all types of people, including typical people who are visually impaired.
30
References
• https://pyglet.readthedocs.io/en/latest/
• https://www.crummy.com/software/BeautifulSoup/bs4/doc/
• https://docs.python.org/3/library/smtplib.html
• https://pypi.org/project/SpeechRecognition/
• https://www.arcjournals.org/pdfs/ijrscse/v3-i1/5.pdf
• https://www.geeksforgeeks.org/project-idea-voice-based-email-visually-challenged/
• https://www.researchgate.net/publication/344296191_Voice_based_E-
mail_for_the_Visually_Impaired
31