0% found this document useful (0 votes)
103 views

Minor Proj

This document describes a text-to-speech converter project submitted to Jawaharlal Nehru Technological University in partial fulfillment of the requirements for a Bachelor of Technology degree in Computer Science and Engineering. The project was completed by four students under the guidance of an associate professor. It includes an abstract, table of contents, chapters on design, implementation, testing, results and a conclusion. The project aims to develop a cost-efficient text-to-speech converter that helps visually impaired people hear text by using optical character recognition to convert text to audio signals.

Uploaded by

shiva prasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
103 views

Minor Proj

This document describes a text-to-speech converter project submitted to Jawaharlal Nehru Technological University in partial fulfillment of the requirements for a Bachelor of Technology degree in Computer Science and Engineering. The project was completed by four students under the guidance of an associate professor. It includes an abstract, table of contents, chapters on design, implementation, testing, results and a conclusion. The project aims to develop a cost-efficient text-to-speech converter that helps visually impaired people hear text by using optical character recognition to convert text to audio signals.

Uploaded by

shiva prasad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

TEXT TO SPEECH CONVERTER

A Minor Project Report


Submitted to
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY, HYDERABAD
In partial fulfillment of the requirement for the award of the degree
BACHELOR OF TECHNOLOGY in COMPUTER SCIENCE AND ENGINEERING

SUBMITTED BY
SARA JHANSI 19641A0572
PINAGANI THAANMAI 19641A05B5
KOYYADA GIRIJA SRI 19641A0595
BANOTH NAVEEN 19641A05A6

Under the Guidance of


Ms. P. SHAILAJA
(ASSOC.PROF)

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

VAAGDEVI COLLEGE OF ENGINEERING


(Autonomous, Affiliated to JNTUH, Accredited By NBA)
BOLLIKUNTA, WARANGAL 2022-2023
VAAGDEVI COLLEGE OF ENGINEERING
(Autonomous, Affiliated to JNTUH, Accredited By NBA)
BOLLIKUNTA, WARANGAL - 506 005

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CERTIFICATE

This is to certify that the project work entitled “TEXT TO SPEECH CONVERTER” is a bonafide work carried
out by Ms. S. JHANSI (19641A0572), Ms. P. THAANMAI (19641A05B5), Ms. K. GIRIJA SRI
(19641A0595), Mr. B. NAVEEN (19641A05A6) in partial fulfillment of the requirements for the award of
degree of Bachelor of Technology in COMPUTER SCIENCE and ENGINEERING from Vaagdevi College
of Engineering, (Autonomous) during the academic year 2019-2023

Project Guide HOD

External Examiner
DECLARATION

We declare that the work reported in the project entitled “TEXT TO SPEECH
CONVERTER” is a record of work done by us in the partial fulfillment for & the award of the

degree of Bachelor of Technology in COMPUTER SCIENCE AND ENGINEERING ,


VAAGDEVI COLLEGE OF ENGINEERING (Autonomous), Affiliated to JNTUH,
Accredited By NBA, under the guidance of Ms. P. SHAILAJA, CSE Department, We
hereby declare that this project work bears no resemblance to any other project submitted at
Vaagdevi College of Engineering or any other university/college for the award of the degree.

S. JHANSI (19641A0572)
P. THAANMAI (19641A05B5)
K. GIRIJA SRI (19641A0595)
B. NAVEEN (19641A05A6)
ACKNOWLEDGEMENT

The development of the project though it was an arduous task, it has been made by the help of many people.
We are pleased to express our thanks to the people whose suggestions, comments, criticisms greatly encouraged
us in betterment of the project.
We would like to express my sincere gratitude and indebtedness to my project Guide Ms. P. SHAILAJA
for his/her valuable suggestions and interest throughout the course of this project.
We are also thankful to Head of the Department Dr. N. SATHYAVATHI, Associate Professor for
providing excellent infrastructure and a nice atmosphere for completing this project successfully.
We would like to express my sincere thanks and profound gratitude to Dr. K. Prakash, principal of
Vaagdevi College of Engineering, for his support, guidance and encouragement in the course of our project.
We are also thankful to Project Coordinators, for their valuable suggestions, encouragement and
motivations for completing this project successfully.
We are thankful to all other faculty members for their encouragement.
We convey my heartfelt thanks to the lab staff for allowing me to use the required equipment whenever
needed.
We are also thankful to (Ms. P. Shailaja ) and supporting staff of Vaagdevi College of Engineering for their
valuable suggestions, encouragement and motivations for completing this project successfully.
Finally, We would like to take this opportunity to thank my family for their support through the work. I
sincerely acknowledge and thank all those who gave directly or indirectly their support in completion of this
work.

S. JHANSI (19641A0572)
P. THAANMAI (19641A05B5)
K. GIRIJA SRI (19641A0595)
B. NAVEEN (19641A05A6)
TABLE OF CONTENTS

PAGE NOS.
Abstract.......................................................................................................................................................i
List of Figures ............................................................................................................................................ii
List of Acronyms……................................................................................................................................iii

CHAPTER I
Introduction 1
i. Existing system 2
ii. Proposed system 2
iii. Software Requirements 3
iv. Hardware Requirements 3

CHAPTER II
Design
Technologies used
Design of the Project

CHAPTER III
Implementations
CHAPTER IV
Testing
CHAPTER V
Results
CHAPTER VI
Conclusion and Future Scope

BIBILOGRAPHY
APPENDIX
ABSTRACT

Current state-of-the-art text-to-speech systems produce intelligible speech but lack the prosody of
natural utterances. Building better models of prosody involves development of prosodically rich speech
databases. However, development of such speech databases requires a large amount of effort and time. An
alternative is to exploit story style monologues (long speech files) in audio books. These monologues already
encapsulate rich prosody including varied intonation contours, pitch accents and phrasing patterns. Thus, audio
books act as excellent candidates for building prosodic models and natural sounding synthetic voices. The
processing of such audio books poses several challenges including segmentation of long speech files, detection
of mispronunciations, extraction and evaluation of representations of prosody. In this thesis, we address the
issues of segmentation of long speech files, capturing prosodic phrasing patterns of a speaker, and conversion of
speaker characteristics.
Text-To-Speech (TTS) is a technology that converts a written text into human
understandable voice. A TTS synthesizer is a computer based system that can be able to read any text aloud that
is given through standard input devices. Most of the information in the world of computer is accessible to a few
who can read or understand a particular language. But it could be very much helpful for the common man if the
computer talks to him in his language. There are several APIs available to convert text to speech in Python. One
of such APIs is the Google Text to Speech API commonly known as the gTTS API. gTTS is a very easy to use
tool which converts the text entered, into audio which can be saved as a mp3 file. The gTTS API supports
several languages including English, Hindi, Tamil, French, German and many more. The speech can be
delivered in any one of the two available audio speeds, fast or slow. However, as of the latest update, it is not
possible to change the voice of the generated audio. Text-to-speech conversion is a technique used to generate a
voice output based on a text. This might be useful when you don’t want to read a document but want to listen to
it instead. Also, some more advanced text-to-speech tools can be used to create a realistic voice for videos, ads,
or podcasts. There are millions of blind people in the world who are visually impaired. Disability to read has a
large impact on the life of visually impaired people. The Proposed system is cost-efficient and helps the visually
impaired person to hear the text. The main idea of this project is optical Character recognition which is used to
convert text character into the audio signal. The text is preprocessed and then used for recognition by
segmenting each character. Segmentation is followed by extraction of the letter and resizing of the file
containing the text.
LIST OF FIGURES

Fig 1.1 Text Flow Chart………………………………………………………


Fig 1.2 Image Flow Chart…………………………………………………….
Fig 1.3 Block Diagram for Text and Image…………………………………..
Fig 2.1 Text to Speech in English…………………………………………….
Fig 2.2 Text to Speech in Telugu…………………………………………….
Fig 2.3 Text to Speech in Japanese…………………………………………..
Fig 3.1 Image to Speech (Input)………………………………………………
Fig 3.2 Image to Speech (Output)…………………………………………….
Fig 4.1 Image to Speech (Input)………………………………………………
Fig 4.2 Image to Speech (Output)…………………………………………….

NOTE: The name of the figure should be mentioned BELOW the figure with description.
LIST OF ACRONYMS

1. Tkinter - toolkitinter
2. gTTs - google text to speech
3. PIL - Python Imaging Library
4. CV - Computer Vision
5. OS - Operating System
1. INTRODUCTION

Text to Speech Converter is an application developed for Android cellphones and tablets. It mainly focuses at
conversion of transcription to speech in any given language. Using this application, most of recent android
cellphones can feature the power of reading text messages in order that user doesn’t got to undergo the message.
Text-to-Speech (TTS) is a useful technology that converts any text into a speech signal. It can be utilized for
various purposes, eg. car navigation, announcements in railway stations, response services in
telecommunications, and e-mail reading. Corpus-based TTS makes it possible to dramatically improve the
naturalness of synthetic speech compared with the early TTS. However, no general-purpose TTS has been
developed that can consistently synthesize sufficiently natural speech. Furthermore, there is not yet enough
flexibility in corpus- based TTS.

The objective of this project is to convert the text into voice with the click of a
button. This project will be developed using Tkinter, gTTs, and playsound library. In this project, we add a
message which we want to convert into voice and click on play button to play the voice of that text message. In
this project, we add a image which contains a particular text or sentence and these are being read in a slow
motion. The text-to-speech (TTS) synthesis procedure consists of two main phases. The first is text analysis,
where the input text is transcribed into a phonetic or some other linguistic representation, and the second one is
the generation of speech waveforms, where the output is produced from this phonetic and prosodic information.
These two phases are usually called high and low-level synthesis [1]. A simplified version of this procedure is
presented in figure 1 below. The input text might be for example data from a word processor, standard ASCII
from e-mail, a mobile text-message, or scanned text from a newspaper. The character string is then pre-
processed and analyzed into phonetic representation which is usually a string of phonemes with some additional
information for correct intonation, duration, and stress. Speech sound is finally generated with the low-level
synthesizer by the information from high-level one. The artificial production of speech-like sounds has a long
history, with documented mechanical attempts dating to the eighteenth century.
i. EXISTING SYSTEM: One of the AI technologies that have been widely used today in everyday
life in various countries in the world is Text-to-Speech (TTS). TTS plays a role in creating
sounds dynamically and automatically as needed. Currently, TTS has developed rapidly in terms
of features and is able to provide output to many languages in the world with accents that are
close to perfect.

Here are some examples of the application of TTS in various important sectors:

1. Traffic Control and Monitoring

2. Reminder of the due date and customer bill

3. As audiobook narrator and supports multitasking

ii. PROPOSED SYSTEM: In our model, we created a new feature which converts a given image into
speech. This model supports text to speech conversion as well. This would help blind people in
recognizing the text.

1. In this module, the text is converted into speech and it displays in a slow speed of voice.

2. As we used gTTs, any language can be recognized and given in speech output.

3. The images are being told what kind of thing is it displays in the image
REQUIREMENTS:

Requirements identification is the first step of any software development project. Until the requirements of
a client have been clearly identified, and verified, no other task (design, coding, testing) could begin. Usually
business analysts having domain knowledge on the subject matter discuss with clients and decide what features are to
be implemented. Categorization of Requirements Based on the target audience or subject matter, requirements can be
classified into different types, as stated below:

iii. SOFTWARE REQUIREMENTS:


1. Operating system: Windows XP/ Fedora core-I
2. Software: jdk 1.4

iv. HARDWARE REQUIREMENTS:

1. 32 MB RAM
2. 1 GB Hard Disk Space
3. Speaker connected to the computer
TECHNOLOGIES USED

1.PYTHON:

what is Python? Chances you are asking yourself this. You may have found this book because you want to
learn to program but don’t know anything about programming languages. Or you may have heard of
programming languages like C, C++, C#, or Java and want to know what Python is and how it compares to “big
name” languages. Hopefully I can explain it for you.

Python concepts
If you not interested in the how and whys of Python, feel free to skip to the next chapter. In this
chapter I will try to explain to the reader why I think Python is one of the best languages available and why it’s
a great one to start programming with.

• Open-source general-purpose language.

• Object Oriented, Procedural, Functional

• Easy to interface with C/Obj C/Java/Fortran

• Easy-ish to interface with C++ (via SWIG)

• Great interactive environment

Python is a high-level, interpreted, interactive and object-oriented scripting language.

Python is designed to be highly readable. It uses English keywords frequently where as other languages use
punctuation, and it has fewer syntactical constructions than other languages.
Python is Interpreted − Python is processed at runtime by the interpreter. You do not need to compile your
program before executing it. This is similar to PERL and PHP.

Python is Interactive − You can actually sit at a Python prompt and interact with the interpreter directly to
write your programs.

Python is Object-Oriented − Python supports Object-Oriented style or technique of programming that


encapsulates code within objects.

Python is a Beginner's Language − Python is a great language for the beginner level programmers and
supports the development of a wide range of applications from simple text processing to WWW browsers to
games.

Python Features
1. Free and Open Source Python language is freely available at the official websit e. Since it is open-
source, this means that source code is also available to the public. So you can download it, use it as well
as share it.
2. Easy to code Python is a high-level programming language. Python is very easy to learn the language as
compared to other languages like C, C#, Java script, Java, etc. It is very easy to code in the Python language
and anybody can learn Python basics in a few hours or days. It is also a developer-friendly language.

3. Easy to Read As you will see, learning Python is quite simple. As was already established, Python’s
syntax is really straightforward. The code block is defined by the indentations rather than by semicolons or
brackets.

4. Object-Oriented Language One of the key features of Python is Object-Oriented programming. Python
supports object-oriented language and concepts of classes, object encapsulation, etc.

5.GUI Programming Support Graphical User interfaces can be made using a module such as PyQt5, PyQt4,
wx Python, or Tk in python. PyQt5 is the most popular option for creating graphical apps with Python.
6.High-Level Language Python is a high-level language. When we write programs in Python, we do not need
to remember the system architecture, nor do we need to manage the memory.

7. Extensible feature Python is an Extensible language. We can write some Python code into C or C++
language and also we can compile that code in C/C++ language.

8. Easy to Debug Excellent information for mistake tracing. You will be able to quickly identify and correct
the majority of your program’s issues once you understand how to interpret Python’s error traces. Simply by
glancing at the code, you can determine what it is designed to perform.

9. Python is a Portable language Python language is also a portable language. For example, if we have
Python code for windows and if we want to run this code on other platforms such as Linux, Unix, and Mac
then we do not need to change it, we can run this code on any platform.

10. Python is an Integrated language Python is also an Integrated language because we can easily integrate
Python with other languages like C, C++, etc.

11. Interpreted Language: Python is an Interpreted Language because Python code is executed line by line
at a time. like other languages C, C++, Java, etc. there is no need to compile Python code this makes it easier
to debug our code. The source code of Python is converted into an immediate form called bytecode.

12. Large Standard Library Python has a large standard library that provides a rich set of modules and
functions so you do not have to write your own code for every single thing. There are many libraries present
in Python such as regular expressions, unit-testing, web browsers, etc.

13. Dynamically Typed Language Python is a dynamically-typed language. That means the type (for
example- int, double, long, etc.) for a variable is decided at run time not in advance because of this feature we
don’t need to specify the type of variable.
14. Frontend and backend development With a new project py script, you can run and write Python codes
in HTML with the help of some simple tags <py-script>, <py-env>, etc. This will help you do frontend
development work in Python like java script.
Backend is the strong forte of Python it’s extensively used for this work cause of its frameworks
like Django and Flask.

15. Allocating Memory Dynamically In Python, the variable data type does not need to be specified. The
memory is automatically allocated to a variable at runtime when it is given a value. Developers do not need to
write int y = 18 if the integer value 15 is set to y. You may just type y=18.
2.Python GUI – tkinter

Python offers multiple options for developing GUI (Graphical User Interface). Out of all the GUI
methods, tkinter is the most commonly used method. It is a standard Python interface to the Tk GUI toolkit
shipped with Python. Python with tkinter is the fastest and easiest way to create the GUI applications.
Creating a GUI using tkinter is an easy task.

To create a tkinter app:

1. Importing the module – tkinter


2. Create the main window (container)
3. Add any number of widgets to the main window
4. Apply the event Trigger on the widgets.

Importing tkinter is same as importing any other module in the Python code. Note that the
name of the module in Python 2.x is ‘Tkinter’ and in Python 3.x it is ‘tkinter’.

import tkinter

There are two main methods used which the user needs to remember while creating the Python
application with GUI.
1. Tk( screen-Name=None, base-Name=None, class-Name=’Tk’, use-Tk=1): To
create a main window, tkinter offers a method ‘Tk(screen-Name=None, base-
Name=None, class-Name=’Tk’, use-Tk=1)’. To change the name of the window,
you can change the class-Name to the desired one. The basic code used to create the
main window of the application is:

m=tkinter.Tk() where m is the name of the main window object


2. mainloop(): There is a method known by the name mainloop() is used when your
application is ready to run. mainloop() is an infinite loop used to run the application,
wait for an event to occur and process the event as long as the window is not closed.
m.mainloop()

import tkinter
m = tkinter.Tk()
'''
widgets are added here
'''
m.mainloop()

tkinter also offers access to the geometric configuration of the widgets which can organize the
widgets in the parent windows. There are mainly three geometry manager classes class.

1. pack() method: It organizes the widgets in blocks before placing in the parent
widget.

2. grid() method: It organizes the widgets in grid (table-like structure) before placing
in the parent widget.

3. place() method: It organizes the widgets by placing them on specific positions


directed by the programmer.
3.gTTS

gTTS (Google Text-to-Speech), a Python library and CLI tool to interface with Google Translate’s
text-to-speech API. Writes spoken mp3 data to a file, a file-like object (bytestring) for further audio
manipulation, or stdout. It features flexible pre-processing and tokenizing.

Installation:
pip install gTTS

The text variable is a string used to store the user’s input. The text can be replaced by anything of your choice

within the quotes.

Another alternative can be to use the input statement for the user to type their own desired input each time the
program is run.

The tts variable is used to perform the Google text-to-speech translation on the user’s input. The output of the

converted text is stored in the form of speech in the tts variable.

The tts.save function allows us to save the converted speech in a format that allows us to play sounds. I have

saved it in a file called hi and in a format called .mp3. Other formats like .wav format can also be used.

The gTTS module can be used extensively on other languages such as French, German, Hindi, etc., as well.

This is extremely useful when there is a communication barrier and the user is unable to convey his messages to

people.
Text-to-speech is a great help to the visually impaired people or people with other disabilities as it can help them

by assisting in the text to speech translation.

There are also many ideas possible with the gTTS module and it can be used for other languages as well.

There is potential for a lot of awesome projects with the same. I will encourage viewers to try experimenting

around more with this module.

The speech can be delivered in any one of the two available audio speeds, fast or slow. However, as of the
latest update, it is not possible to change the voice of the generated audio.

gTTS is a very easy to use tool which converts the text entered, into audio which can be saved as a mp3 file .
4.Playsound:

Play sound on Python is easy. There are several modules that can play a sound file (.wav).
These solutions are cross platform (Windows, Mac, Linux).

The main difference is in the ease of use and supported file formats. All of them should work with Python 3.
The audio file should be in the same directory as your python program, unless you specify a path.

The playsound module is a cross platform module that can play audio files. This doesn’t have any dependencies,
simply install with pip in your virtualenv and run!

from playsound import playsound


playsound('audio.mp3')

It is an easy task to play sound using Python script, because this language contains many modules to use script
in order to to play or record sound.

By using these modules, you can play audio files such as mp3, wav, and other audio file types. You must first
install the sound module before using the module in the script.

The playsound module is the simplest module to use for playing sound.

This module works on both Python 2 and Python 3, and is tested to play wav and mp3 files only. It contains
only one method, named playsound(), with one argument for Linux to take the audio filename for playing.

Method 1: Using playsound module


Run the following command to install the packages:
pip install playsound
• The playsound module contains only a single function named playsound().
• It requires one argument: the path to the file with the sound we have to play. It can be a local file,
or a URL.
• There’s an optional second argument, block, which is set to True by default. We can set it
to False for making the function run asynchronously.
• It works with both WAV and MP3 files.

Example: For WAV format


Method 2: Using pydub module
Run the following commands to install the packages:
sudo apt-get install ffmpeg libavcodec-extra
pip install pydub
Note: You can open WAV files with python. For opening mp3, you’ll need ffmpeg or libav.
This module uses the from_wav() method for playing wav file and from_mp3() method for playing an mp3
file. The play() method is used to play the wav and mp3 file:

Example 1: For WAV format


5.NUMPY

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually
numbers), all of the same type, indexed by a tuple of positive integers. In NumPy dimensions are called axes.
The number of axes is rank.

• Offers Matlab-ish capabilities within Python


• Fast array operations
• 2D arrays, multi-D arrays, linear algebra etc.

POWERFUL N-DIMENSIONAL ARRAYS:


Fast and versatile, the NumPy vectorization, indexing, and broadcasting concepts are the de-facto
standards of array computing today.

NUMERICAL COMPUTING TOOLS:


NumPy offers comprehensive mathematical functions, random number generators, linear algebra
routines, Fourier transforms, and more.

INTEROPERABLE:
NumPy supports a wide range of hardware and computing platforms, and plays well with
distributed, GPU, and sparse array libraries.

PERFORMANT:
The core of NumPy is well-optimized C code. Enjoy the flexibility of Python with the speed of
compiled code.

EASY TO USE:
NumPy’s high level syntax makes it accessible and productive for programmers from any
background or experience level.
OPEN SOURCE:

Distributed under a liberal BSD license, NumPy is developed and maintained publicly on GitHub
by a vibrant, responsive, and diverse community.
2. DESIGN OF THE MINI PROJECT

FLOWCHART:
The following diagram shows how the text is being converted into speech. There are
several APIs available to convert text to speech in Python. One of such APIs is the Google Text to Speech API
commonly known as the gTTS API. gTTS is a very easy to use tool which converts the text entered, into audio
which can be saved as a mp3 file.

Fig:1.1
IMAGE FLOWCHART:

Fig: 1.2
Block Diagram for both modules:

Fig: 1.3
3. IMPLEMENTATIONS
import os
from tkinter import *
from gtts import gTTS
from playsound import playsound
from tkinter import *
import cv2
import sys
import tkinter as tk
from tkinter import ttk
from tkinter import filedialog as fd
from tkinter.messagebox import showinfo
from PIL import Image
import pytesseract
import numpy as np
import pyttsx3;
engine=pyttsx3.init(driverName='sapi5')
def qrr(filename):
img = cv2.imread(filename, 0)# Zero means grayscale
img_origin = cv2.imread(filename)
detector = cv2.QRCodeDetector()

# detect and decode


data, bbox, straight_qrcode = detector.detectAndDecode(img)

if bbox is not None:


#print(f"QRCode data:\n{data}")
n_lines = len(bbox[0])#Cause bbox = [[[float, float]]], we need to convert fload into int and loop over the
first element of array
bbox1 = bbox.astype(int) #Float to Int conversion
for i in range(n_lines):
point1 = tuple(bbox1[0, [i][0]])
point2 = tuple(bbox1[0, [(i+1) % n_lines][0]])
cv2.line(img_origin, point1, point2, color=(255, 0, 0), thickness=2)
cv2.waitKey(0)
cv2.destroyAllWindows()
root.config(bg = 'ghost white')
root.title('DataFlair - TEXT_TO_SPEECH')
##heading
Label(root, text = 'TEXT_TO_SPEECH' , font='arial 20 bold' , bg ='white smoke').pack()
Label(root, text ='vaagdevi mini project R18' , font ='arial 15 bold', bg = 'white smoke').pack(side = BOTTOM)
#label
Label(root, text ='Enter Text', font ='arial 15 bold', bg ='white smoke').place(x=20,y=60)

##text variable
Msg = StringVar()

#Entry
entry_field = Entry(root,textvariable =Msg, width ='50')
entry_field.place(x=20 , y=100)
def Text_to_speech():
Message = entry_field.get()
speech = gTTS(text = Message)
engine.say(Message)
engine.runAndWait()
def Exit():
root.destroy()
def thresholding(image):
return cv2.threshold(image,0,255,cv2.THRESH_BINARY +cv2.THRESH_OTSU)[1]
def remove_noise(image):
return cv2.medianBlur(image,5)
def get_grayscale(image):
return cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
def Reset():
Msg.set("")
def objectdetect():
filetypes = (

else:
data = "QR code not detected"
engine.say(data)
engine.runAndWait()
showinfo(
title='QR code Data',

message=data
)
#messag(data)
def select_file():
filetypes = (
('jpg', '*.jpg'),
('All files', '*.*'),
('jpeg','*.jpeg')
)
filename = fd.askopenfilename(
title='Open a file',
initialdir='/',
filetypes=filetypes)
print(filename)
qrr(filename)
root = Tk()
root.geometry('720x360')
root.resizable(0,0)
root.config(bg = 'ghost white')
root.title('DataFlair - TEXT_TO_SPEECH')
##heading
Label(root, text = 'TEXT_TO_SPEECH' , font='arial 20 bold' , bg ='white smoke').pack()
Label(root, text ='vaagdevi mini project R18' , font ='arial 15 bold', bg = 'white smoke').pack(side = BOTTOM)
#label
Label(root, text ='Enter Text', font ='arial 15 bold', bg ='white smoke').place(x=20,y=60)

##text variable
Msg = StringVar()

#Entry
entry_field = Entry(root,textvariable =Msg, width ='50')
entry_field.place(x=20 , y=100)
def Text_to_speech():
Message = entry_field.get()
speech = gTTS(text = Message)
engine.say(Message)
engine.runAndWait()
def Exit():
root.destroy()
def thresholding(image):
return cv2.threshold(image,0,255,cv2.THRESH_BINARY +cv2.THRESH_OTSU)[1]
def remove_noise(image):
return cv2.medianBlur(image,5)
def get_grayscale(image):
return cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
def Reset():
Msg.set("")
def objectdetect():
filetypes = (
('jpg', '*.jpg'),
('All files', '*.*'),
('jpeg','*.jpeg')
)
filename = fd.askopenfilename(
title='Open a file',
initialdir='/',
filetypes=filetypes)
#img = cv2.imread(filename)
#img = thresholding(img)
#img = remove_noise(img)
img1 = np.array(Image.open(filename))
text = pytesseract.image_to_string(img1)
#print(text)
showinfo(
title='QR code Data',

message=text
)
engine.say(text)
engine.runAndWait()
pass
#Button
Button(root, text = "PLAY" , font = 'arial 15 bold', command = Text_to_speech, width =4).place(x=25, y=140)
Button(root,text = 'EXIT',font = 'arial 15 bold' , command = Exit, bg = 'OrangeRed1').place(x=100,y=140)
Button(root, text = 'RESET', font='arial 15 bold', command = Reset).place(x=175 , y =140)
Button(root,text='QR Reader',font='arial 15 bold',command=select_file).place(x=275 , y =140)
Button(root,text="Object detect",font='arial 15 bold',command=objectdetect).place(x=420,y=140)

#infinite loop to run program


root.mainloop()
4. TESTING

TEST INPUT OUTPUT RESULT


CASE
1 Given text in English Speech in English PASS

2 Given text in Telugu Speech in Telugu PASS

3 Given text in Japanese Speech in Japanese PASS

4 Given image Speech in English PASS


“Vaagdevi College of
Engineering”.
5 Given image Spells all the animals PASS
names.
5. RESULTS

TEXT TO SPEECH

A. A computer system used for this purpose is called a speech synthesizer, and can be implemented in
software or hardware products. In this project, we convert English text into speech using gTTs. The
speech is played by using playsound.

1. Given text in English language “hello”.


2. Choose the button “Play”.
3. The text is being played in female voice in the slowest speed.
B.
A computer system used for this purpose is called a speech synthesizer, and can be implemented in
software or hardware products. In this project, we convert Telugu text into speech using gTTs. The
speech is played by using playsound.

1. Given text in Telugu language “మేము కాలేజీకి వెళ్తు న్నా ము”.


2. Choose the button “Play”.
3. The text is being played in female voice in the slowest speed.
C. A computer system used for this purpose is called a speech synthesizer, and can be implemented in
software or hardware products. In this project, we convert Japanese text into speech using gTTs. The
speech is played by using playsound.

1. Given text in Telugu language “あなたの週末の予定は何ですか”.


2. Choose the button “Play”.
3. The text is being played in female voice in the slowest speed.
IMAGE TO SPEECH:

D. In this image to speech conversion, our module recognizes what type of text or what kind of thing it is
representing in the given input image. Here it detects the objects and gives it in the speech way.

1. Given a college logo.


2. Choose the option “Object Detect”.
3. The output is given in text and speech.

The above is the input we gave to the system. The output is being shown in the below image.
E. In this image to speech conversion, our module recognizes what type of text or what kind of thing it is
representing in the given input image. Here it detects the objects and gives it in the speech way.

1. Given an input image.


2.Choose the Object Detect button.

3. The output is being as speech and text.


6. CONCLUSION & FUTURE SCOPE

We have successfully developed the text to speech python project. We used the popular tkinter library for
rendering graphics on a display window, gTTs (google text to speech) library to convert text to voice, and
playsound library to play that converter voice from the text.Text to speech synthesis is a rapidly growing aspect
of computer technology and is increasingly playing a more important role in the way we interact with the system
and interfaces across a variety of platforms. We have identified the various operations and processes involved in
text to speech synthesis. We have also developed a very simple and attractive graphical user interface which
allows the user to type in his/her text provided in the text field in the application. Our system interfaces with a
text to speech engine developed for American English. Our website provides the user with an audio file of the
book. It is a web based platform which allows the user to upload the image of the book pages and converts them
into an mp3 format. Facilities of downloading the mp3 file. Image to Speech We have successfully developed
the text to speech python project. We used the popular tkinter library for rendering graphics on a display
window, gTTs (google text to speech) library to convert text to voice, and playsound library to play that
converter voice from the text. Text to speech conversion is a fast-growing aspect of computer technology and
has become Text to speech conversion is a fast-growing aspect of computer technology and has become an
important criterion in determining the way we interact with the system and interfaces across a variety of
platforms. This paper intends to propose an approach for image to speech conversion using optical character
recognition and speech synthesis. The application developed is simple to use, very cost effective, portable and
applicable in the real time. Using it we can read text from any natural image, any document image as well as
PDF documents and also it can generate synthesized speech through a computer's speakers. The developed
software has incorporated features like word meaning assistance and voice modulation along with speed control.
This can enable user to multitask and save time by listening to background materials while doing some other
tasks. This system can be used in parts as well. For instance, if we want just the text conversion it is also
possible moreover only text to speech can also be performed separately. Expensive hardware component,
support software, updated Operating System version or even internet connection is not required. A webcam, to
capture image, is an optional requirement. People with visual impairments or complete blindness can use this
application for reading the documents and books. People with learning disabilities can also make use of this
application. It is convenient and easy to use for those who do not have much knowledge about the workings of a
computer but still make use of this application the dictionary feature added intends to help understand the text.
With the availability of synonyms and word meanings in the application itself users do not have to bother to
search anything in numerous places. Tests have been conducted to check the conversion and good results have
been achieved.
FUTURE SCOPE: There is scope to add more functionality to the present application. One of the addons can
include support for languages other than English. Algorithms used to pre-process a natural image works
sufficiently in this system. However, there is a scope to improve it further.as well as the OCR techniques. Also,
adding support for more formats of image inputs can be provided. Algorithms can be developed to recognize
text from low resolution and blur images. This will help users to upload old historic manuscripts and scrolls
which have been damaged and extract text from it.

ADVANTAGES:

➨It helps to listen to class notes, text books and electronic text.
➨It facilitates education.

➨It avoids eyestrain from too much reading.


➨It helps in learning languages which you do not know.

➨It helps in preparation of speeches by hearing your work read aloud.

➨It helps in listening e-books or e-material during journey.


➨It amuses children by letting your PC read stories to them when you are busy.

➨It helps seniors or those having vision problems.


➨It can be adapted easily to say whatever users want them to say.

➨It can help in reading large paragraphs and offers range of different accents and voices.
DISADVANTAGES:

➨The system is very time consuming as it requires huge databases and hard-coding of combination
to form these words. As a result speech synthesis consumes more processing power.
➨The resulting speech is less than natural and emotionless. This is because it is impossible to get
audio recordings of all possible words spoken in all the possible combinations of emotions, prosody,
stress etc.
➨Pronunciation analysis from written text is a major concern.
➨It is difficult to build a perfect system.

➨Filtering background noise is a task which can even be difficult for humans to accomplish.
BIBLIOGRAPHY

1) Lemmetty, S., 1999. Review of Speech Syn1thesis Technology. Masters


Dissertation, Helsinki University Of Technology.
2) Dutoit, T., 1993. High quality text-to-speech synthesis of the French language.
Doctoral dissertation, Faculte Polytechnique de Mons.
3) Suendermann, D., Höge, H., and Black, A., 2010. Challenges in Speech Synthesis.
Chen, F., Jokinen, K., (eds.), Speech Technology, Springer Science + Business Media LLC.
4) Allen, J., Hunnicutt, M. S., Klatt D., 1987. From Text to Speech: The MITalk
system. Cambridge University Press.
5) Text-to-speech (TTS) Overview. In Voice RSS Website. Retrieved February 21,
2014, from http://www.voicerss.org/tts/
6) Text-to-speech technology: In Linguatec Language Technology Website.
Retrieved February 21, 2014, from http://www.linguatec.net/products/tts/information/technology
7) Dutoit, T., 1997. High-Quality Text-to-Speech Synthesis:An Overview. Journal
Of Electrical And Electronics Engineering Australia 17, 25-36.
8) Black, A.W., 2002. Perfect synthesis for all of the people all of the time. IEEE
TTS Workshop.
9) Kominek, J., and Black, A.W., 2003. CMU ARCTIC databases for speech
synthesis. CMU-LTI-03-177. Language Technologies Institute, School of Computer Science, Carnegie
Mellon University.
10) Zhang, J., 2004. Language Generation and Speech Synthesis in Dialogues for
Language Learning. Masters Dissertation, Massachusetts Institute of Technology.

You might also like