0% found this document useful (0 votes)
49 views

Minor Project Report

The document describes a student project on a speech recognition system submitted for a bachelor's degree. It includes declarations from the students and their professor certifying the authenticity and approval of the work. The project involved developing a speech recognition system with three modules: an automatic certificate generator, a meeting summarizer that extracts audio from Zoom calls and creates text summaries, and a voice-to-handwritten text converter intended to aid disabled people.

Uploaded by

pmmodi271101
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Minor Project Report

The document describes a student project on a speech recognition system submitted for a bachelor's degree. It includes declarations from the students and their professor certifying the authenticity and approval of the work. The project involved developing a speech recognition system with three modules: an automatic certificate generator, a meeting summarizer that extracts audio from Zoom calls and creates text summaries, and a voice-to-handwritten text converter intended to aid disabled people.

Uploaded by

pmmodi271101
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

SPEECH RECOGNITION SYSTEM

A Project
Submitted in partial fulfillment of the requirements for
the award of the Degree of
BACHELOR OF COMPUTER APPLICATION
By

NIRMALYA SHIT
ENROLLMENT NO-12021004006061 AND REGISTRATION NO-213661001210131
SHYAM SHAH
ENROLLMENT NO-12021004006048 AND REGISTRATION NO-213661001210143
ASHWANI JHA
ENROLLMENT NO-12021004006043 AND REGISTRATION NO-213661001210148
RONIK SAHA
ENROLLMENT NO-12021004006001 AND REGISTRATION NO-213661001210190

DEPARTMENT OF COMPUTER SCIENCE/ APPLICATION

INSTITUTE OF ENGINEERING & MANAGEMENT

2023
DECLARATION CERTIFICATE

This is to certify that the work presented in the thesis entitled “SPEECH

RECOGNITION SYSTEM” in partial fulfillment of the requirement for the award


of degree of Bachelor of Computer Application of Institute of Engineering &
Management is an authentic work carried out under my supervision and
guidance.

To the best of my knowledge the content of this thesis does not form a
basis for the award of any previous Degree to anyone else.

Date: 20/11/2023 Prof. Ankan Bhowmik

Dept. of Computer Application

Institute of Engineering & Management

Dr. Abhishek Bhattacharya

Head of the Department

Dept. of Computer Application and Science

Institute of Engineering & Management

i
CERTIFICATE OF APPROVAL

The foregoing thesis entitled “SPEECH RECOGNITION SYSTEM” is


hereby approved as a creditable study of research topic and has been presented
in satisfactory manner to warrant its acceptance as prerequisite to the degree for
which it has been submitted.

It is understood that by this approval, the undersigned do not necessarily


endorse any conclusion drawn or opinion expressed therein, but approve the
thesis for the purpose for which it is submitted.

(Internal Examiner) (External Examiner)

ii
Acknowledgements
We would like to express our special thanks of gratitude to our Guide Prof. Ankan Bhowmik
who helped us a lot in this project, her valuable suggestions helped us to solve tough challenges
and without her help this project could not have been completed in time. A special thanks to
our Head of Department Prof. Abhishek Bhattacharya who gave us the golden opportunity to
do this wonderful project on the topic “SPEECH RECOGNITION SYSTEM”, which helped us to
gain a significant knowledge in the aforesaid subjects. Secondly, we would like to thank our
friends who helped us a lot in finalizing this project within the given time frame.

Name of Student: Nirmalya Shit


Enrollment Num: 12021004006061

Name of Student: Shyam Shah


Enrollment Num: 12021004006048

Name of Student: Ashwani Jha


Enrollment Num: 12021004006043

Name of Student: Ronik Saha


Enrollment Num: 12021004006001

iii
Contents

Abstract ………………………………………………………………………………………………………..…..v
Chapter 1
1.1 Introduction ………………………………………………………………………….………………….…1

Chapter 2
2.1 Background Studies……………………………………………………………………………….…….2
2.2 Literature Survey ….…………………………………………………………………………………….2

Chapter 3
3.1 Proposed Methodology…………………………………….……………………………………..…3

Chapter 4
4.1 Experimental Dataset……………………………………………………………………………..….4

Chapter 5
5.1 Expected Outcome…………………………………………………………………………………….5

Chapter 6
6.1 Conclusions …………………………………………..……………………………………………….…6
6.2 Future Work …………………………………………….…………………………………………….…6

Reference…………………………………………..…………….…………………………………………….7

iv
Abstract
Our project comprises of three modules, the first module was a part of our minor
project and it was an automatic certificate generator. The major project
comprises of two modules, one in which we extract the audio file from a
particular Zoom meeting and create a summary of the important points in text
format with the help of our Summarizer. This module focuses on aiding the
attendees of the meeting with ease of reference to all the important points put
forward during the session.

The final module is a voice to handwritten text converter. It has a default text
format and can be used for several purposes, mainly focusing but not limited to
primarily aiding physically handicapped people and indisposed individuals.

v
Chapter 1

1.1 Introduction
Our project is all about generating certificates, Summarizing the important points into
audio format and converting speech into handwritten texts from zoom call meetings.
Certificate Generation of old practices are not quite efficient nowadays for generating
student’s certificates. Student enrollment in schools and colleges is increasing every
year and generating student's certificates every day at every workshop or any event
plays a very vital role. Similarly in this pandemic, whole world has come to online mode
of work and adapting the new form of learning and teaching whether it’s school or
office. All the classes and important meetings are going on popular online meetings
application called ZOOM meetings.

The Speech is one of the most important tools for communication between human and
his environment. Therefore manufacturing of ASR is need for human being all the time.
Speech recognition made it feasible for machine to understand human languages. As
information technology has a bang on more and more aspects of our lives with every
year, the problem of communication between human beings and information
processing devices becomes increasingly significant. Up to now, communication has
almost fully been through the use of keyboards and screens, but speech is the most
widely used, natural and the fastest means of communication for people. In a speech
recognition system, many parameters affect the accuracy of the Recognition System.

1 | Institute of Engineering & Management


Chapter 2
2.1 Background Studies
The main objective of this project is to convert spoken language into text, enabling
computers to understand and interpret human speech for various applications. In
1952, Three Bell Lab researchers, Stephen Balashek, R. Bidulph and K.H. Davis built a
system called “Audrey” for single-speaker digit recognition. Their system located the
formants in the power spectrum of each utterance. In 1960, Gunnar Fant developed
and published the source-filter model of speech production. All the analyzation we
have done on speech recognition, there do not have a feature. Which is very necessary
part in present and future science era. This is our effort to fill that gap. This flaw is
motivated us to develop this project.

To design a Summarizer and Certificate Generation system with full fledge logic and
environment using new trending technologies in market with maximum functionalities,
efficiency, maximum optimal speed.

2.2 Literature Survey


Some of the objectives are as follows:
A study (by Dong Wong College of Computer, National University of Defence
Technology, 2020) highlights the RNN model's effectiveness. Theoretically, it can map
an input to any finite, discrete output sequence. Interdependencies between input and
output and within output elements are also jointly modelled [1].
Research by (M.halle & K Stevens) demonstrates the importance of diverse datasets
that encompass various accents and languages. This approach ensures that speech
recognition models generalize well across linguistic variations, laying the groundwork
for our commitment to inclusivity [2].
Dreuw in his research emphasizes the significance of visual indicators for users with
hearing impairments. Our model innovatively allows users to customize visual alerts,
providing a personalized and inclusive communication experience [3].
Studies by Wilson et al. stress the importance of creating interfaces that prioritize
accessibility. Our model adopts a user-centric design, ensuring an intuitive and
adaptable interface catering to the needs of diverse users [4].
Institute of Engineering & Management | 2
Chapter 3
3.1 Proposed Methodology
Our project is different from previous speech recognition projects. Speech recognition
system is an essential part for learning in it. Generally speech recognition system is a
very necessary technology to recognize a voice and to convert speech to text, to know
its correct spelling and to convert text to speech, to know its correct pronunciation. But
text and sign language are the only part of life for students who cannot speak or hear in
an educational society. Previous research did not have this advantage. We will have
this facility in this project. Our main objective in developing this speech recognition
project is to bring this sign language facility to people who cannot speak or hear.

Unique Contribution:

1. Inclusive Deaf-Friendly Features: Our project goes beyond conventional speech


recognition by incorporating features specifically designed to enhance accessibility for
the deaf and hard-of-hearing communities.

2. Sign Language Recognition Integration: In a ground breaking move towards


inclusivity, our speech recognition model extends its capabilities to recognize and
interpret sign language. By leveraging advanced computer vision techniques, our
system can process and understand sign language gestures, providing a comprehensive
communication solution for both spoken and signed languages.

3. Customizable Visual Alerts: Understanding the reliance on visual cues for individuals
with hearing impairments, our system introduces customizable visual alerts.

4. User-Centric Interface for Deaf Users: Our user interface prioritizes the needs of
deaf users, featuring customizable visual themes, text-based feedback, and intuitive
controls. By considering the unique requirements of the deaf community.

5. Collaborative Learning from Deaf User Input: To further improve the model's
performance in understanding diverse communication styles, we seek input from the
deaf community.

3 | Institute of Engineering & Management


Chapter 4
4.1 Experimental Dataset

Source data

For speech recognition system, source data are collected from Kaggle. The corpus is
split into several parts for your convenience. The subsets with “valid” in their name are
audio clips that have had at least 2 people listen to them, and the majority of those
listeners say the audio matches the text. The subsets with “invalid” in their name are
clips that have had at least 2 listeners, and the majority say the audio does not match
the clip. All other clips, ie. those with fewer than 2 votes, or those that have equal valid
and invalid votes, have “other” in their name. Their have many audio clips, images clips
and text dataset. Considering that low-resource languages are primarily spoken in low-
resource regions, my approach to preparing Swahili data for speech recognition made
use of freely accessible tools. This way I hope it more accessible to a wider audience.

Data type
Common Voice is a corpus of speech data read by users on the Common Voice and
based upon text from a number of public domain sources like user submitted blog
posts, old books, movies, and other public speech corpora. Its primary purpose is to
enable the training and testing of automatic speech recognition (ASR) systems. In
speech recognition system data types are three types: audio, images and text. For a
speech recognition system with sign language recognition facility, it is a useful data
types.

Institute of Engineering & Management | 4


Chapter 5
5.1 Expected Outcome
➢ Thus we are able to recognize multiple words such as Samosa, Dosa, Tea and is
converted into text.
➢ This system is suitable with an environment with less ambient noise.
➢ This system provides good performance with respect to other systems.
➢ It can be concluded that GMM provides more accuracy.
➢ It can be inferred that the comparison of voice samples is quite complicated but
absolutely possible.
➢ It will revolutionize the way the people conduct business over the web and will,
ultimately, differentiate world-class e-business.
➢ Any gesture or sign language are able to recognize.
➢ Able to cancel background noise.
➢ Convert Sign Language to speech or text which will be able to understand.

5 | Institute of Engineering & Management


Chapter 6
6.1 Conclusion
Speech recognition technology has made significant strides, enhancing communication
and accessibility. Its applications range from virtual assistants to transcription services.
While it has improved, challenges like accents and background noise persist. Continued
advancements and research promise a future where speech recognition becomes even
more accurate and inclusive.

6.2 Future Work


The future of speech recognition holds advancements in accuracy, natural language
understanding, and integration with various devices. Expect improved performance in
noisy environments, broader language support, and enhanced contextual
understanding for more seamless interactions. Additionally, developments in AI may
lead to more personalized and adaptive speech recognition systems.

Institute of Engineering & Management | 6


References
[1] Dong Wang, Xiaodong Wang, Shaohe Lv, 2019, An Overview of End-to-End
Automatic Speech Recognition.

[2] M. Halle, K. Stevens,2018, Speech Recognition: A model and a program for research.

[3] Philippe Dreuw, David Rybach, Thomas Deselaers, Morteza Zahedi, Hermann Ney,
2007, Speech Recognition Techniques.

[4] W. Stokoe, D. Casterline, C. Cronenberg, A Dictionary of American Speech


Recognition on Linguistic Principles, Gallaudet College Press, Washington D. C.,
U.S.A,1965.

7 | Institute of Engineering & Management

You might also like