Minor Project Report
Minor Project Report
A Project
Submitted in partial fulfillment of the requirements for
the award of the Degree of
BACHELOR OF COMPUTER APPLICATION
By
NIRMALYA SHIT
ENROLLMENT NO-12021004006061 AND REGISTRATION NO-213661001210131
SHYAM SHAH
ENROLLMENT NO-12021004006048 AND REGISTRATION NO-213661001210143
ASHWANI JHA
ENROLLMENT NO-12021004006043 AND REGISTRATION NO-213661001210148
RONIK SAHA
ENROLLMENT NO-12021004006001 AND REGISTRATION NO-213661001210190
2023
DECLARATION CERTIFICATE
This is to certify that the work presented in the thesis entitled “SPEECH
To the best of my knowledge the content of this thesis does not form a
basis for the award of any previous Degree to anyone else.
i
CERTIFICATE OF APPROVAL
ii
Acknowledgements
We would like to express our special thanks of gratitude to our Guide Prof. Ankan Bhowmik
who helped us a lot in this project, her valuable suggestions helped us to solve tough challenges
and without her help this project could not have been completed in time. A special thanks to
our Head of Department Prof. Abhishek Bhattacharya who gave us the golden opportunity to
do this wonderful project on the topic “SPEECH RECOGNITION SYSTEM”, which helped us to
gain a significant knowledge in the aforesaid subjects. Secondly, we would like to thank our
friends who helped us a lot in finalizing this project within the given time frame.
iii
Contents
Abstract ………………………………………………………………………………………………………..…..v
Chapter 1
1.1 Introduction ………………………………………………………………………….………………….…1
Chapter 2
2.1 Background Studies……………………………………………………………………………….…….2
2.2 Literature Survey ….…………………………………………………………………………………….2
Chapter 3
3.1 Proposed Methodology…………………………………….……………………………………..…3
Chapter 4
4.1 Experimental Dataset……………………………………………………………………………..….4
Chapter 5
5.1 Expected Outcome…………………………………………………………………………………….5
Chapter 6
6.1 Conclusions …………………………………………..……………………………………………….…6
6.2 Future Work …………………………………………….…………………………………………….…6
Reference…………………………………………..…………….…………………………………………….7
iv
Abstract
Our project comprises of three modules, the first module was a part of our minor
project and it was an automatic certificate generator. The major project
comprises of two modules, one in which we extract the audio file from a
particular Zoom meeting and create a summary of the important points in text
format with the help of our Summarizer. This module focuses on aiding the
attendees of the meeting with ease of reference to all the important points put
forward during the session.
The final module is a voice to handwritten text converter. It has a default text
format and can be used for several purposes, mainly focusing but not limited to
primarily aiding physically handicapped people and indisposed individuals.
v
Chapter 1
1.1 Introduction
Our project is all about generating certificates, Summarizing the important points into
audio format and converting speech into handwritten texts from zoom call meetings.
Certificate Generation of old practices are not quite efficient nowadays for generating
student’s certificates. Student enrollment in schools and colleges is increasing every
year and generating student's certificates every day at every workshop or any event
plays a very vital role. Similarly in this pandemic, whole world has come to online mode
of work and adapting the new form of learning and teaching whether it’s school or
office. All the classes and important meetings are going on popular online meetings
application called ZOOM meetings.
The Speech is one of the most important tools for communication between human and
his environment. Therefore manufacturing of ASR is need for human being all the time.
Speech recognition made it feasible for machine to understand human languages. As
information technology has a bang on more and more aspects of our lives with every
year, the problem of communication between human beings and information
processing devices becomes increasingly significant. Up to now, communication has
almost fully been through the use of keyboards and screens, but speech is the most
widely used, natural and the fastest means of communication for people. In a speech
recognition system, many parameters affect the accuracy of the Recognition System.
To design a Summarizer and Certificate Generation system with full fledge logic and
environment using new trending technologies in market with maximum functionalities,
efficiency, maximum optimal speed.
Unique Contribution:
3. Customizable Visual Alerts: Understanding the reliance on visual cues for individuals
with hearing impairments, our system introduces customizable visual alerts.
4. User-Centric Interface for Deaf Users: Our user interface prioritizes the needs of
deaf users, featuring customizable visual themes, text-based feedback, and intuitive
controls. By considering the unique requirements of the deaf community.
5. Collaborative Learning from Deaf User Input: To further improve the model's
performance in understanding diverse communication styles, we seek input from the
deaf community.
Source data
For speech recognition system, source data are collected from Kaggle. The corpus is
split into several parts for your convenience. The subsets with “valid” in their name are
audio clips that have had at least 2 people listen to them, and the majority of those
listeners say the audio matches the text. The subsets with “invalid” in their name are
clips that have had at least 2 listeners, and the majority say the audio does not match
the clip. All other clips, ie. those with fewer than 2 votes, or those that have equal valid
and invalid votes, have “other” in their name. Their have many audio clips, images clips
and text dataset. Considering that low-resource languages are primarily spoken in low-
resource regions, my approach to preparing Swahili data for speech recognition made
use of freely accessible tools. This way I hope it more accessible to a wider audience.
Data type
Common Voice is a corpus of speech data read by users on the Common Voice and
based upon text from a number of public domain sources like user submitted blog
posts, old books, movies, and other public speech corpora. Its primary purpose is to
enable the training and testing of automatic speech recognition (ASR) systems. In
speech recognition system data types are three types: audio, images and text. For a
speech recognition system with sign language recognition facility, it is a useful data
types.
[2] M. Halle, K. Stevens,2018, Speech Recognition: A model and a program for research.
[3] Philippe Dreuw, David Rybach, Thomas Deselaers, Morteza Zahedi, Hermann Ney,
2007, Speech Recognition Techniques.