0% found this document useful (0 votes)
7 views

SpeechRecognition

The document provides an overview of speech recognition technology using Python, particularly focusing on the SpeechRecognition library. It includes instructions for installation, code examples for transcribing audio files, and capturing audio from a microphone. Additionally, it discusses the setup for using Google Speech Recognition and handling exceptions during the recognition process.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

SpeechRecognition

The document provides an overview of speech recognition technology using Python, particularly focusing on the SpeechRecognition library. It includes instructions for installation, code examples for transcribing audio files, and capturing audio from a microphone. Additionally, it discusses the setup for using Google Speech Recognition and handling exceptions during the recognition process.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

SpeechRecognition

Speech recognition is a technology that allows computers to understand and


process human speech. Python, with its simplicity and robust libraries, offers
several modules to tackle speech recognition tasks effectively. One of the most
popular libraries for this purpose is the SpeechRecognition library.

With SpeechRecognition Library


In this section, we will base our speech recognition system on this tutorial.
SpeechRecognition library offers many transcribing engines like Google Speech
Recognition, and that's what we'll be using.

 Before we get started, let's install the required libraries:


 $ pip install SpeechRecognition pydub

Open up a new file named speechrecognition.py, and add the following:

# importing libraries
import speech_recognition as sr
import os
from pydub import AudioSegment
from pydub.silence import split_on_silence

# create a speech recognition object


r = sr.Recognizer()

 The below function loads the audio file, performs speech recognition, and
returns the text:

 # a function to recognize speech in the audio file


 # so that we don't repeat ourselves in in other functions
def transcribe_audio(path):
# use the audio file as the audio source
with sr.AudioFile(path) as source:
audio_listened = r.record(source)
# try converting it to text
text = r.recognize_google(audio_listened)
return text
 Next, we make a function to split the audio files into chunks in silence:

# a function that splits the audio file into chunks on silence


# and applies speech recognition
def get_large_audio_transcription_on_silence(path):
"""
Splitting the large audio file into chunks
and apply speech recognition on each of these chunks
"""
# open the audio file using pydub
sound = AudioSegment.from_file(path)
# split audio sound where silence is 700 miliseconds or more and
get chunks
chunks = split_on_silence(sound,
# experiment with this value for your target audio file
min_silence_len = 500,
# adjust this per requirement
silence_thresh = sound.dBFS-14,
# keep the silence for 1 second, adjustable as well
keep_silence=500,
)
folder_name = "audio-chunks"
# create a directory to store the audio chunks
if not os.path.isdir(folder_name):
os.mkdir(folder_name)
whole_text = ""
# process each chunk
for i, audio_chunk in enumerate(chunks, start=1):
# export audio chunk and save it in
# the `folder_name` directory.
chunk_filename = os.path.join(folder_name, f"chunk{i}.wav")
audio_chunk.export(chunk_filename, format="wav")
# recognize the chunk
with sr.AudioFile(chunk_filename) as source:
audio_listened = r.record(source)
# try converting it to text
try:
text = r.recognize_google(audio_listened)
except sr.UnknownValueError as e:
print("Error:", str(e))
else:
text = f"{text.capitalize()}. "
print(chunk_filename, ":", text)
whole_text += text
# return the text for all chunks detected
return whole_text
print(get_large_audio_transcription_on_silence("7601-291468-0006.wav"))

Implementing Speech Recognition with Python

basic implementation using the SpeechRecognition library involves several steps:

Audio Capture: Capturing audio from the microphone using PyAudio.

Audio Processing: Converting the audio signal into data that the SpeechRecognition library can work
with.

Recognition: Calling the recognize_google() method (or another available recognition method) on
the SpeechRecognition library to convert the audio data into text.

Pro_2

import speech_recognition as sr

# Initialize recognizer class (for recognizing the speech)


r = sr.Recognizer()

# Reading Microphone as source


# listening the speech and store in audio_text variable
with sr.Microphone() as source:
print("Talk")
audio_text = r.listen(source)
print("Time over, thanks")
# recoginze_() method will throw a request
# error if the API is unreachable,
# hence using exception handling

try:
# using google speech recognition
print("Text: "+r.recognize_google(audio_text))
except:
print("Sorry, I did not get that")

Speech Recognition in Python using Google Speech API

sudo pip install SpeechRecognition

PyAudio: Use the following command for Linux users


sudo apt-get install python-pyaudio python3-pyaudio
If the versions in the repositories are too old,
install pyaudio using the following command
sudo apt-get install portaudio19-dev python-all-dev python3-all-dev
&&
sudo pip install pyaudio
pip install pyaudio
USB Device 0x46d:0x825: Audio (hw:1, 0)

Make a note of this as it will be used in the program.

Set Chunk Size: This basically involved specifying how many bytes of data we want to read at once.
Typically, this value is specified in powers of 2 such as 1024 or 2048

Set Sampling Rate: Sampling rate defines how often values are recorded for processing

Set Device ID to the selected microphone : In this step, we specify the device ID of the microphone
that we wish to use in order to avoid ambiguity in case there are multiple microphones. This also
helps debug, in the sense that, while running the program, we will know whether the specified
microphone is being recognized. During the program, we specify a parameter device_id. The
program will say that device_id could not be found if the microphone is not recognized.

Allow Adjusting for Ambient Noise: Since the surrounding noise varies, we must allow the program a
second or two to adjust the energy threshold of recording so it is adjusted according to the external
noise level.

Speech to text translation: This is done with the help of Google Speech Recognition. This requires an
active internet connection to work. However, there are certain offline Recognition systems such as
PocketSphinx, that have a very rigorous installation process that requires several dependencies.
Google Speech Recognition is one of the easiest to use.

SPEECH HINDI
pip install SpeechRecognition
pip install PyAudio
pip install pipwin
pipwin install pyaudio

WAP Speech Hindi


# import required module
import speech_recognition as sr

# explicit function to take input commands


# and recognize them
def takeCommandHindi():

r = sr.Recognizer()
with sr.Microphone() as source:

# seconds of non-speaking audio before


# a phrase is considered complete
print('Listening')
r.pause_threshold = 0.7
audio = r.listen(source)
try:
print("Recognizing")
Query = r.recognize_google(audio, language='hi-In')

# for listening the command in indian english


print("the query is printed='", Query, "'")

# handling the exception, so that assistant can


# ask for telling again the command
except Exception as e:
print(e)
print("Say that again sir")
return "None"
return Query

# Driver Code

# call the function


takeCommandHindi()

You might also like