Speech Recognition in Python using CMU Sphinx
Last Updated :
26 Apr, 2025
“Hey, Siri!”, “Okay, Google!” and “Alexa playing some music” are some of the words that have become an integral part of our life as giving voice commands to our virtual assistants make our life a lot easier. But have you ever wondered how these devices are giving commands via voice/speech?
Do applications understand your voice? How does the computer even decode this if it only understands 0/1?
The answer is simple: it uses Speech Recognition software to decode the user input received as speech/voice using the device’s microphone. Speech Recognition software to decode the user input received as speech/voice using the device’s microphone. the task of this software is to convert the speech to a string(text) so that the computer can then decode it.
One such Toolkit is CMU Sphinx which is an open-source toolkit used for speech recognition, it also has a lightweight recognizer library called Pocketsphinx which will be used to recognize the speech. This library is a great resource especially when you are offline as when you have internet access you should prefer Google API with speech recognition due to higher precision. but when you are building a project that works offline or uses speech on an offline embedded device, use pocketsphinx.
Recognition Process
Let’s discuss how this library works from behind to actually recognize our voice, It takes a waveform and then splits it according to utterances by silence then traverses and tries to find out what is being said in each utterance for accomplishing this task it takes all possible combinations of words and try to match them with audio choosing the best matching combination.
Installation of modules
Since pocketsphinx is an external library i.e. its not present as an inbuilt entity in python we would install it to our machines using pip installer and then using import to invoke all the functionalities of this library,
Now open your terminal and type the following command
NOTE- make sure that you have latest version of pip installed if not then type following
python -m pip install --upgrade pip setuptools wheel
If you have latest version of pip then proceed directly and type the following code into your terminal.
pip install pocketsphinx
Now that you have installed pocketsphinx in your machine lets move forward to more.
Prerequisites
There are two prerequisite library which is used along side with pocketsphinx they are :-
- SpeechRecognition – used for speech recognition ,with support for several engines and APIs, online and offline.
- PyAudio-used to play and even record audio in python.
Now it is recommended to install these two library using pip install command:-
pip install SpeechRecognition
brew install portaudio
pip install pyaudio
Now installation of all required external library is completed so lets move forward to code.
LiveSpeech
It is an external iterator class available in pocketsphinx which can be used for continuous recognition or keyword search from a microphone.
Here is the code for continuous recognition.
Python3
from pocketsphinx import LiveSpeech
for phrase in LiveSpeech():
print (phrase)
else :
print ( "Sorry! could not recognize what you said" )
|
Output :
We used LiveSpeech in a basic for in loop to fetch continuous speech input from user using the device microphone then we store the converted string into phrase and display each word uttered by the user.
Keyword searching
We use an variable named speech of type pocketsphinx.LiveSpeech , In which we invoke the class LiveSpeech with arguments keyphrase i.e. the keyword to be searched and kws_threshold then we used an for in loop on speech which continuously looks for user input in form of voice if the user utters the word ‘forward’ then it is printed along with segments.
Python3
from pocketsphinx import LiveSpeech
speech = LiveSpeech(keyphrase = 'forward' , kws_threshold = 1e - 20 )
for phrase in speech:
print (phrase.segments(detailed = True ))
|
Output :
Test program
First of all import speech_recognition with referencing it as some reference name aud now you can recognize speech using your code.
Now fetch audio from devices microphone and store in variable reference of type speech_recognition.Recognizer to recognize the audio and convert to text. After that define microphone as your source of input and define an variable reference say audio to listen i.e it takes user input of speech and stores it there, then we use invoke sphinx using try we try printing what user said here we invoke recognize_sphinx and pass argument audio, now the work of this class to convert what user said (in form of speech ) to text form and display it in console simply called Recognition.
If the code is unable to accept voice input due to unclear voice then we throw an exception for unclear voice and for RequestError tool.
Python3
import speech_recognition as aud
a = aud.Recognizer()
with aud.Microphone() as source:
print ( "Say something!" )
audio = a.listen(source)
try :
print ( "You said " + a.recognize_sphinx(audio))
except aud.UnknownValueError:
print ( "Could not understand" )
except aud.RequestError as e:
print ( "Error; {0}" . format (e))
|
Output:
Conclusion
This winds up our topic of discussion of Speech recognition using CMU Sphinx , there lot of more applications of this useful library.
Similar Reads
Speech Recognition in Hindi using Python
We can use Python for Speech Recognition, it is mostly used to recognize English words. However, in this article, we are going to use Python so that it can also recognize Hindi words with the help of the Speech Recognition module. Requirements:Speech Recognition Module: It is a library with the help
2 min read
Speech Recognition in Python using Google Speech API
Speech Recognition is an important feature in several applications used such as home automation, artificial intelligence, etc. This article aims to provide an introduction to how to make use of the SpeechRecognition library of Python. This is useful as it can be used on microcontrollers such as Rasp
4 min read
Speech Recognition Module Python
Speech recognition, a field at the intersection of linguistics, computer science, and electrical engineering, aims at designing systems capable of recognizing and translating spoken language into text. Python, known for its simplicity and robust libraries, offers several modules to tackle speech rec
4 min read
Python - Get Today's Current Day using Speech Recognition
We often don't remember the day of the date due to our load of work we are doing. So, here is a Python program with the help of which we can find the day of the date with just a simple chat with our laptop or mobile. Modules NeededDateTime: This is a Library in Python with the help of which we can m
4 min read
How To use Cloud Speech-To-Text For Speech Recognition On GCP?
Google Cloud Platform is one of the famous cloud service providers in the market. With cloud features focusing on deployment and storage, GCP also provides features like speech recognition. This powerful and easy-to-use service is called Cloud speech-to-text. This functionality enables developers to
6 min read
Python | Perform Sentence Segmentation Using Spacy
The process of deciding from where the sentences actually start or end in NLP or we can simply say that here we are dividing a paragraph based on sentences. This process is known as Sentence Segmentation. In Python, we implement this part of NLP using the spacy library. Spacy is used for Natural Lan
2 min read
PyTorch for Speech Recognition
Speech recognition is a transformative technology that enables computers to understand and interpret spoken language, fostering seamless interaction between humans and machines. By implementing algorithms and machine learning techniques, speech recognition systems transcribe spoken words into text,
5 min read
Automatic Speech Recognition using Whisper
Automatic Speech Recognition (ASR) can be simplified as artificial intelligence transforming spoken language into text. Its historical journey dates back to a time when developing ASR posed significant challenges. Addressing diverse factors such as variations in voices, accents, background noise, an
10 min read
How to Set Up Speech Recognition on Windows?
Windows 11 and Windows 10, allow users to control their computer entirely with voice commands, allowing them to navigate, launch applications, dictate text, and perform other tasks. Originally designed for people with disabilities who cannot use a mouse or keyboard. In this article, We'll show you H
5 min read
Restart your Computer with Speech Recognition
We can do this with the help of Python. Python has many libraries that can help many things to be done were easy. We need the help of the terminal for doing this task. Python's one of the best library Speech Recognition will help us to do this so. Modules RequiredPyttsx3: This is a text to speech Li
4 min read