0% found this document useful (0 votes)
81 views5 pages

Voice Assistant Using Artificial Intelligence IJERTV11IS050242

The document discusses the development of a static voice assistant using Python that can perform tasks like copying and moving files and sending messages using voice commands. It provides background on popular voice assistants and an overview of the system architecture and workflow for the developed voice assistant.

Uploaded by

Abhay shukla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views5 pages

Voice Assistant Using Artificial Intelligence IJERTV11IS050242

The document discusses the development of a static voice assistant using Python that can perform tasks like copying and moving files and sending messages using voice commands. It provides background on popular voice assistants and an overview of the system architecture and workflow for the developed voice assistant.

Uploaded by

Abhay shukla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Published by :

https://bncet.ac.in/ B.N. College Of Engineering And Technology (BNCET)

Voice Assistant using Artificial Intelligence


in Desktop
Abhay shukla
Student, Department of Computer Science and Engineering. B.N. College Of Engineering And Technology
Dr. A. P. J. Abdul Kalam Technical University/Lucknow/Uttar Pradesh, India
[email protected]

MR. Vinay Kumar


Head of a Department, Department of Computer Science and Engineering, B.N. College Of Engineering And
Technology/Dr. A. P. J. Abdul Kalam Technical University/ Lucknow/Uttar Pradesh, India

Abstract: Voice assistants are software agents that can introduction to the architecture and construction of voice
interpret human speech and respond via synthesized assistants.
voices. Apple's Siri, Amazon's Alexa, Microsoft's II. LITERATURE SURVEY
Crotona, and Google's Assistant are the most popular The field of voice-based assistants has observed major
voice assistants and are embedded in smart phones or advancements and innovations. The main reason behind
dedicated home speakers. Users can ask their assistants such rapid growth in this field is its demand in devices like
questions, control home automation devices and media smartwatches or fitness bands, speakers, Bluetooth
playback via voice, and manage other basic tasks such earphones, mobile phones, laptop or desktop, television,
as email, to-do lists, and calendars with verbal etc. Most of the smart devices that are being brought in the
commands. This paper will explore the basic workings market today have built in voice assistants. The amount of
and common features of today's voice assistants data that is generated nowadays is huge and in order to
make our assistant good enough to tackle these enormous
Keywords: Voice Assistants; chat bots; Static voice amounts of data and give better results we should
assistant; Speech Recognition; Text-to-Speech ; OpenAI. incorporate our assistants with machine learning and train
our devices according to their uses. Along with machine
I. INTRODUCTION learning other technologies which are equally important are
A voice assistant can be a digital assistant that uses human IoT, NLP, Big data access management. The use of voice
voice, language process algorithms, and synthesis to pay assistants can ease out a lot of tasks for us. Just give voice
attention to particular voice commands and come command input to the system and all tasks will be
applicable information or perform particular functions as completed by the assistant starting from converting your
appealed by the user supported commands, commonly speech command to text command then taking out the
known as intents, tell by the user, voice assistants will keywords from the command and execute queries based on
come applicable information by hearing for particular those keywords. In the paper “Speech recognition using flat
keywords and filtering out the close noise. While voice models” by Patrick Nguyen and all, a novel direct
assistants may be completely a software system primarily modelling approach for speech recognition is being brought
builds on and ready to combine into all devices, some forward which eases out the measure of consistency in the
assistants are sketched individually for every unique device sentences spoken. They have termed this approach as Flat
applications, like the Amazon Alexa clock. Now a day, Direct Model (FDM). They did not follow the conventional
voice assistants are combined into some of the devices we Markov model and their model is not sequential. Using
intend to use on a daily, like cell phones, computers, and their approach, a key problem of defining features has been
good speakers. The main motive of this project is solved. Moreover, the template-based features improved
developed for physically challenged person. In this project the sentence error rate by 3% absolute over the baseline
we have developed a static voice assistant using python [2].
which will perform operations like copy and paste files Again, in the paper “On the track of Artificial Intelligence:
from one location to another location, to send a message to Learning with Intelligent Personal Assistant” by Nil Goksel
the users mobile also order pizza and mobile using voice and all, the potential use of intelligent personal assistants
commands. The mass adoption of AI in users’ everyday (IPAs) which use advanced computing technologies and
lives is additionally refueling the shift towards voice. One Natural Language Processing (NLP) for learning is being
of the most popular voice assistants are Siri, from Apple, examined. Basically, they have reviewed the working
Amazon Echo, which responds to the name of Alexa from system of IPAs within the scope of AI [4].
Amazon, Cortana from Microsoft,Google Assistant from The application of voice assistants has been taken to some
Google, and the recently appeared intelligent assistant higher level in the paper “Smart Home Using Internet of
under the name AIVA. This paper presents a brief Things” by Keerthana S and all where they have discussed
how the application of smart assistants can lead to

01
Published by :
https://bncet.ac.in/ B.N. College Of Engineering And Technology (BNCET)

developing a smart home system using Wireless Fidelity The initial phase includes the data being taken in as speech
(Wi-Fi) and Internet of Things. They have used patterns from the microphone. In the second phase the
CC3200MCU that has in-built Wi-Fi modules and collected data is worked over and transformed into textual
temperature sensors. The temperature that is sensed by the data using NLP. In the next step, this resulting string the
temperature sensor is sent to the microcontroller unit data is manipulated through Python Script to finalize the
(MCU) which is then posted to a server and using that data required output process. In the last phase, the produced
the status of electronic equipment like fan, light etc is output is presented either in the form of text or converted
monitored and controlled [5]. from text to speech using TTS.
The application of voice assistants has been beautifully Features The System shall be developed to offer the
discussed in the paper “An Intelligent Voice Assistant following features:
Using Android Platform'' by Sutar Shekhar and all where 1) It keeps listening continuously in inaction and wakes up
they have stressed on the fact that mobile users can into action when called with a particular predetermined
perform their daily task using voice commands instead of functionality.
typing things or using keys on mobiles. They have also 2) Browsing through the web based on the individual’s
used a prediction technology that will make spoken parameters and then issuing a desired output
recommendations based on the user activity [6]. through audio and at the same time it will print the output
The incorporation of natural language processing (NLP) in on the screen.
voice assistants is really necessary which will also lead to
the creation of a trendsetting assistant. These factors have IV. SYSTEM ARCHITECTURE
been the key focus of the paper “An Intelligent Chatbot
using Natural Language Processing” by Rishabh Shah and
all. They have discussed how NLP can help to make
assistants smart enough to understand commands in any
native language and thus does not prevent any part of the
society form enjoying its perks [7].
We also studied the systems developed by Google Text To
Speech – Electric Hook Up (GTTS-EHU) for Query-by-
example Spoken Term Detection (QbE-STD) and Spoken
Term Detection (STD) tasks of the Albayzin 2018 Search
on Speech Evaluation. For representing audio documents
and spoken queries Stacked bottleneck features (sBNF) are
used as frame level acoustic representation. Spoken queries
are synthesized, average of sBNF representations is taken
and then the average query is used for Qbe-STD [8].
We have seen the integration of technologies like gTTS, Fig 1 Block diagram of the voice assistant
AIML (Artificial Intelligence Mark-up Language) in the
paper “JARVIS: An interpretation of AIML with Basic Workflow
integration of gTTS and Python” by TanveeGawand and all The figure below shows the workflow of the main method
where they have adopted the dynamic base Python pyttsx of voice assistant. Speech recognition is used to convert
which is a text to speech conversion library in python and speech input to text. This text is then sent to the processor,
unlike alternative libraries, it works offline [9]. which determines the character of the command and calls
The main focus of voice assistants should be to reduce the the appropriate script for execution. But that's not the only
use of input devices and this fact has been the key point of complexity. No matter how many hours of input, another
discussion in the paper factor plays a big role in whether a package notices you.
Ground noise simply removes the speech recognition
III. PROPOSED SYSTEM device from the target. This may be due to the inability to
In this proposed concept effective way of implementing a essentially distinguish between the bark of a dog or the
Personal voice assistant, Speech Recognition library has sound near hearing that a helicopter is flying overhead
many in-built functions, that will let the assistant from your voice.
understand the command given by user and the response
will be sent back to user in voice, with Text to Speech
functions. When assistant captures the voice command
given by user, the under lying algorithms will convert the
voice into text.
Proposed Architecture
The system design consists of
1. Taking the input as speech patterns through microphone.
2. Audio data recognition and conversion into text.
3. Comparing the input with predefined commands.
4. Giving the desired output. Fig 2 Basic Workflow of the voice assistant

02
Published by :
https://bncet.ac.in/ B.N. College Of Engineering And Technology (BNCET)

Detailed Workflow Imported modules


Voice assistants such as Siri, Google Voice, and Bixby are
already available on our phones. According to a recent
NPR study, around one in every six Americans already has
a smart speaker in their home, such as the Amazon Echo or
Google Home, and sales are growing at the same rate as
smart phone sales a decade ago. At work, though, the voice
revolution may still seem a long way off. The move toward
open workspaces is one deterrent: nobody wants to be that
obnoxious idiot who can't stop ranting at his virtual
assistant.

Fig 4 Imported modules

Speech Recognitionmodule
Since we're creating a voice assistant app, one of the most
critical features is that your assistant knows your voice. In
Fig 3 Detailed Workflow of the voice assistant the terminal, execute the following command to install this
module.
This Assistant consists of three modules. First is, assistant DateTime module
accepting voice input from user. Secondly, analyzing the Datetime package is used to showing Date and Time. This
input given by the user, and mapping it to the respective datetime module comes with builtin Python.
intent and function. And the third is, the assistant giving Wikipedia
user the result all along with voice. Initially, the assistant We all know Wikipedia is a great and huge source of
will start accepting the user input. After receiving the input, knowledge just like GeeksforGeeks or any other sources
the assistant will convert the analog voice input to the we have used the Wikipedia module in our project to get
digital text. more information from Wikipedia or to perform a
Wikipedia search. To install this Wikipedia module use pip
Methodology install wikipedia.
At the outset we make our program capable of using
system voice with the help of sapi5 and pyttsx3. pyttsx3 is Webbrowser
a text-to-speech conversion library in Python. Unlike To perform web search. This module comes built-in with
alternative libraries, it works offline, and is compatible Python.
with both Python 2 and 3. OS
The Speech Application Programming Interface or SAPI is The OS module in Python provides functions for
an API developed by Microsoft to allow the use of speech interacting with the os. OS comes under Python’s standard
recognition and speech synthesis within Windows utility modules. This module provides a way of using
applications. The main function is then defined where all operating system dependent functionality.
the capabilities of the program are defined. The proposed Pyaudio
system is supposed to have the following functionality:
PyAudio is a set of Python bindings for PortAudio, a cross-
(a) The assistant asks the user for input and keeps listening
platform C++ library interfacing with audio drivers.
for commands. The time for listening can be set according
to user's requirement. Open AI
(b) If the assistant fails to clearly grasp the command it will OpenAI's TTS API is an endpoint that enables users to
keep asking the user to repeat the command again and interact with their TTS AI model that converts text to
again. natural - sounding spoken language. The model has two
(c) This assistant can be customized to have either male or variations: TTS-1: The latest AI model optimized for real-
female voice according to user’s requirement. time text - to- speech use cases. TTS-1-HD: The latest AI
(d) The current version of the assistant supports features model optimized for quality
like Checking weather updates, Sending and checking
mails, Search Wikipedia, Open applications, Check time, Python Backend
take note, show note, Open YouTube, Google, Close The python backend gets the output from the speech
YouTube, Google, Open and close applications. recognition module and then identifies whether the
command or the speech output is an API Call and Context
Extraction. The output is then sent back to the python
backend to give the required output to the user.

03
Published by :
https://bncet.ac.in/ B.N. College Of Engineering And Technology (BNCET)

Text to speech module Wakeup command – The assistant will wait for the user to
Text-to-Speech (TTS) refers to the ability of computers to give wakeup command using voice.
read text aloud. A TTS Engine converts written text to a
phonemic representation, then converts the phonemic
representation to waveforms that can be output as sound.
TTS engines with different languages, dialects and
specialized vocabularies are available through third-party
publishers.
Speech to text conversion
Speech Recognition is used to convert speech input to
textual output. It decodes the voice and converts it into a
textual format which will be understood by pc easily.
Fig 6 Wake up command by user after click start button in GUI.
Content Extraction
Opening Application – The assistant will wait for the user
Context extraction (CE) is the task of automatically
to give a valid voice command.
extracting structured information from unstructured and/or
semi-structured machine-readable documents. In most
cases, this activity concerns processing human language
texts using natural language processing (NLP). Recent
activities in multimedia document processing like
automatic annotation and content extraction out of
images/audio/video could be seen as context extraction test
results.

Textual output
It decodes the voice command and performs the operation
then shows the voice command as textual output in the Fig 7 Assistant opening notepad
terminal.
V. EXECUTION
The assistant, on starting, will initially wait for the input to
be given from user. If the user gives input command, via
voice, the assistant will capture it, and searches for the
keyword present in the input command. If the assistant was
able to find a keyword related to the given command, then
it will perform the task according to the input, and returns
the output back to user, in voice also in textual form in the
terminal window. If not, the assistant will again start
waitingfor the user to give valid input.Each of these
functionalities is having their own importance in the whole Fig 8 Assistant opening Microsoft office word
system working.

User Input - The assistant will wait for the user to click on
start or quit button.

Fig 9 Assistant opening Google chrome

Fig 5Start up GUI for the voice assistant

04
Published by :
https://bncet.ac.in/ B.N. College Of Engineering And Technology (BNCET)

VII. REFRENCES
[1] Shaughnessy, IEEE, “Interacting with Computers by Voice:
Automatic Speech Recognition and Synthesis” proceedings of the
IEEE, vol. 91, no. 9, september 2003.
[2] Patrick Nguyen, Georg Heigold, Geoffrey Zweig, “Speech
Recognition with Flat Direct Models”, IEEE Journal of Selected
Topics in Signal Processing, 2010.
[3] Mackworth (2019-2020), Python code for voice assistant:
Foundations of Computational Agents- David L. Poole and Alan
K. Mackworth.
[4] Nil Goksel, CanbekMehmet ,EminMutlu, “On the track of
Artificial Intelligence: Learning with Intelligent Personal
Assistant”, proceedings of International Journal of Human
Fig 10 Assistant gathering information from Google
Sciences, 2016.
[5] Keerthana S, Meghana H, Priyanka K, Sahana V. Rao, Ashwini B
“Smart Home Using Internet of Things ”, proceedings of
Perspectives in Communication , Embedded -systems and signal
processing, 2017.
[6] Sutar Shekhar, P. Sameer, Kamad Neha, Prof. Devkate Laxman,
“An Intelligent Voice Assistant Using Android Platform”,
IJARCSMS, ISSN: 232-7782, 2017.
[7] Rishabh Shah, Siddhant Lahoti, Prof. Lavanya. K, “An Intelligent
Chatbot using Natural Language Processing”, International
Journal of Engineering Research , Vol.6 , pp.281-286, 2017.
[8] Luis Javier Rodríguez-Fuentes, Mikel Peñagarikano,
AparoVarona, Germán Bordel, “GTTS-EHU Systems for the
Fig 11 Assistant ordering pizza in dominos Albayzin 2018 Search on Speech Evaluation”, proceedings of
IberSPEECH, Barcelona, Spain, 2018.
[9] Ravivanshikumar ,Sangpal,Tanvee ,Gawand,SahilVaykar,
“JARVIS: An interpretation of AIML with integration of gTTS
and Python”, proceedings of the 2019 2nd International
Conference on Intelligent Computing, Instrumentation and
Control Technologies (ICICICT), Kanpur, 2019.
[10] Luis Javier Rodríguez-Fuentes, Mikel Peñagarikano,
AparoVarona, Germán Bordel, “GTTS-EHU Systems for the
Albayzin 2018 Search on Speech Evaluation”, proceedings of
IberSPEECH, Barcelona, Spain, 2018.

Fig 12 Assistant ordering a mobile in Amazon

VI. CONCLUSION
This paper presents a comprehensive overview of the
design and development of a Static Voice enabled personal
assistant for pc using Python programming language. This
Voice enabled personal assistant, in today's life style will
be more effective in case of saving time and helpful to
differently abled people, compared to that of previous days.
This Assistant works properly to perform some tasks given
by user. Furthermore, there are many things that this
assistant is capable of doing, like sending message to user
mobile, YouTube automation, gathering information from
Wikipedia and Google, with just one voice command.
Through this voice assistant, we have automated various
services using a single line command. It eases most of the
tasks of the user like searching the web etc, We aim to
make this project a complete server assistant and make it
smart enough to act as a replacement for a general server
administration.The project is built using open source
software modules with PyCharm community backing
which can accommodate any updates shortly. The modular
nature of this project makes it more flexible and easy to
add additional features without disturbing current system
functionalities.

05

You might also like