0% found this document useful (0 votes)
47 views44 pages

voice sample (1)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views44 pages

voice sample (1)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

VOXMATE

Report submitted to the SASTRA Deemed to be University as the


requirement for the course

BIN522: PYTHON FOR DATA SCIENCE

Submitted by

SWATHI S

Reg. No. : 126162016

November 2024

SCHOOL OF COMPUTING
THANJAVUR, TAMIL NADU, INDIA – 613 401
SCHOOL OF COMPUTING

THANJAVUR – 613 401

Bonafide Certificate

This is to certify that the report titled “Voxmate” submitted as a requirement for the

course, BIN522: PYTHON FOR DATA SCIENCE for M.Tech. Artificial Intelligence and

Data Science programme, is a bona fide record of the work done by Ms. SWATHI S (Reg.

No.126162016) during the academic year 2024-25, in the School of Computing, under my

supervision.

Signature of Project Supervisor :

Name with Affiliation : Ashok Palaniappan, Ph.D


Date : 22.11.2024

Project Viva voce held on 23.11.2024

Examiner 1 Examiner 2

ii
SCHOOL OF COMPUTING

THANJAVUR – 613 401

Declaration

I declare that the report titled “Voxmate” submitted by me/us is an original work done

by me/us under the guidance of Dr Ashok Palaniappan, Associate Professor, School of

Chemical and Biotechnology, SASTRA Deemed to be University during the seventh

semester of the academic year 2024-25, in the School of Computing. The work is original and

wherever I have used materials from other sources, I have given due credit and cited them in the text

of the report. This report has not formed the basis for the award of any degree, diploma, associate-

ship, fellowship or other similar title to any candidate of any University.

Signature of the candidate(s) :

Name of the candidate(s) : SWATHI S


Date : 23.11.2024

iii
Acknowledgements

This project was completed successfully with the kind support and help of many individuals.
We would like to extend my sincere thanks to all of them. First and foremost, we would like to
thank the Almighty God for helping us complete the project successfully.

We would like to express our sincere thanks to Dr. S. Vaidhyasubramaniam, Honorable Vice
Chancellor, and Dr. V.S. Shankar Sriraman, Dean, School of Computing, SASTRA
Deemed University, for providing such an opportunity to carry out our project which has
enriched our practical knowledge in research aspects.

We would like to thank our Project Guide, Dr. Ashok Palaniappan, Associate Professor, School
of Chemical and Biotechnology for his diligent guidance, valuable advice, and constant
encouragement throughout the project.
iv

CHAPTER TITLE PAGE


NO NO
BONA-FIDE CERTIFICATE ii

DECLARATION iii

ACKNOWLEDGEMENT iv

LIST OF FIGURES vi

ABSTRACT vii
1 INTRODUCTION 1

1
1.1 INTRODUCTION

2
1.2 BACKGROUND

2 SYSTEM ANALYSIS 4

2.1 EXISTING SYSTEM 4

2.2 PROPOSED SYSTEM 4

3 LITERATURE SURVEY 6

4 SYSTEM DESIGN 8

4.1 ER DIAGRAM 8

4.2 DATA FLOW DIAGRAM 9


4.3 ACTIVITY DIAGRAM 11
4.4 COMPONENT DIAGRAM 12
4.5 USECASE 13
5 PROPOSED METHODS 14

Table of Contents Continued - on top of next and page

5.1 SPEECH RECOGNITION AND COMMAND 14


PROCESSING

5.2 TASK AND ALARM MANAGEMENT 14


5.3 MULTIMEDIA CONTROL 14
5.4 INTERNET SPEED MEASUREMENT 15
5.5 TRANSLATION 15
5.6 APPLICATION CONTROL 15
5.7 SCREENSHOTS AND CAPTURE PHOTO 16
5.8 NOTIFICATION AND ALERT 16
5.9 SECURITY AND PRIVACY 17
5.10 SHUTDOWN AND EXIT COMMAND 17

6 RESULT AND DISCUSSION 18


6.1 EFFECTS OF SPEECH RECOGNITION AND ACCURACY 18
6.2 FFECT OF TASK AUTOMATION AND MOBILE DESIGN 18
6.3 CHALLENGES AND OBSERVATIONS 19
7 CONCLUSION AND FUTURE WORK 20
7.1 CONCLUSION 20
7.2 FUTURE WORK 21
REFERENCES 22
APPENDIX 23
List of Figures

Figure No. Title Page No.


4.1.1 ER DIAGRAM 8

4.2.1 DFD LEVEL 0 9

4.2.2 DFD LEVEL 1 9

4.2.3 DFD LEVEL 2 10

4.3.1 ACTIVITY DIAGRAM 11

4.4.1 COMPONENT DIAGRAM 12

4.5.1 USE CASE DIAGRAM 13


Abstract

Voice assistants have become a crucial component of modern computing, offering a


seamless way to interact with technology through voice commands. This project focuses on
developing a desktop-based voice assistant that automates various tasks, including setting alarms,
opening applications, checking internet speed, taking screenshots, capturing photos, translating
text, engaging in general conversations, managing YouTube playback, performing Google and
Wikipedia searches, providing news updates, calculations, temperature measurement, and shutting
down the system. Unlike commercial solutions such as Amazon Alexa and Google Assistant, this
voice assistant processes commands locally, ensuring user privacy and customization flexibility.
The assistant is built on foundational research in speech recognition and task automation,
leveraging libraries such as SpeechRecognition, Pyttsx3, and PyAutoGUI. The system employs a
modular design with three primary layers: the input layer for capturing voice commands, the
processing layer for interpreting user requests, and the output layer for executing tasks and
providing feedback. Each module, including alarm scheduling, browser automation, and media
control, was implemented to handle specific functionalities efficiently.
Designed and implemented all aspects of the system, from integrating the voice
recognition module using Google Speech-to-Text API to creating features like YouTube playback
control and system-level commands. Particular emphasis was placed on optimizing command
accuracy by preprocessing input through noise reduction and keyword filtering. Additionally,
fallback mechanisms were implemented to handle unrecognized commands gracefully, improving
the system's reliability.

viii
CHAPTER 1

INTRODUCTION

1.1 INTRODUCTION
Artificial Intelligence when used with machines, it shows us the capability of thinking like
humans. In this, a computer system is designed in such a way that typically requires interaction
from human. Python is an emerging language so it becomes easy to write a script for Voice
Assistant in Python. The instructions for the assistant can be handled as per the requirement of
user. Speech recognition is the Alexa, Siri, etc. In Python there is an API called
SpeechRecognition which allows us to convert speech into text. It was an interesting task to make
my own assistant. It became easier to send emails without typing any word, Searching on Google
without opening the browser, and performing many other daily tasks like playing music, opening
your favorite IDE with the help of a single voice command. In the current scenario, advancement
in technologies are such that they can perform any task with same effectiveness or can say more
effectively than us. By making this project, realized that the concept of AI in every field is
decreasing human effort and saving time. As the voice assistant is using Artificial Intelligence
hence the result that it is providing are highly accurate and efficient. The assistant can help to
reduce human effort and consumes time while performing any task, they removed the concept of
typing completely and behave as another individual to whom we are talking and asking to perform
task. The assistant is no less than a human assistant but we can say that this is more effective and
efficient to perform any task. The libraries and packages used to make this assistant focuses on the
time complexities and reduces time. The functionalities include, It can send emails, It can read
PDF, It can send text on WhatsApp, It can open command prompt, your favorite IDE, notepad
etc., It can play music, It can play Video, It can do Wikipedia searches for you, It can open
websites like Google, YouTube, etc., in a web browser, It can give weather forecast, It can give
desktop reminders of your choice. It can have some basic conversation. Tools and technologies
used are PyCharm IDE for making this project, and created all py files in PyCharm. Along with
this, following modules and libraries in our project. pyttsx3, SpeechRecognition, Datetime,
Wikipedia, Smtplib, twilio, pyjokes, pyPDF2, pyautogui, pyQt5 etc..
These days, Voice Assistants are all the rage. Amazon, Apple, and Google all have thrown
seperate versions into the ring to duke it out in the Smart Home space.

1
Like all technology, it’s easy to make Voice Assistants sound complicated. But when you spend a
few minutes boiling it down, it’s not so complicated at all. In my years as an engineer at both
Amazon and Apple, I’ve found that Voice Assistants aren’t really that complicated. Having an
understanding of what Voice Assistants are and are not will go a long way, as well as learning the
overall advantages of using this technology.

1.2 BACKGROUND
SIRI from Apple
SIRI is personal assistant software that interfaces with the user thru voice interface, recognizes
commands and acts on them. It learns to adapt to user’s speech and thus improves voice
recognition over time. It also tries to converse with the user when it does not identify the user
request. It integrates with calendar, contacts and music library applications on the device and also
integrates with GPS and camera on the device. It uses location, temporal, social and task based
contexts, to personalize the agent behavior specifically to the user at a given point of time.
Supported Tasks

• Call someone from my contacts list


• Launch an application on my iPhone
• Send a text message to someone
• Set up a meeting on my calendar for 9am tomorrow
• Set an alarm for 5am tomorrow morning
• Play a specific song in my iTunes library
• Enter a new note
Drawback
SIRI does not maintain a knowledge database of its own and its understanding comes from the
information captured in domain models and data models.

2
ReQall
ReQall is personal assistant software that runs on smartphones running Apple iOS or Google
Android operating system. It helps user to recall notes as well as tasks within a location and time
context. It records user inputs and converts them into commands, and monitors current stack of
user tasks to proactively suggest actions while considering any changes in the environment. It also
presents information based on the context of the user, as well as filter information to the user
based on its learned understanding of the priority of that information.

Supported Tasks
• Reminders
• Email
• Calendar, Google Calendar
• Outlook
• Evernote
• Facebook, LinkedIn
• News Feeds

Drawback
Will take some time to put all of the to-do items in – you could spend more time putting the
entries in than actually doing the revision.

3
CHAPTER -2
SYSTEM ANALYSIS

2.1 EXISTING SYSTEM


We are familiar with many existing voice assistants like Alexa, Siri, Google
Assistant, Cortana which uses concept of language processing, and voice recognition. They
listens the command given by the user as per their requirements and performs that specific
function in a very efficient and effective manner. As these voice assistants are using Artificial
Intelligence hence the result that they are providing are highly accurate and efficient. These
assistants can help to reduce human effort and consumes time while performing any task, they
removed the concept of typing completely and behave as another individual to whom we are
talking and asking to perform task. These assistants are no less than a human assistant but we
can say that they are more effective and efficient to perform any task. The algorithm used to make
these assistant focuses on the time complexities and reduces time. But for using these assistants
one should have an account (like Google account for Google assistant, Microsoft account for
Cortana) and can use it with internet connection only because these assistants are going to work
with internet connectivity. They are integrated with many devices like, phones, laptops, and
speakers etc.

2.2 PROPOSED SYSTEM

It was an interesting task to make voice assistant. It became easier to send emails without
typing any word, Searching on Google without opening the browser, and performing many
other daily tasks like playing music, opening your favorite IDE with the help of a single
voice command.
VOXMATE is different from other traditional voice assistants in terms that it is specific to
desktop and user does not need to make account to use this, it does not require any internet
connection while getting the instructions to perform any specific task.

4
The IDE used in this project is PyCharm. All the python files were created in PyCharm and all
the necessary packages were easily installable in this IDE. For this project following modules and
libraries were used i.e. pyttsx3, SpeechRecognition, Datetime, Wikipedia, Smtplib, twilio,
pyjokes, pyPDF2, pyautogui, pyQt etc

5
CHAPTER-3
LITERATURE SURVEY

[1] P. Kunekar, A. Deshmukh, S. Gajalwad, A. Bichare, K. Gunjal and S. Hingade, "AI-


based Desktop Voice Assistant," 2023 5th Biennial International Conference on Nascent
Technologies in Engineering (ICNTE), Navi Mumbai, India, 2023
Abstract: Artificial Intelligence (AI) has made significant strides, with Natural Language
Processing (NLP) being a key application enabling voice assistants to interact with users in natural
language. Voice assistants, powered by cloud computing, have gained widespread adoption in
households and are increasingly being used in educational institutions. Initially introduced on
smartphones, these assistants have expanded to laptops, smart speakers, and home automation
systems. Their ability to interface with humans in an intuitive, natural language has revolutionized
daily life, making them a groundbreaking innovation in AI.

[2] M. Subi, M. Rajeswari, J. J. Rajan and S. Sri Harshini, "AI-Based Desktop VIZ: A
Voice-Activated Personal Assistant-Futuristic and Sustainable Technology," 2024 10th
International Conference on Communication and Signal Processing (ICCSP),
Melmaruvathur, India, 2024,
Abstract: Modern technology integrates advanced techniques like speech recognition, machine
learning, artificial intelligence, and OpenAI, exemplified by voice-activated personal assistants.
These systems leverage OpenAI to enhance functionality and deliver precise, comprehensive user-
requested data. This AI-powered desktop uses the Python AutoGUI package to automate mouse
control and interact with program windows, improving user interface interaction. Unlike previous
models with basic interfaces, this system combines OpenAI, GUI automation, and speech
recognition for seamless and efficient performance. It serves as a prime example of advanced
technology integration.

[3] J. Vijaya, C. Swati and S. Satya, "Ikigai: Artificial Intelligence-Based Virtual Desktop
Assistant," 2024 IEEE International Conference on Interdisciplinary Approaches in
Technology and Management for Social Innovation (IATMSI), Gwalior, India, 2024.

6
Abstract: The AI Desktop Assistant project aims to create a virtual assistant inspired by cinematic
systems, enabling natural voice interactions for tasks like email, scheduling, and file organization.
To address limitations such as reliance on pre-defined commands, accent recognition, and privacy
concerns, the project will enhance NLP using models like BERT or GPT-3.5, improve adaptive
voice recognition, and prioritize user-centric design. Data from open-source platforms like Kaggle
will refine language understanding, while features like NASA Navigator provide space-related
news. Success will be measured by user satisfaction, task efficiency, and continuous feedback for
improvement.
[4] S. Kumar, S. Patel, Sonam and V. Srivastav, "Voice-Based Virtual Assistant for
Windows (Ziva - AI Companion)," 2024 IEEE International Conference on Computing,
Power and Communication Technologies (IC2PCT), Greater Noida, India, 2024
Abstract: ZIVA is a Python-based desktop assistant designed to execute voice commands,
eliminating the need for manual typing. Inspired by Siri and Cortana, ZIVA uses Natural
Language Processing (NLP) and intelligent voice recognition to interpret user input and perform
tasks like web searches, opening websites, playing music, telling time, and system shutdowns. It
stores and matches voice commands with predefined actions, streamlining everyday operations.
Machine learning techniques analyze user commands to deliver optimal responses, making ZIVA
a versatile tool for enhancing productivity and interaction with local machines.

[5] L. R. Sirisha Munduri and M. Venugopalan, "Leap Motion Based AI Mouse With
Virtual Voice Assistant Support," 2023 3rd International Conference on Mobile Networks
and Wireless Communications (ICMNWC), Tumkur, India, 2023
Abstract: This paper proposes using Leap Motion technology to control computer systems
through hand gestures, offering contactless operation for tasks like presentations and assisting
people with repetitive strain injuries. By integrating a laptop voice assistant, users can launch or
stop the AI-powered virtual mouse with voice commands. The system uses a desktop camera for
mouse functions like clicking and scrolling. Experimental results show an accuracy of 94.6% for
various gestures and multi-handed use, outperforming other state-of-the-art methods, which
achieved 78% accuracy. This approach eliminates the need for additional hardware while
enhancing device control..

7
CHAPTER-4
SYSTEM DESIGN

4.1 ER DIAGRAM

Single user can ask multiple questions. Each question will be given ID to get
recognized along with the query and its corresponding answer. User can also be
having n number of tasks. These should have their own unique id and status i.e. their
current state. A task should also have a priority value and its category whether it is a
parent task or child task of an older task.

4.1.1 ER DIAGRAM

8
4.2 DATA FLOW DIAGRAM

4.2.1 DFD Level 0

4.2.2 DFD Level 1

9
4.2.2 DFD Level 2

10
4.3 ACTIVITY DIAGRAM

4.3.1 ACTIVITY DIAGRAM

Initially, the system is in idle mode. As it receives any wake up call it begins
execution. The received command is identified whether it is a questionnaire or a
task to be performed. Specific action is taken accordingly. After the Question is
being answered or the task is being performed, the system waits for another
command. This loop continues unless it receives quit command. At that moment, it
goes back to sleep.

11
4.4 COMPONENT DIAGRAM

4.4.1 COMPONENT DIAGRAM

The main component here is the Virtual Assistant. It provides two specific service,
executing Task or Answering your question.

12
4.5 USE CASE DIAGRAM

In this project there is only one user. The user queries command to the system.
System then interprets it and fetches answer. The response is sent back to the user.

4.5.1 USE CASE DIAGRAM

13
CHAPTER-5

PROPOSED METHODS

5.1. Speech Recognition and Command Processing

Method for Voice Command Input:


The assistant uses the speech_recognition library to capture and recognize user input. The
Google Web Speech API is employed to convert speech to text. The method listens for a
wake-up phrase like "wake up" and waits for the user to issue commands.
Proposed Method:

o takeCommand() method listens to the microphone and processes commands.

o The system uses r.recognize_google() to interpret commands.

o Enhanced NLP can be integrated to process more complex queries using libraries
like spaCy or transformer models.

5.2. Task and Alarm Management

Method for Alarm and Task Scheduling:


The assistant can schedule alarms and set reminders. A task scheduling method writes
user-defined alarms into a text file and triggers an external script (alarm.py) to alert the
user. The assistant can also handle recurring tasks using the schedule library, which can
set specific times for tasks like opening applications or taking a screenshot.
Proposed Method:

o File I/O: Use Python’s open() to store tasks and alarm times in text files.

o The system can handle alarms by storing and processing time-based inputs.

o For recurring tasks, use the schedule library to trigger actions at set times

5.3. Multimedia Control (YouTube, Music, and Volume Control)

Method for Media Control:


The assistant interacts with media applications (like YouTube and local players) using
keyboard simulation (via pyautogui), which simulates keypresses to control playback
(pause, play, mute) and adjust volume. The assistant can also open a browser and directly
navigate to video links.
14
Proposed Method:

o PyAutoGUI will be used to simulate keypresses for media controls.

o Selenium can be integrated for web-based media control (YouTube).

o Volume control can be handled through keyboard libraries to adjust system


volume.

5.4. Internet Speed Measurement

Method for Checking Internet Speed:


The assistant uses the speedtest-cli library to measure network upload and download
speeds. This will be triggered by user requests like "What is my internet speed?" and
provide feedback with the current network status.
Proposed Method:

o Use speedtest-cli to measure real-time internet speeds.

o Display results via voice feedback and console output using speak() method.

o Integrate additional processing for detailed analytics

5.5. Translation

Method for Language Translation:


The assistant can translate text or speech using Google Translate API. The system will
allow the user to specify which language they want the text translated into. This can be
done by capturing the user’s command and parsing the request for language translation.
Proposed Method:

o Integrate Google Translate API for multilingual translation tasks.

o Use speech recognition to listen for phrases like "Translate 'hello' to Spanish".

o The result will be spoken back to the user using the pyttsx3 library.

5.6. Application Control (Opening, Closing Apps)

15
Method for Application Control:
The assistant can launch or close applications using system calls with pyautogui and os
modules. The assistant listens for commands like "Open Chrome" or "Close Firefox" and
triggers system-level events to open or close the applications.

Proposed Method:

o Use pyautogui and subprocess to interact with the operating system.

o Implement keyword parsing to detect application names and send commands to the
system.

5.7. Screenshot and Camera Capture

Method for Capturing Screenshots and Photos:


The assistant can take screenshots using the pyautogui.screenshot() function and can
capture photos by controlling the system’s camera app using pyautogui to simulate
keypresses and open the camera application.
Proposed Method:

o Use PyAutoGUI for screenshots and Selenium/pyautogui for webcam interaction.

o The assistant can also trigger the camera app on the system, instructing the user to
"smile" before taking the photo.

5.8. Notification and Alerts

Method for Notifications:


The assistant will use the plyer.notification module to send desktop notifications. These
notifications can be triggered for reminders, task schedules, or specific commands like
showing the user’s to-do list.
Proposed Method:

o Use plyer.notification to create customizable notifications that pop up on the


desktop.

o Notifications can be triggered by certain keywords like "show my schedule" or


after certain tasks are completed

16
5.9. Security and Privacy

Method for Security and Password Handling:


A password authentication system is implemented, where the assistant reads the password
from a text file and compares it with user input. If the user enters the correct password, the
assistant becomes active. This ensures secure access to the assistant’s functionalities.
Proposed Method:

o Store passwords securely in a text file (consider encrypting this for added security).

o Implement retry attempts for security, exiting the program after multiple failed
attempts.

5.10. Shutdown and Exit Command

Method for Shutting Down the System:


The assistant can shut down the system when a command like "shutdown the system" is
issued. The assistant will confirm the shutdown request with the user, ensuring the action
is intentional, then initiate the shutdown via a system command (os.system("shutdown /s /t
1")).
Proposed Method:

o Use os.system() to execute shutdown commands.

o Implement a confirmation system to prevent accidental shutdowns.

17
CHAPTER-6

RESULTS AND DISCUSSION

6.1. Effect of Speech Recognition Accuracy

The speech recognition module, implemented using the SpeechRecognition library and Google’s
Speech-to-Text API, demonstrated the following effects:

• Recognition Performance:

o The module achieved over 90% accuracy in quiet environments and moderately
noisy settings.

o Accuracy dropped to approximately 75% in high-noise environments, indicating


the need for enhanced noise filtering techniques.

• Impact on Task Execution:

o The assistant performed well with predefined commands like "open Chrome" or
"check internet speed."

o However, complex or ambiguous commands (e.g., "Schedule something important


for later") occasionally led to errors in intent recognition.

• Discussion:
To improve accuracy further, integrating advanced NLP models like BERT or Whisper AI
could help in understanding context and handling accents or variations in speech patterns.

6.2. Effect of Task Automation and Modular Design

The modular approach to handling tasks (e.g., alarms, media control, and browser automation)
yielded the following results:

• System Performance:

o Task execution was near-instantaneous for simple commands like opening


applications or setting alarms, with an average response time of 0.5–1 second.

18
o More complex operations, such as browser automation using Selenium, took
slightly longer (about 2–3 seconds) due to browser load times.

• User Satisfaction:

o Users appreciated the modular design, which allowed seamless integration of new
features without disrupting existing functionalities.

o However, commands that involved multi-step interactions, such as "Search Google


for nearby cafes and open the top link," revealed slight delays in coordinating
between modules.

• Discussion:
Enhancements like parallel task execution and asynchronous processing could reduce
delays. Additionally, refining browser automation scripts for optimized performance is
crucial.

6.3. Challenges and Observations

While the assistant performed effectively overall, the following challenges were noted:

• Noise Interference in Speech Recognition:

o Background noise caused occasional recognition failures, necessitating user


repetition.

o Implementing noise-canceling algorithms could mitigate this issue.

• User Privacy and Local Processing:

o The local processing approach successfully maintained user privacy but limited the
assistant’s ability to leverage cloud-based computational resources for advanced
tasks.

• Effectiveness of Media Control and Internet Speed Test:

o Features like media control and internet speed testing were executed with 100%
success, but user feedback indicated that adding context-aware responses (e.g.,
recommending improvements for slow internet) would enhance usability.

19
CHAPTER-7

CONCLUSION AND FUTURE WORK

7.1 CONCLUSION

The development of this desktop voice assistant demonstrates the power of modern technologies
in creating intelligent, interactive systems that can simplify daily tasks. Through the integration of
multiple modules such as speech recognition, task automation, media control, multilingual
support, and system commands, the assistant has proven capable of performing a wide array of
functions including opening applications, scheduling alarms, checking internet speed, controlling
multimedia, and more.

Key highlights of the system include:

• Speech-to-Text Conversion: The integration of Google Speech API and


SpeechRecognition enables accurate voice recognition even in noisy environments

• Privacy-Focused Design: By processing all data locally, the assistant ensures user data is
kept private and secure, which is in line with recent research advocating for privacy-
preserving technologies.

• Modular and Scalable Architecture: The assistant is designed with a modular approach,
allowing for easy addition of new features without affecting the existing functionality. This
is in line with best practices for scalable system design.

• Multilingual Capabilities: The system can handle basic multilingual translation tasks,
enhancing its accessibility to a broader audience

Through the use of popular libraries such as PyAutoGUI, Selenium, and Pyttsx3, the
assistant is able to perform complex tasks such as controlling YouTube, taking screenshots,
and managing system applications.

20
7.2 Future Work
While the system has achieved the core functionality intended, there are several areas for
improvement and potential extensions for future work:
1. Enhanced Natural Language Understanding (NLU):
One of the primary areas of improvement lies in natural language understanding. The
current implementation processes commands based on pre-defined keywords. Future work
could integrate more advanced machine learning models (e.g., BERT, GPT) to allow the
assistant to better handle more complex, context-sensitive interactions. This would
enhance its ability to understand nuanced commands and improve user experience.
2. Contextual Awareness and Memory:
The assistant could benefit from contextual awareness—the ability to retain and recall
past conversations or actions. This can be achieved by integrating a knowledge base or
state-tracking mechanism, allowing the assistant to handle multi-turn conversations more
fluidly.
3. Cross-Platform Compatibility:
Currently, the assistant is tailored for a desktop environment. Future versions could be
made cross-platform (Windows, macOS, Linux) to broaden its accessibility. Additionally,
developing mobile versions for iOS and Android would further extend the assistant's reach.
4. Cloud Integration for Advanced Features:
While local processing ensures privacy, some advanced features such as real-time news
updates, weather, and complex calculations could benefit from cloud-based services. This
would allow the assistant to process heavy tasks that require extensive computational
resources without overloading the local machine

21
REFERENCES

[1] Roro Ayu Fasha Dewatri et al., "Potential Tools to Support Learning: OpenAI and Elevenlabs
Integration", Southeast Asian Journal on Open and Distance Learning, vol. 1, no. 02, 2023.

[2] Rui Yang et al., "Large language models in health care: Development applications and
challenges", Health Care Science, vol. 2, no. 2023, pp. 255-263.

[3] Ms G. Pydi Prasanna Kumari and Mrs P. Pavithra, "ChatGPT Integrated With Voice
Assistant", Journal of Engineering Sciences, vol. 15, no. 02, 2024.

[4] Rathore Bharati, "Future of AI & generation alpha: ChatGPT beyond boundaries", Eduzone:
International Peer Reviewed/Refereed Multidisciplinary Journal, vol. 12, no. 2023, pp. 63-68.

[5] Shashi Kant Singh, Shubham Kumar, Pawan Singh and Mehra, "Chat GPT & Google Bard
AI: A Review", 2023 International Conference on IoT Communication and Automation
Technology (ICICAT), 2023.

[6] Berşe Soner et al., "The role and potential contributions of the artificial intelligence language
model ChatGPT", Annals of Biomedical Engineering, vol. 52, no. 2024, pp. 130-133.

[7] Sai H. Vemprala et al., "Chatgpt for robotics: Design principles and model abilities", IEEE
Access, 2024.

[8] Naoki Wake et al., "Chatgpt empowered long-step robot control in various environments: A
case application", IEEE Access, 2023.

[9] N. Wake, A. Kanehira, K. Sasabuchi, J. Takamatsu and K. Ikeuchi, "Chatgpt empowered


long-step robot control in various environments: A case application", 2023.

[10] T.B. Brown et al., "’Language models are few-shot learners", Proc. Adv. Neur. Inf. Process.
Sys, vol. 33, pp. 1877-1901, 2020.

22
APPENDIX- A

SAMPLE CODE

import pyttsx3
import speech_recognition
import requests
from bs4 import BeautifulSoup
import os
import datetime
import pyautogui
import random
import webbrowser
from plyer import notification
from pygame import mixer
import time
import speedtest
for i in range(3):
a = input("Enter Password to open Jarvis :- ")
pw_file = open("password.txt","r")
pw = pw_file.read()
pw_file.close()
if (a==pw):
print("WELCOME SIR ! PLZ SPEAK [WAKE UP] TO LOAD ME UP")
break
elif (i==2 and a!=pw):
exit()

elif (a!=pw):
print("Try Again")
engine = pyttsx3.init("sapi5")
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[0].id)
engine.setProperty('rate', 175)
def speak(audio):
engine.say(audio)
engine.runAndWait()
def takeCommand():
r = speech_recognition.Recognizer()
with speech_recognition.Microphone() as source:
print("Listening...")
r.pause_threshold=1
r.energy_threshold=300
audio=r.listen(source,0,4)
try:
print("understanding.....")
query= r.recognize_google(audio,language='en-in')
print(f"You said: {query}")
except Exception as e:
print("Say that again please...")
return "None"
return query
def alarm(query):
timehere = open("Alaramtext.txt","a")
timehere.write(query)
timehere.close()
os.startfile("alarm.py")
if __name__ == "__main__":
while True:
query = takeCommand().lower()
if "wake up" in query:
from GreetMe import greetMe
greetMe()
while True:
query = takeCommand().lower()
if "go to sleep" in query:
speak("OK sir,you can me call anytime")
break

elif "change password" in query:


speak("What's the new password")
new_pw = input("Enter the new password\n")
new_password = open("password.txt", "w")
new_password.write(new_pw)
new_password.close()
speak("Done sir")
speak(f"Your new password is{new_pw}")
elif "schedule my day" in query:
tasks = [] # Empty list
speak("Do you want to clear old tasks (Plz speak YES or NO)")
query = takeCommand().lower()
if "yes" in query:
file = open("tasks.txt", "w")
file.write(f"")
file.close()
no_tasks = int(input("Enter the no. of tasks :- "))
i=0
for i in range(no_tasks):
tasks.append(input("Enter the task :- "))
file = open("tasks.txt", "a")
file.write(f"{i}. {tasks[i]}\n")
file.close()
elif "no" in query:
i=0
no_tasks = int(input("Enter the no. of tasks :- "))
for i in range(no_tasks):
tasks.append(input("Enter the task :- "))
file = open("tasks.txt", "a")
file.write(f"{i}. {tasks[i]}\n")
file.close()
elif "show my schedule" in query:
file = open("tasks.txt", "r")
content = file.read()
file.close()
mixer.init()
mixer.music.load("C:/Users/swath/Downloads/Premalu Bgm.mp3")
mixer.music.play()
notification.notify(
title="My schedule :-",
message=content,
timeout=15
)
elif "open" in query: # EASY METHOD
query = query.replace("open", "")
query = query.replace("voxmate", "")
pyautogui.press("super")
time.sleep(1.5)
pyautogui.typewrite(query,interval=0.1)
pyautogui.sleep(2)
pyautogui.press("enter")
elif "internet speed" in query:
wifi = speedtest.Speedtest()
upload_net = wifi.upload() / 1048576 # Megabyte = 1024*1024 Bytes
download_net = wifi.download() / 1048576
print("Wifi Upload Speed is", upload_net)
print("Wifi download speed is ", download_net)
speak(f"Wifi download speed is {download_net}")
speak(f"Wifi Upload speed is {upload_net}")

elif "screenshot" in query:


import pyautogui # pip install pyautogui

im = pyautogui.screenshot()
im.save("ss.jpg")
elif "click my photo" in query:
pyautogui.press("super")
pyautogui.typewrite("camera",interval=0.1)
time.sleep(2)
pyautogui.press("enter")
pyautogui.sleep(4)
speak("SMILE")
pyautogui.press("enter")
elif "translate" in query:
from Translator import translategl

query = query.replace("jarvis", "")


query = query.replace("translate", "")
translategl(query)

elif "hello" in query:


speak("hello sir, how are you?")
elif "i am fine" in query:
speak("that's great sir")

elif "thank you" in query:


speak("you are welcome, sir")
elif "tired" in query:
speak("Playing your favourite songs")
a=(1,2,3)
b= random.choice(a)
if b==1:
webbrowser.open("https://youtube.com/watch?v=eVx2dKmUfMg")
elif b==2:
webbrowser.open("https://www.youtube.com/watch?v=ZWuzH0fW8l0")
elif b==3:
webbrowser.open("https://www.youtube.com/watch?v=6LD30ChPsSs")

elif "pause" in query:


pyautogui.press("k")
speak("video paused")
elif "play" in query:
pyautogui.press("k")
speak("video played")
elif "mute" in query:
pyautogui.press("m")
speak("video muted")

elif "volume up" in query:


from keyboard import volumeup
speak("Turning volume up,sir")
volumeup()
elif "volume down" in query:
from keyboard import volumedown

speak("Turning volume down, sir")


volumedown()

elif "open" in query:


from Dictapp import openappweb
openappweb(query)
elif "close" in query:
from Dictapp import closeappweb
closeappweb(query)

elif "google" in query:


from SearchNow import searchGoogle
searchGoogle(query)
elif "youtube" in query:
from SearchNow import searchYoutube
searchYoutube(query)
elif "wikipedia" in query:
from SearchNow import searchWikipedia
searchWikipedia(query)
elif "news" in query:
from NewsRead import latestnews
latestnews()
elif "calculate" in query:
from Calculatenumbers import WolfRamAlpha
from Calculatenumbers import Calc
query = query.replace("calculate", "")
query = query.replace("voxmate", "")
Calc(query)

elif "temperature" in query:


search ="temperature in"
url= f"https://www.google.com/search?q={search}"
r=requests.get(url)
data=BeautifulSoup(r.text,"html.parser")
temp = data.find("div", class_ ="BNeawe").text
speak(f"current temperature is {temp}")
print(temp)

elif "set an alarm" in query:


print("input time example:- 10 and 10 and 10")
speak("set the time")
a=input("please tell the time")
alarm(a)
speak("Done")
elif "the time" in query:
strTime= datetime.datetime.now().strftime("%I:%M %p")
speak(f"Sir, the time is {strTime}")
elif "finally sleep" in query:
speak("going to sleep,sir")
exit()
elif "remember that" in query:
rememberMessage = query.replace("remember that", "")
rememberMessage = query.replace("jarvis", "")
speak("You told me to remember that" + rememberMessage)
remember = open("Remember.txt", "a")
remember.write(rememberMessage)
remember.close()
elif "what do you remember" in query:
remember = open("Remember.txt", "r")
speak("You told me to remember that" + remember.read())
elif "shutdown the system" in query:
speak("Are You sure you want to shutdown")
shutdown = input("Do you wish to shutdown your computer? (yes/no)")
if shutdown == "yes":
os.system("shutdown /s /t 1")

elif shutdown == "no":


break
OUTPUT

You might also like