Thesis Format
Thesis Format
)
Department of Computer Science and Engineering
Chatauna, Mandir Hasaud, Raipur, (C.G.)
I the undersigned solemnly declare that the report of the thesis work entitled “JARVIS
DESKTOP AI ASSISTANT” is based my own work carried out during the course of my
study under the supervision of Mr. YOGESH RATHORE, Asst. Prof., Department of
CSE, RITEE, Raipur.
I assert that the statements made and conclusions drawn are an outcome of the project
work. I further declare that to the best of my knowledge and belief that the report does not
contain any part of any work which has been submitted for the award of any other
degree/diploma/certificate in this University /deemed University of India or any other country.
-----------------------------------
(Signature of the Candidate)
I
RAIPUR INSTITUTE OF TECHNOLOGY, RAIPUR (C.G.)
Department of Computer Science and Engineering
Chatauna, Mandir Hasaud, Raipur, (C.G.)
This is to certify that the report of the thesis entitled “JARVIS DESKTOP AI ASSISSTANT” is
a record of research work carried out by bearing Roll No. 301202216011 & Enrollment No.:
BA3578 under my guidance and supervision for the award of Degree of Bachelor of Engineering
in Computer Science and Engineering of Chhattisgarh Swami Vivekanand Technical University,
Bhilai (C.G.), India.
To the best of my knowledge and belief the thesis
i) Embodies the work of the candidate herself/himself,
ii) Has duly been completed,
iii) Fulfills the requirement of the Ordinance relating to the B.E. degree of the
University and
iv) Is up to the desired standard both in respect of contents and language for being
referred to the examiners.
----------------------------------------
(Signature of the Principal)
II
RAIPUR INSTITUTE OF TECHNOLOGY, RAIPUR (C.G.)
Department of Computer Science and Engineering
Chatauna, Mandir Hasaud, Raipur, (C.G.)
(Roll No.: 301202216011 Enrollment No.: BA3578 has been examined by the undersigned as a
part of the examination and is hereby recommended for the award of the degree of Bachelor of
------------------------------ -------------------------------
Internal Examiner External Examiner
Date: Date :
II
I
ACKNOWLEDGEMENT
The pleasure, the achievement, the glory, the satisfaction, the reward appreciation and the
construction of my project cannot be thought of without the few, who apart from their regular
schedule spared their valuable time. A number of persons contribute either directly or indirectly in
shaping and achieving the desired outcome. I owe a debt of gratitude to Mr. AVINASH DHOLE
(HOD) for providing me with an opportunity to develop this project. Through her/his timely advice,
constructive criticism and supervision she was a real source of inspiration for me.
I express my sincere thanks to my guide, Mr. YOGESH RATHORE , ASSISSTANT
PROFESSOR, Department of Computer Science & Engineering, RAIPUR INSTITUTE OF
TECHNOLOGY for his valuable guidance, suggestions and help required for executing the project
work time to time. Without his direction and motivation, it would have been nearly impossible for
me to achieve the initial level of target planned.
I am also expressing my unending gratefulness towards Mr. ABHISHEK SAW for their
remarkable advice and unending support throughout, thus leading me to my objective. Also I
express cordial thanks to Dr SANJEEV SRIVASTAVA , Principal, Raipur Institute of
Technology, Raipur, for providing necessary infrastructural and moral support.
At the last but not the least I am really thankful to my parents for always encouraging me in my
studies and also to my friends who directly or indirectly help me in this work.
----------------------------------
(Signature of the Student)
Name: GAURAV RAJ,
RAIPUR INSTITUTE OF
TECHNOLOGY
I
V
Abstract
Intelligent Personal Assistants (IPA) are implemented and used in Operating
Systems, Internet of Things (IOT), and a variety of other systems. Many im-
plementations of IPAs exists today and companies such as Apple, Google and
Microsoft all have their implementations as a major feature in their operating
systems and devices . With the use of Natural Language Processing (NLP),
Machine Learning (ML), Artificial Intelligence (AI) , and prediction models
from these fields in Computer Science (CS) , as well as theory and techniques
from Human - Computer Interaction (HCI) , IPAs are becoming more intelli-
gent and relevant . This paper aims to analyse and compare the current major
implementations of IPAs in order to determine which implementation is the
most developed at this moment in time and is contributing to the sustainable
future of AI.
V
List of Tables
PAGE NO.
V
I
List of Figures
PAGE NO.
V
II
List of Abbreviations
V
II
I
Table of Contents
PAGE NO.
Reference
X
Chapter - 1
Intoduction
In the Modern Era of fast moving technology we can do things which we never
thought we could do before but, to achieve and accomplish these thoughts there is a
need for aplatform which can automate all our tasks with ease and comfort. Thus we
need to develop a Personal Assistant having brilliant powers of deduction and the
ability to interact with the surroundings just by one of the materialistic form of human
interaction i.e. HUMAN VOICE. The Hardware device captures the audio request
through microphone and processes the request so that the device can respond to the
individual using in-built speaker module. For Example, if you ask the device ’what’s
the weather?’ or ’how’s traffic?’ using its built-in skills, it looks up the weather and
traffic status respectively and then returns the response to the customer through
connected speaker.
1.1. Background :
In 21st century, everything is leaning towards automation, may it be your home or
car. There is an unbelievable change rather advancement in technology over the last
few years. Believe it or not, in today’s world you can interact with your machine. What
is interacting with a machine? Obviously giving it some input, but what if the input is
not in the conventional way of typing, rather it is your own Voice. What if you are
talking to the machine, giving it commands and wanting the machine to interact with
you like your assistant? What if the machine is not giving you answers just by
showing you the best results but also by advising you with a better alternative. An
easy access to machine with voice commands is the revolutionary way of human-
system interaction. To achieve this, we need to use speech to text API for
understanding the input. Many companies like Google, Amazon and Apple are trying
to achieve this in generalized form. Isn’t it amazing that you can set reminders by just
saying remind me to.... Or set alarm with wake me up at .. Understanding the
importance of this we have decided to make a system that can be placed anywhere
in vicinity and you can ask it to help you do anything for you just by speaking with it.In
addition to this, you can also connect two such devices through WiFi and make them
communicated with each other in future. This device can be very handy for day to day
X
I
use and it can help you function better by constantly giving you reminders and
updates. Why would we need it? Because your own voice is turning into a best input
device than a conventional enter key.
1.2. Limitation of Existing System :
Voice recognition software won't always put your words on the screen
completely accurately. Programs cannot understand the context of language
the way that humans can, leading to errors that are often due to
misinterpretation.
Voice recognition systems can have problems with accents. Even though
some may learn to decode your speech over time, you have to learn to talk
consistently and clearly at all times to minimize errors. If you mumble, talk too
fast or run words into each other, the software will not always be able to cope.
Programs may also have problems recognizing speech as normal if your voice
changes, say when you have a cold, cough, sinus or throat problem.
To get the best out of voice recognition software, you need a quiet
environment. Systems don't work so well if there is a lot of background noise.
They may not be able to differentiate between your speech, other people
talking and other ambient noise, leading to transcription mix-ups and errors.
This can cause problems if you work in a busy office or noisy environment.
Wearing close-talking microphones or noise-canceling headsets can help the
system focus on your speech.
If you use voice recognition technology frequently, you may experience some
physical discomfort and vocal problems. Talking for extended periods can
cause hoarseness, dry mouth, muscle fatigue, temporary loss of voice and
vocal strain. The fact that you aren't talking naturally may make this worse and
you may need to learn how to protect your voice if you'll use a program
regularly.
1.4.2 pyaudio :
pyAudioAnalysis is a Python library covering a wide range of audio analysis
tasks. Through pyAudioAnalysis you can:
Extract audio features and representations (e.g. mfccs, spectrogram,
chromagram)
Classify unknown sounds
Train, parameter tune and evaluate classifiers of audio segments
Detect audio events and exclude silence periods from long recordings
Perform supervised segmentation (joint segmentation – classification)
Perform unsupervised segmentation (e.g. speaker diarization)
Extract audio thumbnails
Train and use audio regression models (example application: emotion
recognition)
Apply dimensionality reduction to visualize audio data and content similarities
1.4.3. pyttsx3 :
An application invokes the pyttsx3.init() factory function to get a
reference to a pyttsx3.Engine instance. During construction, the
engine initializes a pyttsx3.driver.DriverProxy object responsible
for loading a speech engine driver implementation from the
pyttsx3.drivers module. After construction, an application uses the
engine object to register and unregister event callbacks; produce and
stop speech; get and set speech engine properties; and start and stop
event loops.
pyttsx3.init([driverName : string, debug : bool]) →
pyttsx3.Engine
X
V
Gets a reference to an engine instance that will use
the given driver. If the requested driver is already
in use by another engine instance, that engine is
returned. Otherwise, a new engine is created.
1.4.4.speech recognition :
Speech Recognition is an important feature in several
applications used such as home automation, artificial
intelligence, etc. This article aims to provide an
introduction on how to make use of the
SpeechRecognition library of Python. This is useful
as it can be used on microcontrollers such as
Raspberri Pis with the help of an external
microphone.
Speech recognition is the process of converting audio
into text. This is commonly used in voice assistants
like Alexa, Siri, etc. Python provides an API called
SpeechRecognition to allow us to convert audio into
text for further processing. In this article, we will
look at converting large or long audio files into
text using the SpeechRecognition API in python.
One way to process the audio file is to split it into
chunks of constant size. For example, we can take an
audio file which is 10 minutes long and split it into
60 chunks each of length 10 seconds. We can then feed
these chunks to the API and convert speech to text by
concatenating the results of all these chunks. This
method is inaccurate. Splitting the audio file into
chunks of constant size might interrupt sentences in
between and we might lose some important words in the
process. This is because the audio file might end
X
V
I
before a word is completely spoken and google will
not be able to recognize incomplete words.
X
V
II
Python is one of the most frequently used languages in
recent times for various tasks such as data
processing, data analysis, and website building. In
this process, there are various tasks that are
operating system dependent. Python allows the
developer to use several OS-dependent functionalities
with the Python module os. This package abstracts the
functionalities of the platform and provides the
python functions to navigate, create, delete and
modify files and folders. In this tutorial one can
expect to learn how to import this package, its basic
functionalities and a sample project in python which
uses this library for a data merging task.
X
X
Chapter – 2
X
X
II
I
Analytics platform must be applicable to a machine or facility of any size. The
solution must be able to add assets without a need for any incremental investment
in hardware, software or dedicated labor hours.
2.3. Software and Hardware Requirement :
Artificial Intelligence (AI) software is a reality, but only for limited classes of
problems. In general, AI problems are significantly different from those of
conventional software engineering. The differences suggest a different program
development methodology for AI problems: one that does not readily yield
programs with the desiderata of practical software (reliability, robustness, etc.).
In addition, the problem of machine learning must be solved (to some degree)
before the full potential of AI can be realized, but the resultant self-adaptive
software is likely to further aggravate the software crisis. Realization of the full
potential of AI in practical software awaits some prerequisite breakthroughs in
both basic AI problems and an appropriate AI software development
methodology.
It is required to have a mother board which has on it, CPU, RAM, Video Card
and normally drives. This is necessary for the software to run and normally will
require an output device like monitors and speech recognition. The output
devices help the computer to recognize other information that it may have not
been able to recognize without it. This is important in some applicants such as a
robot which will react when you say words will be able to understand you if it
connected with a microphone and has the capability of using it.
Hardware Requirement :
→ CPU: Core i5 / i7 or higher
→ RAM: 8 GB OR higher
→ Video Card: NVIDIA 7800 Series, ATI Radeon 1800 Series or better
→ Graphic Card: 512MB of Graphics Memory
→ Storage: 12GB
→ Sound Card: DirectX 9.0c Compatible
X
X
I
V
Software synthesises data received from the hardware. Once the data has been
received and processed the AI system needs to make an intelligent response. To
create this software non-procedural languages are often used. These include
languages such as LISP and PROLOG. Both of these languages will actually
allow the system to learn and modify its responses to its environment. The
software requirements for a modeling and simulation of AI needed to do this
include:
a fast CPU
large amounts of RAM
large storage capacity (i.e. large hard drive)
a good graphics card
you may also require specialized input output devices
Chapter – 3
Software Design Phase :
3.1. Dataflow Diagram :
X
X
V
1.1 dataflow diagram
X
X
V
I
1.2. Speech Recognition dataflow diagrams
X
X
V
II
3.2. ER Diagram :
X
X
V
II
3.3. UML Diagram :
X
X
I
X
Chapter – 4
Implementation and Results :
4.1. Description of Modules :
4.1.1 speech_recognition module :
This module will record audio from your microphone, send it to the speech API
and return a Python string.
The audio is recorded using the speech recognition module, the module will
include on top of the program. Secondly we send the record speech to the
Google speech recognition API which will then return the output.
4.1.2 text_to_speech module :
Pyttsx3 is a cross-platform speech (Mac OSX, Windows, and Linux) library.
You can set voice metadata such as age, gender, id, language and name. Thee
speech engine comes with a large amount of voices.
X
X
X
Text to Speech – You can make Jarvis read any text using text to speech. You
can make it read emails, e-books, stories or any website content just by
selecting text.
Play/Search Songs and Videos – This is a super cool feature of this AI
program. Just say Play <song/artist name>, for eg. Play Hips don’t lie or Play
Akon and this app will start playing the song for you. Similarly, you can also
play videos on your computer by voice command.
Window Control – Window control commands allows you to Minimize,
Maximize and Restore Windows. It also allows you to Scroll Up, Scroll Down
and even Switch Applications.
4.3 Implementation :
Artificial Intelligence personal assistants have become plentiful over the last few
years. Applications such as Siri, Bixby, Ok Google and Cortana make mobile device
users’ daily routines that much easier. You may be asking yourself how these function.
Well, the assistants receive external data (such as movement, voice, light, GPS
readings, visually defined markers, etc.) via the hardware’s sensors for further
processing - and take it from there to function accordingly.
X
X
X
I
In order to create a simple personal AI assistant, some programming abilities are
required. In particular, such languages as Python (being the most popular in this
regard) are used for the creation of AI-based apps.
Python is used as a base for the most renowned AI-based software because of its
flexibility, simplicity and longstanding reputation.
To successfully develop a virtual assistant, even an experienced Python developer
would need to advance the level of qualification from time to time, so topical literature
will come in handy. We can recommend several useful tools to make the stages of
AIA creation easier. Familiarize yourself with such libraries and tools as NumPy,
Matplotlib, Pandas, Scikit, Theano, AIMA, pyDatalog, SimpleAI, EasyAi, PyBrain,
MDP, scikit, PyML and others.
It requires significant financial and time resources, which is why modern IT-
companies tend to choose ready-made platforms (described briefly above) in order to
create a unique product. As a result, the testing of general functionality of each
developed AI-based software consists of attempts to analyze the user interface, along
with quality and speed, with which it memorizes individual user interactions (route
optimizing, topics of general education, time management, informal conversations
with assistant, etc.) Monitoring will result in changes to further interaction
dependencies based on the influence of previous user experiences.
The final step is to tie your application in the process(text) function. I left this function
empty. You can customize this portion as you wish and process the text to respond to
user queries.
X
X
X
II
Chapter - 6
Conclusion and Scope of Work :
6.1. Conclusion :
Project development and implementation → As it has been previous stated, the
program is mainly concerns with the techniques of Android development, Java
programming, Database management, Cloud computing, different APIs for Google
products, Bing translate and etc. The program is developed by two developers and
follows the extreme programming model. During the eight weeks development, the
developers did the same cycle in each phase of analyze requirements, construct design,
implement the solutions in pair programming mode and test the result. The
development is carried out as its primary planning which guide the work process of
how to work with the program, how much time should the each of the developers
spent in every week, the rescores needed for developing and how to handle the
problems while it came up. The project was efficiently completed under the
development model and the resources we found in early time were really useful when
implementing the program.
Project usage & prospect, potential → The project is very useful and owns a large
potential use in different industries. Although the program primary concerns more
about how to do the personal assistant on Android phone using the voice, the concept
of voice recognition can be applied in different industries as in many situations it will
be more convenient, save a lot of time and helpful especially for those who have
difficulty in working with manual operations.
X
X
X
II
6.2. Future Scope of Work :
No program has a perfect design without any flaws; it is the same here in this program.
Even though the program is completed with all the primary functions implemented and
work properly, there are still many things that can be done with this program. As the
future improvement, the potential work that can be implemented ranging from adding
more functions to offering the user a more comprehensive, convenient program,
refining the logic to make the program more humanized and easy to use, increase the
database capacity and add more possible keywords, responses and data in this
program, interface optimization and etc.
Add more functions: although there have been 8 normal functions that are used really
often with the mobile phone, there can be more functions which simplify our daily life
and make it convenient to use. Functions as playing movies, checking stocks,
exchange rate, downloading and uploading, installing APPs and etc, these can be the
potential functions that make the program more comprehensive and people can enjoy
more services in this program.
The more humanized the program is, more easier the user can use it. People should
accept that even if developers constantly try to add more predefined commands, more
responses to it, analyze and respond to the command more intelligently, the program
will never be completely comprehensive and contain all the possible circumstances
that the users meets. Nevertheless, the program will certainly be improved and be
more user-friendly if there can be more readable commands, more humanized structure
and more intelligent response.
X
X
X
I
Table 1.1 Future Scope
X
X
X
V
References :
http://www.ai.sri.com/~nysmith/slides/aic-seminars/070322-wobcke.pdf
www.youtube.com
www.googlescholar.com
https://codewithharry.com/videos/python-tutorials-for-absolute-beginners-120
https://www.youtube.com/watch?v=9SqTJQGDJZ0&list=PLkb3p1b2TbNwHFsnLLb
6ovtm5_9VdOffD&index=12
OpenAI, Blog, https://openai.com/blog/
N. Bostrom, Superintelligence: Paths, dangers, strategies
Microsoft, Cortana actions, https://msdn.microsoft.com
/en-us/cortana/actiontypeofaction,
Apple,Introduction to sirikit,
https://developer.apple.com/library/prerelease/content/documentation/Intents/Concept
ual/SiriIntegrationGuide/index.html
X
X
X
V