Final Document
Final Document
THROUGH VOICE
ABSTRACT
Billions of devices belonging to different manufacturers and domains are connected to the
Internet of Things. The amount of consumer equipment that a single person can manage,
however, is limited. Additionally, personal information is spread across several sources, which
prevents a seamless integration. Intelligent personal assistants are a solution to facilitate the
integration of online services and data sources. However, the management of equipment is still
challenging due to difficulty of interaction, security information integration, among other
problems. This paper proposes the architecture and implementation of an intelligent personal
assistant based on the Swarm, a decentralized platform for heterogeneous smart devices. Our
proposal focuses on ease of interaction and semantic data integration.
TABLE OF CONTENTS
ABSTRACT i
v
LIST OF FIGURES
LIST OF SYMBOLS vii
xi
LIST OF ABBREVIATIONS
LIST OF TABLES xii
1. CHAPTER 1 : INTRODUCTION
1.1 GENERAL
1.2 OBJECTIVE
1.3 EXISTING SYSTEM
1.3.1 DISADVANTAGES
1.4 LITERATURE SURVEY
1.5 PROPOSED SYSTEM
1.5.1 ADVANTAGES
2. CHAPTER 2 :PROJECT DESCRIPTION
2.1 GENERAL
2.2 METHODOLOGIES
2.2.1 MODULES NAME
2.2.2 MODULES EXPLANATION
2.3 TECHNIQUE OR ALGORITHM
2.3.1 PSEDUO CODE
3. CHAPTER 3 : REQUIREMENTS
3.1 GENERAL
3.2 HARDWARE REQUIREMENTS
3.3 SOFTWARE REQUIREMENTS
3.4 FUNCTIONAL REQUIREMENTS
3.5 NON FUNCTIONAL REQUIREMENTS
4. CHAPTER 4 : SYSTEM DESIGN
4.1 GENERAL
4.2 UML DIAGRAMS
4.2.1 USE CASE DIAGRAM
4.2.2 CLASS DIAGRAM
4.2.3 OBJECT DIAGRAM
4.2.4 COMPONENT DIAGRAM
4.2.5 DEPLOYMENT DIAGRAM
4.2.6 SEQUENCE DIAGRAM
4.2.7 COLLABORATION DIAGRAM
4.2.8 STATE DIAGRAM
4.2.9 ACTIVITY DIAGRAM
4.2.10 DATA FLOW DIAGRAM
4.2.11 E-R DIAGRAM
4.2.12 SYSTEM ARCHITECTURE
5. CHAPTER 5 :SOFTWARE SPECIFICATION
5.1 PYTHON
5.2 HISTORY OF PYTHON
5.3 IMPARTANCE OF PYTHON
5.4 FEATURES OF PYTHON
5.5 LIBRARIES USED
CHAPTER 6 : IMPLEMENTATION
6. 6.1 GENERAL
6.2 IMPLEMENTATION
7. CHAPTER 7 : SNAPSHOTS
7.1 GENERAL
7.2 VARIOUS SNAPSHOTS
8. CHAPTER 8 : SOFTWARE TESTING
8.1 GENERAL
8.2 DEVELOPING METHODOLOGIES
8.3 TYPES OF TESTING
8.3.1 UNIT TESTING
8.3.2 FUNCTIONAL TESTING
8.3.3 SYSTEM TESTING
8.3.4 PERFORMANCE TESTING
8.3.5 INTEGRATION TESTING
8.3.6 ACCEPTANCE TESTING
8.4 BUILD THE TEST PLAN
9. CHAPTER 9 :
APPLICATIONS AND FUTURE ENHANCEMENT
9.1 APPLICATIONS
9.2 FUTURE ENHANCEMENTS
10. CHAPTER 10 :
10.1CONCLUSION
10.2 REFERENCES
LIST OF FIGURES
FIGURE NO NAME OF THE FIGURE PAGE NO.
2.1 Pseudo Code
Class Name
-private -attribute
# protected
+operation
+operation
+operation
Communication between
7. Communication
various use cases.
Represents physical
modules which are a
collection of components.
14. Component
Represents physical
15. Node modules which are a
collection of components.
Represents communication
Transition
18. that occurs between
processes.
5. AI Artificial Intelligence
Understanding natural language voice commands: The IPA will interpret user commands
expressed in everyday language, eliminating the need for rigid keyword phrases. .Responding
intelligently and contextually: It will not only execute commands but also provide insightful
responses based on user intent and the broader context of the interaction.
Triggering actions across online services: The IPA will connect with various online services
(e.g., calendars, booking platforms, payment gateways) to execute user-requested actions, like
scheduling appointments, booking travel, and paying bills.
IPAs have been used in the context, such as an interactive programming approach in a cloudbased
architecture, and the application of natural language processing for simplifying user interaction. A
decentralized and opportunistic solution, compatible with the Swarm heterogeneity, however, has not
been proposed yet.
1.3 METHODOLOGIES
1. Communication Agent: The Communication Agent is responsible for the user interaction, using
different methods, such as natural language, gestures and touch screens. The current version of the
Swarm Assistant uses natural language processing (NLP) for voice and text. This agent provides a
multiplatform mobile web interface, and a backend that maps intents in natural language to actual
commands in the Swarm network. Intents are associated to phrases that have the same semantic
meaning.
2. Personal Database: The Personal database stores information for the Swarm Assistant to act
proactively on behalf of a person. That includes location, interests, preferences, and relationships. This
module is a semantic database, which stores ontology of concepts and relationships related to the
person domain. It leverages interoperability through semantic inference
Speech-to-Text
Far from a being a fad, the overwhelming success of speech-enabled products like Amazon Alexa has
proven that some degree of speech support will be an essential aspect of household tech for the
foreseeable future. If you think about it, the reasons why are pretty obvious. Incorporating speech
recognition into your Python application offers a level of interactivity and accessibility that few
technologies can match. The accessibility improvements alone are worth considering. Speech
recognition allows the elderly and the physically and visually impaired to interact with state-of-the-art
products and services quickly and naturally—no GUI needed!
4. Service Manager: Analysis of commands in datasets which are stored by user and matching them
with user requested voice and arrange all the things which are ready to execute the commands if they
are available.
5. Execute Command After finding the match for the given command, run the respective command
and send the resource allocation for particular application are within the console only depending up on
type application we need to access.
Swarm assistants
• Intelligent personal assistants (IPA), are capable of gathering information and triggering actions from
online services
. • Agent provides a multiplatform mobile web interface, and a backend that maps intents in natural
language to actual commands in the Swarm network.
Speech-to-Text Conversation:
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that
develops methodologies and technologies that enable the recognition and translation of spoken language
into text by computers. It is also known as automatic speech recognition (ASR), computer speech
recognition or speech to text (STT). It incorporates knowledge and research in the computer science,
linguistics and computer engineering fields. The term voice recognition or speaker identification refers to
identifying the speaker, rather than what they are saying. Recognizing the speaker can simplify the task of
translating speech in systems that have been trained on a specific person's voice or it can be used to
authenticate or verify the identity of a speaker as part of a security process.
1. Convert voice into bits: a. Sound waves are one-dimensional. At every moment in time, they have
a single value based on the height of the wave. b. Record of the height of the wave at equally-
spaced points and it called sampling
2. Quick sidebar on Digital Sampling
a. Sampling is crating approximation of the original sound wave and some data will be lost
b. To recover that data Nyquist theorem, and with math is used to perfectly reconstruct the original
sound wave from the spaced-out samples
a. Now it will generate array of samples and each sample representing sound waves amplitude at
1/16000 th of a second interval.
b. Then that samples information need to given to neural network.
c. But trying to recognize speech patterns by processing these samples directly is difficult.
d. To reduce the complexity we need to do pre-processing to voice data.
e. Then we need to group the samples by some milliseconds time period
f. By that it will create a short recording and even though it is some difficult for different
frequency signals.
g. Because the voice will contain different pitch (low, medium, and high) and when ever those
pitches are mixed it will generate complex sound of human speech.
h. For neural networks to process, the complex sound will break apart this complex sound wave
into it’s component parts.
i. Fourier Transform breaks apart the complex sound wave into the simple sound waves that make
it up.
j. A neural network can find patterns in this kind of data more easily than raw sound waves.
a. Now we can feed the audio to our neural networks. The input to the neural network will be 20
millisecond audio chunks. b. For each little audio slice, it will try to figure out the letter that corresponds
the sound currently being spoken. c. For this we are using Recurrent Neural Network, which is having
memory that influences future predictions. d. That’s because each letter it predicts should affect the
likelihood of the next letter it will predict too. e. By memory of previous predictions helps the neural
network make more accurate predictions going forward. f. After feeding it we’ll end up with a mapping
of each audio chunk to the letters most likely spoken during that chunk.
Nyquist Sampling Theorem:
It is a theorem in the field of digital signal processing which serves as a fundamental bridge between
continuous-time signals and discrete-time signals. It establishes a sufficient condition for a sample rate
that permits a discrete sequence of samples to capture all the information from a continuous-time
signal of finite bandwidth.
Fourier Transform:
It is a mathematical transform which decomposes a function (often a function of time, or a signal) into
its constituent frequencies, such as the expression of a musical chord in terms of the volumes and
frequencies of its constituent notes. The term Fourier transform refers to both the frequency domain
representation and the mathematical operation that associates the frequency domain representation to
a function of time.
A recurrent neural network (RNN) is a class of artificial neural networks where connections between
nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic
behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to
process variable length sequences of inputs. This makes them applicable to tasks such as unsegmented,
connected handwriting recognition or speech
CHAPTER 2
LITERATURE SURVEY
Title: Swarm OS control plane: An architecture proposal for heterogeneous and organic
networks
Year: 2015
Description:
Computing swarms, consisting of smart networked sensors and actuators in the connected world,
enable an extension of the info-sphere into the physical world. We call this extended cyber-
physical info-sphere Swarm. This work presents a proposal for the Swarm Framework
Architecture. Envisioning the Swarm ecosystem, challenges and characteristics were identified
and a control plane framework was proposed. In order to exercise this framework, a use case was
designed and the system simulated. The work presents a computational control plane architecture
proposition for organic and heterogeneous networks leveraging the Swarm concept.
Title: The semantic Mediation for the Swarm: An adaptable and organic solution for the Internet
of Things
Year: 2017
Description:
This paper presents the architecture of the Mediation Service, which is a service being proposed
to support interoperability in the Internet of Things and Swarm ecosystem. Mediation uses
semantics and this paper presents strategies for service discovery using semantics. A software
architecture is presented and results in a solution to generate a matching degree of service
requests and services being offered.
Year: 2016
Description:
The Internet has emerged as a key network to make information accessible quickly and easily,
revolutionizing how people communicate and interact with the world. The information available
on the Internet about a given subject may be extensive, allowing the development of new
solutions to solve people's day-to-day problems. One such solution is the proposal of intelligent
personal assistants (IPAs), which are software agents that can assist people in many of their daily
activities. IPAs are capable of accessing information from databases to guide people through
different tasks, deploying a learning mechanism to acquire new information on user
performance. IPAs can improve the assistance they offer to users by collecting information
autonomously from objects that are available in the surrounding environment. To make this idea
feasible, IPAs could be integrated into ubiquitous computing environments in an Internet of
Things (IoT) context. Therefore, it is necessary to integrate wireless sensor networks with the
Internet properly, considering many different factors, such as the heterogeneity of objects and the
diversity of communication protocols and enabling technologies. This approach fulfills the IoT
vision. This paper surveys the current state of the art of IoT protocols, IPAs in general, and IPAs
based on IoTs.
Title: Addressing the need to capture scenarios, intentions and preferences: Interactive
intentional programming in the smart home
Year: 2018
Description:
The Internet of Things (IoT) and connected products have become part of the advance of
ubiquitous technology into personal and professional living spaces, such as the smart home.
What connectivity and distributed computing have made possible, is still programmed only
according to more or less simplified rule systems (or in traditional code); the mapping between
what end users intend or would value and what can be expressed in rules is not straightforward.
This article analyzes the temporal, preferential, technical, and social complexity of mapping end-
user intent to rules, and it suggests new concepts to better frame information that needs to be
captured to create smart-home systems that better match users’ intents. We need a new approach
aimed at first capturing end users’ intentions and potential usage scenarios, then providing this
information to a control system that learns to resolve intentions and scenarios for available
devices in the context. The new approach should deconstruct and rebuild IoT-related
programming at a higher level of abstraction that allows end users to express long-term
intentions and short-term preferences, instead of programming rules. Based on related work, a
first-person perspective and analysis of current smart-home programming practices, the concept
of Interactive Intentional Programming (IIP), is introduced and discussed.
Year: 2017
Description:
The Internet of Things will bring a scenario in which interaction between humans and devices
will be critical to allow people to use, monitor or configure Internet of Things devices.
Interactions in such applications are based on traditional graphical interfaces. Devices that accept
interaction based on Natural Language, e.g., through voice commands, can understand basic
human orders or answering questions whenever user expressions fit into a known language
pattern. Some devices can understand natural language voice commands but require
sophisticated voice assistants located in the cloud, which raises significant privacy concerns.
Others devices which handle voice-processing locally can perform a very limited local
recognition system, requiring users to be familiar with words the system can process. The
purpose of this work is to diminish the complexity of Natural Language processing in the context
of IoT. The solution posited in this article allows Internet of Things devices to offload Natural
Language processing to a system that improves the use of Natural Language and alleviates the
need to learn or remember specific words or terms intended for triggering device actions. We
have evaluated the feasibility of the design with a proof-of-concept implemented in a home
environment and it was tested by real users.
Pseudo Code:
Pseudo Code
CHAPTER 3
SYSTEM ANALYSIS
3.2 DISADVANTAGES:
• Complexity is high
3.4 ADVANTAGES:
• Time consumption is less
3.5 GENERAL
We can see from the results that on each database, the error rates are very low due to the
discriminatory power of features and the regression capabilities of classifiers. Comparing the
highest accuracies (corresponding to the lowest error rates) to those of previous works, our
results are very competitive.
3.6 HARDWARE REQUIREMENTS
The hardware requirements may serve as the basis for a contract for the implementation of the
system and should therefore be a complete and consistent specification of the whole system.
They are used by software engineers as the starting point for the system design. It shouls what
the system do and not how it should be implemented.
The software requirements document is the specification of the system. It should include both a
definition and a specification of requirements. It is a set of what the system should do rather than
how it should do it. The software requirements provide a basis for creating the software
requirements specification. It is useful in estimating cost, planning team activities, performing
tasks and tracking the teams and tracking the team’s progress throughout the development
activity.
• Platform : Spyder
EFFICIENCY
Our multi-modal event tracking and evolution framework is suitable for multimedia documents
from various social media platforms, which can not only effectively capture their multi-modal
topics, but also obtain the evolutionary trends of social events and generate effective event
summary details over time. Our proposed mmETM model can exploit the multi-modal property
of social event, which can effectively model social media documents including long text with
related images and learn the correlations between textual and visual modalities to separate the
visual-representative topics and non-visual-representative topics.
CHAPTER 4
SYSTEM DESIGN
4.1 GENERAL
Design Engineering deals with the various UML [Unified Modelling language] diagrams
for the implementation of project. Design is a meaningful engineering representation of a thing
that is to be built. Software design is a process through which the requirements are translated into
representation of the software.
The main purpose of a use case diagram is to show what system functions are performed for
which actor. Roles of the actors in the system can be depicted. The above diagram consists of
user as actor. Each will play a certain role to achieve the concept.
EXPLANATION
In this class diagram represents how the classes with attributes and methods are linked together
to perform the verification with security. From the above diagram shown the various classes
involved in our project.
4.2.3 OBJECT DIAGRAM
USER SERVER
VOICE-TO-TEXT
ASSISTANT DATABASE
EXPLANATION:
In the above digram tells about the flow of objects between the classes. It is a diagram that
shows a complete or partial view of the structure of a modeled system. In this object diagram
represents how the classes with attributes and methods are linked together to perform the
verification with security.
4.2.4 COMPONENT DIAGRAM
Provide Voice
Get Data
Voice to Text
Store Command
Get Command
Verify Command
Execute Command
Open Application
Provide Assist
2: Get Data
3: Voice to Text
6: Verify Command
7: Execute Command
8: Open Application
1: Provide Voice
User Server
9: Provide Assist
5: Get Command
4: Store Command
Databas
e
User
Voice Command
Get Data
Voice to Text
Store Data
Verify Command
Open Application
Get Assist
State diagram are a loosely defined diagram to show workflows of stepwise activities and
actions, with support for choice, iteration and concurrency. State diagrams require that the
system described is composed of a finite number of states; sometimes, this is indeed the case,
while at other times this is a reasonable abstraction. Many forms of state diagrams exist, which
differ slightly and have different semantics.
User
Voice Command
Voice to Text
Store Data
Get Command
Open Application
Get Assist
LEVEL 0
Voice
User
Command
Get
Voice to Text Store Data Assistance
LEVEL 1
Voice Command
A data flow diagram (DFD) is a graphical representation of the "flow" of data through an
information system, modeling its process aspects. Often they are a preliminary step used to create an
overview of the system which can later be elaborated. DFDs can also be used for the visualization of data
processing (structured design).
A DFD shows what kinds of data will be input to and output from the system, where the data will
come from and go to, and where the data will be stored. It does not show information about the timing of
processes, or information about whether processes will operate in sequence or in parallel.
4.2.11E-R DIAGRAM
Reminde Commands
Assist Verify
Applicatio Stored
Data
UserNam
Voice
Text
Comman
EXPLANATION:
Entity-Relationship Model (ERM) is an abstract and conceptual representation of data. Entity-
relationship modeling is a database modeling method, used to produce a type of conceptual
schema or semantic data model of a system, often a relational database.
Explanation:
User who wants control the device with his/her commands. User tries to interact with the
system which will control all the devices or execute the commands of user. User will interact
with system by using multiple interfaces (voice and text). User can choose any one interface to
interact with the system. Then user will pass the command and it will verify with the personal
database where the system have all the commands to verify. The user command will verify with
that database and it is given to communication agent. Which will provide whatever the action
that user wants to perform with the system.
Swarm network is overlay network which handles control and data traffic related to
services. Swarm broker which provides semantic registry and discovery of services, an attribute-
based access control system. A web crawler (also known as a web spider or web robot) is a
program or automated script which browses the World Wide Web in a methodical, automated
manner. Swarm assistant is system which is combination of multiple interfaces, communication
agent, personal database, and WebCrawler. And it will connect to swarm broker which is a
registry that have operations or services to do. And it will act mediator between the system and
network.
Swarm assistant is an assistant device which is used to work based on the voice
commands of the user. Initially user will give the voice command to the system. System will
convert the voice command to text. Then based on the text command the communication agent
will verify with the personal database. If the data is matched with personal database, then the
action is given to swarm broker to execute.
If user want to listen songs then user will give the voice command that play music.
According to that command the system will select the song play that song. If user wants to access
facebook at that time it will go web crawler from there it will open the social networking sites.
CHAPTER 5
DEVELOPMENT TOOLS
5.1 PYTHON
Python was developed by Guido van Rossum in the late eighties and early nineties at the
National Research Institute for Mathematics and Computer Science in the Netherlands.
Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68,
SmallTalk, and Unix shell and other scripting languages.
Python is copyrighted. Like Perl, Python source code is now available under the GNU General
Public License (GPL).
Python is now maintained by a core development team at the institute, although Guido van
Rossum still holds a vital role in directing its progress.
• Python is Interactive − You can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.
• Easy-to-learn − Python has few keywords, simple structure, and a clearly defined
syntax. This allows the student to pick up the language quickly.
• Easy-to-read − Python code is more clearly defined and visible to the eyes.
• A broad standard library − Python's bulk of the library is very portable and cross-
platform compatible on UNIX, Windows, and Macintosh.
• Interactive Mode − Python has support for an interactive mode which allows interactive
testing and debugging of snippets of code.
• Portable − Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.
• Extendable − You can add low-level modules to the Python interpreter. These modules
enable programmers to add to or customize their tools to be more efficient.
• GUI Programming − Python supports GUI applications that can be created and ported
to many system calls, libraries and windows systems, such as Windows MFC,
Macintosh, and the X Window system of Unix.
• Scalable − Python provides a better structure and support for large programs than shell
scripting.
Apart from the above-mentioned features, Python has a big list of good features, few are listed
below −
• It can be used as a scripting language or can be compiled to byte-code for building large
applications.
• It provides very high-level dynamic data types and supports dynamic type checking.
• It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.
• scikit-learn - the machine learning algorithms used for data analysis and data mining
tasks.
Coding:
import pyttsx3
import webbrowser
import smtplib
import random
import speech_recognition as sr
import wikipedia
import datetime
import wolframalpha
import os
import sys
import glob
engine = pyttsx3.init('sapi5')
client = wolframalpha.Client('Your_App_ID')
voices = engine.getProperty('voices')
engine.setProperty('voice', voices[len(voices)-1].id)
def speak(audio):
engine.say(audio)
engine.runAndWait()
def greetMe():
currentH = int(datetime.datetime.now().hour)
speak('Good Morning’)
speak('Good Afternoon!')
speak('Good Evening!')
greetMe()
def myCommand():
r = sr.Recognizer()
print("Listening...")
r.pause_threshold = 1
audio = r.listen(source)
try:
except sr.UnknownValueError:
return query
if __name__ == '__main__':
while True:
query = myCommand();
query = query.lower()
speak('okay')
webbrowser.open('www.youtube.com')
elif 'open google' in query:
speak('okay')
webbrowser.open('www.google.co.in')
speak('okay')
webbrowser.open('www.facebook.com')
speak('okay')
webbrowser.open('www.gmail.com’)
speak('okay')
webbrowser.open('www.twitter.com')
stMsgs = ['Just doing my thing!', 'I am fine!', 'Nice!', 'I am nice and full of energy']
speak(random.choice(stMsgs))
recipient = myCommand()
if 'me' in recipient:
try:
content = myCommand()
server.ehlo()
server.starttls()
server.login("Your_Username", 'Your_Password')
server.close()
speak('Email sent’)
except:
speak('okay')
sys.exit()
elif 'hello' in query:
speak('Hello Sir')
sys.exit()
music_folder = 'music'
music=glob.glob("music/*.mp3")
os.system('start '+random.choice(music))
sys.exit()
else:
query = query
speak('Searching...')
try:
try:
res = client.query(query)
results = next(res.results).text
speak('Got it.')
speak(results)
except:
speak('Got it.')
speak(results)
except:
webbrowser.open('www.google.com')
This project is implements like application using python and the Server process is maintained
using the SOCKET & SERVERSOCKET and the Design part is played by Cascading Style
Sheet.
SNAPSHOTS
When user given command to open e-mail then it will take the command and verify with
database and executed the command. Then open e-mail by using web crawler.
When user wants book a bus ticket then user will send the voice command to open abhibus or
book bus then it will open the bus booking web site.
CHAPTER 8
CONCLUSION
This paper proposed architecture for a personal assistant that fulfills the challenges of interacting
with the system to perform the operations that user wants to execute. It was built on top of the
Swarm platform, and added natural language processing and semantic integration of personal
data gathered from different sources.
8.1 APPLICATIONS
1. Biometric system
2. Alexa
Future work will investigate the performance, security and privacy of the system, along with an
analysis of the costs of data acquisition and inference. Finally, we seek to include opportunistic
use of resources and automatic service composition.
REFERENCES
[1] L. C. P. Costa, J. Rabaey, A. Wolisz, M. Rosan, and M. K. Zuffo, “Swarm os control plane:
an architecture proposal for heterogeneous and organic networks,” IEEE Transactions on
Consumer Electronics, vol. 61, no. 4, pp. 454–462, November 2015.
[4] M. Funk, L.-L. Chen, S.-W. Yang, and Y.-K. Chen, “Addressing the need to capture
scenarios, intentions and preferences: Interactive intentional programming in the smart home,”
International Journal of Design, vol. 12, no. 1, pp. 53–66, 2018.
[6] J. Santos, J. Rodrigues, B. Silva, J. Casal, K. Saleem, and V. Denisov, “An iot-based mobile
gateway for intelligent personal assistants on mobile health environments,” vol. 71, 03 2016.