0% found this document useful (0 votes)
456 views

C14 - Speech Emotion Recognition Using Machine Learning

This document describes a project report on speech emotion recognition using machine learning submitted by four students at Panimalar Engineering College in partial fulfillment of their Bachelor of Technology degree in Information Technology. The report includes an introduction to speech emotion recognition, a literature survey of existing approaches, the proposed system design using various diagrams, module design details, software and hardware requirements, implementation details, testing procedures, and conclusions. The goal of the project is to detect underlying emotions in recorded speech by analyzing acoustic features of audio data using machine learning techniques.

Uploaded by

Perinban D
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
456 views

C14 - Speech Emotion Recognition Using Machine Learning

This document describes a project report on speech emotion recognition using machine learning submitted by four students at Panimalar Engineering College in partial fulfillment of their Bachelor of Technology degree in Information Technology. The report includes an introduction to speech emotion recognition, a literature survey of existing approaches, the proposed system design using various diagrams, module design details, software and hardware requirements, implementation details, testing procedures, and conclusions. The goal of the project is to detect underlying emotions in recorded speech by analyzing acoustic features of audio data using machine learning techniques.

Uploaded by

Perinban D
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 118

SPEECH EMOTION RECOGNITION USING

MACHINE LEARNING

A PROJECT REPORT

Submitted by

PERINBAN D (211417205113)

BALAJI M (211417205025)

GOPINATH D (211417205054)

HARIHARAN S J(211417205055)

in partial fulfillment for the award of the degree

of

BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY

PANIMALAR ENGINEERING COLLEGE, POONAMALEE

ANNA UNIVERSITY: CHENNAI 600 025

AUGUST 2021
ANNA UNIVERSITY: CHENNAI 600 025

BONAFIDE CERTIFICATE

Certified that this project report “SPEECH EMOTION RECOGNITION

USING MACHINE LEARNING” is the bonafide work of “PERINBAN

.D(211417205113),BALAJI.M(211417205025),GOPINATH.D(211417205054),

HARIHARAN.S.J(211417205055)”who carried out the project work under my

supervision.

SIGNATURE SIGNATURE
Dr. M. HELDA MERCY, M.E., PH.D., Ms. S.KUMARI, M.E.,

HEAD OF THE DEPARTMENT SUPERVISOR


Assistant Professor

Department of Information Technology Department of Information Technology

Panimalar Engineering College Panimalar Engineering College

Poonamallee, Chennai - 600 123 Poonamallee, Chennai - 600 123

Submitted for the Project and Viva Voce Examination held on 7-8-2021

SIGNATURE SIGNATURE

INTERNAL EXAMINAR EXTERNAL EXAMINAR


ACKNOWLEDGEMENT

A project of this magnitude and nature requires kind co-operation and


support from many, for successful completion . I wish to express our sincere
thanks to all those who were involved in the completion.
I would like to express my deep gratitude to Honorable Secretary and
Correspondent, Dr. P.CHINNADURAI, M.A., Ph.D., for his kind words and
enthusiastic motivation which inspired me a lot.
I also express my sincere thanks to Our Respected Directors,
Mrs. C. VIJAYA RAJESHWARI, Mr. C. SAKTHI KUMAR, M.E. and
Mrs. SARANYA SREE SAKTHI KUMAR B.E, M.B.A for providing us with
the necessary facilities for the completion.
I also express my appreciation and gratefulness to my Principal, Dr.
K. MANI, M.E., Ph.D., who helped us in the completion of the project.
I wish to convey my thanks and gratitude to our Head of the
Department, Dr. M. HELDA MERCY, M.E., Ph.D., Department of
Information Technology, for her support and by providing us ample time to
complete our project.
I express my indebtedness and gratitude to my staff in charge,
Ms. S. KUMARI, M.E., Assistant Professor, Department of Information
Technology for her guidance throughout the course of my project.
I thank my parents and friends for providing their extensive moral support
and encouragement during the course of the project.
Last but never the least, I thank God Almighty for showering his abundant
grace upon us so that we could complete the project successfully on time.

i
DECLARATION

I hereby declare that the project report entitled “SPEECH EMOTION


RECOGNITION USING MACHINE LEARNING” which is being
submitted in partial fulfilment of the requirement of the course leading to the
award of the ‘Bachelor Of Technology in Information Technology’ in
Panimalar Engineering College, Affiliated to Anna University- Chennai is
the result of the project carried out by me under the guidance and supervision of
Ms.S.KUMARI, M.E., Assistant Professor in the department of
Information Technology. I further declare that I or any other person has not
previously submitted this project report to any other institution/university for
any other degree/ diploma or any other person.
Date:
Place: Chennai

( PERINBAN D )

( BALAJI M)

(GOPINATH D )

( HARIHARAN S J )

It is certified that this project has been prepared and submitted under my
guidance.

Date: 7-8-2021 Ms. S. KUMARI


Place: Chennai ( Assistant Professor / IT )

ii
TABLE OF CONTENTS

CHAPTER PAGE
TITLE
NO NO
ABSTRACT Vii

LIST OF TABLES Viii

LIST OF FIGURES Ix

LIST OF ABBREVIATIONS X

1 INTRODUCTION

1.1 OVERVIEW OF THE PROJECT 2

1.2 NEED OF THE PROJECT 4

1.3 OBJECTIVE OF THE PROJECT 5

1.4 SCOPE OF THE PROJECT 6

2 LITERATURE SURVEY

2.1 EMOTION RECOGNITION 8

2.2 EMOTION DETECTION 9

2.3 RANKING SVM APPROACH


9

2.4 LPC COEFFICIENT APPROACH


10

iii
2.5 FEASIBILITY STUDY 10

3 SYSTEM DESIGN
3.1 PROPOSED SYSTEM ARCHITECTURE
14
DESIGN

3.2 DATAFLOW DIAGRAM 14


16
3.3 UML DIAGRAM
3.3.1 Use Case Diagram 16

3.3.2 Sequence Diagram 17

3.3.3 Class Diagram 18

3.3.4 Collaboration Diagram 19


20
3.3.5 Activity Diagram
3.4 BLOCK DIAGRAM
21
3.5 SPEECH EMOTION RECOGNITION 22

4 MODULE DESIGN
4.1 SPEECH PROCESSING MODULE 34
34
4.2 PRE-PROCESSING MODULE
35
4.3 FEATURES ETRACTING MODULE
4.4 CLASSIFIER MODULE 36
4.5 EMOTION DETECTION MODULE 37

5 REQUIREMENT SPECIFICATION

5.1 HARDWARE REQUIREMENT 40

iv
5.2 SOFTWARE REQUIREMENT 40

5.2.1 Introduction To Python 40


42
5.2.2 Integrated Development Environment
5.2.3 Python Libraries
43

6 IMPLEMENTATION

6.1 SAMPLE CODE 48


68
6.2 DATA SETS
6.3 SAMPLE SCREEN SHOTS 78

7 TESTING AND MAINTENANCE

7.1 TESTING 82
83
7.1.1 System Testing
84
7.2 TEST CASES
7.3 TEST DATA AND OUTPUT
7.3.1 Unit Testing 85

7.3.2 Functional Tests 85

v
7.3.3 Integration Testing 86

7.4 TESTING TECHNIQUES

7.4.1 White Box Testing 87

7.4.2 Black Box Testing 88

7.5 MAINTENANCE 91

8 CONCLUSION AND FUTURE WORKS

7.1 CONCLUSION 93

7.2 FUTURE ENHANCEMENTS 94

REFERENCES 96

vi
ABSTRACT

Our human beings speech or way of explaining is amongst the most natural way to

express ourselves. As we depend mostly on it, we recognize its importance when

resorting to other communication forms like emails, messages where we often use

emojis and expression fonts to express the emotions associated with the messages. In

the life of humans emotions play a vital role in communication, the detection and

analysis of the same is of vital importance in today’s digital world of remote

communication. Emotion detection is a challenging task, because emotions are

subjective. There is no common consensus on how to measure or categorize them.

We define a speech emotion recognition system as a collection of methodologies that

process and classify speech signals to detect emotions embedded in them. Such a

system can find use in a wide variety of application areas like interactive voice

based-assistant or caller-agent conversation analysis. In this study we attempt to

detect underlying emotions in recorded speech by analysing the acoustic features of

the audio data of recordings. Emotion is an integral part of human behavior and

inherited property in all mode of communication. We, human is well trained thought

your experience reading recognition of various emotions which make us more

sensible and understandable. But in case of machine, however, it can easily

understand content based information such as information in text, audio or video but

still far behind to access the depth behind the content.

vii
LIST OF TABLES

S.NO TITLE OF THE TABLE PAGE NO

7.2 Test case for Speech Emotion Recognition 84

viii
LIST OF FIGURES

FIGURE.NO TITLE OF THE FIGURE PAGE NO

3.1 Proposed System Architecture Design 14

3.2 Data Flow Diagram 15

3.3 Use Case Diagram 17

3.4 Sequence Diagram 18

3.5 Class Diagram 19

3.6 Collaboration Diagram 20

3.7 Activity Diagram 21

3.8 Block Diagram 22


3.9 Flow Of Process 23
3.10 MFCC 32
4.1 Speech Module 34
4.2 Pre-processing Module 34
4.3 Feature Extraction Module 35
4.4 Classifier Module 36
4.5 Detection Module 37
6.1 Voice Input 78

6.2 Voice Captured 79

6.3 Gender Specification 79

6.4 Voice Analyzed 80

6.5 Graphical Result 80

7.1 Levels Of Testing 83

ix
LIST OF ABBREVIATIONS

ACRONYMS MEANING

MFCC MEL FREQUENCY CEPSTRAL


COEFFICIENTS
LPCC LINEAR PREDICTIVE CEPSTRAL
COEFFICIENTS
JS JAVASCRIPT

API APPLICATION PROGRAMMING INTERFACE

PLP PERCEPTIVE LINEAR PREDICTIVE


COEFFICIENTS
UML UNIFIED MODELING LANGUAGE

OOAD OBJECT ORIENTED ALALYSIS AND DESIGN

TC TEST CASE

LPC LINEAR PREDICTIVE CODING

FFT FAST FOURIER TRANSFORM

DCT DISCRETE COSINE TRANSFORM

DFT DISCRETE FOURIER TRANSFORM

RASTA RELATIVE SPECTRA FILTERING

IDE INTEGRATED DEVELOPMENT


ENVIRONMENT
IEMOCAP INTERACTIVE EMOTIONAL DYADIC
EMOTION CAPTURE

DBMS DATABASE MANAGEMENT SYSTEM

GB GIGABYTE

MB MEGABYTE

x
INTRODUCTION

1
CHAPTER 1

INTRODUCTION

1.1 Overview Of The Project

Speech emotion recognition is a challenging task, and extensive reliance has been
placed on models that use audio features in building well-performing classifiers. In this
paper, we propose a novel deep dual recurrent encoder model that utilizes text data and
audio signals simultaneously to obtain a better understanding of speech data. As
emotional dialogue is composed of sound and spoken content, our model encodes the
information from audio and text sequences using dual recurrent neural networks (RNNs)
and then combines the information from these sources to predict the emotion class. This
architecture analyzes speech data from the signal level to the language level, and it thus
utilizes the information within the data more comprehensively than models that focus on
audio features. Extensive experiments are conducted to investigate the efficacy and
properties of the proposed model. Our proposed model outperforms previous state-of-
the-art methods in assigning data to one of four emotion categories (i.e., angry, happy,
sad and neutral).

Furthermore, the representation of emotions can be done in two ways:

 Discrete Classification: Classifying emotions in discrete labels like anger,


happiness, boredom, etc.

 Dimensional Representation: Representing emotions with dimensions such as


Valence (on a negative to positive scale), Activation or Energy (on a low to
high scale) and Dominance (on an active to passive scale).
Both these approaches have their pros and cons. The dimensional approach is more
elaborate and gives more context to prediction but it is harder to implement and there is a
lack of annotated audio data in a dimensional format. We have used the discrete
2
Both these approaches have their pros and cons. The dimensional approach is more
elaborate and gives more context to prediction but it is harder to implement and there is a
lack of annotated audio data in a dimensional format. We have used the discrete
classification approach in the current study for lack of dimensionally annotated data in
the public domain.

This chapter presents a comparative study of speech emotion recognition


(SER) systems. Theoretical definition, categorization of affective state and the modalities
of emotion expression are presented. To achieve this study, an SER system, based on
different classifiers and different methods for features extraction, is developed. Mel-
frequency cepstrum coefficients (MFCC) and modulation spectral (MS) features are
extracted from the speech signals and used to train different classifiers. Feature selection
(FS) was applied in order to seek for the most relevant feature subset.

The categorization of emotions has long been a hot subject of debate in different
fields of psychology, affective science, and emotion research. It is mainly based on two
popular approaches: categorical (termed discrete) and dimensional (termed continuous).
In the first approach, emotions are described with a discrete number of classes. Many
theorists have conducted studies to determine which emotions are basic . A most popular
example is a list of six basic emotions, which are anger, disgust, fear, happiness, sadness,
and surprise.

3
1.2 Need Of The Project

Communication is the key to express oneself. Humans use most part of their
body and voice to effectively communicate. Hand gestures, body language, and the
tone and temperament are all collectively used to express one’s feeling. Though the
verbal part of the communication varies by languages practiced across the globe, the
non-verbal part of communication is the expression of feeling which is most likely
common among all. Therefore, any advanced technology developed to produce a
social environment experience also covers understanding emotional context in
speech.
So to overcome these problems recognition of emotion of the speech is necessary.
In developing emotionally aware intelligence, the very first step is building robust
emotion classifiers that display good performance regardless of the application; this
outcome is considered to be one of the fundamental research goals in affective
computing . In particular, the speech emotion recognition task is one of the most
important problems in the field of paralinguistics. This field has recently broadened
its applications, as it is a crucial factor in optimal humancomputer interactions,
including dialog systems. The goal of speech emotion recognition is to predict the
emotional content of speech and to classify speech according to one of several
labels (i.e., happy, sad, neutral, and angry). First, insufficient data for training
complex neural network-based models are available, due to the costs associated
with human involvement. Second, the characteristics of emotions must be learned
from low-level speech signals. Feature-based models display limited skills when
applied to this problem. To overcome these limitations, we propose a model that
uses high-level text transcription, as well as low-level audio signals, to utilize the
information contained within low-resource datasets to a greater degree. Given recent
improvements in automatic speech recognition (ASR) technology.

4
1.3 Objective Of The Project

 There are three classes of features in a speech namely, the lexical features (the
vocabulary used), the visual features (the expressions the speaker makes) and the
acoustic features (sound properties like pitch, tone, jitter, etc.).

 The problem of speech emotion recognition can be solved by analysing one or more
of these features. Choosing to follow the lexical features would require a transcript of
the speech which would further require an additional step of text extraction from
speech if one wants to predict emotions from real-time audio. Similarly, going
forward with analysing visual features would require the excess to the video of the
conversations which might not be feasible in every case while the analysis on the
acoustic features can be done in real-time while the conversation is taking place as
we’d just need the audio data for accomplishing our task. Hence, we choose to
analyse the acoustic features in this work.

The field of study is termed as Speech Processing and consists of three


components:

 Speaker Identification
 Speech Recognition
 Speech Emotion Detection

Speech Emotion Detection is challenging to implement among the other


components due to its complexity. Furthermore, the definition of an intelligent
computer system requires the system to mimic human behavior. A striking nature
unique to humans is the ability to alter conversations based on the emotional state of
the speaker and the listener. This project discusses in detail the various methods and
experiments carried out as part of implementing a Speech Emotion Recognition
system.

5
1.4 Scope Of The Project

An scope of our approach to emotion recognition in naturally occurring speech is


as follows. An emotion one out of a designated set of emotions is identified with each
unit of language (word or phrase or utterance) that was spoken, with the precise start of
each such unit determined in the continuous acoustic signal. Using these start points,
equal-length segments of the acoustic signal are demarcated, producing a set of emotion-
coded tokens. With a sufficient number of acoustic-signal segments coded for emotions
in this way, it is possible to use machine learning to detect what, in the acoustic signal,
differentiates the times an utterance is spoken when one emotion is being expressed as
opposed to another. The extent to which the emotions are successfully recognized
corresponds to how successfully the acoustic-signal segments are classified b y a
machine learning algorithm as belonging to one or another of the emotions.

6
LITERATURE
SURVEY

7
CHAPTER 2
LITERATURE SURVEY

2.1 EMOTION RECOGNITION


Classical machine learning algorithms, such as hidden Markov models
(HMMs), support vector machines (SVMs), and decision tree-based methods,
have been employed in speech emotion recognition problems . Recently,
researchers have proposed various neural network-based architectures to
improve the performance of speech emotion recognition. An initial study
utilized deep neural networks (DNNs) to extract high-level features from raw
audio data and demonstrated its effectiveness in speech emotion recognition .
With the advancement of deep learning methods, more complex neuralbased
architectures have been proposed. Convolutional neural network (CNN)-based
models have been trained on information derived from raw audio signals using
spectrograms or audio features such as Mel-frequency cepstral coefficients
(MFCCs) and low-level descriptors (LLDs) . These neural network-based
models are combined to produce higher-complexity models and these models
achieved the best-recorded performance when applied to the IEMOCAP dataset.
Another line of research has focused on adopting variant machine learning
techniques combined with neural networkbased models. One researcher utilized
the multiobject learning approach and used gender and naturalness as auxiliary
tasks so that the neural network-based model learned more features from
different dataset . Another researcher investigated transfer learning methods,
leveraging external data from related domains. As emotional dialogue is
composed of sound and spoken content, researchers have also investigated the
combination of acoustic features and language information, built belief
network-based methods of identifying emotional key phrases, and assessed the
emotional salience of verbal cues from both phoneme sequences and words.
However, none of these studies have utilized information from speech signals
8
and text sequences simultaneously in an end-to-end learning neural network-
based model to classify emotions.

2.2 EMOTION DETECTION

The speech emotion detection system is implemented as a Machine


Learning (ML) model. The steps of implementation are comparable to any other
ML project, with additional fine-tuning procedures to make the model function
better. The flowchart represents a pictorial overview of the process . The first
step is data collection, which is of prime importance. The model being
developed will learn from the data provided to it and all the decisions and
results that a developed model will produce is guided by the data. The second
step, called feature engineering, is a collection of several machine learning tasks
that are executed over the collected data. These procedures address the several
data representation and data quality issues.

The third step is often considered the core of an ML project where an


algorithmic based model is developed. This model uses an ML algorithm to
learn about the data and train itself to respond to any new data it is exposed to.
The final step is to evaluate the functioning of the built model. Very often,
developers repeat the steps of developing a model and evaluating it to compare
the performance of different algorithms. Comparison results help to choose the
appropriate ML algorithm most relevant to the problem.

2.3 RANKING SVM APPROACH

This proposed is a system that considered that the emotion expressed by


humans are mostly a result of mixed feeling. Therefore, they suggested an
improvement over the SVM algorithm that would consider mixed signals and
choose the most dominant one. For this purpose, a ranking SVM algorithm was
chosen. The ranking SVM takes all predictions from individual binary

9
classification SVM classifiers also called as rankers, and applies it to the final
multi-class problem. Using the ranking SVM algorithm, an accuracy of 44.40%
was achieved in their system.

2.4 LPC COEFFICIENT APPROACH

In the Nwe et al. [9] system, a subset of features, similar to the Mel Frequency
Cepstral Coefficients (MFCC), was used. They used the Log Frequency Power
Coefficients (LFPC) over a Hidden Markov Model (HMM) to classify emotions
in speech. Their work is not publically available, as they used a dataset privately
available to them. However, they claim that using the LFPC coefficients over the
MFFCC coefficients shows a significant improvement in terms of the accuracy
of the model. The average classification accuracy in their model is 69%.

2.5 FEASIBILITY STUDY

Existing natural emotional speech datasets each have their own


limitations. Some have a wide range of emotions, which creates difficulties for
machine-learning models. Others have only a small number of emotions or
several emotions dominated by negative or “other" 5 emotions. Higher
recognition rates have, not surprisingly, been obtained on datasets with only two
emotions. The best two-class recognition result achieved was 97.6% and it was
for unbalanced datasets from call-center data (Lee and Narayanan, 2003). This
work used a fuzzy inference classifier and10 best features selected from 21
utterance-level summary statistic features. The best recognition rate for three
emotions was 93% and it was achieved for the Swedish-language telephone
service data using Gaussian Mixture Models (GMMs) over all frames of an
utterance (Neiberg et al., 2006). For multiple-emotion recognition, an average
recognition rate of 68%was obtainedforfive emotionsusing the stock-exchange

10
dataset (Devillers et al., 2002). A balanced dataset was used for testing but not
for training and lexical cues were included in the analysis. A recognition rate of
59.8% was achieved for four emotions in the CEMO corpus (Devillers and
Vasilescu, 2006), with lexical cues again included in the analysis. Using the
“How May I Help You” dataset and four groups of features – lexical, prosodic,
dialog-act, and contextual – the recognition rate achieved for seven emotion was
79%. However, 73.1% of the instances were labeled as non-negative in the
dataset, producing a recognition baseline of 73.1% for random guessing
(Liscombe et al., 2005).

11
SYSTEM
DESIGN

12
CHAPTER 3

SYSTEM DESIGN

3.1 Design of the Proposed System


User don’t need to register into this application inorder to use this
Speech Emotion Recognition System.So,the user who is going to use the
application can start speaking whenever he is ready.The surrounding
environmentis very important,because the noises in the background may lead
the output into error here before speaking the user has to mention whether he is
a male or female.then he can start speaking,When there is a long gap after
clicking the speak button,the system will consider it as the end.And it will start
the preprocessing stage,where it will remove the noises and balance the
frequency with the help of pre-emphasis and equalization.After that,the noise
removed texts will be compared with the datasets,which is customized by
us.And if the text’s equivalent found the result will be sent back or else it will
display can’t predict the emotion.If the word is found,then the equivalent
emotion will be displayed as a result in the graphical view manner.This system
is very simple and will produce great results as well.The user have to speak in
the only language which is available,English.And the user also have to speak
very clearly so that the system can undertand it better.

13
3.1 PROPOSED SYSTEM ARCHITECTURE

Fig 3.1 Architecture Diagram

3.2 Data Flow Diagram for Proposed System

A data flow diagram (DFD) is a graphical representation of the “flow” of


data through an information system, modelling its process aspects. A DFD is
often used as a preliminary step to create an overview of the system without
going into great detail, which can later be elaborated. DFDs can also be used for
the visualization of data processing (structured design).
A DFD shows what kind of the information will be input to and output
from the system, how the data will advance through the system, and where the
data will be stored. It does not show information about the timing of process or

14
information about whether processes will operate in a sequence or in parallel
unlike a flowchart which also shows this information.
Data flow diagrams are also known as bubble charts. DFD is a designing
tool used in the top-down approach to Systems Design. This context-level DFD
is next "exploded", to produce a Level 1 DFD that shows some of the detail of
the system being modeled. The Level 1 DFD shows how the system is divided
into sub-systems (processes), each of which deals with one or more of the data
flows to or from an external agent, and which together provide all of the
functionality of the system as a whole. It also identifies internal data stores that
must be present in order for the system to do its job, and shows the flow of data
between the various parts of the system.
Data flow diagrams are one of the three essential perspectives of the structured-
systems analysis and design method SSADM. The sponsor of a project and the
end users will need to be briefed and consulted throughout all stages of a
system's evolution. With a data flow diagram, users are able to visualize how
the system will operate, what the system will accomplish, and how the system
will be implemented.

LEVEL 0:
User Gender

LEVEL 1:

User Gender Voice input

PreProcessing

15
LEVEL2:

User Gender Voice input PreProcess


ing

Feature Extraction
Graphical Result Datasets

Re

Fig 3.2 Data flow diagram

3.3 UML Diagram


Unified Modeling Language (UML) is a standardized general-purpose
modeling language in the field of software engineering. The standard is
managed and was created by the Object Management Group. UML includes a
set of graphic notation techniques to create visual models of software intensive
systems. This language is used to specify, visualize, modify, construct and
document the artifacts of an object oriented software intensive system under
development.

3.3.1 USECASE DIAGRAM


Use case diagrams overview the usage requirement for system. They are
useful for presentations to management and/or project stakeholders, but for
actual development you will find that use cases provide significantly msore
value because they describe “the meant” of the actual requirements. A use case

16
describes a sequence of action that provides something of measurable value to
an action and is drawn as a horizontal ellipse

Fig 3.3 Use case Diagram


3.3.2 SEQUENCE DIAGRAM
Sequence diagram model the flow of logic within your system in a visual
manner, enabling you both to document and validate your logic, and commonly
used for both analysis and design purpose. Sequence diagrams are the most
popular UML artifact for dynamic modelling, which focuses on identifying the
behaviour within your system.

17
We can also use the terms event diagrams or event scenarios to refer to
a sequence diagram. Sequence diagrams describe how and in what order the
objects in a system function.

Fig 3.4 Sequence Diagram

3.3.3 CLASS DIAGRAM


In software engineering, a class diagram in the Unified Modelling
Language (UML) is a type of static structure diagram that describes the
structure of a system by showing the system's classes, their attributes,
operations (or methods), and the relationships among the classes. It explains
which class contains information. Class diagram describes the attributes and

18
operations of a class and also the constraints imposed on the system. The class
diagrams are widely used in the modeling of objectoriented systems because
they are the only UML diagrams, which can be mapped directly with object-
oriented languages.

Fig 3.5 Class Diagram

3.3.4 COLLABORATION DIAGRAM


Another type of interaction diagram is the collaboration diagram. A
collaboration diagram represents a collaboration, which is a set of objects
related in a particular context, and interaction, which is a set of messages

19
exchange among the objects within the collaboration to achieve a desired
outcome.

Fig 3.6 Collaboration Diagram

3.3.5 ACTIVITY DIAGRAM


The activity diagram is the graphical representations of workflows of stepwise
activities and actions with support for choice, iteration and concurrency. The
activity diagrams can be used to describe the business and operational step-by-
step workflows of components in a system. Activity diagram consist of Initial
node, activity final node and activities in between. An activity diagram is a
behavioral diagram i.e. it depicts the behavior of a system. An activity

20
diagram portrays the control flow from a start point to a finish point showing
the various decision paths that exist while the activity is being executed.

Fig 3.7 Activity Diagram

3.4 BLOCK DIAGRAM FOR PROPOSED SYSTEM

21
Fig 3.8 Block Diagram

3.5 SPEECH EMOTION RECOGNITION:


3.5.1 Speech emotion recognition

22
SER is nothing but the pattern recognition system. This shows that the stages
that are present in the pattern recognition system are also present in the Speech
emotion recognition system. The speech emotion recognition system contains
five main modules emotional speech input, feature extraction, feature selection,
classification, and recognized emotional output [2].

The structure of the speech emotion recognition is as shown in Figure 1. Figure


1. Structure of the Speech Emotion Recognition System. The need to find out a
set of the significant emotions to be classified by an automatic emotion
recognizer is a main concern in speech emotion recognition system. A typical
set of emotions contains 300 emotional states. Therefore to classify such a great
number of emotions is very complicated. According to „Palette theory‟ any
emotion can be decomposed into primary emotions similar to the way that any
color is a combination of some basic colors. Primary emotions are anger,
disgust, fear, joy, sadness and surprise [1]. The evaluation of the speech
emotion recognition system is based on the level of naturalness of the database
which is used as an input to the speech emotion recognition system. If the
inferior database is used as an input to the system then incorrect conclusion may
be drawn. The database as an input to the speech emotion recognition system
may contain the real world emotions or the acted ones. It is more practical to
use database that is collected from the real life situations [1].

Fig 3.9 Flow of Process

23
There is a pattern recognition system stage in speech emotion recognition
system that makes them both same [22]. Energy, MFCC, Pitch like derived
speech features patterns are mapped using various classifiers. It consists of five
main modules are:
 Speech input: Input to the system is speech taken with the help of microphone
audio. Then equivalent digital representation of received audio is produced
through pc sound card.
 Feature extraction and selection:
There are 300 emotional states of emotion and emotion relevance is used to
select the extracted speech features. For speech feature extraction to selection
corresponding to emotions all procedure revolves around the speech signal.
 Classification: Finding a set of significant emotions for classification is the
main concern in speech emotion recognition system. There are 300 emotional
states contains in a typical set of emotions that makes classification a
complicated task .
 Recognized emotional output: Fear, surprise, anger, joy, disgust and sadness
are primary emotions and naturalness of database level is the basis for speech
emotion recognition system evaluation.
A typical set of emotions contains 300 emotional states. Therefore to classify
such a great number of emotions is very complicated. According to „Palette
theory‟ any emotion can be decomposed into primary emotions similar to the
way that any color is a combination of some basic colors. Primary emotions are
anger, disgust, fear, joy, sadness and surprise. The evaluation of the speech
emotion recognition system is based on the level of naturalness of the database
which is used as an input to the speech emotion recognition system. If the
inferior database is used as an input to the system then incorrect conclusion may
be drawn. The database as an input to the speech emotion recognition system

24
may contain the real world emotions or the acted ones. It is more practical to
use database that is collected from the real life situations

LIST OF MODULES:
1.Voice Input
In this module, the user have to speak up to the mic after pressing the
speak button .It will start receiving the user’s voice.
2.Voice To Text
In the second module,After receiving the voice,the MFCC ,LPCC and PLP
Features are performed on the voice to assure the normal hearable
frequencies.Then the voice will be converted to text with the help of Google
API Speech to Text.
3.Analyzing Texts extracted
In the third module,the results of the previous module Will be i.e. the
converted texts are analyzed with the customized datasets.
4.Graphical Result
In the Final module, After comparing the texts with the datasets,
a graphical based result will be displayed showing whether the emotion
Is anger, happy, neutral, etc.

TYPES OF SPEECH:
On the basis of ability they have to recognize a speech recognition systems can
be separated in different classes . Following are the classification:
 Isolated words: In this type of recognizers sample window both sides contains
low pitch utterance. At a time only single word or utterance is accepted by it
and there is need to wait between utterances by speaker as these systems have
listen/non-listen states. For this class isolated utterance is a better name.
 Connected words: In this separate utterance can run together with minimal
pause between them otherwise it is similar to isolated words.
25
 Continuous words: It allows users to speak naturally and content are
determined by computer. Creation of recognizers that have continuous speech
capabilities are difficult due to determination of utterance boundaries by
utilizing a special method.
 Spontaneous words: It can be thought of as speech at basic level that is natural
sounding and not rehearsed. Variety of natural speech features are handle is the
ability of spontaneous speech with ASR system.

3.5.2 FEATURES EXTRACTION


Relevant emotional features extraction from speech is the second
important step in emotions recognition. To classify features there is no unique
way but preferably acoustic and linguistic features taxonomy is considered
separately. Due to extreme difference concerning thwie extraction methods and
database used is another distinction. An importance is gain by linguistic features
in case of spontaneous or real life on other hand their features lose their value in
vase of acted speech. Earlier only small set of features were used but now larger
number of functional and acoustic features are in use for extraction of very large
feature vectors .
Speech is a varying sound signal. Humans are capable of making modifications
to the sound signal using their vocal tract, tongue, and teeth to pronounce the
phoneme. The features are a way to quantify data. A better representation of the
speech signals to get the most information from the speech is through extracting
features common among speech signals. Some characteristics of good features
include [14]:
 The features should be independent of each other. Most features in the feature
vector are correlated to each other. Therefore it is crucial to select a subset of
features that are individual and independent of each other.

26
 The features should be informative to the context. Only those features that are
more descriptive about the emotional content are to be selected for further
analysis.
 The features should be consistent across all data samples. Features that are
unique and specific to certain data samples should be avoided.
 The values of the features should be processed. The initial feature selection
process can result in a raw feature vector that is unmanageable. The process of
Feature Engineering will remove any outliers, missing values, and null values.
The features in a speech percept that is relevant to the emotional content can be
grouped into two main categories:
1. Prosodic features
2. Phonetic features.
The prosodic features are the energy, pitch, tempo, loudness, formant, and
intensity. The phonetic features are mostly related to the pronunciation of the
words based on the language. Therefore for the purpose of emotion detection,
the analysis is performed on the prosodic features or a combination of them.
Mostly the pitch and loudness are the features that are very relevant to the
emotional content.
To extract speech information from audio signals, we use MFCC values, which
are widely used in analyzing audio signals. The MFCC feature set contains a
total of 39 features, which include 12 MFCC parameters (1-12) from the 26
Melfrequency bands and log-energy parameters, 13 delta and 13 acceleration
coefficients The frame size is set to 25 ms at a rate of 10 ms with the Hamming
function. According to the length of each wave file, the sequential step of the
MFCC features is varied. To extract additional information from the data, we
also use prosodic features, which show effectiveness in affective computing.
The prosodic features are composed of 35 features, which include the F0
frequency, the voicing probability, and the loudness contours. All of these

27
MFCC and prosodic features are extracted from the data using the OpenSMILE
toolkit .

FEATURES EXTRACTING MECHANISMS:


Acoustic features: Large statics measures of energy, duration and pitch is used
to characterized acoustic features that are derived from speech processing [33].
In order to mask particular items in speech of humans a involuntary and
voluntary acoustic variation is basic used for emotion recognition using acoustic
features. Measurement of energy, pitch or voiced and unvoiced segments
duration is in seconds that can represent duration features by applying different
types of normalisation. Words, utterance, syllables or pauses like phonemes
unit’s higher phonological parameter duration is exclusively represented.
 Linguistic features: In reaction of our emotional state an important role is
played to grammatical alternations or words chosen by us. Bag-of-Words and
N-Grams are two prime methods from number of existing techniques used for
analysis. To predict next given sequence a probabilistic base language model is
used and N-grams is a numerical representation form of texts in automatic
document categorisation. Reduction of speech complexity by elimination of
irrelevant words and stopping words that do notincrease a general minimum
frequency of occurrence is useful before applying this technique. Cries, laughs,
sighs, etc non-linguistic vocalisations can be integrated into vocabulary.
 Functional: After extraction of Low-level descriptor (LLD) a functional are
applied and number of functional and operators. Out of each base contour equal
size feature vector is obtained [18]. To obtain constant number of elements a
feature vector is used per word to provide normalization over time that are
ready to be model by static classifier. Before applying functional a LLD can be
transformed or altered as for linguistic features. Example of functional features
is peak (distance, number), four first moments (curtosis, standard deviation,

28
mean and skewness), segments (duration, number) or extremes values (max,
min, range).
 Feature selection: To describe phenomenon from a larger set of redundant or
irrelevant features is a subset of features selected by feature selection. Feature
selection is done to improve the accuracy and performance of classifier [20].
Wrapper based selection methods are generally used approaches that employ an
accuracy of target classifier as optimization criterion in a closed loop fashion
[26]. In this features with poor performance are neglected. Hill climbing,
sequential forward search is commonly chosen procedure with a sequentially
adding and empty set. These features give performances improvement. Selected
subset of features effects are ignored by use of filter methods which is a second
general approach. Reduced features sets obtained from the acted and non-acted
emotions difference is very less.
There are number of methods for feature extraction like Linear predictive
cepstral coefficients (LPCC), Power spectral analysis (FFT), First order
derivative (DELTA), Linear predictive analysis (LPC), Mel scale cepstral
analysis (MEL), perceptual linear predictive coefficients (PLP) and Relative
spectra filtering of log domain coefficients (RASTA) .
 Linear predictive coding (LPC): In encoding quality speech at a low bit rate
LPC method is useful that is one of the most powerful techniques of speech
analysis. At current time specific speech sample can be approximated as a linear
combination of past speech samples is the basic idea behind linear predictive
analysis. It is a human speech production base model that utilizes a
conventional source filter model. Vocal tract acoustics are simulated by Lip
radiation, vocal tract and glottal transfer functions that are integrated into one
all pole filter. Over a finite duration the sum of squared differences between
estimated and original speech signal is minimized using LPC that helps in
having unique sets of predictor coefficients. In real recognition actual predictor
coefficients are not used as a high variance is shown by it. There is
29
transformation of predictor coefficient to a cepstral coefficients more robust set
of parameters. Some of the types of LPC are residual excitation, regular pulse
excited, pitch excitation, voice excitation and coded excited LPC.
Mel frequency cepstral coefficients (MFCC): It is considered as one of the
standard method for feature extraction and in ASR most common is the use of
20 MFCC coefficients. Although for coding speech use of 10-12 coefficients are
sufficient and it depend on the spectral form due to which it is more sensitive to
noise. This problem can be overcome by using more information in speech
signals periodicity although aperiodic content is also present in speech. Real
cesptal of windowed short time fast Fourier transform (FFT) signal is represent
by MFCC [21]. Non linear frequency is use. The parameters similar to humans
used for hearing speech are used to extracts parameters using audio feature
extraction MFCC technique. Other information is deemphasizes and arbitrary
number of samples contain time frames are used to divide speech signals.
Overlapping from frame to frame is used to smooth the transition in most
systems and then hamming window is used to eliminate the discontinuities from
each time frame.
Mel-frequency cepstral coefficients (MFCCs, [154]) are a parametric
representation of the speech signal, that is commonly used in automatic speech
recognition, but they have proved to be successful for other purposes as well,
among them speaker identification and emotion recognition. MFCCs are
calculated by applying a Mel-scale filter bank to the Fourier transform of a
windowed signal. Subsequently, a DCT (discrete cosine transform) transforms
the logarithmised spectrum into a cepstrum. The MFCCs are then the
amplitudes of the cepstrum. Usually, only the first 12 coefficients are used.
Through the mapping onto the Mel-scale, which is an adaptation of the Hertz-
scale for frequency to the human sense of hearing, MFCCs enable a signal
representation that is closer to human perception. MFCCs filter out pitch and
other influences in speech that are not linguistically relevant, hence they are

30
very suitable for speech recognition. Though this should make them useless for
emotion
Mel Frequency Cepstrum Coefficients (MFCC) FEATURES A subset of
features that are used for speech emotion detection is grouped under a category
called the Mel Frequency Cepstrum Coefficients (MFCC) [16]. It can be
explained as follows:
 The word Mel represents the scale used in Frequency vs Pitch measurement .
The value measured in frequency scale can be converted into Mel scale using
the formula m = 2595 log10 (1 + (f/700))
 The word Cepstrum represents the Fourier Transform of the log spectrum of
the speech signal.
Perceptual linear prediction (PLP): Hermansky developed a PLP model that
uses psychophysics concept of hearing to model a human speech. The speech
recognition rate gets improved by discarding irrelevant information by PLP.
Spectral characteristics are transformed to human auditory system match is the
only thing that makes PLP different from LPC. The intensity-loudness power-
law relation, equalloudness curve and critical-band resolution curves are three
main perceptual aspects approximates by PLP.  Mel scale cepstral analysis
(MEL): PLP analysis and MEL analysis is similar to each other in which
psychophysically based spectral transformations is used to modify the spectrum.
According to the scale of MEL a spectrum is wrapped in this method on other
hand according to bark scale a spectrum is warped in PLP. So output cepstral
coefficients are the main different between scale cepstral analysis of PLP and
MEL. The modified power spectrum is smooth using all pole model in PLP and
then on the basis of this model a output cepstral coefficients are computed. On
other hand modified power spectrum is smooth using cepstral smoothing in
MEL scale cesptral analysis. In this Discrete Fourier Transform (DFT) is used
to convert log power spectrum is directly transform into capstal domain.

31
 Relative Spectra filtering (RASTA): The ability to perform RASTA filtering
is provided by analysis library to compensate for linear channel distortions. It
can be used either in cepstral or log spectral domains and in both of them linear
channel distortions is appear as an additive constant. Each feature coefficient is
band passes by RASTA filter and convolutional introduced noise in the channel
effect is alleviated by band pass filter equivalent high pass portion. Then frame
to frame spectral changes are smoothened with the help of low pass filtering.
 Power spectral analysis (FFT): This is the more common techniques of
studying speech signal and over the frequency content of the signal over time is
described by speech signal power spectrum. Discrete Fourier Transform (DFT)
of the speech signal is the first step to compute power spectrum that computes
time domain signal equivalent frequency information. Real point values consist
speech signal can use Fast Fourier Transform (FFT) to increase the efficiency.

Fig 3.10 MFCC

32
MODULE DESIGN

33
CHAPTER 4

MODULE DESIGN

4.1 SPEECH PROCESSING MODULE

Fig 4.1 Speech Module

 In this module 1,the voice on which will be processed must be given here.
 The user can start speaking after pressing the mike like button.
 It is much important to specify the gender of the speaker,whether male or
female before starting to speak.

4.2 PRE-PROCESSING MODULE

Fig 4.2 Pre-Processing Module

34
 In this module 2,pre-processing will be completed.
 The preprocessing include silence removal, pre-emphasis, normalization
and windowing so it is an important phase to get pure signal which is
used in the next stage (feature extraction).
 The discrimination between speech and music files was performed
depend on a comparative between more than one statistical indicator such
as mean, standard deviation, energy and silence interval.
 The speech signal usually include many parts of silence. The silence
signal is not important because it is not contain information. There are
several methods to remove these parts such as zero crossing rate
(ZCR) and short time energy (STE). Zero-crossing rate is a measure
of number of times in a given time interval such that the amplitude of the
speech signals passes through a value of zero.

4.3 FEATURES EXTRACTING MODULE

Pre-processing Spectral Feature


Start
Analysis Extraction

Feature
Output Classification
Selection

Fig 4.3 Extraction Module

FEATURES COMPRISES OF

35
Mel frequency cepstral coefficients (MFCC): It is considered as one of
the standard method for feature extraction and in ASR most common is the use
of 20 MFCC coefficients. Although for coding speech use of 10-12 coefficients
are sufficient and it depend on the spectral form due to which it is more
sensitive to noise. This problem can be overcome by using more information in
speech signals periodicity although aperiodic content is also present in speech.
Real cesptal of windowed short time fast Fourier transform (FFT) signal is
represent by MFCC [21]. Non linear frequency is use. The parameters similar to
humans used for hearing speech are used to extracts parameters using audio
feature extraction MFCC technique. Other information is deemphasizes and
arbitrary number of samples contain time frames are used to divide speech
signals. Overlapping from frame to frame is used to smooth the transition in
most systems and then hamming window is used to eliminate the discontinuities
from each time frame.

4.4 CLASSIFIER MODULE

Fig 4.4 Classifier Module

36
 A set of 26 features was selected by statistical method and Multilayer
Perception, Probabilistic Neural Networks and Support Vector Machine
were used for the Emotion Classification at seven classes: anger,
happiness, anxiety/fear, sadness, boredom,disgust and neutral.
 Energy and Formants and were evaluated in order to create a feature set
sufficient to discriminate between seven emotions in acted speech.

4.5 EMOTION DETECTION MODULE

Fig 4.5 Detection Module

 This is the last and final module of the system.Here the feature extracted
audio will be compared to our locally customized data sets.
 We have a huge quantity of customized datasets to make sure that no
emotion is missing out that easily,
 So, After comparing the audio with the customized datasets,the suitable
or perfectly matched emotion will be found.

37
 The founded emotion will be displayed to the user in a very easily
gettable graphical format.

38
REQUIREMENT
SPECIFICATION

39
CHAPTER 5

REQUIREMENT SPECIFICATION

5.1 HARDWARE REQUIREMENTS

Workable Inbuilt mic

Processor : Intel Pentium Processor

Hard disk: Minimum 25 Gb

Ram : Minimum 2 Gb

Moniter : 15 inch

5.2 SOFTWARE REQUIREMENTS

OS : WINDOWS 8 & ABOVE

INTEGRATED DEVELOPMENT

ENVIRONMENT : PYCHARM

5.2.1 INTRODUCTION TO PYTHON

Python is a popular programming language. It was created by Guido van


Rossum, and released in 1991.

It is used for:

 web development (server-side),


 software development,
 mathematics,

40
 system scripting.

FEATURES OF PYTHON:

 Python can be used on a server to create web applications.


 Python can be used alongside software to create workflows.
 Python can connect to database systems. It can also read and modify files.
 Python can be used to handle big data and perform complex mathematics.
 Python can be used for rapid prototyping, or for production-ready
software development.

WHY PYTHON IS USED

 Python works on different platforms (Windows, Mac, Linux, Raspberry


Pi, etc).
 Python has a simple syntax similar to the English language.
 Python has syntax that allows developers to write programs with fewer
lines than some other programming languages.
 Python runs on an interpreter system, meaning that code can be executed
as soon as it is written. This means that prototyping can be very quick.
 Python can be treated in a procedural way, an object-oriented way or a
functional way.

Good to know

 The most recent major version of Python is Python 3, which we shall be


using in this tutorial. However, Python 2, although not being updated
with anything other than security updates, is still quite popular.

41
 In this tutorial Python will be written in a text editor. It is possible to
write Python in an Integrated Development Environment, such as
Thonny, Pycharm, Netbeans or Eclipse which are particularly useful
when managing larger collections of Python files.

Python Syntax compared to other programming languages

 Python was designed for readability, and has some similarities to the
English language with influence from mathematics.
 Python uses new lines to complete a command, as opposed to other
programming languages which often use semicolons or parentheses.
 Python relies on indentation, using whitespace, to define scope; such as
the scope of loops, functions and classes. Other programming languages
often use curly-brackets for this purpose.

5.2.2 INTEGRATED DEVELOPMENT ENVIRONMENT

 PYCHARM

PyCharm is an integrated development environment (IDE) used in computer


programming, specifically for the Python language. It is developed by
the Czech company JetBrains. It provides code analysis, a graphical debugger,
an integrated unit tester, integration with version control systems (VCSes), and
supports web development with Django as well as data
science with Anaconda.[6]

PyCharm is cross-platform, with Windows, macOS and Linux versions. The


Community Edition is released under the Apache License, and there is also
Professional Edition with extra features – released under a proprietary license.

42
FEATURES OF PYCHARM

 Coding assistance and analysis, with code completion, syntax and error
highlighting, linter integration, and quick fixes
 Project and code navigation: specialized project views, file structure views
and quick jumping between files, classes, methods and usages
 Python refactoring: includes rename, extract method, introduce variable,
introduce constant, pull up, push down and others
 Support for web frameworks: Django, web2py and Flask [professional
edition only][8]
 Integrated Python debugger
 Integrated unit testing, with line-by-line code coverage
 Google App Engine Python development
 Version control integration: unified user interface
for Mercurial, Git, Subversion, Perforce and CVS with change lists and
merge
 Support for scientific tools like matplotlib, numpy and scipy.

5.2.3 PYTHON LIBRARIES:

The next step after data collection was to represent these audio files
numerically, in order to perform further analysis on them. This step is called
feature extraction, where quantitative values for different features of the audio is
obtained. The pyAudioAnalysis library was used for this purpose . This python
library provides functions for short-term feature extraction, with tunable
windowing parameters such as frame size and frame step. At the end of this
step, each audio file was represented as a row in a CSV file with 34 columns

43
representing the different features. Each feature will have a range of values for
one audio file obtained over the various frames in that audio signal.

The python library pyAudioAnalysis is an open Python library that provides a


wide range of audio-related functionalities focusing on feature extraction,
classification, segmentation, and visualization issues.

The library depends on several other libraries which are:

 Numpy
 Matplotlib
 Keras
 Tensor flow
 Hmmlearn
 Simplejson
 pydub

NUMPY:

NumPy is the fundamental package for scientific computing in Python. It is a


Python library that provides a multidimensional array object, various derived
objects (such as masked arrays and matrices), and an assortment of routines for
fast operations on arrays, including mathematical, logical, shape manipulation,
sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic
statistical operations, random simulation and much more.

At the core of the NumPy package, is the ndarray object. This encapsulates n-
dimensional arrays of homogeneous data types, with many operations being
performed in compiled code for performance. There are several important
differences between NumPy arrays and the standard Python sequences:

44
 NumPy arrays have a fixed size at creation, unlike Python lists (which can
grow dynamically). Changing the size of an ndarray will create a new
array and delete the original.
 The elements in a NumPy array are all required to be of the same data type,
and thus will be the same size in memory. The exception: one can have arrays
of (Python, including NumPy) objects, thereby allowing for arrays of
different sized elements.
 NumPy arrays facilitate advanced mathematical and other types of
operations on large numbers of data. Typically, such operations are
executed more efficiently and with less code than is possible using Python's
built-in sequences.
 A growing plethora of scientific and mathematical Python-based packages
are using NumPy arrays; though these typically support Python-sequence
input, they convert such input to NumPy arrays prior to processing, and they
often output NumPy arrays.

MATPLOTLIB:

Matplotlib is a comprehensive library for creating static, animated, and


interactive visualizations in Python

 Develop publication quality plots with just a few lines of code


 Use interactive figures that can zoom, pan, update...
 Take full control of line styles, font properties, axes properties...
 Export and embed to a number of file formats and interactive
environments
 Explore tailored functionality provided by third party packages

45
 Learn more about Matplotlib through the many external learning
resources.

Keras:

Keras is a deep learning API written in Python, running on top of the machine
learning platform TensorFlow. It was developed with a focus on enabling fast
experimentation. Being able to go from idea to result as fast as possible is key
to doing good research.

Keras & TensorFlow 2

TensorFlow 2 is an end-to-end, open-source machine learning platform. You can


think of it as an infrastructure layer for differentiable programming. It
combines four key abilities:

 Efficiently executing low-level tensor operations on CPU, GPU, or TPU.


 Computing the gradient of arbitrary differentiable expressions.
 Scaling computation to many devices (e.g. the Summit supercomputer at
Oak Ridge National Lab, which spans 27,000 GPUs).
 Exporting programs ("graphs") to external runtimes such as servers,
browsers, mobile and embedded devices.

Keras is the high-level API of TensorFlow 2: an approachable, highly-productive


interface for solving machine learning problems, with a focus on modern deep
learning. It provides essential abstractions and building blocks for developing
and shipping machine learning solutions with high iteration velocity.

46
IMPLEMENTATION

47
CHAPTER 6

IMPLEMENTATION

6.1 SAMPLE CODE

WORKSPACE.XML

<?xml version="1.0" encoding="UTF-8"?>

<project version="4"><component name="ChangeListManager"><list


name="Default Changelist" comment="" id="e26abddc-b29c-45a3-99d2-
743fbf23056f" default="true"/><option name="SHOW_DIALOG"
value="false"/><option name="HIGHLIGHT_CONFLICTS"
value="true"/><option
name="HIGHLIGHT_NON_ACTIVE_CHANGELIST"
value="false"/><option name="LAST_RESOLUTION"
value="IGNORE"/></component><component
name="GitSEFilterConfiguration"><file-type-list><filtered-out-file-type
name="LOCAL_BRANCH"/><filtered-out-file-type
name="REMOTE_BRANCH"/><filtered-out-file-type
name="TAG"/><filtered-out-file-type
name="COMMIT_BY_MESSAGE"/></file-type-
list></component><component name="ProjectId"
id="1o1pAQtgqEDnww88pm1waIEHC46"/><component
name="ProjectViewState"><option name="hideEmptyMiddlePackages"
value="true"/><option name="showLibraryContents"
value="true"/></component><component
name="PropertiesComponent"><property
name="RunOnceActivity.OpenProjectViewOnStart"
value="true"/><property name="RunOnceActivity.ShowReadmeOnStart"

48
value="true"/><property name="last_opened_file_path"
value="$PROJECT_DIR$/testing.py"/><property
name="settings.editor.selected.configurable"
value="com.jetbrains.python.configuration.PyActiveSdkModuleConfigura
ble"/></component><component name="RecentsManager"><key
name="CopyFile.RECENT_KEYS"><recent name="D:\Projects\Sentiment-
Analysis-master"/></key></component><component name="RunManager"
selected="Python.testing"><configuration name="Speech rec tsmil "
nameIsGenerated="true" temporary="true" factoryName="Python"
type="PythonConfigurationType"><module name="Sentiment-Analysis-
master"/><option name="INTERPRETER_OPTIONS" value=""/><option
name="PARENT_ENVS" value="true"/><envs><env
name="PYTHONUNBUFFERED" value="1"/></envs><option
name="SDK_HOME" value=""/><option
name="WORKING_DIRECTORY"
value="$USER_HOME$/pythonProject"/><option
name="IS_MODULE_SDK" value="false"/><option
name="ADD_CONTENT_ROOTS" value="true"/><option
name="ADD_SOURCE_ROOTS" value="true"/><option
name="SCRIPT_NAME" value="$USER_HOME$/pythonProject/Speech
rec tsmil .py"/><option name="PARAMETERS" value=""/><option
name="SHOW_COMMAND_LINE" value="false"/><option
name="EMULATE_TERMINAL" value="false"/><option
name="MODULE_MODE" value="false"/><option
name="REDIRECT_INPUT" value="false"/><option name="INPUT_FILE"
value=""/><method v="2"/></configuration><configuration name="main -
Copy" nameIsGenerated="true" temporary="true" factoryName="Python"
type="PythonConfigurationType"><module name="Sentiment-Analysis-
master"/><option name="INTERPRETER_OPTIONS" value=""/><option

49
name="PARENT_ENVS" value="true"/><envs><env
name="PYTHONUNBUFFERED" value="1"/></envs><option
name="SDK_HOME" value=""/><option
name="WORKING_DIRECTORY" value="$PROJECT_DIR$"/><option
name="IS_MODULE_SDK" value="true"/><option
name="ADD_CONTENT_ROOTS" value="true"/><option
name="ADD_SOURCE_ROOTS" value="true"/><option
name="SCRIPT_NAME" value="$PROJECT_DIR$/main -
Copy.py"/><option name="PARAMETERS" value=""/><option
name="SHOW_COMMAND_LINE" value="false"/><option
name="EMULATE_TERMINAL" value="false"/><option
name="MODULE_MODE" value="false"/><option
name="REDIRECT_INPUT" value="false"/><option name="INPUT_FILE"
value=""/><method v="2"/></configuration><configuration name="main"
nameIsGenerated="true" temporary="true" factoryName="Python"
type="PythonConfigurationType"><module name="Sentiment-Analysis-
master"/><option name="INTERPRETER_OPTIONS" value=""/><option
name="PARENT_ENVS" value="true"/><envs><env
name="PYTHONUNBUFFERED" value="1"/></envs><option
name="SDK_HOME"
value="C:\Users\thiru\AppData\Local\Microsoft\WindowsApps\python3.7.
exe"/><option name="WORKING_DIRECTORY"
value="$PROJECT_DIR$"/><option name="IS_MODULE_SDK"
value="false"/><option name="ADD_CONTENT_ROOTS"
value="true"/><option name="ADD_SOURCE_ROOTS"
value="true"/><option name="SCRIPT_NAME"
value="$PROJECT_DIR$/main.py"/><option name="PARAMETERS"
value=""/><option name="SHOW_COMMAND_LINE"
value="false"/><option name="EMULATE_TERMINAL"

50
value="false"/><option name="MODULE_MODE" value="false"/><option
name="REDIRECT_INPUT" value="false"/><option name="INPUT_FILE"
value=""/><method v="2"/></configuration><configuration name="testing"
nameIsGenerated="true" temporary="true" factoryName="Python"
type="PythonConfigurationType"><module name="Sentiment-Analysis-
master"/><option name="INTERPRETER_OPTIONS" value=""/><option
name="PARENT_ENVS" value="true"/><envs><env
name="PYTHONUNBUFFERED" value="1"/></envs><option
name="SDK_HOME" value=""/><option
name="WORKING_DIRECTORY" value="$PROJECT_DIR$"/><option
name="IS_MODULE_SDK" value="true"/><option
name="ADD_CONTENT_ROOTS" value="true"/><option
name="ADD_SOURCE_ROOTS" value="true"/><option
name="SCRIPT_NAME" value="$PROJECT_DIR$/testing.py"/><option
name="PARAMETERS" value=""/><option
name="SHOW_COMMAND_LINE" value="false"/><option
name="EMULATE_TERMINAL" value="false"/><option
name="MODULE_MODE" value="false"/><option
name="REDIRECT_INPUT" value="false"/><option name="INPUT_FILE"
value=""/><method v="2"/></configuration><configuration name="train"
nameIsGenerated="true" temporary="true" factoryName="Python"
type="PythonConfigurationType"><module name="Sentiment-Analysis-
master"/><option name="INTERPRETER_OPTIONS" value=""/><option
name="PARENT_ENVS" value="true"/><envs><env
name="PYTHONUNBUFFERED" value="1"/></envs><option
name="SDK_HOME" value=""/><option
name="WORKING_DIRECTORY" value="$PROJECT_DIR$/../Face-
Recognition-Based-Attendance-System-master"/><option
name="IS_MODULE_SDK" value="false"/><option

51
name="ADD_CONTENT_ROOTS" value="true"/><option
name="ADD_SOURCE_ROOTS" value="true"/><option
name="SCRIPT_NAME" value="$PROJECT_DIR$/../Face-Recognition-
Based-Attendance-System-master/train.py"/><option
name="PARAMETERS" value=""/><option
name="SHOW_COMMAND_LINE" value="false"/><option
name="EMULATE_TERMINAL" value="false"/><option
name="MODULE_MODE" value="false"/><option
name="REDIRECT_INPUT" value="false"/><option name="INPUT_FILE"
value=""/><method v="2"/></configuration><recent_temporary><list><item
itemvalue="Python.testing"/><item itemvalue="Python.Speech rec tsmil
"/><item itemvalue="Python.main - Copy"/><item
itemvalue="Python.main"/><item
itemvalue="Python.train"/></list></recent_temporary></component><compo
nent name="SpellCheckerSettings" transferred="true"
UseSingleDictionary="true" DefaultDictionary="application-level"
CustomDictionaries="0" Folders="0" RuntimeDictionaries="0"/><component
name="TaskManager"><task id="Default" summary="Default task"
active="true"><changelist name="Default Changelist" comment=""
id="e26abddc-b29c-45a3-99d2-
743fbf23056f"/><created>1612463823446</created><option name="number"
value="Default"/><option name="presentableId"
value="Default"/><updated>1612463823446</updated></task><servers/></co
mponent></project>.

MODULES.XML

<?xml version="1.0" encoding="UTF-8"?>

52
<project version="4"><component
name="ProjectModuleManager"><modules><module
filepath="$PROJECT_DIR$/.idea/Sentiment-Analysis-master.iml"
fileurl="file://$PROJECT_DIR$/.idea/Sentiment-Analysis-master.iml"/>

</modules>

</component>

</project>

MISC.XML

<?xml version="1.0" encoding="UTF-8"?>

<project version="4"><component version="2" project-jdk-type="Python


SDK" project-jdk-name="Python 3.9 (pythonProject)"
name="ProjectRootManager"/></project>

PROFILE.XML

<?xml version="1.0"?>

<component name="InspectionProjectProfileManager"><settings><option
name="USE_PROJECT_PROFILE" value="false"/><version
value="1.0"/></settings></component>

PYTHON CODE:

PROJECT.PY

from tkinter import *

53
from tkinter import messagebox

import string

from collections import Counter

import matplotlib.pyplot as plt

import speech_recognition as sr

tkWindow = Tk()

tkWindow.geometry('400x150')

tkWindow.title('SPEECH RECOGNITION')

var = StringVar()

label = Label( tkWindow, textvariable=var, relief=RAISED )

def showMsg():

r = sr.Recognizer()

text=''

with sr.Microphone() as source:

print("Speak Anything :")

audio = r.listen(source)

try:

text = r.recognize_google(audio)

print("You said : {}".format(text))

except:

print("Sorry could not recognize what you said")

# reading text file

54
# text = open("read1.txt", encoding="utf-8").read()

# converting to lowercase

lower_case = text.lower()

# Removing punctuations

cleaned_text = lower_case.translate(str.maketrans('', '', string.punctuation))

# splitting text into words

tokenized_words = cleaned_text.split()

stop_words = ["i", "me", "my", "myself", "we", "our", "ours", "ourselves",


"you", "your", "yours", "yourself",

"yourselves", "he", "him", "his", "himself", "she", "her", "hers",


"herself", "it", "its", "itself",

"they", "them", "their", "theirs", "themselves", "what", "which",


"who", "whom", "this", "that",

"these",

"those", "am", "is", "are", "was", "were", "be", "been", "being",


"have", "has", "had", "having",

"do",

"does", "did", "doing", "a", "an", "the", "and", "but", "if", "or",
"because", "as", "until", "while",

"of", "at", "by", "for", "with", "about", "against", "between", "into",


"through", "during", "before",

55
"after", "above", "below", "to", "from", "up", "down", "in", "out",
"on", "off", "over", "under",

"again",

"further", "then", "once", "here", "there", "when", "where", "why",


"how", "all", "any", "both",

"each",

"few", "more", "most", "other", "some", "such", "no", "nor", "not",


"only", "own", "same", "so",

"than",

"too", "very", "s", "t", "can", "will", "just", "don", "should", "now"]

final_words = []

for word in tokenized_words:

if word not in stop_words:

final_words.append(word)

emotion_list = []

with open('emotions.txt', 'r') as file:

for line in file:

clear_line = line.replace("\n", '').replace(",", '').replace("'", '').strip()

word, emotion = clear_line.split(':')

if word in final_words:

emotion_list.append(emotion)

56
labeltext = "You Said :" + text

var.set(labeltext)

label.pack()

print(emotion_list)

w = Counter(emotion_list)

print(w)

# Plotting the emotions on the graph

fig, ax1 = plt.subplots()

ax1.bar(w.keys(), w.values())

fig.autofmt_xdate()

plt.savefig('graph.png')

plt.show()

button = Button(tkWindow,

text='Speak',

command=showMsg)

button.pack()

tkWindow.mainloop()

57
MAIN.PY

from tkinter import *

from tkinter import messagebox

import tkinter as tk

import string

from collections import Counter

import matplotlib.pyplot as plt

import speech_recognition as sr

tkWindow = Tk()

tkWindow.geometry('500x450')

tkWindow.title('SPEECH RECOGNITION')

tkWindow.configure(bg='blue')

var = StringVar()

label = Label( tkWindow, textvariable=var, relief=RAISED )

def speak():

tkWindow1 = Toplevel()

tkWindow1.geometry('400x150')

var2 = StringVar()

label2 = Label(tkWindow1, textvariable=var, relief=RAISED)

photo = PhotoImage(file=r"mic.png")

58
photoimage = photo.subsample(6, 6)

button = Button(tkWindow1,

text='Speak',

image=photoimage,

command=showMsg).pack(side = TOP)

tkWindow1.mainloop()

def gen():

tkWindow2 = Toplevel()

tkWindow2.geometry('400x150')

var1 = StringVar()

label1 = Label(tkWindow2, textvariable=var1, relief=RAISED)

var1.set("What's Your Gender!? -")

label1.pack()

button = Button(tkWindow2,

text='MALE',

command=speak).pack(side=TOP)

button = Button(tkWindow2,

text='FEMALE',

command=speak).pack(side=TOP)

tkWindow.mainloop()

59
def showMsg():

r = sr.Recognizer()

text=''

with sr.Microphone() as source

print("Speak Anything :")

audio = r.listen(source)

try:

text = r.recognize_google(audio)

print("You said : {}".format(text))

except:

print("Sorry could not recognize what you said")

# reading text file

# text = open("read1.txt", encoding="utf-8").read()

# converting to lowercase

lower_case = text.lower()

# Removing punctuations

cleaned_text = lower_case.translate(str.maketrans('', '', string.punctuation))

# splitting text into words

tokenized_words = cleaned_text.split()

stop_words = ["i", "me", "my", "myself", "we", "our", "ours", "ourselves",


"you", "your", "yours", "yourself",

60
"yourselves", "he", "him", "his", "himself", "she", "her", "hers", "herself", "it",
"its", "itself",
"they", "them", "their", "theirs", "themselves", "what", "which", "who",
"whom", "this", "that", "these",
"those", "am", "is", "are", "was", "were", "be", "been", "being",
"have", "has", "had", "having",
"do",
"does", "did", "doing", "a", "an", "the", "and", "but", "if", "or",
"because", "as", "until", "while",
"of", "at", "by", "for", "with", "about", "against", "between", "into",
"through", "during", "before",
"after", "above", "below", "to", "from", "up", "down", "in", "out",
"on", "off", "over", "under",
"again",
"further", "then", "once", "here", "there", "when", "where", "why",
"how", "all", "any", "both",
"each",
"few", "more", "most", "other", "some", "such", "no", "nor", "not",
"only", "own", "same", "so",
"than",
"too", "very", "s", "t", "can", "will", "just", "don", "should", "now"]

final_words = []

for word in tokenized_words:

if word not in stop_words:

final_words.append(word

emotion_list = [] with open('emotions.txt', 'r') as file:

61
for line in file:

clear_line = line.replace("\n", '').replace(",", '').replace("'", '').strip()

word, emotion = clear_line.split(':')

if word in final_words:

emotion_list.append(emotion)

labeltext = "You Said :" + text

var.set(labeltext)

label.pack()

#print(emotion_list)

w = Counter(emotion_list)

#print(w)

# Plotting the emotions on the graph

fig, ax1 = plt.subplots()

ax1.bar(w.keys(), w.values())

fig.autofmt_xdate()

plt.savefig('graph.png')

plt.show()

var.set("Welcome to Tone Based Sentiment detection project!!")

label.pack()

button = Button(tkWindow,

text='Want to start?- Click me!',

62
command=gen)

button.pack()

tkWindow.mainloop()

MAIN.NLKTR.PY

import string

from collections import Counter

import matplotlib.pyplot as plt

from nltk.corpus import stopwords

from nltk.sentiment.vader import SentimentIntensityAnalyzer

from nltk.stem import WordNetLemmatizer

from nltk.tokenize import word_tokenize

text = open('read.txt', encoding='utf-8').read()

lower_case = text.lower()

cleaned_text = lower_case.translate(str.maketrans('', '', string.punctuation))

# Using word_tokenize because it's faster than split()

tokenized_words = word_tokenize(cleaned_text, "english")

# Removing Stop Words

final_words = []

for word in tokenized_words:

if word not in stopwords.words('english'):

63
final_words.append(word)

# Lemmatization - From plural to single + Base form of a word (example better-


> good)

lemma_words = []

for word in final_words:

word = WordNetLemmatizer().lemmatize(word)

lemma_words.append(word)

emotion_list = []

with open('emotions.txt', 'r') as file:

for line in file:

clear_line = line.replace("\n", '').replace(",", '').replace("'", '').strip()

word, emotion = clear_line.split(':')

if word in lemma_words:

emotion_list.append(emotion)

print(emotion_list)

w = Counter(emotion_list)

print(w)

def sentiment_analyse(sentiment_text):

score = SentimentIntensityAnalyzer().polarity_scores(sentiment_text)

if score['neg'] > score['pos']:

print("Negative Sentiment")

elif score['neg'] < score['pos']:

64
print("Positive Sentiment")

else:

print("Neutral Sentiment")

sentiment_analyse(cleaned_text)

fig, ax1 = plt.subplots()

ax1.bar(w.keys(), w.values())

fig.autofmt_xdate()

plt.savefig('graph.png')

plt.show()

SPEECH ANALYS.PY

import string

from collections import Counter

import matplotlib.pyplot as plt

def get_tweets():

import GetOldTweets3 as got

tweetCriteria = got.manager.TweetCriteria().setQuerySearch('Dhoni') \

.setSince("2020-01-01") \

.setUntil("2020-04-01") \

.setMaxTweets(1000)

# Creation of list that contains all tweets

tweets = got.manager.TweetManager.getTweets(tweetCriteria)

65
# Creating list of chosen tweet data

text_tweets = [[tweet.text] for tweet in tweets]

return text_tweets

# reading text file

text = ""

text_tweets = get_tweets()

length = len(text_tweets)

for i in range(0, length):

text = text_tweets[i][0] + " " + text

# converting to lowercase

lower_case = text.lower()

# Removing punctuations

cleaned_text = lower_case.translate(str.maketrans('', '', string.punctuation))

# splitting text into words

tokenized_words = cleaned_text.split()

stop_words = ["i", "me", "my", "myself", "we", "our", "ours", "ourselves",


"you", "your", "yours", "yourself",

"yourselves", "he", "him", "his", "himself", "she", "her", "hers",


"herself", "it", "its", "itself",

"they", "them", "their", "theirs", "themselves", "what", "which", "who",


"whom", "this", "that", "these",

66
"those", "am", "is", "are", "was", "were", "be", "been", "being", "have",
"has", "had", "having", "do",

"does", "did", "doing", "a", "an", "the", "and", "but", "if", "or",
"because", "as", "until", "while",

"of", "at", "by", "for", "with", "about", "against", "between", "into",


"through", "during", "before",

"after", "above", "below", "to", "from", "up", "down", "in", "out", "on",
"off", "over", "under", "again",

"further", "then", "once", "here", "there", "when", "where", "why",


"how", "all", "any", "both", "each",

"few", "more", "most", "other", "some", "such", "no", "nor", "not",


"only", "own", "same", "so", "than"

"too", "very", "s", "t", "can", "will", "just", "don", "should", "now"]

# Removing stop words from the tokenized words list

final_words = [word for word in tokenized_words if word not in stop_words]

# Get emotions text

emotion_list = []

with open('emotions.txt', 'r') as file:

for line in file:

clear_line = line.replace('\n', '').replace(',', '').replace("'", '').strip()

word, emotion = clear_line.split(':')

if word in final_words:

emotion_list.append(emotion)

67
w = Counter(emotion_list)

print(w)

fig, ax1 = plt.subplots()

ax1.bar(w.keys(), w.values())

fig.autofmt_xdate()

plt.savefig('graph.png')

plt.show()

6.2 DATASETS

Two datasets created in the English language, namely the Toronto


Emotional Speech Set (TESS) and the emotional dataset from Knowledge
Extraction based on Evolutionary Learning (KEEL), contain a more diverse and
realistic audio. The descriptions of the dataset are as follows.

TORONTO EMOTIONAL SPEECH SET (TESS) The researchers from the


Department of Psychology at the University of Toronto have created a speech
emotion based dataset in 2010, in the English language . The database contains
2800 sound files of speech utterances in seven basic emotional categories,
namely: Happy, Sad, Angry, Surprise, Fear, Disgust and Neutral. It is an acted
recording, where actors from two age groups of Old (64-year-old) and Young
(26-year-old) had performed the dictation. A few qualities of this dataset which
makes it good for this project are:

 The size of the dataset is large enough for the model to be trained effectively.
The more exposure to data given to a model helps it to perform better.

 All basic emotional categories of data are present. A combination of these


emotions can be used for further research like Sarcasm and Depression
68
detection.  Data is collected from two different age groups which will improve
the classification.

 The audio files are mono signals, which ensures an error-free conversion with
most of the programming libraries.

KNOWLEDGE EXTRACTION BASED ON EVOLUTIONARY LEARNING


(KEEL) KEEL is an online dataset repository contributed by machine learning
researchers worldwide [13]. The emotion for speech dataset contains 72 features
extracted for each of the 593 sound files. The data are labeled across six
emotions, namely: Happy, Sad, Angry, Surprise, Fear and Neutral. The
repository also offers data to be downloaded in 10 or 5 folds for the purpose of
training and testing. A few qualities of this dataset which makes it good for this
project are:

 Data is represented as features directly, which saves conversion time and


procedures.

 All basic emotional categories of data are present. A combination of these


emotions can be used for further research like Sarcasm and Depression
detection.

Dataset We evaluate our model using the Interactive Emotional Dyadic Motion
Capture (IEMOCAP) dataset. This dataset was collected following theatrical
theory in order to simulate natural dyadic interactions between actors. We use
categorical evaluations with majority agreement. We use only four emotional
categories happy, sad, angry, and neutral to compare the performance of our
model with other research using the same categories. The IEMOCAP dataset
includes five sessions, and each session contains utterances from two speakers
(one male and one female). This data collection process resulted in 10 unique
speakers. For consistent comparison with previous work, we merge the

69
excitement dataset with the happiness dataset. The final dataset contains a total
of 5531 utterances (1636 happy, 1084 sad, 1103 angry, 1708 neutral)

Our dataset of speech from couple-therapy sessions presents several advantages


for data collection. Therapy sessions take place in an office where video and
sound can be efficiently set up. Usually, participants are involved in enough
sessions that emotions and emotion-word pairs that occur less frequently are not
too infrequent over the course of all the sessions. More important, these therapy
sessions are rich in expressed emotions in naturally occurring speech.

Coding procedure

We developed our own software for the coding of the emotions to take
advantage of the precise timings of the word onsets that our transcription
offered. The program, written using MATLAB, allows the coder to watch the
video recording of the couple while listening to the session, at the same time
viewing the text transcript for each participant. The coder determines an
emotion category and an intensity level (low, medium, high) of that emotion.
(In the analysis reported in this paper, we did not differentiate between the
intensity levels.) A coder estimates the time, t0, at which an emotion begins,
and the time, t1, at which an emotion ends. Although data were recorded every
millisecond, we did not expect the accuracy of t0 or t1, to be at this level. The
association of a word with an emotion code proceeds as follows {Anger, 7
Sadness, Joy, Tension, Neutral}. If at a time tn a coding is set for Ci and at time
tn+1 a coding is set for emotion Cj different from Ci, then any word with an
onset in the interval [tn,tn+1] is automatically coded as Ci, and any word with
an onset immediately after tn+1 is coded as Cj. We do not allow two emotions

70
to overlap and every word occurrence (or token) is coded with one and only one
emotion or Neutral. In the rest of this paper we talk about emotion-coded word
tokens orjust emotion-coded tokens. They refer to the segments of the acoustic
signal associated with the word tokens and labeled with one of the four
emotions or Neutral. Transformations of these segments are the observations
that are used in the machine-learning classification model. It is well recognized
by most investigator that it is very expensive and time consuming to have the
coding of the temporal length of emotion as an individual human coder’s
responsibility. The need for automated programming to do such coding is
essential in the future to reduce cost.

6.2.1 Dataset For Speech Emotions

In the field of affect detection, a very important role is played by suitable choice
of speech database. Three databases are used for good emotion recognition the
system as given below [8]:

1.Elicited emotional speech database: In this case emotional situation is created


artificially by collecting data from the speaker.

 Advantage: This type of database is similar to a natural database

Problem: There is unavailability of all emotions and if the speaker knows


about it that they are being recorded then artificial emotion can be
expressed by them .

2. Actor based speech database: Trained and professional artists collect this type
of speech dataset.

71
 Advantage: In this database-wide variety of emotions are present and
collecting it is also very easy.

 Problem: It is very much artificial and periodic in nature

3. Natural speech database: Real world data is used to create this database.

 Advantage: For real world emotion recognition use of natural speech database
is very useful.

 Problem: It consist of background noise and all emotions may not be present
in it.

6.2.2 Our Customized Datasets

EMOTION DATASETS

'victimized': 'cheated', 'accused': 'cheated', 'acquitted': 'singled out', 'adorable':


'loved', 'adored': 'loved', 'affected': 'attracted', 'afflicted': 'sad', 'aghast': 'fearful',

'agog': 'attracted', 'agonized': 'sad', 'alarmed': 'fearful', 'amused': 'happy',

'angry': 'angry', 'anguished': 'sad', 'animated': 'happy','annoyed': 'angry',

'anxious': 'attracted', 'apathetic': 'bored', 'appalled': 'angry','appeased':

'singled out', 'appreciated': 'esteemed', 'apprehensive': 'fearful',

'approved of': 'loved', 'ardent': 'lustful', 'aroused': 'lustful', 'attached': 'attached',

'attracted': 'attracted', 'autonomous': 'independent','awed': 'fearful',

'awkward': 'embarrassed', 'beaten down': 'powerless', 'beatific': 'happy',

'belonging': 'attached', 'bereaved': 'sad', 'betrayed': 'cheated',


72
'bewildered': 'surprise', 'bitter': 'angry', 'blissful': 'happy', 'blithe': 'happy',

'blocked': 'powerless', 'blue': 'sad','boiling': 'angry', 'bold': 'fearless',

'bored': 'bored', 'brave': 'fearless', 'bright': 'happy', 'brisk': 'happy', 'calm': 'safe',

'capable': 'adequate', 'captivated': 'attached', 'careless': 'powerless',

'categorized': 'singled out', 'cautious': 'fearful', 'certain': 'fearless',

'chagrined': 'belittled', 'challenged': 'attracted', 'chastised': 'hated',

'cheated': 'cheated', 'cheerful': 'happy', 'cheerless': 'sad', 'cheery': 'happy',

'cherished': 'attached', 'chicken': 'fearful', 'cocky': 'independent',

'codependent': 'codependent', 'coerced': 'cheated', 'comfortable': 'happy',

'common': 'average', 'competent': 'adequate', 'complacent': 'apathetic',

'composed': 'adequate', 'consumed': 'obsessed', 'contented': 'happy',

'controlled': 'powerless', 'convivial': 'happy', 'cornered': 'entitled',

'courageous': 'fearless', 'cowardly': 'fearful', 'craving': 'attracted',

'crestfallen': 'sad', 'criticized': 'hated', 'cross': 'angry',

'cross-examined': 'singled out', 'crushed': 'sad', 'curious': 'attracted',

'cut off': 'alone', 'daring': 'fearless','dark': 'sad', 'concerned': 'attracted',

'confident': 'adequate', 'confused': 'surprise', 'connected': 'attached',

'conned': 'cheated', 'dedicated': 'attracted', 'defeated': 'powerless',

'defenseless': 'fearful', 'degraded': 'belittled', 'dejected': 'sad',

'depressed': 'sad', 'deserted': 'hated', 'desirable': 'loved',

'despondent': 'sad', 'detached': 'alone', 'determined': 'focused',

'diminished': 'belittled', 'disappointed': 'demoralized','discarded': 'hated',

73
'disconsolate': 'sad', 'discontented': 'sad', 'discounted': 'belittled',

'discouraged': 'powerless', 'disgraced': 'belittled', 'disgusted': 'angry',

'disheartened': 'demoralized', 'disillusioned': 'demoralized', 'disjointed':


'derailed',

'dismal': 'sad', 'dismayed': 'fearful', 'disoriented': 'derailed', 'disparaged':


'cheated',

'displeased': 'sad', 'disrespected': 'belittled', 'distressed': 'sad',

'distrustful': 'anxious', 'dolorous': 'sad', 'doubtful': 'fearful', 'down': 'sad',

'downhearted': 'sad', 'dreadful': 'sad', 'dreary': 'sad','dubious': 'anxious',

'dull': 'sad', 'duped': 'cheated', 'eager': 'attracted', 'earnest': 'attracted',

'ecstatic': 'happy', 'elated': 'happy', 'embarrassed': 'embarrassed',

'empathetic': 'attached', 'enchanted': 'attracted', 'encouraged': 'adequate',

'engrossed': 'attracted', 'enraged': 'angry', 'enterprising': 'fearless',

'enthusiastic': 'happy', 'entrusted': 'loved', 'esteemed': 'esteemed', 'excited':


'happy',

'excluded': 'alone', 'exempt': 'entitled', 'exhausted hopeless': 'powerless',

'exhilarated': 'happy', 'exploited': 'cheated', 'exposed': 'fearful',

'fabulous': 'ecstatic', 'fainthearted': 'fearful', 'fantastic': 'ecstatic',

'fascinated': 'attracted', 'favored': 'entitled','fearful': 'fearful', 'fervent': 'attracted',

'fervid': 'attracted', 'festive': 'happy', 'flat': 'sad', 'focused': 'focused',

'forced': 'powerless', 'forsaken': 'hated', 'framed': 'cheated', 'free': 'free',

'free & easy': 'happy', 'frightened': 'fearful', 'frisky': 'happy',

74
'frustrated': 'angry', 'full of anticipation': 'attracted', 'full of ennui': 'apathetic',

'fuming': 'angry', 'funereal': 'sad', 'furious': 'angry', 'gallant': 'fearless',

'genial': 'happy', 'glad': 'happy', 'gleeful': 'happy', 'gloomy': 'sad',

'glum': 'sad','grief-stricken': 'sad','grieved': 'sad', 'guilt': 'sad', 'guilty': 'singled


out',

'happy': 'happy', 'hardy': 'fearless', 'heartbroken': 'sad', 'heavyhearted': 'sad',

'hesitant': 'fearful', 'high-spirited': 'happy', 'hilarious': 'happy','hopeful':


'attracted',

'horny': 'lustful', 'horrified': 'fearful', 'hot and bothered': 'lustful','humiliated':


'sad',

'humorous': 'happy', 'hurt': 'sad', 'hysterical': 'fearful', 'ignored': 'hated',

'ill at ease': 'sad', 'immobilized': 'apathetic', 'immune': 'entitled',

'important': 'happy', 'impotent': 'powerless','imprisoned': 'entitled',

'in a huff': 'angry', 'in a stew': 'angry', 'in control': 'adequate', 'in fear': 'fearful',

'in pain': 'sad', 'in the dumps': 'sad', 'in the zone': 'focused', 'incensed': 'angry',

'included': 'attached', 'indecisive': 'anxious', 'independent': 'free',

'indignant': 'angry', 'infatuated': 'lustful', 'inflamed': 'angry', 'injured': 'sad',

'inquisitive': 'attracted', 'insecure': 'codependent', 'insignificant': 'belittled',

'intent': 'attracted', 'interested': 'attracted', 'interrogated': 'singled out',

'intrigued': 'attracted', 'irate': 'angry', 'irresolute': 'fearful',

'irresponsible': 'powerless', 'irritated': 'angry', 'isolated': 'alone', 'jaunty': 'happy',

'jocular': 'happy', 'jolly': 'happy', 'jovial': 'happy', 'joyful': 'happy', 'joyless': 'sad',

75
'joyous': 'happy', 'jubilant': 'happy', 'justified': 'singled out', 'keen': 'attracted',

'labeled': 'singled out', 'lackadaisical': 'bored', 'lazy': 'apathetic', 'left out': 'hated',

'let down': 'hated', 'lethargic': 'apathetic', 'lied to': 'cheated', 'lighthearted':


'happy',

'liked': 'attached', 'lively': 'happy', 'livid': 'angry', 'lonely': 'alone',

'lonesome': 'alone', 'lost': 'lost', 'loved': 'attached', 'low': 'sad', 'lucky': 'happy',

'lugubrious': 'sad', 'macho': 'independent', 'mad': 'angry', 'melancholy': 'sad',

'menaced': 'fearful', 'merry': 'happy', 'mirthful': 'happy', 'misgiving': 'fearful',

'misunderstood': 'alone', 'moody': 'sad', 'moping': 'sad', 'motivated': 'attracted',

'mournful': 'sad', 'needed': 'attracted', 'needy': 'codependent', 'nervous': 'fearful',

'obligated': 'powerless', 'obsessed': 'obsessed', 'offended': 'angry',

'oppressed': 'sad', 'optionless': 'entitled', 'ordinary': 'average',

'organized': 'adequate', 'out of control': 'powerless', 'out of sorts': 'sad',

'outmaneuvered': 'entitled', 'outraged': 'angry', 'overjoyed': 'happy',

'overlooked': 'hated', 'overwhelmed': 'powerless', 'panicked': 'fearful',

'passionate': 'lustful', 'passive': 'apathetic', 'pathetic': 'sad','peaceful': 'safe',

'pensive': 'anxious', 'perplexed': 'anxious', 'phobic': 'fearful',

'playful': 'happy', 'pleased': 'happy', 'powerless': 'powerless',

'pressured': 'burdened', 'privileged': 'entitled', 'proud': 'happy', 'provoked': 'angry',

'punished': 'hated', 'put upon': 'burdened', 'quaking': 'fearful',

'quiescent': 'apathetic', 'rageful': 'angry', 'rapturous': 'happy', 'rated': 'singled


out',

76
'reassured': 'fearless', 'reckless': 'powerless', 'redeemed': 'singled out',

'regretful': 'sad', 'rejected': 'alone', 'released': 'free', 'remorse': 'sad',

'replaced': 'hated', 'repulsed': 'demoralized', 'resentful': 'angry',

'resolute': 'fearless', 'respected': 'esteemed', 'responsible': 'adequate',

'restful': 'fearful', 'revered': 'esteemed', 'rueful': 'sad', 'sad': 'sad',

'satisfied': 'happy', 'saucy': 'happy', 'scared': 'fearful', 'secure': 'fearless',

'self-reliant': 'fearless', 'serene': 'happy', 'shaky': 'fearful', 'shamed': 'sad',

'shocked': 'surprise', 'significant': 'esteemed', 'singled out': 'singled out',

'skeptical': 'anxious', 'snoopy': 'attracted', 'somber': 'sad', 'sparkling': 'happy',

'spirited': 'happy', 'spiritless': 'sad', 'sprightly': 'happy', 'startled': 'surprise',

'stereotyped': 'singled out', 'stifled': 'powerless', 'stout hearted': 'fearless',

'strong': 'independent', 'suffering': 'sad', 'sulky': 'sad', 'sullen': 'angry',

'sunny': 'happy', 'surprised': 'surprise', 'suspicious': 'anxious',

'sympathetic': 'codependent', 'tense': 'anxious', 'terrified': 'fearful',

'terrorized': 'fearful', 'thankful': 'happy', 'threatened': 'fearful',

'thwarted': 'powerless', 'timid': 'fearful', 'timorous': 'fearful', 'torn': 'derailed',

'tortured': 'sad', 'tragic': 'sad', 'tranquil': 'happy', 'transported': 'happy',

'trapped': 'entitled', 'tremulous': 'fearful', 'turned on': 'lustful', And much more.

77
6.3 SAMPLE SCREEN SHOTS

Fig 6.1 Voice Input

78
Fig 6.2 Voice Captured

Fig 6.3 Gender Specification

79
Fig 6.4 Voice Analyzed

Fig 6.5 Graphical Result

80
TESTING
AND
MAINTENANCE

81
CHAPTER 7
TESTING AND MAINTENANCE

7.1 TESTING
Implementation forms an important phase in the system development
life cycle. It is a stage of the project work that transforms the design into a
model. Testing was done to see if all the features provided in the modules are
performing satisfactory and to ensure that the process of testing is as realistic as
possible.
Each program is tested individually at the time of development using the data
and has verified that this program linked together in the way specified in the
program specification, the computer system and its environment is tested to the
satisfaction of the user. The system that has been developed is accepted and
proved to be satisfactory for the user. And so the system is going to be
implemented very soon.
Initially as a first step the executable form of the application is to be created and
loaded in the common server machine which is accessible to all the users and
the server is to be connected to a network. The final stage is to document the
entire system which provides components and the operating procedures of the
system.
The importance of software testing and its implementations with respect to
software quality cannot be over emphasized. Software testing is a critical
element of software quality assurance and represents the ultimate review of
specification, design and coding. Any product can be tested using either a black
box testing or white box testing. Further testing can be implemented along the
lines of code, integration and system testing.

82
Fig 7.1 Levels of Testing

7.1.1 SYSTEM TESTING

Testing is performed to identify errors. It is used for quality assurance. Testing


is an integral part of the entire development and maintenance process. The goal
of the testing during phase is to verify that the specification has been accurately
and completely incorporated into the design, as well as to ensure the correctness
of the design itself. For example the design must not have any logic faults in the
design is detected before coding commences, otherwise the cost of fixing the
faults will be considerably higher as reflected. Detection of design faults can be
achieved by means of inspection as well as walkthrough.
Testing is one of the important steps in the software development phase.
Testing checks for the errors, as a whole of the project testing involves the
following test case:
Static analysis is used to investigate the structural properties of the Source code.
Dynamic testing is used to investigate the behavior of the source code by
executing the program on the test data.

83
7.2 TEST CASES

Test. TestDescription Input Expected Actual Pass/

No Output Output Fail

TC_01 Speaking after Voice Voice must get Voice gets Pass
clicking the mic recorded recorded.
button.
TC_02 Choosing the Gender Specified Gender is pass
gender gender must be specified
chosen. correctly.
TC_03 Feature Voice Features from Features are pass
Extraction Stage voice must be extracted.
extracted
TC_04 Comparing with Text The text must Got matched Pass
the datasets. match any data. against a
data.
TC_05 Results will be Text Graphical Correct Pass
shown result of emotion is
emotion will be displayed in
displayed. graph form.

Fig 7.2 Test Cases

7.3 TEST DATA AND OUTPUT

84
7.3.1 UNIT TESTING

Unit testing is conducted to verify the functional performance of each


modular component of the software. Unit testing focuses on the smallest unit of
the software design (i.e.), the module. The white-box testing techniques were
heavily employed for unit testing.

7.3.2 FUNCTIONAL TESTS

Functional test cases involved exercising the code with nominal input
values for which the expected results are known, as well as boundary values and
special values, such as logically related inputs, files of identical elements, and
empty files.

Three types of tests in Functional test:

 Performance Test
 Stress Test
 Structure Test

7.3.2.1 PERFORMANCE TEST

It determines the amount of execution time spent in various parts of the unit,
program throughput, and response time and device utilization by the program
unit.

7.3.2.2 STRESS TEST

Stress Test is those test designed to intentionally break the unit. A Great deal
can be learned about the strength and limitations of a program by examining the
manner in which a programmer in which a program unit breaks.

7.2.3.2 STRUCTURED TEST

85
Structure Tests are concerned with exercising the internal logic of a program
and traversing particular execution paths. The way in which White-Box test
strategy was employed to ensure that the test cases could Guarantee that all
independent paths within a module have been have been exercised at least once.
 Exercise all logical decisions on their true or false sides.
 Execute all loops at their boundaries and within their operational
bounds.
 Exercise internal data structures to assure their validity.
 Checking attributes for their correctness.
 Handling end of file condition, I/O errors, buffer problems and
 textual errors in output information

7.3.3 INTEGRATION TESTING


Integration testing is a systematic technique for construction the program
structure while at the same time conducting tests to uncover errors associated
with interfacing. i.e., integration testing is the complete testing of the set of
modules which makes up the product. The objective is to take untested modules
and build a program structure tester should identify critical modules. Critical
modules should be tested as early as possible. One approach is to wait until all
the units have passed testing, and then combine them and then tested. This
approach is evolved from unstructured testing of small programs. Another
strategy is to construct the product in increments of tested units. A small set of
modules are integrated together and tested, to which another module is added
and tested in combination. And so on. The advantages of this approach are that,
interface dispenses can be easily found and corrected.
The major error that was faced during the project is linking error. When all the
modules are combined the link is not set properly with all support files. Then
we checked out for interconnection and the links. Errors are localized to the new
module and its intercommunications. The product development can be staged,
86
and modules integrated in as they complete unit testing. Testing is completed
when the last module is integrated and tested.

7.4 TESTING TECHNIQUES / TESTING STRATERGIES


7.4.1 TESTING
Testing is a process of executing a program with the intent of finding an
error. A good test case is one that has a high probability of finding an as-yet –
undiscovered error. A successful test is one that uncovers an as-yet-
undiscovered error. System testing is the stage of implementation, which is
aimed at ensuring that the system works accurately and efficiently as expected
before live operation commences. It verifies that the whole set of programs
hang together. System testing requires a test consists of several key activities
and steps for run program, string, system and is important in adopting a
successful new system. This is the last chance to detect and correct errors before
the system is installed for user acceptance testing.
The software testing process commences once the program is created and
the documentation and related data structures are designed. Software testing is
essential for correcting errors. Otherwise, the program or the project is not said
to be complete. Software testing is the critical element of software quality
assurance and represents the ultimate the review of specification design and
coding. Testing is the process of executing the program with the intent of
finding the error. A good test case design is one that as a probability of finding
an yet undiscovered error. A successful test is one that uncovers an yet
undiscovered error. Any engineering product can be tested in one of the two
ways:

7.4.1.1 WHITE BOX TESTING

87
This testing is also called as Glass box testing. In this testing, by knowing the
specific functions that a product has been design to perform test can be
conducted that demonstrate each function is fully operational at the same time
searching for errors in each function. It is a test case design method that uses the
control structure of the procedural design to derive test cases. Basis path testing
is a white box testing.
Basis path testing:
 Flow graph notation
 Cyclometric complexity
 Deriving test cases
 Graph matrices Control

7.4.1.2 BLACK BOX TESTING


In this testing by knowing the internal operation of a product, test can be
conducted to ensure that “all gears mesh”, that is the internal operation performs
according to specification and all internal components have been adequately
exercised. It fundamentally focuses on the functional requirements of the
software.
The steps involved in black box test case design are:
 Graph based testing methods
 Equivalence partitioning
 Boundary value analysis
 Comparison testing

7.4.2 SOFTWARE TESTING STRATEGIES:


A software testing strategy provides a road map for the software developer.
Testing is a set activity that can be planned in advance and conducted
systematically. For this reason a template for software testing a set of steps into

88
which we can place specific test case design methods should be strategy should
have the following characteristics:
 Testing begins at the module level and works “outward” toward the
integration of the entire computer based system.
 Different testing techniques are appropriate at different points in
time.
 The developer of the software and an independent test group
conducts testing.
 Testing and Debugging are different activities but debugging must
be accommodated in any testing strategy.

7.4.2.1 INTEGRATION TESTING:


Integration testing is a systematic technique for constructing the program
structure while at the same time conducting tests to uncover errors associated
with. Individual modules, which are highly prone to interface errors, should not
be assumed to work instantly when we put them together. The problem of
course, is “putting them together”- interfacing. There may be the chances of
data lost across on another’s sub functions, when combined may not produce
the desired major function; individually acceptable impression may be
magnified to unacceptable levels; global data structures can present problems.

7.4.2.2 PROGRAM TESTING:


The logical and syntax errors have been pointed out by program testing. A
syntax error is an error in a program statement that in violates one or more rules
of the language in which it is written. An improperly defined field dimension or
omitted keywords are common syntax error. These errors are shown through
error messages generated by the computer. A logic error on the other hand deals
with the incorrect data fields, out-off-range items and invalid combinations.
Since the compiler s will not deduct logical error, the programmer must

89
examine the output. Condition testing exercises the logical conditions contained
in a module. The possible types of elements in a condition include a Boolean
operator, Boolean variable, a pair of Boolean parentheses A relational operator
or on arithmetic expression. Condition testing method focuses on testing each
condition in the program the purpose of condition test is to deduct not only
errors in the condition of a program but also other a errors in the program.

7.4.2.3 SECURITY TESTING:


Security testing attempts to verify the protection mechanisms built in to a
system well, in fact, protect it from improper penetration. The system security
must be tested for invulnerability from frontal attack must also be tested for
invulnerability from rear attack. During security, the tester places the role of
individual who desires to penetrate system.

7.4.2.4 VALIDATION TESTING


At the culmination of integration testing, software is completely
assembled as a package. Interfacing errors have been uncovered and corrected
and a final series of software test-validation testing begins. Validation testing
can be defined in many ways, but a simple definition is that validation succeeds
when the software functions in manner that is reasonably expected by the
customer. Software validation is achieved through a series of black box tests
that demonstrate conformity with requirement. After validation test has been
conducted, one of two conditions exists.
* The function or performance characteristics confirm to specifications and are
accepted.
* A validation from specification is uncovered and a deficiency created.
Deviation or errors discovered at this step in this project is corrected prior to
completion of the project with the help of the user by negotiating to establish a

90
method for resolving deficiencies. Thus the proposed system under
consideration has been tested by using validation testing and found tobe
working satisfactorily. Though there were deficiencies in the system they were
not catastrophic.

7.5 MAINTENANCE
After a software system has been verified, tested and implemented, it
must continue to be maintained. Maintenance routines will vary depending on
the type and complexity of the technology. Many software systems will come
with a maintenance schedule or program recommended by the developer.
Maintenance could be provided by the developer as part of the purchase
agreement for the technology.
Ongoing monitoring or testing systems may be installed to ensure that
maintenance needs are identified and met where necessary. Where systems are
in long-term use, a system can be designed to monitor feedback from users and
conduct any modifications or maintenance as needed. Where modifications to
software are made as a result of system maintenance or upgrades, it may be
necessary to instigate further rounds of system verification and testing to ensure
that standards are still met by the modified system.

91
CONCLUSION
AND
FUTURE
ENHANCEMENT

92
CHAPTER 8
CONCLUSION AND FUTURE ENHANCEMENT

In speaking about speech emotion recognition is given along with the


speech emotion recognition system block diagram description. In the field of
affect detection, a very important role is played by a suitable choice of speech
database. For good emotion recognition system mainly three databases are used.
On the basis of ability, they have to recognize a speech recognition system can
be separated in different classes are isolated, connected, spontaneous and
continuous words. Relevant emotional features extraction from the speech is the
second important step in emotions recognition. To classify features there is no
unique way but preferably acoustic and linguistic features taxonomy is
considered separately.
There are a number of methods for feature extraction like Linear predictive
cepstral coefficients (LPCC), Power spectral analysis (FFT), First order
derivative (DELTA), Linear predictive analysis (LPC), Mel scale cepstral
analysis (MEL), perceptual linear predictive coefficients (PLP) and Relative
spectra filtering of log domain coefficients (RASTA) and some of them are
briefly covered in this paper. Another important part of speech emotion
recognition system is the use of classifier. In the paper, the detailed review on
KNN, SVM, CNN, Naive Bayes, and recurrent neural network classifier for
speech emotion recognition system. The last section of the paper covers the
review on the use of the deep neural network to make speech emotion
recognition system. To further improve the efficiency of system combination of
more effective features can be used that enhances the accuracy of speech
emotion recognition system. Thus this conclude the SER system.

93
Future Enhancements:

There is plenty of room left for more detailed study of individual


emotions. Deeper discussion and conception of whether or not there are a few
basic emotions from which others can be constructed is not a settled question. In
addition, like much experimental psychology, the theoretical framework for the
recognition of results reported here is too static in character. The flow of speech
and the flow of emotion are both among the most important examples of the
temporal nature of much of human experience. Study of dynamic temporal
processes is much more difficult both experimentally and theoretically, but in
order to reach results of deeper scientific significance, such work is badly
needed. This remark applies to both the fundamental theory and important
applications. Even more pertinent from the standpoint of the main interest of
this paper, this temporal quality of speech is matched very well by the temporal
quality of emotions. The temporal flow of emotion probably has no natural
grammar as is the case for speech or written language. This means that the study
of emotion is more dependent on a thorough understanding of the ebb and flow
of the emotions as a function of time. The complexity of such temporal study
has necessarily 21 delayed its deeper development. Fortunately, the wide-
ranging nature of present research on emotion makes us hopeful that the
temporal qualities of emotion will be more thoroughly studied in the near
future.

94
REFERENCES

95
REFERENCES :
[1] M. E. Ayadi, M. S. Kamel, F. Karray, ―Survey on Speech Emotion
Recognition: Features, Classification Schemes, and Databases‖, Pattern
Recognition, vol. 44, pp. 572-587, 2011.
[2] S. K. Bhakre, A. Bang, ―Emotion Recognition on The Basis of Audio
Signal Using Naive Bayes Classifier‖, 2016 Intl. Conference on Advances in
Computing, Communications and Informatics (ICACCI), pp. 2363- 2367, 2016.
[3] I. Chiriacescu, ―Automatic Emotion Analysis Based On Speech‖, M.Sc.
THESIS Delft University of Technology, 2009.
[4] X. Chen, W. Han, H. Ruan, J. Liu, H. Li, D. Jiang, ―Sequence-to-sequence
Modelling for Categorical Speech Emotion Recognition Using Recurrent Neural
Network‖, 2018 First Asian Conference on Affective Computing and Intelligent
Interaction (ACII Asia), pp. 1-6, 2018.
[5] P. Cunningham, J. Loughrey, ―Over fitting in WrapperBased Feature
Subset Selection: The Harder You Try the Worse it Gets Research and
development in intelligent systems‖, XXI, 33-43, 2005.
[6] C. O. Dumitru, I. Gavat, ―A Comparative Study of Feature Extraction
Methods Applied to Continuous Speech Recognition in Romanian Language‖,
International Symphosium ELMAR, Zadar, Croatia, 2006.
[7] S. Emerich, E. Lupu, A. Apatean, ―Emotions Recognitions by Speech and
Facial Expressions Analysis‖, 17th European Signal Processing Conference,
2009.
[8] R. Elbarougy, M. Akagi, ―Cross-lingual speech emotion recognition
system based on a three-layer model for human perception‖, 2013 AsiaPacific
Signal and Information Processing Association Annual Summit and
Conference, pp. 1–10, 2013.

96
[9] D. J. France, R. G. Shiavi, ―Acoustical properties of speech as indicators of
depression and suicidal risk‖, IEEE Transactions on Biomedical Engineering,
pp. 829–837, 2000.
[10] P. Harár, R. Burget, M. K. Dutta, ―Speech Emotion Recognition with
Deep Learning‖, 2017 4th International Conference on Signal Processing and
Integrated Networks (SPIN), pp. 137-140, 2017.
[11] Q. Jin, C. Li, S. Chen, ―Speech emotion recognition with acoustic and
lexical features‖, PhD Proposal, pp. 4749–4753, 2015.
[12] Y. Kumar, N. Singh, ―An Automatic Spontaneous Live Speech
Recognition System for Punjabi Language Corpus‖, I J C T A, pp. 259-266,
2016.
[13] Y. Kumar, N. Singh, ―A First Step towards an Automatic Spontaneous
Speech Recognition System for Punjabi Language‖, International Journal of
Statistics and Reliability Engineering, pp. 81-93, 2015.
[14] Y. Kumar, N. Singh, ―An automatic speech recognition system for
spontaneous Punjabi speech corpus‖, International Journal of Speech
Technology, pp. 1-9, 2017.
[15] A. Khan, U. Kumar Roy, ―Emotion Recognition Using Prosodic and
Spectral Features of Speech and Naïve Bayes Classifier‖, 2017 International
Conference on Wireless Communications, Signal Processing and Networking
(WiSPNET), pp. 1017-1021, 2017.
[16] A. Kumar, K. Mahapatra, B. Kabi, A. Routray, ―A novel approach of
Speech Emotion Recognition with prosody, quality and derived features using
SVM classifier for a class of North-Eastern Languages‖, 2015 IEEE 2nd
International Conference on Recent Trends in Information Systems (ReTIS), pp.
372-377, 2015.
[17] Y. Kumar, N. Singh, ―Automatic Spontaneous Speech Recognition for
Punjabi Language Interview Speech Corpus‖, I.J. Education and Management
Engineering, pp. 64-73, 2016.

97
[18] G. Liu, W. He, B. Jin, ―Feature fusion of speech emotion recognition
based on deep Learning‖, 2018 International Conference on Network
Infrastructure and Digital Content (IC-NIDC), pp. 193-197, 2018.
[19] C. M. Lee, S. S. Narayanan, ―Toward detecting emotions in spoken
dialogs‖, IEEE Transactions on Speech and Audio Processing, pp. 293-303,
2005.
[20] S. Mirsamadi, E. Barsoum, C. Zhang, ―Automatic speech emotion
recognition using recurrent neural networks with local attention‖, 2017 IEEE
International Conference on Acoustics, Speech and Signal Processing
(ICASSP), pp. 2227-2231, 2017.
[21] A. Nogueiras, A. Moreno, A. Bonafonte, J. B. Marino, ―Speech Emotion
Recognition Using Hidden Markov Model‖, Eurospeech, 2001.
[22] J .Pohjalainen, P. Alku, ―Multi-scale modulation filtering in automatic
detection of emotions in telephone speech‖, International Conference on
Acoustic, Speech and Signal Processing, pp. 980- 984, 2014.
[23] S. Renjith, K. G. Manju, ―Speech Based Emotion Recognition in Tamil
and Telugu using LPCC and Hurst Parameters‖, 2017 International Conference
on circuits Power and Computing Technologies (ICCPCT), pp. 1-6, 2017.

Kernel References
 https://github.com/marcogdepinto/emotion-classification-from-audio-
files?fbclid=IwAR2T4hhtWWfKdU4FwLS8LOAnF5sBwnmfc6PQH
TGidzLaLl1uUVOvicx7TVw
 https://data-flair.training/blogs/python-mini-project-speech-emotion-
recognition/

98
APPENDIX

99
APPENDIX
(PUBLICATION DETAILS)

Paper Title: SPEECH EMOTION RECOGNITION


USING MACHINE LEARNING

Authors: Ms. S.Kumari,Balaji M ,Perinban D ,


Gopinath D ,Hariharan S.J.

Journal Name: International Research Journal of


Engineering and Technology (IRJET).

Edition: IRJET Volume 8, Issue 7, JULY 2021

Month and Year: JULY 2021

100
101
102
103
104
105
106

You might also like