0% found this document useful (0 votes)
42 views

Abstract for Facial Emotion Detection Using Neural Networks

Uploaded by

sathish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Abstract for Facial Emotion Detection Using Neural Networks

Uploaded by

sathish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 48

Facial Emotion Detection Using Neural

Networks
ABSTRACT
Recognizing facial expressions would help systems to detect if people were
happy or sad as a human being can. This will allow software’s and AI systems to
provide an even better experience to humans in various applications. From
detecting probable suicides and stopping them to playing mood based music there
is a wide variety of applications where emotion detection or mood detection can
play a vital role in AI applications.

The system works on CNN (convolutional neural network) for extracting the
physiological signals and make a prediction. The results can be drawn out by
scanning the person’s image through a camera and then correlate it with a training
dataset to predict one’s state of emotions.

This system can detect the Live Emotions of the particular user; system
compares the information with a training dataset of known emotion to find a
match. Different emotion types are detected through the integration of information
from facial expressions, body movement and gestures, and speech. The technology
is said to contribute in the emergence of the so-called emotional or emotive
Internet, Algorithm involve the use of different supervised machine learning
algorithms in which a large set of annotated data is fed into the algorithms for the
system to learn and predict the appropriate emotion.

INTRODUCTION
Facial expressions play a key role in understanding and detecting emotion.
Even the term "interface" suggests how important face plays in communication
between two entities. Studies have shown that reading of facial expressions can
significantly alter the interpretation of what is spoken as well as control the flow of
a conversation. The ability for humans to interpret emotions is very important to
effective communication; accounting for up to 93% of communication used in a
normal conversation depends on the emotion of an entity. For ideal human-
computer interfaces (HCI) would desire that machines can read human emotion.
For that this research is all about how computers can detect emotion properly from
its various sensors. This experiment has been used as a facial image as a medium
to read human emotion. The research on human emotion can be traced back to
Darwin's pioneer working and since then has attracted a lot of researchers to this
area. Seven basic emotions are universal to human beings. Namely neutral, angry,
disgust, fear, happy, sad, and surprise, and these basic emotions can be recognized
from a human's facial expression. This research proposes an effective way to detect
neutral, happy, sad, and surprise these four emotions from frontal facial emotion.
During the past decades, various methods have been proposed for emotion
recognition. Many algorithms were suggested to develop systems applications that
can detect emotions very well. Computer applications could better communicate by
changing responses according to the emotional state of human users in various
interactions. The emotion of a person can be determined by speech or his face or
even one's gesture. The work presented in this paper explores the recognition of
expressions from the face.

For facial emotion recognition, the traditional approaches usually consider a


face image that is distinguished from an information picture, and facial segments
or milestones are recognized from the face districts. After that different spatial and
worldly highlights are separated from these facial segments. At last dependent on
the separated highlights a classifier, for example, Keras library, random forest, is
trained to produce recognitions results. This work is an applied, deep learning
model. Deep learning is a well-set model in the pattern recognition domain. It uses
a Convolutional Neural Network (CNN) algorithm using Keras library. CNN is a
specific sort of artificial neural network that uses a machine-learning unit. CNN
applies to objects detections, face recognition, image processing, etc. Deep
convolutional neural network (DCNN) composition of many neural network
layers. Which is also can be able to extract significant features from the data.

REVIEW OF LITERATURE

In a research field of emotion detection, there is a contribution of several


domains like machine learning, natural language, neuroscience, etc. In previous
works, they individually rummaged facial expressions, voice features, and textual
data as universal indicators of emotions. Emotion can be classified into several
static classifications like happiness, sadness, disgust, anger, fear, and surprise. In
later works are improved by combining the image, voice, and textual data. The
fusion of this data gives the maximum accurate result. This type of fusion can be
done in three ways early, late, or hybrid. Other ethos features the elements of
emotion and the collaborations between emotional processes and other intellectual
procedures.

A. Emotion Detections Through Facial Feature Recognition


This work deals with the emotion recognition with the Machine
learning using support vector machine (SVM). Some principles are work to
detection, extraction, and evaluation of facial expressions of image. These are:

i. Viola-Jones cascade object detectors and Harris corner key-points to extract


faces and facial features from images.
ii. Histogram of oriented gradients (HOG) feature extraction.
iii. Support vector machines (SVM) to train a multi-class predictor for
classifying the seven
iv. fundamental human facial expressions such as: (Anger, Contempt, Disgust,
Fear, Happiness, Sadness, Surprise).

Computers can easily recognize facial expressions and can find out the motive
of a person including in entertainment, social media, content analysis, criminal
justice, and healthcare. Here is discussed mainly two-approach such as: (Zhang's
approach and Gabor wavelet coefficients). Zhang has shown that lower resolution
(64x64) is adequate, we will resize the extracted faces to 100x100 pixels. When
using the HOG and SVM classifier only, the accuracy for detection is 81%, much
better than a Fisher's face. Only approach. When using the dual-classifier method,
the accuracy is the same as HOG only at 81%, but the testing process is 20%
faster.

B. SVM Point-based Real-time Emotion Detection


This work deals with the emotion recognition with Machine learning using a
cascade of a multi-class support vector machine (SVM) and a binary SVM. This
algorithm is developed to extract emotions based on the movement of 19 feature
points. These feature points are located in different regions of the countenance
such as the mouth, eyes, eyebrows, and nose. It mainly works non-changeable rigid
points on the nose. Its divide into facial recognition and action unit (AU).
Computers can easily recognize facial expressions and can find out the motive of a
person including in entertainment, social media, content analysis, criminal justice,
and healthcare. A final suggestion for improvement is the fact that in the real-time
application the user needs to stay on the same distance concerning the camera from
which the neutral frame was taken. Otherwise, the theory behind the displacement
ratios is no longer valid. Rescaling the neutral distances based on the movement of
the user can be a solution to this problem.

SYSTEM ANALYSIS
FEASIBILITY STUDY:

In feasibility study an estimate is made of whether the identified user needs


may be satisfied using current software and hardware techniques. The study will
decide if the proposed system will be cost effective from a business point of view,
and if it can be developed given existing budgetary constraints.

The result should inform the decision whether to go ahead with a more
detailed analysis.The feasibility of the system is decided based on the following
three distinct aspects, which are considered in the investigation phase.

The development of a computer-based system is very likely to be plagued by


resource scarcity and stringent delivery schedules. It is necessary to evaluate the
feasibility of a project at the earliest possible time.
Wastage of manpower and financial resources and untold professional
embarrassment can be avoided if an ill-conceived system is identified early in the
definition phase and aborted. Scarcity of resources and project completion dates
challenges the software development process.

It is both necessary and prudent to evaluate the feasibility of a project at the


earliest possible time. A feasibility study is not warranted for systems in which
economic justification is obvious, technical risk is low, few legal problems are
accepted and no reasonable alternative exists.

An estimate is made of whether the identified user (client) may be satisfied


using current software and hardware technologies. The study will decide if the
proposed system will be cost effective from the business point of view and it can be
developed in the given existing budgetary constraints.

So, a detailed study was carried out to check the workability of the proposed
system. Feasibility study is the test of system proposal regarding its workability,
impact on the organization, ability to meet user needs, and effective use of
resources.

Thus when a new application is proposed, it normally goes through a


feasibility study before it is approved for development. Feasibility and Risk Analysis
are related in many ways. If project risk is great, the feasibility of producing quality
software is reduced. Our Project is feasible when we are using the procedure in
proper manner.

EXISTING METHOD
Zao et al.have achieved maximum accuracy up to 99.3% but at the cost of 22
layers neural network. Training such a large network is a time-consuming job.
FERC has keyframe extraction method, whereas others have only gone for the last
frame. Jung et al.tried to work with fixed frames which make the system not so
efficient with video input. The number of folds of training in most of the other
cases was ten only, whereas we could go up to 25-fold training because of small
network size.

PROPOSED SYSTEM

The problem statements we have are having robust and automated face
detection, analysis of the captured image and its meaningful analysis by facial
expressions, creating data sets for test and training and then the designing and the
implementation of perfectly fitted classifiers to learn underlying classifiers to learn
the vectors of the facial descriptors. We propose a model design which is capable
of recognizing up to six models which are considered universal among all walks of
cultures. Mainly being fear, happiness, sadness, surprise, disgust and lastly
surprise. Our system would be to understand a face and its characteristics and then
make a weighted assumption of the identity of the person. This algorithm is mainly
helped from the most widely used algorithms at this task, known as the Viola-
Jones algorithm.
SYSTEM CONFIGURATION

HARDWARE SPECIFICATION:

 Lenovo Yoga 530 model laptop with Intel i5 8th generation CPU

 RAM: 8 GB RAM

 HARD DISK: 512 GB SSD hard disk

SOFTWARE SPECIFICATION:

 Platform: Python (Using Thonny IDE)

 Operating System: Windows 10

 Tools Used: MATLAB 2018a, and ImageJ.


SYSTEM DESIGN
NORMALIZATION:

The first stage is normalization is to reduce the data to its first normal form,
by removing repeating items showing them as separate records but including in
them the key fields of the original record

The next stage of reduction to the second normal form is to check that the
record, which one is first normal form, all the items in each record are entirely
dependent on the key of the record. If a data item is not dependent on the key of
the record, but on the other data item, then it is removed with its key to form
another record. This is done until each record contains data items, which are
entirely dependent on the key of their record.

The final stage of the analysis, the reduction of third normal form involves
examining each record, which one is in second normal form to see whether any
items are mutually dependent. If there are any item there are removed to a
separate record leaving one of the items behind in the original record and using
that as the key in the newly created record.
SFD/DFD:

DATA FLOW DIAGRAM


Figure: Emotion Detection Data Flow Diagram

METHODOLOGY
This work consider the leading challenge faced by machine learning and the
entire system is the training part. Where the system has to train by using real data
of human face reactions. For example, if the system has to detect an angry face
then the first system has to be acquainted with the angry face. Also if the system
has to detect a happy face then the first system has to be acquainted with the happy
face. To antecedents the system with this emotion types, the re-training process has
been used. The re-training data were collected from the real world. The hardest
part of this system was the re-training part. There are also many other parts of the
system. Machine learning is a strong tool that enables data analysis of large
databases more proficiently and fleetly. This enables the capability of detection
emotion more accurate. It gives feedback in real-time. The system did not wait for
the result for the future, not the image has to be stored. With help of modern-day
computers, neoteric data mining techniques can analyse thousands of data within a
very short amount of time saving lots of hours. Besides, using and installing such
programs costs significantly less. If properly optimized these data mining
techniques can give perfect outcomes than a human. This work resented a general
and feasible framework for emotion data mining to identify emotion patterns using
machine learning. This paper proposed the program based on the Deep learning
model and computer vision emotion recognition. This proposed method uses the
CNN algorithm for this paper. This proposed a more advanced method than the
one that recognized only seven emotions with CNN. Their emotion recognition
method using deep learning followed four steps, as follows.
(1) Training the public face database with CNN.
(2) Extraction of seven probabilities for each frame of the face.
(3) Aggregation of single-frame probabilities into fixed-length image
descriptors for each image in the dataset.
(4) Classification of all images using a support vector machine (SVM)
trained on image descriptors of the competition training set.

A. Emotion Database
In the data collection steps, this is used both in real-world media and online
media to collect as much data as that could. Real-world includes different types of
emotional pictures of friends and family members, relatives, some known
unknown people’s different kinds of facial expressions. They culled data was
initially stored for future analysis. From online media, the data is collected data set
from kaggle.com. This site uploaded this data set 6years ago. This site most trusted
data set of emotions. This converted the data into 48×48 pixel grayscale images of
faces. It contains two sections pixels and feelings. The feeling section contains a
numeric code which runs from 0 to 6. What's more, the pixel section contains a
string incorporated in statements for each picture. Furthermore, the picture should
be only the picture of a face. So the collected pictures are resized and cropped
picture of a face. And a clear picture.

B. Training phase using deep learning


A great way to use deep learning to classify images is to build a convolution
neural network (CNN). The keras library in python makes it pretty simple to build
a CNN. Computer see images using pixels. Pixels in images are usually related.
For example, a certain group of pixels may signify an edge in an image or some
other pattern. Convolutions use this to help identify images. A convolution
multiplies a matrix of pixels with a filter matrix or ‘kernel’ and sums up the
multiplication values. Then the convolution slides over to the next pixel and
repeats the same process until all the image pixels have been covered. This process
is visualized below.

Figure: Emotion detection using Convolutional Neural Network

The model type that we will be using is Sequential. Sequential is the easiest
way to build a model in Keras. It allows you to build a model layer by layer. We
use the ‘add ()’ function to add layers to our model. Our first 2 layers are Conv2D
layers. These are convolution layers that will deal with our input images, which are
seen as 2-dimensional matrices. 64 in the first layer and 32 in the second layer are
the number of nodes in each layer. This number can be adjusted to be higher or
lower, depending on the size of the dataset. In our case, 64 and 32 work well, so
we will stick with this for now. Kernel size is the size of the filter matrix for our
convolution. So a kernel size of 3 means we will have a 3x3 filter matrix. Refer
back to the introduction and the first image for a refresher on this. Activation is the
activation function for the layer. The activation function we will be using for our
first 2 layers is the ReLU or Rectified Linear Activation. This activation function
has been proven to work well in neural networks. Our first layer also takes in an
input shape. This is the shape of each input image, 28,28,1 as seen earlier on, with
the 1 signifying that the images are greyscaling. In between the Conv2D layers and
the dense layer, there is a ‘Flatten’ layer. Flatten serves as a connection between
the convolution and dense layers. The model will then make its prediction based on
which option has the highest probability. Next, need to compile that model.
Compiling the model takes three parameters: optimizer, loss, and metrics. The
optimizer controls the learning rate. It will be using ‘adam’ as their optimizer.
Adam is generally a good optimizer to use for many cases. The adam optimizer
adjusts the learning rate throughout the training. The learning rate determines how
fast the optimal weights for the model are calculated. A smaller learning rate may
lead to more accurate weights (up to a certain point), but the time it takes to
compute the weights will be longer. We will use 'categorical cross-entropy' for our
loss function. This is the most common choice for classification. A lower score
indicates that the model is performing better. To make things even easier to
interpret, that will use the ‘accuracy’ metric to see the accuracy score on the
validation set when that train the model. To train, it will use the ‘fit ()’ function on
their model with the following parameters: training data (train_X), target data
(train_y), validation data, and the number of epochs. For its validation data, it will
use the test set provided its dataset, which has split into X_test and y_test. The
number of epochs is the number of times the model will cycle through the data.
The more epochs we run, the more the model will improve, up to a certain point.
After that point, the model will stop improving during each epoch. For our model,
we will set the number of epochs to 3. After 3 epochs, it has gotten to 93%
accuracy on that validation set.

C. Detection

K-means clustering was used with the number of clusters taken as two. Here,
the maximum value in all rows is found out and its average is determined.
Similarly, the minimum value in all rows is found out and its average is
determined. Considering these two points as the base, the pixel values nearer to the
maximum average value are grouped into a cluster and the pixel values nearer to
the minimum average value are grouped into another cluster. Based on the
clustering result, the total number of components in the image is calculated. Based
on the number of components, he person’s eyes are segmented first by using
bounding box function. Since the eye or eyebrow forms the first element while
traversing the pixel values column-wise, the eyes are segmented first. Using the
eye matrix, other facial parts are segmented using a distance-based algorithm. The
resulting image after performing k-means clustering for different expressions are
shown.

Figure: K-means clustering segmentation outputs

The Viola-Jones algorithm is a widely used mechanism for object detection.


The main property of this algorithm is that training is slow, but detection is fast.
This algorithm uses the Haar basis feature. Haar features are the relevant features
for face detection. There are various types of features such as:

i) Edge features

Figure: Edge features

ii) Line Features.

Figure: Line features

iii) Four Rectangle Features

Figure: Four rectangle features


a Vertical and horizontal edge detector filter matrix used at layer 1 of background
removal CNN (first-part CNN). b Sample EV matrix showing all 24 values in the
pixel in top and parameter measured at bottom. c Representation of point in Image
domain (top panel) to Hough transform domain (bottom panel) using Hough
transform

For example, we need face detection of a person then we need at first image
conversion in grayscale then second step image segmentation.
a Block diagram of FERC. The input image is (taken from camera or)
extracted from the video. The input image is then passed to the first-part CNN for
background removal. After background removal, facial expressional vector (EV) is
generated. Another CNN (the second-part CNN) is applied with the supervisory
model obtained from the ground-truth database. Finally, emotion from the current
input image is detected. b Facial vectors marked on the background-removed face.
Here, nose (N), lip (P), forehead (F), eyes (Y) are marked using edge detection and
nearest cluster mapping. The position left, right, and center are represented using
L, R, and C, respectively
Figure: Landmark image

Suppose we need to detect the eyebrow. Then we need edge features. If we


want to detect nose then we need line features black-white-black. If we want to
detect teeth then we need edge features. After using these Haar features the image
goes on the next feature. The ratio between these detected features is used in
emotion detection.
Figure: Haar features image.

We can calculate value by using Fourier equation,

For ideal Haar features,


The black region value is 1 and the white region value is 0.
So the difference between dark and white 1-0=1.
Δ for ideal Haar features is 1

For real image, If we calculate the black region and we average it's when we get
0.74 and the same way white region value is 0.18. So the difference between dark
and white: 0.74-0.18=0.56
Δ for real image: 0.56
Figure: Feature extraction

Convolution filter operation with the 3×3 kernel. Each pixel from the input
image and its eight neighboring pixels are multiplied with the corresponding value
in the kernel matrix, and finally, all multiplied values are added together to achieve
the final output value
Neural networks are typically organized in layers. Layers are made up of
several interconnected nodes that contain an activation function. Patterns are
presented to the network via the input layer, which communicates to one or more
hidden layers where the actual processing is done via a system of weighted
connections. This process of the facial expression recognition system is divided
into three stages- Image Pre Processing which involves Face and facial parts
detection using the viola-Jones algorithm, facial Feature extraction, and feature
classification using CNN. Keras is an open-source neural network in python, which
is used for the pre-processing, modeling, evaluating, and optimization. It is used
for high-level API as it handled by backend. It is designed for making a model
with loss and optimizer function, and training process with fit function. For
backend, it designed for convolution and low-level computation under tensors or
TensorFlow. Importing the below python libraries is used for preprocessing,
modeling, optimization, testing, and display emotion which having a maximum
percentage. It uses a sequential model and some layers such as image pre-
processing, convolution layer, pooling layer, flatten layers, and dense layers,
activation, ReLU. Image preprocessing is the first phase of the proposed system
and it involves the Face Detection and FPs detection and extraction. The Viola-
Jones face detection framework, which is a robust algorithm capable of processing
images extremely rapidly for real-time situations, is used. This algorithm detects
face region irrespective of variance in size, background, brightness, and spatial
transformation of the raw input image. The face FP detection is achieved by
combining classifiers in a cascade structure that is capable of increasing the
detection performance while reducing computational complexity. The final
classifier is computed by the linear combination of all weak classifiers, which
separates the positive and negative in terms of the weighted error (weight of each
learner is directly proportional to its accuracy). The face is first detected, cropped,
extracted and normalized to a size of 64 x 64 pixels, and then facial parts (both
eyes and mouth) are detected, cropped and extracted from the normalized face
image. The extracted facial parts are resized to equal size of 32 x 64 pixels. The
reduced image scale helps to reduce the information that has to be learned
by the network and also makes training faster and with less memory cost.
Convolution layers will be added for better accuracy for large datasets. The dataset
is collected from CSV file (in pixels format) and it's converted into images and
then classify emotions with respective expressions.

Here emotions are classified as happy, sad, angry, surprise, neutral, disgust,
and fear with 34,488 images for the training dataset and 1,250 for testing. Each
emotion is expressed with different facial features like eyebrows, opening the
mouth, raised cheeks, wrinkles around the nose, wide-open eyelids, and many
others. Trained the large dataset for better accuracy and result that is the object
class for an input image. Pooling is a concept in deep learning visual object
recognition that goes hand-in-hand with convolution. The idea is that a convolution
(or a local neural network feature detector) maps a region of an image to a feature
map. For example, a 5x5 array of pixels could be mapped to oriented edge features.
Flattening occurs when you reduce all Photoshop layers to one background layer.
Layers can increase file size, thereby also tying up valuable processing resources.
To keep down file size, you may choose to merge some layers or even flatten the
entire image to one background layer. The dense layer is the regular deeply
connected neural network layer. It is the most common and frequently used layer.
The dense layer does the below operation on the input and returns the output.
Based on the connection strengths (weights), inhibition or excitation, and transfer
functions, the activation value is passed from node to node. Each of the nodes
sums the activation values it receives; it then modifies the value based on its
transfer function. In Keras, it can implement dropout by added Dropout layers into
our network architecture. Each Dropout layer will drop a user-defined
hyperparameter of units in the previous layer every batch. Remember in Keras the
input layer is assumed to be the first layer and not added using the add. ReLU is
one of the most popular types of nonlinearity to use in neural networks that are
applied after the convolutional layer and before max pooling. It replaces all
negative pixel values in the feature map by zero. It normally used after the
convolutional layer.

Example:

ReLU is the max function(x, 0) with input x, matrix from a convolved


image. ReLU then sets all negative values in the matrix x to zero and all other
values are kept constant. ReLU is computed after the convolution and therefore a
nonlinear activation function like than or sigmoid. Adam is an optimization
algorithm that can be used instead of the classical stochastic gradient descent
procedure to update network weights iterative based on training data. It has been
found that there are many endeavors have been taken using several automated
techniques to analyze emotions. However, most of them are found without any
establishing framework and describing how to properly use them. More
specifically, understanding and maintaining the emotion analysis capability can
help law-enforcement authorities effectively use machine learning techniques to
track and identify emotion patterns.

At first, take the image from the user then remove the noise. Then identify
only the face of a person and applying the Haar features. Then match the image
with the previous training dataset. Here use the Keras library of python. It works
with the convolution neural network (CNN). CNN
works with a sequential model. It also uses some layers such as Conv2D,
MaxPooling2D, AveragePooling2D, Dense, Activation, Dropout, and Flatten.
After the approach, these layers select the emotion from the classification set. This
is the final output. After performing some pre-processing (if necessary), the
normalized face image is presented to the feature extraction part to find the key
features that are going to be used for classification. In another word, this module is
responsible for composing a feature vector that is well enough to represent the face
image. After doing this comparison, the face image is classified into one of the
seven expressions (anger, contempt, disgust, fear, happiness, sadness, surprise).

Table 1 Results obtained with different databases

RESULT & ANALYSIS


The first major challenge was the confined measure of information for
preparing a broad framework. Which needs to defeat for framework in nature.
Move learning is the most prevalent response to this. In this methodology that was
begun from pre-prepared strategy and calibrated this model with the put-away
information which is gathered from a genuine world. A progression of starter
investigations affirmed the presumption that face acknowledgment would serve
better in highlight extraction. There are models where such systems are
effectively utilized.
Figure: Expression with Surprise Emotion

Machine Learning algorithms work well on the datasets that have a couple of
hundred highlights or segment. The algorithm successfully classifies an image and
classify the sentiment of the image and choose the match emotion for the image.
The reason behind choosing the deep learning classifier is that the classifier runs
data through several layers. And a deep learning algorithm can be useful for less
unpredictable issues since they gain admittance to an immense measure of
information to be compelling. For pictures, the regular benchmark for preparing
profound learning models for broad picture acknowledgment approaches more than
14 million pictures. For perfect visualization of emotion detection pattern analysis,
it used a decision tree. In the decision tree, the character is represented by the
nodes and layers, and also the outcome
of the experiment is represented by the branch. The advantage of the decision tree
is that it is very helpful and easy to visualize the emotion and interpret the result.
The working process of a decision tree is easy to understand. If it has been
classified the data according to their movement,
reactions, and order which ideally different types of emotions. This also has been
classified into trees and sub trees which reflects that whether the person is sad,
angry or happy, etc. if this could find something that can categorize their using
these methods more simply. To do this it has
been used retrain method that memorized the pattern and satisfies the condition.
When any of the condition is satisfied it carry on to the end of the tree. However, if
none of the conditions satisfy the intermediate condition, it will stop checking and
say ‚The emotion cannot be identified.
The emotion is unknown‛. Emotions are complicated to understand. There are
different kinds of expression for the same emotion. Different people give different
kinds of expression for the same kind of emotion. Modern-day machine learning
technology can help law-enforcement authority to detect emotion so the machine
can understand the emotion of humans and more behave and act like humans. This
data for emotion came from different online and offline media. Such as Google,
kaggel.com site. Friends and family, random people, etc. This is used Keras library
to initially classify and analyze the emotion and got that data. Then with the help
of Haar features and Numpy, It identifies the emotion. And with the help of
platform anaconda. It generates the output from the raw data where the result is
going to show in real-time. The hierarchical data mining procedure like decision
tree helps to generate probability decision by calculating various probability
decisions by calculating various characteristic which is
initially used to identify the emotion pattern. Along with offline and online data
collection, it also conducted an effective field study to gather more people and
various kinds of people and various emotional deferent expressions lots of different
faces. In online data collection, the data set is taken from kaggel.com. They
provide quality data sets. They converted the images into pixel grayscale and use
the numerical number of the images. So, it gives the quality data and the batter
result. Both of the experts believed that this analysis of sentiment could help
identify emotion more accurately and help to take accurate actions on behalf of
accurate emotion identification. It would provide more knowledge about different
types of expression of their
sentiment as well as the percentage of each existed various kinds of emotions.
While completing this work, we found that a large quantity of test data and
keywords are needed if it wants to get greater accuracy. A lack of a good quantity
of raw data is also required to extend the research work. A high configuration
graphics processing unit (GPU) qualified computer is also required if this wants to
process a large quantity of test data in the shortest time. So, if this gets adequate
data along with a high-performance computer, it will be easier for that to rise the
accuracy to
more than 97%. It will also be able to use that system for a different platform for a
different outcome and help to determine the emotion expression pattern.

CREATE DATASET
As we know, Keras has many data preprocessing APIs, and to feed the
model will use Keras’ data generation API and, according to Keras, the dataset
must be stored in specific directories. Keras’ data generation API expects images
to be sorted into separate directories like training and validation, and each
directory has a sub-directory according to their category.

Once the images are copied to their respective folder and sub-folder, we will
define image generators to preprocess the images. Keras provides the
ImageDataGenerator API for preprocessing. Python code for the training and
validation image generator is mentioned below.

image_height, img_width = 224, 224


batchsize = 64# Image Data Generator for Training and validation
data--------
train_datagen = ImageDataGenerator(
rotation_range=20,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.1,
horizontal_flip=True,
rescale=1./255,
shear_range=0.01,
fill_mode='constant')
valid_datagen = ImageDataGenerator(rescale=1./255)# Now code for
generator, it will read pictures found in
# subfolers and indefinitely generate
# batches of augmented image datatrain_generator =
train_datagen.flow_from_directory(
'data/train',
target_size=(image_height, img_width),
batch_size=batchsize,
class_mode='categorical') # similar
generator, for validation datavalidation_generator =
valid_datagen.flow_from_directory(
'data/valid',
target_size=(image_height, img_width),
batch_size=batchsize,
class_mode='categorical')

Model Training and Compilation


First, we need to import the model to do model training. You can import the model
from the Keras application using python code from keras.applications
import VGG16 . In this tutorial, we are using the VGG16 model as the name
base_model_VGG16 and it is faster compared to others like ResNet or some of the
other newer models. After model training, it's required to compile the model.

base_model_VGG16 = VGG16(weights='imagenet', include_top=False,


input_shape=(image_height, img_height, 3))base_model_VGG16.summary()# Model
Compilation python code
base_model_VGG16.compile(loss= 'categorical_crossentropy',
metrics = ['acc'],
optimizer= 'adam',)

Model Saving
The next step after compilation is to fit the model and, for that the
fit_generator function is used.

nb_train_samples = train_df.size
nb_validation_samples = valid_df.size
epochs = 10
history = base_model_VGG16.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples // batchsize,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batchsize,
callbacks=callbacks_list,
class_weight=class_weights,
epochs=epochs)# To save the model as .h5
extentionsaved_model_path =
"./saved_models/Emotion_Detection_Model.h5"
base_model_VGG16.save(saved_model_path)

To visualize the model prediction output as an image, we will use the below
method. Below the code plot is the image along with a bar chart indicating the
individual emotion percentage of the input image.

img_test = image.load_img("./test/image1.png", grayscale=True,


target_size=(224,224))x = image.img_to_array(img_test)
x = np.expand_dims(x, axis = 0)
x /= 255
custom = model.predict(x)
emotion_graph(custom[0])
x = np.array(x, 'float32')
x = x.reshape([224, 224]);plt.gray()
plt.imshow(x)
plt.show()def emotion_graph(emotions):
objects = ('angry', 'disgust', 'fear', 'happy', 'sad',
'surprise', 'neutral')y_pos = np.arange(len(objects))
plt.bar(y_pos, emotions, align='center', alpha=0.5)
plt.xticks(y_pos, objects)
plt.ylabel('percentage Value')
plt.title('Emotion Graph')plt.show()
TESTING AND IMPLEMENTATION
TESTING
INTRODUCTION

Testing is a process to identify and correct the errors in the proposed system.
Before implementing the system, it should be tested in the manner that it works
effectively under various conditions. Every new system must be tested in various
ways to make sure of its perfectas.

System testing is actually a series of different tests whose primary purpose is


to fully exercise the computer based system. Software is only one element of a
larger computer based system. Ultimately software is incorporated with other
system elements and series of system integration and validation tests are
conducted.

Testing presents an interesting anomaly for the software development. The


testing phase creates a series of test cases that are intended to demolish the
software that has been built. A good test case is one that has a high probability of
finding an as yet undiscovered error. A successful test is one that uncovers as an
yet undiscovered error.

STAGES IN THE TESTING PROCESS

The following are the various stages in the testing process.

 Unit testing
 Module testing
 Sub-system testing
 System testing
 Acceptance testing
 Task testing
 Behavioral testing
 Intertask testing
 User Interface testing
 Integration testing

Unit Testing

Unit testing focuses verification and validation effort on the smallest unit of
software design i.e., the module. The unit testing in always-white box oriented and
a step can be conducted in parallel for modules. The software developer does not
turn the program to the ITG group and go away.

It comprises the set of tests performed by an individual programmer prior to


integration of the unit into a larger system. The situation is illustrated as follows.

A program unit is usually small enough that the programmer who developed
can test in a great detail and certainly in greater detail that will be possible when
the unit is integrated into an evolving software product.

There are four categories of test that a programmer will typically perform on
a program unit.

 Functional tests

 Performance tests

 Stress tests

 Structure tests
Functional Tests
Functional test, where test cases involving exercising the code with nominal
input values for which the expected results are known, were done.

Performance Testing

This testing is designed to test the run time performance of software within
the content of an integrated system. This testing occurs through out all steps in the
testing process. It is concerned with the evaluation speed and memory utilization of
the program.

Stress Tests

Stress testing, which is concerned with exercising the internal logic of a


program and traveling particular execution paths is done. The input is given in
such away that starting form request from client to the job completion all possible
paths is tested.

Structure Tests

Structure testing is also referred to as White Box or Glass Box testing. The
project is tested for its execution in every module. The testing operation is
successfully done and every module performs properly.

Module Testing

A module is the collection of dependant components such as an object class,


an abstract data type or some collection of procedures and functions. The module
encapsulates related components that can be tested without other system modules.
Sub-System Testing

This phase involves testing collection of modules, which have been


integrated into sub-systems. Sub-systems may be independently designed and
implemented.

The most common problem, which arises in large software systems, is sub-
system interface mismatches. The sub-system test process should therefore
concentrate on the detection of interface errors by rigorously exercising these
interfaces.

System Testing

The sub-systems are integrated to buildup the entire system. The testing
process is concerned with finding errors, which is an outcome of unanticipated
interaction between sub-system and system components. If is also concerned with
validating whether the system meets its functional and non-functional
requirements.

Accepting Testing

This is the final stage in the testing process before the system is accepted for
operational use. The system is tested data supplied by the system developed rather
than simulated test data acceptance testing may reveal error and omissions in the
system requirement definition because the real data exercises the system in
different ways from the test. The systems facilities do not really meet the users
needs or the system performance in unacceptable.

Task Testing
The first step in the testing of real-time software is to test each task
independently. That is, the white and black box tests are designed and executed for
each task. Each task is executed independently during the tests. The task testing
uncovers errors in logic and functions, but will not uncover timing or behavioral
errors.

Behavioural Testing

Using system modules created with case tools, it is possible to simulate the
behavior of external events. Using a technique that is similar to equivalence of
partitioning, events and categorized for testing.

Each of these events is tested individual and behavior of the executed system
is examined to detect an error that occurs as a Consequence of processing
associated with these events. Once, each class of events is tested, events are
presented to the system in random order and with frequency.

Intertask Testing

Once the errors in individual tasks and in system behavior have been
isolated, testing shifts to time-related errors the asynchronous task that are known
to communicate with one another are tested with different data rates and
processing load to determine if inter task synchronization errors will occur.

User Interface Testing

An interactive interface is a system that is dominated by interactive between


the system and external agents, such as human, devices or other programs.
The external agents are independent of the system o their input cannot be
controlled, although the system may solicit response from them. An interactive
interface usually includes only part of an entire application, one that can often be
handled independently from the computation part of the application.

The major concerns of an interactive interface are the communications


protocol between system and the external agents, the syntax of possible
interactions the presentation of output the flow of control within the system, the
ease of understanding an user interface, performance and error handling.

The dynamic model dominates interactive interfaces. Object in the model


represent interaction elements, such s input and output tokens and presentation
formats. The functional model describes which application functions are executed
in response to input event sequences, but the internal structure of the functions is
usually unimportant to the behavior of the interface.

Integration Testing

Integration testing is a logical extension of unit testing. In its simplest form,


two units that have already been tested are combined into a component and the
interface between them is tested. A component, in this sense, refers to an integrated
aggregate of more than one unit. In a realistic scenario, many units are combined
into components, which are in turn aggregated into even larger parts of the
program.

The idea is to test combinations of pieces and eventually expand the process
to test your modules with those of other groups. Eventually all the modules making
up a process are tested together. Beyond that, if the program is composed of more
than one process, they should be tested in pairs rather than all at once.
Integration testing identifies problems that occur when units are combined.
By using a test plan that requires you to test each unit and ensure the viability of
each before combining units, you know that any errors discovered when combining
units are likely related to the interface between units. This method reduces the
number of possibilities to a far simpler level of analysis

IMPLEMENTATION

Implementation is the stage of the project where the theoretical design is


turned into a working system. At this stage the main work load, the greatest
upheaval and the major impact on the existing system shifts to the user department.

If the implementation is not carefully planned and controlled, it can cause


chaos and confusion. Implementation includes all those activities that take place to
convert from the old system to the new one.

The new system may be totally new, replacing an existing manual or


automated system or it may be a major modification to an existing system. Proper
implementation is essential to provide a reliable system to meet the organization
requirements.

Successful implementation may not guarantee improvement in the


organization using the new system, but improper installation will prevent it.

SAMPLE CODE
Implementing VGG16 Network for Classification of Emotions with GPU

First, we need to enable GPU in the Google Colab to get fast processing. We
can enable it by going to ‘Runtime’ in Google Colab and then clicking on ‘Change
runtime type’ and select GPU. Once it is enabled we will now import the required
libraries for building the network. The code for importing the libraries is given
below.

import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers.core import Flatten, Dense, Dropout
from keras.layers.convolutional import Convolution2D,
MaxPooling2D, ZeroPadding2D
from keras.optimizers import SGD
import cv2

We have now imported all the libraries and now we will import the data set.
I have already saved it in my drive so I will read it from there. You can give
directory in round brackets where your data is stored as shown in the code below.
After importing we have printed the data frame as shown in the image.

emotion_data = pd.read_csv('/content/drive/My
Drive/Emotion_Detection /fer2013.csv')
print('emotion_data')
View of the data frame

We then create different lists of storing the testing and training image pixels.
After this, we check if the pixel belongs to training then we append it into the
training list & training labels. Similarly, for pixels belonging to the Public test, we
append it to testing lists. The code for the same is shown below.

X_train = []
y_train = []
X_test = []
y_test = []
for index, row in emotion_data.iterrows():
k = row['pixels'].split(" ")
if row['Usage'] == 'Training':
X_train.append(np.array(k)
y_train.append(row['emotion'])
elif row['Usage'] == 'PublicTest':
X_test.append(np.array(k))
y_test.append(row['emotion'])
Once we have added the pixel to the lists then we convert them into NumPy arrays
and reshape X_train, X_test. After doing this we convert the training labels and
testing labels into categorical ones. The code of the same is given below .

X_train = np.array(X_train')
y_train = np.array(y_train)
X_test = np.array(X_test)
y_test = np.array(y_test)

X_train = X_train.reshape(X_train.shape[0], 48, 48, 1)


X_test = X_test.reshape(X_test.shape[0], 48, 48, 1)

y_train= np_utils.to_categorical(y_train, num_classes=7)


y_test = np_utils.to_categorical(y_test, num_classes=7)

VGG16 Model for Emotion Detection

Now it’s time to design the CNN model for emotion detection with different
layers. We start with the initialization of the model followed by batch
normalization layer and then different convents layers with ReLu as an activation
function, max pool layers, and dropouts to do learning efficiently. You can also
change the architecture by initiating the layers of your choices with different
numbers of neurons and activation functions.

model = Sequential()
model.add(ZeroPadding2D((1,1),input_shape=(48,48,1)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(7, activation='softmax'))

After this, we compile the model using Adam as an optimizer, loss as categorical
cross-entropy, and metrics as accuracy as shown in the below code.

model.compile(optimizer='sgd',loss='categorical_crossentropy',met
rics=['accuracy'])

After compiling the model we then fit the data for training and validation. Here, we
are taking the batch size to be 32 with 30 epochs. You can tune them according to
your wish.

model.fit(X_train,train_y,batch_size=32,epochs=30,verbose=1,valid
ation_data=(X_test, test_y))
Training of the Network

Once the training has been done we can evaluate the model and compute loss and
accuracy using the below code.

loss_and_metrics = model.evaluate(X_test,y_test)
print(loss_and_metrics)
We now serialize the model to JSON and save the model weights in an hd5 file so
that we can make use of this file to make predictions rather than training the
network again. You can do this task by using the below code.

model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
model.save_weights("model.h5")
print("Saved model to disk")

CONCLUSION
An experienced human can often identify another human’s emotions by
analyzing and looking at him or her. However, in this modern age machines are
becoming more intelligent. For the time been machines are trying to act more like
humans. If the machine has been trained on how to react on behalf of the human
sentiment at that time. Then the machine can behave and act like a human. On the
other hand, if the machine can identify the emotion it can prevent lots of
occurrences too. With increased proficiency and errorless computation emotion,
data mining can facilitate accurate expression patterns enabling machines to find
and act more like humans effectively. To determine the emotion expression
patterns this thesis is created or framework with comprehensive research and field
works. This followed the framework step by step to get the expected outcome. To
follow the framework and to identify the emotion expression patterns more
effectively and used deep learning CNN algorithm along with Keras, Tensorflow,
and retraining concepts. With these techniques, it was possible to identify
emotions, type of emotion in the real image. To delineate the result and procedures
more visually and this has also introduced decision tree techniques which helps to
decide which emotions percentage is high and which emotions percentage is low.
Now the high percentage of emotions get the most possible accurate emotions. And
the low percentage of emotions get the low chance of existence. With this
discovery, it is now possible to determine accurate emotions. And machines can
identify emotion more accurately and on behalf of that, they can give a proper
reaction and also can help to prevent the same unwonted occurrence. This machine
can also become the replacement of a human.

FUTURESCOPE
Emotion recognition is the process of machines detecting, interpreting, and
classifying human emotion based on facial characteristics.

Visual emotion analysis is a high-level vision task due to the effective gap between
small pixels and high-level emotions. Despite the challenges, visual emotion
analysis opens up possibilities because comprehending human emotions is a
crucial task in achieving robust AI. Due to the fast evolution of convolutional
neural networks, deep learning has become the dominant emotion detection and
identification model.

BIBLIOGRAPHY
[1]. Chu Wang, Jiabei Zeng, Shiguang Shan, Xilin Chen, ‚MULTI-TASK

LEARNING OF EMOTION RECOGNITION AND FACIAL ACTION UNIT


DETECTION WITH ADAPTIVELY WEIGHTS SHARING NETWORK,‛ in
IEEE, 2019

[2]. Ninad Mehendale, ‚Facial emotion recognition using convolutional neural


networks (FERC),‛ 18 February 2020
[3]. James Pao, ‚Emotion Detection Through Facial Feature Recognition,‛ in
International Journal of Multimedia and Ubiquitous Engineering, November 2017

[4]. Aitor Azcarate, Felix Hageloh, Koen van de Sande, Roberto Valenti,
‚Automatic facial emotion recognition,‛ January 2005

[5]. Dan Duncan, Gautam Shine, Chris English, "Facial Emotion Recognition in
Real-Time," November 2016

[6]. Shivam Gupta, ‚Facial emotion recognition in real-time and static images,‛ in
2nd International Conference on Inventive Systems and Control (ICISC) IEEE, 28
June 2018

[7]. Aneta Kartali, Miloš Roglić, Marko Barjaktarović, Milica Đurić-Jovičić,


Milica M. Janković, ‚Real-time Algorithms for Facial Emotion Recognition: A
Comparison of Different Approaches,‛ in 2018 14th Symposium on Neural
Networks and Applications (NEURAL), Nov 2018
[8]. Jonathan, Andreas Pangestu Lim, Paoline, Gede Putra Kusuma, Amalia Zahra,
‚Facial Emotion Recognition Using Computer Vision,‛ in Indonesian Association
for Pattern Recognition International Conference (INAPR) IEEE, 31 January 2019

[9]. Renuka S. Deshmukh, Vandana Jagtap, Shilpa Paygude, ‚Facial emotion


recognition system through machine learning approach,‛ in International
Conference on Intelligent Computing and Control Systems (ICICCS) IEEE, 11
January 2018
[10]. Hyeon-Jung Lee, Kwang-Seok Hong, ‚A Study on Emotion Recognition
Method and Its Application using Face Image,‛ in International Conference on
Information and Communication Technology Convergence (ICTC) IEEE, 14
December 2017

[11]. T. K. Senthilkumar, S. Rajalingam, S. Manimegalai, V. Giridhar Srinivasan


‚Human Facial Emotion Recognition Through Automatic Clustering Based
Morphological Segmentation And Shape/Orientation Feature Analysis,‛ in IEEE
International Conference on Computational Intelligence and Computing Research
(ICCIC), 08 May 2017

[12]. Piórkowska, Magda, Wrobel, Monika, ‚Basic Emotions,‛ in Springer


International Publishing AG, 15 July 2017

[13]. Ninad Mehendale ‚Facial emotion recognition using convolutional neural


networks (FERC),‛ in Springer Nature Switzerland AG, 18 February 2020

[14]. Victor M. Alvarez, Ramiro Velázquez, Sebastián Gutierrez, Josué Enriquez-


Zarate ‚A Method for Facial Emotion Recognition Based on Interest Points,‛ in
International Conference on Research in Intelligent and Computing in Engineering
(RICE), 25 October 2018.

[15]. Byoung Chul Ko, ‚A Brief Review of Facial Emotion Recognition Based on
Visual Information,‛ 30 January 2018 *18+. Dumas, Melanie, ‚Emotional
Expression Recognition using Support Vector Machines,‛ July 2001
[16]. Muzammil, Abdulrahman, ‚ Facial expression recognition using Support
Vector Machines,‛ in 23nd Signal Processing and Communications Applications
Conference (SIU) IEEE, 22 June 2015

[17]. Turabzadeh, Saeed & Meng, Hongying & Swash, M. & Pleva, Matus &
Juhár, Jozef, ‚Facial Expression Emotion Detection for Real-Time Embedded
Systems,‛ January 2018

You might also like