Abstract for Facial Emotion Detection Using Neural Networks
Abstract for Facial Emotion Detection Using Neural Networks
Networks
ABSTRACT
Recognizing facial expressions would help systems to detect if people were
happy or sad as a human being can. This will allow software’s and AI systems to
provide an even better experience to humans in various applications. From
detecting probable suicides and stopping them to playing mood based music there
is a wide variety of applications where emotion detection or mood detection can
play a vital role in AI applications.
The system works on CNN (convolutional neural network) for extracting the
physiological signals and make a prediction. The results can be drawn out by
scanning the person’s image through a camera and then correlate it with a training
dataset to predict one’s state of emotions.
This system can detect the Live Emotions of the particular user; system
compares the information with a training dataset of known emotion to find a
match. Different emotion types are detected through the integration of information
from facial expressions, body movement and gestures, and speech. The technology
is said to contribute in the emergence of the so-called emotional or emotive
Internet, Algorithm involve the use of different supervised machine learning
algorithms in which a large set of annotated data is fed into the algorithms for the
system to learn and predict the appropriate emotion.
INTRODUCTION
Facial expressions play a key role in understanding and detecting emotion.
Even the term "interface" suggests how important face plays in communication
between two entities. Studies have shown that reading of facial expressions can
significantly alter the interpretation of what is spoken as well as control the flow of
a conversation. The ability for humans to interpret emotions is very important to
effective communication; accounting for up to 93% of communication used in a
normal conversation depends on the emotion of an entity. For ideal human-
computer interfaces (HCI) would desire that machines can read human emotion.
For that this research is all about how computers can detect emotion properly from
its various sensors. This experiment has been used as a facial image as a medium
to read human emotion. The research on human emotion can be traced back to
Darwin's pioneer working and since then has attracted a lot of researchers to this
area. Seven basic emotions are universal to human beings. Namely neutral, angry,
disgust, fear, happy, sad, and surprise, and these basic emotions can be recognized
from a human's facial expression. This research proposes an effective way to detect
neutral, happy, sad, and surprise these four emotions from frontal facial emotion.
During the past decades, various methods have been proposed for emotion
recognition. Many algorithms were suggested to develop systems applications that
can detect emotions very well. Computer applications could better communicate by
changing responses according to the emotional state of human users in various
interactions. The emotion of a person can be determined by speech or his face or
even one's gesture. The work presented in this paper explores the recognition of
expressions from the face.
REVIEW OF LITERATURE
Computers can easily recognize facial expressions and can find out the motive
of a person including in entertainment, social media, content analysis, criminal
justice, and healthcare. Here is discussed mainly two-approach such as: (Zhang's
approach and Gabor wavelet coefficients). Zhang has shown that lower resolution
(64x64) is adequate, we will resize the extracted faces to 100x100 pixels. When
using the HOG and SVM classifier only, the accuracy for detection is 81%, much
better than a Fisher's face. Only approach. When using the dual-classifier method,
the accuracy is the same as HOG only at 81%, but the testing process is 20%
faster.
SYSTEM ANALYSIS
FEASIBILITY STUDY:
The result should inform the decision whether to go ahead with a more
detailed analysis.The feasibility of the system is decided based on the following
three distinct aspects, which are considered in the investigation phase.
So, a detailed study was carried out to check the workability of the proposed
system. Feasibility study is the test of system proposal regarding its workability,
impact on the organization, ability to meet user needs, and effective use of
resources.
EXISTING METHOD
Zao et al.have achieved maximum accuracy up to 99.3% but at the cost of 22
layers neural network. Training such a large network is a time-consuming job.
FERC has keyframe extraction method, whereas others have only gone for the last
frame. Jung et al.tried to work with fixed frames which make the system not so
efficient with video input. The number of folds of training in most of the other
cases was ten only, whereas we could go up to 25-fold training because of small
network size.
PROPOSED SYSTEM
The problem statements we have are having robust and automated face
detection, analysis of the captured image and its meaningful analysis by facial
expressions, creating data sets for test and training and then the designing and the
implementation of perfectly fitted classifiers to learn underlying classifiers to learn
the vectors of the facial descriptors. We propose a model design which is capable
of recognizing up to six models which are considered universal among all walks of
cultures. Mainly being fear, happiness, sadness, surprise, disgust and lastly
surprise. Our system would be to understand a face and its characteristics and then
make a weighted assumption of the identity of the person. This algorithm is mainly
helped from the most widely used algorithms at this task, known as the Viola-
Jones algorithm.
SYSTEM CONFIGURATION
HARDWARE SPECIFICATION:
Lenovo Yoga 530 model laptop with Intel i5 8th generation CPU
RAM: 8 GB RAM
SOFTWARE SPECIFICATION:
The first stage is normalization is to reduce the data to its first normal form,
by removing repeating items showing them as separate records but including in
them the key fields of the original record
The next stage of reduction to the second normal form is to check that the
record, which one is first normal form, all the items in each record are entirely
dependent on the key of the record. If a data item is not dependent on the key of
the record, but on the other data item, then it is removed with its key to form
another record. This is done until each record contains data items, which are
entirely dependent on the key of their record.
The final stage of the analysis, the reduction of third normal form involves
examining each record, which one is in second normal form to see whether any
items are mutually dependent. If there are any item there are removed to a
separate record leaving one of the items behind in the original record and using
that as the key in the newly created record.
SFD/DFD:
METHODOLOGY
This work consider the leading challenge faced by machine learning and the
entire system is the training part. Where the system has to train by using real data
of human face reactions. For example, if the system has to detect an angry face
then the first system has to be acquainted with the angry face. Also if the system
has to detect a happy face then the first system has to be acquainted with the happy
face. To antecedents the system with this emotion types, the re-training process has
been used. The re-training data were collected from the real world. The hardest
part of this system was the re-training part. There are also many other parts of the
system. Machine learning is a strong tool that enables data analysis of large
databases more proficiently and fleetly. This enables the capability of detection
emotion more accurate. It gives feedback in real-time. The system did not wait for
the result for the future, not the image has to be stored. With help of modern-day
computers, neoteric data mining techniques can analyse thousands of data within a
very short amount of time saving lots of hours. Besides, using and installing such
programs costs significantly less. If properly optimized these data mining
techniques can give perfect outcomes than a human. This work resented a general
and feasible framework for emotion data mining to identify emotion patterns using
machine learning. This paper proposed the program based on the Deep learning
model and computer vision emotion recognition. This proposed method uses the
CNN algorithm for this paper. This proposed a more advanced method than the
one that recognized only seven emotions with CNN. Their emotion recognition
method using deep learning followed four steps, as follows.
(1) Training the public face database with CNN.
(2) Extraction of seven probabilities for each frame of the face.
(3) Aggregation of single-frame probabilities into fixed-length image
descriptors for each image in the dataset.
(4) Classification of all images using a support vector machine (SVM)
trained on image descriptors of the competition training set.
A. Emotion Database
In the data collection steps, this is used both in real-world media and online
media to collect as much data as that could. Real-world includes different types of
emotional pictures of friends and family members, relatives, some known
unknown people’s different kinds of facial expressions. They culled data was
initially stored for future analysis. From online media, the data is collected data set
from kaggle.com. This site uploaded this data set 6years ago. This site most trusted
data set of emotions. This converted the data into 48×48 pixel grayscale images of
faces. It contains two sections pixels and feelings. The feeling section contains a
numeric code which runs from 0 to 6. What's more, the pixel section contains a
string incorporated in statements for each picture. Furthermore, the picture should
be only the picture of a face. So the collected pictures are resized and cropped
picture of a face. And a clear picture.
The model type that we will be using is Sequential. Sequential is the easiest
way to build a model in Keras. It allows you to build a model layer by layer. We
use the ‘add ()’ function to add layers to our model. Our first 2 layers are Conv2D
layers. These are convolution layers that will deal with our input images, which are
seen as 2-dimensional matrices. 64 in the first layer and 32 in the second layer are
the number of nodes in each layer. This number can be adjusted to be higher or
lower, depending on the size of the dataset. In our case, 64 and 32 work well, so
we will stick with this for now. Kernel size is the size of the filter matrix for our
convolution. So a kernel size of 3 means we will have a 3x3 filter matrix. Refer
back to the introduction and the first image for a refresher on this. Activation is the
activation function for the layer. The activation function we will be using for our
first 2 layers is the ReLU or Rectified Linear Activation. This activation function
has been proven to work well in neural networks. Our first layer also takes in an
input shape. This is the shape of each input image, 28,28,1 as seen earlier on, with
the 1 signifying that the images are greyscaling. In between the Conv2D layers and
the dense layer, there is a ‘Flatten’ layer. Flatten serves as a connection between
the convolution and dense layers. The model will then make its prediction based on
which option has the highest probability. Next, need to compile that model.
Compiling the model takes three parameters: optimizer, loss, and metrics. The
optimizer controls the learning rate. It will be using ‘adam’ as their optimizer.
Adam is generally a good optimizer to use for many cases. The adam optimizer
adjusts the learning rate throughout the training. The learning rate determines how
fast the optimal weights for the model are calculated. A smaller learning rate may
lead to more accurate weights (up to a certain point), but the time it takes to
compute the weights will be longer. We will use 'categorical cross-entropy' for our
loss function. This is the most common choice for classification. A lower score
indicates that the model is performing better. To make things even easier to
interpret, that will use the ‘accuracy’ metric to see the accuracy score on the
validation set when that train the model. To train, it will use the ‘fit ()’ function on
their model with the following parameters: training data (train_X), target data
(train_y), validation data, and the number of epochs. For its validation data, it will
use the test set provided its dataset, which has split into X_test and y_test. The
number of epochs is the number of times the model will cycle through the data.
The more epochs we run, the more the model will improve, up to a certain point.
After that point, the model will stop improving during each epoch. For our model,
we will set the number of epochs to 3. After 3 epochs, it has gotten to 93%
accuracy on that validation set.
C. Detection
K-means clustering was used with the number of clusters taken as two. Here,
the maximum value in all rows is found out and its average is determined.
Similarly, the minimum value in all rows is found out and its average is
determined. Considering these two points as the base, the pixel values nearer to the
maximum average value are grouped into a cluster and the pixel values nearer to
the minimum average value are grouped into another cluster. Based on the
clustering result, the total number of components in the image is calculated. Based
on the number of components, he person’s eyes are segmented first by using
bounding box function. Since the eye or eyebrow forms the first element while
traversing the pixel values column-wise, the eyes are segmented first. Using the
eye matrix, other facial parts are segmented using a distance-based algorithm. The
resulting image after performing k-means clustering for different expressions are
shown.
i) Edge features
For example, we need face detection of a person then we need at first image
conversion in grayscale then second step image segmentation.
a Block diagram of FERC. The input image is (taken from camera or)
extracted from the video. The input image is then passed to the first-part CNN for
background removal. After background removal, facial expressional vector (EV) is
generated. Another CNN (the second-part CNN) is applied with the supervisory
model obtained from the ground-truth database. Finally, emotion from the current
input image is detected. b Facial vectors marked on the background-removed face.
Here, nose (N), lip (P), forehead (F), eyes (Y) are marked using edge detection and
nearest cluster mapping. The position left, right, and center are represented using
L, R, and C, respectively
Figure: Landmark image
For real image, If we calculate the black region and we average it's when we get
0.74 and the same way white region value is 0.18. So the difference between dark
and white: 0.74-0.18=0.56
Δ for real image: 0.56
Figure: Feature extraction
Convolution filter operation with the 3×3 kernel. Each pixel from the input
image and its eight neighboring pixels are multiplied with the corresponding value
in the kernel matrix, and finally, all multiplied values are added together to achieve
the final output value
Neural networks are typically organized in layers. Layers are made up of
several interconnected nodes that contain an activation function. Patterns are
presented to the network via the input layer, which communicates to one or more
hidden layers where the actual processing is done via a system of weighted
connections. This process of the facial expression recognition system is divided
into three stages- Image Pre Processing which involves Face and facial parts
detection using the viola-Jones algorithm, facial Feature extraction, and feature
classification using CNN. Keras is an open-source neural network in python, which
is used for the pre-processing, modeling, evaluating, and optimization. It is used
for high-level API as it handled by backend. It is designed for making a model
with loss and optimizer function, and training process with fit function. For
backend, it designed for convolution and low-level computation under tensors or
TensorFlow. Importing the below python libraries is used for preprocessing,
modeling, optimization, testing, and display emotion which having a maximum
percentage. It uses a sequential model and some layers such as image pre-
processing, convolution layer, pooling layer, flatten layers, and dense layers,
activation, ReLU. Image preprocessing is the first phase of the proposed system
and it involves the Face Detection and FPs detection and extraction. The Viola-
Jones face detection framework, which is a robust algorithm capable of processing
images extremely rapidly for real-time situations, is used. This algorithm detects
face region irrespective of variance in size, background, brightness, and spatial
transformation of the raw input image. The face FP detection is achieved by
combining classifiers in a cascade structure that is capable of increasing the
detection performance while reducing computational complexity. The final
classifier is computed by the linear combination of all weak classifiers, which
separates the positive and negative in terms of the weighted error (weight of each
learner is directly proportional to its accuracy). The face is first detected, cropped,
extracted and normalized to a size of 64 x 64 pixels, and then facial parts (both
eyes and mouth) are detected, cropped and extracted from the normalized face
image. The extracted facial parts are resized to equal size of 32 x 64 pixels. The
reduced image scale helps to reduce the information that has to be learned
by the network and also makes training faster and with less memory cost.
Convolution layers will be added for better accuracy for large datasets. The dataset
is collected from CSV file (in pixels format) and it's converted into images and
then classify emotions with respective expressions.
Here emotions are classified as happy, sad, angry, surprise, neutral, disgust,
and fear with 34,488 images for the training dataset and 1,250 for testing. Each
emotion is expressed with different facial features like eyebrows, opening the
mouth, raised cheeks, wrinkles around the nose, wide-open eyelids, and many
others. Trained the large dataset for better accuracy and result that is the object
class for an input image. Pooling is a concept in deep learning visual object
recognition that goes hand-in-hand with convolution. The idea is that a convolution
(or a local neural network feature detector) maps a region of an image to a feature
map. For example, a 5x5 array of pixels could be mapped to oriented edge features.
Flattening occurs when you reduce all Photoshop layers to one background layer.
Layers can increase file size, thereby also tying up valuable processing resources.
To keep down file size, you may choose to merge some layers or even flatten the
entire image to one background layer. The dense layer is the regular deeply
connected neural network layer. It is the most common and frequently used layer.
The dense layer does the below operation on the input and returns the output.
Based on the connection strengths (weights), inhibition or excitation, and transfer
functions, the activation value is passed from node to node. Each of the nodes
sums the activation values it receives; it then modifies the value based on its
transfer function. In Keras, it can implement dropout by added Dropout layers into
our network architecture. Each Dropout layer will drop a user-defined
hyperparameter of units in the previous layer every batch. Remember in Keras the
input layer is assumed to be the first layer and not added using the add. ReLU is
one of the most popular types of nonlinearity to use in neural networks that are
applied after the convolutional layer and before max pooling. It replaces all
negative pixel values in the feature map by zero. It normally used after the
convolutional layer.
Example:
At first, take the image from the user then remove the noise. Then identify
only the face of a person and applying the Haar features. Then match the image
with the previous training dataset. Here use the Keras library of python. It works
with the convolution neural network (CNN). CNN
works with a sequential model. It also uses some layers such as Conv2D,
MaxPooling2D, AveragePooling2D, Dense, Activation, Dropout, and Flatten.
After the approach, these layers select the emotion from the classification set. This
is the final output. After performing some pre-processing (if necessary), the
normalized face image is presented to the feature extraction part to find the key
features that are going to be used for classification. In another word, this module is
responsible for composing a feature vector that is well enough to represent the face
image. After doing this comparison, the face image is classified into one of the
seven expressions (anger, contempt, disgust, fear, happiness, sadness, surprise).
Machine Learning algorithms work well on the datasets that have a couple of
hundred highlights or segment. The algorithm successfully classifies an image and
classify the sentiment of the image and choose the match emotion for the image.
The reason behind choosing the deep learning classifier is that the classifier runs
data through several layers. And a deep learning algorithm can be useful for less
unpredictable issues since they gain admittance to an immense measure of
information to be compelling. For pictures, the regular benchmark for preparing
profound learning models for broad picture acknowledgment approaches more than
14 million pictures. For perfect visualization of emotion detection pattern analysis,
it used a decision tree. In the decision tree, the character is represented by the
nodes and layers, and also the outcome
of the experiment is represented by the branch. The advantage of the decision tree
is that it is very helpful and easy to visualize the emotion and interpret the result.
The working process of a decision tree is easy to understand. If it has been
classified the data according to their movement,
reactions, and order which ideally different types of emotions. This also has been
classified into trees and sub trees which reflects that whether the person is sad,
angry or happy, etc. if this could find something that can categorize their using
these methods more simply. To do this it has
been used retrain method that memorized the pattern and satisfies the condition.
When any of the condition is satisfied it carry on to the end of the tree. However, if
none of the conditions satisfy the intermediate condition, it will stop checking and
say ‚The emotion cannot be identified.
The emotion is unknown‛. Emotions are complicated to understand. There are
different kinds of expression for the same emotion. Different people give different
kinds of expression for the same kind of emotion. Modern-day machine learning
technology can help law-enforcement authority to detect emotion so the machine
can understand the emotion of humans and more behave and act like humans. This
data for emotion came from different online and offline media. Such as Google,
kaggel.com site. Friends and family, random people, etc. This is used Keras library
to initially classify and analyze the emotion and got that data. Then with the help
of Haar features and Numpy, It identifies the emotion. And with the help of
platform anaconda. It generates the output from the raw data where the result is
going to show in real-time. The hierarchical data mining procedure like decision
tree helps to generate probability decision by calculating various probability
decisions by calculating various characteristic which is
initially used to identify the emotion pattern. Along with offline and online data
collection, it also conducted an effective field study to gather more people and
various kinds of people and various emotional deferent expressions lots of different
faces. In online data collection, the data set is taken from kaggel.com. They
provide quality data sets. They converted the images into pixel grayscale and use
the numerical number of the images. So, it gives the quality data and the batter
result. Both of the experts believed that this analysis of sentiment could help
identify emotion more accurately and help to take accurate actions on behalf of
accurate emotion identification. It would provide more knowledge about different
types of expression of their
sentiment as well as the percentage of each existed various kinds of emotions.
While completing this work, we found that a large quantity of test data and
keywords are needed if it wants to get greater accuracy. A lack of a good quantity
of raw data is also required to extend the research work. A high configuration
graphics processing unit (GPU) qualified computer is also required if this wants to
process a large quantity of test data in the shortest time. So, if this gets adequate
data along with a high-performance computer, it will be easier for that to rise the
accuracy to
more than 97%. It will also be able to use that system for a different platform for a
different outcome and help to determine the emotion expression pattern.
CREATE DATASET
As we know, Keras has many data preprocessing APIs, and to feed the
model will use Keras’ data generation API and, according to Keras, the dataset
must be stored in specific directories. Keras’ data generation API expects images
to be sorted into separate directories like training and validation, and each
directory has a sub-directory according to their category.
Once the images are copied to their respective folder and sub-folder, we will
define image generators to preprocess the images. Keras provides the
ImageDataGenerator API for preprocessing. Python code for the training and
validation image generator is mentioned below.
Model Saving
The next step after compilation is to fit the model and, for that the
fit_generator function is used.
nb_train_samples = train_df.size
nb_validation_samples = valid_df.size
epochs = 10
history = base_model_VGG16.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples // batchsize,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batchsize,
callbacks=callbacks_list,
class_weight=class_weights,
epochs=epochs)# To save the model as .h5
extentionsaved_model_path =
"./saved_models/Emotion_Detection_Model.h5"
base_model_VGG16.save(saved_model_path)
To visualize the model prediction output as an image, we will use the below
method. Below the code plot is the image along with a bar chart indicating the
individual emotion percentage of the input image.
Testing is a process to identify and correct the errors in the proposed system.
Before implementing the system, it should be tested in the manner that it works
effectively under various conditions. Every new system must be tested in various
ways to make sure of its perfectas.
Unit testing
Module testing
Sub-system testing
System testing
Acceptance testing
Task testing
Behavioral testing
Intertask testing
User Interface testing
Integration testing
Unit Testing
Unit testing focuses verification and validation effort on the smallest unit of
software design i.e., the module. The unit testing in always-white box oriented and
a step can be conducted in parallel for modules. The software developer does not
turn the program to the ITG group and go away.
A program unit is usually small enough that the programmer who developed
can test in a great detail and certainly in greater detail that will be possible when
the unit is integrated into an evolving software product.
There are four categories of test that a programmer will typically perform on
a program unit.
Functional tests
Performance tests
Stress tests
Structure tests
Functional Tests
Functional test, where test cases involving exercising the code with nominal
input values for which the expected results are known, were done.
Performance Testing
This testing is designed to test the run time performance of software within
the content of an integrated system. This testing occurs through out all steps in the
testing process. It is concerned with the evaluation speed and memory utilization of
the program.
Stress Tests
Structure Tests
Structure testing is also referred to as White Box or Glass Box testing. The
project is tested for its execution in every module. The testing operation is
successfully done and every module performs properly.
Module Testing
The most common problem, which arises in large software systems, is sub-
system interface mismatches. The sub-system test process should therefore
concentrate on the detection of interface errors by rigorously exercising these
interfaces.
System Testing
The sub-systems are integrated to buildup the entire system. The testing
process is concerned with finding errors, which is an outcome of unanticipated
interaction between sub-system and system components. If is also concerned with
validating whether the system meets its functional and non-functional
requirements.
Accepting Testing
This is the final stage in the testing process before the system is accepted for
operational use. The system is tested data supplied by the system developed rather
than simulated test data acceptance testing may reveal error and omissions in the
system requirement definition because the real data exercises the system in
different ways from the test. The systems facilities do not really meet the users
needs or the system performance in unacceptable.
Task Testing
The first step in the testing of real-time software is to test each task
independently. That is, the white and black box tests are designed and executed for
each task. Each task is executed independently during the tests. The task testing
uncovers errors in logic and functions, but will not uncover timing or behavioral
errors.
Behavioural Testing
Using system modules created with case tools, it is possible to simulate the
behavior of external events. Using a technique that is similar to equivalence of
partitioning, events and categorized for testing.
Each of these events is tested individual and behavior of the executed system
is examined to detect an error that occurs as a Consequence of processing
associated with these events. Once, each class of events is tested, events are
presented to the system in random order and with frequency.
Intertask Testing
Once the errors in individual tasks and in system behavior have been
isolated, testing shifts to time-related errors the asynchronous task that are known
to communicate with one another are tested with different data rates and
processing load to determine if inter task synchronization errors will occur.
Integration Testing
The idea is to test combinations of pieces and eventually expand the process
to test your modules with those of other groups. Eventually all the modules making
up a process are tested together. Beyond that, if the program is composed of more
than one process, they should be tested in pairs rather than all at once.
Integration testing identifies problems that occur when units are combined.
By using a test plan that requires you to test each unit and ensure the viability of
each before combining units, you know that any errors discovered when combining
units are likely related to the interface between units. This method reduces the
number of possibilities to a far simpler level of analysis
IMPLEMENTATION
SAMPLE CODE
Implementing VGG16 Network for Classification of Emotions with GPU
First, we need to enable GPU in the Google Colab to get fast processing. We
can enable it by going to ‘Runtime’ in Google Colab and then clicking on ‘Change
runtime type’ and select GPU. Once it is enabled we will now import the required
libraries for building the network. The code for importing the libraries is given
below.
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers.core import Flatten, Dense, Dropout
from keras.layers.convolutional import Convolution2D,
MaxPooling2D, ZeroPadding2D
from keras.optimizers import SGD
import cv2
We have now imported all the libraries and now we will import the data set.
I have already saved it in my drive so I will read it from there. You can give
directory in round brackets where your data is stored as shown in the code below.
After importing we have printed the data frame as shown in the image.
emotion_data = pd.read_csv('/content/drive/My
Drive/Emotion_Detection /fer2013.csv')
print('emotion_data')
View of the data frame
We then create different lists of storing the testing and training image pixels.
After this, we check if the pixel belongs to training then we append it into the
training list & training labels. Similarly, for pixels belonging to the Public test, we
append it to testing lists. The code for the same is shown below.
X_train = []
y_train = []
X_test = []
y_test = []
for index, row in emotion_data.iterrows():
k = row['pixels'].split(" ")
if row['Usage'] == 'Training':
X_train.append(np.array(k)
y_train.append(row['emotion'])
elif row['Usage'] == 'PublicTest':
X_test.append(np.array(k))
y_test.append(row['emotion'])
Once we have added the pixel to the lists then we convert them into NumPy arrays
and reshape X_train, X_test. After doing this we convert the training labels and
testing labels into categorical ones. The code of the same is given below .
X_train = np.array(X_train')
y_train = np.array(y_train)
X_test = np.array(X_test)
y_test = np.array(y_test)
Now it’s time to design the CNN model for emotion detection with different
layers. We start with the initialization of the model followed by batch
normalization layer and then different convents layers with ReLu as an activation
function, max pool layers, and dropouts to do learning efficiently. You can also
change the architecture by initiating the layers of your choices with different
numbers of neurons and activation functions.
model = Sequential()
model.add(ZeroPadding2D((1,1),input_shape=(48,48,1)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(64, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, 3, 3, activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
model.add(Flatten())
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(7, activation='softmax'))
After this, we compile the model using Adam as an optimizer, loss as categorical
cross-entropy, and metrics as accuracy as shown in the below code.
model.compile(optimizer='sgd',loss='categorical_crossentropy',met
rics=['accuracy'])
After compiling the model we then fit the data for training and validation. Here, we
are taking the batch size to be 32 with 30 epochs. You can tune them according to
your wish.
model.fit(X_train,train_y,batch_size=32,epochs=30,verbose=1,valid
ation_data=(X_test, test_y))
Training of the Network
Once the training has been done we can evaluate the model and compute loss and
accuracy using the below code.
loss_and_metrics = model.evaluate(X_test,y_test)
print(loss_and_metrics)
We now serialize the model to JSON and save the model weights in an hd5 file so
that we can make use of this file to make predictions rather than training the
network again. You can do this task by using the below code.
model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
model.save_weights("model.h5")
print("Saved model to disk")
CONCLUSION
An experienced human can often identify another human’s emotions by
analyzing and looking at him or her. However, in this modern age machines are
becoming more intelligent. For the time been machines are trying to act more like
humans. If the machine has been trained on how to react on behalf of the human
sentiment at that time. Then the machine can behave and act like a human. On the
other hand, if the machine can identify the emotion it can prevent lots of
occurrences too. With increased proficiency and errorless computation emotion,
data mining can facilitate accurate expression patterns enabling machines to find
and act more like humans effectively. To determine the emotion expression
patterns this thesis is created or framework with comprehensive research and field
works. This followed the framework step by step to get the expected outcome. To
follow the framework and to identify the emotion expression patterns more
effectively and used deep learning CNN algorithm along with Keras, Tensorflow,
and retraining concepts. With these techniques, it was possible to identify
emotions, type of emotion in the real image. To delineate the result and procedures
more visually and this has also introduced decision tree techniques which helps to
decide which emotions percentage is high and which emotions percentage is low.
Now the high percentage of emotions get the most possible accurate emotions. And
the low percentage of emotions get the low chance of existence. With this
discovery, it is now possible to determine accurate emotions. And machines can
identify emotion more accurately and on behalf of that, they can give a proper
reaction and also can help to prevent the same unwonted occurrence. This machine
can also become the replacement of a human.
FUTURESCOPE
Emotion recognition is the process of machines detecting, interpreting, and
classifying human emotion based on facial characteristics.
Visual emotion analysis is a high-level vision task due to the effective gap between
small pixels and high-level emotions. Despite the challenges, visual emotion
analysis opens up possibilities because comprehending human emotions is a
crucial task in achieving robust AI. Due to the fast evolution of convolutional
neural networks, deep learning has become the dominant emotion detection and
identification model.
BIBLIOGRAPHY
[1]. Chu Wang, Jiabei Zeng, Shiguang Shan, Xilin Chen, ‚MULTI-TASK
[4]. Aitor Azcarate, Felix Hageloh, Koen van de Sande, Roberto Valenti,
‚Automatic facial emotion recognition,‛ January 2005
[5]. Dan Duncan, Gautam Shine, Chris English, "Facial Emotion Recognition in
Real-Time," November 2016
[6]. Shivam Gupta, ‚Facial emotion recognition in real-time and static images,‛ in
2nd International Conference on Inventive Systems and Control (ICISC) IEEE, 28
June 2018
[15]. Byoung Chul Ko, ‚A Brief Review of Facial Emotion Recognition Based on
Visual Information,‛ 30 January 2018 *18+. Dumas, Melanie, ‚Emotional
Expression Recognition using Support Vector Machines,‛ July 2001
[16]. Muzammil, Abdulrahman, ‚ Facial expression recognition using Support
Vector Machines,‛ in 23nd Signal Processing and Communications Applications
Conference (SIU) IEEE, 22 June 2015
[17]. Turabzadeh, Saeed & Meng, Hongying & Swash, M. & Pleva, Matus &
Juhár, Jozef, ‚Facial Expression Emotion Detection for Real-Time Embedded
Systems,‛ January 2018