0% found this document useful (0 votes)
23 views

Emotion Detection and Characterization Using Facial Features

Uploaded by

Laura De Gea
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Emotion Detection and Characterization Using Facial Features

Uploaded by

Laura De Gea
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

3rd International Conference and Workshops on Recent Advances and Innovations in Engineering, 22-25 November 2018

(IEEE Conference Record # 43534)

Emotion Detection and Characterization using


Facial Features
Charvi Jain Kshitij Sawant, Mohammed Rehman Rajesh Kumar
Computer Science and Electronics and Electronics and Dept. of Electrical Engineering
Engineering,Indian Institute of Communication Engineering Communication Engineering Malaviya National Institute of
InformationTechnology Manipal University Manipal University Technology
H.P, India Jaipur, India Jaipur, India Jaipur, India
[email protected] [email protected] [email protected] [email protected]

Abstract—The human face has peculiar and specific classifier, Gabor filters, Discrete Wavelet Transform,
characteristics, therefore it becomes difficult in understanding Histogram of Gradients. The methodologies were defined for
and identifying the facial expressions. It is easy to identify the the respective models and the Cohn-Kanade dataset is used
facial expression of particular person in any image sequence. as a stimulus for evaluation of the models [2]. This is
If we look to automated recognition system, however, the followed by a comparative study of the said models and
systems available are quite inadequate and incapable results. The usual way for doing the evaluation is for
ofaccurately identify emotions. The area of facial expression complete facial expression but the main focus was to reduce
identification has many important applications. It is an the features to eyes and lips only.
interactive tool between humans and computers. The user,
without using the hand can go-ahead with the facial II. FACE DETECTION, EXTRACTION AND CLASSIFICATION
expressions. Presently, the research on facial expression are on
the factors i.e. sad, happy, disgust, surprise, fear and angry.
This paper aims to detect faces from any given image, extract In this section, a step-by-step approach towards the various
facial features (eyes and lips) and classify them into 6 emotions processes taking place for fulfilling this work has been
(happy, fear, anger, disgust, neutral, sadness). The training described. The final prediction of emotion is preceded by
data is passed through a series of filters and processes and is multiple processes.
eventually characterized through a Support Vector The first step will be to determine the face of the person in
Machine(SVM), refined using Grid Search. The testing data
the given input image, which is then succeeded by
then tests the data and their labels and gives the accuracy of
classification of the testing data in a classification report.
identifying the features (eyes and mouth). These features are
Various approaches, including passing the training images then passed through their respective filters and
through Gabor filter, or transforming images using Histogram transformation, if that is a part of the decided method. The
of Oriented Gradients(HOG) and Discrete Wavelet outputs are then sent to the classifiers to get classified
Transform(DWT) for better classification of data are according to the trained data. This gives us the output
implemented. The best result achieved so far is by passing the emotion predicted by the system.
training images through Histogram of Oriented These processes will be explained in much more detail
Gradients(HOG), followed by characterization by SVM, which in the next sub-sections and will give a holistic view of how
gives an average precision of 85%.
the system as a whole works with various comparative
Keywords—Characterization,FacialExpression, Emotions, methods.
Cascade, Classification,SVM, Kernel, Grid Search, Wavelet,
HOG, Precision A. Face Detection and Feature Extraction
I. INTRODUCTION
The key is to understand the human behavior and how it Face detection is regarded as one of the most complex
reacts or interact with the environment. Computer interface problems in computer vision, due to the large variations
provides technology which analyses human and computer caused due to changes in lighting, facial appearance and
interaction. Facial emotions convey the intention of a person. expressions.
The emotion communicates the state of the person such as Let’s solve all the stages step by step. For face detection
joy, sadness, anger. The human communication has one-third Viola Jones Algorithm is used. Though it was proposed in
part of verbal communication and two third part of nonverbal 2001 it is one of the simple and easiest method for face
communication[1]. Moreover, the facial expressions are an detection giving high accuracy[3].
important means of interpersonal communication. Therefore,
the facial expression is a key means for the detection of This algorithmuses Haar based feature filters. The
emotions. The non-verbal interaction among humans is objective of this filter is to find the face in an image given as
through facial expression. The reason behind this interaction input.In each sub window Haar features are calculated and
is that humans can identify the emotion in an efficient and this difference is compared with the learned threshold that
prompt manner. Thus, there exists a demand to develop a separates objects and non-objects.Haar features are weak
machine, which can recognize the human emotion. classifiers so a large number of Haar classifiers are organized
in such a way that they form a strong classifier which is
The objective of the work is to evaluate the performance called as “Classifier Cascade”.
using different models and their combinations. The models
include Support Vector Machine, Linear Discriminant Each classifier looks at the sub window and determines if
Analysis, Principal Component Analysis, Fisherface the sub window looks like a face and if it does then the next

• 978-1-5386-4525-3/18/$31.00 ©2018 IEEE

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on November 07,2024 at 17:39:42 UTC from IEEE Xplore. Restrictions apply.
classifier is applied. If all the classifiers give a positive A Support Vector Machine (SVM) is a machine-learning
answer then face is there in the sub window otherwise size of algorithm, utilizing supervised mechanism of classification.
sub window is changed and whole process is repeated till the It classifies by utilizing a separating hyperplane
face is detect [4]. discriminatively. The algorithm creates an optimal
hyperplane which categorizes the testing data. The classifier
Similarly, Eyes and mouth Cascades are used to detect in two-dimensional form represents a line which consists of
eyes and mouth in the sub window in which face is detected. classes on either side. This classifier utilizes the “kernel
trick”. This trick uses specific mathematical formulae to
project the data into feature space of higher dimensions,
B. Feature Classification: Different Approaches Towards where a hyperplane creates the possible boundaries among
Classification of Data possible outputs. SVM is usually used to solve classification
Once faces have been detected and the required features or regression problems. A kernel takes data as input and
have been extracted, it is now time to put them through transforms it to the required form. The product between two
various methods designed to simplify and classify them into points is returned by kernels. Thus, high dimensional
6 emotions (happy, sadness, fear, neutral, anger, disgust). projection becomes possible with very low computational
cost. The kernels used in SVM are as follows:
It is important to note that each and every method
described below is independent of each other in terms of Linear kernel : , = ⋅ (2)
application. The section heading defines the combination of
filter(s)/transform(s)/classifier(s) used in the respective Polynomial kernel : , =( ( ⋅ )+ ) (3)
method.This section gives a brief account of all the methods
used, along with description and methodology. Radial biased function kernel: , = (− ‖ −
Methodology highlights the application of that method in ‖ )(4)
the dataset being used in this work.
Sigmoid kernel : , = ℎ( ( ⋅ )+ ) (5)

• Fisherface Classifier where,


, = two samples shown as feature vectors in input space
Linear Discriminant Analysis is a supervised algorithm = , where is free parameter
that aims for classification of the input dataset. They d = degree of polynomial
analyze the sub space that matches the given vectors of the = free parameter
same class in a single blot of the feature presentation and the
different classes. Thus it improves the ratio between the
class scatter to the within class scatter. The sub space The algorithm employed in this case is classifying the
presentation of a group of face images, the outcome of the dataset of faces into six basic emotions (anger, disgust, fear,
basis vector resulting that spaces are defined as Fisherfaces. happy, neutral, sadness) by projecting it into a 3D feature
They are helpful when facial images have wide differences space using kernels and separating them using appropriate
in facial expressions and illumination. Principal component hyperplanes. Initially, only lips were taken into
Analysis is predominantly used for dimensionality reduction consideration, and data consisting of images of lips
in facial classification, image compression, etc. converted into pixel arrays and their respective labels of 6
different emotions were split into training and testing data.
Initially, the training data is to be reduced to at least N-c This data has been trained by the SVM classifier and Grid
dimension using Principal Component Analysis where N Search. Grid search is a sub algorithm, which checks
represents the number of images in the images in the various values of ‘C’ and ‘gamma’ in the function and give
training set and c represent the number of classes. out the best combination of the variables possible. The
Thereafter Linear Discriminant Analysis is applied to kernel with maximum accuracy as output for the testing data
further reduce the projected data.The equation for finding is taken into consideration. Later, images of eyes were
optimum weight is as follows: included along with those of lips, and the resulting labeled
data was passed through the SVM classifier and Grid
( ) =( ) ( ) (1) Search, and results were obtained for the combined dataset.

Where, • Gabor Filter + SVM


( ) = Projection representing the reduction in PCA-
space
Gabor filters have a mask i.e. an array of pixels. The pixels
( ) = Further Projection representing further reduced
are then given respective values, which are in a way, used as
in LDA-space.
their weights. This array is convoluted with the entire
image, pixel by pixel. Gabor filters change their values
Both of the given projection, ( ) has combination of
according the texture of the image. They give higher values
PCA and LDA.The precision of classification for Fisherface
at edges and points where the texture changes
classifier turns out to be 0.74.
Gabor filters are used to detect changes in texture as well as
edges in images [5].

• Support Vector Machine

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on November 07,2024 at 17:39:42 UTC from IEEE Xplore. Restrictions apply.
The images obtained by feature extraction are initially that scaling functions are mutually perpendicular to its
passed through the Gabor filter to highlight edges of the discreet form and must stay so, DWT is an effective way to
facial features, as well as their texture (See Fig. 1 and Fig. reducing noise and transforming data [6].
2) and then the resulting labeled data is decomposed into
DWT or Discrete Wavelet Transform is a method which
pixel arrays and passed through the SVM classifier
decomposes the image into wavelets. The features are
(discussed previously). The parameters of Gabor filter were
transformed to a wavelet coefficient spectrum, consisting of
set to appropriate values, and various combinations of
certain values called signal data points. A data vector is in
kernels provided accuracies, out of which the maximum
turn created which has the same size as our input. Scaling of
value was taken into consideration.
wavelets occurs according to the images of eyes and mouth,
and the respective data vectors are created. The data vector
received through the DWT of the data is then fed to the
SVM classifier, and best results are recorded.

• Histogram Of Gradients (HOG+ SVM)

Fig 1(a). Image of eyes before applying Gabor filter


In a HOG filter, gradients of any image are decided by
distributing the image into smaller parts called cells. The
description of gradients decides the orientation and
magnitude of pixels in the image. These orientation bins
related gradients are fixed for a cell. HOG scans the entire
image and extracts the required features.
A gradient proportional to the magnitude and direction at
Fig 1(b). Image of eyes after applying Gabor filter that given pixel is decided by the cell. Image recognition
and object detection algorithms get an efficient boost using
this filter. It is not very useful for viewing the image,
though. The feature vector produced by these algorithms
produce good results, when fed into an image classification
algorithm like Support Vector Machine (SVM).
The images are processed through the HOG filter to create
output feature vectors of every image according to its
magnitude and gradient (See Fig. 3 and Fig. 4) and are then
compiled into a single dataset. This dataset describes the
magnitude and vector of every pixel and is passed through
SVM for classification.

Fig 2(a). Image of mouth beforeapplying Gabor filter

Fig 3. Image of eyes before and after applying HOG filter (left to right)

Fig 2(b). Image of mouth beforeapplying Gabor filter

• Discrete Wavelet Transform(DWT) + SVM


Fig 4. Image of mouth before and after applying HOG filter (left to right)
The Discreet Wavelet Transform (DWT) decomposes
signals into mutually orthogonal wavelets. A wavelet is
created using the scaling function. Discreet values are • DWT + HOG + SVM
formed for representing the wavelet scales. The scaling
properties describe the pattern of image. Other than the fact

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on November 07,2024 at 17:39:42 UTC from IEEE Xplore. Restrictions apply.
The images saved after going through the Discreet Wavelet transformed into its HOG form, thus aiming to reveal
Transformation(DWT) are processed through the HOG filter features of the given face in much more clarity. The classifier
to create output feature vectors of every image (See Fig. 5 receives information in form of gradients given by the HOG
and Fig. 6) and are then compiled into a single dataset. This transformation.
dataset describes the magnitude and vector of every pixel and
is passed through SVM for classification.
Discreet Wavelet Transform aims to convert the given
image into varying scaling factors of a wave, which is further
TABLE 1. RESULTS OF VARIOUS CLASSIICATION METHODS CORRESPONDING TO SVM COMBINED WITH OTHER FILTERS

METHODS Anger Disgust Fear Happy Neutral Sad Avg.


HOG + SVM 0.80 0.80 0.40 0.94 0.92 0.75 0.85
Gabor filter + SVM 0.88 0.67 0.75 0.90 0.74 1.00 0.81
SVM 0.86 0.54 0.80 0.94 0.69 1.00 0.80
DWT + SVM 1.00 0.50 0.50 0.78 0.71 0.33 0.72
DWT + HOG + SVM 1.00 0.36 0.50 0.87 0.65 0.29 0.70

Fig 5. Image of eyes before and after applying HOG filter with DWT (left to Fig 6. Image of mouth BEFORE AND AFTER APPLYING HOG filter with DWT
right) (left to right)
The Fischerface classifier uses Principal Component
III. DATASET Analysis (PCA) and Linear Discriminant Analysis (LDA),
The Extended Cohn Kanade Dataset (CK+) has 593 both of which contribute to its accuracy. PCA aims to reduce
numbers of image sequences (327 image sequence contains dimensionality, such that variables in the dataset are reduced
the emotion labels) from 8 facial expressions namely to a minimum. The new set of variables, called principal
neutral, sadness, fear, happiness, surprise, anger, disgust, components make the classification much simpler in terms of
contempt. The emotions considered in this dataset are 6 out space complexity. It gives a precision output of 0.74, which
of the 8, namely neutral, sadness, fear, happiness, anger and is fairly good, but still not enough.
disgust. The image sequences begin with a neutral Gabor filter recognizes texture in any given image and
expression and end towards a peak face expression. Each creates frequency and orientation components. This gives
frame size has a resolution of 640 x 490 and are usually Gabor filter an edge over other filters, as difference in
grey. To begin with, data is classified into two folders, one texture could be a very efficient way to differentiate mouth
has collection of images and the other containing the text and eyes from the rest of the skin. These filters also
file [7]. decompose in multiple dimensions in space. This might also
be the reason why SVM works best with this filter, as
IV. EXPERIMENTAL SETUP projecting in higher spatial dimensions is a property
common to both the filter and the classifier in this case, thus
The experiment utilizes the Cohn-Kanade dataset by giving a precision of 0.81.
extracting the extreme image showing the required
expression,amongst a series of images of any individual The most precise classification of 0.85 occurs by using a
combination of Histogram of Gradients (HOG) filter
from neutral to the labelled expression. These images are
followed by classification by SVM. This combination has
then fed to their respective filters, which finally convert the been used frequently in the past due to its high precision rate,
image to a CSV file consisting of data representing that and due to valid reason. The method by which HOG
image, which is fed to the classifier to predict results. Real partitions the image into boxes, so as to provide features of
time images are taken on mobile phones, and then processed every box available, makes the data much more precise, and
through the system accordingly. reduces noise from the image. The classifier thus gets an
image which not only has less noise, but also has a strong
sense of direction. Gabor filters are not as powerful in cases
V.RESULTS AND COMPARATIVE STUDY where the texture is more complex, such as in the face of
disgust, but in such cases, HOG filters have the upper hand,
due to simplicity of data. Simply applying SVM also gives a
The classification of emotions carried out in previous
precision of 0.80, which is due to the ability of a simple
sectionsinvolves multiple approaches towards characterizing
SVM algorithm to project the pixel densities of trained
the given dataset, that too with a combination of multiple
images into higher dimensions.
methods(See Table 1).

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on November 07,2024 at 17:39:42 UTC from IEEE Xplore. Restrictions apply.
Code Output (Right half of image):
[‘sadness’]
[Parallel(n_jobs=1)]: Done 60 out of 60 | elapsed: 4.0s finished

Discrete Wavelet Transform (DWT) creates sub-signals in


the horizontal, vertical and diagonal direction, which are then
analyzed to gain a general form of the image taken.This
transformation is very useful in detecting patterns, or
abnormalities among regular trends. In face detection
however, DWT falls back as compared to Gabor filter and
HOG. This happens due to the variation in every image that
has to be trained. Change of skin tone, differing features in
the greyscale image of a face can tend to confuse the
Fig 7. Input face image and predicted result “neutral”(Classification Wavelet specifications, thus giving a precision of 0.72.
method = HOG + SVM) [8]
Finally, passing the images through a DWT filter
Code Output (Right half of image): followed by an HOG filter and then classifying by SVM
[‘neutral’] gives us a precision of 0.70. This is bound to occur, as
[Parallel(n_jobs=1)]: Done 60 out of 60 | elapsed: 3.8s finished
transforming the image to DWT form, as seen before, has
reduced the precision to 0.72. The Wavelet features obtained
by the output of DWT do not provide any sense of direction
or magnitude, which is exactly what the HOG filter is
supposed to receive and analyze. This leads to unnecessary
information being passed to the HOG filter, which the
classifier classifies with a precision of 0.70. Also, horizontal
details of mouth might appear very similar, thus resulting in
decrease of precision.
When we overview all the used filters above, it is eminent
that the main contributors of classification are the
directionality and magnitude of a facial image, followed by
Fig 8. Input face image and predicted result “happy” the texture differences in that particular image. Wavelet
(Classification method = HOG + SVM) [9] transformation seems to provide satisfactory results, whereas
the combination of HOG followed by SVM classifier does
Code Output (Right half of image): not disappoint.
[‘happy’]
[Parallel(n_jobs=1)]: Done 60 out of 60 | elapsed: 3.9s finished VI. CONCLUSION
Emotions are an integral method of expressing our
judgement and decisions in daily life, and this work aims to
recognize and detect exactly these emotions.This work is
capable of recognizing 6 integral emotions – Happy, Sad,
Anger, Fear, Neutral and Disgust; with the help of the
Support Vector Machine algorithm. This image primarily
uses only 2 crucial features of the face, namely eyes and
mouth, to detect an emotion within a face. The Viola Jones
algorithm is utilized in order to detect face and features of
the individual in the input photo. These features include
Fig 9. Input real time photo and predicted result “angry”
(Classification method = HOG + SVM) eyes and mouth.
The predictions occurred by simply applying SVM are not
Code Output (Right half of image): as accurate as when an HOG filter is applied before
[‘anger’] classification. This happens due to lack of distinguishable
[Parallel(n_jobs=1)]: Done 60 out of 60 | elapsed: 3.7s finished
features without the HOG transform. The advantage of this
system is that it needs only 2 features for detecting the
emotion of a complete face, which satisfyingly decreases
amount of storage data necessary for testing and for future
applications.
ACKNOWLEDGMENT
The authors would like to thank all the members
ofRAMAN Lab,MNIT especially Ms. Vishu Gupta for
providing excellent guidance. Theyare grateful to them for
this opportunity and support.
Fig 10. Input real time photo and predicted result “sadness”
(Classification method = HOG + SVM)

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on November 07,2024 at 17:39:42 UTC from IEEE Xplore. Restrictions apply.
REFERENCES [6] H.K. Meena, K.K. Sharma and S.D. Joshi, Improved facial
expression recognition using graph signal
[1] Byoung Chul Ko. ABrief Review of Facial EmotionRecognition processing, Electronics Letters ( Volume: 53, Issue: 11, 5 25
Basedon Visual Information, Sensors 2018, 18, 401. 2017 ).
[2] Kanade, T., Cohn, J. F., & Tian, Y. (2000). Comprehensive [7] Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., &
database for facial expression analysis. Proceedings of the Matthews, I. (2010). The Extended Cohn-Kanade Dataset
Fourth IEEE International Conference on Automatic Face and (CK+): A complete expression dataset for action unit and
Gesture Recognition (FG’00), Grenoble, France, 46-53. emotion-specified expression. Proceedings of the Third
[3] Paul Viola , Michael Jones, Rapid object detection using a International Workshop on CVPR for Human Communicative
boosted cascade of simple features, Proceedings of the 2001 Behavior Analysis (CVPR4HB 2010), San Francisco, USA, 94-
IEEE Computer Society Conference on Computer Vision and 101.
Pattern Recognition. CVPR 2001. [8] Image taken from “http:/ /i1.wp.com/ detourphotography.ca
[4] Prof. Neelum Dave, NarendraPatil, RohitPawar, /wp-content/ uploads/2015/11/DSC_8822.jpg ?resize = 236 %
Digambarpople, Emotion Detection Using Face Recognition 2C355”. Accessed on 21 Aug.2018.
,IJESC Volume 7 Issue No.4. [9] Image taken from “https://www.dreamstime.com/close-up-
[5] T. Ahsan, T. Jabid, and U.-P. Chong,Facial expression happy-middle-eastern-man-s-face-white-background-image
recognition using local transitional pattern on Gabor Filtered 111174542”. Accessed on 21 Aug. 2018.
facial image,. IETE Technical Review, 30(1):47{52, 2013}.

Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on November 07,2024 at 17:39:42 UTC from IEEE Xplore. Restrictions apply.

You might also like