Emotion Detection and Characterization Using Facial Features
Emotion Detection and Characterization Using Facial Features
Abstract—The human face has peculiar and specific classifier, Gabor filters, Discrete Wavelet Transform,
characteristics, therefore it becomes difficult in understanding Histogram of Gradients. The methodologies were defined for
and identifying the facial expressions. It is easy to identify the the respective models and the Cohn-Kanade dataset is used
facial expression of particular person in any image sequence. as a stimulus for evaluation of the models [2]. This is
If we look to automated recognition system, however, the followed by a comparative study of the said models and
systems available are quite inadequate and incapable results. The usual way for doing the evaluation is for
ofaccurately identify emotions. The area of facial expression complete facial expression but the main focus was to reduce
identification has many important applications. It is an the features to eyes and lips only.
interactive tool between humans and computers. The user,
without using the hand can go-ahead with the facial II. FACE DETECTION, EXTRACTION AND CLASSIFICATION
expressions. Presently, the research on facial expression are on
the factors i.e. sad, happy, disgust, surprise, fear and angry.
This paper aims to detect faces from any given image, extract In this section, a step-by-step approach towards the various
facial features (eyes and lips) and classify them into 6 emotions processes taking place for fulfilling this work has been
(happy, fear, anger, disgust, neutral, sadness). The training described. The final prediction of emotion is preceded by
data is passed through a series of filters and processes and is multiple processes.
eventually characterized through a Support Vector The first step will be to determine the face of the person in
Machine(SVM), refined using Grid Search. The testing data
the given input image, which is then succeeded by
then tests the data and their labels and gives the accuracy of
classification of the testing data in a classification report.
identifying the features (eyes and mouth). These features are
Various approaches, including passing the training images then passed through their respective filters and
through Gabor filter, or transforming images using Histogram transformation, if that is a part of the decided method. The
of Oriented Gradients(HOG) and Discrete Wavelet outputs are then sent to the classifiers to get classified
Transform(DWT) for better classification of data are according to the trained data. This gives us the output
implemented. The best result achieved so far is by passing the emotion predicted by the system.
training images through Histogram of Oriented These processes will be explained in much more detail
Gradients(HOG), followed by characterization by SVM, which in the next sub-sections and will give a holistic view of how
gives an average precision of 85%.
the system as a whole works with various comparative
Keywords—Characterization,FacialExpression, Emotions, methods.
Cascade, Classification,SVM, Kernel, Grid Search, Wavelet,
HOG, Precision A. Face Detection and Feature Extraction
I. INTRODUCTION
The key is to understand the human behavior and how it Face detection is regarded as one of the most complex
reacts or interact with the environment. Computer interface problems in computer vision, due to the large variations
provides technology which analyses human and computer caused due to changes in lighting, facial appearance and
interaction. Facial emotions convey the intention of a person. expressions.
The emotion communicates the state of the person such as Let’s solve all the stages step by step. For face detection
joy, sadness, anger. The human communication has one-third Viola Jones Algorithm is used. Though it was proposed in
part of verbal communication and two third part of nonverbal 2001 it is one of the simple and easiest method for face
communication[1]. Moreover, the facial expressions are an detection giving high accuracy[3].
important means of interpersonal communication. Therefore,
the facial expression is a key means for the detection of This algorithmuses Haar based feature filters. The
emotions. The non-verbal interaction among humans is objective of this filter is to find the face in an image given as
through facial expression. The reason behind this interaction input.In each sub window Haar features are calculated and
is that humans can identify the emotion in an efficient and this difference is compared with the learned threshold that
prompt manner. Thus, there exists a demand to develop a separates objects and non-objects.Haar features are weak
machine, which can recognize the human emotion. classifiers so a large number of Haar classifiers are organized
in such a way that they form a strong classifier which is
The objective of the work is to evaluate the performance called as “Classifier Cascade”.
using different models and their combinations. The models
include Support Vector Machine, Linear Discriminant Each classifier looks at the sub window and determines if
Analysis, Principal Component Analysis, Fisherface the sub window looks like a face and if it does then the next
Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on November 07,2024 at 17:39:42 UTC from IEEE Xplore. Restrictions apply.
classifier is applied. If all the classifiers give a positive A Support Vector Machine (SVM) is a machine-learning
answer then face is there in the sub window otherwise size of algorithm, utilizing supervised mechanism of classification.
sub window is changed and whole process is repeated till the It classifies by utilizing a separating hyperplane
face is detect [4]. discriminatively. The algorithm creates an optimal
hyperplane which categorizes the testing data. The classifier
Similarly, Eyes and mouth Cascades are used to detect in two-dimensional form represents a line which consists of
eyes and mouth in the sub window in which face is detected. classes on either side. This classifier utilizes the “kernel
trick”. This trick uses specific mathematical formulae to
project the data into feature space of higher dimensions,
B. Feature Classification: Different Approaches Towards where a hyperplane creates the possible boundaries among
Classification of Data possible outputs. SVM is usually used to solve classification
Once faces have been detected and the required features or regression problems. A kernel takes data as input and
have been extracted, it is now time to put them through transforms it to the required form. The product between two
various methods designed to simplify and classify them into points is returned by kernels. Thus, high dimensional
6 emotions (happy, sadness, fear, neutral, anger, disgust). projection becomes possible with very low computational
cost. The kernels used in SVM are as follows:
It is important to note that each and every method
described below is independent of each other in terms of Linear kernel : , = ⋅ (2)
application. The section heading defines the combination of
filter(s)/transform(s)/classifier(s) used in the respective Polynomial kernel : , =( ( ⋅ )+ ) (3)
method.This section gives a brief account of all the methods
used, along with description and methodology. Radial biased function kernel: , = (− ‖ −
Methodology highlights the application of that method in ‖ )(4)
the dataset being used in this work.
Sigmoid kernel : , = ℎ( ( ⋅ )+ ) (5)
Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on November 07,2024 at 17:39:42 UTC from IEEE Xplore. Restrictions apply.
The images obtained by feature extraction are initially that scaling functions are mutually perpendicular to its
passed through the Gabor filter to highlight edges of the discreet form and must stay so, DWT is an effective way to
facial features, as well as their texture (See Fig. 1 and Fig. reducing noise and transforming data [6].
2) and then the resulting labeled data is decomposed into
DWT or Discrete Wavelet Transform is a method which
pixel arrays and passed through the SVM classifier
decomposes the image into wavelets. The features are
(discussed previously). The parameters of Gabor filter were
transformed to a wavelet coefficient spectrum, consisting of
set to appropriate values, and various combinations of
certain values called signal data points. A data vector is in
kernels provided accuracies, out of which the maximum
turn created which has the same size as our input. Scaling of
value was taken into consideration.
wavelets occurs according to the images of eyes and mouth,
and the respective data vectors are created. The data vector
received through the DWT of the data is then fed to the
SVM classifier, and best results are recorded.
Fig 3. Image of eyes before and after applying HOG filter (left to right)
Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on November 07,2024 at 17:39:42 UTC from IEEE Xplore. Restrictions apply.
The images saved after going through the Discreet Wavelet transformed into its HOG form, thus aiming to reveal
Transformation(DWT) are processed through the HOG filter features of the given face in much more clarity. The classifier
to create output feature vectors of every image (See Fig. 5 receives information in form of gradients given by the HOG
and Fig. 6) and are then compiled into a single dataset. This transformation.
dataset describes the magnitude and vector of every pixel and
is passed through SVM for classification.
Discreet Wavelet Transform aims to convert the given
image into varying scaling factors of a wave, which is further
TABLE 1. RESULTS OF VARIOUS CLASSIICATION METHODS CORRESPONDING TO SVM COMBINED WITH OTHER FILTERS
Fig 5. Image of eyes before and after applying HOG filter with DWT (left to Fig 6. Image of mouth BEFORE AND AFTER APPLYING HOG filter with DWT
right) (left to right)
The Fischerface classifier uses Principal Component
III. DATASET Analysis (PCA) and Linear Discriminant Analysis (LDA),
The Extended Cohn Kanade Dataset (CK+) has 593 both of which contribute to its accuracy. PCA aims to reduce
numbers of image sequences (327 image sequence contains dimensionality, such that variables in the dataset are reduced
the emotion labels) from 8 facial expressions namely to a minimum. The new set of variables, called principal
neutral, sadness, fear, happiness, surprise, anger, disgust, components make the classification much simpler in terms of
contempt. The emotions considered in this dataset are 6 out space complexity. It gives a precision output of 0.74, which
of the 8, namely neutral, sadness, fear, happiness, anger and is fairly good, but still not enough.
disgust. The image sequences begin with a neutral Gabor filter recognizes texture in any given image and
expression and end towards a peak face expression. Each creates frequency and orientation components. This gives
frame size has a resolution of 640 x 490 and are usually Gabor filter an edge over other filters, as difference in
grey. To begin with, data is classified into two folders, one texture could be a very efficient way to differentiate mouth
has collection of images and the other containing the text and eyes from the rest of the skin. These filters also
file [7]. decompose in multiple dimensions in space. This might also
be the reason why SVM works best with this filter, as
IV. EXPERIMENTAL SETUP projecting in higher spatial dimensions is a property
common to both the filter and the classifier in this case, thus
The experiment utilizes the Cohn-Kanade dataset by giving a precision of 0.81.
extracting the extreme image showing the required
expression,amongst a series of images of any individual The most precise classification of 0.85 occurs by using a
combination of Histogram of Gradients (HOG) filter
from neutral to the labelled expression. These images are
followed by classification by SVM. This combination has
then fed to their respective filters, which finally convert the been used frequently in the past due to its high precision rate,
image to a CSV file consisting of data representing that and due to valid reason. The method by which HOG
image, which is fed to the classifier to predict results. Real partitions the image into boxes, so as to provide features of
time images are taken on mobile phones, and then processed every box available, makes the data much more precise, and
through the system accordingly. reduces noise from the image. The classifier thus gets an
image which not only has less noise, but also has a strong
sense of direction. Gabor filters are not as powerful in cases
V.RESULTS AND COMPARATIVE STUDY where the texture is more complex, such as in the face of
disgust, but in such cases, HOG filters have the upper hand,
due to simplicity of data. Simply applying SVM also gives a
The classification of emotions carried out in previous
precision of 0.80, which is due to the ability of a simple
sectionsinvolves multiple approaches towards characterizing
SVM algorithm to project the pixel densities of trained
the given dataset, that too with a combination of multiple
images into higher dimensions.
methods(See Table 1).
Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on November 07,2024 at 17:39:42 UTC from IEEE Xplore. Restrictions apply.
Code Output (Right half of image):
[‘sadness’]
[Parallel(n_jobs=1)]: Done 60 out of 60 | elapsed: 4.0s finished
Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on November 07,2024 at 17:39:42 UTC from IEEE Xplore. Restrictions apply.
REFERENCES [6] H.K. Meena, K.K. Sharma and S.D. Joshi, Improved facial
expression recognition using graph signal
[1] Byoung Chul Ko. ABrief Review of Facial EmotionRecognition processing, Electronics Letters ( Volume: 53, Issue: 11, 5 25
Basedon Visual Information, Sensors 2018, 18, 401. 2017 ).
[2] Kanade, T., Cohn, J. F., & Tian, Y. (2000). Comprehensive [7] Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., &
database for facial expression analysis. Proceedings of the Matthews, I. (2010). The Extended Cohn-Kanade Dataset
Fourth IEEE International Conference on Automatic Face and (CK+): A complete expression dataset for action unit and
Gesture Recognition (FG’00), Grenoble, France, 46-53. emotion-specified expression. Proceedings of the Third
[3] Paul Viola , Michael Jones, Rapid object detection using a International Workshop on CVPR for Human Communicative
boosted cascade of simple features, Proceedings of the 2001 Behavior Analysis (CVPR4HB 2010), San Francisco, USA, 94-
IEEE Computer Society Conference on Computer Vision and 101.
Pattern Recognition. CVPR 2001. [8] Image taken from “http:/ /i1.wp.com/ detourphotography.ca
[4] Prof. Neelum Dave, NarendraPatil, RohitPawar, /wp-content/ uploads/2015/11/DSC_8822.jpg ?resize = 236 %
Digambarpople, Emotion Detection Using Face Recognition 2C355”. Accessed on 21 Aug.2018.
,IJESC Volume 7 Issue No.4. [9] Image taken from “https://www.dreamstime.com/close-up-
[5] T. Ahsan, T. Jabid, and U.-P. Chong,Facial expression happy-middle-eastern-man-s-face-white-background-image
recognition using local transitional pattern on Gabor Filtered 111174542”. Accessed on 21 Aug. 2018.
facial image,. IETE Technical Review, 30(1):47{52, 2013}.
Authorized licensed use limited to: UNIVERSIDAD DE ALICANTE . Downloaded on November 07,2024 at 17:39:42 UTC from IEEE Xplore. Restrictions apply.