0% found this document useful (0 votes)
8 views

Face Expression Detection Using CNN

The document discusses facial expression detection using convolutional neural networks. It provides background on facial expression recognition and challenges. The architecture of CNNs is described, including convolutional layers, pooling layers, and fully connected layers. Applications of facial expression detection are also mentioned.

Uploaded by

lalala lalala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Face Expression Detection Using CNN

The document discusses facial expression detection using convolutional neural networks. It provides background on facial expression recognition and challenges. The architecture of CNNs is described, including convolutional layers, pooling layers, and fully connected layers. Applications of facial expression detection are also mentioned.

Uploaded by

lalala lalala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Face Expression Detection Using CNN

Karan J, Kartheek Kumar R,


Department of Electronics and Communication, Easwari Engineering College.
Address
[email protected]

[email protected]

Abstract:

The ability to recognize facial expressions using computer vision is a crucial task that has numerous potential
applications. Although deep neural networks have achieved high performance, their use in the recognition of facial
expressions is still challenging. This is because different facial expressions have varying degrees of similarities among
themselves, and numerous variations because diversity in the same facial images .We propose a novel divide-and-conquer-
based learning strategy to improve the performance of facial expression recognition (FER). Due to globalization and digital
divide facial expression detection has received the primary attention in order to identify the criminals and breaches. Facial
expressions are the changes occurring on the human face indicating a person's internal emotional states, intents or societal
communications. Depending on the expressions on the face, human face is the most principal mode of conveying and
deducing affective states of human ones. In real time, facial expression detection has become a prominent research area as it
plays an important role in Human Computer Interaction. The applications of the facial expression detection are computer
vision, biometric security, social interaction, emotional intelligence and social intelligence.

Keyword:

Convolutional Neural Networks, Emotion Recognition, Image Processing, Facial Feature Extraction, Facial
Expression Dataset, Machine Learning, Facial Feature Representation, Neural Network Architecture, Feature Extraction.

Introduction: recognition of facial expressions addresses the problems


surrounding the representation and categorization of static
A Facial expression is the visible manifestation or dynamic characteristics of these deformations of face
of the affective state, cognitive activity, intention, pigmentation. In machine learning, a convolutional neural
personality and psychopathology of a person and plays a network (CNN, or ConvNet) is a type of feedforward
communicative role in interpersonal relations. Human artificial neural network in which the connectivity pattern
facial expressions can be easily classified into 7 basic between its neurons is inspired by the organization of the
emotions: happy, sad, surprise, fear, anger, disgust, and animal visual cortex. Individual cortical neurons respond
neutral. Our facial emotions are expressed through to stimuli in a restricted region of space known as the
activation of specific sets of facial muscles. These
sometimes subtle, yet complex, signals in an expression
often contain an abundant amount of information about
our state of mind. Automatic recognition of facial Receptive field. The receptive fields of different neurons
expressions can be an important component of natural partially overlap such that they tile the visual field. The
human machine interfaces; it may also be used in response of an individual neuron to stimuli within its
behavioural science and in clinical practice. It have been receptive field can be approximated mathematically by a
studied for a long period of time and obtaining the convolution operation. Convolutional networks were
progress recent decades. Though much progress has been inspired by biological processes 2 and are variations of
made, recognizing facial expression with a high accuracy multilayer perceptron designed to use minimal amounts of
remains to be difficult due to the complexity and varieties pre-processing. They have wide applications in image and
of facial expressions. On a day to day basics humans video recognition, recommender systems and natural
commonly recognize emotions by characteristic features, language processing. The convolutional neural network is
displayed as a part of a facial expression. For instance also known as shift invariant or space invariant artificial
happiness is undeniably associated with a smile or an neural network (SIANN), which is named based on its
upward movement of the corners of the lips. Similarly shared weights architecture and translation invariance
other emotions are characterized by other deformations characteristics. LeNet is one of the very first convolutional
typical to a particular expression. Research into automatic
neural networks which helped propel the field of Deep MxM. By sliding the filter over the input image, the dot
Learning. product is taken between the filter and the parts of the
input image with respect to the size of the filter
Architecture of CNN:
(MxM).The output is termed as the Feature map which
A typical architecture of a convolutional neural gives us information about the image such as the corners
network contains an input layer, some convolutional and edges. Later, this feature map is fed to other layers to
layers, some fully-connected layers, and an output layer. learn several other features of the input image. The
CNN is designed with some modification on LeNet convolution layer in CNN passes the result to the next
Architecture [10]. It has 6 layers without considering input layer once applying the convolution operation in the input.
and output. The architecture of the Convolution Neural Convolutional layers in CNN benefit a lot as they ensure
Network used in the project is shown in the following the spatial relationship between the pixels is intact.
figure.

2. Pooling Layer:

In most cases, a Convolutional Layer is followed by a


Pooling Layer. The primary aim of this layer is to decrease
the size of the convolved feature map to reduce the
computational costs. This is performed by decreasing the
connections between layers and independently operates on
each feature map. Depending upon method used, there are
several types of pooling operations. It basically
There are two main parts to a CNN architecture summarises the features generated by a convolution layer.
In Max Pooling, the largest element is taken from feature
 A convolution tool that separates and identifies map. Average Pooling calculates the average of the
the various features of the image for analysis in elements in a predefined sized Image section. The total
a process called as Feature Extraction. sum of the elements in the predefined section is computed
 The network of feature extraction consists of in Sum Pooling. The Pooling Layer usually serves as a
many pairs of convolutional or pooling layers. bridge between the Convolutional Layer and the FC
 A fully connected layer that utilizes the output Layer. This CNN model generalises the features extracted
from the convolution process and predicts the by the convolution layer, and helps the networks to
class of the image based on the features recognise the features independently. With the help of
extracted in previous stages.
this, the computations are also reduced in a network.
 This CNN model of feature extraction aims to
reduce the number of features present in a
dataset. It creates new features which 3. Fully Connected Layer
summarises the existing features contained in an
original set of features. There are many CNN
layers as shown in the CNN architecture The Fully Connected (FC) layer consists of the weights
diagram
and biases along with the neurons and is used to connect
the neurons between two different layers. These layers are
usually placed before the output layer and form the last
Convolution Layers
few layers of a CNN Architecture. In this, the input image
from the previous layers are flattened and fed to the FC
There are three types of layers that make up the CNN layer. The flattened vector then undergoes few more FC
which are the convolutional layers, pooling layers, and layers where the mathematical functions operations
fully-connected (FC) layers. When these layers are usually take place. In this stage, the classification process
stacked, a CNN architecture will be formed. In addition to begins to take place. The reason two layers are connected
these three layers, there are two more important is that two fully connected layers will perform better than
parameters which are the dropout layer and the activation a single connected layer. These layers in CNN reduce the
function which are defined below. human supervision

1. Convolutional Layer 4. Dropout

This layer is the first layer that is used to extract the Usually, when all the features are connected to the FC
various features from the input images. In this layer, the layer, it can cause overfitting in the training dataset.
mathematical operation of convolution is performed Overfitting occurs when a particular model works so well
between the input image and a filter of a particular size on the training data causing a negative impact in the
model’s performance when used on a new data. To facial components. The final step is to use a Feature
overcome this problem, a dropout layer is utilised wherein Extraction (FE) classifier and produce the recognition
a few neurons are dropped from the neural network during results using the extracted features. Figure 1.1 shows the
training process resulting in reduced size of the model. On FER procedure for an input image where a face region and
passing a dropout of 0.3, 30% of the nodes are dropped out facial landmarks are detected. Facial landmarks are
randomly from the neural network. visually salient points such as the end of a nose, and the
ends of eyebrows and the mouth as shown in Figure 1.2.
The pairwise positions of two landmark points or the local
5. Activation Functions:
texture of a landmark are used as features. Table 1.1 gives
the definitions of 64 primary and secondary landmarks [8].
Finally, one of the most important parameters of the CNN The spatial and temporal features are extracted from the
model is the activation function. They are used to learn face and the expression is determined based on one of the
and approximate any kind of continuous and complex facial categories using pattern classifier.
relationship between variables of the network. In simple
words, it decides which information of the model should
fire in the forward direction and which ones should not at
the end of the network. It adds non-linearity to the
network. There are several commonly used activation
functions such as the ReLU, Softmax, tanH and the
Sigmoid functions. Each of these functions have a specific
usage. For a binary classification CNN model, sigmoid
and softmax functions are preferred and for a multi-class
classification, generally softmax us used. In simple terms,
activation functions in a CNN model determine whether a
neuron should be activated or not. It decides whether the
input to the work is important or not to predict using
mathematical operations. It drops neurons from the
neural networks during training.

Facial emotion recognition:

FER typically has four steps. The first is to detect a face


in an image and draw a rectangle around it and the next
step is to detect landmarks in this face region. The third
step is extracting spatial and temporal features from the

Literature survey:

S.NO JOURNAL DETAILS TECHNIQUES USED INFERENCE

1. Dong-Hwan Lee; Jang-Hee Yoo, “CNN Convolutional neural A divide-and-conquer-


Learning Strategy for Recognizing network, divide-and- based CNN learning
Facial Expressions”, IEEE, Volume conquer, facial expression strategy that utilizes the
No:11, 2023 recognition. grouping of similar data
by analyzing the
classification results for
training and testing the
CNN model.

2. Longbiao Mao; Yan Yan; Jing-Hao Deep Multi-Task Multi- DMM-CNN jointly
Xue, “Deep Multi-Task Multi-Label Label CNN, facial optimizes two closely-
CNN for Effective Facial Attribute expression recognition related tasks (i.e., facial
Classification”, IEEE, Volume No:13, landmark detection and
2022 FAC) to improve the
performance of FAC by
taking advantage of multi-
task learning
3. Xiao Liu; Xiangyi Cheng; Kiju Lee, Genetic algorithm, This method employs less
“GA-SVM-Based Facial Emotion Convolutional neural complicated models and
Recognition Using Facial Geometric network. thus shows potential for
Features”, IEEE, Volume No:21, 2021 real-time machine vision
applications in automated
systems
4. Jun-Tong Liu; Fang-Yu Wu, “Domain Similarity preserving Competitive accuracy is
Adaption for Facial Expression generative adversarial reported when compared it
Recognition”, IEEE, Volume No:10, network (SPGAN with other state of the art
2020 works, which shows
promising results.

Existing system: This provides interpretable insights into the model's


decision-making process.
1. Data Collection and Pre-processing: Datasets
containing images or video clips of individuals displaying 5. Real-time Processing: The system is optimized for
various facial expressions are gathered. These datasets are real-time applications, making it suitable for use in
pre-processed to standardize image sizes, lighting human-computer interaction systems, emotion-aware
conditions, and facial alignment. applications, and other real-world scenarios.

2. Feature Extraction: Convolutional Neural Networks


are used to automatically extract relevant features from
facial images. These features capture important facial
landmarks, textures, and patterns.
6. Performance Metrics: Comprehensive evaluation
3. Training: The CNN model is trained on the pre- includes not only traditional metrics like accuracy but also
processed dataset. This involves feeding the network a takes into account fairness, handling of subtle expressions,
large number of labelled examples to learn how to and robustness across diverse demographic groups.
recognize different facial expressions. Techniques like
The proposed system represents a significant
transfer learning from pre-trained models may also be
advancement in the field of facial expression recognition,
used.
offering improved accuracy, real-time processing
4. Testing and Validation: The system's performance is capabilities, and enhanced interpretability. It addresses the
evaluated on a separate testing dataset to measure its limitations of existing systems, making it a valuable
accuracy in recognizing facial expressions. Various contribution to the domain of computer vision and
evaluation metrics such as accuracy, precision, recall, and emotion analysis.
F1-score are calculated.
Reference:
5. Real-time or Batch Processing: Depending on the
1. S. Li and W. Deng, ‘‘Deep facial
application, the system may process facial expressions in
expression recognition: A survey,’’ IEEE
real-time from live camera feeds or in batch mode on pre-
Trans. Affect. Comput. vol. 13, no. 3, pp.
recorded video or image data.
1195–1215, Jul./Sep. 2022.
2. A, Cruz-Albarran, J. P. Benitez-Rangel, R.
A. Osornio-Rios, and L. A. Morales-
Proposed system: Hernandez, ‘‘Human emotions detection
based on a smart-thermal system of
1. Advanced CNN Architecture: The proposed system thermographic images,’’ Infr. Phys.
introduces a novel CNN architecture specifically designed Technol., vol. 81, pp. 250–261, Mar. 2017.
for facial expression recognition. This architecture 3. C, Shan, S. Gong, and P. W. McOwan,
incorporates multiple convolutional layers for feature ‘‘Facial expression recognition based on
extraction, followed by fully connected layers for emotion local binary patterns: A comprehensive
classification. study,’’ Image Vis. Com-put. vol. 27, no.
6, pp. 803–816, May 2009.
2. Data Augmentation: To improve generalization and
4. Shan, C., Gong, S., & McOwan, P. W.
handle diverse conditions, the system employs extensive
(2005, September). Robust facial
data augmentation techniques. This includes variations in
expression recognition using local binary
lighting, facial poses, and ethnic backgrounds to create a
patterns. In Image Processing, 2005. ICIP
more comprehensive training dataset.
2005. IEEE International Conference on
3. Transfer Learning: Transfer learning from pre-trained (Vol. 2, pp. II-370). IEEE.
CNN models on large image datasets is utilized to 5. Chibelushi, C. C., & Bourel, F. (2003).
initialize the network's weights. Fine-tuning on the facial Facial expression recognition: A brief
expression dataset helps the model converge faster and tutorial overview. CVonline: On-Line
achieve higher accuracy. Compendium of Computer Vision, 9.
6. "Convolutional Neural Networks (LeNet)
4. Emotion Heatmaps: The proposed system also – DeepLearning 0.1 documentation".
generates emotion heatmaps, highlighting the regions of DeepLearning 0.1. LISA Lab. Retrieved 31
the face that contribute most to the recognized emotion. August 2013.
7. T. Danisman, M. Bilasco, N. Ihaddadene,
C. Djeraba, Automatic facial feature
detection for facial expression recognition,
International Conference on Computer
Vision Theory and Applications, pp. 407-
412, (2010).
8. L.A. Parr, B.M. Waller, Understanding
chimpanzee facial expression: Insights into
the evolution of communication, Social
Cognitive and Affective Neuroscience,
vol. 1, no. 3, pp. 221-228, (2006).

You might also like