0% found this document useful (0 votes)
15 views14 pages

Real-time Facial Emotion Recognition Model Based on Kernel Autoencoder and Convolutional Neural Network for Autism Children

Uploaded by

mohannedta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views14 pages

Real-time Facial Emotion Recognition Model Based on Kernel Autoencoder and Convolutional Neural Network for Autism Children

Uploaded by

mohannedta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Soft Computing (2024) 28:6695–6708

https://doi.org/10.1007/s00500-023-09477-y (0123456789().,-volV)
(0123456789().,-volV)

APPLICATION OF SOFT COMPUTING

Real-time facial emotion recognition model based on kernel


autoencoder and convolutional neural network for autism children
Fatma M. Talaat1,2,3 • Zainab H. Ali4,5 • Reham R. Mostafa6,7 • Nora El-Rashidy1

Accepted: 14 November 2023 / Published online: 4 January 2024


Ó The Author(s) 2024

Abstract
Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that is characterized by abnormalities in the brain,
leading to difficulties in social interaction and communication, as well as learning and attention. Early diagnosis of ASD is
challenging as it mainly relies on detecting abnormalities in brain function, which may not be evident in the early stages of
the disorder. Facial expression analysis has shown promise as an alternative and efficient solution for early diagnosis of
ASD, as children with ASD often exhibit distinctive patterns that differentiate them from typically developing children.
Assistive technology has emerged as a crucial tool in improving the quality of life for individuals with ASD. In this study,
we developed a real-time emotion identification system to detect the emotions of autistic children in case of pain or anger.
The emotion recognition system consists of three stages: face identification, facial feature extraction, and feature cate-
gorization. The proposed system can detect six facial emotions: anger, fear, joy, natural, sadness, and surprise. To achieve
high-performance accuracy in classifying the input image efficiently, we proposed a deep convolutional neural network
(DCNN) architecture for facial expression recognition. An autoencoder was used for feature extraction and feature
selection, and a pre-trained model (ResNet, MobileNet, and Xception) was applied due to the size of the dataset. The
Xception model achieved the highest performance, with an accuracy of 0.9523%, sensitivity of 0.932, specificity of 0.9421,
and AUC of 0.9134%. The proposed emotion detection framework leverages fog and IoT technologies to reduce latency for
real-time detection with fast response and location awareness. Using fog computing is particularly useful when dealing
with big data. Our study demonstrates the potential of using facial expression analysis and deep learning algorithms for
real-time emotion recognition in autistic children, providing medical experts and families with a valuable tool for
improving the quality of life for individuals with ASD.

Keywords Emotion recognition  Assistive technology  Autism  IoT  Deep learning

1 Introduction this developmental syndrome can be seen in a child’s early


years, even though it can manifest at any age. The DSM-5
This section discusses some important issues such as aut- (Diagnostic and Statistical Manual of Mental Disorders)
ism spectrum disorder, Autism-related issues with emotion (Australia 2015) states that a person with autism exhibits
recognition, and assistive technology. ineffective behavior, and social, and communication abil-
ities. Even though the impairment is permanent, therapy
1.1 Autism spectrum disorder and assistance can help a person accomplish some things
more effectively. Some people with autism have problems
A neurological disorder called autism spectrum disorder sleeping and exhibit obnoxious behavior. Because certain
(ASD) affects behavior and communication. Although symptoms in adults may overlap with other intellectual
Kanner was the first to recognize it in 1943 ( (Kanner illnesses, like Attention Deficit Hyperactivity Disorder
1968)), our knowledge of ASD has greatly increased in (ADHD), It is simpler to diagnose ASD in kids than in
terms of diagnosis and treatment. The initial indicators of adults.
They have a consistent demeanor and exhibit no desire
to interact with others. Based on the individual’s
Extended author information available on the last page of the article

123
6696 Fatma M. Talaat et al.

symptoms, autism is a disorder with a wide range of measurements of the face, body, and speech can all be used
symptoms. It might be modest to really severe, which is to interpret emotion. The capacity to divide attention and
why the word ‘‘spectrum’’ is included in the disorder’s concentrate on pertinent facial information is necessary for
name (Maenner et al. 2021). An autistic individual with a the recognition of emotions; this sort of processing is lar-
severe disorder is mostly nonverbal or has difficulties gely subconscious.
speaking. They have a hard time interpreting their feelings
and communicating them to others. As a result, a person 1.3 Assistive technology’s role in the lives
with Autism has difficulty executing everyday tasks. A of individuals with ASD
real-time emotion identification system for autistic
youngsters is a vital issue to detect their emotions to help Any device or piece of gear that allows persons with ASD
them in case of pain or anger. to perform tasks they previously couldn’t is considered
assistive technology. These technology devices make it
1.2 Autism-related issues with emotional easier for people with disabilities to complete daily duties.
recognition In recent years, technology that helps people with autism
has advanced significantly. From basic to advanced, this
Empathy is referred to as the capacity for comprehending technology is diverse (O’Neill and Gillespie 2014). The
and reciprocating the feelings of another. Sympathy, on the primary objective of assistive technology is to benefit those
other hand, is the ability to share comparable feelings with with special needs. Such facilities, schools, and the gov-
another individual. People with ASD may not be able to ernment might work together to develop therapeutic spaces
empathize or sympathize with others (Cheng et al. that use technology. Most academics concur that it is
2002). When someone is harmed, they may express glad- crucial to select appropriate assistive technology for people
ness, or they may show no emotion at all. As a result of with autism in a methodical manner, depending on the
their inability to respond correctly to others’ emotions, severity of the disease (Knight et al. 2013).
autistic people may appear emotionless. Several studies, As a result, not every person should use every type of
however, have looked into if someone with autism can assistive technology. Each ASD sufferer has their own
genuinely express their emotions to others. unique set of traits. It is evident that there is no set of
Empathy requires a careful examination of another assistive technologies that are universal.
person’s body language, speech, and facial expressions in Only experts are qualified to distinguish between the
order to comprehend and interpret their sentiments. People differences and offer the required assistance. Assistive
with ASD lack the necessary social skills associated with technology uses everything from elementary to cutting-
interpreting body language and reciprocating feelings, edge computing techniques (Aresti-Bartolome and Garcia-
whereas youngsters learn to understand the facial expres- Zapirain 2014). It can be divided into three main cate-
sions needed to demonstrate empathy by seeing and emu- gories: I basic assistive technology, which refers to picto-
lating people around them. In people with ASD, the rial cards used to facilitate communication between the
majority of social skills needed to engage with others are student and the teacher; (ii) medium assistive technology,
significantly hampered. which refers to graphical representation systems; and (iii)
ASD is linked to a specific social and emotional deficit advanced technology, which includes applications for
that is characterized by cognitive, social semiotic, and human–computer interaction like robots and gadgets
social understanding deficits. Autism typically prevents a (Anwar et al. 2010).
person from understanding another person’s emotions and A wide range of tools known as assistive technology is
mental state through facial expressions or speech intona- available to help people with autism overcome (Brumfitt
tion. They may also have trouble anticipating other peo- 1993) their functional limitations. Another technological
ple’s actions by analyzing their emotional tool used to enhance communication for people with ASD
conditions. Facial expressions are heavily used in emotion is augmented and alternative communication (AAC),
recognition studies. The ability to recognize emotions and which is a plan of action that could allow a nonverbal
distinguish between distinct facial expressions is normally person to interact with others (Auyeung et al. 2013).
developed from infancy (Dollion et al. 2022). Additionally, the ways for instructing such extraordinary
Children with ASD frequently neglect facial expressions people to study in order to enhance their life can be
(Howard et al. 2021). Additionally, children with autism improved through the use of computer-based adaptive
perceive facial expressions inconsistently, suggesting that learning. This media could consist of software, hardware,
they lack the ability to recognize emotions (Banire et al. or a combination of the two.
2021). When interpreting emotions, multiple sensory pro- Dynamic assistive technology incorporates control
cessing is frequently necessary (Conner et al. 2020). The apparatus, touch displays, and augmented and virtual

123
Real-time facial emotion recognition model based on kernel autoencoder 6697

reality applications, among other advanced technological computer interaction interface was also created as a result
computer gadgets that have evolved. These technologies of the ‘‘AURORA’’ project, which employed a robot and
can be used for both diagnosis and treatment. Pictures and permitted interaction between the kid and the robot
images have piqued the interest of people with ASD (Goldsmith and LeBlanc 2004). Another study (Dauten-
(Charlop-Christy et al. 2002). They have proven to be hahn and Werry 2004) confirms the human–computer
efficient visual learners, and pictorial cards have been connection by demonstrating a series of brief films
shown to be a successful teaching tool for children to learn depicting various emotional states of children with excep-
how to perform daily tasks. tional needs. The writers of (Robins et al. 2009) look into
The aim of this study is to develop a real-time emotion the various possibilities for using robots as therapeutic
recognition system based on DL CNN model. Emotion instruments.
detection using face images is a challenging task. The Despite several studies on teaching emotions to autistic
consistency, quantity, and caliber of the photos used to children, several obstacles remain. An autistic individual
train the model have a big effect on how well it works. has a hard time deciphering emotions from facial expres-
From the images used in training, the model should be able sions. In a study (Kaliouby and Robinson 2005), the
to distinguish between different emotions. It was a chal- application of giving an emotional hearing aid to special-
lenge due to several reasons including (i) face images have needs individuals was proven. The facial action coding
different characteristics, for example, some of them have system (FACS), introduced by Ekman and Friesian
shorter faces, border faces, wider images, smallmouth, etc. (Donato et al. 1999), is one of the approaches used to
(ii) some faces may indicate emotions that are different recognize facial expressions. Depending on the facial
from their actual emotions. (iii) sad emotions may some- muscular activity, the facial action coding system depicts
times overlap with anger emotions and the same for joy and several sorts of facial expressions.
surprise. The results showed that the higher accuracy is for This method allows for the quantitative measurement
the natural class. and recording of facial expressions. Facial recognition
The remaining work is organized as follows. In Sect. 2, systems have advanced significantly in tandem with
some of the recent related work in emotion recognition advances in real-time machine learning techniques. The
techniques is presented. Section 3 presents the problem study (Pantic and Rothkrantz 2000) presents a detailed
definition. Experimental evaluation is provided in Sect. 4. review of current automatic facial recognition technologies
And in Sect. 5, we conclude this work. and applications. According to the research, the majority of
existing systems recognize either the six fundamental facial
expressions or different forms of facial expressions.
2 Literature review When it comes to recognizing emotions, emotional
intelligence is crucial. Understanding emotions entails
A facial identification system that not only recognizes faces biological and physical processes as well as the ability to
in a picture but also infers the type of emotion from facial recognize other people’s feelings (Staff et al. 2022). By
features has long been investigated by academics and tech watching facial expressions and somatic changes and
experts. Bledsoe (Rashidan et al. 2021); (Banire et al. converting these documented changes to their physiologi-
2021) documented some of the first research on automatic cal presentation, an individual can reliably predict emo-
facial detection for the US Department of Defense in 1960. tions. According to Darwin’s research, the process of
Since then, the software has been developed specifically for detecting emotions is thought to entail numerous models of
the Department of Defense, although little information behavior, resulting in a thorough classification of 40
about the product is given to the general public. Kanade emotional states (Magdin et al. 2019). On the other hand,
(Ahmed et al. 2022) developed the first fully effective the majority of studies on facial attribute stratification, on
autonomous facial recognition system. By discriminating the other hand, refer to Ekman’s classification of six pri-
between features retrieved by a machine and those derived mary emotions (Batty and Taylor 2003). Joy, sadness,
by humans, this system was able to measure 16 distinct surprise, fear, disgust, and anger are the six fundamental
facial features. emotions. However, six other fundamental emotions were
One of the studies (Baron-Cohen et al. 2009) demon- later added to the neutral expression.
strates how teaching emotions to autistic children through The ease with which various groupings of emotions may
the use of visual clues. To teach kids about various emo- be identified is a big advantage of using this approach. A
tions and monitor their development, they produced several variety of computer-based technologies have been devel-
movies and games. Another study (Cheng et al. 2002) oped to better read human attitudes and feelings in order to
provided a web application that gives these gifted children improve the user experience (Leony et al. 2013). To
a platform to interact with a simulated model. A human– anticipate meaningful human facial expressions, they

123
6698 Fatma M. Talaat et al.

mostly use cameras or webcams. With moderate accuracy, 2.1 Problem definition
one can deduce the emotions of another person who is
facing the camera or webcam. Meanwhile, several This section proposes a real-time emotion identification
machine-learning and image-processing experiments have system for autistic youngsters. Face identification, facial
shown that face traits and eye-gazing behaviors may be feature extraction, and feature categorization are the three
used to identify human moods (Lakshminarayanan et al. stages of emotion recognition. A total of six facial emo-
2017). The Facial Action Coding Method (FACS) is a tions are detected by the propound system: anger, fear, joy,
classification system for facial impressions based on facial natural, sadness, and surprise. This research presents a
affectation. It was first proposed in 1978 by Ekman and Deep Convolutional Neural Network (DCNN) architecture
Friesen, and it was modified in 2002 by Hager (Lakshmi- for facial expression recognition. The suggested architec-
narayanan et al. 2017). ture outperforms earlier convolutional neural network-
Several studies depend on face images. For example based algorithms and does not require any hand-crafted
(Wells et al. 2016), R. Sadik et al. utilize transfer learning feature extraction.
in emotion recognition. The proposed model is imple-
mented using CNN to develop the MobileNet model. The 2.2 Proposed emotion detection framework
experimental result showed the recognition model achieved
89%, and 87% in terms of accuracy and F1 score, The proposed emotion detection framework is based on
respectively. The same in (Ahmed et al. 2022), they uti- three layers which are: (i) Cloud Layer, (ii) Fog Layer, and
lized three pre-trained models include (MobileNet, Xcep- (iii) IoT Layer. The two main layers are (i) IoT and (ii)
tion, and Inception V3) to detect autism based on facial Fog. In IoT, there is the proposed assistant mobile appli-
features. Given the accuracy of 95%, 94%, and 89% for cation which is used to capture an image of the child while
MobileNet, Xception and Inception, respectively. In using the smart device and then sends this face image to the
another study (Akter et al. 2021), T. Akter et al. utilized an fog layer. In the Fog layer, there is a controller which is the
improved version of MobileNet V1. fog server (FS) responsible for getting this face image and
The enhanced version has augmented additional layers detecting the emotion depending on the proposed DL
to increase performance including batch normalization to technique. In the Fog layer, there is also a database (DB)
normalize output, average pooling to recenter and rescale containing the pre-trained dataset. After detecting the
the input, and two fully connected layers before the output emotion, the fog server sends an alert message to the parent
layer. The improved model achieved 90.67% in terms of device in case of detected emotion is anger, fear, sadness,
classification accuracy. Other studies analyze the facial or surprise.
image of autistic children for other purposes. For example, The main controlling and managing is implemented at
in (Banire et al. 2021), build DL to recognize attention the controller at the fog layer to reduce the latency for real-
from the facial analysis. They achieved performance of time detection with fast response and to be a location
88.89% and 53.1% in terms of ACC and AUC, awareness. The overall proposed framework is shown in
respectively. Fig. 1.
Regarding children with Autism. Various studies used
DL and CNN to diagnose Autism based on facial analysis. 2.3 Cache replacement strategy using fuzzy
For example in (Beary et al. 2020), M. Beary et al.,
introduce DL model to classify children as either normal or While there are a huge number of captured images that will
potentially autistic. Authors utilized a pre-trained model be sent to the fog every second, there will be a problem in
(MobileNet) and achieved an accuracy reached 94.6%. In the space of the cache memory of the fog server. Hence, we
(Nagy et al. 2021), E. Nagy et al. compare the accuracy of should use an efficient cache replacement technique to free
the response of six emotions include ( neutral, sad, disgust, the memory from the old captured images after a specific
anger, fear, and surprise) for normal and Autism children time. The fog server contains a table containing data about
under non-timed and timed conditions. The result showed each captured image as shown in Table 1 such as (i) Image
children with Autism are less accurate in identifying sur- ID: which is a sequence number, (ii) Arrival time, (iii)
prise and anger when compared to normal children. More Current Time, (iv) TTL: Time To Live which is calculated
extensive studies that detail emotion recognition among from the arrival time and current time. (v) Priority depends
Autism could be found in the following reviews (Rashidan on the location of the capturing of the image. The priority
et al. 2021); Harms et al. (Harms et al. 2010). is high if the captured image is taken at a remote location
from the location of the parent. The priority is low if the
captured image was taken at the same location as the

123
Real-time facial emotion recognition model based on kernel autoencoder 6699

Fig. 1 Proposed emotion detection framework

Table 1 Images information table power (Ranjan and Prasad (Ranjan and Prasad 2018)).
Particularly for real-time systems, this is a really intriguing
Image ID Arrival time Current time TTL Priority Available
aspect. Typically, fuzzy approaches require less time to
1 10 11 7 High Available design than traditional ones.
2 10 11 7 High Available The following consecutive steps are used to carry out the
3 10 11 7 Low Available fuzzy inference process: (i) Input fuzzification, (ii)
4 11 18 7 Low Deleted Applying fuzzy rules, and (iii) Defuzzification. The fuzzy
5 11 19 7 Low Deleted procedure depicted in Fig. 2 illustrates those processes.
Each image is ranked based on the following three
characteristics: (i) Arrival time (AT), (ii) Time To Live
(TTL), which are determined by the arrival time and the
parent. (vi) Available: to decide whether the image is current time, (iii) Priority (P): This factor is determined by
available or deleted. where the photograph was taken. Each photo receives a
Fuzzy logic is used to determine if an image should be Ranking (R) value from the Fuzzy by taking into account
kept around for a while longer or should be removed. In its three preset attributes (AT, TTL, and P). The fuzzy
contrast to computationally exact systems, the reasoning process takes into account each of those characteristics.
process is frequently straightforward, saving processing

123
6700 Fatma M. Talaat et al.

The rules in this stage explain the connection between


the output and the specified input variables (AT, TTL, and
P) (R). The fuzzy language rules are founded on IF–THEN
statements like these:
If TTL is small and P is low and AT is early THEN
R is R2
If P is high THEN R is R1

Fig. 2 The steps of the fuzzy process iii. Crips values


It will be chosen whether to delete the image or keep it
The ranking is quickly and accurately determined by a
for longer than was originally determined based on the
fuzzy algorithm. Insofar as they are less sensitive to
fuzzy output value (R).
shifting settings and incorrect or forgotten rules, fuzzy
According to the output value from fuzzy, it will be
algorithms are frequently robust.
decided to delete the image or remain it for more extra time
i. Arrival Time (AT): has values: Early, Medium, or which is predetermined previously.
Late.
ii. Time To Live (TTL): Small, Medium, or Large. 2.4 Proposed application
iii. Priority (P): Low, Medium, or High.
The proposed application can be implemented in any smart
The rating value is R1, meaning that the photo is sig-
device such as smartphones, tablets, etc. When the appli-
nificant and will require additional time (15 min). If a
cation captures a photo of the kid, it can detect his feeling
photo has a rating value of R2, it has a low priority and can
as shown in Fig. 3. The Emotion Detection Assistant
be deleted to make room for another.
(EDA) Application can be active in the background while
i. Fuzzified inputs the child uses another application. EDA is used to detect
the emotion of the kid. If the detected emotion is natural or
The considered three features for each image are
joy, then there is no problem. If the detected emotion is
fuzzified. The Fuzzified Arrival Time (FAT), Fuzzified
anger, fear, sadness, or surprise, it will send an alert signal
Time to Live (FTTL), and Fuzzified Priority (FP) is used to
to a connected application installed on the parent’s device.
predict the value of R. They are calculated as shown in
In normal, a child with autism has not had enough
Eqs. (1), (2), and (3).
ability to express his feeling. Hence, EDA is very useful
FAT ¼ fðAT,lFAT(AT)Þ=AT 2 T g and essential to help his parent to be notified when he is not
ð1Þ
lFAT(AT) 2 ½0; 1 okay. The child with autism has no ability to ask for help.
There are two main sides in this system as shown in Fig. 4:
where, AT = {Early, Medium, Late}, T = [0,100]. FAT: (i) the Child Side, and (ii) the Parent Side.
Fuzzified Arrival Time.
FTTL = f (TTL,lFTTL(TTL))/TTL 2 TLg
lFTTL(TTL) 2 ½0; 1 ð2Þ

where, TTL = {Small, Medium, Large}, TL = [0,100].


FTTL: Fuzzified Time To Live.
FP = f (P,lFP(P))/P 2 pg
lFP(P) 2 ½0; 1 ð3Þ

where, P = {Low, Medium, High}, p = [0,100]. FP:


Fuzzified Priority.
ii. Applying regulations

Fig. 3 Emotion Detection Assistant (EDA) application

123
Real-time facial emotion recognition model based on kernel autoencoder 6701

Fig. 4 Emotion Detection


Assistant (EDA) system

2.5 Proposed DL framework architecture to a bottleneck from which it can reconstruct


the image. It is utilized to reduce the dimension of the
To improve the performance of the algorithm to classify dataset when the relation between the independent and
the input image efficiently, the proposed algorithm contains dependent datasets could be described using a nonlinear
an autoencoder for feature extraction and feature selection relationship. Autoencoders are considered the most
Fig. 5. promising tool for feature extraction that is used for various
applications including self-driving cars, speech recogni-
2.5.1 Autoencoder for feature extraction tion, etc.
As shown in Fig. 6, autoencoder architecture consists of
Autoencoder is a type of unsupervised neural network that three main parts: the encoder, the bottleneck, and the
attempts to make a compressed representation of the input decoder. First: the encoder tries to pick the most significant
data. The design of the autoencoder restricts the features to form the input data. While the decoder part tries

Fig. 5 The proposed DL framework

123
6702 Fatma M. Talaat et al.

activation of the neuron in the hidden layer is calculated by


the following equation.
m h i
1X ð2Þ
Pbj ¼ aj ðxÞðiÞ ; ð7Þ
m i¼1
ð2Þ
where aj ðxÞðiÞ is the activation of neuron j in layer 2

Pbj ¼ P: ð8Þ
The sparsity in the model could be imposed using L1
regularization.

3 Implementation and evaluation


Fig. 6 Autoencoder architecture

3.1 Used dataset


to reconstruct the original inputs using the crucial feature.
Autoencoders reduce the data dimension by keeping the
This paper uses a set of cleaned images for autistic children
features required to reconstruct the data. The output of the
with different emotions (‘‘Dataset link’’. xxxx). The
autoencoders is considered the same as inputs, but with
duplicated images and the stock images have been
some loss. Thus, sometimes, it is called lossy compression.
removed. Then dataset have been categorized into six
Considering X as a data sample n is the number of
facial emotions: anger, fear, joy, natural, sadness, and
samples and m number of features and Y the encoder
surprise. The six primary used emotions are shown in
output (the reduced dimension of X). The decoder then tries
Fig. 7.
to reconstruct X from Y. The main goal is to reduce the
This paper used 758 images for training (Anger: 67,
difference between the original input X and the recon-
Fear:30, Joy: 350, Natural: 48, Sadness: 200, Surprise: 63)
structed input X0.
and 72 images for testing (Anger: 3, Fear:3, Joy: 42, Nat-
Encoder function that maps between X and Y formulated
ural: 7, Sadness: 14, Surprise: 6).
as follows
Y ¼ f ð xÞ ¼ SF ðWX þ bx Þ; ð4Þ 3.2 Autoencoder for feature extraction
where SF is the activation function. and selection
The decoder function maps the representation Y back to
reconstruct image X The autoencoder model was created as a sequential model
  that sequentially adds one layer and deepens the network.
X0 ¼ gðY Þ ¼ Sg W0Y þ by ; ð5Þ Each layer feeds the output to the next layer. Starting with
where Sg is the decoder activation function (i.e. Sigmod, the input layer that takes the image dimension ( width and
Softmax). Training the autoencoder mainly depends on height) and the code size. Then, the flatten layer is used to
finding the parameters of ðWandbx Þ that could minimize flatten the image matrix to a 1D array. The Dense layer
the recreation loss tries to find the optimal features to ensure achieving the
optimal output. L1 regularization utilized to achieve
h ¼ minh Lð X; X0Þ ¼ minh ðX; gF ð xÞÞ ð6Þ sparsity.
The decoder is also a sequential model that accepts the
2.5.2 Autoencoder for feature selection input from the encoder and then tries to reconstruct the
input image in the form of one row. Then, it stacks through
In this section, a spare autoencoder is used for feature a dense layer and finally reshapes to construct the image.
selection. The difference in sparse autoencoder is that it Figure 8 clarifies the used model for the autoencoder. The
includes sparsity constraint on the hidden unit. This con- plot in Fig. 9 shows the learning curve of the autoencoder
straint is utilized to achieve a bottleneck by applying a model. The curve clarifies that the model achieves a good
penalty to the neurons. It ensures that the critical features fit in the reconstruction process.
are only activated. Thus, it forces the model to learn small
and unique statistical features of the data. The average

123
Real-time facial emotion recognition model based on kernel autoencoder 6703

Fig. 7 The six primary used emotions

Fig. 8 Sparse autoencoder model architecture

3.3 The deep learning model for classification members of the negative class. (iv) Cohen Kapa Cohen’s
task kappa is a metric often used to assess the agreement
between two parameters. It can also be used to evaluate a
Several metrics are used to evaluate the model performance Classification model’s effectiveness. where pe is a mea-
including the following (i) Accuracy, (ii) Precision: cal- surement of the consistency between the model predictions
culate the proportion of samples that have been accurately and the actual class values as if it happened by random and
categorized compared to all samples. (iii) Recall: calculate p0 is the model’s overall accuracy.
the proportion of samples that are correctly categorized as

123
6704 Fatma M. Talaat et al.

Fig. 9 Sparse autoencoder


model results

(v) F1- score: calculate The harmonic mean of the experimentation and empirical analysis. The performance
precision and recall. It is considered an effective evaluation evaluation of the models was conducted using various
metric for unbalanced data. The area under the roc curve metrics, including accuracy and loss percentages, which
(AUC), is calculated based on the ROC curve. In medical were monitored during both the training and validation
images. The roc curve is preferable to accuracy. That is s stages of the models.
because accuracy could not reflect the distribution of the These metrics are visualized in Figs. 10 and 11,
prediction and which class has the height likelihood of respectively, providing insights into the model’s conver-
estimation. Table 2 shows all metrics and the mathematical gence and generalization capabilities. Based on our com-
formulas to calculate them. prehensive analysis of the results presented in Figs. 8, 9,
and Table 2, we observed that the Xception model out-
performed the other models in terms of accuracy and per-
4 Results and discussion formance. It achieved an impressive accuracy of 95.23%,
showcasing a substantial improvement of 7.3% compared
In order to process the dataset efficiently, we employed to MobileNet and 1.8% compared to ResNet. Furthermore,
data flow generators to divide the collected images into the Xception model demonstrated an enhanced AUC value
manageable batches, which were then fed into our pro- of 94.34, exhibiting a remarkable improvement of 3.3%
posed models: MobileNet, Xception, and ResNet. The over MobileNet and 2.7% over ResNet. These findings
training dataset consisted of a total of 1200 images, care- highlight the superior capabilities of the Xception model
fully curated to ensure diversity and representativeness, within the context of our experimental setup. The Xception
while the testing dataset comprised 220 images for evalu- architecture, with its advanced feature extraction and rep-
ating the performance of the models. To provide trans- resentation learning capabilities, proved to be highly
parency and reproducibility, we present the model effective in achieving higher accuracy and improved per-
hyperparameters used in our experiments in Table 3. These formance compared to the alternative models. This sug-
hyperparameters were fine-tuned through rigorous gests that the Xception model is well-suited for the specific
task at hand and holds great potential for similar image
classification tasks.
Table 2 Evaluation metrics
The objective of our study is to develop a CNN model
Metric Abbreviation Equation that could classify the emotions of ASD children based on
Accuracy ACC tpþtn facial expressions. A big challenge in analyzing facial
tpþfpþtnþfn
tn
expressions is the inability. To detect the crucial features,
Precision P tnþfp we utilized the sparse autoencoder model that includes an
tn
Recall R tnþfn encoder and decoder model. The encoder transforms the
0 pe
Cohen kapps K K ¼ p1p input image into a feature vector and passes only the
e

F1- score F1 2ðPRÞ important features, then the decoder reconstructs the image
PþR
from the encoder output. The output from the autoencoder
is then used to classify facial emotions. Due to the size of

123
Real-time facial emotion recognition model based on kernel autoencoder 6705

Table 3 Results of the pretrained models


Model Acc P R F1 K AUC

MobileNet 0.8812 ± 0.0231 0.9138 ± 0.0281 0.9027 ± 0.0101 0.9002 ± 0.0256 0.813 ± 0.0264 0.892 ± 0.0312
ResNet 0.9143 ± 0.0338 0.9028 ± 0.0335 0.9311 ± 0.005 0.9162 ± 0.0227 0.8021 ± 0.0315 0.8625 ± 0.0103
Xception 0.9523 ± 0.0278 0.932 ± 0.0318 0.9421 ± 0.0134 0.9331 ± 0.0219 0.853 ± 0.0312 0.9134 ± 0.0121

Fig. 10 Deep learning model results, a Training and validation accuracy, b training and validation loss

Fig. 11 Comparison of the pre-


trained models

the used dataset, we choose to use a pre-trained model the consistency, number, and quality of images used for
(ResNet, MobileNet, and Xception). Xception model training models have a significant impact on the model
achieved the highest performance (ACC = 0.9523%, sn = performance. From the images used in training, the model
0.932, R = 0.9421, and AUC = 0.9134%). To ensure the should be able to distinguish between different emotions. It
superiority of the exception model, we made the nymeni was a challenge due to several reasons including (i) face
test to calculate the critical distance between the three images have different characteristics, for example, some of
models. them have a shorter face, border face, wider image, or
Using datasets like CIFAR and ImageNet in the pre- smallmouth, etc.; (ii) some faces may indicate emotions
trained model improves the results by about (8% to 12%). that are different from their actual emotions; (iii) sad
Emotion detection using face images is a challenging task, emotions may sometimes overlap with anger emotions and

123
6706 Fatma M. Talaat et al.

the same for joy and surprise. The results showed that the Table 4 List of abbreviations
higher accuracy is for the natural class. Term Abbreviations

AAC Augmented and Alternative Communication


5 Conclusion ASD Autism Spectrum Disorder ASD
CNN Convolutional Neural Network
Autism Spectrum Disorder (ASD) is a neurodevelopmental DCNN Deep Convolutional Neural Network
disorder characterized by abnormal brain development DL Deep Learning
leading to difficulties in social interaction, communication, FACS Facial Action Coding Method
and learning. Medical experts face challenges in diagnos- FLED Fisher Linear Discriminant analysis
ing ASD because diagnosis relies mainly on detecting PCA Principle Component Analysis
abnormalities in brain function that may not appear in the SIFT Scale-Invariant Feature Transform
early stages of the disorder. As an alternative to traditional
diagnostic methods, facial expression analysis has shown
promise in early detection of ASD, as children with ASD
Author contributions Single Author.
often exhibit distinctive patterns that differentiate them
from typically developing children. Assistive technology Funding Open access funding provided by The Science, Technology
has proven to be an effective tool in improving the quality & Innovation Funding Authority (STDF) in cooperation with The
of life for individuals with ASD. In this study, we devel- Egyptian Knowledge Bank (EKB). The authors received no specific
funding for this study.
oped a real-time emotion identification system for autistic
children to detect their emotions in case of pain or anger. Data availability FFFGG.
The emotion recognition system consists of three stages:
face identification, facial feature extraction, and feature
categorization. The proposed system can detect six facial Declarations
emotions: anger, fear, joy, natural, sadness, and surprise.
Conflict of interest The authors declare that they have no conflicts of
To enhance the performance of the algorithm in classifying
interest to report regarding the present study.
the input image efficiently, we proposed a deep convolu-
tional neural network (DCNN) architecture for facial Ethical approval There are no ethical conflicts.
expression recognition. An autoencoder was used for fea-
Open Access This article is licensed under a Creative Commons
ture extraction and feature selection, and a pre-trained Attribution 4.0 International License, which permits use, sharing,
model (ResNet, MobileNet, and Xception) was applied due adaptation, distribution and reproduction in any medium or format, as
to the size of the dataset. The Xception model achieved the long as you give appropriate credit to the original author(s) and the
highest performance, with an accuracy of 0.9523%, sensi- source, provide a link to the Creative Commons licence, and indicate
if changes were made. The images or other third party material in this
tivity of 0.932, specificity of 0.9421, and AUC of 0.9134%. article are included in the article’s Creative Commons licence, unless
The proposed emotion detection framework leverages fog indicated otherwise in a credit line to the material. If material is not
and IoT technologies to reduce latency for real-time included in the article’s Creative Commons licence and your intended
detection with fast response and location awareness. Using use is not permitted by statutory regulation or exceeds the permitted
use, you will need to obtain permission directly from the copyright
fog computing is particularly useful when dealing with big holder. To view a copy of this licence, visit http://creativecommons.
data. Our study demonstrates the potential of using facial org/licenses/by/4.0/.
expression analysis and deep learning algorithms for real-
time emotion recognition in autistic children, providing
medical experts and families with a valuable tool for References
improving the quality of life for individuals with ASD. In
the future, we will explore opportunities to expand our Ahmed ZAT et al (2022) Facial features detection system to identify
children with autism spectrum disorder: deep learning models.
dataset in order to enhance the effectiveness and robustness Comput Math MethOds Med 2022:3941049. https://doi.org/10.
of our models. 1155/2022/3941049
The list of abbreviations is shown in Table 4. Akter T et al (2021) Improved transfer-learning-based facial recog-
nition framework to detect autistic children at an early stage.
Brain Sci. https://doi.org/10.3390/brainsci11060734

123
Real-time facial emotion recognition model based on kernel autoencoder 6707

Anwar A, Rahman M, Ferdous SM, Ahmed SI (2010) ‘‘Autism and Access Inf Soc 4(2):121–134. https://doi.org/10.1007/s10209-
Technology: An approach to new technology-based therapeutic 005-0119-0
tools A Computer Game-based Approach for Increasing Fluency Goldsmith TR, LeBlanc LA (2004) Use of technology in interven-
in the Speech of Autistic Children,’’ no. January 2010. doi: tions for children with autism. J Early Intensive Behav Interv
https://doi.org/10.1007/978-3-642-03893-8. 1(2):166–178. https://doi.org/10.1037/h0100287
Aresti-Bartolome N, Garcia-Zapirain B (2014) Technologies as Harms MB, Martin A, Wallace GL (2010) Facial emotion recognition
support tools for persons with autistic spectrum disorder: a in autism spectrum disorders: a review of behavioral and
systematic review. Int J Environ Res Public Health neuroimaging studies. Neuropsychol Rev 20(3):290–322. https://
11(8):7767–7802. https://doi.org/10.3390/ijerph110807767 doi.org/10.1007/s11065-010-9138-6
Australia D, ‘‘Diagnostic criteria for Dementia,’’ Alzheimer’s Aust., Howard K, Gibson J, Katsos N (2021) Parental perceptions and
pp. 1–6, 2015, [Online]. Available: http://www.ncbi.nlm.nih. decisions regarding maintaining bilingualism in Autism.
gov/books/NBK56452/ J Autism Dev Disord 51(1):179–192. https://doi.org/10.1007/
Auyeung B, Baron-Cohen S (2013) Hormonal influences in typical s10803-020-04528-x
development: implications for autism. In: Buxbaum JD (ed) San Kanner L (1968) Autistic disturbances of affective contact. Acta
Diego. Academic Press, Elsevier, pp 215–232 Paedopsychiatr 35(4):100–136
Banire B, Al Thani D, Qaraqe M, Mansoor B (2021) Face-based Knight V, McKissick BR, Saunders A (2013) A review of technol-
attention recognition model for children with autism spectrum ogy-based interventions to teach academic skills to students with
disorder. J Healthc Inform Res. 5(4):420–445. https://doi.org/10. autism spectrum disorder. J Autism Dev Disord
1007/s41666-021-00101-y 43(11):2628–2648. https://doi.org/10.1007/s10803-013-1814-y
Baron-Cohen S, Golan O, Ashwin E (2009) Can emotion recognition Lakshminarayanan B, Pritzel A, Blundell C (2017) Simple and
be taught to children with autism spectrum conditions? Philos scalable predictive uncertainty estimation using deep ensembles.
Trans r Soc B Biol Sci 364(1535):3567–3574. https://doi.org/10. Adv Neural Inf Process Sys 2017:6403–6414
1098/rstb.2009.0191 Leony D, Merino P, Pardo A, Delgado-Kloos C (2013) Provision of
Batty M, Taylor MJ (2003) Early processing of the six basic facial awareness of learners’ emotions through visualizations in a
emotional expressions. Brain Res Cogn Brain Res computer interaction-based environment. Expert Syst Appl.
17(3):613–620. https://doi.org/10.1016/s0926-6410(03)00174-5 https://doi.org/10.1016/j.eswa.2013.03.030
Beary M, Hadsell A, Messersmith R, Hosseini MP (2020) ‘‘Diagnosis Maenner MJ et al (2021) Prevalence and characteristics of autism
of autism in children using facial analysis and deep learning,’’ spectrum disorder among children aged 8 years—Autism and
arXiv developmental disabilities monitoring Network, 11 Sites, United
Brumfitt S (1993) Clinical forum. Aphasiology 7(6):569–575. https:// States, 2018. MMWR Surv Summ 70(11):1–16. https://doi.org/
doi.org/10.1080/02687039308248631 10.15585/MMWR.SS7011A1
Charlop-Christy MH, Carpenter M, Le L, LeBlanc LA, Kellet K Magdin M, Benko L, Koprda Š (2019) A case study of facial emotion
(2002) Using the picture exchange communication system (Pecs) classification using affdex. Sensors 19(9):2140. https://doi.org/
with children with autism: assessment of pecs acquisition, 10.3390/s19092140
speech, social-communicative behavior, and problem behavior. Nagy E, Prentice L, Wakeling T (2021) Atypical facial emotion
J Appl Behav Anal 35(3):213–231. https://doi.org/10.1901/jaba. recognition in children with autism spectrum disorders: explora-
2002.35-213 tory analysis on the role of task demands. Perception
Cheng L, Kimberly G, Orlich F (2002) ‘‘KidTalk: Online Therapy for 50(9):819–833. https://doi.org/10.1177/03010066211038154
Asperger’s Syndrome,’’, [Online]. Available: https://pdfs.seman O’Neill B, Gillespie A (2014) Assistive technology for cognition, no.
ticscholar.org/186e/13195cb3f94dfeb8d978ed5317827ef08263. January 2020.. doi: https://doi.org/10.4324/9781315779102-8
pdf Pantic M, Rothkrantz LJM (2000) Automatic analysis of facial
Conner CM, White SW, Scahill L, Mazefsky CA (2020) The role of expressions: the state of the art. IEEE Trans Pattern Anal Mach
emotion regulation and core autism symptoms in the experience Intell 22(12):1424–1445. https://doi.org/10.1109/34.895976
of anxiety in autism. Autism 24(4):931–940. https://doi.org/10. Ranjan NM, Prasad RS (2018) LFNN: Lion fuzzy neural network-
1177/1362361320904217 based evolutionary model for text classification using context
Dautenhahn K, Werry I (2004) Towards interactive robots in autism and sense based features. Appl Soft Comput J 71:994–1008.
therapy. Pragmat Cogn 12(1):1–35. https://doi.org/10.1075/pc. https://doi.org/10.1016/j.asoc.2018.07.016
12.1.03dau Rashidan MA et al (2021) Technology-assisted emotion recognition
‘‘Dataset link.’’ https://www.kaggle.com/gpiosenka/autistic-children- for Autism Spectrum Disorder (ASD) children: a systematic
data-set-traintestvalidate literature review. IEEE Access 9:33638–33653. https://doi.org/
Dollion N et al (2022) Emotion facial processing in children with 10.1109/ACCESS.2021.3060753
autism spectrum disorder: a pilot study of the impact of service Robins B, Dautenhahn K, Dickerson P (2009) ‘‘From isolation to
dogs. Front Psychol 13(May):1–13. https://doi.org/10.3389/ communication: a case study evaluation of robot assisted play
fpsyg.2022.869452 for children with autism with a minimally expressive humanoid
Donato G, Bartlett MS, Hager JC, Ekman P, Sejnowski TJ (1999) robot.’’ Sec Int Conf Adv Comput-Human Interact
Classifying facial actions. IEEE Trans Pattern Anal Mach Intell 2009:205–211. https://doi.org/10.1109/ACHI.2009.32
21(10):974. https://doi.org/10.1109/34.799905 Staff AI, Luman M, van der Oord S, Bergwerff CE, van den
el Kaliouby R, Robinson P (2005) The emotional hearing aid: an Hoofdakker BJ, Oosterlaan J (2022) Facial emotion recognition
assistive tool for children with Asperger syndrome. Univers impairment predicts social and emotional problems in children

123
6708 Fatma M. Talaat et al.

with (subthreshold) ADHD. Eur Child Adolesc Psychiatry Publisher’s Note Springer Nature remains neutral with regard to
31(5):715–727. https://doi.org/10.1007/s00787-020-01709-y jurisdictional claims in published maps and institutional affiliations.
Wells LJ, Gillespie SM, Rotshtein P (2016) Identification of
emotional facial expressions: effects of expression, intensity,
and sex on eye gaze. PLoS ONE 11(12):e0168307. https://doi.
org/10.1371/journal.pone.0168307

Authors and Affiliations

Fatma M. Talaat1,2,3 • Zainab H. Ali4,5 • Reham R. Mostafa6,7 • Nora El-Rashidy1

& Nora El-Rashidy 3


Nile Higher Institute for Engineering and Technology,
[email protected] Mansoura, Egypt
4
Fatma M. Talaat Embedded Network Systems and Technology Department,
[email protected] Faculty of Artificial Intelligence, Kafrelsheikh University,
Kafrelsheikh, Egypt
Zainab H. Ali
5
[email protected] Department of Electronics and Computer Engineering,
School of Engineering and Applied Sciences at Nile
Reham R. Mostafa
University, Giza, Egypt
[email protected]
6
Research Institute of Sciences and Engineering (RISE),
1
University of Sharjah, Sharjah 27272, United Arab Emirates
Machine Learning and Information Retrieval Department, 7
Faculty of Artificial Intelligence, Kafrelsheikh University, Information Systems Department, Faculty of Computers and
Kafrelsheikh, Egypt Information Sciences, Mansoura University,
2
Mansoura 35516, Egypt
Faculty of Computer Science and Engineering, New
Mansoura University, Gamasa 35712, Egypt

123

You might also like