Affective Computing: Recent Advances, Challenges, and Future Trends
Affective Computing: Recent Advances, Challenges, and Future Trends
research is generally small [29]. Representative datasets include Speech analysis. Speech emotion recognition is the process
the Database for Emotion Analysis using Physiological Signals by which a computer automatically recognizes the emotional
(DEAP) [30], the Shanghai Jiao Tong University Emotion EEG state signaled by speech. Speech contains emotional informa-
Dataset (SEED) [31], and WESAD, a dataset for wearable stress tion, such as speech rate and intonation, in addition to semantic
and affect detection [32]. information. Speech emotion analysis combines linguistic and
acoustics-related technologies to analyze the syntax, semantics,
Sentiment analysis and acoustic feature information related to the speaker’s emo-
Text analysis. This method focuses on extracting, analyzing, tional state [40]. This analysis mainly revolves around rhyme,
understanding, and generating emotional information in natu- spectrum, and sound quality features. The numerous acoustic
ral language. Early text affective recognition relied mainly on features related to affective states include fundamental fre-
manually constructed affective dictionaries and rules for affec- quency, duration, speech rate, resonance peaks, pitch, mel-filter
tive analysis. These methods judge sentiment polarity by bank (MFB), log-frequency power coefficients (LFPC), linear
matching sentiment words with grammatical rules in a text predictive cepstral coefficients (LPCC), and mel-frequency ceps-
[33,34]. However, this approach is limited by emotional lexicon tral coefficients (MFCC) [41–43]. These features are represented
coverage and rules, making it challenging to support multido- as fixed dimensional feature vectors, with each component rep-
main sentiment analysis. With the advancement of machine resenting the statistical value of each acoustic parameter, includ-
learning, text emotion recognition methods based on statistical ing the mean, variance, maximum or minimum value, and range
and machine learning algorithms have emerged. By training of variation. Recently, the ability of neural networks to extract
on large-scale text datasets, machine learning models can auto- suitable feature parameters has received increasing attention.
matically learn emotional expression and semantic features, Deep speech emotion features are learned from speech signals
enhancing the accuracy and generalization ability of sentiment or spectrograms through tasks related to speech emotion rec-
classification [35,36]. In recent years, deep-learning technol- ognition. Deep speech features learned from large-scale training
ogy has considerably impacted text emotion recognition. data are widely used as speech emotion features in speech event
Neural network-based models, such as recurrent neural net- detection and speech emotion recognition tasks, as in the
works (RNNs), convolutional neural networks (CNNs), long VGGish and wav2vec projects [44,45], for example. In recent
short-term memory (LSTM) networks, bidirectional encoder years, algorithms such as ConvNet learning [46], ConvNet-RNN
representation from transformers (BERT), and generative pre- [47], and adversarial learning [48] have considerably improved
trained transformers (GPT), have been successful in various speech emotion recognition performance.
sentiment analysis tasks [37–39]. They can capture contextual Visual analysis. Visual emotion recognition research primar-
information and semantic relationships to better understand ily focuses on facial expression recognition (FER) and emotional
and analyze sentiments. body gesture recognition. The conventional method involves
Data collection
This study searched for papers published in affective computing
from January 1997 to September 2023 in the Web of Science Core
Collection (WoSCC), which includes the Science Citation Index peer review and expert judgment, bibliometrics can provide
Expanded, Social Sciences Citation Index, Arts & Humanities quantitative indicators to ensure objectivity through statistical
Citation Index, Emerging Sources Citation Index, Conference analysis of academic achievements [85]. Bibliometric analysis
Proceedings Citation Index—Science (CPCI-S), and Conference enables monitoring and summarizes the status, hotspots, and
Proceedings Citation Index—Social Sciences & Humanities trends of a particular topic, helping researchers identify future
Fig. 2. Annual scientific production on “affective computing” from 1997 January 1 to 2023 September 25.
Table 5. Top 20 categories with the most papers in the field of affective computing
C1 U1 I1 U2 G1 J I2 A S1 S2 C2 F N S3 T S4 M P B G2
C1 / 641 79 343 79 256 59 218 43 61 137 44 28 61 12 161 38 57 6 4
U1 641 / 128 232 174 42 120 122 62 83 153 100 105 55 42 71 10 36 53 23
I1 79 128 / 73 15 15 22 37 19 33 26 26 8 67 10 48 26 11 3 2
U2 343 232 73 / 294 40 132 119 96 15 60 89 160 77 25 64 30 43 30 48
G1 79 174 15 294 / 41 69 47 39 11 43 59 99 4 17 17 5 7 13 19
J 256 42 15 40 41 / 4 20 17 7 27 15 11 6 5 18 16 1 3 1
I2 59 120 22 132 69 4 / 22 58 13 25 65 53 10 13 37 4 10 6 7
A 218 122 37 119 47 20 22 / 26 14 30 22 24 27 14 35 25 23 11 3
S1 43 62 19 96 39 17 58 26 / 15 15 42 46 23 12 11 5 10 24 18
S2 61 83 33 15 11 7 13 14 15 / 10 16 6 29 2 9 71 3 3
C2 137 153 26 60 43 27 25 30 15 10 / 45 19 40 10 9 2 12 17 3
F 44 100 26 89 59 15 65 22 42 16 45 / 39 11 4 2 8 16 18 10
N 28 105 8 160 99 11 53 24 46 6 19 39 / 19 8 3 1 11 14
S3 61 55 67 77 4 6 10 27 23 29 40 11 / 8 4 41 120 1 3
T 12 42 10 25 17 5 13 14 12 10 4 19 8 / 5 7 3 2 1
global average level, a value greater than 1 indicates higher words. The results of conducting frequency and co-occurrence
performance, and a value less than 1 indicates lower perform analysis on keywords assigned to papers by authors in the field
ance; a value of 2 indicates performance twice as high as the of affective computing are shown in Table 10.
global average. The top 5 institutions according to CNCI rank- The Thomson Data Analyzer was used to automatically and
ings were Nanyang Technological University (5.06), Imperial manually clean the keywords assigned by the authors of papers
College London (3.58), Tsinghua University (3.23), the Chinese in the dataset. Subsequently, VOSviewer was used to cluster
Academy of Sciences (3.15), and the University of California the core (high-frequency) subject words and set a certain co-
System (2.77). occurrence frequency and co-occurrence intensity according
to the size of the dataset to cluster the keywords. Combined with
Citation network analysis expert interpretation, each cluster was named and interpreted,
This section analyzes the direct citations of all authors in the and the topics of the journal articles were identified and ana-
field of affective computing. To highlight the key authors, 40 lyzed. After keyword cleaning, 613 keywords appearing more
authors who had published no fewer than 30 papers were selected than 20 times were selected as analysis objects for cluster calcu-
for analysis. The results are shown in Fig. 5. Authors in clusters lation. Five clusters were obtained by clustering the core subject
of the same color have strong correlations and inheritance in words with the highest co-occurrence intensity, as shown in
research content. Representative scholars from the 5 clusters are Table 11 and Fig. 6.
listed in Table 9. The average number of citations of a research theme is the
average number of times that a paper containing these subject
words has been cited since publication, and the average correla-
Word frequency analysis tion strength of a research theme indicates the closeness of the
Word frequency refers to the number of times a word occurs connection between the core subject words contained in this
in the document being analyzed. In scientometric research, theme concept. The greater the correlation strength, the greater
word frequency dictionaries can be established for specific sub- the co-occurrence intensity between the core subject words and
ject areas to quantify the analysis of scientists’ creative activities. the more concentrated the research. In contrast, relatively lower
Word frequency analysis is the method of extracting keywords correlation is associated with more scattered research. Research
or subject words that express the core content of the articles in on the application of affective computing in the analysis of
the literature, to study the development trends and research affective disorders has the highest average citation frequency,
hotspots of the field through the frequency distribution of these which shows that interdisciplinary research involving affective
No. Institution Number of papers Citation impact Category H-index Percentage in Q1 Country
Normalized journals
Citation Impact
1 Chinese Academy 699 20.97 3.15 60 59.87 China
of Sciences
2 University of 443 50.26 2.29 77 69.45 UK
London
3 UDICE-French 388 18.86 1.37 42 50.43 France
Research
Universities
4 Centre National de 377 19.33 1.36 42 51.56 France
la Recherche Sci-
entifique (CNRS)
5 University of 371 40.83 2.77 64 58.72 USA
California System
6 National Institute 364 9.68 1.46 29 26.43 India
of Technology
(NIT System)
computing and medicine, especially research on affective dis- citations of their papers was more than twice the global average.
orders and depression recognition, has a greater influence. The Citation network analysis showed that Chinese scholars are
average correlation strength of multimodal sentiment analysis representative and have become essential nodes in the citation
based on deep learning is the largest, which shows that the network, indicating that China is constructing a large-scale tal-
research on this topic is the most concentrated. ent team for affective computing and progressing in both the
quantity and quality of research. However, China also faces
disadvantages in academic journals, international conferences,
Discussion and other aspects, leading to weak dominance, which restricts
This paper presents a comprehensive analysis and review of China’s academic discourse improvement in this field. Notably,
systematically collected data on papers and major intellectual in recent years, India’s publication volume has exceeded that of
property rights in the field of affective computing. The results the United States for the first time, revealing a robust develop-
reveal that over the past 25 years, affective computing has expe- ment potential linked to its advantages in computing.
rienced rapid growth in the number of published papers, rep- Nonetheless, India still has room for growth in terms of research
resenting a vibrant academic ecology and an interdisciplinary quality and paper impact as it lacks representative scholars in
character with a wide range of disciplines. Additionally, scholars the field of affective computing.
worldwide actively participate in a relatively close cooperation
network. In particular, Chinese scholars have led the world in Challenges and technology development trends
terms of the number of publications, scholars, and collaborative Modeling of cultural contexts
papers in this field. Among important research institutions, This study found that affective computing researchers are dis-
Tsinghua University and the Chinese Academy of Sciences stand tributed across various countries globally and have a wide range
out, with CNCI values indicating that the average number of of cultural backgrounds. While emotional expression has a
degree of consistency across humanity, it is considerably influ- techniques for affective computing and opinion mining” and
enced by cultural background. Cultural norms and values “facial expression and micro-expression recognition and analysis.”
determine the different emotional experiences of individuals Current research focuses more on emotion recognition, with rela-
and how others perceive these emotions. Therefore, affective tively limited attention accorded to emotion generation. Emotion
computing systems developed using a single cultural group may recognition and generation are both essential aspects of affective
fail in other cultural contexts. For example, Chinese, Germans, computing and constitute an important technical basis for the
and Japanese express emotions relatively implicitly, whereas closed loop of human–computer interaction. To enable machines
Americans, British, and Brazilians express emotions more to provide more anthropomorphic and natural feedback, it is cru-
overtly. This indicates that emotion agents must match emotion cial to focus on the following 2 research areas. (a) Generation of
calculation rules with the cultural context. Many Western cul- facial expressions. The fact that human emotions are expressed
tural standards may not necessarily apply in Eastern contexts. through visual (55%), voice (38%), and verbal (7%) signals is also
For example, Japanese researchers tend to develop robots that known as the “3V rule,” which reflects the importance of human
can express emotions implicitly because overly direct expres- facial expressions in emotion analysis [90]. Appropriate use of
sions of emotions may cause user dissatisfaction [88]. Therefore, facial expressions by avatars and robots can enhance human–robot
cultural characteristics must be considered in developing uni- interaction. Thus, current research aims to build a lexicon of facial
versal cross-cultural emotional agents for people from differ- expressions that can translate communicative intent into associ-
ent cultural backgrounds. Hofstede defined culture in terms ated expressive morphology and dynamic features to express vari-
of 5 measures—power distance, identity, gender, uncertainty ous meanings. Meanwhile, a team of animation experts is required
avoidance, and long-term orientation—which can be used to to achieve realistic facial rendering effects, including lighting and
summarize the typical rules of emotional expression in differ- muscle textures. (b) Generation of emotional body movement.
ent cultural contexts [89]. When it is challenging to obtain This requires the design of embodied agents using computer mod-
culture-specific empirical affective data, it is more feasible to els of body expression. This area involves studying human kine-
design affective computational models using cultural theories matics; however, researchers have yet to determine how to
and rules. characterize the organic combination of body parts, movement
strength, and posture of specific emotional states.
No. Number of Technical keyword Number of co-occurrences with Time period Proportion of occur-
occurrences other keywords rences within last
3 years (%)
1 7,621 Sentiment analysis Machine learning [958]; 2006–2023 21
Opinion mining [936];
Natural language processing
[829]
2 4,566 Emotion recognition Feature extraction [422]; 1997–2023 24
Affective computing [397];
Deep learning [372]
3 2,457 Affective computing Emotion recognition [397]; 2000–2023 15
Machine learning [191];
Emotion [137]
4 2,232 Deep learning Sentiment analysis [691]; 2012–2023 40
Emotion recognition [372];
Machine learning [268]
5 2,054 Machine learning Sentiment analysis [958]; 2002–2023 27
Natural language processing
(Continued)
No. Number of Technical keyword Number of co-occurrences with Time period Proportion of occur-
occurrences other keywords rences within last
3 years (%)
17 620 Classification Sentiment analysis [208]; 2003–2023 19
Machine learning [100];
Emotion recognition [84]
18 618 Facial expression Emotion recognition [175]; 1998–2023 15
Emotion [78];
Affective computing [49]
19 582 Convolutional neural Deep learning [146]; 2003–2023 30
network Facial expression recognition
[102];
Emotion recognition [100]
20 535 Schizophrenia Social cognition [193]; 1998–2023 8
Emotion recognition [88];
Theory of mind [68]
21 478 Support vector machine Sentiment analysis [123]; 2002–2023 9
Facial expression recognition [79];
neuroimaging. Further human research in the field of cognitive parameters, and the selection of these parameters requires sam-
neuroscience will ultimately affect the development of affective ples that are typically 100 times the number of parameters. A
computing and artificial intelligence as a whole. The cognitive larger dataset size enables the trained model to avoid overfitting,
process of human brain emotion processing, its neural mecha- which improves model learning. However, the challenge lies in
nism, and its anatomical basis provide essential inspiration for labeling these massive datasets. Thus, it is necessary to explore
the development of affective computing models. However, to active, weakly supervised, and unsupervised learning methods
ensure that machines have genuine emotions rather than just to label the meaningful data in large unlabeled datasets or train
appearing to have emotions, further research in cognitive neu- machines for labeling. The second trend highlights the need
roscience is required. This research may involve exploring the for the collection of multimodal data, the accumulation of
neural basis for the generation of human consciousness, the richer modal information, and fine-grained alignment between
neural mechanism for the construction of human values, and different modalities. At this stage, machines differ from human
other key scientific issues. Based on this neural theoretical foun- beings in 2 critical aspects: First, humans exist in a multimodal
dation, simulation and machine implementation are feasible social environment, as evidenced by their joint expression of
options for providing machines with authentic emotions. intentions and emotions through language, facial expressions,
speech, and actions; second, humans can switch between
Construction of large-scale multimodal datasets modalities for emotional reasoning when dealing with emo-
The development of affective computing is highly dependent on tions. They can also switch between different modalities to
the construction of large-scale open datasets. Three major trends search for clues, eliminate ambiguities, and conduct emotional
are described below. The first trend predicts that dataset sizes will reasoning through interconnections. Therefore, creating a
continue to grow to meet the demands of deep learning algorithm large-scale multimodal emotion dataset can contribute to the
training. Deep-learning models have a substantial number of development of human-like emotion intelligence technology
and the realization of more accurate emotion recognition. The high-quality labeled emotional-physiological data in daily
third trend focuses on collecting natural-scene data, as emo- life remains a challenge due to the lack of hardware collec-
tional data collected in performance or evoked mode may not tion devices that are sufficiently comfortable and resistant to
accurately represent real-life scenarios. However, collecting interference.