Modeling_Category_Semantic_and_Sentiment_Knowledge_for_Aspect-Level_Sentiment_Analysis

The document presents a novel multi-task learning framework called Co-interactive Attention Network (CoAN) for aspect-level sentiment analysis, which integrates fine-grained and coarse-grained semantic knowledge to improve sentiment classification accuracy. CoAN leverages two co-interactive attention layers to enhance the relationship between Aspect Term Sentiment Classification (ATSC) and Aspect Category Sentiment Classification (ACSC), addressing challenges like polysemy and ambivalence. Experimental results demonstrate that CoAN outperforms existing methods, achieving better accuracy and F1-scores on benchmark datasets.

Uploaded by

venomroy9363

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Modeling_Category_Semantic_and_Sentiment_Knowledge_for_Aspect-Level_Sentiment_Analysis

Uploaded by

venomroy9363

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

1962 IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 15, NO.

4, OCTOBER-DECEMBER 2024

Modeling Category Semantic and Sentiment

Knowledge for Aspect-Level Sentiment Analysis
Yuan Wang , Peng Huo , Lingyan Tang , Ning Xiong , Mengting Hu , Qi Yu , and Jucheng Yang

Abstract—To classify the sentiment polarity of the aspect entity classifying sentiments of specific terms within a sentence. For
in a sentence, most existing research evaluates the semantic knowl- instance, in “The food is good, but the waiter is so rude.” ATSC
edge among a certain aspect of a sentence and corresponding con- identifies ‘food’ as positive (corresponding to the expression
text as significant clues for the task. However, available accompa-
nying information has not been completely exploited, especially the “good”) and ‘waiter’ as negative (corresponding to the expres-
coarse-grained category-level knowledge in contexts. Such knowl- sion “so rude”). In contrast, ACSC is a coarse-grained subtask
edge can help to alleviate polysemy and ambivalence problems. that predicts sentiments over general categories in a sentence.
In this article, we propose a multi-task learning framework Co- In the same instance, ACSC might categorize ‘Food Quality’ as
interactive Attention Network(CoAN) to jointly learn and handle positive and ‘Service Quality’ as negative. Thus, while ATSC
multiple granularity features at both target and category levels. In
order to leverage the fine-grained and coarse-grained knowledge deals with explicit terms, ACSC handles broad categories.
in contexts and get multi-granularity sentiment related sentence In the evolving landscape of sentiment analysis, the comple-
representations, we introduce two co-interactive attention layers mentary strengths of symbolic and sub-symbolic AI approaches
to conduct accompanying semantic interactions at the word-level play a pivotal role in addressing the challenges of ACSC and
and the feature-level. The experimental results on three restaurant ATSC. Symbolic methods [3], [4] use structured knowledge,
review datasets prove that CoAN is superior to the baselines by
1.41% in accuracy and 2.81% in F1-score. Furthermore, ablation such as syntax or commonsense knowledge for sentiment anal-
studies and attention visualizations show that the multi-task frame- ysis. However, symbolic methods typically struggle to under-
work and novel co-interactive mechanisms can distinguish and fuse stand nuanced language structures. Zhu et al. [5], [6] focus
multi-granularity knowledge, which benefits the two subtasks in on the structure investigation of syntactic dependency relations
aspect based sentiment analysis. but ignore rich semantic structures between words. Formally,
Index Terms—Aspect-level sentiment analysis, category-level attention-based sub-symbolic models [7], [8], [9] usually con-
knowledge, multi-task learning. struct two-channel neural network to model text sequences and
aspect sequences. However, the performance of these models
I. INTRODUCTION is contingent upon the availability of large corpus, and their
potential is constrained by high computational costs. Within a
SPECT based sentiment analysis (ABSA) is drawn to
A infer sentiment polarities towards different aspects in a
sentence, where aspect can be a specific term or a general
specific domain of interest, the availability of labeled dataset is
not always guaranteed, and the process of manual annotation
is both laborious and costly. Dragoni et al. [10], [11] achieve
category. ABSA includes two subtasks: Aspect Term Sentiment automatic discrimination of fine-grained sentiment analysis by
Classification [1] (ATSC) and Aspect Category Sentiment Clas- establishing a graph structure. The accuracy of the sentiment
sification [2] (ACSC). ATSC is a fine-grained subtask focuses on analysis heavily depends on the quality of the graph’s con-
struction. And the majority of current research [12], [13], [14],
Manuscript received 2 February 2023; revised 21 January 2024; accepted [15] focuses on addressing either ATSC or ACSC as isolated
15 April 2024. Date of publication 19 April 2024; date of current version 18 tasks. Current research, while grappling with data limitations
November 2024. This work was supported in part by the National Natural
Science Foundation of China under Grant 61702367 and Grant 61976156, in and computational challenges, often overlooks the potential of
part by Tianjin Science and Technology Commissioner project under Grant combining syntactic and semantic analyses.
20YDTPJC00560, in part by the Fundamental Research Program of Shanxi We observe that the granular knowledge from ATSC and
Province under Grant 202303021221132, and in part by the special fund for
Science and Technology Innovation Teams of Shanxi Province under Grant ACSC can be mutually supplementary, especially in complex
202304051001017. Recommended for acceptance by P. Nakov. (Corresponding sentiment contexts. In Fig. 1(a), there are two sentences with the
author: Jucheng Yang.) same aspect category “food” and the same sentiment expression
Yuan Wang, Peng Huo, Lingyan Tang, Ning Xiong, and Jucheng Yang are
with the College of Artificial Intelligence, Tianjin University of Science and “high”, but express opposite sentiment polarities. Therefore, it
Technology, Tianjin 300457, China (e-mail: [email protected]; yaphet is biased to assume that the emotional polarities of aspect cate-
[email protected]; [email protected]; [email protected]. gory is only related to the sentiment expressions in a sentence.
cn; [email protected]).
Mengting Hu is with Nankai University, Tianjin 300192, China, and also with In Fig. 1(b), the aspect category “food” originally expresses
the Shanxi Medical University, Jinzhong 030605, China. positive sentiment but ends up with neutral sentiment polarity
Qi Yu is with Shanxi Medical University, Jinzhong 030605, China, and also due to the implicit category semantics with negative sentiment.
with the Key Laboratory of Big Data Clinical Decision Research in Shanxi
Province, Taiyuan 030619, China. It means the target sentiment prediction will be affected by
Digital Object Identifier 10.1109/TAFFC.2024.3391337 the sentiment of given target’s corresponding aspect category.

1949-3045 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:26:58 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: MODELING CATEGORY SEMANTIC AND SENTIMENT KNOWLEDGE FOR ASPECT-LEVEL SENTIMENT ANALYSIS 1963

an aspect-oriented dependency-parsed tree by analyzing and

pruning the dependency-parsed tree of the sentence. Meth-
ods [10], [11] based on Knowledge Graphs (KGs), while lever-
aging large-scale domain-specific knowledge, may introduce
redundant information when the semantic edges in the graph
don’t directly relate to the sentiment target. To solve the issue of
semantic neutrality or semantic ambivalence, Valdivia et al. [16]
addressed semantic neutrality with a multi-level model, yet the
context-awareness remains limited in GCN-based approaches.
Hou et al. [17] and Bao et al. [4] further explored dependency
tree-based and ensemble models, respectively, emphasizing the
need for improved context integration.

B. Aspect Category Sentiment Classification

Fig. 1. Examples with contained target, category labels and sentiment polari- ACSC focuses on predicting sentiments of potentially im-
ties. plicit targets in texts. Hu et al. [15] introduced a constrained
attention network (CAN) for multi-aspect sentiment analysis,
Hence, ATSC needs to consider both fine-grained target-related while Li et al. [18] employed bidirectional long short-term mem-
sentiments and broader category-level semantics. ory (LSTM) with an attention mechanism for category-specific
In this paper, we propose a Co-interactive Attention Network representations. Chen et al. [19] and Dai et al. [20] further
(CoAN) as a novel multi-task learning framework, enabling explored graph attention networks and category name embed-
a synergistic interaction between ATSC and ACSC. CoAN ding networks, respectively. These approaches are innovative
leverages the strengths of sub-symbolic AI methods for ATSC, but often ignore a comprehensive combination of coarse-grained
while addressing the disadvantages of symbolic AI methods and fine-grained sentiment analyses.
in ACSC. And CoAN integrates a self-attention mechanism
(SA) for contextual representation and a hierarchical collab- C. Multi-Task Learning
orative interaction attention layer. This architecture enhances Multi-task learning [21] is an import mechanism of machine
the interrelation between the two subtasks of sentiment analysis learning which improves the general performance by learning
through shared parameters, optimizing their mutual association the main task with other associated auxiliary tasks. It usually
and effectiveness in sentiment analysis. And we add category contains a common layer that learns the shared representations
labels with tokens in the input layer and design two output with different tasks and then stacks several task-specific upper
strategies to improve the performance of ACSC. In summary, layer modules to learn task-relevant representations. A recent
the main contributions of our work can be concluded as follows:
r We propose CoAN, a heuristic and novel multi-task learn- method [22] adopts multi-task learning networks to integrate
dialog act recognition and sentiment classification, which aims
ing framework to blend disparate levels of semantic knowl- to analyze conversational sentiment by act and language [23].
edge for the challenge of ambivalence samples and samples However, some methods [2], [15], [18] combine ATSC task with
with a polysemous target.
r Weakly supervised information from both coarse-grained aspect term extraction task or combine ACSC task with aspect
category detection (ACD) task. Due to the sequential execution
category-level semantics and sentiment features is lever- characteristics of aspect entity detection tasks and emotion
aged as guiding knowledge by co-interactive attention lay- classification tasks, Kumar et al. [24] proposed a convolutional
ers in the targeted subtasks of targeted sentiment analysis.
r Experimental results on three benchmark datasets show stacked bidirectional LSTM based on a multiplicative attention
mechanism for aspect category and sentiment polarity detection.
that CoAN surpasses the state-of-the-art baseline methods However, some existing works [19], [25] have shown that ATSC
on both aspect term and aspect category sentiment classi- task can be effectively improved by being jointly trained with
fication subtasks. document-level sentiment classification tasks. Inspired by that,
we design a novel multi-task joint architecture based on the target
II. RELATED WORK sentiment classification task and the category-level sentiment
A. Aspect Term Sentiment Classification classification task.

ATSC leverages both graph convolutional networks (GCNs)

and symbolic methods for in-depth understanding of targets III. METHODOLOGY
in natural language understanding. This ensemble application We define ATSC task as a 3-classification process to
of symbolic and sub-symbolic AI enables nuanced sentiment predict sentiment polarity y ∈ {positive, negative, neutral}
analysis by combining the depth of machine learning with the over a given target ti = {wi+1 , wi+2 , . . . , wi+M } in the con-
clarity of rule-based systems. Liang et al. [12] presented a text S = {w1 , w2 , . . . , wL }. Given a category label set A =
GCN-based SenticNet to integrate the sentiment dependencies {a1 , a2 , . . . , aN } where N is the number of labels, we define
in sentences according to specific target. Li et al. [9] constructed ACSC task as an N sentiment classification process to predict
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:26:58 UTC from IEEE Xplore. Restrictions apply.
1964 IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 15, NO. 4, OCTOBER-DECEMBER 2024

where K is the word vector matrix of the text sequence, ⊕ is

concatenation operation, ki is the single word vector represen-
tation in matrix K. Wsa ∈ R2 d is the learnable weight matrix,
tanh is an activation function, and softmax is a normalization
function.
SA is used to learn the internal correlation of context se-
quences, so that the two sentiment classification subtasks can
share the contextual semantic features and balance the feature
heterogeneity of different tasks. The word vector representation
matrix X is processed by SA to obtain the context initial
semantic representation Hc which is calculated by:
Hc = SA(X, X) (2)
Fig. 2. Schematic view of CoAN architecture, where “SA”, “CA”, “MSA”
and “PCT” denote self-attention, cross-attention, multi-head attention and point- The number of convolution kernels in P CT is fixed to 1,
wise convolution transformation, respectively. and its role is to transfer the full text information to each word
through a convolution sliding window. The P CT is calculated
the sentiment polarities over all categories discussed in the by:
context S. 1

P CT (H) = σ H ∗ Wpc + b1pc ∗ Wpc2
+ b2pc (3)
As shown in Fig. 2, CoAN multi-task architecture contains
1 2
two modules, the left main Task Learning Module (ATSC Task where Wpc , Wpc ∈ Rd×d are weight matrices, b1pc , b2pc ∈ Rd are
Learning Module) and the right auxiliary task learning mod- bias vectors, both of which are learnable parameters.
ule (ACSC Task Learning Module). They can solve sub-tasks The Context semantic representation Hc is calculated by:
of target-level sentiment analysis and category-level sentiment
analysis in parallel. Word-level and feature-level attention mech- Hc = P CT (Hc ) (4)
anisms are designed to calculate the emotional semantic associ- To model the implicit correlations of target words or cate-
ation between text and learn contextual emotional features from gory labels with context words, we adopt the CA to calculate
a multi-granularity perspective. Furthermore, the right auxiliary the semantic correlation between two text sequences at the
task learning module can be replaced by other semantic tasks to word level. Given text sequence word vector matrix sequences
make CoAN more robust. K = {k1 , k2 , . . . , kL } and Q = {q1 , q2 , . . . , qM }, the CA is
calculated by:
A. Embedding Layer With Category Labels

L
M
The context and target are processed into pair format of CA(K, Q) = softmax (tanh ([ki ⊕ qj ] · Wca )) · ki
BERT as “[CLS] + context + [SEP]” and “[CLS] + target + i j
[SEP]”, respectively. In order to learn category-level knowledge (5)
in the context, the category labels, as guidance information, are where ki and qj represents the vector of the i − th and j −
encoded at the embedding layer without any tokens because they th words in the two text sequences. Wca ∈ R2 d is a learnable
are just one-word long. With a BERT matrix E ∈ Rd×|V | , where matrix.
d is the embedding dimension and |V | is the vocabulary size, we Formula (6) uses the attention mechanism to calculate the
can get three sequences of word vectors X = {x1 , x2 , . . . , xL }, semantic correlation between the context and the aspect entity,
Xt = {xt1 , xt2 , . . . , xtM } and Xa = {xa1 , xa2 , . . . , xaN }. And then and calculates the semantic correlations between the context
we use them as the input of the word-level attention interaction and the aspect category entity words. The semantic interaction
layer. process of Xt and Xa with X is as follows. We obtain the initial
semantic representations of aspect entity and category entity,
and use them as the input to the attention interaction layer at the
B. Co-Interactive Attention Layers
next semantic level.
1) Word-Level Attention Interaction Layer: The word-level
attention interaction layer is composed of three components: the Ht = CA (X, Xt )
SA as mentioned above, point-wise convolution transformation Ht = P CT (Ht )
(P CT ) [8] and cross attention mechanism (CA). The input of
this layer is the original word vector, and we improve the CA Ha = CA (X, Xa ) (6)
to calculate the coarse-grained semantics between context and Xt is the word vector matrix of t − th aspect entity in the context,
aspect entity words and between context and aspect category Xa is the word vector matrix of the predefined a − th aspect
entity words respectively. The SA is calculated by: category entity word and X is the context matrix.

L
L 2) Semantic-Level Attention Interaction Layer: Since an as-
SA(K, K) = softmax (tanh ([ki ⊕ kj ] · Wsa )) · ki pect entity may have multiple words, the contribution of each
i j word to the semantic representation varies. To thoroughly exploit
(1) the semantic and sentiment correlations between the context and
Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:26:58 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: MODELING CATEGORY SEMANTIC AND SENTIMENT KNOWLEDGE FOR ASPECT-LEVEL SENTIMENT ANALYSIS 1965

the given target at the feature-level, we adopt the multi-head TABLE I

DATASET STATISTICS WITH NUMBERS OF THREE SENTIMENTS
self-attention (M SA) mechanism for deeply extracting and fus-
ing the internal features. Specifically, the contextual sentiment
semantic representation of aspect entities Hct is calculated by:

M SA(K, Q) = CA1 , CA2 , . . . , CAnh · Wma
Hct = M SA (Hc , Ht ) (7)
where nh is the number of heads for M SA and Wma ∈ R is 2d

the learnable weight matrix. label value ycn is calculated by:

It is not enough to calculate the correlation between category
C
N
attributes and context at the word level, and it is also necessary to Lb = −
ycn log ycn
(12)
update the context representation of aspect-oriented categories c n
at the semantic level. For the j − th category, its context rep- The two subtasks are jointly trained by minimizing the sum
resentation of aspect-oriented categories Hca is calculated by: objection:
L(θ) = La + Lb + λθ22 (13)
Hca = CA (Hc , Ha ) (8)
where the θ contains all the learnable parameters of CoAN and
C. Prediction Layer λ is set as 0.001 in training.

1) Two Output Strategies for ACSC Prediction: For improv- IV. EXPERIMENTS
ing the performance of ACSC task, we adopt two output strate-
gies in the ACSC task learning module: (1) In CoAN+Four, A. Datasets and Settings
we regard ACSC as N 4-classification problems. The prediction Experiments are conducted on three benchmark restaurant
layer not only predicts the probability distributions of the three datasets from SemEval’14 Task 4 [26], SemEval’15 Task 12 [27]
sentiments but also judges whether the category attribute exists and SemEval’16 Task 5 [28] (denoted as Rest14, Rest15 and
in the current sentence; (2) In CoAN+Thri, we regard ACSC as Rest16). In accordance with the settings in previous works [1],
N 3-classification sentiment classification problems. [13], samples with conf lict sentiment label have been removed
The j − th category-oriented sentence representation is ap- and only the ones with positive, negative and neutral labels
plied to predict the sentiment polarity: have been retained. The number of each sentiment label in

yj = softmax Wpa · Hcaj
+ bap (9) training and test samples has been shown in Table I. In order
to balance the dataset, the 13 aspect categories are mapped into

where Wpa ∈ R1×C and bap ∈ RC are learnable parameters, and eight broad categories with the method mentioned in [2], while
the value of C can be 3 or 4 due to the strategy. the sentiment labels are kept unchanged. In all experiments,
2) ATSC Prediction: In order to enhance the context seman- the pre-trained BERT-base format is fine-tuned, embedding
tic features and aspect entity semantic features, we average dimension is set to 768 and hidden dimension to 300. During the
pooling the context sentiment semantic representation Hct , the training process, dropout rate is set to 0.1 to avoid overfitting,
context initial semantic representation Hc and the aspect entity and the number of training epochs is 20. The best models are
initial semantic representation Ht , and then concatenate the selected by early stopping. The model parameters are optimized
semantic vectors in series to obtain the input of aspect-level and updated by Adam optimizer with a learning rate of 5e-5.
fine-grained sentiment classification. The experiments are conducted using an Nvidia RTX 1080 GPU.
The experimental results are obtained by averaging the outcomes
oavg
ct = Hc
avg
⊕ Htavg ⊕ Hctavg

from ten runs with random initialization. Finally, the Accuracy

yc∈C = softmax Wpt · oavg
ct + bp
t
(10) (acc) and Macro-F1 (F1) scores are used as the final metrics to
evaluate the models’ performance.
where Wpt ∈ R1×C , btp ∈ RC are learnable parameters, and the
value of C is 3. B. Baselines
D. Loss Function CoAN is compared with various representative baselines in
Table II.
Fine-grained sentiment classification tasks generally adopt Attention-based models
the cross-entropy loss function, and the loss function for the r AC-MIMLLN [18]: It proposes an LSTM and attention
predicted value yc and the true label value yc is calculated by: networks model to treat the words indicating an aspect

C category as the key instances of the aspect category.
La = − yc log yc (11) r CoGAN [19]: It bases on graph attention networks to model
c above two kinds of document-level sentiment preference
For the coarse-grained sentiment classification auxiliary sub- information respectively, followed by an interactive mech-
task, the loss function for the predicted value ycn
and the true anism to integrate the two-fold preference.

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:26:58 UTC from IEEE Xplore. Restrictions apply.
1966 IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 15, NO. 4, OCTOBER-DECEMBER 2024

TABLE II TABLE III

THE PERFORMANCE OF THE MODELS ON ATSC TASKS IS MEASURED BY THE PERFORMANCE OF THE MODELS ON ACSC TASKS IS MEASURED BY
ACC(%) AND F1(%) ACC(%)

r TD-LSTM [1]: It constructs on LSTM networks in which

target information is automatically integrated to improve
the classification accuracy significantly.
r GSF+AEN [21]: It’s an adversarial learning framework
to identify the aspect-invariant/dependent sentiment ex-
pressions and adopt a gating mechanism to control the
contribution of representations.
r SCAN [2]: It tackles the ACD task and ACSC task by graph r CNE-net [20]: It sets encoder and decoder shared among all
attention networks to generate representations of the nodes categories to weaken the catastrophic forgetting problem.
in sentence constituency parse trees. And it applies category name for task discrimination.
r TD-GAT [29]: It bases on graph attention networks to Rule-based model
explicitly utilize the dependency relationship among words
r DP [3]: It is a classic double propagation-based approach,
to propagate sentiment features of an aspect target. recognized as a representative rule-based method for ex-
r AEN [8]: It employs attention based encoders for the tracting aspect-opinion sentiment triples.
modeling between context and target and introduce label
smoothing regularization.
r CAN [15]: It builds on attention networks and introduces C. Experimental Results Analysis
orthogonal regularization on multiple aspects and sparse The results are shown in Tables II and III, from which we
regularization on each single aspect to regularize the atten- can conclude that the two kinds of granularity of sentiment
tion for multi aspect sentiment analysis. analysis tasks can assist each other. The statistical significance
r MGAN [14]: It integrates the attention networks for ATSC test results(p-values) with CoAN and AEN, are presented in the
to capture the word-level interaction between aspect and last line of Table II, obtained by independently initializing both
context. the CoAN and AEN models with the same random seed. The
r IAN [30]: It builds an interactive attention mechanism to p-values prove that CoAN achieves significant improvements
interactively learn attentions in the contexts and targets. than AEN. And we observe that CoAN achieves the best re-
r ATAE-LSTM [13]: It aims to solve the ATSC problem sults, and consistently gets a 0.09–1.41% increase in accuracy
by focusing on different parts of a sentence through an and a 0.51–2.81% increase in F1-scores over baselines across
attention-based LSTM architecture. all datasets. CoAN’s superiority over both syntax-based and
r MemNet [31]: It builds an attention-based deep memory sequential models underscores the importance of leveraging
network for ATSC in which the importance of each context multi-grained semantic and sentiment features. As shown in
word is captured explicitly. Table III, CoAN outperforms the baseline models in ACSC tasks
Other framework-based models significantly, especially on Rest15 and Rest16. This demon-
r ASGCN [32]: It proposes a GCN over the dependency tree strates the effectiveness of CoAN’s approach for it uses category
of a sentence to exploit syntactical information and word labels for coarse-grained semantics and sentiments apart from
dependencies. using external knowledge, contrasting with CNE-net’s focus on
r Bi-GCN [33]: It builds a GCN over the dependency tree addressing the catastrophic forgetting problem.
of a sentence to exploit syntactical information and word To reduce the impacts between the two subtasks, CoAN is
dependencies. based on a multi-task learning framework, and a co-interactive
r CDT [34]: It’s a method based on a Bi-directional LSTM attention mechanism is introduced to transfer coarse-grained
to identify the sentiment polarity of opinion words. knowledge. Experimental results show that the two output strate-
r GCAE [35]: It bases on convolutional neural networks gies in ACSC can also improve the performance of ATSC task.
and gating mechanisms to selectively output the sentiment However, CoAN+Four performs better than CoAN+Thri for the
features according to the given aspects or entities. ATSC task on the Rest16 dataset. One of the reasons may be

Fig. 4. Examples of error cases.

TABLE V
ABLATION STUDY FOR ACSC

Fig. 3. Comparison of the performance of three models in different categories

of the Rest14.

TABLE IV
ABLATION STUDY FOR ATSC

category labels, like “[CLS] + FOOD + [SEP]”. Table IV

demonstrates the multi-task model’s superiority over single-task
models in fine-grained sentiment classification, suggesting that
coarse-grained subtasks enhance aspect-level sentiment analy-
that the ratio of ambivalence samples in Rest16 dataset is the sis. However, adding “CLS” and “SEP” to the ACSC task dimin-
highest, and statistics are shown in Fig. 4(a). ishes fine-grained classification performance, likely due to the
As shown in Fig. 3, CoAN consistently outperforms the overfitting caused by context and category feature fusion. Thus,
other two models in terms of accuracy and F1 score across all for fine-grained tasks, focusing on the left module’s semantic
categories. This superior performance is indicative of CoAN’s learning is crucial, with the right module serving as an auxiliary
robustness and effectiveness in sentiment classification. Addi- function.
tionally, in the ‘anecdotes’ category, CoAN also maintains a Table V indicates that introducing tokens into coarse-grained
high level of accuracy, further demonstrating its versatility and tasks enhances model performance, likely due to the interaction
comprehensive understanding of complex linguistic structures. within multi-classification task categories. ACSC’s performance
in single-task model surpasses that of multi-task models by
0.11%, 0.87% and 2.13% across three datasets. The CoAN
D. Ablation Study model shows better results with Rest15 and Rest16 for they
To assess the importance of the semantic modules in CoAN, have eight aspect categories and require more fine-grained sen-
ablation experiments are conducted. Results for two granularity timent information. Rest14, with a higher proportion of neutral
sentiment classification tasks are shown in Tables IV and V. sentiments, presents challenges for fine-grained sentiment anal-
The “left module” of CoAN+Thri, an ATSC single-task model, ysis. The standardized data expressions in Rest15 and Rest16
is compared with the “right module”, an ACSC single-task facilitate neural network models in learning text features more
model. Input layer tokens include “CLS” and “SEP” around effectively than in Rest14.

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:26:58 UTC from IEEE Xplore. Restrictions apply.
1968 IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 15, NO. 4, OCTOBER-DECEMBER 2024

V. CONCLUSION
In this article, we propose a novel multi-task learning frame-
work to jointly learn the two important subtasks of ABSA and
improve their performance simultaneously. We design two co-
interactive attention layers that can effectively exploit and lever-
age multiple granularity informative interactions at word-level
and feature-level. The experimental results show that CoAN
performs better than other baselines on the three public datasets,
and the problem of polysemy and ambivalence is solved by
adding coarse-grained knowledge. In future work, we would try
to further solve the information deficiency issue by introducing
graph-structured knowledge with the help of the multi-grained
graph attention networks, to deal with the task of the detection
of polysemy information.

REFERENCES
[1] D. Tang, B. Qin, X. Feng, and T. Liu, “Effective LSTMs for target-
dependent sentiment classification,” in Proc. COLING, 26th Int. Conf.
Comput. Linguistics, 2016, pp. 3298–3307.
[2] Y. Li, C. Yin, and S. Zhong, “Sentence constituent-aware aspect-category
sentiment analysis with graph attention networks,” in Proc. Natural Lang.
Process. Chin. Comput., 2020, pp. 815–827.
[3] G. Qiu and B. Liu, “Opinion word expansion and target extraction through
double propagation,” Comput. Linguistics, vol. 37, pp. 9–27, 2011.
[4] X. Bao, Z. Wang, X. Jiang, R. Xiao, and S. Li, “Aspect-based sentiment
analysis with opinion tree generation,” in Proc. 31st Int. Joint Conf. Artif.
Intell., 2022, pp. 4044–4050.
Fig. 5. Visualization of attention weight. The darker the color, the higher the
[5] L. Zhu, X. Zhu, J. Guo, and S. Dietze, “Exploring rich structure information
weight. IAN (positive) and ASGCN (neutral) get wrong predictions on the
for aspect-based sentiment classification,” J. Intell. Inf. Syst., vol. 60,
first sample. IAN (positive), ASGCN (positive) and AEN (negative) get
pp. 97–117, 2023.
wrong predictions on the second sample.
[6] Y. Liang, F. Meng, J. Zhang, Y. Chen, J. Xu, and J. Zhou, “A dependency
syntactic knowledge augmented interactive architecture for end-to-end
aspect-based sentiment analysis,” Neurocomputing, vol. 454, pp. 291–302,
2021.
[7] W. Zhang, X. Li, Y. Deng, L. Bing, and W. Lam, “Towards generative
E. Error Case Analysis aspect-based sentiment analysis,” in Proc. 59th Annu. Meeting Assoc.
Comput. Linguistics 11th Int. Joint Conf. Natural Lang. Process., 2021,
To provide an intuitive understanding of CoAN, we ana- pp. 504–510.
lyze two error cases in Fig. 4. These cases reveal that though [8] Y. Song, J. Wang, T. Jiang, Z. Liu, and Y. Rao, “Attentional encoder
CoAN+Thri occasionally mispredicts sentiment in multi-task network for targeted sentiment classification,” 2019, arXiv:1902.09314.
[9] W. Li, S. Yin, and T. Pu, “Lexical attention and aspect-oriented graph
settings, it performs accurately in single-task settings. And these convolutional networks for aspect-based sentiment analysis,” J. Intell.
cases also suggest that the contextual semantic learning and Fuzzy Syst., vol. 42, pp. 1643–1654, 2022.
overall training of the framework primarily rely on the results of [10] M. Dragoni, C. d. C. Pereira, and A. G. B. Tettamanzi, “Combining
argumentation and aspect-based opinion mining: The smack system,” AI
the ATSC task, and further influence the outcomes of ACSC task. Commun., vol. 31, pp. 75–95, 2018.
Furthermore, the effectiveness of the auxiliary task is essential [11] M. Dragoni, M. Federici, and A. Rexha, “ReUS: A real-time unsuper-
for enhancing the main task’s performance. vised system for monitoring opinion streams,” Cogn. Comput., vol. 11,
pp. 469–488, 2019.
To elucidate CoAN’s approach of learning category-level [12] B. Liang, H. Su, L. Gui, E. Cambria, and R. Xu, “Aspect-based sentiment
semantics and its impact on targeted sentiment classification, analysis via affective knowledge enhanced graph convolutional networks,”
we visualized the attention weights of IAN, ASGCN, AEN, and Knowl.-Based Syst., vol. 235, 2022, Art. no. 107643.
[13] Y. Wang, M. Huang, X. Zhu, and L. Zhao, “Attention-based LSTM for
CoAN+Thri models, as shown in Fig. 5. All models correctly aspect-level sentiment classification,” in Proc. Conf. Empirical Methods
predict the training sample “The food quality is high.” How- Natural Lang. Process., 2016, pp. 606–615.
ever, in testing, when “quality” is replaced with “price,” only [14] F. Fan, Y. Feng, and D. Zhao, “Multi-grained attention network for aspect-
level sentiment classification,” in Proc. Conf. Empirical Methods Natural
CoAN appropriately focuses more on “price”, highlighting its Lang. Process., 2018, pp. 3433–3442.
importance over “food”. This demonstrates CoAN’s robustness [15] M. Hu et al., “CAN: Constrained attention networks for multi-aspect
and ability to effectively discern and utilize coarse-grained con- sentiment analysis,” in Proc. Conf. Empirical Methods Natural Lang.
Process. Int. Joint Conf. Natural Lang. Process., 2019, pp. 4601–4610.
text, especially in complex language scenarios. As shown in [16] Z. Wang, S. Ho, and E. Cambria, “Multi-level fine-scaled sentiment
Fig. 5(a), CoAN effectively balances attention between explicit sensing with ambivalence handling,” Int. J. Uncertainty Fuzziness Knowl.
and implicit category sentiments within the global context. As Based Syst., vol. 28, pp. 683–697, 2020.
[17] X. Hou, P. Qi, G. Wang, R. Ying, and B. Zhou, “Graph ensemble learning
shown in Fig. 5(b), CoAN outperforms other models in capturing over multiple dependency trees for aspect-level sentiment classification,”
sentence sentiment-affecting features and deals more effectively in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics, 2021,
with neutral labeled samples compared with AEN. pp. 2884–2894.

[18] Y. Li, C. Yin, S. Zhong, and X. Pan, “Multi-instance multi-label learning Peng Huo received the BS degree from the Tianjin University of Science and
networks for aspect-category sentiment analysis,” in Proc. Conf. Empirical Technology in 2017, and the MS degree from the Tianjin University of Science
Methods Natural Lang. Process., 2020, pp. 3550–3560. and Technology, in 2021. His research interests include sentiment analysis and
[19] X. Chen et al., “Aspect sentiment classification with document-level senti- knowledge graphs.
ment preference modeling,” in Proc. 58th Annu. Meeting Assoc. Comput.
Linguistics, 2020, pp. 3667–3677.
[20] Z. Dai, C. Peng, H. Chen, and Y. Ding, “A multi-task incremental learning
framework with category name embedding for aspect-category sentiment
analysis,” in Proc. Conf. Empirical Methods Natural Lang. Process., 2020,
pp. 6955–6965.
[21] B. Liang, R. Yin, L. Gui, J. Du, Y. He, and R. Xu, “Aspect-invariant sen-
timent features learning: Adversarial multi-task learning for aspect-based Lingyan Tang received the BS degree in computer science and technology from
sentiment analysis,” in Proc. 29th ACM Int. Conf. Inf. Knowl. Manage., the Jiangxi University of Traditional Chinese Medicine in 2019. She is currently
2020, pp. 825–834. working toward the master’s degree with the Department of Artificial Intelli-
[22] L. Qin, W. Che, Y. Li, M. Ni, and T. Liu, “DCR-Net:A deep co-interactive gence, Tianjin University of Science and Technology. Her research interests
relation network for joint dialog act recognition and sentiment classifica- include machine learning, text classification, and sentiment analysis.
tion,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 8665–8672.
[23] W. Li, W. Shao, S. Ji, and E. Cambria, “BiERU: Bidirectional emotional
recurrent unit for conversational sentiment analysis,” Neurocomputing,
vol. 467, pp. 73–82, 2022.
[24] J. A. Kumar, T. E. Trueman, and E. Cambria, “A convolutional stacked
bidirectional LSTM with a multiplicative attention mechanism for aspect
category and sentiment detection,” Cogn. Comput., vol. 13, pp. 1423–1432,
2021. Ning Xiong received the B.S. degree in computer science and technology from
[25] R. He, W. S. Lee, H. T. Ng, and D. Dahlmeier, “An interactive multi-task the Tianjin University of Science and Technology in 2022. He is currently work-
learning network for end-to-end aspect-based sentiment analysis,” in Proc. ing toward the master’s degree with the Department of Artificial Intelligence,
57th Annu. Meeting Assoc. Comput. Linguistics, 2019, pp. 504–515. Tianjin University of Science and Technology. His research interests include
[26] M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, I. Androutsopou- machine learning and deep learning.
los, and S. Manandhar, “SemEval-2014 task 4: Aspect based sentiment
analysis,” in Proc. 8th Int. Workshop Semantic Eval., 2014, pp. 27–35.
[27] M. Pontiki, D. Galanis, H. Papageorgiou, S. Manandhar, and I. Androut-
sopoulos, “SemEval-2015 task 12: Aspect based sentiment analysis,” in
Proc. 9th Int. Workshop Semantic Eval., 2015, pp. 486–495.
[28] M. Pontiki, D. Galanis, H. Papageorgiou, I. Androutsopoulos, S. Manand-
har, and M. AL-Smadi, “SemEval-2016 task 5: Aspect based sentiment
analysis,” in Proc. 10th Int. Workshop Semantic Eval., 2016, pp. 19–30.
[29] B. Huang and K. M. Carley, “Syntax-aware aspect level sentiment clas- Mengting Hu received the BS degree from Tongji University in 2015, the MS
sification with graph attention networks,” in Proc. Empirical Methods and PhD degrees from Nankai University, in 2018 and 2021, respectively. His
Natural Lang. Process.-Int. Joint Conf. Natural Lang. Process., 2019, PhD degree is jointly conferred by IBM CRL. She is currently an assistant
pp. 5469–5477. professor with the College of Software, Nankai University. Her research interests
[30] D. Ma, S. Li, X. Zhang, and H. Wang, “Interactive attention networks for include sentiment analysis, domain adaptation, and few-shot learning.
aspect-level sentiment classification,” in Proc. 26th Int. Joint Conf. Artif.
Intell., 2017, pp. 4068–4074.
[31] D. Tang, B. Qin, and T. Liu, “Aspect level sentiment classification with
deep memory network,” in Proc. Conf. Empirical Methods Natural Lang.
Process., 2016, pp. 214–224.
[32] C. Zhang, Q. Li, and D. Song, “Aspect-based sentiment classification with
aspect-specific graph convolutional networks,” in Proc. Conf. Empirical
Methods Natural Lang. Process.-Int. Joint Conf. Natural Lang. Process.,
2019, pp. 4568–4578. Qi Yu received the BS degree from Shanxi University in 2004, the MS degree
[33] M. Zhang and T. Qian, “Convolution over hierarchical syntactic and lexical from Wuhan University in 2007, and the PhD degree from Shanxi Medical Uni-
graphs for aspect level sentiment analysis,” in Proc. Conf. Empirical versity in 2014. He went to Indiana University for visiting study, Bloomington
Methods Natural Lang. Process., 2020, pp. 3540–3549. in 2013. He is currently a professor with the School of Management, Shanxi
[34] K. Sun, R. Zhang, S. Mensah, Y. Mao, and X. Liu, “Aspect-level sentiment Medical University. His research focuses on healthcare Big Data.
analysis via convolution over dependency tree,” in Proc. Conf. Empirical
Methods Natural Lang. Process.-Int. Joint Conf. Natural Lang. Process.,
2019, pp. 5679–5688.
[35] W. Xue and T. Li, “Aspect based sentiment analysis with gated convolu-
tional networks,” in Proc. 56th Annu. Meeting Assoc. Comput Linguistics,
2018, pp. 483–493.

Jucheng Yang received the BS degree from South-Central Minzu University

Yuan Wang received the PhD degree in computer application technology from and the MS and PhD degrees from Chonbuk National University, South Ko-
Nankai University. She is currently an associate professor with the College rea. He is currently a professor with the College of Computer Science and
of Artificial Intelligence, Tianjin University of Science and Technology. She Information Engineering, Tianjin University of Science and Technology. His
has authored or coauthored several papers in proceedings and journals, such research interests include image processing and pattern recognition. He was
as IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, AAAI, the editor or reviewer for international journals, such as IEEE TRANSACTIONS
WWW, ICDM, APWEB. Her research interests include machine learning, ON INFORMATION FORENSICS AND SECURITY and IEEE TRANSACTIONS ON
pattern recognition, information retrieval, and data mining. INDUSTRIAL INFORMATICS.

Authorized licensed use limited to: Mepco Schlenk Engineering College. Downloaded on February 24,2025 at 08:26:58 UTC from IEEE Xplore. Restrictions apply.