0% found this document useful (0 votes)
47 views12 pages

A Local and Global Context Focus Multilingual Learning Model For Aspect-Based Sentiment Analysis

This paper proposes a new multilingual learning model called LGCF (Local and Global Context Focus) for aspect-based sentiment analysis. LGCF uses multi-headed self-attention and dynamic masking/weighting to capture local context, and bidirectional GRU and CNN to learn global context. It combines local and global context to infer sentiment polarity of target aspects. The model achieves state-of-the-art results on Chinese and English review datasets, validating its effectiveness for multilingual tasks. This is one of the first studies on both Chinese and multilingual aspect-based sentiment analysis.

Uploaded by

deepak it
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views12 pages

A Local and Global Context Focus Multilingual Learning Model For Aspect-Based Sentiment Analysis

This paper proposes a new multilingual learning model called LGCF (Local and Global Context Focus) for aspect-based sentiment analysis. LGCF uses multi-headed self-attention and dynamic masking/weighting to capture local context, and bidirectional GRU and CNN to learn global context. It combines local and global context to infer sentiment polarity of target aspects. The model achieves state-of-the-art results on Chinese and English review datasets, validating its effectiveness for multilingual tasks. This is one of the first studies on both Chinese and multilingual aspect-based sentiment analysis.

Uploaded by

deepak it
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Received 21 July 2022, accepted 3 August 2022, date of publication 8 August 2022, date of current version 15 August 2022.

Digital Object Identifier 10.1109/ACCESS.2022.3197218

A Local and Global Context Focus Multilingual


Learning Model for Aspect-Based
Sentiment Analysis
JIANGTAO HE , AISHAN WUMAIER, ZAOKERE KADEER , WEIWEI SUN,
XIANGZHE XIN, AND LINNA ZHENG
College of Information Science and Engineering, Xinjiang University, Ürümqi, Xinjiang 830046, China
Key Laboratory of Multilingual Information Technology in Xinjiang Uygur Autonomous Region, Xinjiang University, Ürümqi, Xinjiang 830046, China
Corresponding author: Zaokere Kadeer ([email protected])
This work was supported by the Autonomous Region Natural Science Foundation Joint Fund Project, Research on Xinjiang Tourism
Sentiment Analysis Technology Based on Deep Learning under Grant 2021D01C081.

ABSTRACT Aspect-Based Sentiment Analysis (ABSA) aims to predict the sentiment polarity of different
aspects in a sentence or document, which is a fine-grained task of natural language processing. Most of the
existing work focuses on the correlation between aspect sentiment polarity and local context. The important
deep correlations between global context and aspect sentiment polarity have not received enough attention.
Besides, there are few studies on Chinese ABSA tasks and multilingual ABSA tasks. Based on the local
context focus mechanism, we propose a multilingual learning model based on the interactive learning of local
and global context focus, namely LGCF. Compared with the existing models, this model can effectively learn
the correlation between local context and target aspects and the correlation between global context and target
aspects simultaneously. In addition, the model can effectively analyze both Chinese and English reviews.
Experiments conducted on three Chinese benchmark datasets(Camera, Phone and Car) and six English
benchmark datasets(Laptop14, Restaurant14, Restaurant16, Twitter, Tshirt and Television) demonstrate that
LGCF has achieved compelling performance and efficiency improvements compared with several existing
state-of-the-art models. Moreover, the ablation experiment results also verify the effectiveness of each
cmponent in LGCF.

INDEX TERMS Aspect-based sentiment analysis, Chinese sentiment analysis, multilingual ABSA, local
and global context focus.

I. INTRODUCTION the price is too expensive.’’ Customers are positive about the
In recent years, with the development of cross-border look of the laptop but negative about the price of the laptop.
e-commerce, online shopping has become a living habit of Aspect-Based Sentiment Analysis(ABSA) and Target-
people. What followed was a plethora of reviews of various Dependent Sentiment Analysis(DTSA) [4], [5] are simi-
goods and services. These comments contain the language lar fine-grained sentiment analysis. Given the review that
systems of each country. In this context, Aspect-Based Sen- ‘‘Living in London is good but very expensive,’’ ‘‘London’’ is
timent Analysis (ABSA) [1], [2], and [3] can help consumers the target contained in the sentence (corresponding to aspect
make better purchasing decisions, and allow merchants and ‘‘LIVE’’ and ‘‘PRICE’’). DTSA needs to detect sentiment
enterprises to more clearly understand their strengths and polarity for target ‘‘London.’’ While the goal of ABSA is
weaknesses and market needs. For example, given a laptop to detect sentiment polarities for different aspects mentioned
review: ’’The appearance of this laptop is good-looking, but in the sentence, i.e. ‘‘LIVE’’ and ‘‘PRICE.’’ [6] Vo and
Zhang [7] show that competitive results can be achieved
The associate editor coordinating the review of this manuscript and
without the use of syntax, by extracting a rich set of automatic
approving it for publication was Yiming Tang .
features.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 10, 2022 84135
J. He et al.: Local and Global Context Focus Multilingual Learning Model for Aspect-Based Sentiment Analysis

Previous studies have not adequately considered the corre- structures and stronger feature fitting capabilities, which are
lation between aspect sentiment polarity and global context. widely used in the tasks such as Machine Translation, Dia-
Most of the existing work has focused on the correlation logue Systems, and Text Summarization, etc. [9]
between the sentiment polarity of aspects and the local con- In this section, We present previous research in fix aspects:
text. However, the Local and Global Context Focus(LGCF) CNN, RNN, Memory Network, Attention, GCN, and Capsule
design noticed that the sentiment polarity of an aspect not Network.
only has a great correlation with the local context but also
has some deep correlation with the global context. Since A. METHODS BASED ON CONVOLUTIONAL NEURAL
CNN can extract local features and deep features from NETWORK
natural languages, an RNN can process sequential input Huang and Carley [10] introduced a novel parametric convo-
and solve long-term dependency problems. [8] Therefore, lutional neural network for aspect-level sentiment analysis.
we design the Global Context Focus(GCF) to solve the Incorporating aspect information into a convolutional neural
problem of insufficient extraction of full contextual infor- network(CNN) effectively captures aspect-specific features.
mation. The LGCF design model is different from previ- A long short-term memory network(LSTM) is difficult to
ous studies, in the LGCF design model, the Local Context parallelize and has low time efficiency and classical convolu-
Focus(LCF) deploys multi-headed self-attention(MHSA) and tional neural networks(CNN) can only capture local semantic
Context-features Dynamic Mask(CDM) or Context-features features, Gan et al. [11] proposed sparse attention-based sep-
Dynamic Weighting(CDW). Meanwhile, GCF deploys a arable dilated convolutional neural network(SA-SDCCN),
Bidirectional Gated Recurrent Unit(BGRU) and Convolu- which mainly includes three parts: multichannel embedding
tional Neural Network(CNN) and Layer Normalization(LN). layer, separable dilated convolution module, and sparse atten-
In addition, multilingual information processing is an tion layer. In the multichannel embedding layer, semantic
important direction of natural language processing. And there and sentiment embeddings are incorporated into an embed-
is almost no research on Chinese ABSA and multilingual ding tensor, which builds richer representations over the
ABSA at present. The LGCF design model proposed in this input sequence. In the separable dilated convolution module,
paper is an innovative multilingual learning model. State-of- long-range contextual semantic information is explored and
the-art results have been achieved on Chinese and English multi-scale contextual semantic dependencies are aggregated
review datasets, which fully validates the model’s strong simultaneously through diverse dilation rates. Moreover, the
scalability and adaptability to multilingual requirements. separable structure further reduces the model parameters.
The main contributions of this paper are as follows: In the sparse attention layer, sentiment-oriented components
1) In this paper, the LGCF design model is pro- are noticed according to the features of the specific target
posed, which uses CDM/CDW and Multi-headed Self- entity. To use important aspect location information and fur-
attention to capture local context features. And BGRU ther design effective aspect injection strategies when model-
and CNN are used to learn global context features. ing ACSA and ATSA in a unified framework, Wang et al. [12]
LGCF design model combines local and global context proposed a novel model named Unified Position-aware Con-
features to infer the sentiment polarity of the targeted volutional Neural Network (UP-CNN) that first proposed an
aspect. aspect detection network with prior knowledge to deal with
2) This paper studies ABSA on Chinese and English the lack of aspect locations in ACSA, and second, proposed
datasets, which provides a new idea for multilin- an aspect mask to construct aspect-aware contextual repre-
gual ABSA. sentations to fit CNN in ABSA.
3) We introduce BGRU and CNN to feature learning of
global context and experimented with the effect of the B. METHODS BASED ON RECURRENT NEURAL NETWORK
combination of BGRU and CNN with different layers To address the limitations in RNN such as lacking
on the model effect, which fully explores its potential position invariance and lacking sensitivity to local key
in feature learning of the global context. patterns, Liu and Shen [13] proposed a Gated Alternate
4) Experiments conducted on three Chinese benchmark Neural Network(GANN), which designs a special mod-
datasets and six English benchmark datasets demon- ule named the Gate Truncation RNN (GTR) to learn
strate that LGCF achieves state-of-the-art results. informative aspect-dependent sentiment clue representations.
Experiments were performed on the ablative LGCF Shuang et al. [14] proposed a feature distillation net-
design model to evaluate the importance and effective- work (FDN) for reducing noise and distilling aspect-relevant
ness of the LGCF design architecture. sentiment features. To address the problem of insufficient
aspect representation learning, Jiang et al. [15] proposed a
II. RELATED WORK mutually enhanced transformation network (METNet) for
With the rapid development of hardware computing power the ABSA task. First, the aspect enhancement module in
and the convenience of collecting a large amount of data, deep METNet improves the representation learning of the aspect
neural networks have an exclusive advantage in the field of with contextual semantic features, which gives the aspect
natural language processing relying on their more complex more abundant information. Second, METNet designed and

84136 VOLUME 10, 2022


J. He et al.: Local and Global Context Focus Multilingual Learning Model for Aspect-Based Sentiment Analysis

implemented a hierarchical structure, which enhances the dependency trees, where the tree’s node (word) embeddings
representations of aspect and context iteratively. are initialized through a bidirectional long short-term mem-
ory (bilstm) network.
C. METHODS BASED ON ATTENTION
Ma et al. [16] proposed the Interactive Attention Network F. METHODS BASED ON CAPSULE NETWORK
(IAN), which realizes the interactive learning of target and Jiang et al. [25] designed some capsule layers to learn
context attention, and generates the attention representation the deep features of the context. Chen and Qian [26] pro-
of target and context, respectively. To address the problem posed a Transfer Capsule Network (TransCap) to transfer
of information loss when modeling aspect terms containing document-level knowledge to aspect-level sentiment classi-
multiple words, Qiannan et al. [17] proposed a multi-attention fication. Du et al. [27] utilized a capsule network to construct
network (MAN). The model uses both intra-layer and inter- vector-based feature representation and cluster features by
layer attention mechanisms for aspect-based sentiment anal- ways of an EM routing algorithm.
ysis. In order to effectively extract aspects from text and
analyze sentiment polarity at the same time, Yang et al. [18] III. PROPOSED METHODOLOGY
proposed a Chinese-oriented multi-task learning model based The overall structure of the LGCF is shown in figure 1.
on the local context focus mechanism. The model uses a It mainly consists of four parts: Input embedding layer, Local
local context focus mechanism for aspect term extraction and Context Focus, Global Contexts Focus, and Feature interac-
aspects polarity classification. tive learning layer.

D. METHODS BASED ON MEMORY NETWORK A. TASK DEFINITION


Mao et al. [19] proposed a novel model of Attentive Neural Aspect-based sentiment classification is a fine-grained sub-
Turing Machines (ANTM). Via interactive read-write opera- task of sentiment analysis, which aims to predict the emo-
tions between external memory storage and a recurrent con- tional polarity of the target aspects. Given an input context
troller, ANTM can learn the dependable correlation of the sequence W = {w0 , w1 , . . . ,wn } consisting of N words
opinion target context and concentrate on crucial sentiment including aspect terms. W a = wa0 , wa1 , . . . , wak is a aspect
information. Xu et al. [20] proposed a novel Multi-Interactive term sequence. At the same time, W a is a subsequence
Memory Network(MIMN) model, which includes two inter- from W , which is made up of k (k ≥ 1) words.
active memory networks to supervise the textual and visual
information with the given aspect, and learn not only the
interactive influences between cross-modality data but also
the self influences in single-modality data. Lin et al. [21]
proposed an innovative model Deep Mask Memory Net-
work with Semantic Dependency and Context Moment
(DMMN-SD-CM), which introduces semantic analysis infor-
mation to guide the attention mechanism and effectively
learn the information provided by other aspects (non-target
aspects). At the same time, the context moment proposed in
the paper is embedded into the sentiment classification of
the whole sentence and is designed to provide background
information for the target aspect.

E. METHODS BASED ON GRAPH CONVOLUTIONAL


NETWORK FIGURE 1. Overall Network architecture of the LGCF.
Zhao et al. [22] proposed a novel aspect-level sentiment
classification model based on graph convolutional net- B. INPUT EMBEDDING LAYER
works (GCN). To solve the problem that the emotional The input embedding layer translates words into distributed
preference information at the document level is ignored, representations. Pre-trained BERT models are designed to
Chen et al. [23] proposed a Cooperative Graph Attention improve the performance of most NLP tasks. We use the
Networks (Co-GAN) method for cooperative learning pre-trained language model BERT [28] to map each word to
of aspect-related sentence representations, which utilizes the embedded vector et ∈ Rdh ×1 , where dh is the hidden
two Graph Attention Networks (GAT) to model two dimension of the word vector. In LGCF, the input embed-
document-level sentiment preference information respec- ding layer contains local context embedding and global con-
tively. And integrate these two kinds of emotional preference text
 embedding, where the local context is constructed as
information through an interactive mechanism. Sun et al. [24] [CLS] + Sentence
 + [SEP] , the global context is con-
proposed a convolutional model of dependency trees, which structed as [CLS] + Sentence + [SEP] + Aspect term +
utilizes GCNs to model the structure of sentences through [SEP] . LGCF uses two independent Bert sharing layers

VOLUME 10, 2022 84137


J. He et al.: Local and Global Context Focus Multilingual Learning Model for Aspect-Based Sentiment Analysis

to model local context sequence features and global con- Semantic Relative Distance (SRD). Extensive experimental
text sequence features respectively. X l and X g are used to results show that the local context of the target aspect contains
represent the tokenized inputs of the Local Context Feature very important information.
Processor and Global Context Feature Processor respectively, Semantic Relative Distance (SRD) describes the distance
we have between tokens and aspects, taking the number of tokens
  between each particular token as the SRD of all token-aspect
OlBERT = BERT l X l (1)
pairs. The calculation method of SRD is:
g
OBERT = BERT g X g

(2) m
SRDi = |i−Pa | − | | (7)
g
OlBERT and OBERT are Bert shared layer models correspond- 2
ing to local and global contexts respectively. where i (1 < i < n) is the position of the specific token, Pa is
the center position of the aspect. m is the length of the target
C. LOCAL CONTEXT FOCUS aspect, SRDi represents the SRD between the i-th token and
Local Context Focus(LCF) mainly consists of four parts: the target aspect.
Multi-Headed Self-Attention(MHSA), Context-features Figure 2 and Figure 3 show two implementations of
Dynamic Mask(CDM), Context-features Dynamic Weight- two local context focusing mechanisms, the context-feature
ing(CDW), CDM-CDW. dynamic mask (CDM) layer and the context-feature dynamic
weighting (CDW) layer, respectively. The bottom and top
1) MULTI-HEADED SELF-ATTENTION of the figure represent the feature input position and output
Multi-headed self-attention is to repeat the process of seeking position (POS) corresponding to each token. The MHSA
attention mechanism multiple times—multiple heads, and encoder computes the outputs for all tokens, then the output
then gather them together to become the size of a single features at each output location will be masked or weighted
head through a linear layer. For the self-attention function, attenuated, while the local context will remain unchanged.
it is recommended to use the Scaled dot-product attention
mechanism (SDA) as the same attention function, because it
is faster and more efficient to calculate.
Assume that XSDA is an input representation embedded
through the embed layer. SDA is defined as follows:
Q · KT
 
SDA (XSDA ) = softmax √ (3)
dK
Q, K , V = fx (XSDA ) (4)

q
Q = XSDA · W

fx (XSDA ) = K = XSDA · W k (5) FIGURE 2. Simulation of context-features Dynamic Mask(CDM)
mechanism.

V = XSDA · W v

Q, K and V are obtained by multiplying the output represen-


tation of the upper hidden state by their respective weight
matrices W q ∈ Rdh ·dq , W k ∈ Rdh ·dk , W v ∈ Rdh ·dv . These
weight matrices are trainable in the learning process, and the
dimension dq , dk , dv is equal to dh ÷h, dh is the dimension
of the hidden layer. The attentional representation learned
by each head will be concatenated and transformed by mul-
tiplying by a vector W MH . In the LCF design, the number
h of attention headers is set to 12. Assuming that hi is the
representation learned by each attentional head, then we have:
 
MHSA (X ) = Tanh {H0 ; H1 ; . . . ; Hh } · W MH (6) FIGURE 3. Simulation of context-features Dynamic Weighting (CDW)
mechanism.
Among them ‘‘;’’ represents vector connections. W MH ∈
Rhdv ·dhis the parameter matrices for projection. A tanh acti- 3) CONTEXT-FEATURES DYNAMIC MASK
vation function is designed for the MHSA encoder to improve Context features that are relatively semantically less learned
the learning ability of the model representation. by the Bert sharing layer will be masked by the CDM layer,
but will never be discarded. When the CDM layer is deployed,
2) SEMANTIC-RELATIVE DISTANCE the relative representations of context words and aspects with
The LCF determines which context words belong to relatively few semantics remain in the corresponding output
the context words of the target aspect according to the location, without masking, and only a relatively small amount

84138 VOLUME 10, 2022


J. He et al.: Local and Global Context Focus Multilingual Learning Model for Aspect-Based Sentiment Analysis

of semantic context itself is masked in the corresponding OlCDW is the output of the CDW layer. OlCDM is the output
output location. of the CDM layer. CDM and CDW are independent, which
CDM sets all masking features to zero vectors. Another means they are optional. The output features of both CDM
MHSA encoder is deployed in the LCF to learn the context and CDW layers are represented as Ol . CDM-CDW fuses
characteristics of the mask. In this way, the LCF design mit- CDM and CDW, which concatenates features learned by the
igated the impact of relatively few contexts with semantics, CDM and CDW layers and adopts linear transformations as
but preserved the relevance between each context word and features of local context.
aspect. Suppose OlBERT is the output representation of the h i
OlCDM −CDW = OlCDM ; OlCDW (16)
Bert module corresponding to the LCF, CDM pays attention
to the local context by constructing a mask vector Vmi for OlCDM −CDW = W MW · OMW + bMW (17)
each context word with relatively few semantics, resulting in
a mask matrix M . W MW , OMW and bMW are weight matrices and bias vectors,
( respectively. The model can choose one of three to learn local
E SRDi ≤ a contextual features.
Vi = (8)
O SRDi > a
D. GLOBAL CONTEXT FOCUS
M = V0m , V1m , . . . , Vnm

(9)
CNN can extract local features and deep features from natural
OlCDM = OlBERT · M (10) language, while RNN can handle sequential input and solve
where a is the threshold of SRD. M is the mask matrix repre- long-term dependency problems. In this section, we combine
senting the input sequence. n is the length of the sequence of CNN and BGRU for the Global Context Focus(GCF). Global
inputs containing each aspect. E ∈ Rdh is a vector, O ∈ Rdh Context Focus mainly consists of three parts: Bidirectional
is the zero vector. OlCDM is the output of the CDM layer. ‘‘·’’ Gated Recurrent Unit (BGRU), Convolutional Neural Net-
denotes the dot product of vectors. work(CNN), and Layer Normalization(LN).

4) CONTEXT-FEATURES DYNAMIC WEIGHTING 1) BIDIRECTIONAL GATED RECURRENT UNIT


In addition to the CDM layer, another mechanism is designed RNN can process sequence data and learn long-term
to focus on local context words, namely the context feature dependencies in sequence learning while avoiding gradi-
Dynamic Weighting (CDW) layer. With CDW, semantically ent disappearance and gradient explosion in the learning
relevant context features will be completely preserved, while process. [8] We make use of a BGRU composed of two layers
semantically less relevant context features will be weighted of GRU [29], one of which learns features from sequence data
down. In this design, according to their SRD, the features and the other from reverse input data. The GRU introduced
of context words far from the target aspect will be weighted reset and update gates to control its input and output.
down. CDW weights feature by constructing a weight vector According to the structure diagram of GRU in Figure 4,
for each context with relatively few semantics. Here is the the input of time t is ht−1 and xt , and the output is ht , and the
formula for obtaining the mask matrix of an input sequence: parameters are updated by the following formulas:
( rt = σ (Wr · [ht−1 , xt ]) (18)
E SDRDi ≤ t
Viw = n−(SDRDi −t) (11) zt = σ (Wz · [ht−1 , xt ]) (19)
n · E SDRDi > t
h̃ = tanh (Wh · [rt × ht−1 , xt ]) (20)
M = V0w , V1w , . . . , Vnw

(12)
ht = (1 − zt ) × ht−1 + zt × h̃ (21)
OlCDW = OlBERT · W (13)
where SRDi is the SRD between the i-th context token and a
particular aspect. n is the length of the input sequence. a is
the threshold of SRD. OlCDW is the output of the CDW layer.
‘‘·’’ stands for vector dot product operation.

5) CDM-CDW
Based on the output of CDW and CDM, the output represen-
tation of the local context can be obtained.
For CDM layer:
 
Ol = MHSA OlCDM (14)

For CDW layer:


 
Ol = MHSA OlCDW (15) FIGURE 4. Structure of GRU.

VOLUME 10, 2022 84139


J. He et al.: Local and Global Context Focus Multilingual Learning Model for Aspect-Based Sentiment Analysis

where rt and zt corresponds to reset gate and update gate The formula for the layer normalization module is as
respectively; ht denotes the candidate hidden layer; · denotes follows:
the element-wise multiplication; tanh is the tanh function,
x−E [x]
whose formula is (22); and Wr , Wz , Wh , Wo are the parameters y= √ ∗γ +β (27)
of GRU. Var [x] + ε
ex − e−x The first term is the normalization process. The ε in the
tanh (x) = (22)
ex + e−x denominator is a very small number to prevent numerical
instability. γ and β are affine parameters. The normalized
The above structure enables the model to fully learn con-
data is scaled up again to obtain new data. γ is the standard
text information from the sequential data input. In addition,
deviation, β is the mean, and they’re generally learnable.
some neuron elements are randomly zeroed in each training
γ and β are the only learnable parameters of the layer nor-
process, which is an effective regularization method.
g malization.
Supposed that the OBERT is the preliminary output features
Gg and C g are outputs of BGRU and CNN modules respec-
of Bert module coresponding to the GCF, then we get the
tively. We get the concatenation GC g of them by the concate-
output Gg of the BGRU module.
nation operation.
g
Gg = BGRU OBERT

(23)
GC g = Gg ; C g
 
(28)
Finally, through the BGRU module, we get the output Gg
of the BGRU module. We put GC g into the Layer Normalization module, and the
output is Og .
2) CONVOLUTIONAL NEURAL NETWORK
Og = LN GC g

Sharing weights among convolutional layers allows the (29)
model to remove some connections between layers in the
network, which can help to reduce the calculation and avoid where LN is formula (27).
overfitting. In this paper, we pad the original data so that
the sample size after the convolution operation remains E. FEATURE INTERACTIVE LEARNING LAYER
unchanged. After padding, a convolution operation is per- A feature interactive learning (FIL) layer is designed to learn
formed on the sentence representation sequence. Represents the features of the global context interactively. The FIL first
that the convolution filter is H ∈ Rh×U , and each kernel-size connects the representation of Ol and Og , then projects them
is 3. ci is generated by applying convolution kernel H to the lg
into Odense , and an MHSA coding operation is applied.
kernel-size according to:
h i
ci = f H · Ei;i+h−1 + b

(24) Olg = Ol ; Og (30)
Oldense = W lg · Olg + blg (31)
where Ei;i+h−1 ∈ Rh×U is the column vector from i to (i +  
lg lg
h − 1), b ∈ R is a bais, and f is a linear activation function. OFIL = MHSA Odense (32)
The convolution kernel is applied to the entire input to
obtain the feature map. W lg ∈ Rdh ×2dh and blg ∈ Rdh are the weights and bias
vectors of dense layers respectively. The MHSA encoder
Ci = [c1 , c2 , c3 , . . . , cL ] , Ci ∈ RL (25) lg lg
encodes Odense .Then output interactive learning feature OFIL .
Then we get the output C g of the CNN module.
F. OUTPUT LAYER
g
C g = CNN OBERT

(26) In the output layer, the feature representation learned by the
feature interactive learning layer is pooled by extracting the
Finally, through the CNN module, we get the output C g of
hidden state at the corresponding position of the first token.
the CNN module.
Finally, the softmax layer is applied to predict the polarity of
sentiment.
3) LAYER NORMALIZATION
 
Layer normalization is performed for a single training sample lg
Xpool = POOL OFIL
lg
(33)
and does not depend on other data. Therefore, it can avoid  
the problem of being affected by mini-batch data distribution lg
  exp Xpool
lg
in batch normalization, and can be used in small Mini-Batch Y = softmax Xpool = P   (34)
C lg
scenarios, dynamic network scenarios, and RNN, especially k=1 exp Xpool
in the field of natural language processing. Layer normaliza-
tion does not need to save the mean and variance of mini- where C is the number of polarity categories, and Y is the
batch, saving extra storage space. sentiment polarity predicted by the LGCF architecture model.

84140 VOLUME 10, 2022


J. He et al.: Local and Global Context Focus Multilingual Learning Model for Aspect-Based Sentiment Analysis

TABLE 1. The statistical information of all datasets. TABLE 2. The global hyperparameters setting in the experiments.

AOA [34]: models aspects and sentences in a federated


IV. EXPERIMENTS
manner and explicitly capture the interaction between aspects
A. DATASETS AND EXPERIMENTAL SETTINGS
and context sentences. Through the AOA module, the model
To demonstrate the effectiveness of our proposed model, learns aspects and sentence representation together and auto-
we tested the performance of the LGCF on three Chinese matically focuses on the important parts of the sentence.
comment datasets [30] (Camera, Phone, Car). The sentiment ASGCN-DG [35]: is based on an undirected dependency
polarity in three Chinese datasets is divided into two cate- graph. The adjacency matrix is obtained from the words of
gories: Positive and Negative. And we conduct experiments DSPT to overcome some limitations of attention mechanisms
on the SemEval-2014 Laptop, Restaurant datasets [1], the and CNN-based models.
SemEval-2016 Restaurant dataset [3] and an ACL Twitter ASGCN-DT [35]: is based on a directed dependency tree.
social dataset [31], the Tshirt dataset, and the Television The difference between ASGCN-DG and ASGCN-DT lies in
dataset [32]. The sentiment polarity in six English datasets is their adjacency matrices.
divided into three categories: Positive, Neutral, and Negative. MAN [17]: proposed a multi-attention network (MAN) to
The statistical information of all datasets is shown in Table 1. solve the problem of information loss when a coarse-level
The cross-entropy loss is adopted for ABSA. And the attention mechanism is used to model aspects.
L2 regularization is applied in the LGCF, here is the loss UP-CNN-BERT [12]: proposes a novel unified location-
function for the LGCF. aware convolutional neural network that deploys an aspect
C
X X detection network with prior knowledge to deal with the lack
absc = ŷi log yi + λ θ2 (35) of aspect locations in ACSA and utilizes an aspect mask to
1 θ∈2 construct an aspect-aware contextual representation for the
where C is the number of polarity categories, λ is the purpose of ABSA to fit the CNN.
L2 regularization parameter, and 2 is the parameter-set of
the LGCF. 2) STATE-OF-THE-ART MODELS AT DIFFERENT TIMES
In addition to the hyperparameter settings mentioned in The following are seventeen models that achieved the state
previous studies, we also conduct controlled experiments and of the art (SOTA) performance at different times, including
analyze the experimental results to optimize the hyperpa- now:
rameter settings. The superior hyperparameters are listed in ATAE-LSTM [36]: mainly uses the attention mechanism
Table 2. For all experiments, the default SRD was set to 5. to capture the importance of different context information to
a given aspect, and combines the attention mechanism with
B. COMPARED MODELS LSTM to conduct semantic modeling of sentences, to solve
To evaluate our proposed model comprehensively, we com- the problem of aspect-level sentiment analysis.
pared our model with some baseline models and start-of-the- GANN [13]: simultaneously encodes the relative distance
art models at different times. between each context word and aspect target, sequence infor-
mation, and semantic dependencies within emotional cues,
1) BASELINE MODELS and uses convolution and pooling mechanisms to obtain key
We introduced eight baseline models as follows: local emotional cue features to obtain the location invariance
IAN [16]: implements attentive interactive learning of tar- of features.
get and context, and generates attentive representations of RAM [37]: employs a multi-attention mechanism to cap-
target and context, respectively. ture emotional features at a distance, resulting in greater
MemNet [33]: captures the importance of different context robustness to irrelevant information.
words for a specific aspect, and uses this information as the MGAN [38]: combines coarse-grained and fine-grained
semantic representation of the sentence to distinguish the attention to capture the word-level interactions between
more important context information of a specific aspect. aspects and contexts and utilizes an aspect alignment loss
ATSM-S [30]: tackles sentiment classification of very rich to describe the aspect-level interactions between aspects that
and complex written Chinese by explicitly modeling the tar- share a common context.
get aspects at three different levels of granularity, followed by CDT [39]: utilizes BiLSTM to learn the feature repre-
sentiment classification. sentation of sentences, and utilizes GCN to enhance the

VOLUME 10, 2022 84141


J. He et al.: Local and Global Context Focus Multilingual Learning Model for Aspect-Based Sentiment Analysis

embeddings learned from BiLSTM. GCN operates directly LCFS-BERT-CDW [47]: is a variant of LCF-BERT-CDM
on the dependency tree of sentences. that down-weights words that are far from the aspect.
DMMN-SDCM [21]: integrates semantic analysis infor-
C. EXPERIMENTAL RESULTS AND ANALYSES
mation into the memory network and designs an auxiliary
task to learn the sentiment distribution of the whole sentence We compare our proposed method with other baseline models
to provide the desired background information for sentiment and state-of-the-art models. Table 3 shows the results of the
classification of the target aspect. model on the Chinese dataset. Table 4 shows the results of the
TD-BERT [40]: develops the BERT model for target sen- model on the English dataset.
timent classification by stacking a max-pooling layer and an
FCL on top of the BERT model. TABLE 3. The experimental results (%) of LGCF models on three Chinese
datasets. The optimal performance is in bold.
AEN-BERT [41]: uses attention-based encoding to model
the relationship between context and target, raises the
label unreliability problem, and introduces label smooth-
ing regularization. The model also applies pre-trained
Bert.
BERT-PT [42]: proposed a solution after Bert’s training,
so that Bert’s training with a small amount of data can achieve
better results, to solve Bert’s lack of domain knowledge and
task-related knowledge. First, the LGCF design model achieves better results than
BERT-BASE [28]: is designed to pre-train deep bidirec- other comparable models on all accuracy measures and
tional representations from the unlabeled text by jointly almost all macro-f1 score measures. We believe that the
conditioning on both left and right contexts in all layers. first reason that the LGCF design model does better than
As a result, the pre-trained BERT model can be fine-tuned other state-of-the-art models is that it effectively captures
with just one additional output layer to create state-of-the- the importance of global contextual sentiment information.
art models for a wide range of tasks. We apply it to ABSA We design Global Context Focus (GCF) to make the model
multilingual learning. pay more attention to the global context information. Most
RepWalk [43]: performs random copy walks on the gram- state-of-the-art models do not pay enough attention to the
mar graph to obtain the context words that have the greatest global context.
impact on the sentiment prediction of aspect words. Calculate Second, We find that CDM, CDW, and CDM-CDW per-
the weight of each word in a sentence based on grammatical form differently on different datasets. LGCF-CDM has better
information. performance on camera dataset, acc of laptop14 dataset and
SK-GCN(BERT) [44]: exploits syntactic dependency trees restaurant 16 dataset, and F1 of TV dataset. LGCF-CDM
and commonsense knowledge through GCN, and proposes masks contextual features with relatively few semantics by
two strategies to model syntactic dependency trees and com- setting them to zero vectors. In this way, the influence of
monsense knowledge graphs in order to enhance the repre- non-local context is eliminated, which is a more extreme
sentation of sentences in a given aspect. way. LGCF-CDW performs better on the Acc metrics of
SAGAT-BERT [45]: uses the graph attention network to the three datasets car, restaurant 14, and Twitter. LGCF-
use the syntax perception information on the dependent tree CDW downweights contextual features with relatively few
structure and the external pre-training knowledge of the semantics. This is a gentler way to mitigate the effects of
BERT language model, which helps to model the interaction non-local contexts. However, LGCF-CDM-CDW performs
between the context and aspects. well on almost all Acc and F1 on two Chinese datasets
ASGCN-BERT [35]: uses the syntactic dependency struc- and five English datasets. LGCF-CDM-CDW is designed by
ture of sentences to solve the long-term multi-word depen- combining CDM and CDW, which is a more effective fusion
dency problem in ABSA. mechanism.
R-GAT+BERT [46]: defines a unified aspect-oriented Finally, the performance of the model on the Chinese
dependency tree structure rooted in the target aspect by dataset is better than that of the English dataset, which is
reshaping and pruning the ordinary dependency parse tree. mainly due to the fact that the English dataset contains three
And a relational graph attention network (R-GAT) is pro- sentiment polarities (positive, negative, neutral) while the
posed to encode a new tree structure for sentiment prediction. Chinese dataset has only two polarities ( positive, negative).
LCFS-BERT-CDM [47]: combines part of speech embed- And sentences containing neutral sentiment polarity are more
ding, syntactic relation embedding, and context embedding likely to identify errors.
(e.g. BERT, Roberta) to improve the performance of the
aspect extractor, and proposes syntactic relative distance to D. EFFECTS OF COMBINATION OF BGRU AND CNN WITH
reduce the adverse effects of unrelated words with weak DIFFERENT NUMBER OF LAYERS
syntactic connections between and aspects to improve the In the LGCF mechanism, different layer combinations
accuracy of the aspect emotion classifier. of BGRU and CNN will have different effects on the

84142 VOLUME 10, 2022


J. He et al.: Local and Global Context Focus Multilingual Learning Model for Aspect-Based Sentiment Analysis

TABLE 4. The experimental results (%) of LGCF models on six English datasets. The results of models we reproduce following the source code published
in the paper are indicated by an asterisk (∗). All hyperparameters and experimental environments are strictly follow the source paper. ‘‘†’’represents that
the result are obtained from [32]. ‘‘−’’ represents the result is not unreported. Others are retrieved from the previous papers. The optimal performance is
in bold.

performance of the model. We explore the effect of different


combinations of 1-3 layers of BGRU and CNN on the Chinese
Car dataset and the English Laptop14 dataset.
The experimental results for the combination of BGRU
and CNN layers are shown in Figure 5, Figure 6, Figure 7,
and Figure 8.
After experimental verification on the Car dataset and
the Laptop14 dataset, the combined effect of one layer of
BGRU and one layer of CNN is more universal and supe-
rior.Although, in the case of 2B+3C and 3B+1C on the FIGURE 5. The influence of Acc on the Car dataset.
car dataset, and in the case of 1B+3C and 3B+2C on the
laptop14 dataset, the performance of the model under some
mechanisms will increase. However, based on the generality,
robustness and complexity of the model, we choose to use the
combination of one layer of BGRU and one layer of CNN for
the model design.

E. ABLATION STUDY
To verify the performance of each component of our pro-
posed model, we conduct ablation experiments. The role of
each component is further studied by adding components one
by one. The results of the CDM, CDW, and CDM-CDW FIGURE 6. The influence of F1 on the Car dataset.

ablation experiments on Laptop dataset and Car dataset are


shown in Table 5 and Table 6 respectively. Where ‘‘our’’ removing any one or two of the three modules of BGRU,
represent our proposed LGCF. ‘‘BGEU,’’ ‘‘CNN,’’ and ‘‘LN’’ CNN and LN, the experimental effect almost decreased. This
are the respective modules added to LGCF. For example, illustrates the importance of combining the three modules in
‘‘CNN+LN’’ means LGCF removes BGRU module. Abla- the LGCF design model on the Chinese dataset. Furthermore,
tion experiments on Chinese and English datasets fully verify we note that the experimental performance increases in the
the effectiveness of each module of our proposed model. case of ‘‘CNN’’ or ‘‘CNN+LN’’ on the mechanism of CDM
First, for the ablation experiments on the car dataset, and CDM-CDW, which may be attributed to the noise intro-
in the three mechanisms of CDM, CDW, and CDM-CDW, duced by BGRU.

VOLUME 10, 2022 84143


J. He et al.: Local and Global Context Focus Multilingual Learning Model for Aspect-Based Sentiment Analysis

TABLE 6. Ablation experiments on the Laptop14 dataset.

FIGURE 7. The influence of Acc on the Laptop14 dataset.

In general, our proposed model LGCF is a trade-off for


FIGURE 8. The influence of F1 on the Laptop14 dataset.
the comprehensive performance of all datasets, which fully
reflects the generality and superiority of the model.

TABLE 5. Ablation experiments on the Car dataset. F. CASE STUDY


To gain a more intuitive understanding of how LGCF works,
we selected two review examples for visual case studies.
In Figure 9, we visualize the attention weights, predicted
labels, and corresponding true labels for sentences and aspect
terms.

FIGURE 9. The influence of coefficients on review examples.

In the first example, there are two aspect terms that corre-
spond to opposite sentiment polarities. The second example
Second, for the ablation experiments on the laptop14 has only one aspect term.
dataset, on both the CDM and CDM-CDW mechanisms, AOA predicts failure for the first example sentence and
removing any one or two of the three modules results in a drop success for example sentences with only one aspect term.
in the experimental results. This illustrates the effectiveness ASGCN fails to predict the sentiment polarity of ‘‘location’’
of the combination of BGRU, CNN, and LN in the LGCF in the first example sentence, probably because of wrongly
design model on the English dataset. However, we note that focusing on ‘‘but’’ related words. BERT-BASE failed to cor-
all ablation experiments have improved performance on the rectly predict the sentiment polarity of ‘‘service,’’ probably
CDW mechanism. It may be because the CDW mechanism is because the two emotional words ‘‘bustling’’ and ‘‘poor’’
a relatively mild weight decay mechanism, and the integration in the text are relatively close to the position of ‘‘service.’’
of too rich global context information will bring over-fitting LGCF effectively addresses these issues by combining local
and noise, thereby affecting the prediction accuracy of the contextual attention and global contextual attention. LGCF
model. successfully predicts the sentiment polarity of three aspect

84144 VOLUME 10, 2022


J. He et al.: Local and Global Context Focus Multilingual Learning Model for Aspect-Based Sentiment Analysis

terms in two example sentences. This means that LGCF [9] E. Cambria and B. White, ‘‘Jumping NLP curves: A review of natural
effectively learns the relevant features of local context and language processing research [review article],’’ IEEE Comput. Intell. Mag.,
vol. 9, no. 2, pp. 48–57, May 2014.
global context. [10] B. Huang and K. Carley, ‘‘Parameterized convolutional neural networks for
aspect level sentiment classification,’’ in Proc. Conf. Empirical Methods
V. CONCLUSION AND FUTURE WORK Natural Lang. Process. Brussels, Belgium: Association for Computational
Linguistics, 2018, pp. 1091–1096, doi: 10.18653/v1/D18-1136.
Most of the existing research has focused on the correlation [11] C. Gan, L. Wang, Z. Zhang, and Z. Wang, ‘‘Sparse attention based separa-
between the sentiment polarity of aspects and the local con- ble dilated convolutional neural network for targeted sentiment analysis,’’
text. The important deep correlations between global context Knowl.-Based Syst., vol. 188, Jan. 2020, Art. no. 104827.
[12] X. Wang, F. Li, Z. Zhang, G. Xu, J. Zhang, and X. Sun, ‘‘A unified position-
and aspects have not received enough attention. In addi- aware convolutional neural network for aspect based sentiment analysis,’’
tion, there are few studies on Chinese ABSA and multilin- Neurocomputing, vol. 450, pp. 91–103, Aug. 2021.
gual ABSA task, and urgently needs to be proposed and [13] N. Liu and B. Shen, ‘‘Aspect-based sentiment analysis with gated alternate
developed. To solve the above problem, in this paper we neural network,’’ Knowl.-Based Syst., vol. 188, Jan. 2020, Art. no. 105010.
[14] K. Shuang, Q. Yang, J. Loo, R. Li, and M. Gu, ‘‘Feature distillation net-
design a novel model called LGCF for aspect-based sentiment work for aspect-based sentiment analysis,’’ Inf. Fusion, vol. 61, pp. 13–23,
analysis. Sep. 2020.
In future work, we plan to introduce Local Context [15] B. Jiang, J. Hou, W. Zhou, C. Yang, S. Wang, and L. Pang, ‘‘METNet:
A mutual enhanced transformation network for aspect-based sentiment
Focus (LCF) and Global Context Focus (GCF) into other
analysis,’’ in Proc. 28th Int. Conf. Comput. Linguistics, 2020, pp. 162–172.
ABSA subtasks, and try to build a multi-task ABSA model [16] D. Ma, S. Li, X. Zhang, and H. Wang, ‘‘Interactive attention networks for
based on LCF and GCF. In addition, the introduction of aspect-level sentiment classification,’’ in Proc. 26th Int. Joint Conf. Artif.
syntactic and semantic information into the model is also Intell., Aug. 2017, pp. 4068–4074, doi: 10.24963/ijcai.2017/568.
[17] Q. Xu, L. Zhu, T. Dai, and C. Yan, ‘‘Aspect-based sentiment classification
considered. with multi-attention network,’’ Neurocomputing, vol. 388, pp. 135–143,
May 2020.
DISCLOSURE STATEMENT [18] H. Yang, B. Zeng, J. Yang, Y. Song, and R. Xu, ‘‘A multi-task learn-
ing model for Chinese-oriented aspect polarity classification and aspect
No potential conflict of interest was reported by the author(s). term extraction,’’ Neurocomputing, vol. 419, pp. 344–356, Jan. 2021, doi:
10.1016/j.neucom.2020.08.001.
REFERENCES [19] Q. Mao, J. Li, S. Wang, Y. Zhang, H. Peng, M. He, and L. Wang, ‘‘Aspect-
based sentiment classification with attentive neural Turing machines,’’ in
[1] M. Pontiki, D. Galanis, J. Pavlopoulos, H. Papageorgiou, Proc. IJCAI, Aug. 2019, pp. 5139–5145.
I. Androutsopoulos, and S. Manandhar, ‘‘SemEval-2014 task 4:
[20] N. Xu, W. Mao, and G. Chen, ‘‘Multi-interactive memory network for
Aspect based sentiment analysis,’’ in Proc. 8th Int. Workshop
aspect based multimodal sentiment analysis,’’ in Proc. AAAI Conf. Artif.
Semantic Eval. (SemEval). Dublin, Ireland: Association for
Intell., vol. 33, 2019, pp. 371–378.
Computational Linguistics, 2014, pp. 27–35. [Online]. Available:
[21] P. Lin, M. Yang, and J. Lai, ‘‘Deep mask memory network with semantic
https://www.aclweb.org/anthology/S14-2004, doi: 10.3115/v1/S14-2004.
dependency and context moment for aspect level sentiment classification,’’
[2] M. Pontiki, D. Galanis, H. Papageorgiou, S. Manandhar, and
in Proc. IJCAI, 2019, pp. 5088–5094.
I. Androutsopoulos, ‘‘SemEval-2015 task 12: Aspect based sentiment
analysis,’’ in Proc. 9th Int. Workshop Semantic Eval. (SemEval). Denver, [22] P. Zhao, L. Hou, and O. Wu, ‘‘Modeling sentiment dependencies with
CO, USA: Association for Computational Linguistics, pp. 486–495. graph convolutional networks for aspect-level sentiment classification,’’
[Online]. Available: https://www.aclweb.org/anthology/S15-2082, doi: Knowl.-Based Syst., vol. 193, Apr. 2020, Art. no. 105443.
10.18653/v1/S15-2082. [23] X. Chen, C. Sun, J. Wang, S. Li, L. Si, M. Zhang, and G. Zhou, ‘‘Aspect
[3] M. Pontiki, D. Galanis, H. Papageorgiou, I. Androutsopoulos, sentiment classification with document-level sentiment preference mod-
S. Manandhar, M. Al-Smadi, M. Al-Ayyoub, Y. Zhao, B. Qin, eling,’’ in Proc. 58th Annu. Meeting Assoc. Comput. Linguistics, 2020,
O. De Clercq, V. Hoste, M. Apidianaki, X. Tannier, N. Loukachevitch, pp. 3667–3677.
E. Kotelnikov, N. Bel, S. M. Jiménez-Zafra, and G. Eryiğit, ‘‘SemEval- [24] K. Sun, R. Zhang, S. Mensah, Y. Mao, and X. Liu, ‘‘Aspect-level sentiment
2016 task 5: Aspect based sentiment analysis,’’ in Proc. 10th Int. analysis via convolution over dependency tree,’’ in Proc. Conf. Empirical
Workshop Semantic Eval. (SemEval). San Diego, CA, USA: Association Methods Natural Lang. Process. 9th Int. Joint Conf. Natural Lang. Process.
for Computational Linguistic, 2016, pp. 19–30. [Online]. Available: (EMNLP-IJCNLP), 2019, pp. 5683–5692.
https://www.aclweb.org/anthology/S16-1002, doi: 10.18653/v1/ [25] Q. Jiang, L. Chen, R. Xu, X. Ao, and M. Yang, ‘‘A challenge dataset
S16-1002. and effective models for aspect-based sentiment analysis,’’ in Proc. Conf.
[4] S. Abudalfa and M. Ahmed, ‘‘Survey on target dependent sentiment anal- Empirical Methods Natural Lang. Process. 9th Int. Joint Conf. Natural
ysis of micro-blogs in social media,’’ in Proc. 9th IEEE-GCC Conf. Exhib. Lang. Process. (EMNLP-IJCNLP), 2019, pp. 6281–6286.
(GCCCE), Manama, Bahrain, May 2017, pp. 7–10. [26] Z. Chen and T. Qian, ‘‘Transfer capsule network for aspect level sentiment
[5] M. Jabreel, F. Hassan, and A. Moreno, ‘‘Target-dependent sentiment classification,’’ in Proc. 57th Annu. Meeting Assoc. Comput. Linguistics,
analysis of tweets using bidirectional gated recurrent neural networks,’’ 2019, pp. 547–556.
in Advances in Hybridization of Intelligent Methods (Smart Innovation, [27] C. Du, H. Sun, J. Wang, Q. Qi, J. Liao, T. Xu, and M. Liu, ‘‘Capsule net-
Systems and Technologies), vol. 85, I. Hatzilygeroudis and V. Palade, Eds. work with interactive attention for aspect-level sentiment classification,’’
Cham, Switzerland: Springer, 2018. in Proc. Conf. Empirical Methods Natural Lang. Process. 9th Int. Joint
[6] B. Liang, R. Yin, J. Du, L. Gui, Y. He, M. Yang, and R. Xu, ‘‘Embed- Conf. Natural Lang. Process. (EMNLP-IJCNLP), 2019, pp. 5492–5501.
ding refinement framework for targeted aspect-based sentiment anal- [28] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, ‘‘BERT: Pre-
ysis,’’ IEEE Trans. Affect. Comput., early access, Apr. 6, 2021, doi: training of deep bidirectional transformers for language understand-
10.1109/TAFFC.2021.3071388. ing,’’ in Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics,
[7] D. T. Vo and Y. Zhang, ‘‘Target-dependent Twitter sentiment classification Hum. Lang. Technol., vol. 1. Minneapolis, MN, USA: Association for
with rich automatic features,’’ in Proc. 24th Int. Joint Conf. Artif. Intell., Computational Linguistics, Jun. 2019, pp. 4171–4186. [Online]. Avail-
2015, pp. 1–7. able: https://www.aclweb.org/anthology/N19-1423, doi: 10.18653/v1/
[8] N. Zhao, H. Gao, X. Wen, and H. Li, ‘‘Combination of convolu- N19-1423.
tional neural network and gated recurrent unit for aspect-based sen- [29] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, ‘‘Empirical evalua-
timent analysis,’’ IEEE Access, vol. 9, pp. 15561–15569, 2021, doi: tion of gated recurrent neural networks on sequence modeling,’’ 2014,
10.1109/ACCESS.2021.3052937. arXiv:1412.3555.

VOLUME 10, 2022 84145


J. He et al.: Local and Global Context Focus Multilingual Learning Model for Aspect-Based Sentiment Analysis

[30] H. Peng, Y. Ma, Y. Li, and E. Cambria, ‘‘Learning multi-grained aspect JIANGTAO HE received the B.S. degree from the
target sequence for Chinese sentiment analysis,’’ Knowl.-Based Syst., Nanyang Institute of Technology, Nanyang, China,
vol. 148, pp. 167–176, May 2018, doi: 10.1016/j.knosys.2018.02.034. in 2020. He is currently pursuing the M.S. degree
[31] L. Dong, F. Wei, C. Tan, D. Tang, M. Zhou, and K. Xu, ‘‘Adaptive recur- with the School of Computer Science and Engi-
sive neural network for target-dependent Twitter sentiment classification,’’ neering, Xinjiang University, Ürümqi, China. His
in Proc. 52nd Annu. Meeting Assoc. Comput. Linguistics, vol. 2, 2014, research interest includes aspect-based sentiment
pp. 49–54, doi: 10.3115/v1/p14-2009. analysis.
[32] R. Mukherjee, S. Shetty, S. Chattopadhyay, S. Maji, S. Datta, and
P. Goyal, ‘‘Reproducibility, replicability and beyond: Assessing
production readiness of aspect based sentiment analysis in the wild,’’
in Proc. Eur. Conf. Inf. Retr., in Lecture Notes in Computer Science,
vol. 12657, D. Hiemstra, M.-F. Moens, J. Mothe, R. Perego, M. Potthast,
and F. Sebastiani, Eds. Springer, Mar./Apr. 2021, pp. 92–106. [Online].
AISHAN WUMAIER received the Ph.D. degree
Available: https://link.springer.com/chapter/10.1007/978-3-030-72240-
1_7, doi: 10.1007/978-3-030-72240-1_7.
from Xinjiang University, in 2010. He is currently
[33] D. Tang, B. Qin, and T. Liu, ‘‘Aspect level sentiment classification with a Professor with Xinjiang University. His research
deep memory network,’’ in Proc. Conf. Empirical Methods Natural Lang. interests include multimodal natural language pro-
Process., 2016, pp. 214–224. cessing, visual understanding, speech recognition,
[34] B. Huang, Y. Ou, and K. M. Carley, ‘‘Aspect level sentiment classification and machine translation.
with attention-over-attention neural networks,’’ in Proc. Int. Conf. Social
Comput., Behav.-Cultural Modeling Predict. Behav. Represent. Modeling
Simulation, 2018, pp. 197–206.
[35] C. Zhang, Q. Li, and D. Song, ‘‘Aspect-based sentiment classification with
aspect-specific graph convolutional networks,’’ in Proc. Conf. Empirical
Methods Natural Lang. Process. 9th Int. Joint Conf. Natural Lang. Process.
(EMNLP-IJCNLP), 2019, pp. 4560–4570. ZAOKERE KADEER received the M.S. degree
[36] Y. Wang, M. Huang, X. Zhu, and L. Zhao, ‘‘Attention-based LSTM for from Xinjiang University, in 2009. She is currently
aspect-level sentiment classification,’’ in Proc. Conf. Empirical Methods an Experimentalist with Xinjiang University.
Natural Lang. Process., 2016, pp. 606–615. Her research interest includes natural language
[37] P. Chen, Z. Sun, L. Bing, and W. Yang, ‘‘Recurrent attention network on processing.
memory for aspect sentiment analysis,’’ in Proc. Conf. Empirical Methods
Natural Lang. Process., 2017, pp. 452–461.
[38] F. Fan, Y. Feng, and D. Zhao, ‘‘Multi-grained attention network for aspect-
level sentiment classification,’’ in Proc. Conf. Empirical Methods Natural
Lang. Process., 2018, pp. 3433–3442.
[39] K. Sun, R. Zhang, S. Mensah, Y. Mao, and X. Liu, ‘‘Aspect-level
sentiment analysis via convolution over dependency tree,’’ in Proc.
Conf. Empirical Methods Natural Lang. Process. 9th Int. Joint Conf. WEIWEI SUN received the B.S. degree from
Natural Lang. Process. (EMNLP-IJCNLP). Hong Kong: Association the Xuhai College, China University of Mining
for Computational Linguistics, 2019, pp. 5679–5688. [Online]. Avail-
and Technology, China, in 2019. He is currently
able: https://www.aclweb.org/anthology/D19-1569, doi: 10.18653/v1/
pursuing the M.S. degree with the School of Com-
D19-1569.
[40] Z. Gao, A. Feng, X. Song, and X. Wu, ‘‘Target-dependent sentiment puter Science and Engineering, Xinjiang Univer-
classification with BERT,’’ IEEE Access, vol. 7, pp. 154290–154299, 2019. sity, Ürümqi, China. His research interest includes
[41] Y. Song, J. Wang, T. Jiang, Z. Liu, and Y. Rao, ‘‘Targeted sentiment natural language processing.
classification with attentional encoder network,’’ in Proc. Int. Conf.
Artif. Neural Netw. Springer, 2019, pp. 93–103. [Online]. Available:
https://link.springer.com/chapter/10.1007/978-3-030-30490-4_9
[42] H. Xu, B. Liu, L. Shu, and P. S. Yu, ‘‘Bert post-training for review
reading comprehension and aspect-based sentiment analysis,’’ in Proc.
Conf. North Amer. Chapter Assoc. Comput. Linguistics, Hum. Lang., 2019, XIANGZHE XIN received the B.S. degree from
pp. 2324–2335. Henan Normal University, Xinxiang, China,
[43] Y. Zheng, R. Zhang, S. Mensah, and Y. Mao, ‘‘Replicate, in 2019. He is currently pursuing the M.S. degree
walk, and stop on syntax: An effective neural network model with the School of Computer Science and Engi-
for aspect-level sentiment classification,’’ in Proc. AAAI Conf. neering, Xinjiang University, Ürümqi, China.
Artif. Intell., Apr. 2020, vol. 34, no. 5, pp. 9685–9692, [Online]. His research interest includes natural language
Available: https://ojs.aaai.org/index.php/AAAI/article/view/6517, doi: processing.
10.1609/aaai.v34i05.6517.
[44] J. Zhou, J. X. Huang, Q. V. Hu, and L. He, ‘‘SK-GCN: Modeling syntax
and knowledge via graph convolutional network for aspect-level sentiment
classification,’’ Knowl.-Based Syst., vol. 205, Oct. 2020, Art. no. 106292.
[45] L. Huang, X. Sun, S. Li, L. Zhang, and H. Wang, ‘‘Syntax-aware graph
attention network for aspect-level sentiment classification,’’ in Proc. 28th
Int. Conf. Comput. Linguistics, 2020, pp. 799–810.
LINNA ZHENG received the B.S. degree from
[46] K. Wang, W. Shen, Y. Yang, X. Quan, and R. Wang, ‘‘Relational graph Hefei Normal University, Hefei, China, in 2020.
attention network for aspect-based sentiment analysis,’’ in Proc. 58th Annu. She is currently pursuing the M.S. degree with
Meeting Assoc. Comput. Linguistics, 2020, pp. 3229–3238. the School of Computer Science and Engineering,
[47] M. H. Phan and P. O. Ogunbona, ‘‘Modelling context and syntactical Xinjiang University, Ürümqi, China. Her research
features for aspect-based sentiment analysis,’’ in Proc. 58th Annu. interest includes natural language processing.
Meeting Assoc. Comput. Linguistics. Stroudsburg, PA, USA: Association
for Computational Linguistics, 2020, pp. 3211–3220. [Online].
Available: https://www.aclweb.org/anthology/2020.acl-main.293, doi:
10.18653/v1/2020.acl-main.293.

84146 VOLUME 10, 2022

You might also like