Span Detection For Aspect-Based Sentiment Analysis in Vietnamese
Span Detection For Aspect-Based Sentiment Analysis in Vietnamese
Kim Thi-Thanh Nguyen1,2,3 , Sieu Khai Huynh1,2,3 , Phuc Huynh Pham1,2,3 , Luong Luc Phan1,2,3 ,
Duc-Vu Nguyen1,2,4 , Kiet Van Nguyen1,2,4
1
University of Information Technology, Ho Chi Minh City, Vietnam
2
Vietnam National University, Ho Chi Minh City, Vietnam
3
{18520963,18520348,18521260,18521073}@gm.uit.edu.vn
4
{vund, kietnv}@uit.edu.vn
arXiv:2110.07833v1 [cs.CL] 15 Oct 2021
2 Related Work and Dataset et al., 2019b). VLSP provided an ABSA dataset
composed of hotel and restaurant reviews. Unfor-
The SemEval dataset series includes user reviews tunately, the VLSP dataset inspired by SE-ABSA15
from e-commerce websites motivated for much- was only annotated for entity#atribute aspect cate-
related ABSA research (Li et al., 2019; Luo et gory and its sentiment but ignoring the Opinion Tar-
al., 2020; Chen and Qian, 2020). The SemEval- get Extraction. Nguyen et al. (2019a) proposed the
2014 task 4 (SE-ABSA14) (Pontiki et al., 2014) dataset on the same domains as restaurants and ho-
dataset consists of restaurant and laptop reviews. tels, including only 7,828 reviews at document-level
The restaurant subset includes five aspects cate- with seven aspects combined with five polarity sen-
gories (i.e., Food, Service, Price, Ambience and timents for two tasks. Dang et al. (2021) also built a
Anecdotes/Miscellaneous) and four polarity labels dataset for the same domain as two previous works
(i.e., Positive, Negative, Conflict and Neutral). The annotated with high inter-annotator agreements at
laptop subset was just annotated for aspect cate- the sentence level. Mai et al. (2018) collected and
gory detection and sentiment polarity classification. annotated Vietnamese ABSA corpora consisted of
SemEval-2015 Task 12 (Pontiki et al., 2015) dataset only 2,098 sentences for two tasks: opinion target
(SE-ABSA15) is built based on SE-ABSA14. SE- extraction and sentiment polarity detection for the
ABSA15 describes its aspect category as an en- smartphone domain. They presented a multi-task
tity type combined with an attribute type (e.g., model for the two tasks using the sequence labeling
Food#Style Options) and removes the Conflict po- scheme associated with bidirectional recurrent neu-
larity label. The SemEval-2016 task-5 (Pontiki ral networks (BRNN) and conditional random field
et al., 2016) dataset (SE-ABSA16) extended SE- (CRF). To evaluate aspect-based sentiment analysis
ABSA15 to new domains such as Hotels, Con- for mobile e-commerce, Phan et al., (2021) created
sumer Electronics, Telecom, Museums, and other a benchmark dataset (UIT-ViSFD) with 11,122 com-
languages (Dutch, French, Russian, Spanish, Turk- ments based on a strict annotation scheme. Further-
ish, and Arabic). more, they developed a social listening system in
Compared with the prosperity of rich resource Vietnamese based on aspect-based sentiment anal-
languages such as English, Chinese, or Spanish, the ysis. For span detection in ABSA, Hu et al. (2019)
number of high-quality Vietnamese datasets are very proposed a span-based extract-then-classify frame-
low. In 2018, the first ABSA shared-task in Viet- work, where multiple opinion targets are directly ex-
namese was organized by the Vietnamese Language tracted from the sentence under the supervision of
and Speech Processing (VLSP) community (Nguyen
Aspect Definition
SCREEN User comments express screen quality, size, colors, and display technology.
CAMERA The comments mention the quality of a camera, vibration, delay, focus, and image colors.
FEATURES The users refer to features, fingerprint sensor, wifi connection, touch and face detection of the phone.
BATTERY The comments describes battery capacity and battery quality.
PERFORMANCE The reviews describe ramming capacity, processor chip, performance using, and smoothness of the phone.
STORAGE The comments mention storage capacity, the ability to expand capacity through memory cards.
DESIGN The reviews refer to the style, design, and shell.
PRICE The comments present the specific price of the phone.
GENERAL The reviews of customers generally comment about the phone.
SER&ACC7 The comments mention sales service, warranty, and review of accessories of the phone.
7
SER&ACC is short for SERVICE and ACCESSORIES.
target span boundaries, and corresponding polarities such as camera, price, battery, service, and etcetera.
are then classified using their span representations. The dataset includes 11,122 feedback with four at-
This work is inspired by advances in machine com- tributes: comment, n star, date time, and label. Ta-
prehension and question answering (Seo et al., 2018; ble 1 summarizes ten aspects in the guidelines, and
Xu et al., 2018), where the task is to extract a contin- each aspect has one of three sentiments (positive,
uous span of text from the document as the answer to negative, and neutral).
the question (Rajpurkar et al., 2016; Nguyen et al.,
2020). Xu et al. (2020) presented a neat and effec- 3.1 Span Definition and Annotation Guidelines
tive multiple CRFs based structured attention model Following the annotation guidelines proposed by
capable of extracting aspect-specific opinion spans. Phan et al. (2021), we add some definitions and rules
The sentiment polarity of the target is then classified to form the core of data construction. We reuse the
based on the extracted opinion features and contex- ten predefined aspect categories as in Table 1, with
tual information. each aspect category mentioned within the review,
the sentiment polarity over the aspect category is la-
3 Dataset Creation and Analysis beled as Positive, Neutral, or Negative. The span
Based on the benchmark dataset proposed by Phan is defined as the shortest span containing the opin-
et al. (2021), we develop a new dataset for span ions of the user about the aspect category. With
detection for ABSA in Vietnamese. The creation ten predefined aspects, annotators are asked to an-
process of our dataset is described as follows. To notate spans towards aspect categories with senti-
begin with, we edit and revise the annotation guide- ment polarities of each review. Suppose a review
lines from (Phan et al., 2021) for annotators to deter- is given, when a span is discovered within the re-
mine spans and how to annotate data correctly (see view either explicitly or implicitly, the aspect cate-
Section 3.1). Annotators are trained with the guide- gory with sentiment polarity of that span is labeled
lines and annotate data to ensure that the F1-score as aspect#polarity as in Figure 1.
in the training process reaches over 80% before per-
forming data annotation independently (see Section 3.2 Annotation Process
3.2). Finally, we provide an analysis of the dataset Three phases of annotation are conducted as fol-
that helps experts understand this dataset (see Sec- lows. To begin with, we train annotators with the
tion 3.3). guidelines and randomly take about 30-70 reviews
We utilize the ABSA dataset collected from an in the dataset to annotate, then calculate F1-core per
e-commerce website for smartphones in Vietnam, review for those annotated data. For disagreement
which allows customers to write fine-grained re- cases, annotators decide the final label by discussing
views of a smartphone they have purchased. In and having a voting poll. Annotators spend four
the reviews, users comment on multiple aspects training rounds to obtain a high F1-score above 80%
either explicitly or implicitly about many aspects before performing data annotation independently.
Figure 3 shows the F1-score during training phases. polarity. On average, the reviews have three spans,
with each span being about 32 characters long. We
hope our dataset will open the new shared task for
evaluating span detection in aspect-based sentiment
analysis.
4 Our Approach
For the baseline evaluation, we consider span detec-
tion for ABSA as a sequence labeling problem at the
syllable level. We employ a BiLSTM-CRF model
(Huang et al., 2015) with embedding fusion to solve
the task. The BiLSTM-CRF model comprises three
layers: token embedding layer giving contextualized
Figure 3: Results for four rounds of measurement of F1-
score. vector representation of input sequence, passed into
the BiLSTM-CRF sequence labeler as depicted in
An annotation is a triple (d, l, o), where d is a Figure 5.
document id, l a label, and o is a list of start-end B-CAMERA I-CAMERA
O O #POSITIVE #POSITIVE
character offset tuples. An annotator i contributes
a (multi)set Ai of (token) annotations. We compute
(1) for each 2-combination of annotators and report
arithmetic mean of F1 across all these combinations forward
Finally, our dataset is divided randomly into three Figure 5: BiLSTM-CRF network with embedding layers
sets: the training (Train), development (Dev), and (the example feedback means ”This phone has a good
test (Test) in the ratio 7:1:2. Figure 1 presents an ex- camera” in English).
ample review of our dataset and corresponding an-
notations. 4.1 Embedding Fusion Layer
The embedding layer takes as input a sequence
3.3 Dataset Analysis
of N tokens (x1 , x2 , .., xN ), and output a fixed-
Figure 4 presents the distribution of ten aspect cate- dimensional vector representation of each token
gories in our dataset UIT-ViSD4SA. People tend to (e1 , e2 , ..., eN ). We use an embedding fusion of
give a smartphone an overall rating, with 22.76% syllable embedding (Nguyen et al., 2017), charac-
of reviews mentioning GENERAL. Users frequently ter embedding (CharLSTM), contextual embedding
pay great attention to aspects related to their needs, from XLM-RoBERTa (Conneau et al., 2020).
such as PERFORMANCE, BATTERY, FEATURES,
and CAMERA. 4.2 Bidirectional Long Short-Term Memory
The statistics of our dataset are presented in Table (BiLSTM)
2. Our dataset includes 35,396 spans over 11,122 A long-short term memory network (LSTM) is a
comments. Through our analysis, the dataset has an special type of Recurrent neural network (RNN) in-
uneven distribution of sentiment labels. The positive troduced by Hochreiter et al.,(1997), which can cap-
polarity accounts for the most significant number of ture a long-distance semantic relationship by main-
labels, followed by the negative polarity and neutral taining a memory cell store context information.
Figure 4: The distribution of 10 fine-grained aspect categories.
LSTMs do not suffer from vanishing and explod- tor and hidden state vector at time t. Both σ and
ing gradient problems. The LSTM is equipped with tanh are the activation functions, and represents
a memory cell with an adaptive adjustment mecha- the element-wise product. W ∗ and b∗ are net-
nism that adjusts information to be added to or re- work parameters that donate the weight matrices and
moved from the cell. The memory cell is continu- bias vectors. Although LSTM can solve the long-
ously updated during encryption, and the informa- distance dependency problem, it still loses some se-
tion rate is determined by three kernel gates, includ- mantic information due to the sequential encoding
ing input, forget and output. In terms of formality, way of LSTM. For example, ht only contains the se-
the encryption process at the time step t is performed mantic information before time step t. Therefore, a
as follows: Bidirectional LSTM (BiLSTM) is needed to model
both the forward and backward context information
as in equation (8,9), and the two hidden states are
it = σ(Whi ht−1 + Wei ew
t + bi ) (2)
concatenated to obtain the final output as equation
ft = σ(Whf ht−1 + Wef ew
t + bf ) (3) (10):
cet = tanh(Whc ht−1 + Wec ew
t + bc ) (4) →
− −−→
ht = F (ew t , ht−1 ) (8)
ct = ft ct−1 + it cet (5) ←
− ←−−
ht = F (ew t , ht−1 ) (9)
ot = σ(Who ht−1 + Weo ew
t + bo (6) →
− ← −
ht = [ ht , ht ] (10)
ht = ot tanh(ct ) (7)
where ct , it , ft , and ot represent the memory 4.3 Conditional Random Fields (CRF)
cell, input gate, forget gate and output gate respec- Conditional Random Fields (CRF) (Lafferty et al.,
tively. ew
t and ht donate the word embedding vec- 2001) is a sequence modeling framework that brings
in all the advantages of MEMMs models (McCallum bert-based embedding) have significantly better per-
et al., 2000; Ratnaparkhi, 1996) while also solving formance than just one or two embedding layers. In
the label bias problem. With CRF, the inputs and particular, syllabel+char+XLMRlarge achieves the
outputs are directly connected, unlike LSTM and best F1macro of 62.76%, 49.77%, and 45.70% for
BiLSTM networks where memory cells/recurrent aspect, polarity, and aspect#polarity, respectively,
components are employed. Given a training dataset whereas the model with just syllable embedding
D = (x1 , y 1 ), ..., (xN , y N ) of N data sequences layer shows the lowest performances. On the other
to be labeled xi and their corresponding label se- hand, our method tends to be less efficient with la-
quences y i , CRF maximizes the conditional log- bels which consist of polarity, in which polarity task
likelihood of label sequences based on the data se- reach 49.77% F1macro while aspect#polarity task
quences as shown as follow: gets 45.70% F1macro .
Detailed results per class of each task are shown
N K in Tables 4, 5, and 6 (with aspect#polarity label,
X
i i
X λ2k
L= log(P (y |x )) − (11) we only show F1-score). For aspect task, only two
2σ 2 aspects have a high F1-score above 70% (CAM-
i=1 k=1
ERA and BATTERY) while the rest range from 60-
5 Experiments and Results
70%, especially F1-score of PRICE and STORAGE
5.1 Experimental Settings is relatively low (below 50%). With the polarity
Following the IOB format (short for inside, outside, task, the result is descending with the order POS-
beginning), our dataset is converted with data ITIVE, NEGATIVE, NEUTRAL. The result of as-
containing only aspect labels (SCREEN, BAT- pect#polarity can be considered the sum of the two
TERY, CAMERA, etcetera.), sentiment labels only previous tasks: previous high-performing aspects la-
(POSITIVE, NEUTRAL, and NEGATIVE), and bels combined with positive give the highest result.
data containing both aspect and sentiment labels This result explains the lack of quantity uniformity
(SCREEN#POSITIVE, BATTERY#NEGATIVE, in the labels (labels consist of NEUTRAL polarity
etcetera.) to evaluate our approach comprehen- only cover 6.25% of our dataset, detail in Figure
sively. Our word embeddings have three parts: 4). In general, our approach gets better performance
syllable (1), character (2), and contextual from when it comes to detecting span for aspect than po-
XLM-R(3), with an embedding dimension of 100. larity and aspect#polarity span detection. However,
We set the hidden layers of LSTM as 400, the their ability to detect span for all types of labels is
dropout rate as 0.33, and the batch size as 5,000 still limited (F1-score below 80%), which will be
with 30 epochs for training. All experiments are exploited in future work.
conducted on a single NVIDIA T4 GPU card.
5.4 Case Study
5.2 Evaluation Metrics Figure 6 shows several cases predicted by the
In this paper, we use three evaluation metrics: Preci- BiLSTM-CRF model. After reviewing the cases, we
sion, Recall, and F1-score. A predicted span is cor- found that the model commits three common types
rect only if it exactly matches the gold standard span. of errors that can not detect spans, misclassify the
To gain a comprehensive view, we calculate these sentiment polarity, and detect the wrong boundary of
evaluation metrics on both the micro and macro av- spans. As observed in the first sentence, both three
erages. types of models can not detect the span ”there’s
some sound from the speaker”. With the cases of
5.3 Experimental Results misclassification, we found that many cases of this
Table 3 presents performances of the BiLSTM-CRF mistake contained English loanwords. For exam-
model with three types of embedding fusion on the ple, in comment 2, the span ”Really like the dark
aspect, polarity, aspect#polarity span detection. Ac- mode” is about the interface, and we annotate it as
cording to our results, we can see that concate- PERFORMACNE#POSITIVE. However, the model
nate three embedding layers (syllable, character, and can understand it and classify it as CAMERA (as-
System PM icro RM icro F1M icro PM acro RM acro F1M acro
Aspect (syllable) 64.55 60.86 62.65 62.76 57.28 59.74
Aspect (syllable + char) 63.78 62.11 62.93 61.64 58.91 60.21
Aspect (syllable + char + XLM-RBase ) 65.63 65.15 65.39 62.88 61.62 62.17
Aspect (syllable + char + XLM-RLarge ) 64.96 66.85 65.89 62.00 63.56 62.76
Polarity (syllable) 52.36 50.10 51.20 46.71 38.37 41.05
Polarity (syllable + char) 52.12 51.00 51.55 44.44 38.79 40.68
Polarity (syllable + char + XLM-RBase ) 54.88 55.91 55.39 46.87 46.39 46.57
Polarity (syllable + char + XLM-RLarge ) 56.89 59.78 58.30 49.00 50.60 49.77
Aspect-polarity (syllable) 61.87 54.55 57.98 48.77 34.27 37.64
Aspect-polarity (syllable + char) 59.51 57.56 58.52 43.66 37.53 39.30
Aspect-polarity (syllable + char + XLM-RBase ) 60.71 61.62 61.16 46.18 43.42 44.37
Aspect-polarity (syllable + char + XLM-RLarge ) 61.78 62.99 62.38 46.84 45.46 45.70
Table 4: Result per class for only aspect label. Table 6: F1-score per class for aspect#polarity label.
Table 5: Result per class for only sentiment polarity label. 6 Conclusion and Future Work
This paper presented UIT-ViSD4SA, which is a new
pect label model) or FEAUTURE#POSITIVE (as- dataset for span detection on aspect-based senti-
pect#polarity label model). This feature needs at- ment analysis and consists of over 35,000 human-
tention and research in future studies because the annotated spans on 11,122 comments for mobile e-
Vietnamese language feature (especially in technol- commerce. Each feedback is manually annotated ac-
ogy) often includes many loanwords with meanings cording to its spans towards ten fine-grained aspect
that can be similar or different from the original lan- categories with their sentiment polarities. BiLSTM-
guage. Besides, the polarity model incorrectly pre- CRF uses an embedding fusion of syllable, char-
dicts the target span by detecting the whole span acter, and contextual embedding, which had the
”the screen is clear, play game phone is warm, but highest 62.76% F1macro for span detection on as-
noise speaker” as a NEUTRAL span. This mistake pect, 49.77% F1macro for span detection on polar-
can be blamed on the way we train the model just ity, and 45.70% F1macro for span detection on as-
with polarity label, which makes it difficult for the pect#polarity, respectively. In general, the perfor-
model to identify the aspect to which the emotional mances are relatively not high and challenging for
Aspect Aspect#polarity
Gold labels Polarity prediction
prediction prediction
tôi cảm thấy, loa có tiếng gì đó phát raFEATURES#NEGATIVE,
in trâuBATTERY#POSITIVE
mặc dù k chạm vào điện thoại.còn lại in màn nétSCREEN
trâuBATTERY#POSITIVE, màn nétSCREEN#POSITIVE, chơi game
màn nétSCREEN#POSITIVE
ấmPERFORMANCE#NEGATIVE, nhưng loa dèFEATURES#NEGATIVE chơi game màn nét, chơi game
1 ấmPERFORMANCE ấm, nhưng loa
chơi game
i feel, there’re some sound from the speaker, even though I dèNEUTRALX
ấmPERFORMANCE#POSITIVEX
don’t touch the phone.the rest is battery last long, the loa dèFEATURES
screen is clear, play game phone is warm, but noisy
loa dèFEATURES#NEGATIVE
speaker
Sử dụng hơn 3 tháng thấy máy rất tốtGENERAL#POSITIVE ,
máy rất tốtGENERAL máy rất tốtPOSITIVEL máy rất tốtGENERAL#POSITIVE
dùng 2 ngày mới sạc lần , lần sạc 2-3 tiếng là
đầyBATTERY#POSITIVE. Rất thích dark mode
dùng 2 ngày mới dùng 2 ngày mới dùng 2 ngày mới sạc lần ,
PERFORMANCE#POSITIVE.
sạc lần , lần sạc 2-3 sạc lần , lần sạc 2-3 lần sạc 2-3 tiếng là
2
tiếng là đầyBATTERY tiếng là đầyPOSITIVE đầyBATTERY#POSITIVE
Using more than 3 months find that the device is really goo,
using till 2 days to need to charge , take 2-3 hours to full.
Rất thích dark Rất thích dark mode Rất thích dark mode
Really like the dark mode.
mode CAMERAX POSITIVE FEATURES#POSITIVEX
Figure 6: Case study. The spans are bold with aspects and their polarities are given as subscripts. Incorrect predictions
are marked with X.
further machine learning-based models. We hope Computational Linguistics, pages 3685–3694. Associ-
the release of UIT-ViSD4SA could motivate the de- ation for Computational Linguistics.
velopment of machine learning models and applica- [Chen et al.2017] Peng Chen, Zhongqian Sun, Lidong
tions. Bing, and Wei Yang. 2017. Recurrent attention net-
work on memory for aspect sentiment analysis. In
In future work, we give several directions: (1)
Proceedings of the 2017 Conference on Empirical
Inspired by Yuan et al. (2020), multilingual pre- Methods in Natural Language Processing, pages 452–
trained language models can be used for enhancing 461, Copenhagen, Denmark, September. Association
span boundary detection. (2) Improving the per- for Computational Linguistics.
formance of this task can be used with approaches [Conneau et al.2020] Alexis Conneau, Kartikay Khandel-
based on machine comprehension reading, and other wal, Naman Goyal, Vishrav Chaudhary, Guillaume
approaches (Hu et al., 2019; Xu et al., 2020). (3) Wenzek, Francisco Guzmán, Edouard Grave, Myle
Inspired by Xu et al. (2019), review reading com- Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020.
prehension for Vietnamese can be developed on our Unsupervised cross-lingual representation learning at
dataset. (4) Span detection is a challenging task that scale.
can motivate various future works on constructive [Fujita et al.2019] Soichiro Fujita, Hayato Kobayashi,
analysis (Fujita et al., 2019; Nguyen et al., 2021a), and Manabu Okumura. 2019. Dataset creation for
ranking constructive news comments. In Proceed-
emotion analysis (Sosea and Caragea, 2020; Ho et
ings of the 57th Annual Meeting of the Association for
al., 2019), complaint analysis (Preoţiuc-Pietro et al., Computational Linguistics, pages 2619–2626.
2019; Nguyen et al., 2021b), and opinion mining
[Ho et al.2019] Vong Anh Ho, Duong Huynh-Cong
(Nguyen et al., 2018; Jiang et al., 2019). Nguyen, Danh Hoang Nguyen, Linh Thi-Van Pham,
Duc-Vu Nguyen, Kiet Van Nguyen, and Ngan Luu-
Thuy Nguyen. 2019. Emotion Recognition for Viet-
References namese Social Media Text. In International Confer-
ence of the Pacific Association for Computational Lin-
[Chen and Qian2020] Zhuang Chen and Tieyun Qian. guistics, pages 319–333. Springer.
2020. Relation-aware collaborative learning for uni- [Hochreiter and Schmidhuber1997] S. Hochreiter and
fied aspect-based sentiment analysis. In Proceedings J. Schmidhuber. 1997. Long short-term memory.
of the 58th Annual Meeting of the Association for Neural Computation, 9:1735–1780.
[Hripcsak and Rothschild2005] G. Hripcsak and A. Roth- [Mai and Le2018] Long Mai and Bac Le. 2018. Aspect-
schild. 2005. Technical brief: Agreement, the f- based sentiment analysis of vietnamese texts with deep
measure, and reliability in information retrieval. Jour- learning. In Asian Conference on Intelligent Informa-
nal of the American Medical Informatics Association : tion and Database Systems, pages 149–158. Springer.
JAMIA, 12 3:296–8. [McCallum et al.2000] A. McCallum, Dayne Freitag, and
[Hu and Liu2004] Minqing Hu and Bing Liu. 2004. Min- Fernando C Pereira. 2000. Maximum entropy markov
ing and summarizing customer reviews. In Proceed- models for information extraction and segmentation.
ings of the Tenth ACM SIGKDD International Confer- In ICML.
ence on Knowledge Discovery and Data Mining, page
[Nguyen et al.2017] Dat Quoc Nguyen, Thanh Vu,
168–177. Association for Computing Machinery.
Dai Quoc Nguyen, Mark Dras, and Mark Johnson.
[Hu et al.2019] Minghao Hu, Yuxing Peng, Zhen Huang,
2017. From word segmentation to POS tagging for
Dongsheng Li, and Yiwei Lv. 2019. Open-Domain
Vietnamese. In Proceedings of the Australasian Lan-
Targeted Sentiment Analysis via Span-Based Extrac-
guage Technology Association Workshop 2017, pages
tion and Classification. In Proceedings of ACL, pages
108–113.
537–546. Association for Computational Linguistics.
[Huang et al.2015] Zhiheng Huang, Wei Xu, and Kai Yu. [Nguyen et al.2018] Huyen TM Nguyen, Hung V
2015. Bidirectional lstm-crf models for sequence tag- Nguyen, Quyen T Ngo, Luong X Vu, Vu Mai Tran,
ging. Bach X Ngo, and Cuong A Le. 2018. Vlsp shared
[Jiang et al.2019] Qingnan Jiang, Lei Chen, Ruifeng Xu, task: sentiment analysis. Journal of Computer Science
Xiang Ao, and Min Yang. 2019. A challenge dataset and Cybernetics, 34(4):295–310.
and effective models for aspect-based sentiment anal- [Nguyen et al.2019a] Hao Nguyen, Tri Nguyen, Thin
ysis. In Proceedings of the 2019 Conference on Dang, and Ngan Nguyen. 2019a. A corpus for aspect-
Empirical Methods in Natural Language Processing based sentiment analysis in vietnamese. pages 1–5,
and the 9th International Joint Conference on Nat- 10.
ural Language Processing (EMNLP-IJCNLP), pages [Nguyen et al.2019b] Huyen Nguyen, Hung Nguyen,
6280–6285. Quyen Ngo, Luong Vu, Vu Tran, Ngo Xuan Bach, and
[Jo and Oh2011] Yohan Jo and Alice H. Oh. 2011. As- Cuong Le. 2019b. Vlsp shared task: Sentiment anal-
pect and sentiment unification model for online review ysis. Journal of Computer Science and Cybernetics,
analysis. In Proceedings of the Fourth ACM Interna- 34:295–310, 01.
tional Conference on Web Search and Data Mining, [Nguyen et al.2020] Kiet Nguyen, Vu Nguyen, Anh
page 815–824. Association for Computing Machinery. Nguyen, and Ngan Nguyen. 2020. A vietnamese
[Kiritchenko et al.2014] Svetlana Kiritchenko, Xiaodan dataset for evaluating machine reading comprehen-
Zhu, Colin Cherry, and Saif Mohammad. 2014. NRC- sion. In Proceedings of the 28th International Confer-
Canada-2014: Detecting aspects and sentiment in cus- ence on Computational Linguistics, pages 2595–2605.
tomer reviews. In Proceedings of the 8th International
Workshop on Semantic Evaluation (SemEval 2014), [Nguyen et al.2021a] Luan Thanh Nguyen, Kiet
pages 437–442, Dublin, Ireland. Association for Com- Van Nguyen, and Ngan Luu-Thuy Nguyen. 2021a.
putational Linguistics. Constructive and toxic speech detection for open-
domain social media comments in vietnamese. In
[Lafferty et al.2001] John D. Lafferty, Andrew McCal-
Advances and Trends in Artificial Intelligence. Artifi-
lum, and Fernando C. N. Pereira. 2001. Condi-
cial Intelligence Practices, pages 572–583. Springer
tional random fields: Probabilistic models for seg-
International Publishing.
menting and labeling sequence data. In Proceedings of
the Eighteenth International Conference on Machine [Nguyen et al.2021b] Nhung Thi-Hong Nguyen,
Learning, ICML ’01, page 282–289. Morgan Kauf- Phuong Ha-Dieu Phan, Luan Thanh Nguyen,
mann Publishers Inc. Kiet Van Nguyen, and Ngan Luu-Thuy Nguyen.
[Li et al.2019] Xin Li, Lidong Bing, Piji Li, and Wai Lam. 2021b. Vietnamese open-domain complaint de-
2019. A unified model for opinion target extraction tection in e-commerce websites. arXiv preprint
and target sentiment prediction. In AAAI. arXiv:2104.11969.
[Luo et al.2020] Huaishao Luo, Lei Ji, Tianrui Li, Daxin [Phan et al.2021] Luong Luc Phan, Phuc Huynh Pham,
Jiang, and Nan Duan. 2020. GRACE: Gradient har- Kim Thi-Thanh Nguyen, Tham Nguyen, Sieu Khai
monized and cascaded labeling for aspect-based sen- Huynh, Luan Thanh Nguyen, Tin Van Huynh, and
timent analysis. In Findings of the Association for Kiet Van Nguyen. 2021. Sa2sl: From aspect-based
Computational Linguistics: EMNLP 2020, pages 54– sentiment analysis to social listening system for busi-
64. Association for Computational Linguistics. ness intelligence. In KSEM.
[Pontiki et al.2014] Maria Pontiki, Dimitris Galanis, John [Xu et al.2019] Hu Xu, Bing Liu, Lei Shu, and S Yu
Pavlopoulos, Harris Papageorgiou, Ion Androutsopou- Philip. 2019. Bert post-training for review read-
los, and Suresh Manandhar. 2014. SemEval-2014 ing comprehension and aspect-based sentiment anal-
Task 4: Aspect Based Sentiment Analysis. In Pro- ysis. In Proceedings of the 2019 Conference of the
ceedings of ACL, pages 27–35. Association for Com- North American Chapter of the Association for Com-
putational Linguistics. putational Linguistics: Human Language Technolo-
[Pontiki et al.2015] Maria Pontiki, Dimitrios Galanis, gies, Volume 1 (Long and Short Papers), pages 2324–
Harris Papageorgiou, Suresh Manandhar, and Ion An- 2335.
droutsopoulos. 2015. Semeval-2015 Task 12: Aspect [Xu et al.2020] Lu Xu, Lidong Bing, Wei Lu, and Fei
based sentiment analysis. In Proceedings of SemEval, Huang. 2020. Aspect Sentiment Classification with
pages 486–495. Aspect-Specific Opinion Spans. In Proceedings of
EMNLP, pages 3561–3567. Association for Compu-
[Pontiki et al.2016] Maria Pontiki, Dimitrios Galanis,
tational Linguistics.
Haris Papageorgiou, Ion Androutsopoulos, Suresh
Manandhar, Mohammad Al-Smadi, Mahmoud Al- [Yuan et al.2020] Fei Yuan, Linjun Shou, Xuanyu Bai,
Ayyoub, Yanyan Zhao, Bing Qin, Orphée De Clercq, Ming Gong, Yaobo Liang, Nan Duan, Yan Fu, and
et al. 2016. Semeval-2016 Task 5: Aspect based senti- Daxin Jiang. 2020. Enhancing answer boundary de-
ment analysis. In Proceedings of SemEval, pages 19– tection for multilingual machine reading comprehen-
30. sion. In Proceedings of the 58th Annual Meeting of
the Association for Computational Linguistics, pages
[Preoţiuc-Pietro et al.2019] Daniel Preoţiuc-Pietro, Mi- 925–934.
haela Gaman, and Nikolaos Aletras. 2019. Automat-
ically identifying complaints in social media. In Pro-
ceedings of the 57th Annual Meeting of the Associa-
tion for Computational Linguistics, pages 5008–5019.
Association for Computational Linguistics.
[Rajpurkar et al.2016] Pranav Rajpurkar, Jian Zhang,
Konstantin Lopyrev, and Percy Liang. 2016. Squad:
100,000+ questions for machine comprehension of
text.
[Ratnaparkhi1996] Adwait Ratnaparkhi. 1996. A max-
imum entropy model for part-of-speech tagging. In
Conference on Empirical Methods in Natural Lan-
guage Processing.
[Seo et al.2018] Minjoon Seo, Aniruddha Kembhavi, Ali
Farhadi, and Hannaneh Hajishirzi. 2018. Bidirec-
tional attention flow for machine comprehension.
[Sosea and Caragea2020] Tiberiu Sosea and Cornelia
Caragea. 2020. Canceremo: A dataset for fine-
grained emotion detection. In Proceedings of the 2020
Conference on Empirical Methods in Natural Lan-
guage Processing (EMNLP), pages 8892–8904.
[Van Thin et al.2021] Dang Van Thin, Ngan Luu-Thuy
Nguyen, Tri Minh Truong, Lac Si Le, and Duy Tin Vo.
2021. Two new large corpora for vietnamese aspect-
based sentiment analysis at sentence level. ACM
Trans. Asian Low-Resour. Lang. Inf. Process., 20(4).
[Xu et al.2018] Hu Xu, Bing Liu, Lei Shu, and Philip S.
Yu. 2018. Double embeddings and CNN-based se-
quence labeling for aspect extraction. In Proceed-
ings of the 56th Annual Meeting of the Association
for Computational Linguistics (Volume 2: Short Pa-
pers), pages 592–598. Association for Computational
Linguistics.