Towards Generative Aspect-Based Sentiment Analysis
Towards Generative Aspect-Based Sentiment Analysis
Wenxuan Zhang1 , Xin Li2 , Yang Deng1 , Lidong Bing2 and Wai Lam1
1
The Chinese University of Hong Kong
2
DAMO Academy, Alibaba Group
{wxzhang,ydeng,wlam}@se.cuhk.edu.hk
{xinting.lx,l.bing}@alibaba-inc.com
504
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics
and the 11th International Joint Conference on Natural Language Processing (Short Papers), pages 504–510
August 1–6, 2021. ©2021 Association for Computational Linguistics
eral language understanding problems such as opinion terms as pairs (Zhao et al., 2020; Chen
named entity recognition, question answering, and et al., 2020). Here is an illustrative example of our
text classification as generation tasks (Raffel et al., generative formulations for the AOPE task:
2020; Athiwaratkun et al., 2020), we propose to
Input: Salads were fantastic, our server was
tackle various ABSA problems in a unified gen-
also very helpful.
erative approach in this paper. It can fully utilize
Target (Annotation-style): [Salads
the rich label semantics by encoding the natural
| fantastic] were fantastic here, our [server |
language label into the target output. Moreover,
helpful] was also very helpful.
this unified generative model can be seamlessly
Target (Extraction-style): (Salads,
adapted to multiple tasks without introducing addi-
fantastic); (server, helpful)
tional task-specific model designs.
In order to enable the Generative Aspect-based In the annotation-style paradigm, to indicate the
Sentiment analysis (G AS), we tailor-make two pair relations between the aspect and opinion terms,
paradigms, namely annotation-style and extraction- we append the associated opinion modifier to each
style modeling to transform the original task as a aspect term in the form of [aspect | opinion] for
generation problem. Given a sentence, the former constructing the target sentence, as shown in the
one adds annotations on it to include the label in- above example. The prediction of the coupled as-
formation when constructing the target sentence; pect and opinion term is thus achieved by including
while the latter directly adopts the desired natural them in the same bracket. For the extraction-style
language label of the input sentence as the target. paradigm, we treat the desired pairs as the target,
The original sentence and the target sentence pro- which resembles direct extraction of the expected
duced by either paradigm can then be paired as a sentiment elements but in a generative manner.
training instance of the generation model. Further- Unified ABSA (UABSA) is the task of extract-
more, we propose a prediction normalization strat- ing aspect terms and predicting their sentiment po-
egy to handle the issue that the generated sentiment larities at the same time (Li et al., 2019a; Chen and
element falls out of its corresponding label vocabu- Qian, 2020). We also formulate it as an (aspect,
lary set. We investigate four ABSA tasks including sentiment polarity) pair extraction problem. For
Aspect Opinion Pair Extraction (AOPE), Unified the same example given above, we aim to extract
ABSA (UABSA), Aspect Sentiment Triplet Extrac- two pairs: (Salads, positive) and (server, positive).
tion (ASTE), and Target Aspect Sentiment Detec- Similarly, we replace each aspect term as [aspect |
tion (TASD) with the proposed unified G AS frame- sentiment polarity] under the annotation-style for-
work to verify its effectiveness and generality. mulation and treat the desired pairs as the target
Our main contributions are 1) We tackle various output in the extraction-style paradigm to reformu-
ABSA tasks in a novel generative manner; 2) We late the UABSA task as a text generation problem.
propose two paradigms to formulate each task as a
generation problem and a prediction normalization Aspect Sentiment Triplet Extraction (ASTE)
strategy to refine the generated outputs; 3) We con- aims to discover more complicated (aspect, opin-
duct experiments on multiple benchmark datasets ion, sentiment polarity) triplets (Peng et al., 2020):
across four ABSA tasks and our approach surpasses Input: The Unibody construction is solid,
previous state-of-the-art in almost all cases. Specif- sleek and beautiful.
ically, we obtain 7.6 and 3.7 averaged gains on the Target (Annotation-style): The
challenging ASTE and TASD task respectively. [Unibody construction | positive | solid, sleek,
beautiful] is solid, sleek and beautiful.
2 Generative ABSA (G AS) Target (Extraction-style): (Uni-
2.1 ABSA with Generative Paradigm body construction, solid, positive); (Unibody
construction, sleek, positive); (Unibody
In this section, we describe the investigated ABSA construction, beautiful, positive);
tasks and the proposed two paradigms, namely,
annotation-style and extraction-style modeling. As shown above, we annotate each aspect term
with its corresponding sentiment triplet wrapped
Aspect Opinion Pair Extraction (AOPE) aims in the bracket, i.e., [aspect|opinion|sentiment po-
to extract aspect terms and their corresponding larity] for the annotation-style modeling. Note that
505
we will include all the opinion modifiers of the L14 R14 R15 R16
same aspect term within the same bracket to pre- HAST+TOWE† 53.41 62.39 58.12 63.84
dict the sentiment polarities more accurately. For JERE-MHS† 52.34 66.02 59.64 67.65
the extraction-style paradigm, we just concatenate SpanMlt (Zhao et al., 2020) 68.66 75.60 64.68 71.78
SDRN (Chen et al., 2020) 66.18 73.30 65.75 73.67
all triplets as the target output.
G AS -A NNOTATION -R 68.74 72.66 65.03 73.75
Target Aspect Sentiment Detection (TASD) is G AS -E XTRACTION -R 67.58 73.22 65.83 74.12
G AS -A NNOTATION 69.55 75.15 67.93 75.42
the task to detect all (aspect term, aspect category,
G AS -E XTRACTION 68.08 74.12 67.19 74.54
sentiment polarity) triplets for a given sentence
(Wan et al., 2020), where the aspect category be- Table 1: Main results of the AOPE task. The best re-
longs to a pre-defined category set. For example, sults are in bold, second best results are underlined. Re-
sults are the average F1 scores over 5 runs. † denotes
Input: A big disappointment, all around. The results are from Zhao et al. (2020).
pizza was cold and the cheese wasn’t even fully
melted. L14 R14 R15 R16
Target (Annotation-style): A big BERT+GRU (Li et al., 2019b) 61.12 73.17 59.60 70.21
disappointment, all around. The [pizza | food SPAN-BERT (Hu et al., 2019) 61.25 73.68 62.29 -
IMN-BERT (He et al., 2019) 61.73 70.72 60.22 -
quality | negative] was cold and the [cheese |
RACL (Chen and Qian, 2020) 63.40 75.42 66.05 -
food quality | negative] wasn’t even fully melted Dual-MRC (Mao et al., 2021) 65.94 75.95 65.08 -
[null | restaurant general | negative]. G AS -A NNOTATION -R 67.37 75.77 65.75 71.87
Target (Extraction-style): (pizza, G AS -E XTRACTION -R 66.71 76.30 64.00 72.39
food quality, negative); (cheese, food quality, G AS -A NNOTATION 68.64 76.58 66.78 73.21
G AS -E XTRACTION 68.06 77.13 65.96 73.64
negative); (null, restaurant general, negative);
Table 2: Main results of the UABSA task. The best
Similarly, we pack each aspect term, the aspect results are in bold, second best results are underlined.
category it belongs to, and its sentiment polarity Results are the average F1 scores over 5 runs.
into a bracket to build the target sentence for the
annotation-style method. Note that we use a bigram
expression for the aspect category instead of the we ignore such predictions. For the extraction-
original uppercase form “FOOD#QUALITY” to style paradigm, we separate the generated pairs
make the annotated target sentence more natural. or triplets from the sequence y 0 and ignore those
As presented in the example, some triplets may not invalid generations in a similar way.
have explicitly-mentioned aspect terms, we thus We adopt the pre-trained T5 model (Raffel et al.,
use “null” to represent it and put such triplets at the 2020) as the generation model f (·), which closely
end of the target output. For the extraction-style follows the encoder-decoder architecture of the
paradigm, we concatenate all the desired triplets, original Transformer (Vaswani et al., 2017). There-
including those with implicit aspect terms, as the fore, by formulating these ABSA tasks as a text
target sentence for sequence-to-sequence learning. generation problem, we can tackle them in a uni-
fied sequence-to-sequence framework without task-
2.2 Generation Model specific model design.
Given the input sentence x, we generate a target se-
2.3 Prediction Normalization
quence y 0 , which is either based on the annotation-
style or extraction-style paradigm as described in Ideally, the generated element e ∈ s after decod-
the last section, with a text generation model f (·). ing is supposed to exactly belong to the vocabulary
Then the desired sentiment pairs or triplets s can be set it is meant to be. For example, the predicted
decoded from the generated sequence y 0 . Specif- aspect term should explicitly appear in the input
ically, for the annotation-style modeling, we ex- sentence. However, this might not always hold
tract the contents included in the bracket “[]” from since each element is generated from the vocabu-
y 0 , and separate different sentiment elements with lary set containing all tokens instead of its specific
the vertical bar “|”. If such decoding fails, e.g., vocabulary set. Thus, the predictions of a genera-
we cannot find any bracket in the output sentence tion model may exhibit morphology shift from the
or the number of vertical bars is not as expected, ground-truths, e.g., from single to plural nouns.
506
L14 R14 R15 R16 Rest15 Rest16
CMLA+ (Wang et al., 2017) 33.16 42.79 37.01 41.72 Baseline (Brun and Nikoulina, 2018) - 38.10
Li-unified-R (Li et al., 2019a) 42.34 51.00 47.82 44.31
TAS-LPM-CRF (Wan et al., 2020) 54.76 64.66
Pipeline (Peng et al., 2020) 42.87 51.46 52.32 54.21
Jet (Xu et al., 2020) 43.34 58.14 52.50 63.21
TAS-SW-CRF (Wan et al., 2020) 57.51 65.89
Jet+BERT (Xu et al., 2020) 51.04 62.40 57.53 63.83 TAS-SW-TO (Wan et al., 2020) 58.09 65.44
G AS -A NNOTATION -R 52.80 67.35 56.95 67.43 G AS -A NNOTATION -R 59.27 66.54
G AS -E XTRACTION -R 58.19 70.52 60.23 69.05 G AS -E XTRACTION -R 60.63 68.31
G AS -A NNOTATION 54.31 69.30 61.02 68.65 G AS -A NNOTATION 60.06 67.70
G AS -E XTRACTION 60.78 72.16 62.10 70.10 G AS -E XTRACTION 61.47 69.42
Table 3: Main results of the ASTE task. The best re- Table 4: Main results of the TASD task. The best re-
sults are in bold, second best results are underlined. Re- sults are in bold, second best results are underlined. Re-
sults are the average F1 scores over 5 runs. sults are the average F1 scores over 5 runs.
507
B EFORE A FTER L ABEL cases, the difficulty does not come from the way of
performing prediction normalization but the gen-
#1 Bbq rib BBQ rib BBQ rib
#2 repeat repeats repeats eration of labels close to the ground truths, espe-
#3 chicken peas chick peas chick peas cially for the examples containing implicit aspects
#4 bodys bodies None or opinions (Case #4).
#5 cafe coffee coffee 4 Conclusions and Future Work
#6 vegetarian vegan vegetarian
#7 salmon not spinach We tackle various ABSA tasks in a novel genera-
#8 flight cookie might cookie fortune cookie tive framework in this paper. By formulating the
target sentences with our proposed annotation-style
Table 5: Example cases of the predictions before and and extraction-style paradigms, we solve multiple
after the prediction normalization. sentiment pair or triplet extraction tasks with a uni-
fied generation model. Extensive experiments on
3.3 Discussions multiple benchmarks across four ABSA tasks show
the effectiveness of our proposed method.
Annotation-style & Extraction-style As shown Our work is an initial attempt on transforming
in result tables, the annotation-style method gen- ABSA tasks, which are typically treated as classi-
erally performs better than the extraction-style fication problems, into text generation problems.
method on the AOPE and UASA task. However, Experimental results indicate that such transfor-
the former one becomes inferior to the latter on the mation is an effective solution to tackle various
more complex ASTE and TASD tasks. One possi- ABSA tasks. Following this direction, designing
ble reason is that, on the ASTE and TASD tasks, more effective generation paradigms and extending
the annotation-style method introduces too much such ideas to other tasks can be interesting research
content, such as the aspect category and sentiment problems for future work.
polarity, into the target sentence, which increases
the difficulty of sequence-to-sequence learning.
References
Why Prediction Normalization Works To bet-
Ben Athiwaratkun, Cı́cero Nogueira dos Santos, Jason
ter understand the effectiveness of the proposed pre- Krone, and Bing Xiang. 2020. Augmented natural
diction normalization strategy, we randomly sam- language for generative sequence labeling. In Pro-
ple some instances from the ASTE task that have ceedings of the 2020 Conference on Empirical Meth-
different raw prediction and normalized prediction ods in Natural Language Processing, EMNLP 2020,
pages 375–385.
(i.e., corrected by our strategy). The predicted sen-
timent elements before and after the normalization, Caroline Brun and Vassilina Nikoulina. 2018. As-
as well as the gold label of some example cases are pect based sentiment analysis into the wild. In Pro-
ceedings of the 9th Workshop on Computational Ap-
shown in Table 5. We find that the normalization proaches to Subjectivity, Sentiment and Social Me-
mainly helps on two occasions: The first one is dia Analysis, WASSA@EMNLP 2018, pages 116–
the morphology shift where two words have mi- 122.
nor lexical differences. For example, the method
Peng Chen, Zhongqian Sun, Lidong Bing, and Wei
fixes “Bbq rib” to “BBQ rib” (#1) and “repeat” to Yang. 2017. Recurrent attention network on mem-
“repeats” (#2). Another case is orthographic alterna- ory for aspect sentiment analysis. In Proceedings of
tives where the model might generate words with the 2017 Conference on Empirical Methods in Natu-
the same etyma but different word types, e.g., it ral Language Processing, EMNLP 2017, pages 452–
461.
outputs “vegetarian” rather than “vegan” (#6). Our
proposed prediction normalization, which finds the Shaowei Chen, Jie Liu, Yu Wang, Wenzheng Zhang,
replacement from the corresponding vocabulary set and Ziming Chi. 2020. Synchronous double-
channel recurrent network for aspect-opinion pair
via Levenshtein distance, is a simple yet effective extraction. In Proceedings of the 58th Annual Meet-
strategy to alleviate this issue. ing of the Association for Computational Linguistics,
We also observe that our prediction strategy may ACL 2020, pages 6515–6524.
fail if the raw predictions are quite lexically differ- Zhuang Chen and Tieyun Qian. 2020. Relation-aware
ent or even semantically different from the gold- collaborative learning for unified aspect-based sen-
standard labels (see Case #4, #7 and #8). In these timent analysis. In Proceedings of the 58th Annual
508
Meeting of the Association for Computational Lin- Pengfei Liu, Shafiq R. Joty, and Helen M. Meng. 2015.
guistics, ACL 2020, pages 3685–3694. Fine-grained opinion mining with recurrent neural
networks and word embeddings. In Proceedings
Zhifang Fan, Zhen Wu, Xin-Yu Dai, Shujian Huang, of the 2015 Conference on Empirical Methods in
and Jiajun Chen. 2019. Target-oriented opinion Natural Language Processing, EMNLP 2015, pages
words extraction with target-fused neural sequence 1433–1443.
labeling. In Proceedings of the 2019 Conference of
the North American Chapter of the Association for Huaishao Luo, Tianrui Li, Bing Liu, and Junbo Zhang.
Computational Linguistics: Human Language Tech- 2019. DOER: dual cross-shared RNN for aspect
nologies, NAACL-HLT 2019, pages 2509–2518. term-polarity co-extraction. In Proceedings of the
57th Conference of the Association for Computa-
Ruidan He, Wee Sun Lee, Hwee Tou Ng, and Daniel tional Linguistics, ACL 2019, pages 591–601.
Dahlmeier. 2019. An interactive multi-task learn-
ing network for end-to-end aspect-based sentiment Dehong Ma, Sujian Li, Fangzhao Wu, Xing Xie,
analysis. In Proceedings of the 57th Conference of and Houfeng Wang. 2019. Exploring sequence-to-
the Association for Computational Linguistics, ACL sequence learning in aspect term extraction. In Pro-
2019, pages 504–515. ceedings of the 57th Conference of the Association
for Computational Linguistics, ACL 2019, pages
Minghao Hu, Yuxing Peng, Zhen Huang, Dongsheng 3538–3547.
Li, and Yiwei Lv. 2019. Open-domain targeted sen-
timent analysis via span-based extraction and classi- Yue Mao, Yi Shen, Chao Yu, and Longjun Cai. 2021. A
fication. In Proceedings of the 57th Conference of joint training dual-mrc framework for aspect based
the Association for Computational Linguistics, ACL sentiment analysis. CoRR, abs/2101.00816.
2019, pages 537–546.
Haiyun Peng, Lu Xu, Lidong Bing, Fei Huang, Wei Lu,
Binxuan Huang and Kathleen M. Carley. 2018. Param- and Luo Si. 2020. Knowing what, how and why:
eterized convolutional neural networks for aspect A near complete solution for aspect-based sentiment
level sentiment classification. In Proceedings of the analysis. In The Thirty-Fourth AAAI Conference
2018 Conference on Empirical Methods in Natural on Artificial Intelligence, AAAI 2020, pages 8600–
Language Processing, pages 1091–1096. 8607.
Qingnan Jiang, Lei Chen, Ruifeng Xu, Xiang Ao, Maria Pontiki, Dimitris Galanis, Haris Papageor-
and Min Yang. 2019. A challenge dataset and ef- giou, Ion Androutsopoulos, Suresh Manandhar, Mo-
fective models for aspect-based sentiment analysis. hammad Al-Smadi, Mahmoud Al-Ayyoub, Yanyan
In Proceedings of the 2019 Conference on Empiri- Zhao, Bing Qin, Orphée De Clercq, Véronique
cal Methods in Natural Language Processing and Hoste, Marianna Apidianaki, Xavier Tannier, Na-
the 9th International Joint Conference on Natural talia V. Loukachevitch, Evgeniy V. Kotelnikov,
Language Processing, EMNLP-IJCNLP 2019, pages Núria Bel, Salud Marı́a Jiménez Zafra, and Gülsen
6279–6284. Eryigit. 2016. Semeval-2016 task 5: Aspect based
sentiment analysis. In Proceedings of the 10th
Vladimir I Levenshtein. 1966. Binary codes capable of International Workshop on Semantic Evaluation,
correcting deletions, insertions, and reversals. SemEval@NAACL-HLT 2016, pages 19–30.
Xin Li, Lidong Bing, Piji Li, and Wai Lam. 2019a. A Maria Pontiki, Dimitris Galanis, Haris Papageorgiou,
unified model for opinion target extraction and target Suresh Manandhar, and Ion Androutsopoulos. 2015.
sentiment prediction. In The Thirty-Third AAAI Con- Semeval-2015 task 12: Aspect based sentiment anal-
ference on Artificial Intelligence, AAAI 2019, pages ysis. In SemEval@NAACL-HLT, pages 486–495.
6714–6721.
Maria Pontiki, Dimitris Galanis, John Pavlopoulos,
Xin Li, Lidong Bing, Piji Li, Wai Lam, and Zhimou Harris Papageorgiou, Ion Androutsopoulos, and
Yang. 2018. Aspect term extraction with history Suresh Manandhar. 2014. Semeval-2014 task
attention and selective transformation. In Proceed- 4: Aspect based sentiment analysis. In Se-
ings of the Twenty-Seventh International Joint Con- mEval@COLING 2014, pages 27–35.
ference on Artificial Intelligence, IJCAI 2018, pages
4194–4200. Colin Raffel, Noam Shazeer, Adam Roberts, Katherine
Lee, Sharan Narang, Michael Matena, Yanqi Zhou,
Xin Li, Lidong Bing, Wenxuan Zhang, and Wai Lam. Wei Li, and Peter J. Liu. 2020. Exploring the limits
2019b. Exploiting BERT for end-to-end aspect- of transfer learning with a unified text-to-text trans-
based sentiment analysis. In Proceedings of the former. J. Mach. Learn. Res., 21:140:1–140:67.
5th Workshop on Noisy User-generated Text, W-
NUT@EMNLP 2019, pages 34–41. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz
Bing Liu. 2012. Sentiment Analysis and Opinion Min- Kaiser, and Illia Polosukhin. 2017. Attention is all
ing. Synthesis Lectures on Human Language Tech- you need. In Advances in Neural Information Pro-
nologies. cessing Systems 30: Annual Conference on Neural
509
Information Processing Systems 2017, pages 5998–
6008.
Hai Wan, Yufei Yang, Jianfeng Du, Yanan Liu, Kunxun
Qi, and Jeff Z. Pan. 2020. Target-aspect-sentiment
joint detection for aspect-based sentiment analysis.
In The Thirty-Fourth AAAI Conference on Artificial
Intelligence, AAAI 2020, pages 9122–9129.
Wenya Wang, Sinno Jialin Pan, Daniel Dahlmeier, and
Xiaokui Xiao. 2017. Coupled multi-layer attentions
for co-extraction of aspect and opinion terms. In
Proceedings of the Thirty-First AAAI Conference on
Artificial Intelligence, pages 3316–3322.
Yequan Wang, Minlie Huang, Xiaoyan Zhu, and
Li Zhao. 2016. Attention-based LSTM for aspect-
level sentiment classification. In Proceedings of
the 2016 Conference on Empirical Methods in Natu-
ral Language Processing, EMNLP 2016, pages 606–
615.
Lu Xu, Hao Li, Wei Lu, and Lidong Bing. 2020.
Position-aware tagging for aspect sentiment triplet
extraction. In Proceedings of the 2020 Conference
on Empirical Methods in Natural Language Process-
ing, EMNLP 2020, pages 2339–2349.
Yichun Yin, Furu Wei, Li Dong, Kaimeng Xu, Ming
Zhang, and Ming Zhou. 2016. Unsupervised word
and dependency path embeddings for aspect term ex-
traction. In Proceedings of the Twenty-Fifth Inter-
national Joint Conference on Artificial Intelligence,
IJCAI 2016, pages 2979–2985.
510