0% found this document useful (0 votes)
28 views8 pages

2023 Newsum-1 4

The document presents a modular sentence summarization method called Extract-Select-Rewrite (ESR), which decomposes the summarization process into three distinct phases: knowledge extraction, content selection, and rewriting. ESR aims to enhance controllability and interpretability compared to end-to-end models by allowing each module to be trained separately, making it particularly effective in low-resource settings. Experimental results demonstrate that ESR produces competitive summaries on various datasets, outperforming some state-of-the-art models while maintaining greater fidelity to the source content.

Uploaded by

Chi Bi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views8 pages

2023 Newsum-1 4

The document presents a modular sentence summarization method called Extract-Select-Rewrite (ESR), which decomposes the summarization process into three distinct phases: knowledge extraction, content selection, and rewriting. ESR aims to enhance controllability and interpretability compared to end-to-end models by allowing each module to be trained separately, making it particularly effective in low-resource settings. Experimental results demonstrate that ESR produces competitive summaries on various datasets, outperforming some state-of-the-art models while maintaining greater fidelity to the source content.

Uploaded by

Chi Bi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Extract, Select and Rewrite: A Modular Sentence Summarization Method

Shuo Guan Vishakh Padmakumar


UBS AG New York University
New York, NY 10010 New York, NY 10012
[email protected] [email protected]

Abstract representing content of the summary. Finally, we


train a rewriter to convert the selected triples into
A modular approach has the advantage of be- natural language text (Figure 1).
ing compositional and controllable, compar-
There is extensive prior work that uses struc-
ing to most end-to-end models. In this pa-
per we propose Extract-Select-Rewrite (ESR), tured content extracted from the document to help
a three-phase abstractive sentence summariza- summarization, such as relation triples (Cao et al.,
tion method. We decompose summarization 2018), knowledge graphs (Zhu et al., 2021; Guan
into three stages: (i) knowledge extraction, et al., 2021), and topics (Li et al., 2018, 2020; Ara-
where we extract relation triples from the text likatte et al., 2021). However, these methods typi-
using off-the-shelf tools; (ii) content selection, cally augment the source document with the extract-
where a subset of triples are selected; and (ii- ed information and still learn to generate reference
i) rewriting, where the selected triple are real-
summaries from it in an end-to-end manner. By
ized into natural language. Our results demon-
strates that ESR is competitive with the best fully separating the modules during training, we
end-to-end models while being more faithful. can take a rewriter trained on a large dataset, and
Being modular, ESR’s modules can be trained reuse it on a small target dataset while only training
on separate data which is beneficial in low- the content selector on as few as 1k examples.
resource settings and enhancing the style con- We run experiments on Gigaword, DUC-2004
trollability on text generation.1 and Reddit-TIFU datasets and find that our ap-
proach produces summaries that are competitive to
1 Introduction
the end-to-end models in terms of automatic met-
While end-to-end models are dominating text gener- rics. We also observe that a rewriter module trained
ation tasks today, modular or pipelined approaches on Gigaword, in the news domain, can be paired
have the advantage of greater controllability and in- with a content selector trained on 1000 examples
terpretability (Kedzie and McKeown, 2020). Prior from Reddit-TIFU, a social media dataset, to pro-
work on abstractive summarization adopts a two- duce high quality summaries, demonstrating the
step process of first generating a plan (e.g., a se- value of modularity in abstractive summarization.
mantic representation) of the target summary and Further, since our content planning is extractive
then generating the summary conditioned on both in nature the summaries generated are also more
the plan and the input document (Narayan et al., faithful to the source as evidenced by a human
2021, 2022). In this paper, we present a three- evaluation comparing summaries from our modular
phase extract-select-rewrite pipeline, or ESR, for approach and an end-to-end BART baseline. Lastly,
abstractive sentence summarization, where the plan We also observe that the rewriter module can be
is restricted to be a subset of knowledge triples ex- trained once on standalone text, which can enhance
tracted from the document. Specifically, we decom- the controllability on the summary text generation
pose the task into three subtasks: knowledge extrac- style with minor changes of the training process.
tion, content selection and rewriting. To implement
the three modules, we extract knowledge triples 2 Related Work
from the source document using off-the-shelf tools. Knowledge-based Summarization Existing
Then, we train a classifier to select important triples methods that use knowledge in summarization en-
1
The codes are available on https://github.com/SeanG- codes it together with the input, e.g., Ribeiro et al.
325/ESR. (2020) and Guan et al. (2021) introduce knowledge
41
Proceedings of The 4th New Frontiers in Summarization Workshop, pages 41–48
December 6, 2023 ©2023 Association for Computational Linguistics
Sentence Text:
(an UN soldier, be killed by, a stray bullet)
An UN soldier in Bosnia Summary Text:
was shot and killed by a (an UN soldier, be killed by, a stray bullet)
stray bullet on Tuesday in (military officials, is in, Stockholm)
An UN soldier
an incident. Authorities are in Bosnia killed
calling an accident , (UN soldier, is in, Bosnia) (UN soldier, is in, Bosnia) by stray bullet.
military officials in
Stockholm said Tuesday. (authorities, are calling, an accident)

Knowledge Content
Rewriting
Extraction Selection

Figure 1: An overview of the three-phase summarization framework ESR.

graph encoding strategies for the graph-to-text delete the overlapping things, we use the Jaccard
generation model. Koncel-Kedziorski et al. (2019) distance on n-grams (JU ni , JBi ) of between any
and Wu et al. (2021) use a graph transformer pairs of triples (x1 , x2 ) to calculate their similarity:
encoder to consume knowledge and semantic def
graph. Huang et al. (2020) propose a model Sim(xi , xj ) = λ1 JUni (xi , xj ) + λ2 JBi (xi , xj )
integrated with the GAT (Veličković et al., 2018) Here λ1 , λ2 are hyperparameters determined on
encoding knowledge graphs of the documents. the validation data. We filter triples such that no
Modular Summarization Castro Ferreira et al. pair of triples has a similarity score higher than
(2019) and Khot et al. (2021) showed the advan- the threshold. If the similarity between two tripes
tages of the modularity on text generation and are larger than the threshold, the triple that has
question answering comparing to the end-to-end the larger length will be kept. The details of the
models. Pilault et al. (2020) and Chen and Bansal threshold are in Section 4.1.
(2018) first extract sentences from the documen- Content Selection The content selector selects
t and then perform abstractive summarization on the triples that are to be included in the summary
them. Krishna et al. (2021) proposed a medical out of the candidates. We train it as a sentence-pair
text generation method using modular summariza- classifier with two inputs, the document and the
tion techniques based on clustering of utterances candidate knowledge triple extracted from it, and
in sentences. However, the "modularity" in these an output of whether to select the triple. If the triple
methods rely on the neural networks to take in ad- is to be included in the summary of the document,
ditional knowledge such as knowledge graphs, as the document-triple pair will be labeled positive,
opposed to splitting the model into different mod- otherwise negative. We need to obtain supervised
ules explicitly, which is where ESR differs. labels for the triples in the training set for training
the content selector. For each triple in the training
3 Method
set, we use ROUGE (Lin, 2004) to measure the
We divide the summarization task explicitly into similarity to the corresponding summaries, if it is
three phases—Knowledge Extraction, Content Se- higher than a threshold then we label that triple as
lection, and Rewriting, as shown in Figure 1. a positive example. Some representative examples
of these sentence pairs and the details for selecting
Knowledge Extraction To enable fine-grained
the threshold can be found in Section 4.1.
content selection and rewriting, we turn all docu-
ments into structured content representation. We Rewriting The rewriter converts the selected
adopt knowledge triples that can be extracted triples into fluent summaries, where the triples
by off-the-shelf tools (Section 4.1). The knowl- serve as a content plan. We train a sequence-to-
edge triples are in the form of <entity 1, sequence text generation model, similar to con-
relation, entity 2>. The extractors usu- verting meaning-representation to natural language
ally generate a large number of redundant triples text (Kedzie and McKeown, 2020). The train data
(i.e. triples with large overlap with each other.2 To for this phase contains the texts and the triples ex-
2
tracted from them. To train the generation model,
For example, given the sentence "German chemical
giant Hoechst Group announced plans wednesday to chemical giant Hoechst Group, announced,
invest over a million dollars in China next year" our plans> and <chemical giant Hoechst group,
extractors might generate two candidates <German announced, plans> which are clearly redundant.

42
Ext. Valid Redun. Pos/Neg Model R-1 R-2 R-L
Train Articles 6.34 2.53 60.1% 0.91 BART (2020) 37.28 18.58 34.53
Train Summaries 4.51 1.76 62.0% - BART-RXF (2021) 40.45 20.69 36.56
Test Articles 6.19 2.42 60.9% - PEGASUS+Dot (2021) 40.60 21.00 37.00
OFA (2022) 39.81 20.66 37.11
Table 1: Triple statistics in train and test sets. "Ext." ESR 40.63 20.62 37.14
(Extracted) and "Valid" are the mean numbers of the
the extracted and valid triplets (redundance removed). Table 2: ROUGE F1 on the Gigaword testset. It shows
"Redun." is the redundancy rate. "Pos/Neg" is the pos- that ESR achieves or is competitive with the state-of-
itive and negative sample ratio of the constructed data the-art on this dataset. Bold indicates the best score.
set in the content selection phase.
Model R-1 R-2 R-L
RT+Conv (2018) 31.15 10.85 27.68
we concatenate the extracted triples from the doc- BART (2020) 31.36 11.40 28.02
ument as the source sequence, and use the text as ALONE (2020) 32.57 11.63 28.24
the target sequence. Note that training the rewriter WDROP (2021) 33.06 11.45 28.51
ESR 33.08 11.52 28.74
only requires a piece of text and knowledge triples
extracted from it. Therefore it can be potentially Table 3: ROUGE F1 on DUC-2004 dataset. It shows
trained on much larger data (like Wiki text). ESR’s performance achieved the SOTA on this dataset.
Bold indicates the best score.
4 Experiments
4.1 Experiment Settings ble 1 shows the detailed statistics of knowledge
Datasets Our main results are based on 2 news extraction based on Gigaword. The number of
summarization datasets: (i) the Gigaword corpus sentence-triple pairs is 400k, which are used to
(Rush et al., 2015), with around 3.8M summaries of train the content selector. The accuracy of our fine-
single sentence news documents; (ii) DUC-2004, tuned RoBERTa content selector on this dataset is
another test set in the news domain (Over et al., 88.9%. The details of the metrics are in Appendix
2007)3 To evaluate the modularity of our method, Table 6. The size of the rewriting data set is 2M.
we reuse the rewriter trained on Gigaword and We ablate the effect of the rewriting phase by com-
pair it with the content selector trained on another paring ROUGE scores before and after rewriting
dataset from a different domain, Reddit TIFU (Kim the triples in Appendix Table 7.
et al., 2019); Gigaword contains news text while
Reddit TIFU contains text from social media. Automatic Evaluation Next, we evaluate the w-
hole system on new summarization datasets. We
Training Details We used OLLIE (Mausam report ROUGE score (Lin, 2004) on the Gigaword
et al., 2012), two OpenIE tools (Angeli et al., 2015; test set and the DUC-2004 dataset, containing 1951
Saha and Mausam, 2018) as the triple extractors. and 500 samples respectively. We compare our ES-
The triples from each of these are combined and R to a BART baseline that is fine-tuned in a single
then filtered for redundancy (Section 3). In order supervised step to generate the summary from the
to ensure the quality of the triplet to the greatest source documents. and some other strong model-
extent, the methods such as co-reference resolution s on the datasets.4 The performance is shown in
will be required. We fine-tuned the RoBERTa-large Table 2 and Table 3. On Gigaword and DUC2004,
(Liu et al., 2019) as the content selector and fine- our approach outperforms the BART baseline and
tuned the BART-large (Lewis et al., 2020) from is within half the point of the SOTA results.5
fairseq (Ott et al., 2019) as the rewriter. All
models are trained and fine-tuned on 2 NVIDIA Modularity One advantage of ESR is that train-
RTX 2080 Ti GPUs. The detailed hyperparameters ing the rewriter does not require document-
for three modules are in Appendix B. summary pairs and we can train it on any generic
text. To test the modularity of ESR, we report the
4.2 Results ROUGE on Reddit TIFU reusing a rewriter trained
Intrinsic Evaluation of Each Module We first 4
These are typically modified variants of end-to-end model-
evaluate each of the three modules separately. Ta- s.We report the results from the PapersWithCode leaderboard
and cite the corresponding works in the results table.
3 5
We use the DUC 2004 Task 1 which requires you to State-of-the-art as of the date of submission per the leader-
generate a sentence summary to a short article. board on PapersWithCode

43
Model R-1 R-2 R-L Summaries Sup. Unsup. Incoh. Inconc.
BART (2020) 24.19 8.12 21.31 Human-Written 96 3 0 1
PEGASUS+Sum (2022) 29.83 9.50 23.47 BART 90 6 2 2
BART-R3F (2021) 30.31 10.98 24.74 ESR 94 3 2 1
ESR
SR + RG 30.63 10.82 24.78 Table 5: Human evaluation on faithfulness. The sum-
SR + RR 29.92 10.51 24.26
SR1k + RG1k 29.67 10.09 24.00 maries from the dataset (Human-Written) and those
SR1k + RR1k 29.38 10.02 23.90 from ESR and the BART are annotated by 3 annota-
SR1k + RG 29.09 10.07 23.86 tors. Crowd workers find ESR to be more faithful than
BART.
Table 4: ROUGE F1 on R-TIFU (Reddit-TIFU). SR
means the content selector was trained on R-TIFU, RG Case Study
ST: Zairean president Mobutu Sese Seko will stay at his French
and RR mean rewriter trained on Gigaword and R-TIFU Riviera residence until at least the middle of the week because of
respectively. 1k means that the module is trained on 1k an increase in diplomatic activity, a Mobutu aide said on Sunday.
Selected Triples:
randomly sampled subset. The content selector can be (Zairean president Mobutu Sese Seko, will stay at, his French
trained with low-resourced data without large dropping. Riviera residence)
(Zairean president Mobutu Sese Seko, will stay until, the middle
Bold means the best and Italics means the best in ESR. of the week)
Ref: Zairean president Mobutu to stay in France till mid-week
BART: Tanzania's Mobutu to stay at Riviera residence until
middle of week
on Gigaword in Table 4. The best ROUGE is ob- ESR (Gigaword content selector):
tained when using the Reddit TIFU content selector - Gigaword rewriter: Zairean president Mobutu will stay at
his French Riviera residence until the middle of week
coupled with the Gigaword rewriter, highlighting - Reddit-TIFU rewriter: Zairean Mobutu will stay at his
the benefit of training the modules separately. One French Riviera president residence… it’s said that he will
stay until the middle of week
advantage of such decoupling is that we can train
the rewriter on high resource domains and reuse Figure 2: A case on the Gigaword testset. ST: source
it in low resource tasks. We further subsampled text; Ref: reference summary; Selected Triples: triples
1k samples from Reddit TIFU and Gigaword for selected by the content selector. With the rewriter mod-
training the modules to see how performance varies ule trained on different dataets, the text style of ESR
in the small data regime. We see that training a con- can be controlled. The green shows the factual correct-
tent selector on only 1k examples and reusing the ness and the red shows the error.
rewriter from Gigaword is on-par with using the
entire Reddit TIFU. Further, the modularity makes 4.3 Analysis
ESR able to control the text style, as in Figure 2.
The evaluations show that ESR can achieve or ap-
proach SOTA performance on multiple datasets
Human Evaluation We conducted a user study and can enhance the faithfulness of summaries. We
on Amazon MTurk where annotators rated sum- found ESR can limit the content of the generated
maries of 100 randomly sampled texts from the summary in the content selection stage, and then
Gigaword test set on faithfulness. We asked the rewrite only selected content. Therefore, text gener-
annotators to rate summaries of our approach and ation will introduce less hallucination. In addition,
BART, together with the results of the gold sum- ESR has better modularity than other models, as
maries of the data set. Each crowdworker was the selector and rewriter can be trained separately
shown the source document and three summaries on different data to enhance performance and con-
and asked to decide if each summary is individual- trollability on summarization. This means that we
ly supported by the text in the source. We collect can modify the modules to enhance performance
three annotations for each example and decide the rather than redesign the entire framework.
judgement via a majority vote. It is labeled incon-
5 Conclusion
clusive if there is no agreement. The results are
in Table 5. We see that ESR is rated to be more We propose ESR, a three-phase modular abstrac-
faithful than the baseline and almost as good as the tive summarization method. It obtains competitive
human-written summaries. A representative case is performance on automatic metrics while produc-
shown in Figure 2. It shows that ESR can eliminate ing more faithful summaries, and its modularity
the hallucination and control the summarization makes it have a good controllability on summary
styles with different rewriter modules. generation, and maintains a good performance on
44
low resource data. In the future, we are adapting Gabor Angeli, Melvin Jose Johnson Premkumar, and
the ESR method to multi-document summarization Christopher D. Manning. 2015. Leveraging linguis-
tic structure for open domain information extraction.
datasets.
In Proceedings of the 53rd Annual Meeting of the
Association for Computational Linguistics and the
Acknowledgement 7th International Joint Conference on Natural Lan-
guage Processing (Volume 1: Long Papers), pages
We would like to appreciate Professor He He for 344–354, Beijing, China. Association for Computa-
her input and guidance at various stages of the tional Linguistics.
project. This work is supported by the Samsung Rahul Aralikatte, Shashi Narayan, Joshua Maynez,
Advanced Institute of Technology (Next Genera- Sascha Rothe, and Ryan McDonald. 2021. Focus at-
tion Deep Learning: From Pattern Recognition to tention: Promoting faithfulness and diversity in sum-
marization. In Proceedings of the 59th Annual Meet-
AI) and the National Science Foundation under ing of the Association for Computational Linguistics
Grant No.1922658. The computation resource of and the 11th International Joint Conference on Nat-
the work is supported by NYU Courant Institute of ural Language Processing (Volume 1: Long Papers),
Mathematical Sciences. pages 6078–6095, Online. Association for Computa-
tional Linguistics.
Limitation Ziqiang Cao, Furu Wei, Wenjie Li, and Sujian Li. 2018.
Faithful to the original: Fact-aware neural abstrac-
One limitation of our method is the reliance on tive summarization. In Proceedings of the Thirty-
off-the-shelf tools for the extraction phase. These Second AAAI Conference on Artificial Intelligence
and Thirtieth Innovative Applications of Artificial In-
tools are sometimes not able to successfully obtain telligence Conference and Eighth AAAI Symposium
triples from the source sentences, which results in on Educational Advances in Artificial Intelligence,
empty summaries, and at others they returns mul- AAAI’18/IAAI’18/EAAI’18. AAAI Press.
tiple redundant candidates which makes selection Thiago Castro Ferreira, Chris van der Lee, Emiel van
challenging. We attempt to address the former by Miltenburg, and Emiel Krahmer. 2019. Neural data-
aggregating results from multiple extractors and to-text generation: A comparison between pipeline
the latter by filtering candidates through overlap and end-to-end architectures. In Proceedings of
the 2019 Conference on Empirical Methods in Nat-
based heuristics. ural Language Processing and the 9th Internation-
al Joint Conference on Natural Language Process-
Ethical Consideration ing (EMNLP-IJCNLP), pages 552–562, Hong Kong,
China. Association for Computational Linguistics.
One ethical consideration for the modular summa- Yen-Chun Chen and Mohit Bansal. 2018. Fast ab-
rization method is that we are essentially using two stractive summarization with reinforce-selected sen-
different deep learning steps, content selection fol- tence rewriting. In Proceedings of the 56th Annual
lowed by text generation. There is a chance for Meeting of the Association for Computational Lin-
guistics (Volume 1: Long Papers), pages 675–686,
model bias to have an impact at either stage. Addi- Melbourne, Australia. Association for Computation-
tionally, we note that one of the features of modular al Linguistics.
summarization is that different applications can se-
Shuo Guan, Ping Zhu, and Zhihua Wei. 2021. Knowl-
lect different content to be relevant to a summary. edge and keywords augmented abstractive sentence
Improper content selection here could exacerbate summarization. In EMNLP 2021 Workshop on New
issues such as misinformation when used in real- Frontiers in Summarization, pages 25–32, Online
world applications. We do however note that this and in Dominican Republic. Association for Com-
putational Linguistics.
is not isolated to our modular summarization ap-
proach, but is also the case even when the model is Luyang Huang, Lingfei Wu, and Lu Wang. 2020.
learned end-to-end. Knowledge graph-augmented abstractive summa-
rization with semantic-driven cloze reward. In Pro-
ceedings of the 58th Annual Meeting of the Asso-
ciation for Computational Linguistics, pages 5094–
References 5107, Online. Association for Computational Lin-
guistics.
Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta,
Naman Goyal, Luke Zettlemoyer, and Sonal Gupta. Akhil Kedia, Sai Chetan Chinthakindi, and Wonho
2021. Better fine-tuning by reducing representation- Ryu. 2021. Beyond reptile: Meta-learned dot-
al collapse. In International Conference on Learn- product maximization between gradients for im-
ing Representations. proved single-task regularization. In Findings of the

45
Association for Computational Linguistics: EMNLP American Chapter of the Association for Computa-
2021, pages 407–420, Punta Cana, Dominican Re- tional Linguistics: Human Language Technologies,
public. Association for Computational Linguistics. Volume 2 (Short Papers), pages 55–60, New Orleans,
Louisiana. Association for Computational Linguis-
Chris Kedzie and Kathleen McKeown. 2020. Con- tics.
trollable meaning representation to text generation:
Linearization and data augmentation strategies. In Haoran Li, Junnan Zhu, Jiajun Zhang, Chengqing
Proceedings of the 2020 Conference on Empirical Zong, and Xiaodong He. 2020. Keywords-guided
Methods in Natural Language Processing (EMNLP), abstractive sentence summarization. In Proceedings
pages 5160–5185, Online. Association for Computa- of the AAAI Conference on Artificial Intelligence,
tional Linguistics. volume 34, pages 8196–8203.
Tushar Khot, Daniel Khashabi, Kyle Richardson, Pe-
Chin-Yew Lin. 2004. ROUGE: A package for automat-
ter Clark, and Ashish Sabharwal. 2021. Text mod-
ic evaluation of summaries. In Text Summarization
ular networks: Learning to decompose tasks in the
Branches Out, pages 74–81, Barcelona, Spain. Asso-
language of existing models. In Proceedings of the
ciation for Computational Linguistics.
2021 Conference of the North American Chapter of
the Association for Computational Linguistics: Hu-
man Language Technologies, pages 1264–1279, On- Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man-
line. Association for Computational Linguistics. dar Joshi, Danqi Chen, Omer Levy, Mike Lewis,
Luke Zettlemoyer, and Veselin Stoyanov. 2019.
Byeongchang Kim, Hyunwoo Kim, and Gunhee Kim. Roberta: A robustly optimized bert pretraining ap-
2019. Abstractive summarization of Reddit posts proach. arXiv e-prints, arXiv:1907.11692.
with multi-level memory networks. In Proceedings
of the 2019 Conference of the North American Chap- Mausam, Michael Schmitz, Stephen Soderland, Robert
ter of the Association for Computational Linguistic- Bart, and Oren Etzioni. 2012. Open language learn-
s: Human Language Technologies, Volume 1 (Long ing for information extraction. In Proceedings of the
and Short Papers), pages 2519–2531, Minneapolis, 2012 Joint Conference on Empirical Methods in Nat-
Minnesota. Association for Computational Linguis- ural Language Processing and Computational Natu-
tics. ral Language Learning, pages 523–534, Jeju Island,
Korea. Association for Computational Linguistics.
Rik Koncel-Kedziorski, Dhanush Bekal, Yi Luan,
Mirella Lapata, and Hannaneh Hajishirzi. 2019. Shashi Narayan, Gonçalo Simões, Yao Zhao, Joshua
Text Generation from Knowledge Graphs with Maynez, Dipanjan Das, Michael Collins, and
Graph Transformers. In Proceedings of the 2019 Mirella Lapata. 2022. A well-composed text is half
Conference of the North American Chapter of the done! composition sampling for diverse conditional
Association for Computational Linguistics: Human generation. arXiv preprint arXiv:2203.15108.
Language Technologies, Volume 1 (Long and Short
Papers), pages 2284–2293, Minneapolis, Minnesota. Shashi Narayan, Yao Zhao, Joshua Maynez, Gonçalo
Association for Computational Linguistics. Simões, Vitaly Nikolaev, and Ryan McDonald. 2021.
Planning with learned entity prompts for abstractive
Kundan Krishna, Sopan Khosla, Jeffrey Bigham, and summarization. Transactions of the Association for
Zachary C. Lipton. 2021. Generating SOAP notes Computational Linguistics, 9:1475–1492.
from doctor-patient conversations using modular
summarization techniques. In Proceedings of the
Myle Ott, Sergey Edunov, Alexei Baevski, Angela
59th Annual Meeting of the Association for Compu-
Fan, Sam Gross, Nathan Ng, David Grangier, and
tational Linguistics and the 11th International Join-
Michael Auli. 2019. fairseq: A fast, extensible
t Conference on Natural Language Processing (Vol-
toolkit for sequence modeling. In Proceedings of
ume 1: Long Papers), pages 4958–4972, Online. As-
the 2019 Conference of the North American Chap-
sociation for Computational Linguistics.
ter of the Association for Computational Linguistics
Mike Lewis, Yinhan Liu, Naman Goyal, Mar- (Demonstrations), pages 48–53, Minneapolis, Min-
jan Ghazvininejad, Abdelrahman Mohamed, Omer nesota. Association for Computational Linguistics.
Levy, Veselin Stoyanov, and Luke Zettlemoyer.
2020. BART: Denoising sequence-to-sequence pre- Paul Over, Hoa Dang, and Donna Harman. 2007. Duc
training for natural language generation, translation, in context. Information Processing & Management,
and comprehension. In Proceedings of the 58th An- 43(6):1506–1520.
nual Meeting of the Association for Computational
Linguistics, pages 7871–7880, Online. Association Jonathan Pilault, Raymond Li, Sandeep Subramanian,
for Computational Linguistics. and Chris Pal. 2020. On extractive and abstractive
neural document summarization with transformer
Chenliang Li, Weiran Xu, Si Li, and Sheng Gao. 2018. language models. In Proceedings of the 2020 Con-
Guiding generation for abstractive text summariza- ference on Empirical Methods in Natural Language
tion based on key information guide network. In Processing (EMNLP), pages 9308–9319, Online. As-
Proceedings of the 2018 Conference of the North sociation for Computational Linguistics.

46
Mathieu Ravaut, Shafiq Joty, and Nancy Chen. 2022. 59th Annual Meeting of the Association for Compu-
SummaReranker: A multi-task mixture-of-experts tational Linguistics and the 11th International Join-
re-ranking framework for abstractive summarization. t Conference on Natural Language Processing (Vol-
In Proceedings of the 60th Annual Meeting of the ume 1: Long Papers), pages 6052–6067, Online. As-
Association for Computational Linguistics (Volume sociation for Computational Linguistics.
1: Long Papers), pages 4504–4524, Dublin, Ireland.
Association for Computational Linguistics. Chenguang Zhu, William Hinthorn, Ruochen Xu,
Qingkai Zeng, Michael Zeng, Xuedong Huang, and
Leonardo F. R. Ribeiro, Yue Zhang, Claire Gardent, Meng Jiang. 2021. Enhancing factual consistency
and Iryna Gurevych. 2020. Modeling global and of abstractive summarization. In Proceedings of the
local node contexts for text generation from knowl- 2021 Conference of the North American Chapter of
edge graphs. Transactions of the Association for the Association for Computational Linguistics: Hu-
Computational Linguistics, 8:589–604. man Language Technologies, pages 718–733, On-
line. Association for Computational Linguistics.
Alexander M. Rush, Sumit Chopra, and Jason Weston.
2015. A neural attention model for abstractive sen-
tence summarization. In Proceedings of the 2015
Conference on Empirical Methods in Natural Lan-
guage Processing, pages 379–389, Lisbon, Portugal.
Association for Computational Linguistics.

Swarnadeep Saha and Mausam. 2018. Open informa-


tion extraction from conjunctive sentences. In Pro-
ceedings of the 27th International Conference on
Computational Linguistics, pages 2288–2299, San-
ta Fe, New Mexico, USA. Association for Computa-
tional Linguistics.

Sho Takase, Shun Kiyono, and Sho Takase. 2021. Re-


thinking perturbations in encoder-decoders for fast
training. In Proceedings of the 2021 Conference of
the North American Chapter of the Association for
Computational Linguistics: Human Language Tech-
nologies, pages 5767–5780, Online. Association for
Computational Linguistics.

Sho Takase, Sosuke Kobayashi, and Sho Takase. 2020.


All word embeddings from one embedding. In Ad-
vances in Neural Information Processing Systems,
volume 33, pages 3775–3785. Curran Associates, In-
c.

Petar Veličković, Guillem Cucurull, Arantxa Casano-


va, Adriana Romero, Pietro Lió, and Yoshua Bengio.
2018. Graph attention networks. In International
Conference on Learning Representations.

Li Wang, Junlin Yao, Yunzhe Tao, Li Zhong, Wei Liu,


and Qiang Du. 2018. A reinforced topic-aware con-
volutional sequence-to-sequence model for abstrac-
tive text summarization. In Proceedings of the 27th
International Joint Conference on Artificial Intelli-
gence, IJCAI’18, pages 4453–4460. AAAI Press.

Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai


Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren
Zhou, and Hongxia Yang. 2022. Ofa: Unifying ar-
chitectures, tasks, and modalities through a simple
sequence-to-sequence learning framework. CoRR,
abs/2202.03052.

Wenhao Wu, Wei Li, Xinyan Xiao, Jiachen Liu, Z-


iqiang Cao, Sujian Li, Hua Wu, and Haifeng Wang.
2021. BASS: Boosting abstractive summarization
with unified semantic graph. In Proceedings of the

47
Appendices
A Details of the Generated Summaries
The length statistics of the generated summaries of
our model on Gigaword test set is showed in Table
8.
As mentioned in the paper, the summary gen-
eration of our model is based on triples extracted
from the original text. Therefore, the quality of
the extracted triples during inference will affect the R-1 R-2 R-L
quality of the generated abstracts to a certain ex- Concatenated Triples 38.98 18.12 35.76
tent. For example, the length of the final generated Rewritten Summaries 40.63 20.62 36.71
summaries will depend on the text length of the
triples. In order to ensure the quality of the triplet Table 7: ROUGE comparing Concatenated Triples
to the greatest extent, methods such as co-reference (aren’t rewritten) and Rewritten Summaries (rewritten).
resolution will be required.

B Hyper Parameters
The hyper parameters for fine-tuning RoBERTa-
large in content selection phase, and BART-large
model in rewriting phase are listed.

B.1 Knowledge Extraction


The hyperparameters in Jaccard similarity are λ1 =
0.75 and λ2 = 0.25. The threshold for similarity is
0.7.

B.2 Content Selection


TOTAL_NUM_UPDATES=3000
WARMUP_UPDATES=500
LR=1e-05
NUM_CLASSES=2
MAX_SENTENCES=8

Acc. Rec. Prec. F1


88.9% 88.6% 88.1% 88.4% Statistics Articles Ref. Our Model
Avg Len 30.9 9.1 12.3
Table 6: Sentence-pair classification performance of
the content selector. Table 8: Sentence-pair classification metrics of content
selection phase.

B.3 Rewriting
TOTAL_NUM_UPDATES = 10000
WARMUP_UPDATES = 500
MAX_TOKENS = 256
UPDATE_FREQ = 2
LR = 3e-5

48

You might also like