0% found this document useful (0 votes)
16 views10 pages

The Impact of Rule-Based Text Generation On The Quality of Abstractive Summaries

The document describes an approach to improving abstractive text summarization by integrating syntactic text simplification, subject-verb-object concept frequency scoring, and rules to transform text into a semantic representation. The authors analyzed the impact of each component on summary quality and found their approach outperformed other abstractive methods while maintaining good linguistic quality and redundancy rates when tested on the DUC 2002 dataset.

Uploaded by

Assala Belkhiri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views10 pages

The Impact of Rule-Based Text Generation On The Quality of Abstractive Summaries

The document describes an approach to improving abstractive text summarization by integrating syntactic text simplification, subject-verb-object concept frequency scoring, and rules to transform text into a semantic representation. The authors analyzed the impact of each component on summary quality and found their approach outperformed other abstractive methods while maintaining good linguistic quality and redundancy rates when tested on the DUC 2002 dataset.

Uploaded by

Assala Belkhiri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

The Impact of Rule-Based Text Generation on the Quality of Abstractive

Summaries

Tatiana Vodolazova, Elena Lloret


Dept. of Software and Computing Systems
University of Alicante
Apdo. de Correos 99
E-03080, Alicante, Spain
{tvodolazova,elloret}@dlsi.ua.es

Abstract Over the past few years interest in the field of


text summarization has shifted towards abstrac-
In this paper we describe how an abstrac- tive methods and quickly produced a large variety
tive text summarization method improved of approaches. Gupta and Gupta (2018) classify
the informativeness of automatic sum- them broadly into methods based on the structure,
maries by integrating syntactic text simpli- semantics and deep learning with neural networks.
fication, subject-verb-object concept fre- The main advantage of semantic-based ap-
quency scoring and a set of rules that proaches over deep learning ones lies in their in-
transform text into its semantic represen- dependence from a large training corpus. Most
tation. We analyzed the impact of each of the available datasets for deep learning belong
component of our approach on the qual- to the domain of news text that further restricts
ity of generated summaries and tested the application of these methods to other domains.
it on DUC 2002 dataset. Our experi- However, semantic-based approaches rely on a
ments showed that our approach outper- parser to transform text to its semantic represen-
formed other state-of-the-art abstractive tation and, therefore, a poor parser performance
methods while maintaining acceptable lin- will reduce the quality of generated summaries.
guistic quality and redundancy rate. Another limitation of the deep learning methods
comes from the fact that they rely on statistical co-
1 Introduction
occurrence of words and are prone to semantic and
Rapid growth of digital information increases the grammatical errors. This is something that a reli-
need for automatic text summarization methods able parser could help to avoid.
that can digest large amounts of textual data, Structure-based methods, such as template and
such as scientific articles, blogs and news articles ontology based ones reveal other weaknesses.
to extract concise and relevant information from Template-based methods lack diversity. At the
them. Text summarization methods can be classi- same time, ontology based ones rely on a time-
fied into abstractive and extractive ones (Nenkova consuming task of creating an ontology by a hu-
and McKeown, 2012). Extractive methods com- man expert. However, they provide highly co-
pose summaries from the most salient sentences herent summaries and can handle uncertainties re-
of the original document. In contrast, abstractive spectively. Semantic-based approaches that rely
methods generate novel or partially novel text us- on handcrafted rules to transform text into seman-
ing such techniques as sentence compression, fu- tic representation may be criticized for the same
sion, calculation of path scores in graphs or, natu- reason related to the human effort and time re-
ral language generation tools such as SimpleNLG quired to solve the laborious task of creating trans-
(Gupta and Gupta, 2018). They involve an inter- formation rules.
mediate step of deep linguistic analysis and an ab- It becomes clear that each abstractive approach
stract semantic representation of the data. Extrac- can reliably handle only some aspects of the sum-
tive techniques have been intensively researched marization process while revealing weaknesses in
for over half a century and, according to some the remaining ones. Thus far, none of the ap-
studies, “have more or less achieved their peak proaches has been capable of offering a broad-
performance” (Mehta, 2016). based solution. Research in this field is mak-

1275
Proceedings of Recent Advances in Natural Language Processing, pages 1275–1284,
Varna, Bulgaria, Sep 2–4, 2019.
https://doi.org/10.26615/978-954-452-056-4_146
ing headway, each time with more elaborate algo- ing information representation and extract a num-
rithms and combining techniques from a number ber of features from it that are later used for
of different methods. However, Chen et al. (2016) ranking. Li (2015) define the concept of Ba-
have shown via their analysis of the reading com- sic Semantic Unit (BSU) where each BSU is an
prehension task – another natural language pro- actor-action-receiver triplet with its obligatory ar-
cessing task that requires interpretation of the text guments, namely, actor and receiver of the action.
– that a straightforward approach designed around The BSUs are used to construct a BSU semantic
a small set of carefully selected features can obtain link network representation for each text.
high, state-of-the-art accuracy. Abstract Meaning Representation (AMR)
Therefore, this study has a threefold objective. graphs is the most recent approach to abstract
First, to design a broad-based abstractive text sum- semantic representation of texts (Vilca and
marization method. Second, to evaluate whether Cabezudo, 2017). AMR nodes are represented
the proposed method is capable of delivering con- by either words or PropBank1 frames, and edges
cise and informative summaries while maintaining define relationships between them. Both the AMR
above-average linguistic quality and redundancy graph representation and the subject-verb-object
rate. Third, to compare it against other state-of- (SVO) representation depend on the efficiency
the-art abstractive methods. of the parser. However, AMR graphs also rely
The approach that we propose in this work falls on PropBank framework whose limitations pose
into the previously mentioned semantic-based additional constraints on AMR graphs. Further-
group of abstractive summarization approaches more, the problem of text generation from AMR
and has been inspired by the ideas of Genest and graphs is still a challenge and it has not yet been
Lapalme (2011) and Lloret et al. (2015). Our con- solved (Li, 2015).
tribution takes their abstractive models one step The summarization method based on BSUs pro-
further by scoring abstract information representa- posed by Li (2015) overcomes the limitation of
tion without taking into account its surface repre- text generation faced by AMR graphs, and pro-
sentation. The proposed method incorporates syn- duces informative, coherent and compact sum-
tactic text simplification, subject-verb-object con- maries. However, as the authors state, the BSU
cept frequency scoring, and a set of rules that network cannot yet handle data that express opin-
transform text into its semantic representation. ions rather than facts and actions, since these cases
This paper is structured as follows: Section 2 involve verbs that lack meaningful actions, such as
discusses related semantic-based abstractive sum- ’be’, and the underlying representation of actor-
marization approaches. Section 3 describes in de- action-receiver cannot be appropriately computed.
tail the architecture of our method. Evaluation Alshaina et al. (2017) use K-means and agglom-
methods and results are presented in Section 4. erative hierarchical clustering algorithms to group
Section 5 describes the effect of individual com- similar predicate-argument structures (PAS) based
ponents of our approach upon the quality of gen- on semantic similarity measures, and to eventu-
erated summaries. Section 6 provides a summary ally select the most representative PAS based on a
of the conclusions and areas for future work. weighted set of 12 features. The PAS proposed by
this approach are classified into simple and com-
2 Related Work plex ones. Complex PAS are derived from sen-
tences with multiple verbs, otherwise they are con-
All the methods in the semantic-based abstrac- sidered to be simple. Nested PAS are eliminated.
tive summarization group include the initial step One of the features that determines whether to in-
of converting texts into an abstract semantic rep- clude a PAS into the summary or not is the ”num-
resentation. For example, Genest and Lapalme ber of verbs and nouns” that gives preference to
(2011) introduced the concept of information item complex PAS as crucial to summary generation.
that was defined as a smallest element of coher- Lloret et al. (2015) propose an abstractive
ent information and represented as a dated and semantic-based approach to ultra-concise opin-
located subject-verb-object triplet. Lloret et al. ion summarization. It involves a syntactic sen-
(2015) also base their concept representation on tence simplification in the preprocessing step and
subject-verb-object triplets. Alshaina et al. (2017)
1
use predicate-argument structure as their underly- https://propbank.github.io/

1276
semantic representation based on subject-verb- bits of information are selected while the irrele-
object triplets. Their scoring heuristics relies on vant ones are discarded. We use the Factual State-
subject-verb-object term frequencies. ment Extractor to carry out the simplification task
The approaches closest to ours are those of (Heilman and Smith, 2010).
Lloret et al. (2015); Genest and Lapalme (2011, Analysis. In this stage, we perform a linguis-
2010). However, the difference between them is tic analysis decomposing each supplied simpli-
twofold. First, the aforementioned systems use fied sentence into lemmas, stems, parts of speech,
term or document frequencies for scoring. We in- senses, named entities, syntactic roles and noun
tegrate word sense disambiguation to identify sim- phrases. This is done mainly with the help of
ilarities between subject-verb-object triplets on the Stanford CoreNLP (Manning et al., 2014). Ad-
conceptual level that allows us to introduce con- ditionally we use Porter stemmer for stemming
cept frequencies for scoring. Second, the architec- (Porter, 1997), Freeling for word sense disam-
ture of our approach is characterized by a higher biguation (Padró and Stanilovsky, 2012) and Java
level of abstraction. Namely, our approach scores DOM parser for noun phrase chunking.
abstractive concepts represented in the form of Information Items Generation. Once the data
enriched subject-verb-object triplets and not their have been analyzed we proceed to build an ab-
surface representation. Their surface representa- stract representation of each of the sentences. We
tion is integrated in the final step when all the adopt the same naming convention as Genest and
triplets have already been assigned their score. Lapalme (2011) and refer to them as information
Unlike the approach of Alshaina et al. (2017) items (InIts). At the core of each InIt lies
who give preference to sentences with more than the main verb of the sentence accompanied by its
one verb, our approach integrates syntactic sen- subject and object, if they are present. Contrary
tence simplification in the preprocessing step in to Genest and Lapalme (2011) we do not incorpo-
order to split complex sentences into simpler ones rate any manual rules to reject candidate InIts.
and ideally reduce syntactic structure to a single However, a small portion of them will be lost dur-
main verb. This allows us to generate various ing the surface realization stage if SimpleNLG
subject-verb-object triplets from a single sentence fails to generate a sentence from an InIt. It hap-
and to manipulate them in a more precise manner. pens at most to 1-2 simplified sentences per doc-
ument. Preserving all InIts may introduce a
3 Abstractive Summarization higher rate of grammatically incorrect sentences
Framework due to the incorrect sentence parses3 . However,
The architecture of our proposed abstractive text since no clear pattern between syntactic linguis-
summarization approach is illustrated in Figure 1. tic phenomena and incorrect parses was observed,
This section describes the role and the implemen- we could not discard such cases. Additionally, we
tation of each of its components. extend the core subject-verb-object structure to in-
Simplification. We begin by applying syntac- clude open clausal complements and prepositional
tic simplification to the original document as a phrases. Since the Stanford CoreNLP configura-
pre-processing step. Simplification targets only tion that we used implements Universal Depen-
complex sentences, splitting their syntactic trees dencies 4 for dependency parsing, our rules for
into simpler ones. Each newly created sentence transforming text into InIts are also designed
is a fully grammatical construction that, not al- around this annotation scheme. We implemented
ways but in most cases, contains one main verb 5 transformation rules:
and covers one single concept2 . In the next stages 1. ccomp rule retains a clausal complement of
our method generates an information item from a verb or adjective, rejecting the initial part.
each simplified sentence. Simplifying the syntac- He says that [you like to swim].
tic structure of the input text allows us to have
2. subject and verb rule identifies them in
fewer, less recursive and less error-prone rules for
the remaining sentence. It also handles cop-
information item extraction. And capturing as
ula and passive voice.
many concepts as possible benefits the process of
3
information item selection: only the most salient Common mistakes provoked by this decision can be
found in Section 4.2.
2 4
Table 9 provides an example of a simplified sentence. https://universaldependencies.org/

1277
Information Items
Simplify Simplified Text Analyze Generate InIts
(InIts)

Source Text Generate surface Calculate SVO


representation Generated Text frequencies

Simplified Text Score InIts

Summary Select Surface


Select InIts
Representation

Figure 1: Our Abstractive Summarization Framework.

3. direct and indirect object rule sets other. We apply extracted SVO and named entity
corresponding objects if they exist. head frequencies from the previous step to score
4. xcomp rule handles open clausal comple- InIts directly. This gives us the flexibility to
ments of verbs and adjective. She looks [very choose which parts of InIts to use for scoring.
beautiful]. I consider him [a fool]. He tried Our scoring is based on the idea that InIts that
[to run]. cover the main topic of the document contain the
5. pp rule identifies remaining prepositional most frequent SVO concepts and named entities
phrases. They talked [about London]. in any of their components. Given the flexibility to
work with InIts directly and not the raw text, we
All the InIts are stored internally as an ordered experimented with scoring on SVO components
list. and also combined them with open clausal com-
Calculation of frequencies. In this stage we plements and prepositional phrases. While scor-
analyze InIts and calculate concept frequencies ing, we calculate matches not only between can-
of all the verb, subject and object phrase heads of didate noun phrase heads, but other phrase con-
the input. For the purpose of evaluating effective- stituents es well.
ness of concept frequencies, we also incorporated For testing purposes we also integrate a modifi-
their term frequency scoring for comparative pur- cation of this step that, instead of scoring InIts
poses. Our scoring strategy is based on the idea directly, applies SVO and named entity frequen-
that there is “a very strong correlation between cies to the simplified text. This configuration is
concepts of topic and subject in English.” (Foley, indicated with the dashed arrow in Figure 1. It al-
1994). And it has also been shown in previous re- lows us to compare how much information is lost
search on text summarization that subjects, verbs during the transformation and generation stages.
and objects play a crucial role in content selection Text Generation. We generate sentences from
and cannot be dropped (Harabagiu and Lacatusu, InIts with the help of SimpleNLG realization
2010). Along with the SVO frequencies we also engine (Gatt and Reiter, 2009). The order of text
calculate term frequencies of named entities that generation rules is defined mainly by functionali-
represent subject or object phrase heads. ties of SimpleNLG and follows these steps:
Information Items Scoring. Unlike the ap-
proaches of Genest and Lapalme (2011); Lloret • generate a noun phrase (NP) to represent the
et al. (2015) in our approach, InIt scoring subject if present;
and surface realization are independent from each • generate the main verb;

1278
• generate an NP to represent direct object if At this development phase, our approach gen-
present; erates summaries operating exclusively with the
• generate an NP for indirect object if present; words present in the original text. However, as
• generate prepositional phrases a result of the syntactic simplification, they are
• generate open clausal complements if present likely to be reorganized into shorter sentences.
(xcomp transformation rule); and, Moreover, some of the words are ordered differ-
• assemble all the components and generate the ently or not included into generated sentences as
verb phrase (VP). a consequence of the implemented translation and
We do not do any other modifications apart from surface realization rules. These operations create
syntactic simplification of long sentences in the summaries that go beyond the literal extraction of
preprocessing step of our approach. This means, original text fragments.
for example, that we do not convert passive con- We evaluate the content selection part of our ap-
structions into active ones. However, since we al- proach with ROUGE toolkit and use human eval-
ways use the same order for the text generation uation to assess the linguistic quality of generated
rules, the original order of constituents may be summaries as described in Sections 4.1 and 4.2 re-
changed, i.e. prepositional phrases will always be spectively.
generated after subject, verb or objects, despite the
fact that in the original sentence they may be in 4.1 Informativeness
a different position. Generated sentences play no Following the example of recent works on ab-
role in InIts scoring or InIt selection. They stractive text summarization we used ROUGE
remain on hold until the selection of InIts and toolkit (Lin, 2004) to evaluate generated sum-
surface representation stage. maries (Vilca and Cabezudo, 2017; Hsu et al.,
Information Items Selection. At this stage, 2018). ROUGE-1 and ROUGE-2 are used to as-
we inspect all the InIts and reject the ones sess informativeness and together with ROUGE-
with empty text representation generated by Sim- SU4 they have been found to correlate well with
pleNLG. human judgement. The longest common subse-
Selection of Surface Representation. For all quence ROUGE-L is used to assess fluency. We
the remaining ranked InIts, starting from the compared our summaries to the human summaries
highest ranked one, we add each InIt’s surface provided for DUC 2002 corpus, and each text can
representation to the final summary until the max- be evaluated against at least 2 of them.
imum allowed size has been reached. Once we We also calculated average pairwise ROUGE
reach it we reorder sentences to preserve the orig- values for human summaries to identify the high-
inal order of simplified sentences that each InIt est score that an abstractive summary can obtain
originated from and deliver the summary. For sur- with ROUGE (see Table 1).
face representation our approach allows the selec- The selected baseline was implemented with the
tion of either a representation generated with Sim- help of our method such that each original text
pleNLG or the simplified sentence. In this final passes through all the stages specified in Section 3,
stage we do not integrate additional date or loca- including sentence simplification and surface real-
tion information as Genest and Lapalme (2011), ization stages but avoiding the SVO and named en-
but if an InIt contained them among its preposi- tity scoring. To produce the baseline summary we
tional phrases, they are included into the generated applied tf-scoring to such regenerated sentences.
sentence by SimpleNLG. This ensures that the baseline is an abstractive
4 Evaluation summary only differing in the scoring method.
We compared our approach to two state-of-the-
Our approach is evaluated on DUC 2002 dataset art approaches for abstractive text summarization
for the single document summarization task5 . Af- of a different nature: 1) Vilca and Cabezudo’s
ter discarding duplicates, the dataset consists of (2017) approach based on AMR graphs and
530 newswire articles. Each article is accompa- Rhetorical Structure Theory; and, 2) the approach
nied by one or more manually created abstractive proposed by Hsu et al. (2018) based on deep learn-
model summaries of approximately 100 words. ing and combines abstractive and extractive com-
5
http://duc.nist.gov/ ponents. To compare our approach with the latter

1279
R-1 R-2 R-L R-SU4 Grammaticality:
1. TAS gave not details of Gorbachev ’s suggestion.
Human 0.507 0.218 0.460 0.239 2. Six bodies were founded in the hull of the ferry by
Ours 0.410 0.154 0.378 0.180 Police.
3. The Lone Star Statuette were built by Chicago ’s
Baseline 0.378 0.138 0.351 0.163 Creative House Promotions.
Hsu’18 abs 0.266 0.116 0.239 0.126 Redundancy:
Vilca’17 0.244 0.231 - 0.033 1. Martin Nelson was another meteorologist at the cen-
ter at center.
Table 1: ROUGE scores for different summariza- 2. Dullah Omar was an activist and family friend of the
tion methods. Mandelas of Mandelas.
3. A resolution promises reforms. A resolution
promises reforms.
Completeness:
one, we used their abstractive model pre-trained 1. A quake of 6 on the scale is capable.
on CNN/Daily Mail dataset of newswire articles. 2. Reunification mishandled.
3. Arthur Andersen wanted.
Table 1 shows that our approach outperforms
Table 3: Examples of some of the mistakes pro-
both the abstractive baseline and the approach of
duced by our approach.
Hsu et al. (2018) on all the ROUGE metrics. It
also outperforms Vilca and Cabezudo’s (2017) ap-
4.2 Human Evaluation
proach on 3 of the 4 metrics.
To illustrate how our approach and the approach For our preliminary human evaluation of gener-
of Hsu et al. (2018) modify original sentences, we ated summaries, we used the statistical formula to
contrast an extractive term-frequency based sum- calculate the correct size of a representative sam-
mary with the abstractive summaries generated by ple that was proposed by Pita-Fernández (1996)
both of the approaches (see Table 2). For con- and successfully applied to different NLP tasks
venience, the common chunks between the sum- (Vázquez et al., 2010; Lloret et al., 2019). For
maries are numbered and surrounded by square DUC 2002 dataset, a representative sample con-
brackets, while the unique chunks are italicized. sists of 77 documents that we randomly chose
from the corpus. They were evaluated according
to the following criteria based on the DUC guide-
Our approach: [More than 4,000 workers at a coal lines, but adapted to the specific task and errors:
mine in the southern city of Jastrzebie went to demand
legalization of Solidarity and higher wages on strike]1 .
[Workers on the overnight shift at the Manifest Lipcowy • grammaticality - grammatical correctness of
mine stayed outside the mine shaft]2 . [The miners are the summary (i.e. number agreement);
demanding the legalization of Solidarity]4 . The workers
are calling for higher wages and better working condi- • non-redundancy - no unnecessary repetitions;
tions. The workers are requesting two lawyers and two and,
economists. Workers at the Rudna copper mine near the
city of Wroclaw staged a protest rally.[Workers at facto- • completeness - completeness of grammatical
ries around the northern port of Gdansk joined striking construction (i.e. a missing direct object of a
shipyard workers.]5 . transitive verb).
Extractive TF summary: Solidarity spokeswoman
Katarzyna Ketrzynska said [[workers on the overnight
shift at the Manifest Lipcowy mine stayed outside the The generated abstractive summaries were as-
mine shaft]2 all night and were joined by workers arriv- sessed on a five-point Likert scale by 3 external
ing for the morning shift]3 . The strike began at noon
today, according to Katrzynska. She said [the miners annotators without any knowledge about how the
are demanding the legalization of Solidarity]4 and re- summaries were produced. A grammatically cor-
instatement of workers fired for union activities. Three rect, non-redundant and complete summary would
members of Solidarity were barred Saturday from work-
ing. On Aug. 16, 1980, [workers at factories around receive a score of 5-5-5 respectively. The results
the northern port of Gdansk joined striking shipyard in Table 4 show that the summaries produced by
workers]5 to form Solidarity, the first and only indepen- our approach scored above the average on the three
dent trade federation in the Soviet bloc.
Hsu’18: [more than 4,000 workers at a coal mine in criteria.
the southern city of jastrzebie went on strike today to
demand legalization of solidarity and higher wages]1 . Measure Score
[[workers on the overnight shift at the manifest lipcowy
mine stayed outside the mine shaft]2 all night and were Grammaticality 3.60
joined by workers arriving for the morning shift]3 . Non-redundancy 3.71
Table 2: An comparison of abstractive summaries Completeness 3.81
with an extractive summary. Table 4: Average scores for human evaluation.

1280
Table 3 shows examples of such mistakes. Upon applied the SVO frequencies to the simplified text
closer inspection we detected that the complete- and delivered it in the final summary. This set-
ness errors are often caused by incorrect parses. ting overcomes two possible limitations of our ap-
Some of the grammatical errors are produced by proach: it also scores the parts of the sentence that
SimpleNLG, whereas others refer to the cases are not included into an InIt and provides more
not covered by information item extraction rules. text for the future recall evaluation with ROUGE.
Contrary to our predictions, the non-redundancy Results in Table 6 show a slight improvement over
rate was above the average. The overall linguistic the InIt-based scoring, but the difference is not
quality looks promising and reveals areas for im- as high as we expected. We may conclude that
provement. However, there is a need for a deeper our InIt extraction rules capture most of the in-
evaluation that is planned for future work. formation, and surface realization rules generate
sufficient material for the ROUGE evaluation.
5 Further Experiments and Discussion
R-1 R-2 R-L
In this section we analyze the impact of each of the InIt 0.4102 0.1544 0.3776
components of our method on the informativeness Simpl. text 0.4181 0.1668 0.3797
of generated summaries.
Table 6: ROUGE evaluation of text-based scoring
5.1 Syntactic Constituents
We experimented with different configurations 5.3 Effect of Concept Frequency Scoring
of our scoring module to test whether the sub- Word sense disambiguation and the resulting con-
ject, verb and object are enough for the scor- cept scoring should positively affect InIt selec-
ing, or should be extended with open clausal tion as well. Table 7 shows that in this setting the
complements and prepositional phrases to im- difference between term and concept frequencies
prove its performance. For this purpose we ap- is almost non-existent. We believe that if we inte-
plied the scoring in three different contexts: ex- grate the entire noun phrase when calculating SVO
clusively SVO (SVO); the SVO extended with frequencies and not only the noun phrase head, it
clausal complements (SVO+xComp); and, the may lead to a more significant difference.
SVO+xComp extended with prepositional phrases
(SVO+xComp+PPs). This means that the scor- R-1 R-2 R-L
ing module checked occurrences of the most fre- SVO cf 0.4102 0.1544 0.3776
quent SVO elements only in subject-object-object SVO tf 0.4100 0.1545 0.3777
triplets or in the extended structures. The results Table 7: ROUGE scores for concept and term fre-
in Table 5 show that there is some improvement quency scoring.
in performance when additional syntactic compo-
nents are included. We believe that this improve- 5.4 Simplification and Recall
ment may increase as the corpus increases, since Our motivation behind the integration of a syntac-
a larger corpus will contain more cases of open tic simplification module was to reach a greater
clausal complements and prepositional phrases. degree of concept granularity that would allow us
to select only the most salient InIts while dis-
R-1 R-2 R-L
carding the less relevant ones. We tested our ap-
SVO 0.4064 0.1522 0.3756 proach both with and without simplification. The
SVO xComp 0.4078 0.1523 0.3765
results revealed in Table 8 indicate that working
SVO xComp PPs 0.4102 0.1544 0.3776
with original text yields a slightly better recall.
Table 5: ROUGE scores for syntactic components Close inspection showed that our simplification
module generates syntactically more simple sen-
5.2 Generation and Recall tences, but introduces more repetitions that are
Another experimental setup addresses the question
of how much important information is lost during R-1 R-2 R-L
the generation stage. As described in Section 3, Simplified 0.4102 0.1544 0.3776
we integrated a setting (signaled with the dashed Original 0.4169 0.1588 0.3803
arrow in Figure 1) that, instead of scoring InIts, Table 8: ROUGE scores for simplification test.

1281
picked up by the SVO and named entity scoring. syntactic sentence simplification positively affects
Consider the example in Table 9: the parse tree depth metric. However, it also gen-
erates summaries with greater lexical density.
Original sentence: Greek marine archaeologists fo-
cus on locating and surveying historic wrecks scattered
around the Aegean and rarely carry out excavations.
FRE DC PTD
Simplified: Ours 50.74 10.56 8.30
1. Greek marine archaeologists focus on locating. Human 42.76 10.45 10.51
2. Greek marine archaeologists focus on surveying his-
toric wrecks scattered around the Aegean. Original 43.51 10.13 11.48
3. Greek marine archaeologists carry out excavations.
Simplified summary: Table 10: Readability metrics for different meth-
1. Greek marine archaeologists focus. ods.
2. Greek marine archaeologists carry out excavations.
Original summary:
not included 6 Conclusions and Future Work
Table 9: Simplification example. This paper presents a broad-based abstractive
When we split a long sentence into several text summarization method that outperforms
shorter ones with the repeated subject, the scor- other state-of-the-art abstractive approaches while
ing module gives them more importance by con- maintaining acceptable linguistic quality and re-
sidering the repeated subject to be the topic of the dundancy rate. Our approach is based on the set of
document. If some of these split sentences are in- syntactic rules that transform text into its seman-
cluded in the final summary, the repeated subject tic representation as well as the combination of
noun phrase takes summary space that otherwise subject-verb-object concept frequency and named
could be occupied by a different phrase. On the entity frequency for scoring.
other hand if the subject of such a split phrase is The results show that some aspects of the pro-
the true topic of the document, our method gener- posed approach require improvement. Integration
ates a very topic-focused summary. We hypothe- of the entire subject and object noun phrases for
size that scoring should be performed on the orig- the calculation of frequencies may increase in-
inal subject-verb-object distribution of the docu- formativeness of the generated summaries. Co-
ment so as to avoid scoring for repeated subjects. reference resolution and sentence fusion may help
to lower the degree of redundancy introduced
5.5 Summary Readability through the syntactic sentence simplification.
Readability is rarely studied in detail in the context In future work, we plan to integrate these im-
of automatic text summarization. Our summariza- provements and to evaluate our method on other
tion approach integrates syntactic simplification datasets such as CNN/Daily Mail dataset. First,
that results in syntactically simpler summaries and a larger dataset can provide more insights on the
concept frequency scoring that may yield sum- relative importance of open clausal complements,
maries with richer vocabulary when compared to prepositional phrases and concept frequency for
term frequency based ones. To assess readability information item rating. Second, it will allow us
of generated summaries we calculated their Flesch to gauge the weaknesses and strengths of our ap-
Reading Ease (FRE), Dale-Chall (DC) and depth proach, which is based on the concept of informa-
of the parse tree (PTD) scores. These three met- tion items and handcrafted syntactic transforma-
rics give us a quick but complete assessment of tion rules, via a comparative analysis with state-
the length, vocabulary and syntactic complexity- of-the-art deep learning and semantic graph ap-
based readability aspects. Higher FRE and lower proaches.
PTD and DC values correspond more comprehen-
Acknowledgments
sible texts.
Results in Table 10 show that human summaries This research work has been partially funded by
include longer sentences and words, and are also the University of Alicante (Spain), Generalitat
more concept dense than the original texts. Human Valenciana and the Spanish Government through
summaries also tend to consist of syntactically less the projects SIIA (PROMETEU/2018/089),
complex sentences. Unlike human summaries, our LIVING-LANG (RTI2018-094653-B-C22), IN-
approach generates more comprehensible texts in TEGER (RTI2018-094649-B-I00) and Red iGLN
terms of sentence and word length. As expected, (TIN2017-90773-REDT).

1282
References the Association for Computational Linguistics.
ACL, Melbourne, Australia, volume 1, pages
S. Alshaina, A. John, and A. G. Nath. 2017. 132–141. https://aclanthology.info/papers/P18-
Multi-document abstractive summarization 1013/p18-1013.
based on predicate argument structure. In
2017 IEEE International Conference on Sig- Wei Li. 2015. Abstractive multi-document sum-
nal Processing, Informatics, Communication marization with semantic information extrac-
and Energy Systems (SPICES). pages 1–6. tion. In Proceedings of the 2015 Conference
https://doi.org/10.1109/SPICES.2017.8091339. on Empirical Methods in Natural Language
Processing. Association for Computational Lin-
Danqi Chen, Jason Bolton, and Christopher D. Man- guistics, Lisbon, Portugal, pages 1908–1913.
ning. 2016. A thorough examination of the https://doi.org/10.18653/v1/D15-1219.
CNN/daily mail reading comprehension task. In
Proceedings of the 54th Annual Meeting of the Chin-Yew Lin. 2004. ROUGE: A package for
Association for Computational Linguistics (Volume automatic evaluation of summaries. In Text
1: Long Papers). Association for Computational Summarization Branches Out: Proceedings of
Linguistics, Berlin, Germany, pages 2358–2367. the ACL-04 Workshop. Association for Computa-
https://doi.org/10.18653/v1/P16-1223. tional Linguistics, Barcelona, Spain, pages 74–81.
https://www.aclweb.org/anthology/W04-1013.
William A Foley. 1994. Information structure. In
Roland E. Asher and Joy M. Y. Simpson, edi- Elena Lloret, Ester Boldrini, Tatiana Vodolazova, Patri-
tors, The Encyclopedia of Language and Linguistics, cio Martı́nez-Barco, Rafael Muñoz, and Manuel
Pergamon Press, Oxford, volume 3, pages 1678– Palomar. 2015. A novel concept-level approach
1685. for ultra-concise opinion summarization. Ex-
pert Systems with Applications 42(20):7148–7156.
Albert Gatt and Ehud Reiter. 2009. Simplenlg: A re- https://doi.org/10.1016/j.eswa.2015.05.026.
alisation engine for practical applications. In Pro-
ceedings of the 12th European Workshop on Natu- Elena Lloret, Tatiana Vodolazova, Paloma Moreda,
ral Language Generation. Association for Compu- Rafael Muñoz, and Manuel Palomar. 2019. Are
tational Linguistics, Stroudsburg, PA, USA, ENLG better summaries also easier to understand? An-
’09, pages 90–93. alyzing text complexity in automatic summariza-
tion: Challenges, models, and approaches. In
Pierre-Etienne Genest and Guy Lapalme. 2010. Text Marina Litvak and Natalia Vanetik, editors, Mul-
generation for abstractive summarization. In Pro- tilingual text analysis challenges, models, and ap-
ceedings of the Third Text Analysis Conference. proaches, World Scientific, New Jersey, pages 337–
National Institute of Standards and Technology, 369. https://doi.org/10.1142/9789813274884 0010.
Gaithersburg, Maryland, USA.
Christopher Manning, Mihai Surdeanu, John Bauer,
Pierre-Etienne Genest and Guy Lapalme. 2011. Jenny Finkel, Steven Bethard, and David McClosky.
Framework for abstractive summarization us- 2014. The Stanford CoreNLP natural language pro-
ing text-to-text generation. In Proceedings cessing toolkit. In Proceedings of 52nd Annual
of the Workshop on Monolingual Text-To-Text Meeting of the Association for Computational Lin-
Generation. Association for Computational guistics: System Demonstrations. Association for
Linguistics, Portland, Oregon, pages 64–73. Computational Linguistics, Baltimore, Maryland,
https://www.aclweb.org/anthology/W11-1608. pages 55–60. https://doi.org/10.3115/v1/P14-5010.
Som Gupta and S.K. Gupta. 2018. Abstractive Parth Mehta. 2016. From extractive to abstractive sum-
summarization: An overview of the state of the marization: a journey. In Proceedings of the ACL
art. Expert Systems with Applications 121:49–65. 2016 student research workshop. pages 100–106.
https://doi.org/10.1016/j.eswa.2018.12.011. https://www.aclweb.org/anthology/P16-3015.
Sanda Harabagiu and Finley Lacatusu. 2010. Us- Ani Nenkova and Kathleen McKeown. 2012.
ing topic themes for multi-document summariza- A survey of text summarization techniques.
tion. ACM Trans. Inf. Syst. 28(3):13:1–13:47. In Mining text data, Springer, pages 43–76.
https://doi.org/10.1145/1777432.1777436. https://doi.org/10.1007/978-1-4614-3223-4 3.
Michael Heilman and Noah A Smith. 2010. Extract- Lluı́s Padró and Evgeny Stanilovsky. 2012. Freeling
ing simplified statements for factual question gener- 3.0: Towards wider multilinguality. In Proceed-
ation. In Proceedings of QG2010: The Third Work- ings of the Language Resources and Evaluation
shop on Question Generation. pages 11–20. Conference (LREC 2012). ELRA, Istanbul, Turkey.
https://www.aclweb.org/anthology/papers/L/L12/L12-
Wan Ting Hsu, Chieh-Kai Lin, Ming-Ying Lee, 1224/.
Kerui Min, Jing Tang, and Min Sun. 2018.
A unified model for extractive and abstractive Salvador Pita-Fernández. 1996. Determinación del
summarization using inconsistency loss. In tamaño muestral. Cadernos de atención primaria
Proceedings of the 56th Annual Meeting of 3(3):138–141.

1283
M. F. Porter. 1997. An algorithm for suffix stripping.
In Karen Sparck Jones and Peter Willett, editors,
Readings in Information Retrieval, Morgan Kauf-
mann Publishers Inc., San Francisco, CA, USA,
pages 313–316.
Yoan Gutiérrez Vázquez, Antonio Fernández Orquı́n,
Andrés Montoyo Guijarro, and Sonia Vázquez
Pérez. 2010. Integración de recursos semánticos
basados en wordnet. Procesamiento del lenguaje
natural 45:161–168.
Gregory César Valderrama Vilca and Marco Anto-
nio Sobrevilla Cabezudo. 2017. A study of abstrac-
tive summarization using semantic representations
and discourse level information. In Kamil Ekštein
and Václav Matoušek, editors, Text, Speech, and
Dialogue. Springer International Publishing, Cham,
pages 482–490. https://doi.org/10.1007/978-3-319-
64206-2 54.

1284

You might also like