A Framework For Multi-Document Abstractive Summarization Based On Semantic Role Labelling
A Framework For Multi-Document Abstractive Summarization Based On Semantic Role Labelling
a r t i c l e
i n f o
Article history:
Received 18 May 2014
Received in revised form 7 September 2014
Accepted 29 January 2015
Available online 20 February 2015
Keywords:
Abstractive summary
Semantic role labeling
Semantic similarity measure
Language generation
Genetic algorithm
a b s t r a c t
We propose a framework for abstractive summarization of multi-documents, which aims to select contents of summary not from the source document sentences but from the semantic representation of
the source documents. In this framework, contents of the source documents are represented by predicate argument structures by employing semantic role labeling. Content selection for summary is made
by ranking the predicate argument structures based on optimized features, and using language generation for generating sentences from predicate argument structures. Our proposed framework differs from
other abstractive summarization approaches in a few aspects. First, it employs semantic role labeling for
semantic representation of text. Secondly, it analyzes the source text semantically by utilizing semantic
similarity measure in order to cluster semantically similar predicate argument structures across the text;
and nally it ranks the predicate argument structures based on features weighted by genetic algorithm
(GA). Experiment of this study is carried out using DUC-2002, a standard corpus for text summarization.
Results indicate that the proposed approach performs better than other summarization systems.
2015 Elsevier B.V. All rights reserved.
1. Introduction
The information on Web is growing at exponential pace. In the
current era of information overload, multi-document summarization is an essential tool that creates a condensed summary while
preserving the important contents of the source documents. The
automatic multi-document summarization of text is a major task in
the eld of natural language processing (NLP) and has gained more
consideration in recent years [1]. One of the problems of information overload is that many documents share similar topics, which
creates both difculties and opportunities for natural language systems. On one hand, the similar information conveyed by several
different documents, causes difculties for the end users, as they
have to read the same information repeatedly. On the other side,
such redundancy can be used to identify accurate and signicant
information for applications such as summarization and question
answering. Thus, summaries that synthesize common information
across many text documents would be useful for users and reduce
their time for nding the key information in the text documents.
Such a summary would signicantly help users interested in single
event described in many news documents [1]. In this paper, we propose a framework that will automatically fuse similar information
across multiple documents and use language generation to produce
a concise abstractive summary.
Two approaches are employed to multi-document summarization: extractive and abstractive. Most of the studies have focused
on extractive summarization, using techniques of sentence extraction [2], statistical analysis [3], discourse structures and various
machine learning techniques [4]. On other hand, abstractive summarization is a challenging area and dream of researchers [5],
because it requires deeper analysis of the text and has the capability to synthesize a compressed version of the original sentence or
may compose a novel sentence not present in the original source.
The goal of abstractive summarization is to improve the focus of
summary, reduce its redundancy and keeps a good compression
rate [6].
Past literature shows that there have been a few research efforts
made toward abstractive summarization. Many researchers have
tried to generate abstractive summaries using various methods.
These abstractive methods can be grouped into two categories: Linguistic (Syntactic) based approach and Semantic based approach.
Linguistic (Syntactic) based approach employs syntactic parser
to analyze and represent the text syntactically. Usually, in this
approach, verbs and nouns identied by syntactic parser are used
for text representation and further processed to generate the
738
739
Table 1
Core arguments and adjunctive arguments.
Core Arguments
V
A0
A1
A2
A3
A4
A5
Verb
Subject
Object
Indirect Object
Start point
End point
Direction
Adjunctive arguments
ArgM-DIR Direction
ArgM-MNR Manner
ArgM-LOC Location
ArgM-TMP Temporal marker
ArgM-PRP Purpose
ArgM-NEG Negation
ArgM-REC Reciprocal
AM-DIS Discourse marker
ranked predicate argument structures are selected from each cluster (as described in Section 2.5). Finally, the SimpleNLG realisation
engine [22] is employed to generate sentences from the selected
predicate argument structures. The generated sentences will form
the nal abstractive summary (as discussed in Section 2.6).
2.2. Semantic role labeling
The aim of semantic role labeling (SRL) is to determine the syntactic constituents/arguments of a sentence with respect to the
sentence predicates, identify the semantic roles of the arguments
such as Agent, Patient and Instrument, and the adjunctive arguments of the predicate such as Locative, Temporal and Manner
[23]. The primary task of SRL is to identify what semantic relation
a predicate holds with its participants/constituents. As abstractive
summarization requires deeper semantic analysis of text, therefore, this study employs semantic role labeling to extract predicate
argument structure from sentences in the document collection. The
framework uses SENNA [20] tool distributed under open source
and non-commercial license, and yields a host of natural language
processing (NLP) predictions: semantic role labeling (SRL), part-ofspeech (POS) tags, named entity recognition (NER) and chunking
(CHK). We employ SENNA tool in our framework for SRL, POS tags
and NER.
At rst, we decompose the document collection into sentences
in such a way that each sentence is preceded by its corresponding
document number and sentence position number. Next, SENNA
semantic role labeler is employed to parse each sentence and
properly labels the semantic word phrases. These phrases are
referred to as semantic arguments. The semantic arguments can
be grouped in two categories: core arguments (Arg) and adjunctive
arguments (ArgM) as shown in Table 1. In this study, we consider
A0 for subject, A1 for object, A2 for indirect object as core arguments, and ArgM-LOC for location, ArgM-TMP for time as adjunctive
arguments for predicate (Verb) V. We consider all the complete
predicates associated with the single sentence structure in order
to avoid loss of important terms contributing to the meaning of
sentence, and the actual predicate of the sentence. We assume
that predicates are complete if they have at least two semantic
arguments. The extracted predicate argument structure is used
as semantic representation for each sentence in the document
collection. A sentence containing one predicate is represented by
After applying semantic role labeling to sentence S, the corresponding two predicate argument structures are obtained as
follows:
P1: [A0: Tropical Storm Gilbert] [V: form] [ArgM-LOC: in the eastern Caribbean]
P2: [A0: Tropical Storm Gilbert] [V: strengthen] [A2: into a hurricane Saturday night]
Both predicate argument structures P1 and P2 are associated
with a single sentence S and hence the sentence S is represented
by a composite (more than one) predicate argument structure. Both
incomplete predicate argument structures (PASs) and the PASs that
are nested in a larger predicate argument structure are ignored.
Example 2. Consider the following two sentences represented by
simple predicate argument structures.
S1: Eventually, a huge cyclone hit the entrance of my house.
S2: Finally, a massive hurricane attack my home
The corresponding simple predicate argument structures P1 and
P2 are obtained after applying semantic role labeling to sentences
S1 and S2:
P1: [AM-TMP: Eventually] [A0: a huge cyclone] [V: hit] [A1: the
entrance of my house]
P2: [AM-DIS: Finally] [A0: a massive hurricane] [V: attack] [A1: my
home]
2.2.1. Processing of predicate argument structures
Once the predicate argument structures (PASs) are obtained,
they are split into meaningful words or tokens, followed by removal
of stop words. The words in PASs are stemmed to their base form
using porter stemming algorithm [24]. Next, SENNA POS tagger [20]
is employed to label each term of semantic arguments (associated
with the predicates), with part of speech (POS) tags or grammatical
roles. The POS tags NN stands for noun, V for verb, JJ for adjective
and RB for adverb, etc. This step is required, as semantic arguments
of the predicates will be compared based on grammatical roles of
the terms. In this study, we compare only terms of the semantic
arguments of the predicates which are labeled as noun (NN) and
the rest are ignored as discussed in Section 2.3. After POS tagging,
the two predicate argument structures P1 and P2 in example 2 are
as follows:
740
P1: [A0: a massive (JJ) hurricane NN] [V: attack] [A1: my home
(NN)]
P2: [AM-TMP: Eventually (RB)] [A0: a huge (JJ) cyclone (NN)]
[VBD: hit] [A1: the entrance (NN) of my house (NN)]
We also employ SENNA NER [20] to identify named entities such
as person names (cabral), and organization names (civil defense)
in the semantic arguments of predicate. These named entities are
stored for each predicate argument structure (PAS) and is required
in later phase for scoring the PAS based on proper noun feature.
This study compares predicate argument structures based on
nounnoun, verbverb, locationlocation and timetime arguments. Therefore, the framework extracts only tokens from
predicate argument structure, which are labeled as noun, verb,
location, and time as identied in previous steps. All the PASs associated with the sentence will be included in comparison. Once the
nouns, verbs, and other arguments (time and location) if exist, are
extracted, the predicate argument structures obtained in example
2 after further processing will become
P1: [A0: hurricane NN] [V:attack] [A1: home (NN)]
P2: [AM-TMP: Eventually (RB)] [A0: cyclone (NN)] [VBD: hit] [A1:
entrance (NN), house (NN)]
In next phase, the noun home in the semantic argument A1
of PAS P1 will be compared with both of the nouns entrance and
house in the semantic argument A1 of PAS P2. The temporal (time)
semantic argument Eventually in P2 is skipped from comparison
as there is no corresponding temporal argument in P1.
The similarity score between the two predicate argument structures is computed using Eqs. (1)(5).
simarg (vik , vjl ) = sim(A0i , A0j ) + sim(A1i , A1j ) + sim(A2i , A2j )
(1)
(2)
(3)
(4)
(5)
The semantic similarity computation of the two predicate argument structures discussed in example 2 is depicted in Fig. 2.
In order to compute the similarity score of the given two
terms/concepts in the semantic argument A0 of predicates P1
and P2; let concept C1 = hurricane and concept C2 = cyclone as
shown in Fig. 2. First, Jiangs measure [21] uses WordNet to compute
the least common subsumer (lso) of two concepts, then determines
IC(C1), IC(C2), and IC(lso (C1, C2)). The information content (IC) of
concept is achieved by determining the probability of occurrence
of a concept in a large text corpus and quantied as follows:
IC(C) = log P(C)
(6)
Freq (C)
N
(7)
(8)
741
IC(C2) = 10.6671
IC(lso(C1, C2)) = 10.6671
According to Eq. (8), the similarity between concepts
C1 = Hurricane and C2 = Cyclone is computed as follows:
Jiangdist = 11.0726 + 10.6671 2 10.6671 = 0.4055
The similarity of other concepts/terms is determined in the same
manner. However, simtmp (vik, vjl ) and simloc (vik, vjl ) are set 0 as there
are no temporal and location arguments for comparison in both
predicate argument structures. According to Eq. (5), the similarity
score of the given two predicate argument structures is computed
as follows:
sim(vik, vjl ) = 0.8571 + [0.4055 + 0.29 + 1 + 0 + 0]
sim(vik, vjl ) = 2.5526
In order to normalize the result in the range of [0,1], we use
a scaling factor eSim(vik, vjl ) [27], Where is constant set to 0.05
(optimal value).
= e0.05X2.5526
Msim (Pi , Pj )
if i =
/ j
otherwise
(10)
2.5.1.2. Length of predicate argument structure. We use the normalize length of the PAS, which is the ratio of number of words in the
PAS over the number of words in the longest PAS of the document
[2].
The aim of this phase is to select high ranked predicate argument structures from each cluster based on features weighted and
optimized by genetic algorithm (GA). At rst, all the sentences in
the document collection are represented by corresponding predicate argument structure (PAS) extracted through SENNA SRL. The
following features are extracted for each predicate argument structure (PAS) in the document collection and hence each PAS is
represented by a vector representing the weight of these features
P = {P F1, P F2,. . .P F10}.
(9)
P F2 =
(11)
2.5.1.3. PAS to PAS similarity. For each predicate argument structure P, the semantic similarity between P and other predicate
argument structures in the document collection is computed using
Eq. (5). Once the similarity score for each predicate argument structure (PAS) is achieved, then the score of this feature is obtained by
computing the ratio of sum of similarities of PAS P with all other
PASs over the maximum of summary in the document collection
[34].
P F3 =
sim(Pi , Pj )
Max
(sim(Pi , Pj ))
(12)
(13)
2.5.1.5. Proper nouns. The predicate argument structure that contains more proper nouns is considered as signicant for inclusion in
summary generation. This feature identies proper nouns as words
beginning with a capital letter. The score of this feature is computed
as the ratio of the number of proper nouns in the PAS over the length
of PAS [33]. Length of PAS is the number of words/terms in the PAS.
P F5 =
(14)
2.5.1.6. Numerical data. The predicate argument structure containing numerical data such as number of people killed, is regarded as
important for inclusion in summary generation. The score for this
742
Initial population of
chromosomes
(15)
Fitness evaluation
2.5.1.7. Number of nouns and verbs. Some sentences may have
more than one predicate argument structure associated with them,
represented by a composite predicate argument structure and considered important for summary. The score of this feature [34] is
computed as follows:
P F7 =
(16)
2.5.1.8. Temporal feature. The predicates argument structure containing time and date information for an event is considered as
important for summary generation. The score of this feature is computed as ratio of the number of temporal information (time and
date) in the PAS over the length of PAS [35].
P F8 =
No
(18)
Population =
Yes
Terminate?
(17)
Optimized feature
weights
0.25 0.08 0.74 0.53 0.26 0.32 0.42 0.73 0.21 0.62
Chromosome 1
0.77 0.67 0.03 0.22 0.31 0.01 0.02 0.73 0.51 0.42
Chromosome 2
Chromosome N
.
..
0.54 0.87 0.52 0.45 0.33 0.71 0.39 0.49 0.83 0.18
The weight of semantic term is calculated as follows:
N
Wi = Tfi x Idfi = Tfi x log
ni
(19)
where Tfi is the term frequency of the semantic term i in the document, N is the total number of documents, and ni is the number of
documents in which the term i occurs. This feature is computed as
the ratio of sum of weights of all semantic terms in the PAS over the
maximum summary of the term weights of PAS in the document
collection [33].
i=1
P F10 =
Wi (P)
i=1
Max (
(20)
Wi (P))
F(x) =
Countmatch (gramn )
S{Reference Summaries}gramn S
(21)
Count(gramn )
S{Reference Summaries}gramn S
10
Wk xP Fk (Pi )
743
(22)
k=1
744
P1: [A0: Floods] [V: prevented] [A1: ofcials] [A2: from reaching
the hotel zone in Cancun]
P2: [A0: ofcials] [V: reaching] [A2: the hotel zone in Cancu]
Step 8: This step denes the constituent from reaching the hotel
zone in Cancun along with the specied indirect object feature.
p.setIndirectObject(from reaching the hotel zone in Cancun);
(23)
where SCUs refers to the summary content units and their weights
correspond to number of model (human) summaries they appeared
in.
The precision for peer summary [41] or candidate summary is
computed as follows.
Precision =
745
Average precision
Models
AS GA SRL
AS SRL
Best
Avg
0.69
0.50
0.44
0.28
0.17
0.85
0.70
0.60
0.75
0.67
3.2. Results
The results of optimal feature weighting using genetic algorithm are depicted in Fig. 5. The optimal feature weights obtained
are 0.121493, 0.41313752, 0.7118985, 0.351493, 0.24141418,
0.18693995, 0.134772, 0.614672, 0.475472, 0.21308642 correspond to the weights for title, PAS to PAS similarity, position, proper
nouns, numerical data, temporal feature, length, nouns and verbs,
TF IDF, frequent semantic term, respectively.
The proposed framework is evaluated in the context of
multi-document abstractive summarization task, using 59 news
articles/data sets provided by the Document Understanding Evaluations 2002 (DUC, 2002). For each data set, our framework
generates a 100 words summary, the task undertaken by other
746
[4] B. Larsen, A trainable summarizer with knowledge acquired from robust NLP
techniques, in: Advances in Automatic Text Summarization, MIT Press, 1999,
pp. 71.
[5] H.P. Luhn, The automatic creation of literature abstracts, IBM J. Res. Dev. 2
(1958) 159165.
[6] P.-E. Genest, G. Lapalme, Framework for abstractive summarization using textto-text generation, in: Proceedings of the Workshop on Monolingual Text-ToText Generation, 2011, pp. 6473.
[7] I. Titov, A. Klementiev, A Bayesian approach to unsupervised semantic role
induction, in: Proceedings of the 13th Conference of the European Chapter of
the Association for Computational Linguistics, 2012, pp. 1222.
[8] R. Barzilay, K.R. McKeown, M. Elhadad, Information fusion in the context of
multi-document summarization, in: Proceedings of the 37th Annual Meeting
of the Association for Computational Linguistics on Computational Linguistics,
1999, pp. 550557.
[9] H. Tanaka, A. Kinoshita, T. Kobayakawa, T. Kumano, N. Kato, Syntax-driven
sentence revision for broadcast news summarization, in: Proceedings of
the 2009 Workshop on Language Generation and Summarisation, 2009,
pp. 3947.
[10] S.M. Harabagiu, F. Lacatusu, Generating single and multi-document summaries
with GISTEXTER, in: Document Understanding Conferences, 2002.
[11] C.-S. Lee, Z.-W. Jian, L.-K. Huang, A fuzzy ontology and its application to
news summarization, IEEE Trans. Syst. Man Cybern. B: Cybern. 35 (2005)
859880.
[12] P.-E. Genest, G. Lapalme, Fully abstractive approach to guided summarization,
in: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers vol. 2, 2012, pp. 354358.
[13] C.F. Greenbacker, Towards a framework for abstractive summarization of multimodal documents, in: ACL HLT 2011, 2011, p. 75.
[14] I.F. Moawad, M. Aref, Semantic graph reduction approach for abstractive text
summarization, in: 2012 Seventh International Conference on Computer Engineering & Systems (ICCES), 2012, pp. 132138.
[15] S. Shehata, F. Karray, M.S. Kamel, An efcient concept-based retrieval
model for enhancing text retrieval quality, Knowl. Inf. Syst. 35 (2013)
411434.
[16] L. Del Corro, R. Gemulla, ClausIE: clause-based open information extraction, in:
Proceedings of the 22nd International Conference on World Wide Web, 2013,
pp. 355366.
[17] J. Persson, R. Johansson, P. Nugues, Text Categorization Using
PredicateArgument Structures, vol. 1, 2008, pp. 142149.
[18] N. Jadhav, P. Bhattacharyya, Dive deeper: deep semantics for sentiment analysis, ACL 2014 (2014) 113.
[19] N. Salim, SRLGSM: a hybrid approach based on semantic role labeling and
general statistic method for text summarization, J. Appl. Sci. 10 (2010) 166173.
[20] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa, Natural language processing (almost) from scratch, J. Mach. Learn. Res. 12 (2011)
24932537.
[21] J.J. Jiang, D.W. Conrath, Semantic Similarity Based on Corpus Statistics and
Lexical Taxonomy, 1997 arxiv:cmp-lg/9709008.
[22] A. Gatt, E. Reiter, SimpleNLG: a realisation engine for practical applications, in:
Proceedings of the 12th European Workshop on Natural Language Generation,
2009, pp. 9093.
[23] C. Aksoy, A. Bugdayci, T. Gur, I. Uysal, F. Can, Semantic argument
frequency-based multi-document summarization, in: 24th International Symposium on Computer and Information Sciences 2009 (ISCIS 2009), 2009,
pp. 460464.
[24] M.F. Porter, Snowball: A Language for Stemming Algorithms, 2001.
[25] Y. Li, Z.A. Bandar, D. McLean, An approach for measuring semantic similarity
between words using multiple information sources, IEEE Trans. Knowl. Data
Eng. 15 (2003) 871882.
[26] G.A. Miller, WordNet: a lexical database for English, Commun. ACM 38 (1995)
3941.
[27] P. Achananuparp, X. Hu, C.C. Yang, Addressing the variability of natural
language expression in sentence similarity with semantic structure of the sentences, in: Advances in Knowledge Discovery and Data Mining, Springer, 2009,
pp. 548555.
[28] F. Murtagh, V. Contreras, Methods of Hierarchical Clustering, 2011,
arxiv:1105.0121.
[29] S. Takumi, S. Miyamoto, Top-down vs bottom-up methods of linkage for
asymmetric agglomerative hierarchical clustering, in: 2012 IEEE International
Conference on Granular Computing (GrC), 2012, pp. 459464.
[30] M. Steinbach, G. Karypis, V. Kumar, A comparison of document clustering techniques, in: KDD Workshop on Text Mining, 2000, pp. 525526.
[31] Y. Zhao, G. Karypis, U. Fayyad, Hierarchical clustering algorithms for document
datasets, Data Mining Knowl. Discov. 10 (2005) 141168.
[32] A. El-Hamdouchi, P. Willett, Comparison of hierarchic agglomerative clustering
methods for document retrieval, Comput. J. 32 (1989) 220227.
[33] L. Suanmali, N. Salim, M.S. Binwahlan, Fuzzy Logic Based Method for Improving
Text Summarization, 2009, arxiv:0906.4690.
[34] M.A. Fattah, F. Ren, GA, MR, FFNN, PNN and GMM based models for automatic
text summarization, Comput. Speech Lang. 23 (2009) 126144.
[35] J.-M. Lim, I.-S. Kang, J. Bae, J.-H. Lee, Sentence extraction using time features in multi-document summarization, in: Information Retrieval Technology,
Springer, 2005, pp. 8293.
[36] G. Salton, Automatic Text Processing: The Transformation, Analysis, and
Retrieval of Information by Computer, Addison-Wesley, 1989.
747
[40] C.-Y. Lin, Rouge: a package for automatic evaluation of summaries, in: Text
Summarization Branches Out: Proceedings of the ACL-04 Workshop, 2004, pp.
7481.
[41] A. Nenkova, R. Passonneau, Evaluating Content Selection in Summarization:
The Pyramid Method, 2004.