0% found this document useful (0 votes)

12 views

An Overview of Extractive Based Automati

summarization

Uploaded by

yeshi telay

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

An Overview of Extractive Based Automati

summarization

Uploaded by

yeshi telay

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 5, October 2016

AN OVERVIEW OF EXTRACTIVE BASED AUTOMATIC

TEXT SUMMARIZATION SYSTEMS
Kanitha.D.K 1 And D. Muhammad Noorul Mubarak2
1
Compuational Linguistics, Department of Linguistics, University of Kerala,
Kariavattom, Thiruvananthapuram.
2
Department of Computer Science, University of Kerala, Kariavattom,
Thiruvananthapuram,

ABSTRACT:
The availability of online information shows a need of efficient text summarization system. The text
summarization system follows extractive and abstractive methods. In extractive summarization, the
important sentences are selected from the original text on the basis of sentence ranking methods. The
Abstractive summarization system understands the main concept of texts and predicts the overall idea
about the topic. This paper mainly concentrated the survey of existing extractive text summarization
models. Numerous algorithms are studied and their evaluations are explained. The main purpose is to
observe the peculiarities of existing extractive summarization models and to find a good approach that
helps to build a new text summarization system.

KEYWORDS:
Text summarization, Abstractive summarization, Extractive summarization, Statistical methods, Latent
semantic analysis.

1. INTRODUCTION
Large number of text materials is available on internet in any topic. The user searches a number
of web pages to find out the relevant information. It takes time and effort to the user. An
efficient summarizer generate summary of document within a limited time. Mani and Maybury
(1999) defined an automatic text summarization as the process of distilling the most important
information from a source (or sources) to produce an abridged version for a particular user (or
users) and task (or tasks) [26].

Text summarization methods can be classified into abstractive and extractive summarization
(Hahn.U, and Mani.I. 2000) [15]. In abstractive summarization Natural language generation
techniques are used for summarization. It understands the original document and retells it in few
words same as human summarization. The extractive summarization method select the important
sentences, paragraphs etc from the original document and concatenate into shorter form. The
sentences are extracted on the basis of statistical, heuristic and linguistic methods. Most of the
text summarization systems used extractive summarization method based on statistical and
algebraic methods which generate an accurate summary in large datasets and give overall opinion
about the document. Abstractive summarization approaches are more complex than extractive
summarization.

This paper primarily aims to examine the efficiency of summarization methods. This paper is
DOI:10.5121/ijcsit.2016.8503 33
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 5, October 2016

organized as follows. Section 1 describes a brief introduction about text summarization

techniques. Section 2 describes the existing models that focusing on extractive techniques.
Section 3 discusses the advantages and disadvantages of each method. Section 4 describes some
of the standards for evaluating summaries automatically and Section 5 concludes the paper.

2. EARLY WORK ON TEXT SUMMARIZATION

The text summarization systems started in early 1950’s. Most of the early work on text
summarization focused on single document summarization in technical articles. Due to lack of
powerful computers and technological developments, summarization systems consider some
simple surface level features of sentences like word frequency, position, length of the sentence
etc. In 1970’s Artificial Intelligence technology was developed and most of the summarization
systems depend on AI technology. The AI technology based summarization systems are domain
dependent systems.

In 1980 some summarization systems are developed on the basis of cognitive science theory. In
1990 Information retrieval methods are used for domain independent summarization. The IR
technique doesn’t consider synonymy, and polysemy. In 1995 Machine learning techniques are
developed and it is highly used in summarization systems. The machine learning algorithms are
bayesian classifier, hidden Markov model, long linear model, neural network etc. Now the
statistical and mathematical techniques are widely used for extractive text summarization. The
technological developments and its advantages and disadvantages are explained in Table1.

Table 1: Technological Developments in Text Summarization

Year Methods Advantages Disadvantages Models

The sentences which Duplication in Luhn,1958[25];
Simple surface
include most frequent summary sentences. Edmundson,1969[11]
1958 level features
words are selected as etc.
of sentences.
summary sentences.
Frames or templates are Only limited frames Azzam, Humphreys, and
used to identify the or templates may Gaizauskas, 1999[2];
conceptual relation of lead to incomplete DeJong, 1979[10];
entities and extract the analysis of Graesser, 1981;
Artificial
1970 relation between entities conceptual entities. McKeown and Radev,
Intelligence.
by an assumption. 1995[27]; Schank and
Abelson, 1977[32];
Young and Hayes,
1985[35] etc.
The system can Complex task and Rinehart, S. D., Stahl, S.
overcome the limited to specified A., Erikson, L.G.
Cognitive redundancy in some area. 1986[31]; Jones, R. C.
1980 science extent. Extract the 2006[19]; Johnston, P.
theories representative sentences H. 1983[18]etc.
from the source text.
Generate significant Doesn’t consider the Aone, Okurowski,
sentence from source semantic aspects Gorlinsky, & Larsen,
Information text same as such as synonymy 1997[1]; Goldstein,
1990 retrieval information retrieval and polysemy Kantrowitz, Mittal, &
techniques techniques. Carbonell, 1999[13];
Hovy & Lin, 1997
[16]etc.
1995 Machine Different machine Computationally Kupiec. J, Pedersen. J
34
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 5, October 2016

learning learning algorithms are complex and lack of and Chen. F. (1995)[21];
techniques used and provides more semantic analysis of Conroy, J. M. &
generalized summary. source text. O'Leary, D. P. 2001[9];
Osborne, M. 2002[28]
etc.
Depended on some Without any Gong and Liu, 2001[14];
Statistical and heuristics, linguistics syntactic analysis of Steinberger, J. and
1997 Algebraic and mathematical the source text. Jezek, K. 2004[29] etc.
methods techniques. Easy to
implement.

2.1 SOME MODELS IN EXTRACTIVE TEXT SUMMARIZATION

2.1.1 LUHN METHOD (1958)[25]

Luhn created the first automatic text summarizer for summarize technical articles. The author
ranked each sentence in the document on the basis of word frequency and phrase frequency.
After performing the stemming and stop word removal, then calculates the word frequency. He
stated that the word frequency shows a useful measure for significant factor of a sentence. All
sentences are ranked on the basis of significant factor and get top rank sentences. The top ranked
sentences are selected as summary sentences.

2.1.2 BOXENDALE MODEL (1958)[6]

Boxendale proposed a position method for sentence extraction. He argued that some significant
sentences are placed in some fixed positions. The author checked 200 paragraphs in newspaper
articles and 85% of the paragraphs, the topic sentence come first and 7% come last. So he stated
that in newspaper articles the first sentence in each paragraph got high chance to include in
summary. In 1997 Lin and Hovy claimed that Baxendale position method is not a suitable
method for sentence extraction in different domains. Because the discourse structure of a
sentence varies from different domains.

2.1.3 EDMUNDSON METHOD (1969)[11]

Edmundson developed a new method in automatic summarization. This method computes the
candidate sentence by adding some features of sentences such as keywords, cue phrases, title plus
heading and sub heading words and sentence location. This sentence scoring parameters are used
to extract the top ranked sentences. The stop words are removed from the source document. The
sentences include cue words like conclusion, according to the study etc gets high score. This
method also gives high score to title word, heading and sub-heading words which are included in
the sentences. Through location feature, conclusion sentences in technical documents and the
first and last sentences in the newspaper articles gets high score. The score of each sentence is
computed as follows:

Si= w1Ci+w2Ki+w3Ti+w4Li……………. (1)

Where Si is the score of sentence i. Ci, Ki and Ti are the scores of sentence i based on the number
of cue phrases, keywords and title words. Li is the score of location in the document. w1, w2, w3
and w4 are the weights for linear combination of the four scores.

35
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 5, October 2016

2.1.4 TRAINABLE DOCUMENT SUMMARIZER (KUPIEC. J, PEDERSEN. J AND CHEN. F. ,1995) [21].

Trainable Document Summarizer executes sentence extraction on the basis of some sentence
weighting methods. The important methods used in this summarizer are:

• Sentence length cutoff feature - sentences containing less than a pre-specified number
of words are excluded by sentence length cutoff feature.
• Cue words and phrases related sentences are included
• The first sentence in each paragraph is included
• Thematic words -The most frequent words are included.

Thus the sentences are ranked on the basis of the above features and high scored sentences
are selected as summary sentences.

2.1.5 ANES (BRANDOW, MITZE AND RAU 1995)[8]

ANES text extraction system is a domain-independent summary system for summarize news
articles. The process of summary generation has four major elements such as:

a. Calculation of the tf*idf weights for all terms.

b. Terms with a high tf*idf weight plus headline-words.
c. Summing over all signature word weights plus the relative location score.
d. Select the high scored sentences as summary sentences.

2.1.6 BARZILAY & ELAHADAD SYSTEM, 1997[4]

Barzilay & Elahadad, develop a summarizer based on lexical chain method. The sentences are
extracted by the collection of the similar words which form a lexical chain. The concept of
lexical chain was introduced in Morris and Hirst, 1991. The lexical chain links the semantically
related terms with the different parts of source document. Barzilay and Elhadad used a wordnet
to construct the lexical chains.

2.1.7 BOGURAEV, BRANIMIR & KENNEDY (BOGURAEV, BRANIMIR AND CHRISTOPHER

KENNEDY, 1997)[7]

The authors develop a single document and domain independent system. The linguistic
techniques are used to identify the main topic. The sentences are selected on the basis of noun
phrases, title word and topic related sentences.

2.1.8 FOCISUM (KAN, MIN-YEN AND KATHLEEN MCKEOWN (1999))[20]

The summarization system follows a question answering approach. It is a two stage system, first
takes a question then summarizes the source text then gives answer to the question. The system
first uses a named entity extractor to find the important term of the document. The system also
follows existing information extraction features of sentence like word frequency and type of
terms. The result is a concatenation of sentence fragments and phrases found in the original
document.

36
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 5, October 2016

2.1.9 SUMMARIST (HOVY AND LIN 1999 )[16]

Lin and Hovy, 1997 studied the importance of sentence position method proposed by Baxendale,
1958. In 1999, Lin and Hovy develop a machine learning model for summarization using
decision trees instead of a naive Bayes classifier. Summarist system produces summaries of the
web documents. The system provides abstractive and extractive based summaries. Summarist
first identifies the main topics of the document using the chain of lexically connected sentences.
Wordnet and dictionaries are used for identify the lexically connected sentences. The statistical
techniques such as position, cue phrases, numerical data, proper name, word frequency etc are
used for extractive summary.

2.1.10 MULTIGEN (BARZILAY, MCKEOWN AND ELHADAD, 1999)[4]

MultiGen is a multi document summarization system. The system identified similarities and
differences across the documents by applying the statistical techniques. It extracted high weight
sentences that represent key portion of information in the set of related documents. This is done
by apply the machine learning algorithm to group paragraph sized chunks of text in related
topics. Sentences from these clusters are parsed and the resultant trees are merged together to
form the logical representations of the commonly occurring concepts. Matching concepts are
selected on the basis of the linguistic knowledge such as stemming, part-of-speech, synonymy
and verb classes.

2.1.11 CUT AND PASTE SYSTEM (JING, HONGYAN AND KATHLEEN MCKEOWN. 2000)[20]

The Cut and Paste system designed to understand the key concepts of the sentences. These key
concepts are then combined to form new sentences. The system first copies the surface form of
these key concepts and pasted them into the summary sentences. The key concepts are achieved
by probabilities learnt from a training corpus and lexical links.

2.1.12 CONROY ET AL. (CONROY, J. M. & O'LEARY, D. P., 2001)[9]

The work presented by Conroy, J. M. & O'Leary, D. P., considered the probability of inclusion of
a sentence in summary depends on whether the previous sentence is related next sentence based
on HMM (Hidden Markov Model).The sentences are classified into two states such as summary
sentences and non summary sentences. The lexically connected sentences are selected into
summary sentences.

2.1.13 SWESUM( HERCULES DALIANIS., 2000)[34]

SweSum create summaries from Swedish or English texts either the newspaper or academic
domains. Sentences are extracted according to weighted word level features of sentences. It uses
statistical, linguistic and heuristic methods to generate summary. The methods are Baseline, First
sentence, Title, Word frequency, Position score, Sentence length, Proper names and Numerical
data etc. The processed text is newspaper articles so the first sentence in the paragraphs got high
score. The formula is, 1/n, where n is the line number, this method is called Baseline. It built a
combination of function on above parameters and extracts the required summary sentences.

37
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 5, October 2016

2.1.14 MEAD (RADEV, H. Y. JING, M. STYS AND D. TAM, 2001)[30]

MEAD computed the score of a sentence on the basis of a centroid score. The centroid score is
formed on the basis of tf-idf values, similarity to the first sentence of the document, position of
the sentence in the document, sentence length etc. The highest ranked sentences are selected as
summary sentences. This summarizer produced single and multi document summaries.

2.1.15 WEBINESSENCE (RADEV, 2001)[30]

This system is an improved version MEAD summarizer. It is a web based summarizer for web
pages. The architecture of the system includes two stages. The first stage the system collects
URLs from the different web pages and extracts the news articles in same event. The second
stage clusters the data from different documents. A centroid algorithm is used for find the
representative sentences. Avoid repetition and generate a final summary.

2.1.16 TEXT SUMMARIZATION USING TERM WEIGHTS (R.C.BALABANTARY, D.K.SAHOO,

B.SAHOO, M.SWAIN. 2012)[5]

The authors developed a statistical approach to summarize the source text. The sentences are
split into tokens and remove the stop words. After remove the stop words then a weight value is
assigned to each individual term. The weight is calculated on the basis of frequency of a term in
the sentence divided by frequency of term in the document. Then add a additional score to the
weight of terms which are appear in bold, italic, underlined or any combination of these. Then
rank the individual sentence according to their weight value that is calculated as weight of
individual term divided by total number of terms in that sentence. Finally, extract the higher
ranked sentences include the first sentence of the first paragraph of the input text to generate
summary.

2.1.17 LSA FOR DOCUMENT SUMMARIZATION [22]

LSA is a technique for extracting the hidden semantic representation of terms, sentences, or
documents (Landauer & Dumais, 1997). It is an unsupervised method for extract the semantics of
terms by examines the co-occurrence of words. The first step of this approach is the
representation of input documents as a word by sentence matrix A. Each row represents a word
from the document and each column represents a sentence in the document. So A=mXn matrix
that means ‘m’ words and ‘n’ sentences. The Singular Value Decomposition (SVD) from linear
algebra is applied to matrix A. The SVD of mXn matrix is defined as A=U∑VT. Matrix U is an
mXn matrix of real numbers. Matrix ∑ is diagonal nXn matrix. The VT matrix is nXn matrix each
row represented as sentences. Gong and Liu (2001) [14]proposed a method of LSA for document
summarization to recognize the important topics in the document without the use of wordnet.
They consider each rows of matrix VT and select the sentences with the highest value.
Steinberger and Jezek (2004)[33] proposed an improved method for document summarization.
Murray, Renals and Carletta (2005) proposed an approach for summarizing meeting recordings
using LSA. Text summarization using a trainable summarizer and latent semantic analysis are
proposed by Yeh, Ke, Yang and Meng (2005). This approach sentence ranking depends on graph
based method and LSA based method.

2.1.18 POURVALI AND ABADEH MOHAMMAD (2012) [29]

The authors approach was based on lexical chains method and the exact meaning of each word in
the text is determined by using WordNet and Wikipedia. The score of sentence is determined by
38
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 5, October 2016

the number and type of relation in the chains. The sentences that got highest chains are selected
as final summary sentences.

2.1.19 S.T. KHUSHBOO, R.V.D.DHARASKAR AND M.B.CHANDAK (2010)[13]

They proposed a method based on graph based algorithms for text summarization. This method
constructs a graph from the source text. The nodes are represented as sentences and the edges are
represents the semantic relation between sentences. The weight of each node is calculated and
the highest ranking sentences are selected for final summary.

2.1.20 DISCOURSE BASED SUMMARIZER (LI CHENGCHENG, 2010)[24]

The author proposed a summarizer depend on rhetorical structure theory. This technique based
on analysis of discourse structure of sentence. The sentence score is calculated on its relevance
factor. The relevant sentences got the highest weight and irrelevant sentences got low weight.

3. COMPARISON OF EXTRACTIVE TEXT SUMMARIZATION MODELS

No Models Criteria Type Level Corpus Advantages Disadvantages

. for of of
sentence docum process
selection ent ing
1. Luhn Word Single Surface Technical The highest word Duplication in
method frequency articles. frequency summary.
(1958) and phrase sentences are
frequency. selected to
summary
sentences.
2. Baxend Position Single Surface Technical It is used in the It is related to the
ale method. documents. system where discourse
method machine learning structure of
(1958) systems are sentence.
complex. The discourse
structure of
sentence varies
from different
domain.
3. Edmuns Word Single Surface Technical Foundation for Redundancy in the
on frequency, documents. many existing summary and
method cue extractive computationally
(1969) phrases, summarization complex.
title and method.
heading
words,
sentence
location.
4. Trainabl Machine Single Surface Technical It provides a Machine learning
e learning documents. universal techniques are
Docume techniques summary. computationally
nt . complex.
Summar
izer
(1995)
39
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 5, October 2016

5. ANES Tf*idf Single Surface Domain The main topic The summarizer
(1995) independe related sentences doesn’t handle
nt are included in various sub topics.
summary
sentences.
6. Barzilay Lexical Single Entity Consider the It requires deep
& chain semantic syntactic and
Elahada method relationship semantic structure
d among sentences of a sentence.
(1997) and provides
representative
summary.
7. Bogurae Noun Single Entity Extract the Requires linguistic
v & phrases, sentences in knowledge.
Kenned Title same context.
y (1997) related
terms
8. Focisum Named Single Entity News Extract the Requires a
(1998) entity articles. information same question generator
recognitio way as a question for information
n and answering extraction. The
informatio system. The summary is the
n number of word result of question.
extraction co-occur in the
techniques questions are
. extracted as
summary.
9. Summar Statistical Single Surface Web Extract the Computationally
ist and and documents. representative complex method.
(1999) linguistica multi sentences as
l docum summary
ent sentences.
10. MultiGe Syntactic Multi Entity News Generate multi Require the
n analysis docum articles document language
(1999) ent from summaries. processing tools.
different
web pages.
11. Cut and Statistical Single Surface Generate Complex method
Paste cohesive
System summary.
(1999)
12. SweSu Statistical, Single Surface News Generate the Restricted to some
m linguistic article representative specific domain.
(Hercul methods summary.
es
Dalianis
, 2000)
13. Conroy, HMM Single Surface News Lexically related Difficult to
J. M. & article sentences. compute
O'Leary
, D. P.
2001
14. MEAD Cluster Single Surface News Summary from Duplication in
(Radev, based and article single and summary.
H. Y. multi multiple

40
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 5, October 2016

Jing, M. docum documents.

Stys and ent
D. Tam,
2001)
15. WebInE Cluster Single Surface News Summarize news
ssence based and articles articles in
(Radev, multi different web
2001) docum pages.
ent
16. Discour Statistical Discour News Linguistic Compute all the
se and se articles analysis of rhetorical relation
based, linguistic source text. between sentences
2010 method. is difficult.
17. Graph Statistical Single Surface Graph based Computationally
based,2 and method to form complex.
010 multi final summary.
docum
ent
18. Text Statistical Surface Extract more It generally
Summar based relevant depends on format
ization sentences. of the text.
using
Term
Weights
(R.C.Ba
labantar
y,
D.K.Sa
hoo,
B.Sahoo
,
M.Swai
n. 2012)
19. Pourvali Statistical Surface Generate Requires language
, 2012 semantic based processing tools.
summary.
20. LSA Statistical Single Surface/ News Semantically Non availability
based and and Entity articles, related sentences of syntactic
summar algebraic multipl technical and easy to analysis and world
ization method e documents, implement. knowledge.
books etc.

4. EVALUATION OF SUMMARIZATION SYSTEMS

Evaluation of summaries is an important aspect of text summarization. A general policy to

evaluate the quality of a summarization system is absent in existing models. The authors provide
different approaches for summary evaluation. In some systems the quality of a summary is
determined by grammatically and its relevancy to the user. If the summary is satisfactory then the
system summary meets the needs of a user.

Mainly the evaluation method can be classified as intrinsic and extrinsic methods. The intrinsic
methods evaluate the quality of summary on the basis of manual summary. The extrinsic
evaluation evaluates how the summary affects the other task. Most of the summarization system
41
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 5, October 2016

follows combination of methods to evaluate the quality of summary. Precision and Recall
measures are used by the most of the extractive based summarization systems. Most of the
systems evaluate the quality of summary on the basis of manual summary. Comparing manual
summaries with system summaries are not appropriate. Because the human select the different
sentence in different times same way the different authors choose different sentences as summary
sentences. Recently some system follows SEE, ROUGE, BE methods for summary evaluation
(Lin,C.Y., Hovy, E. 2003)[23].

5. CONCLUSION
This paper examines the efficiency and accuracy of existing summarization systems. The
summarizer systems in earlier stage mainly concentrate some simple statistical features of
sentences and summarize only the technical articles. For a generic summarization these systems
are not produce the satisfactory result. The above extractive summarization systems follows
statistical, linguistic and heuristics methods. The statistical methods are tf method, tf-idf method,
graph based, machine learning, lexical based, discourse based, cluster based, vector based, LSA
based etc. The statistical methods follow supervised and unsupervised learning algorithms. The
machine learning algorithm related models are generates coherent and cohesive summary but the
algorithms are computationally complex and needs large storage capacity. These algorithms are
overcome the redundancy in some extent and the systems are domain independent. Some
systems follow the statistical and linguistic based methods. It also generate good summary but
the linguistic analysis of source document required heavy machinery for language processing.
The lexical based method requires semantic dictionaries and thesaurus. The discoursed based
methods analyze the rhetorical structure of documents. Complete analysis of source document is
very difficult. At the same time the statistical and algebraic method LSA extract the semantically
related sentences without the use of wordnet and online dictionaries. The systems provide a
domain independent generic summary rather than a query based summary. The LSA based
systems summarize the large datasets within the limited time and produce satisfactory result.

REFERENCES

[1] Aone, C., Okurowski, M. E., Gorlinsky, J., & Larsen, B. (1997). A scalable summarization system
using robust NLP. In Proceedings of the ACL’97/EACL’97 workshop on intelligent scalable text
summarization (pp. 10–17), Madrid, Spain.
[2] Azzam, S., Humphreys, K. R., Gaizauskas. (1999). Using coreference chains for text summarization.
In Proceedings of the ACL’99 workshop on co-reference and its applications (pp. 77–84), College
Park, MD, USA.
[3] Baldwin., Breck., & Thomas.S.Morton. (1998). Dynamic coreference-based summarization. In
Proceedings of the Third Conference on Empirical Methods in Natural Language Processing,
Granada, Spain,June.
[4] Barzilay, R., & Elhadad, M. (1997). Using lexical chains for text summarization. In Proceedings
ISTS'97.
[5] Balabantary.R.C., Sahoo.D.K., Sahoo.B., & Swain.M (2012). “Text summarization using term
weights”, International Journal of computer applications, Volume 38-No.1
[6] Baxendale, P. B. (1958). ‘Machine-made index for technical literature: an experiment’. IBM Journal,
354–361.
[7] Boguraev., Branimir., & Christopher Kennedy. (1997). Sailence-based content characterisation of text
documents. In proceedings of ACL'97 Workshop on Intelligent, Scalable Text Summarization,Pages
2-9,Madrid, Spain.
[8] Brandow., Ronald., Karl Mitz., & Lisa. F. Rau. (1995). Automatic condensation of electronic
publications by sentence selection. Information Processing and Management, 31(5):657-688.

42
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 5, October 2016

[9] Conroy, J. M., & O'Leary, D. P. (2001). Text summarization via hidden Markov models and pivoted
QR matrix decomposition. Tech. Rep., University of Maryland, College Park.
[10] DeJong, G.F. (1979). Skimming stories in real time: an experiment in integrated understanding.
Doctoral Dissertation. Computer Science Department, Yale University.
[11] Edmundson, H.P. (1969). New Methods in Automatic Extracting, Journal of the ACM, 16(2):264-285.
[12] Eduard Hovy., (2003) The Oxford Handbook of Computational Linguistics, Oxford University Press,
Oxford, chapter 32.
[13] Goldstein, J., Kantrowitz, M., Mittal, V., & Carbonell, J. (1999). Summarizing text documents:
sentence selection and evaluation metrics. In Proceedings of the 22nd annual international ACM
SIGIR conference on research and development in information retrieval (SIGIR’99) (pp. 121–128),
Berkeley, CA, USA.
[14] Gong.Y., & Liu. X (2001) Generic text summarization using relevance measure and latent semantic
analysis. In Proceedings of the Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval, pages 19–25.Hahn, U., & Mani, I. (2000). The challenges of
automatic summarization.
[15] Hahn, U., & Mani. I. (2000). The challenges of automatic summarization.
[16] Hovy, E., & Lin, C.Y. (1997). Automatic text summarization in SUMMARIST. In Proceedings of the
ACL’97/EACL’97 workshop on intelligent scalable text summarization (pp. 18–24), Madrid, Spain.
[17] Jen-Yuan Yeh, Hao-Ren Ke, Wei-Pang Yang, & I-Heng Meng. (2005). Text summarization using a
trainable summarizer and Latent Semantic Analysis.
[18] Johnston, P. H. (1983). Reading Comprehension Assessment: A Cognitive Basis. Newark, Delaware:
IRA.
[19] Jones, R. C. (2006). Reading quest: Summarizing strategies for reading comprehension. TESOL
Quarterly, 8, 48-69.
[20] Kan., Min-Yen., & Kathleen McKeown. (1999). Information extraction and summarization: Domain
independence through focus types. Technical report, Computer Science Department, Columbia
University, New York.
[21] Kupiec, J., Pedersen, J., & Chen, F. (1995). A trainable document summarizer. In Proceedings of the
18th annual international ACMSIGIR conference on research and development in information
retrieval (SIGIR’95) (pp. 68–73), Seattle, WA, USA.
[22] Landauer, T., & Dumais, S. T. (1997). A solution to Plato’s problem: The Latent Semantic Analysis
theory of the acquisition, induction, and representation of knowledge, Psychological Review, 104:
211-240.
[23] Lin.C.Y, & Hovy.E, “Automatic Evaluation of Summaries Using N-gram Concurrence Statistics”, in
2003 Language Technology Conference, Edmonton, Canada, 2003.
[24] Li Chengcheng, .(2010).“Automatic text summarization based on rhetorical structure theory”,
Computer application system modeling, 2010 International Conference.
[25] Luhn, H.P. (1958). The automatic creation of literature abstracts. IBM Journal of Research and
Development, 2:159–165.
[26] Mani, I., & Maybury, M.T. (Eds.) (1999). Advances in automated text summarization. Cambridge,
US: The MIT Press.
[27] McKeown, K., & Radev, D. R. (1995). Generating summaries of multiple news articles. In
Proceedings of the 18th annual international ACM SIGIR conference on research and development in
information retrieval (SIGIR’95) (pp. 74–82), Seattle, WA, USA.
[28] Osborne, M., (2002). Using maximum entropy for sentence extraction. In Proceedings of the Acl-02,
Workshop on Automatic Summarization, Volume 4 (Philadelphia Pennsylvania), Annual Meeting of
the ACL, Association for Computational Linguistics, Morristown.
[29] Pourvali,M., & Abadeh Mohammad,S.(2012). “Automated text summarization base on lexical chain
and graph using of word net and wikipedia knowledge base”, IJCSI International Journal of Computer
Science, Issues No.3, Vol.9.
[30] Radev, H. Y. Jing, M. Stys & D. Tam. (2004) Centroid-based summarization of multiple documents.
Information Processing and Management, 40: 919-938.
[31] Rinehart, S. D., Stahl, S. A., & Erikson, L.G. (1986). Some effects of summarization training on
reading. Reading Research Quarterly, 21(4), 422-435.
[32] Schank, R., & Abelson, R. (1977). Scripts, Plans, Goals, and Understanding. Hillsdale, NJ: Lawrence
Erlbaum Associates.
43
International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 5, October 2016

[33] Steinberger, J., and Jezek, K. (2004). Using latent semantic analysis in text summarization and
summary evaluation. Proceedings of ISIM’04, pages 93-100.
[34] SweSum - A Text Summarizer for Swedish Hercules Dalianis, NADA-KTH, SE-100 44 Stockholm,
Sweden.
[35] Young, S. R., & Hayes, P. J. (1985). Automatic classification and summarization of banking telexes.
In Proceedings of the 2ndConference on Artificial Intelligence Applications (pp. 402–408).

Analysis of Abstractive and Extractive Summarizati
No ratings yet
Analysis of Abstractive and Extractive Summarizati
11 pages
Rane, Govilkar - 2019 - Recent Trends in Deep Learning Based Abstractive Text Summarization-Annotated
No ratings yet
Rane, Govilkar - 2019 - Recent Trends in Deep Learning Based Abstractive Text Summarization-Annotated
8 pages
(IJCST-V3I4P21) : Ms - Pallavi.D.Patil, P.M.Mane
No ratings yet
(IJCST-V3I4P21) : Ms - Pallavi.D.Patil, P.M.Mane
7 pages
An Extractive Approach for English Text
No ratings yet
An Extractive Approach for English Text
11 pages
Text Summarizer Using NLP (Natural Language Processing) : © JUL 2022 - IRE Journals - Volume 6 Issue 1 - ISSN: 2456-8880
No ratings yet
Text Summarizer Using NLP (Natural Language Processing) : © JUL 2022 - IRE Journals - Volume 6 Issue 1 - ISSN: 2456-8880
6 pages
RVVM
No ratings yet
RVVM
9 pages
A Domain-Specific Automatic Text Summarization Using Fuzzy Logic
No ratings yet
A Domain-Specific Automatic Text Summarization Using Fuzzy Logic
13 pages
Text Summarization Using Natural Language Processing
No ratings yet
Text Summarization Using Natural Language Processing
5 pages
An Overall Survey of Extractive Based Automatic Text Summarization Methods
No ratings yet
An Overall Survey of Extractive Based Automatic Text Summarization Methods
6 pages
Text Summarization Using Word Frequency
No ratings yet
Text Summarization Using Word Frequency
3 pages
State of The Art Text - Summarisation
No ratings yet
State of The Art Text - Summarisation
15 pages
Automatic Text Summarization Using: Hybrid Fuzzy GA-GP
No ratings yet
Automatic Text Summarization Using: Hybrid Fuzzy GA-GP
7 pages
Research Paper Summer Izer
No ratings yet
Research Paper Summer Izer
6 pages
Research Final
No ratings yet
Research Final
6 pages
An Automatic Text Summarization Using Feature Terms For Relevance Measure
No ratings yet
An Automatic Text Summarization Using Feature Terms For Relevance Measure
5 pages
ATSSI Abstractive Text Summarization Using Sentiment Infusion
No ratings yet
ATSSI Abstractive Text Summarization Using Sentiment Infusion
7 pages
150
No ratings yet
150
6 pages
A Comparative Study On Text Summarization Methods: Abstract
No ratings yet
A Comparative Study On Text Summarization Methods: Abstract
7 pages
Coas Ojit 0502 03065k
No ratings yet
Coas Ojit 0502 03065k
16 pages
Abstractive Text Summarization: State of The Art, Challenges, and Improvements
No ratings yet
Abstractive Text Summarization: State of The Art, Challenges, and Improvements
38 pages
Text Summarization Using Python NLTK
No ratings yet
Text Summarization Using Python NLTK
8 pages
Comparative Study of Text Summarization Methods
No ratings yet
Comparative Study of Text Summarization Methods
6 pages
Research Paper On Text
No ratings yet
Research Paper On Text
7 pages
Paper A Survey On ETS
No ratings yet
Paper A Survey On ETS
6 pages
Optimal Features Set For Extractive Automatic Text Summarization
No ratings yet
Optimal Features Set For Extractive Automatic Text Summarization
6 pages
Automatic Text Summarization Using Python
No ratings yet
Automatic Text Summarization Using Python
8 pages
Extractive Text Summarization Using Word Frequency
No ratings yet
Extractive Text Summarization Using Word Frequency
6 pages
Extractive Text Summarization: Motilal Nehru National Institute of Technology Allahabad
No ratings yet
Extractive Text Summarization: Motilal Nehru National Institute of Technology Allahabad
29 pages
A Graph Based Approach On Extractive Summarization
No ratings yet
A Graph Based Approach On Extractive Summarization
9 pages
Recent Approaches For Text Summarization
No ratings yet
Recent Approaches For Text Summarization
13 pages
A Review Paper On Extractive Techniques of Text Summarization
No ratings yet
A Review Paper On Extractive Techniques of Text Summarization
4 pages
Automatic Summarization of Document Using Machine Learning
No ratings yet
Automatic Summarization of Document Using Machine Learning
3 pages
Automatic Summarisation II: Methods
No ratings yet
Automatic Summarisation II: Methods
84 pages
Types of Extractive Methods
No ratings yet
Types of Extractive Methods
22 pages
Extractive Text Summarization Using Sentence Ranking: J.N.Madhuri Ganesh Kumar.R
No ratings yet
Extractive Text Summarization Using Sentence Ranking: J.N.Madhuri Ganesh Kumar.R
3 pages
Text Summarizing Using NLP
No ratings yet
Text Summarizing Using NLP
8 pages
A.V.C. College of Engineering: Mayiladuthurai, Mannampandal-609 305
No ratings yet
A.V.C. College of Engineering: Mayiladuthurai, Mannampandal-609 305
21 pages
A Survey On Abstractive Text Summarization
No ratings yet
A Survey On Abstractive Text Summarization
7 pages
Automatic Text Summarization by Extracti
No ratings yet
Automatic Text Summarization by Extracti
15 pages
IEEE_Conference_Template__3_
No ratings yet
IEEE_Conference_Template__3_
4 pages
EASESUM: An Online Abstractive and Extractive Text Summarizer Using Deep Learning Technique
No ratings yet
EASESUM: An Online Abstractive and Extractive Text Summarizer Using Deep Learning Technique
12 pages
Malayalam 2
No ratings yet
Malayalam 2
4 pages
Operating
No ratings yet
Operating
3 pages
Text Summarization:An Overview: October 2013
No ratings yet
Text Summarization:An Overview: October 2013
6 pages
5bbb PDF
No ratings yet
5bbb PDF
6 pages
Seminar - Report - PYLI - RAGHURAM - Entire Document Ready
No ratings yet
Seminar - Report - PYLI - RAGHURAM - Entire Document Ready
26 pages
Jaya D. Kapoor Alamuri Ratnamala Institute of Engineering and Technology, Shahpur Kailas K. Devadkar Sardar Patel Institute of Technology, Andheri
No ratings yet
Jaya D. Kapoor Alamuri Ratnamala Institute of Engineering and Technology, Shahpur Kailas K. Devadkar Sardar Patel Institute of Technology, Andheri
6 pages
NLP Report
No ratings yet
NLP Report
14 pages
Conceptual Framework For Abstractive Text Summarization
No ratings yet
Conceptual Framework For Abstractive Text Summarization
11 pages
Feature Based Automatic Text Summarization Methods a Comprehensive State-Of-The-Art Survey
No ratings yet
Feature Based Automatic Text Summarization Methods a Comprehensive State-Of-The-Art Survey
23 pages
Robin 3 PDF
No ratings yet
Robin 3 PDF
6 pages
1 Extractive Text Summarization Technique Based Fuzzy Membership Calculation Using Roughsets
No ratings yet
1 Extractive Text Summarization Technique Based Fuzzy Membership Calculation Using Roughsets
15 pages
Assessing Sentence Scoring Techniques Fo
No ratings yet
Assessing Sentence Scoring Techniques Fo
10 pages
A Jaccards Similarity Score Based Methodology For Kannada Text Document Summarization
No ratings yet
A Jaccards Similarity Score Based Methodology For Kannada Text Document Summarization
4 pages
IEEE_Conference_Template__3_.pdf
No ratings yet
IEEE_Conference_Template__3_.pdf
4 pages
Automatic Text Summarization Using Natural Language Processing
No ratings yet
Automatic Text Summarization Using Natural Language Processing
54 pages
Automatic Text Summarization Using Natural Language Processing PDF
No ratings yet
Automatic Text Summarization Using Natural Language Processing PDF
54 pages
Abstractive Text Summarization Using Transformer Architecture
No ratings yet
Abstractive Text Summarization Using Transformer Architecture
5 pages
Tsreport
No ratings yet
Tsreport
25 pages
Explanation Based Learning: Fundamentals and Applications
From Everand
Explanation Based Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Present Tense
No ratings yet
Present Tense
18 pages
Cambridge IGCSE 0500 Paper 1 (XIII)
No ratings yet
Cambridge IGCSE 0500 Paper 1 (XIII)
7 pages
Present Continuous: A2.1 Bpo Sena Project
No ratings yet
Present Continuous: A2.1 Bpo Sena Project
7 pages
NDA ENG Practice Question
No ratings yet
NDA ENG Practice Question
134 pages
Technical Writing For Scientific Abstract: Research Methedelogy Course
No ratings yet
Technical Writing For Scientific Abstract: Research Methedelogy Course
7 pages
PSI Mission X Series -37 Introduction to Translation -2
No ratings yet
PSI Mission X Series -37 Introduction to Translation -2
7 pages
List of Adverbs List of Adverbs
No ratings yet
List of Adverbs List of Adverbs
1 page
Homophones: Hour Our
No ratings yet
Homophones: Hour Our
13 pages
Let pdf-1
No ratings yet
Let pdf-1
5 pages
11.transfer of Learning
No ratings yet
11.transfer of Learning
9 pages
UNIT 14 Video Worksheets
No ratings yet
UNIT 14 Video Worksheets
2 pages
Applied Linguistics M
No ratings yet
Applied Linguistics M
4 pages
MODULE 14 - Exercises About Past Perfect Tense
No ratings yet
MODULE 14 - Exercises About Past Perfect Tense
2 pages
District Trasformation Programme Bahasa Inggeris 2018
No ratings yet
District Trasformation Programme Bahasa Inggeris 2018
24 pages
Pronoun Report
No ratings yet
Pronoun Report
17 pages
Colloqualism and Slang
No ratings yet
Colloqualism and Slang
12 pages
Festive Phrasal Verbs
100% (1)
Festive Phrasal Verbs
3 pages
Understanding Adjectives P5
No ratings yet
Understanding Adjectives P5
12 pages
Passive Voice PPT Galit and Orly
No ratings yet
Passive Voice PPT Galit and Orly
29 pages
DEFINITION-OF-TERMS Research
No ratings yet
DEFINITION-OF-TERMS Research
21 pages
12-Future Perfect Continuous Tense
No ratings yet
12-Future Perfect Continuous Tense
4 pages
A, An, The - Articles
No ratings yet
A, An, The - Articles
9 pages
Grammar Sheets
No ratings yet
Grammar Sheets
11 pages
English 9 Intervention Test
No ratings yet
English 9 Intervention Test
3 pages
Sentence-Correction-for-Competative-Exms (1)
No ratings yet
Sentence-Correction-for-Competative-Exms (1)
37 pages
KSJ 12 Eng First Mid Term Test - 2024
No ratings yet
KSJ 12 Eng First Mid Term Test - 2024
4 pages
Action Verbs WDWZM
No ratings yet
Action Verbs WDWZM
4 pages
Discourse and critical Discourse Analysis Assignment
No ratings yet
Discourse and critical Discourse Analysis Assignment
14 pages
C2GV. ĐỀ KHẢO SÁT-03.2025
No ratings yet
C2GV. ĐỀ KHẢO SÁT-03.2025
6 pages
English 3 - Quiz # 2 (Week 3-4)
No ratings yet
English 3 - Quiz # 2 (Week 3-4)
5 pages

An Overview of Extractive Based Automati

Uploaded by

An Overview of Extractive Based Automati

Uploaded by

International Journal of Computer Science & Information Technology (IJCSIT) Vol 8, No 5, October 2016

AN OVERVIEW OF EXTRACTIVE BASED AUTOMATIC

organized as follows. Section 1 describes a brief introduction about text summarization

2. EARLY WORK ON TEXT SUMMARIZATION

Table 1: Technological Developments in Text Summarization

Year Methods Advantages Disadvantages Models

2.1 SOME MODELS IN EXTRACTIVE TEXT SUMMARIZATION

2.1.1 LUHN METHOD (1958)[25]

2.1.2 BOXENDALE MODEL (1958)[6]

2.1.3 EDMUNDSON METHOD (1969)[11]

Si= w1*Ci+w2*Ki+w3*Ti+w4*Li……………. (1)

2.1.5 ANES (BRANDOW, MITZE AND RAU 1995)[8]

a. Calculation of the tf*idf weights for all terms.

2.1.6 BARZILAY & ELAHADAD SYSTEM, 1997[4]

2.1.7 BOGURAEV, BRANIMIR & KENNEDY (BOGURAEV, BRANIMIR AND CHRISTOPHER

2.1.8 FOCISUM (KAN, MIN-YEN AND KATHLEEN MCKEOWN (1999))[20]

2.1.9 SUMMARIST (HOVY AND LIN 1999 )[16]

2.1.10 MULTIGEN (BARZILAY, MCKEOWN AND ELHADAD, 1999)[4]

2.1.12 CONROY ET AL. (CONROY, J. M. & O'LEARY, D. P., 2001)[9]

2.1.13 SWESUM( HERCULES DALIANIS., 2000)[34]

2.1.14 MEAD (RADEV, H. Y. JING, M. STYS AND D. TAM, 2001)[30]

2.1.15 WEBINESSENCE (RADEV, 2001)[30]

2.1.16 TEXT SUMMARIZATION USING TERM WEIGHTS (R.C.BALABANTARY, D.K.SAHOO,

2.1.17 LSA FOR DOCUMENT SUMMARIZATION [22]

2.1.18 POURVALI AND ABADEH MOHAMMAD (2012) [29]

2.1.19 S.T. KHUSHBOO, R.V.D.DHARASKAR AND M.B.CHANDAK (2010)[13]

2.1.20 DISCOURSE BASED SUMMARIZER (LI CHENGCHENG, 2010)[24]

3. COMPARISON OF EXTRACTIVE TEXT SUMMARIZATION MODELS

No Models Criteria Type Level Corpus Advantages Disadvantages

Jing, M. docum documents.

4. EVALUATION OF SUMMARIZATION SYSTEMS

Evaluation of summaries is an important aspect of text summarization. A general policy to

You might also like

Si= w1Ci+w2Ki+w3Ti+w4Li……………. (1)