0% found this document useful (0 votes)
111 views

EXT Ummarization: Kareem El-Sayed Hashem Mohamed Mohsen Brary

The document discusses text summarization using machine learning techniques. It describes the goal of text summarization as reducing a text to create a summary that retains the most important points. It also discusses single and multiple document summarization, extractive versus abstractive summarization, and supervised and unsupervised methods for content selection. Key algorithms mentioned include TF-IDF, LLR, MMR, and ROUGE for evaluation.

Uploaded by

Sujal Chawala
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views

EXT Ummarization: Kareem El-Sayed Hashem Mohamed Mohsen Brary

The document discusses text summarization using machine learning techniques. It describes the goal of text summarization as reducing a text to create a summary that retains the most important points. It also discusses single and multiple document summarization, extractive versus abstractive summarization, and supervised and unsupervised methods for content selection. Key algorithms mentioned include TF-IDF, LLR, MMR, and ROUGE for evaluation.

Uploaded by

Sujal Chawala
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Text Summarization - Machine Learning

TEXT SUMMARIZATION
1

Kareem El-Sayed Hashem Mohamed Mohsen Brary

TEXT SUMMARIZATION

Goal: reducing a text with a computer program in order to create a summary that retains the most important points of the original text. Summarization Applications
summaries of email threads action items from a meeting simplifying text by compressing sentences

Text Summarization - Machine Learning

WHAT TO SUMMARIZE? SINGLE VS. MULTIPLE DOCUMENTS

Single Document Summarization

Given a single document produce


Text Summarization - Machine Learning

Abstract Outline Headline

Multiple Document Summarization

Given a group of document produce a gist of the document


A series of news stories of the same event A set of webpages about some topic or question

QUERY-FOCUSED SUMMARIZATION & GENERIC SUMMARIZATION

Generic Summarization

Summarize the content of a document


Text Summarization - Machine Learning

Query-focused Summarization
Summarize a document with respect to an information need expressed in a user query A kind of complex question answering

Answer a question by summarizing a document that has the information to construct the answer

SUMMARIZATION FOR QUESTION ANSWERING:

Snippets

Create snippets summarizing a web page for a query


Text Summarization - Machine Learning

Multiple Documents

Create answer to complex questions summarizing multiple documents.


Instead of giving a snippet for each document Create a cohesive answer that combines information from each document

EXTRACTIVE SUMMARIZATION & ABSTRACTIVE SUMMARIZATION

Extractive Summarization:

Create the summary from phrases or sentences in the source document(s)

Text Summarization - Machine Learning

Abstractive Summarization

Express the ideas in the source document using different words

SUMMARIZATION: THREE STAGES


Content Selection: choose sentences to extract from the document Information Ordering: choose an order to place them in the summary Sentence Realization: clean up the sentence

Text Summarization - Machine Learning

UNSUPERVISED CONTENT SELECTION

Intuition Dating Back to Luhn (1958):

Choose sentences that have distinguished or informative words


tf-idf: weigh each word wi in document j by tf-idf

Text Summarization - Machine Learning

Two Approaches to Define distinguished words

Topic signature: choose smaller set of distinguished words

Log-likelihood ratio (LLR)


8

TOPIC SIGNATURE-BASED CONTENT


SELECTION WITH QUERIES

Choose words that are informative either

By log-likelihood ratio (LLR) Or by appearing in the query

Text Summarization - Machine Learning

Weigh a sentence by weight of its words:

SUPERVISED CONTENT SELECTION

Given

A labeled training set of good summaries for each document


The sentences in the document with sentences in the summary

Text Summarization - Machine Learning

Align

Extract Features
Position Length of sentence Word informativeness Cohesion

10

SUPERVISED CONTENT SELECTION

Train

A binary classifier (put sentence in summary? Yes or no)

Text Summarization - Machine Learning

Problems

Hard to get labeled training data Alignment is difficult Performance not better that unsupervised algorithm

11

EVALUATING SUMMARIES: ROUGE

ROUGE Recall Oriented Understudy for Gisting Evaluation


Text Summarization - Machine Learning

Internal metric for automatically evaluating summaries


Based on BLEU (a metric used for machine translation) Not as good as human evaluation. But much more convenient

12

EVALUATING SUMMARIES: ROUGE

Given a document D, and an automatic summary X:


Have N humans produce a set of reference summaries of D Run System, giving automatic summary X What percentage of the bigrams from the reference summaries appear in X?

13

Text Summarization - Machine Learning

EXAMPLE
Human 1: water spinach is a green leafy vegetable grown in the tropics. Human 2: water spinach is a semi-aquatic tropical plant grown as a vegetable. Human 3: water spinach is a commonly eaten leaf vegetable of Asia.

Text Summarization - Machine Learning

System: water spinach is a leaf vegetable commonly eaten in tropical areas of Asia. ROUGE -2= = 12/28 = 0.43

14

ANSWERING HARDER QUESTION: QUERY-FOCUSED MULTI-DOCUMENT


SUMMARIZATION

The (bottom-up) snippet method

Find a set of relevant documents Extract informative sentences form the documents Order and modify the sentences into an answer

Text Summarization - Machine Learning

The(top-down) information extraction method

Build specific answers for different questions types:


Definition questions Biography questions Certain medical questions

15

QUERY-FOCUSED MULTI-DOCUMENT
SUMMARIZATION

16

Text Summarization - Machine Learning

MAXIMAL MARGINAL RELEVANCE (MMR)


An iterative method for content selection from multiple documents Iteratively (greedily) choose the best sentence to insert in the summary/answer so far:

Text Summarization - Machine Learning

Relevant: maximally relevant to the user query

High cosine similarity to the query

Novel: minimally redundant with the summary so far:

Low cosine similarity to the summary

Stop when desired length

17

LLR + MMR CHOOSING INFORMATIVE YET NON-REDUNDANT SENTENCES

One of many ways to combine the intuitions of LLR and MMR:


Text Summarization - Machine Learning

Score each sentence based on LLR (including query words) Include the sentence with highest score in the summary Iteratively add into the summary high-scoring sentences that are not redundant with the summary so far.

18

INFORMATION ORDERING

Chronological ordering:

Order sentences by the date of the document for summarizing news

Text Summarization - Machine Learning

Coherence:
Choose ordering that make neighboring sentences similar(by cosine similarity) Choose ordering in which neighboring sentences discuss the same entity

Topical ordering

Learn the ordering of topics in the source document

19

DOMAIN-SPECIFIC ANSWERING:
THE INFORMATION EXTRACTION METHOD

A good biography of a person contains:

A persons birth/death, fame factor, education etc


Text Summarization - Machine Learning

A good definition contains

Type or category The Hajj is a type of ritual

A medical answer about a drugs use contains:


The problem : medical condition The intervention : drug or procedure The outcome : the result of the study

20

INFORMATION THAT SHOULD BE IN THE ANSWER FOR 3 KINDS OF QUESTIONS

21

Text Summarization - Machine Learning

ARCHITECTURE FOR ANSWERING COMPLEX


QUESTIONS

22

Text Summarization - Machine Learning

REFERENCES:

NLP Stanford course.

23

Text Summarization - Machine Learning

Text Summarization - Machine Learning

THANK YOU
24

You might also like