0% found this document useful (0 votes)

25 views21 pages

Evaluating Language Models

The document discusses different methods for evaluating language models, including intrinsic and extrinsic evaluation. Intrinsic evaluation assesses model quality independently of applications by measuring how well trained models predict held-out test data using metrics like perplexity. Extrinsic evaluation uses models in tasks like translation to compare performance. The document also covers techniques like smoothing to address zero probabilities for unseen events in test data.

Uploaded by

Mahesh Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views21 pages

Evaluating Language Models

Uploaded by

Mahesh Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Evaluating Language Models

Natural Language Processing 1

Evaluating Language Models
• Does our language model prefer good sentences to bad
ones?
– Assign higher probability to “real” or “frequently
observed” sentences than “ungrammatical” or
“rarely observed” sentences?
• We train parameters of our model on a training set.
• We test the model’s performance on data we haven’t
seen.
– A test set is an unseen dataset that is different from
our training set, totally unused.
•
Natural Language Processing 2
Evaluating Language Models
Extrinsic Evaluation
• Extrinsic Evaluation of a N-gram language model is to use it
in an application and measure how much the application
improves.
• To compare two language models A and B:
– Use each of language model in a task such as spelling corrector, MT
system.
– Get an accuracy for A and for B
• How many misspelled words corrected properly
• How many words translated correctly
– Compare accuracy for A and B
• The model produces the better accuracy is the better model.
Natural Language Processing 3
Evaluating Language Models
Intrinsic Evaluation
• An intrinsic evaluation metric is one that measures the
quality of a model independent of any application.
• When a corpus of text is given and to compare two different
n-gram models,
– Divide the data into training and test sets,
– Train the parameters of both models on the training set,
and
– Compare how well the two trained models fit the test set.
• Whichever model assigns a higher probability to the
test set
• In practice, probability-based metric called perplexity is used
instead of raw probability as our metric for evaluating
language models. Natural Language Processing 4
Evaluating Language Models
Perplexity
• The best language model is one that best predicts an unseen test set
– Gives the highest P(testset)
• The perplexity of a language model on a test set is the inverse probability of the
test set, normalized by the number of words.
• Minimizing perplexity is the same as maximizing probability
• The perpelexity PP for a test set W=w1w2…wn is

PP(W) by
chain rule

• The perpelexity PP for bigrams:

PP(W)
Natural Language Processing 5
Evaluating Language Models
Perplexity as branching factor
• Perplexity can be seen as the weighted average branching factor of a
language.
– The branching factor of a language is the number of possible next words that
can follow any
word.
• Let’s suppose a sentence consisting of random digits
• What is the perplexity of this sentence according to a model that assign
P=1/10 to each digit?

Natural Language Processing 6

Evaluating Language Models
Perplexity
• Lower perplexity = better model
• Training 38 million words, test 1.5 million words, WSJ

Natural Language Processing 7

Generalization and Zeros
• The n-gram model, like many statistical models, is dependent on the
training corpus.
– the probabilities often encode specific facts about a given training corpus.
• N-grams only work well for word prediction if the test corpus looks
like the training corpus
– In real life, it often doesn’t
– We need to train robust models that generalize!
– One kind of generalization: Getting rid of Zeros!
• Things that don’t ever occur in the training set, but occur in the test set

• Zeros: things that don’t ever occur in the training set but do occur in the
test set causes problem for two reasons.
– First, underestimating the probability of all sorts of words that might occur,
– Second, if probability of any word in test set is 0, entire probability of test set is 0.
Natural Language Processing 8
Unknown Words
• We have to deal with words we haven’t seen before, which we call
unknown words.
• We can model these potential unknown words in the test set by adding
a pseudo-word
called <UNK> into our training set too.

• One way to handle unknown words is:

– Replace words in the training data by <UNK> based on their frequency.
• For example we can replace by <UNK> all words that occur fewer than n
times in the training set, where n is some small number, or
• Equivalently select a vocabulary size V in advance (say 50,000) and choose
the top V words by frequency and replace the rest by <UNK>.
– Proceed to train the language model as before, treating <UNK> like a regular
word. Natural Language Processing 9
Smoothing
• To keep a language model from assigning zero probability to
these unseen events, we’ll have to shave off a bit of
probability mass from some more frequent events and give
it to the events we’ve never seen.
• This modification is called smoothing (or discounting).
• There are many ways to do smoothing, and some of them
are:
– Add-1 smoothing (Laplace Smoothing)
– Add-k smoothing,
– Backoff
Natural Language Processing 12
Laplace Smoothing
• The simplest way to do smoothing is to add one to all
the counts, before we normalize them into
probabilities.
– All the counts that used to be zero will now have a count of 1, the
counts of 1 will be 2, and so on.
• This algorithm is called Laplace smoothing (Add-1
Smoothing).
• We pretend that we saw each word one more time than we
did, and we just add one to
all the counts!,

Natural Language Processing 13

Laplace Smoothing
( Add-1
Laplace Smoothing forSmoothing
Unigrams: )
• The unsmoothed maximum likelihood estimate of the unigram
probability of the word
wi is its count ci normalized by the total number of word tokens N:

P(wi) = ci / N
• Laplace smoothing adds one to each count. Since there are V words in
the vocabulary and each one was incremented, we also need to adjust
the denominator to take into account the extra V observations.

PLaplace(wi) = (ci + 1) / (N + V)

Natural Language Processing 14

Laplace Smoothing for Bigrams
• The normal bigram probabilities are computed by normalizing each
bigram counts by the unigram count:

• Add-one smoothed bigram probabilities:

Natural Language Processing 15

Laplace-smoothed Bigrams
Corpus: Berkeley Restaurant Project Sentences

Natural Language Processing 19

Laplace-smoothed Bigrams
Corpus: Berkeley Restaurant Project Sentences: Adjusted counts

Natural Language Processing 20

Add-k Smoothing
• Add-one smoothing has made a very big change to the counts.
– C(want to) changed from 608 to 238!
– P(to|want) decreases from .66 in the unsmoothed case to .26 in the smoothed case.
– Looking at discount d shows us how counts for each prefix word have been reduced;
• discount for bigram want to is .39, while discount for Chinese food is .10, a factor of 10

• The sharp change in counts and probabilities occurs because too much
probability
mass is moved to all the zeros.
• One alternative to add-one smoothing is to move a bit less of the probability
mass from the seen to the unseen events.
• Instead of adding 1 to each count, we add a fractional count k (.5? .05? .01?).
• This algorithm is called add-k smoothing.

Natural Language Processing 21

Pcnsa PDF
100% (1)
Pcnsa PDF
47 pages
OBGYN and Infertility Handbook For Clinicians
100% (3)
OBGYN and Infertility Handbook For Clinicians
237 pages
NLP m2
No ratings yet
NLP m2
74 pages
N Grams
No ratings yet
N Grams
51 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
NLP_Unit2 (2)
No ratings yet
NLP_Unit2 (2)
65 pages
NLP-UNITS-IV-V
No ratings yet
NLP-UNITS-IV-V
30 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
6.Chapter6_LanguageModel
No ratings yet
6.Chapter6_LanguageModel
33 pages
Natural Language Processing
No ratings yet
Natural Language Processing
28 pages
2. Language Modeling
No ratings yet
2. Language Modeling
50 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
Statistical Inference
No ratings yet
Statistical Inference
38 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
Ngrams
100% (1)
Ngrams
22 pages
Unit 2b
No ratings yet
Unit 2b
22 pages
5)Lecture-Feb11&13&17&18
No ratings yet
5)Lecture-Feb11&13&17&18
21 pages
Natural Language Processing_Notes_Unit 2.docx
No ratings yet
Natural Language Processing_Notes_Unit 2.docx
19 pages
April 22 Part 2achine translation
No ratings yet
April 22 Part 2achine translation
36 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
Module 2
No ratings yet
Module 2
98 pages
3 LM 2024
No ratings yet
3 LM 2024
78 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
2. ngram
No ratings yet
2. ngram
41 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
NLP PLM
No ratings yet
NLP PLM
35 pages
lm24aug
No ratings yet
lm24aug
84 pages
Unit 2
No ratings yet
Unit 2
7 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
ai
No ratings yet
ai
13 pages
NLP_Module 2(1)
No ratings yet
NLP_Module 2(1)
77 pages
Chapter-02.2
No ratings yet
Chapter-02.2
42 pages
3-Lecture Three - (Chapter Two-N-gram Language Models)
No ratings yet
3-Lecture Three - (Chapter Two-N-gram Language Models)
28 pages
3 LM Jan 08 2021
No ratings yet
3 LM Jan 08 2021
77 pages
9-10. Evaluation of Language Models and Smoothing
No ratings yet
9-10. Evaluation of Language Models and Smoothing
10 pages
NLP 1.2
No ratings yet
NLP 1.2
22 pages
Unit V-AI-KCS071
No ratings yet
Unit V-AI-KCS071
28 pages
Language Modeling
No ratings yet
Language Modeling
88 pages
Assignment 0 DL4NLP-1
No ratings yet
Assignment 0 DL4NLP-1
4 pages
3_LM_2024
No ratings yet
3_LM_2024
78 pages
plm.17
No ratings yet
plm.17
15 pages
module 2
No ratings yet
module 2
26 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
56 pages
NLP Module 2
No ratings yet
NLP Module 2
18 pages
N-Gram Language Models Lecture
No ratings yet
N-Gram Language Models Lecture
59 pages
module-1 ch-2
No ratings yet
module-1 ch-2
31 pages
AI Unit V
No ratings yet
AI Unit V
64 pages
NLP CH 2
No ratings yet
NLP CH 2
59 pages
Lecture 03
No ratings yet
Lecture 03
41 pages
language modelling_
No ratings yet
language modelling_
17 pages
Lecture 6 to 8 N-gram
No ratings yet
Lecture 6 to 8 N-gram
19 pages
14 Ngramlm
No ratings yet
14 Ngramlm
67 pages
Cs224n 2023 Lecture05 RNNLM
No ratings yet
Cs224n 2023 Lecture05 RNNLM
68 pages
UNIT II_NLP
No ratings yet
UNIT II_NLP
35 pages
Unit 5-Aiml
No ratings yet
Unit 5-Aiml
25 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
Lecture04-Ngram Lang Models
No ratings yet
Lecture04-Ngram Lang Models
39 pages
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
No ratings yet
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
36 pages
ACT Flashcards, Fourth Edition: Up-to-Date Review
From Everand
ACT Flashcards, Fourth Edition: Up-to-Date Review
Barron's Educational Series
No ratings yet
How to Find Inter-Groups Differences Using Spss/Excel/Web Tools in Common Experimental Designs: Book 1
From Everand
How to Find Inter-Groups Differences Using Spss/Excel/Web Tools in Common Experimental Designs: Book 1
P.Y. Cheng
No ratings yet
Marketing Individual Assignment Draft
No ratings yet
Marketing Individual Assignment Draft
13 pages
4.5.1.6 Strategic Planning Guide 0
No ratings yet
4.5.1.6 Strategic Planning Guide 0
15 pages
Dasheen Bush Guide
No ratings yet
Dasheen Bush Guide
7 pages
Definitions of Indexes and Scales in Research
No ratings yet
Definitions of Indexes and Scales in Research
10 pages
Test Paper - 24-10-2019
No ratings yet
Test Paper - 24-10-2019
4 pages
MC Brandcenter READ-ME 181130
No ratings yet
MC Brandcenter READ-ME 181130
1 page
Cousin Sister Ki Pehli Chudai – Part 3 - Desi Kahani
No ratings yet
Cousin Sister Ki Pehli Chudai – Part 3 - Desi Kahani
17 pages
THE IMPACT OF REMITTANCES ON Ethiopia
No ratings yet
THE IMPACT OF REMITTANCES ON Ethiopia
16 pages
Roll Nos Slips 2024
No ratings yet
Roll Nos Slips 2024
66 pages
Sales Management :buyer's Behaviour
No ratings yet
Sales Management :buyer's Behaviour
31 pages
Quarter 3
No ratings yet
Quarter 3
6 pages
Gamification Mechanics and Elements
No ratings yet
Gamification Mechanics and Elements
23 pages
Module 23 - Edited
No ratings yet
Module 23 - Edited
18 pages
Relation and Functions
100% (3)
Relation and Functions
14 pages
Video Lesson and Marungko Approach: A Way To Upgrade The Reading Skills of Grade 1 Learners
No ratings yet
Video Lesson and Marungko Approach: A Way To Upgrade The Reading Skills of Grade 1 Learners
6 pages
Legends and Landscape Introduction PDF
No ratings yet
Legends and Landscape Introduction PDF
12 pages
Employee Relations
No ratings yet
Employee Relations
23 pages
Sensitivity: Internal & Restricted
No ratings yet
Sensitivity: Internal & Restricted
9 pages
Aiou Past Papers
No ratings yet
Aiou Past Papers
7 pages
The Designer Method of 1970 S
No ratings yet
The Designer Method of 1970 S
10 pages
Caring For Your Skin After A Skin Graft
No ratings yet
Caring For Your Skin After A Skin Graft
8 pages
Schoenberg A Survival From Varsavia
No ratings yet
Schoenberg A Survival From Varsavia
2 pages
Work and Strain Energy
No ratings yet
Work and Strain Energy
47 pages
Oedipus Story
No ratings yet
Oedipus Story
3 pages
CINEC - Syllabus - Electronics and Telecommunications
No ratings yet
CINEC - Syllabus - Electronics and Telecommunications
35 pages
Vol. 2, Issue 6, June 2014, PharmaTutor, Paper-20
No ratings yet
Vol. 2, Issue 6, June 2014, PharmaTutor, Paper-20
7 pages

Evaluating Language Models

Uploaded by

Evaluating Language Models

Uploaded by

Evaluating Language Models

Natural Language Processing 1

• The perpelexity PP for bigrams:

Natural Language Processing 6

Natural Language Processing 7

• One way to handle unknown words is:

Natural Language Processing 13

Natural Language Processing 14

• Add-one smoothed bigram probabilities:

Natural Language Processing 15

Natural Language Processing 19

Natural Language Processing 20

Natural Language Processing 21

You might also like