Evaluating Language Models
Evaluating Language Models
PP(W) by
chain rule
PP(W)
Natural Language Processing 5
Evaluating Language Models
Perplexity as branching factor
• Perplexity can be seen as the weighted average branching factor of a
language.
– The branching factor of a language is the number of possible next words that
can follow any
word.
• Let’s suppose a sentence consisting of random digits
• What is the perplexity of this sentence according to a model that assign
P=1/10 to each digit?
• Zeros: things that don’t ever occur in the training set but do occur in the
test set causes problem for two reasons.
– First, underestimating the probability of all sorts of words that might occur,
– Second, if probability of any word in test set is 0, entire probability of test set is 0.
Natural Language Processing 8
Unknown Words
• We have to deal with words we haven’t seen before, which we call
unknown words.
• We can model these potential unknown words in the test set by adding
a pseudo-word
called <UNK> into our training set too.
P(wi) = ci / N
• Laplace smoothing adds one to each count. Since there are V words in
the vocabulary and each one was incremented, we also need to adjust
the denominator to take into account the extra V observations.
PLaplace(wi) = (ci + 1) / (N + V)
• The sharp change in counts and probabilities occurs because too much
probability
mass is moved to all the zeros.
• One alternative to add-one smoothing is to move a bit less of the probability
mass from the seen to the unseen events.
• Instead of adding 1 to each count, we add a fractional count k (.5? .05? .01?).
• This algorithm is called add-k smoothing.