NLP Unit-V
NLP Unit-V
Language Modeling
• Introduction
• n-Gram Models
• Language Model Evaluation
• Parameter Estimation
• Language Model Adaptation
• Types of Language Models
• Language Specific Modeling Problems
• Multilingual and Cross-lingual Language Modeling
Introduction
• Statistical Language Model is a model that specifies the a priori
probability of a particular word sequence in the language of interest.
• Given an alphabet or inventory of units ∑ and a sequence W=
w1w2…..wt ϵ ∑* a language model can be used to compute the
probability of W based on parameters previously estimated from a
training set.
• The inventory ∑ is the list of unique words encountered in the training
data.
• Selecting the units over which a language model should be defined is
a difficult problem particularly in languages other than English.
Introduction
• A language model is combined with other model or models that
hypothesize possible word sequences.
• In speech recognition a speech recognizer combines acoustic model
scores with language model scores to decode spoken word sequences
from an acoustic signal.
• Language models have also become a standard tool in information
retrieval, authorship identification, and document classification.
n-Gram Models
• Finding the probability of a word sequence of arbitrary length is not
possible in natural language because natural language permits infinite
number of word sequences of variable length.
• The probability P(W) can be decomposed into a product of component
probabilities according to the chain rule of probability:
• Since the individual terms in the above product are difficult to compute
directly n-gram approximation was introduced.
n-Gram Models
• The assumption is that all the preceding words except the n-1 words
directly preceding the current word are irrelevant for predicting the
current word.
• Hence P(W) is approximated to:
• Where n1,n2,….. are the counts of n-grams with one, two, …, counts.
Maximum-Likelihood Estimation and
Smoothing
• Another common way of smoothing language model estimates is linear
model interpolation.
• In linear interpolation, M models are combined by
• The α parameter is fixed for all contexts rather than being dependent
on the lower-order n-gram.
Large-Scale Language Models
• An alternative possibility is to use large-scale distributed language
models at a second pass rescoring stage only, after first-pass
hypotheses have been generated using a smaller language model.
• The overall trend in large-scale language modeling is to abandon
exact parameter estimation of the type described in favor of
approximate techniques.
Language Model Adaptation
• Language model adaptation is about designing and tuning model such that
it performs well on a new test set for which little equivalent training data is
available.
• The most commonly used adaptation method is that of mixture language
models or model interpolation.
• One popular method is topic-dependent language model adaptation.
• The documents are first clustered into a large number of different topics
and individual language models can be built for each topic cluster.
• The desired final model is then fine-tuned by choosing and interpolating a
smaller number of topic-specific language models.
Language Model Adaptation
• A form of dynamic self-adaptation of a language model is provided by
trigger models.
• The idea is that in accordance with the underlying topic of the text,
certain word combinations are more likely than other to co-occur.
• Some words are said to trigger others for example the words stock
and market in a financial news text.