0% found this document useful (0 votes)

66 views

Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology

Language modeling involves creating statistical or rule-based models of natural language to process and understand it computationally. There are two main approaches: grammar-based modeling uses syntactic rules while statistical modeling estimates word probabilities from large corpora. N-gram models are a common statistical approach that estimate the probability of a word based on the previous n-1 words. These probabilities are calculated from word counts in training data. Language models are essential for applications like speech recognition, machine translation, and spelling correction.

Uploaded by

Rohan Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views

Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology

Uploaded by

Rohan Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 36

LANGUAGE MODELING

Prabhleen Juneja
Thapar Institute of Engineering & Technology
INTRODUCTION
 Natural Language is a complex entity and in order to process it
through a computer based program, we need to build a model of
it.
 A language model is a description of a language that can be
used to check whether a string is a valid member of the
language or not.
 The process of creation of language models is called language
modeling.
 There are two major approaches used for language modeling
namely Grammar-Based Language Modeling and Statistical
Language Modeling
GRAMMAR-BASED LANGUAGE
MODELING
 A grammar-based approach uses the grammar of a language to
create a model.
 It attempts to represent the syntactic structure of the language.

 Grammar consists of hand-coded rules defining the structure

and ordering of various constituents appearing in the linguistic
unit (phrase, sentence, etc.)
 Various computational grammars have been studied and
proposed such as phrase structure grammar, transformational
grammar, functional grammar, government and binding,
dependency grammar, paninian grammar, tree-adjoining
grammar,etc.
 The major limitation of grammar-based language modeling is
that these methods are language-depenedent.
STATISTICAL LANGUAGE MODELING
 Statistical Language Modeling aims to build a statistical language model
that can estimate the distribution of natural language as accurate as
possible.
 A statistical language model (SLM) is a probability distribution P(s) over
strings S that attempts to reflect how frequently a string S occurs as a
sentence.
 In other words, statistical language model attempts to identify a sentence
based on its probability measure.
 Statistical language models attempt the capture the regularities of natural
language using large training corpus for the purpose of improving
performance of various NLP applications.
 The original (and is still the most important) application of SLMs is
speech recognition, but SLMs also play a vital role in various other
natural language applications as diverse as machine translation,
handwritten recognition, intelligent input method, part-of-speech and Text
To Speech system.
STATISTICAL LANGUAGE MODELING CONTD..

 SLMs are essential in any task in which we have to identify

words in noisy, ambiguous input.
 In speech recognition, for example, the input speech sounds
are very confusable and many words sound extremely similar.
 In speech recognition, the computer tries to match sounds
with word sequences.
 The language model provides context to distinguish between
words and phrases that sound similar.
 For example, in American English, the phrases "recognize
speech" and "wreck a nice beach" are pronounced almost the
same but mean very different things. These ambiguities are
easier to resolve when evidence from the language model is
incorporated
STATISTICAL LANGUAGE MODELING CONTD..

 In OCR & Handwriting recognition – More probable

sentences are more likely correct readings.
 In the movie Take the Money and Run, Woody Allen tries to
rob a bank with a sloppily written hold-up note that the teller
incorrectly reads as “I have a gub”.
 Any speech and language processing system could avoid
making this mistake by using the knowledge that the
sequence “I have a gun” is far more probable than the non-
word “I have a gub”
 In Machine translation – More likely sentences are probably
better translations
STATISTICAL LANGUAGE MODELING CONTD..

 Suppose we are translating a Chinese source sentence and

as part of the process we have a set of potential rough English
translations:
 he briefed to reporters on the chief contents of the statement
 he briefed reporters on the chief contents of the statement
 he briefed to reporters on the main contents of the statement
 he briefed reporters on the main contents of the statement

 A SLM might tell us that, even after controlling for length,

briefed reporters is more likely than briefed to reporters, and
main contents is more likely than chief contents.
STATISTICAL LANGUAGE MODELING CONTD..

 In spelling correction, we need to find and correct spelling

errors like the following that accidentally result in real
English words:
 They are leaving in about fifteen minuets to go to her house.
 The design an construction of the system will take more than a
year.
 Since these errors have real words, we can’t find them by just flagging
words that are not in the dictionary. But note that in about fifteen
minuets is a much less probable sequence than in about fifteen
minutes.
 A spellchecker can use a probability estimator both to detect these
errors and to suggest higher-probability corrections.
STATISTICAL LANGUAGE MODELING CONTD..

 Word prediction is also important for augmentative

communication systems that help the disabled.
 People who are unable to use speech or sign- language to
communicate, like the physicist Steven Hawking, can
communicate by using simple body movements to select
words from a menu that are spoken by the system.
 Word prediction can be used to suggest likely words for the
menu.
STATISTICAL LANGUAGE MODELING
 The goal of statistical models is to estimate the probability (likelihood) of a
sentence.
 This is achieved by decomposing the sentence probability into a product of
conditional probabilities using chain rule as follows:

P( S )  P( w1w2 w3 .......... ...... wn )

 P( w1 ) P( w2 | w1 ) P( w3 | w1w2 )......... ... P( wn | w1w2 w3 .......... ...wn 1 )
n
  P( w | h )
i 1
i i

where hi is the history of the ith word defined as:

hi  w1 w2 w3 .......... ..... wi 1
Thus, in order to calculate the probability of the sentence, we need to
calculate the probability of the word, given the sequence of word
preceding it.
STATISTICAL LANGUAGE MODELING
 It is quite difficult to estimate the probabilities of words
following very long strings because of the following reasons:
 The language is very creative; new sentences are created all the
time, and we won’t be able to count all the sentences.
 Any particular context might have not occurred ever before in the
training corpus.

 An N-Gram model simplifies the task of these probability

estimation.
N-GRAM BASED LANGUAGE MODELING
 A n-gram model approximate the probability of the word given all previous
word sequences by the conditional probability given previous n-1 words
only.
P( wi | hi )  P( wi | wi  n 1.......... .wi 1 )

 A model that limits the history to previous one word only is called bi-gram
model (n=2)
P( wi | hi )  P ( wi | wi 1 )

 A tri-gram model limits the history to previous two words only (n=3)
P( wi | hi )  P( wi | wi  2 wi 1 )

 A special pseudo-word <s> is introduced to mark the beginning of the

sentence in bigram estimation. Similarly, in tri-gram estimation, 2 pseudo-
words are introduced<s1> & <s2>
N-GRAM BASED LANGUAGE MODELING
 A n-gram model approximate the probability of the word given all previous
word sequences by the conditional probability given previous n-1 words
only.
P( wi | hi )  P( wi | wi  n 1.......... .wi 1 )

 A model that limits the history to previous one word only is called bi-gram
model (n=2)
P( wi | hi )  P ( wi | wi 1 )

 A tri-gram model limits the history to previous two words only (n=3)
P( wi | hi )  P( wi | wi  2 wi 1 )

 A special pseudo-word <s> is introduced to mark the beginning of the

sentence in bigram estimation. Similarly, in tri-gram estimation, 2 pseudo-
words are introduced<s1> & <s2>
N-GRAM BASED LANGUAGE MODELING
 The probabilities of word given n-1 words are estimated by
training the n-gram model on a training corpus.
 The probabilities are estimated using Maximum Likelihood
estimation (MLE) (i.e. using relative frequency).
 The count of a particular n-gram in the training corpus is
divided by the count of all the n-grams that share the same
prefix.

C ( wi  n 1......... wi 1wi )
P ( wi | wi  n 1 ...... wi 1 ) 
 C (w ...... w w) i  n 1
 The sum of all n-grams that share the n-1 words is equal to the
w
i 1

count of the common prefix

C ( wi  n 1......... wi 1wi )
P( wi | wi  n 1...... wi 1 ) 
C ( wi  n 1...... wi 1 )
N-GRAM MODEL- EXAMPLE
 Consider the following training corpus with three sentences:
<s> I am Sam </s>
<s> Sam I am </s>
<s> I do not like green ham </s>
Compute bi-gram probabilities and find the probability of the
test sentence “Sam I ham” using bi-gram language model.
N-GRAM MODEL- EXAMPLE CONTD…
Bi-gram Probabilities:
P (I|<s>)= 2/3 =0.67 P(am|I) = 2/3 =0.67
P(Sam|am) =1/2=0.5 P(</s>|Sam)= ½=0.5
P(Sam|<s>)= 1/3=0.33 P(I|Sam)= ½ = 0.5
P(</s>|am)=1/2 =0.5 P(do|I)= 1/3 = 0.33
P(not|do) = 1 /1 =1 P (like|not) =1/1 =1
P(green|like) =1/1 =1 P(ham|green) = 1/1 =1
P(</s>|ham) =1/1

P(Sam I ham)= P(Sam|<s>) P(I|Sam) P(ham|I) P(</s>|ham)

= 0.33  0.5  0  1
=0
N-GRAM MODEL- EXAMPLE 2
 The bigram counts from a piece of a bigram grammar from the
Berkeley Restaurant Project.
I want to eat chinese food lunch spend
I 5 827 0 9 0 0 0 2
want 2 0 608 1 6 6 5 1
To 2 0 4 686 2 0 6 211
eat 0 0 2 0 16 2 42 0
chinese 1 0 0 0 0 82 1 0
food 15 0 15 0 1 4 0 0
lunch 2 0 0 0 0 1 0 0
spend 1 0 1 0 0 0 0 0

The unigram count of each word is:

i want to eat food lunch spend chinese
2533 927 2417 746 1093 341 278 158
N-GRAM MODEL- EXAMPLE 2
 Some other useful probabilities are:
P(i|<s>)= 0.25 ,
P(</s>|food) = 0.68

Compute the probability of the sentence “I want chinese food”

N-GRAM MODEL- EXAMPLE 2
 The bigram probabilities after normalization (dividing each row
by the unigram counts) is shown below:
i want to eat chinese food lunch spend
i 0.002 0.33 0 0.0036 0 0 0 0.00079
want 0.0022 0 0.66 0.0011 0.0065 0.0065 0.0054 0.0011
to 0.00083 0 0.0017 0.28 0.00083 0 0.0025 0.087
eat 0 0 0.0027 0 0.021 0.0027 0.056 0
chinese 0.0063 0 0 0 0 0.52 0.0063 0
food 0.014 0 0.014 0 0.00092 0.0037 0 0
lunch 0.0059 0 0 0 0 0.0029 0 0
spend 0.0036 0 0.0036 0 0 0 0 0
N-GRAM MODEL- EXAMPLE 2
P(I want chinese food)
= P(I|<s>) P(want|I) P(chinese|want) P(food|chinese)
P(</s>|food)
= 0.25  0.33  0.0065  0.52  0.68
= 0.000189618
TRAINING & TEST SETS
 The probabilities of an N-gram model come from the corpus it
is trained on.
 In general, the parameters of a statistical model are trained on
some set of data (training data), and then we apply the models
to some new data (test data) in some task (such as speech
recognition) and see how well they work.
 This training-and-testing paradigm can also be used to
evaluate different N-gram architectures.
 If our test sentence is part of the training corpus, we will
mistakenly assign it an artificially high probability when it
occurs in the test set. We call this situation training on the test
set.
TRAINING & TEST SETS
 In addition to training and test sets, other divisions of data are
often useful.
 Sometimes we need an extra source of data to augment the
training set. Such extra data is called a held-out set, because
we hold it out from our training set when we train our N-gram
counts.
 The held-out corpus is then used to set some other parameters;

 Sometimes we need to have multiple test sets.

 This happens because we might use a particular test set so

often that we implicitly tune to its characteristics. Then we
would definitely need a fresh test set which is truly unseen. In
such cases, we call the initial test set the development test set
or, devset.
Unknown Words: Open VS. closed
Vocabulary tasks
 Sometimes we have a language task in which we know all the words that
can occur, and hence we know the vocabulary size V in advance.
 The closed vocabulary assumption is the assumption that we have such a
lexicon, and that the test set can only contain words from this lexicon. The
closed vocabulary task thus assumes there are no unknown words.
 But the number of unseen words grows constantly, so we can’t possibly
know in advance exactly how many there are, and we’d like our model to
do something reasonable with them.
 We call these unseen events unknown words, or out of vocabulary
(OOV) words. The percentage of OOV words that appear in the test set is
called the OOV rate.
 An open vocabulary system is one where we model these potential
unknown words in the test set by adding a pseudo-word called <UNK>.
Unknown Words: Open VS. closed
Vocabulary tasks
 We can train the probabilities of the unknown word model
<UNK> as follows:
 Choose a vocabulary (word list) which is fixed in advance.
 Convert in the training set any word that is not in this set (any OOV word)

to the unknown word token <UNK> in a text normalization step.

 Estimate the probabilities for <UNK> from its counts just like any other

regular word in the training set.

EVALUATING N-GRAMS
 The best way to evaluate the performance of a language model is to
embed it in an application and measure the total performance of the
application. Such end-to-end evaluation is called extrinsic
evaluation.
 For example, for speech recognition, we can compare the
performance of two language models by running the speech
recognizer twice, once with each language model, and seeing which
gives the more accurate transcription.
 An end-to-end evaluation is often very expensive; evaluating a large
speech recognition test set, for example, takes hours or even days.
 An intrinsic evaluation metric is one which measures the quality of
a model independent of any application.
 Perplexity is the most common intrinsic evaluation metric for N-
gram language models.
EVALUATING N-GRAMS CONTD….
 The perplexity (sometimes called PP for short) of a language
model on a test set is a function of the probability that the
language model assigns to that test set.
 For a test set W = w1w2 . . . wN , the perplexity is given by:

1
 1 N
PP ( W  w 1w 2 . . . w N )   

 P ( w w
1 2 . . . w )
N 
 Perplexity is the degree of randomness or confusion of a model
as it is inversely proportional to the probability of the test set.
 Thus, a lower value of perplexity, indicate better language
model.
SMOOTHING
 There is a major problem with the maximum likelihood estimation
process. This is the problem of sparse data caused by the fact that
the maximum likelihood estimate is based on a particular set of
training data.
 For any N-gram that occurred a sufficient number of times, we
might have a good estimate of its probability. But because any
corpus is limited, some perfectly acceptable English word
sequences are bound to be missing from it.
 This missing data means that the N-gram matrix for any given
training corpus is bound to have a very large number of “zero
probability N-grams” that should really have some non-zero
probability.
 Furthermore, the MLE method also produces poor estimates when
the counts are non-zero but still small.
SMOOTHING
 Zero counts turn out to cause another huge problem. The
perplexity metric requires that we compute the probability of
each test sentence.
 But if a test sentence has an N-gram that never appeared in the
training set, the Maximum Likelihood estimate of the probability
for this N-gram, and hence for the whole test sentence, will be
zero.
 This means that in order to evaluate our language models, we
need to modify the MLE method to assign some non-zero
probability to any N-gram, even one that was never observed in
training.
 Smoothing are the modifications done in maximum likelihood
estimates to address the problem of poor estimates that are due to
variability in small data sets.
ADD ONE / LAPLACE SMOOTHING
 Laplace Smoothing adds one to each count. Since there are V
words in the vocabulary, and each one got incre- mented, we
also need to adjust the the denominator to take into account the
extra V observations
 For unigram model ,

ci  1
PLaplace  wi  
N V
 For bigram model
c( wi wi 1 )  1
P( wi | wi 1 ) 
c( wi )  V
ADD ONE / LAPLACE SMOOTHING
Laplace smoothing results in sharp change in counts and
probabilities because too much probability mass is moved to all
the zeros.
 We could move a bit less mass by adding a fractional count
rather than 1 (add-d smoothing)
c( wi wi 1 )  d
P( wi | wi 1 ) 
c( wi )  dV
 But this method requires a method for choosing d dynamically.
 I also results in an in- appropriate discount for many counts, and
turns out to give counts with poor variances.
GOOD TURING DISCOUNTING
 A related way to view smoothing is as discounting (lowering)
some non-zero counts in order to get the probability mass that
will be assigned to the zero counts.
 The Good-Turing algorithm was first described by Good
(1953), who credits Turing with the original idea.
 The basic principle of Good-Turing smoothing is to re-estimate
the amount of probability mass to assign to N-grams with zero
counts by looking at the number of N-grams that occurred one
time.
 A word or N-gram (or any event) that occurs once is called a
singleton, or a hapax. The Good-Turing relies on the
assumption is to use the frequency of singletons as a re-estimate
of the frequency of zero-count bigrams.
GOOD TURING DISCOUNTING

 The Good-Turing algorithm is based on computing Nc, the

number of N-grams that occur c times.
 We refer to the number of N-grams that occur c times as the
frequency of frequency c.
 The MLE count for Nc is c. The Good-Turing estimate replaces
this with a smoothed count c∗, as a function of Nc+1:

* N c 1
c  (c  1)
Nc
EXAMPLE
 Consider the following training corpus with three sentences:
I am Sam
Sam I am
I do not like green ham
Compute the probability of the sentence “Sam I like” using
(i) unsmoothed bi-gram language model.
(ii) smoothed bi-gram language model using Laplace
smoothing.
(iii) smoothed bi-gram language model using Good -Turing
discounting.
EXAMPLE CONTD…..
(i) Unsmoothed bi-gram language model
P(Sam I like) = P(Sam|<s>) P(I|Sam) P(like|I)
= 1 1 0
  0
3 3 3

(ii) Smoothed bi-gram model using Laplace smoothing:

V= 9 (<s>, I , am, Sam, do, not, like, green, ham)
P(Sam I like) = P(Sam|<s>) P(I|Sam) P(like|I)

11 11 0 1
  
39 39 39
1 1 1 1
     0.0023148
6 6 12 432
EXAMPLE CONTD…..
(ii) Smoothed bi-gram model using Laplace smoothing:
V= 9 (<s>, I , am, Sam, do, not, like, green, ham)
Total Bi-grams possible = 99 =81
Seen Pairs = 12
<s> I, I am, am Sam, <s> Sam, Sam I, I am, <s> I , I do, do
not, not like, like green, green ham.
Seen Distinct Pairs = 10
Total Unseen Pairs = 81-10 =71
EXAMPLE CONTD…..

c Nc Pairs c* = (c+1) Nc+1

Nc
0 71 - 8/71
1 8 am Sam, <s> Sam, Sam I, 2*2/8=4/8
I do, do not, hot like,
like green
2 2 <s> I , I am 0

P(Sam I like) = P(Sam|<s>) P(I|Sam) P(like|I)

4 1  4 1   8 1 
       
 8 12   8 12   71 12 
1 1 2
  
24 24 213
 0.0000163

Midterm2006 Sol Csi4107
100% (2)
Midterm2006 Sol Csi4107
9 pages
Virginia Tufte - Artful Sentences - Syntax As Style-Graphics PR (2006)
No ratings yet
Virginia Tufte - Artful Sentences - Syntax As Style-Graphics PR (2006)
161 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
ch6 Perceptron MLP PDF
No ratings yet
ch6 Perceptron MLP PDF
31 pages
Contoh Soal N Gram (Bagus)
No ratings yet
Contoh Soal N Gram (Bagus)
2 pages
Wireless 4 Dof Robotic Arm Using Mega 2560-1
No ratings yet
Wireless 4 Dof Robotic Arm Using Mega 2560-1
11 pages
Model With One-Word Context: 2vec 2vec 2vec 2vec
100% (1)
Model With One-Word Context: 2vec 2vec 2vec 2vec
17 pages
NLP2 7
No ratings yet
NLP2 7
400 pages
Bahir Dar University Bahir Dar Institute of Technology Faculty of Computing Department of Computer Science
No ratings yet
Bahir Dar University Bahir Dar Institute of Technology Faculty of Computing Department of Computer Science
4 pages
Chapters 8 & 9 First-Order Logic: Dr. Daisy Tang
No ratings yet
Chapters 8 & 9 First-Order Logic: Dr. Daisy Tang
76 pages
6 - Train - Test - Split - Ipynb - Colaboratory
No ratings yet
6 - Train - Test - Split - Ipynb - Colaboratory
5 pages
Technical Seminar: Sapthagiri College of Engineering
No ratings yet
Technical Seminar: Sapthagiri College of Engineering
18 pages
AutoGen - The Automated Program Generator
No ratings yet
AutoGen - The Automated Program Generator
196 pages
Navies Bayes
No ratings yet
Navies Bayes
18 pages
Module-5:: Network Analysis
No ratings yet
Module-5:: Network Analysis
22 pages
Knowledge Representation Additional Reading
No ratings yet
Knowledge Representation Additional Reading
26 pages
Lesson 4 Logic and Knowledge Representation
No ratings yet
Lesson 4 Logic and Knowledge Representation
100 pages
L 13 Naive Bayes Classifier
100% (1)
L 13 Naive Bayes Classifier
52 pages
ch9 Ensemble Learning
No ratings yet
ch9 Ensemble Learning
19 pages
Tf-Idf: David Kauchak cs160 Fall 2009
No ratings yet
Tf-Idf: David Kauchak cs160 Fall 2009
51 pages
Automatic Music Generation
No ratings yet
Automatic Music Generation
16 pages
Maximize The Business Value of Generative Ai
No ratings yet
Maximize The Business Value of Generative Ai
19 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Application of First-Order Logic in Knowledge Based Systems PDF
No ratings yet
Application of First-Order Logic in Knowledge Based Systems PDF
7 pages
CSC445: Neural Networks
No ratings yet
CSC445: Neural Networks
51 pages
Knowledge Representation First Order Logic
No ratings yet
Knowledge Representation First Order Logic
49 pages
Prompt Engineering For Vision Models Slides 1720084286
No ratings yet
Prompt Engineering For Vision Models Slides 1720084286
17 pages
Fine-Tuning AI Models - A Guide. Fine-Tuning Is A Technique For Adapting - by Prabhu Srivastava - Medium
No ratings yet
Fine-Tuning AI Models - A Guide. Fine-Tuning Is A Technique For Adapting - by Prabhu Srivastava - Medium
12 pages
m8 Fol
No ratings yet
m8 Fol
27 pages
Short Report On Expert Systems
100% (1)
Short Report On Expert Systems
12 pages
Lecture 05 - Part A First Order Logic (FOL) : Dr. Shazzad Hosain
No ratings yet
Lecture 05 - Part A First Order Logic (FOL) : Dr. Shazzad Hosain
80 pages
Statistics Powerpoint Presentation - Regression
No ratings yet
Statistics Powerpoint Presentation - Regression
17 pages
Python
No ratings yet
Python
21 pages
Cs 224 N
No ratings yet
Cs 224 N
128 pages
Automatic Sorting System Using Machine V PDF
No ratings yet
Automatic Sorting System Using Machine V PDF
6 pages
Sociological Study of Perceptron Convtroversy
No ratings yet
Sociological Study of Perceptron Convtroversy
50 pages
Natural Language Toolkit NLTK PDF
No ratings yet
Natural Language Toolkit NLTK PDF
23 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
CS 8520: Artificial Intelligence: Knowledge Representation
No ratings yet
CS 8520: Artificial Intelligence: Knowledge Representation
30 pages
Knowledge Based Systems (Sistem Berbasis Pengetahuan) : Ir. Wahidin Wahab M.SC PH.D
No ratings yet
Knowledge Based Systems (Sistem Berbasis Pengetahuan) : Ir. Wahidin Wahab M.SC PH.D
33 pages
Bias and Variance
No ratings yet
Bias and Variance
6 pages
10 Natural Language Processing
No ratings yet
10 Natural Language Processing
27 pages
NLP Unit-V
No ratings yet
NLP Unit-V
30 pages
SVM
No ratings yet
SVM
12 pages
Ai Final
No ratings yet
Ai Final
52 pages
Statistics Presentation
No ratings yet
Statistics Presentation
21 pages
NLP Mcq+Dis Answers-Ok
No ratings yet
NLP Mcq+Dis Answers-Ok
52 pages
First Order Logic: Artificial Intelligence
No ratings yet
First Order Logic: Artificial Intelligence
16 pages
21AD3202 - Natural LanguageProcessing-Record
No ratings yet
21AD3202 - Natural LanguageProcessing-Record
64 pages
ML Unit-5
No ratings yet
ML Unit-5
83 pages
Text Analysis With NLTK Cheatsheet PDF
No ratings yet
Text Analysis With NLTK Cheatsheet PDF
3 pages
Brief Introduction To GenAI
No ratings yet
Brief Introduction To GenAI
1 page
Topic For The Class:: Knowledge and Reasoning
No ratings yet
Topic For The Class:: Knowledge and Reasoning
41 pages
Computer Organization & Architecture
No ratings yet
Computer Organization & Architecture
49 pages
Predicate Logic
No ratings yet
Predicate Logic
64 pages
Mining The Web Graph: Technical Seminar Presentation On
No ratings yet
Mining The Web Graph: Technical Seminar Presentation On
15 pages
AIML - 04 Single Layer Perceptron
No ratings yet
AIML - 04 Single Layer Perceptron
11 pages
Agents in Artificial Intelligence Book
No ratings yet
Agents in Artificial Intelligence Book
29 pages
Text Mining: Fundamentals and Applications
From Everand
Text Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
From Everand
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
Fouad Sabry
No ratings yet
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
Implementing Warehouse Management Systems Inlogistics A Case Study
No ratings yet
Implementing Warehouse Management Systems Inlogistics A Case Study
12 pages
Case Study of Improving Productivity in Warehouse Work
No ratings yet
Case Study of Improving Productivity in Warehouse Work
5 pages
Conceptualizing The Interplay of Knowledge Accumulation and Integration Capabilities in Service Innovation-Based Competitive Strategy: A Project-Oriented Firm Context
No ratings yet
Conceptualizing The Interplay of Knowledge Accumulation and Integration Capabilities in Service Innovation-Based Competitive Strategy: A Project-Oriented Firm Context
21 pages
Introduction and Design of The Study: Chapter - 1
No ratings yet
Introduction and Design of The Study: Chapter - 1
33 pages
Detection and Prevention of ARP
No ratings yet
Detection and Prevention of ARP
70 pages
Lab Session VI-RTextTools (P8)
No ratings yet
Lab Session VI-RTextTools (P8)
13 pages
Lexical Semantics: Prabhleen Juneja Tiet
No ratings yet
Lexical Semantics: Prabhleen Juneja Tiet
43 pages
Hidden Markov Models (HMMS) : Prabhleen Juneja Thapar Institute of Engineering & Technology
No ratings yet
Hidden Markov Models (HMMS) : Prabhleen Juneja Thapar Institute of Engineering & Technology
36 pages
Facility Layout
No ratings yet
Facility Layout
36 pages
Chapter 2 PQ FM
No ratings yet
Chapter 2 PQ FM
5 pages
Williamson Markets and Hierarchies Ch1!2!12 - 13 (1975)
No ratings yet
Williamson Markets and Hierarchies Ch1!2!12 - 13 (1975)
37 pages
Market and Hierarchies
No ratings yet
Market and Hierarchies
10 pages
Year 7 Ict Structure
No ratings yet
Year 7 Ict Structure
6 pages
Group 2 Writing An Interview
No ratings yet
Group 2 Writing An Interview
46 pages
Mediating Between Concepts and Grammar Holden Hartl 2024 scribd download
100% (5)
Mediating Between Concepts and Grammar Holden Hartl 2024 scribd download
85 pages
Oral Exam 2017 01 PDF
No ratings yet
Oral Exam 2017 01 PDF
8 pages
SSC CHSL Tier 1 Question Paper 11 July 2024 Shift 4
No ratings yet
SSC CHSL Tier 1 Question Paper 11 July 2024 Shift 4
22 pages
Unit 02 - Leccion 2
No ratings yet
Unit 02 - Leccion 2
15 pages
The Hindu Vocabulary Words Questions PDF Download 2nd July 2021
No ratings yet
The Hindu Vocabulary Words Questions PDF Download 2nd July 2021
14 pages
Ghada Abushahla CV (2024)
No ratings yet
Ghada Abushahla CV (2024)
2 pages
Final Term of Sir DR Nazakat
No ratings yet
Final Term of Sir DR Nazakat
14 pages
Assignment On How-To Paragraphs and Complex Sentences (5 Sub-Tasks) Task 1 Put in The Order
No ratings yet
Assignment On How-To Paragraphs and Complex Sentences (5 Sub-Tasks) Task 1 Put in The Order
2 pages
Ex PlanLector5toPrimaria
No ratings yet
Ex PlanLector5toPrimaria
2 pages
The Experiential Meaning in Forensic Courtroom Discourse
No ratings yet
The Experiential Meaning in Forensic Courtroom Discourse
5 pages
OA2 1a Grammar
No ratings yet
OA2 1a Grammar
24 pages
CBSE 12 Holiday Homework
No ratings yet
CBSE 12 Holiday Homework
5 pages
MASTERING_THE_NEW_TOEFL_IBT_LISTENING_20
No ratings yet
MASTERING_THE_NEW_TOEFL_IBT_LISTENING_20
20 pages
Prepositions On Time: Exercise On Prepositions - Time (At, In, On) Fill in The Correct Prepositions
No ratings yet
Prepositions On Time: Exercise On Prepositions - Time (At, In, On) Fill in The Correct Prepositions
2 pages
Week 3 - Tutorial
No ratings yet
Week 3 - Tutorial
5 pages
Love at First Byte British English Teacher B1 B2
No ratings yet
Love at First Byte British English Teacher B1 B2
11 pages
23-2
No ratings yet
23-2
4 pages
Y13b Unit 1 Reading
No ratings yet
Y13b Unit 1 Reading
10 pages
Cohesion in English - (6 Lexical Cohesion)
No ratings yet
Cohesion in English - (6 Lexical Cohesion)
19 pages
TOEFL Notes
100% (1)
TOEFL Notes
9 pages
SSC Regular Hallticket
No ratings yet
SSC Regular Hallticket
2 pages
[Ebooks PDF] download Cognitive Psychology, 7th Ed 7th Edition Robert J. Sternberg full chapters
100% (1)
[Ebooks PDF] download Cognitive Psychology, 7th Ed 7th Edition Robert J. Sternberg full chapters
47 pages
First Conditional PowerPoint Lesson
No ratings yet
First Conditional PowerPoint Lesson
21 pages
Past Perfect: How Do We Make The Past Perfect Tense?
No ratings yet
Past Perfect: How Do We Make The Past Perfect Tense?
7 pages
Thesis Format English
No ratings yet
Thesis Format English
19 pages
1st Year Grammar
No ratings yet
1st Year Grammar
30 pages
You Tube Links For ELA Segments On TTT SEA Time
No ratings yet
You Tube Links For ELA Segments On TTT SEA Time
3 pages

Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology

Uploaded by

Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology

Uploaded by

LANGUAGE MODELING

 Grammar consists of hand-coded rules defining the structure

 SLMs are essential in any task in which we have to identify

 In OCR & Handwriting recognition – More probable

 Suppose we are translating a Chinese source sentence and

 A SLM might tell us that, even after controlling for length,

 In spelling correction, we need to find and correct spelling

 Word prediction is also important for augmentative

P( S )  P( w1w2 w3 .......... ...... wn )

where hi is the history of the ith word defined as:

 An N-Gram model simplifies the task of these probability

 A special pseudo-word <s> is introduced to mark the beginning of the

 A special pseudo-word <s> is introduced to mark the beginning of the

count of the common prefix

P(Sam I ham)= P(Sam|<s>) P(I|Sam) P(ham|I) P(</s>|ham)

The unigram count of each word is:

Compute the probability of the sentence “I want chinese food”

 Sometimes we need to have multiple test sets.

 This happens because we might use a particular test set so

to the unknown word token <UNK> in a text normalization step.

regular word in the training set.

 The Good-Turing algorithm is based on computing Nc, the

(ii) Smoothed bi-gram model using Laplace smoothing:

c Nc Pairs c* = (c+1) Nc+1

P(Sam I like) = P(Sam|<s>) P(I|Sam) P(like|I)

You might also like