MultinomialNB
MultinomialNB
Classification
and Naïve
Bayes
5
What is the subject of this article?
10
Classification Methods:
Supervised Machine Learning
• Any kind of classifier
• Naïve Bayes
• Logistic regression
• Support-vector machines
• k-Nearest Neighbors
•…
Text
Classification
and Naïve
Bayes
15
The bag of words representation
seen 2
γ
sweet 1
whimsical
recommend
1
1 )=c
(
happy 1
... ...
Text
Classification
and Naïve
Bayes
Formalizing the
Naïve Bayes
Classifier
Bayes’ Rule Applied to Documents and
Classes
MAP is “maximum a
posteriori” = most
likely class
Bayes Rule
Dropping the
denominator
Naïve Bayes Classifier (II)
Document d
argmax P( x1 , x2 , , xn | c) P (c) represented
as features
cC x1..xn
Naïve Bayes Classifier (IV)
available.
Multinomial Naïve Bayes Independence
Assumptions
P ( x1 , x2 , , xn | c)
This:
Notes:
1) Taking log doesn't change the ranking of classes!
The class with highest probability also has highest log
probability!
2) It's a linear model:
Just a max of a sum of weights: a linear function of the inputs
So naive bayes is a linear classifier
Text
Classification
and Naïve
Bayes
Formalizing the
Naïve Bayes
Classifier
Text
Classification
and Naïve
Bayes
Naïve Bayes:
Learning
Sec.13.3
Naïve Bayes:
Learning
Text
Classification
and Naïve
Bayes
Naïve Bayes:
Relationship to
Language Modeling
Generative Model for Multinomial Naïve Bayes
c=China
39
Naïve Bayes and Language Modeling
• Naïve bayes classifiers can use any sort of feature
• URL, email address, dictionaries, network features
• But if, as in the previous slides
• We use only word features
• we use all of the words in the text (not a subset)
• Then
• Naïve bayes has an important similarity to language
40
modeling.
Sec.13.2.1
Naïve Bayes:
Relationship to
Language Modeling
Text
Classification
and Naïve
Bayes
Multinomial Naïve
Bayes: A Worked
Example
Doc Words Class
Training 1 Chinese Beijing Chinese c
2 Chinese Chinese Shanghai c
3 Chinese Macao c
4 Tokyo Japan Chinese j
Test 5 Chinese Chinese Chinese Tokyo Japan ?
Priors:
P(c)= 3
4 1 Choosing a class:
P(j)= 4 P(c|d5) 3/4 * (3/7)3 * 1/14 * 1/14
≈ 0.0003
Conditional Probabilities:
P(Chinese|c) = (5+1) / (8+6) = 6/14 = 3/7
P(Tokyo|c) = (0+1) / (8+6) = 1/14 P(j|d5) 1/4 * (2/9)3 * 2/9 * 2/9
P(Japan|c) = (0+1) / (8+6) = 1/14 ≈ 0.0001
P(Chinese|j) = (1+1) / (3+6) = 2/9
P(Tokyo|j) = (1+1) / (3+6) = 2/9
45 P(Japan|j) = (1+1) / (3+6) = 2/9
Naïve Bayes in Spam Filtering
• SpamAssassin Features:
• Online Pharmacy
• Mentions millions of (dollar) ((dollar) NN,NNN,NNN.NN)
• Phrase: impress ...
• From: starts with many numbers
• Subject is all capitals
• HTML has a low ratio of text to image area
• One hundred percent guaranteed
• Claims you can be removed from the list
• 'Prestigious Non-Accredited Universities'
• http://spamassassin.apache.org/tests_3_3_x.html
Summary: Naive Bayes is Not So Naive
• Very Fast, low storage requirements
• Robust to Irrelevant Features
Irrelevant Features cancel each other without affecting results
• Very good in domains with many equally important features
Decision Trees suffer from fragmentation in such cases – especially if little data
• Optimal if the independence assumptions hold: If assumed
independence is correct, then it is the Bayes Optimal Classifier for problem
• A good dependable baseline for text classification
• But we will see other classifiers that give better accuracy
Text
Classification
and Naïve
Bayes
Multinomial Naïve
Bayes: A Worked
Example
Classification
and Naïve
Bayes