0% found this document useful (0 votes)

99 views55 pages

CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)

The document summarizes the Naive Bayes classifier. It explains that Naive Bayes makes the assumption of conditional independence between features given the class. This allows it to greatly reduce the number of parameters needed to estimate probabilities compared to other classifiers. The document then discusses how Naive Bayes can be applied to text classification problems by representing documents as bag-of-words models and estimating word probabilities.

Uploaded by

Mathias Bueno

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

99 views55 pages

CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)

Uploaded by

Mathias Bueno

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

CS464

Chapter 4: Naïve Bayes

(slides based on the slides provided by Öznur Taştan and

Mehmet Koyutürk)
Last Chapter: Density Estimation

2
Outline Today
•  Naïve Bayes Classifier

•  Generalization of maximum a posteriori estimation

•  Text Classification

•  Application of Naïve Bayes

•  Illustration of feature extraction/encoding and feature selection

3
A Bayesian Classifier
-  Compute the conditional probability of each value
of Y given the attributes

-  Classify the example into the class that is most

probable given the attributes

4
Learning a Classifier By Learning P(Y|X)

Joint probability table

P(G,W,H)
W: Wealth
G: Gender
H: HoursWorked

Conditional probability
table P(W |G, H)

5
A Bayesian Classifier
Predict the class label that is most probable given the
attributes (values of features)

6
Building a Classifier By Learning P(Y|X)
•  Two binary features, one class label

7
How many parameters do we need to estimate?

8
Can we reduce the
number of parameters using Bayes' Rule?

10
Can we reduce the
number of parameters using Bayes Rule?

30 features ---> more than 30 billion parameters!

10
Naïve Bayes
•  Naïve Bayes assumes

•  Random variables (features) Xi and Xj are

conditionally independent of each other given the
class label Y for all i≠ j

11
Conditional Independence

•  X and Y are conditionally independent given Z iff

the conditional probability of the joint variable
can be written as product of conditional
probabilities:

𝑋 ⊥ 𝑌|𝑍 ⟺ 𝑃(𝑋, 𝑌|𝑍) = 𝑃 𝑋|𝑍 𝑃(𝑌|𝑍)

Naïve Bayes in a Nutshell

How many parameters?

- > 2n + 1 if Y is binary

13
Example: Shall we play tennis?

14
Applying Naïve Bayes Assumption

Applying the Naïve Bayes Assumption:

O: Outlook
T: Temperature
H: Humidity
W: Wind

15
Applying Naïve Bayes Assumption

Applying the Naïve Bayes Assumption:

16
Parameters to Estimate

17
Parameters to Estimate

20
Parameters to Estimate

19
Relative Frequencies
•  Consider each feature independently and estimate:

20
Applying Naïve Bayes
•  Posterior probability for a new instance with the feature
vector:
•  Xnew = (sunny, cool, high, true)

Posterior Likelihood Prior

21
Applying Naïve Bayes
X = (sunny, cool, humid, windy)
•  Estimating the likelihood:

•  Estimating the posterior:

•  Class label predicted for X is then Play = No

22
Numerical Issues
•  Multiplying lots of probabilities, which are between 0
and 1 by definition, can result in floating--point
underflow.

•  Underflow occurs when you perform an operation that's

smaller than the smallest magnitude non--zero number.

•  Since log(xy) = log(x) + log(y), it is better to perform all

computations by summing logs of probabilities rather
than multiplying probabilities.

•  Class with highest final un--normalized log probability

score is still the most probable.
10
Underflow
•  Therefore, instead of using this formulation:

•  Use the following equivalent rule:

•  Avoiding underflow is an important implementation

detail!
24
Text Classification Using Naïve Bayes

25
Identifying Spam

26
Other Text Classification Tasks
•  Classify email as
‘Spam’, ‘Ham’

•  Classify web pages as

‘Student’, ‘Faculty’, ‘Other’

•  Classify news stories into topics as

‘Sports’, ‘Politics’..

•  Classify movie reviews as

‘favorable’, ‘unfavorable’, ‘neutral

27
Text Classification
•  Classify email as
– ‘Spam’, ‘Ham’

•  Classify web pages as

– ‘Student’, ‘Faculty’,
‘Other’ Class labels, y
•  Classify news stories into topics as
–  ‘Sports’, ‘Politics’..

•  Classify movie reviews as

–  ‘favorable’, ‘unfavorable’, ‘neutral’

•  What about the features X?

•  How to represent the document?

28
How do we represent a document?
•  A sequence of words?
–  computationally very expensive, can be difficult to
train

•  A set of words (Bag--of--Words)

–  Ignore the position of the word in the
document
–  Ignore the ordering of the words in the
document
–  Consider the words in a predefined
vocabulary Image courtesy: Joseph Gonzalez

29
Document Models
•  Bernoulli document model: a document is represented by
a binary feature vector, whose elements indicate the absence
or presence of corresponding word in the document

•  Multinomial document model: a document is represented

by an integer feature vector, whose elements indicate the
frequency of corresponding word in the document

30
Bag--of--words document models
•  Document:
Congratulations to you as we bring to your notice, the
results of Category draws of THE
CASINO the First INT. We are HOLLAND
LOTTO PROMO happy to inform you that
you have emerged a winner under the First Category,
which is part of our promotional draws.

31
Example
•  Classify documents as Sports and
Informatics
•  Assume the vocabulary contains 8 words
•  Good vocabularies usually do not include common words
(a.k.a. stop words)

32
Training Data
•  Rows are documents
•  6 examples of sports documents
•  5 examples of informatics documents
•  Columns are words in the order of vocabulary

20
Estimating Parameters

34
Bernoulli Document Model

35
Classification with Bernoulli Model

36
Classifying a given sample
•  A test document:

•  Priors and likelihoods:

•  Posterior probabilities:

•  Classify this document as Sports

37
Multinomial Document Model

38
Multinomial Document Model

Words are i.i.d. samples from a

multinomial distirbution.

39
Classification with Multinomial Model

40
Add--one (Laplace) Smoothing

-  Add one imaginary occurrence of every word to every document

41
Text Classification Framework

Features/
Documents Preprocessing
Indexing

Applying
Performance Feature
classification
measure filtering
algorithms

42
Preprocessing
•  Token normalization
-- Remove superficial character variances from words
normelization -> normalization
•  Stop--word removal
–  Remove predefined common words that are not specific or
discriminatory to the different classes
is, a, the, you, as…
•  Stemming
–  Reduce different forms of the same word into a single word
(base/root form)
swimming, swimmer, swims -> swim
•  Feature selection
–  Choose features that are more relevant and complementary
can be part of the design process, but in general it is done
computationally by trying different combinations
30
Preprocessing

•  Mr. O’Neill thinks that the boys’ stories about Chile’s

capital aren’t amusing.

How to handle special cases involving apostrophes,

hyphens etc?

C++, C#, URLs, emails, phone numbers, dates

San Francisco, Los Angeles

44
Tokenization

•  Divide the text into a sequence of words by

combining, dividing words, handling special
characters etc.

•  Issues of tokenization are language specific

–  Requires the language to be known
German compound nouns

–  East Asian Languages (Chinese, Japanese, Korean,

Thai)
•  Text is written without any spaces between words

45
Normalization
•  Token normalization
–  Canonicalizing tokens so that matches occur despite
superficial differences in the character sequences of the
tokens
–  U.S.A vs USA
–  Anti--discriminatory vs antidiscriminatory
–  Car vs automobile?

46
Stop Words
•  Very common words that have no discriminatory power

•  Sort terms by collection frequency and take the most

frequent words

•  For an application, an additional domain specific

stopword list may be constructed
– In a collection about insurance practices, “insurance” would be
a stop word

47
Feature Encoding

feature
encoding

Source: http://www.3n1ltk.org/book/ch06.html
Feature Encoding
•  How to represent the features

•  Feature encoding can have tremendous impact on

the classifier

49
Feature Extraction vs Feature Selection
•  Feature extraction:
–  Transform data into a new feature space, usually by
mapping existing features into a lower dimensional space
(PCA, ICA, etc. We will come back)

•  Feature selection:
–  Select a subset of the existing features without a
transformation

50
Feature (Subset) Selection
•  Necessary in a number of situations:
•  Features may be expensive to obtain
•  Evaluate a large number of features in the test bed
and select a subset for the final implementation

•  You want to extract meaningful rules from your classifier

•  Fewer features means fewer model parameters

•  Improved generalization capabilities
•  Reduced complexity and run--time

51
Runtime of Naïve Bayes

•  It is fast
•  Computation of parameters can be done in O
(CD)
•  C : number of classes
•  D: number of attributes/features
Incremental Updates

•  If the model is going to be updated very often as new data

come, you may implement it such that it allows easy
incremental updates

•  For example: Store raw counts instead of probabilities

•  New example of class k:
•  For each feature update the counts based on the example feature
vector
•  Update the class counts, update the number of training data
•  When you need to classify, compute the probabilities
The Independence Assumption

•  Usually features are not conditionally independent

•  That is why it is called naïve
•  In practice it often works well
•  Naïve Bayes does not produce accurate probability estimates when
its independence assumptions are violated, but it may still (and often)
pick the correct maximum-probability class in many cases
[Domingos&Pazzani, 1996].
•  Typically handles noise well since it does not even focus on
completely fitting the training data

What You Should Know
•  Training and using classifiers based on Bayes rule

•  Conditional independence
•  What it is
•  Why it is important

•  Naïve Bayes
•  What it is
•  How to estimate the parameters
•  How to make predictions

•  Mutual Information is a good measure for filtering features

Set Theory for Beginners: Foundational Mathematics for Software Developers, #1
From Everand
Set Theory for Beginners: Foundational Mathematics for Software Developers, #1
Subhomoy Haldar
No ratings yet
Next Steps Tutor Binder - 200518
No ratings yet
Next Steps Tutor Binder - 200518
80 pages
lecture3-linear-classifiers
No ratings yet
lecture3-linear-classifiers
36 pages
Irs Unit 4 CH 1
No ratings yet
Irs Unit 4 CH 1
58 pages
in4080_2022_lecture_03
No ratings yet
in4080_2022_lecture_03
62 pages
L2_CSE256_FA24_TC
No ratings yet
L2_CSE256_FA24_TC
65 pages
03 ML Essentials
No ratings yet
03 ML Essentials
52 pages
Bayes Classifier
No ratings yet
Bayes Classifier
35 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
NLP NB
No ratings yet
NLP NB
52 pages
Lecture 12 Dr. Lamiaa
No ratings yet
Lecture 12 Dr. Lamiaa
21 pages
Lecture 4
No ratings yet
Lecture 4
36 pages
5.2_feature_engineering
No ratings yet
5.2_feature_engineering
57 pages
Text Classification
No ratings yet
Text Classification
53 pages
NBayes-1-20-2011-ann
No ratings yet
NBayes-1-20-2011-ann
21 pages
6. Naive Bayes
No ratings yet
6. Naive Bayes
26 pages
Inf2b Learn Note07 2up
No ratings yet
Inf2b Learn Note07 2up
5 pages
03 Classification
No ratings yet
03 Classification
66 pages
Document
No ratings yet
Document
7 pages
L5 TextClassification Updated
No ratings yet
L5 TextClassification Updated
179 pages
lec20-ML I
No ratings yet
lec20-ML I
48 pages
MultinomialNB
No ratings yet
MultinomialNB
52 pages
CSE546: Naïve Bayes: Winter 2012
No ratings yet
CSE546: Naïve Bayes: Winter 2012
35 pages
AI Lec 04+05 - Naive Bayes
No ratings yet
AI Lec 04+05 - Naive Bayes
55 pages
Naive Bayesian Classifier: National Institute of Technology Sikkim
No ratings yet
Naive Bayesian Classifier: National Institute of Technology Sikkim
6 pages
ML CLassification Naive Bayes
No ratings yet
ML CLassification Naive Bayes
6 pages
07 - ML - Naive-Bayes-update
No ratings yet
07 - ML - Naive-Bayes-update
26 pages
Naive Bayes
No ratings yet
Naive Bayes
38 pages
NaiveBayes N Text Analytics
No ratings yet
NaiveBayes N Text Analytics
20 pages
LM3 - Naive Bayes Model
No ratings yet
LM3 - Naive Bayes Model
21 pages
Discriminative Generative: R Follow A
100% (1)
Discriminative Generative: R Follow A
18 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
39 pages
SP14 CS188 Lecture 21 -- Naive Bayes - Print
No ratings yet
SP14 CS188 Lecture 21 -- Naive Bayes - Print
41 pages
cs188-fa22-note19
No ratings yet
cs188-fa22-note19
8 pages
Introduction To Machine Learning and Deep Learning
No ratings yet
Introduction To Machine Learning and Deep Learning
40 pages
Multimedia Application L8
No ratings yet
Multimedia Application L8
68 pages
Naive Bayes Classifiers - Parta
No ratings yet
Naive Bayes Classifiers - Parta
17 pages
mla_unit-5'2 (1)
No ratings yet
mla_unit-5'2 (1)
74 pages
BAI601 Module 3 PDF
No ratings yet
BAI601 Module 3 PDF
19 pages
Ml Module4 Classification
No ratings yet
Ml Module4 Classification
79 pages
2 Naive Bayes
No ratings yet
2 Naive Bayes
49 pages
Unit 4
No ratings yet
Unit 4
207 pages
Lecture 8-1 - Text Classification, Naïve Bayes, Vector Space Classification
No ratings yet
Lecture 8-1 - Text Classification, Naïve Bayes, Vector Space Classification
38 pages
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper (1)
No ratings yet
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper (1)
74 pages
Naive Bayes
No ratings yet
Naive Bayes
41 pages
n9
No ratings yet
n9
14 pages
Naive Bayes Ons
No ratings yet
Naive Bayes Ons
29 pages
Wk08
No ratings yet
Wk08
10 pages
Naivebayes 2021
No ratings yet
Naivebayes 2021
77 pages
Bayesian Learning
No ratings yet
Bayesian Learning
58 pages
CSC 325 AI Lecture08 Supervised Learning Fall2024 Dr Raheel 20022025 034558pm
No ratings yet
CSC 325 AI Lecture08 Supervised Learning Fall2024 Dr Raheel 20022025 034558pm
29 pages
5 ML NaiveBayes
No ratings yet
5 ML NaiveBayes
45 pages
COMP2050-Lecture 22 - Machine Learning
No ratings yet
COMP2050-Lecture 22 - Machine Learning
47 pages
4 NB 2024
No ratings yet
4 NB 2024
82 pages
Tackling The Poor Assumptions of Naive Bayes Text Classifiers
No ratings yet
Tackling The Poor Assumptions of Naive Bayes Text Classifiers
8 pages
Naive Bayes
No ratings yet
Naive Bayes
56 pages
Lecture 6 Text Classification
No ratings yet
Lecture 6 Text Classification
19 pages
IR unit 2 (1,2)
No ratings yet
IR unit 2 (1,2)
76 pages
Naive_Bayes (1)
No ratings yet
Naive_Bayes (1)
4 pages
1.9. Naive Bayes - Scikit-Learn 0.21.3 Documentation
No ratings yet
1.9. Naive Bayes - Scikit-Learn 0.21.3 Documentation
4 pages
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet
TO THE NILE (1)
No ratings yet
TO THE NILE (1)
5 pages
Ft6 017 (Teste Outubro)
No ratings yet
Ft6 017 (Teste Outubro)
4 pages
Team Sj' 25 - Engl 157 Pasco
No ratings yet
Team Sj' 25 - Engl 157 Pasco
11 pages
English Grammar Mod 3
No ratings yet
English Grammar Mod 3
8 pages
Persian Study-Guide General
No ratings yet
Persian Study-Guide General
11 pages
Passive of Asserive Sentence: Type-I
No ratings yet
Passive of Asserive Sentence: Type-I
8 pages
TN Evening Class
No ratings yet
TN Evening Class
3 pages
First Term Practice (Listening&Speaking) Exam For 9th Grades
No ratings yet
First Term Practice (Listening&Speaking) Exam For 9th Grades
5 pages
Score Report Form: Fraud Prevention Code L027198
No ratings yet
Score Report Form: Fraud Prevention Code L027198
1 page
Objectives: G.Kishore, 2/95 Mallapatti, Meenachivalasu (Po), Aravakurichi (Taluk), Karur - 639 207
No ratings yet
Objectives: G.Kishore, 2/95 Mallapatti, Meenachivalasu (Po), Aravakurichi (Taluk), Karur - 639 207
2 pages
diction and style
No ratings yet
diction and style
7 pages
2025 BECE Timetable
No ratings yet
2025 BECE Timetable
1 page
Idioms
No ratings yet
Idioms
13 pages
Your Answers in The Corresponding Numbered Boxes Provided
No ratings yet
Your Answers in The Corresponding Numbered Boxes Provided
8 pages
Founders and Fathers in English Literature
No ratings yet
Founders and Fathers in English Literature
3 pages
De Cuong On Tap Anh 6 HK2 Nam 22 23
No ratings yet
De Cuong On Tap Anh 6 HK2 Nam 22 23
8 pages
Download Subjects in Japanese and English Yoshihisa Kitagawa ebook All Chapters PDF
100% (1)
Download Subjects in Japanese and English Yoshihisa Kitagawa ebook All Chapters PDF
55 pages
Tanaga: Tanaga, Is The Haiku Equivalent From The Philippines
100% (1)
Tanaga: Tanaga, Is The Haiku Equivalent From The Philippines
9 pages
Connect With English - 022145
No ratings yet
Connect With English - 022145
8 pages
Be + Adjective Combinations Followed by Infinitives
No ratings yet
Be + Adjective Combinations Followed by Infinitives
3 pages
Jawaharlal Nehru Technological University Hyderabad
No ratings yet
Jawaharlal Nehru Technological University Hyderabad
5 pages
Oxford_Discover_Futures_3_Student_39_s_Book
No ratings yet
Oxford_Discover_Futures_3_Student_39_s_Book
5 pages
Movie Trailer Teachers Notes
No ratings yet
Movie Trailer Teachers Notes
3 pages
Politely Asking Someone To Do Something
No ratings yet
Politely Asking Someone To Do Something
3 pages
Inggris Xii
No ratings yet
Inggris Xii
2 pages
fsf2d Il Faut 7 Avril 1
No ratings yet
fsf2d Il Faut 7 Avril 1
2 pages
AUN Report
No ratings yet
AUN Report
81 pages
The Approaches For Language Arts Teaching
100% (1)
The Approaches For Language Arts Teaching
63 pages
Instruction to Miguel López de Legazpi From the Royal Audiencia of New Spain
No ratings yet
Instruction to Miguel López de Legazpi From the Royal Audiencia of New Spain
4 pages