0% found this document useful (0 votes)

8 views85 pages

nb24aug

The document discusses text classification methods, focusing on the Naive Bayes classifier, which uses Bayes' theorem for categorizing text based on word frequencies. It covers various applications such as spam detection, sentiment analysis, and authorship identification, while addressing challenges like zero probabilities and the need for smoothing techniques. Additionally, it highlights the importance of feature selection and the potential use of lexicons in improving classification accuracy.

Uploaded by

aditya426690

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views85 pages

nb24aug

Uploaded by

aditya426690

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 85

Text The Task of Text

Classification Classification
and Naive
Bayes
Is this spam?
Who wrote which Federalist Papers?
1787-8: essays anonymously written by:
Alexander Hamilton, James Madison, and John Jay
to convince New York to ratify U.S Constitution
Authorship of 12 of the letters unclear between:

Alexander Hamilton James Madison

1963: solved by Mosteller and Wallace using Bayesian methods
Positive or negative movie review?
unbelievably disappointing
Full of zany characters and richly applied satire, and
some great plot twists
this is the greatest screwball comedy ever filmed
It was pathetic. The worst part about it was the
boxing scenes.

4
What is the subject of this article?

; Matter or property, the substance, properties or materials of the subject.

: Energy, including the processes, operations and activities.

. Space, which relates to the geographic location of the subject.

' Time, which refers to the dates or seasons of the subject.

Example: "research in the cure of tuberculosis of lungs by x-ray conducted in India in 1950"
would be categorized as:

Medicine,Lungs;Tuberculosis:Treatment;X-ray:Research.India'1950

https://en.wikipedia.org/wiki/Colon_classification
Text Classification
Assigning subject categories, topics, or genres
Spam detection
Authorship identification (who wrote this?)
Language Identification (is this Portuguese?)
Sentiment analysis
…
Text Classification: definition

Input:
◦ a document d
◦ a fixed set of classes C = {c1, c2,…, cJ}

Output: a predicted class c  C

Basic Classification Method:
Hand-coded rules
Rules based on combinations of words or other
features
◦ spam: black-list-address OR (“dollars” AND “have been selected”)
Accuracy can be high
• In very specific domains
• If rules are carefully refined by experts
But:
• building and maintaining rules is expensive
• they are too literal and specific: "high-precision, low-recall"
Classification Method:
Supervised Machine Learning
Input:
◦ a document d
◦ a fixed set of classes C = {c1, c2,…, cJ}
◦ A training set of m hand-labeled documents (d1,c1),....,(dm,cm)
Output:
◦ a learned classifier γ:d → c

10
Classification Methods:
Supervised Machine Learning
Many kinds of classifiers!
• Naïve Bayes (this lecture)
• Logistic regression
• Neural networks
• k-nearest neighbors
• …

We can also use pretrained large language models!

• Fine-tuned as classifiers
• Prompted to give a classification
Text The Naive Bayes Classifier
Classification
and Naive
Bayes
Naive Bayes Intuition

Simple ("naive") classification method based on

Bayes rule
Relies on very simple representation of document
◦ Bag of words
The Bag of Words Representation

14
The bag of words representation
seen 2

γ( )=c
sweet 1
whimsical 1
recommend 1
happy 1
... ...
Bayes’ Rule Applied to Documents and Classes

•For a document d and a class c

P(d | c)P(c)
P(c | d) =
P(d)
Naive Bayes Classifier (I)

cMAP = argmax P(c | d) MAP is “maximum a

posteriori” = most
cÎC likely class

P(d | c)P(c)
= argmax Bayes Rule

cÎC P(d)
= argmax P(d | c)P(c) Dropping the
denominator
cÎC
Naive Bayes Classifier (II)
"Likelihood" "Prior"

cMAP = argmax P(d | c)P(c)

cÎC
Document d

= argmax P(x1, x2,… , xn | c)P(c) represented as

features
cÎC x1..xn
Naïve Bayes Classifier (IV)

cMAP = argmax P(x1, x2,… , xn | c)P(c)

cÎC

O(|X|n•|C|) parameters How often does this

class occur?

Could only be estimated if a

We can just count the
very, very large number of relative frequencies in
training examples was a corpus

available.
Multinomial Naive Bayes Independence
Assumptions
P(x1, x2,… , xn | c)
Bag of Words assumption: Assume position doesn’t matter
Conditional Independence: Assume the feature
probabilities P(xi|cj) are independent given the class c.

P(x1,… , xn | c) = P(x1 | c)·P(x2 | c)·P(x3 | c)·...·P(xn | c)

Multinomial Naive Bayes Classifier

cMAP = argmax P(x1, x2,… , xn | c)P(c)

cÎC

cNB = argmax P(c j )Õ P(x | c)

cÎC xÎX
Applying Multinomial Naive Bayes Classifiers
to Text Classification

positions  all word positions in test document

cNB = argmax P(c j )

c j ÎC
Õ P(xi | c j )
iÎ positions
Problems with multiplying lots of probs
There's a problem with this:

cNB = argmax P(c j )

c j ÎC
Õ P(xi | c j )
iÎ positions

Multiplying lots of probabilities can result in floating-point underflow!

.0006 * .0007 * .0009 * .01 * .5 * .000008….
Idea: Use logs, because log(ab) = log(a) + log(b)
We'll sum logs of probabilities instead of multiplying probabilities!
We actually do everything in log space
Instead of this: cNB = argmax P(c j )
c j ÎC
Õ P(xi | c j )
iÎ positions

This:

Notes:
1) Taking log doesn't change the ranking of classes!
The class with highest probability also has highest log probability!
2) It's a linear model:
Just a max of a sum of weights: a linear function of the inputs
So naive bayes is a linear classifier
Text The Naive Bayes Classifier
Classification
and Naive
Bayes
Text
Classification Naive Bayes: Learning
and Naïve
Bayes
Sec.13.3

Learning the Multinomial Naive Bayes Model

First attempt: maximum likelihood estimates

◦ simply use the frequencies in the data

𝑁𝑐𝑗
𝑃෠ 𝑐𝑗 =
𝑁𝑡𝑜𝑡𝑎𝑙
count(wi , c j )
P̂(wi | c j ) =
å count(w, c j )
wÎV
Parameter estimation

count(wi , c j ) fraction of times word wi appears

P̂(wi | c j ) =
å count(w, c j ) among all words in documents of topic cj
wÎV

Create mega-document for topic j by concatenating all

docs in this topic
◦ Use frequency of w in mega-document
Sec.13.3

Problem with Maximum Likelihood

What if we have seen no training documents with the word fantastic

and classified in the topic positive (thumbs-up)?

count("fantastic", positive)
P̂("fantastic" positive) = = 0
å count(w, positive)
wÎV

Zero probabilities cannot be conditioned away, no matter the other

evidence!
cMAP = argmax c P̂(c)Õ P̂(xi | c)
i
Laplace (add-1) smoothing for Naïve Bayes

count(wi , c) +1
P̂(wi | c) =
å (count(w, c))+1)
wÎV

count(wi , c) +1
=
æ ö
çç å count(w, c)÷÷ + V
è wÎV ø
Multinomial Naïve Bayes: Learning

• From training corpus, extract Vocabulary

But removing stop words doesn't usually help

• So in practice most NB algorithms use all words and don't
use stopword lists
Text
Classification Naive Bayes: Learning
and Naive
Bayes
Text Sentiment and Binary
Classification Naive Bayes
and Naive
Bayes
Let's do a worked sentiment example!
A worked sentiment example with add-1 smoothing
1. Prior from training:
𝑃෠ 𝑐𝑗 =
𝑁𝑐𝑗 P(-) = 3/5
𝑁𝑡𝑜𝑡𝑎𝑙
P(+) = 2/5
2. Drop "with"
3. Likelihoods from training:
𝑐𝑜𝑢𝑛𝑡 𝑤𝑖 , 𝑐 + 1
𝑝 𝑤𝑖 𝑐 =
σ𝑤∈𝑉 𝑐𝑜𝑢𝑛𝑡 𝑤, 𝑐 + |𝑉| 4. Scoring the test set:
Optimizing for sentiment analysis
For tasks like sentiment, word occurrence seems to
be more important than word frequency.
◦ The occurrence of the word fantastic tells us a lot
◦ The fact that it occurs 5 times may not tell us much more.
Binary multinominal naive bayes, or binary NB
◦ Clip our word counts at 1
◦ Note: this is different than Bernoulli naive bayes; see the
textbook at the end of the chapter.
Binary Multinomial Naïve Bayes: Learning
• From training corpus, extract Vocabulary
Calculate P(cj) terms • Calculate P(wk | cj) terms
j  single doc containing all docsj
◦ For each cj in C do • Text
Remove duplicates in each doc:
docsj  all docs with class =cj • For
• For eacheach word
word wktype w in docj
in Vocabulary
n• Retain
# of only a single instance
occurrences of w inofText
w
| docs j | k k j
P(c j ) ¬ nk + a
| total # documents| P(wk | c j ) ¬
n + a | Vocabulary |
Binary Multinomial Naive Bayes
on a test document d
First remove all duplicate words from d
Then compute NB using the same equation:

cNB = argmax P(c j )

c j ÎC
Õ P(wi | c j )
iÎ positions

40
Binary multinominal naive Bayes
Binary multinominal naive Bayes
Binary multinominal naive Bayes
Binary multinominal naive Bayes

Counts can still be 2! Binarization is within-doc!

Text Sentiment and Binary
Classification Naive Bayes
and Naive
Bayes
Text More on Sentiment
Classification Classification
and Naive
Bayes
Sentiment Classification: Dealing with Negation
I really like this movie
I really don't like this movie

Negation changes the meaning of "like" to negative.

Negation can also change negative to positive-ish
◦ Don't dismiss this film
◦ Doesn't let us get bored
Sentiment Classification: Dealing with Negation
Das, Sanjiv and Mike Chen. 2001. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In
Proceedings of the Asia Pacific Finance Association Annual Conference (APFA).
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using
Machine Learning Techniques. EMNLP-2002, 79—86.

Simple baseline method:

Add NOT_ to every word between negation and following punctuation:

didn’t like this movie , but I

didn’t NOT_like NOT_this NOT_movie but I

Sentiment Classification: Lexicons
Sometimes we don't have enough labeled training
data
In that case, we can make use of pre-built word lists
Called lexicons
There are various publically available lexicons
MPQA Subjectivity Cues Lexicon
Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in
Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005.

Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003.

Home page: https://mpqa.cs.pitt.edu/lexicons/subj_lexicon/

6885 words from 8221 lemmas, annotated for intensity (strong/weak)
◦ 2718 positive
◦ 4912 negative
+ : admirable, beautiful, confident, dazzling, ecstatic, favor, glee, great
− : awful, bad, bias, catastrophe, cheat, deny, envious, foul, harsh, hate

50
Using Lexicons in Sentiment Classification
Add a feature that gets a count whenever a word
from the lexicon occurs
◦ E.g., a feature called "this word occurs in the positive
lexicon" or "this word occurs in the negative lexicon"
Now all positive words (good, great, beautiful,
wonderful) or negative words count for that feature.
Using 1-2 features isn't as good as using all the words.
• But when training data is sparse or not representative of the
test set, dense lexicon features can help
Naive Bayes in Other tasks: Spam Filtering
SpamAssassin Features:
◦ Mentions millions of (dollar) ((dollar) NN,NNN,NNN.NN)
◦ From: starts with many numbers
◦ Subject is all capitals
◦ HTML has a low ratio of text to image area
◦ "One hundred percent guaranteed"
◦ Claims you can be removed from the list
Naive Bayes in Language ID
Determining what language a piece of text is written in.
Features based on character n-grams do very well
Important to train on lots of varieties of each language
(e.g., American English varieties like African-American English,
or English varieties around the world like Indian English)
Knowledge to numbers
• Labeling few documents
• Labeling few words
Summary: Naive Bayes is Not So Naive
Very Fast, low storage requirements
Work well with very small amounts of training data
Robust to Irrelevant Features
Irrelevant Features cancel each other without affecting results

Very good in domains with many equally important features

Decision Trees suffer from fragmentation in such cases – especially if little data

Optimal if the independence assumptions hold: If assumed

independence is correct, then it is the Bayes Optimal Classifier for problem
A good dependable baseline for text classification
◦ But we will see other classifiers that give better accuracy
Slide from Chris Manning
Text More on Sentiment
Classification Classification
and Naive
Bayes
Text Classification
and Naïve Bayes

Naïve Bayes:
Relationship to
Language Modeling
Dan Jurafsky

Generative Model for Multinomial Naïve Bayes

c=China

X1=Shanghai X2=and X3=Shenzhen X4=issue X5=bonds

59
Dan Jurafsky

Naïve Bayes and Language Modeling

• Naïve bayes classifiers can use any sort of feature
• URL, email address, dictionaries, network features
• But if, as in the previous slides
• We use only word features
• we use all of the words in the text (not a subset)
• Then
• Naïve bayes has an important similarity to language
60 modeling.
Dan Jurafsky Sec.13.2.1

Each class = a unigram language model

• Assigning each word: P(word | c)
• Assigning each sentence: P(s|c)= P(word|c)
Class pos
0.1 I
I love this fun film
0.1 love
0.1 0.1 .05 0.01 0.1
0.01 this
0.05 fun
0.1 film P(s | pos) = 0.0000005
Dan Jurafsky Sec.13.2.1

Naïve Bayes as a Language Model

• Which class assigns the higher probability to s?

Model pos Model neg

0.1 I 0.2 I I love this fun film
0.1 love 0.001 love
0.1 0.1 0.01 0.05 0.1
0.01 this 0.01 this 0.2 0.001 0.01 0.005 0.1

0.05 fun 0.005 fun

0.1 film 0.1 film P(s|pos) > P(s|neg)
Text Classification
and Naïve Bayes

Naïve Bayes:
Relationship to
Language Modeling
Text Precision, Recall, and F1
Classification
and Naive
Bayes
Evaluating Classifiers: How well does our
classifier work?
Let's first address binary classifiers:
• Is this email spam?
spam (+) or not spam (-)
• Is this post about Delicious Pie Company?
about Del. Pie Co (+) or not about Del. Pie Co(-)

We'll need to know

1. What did our classifier say about each email or post?
2. What should our classifier have said, i.e., the correct
answer, usually as defined by humans ("gold label")
First step in evaluation: The confusion matrix

gold standard labels

gold positive gold negative
system system tp
positive true positive false positive precision = tp+fp
output
labels system
negative false negative true negative
tp tp+tn
recall = accuracy =
tp+fn tp+fp+tn+fn
Accuracy on the confusion matrix
gold standard labels
gold positive gold negative
system system tp
positive true positive false positive precision = tp+fp
output
labels system
negative false negative true negative
tp tp+tn
recall = accuracy =
tp+fn tp+fp+tn+fn
Why don't we use accuracy?
Accuracy doesn't work well when we're dealing with
uncommon or imbalanced classes
Suppose we look at 1,000,000 social media posts to find
Delicious Pie-lovers (or haters)
• 100 of them talk about our pie
• 999,900 are posts about something unrelated
Imagine the following simple classifier
Every post is "not about pie"
Accuracy re: pie posts 100 posts are about pie; 999,900 aren't

gold standard labels

gold positive gold negative
system system tp
positive true positive false positive precision = tp+fp
output
labels system
negative false negative true negative
tp tp+tn
recall = accuracy =
tp+fn tp+fp+tn+fn
Why don't we use accuracy?
Accuracy of our "nothing is pie" classifier
999,900 true negatives and 100 false negatives
Accuracy is 999,900/1,000,000 = 99.99%!
But useless at finding pie-lovers (or haters)!!
Which was our goal!
Accuracy doesn't work well for unbalanced classes
Most tweets are not about pie!
Instead of accuracy we use precision and recall
gold standard labels
gold positive gold negative
system system tp
positive true positive false positive precision = tp+fp
output
labels system
negative false negative true negative
tp tp+tn
recall = accuracy =
tp+fn tp+fp+tn+fn

Precision: % of selected items that are correct

Recall: % of correct items that are selected
Precision/Recall aren't fooled by the"just call
everything negative" classifier!
Stupid classifier: Just say no: every tweet is "not about pie"
• 100 tweets talk about pie, 999,900 tweets don't
• Accuracy = 999,900/1,000,000 = 99.99%
But the Recall and Precision for this classifier are terrible:
A combined measure: F1
F1 is a combination of precision and recall.
F1 is a special case of the general "F-measure"

F-measure is the (weighted) harmonic mean of

precision and recall

F1 is a special case of F-measure with β=1, α=½

Suppose we have more than 2 classes?
Lots of text classification tasks have more than two classes.
◦ Sentiment analysis (positive, negative, neutral) , named entities (person, location, organization)

We can define precision and recall for multiple classes like this 3-way email task:

gold labels
urgent normal spam
8
urgent 8 10 1 precisionu=
8+10+1
system 60
output normal 5 60 50 precisionn=
5+60+50
200
spam 3 30 200 precisions=
3+30+200
recallu = recalln = recalls =
8 60 200
8+5+3 10+60+30 1+50+200
How to combine P/R values for different classes:
Microaveraging vs Macroaveraging

Class 1: Urgent Class 2: Normal Class 3: Spam Pooled

true true true true true true true true
urgent not normal not spam not yes no
system system system system
urgent 8 11 normal 60 55 spam 200 33 yes 268 99
system system system system
not 8 340 not 40 212 not 51 83 no 99 635
8 60 200 microaverage = 268
precision = = .42 precision = = .52 precision = = .86 = .73
8+11 60+55 200+33 precision 268+99

macroaverage = .42+.52+.86
= .60
precision 3
Text Precision, Recall, and F1
Classification
and Naive
Bayes
Text Avoiding Harms in Classification
Classification
and Naive
Bayes
Harms of classification
Classifiers, like any NLP algorithm, can cause harms
This is true for any classifier, whether Naive Bayes or
other algorithms
Representational Harms
• Harms caused by a system that demeans a social group
• Such as by perpetuating negative stereotypes about them.
• Kiritchenko and Mohammad 2018 study
• Examined 200 sentiment analysis systems on pairs of sentences
• Identical except for names:
• common African American (Shaniqua) or European American (Stephanie).
• Like "I talked to Shaniqua yesterday" vs "I talked to Stephanie yesterday"
• Result: systems assigned lower sentiment and more negative
emotion to sentences with African American names
• Downstream harm:
• Perpetuates stereotypes about African Americans
• African Americans treated differently by NLP tools like sentiment (widely
used in marketing research, mental health studies, etc.)
Multinomial Naïve Bayes: Learning

• From training corpus, extract Vocabulary

Calculate P(cj) terms • Calculate P(wk | cj) terms
◦ For each cj in C do • Textj  single doc containing all docsj
docsj  all docs with class =cj • For each word wk in Vocabulary
nk  # of occurrences of wk in Textj
| docs j |
P(c j ) ¬ nk + a
| total # documents| P(wk | c j ) ¬
n + a | Vocabulary |
Harms of Censorship
• Toxicity detection is the text classification task of detecting hate speech,
abuse, harassment, or other kinds of toxic language.
• Widely used in online content moderation
• Toxicity classifiers incorrectly flag non-toxic sentences that simply
mention minority identities (like the words "blind" or "gay")
• women (Park et al., 2018),
• disabled people (Hutchinson et al., 2020)
• gay people (Dixon et al., 2018; Oliva et al., 2021)
• Downstream harms:
• Censorship of speech by disabled people and other groups
• Speech by these groups becomes less visible online
• Writers might be nudged by these algorithms to avoid these words
making people less likely to write about themselves or these groups.
Performance Disparities
1. Text classifiers perform worse on many languages of
the world due to lack of data or labels
2. Text classifiers perform worse on varieties of even
high-resource languages like English
• Example task: language identification, a first step in NLP
pipeline ("Is this post in English or not?")
• English language detection performance worse for writers
who are African American (Blodgett and O'Connor 2017)
or from India (Jurgens et al., 2017)
Harms in text classification
• Causes:
• Issues in the data; NLP systems amplify biases in training data
• Problems in the labels
• Problems in the algorithms (like what the model is trained to
optimize)
• Prevalence: The same problems occur throughout NLP
(including large language models)
• Solutions: There are no general mitigations or solutions
• But harm mitigation is an active area of research
• And there are standard benchmarks and tools that we can use
for measuring some of the harms
Text Annotation
• Multiple annotators
• Who? Why? What?
• Rationales
• Modeling (dis)agreement
• Annotation cost
• Positionality
• Invariance assumptions
Text Avoiding Harms in Classification
Classification
and Naive
Bayes

NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
100% (1)
NLP - PPT - Module 3 - Naïve Bayes, Text Classification and Sentiment
86 pages
Lecture 8-1 - Text Classification, Naïve Bayes, Vector Space Classification
No ratings yet
Lecture 8-1 - Text Classification, Naïve Bayes, Vector Space Classification
38 pages
3. Text Classification
No ratings yet
3. Text Classification
60 pages
04_1 06 naivebayes
No ratings yet
04_1 06 naivebayes
65 pages
GM 1927 21 DFMEA PFMEA Gap Analysis Process and Transition Form
No ratings yet
GM 1927 21 DFMEA PFMEA Gap Analysis Process and Transition Form
23 pages
Slp3 TextClassification Reduced
No ratings yet
Slp3 TextClassification Reduced
60 pages
Graphblas Introduction
No ratings yet
Graphblas Introduction
188 pages
BAI601 Module 3 PDF
No ratings yet
BAI601 Module 3 PDF
19 pages
nb24aug
No ratings yet
nb24aug
79 pages
10.5 Conversion Graphs
No ratings yet
10.5 Conversion Graphs
16 pages
Multimedia Application L8
No ratings yet
Multimedia Application L8
68 pages
Multimedia Application L7_for
No ratings yet
Multimedia Application L7_for
46 pages
AI Ethics
No ratings yet
AI Ethics
16 pages
Week4
No ratings yet
Week4
45 pages
05_NaiveBayesAndSentimentClassification
No ratings yet
05_NaiveBayesAndSentimentClassification
36 pages
Naive Bayes With Sentiment Classification
No ratings yet
Naive Bayes With Sentiment Classification
82 pages
4.Machine Learning for Text Understanding-1
No ratings yet
4.Machine Learning for Text Understanding-1
45 pages
Text Classification Using TF-IDF and Machine Learning
No ratings yet
Text Classification Using TF-IDF and Machine Learning
30 pages
Lecture03 Naive Bayes
No ratings yet
Lecture03 Naive Bayes
33 pages
Chapter 4 Text Classification
No ratings yet
Chapter 4 Text Classification
28 pages
4 Naive Bayes
No ratings yet
4 Naive Bayes
82 pages
Lecture-Feb20&25
No ratings yet
Lecture-Feb20&25
11 pages
NBayes-1-20-2011-ann
No ratings yet
NBayes-1-20-2011-ann
21 pages
Base Paper
No ratings yet
Base Paper
11 pages
AI Then and Now A Wild Ride from 2017 to 2025
100% (1)
AI Then and Now A Wild Ride from 2017 to 2025
2 pages
Lecture 12 Dr. Lamiaa
No ratings yet
Lecture 12 Dr. Lamiaa
21 pages
L5 TextClassification Updated
No ratings yet
L5 TextClassification Updated
179 pages
CCS Question bank
No ratings yet
CCS Question bank
36 pages
Text Classification
No ratings yet
Text Classification
53 pages
MultinomialNB
No ratings yet
MultinomialNB
52 pages
Lecture13 Nbayes
No ratings yet
Lecture13 Nbayes
56 pages
Chen Et Al. - 2022 - A Reinforcement Learning Agent for Obstacle-Avoiding Rectilinear Steiner Tree Construction
No ratings yet
Chen Et Al. - 2022 - A Reinforcement Learning Agent for Obstacle-Avoiding Rectilinear Steiner Tree Construction
9 pages
T4L1 Naive Bayes
No ratings yet
T4L1 Naive Bayes
50 pages
Naive Bayes
No ratings yet
Naive Bayes
56 pages
NB 24 Aug
No ratings yet
NB 24 Aug
82 pages
ML for ME S15-16 Naïve Bayes
No ratings yet
ML for ME S15-16 Naïve Bayes
17 pages
Unit-14
No ratings yet
Unit-14
23 pages
bag_of_words nlp
No ratings yet
bag_of_words nlp
23 pages
Untitled document
No ratings yet
Untitled document
4 pages
4 NB 2024
No ratings yet
4 NB 2024
82 pages
Document
No ratings yet
Document
7 pages
20250129_Lecture03_naivebayes
No ratings yet
20250129_Lecture03_naivebayes
25 pages
Top Machine Learning Informations About Different Algorithms
No ratings yet
Top Machine Learning Informations About Different Algorithms
63 pages
Naivebayes 2021
No ratings yet
Naivebayes 2021
77 pages
NLP NB
No ratings yet
NLP NB
52 pages
NLP ch4 l1
No ratings yet
NLP ch4 l1
23 pages
requirements_tool_use
No ratings yet
requirements_tool_use
2 pages
24 Shivangi DMDW
No ratings yet
24 Shivangi DMDW
12 pages
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper (1)
No ratings yet
Winter Semester 2023-24 CSE3015 ETH AP2023246000714 Quiz-I-Question-Paper (1)
74 pages
2011 BoA Content v04
No ratings yet
2011 BoA Content v04
134 pages
Naive_Bayes_Classifier_Presentation
No ratings yet
Naive_Bayes_Classifier_Presentation
10 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
48 pages
2022 Slide9 BayesML Eng
No ratings yet
2022 Slide9 BayesML Eng
34 pages
A5 2024
No ratings yet
A5 2024
5 pages
Text Classification in ML
No ratings yet
Text Classification in ML
47 pages
MLRD 2
No ratings yet
MLRD 2
15 pages
Image Upcaling With UNet
No ratings yet
Image Upcaling With UNet
42 pages
05 Naive Bayes - Relationship To Language Modeling 4-35
No ratings yet
05 Naive Bayes - Relationship To Language Modeling 4-35
2 pages
Binomial Set a.258f408
No ratings yet
Binomial Set a.258f408
1 page
Naive Bayes
No ratings yet
Naive Bayes
12 pages
UMA004
No ratings yet
UMA004
4 pages
Helmet_and_Number_Plate_Detection (1)
No ratings yet
Helmet_and_Number_Plate_Detection (1)
3 pages
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
No ratings yet
CS464 Chapter 4: Naïve Bayes: (Slides Based On The Slides Provided by Öznur Taştan and Mehmet Koyutürk)
55 pages
ST
No ratings yet
ST
51 pages
The Wiener Filter 3.1 The Wiener-Hopf Equation
No ratings yet
The Wiener Filter 3.1 The Wiener-Hopf Equation
31 pages
02 Text Processing PDF
No ratings yet
02 Text Processing PDF
70 pages
6. Naive Bayes
No ratings yet
6. Naive Bayes
26 pages
News Classifier Using Multinomial Naive Bayes
No ratings yet
News Classifier Using Multinomial Naive Bayes
15 pages
04 Textcat
No ratings yet
04 Textcat
101 pages
Interview Camp: Level: Medium Given A String, Find The Longest Substring With Unique Characters
No ratings yet
Interview Camp: Level: Medium Given A String, Find The Longest Substring With Unique Characters
3 pages
Matlab Codes: Appendix C
No ratings yet
Matlab Codes: Appendix C
5 pages
K.jeyabal Tpde CP
No ratings yet
K.jeyabal Tpde CP
10 pages
Dimensional Reduction in R
No ratings yet
Dimensional Reduction in R
24 pages
Sentiment Analysis: Using Naïve Bayes Classifier
No ratings yet
Sentiment Analysis: Using Naïve Bayes Classifier
18 pages
Col780 A1
No ratings yet
Col780 A1
4 pages
Tackling The Poor Assumptions of Naive Bayes Text Classifiers
No ratings yet
Tackling The Poor Assumptions of Naive Bayes Text Classifiers
8 pages
Week Eight Term Project
No ratings yet
Week Eight Term Project
5 pages
Estimating Volatilities and Correlations: Options, Futures, and Other Derivatives, 9th Edition, 1
No ratings yet
Estimating Volatilities and Correlations: Options, Futures, and Other Derivatives, 9th Edition, 1
34 pages
Naive Bayes and Sentiment
No ratings yet
Naive Bayes and Sentiment
19 pages
Inf2b Learn Note07 2up
No ratings yet
Inf2b Learn Note07 2up
5 pages
Dilshad Ali: Career Objective
No ratings yet
Dilshad Ali: Career Objective
2 pages
Naive Bayes
No ratings yet
Naive Bayes
3 pages
Structural Analysis by Direct Stiffness Method
100% (2)
Structural Analysis by Direct Stiffness Method
3 pages
NaiveBayes N Text Analytics
No ratings yet
NaiveBayes N Text Analytics
20 pages
Exercise 1corrected
No ratings yet
Exercise 1corrected
5 pages
Bayesian Learning
No ratings yet
Bayesian Learning
49 pages
Generative Adversarial Networks Review 1-06-08-1.edit
No ratings yet
Generative Adversarial Networks Review 1-06-08-1.edit
24 pages
An Approach of The Naive Bayes Classifier For The Document Classification
No ratings yet
An Approach of The Naive Bayes Classifier For The Document Classification
4 pages
Lecture-03 Modeling and Simulation of Discrete Event Systems
No ratings yet
Lecture-03 Modeling and Simulation of Discrete Event Systems
10 pages
PID Tuning Tutorial
No ratings yet
PID Tuning Tutorial
1 page
Na Ive Bayes Classifier
No ratings yet
Na Ive Bayes Classifier
3 pages
Tabel Distribusi Poisson
No ratings yet
Tabel Distribusi Poisson
2 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)

nb24aug

Uploaded by

nb24aug

Uploaded by

Text The Task of Text

Alexander Hamilton James Madison

MEDLINE Article MeSH Subject Category Hierarchy

; Matter or property, the substance, properties or materials of the subject.

: Energy, including the processes, operations and activities.

. Space, which relates to the geographic location of the subject.

' Time, which refers to the dates or seasons of the subject.

Output: a predicted class c  C

We can also use pretrained large language models!

Simple ("naive") classification method based on

•For a document d and a class c

cMAP = argmax P(c | d) MAP is “maximum a

cMAP = argmax P(d | c)P(c)

= argmax P(x1, x2,… , xn | c)P(c) represented as

cMAP = argmax P(x1, x2,… , xn | c)P(c)

O(|X|n•|C|) parameters How often does this

Could only be estimated if a

P(x1,… , xn | c) = P(x1 | c)·P(x2 | c)·P(x3 | c)·...·P(xn | c)

cMAP = argmax P(x1, x2,… , xn | c)P(c)

cNB = argmax P(c j )Õ P(x | c)

positions  all word positions in test document

cNB = argmax P(c j )

cNB = argmax P(c j )

Multiplying lots of probabilities can result in floating-point underflow!

Learning the Multinomial Naive Bayes Model

First attempt: maximum likelihood estimates

count(wi , c j ) fraction of times word wi appears

Create mega-document for topic j by concatenating all

Problem with Maximum Likelihood

What if we have seen no training documents with the word fantastic

Zero probabilities cannot be conditioned away, no matter the other

• From training corpus, extract Vocabulary

But removing stop words doesn't usually help

cNB = argmax P(c j )

Counts can still be 2! Binarization is within-doc!

Negation changes the meaning of "like" to negative.

Simple baseline method:

didn’t like this movie , but I

didn’t NOT_like NOT_this NOT_movie but I

Home page: https://mpqa.cs.pitt.edu/lexicons/subj_lexicon/

Very good in domains with many equally important features

Optimal if the independence assumptions hold: If assumed

Generative Model for Multinomial Naïve Bayes

X1=Shanghai X2=and X3=Shenzhen X4=issue X5=bonds

Naïve Bayes and Language Modeling

Each class = a unigram language model

Naïve Bayes as a Language Model

Model pos Model neg

0.05 fun 0.005 fun

We'll need to know

gold standard labels

gold standard labels

Precision: % of selected items that are correct

F-measure is the (weighted) harmonic mean of

F1 is a special case of F-measure with β=1, α=½

Class 1: Urgent Class 2: Normal Class 3: Spam Pooled

• From training corpus, extract Vocabulary

You might also like