0% found this document useful (0 votes)

24 views49 pages

Class Adv Classification IV

Bayesian classifiers are statistical classifiers that predict class membership probabilities using Bayes Theorem, with Naïve Bayesian classifiers being computationally simple and effective. They calculate explicit probabilities for hypotheses, allowing for incremental learning by combining prior knowledge with observed data. The Naïve Bayes model assumes independence among features given the class label, simplifying computations and making it practical for various applications such as digit recognition and classification tasks.

Uploaded by

fakertoolzz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views49 pages

Class Adv Classification IV

Uploaded by

fakertoolzz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 49

Bayesian Classification

What are Bayesian Classifiers?

 Statistical Classifiers
 Predict class membership probabilities
 Based on Bayes Theorem
 Naïve Bayesian Classifier
 Computationally Simple
 Comparable performance with DT and
NN classifiers
Bayesian Classification

 Probabilistic learning: Calculate explicit

probabilities for hypothesis, among the
most practical approaches to certain types
of learning problems
 Incremental: Each training example can
incrementally increase/decrease the
probability that a hypothesis is correct.
Prior knowledge can be combined with
observed data.
Bayes Theorem
 Let X be a data sample whose class label is
unknown
 Let H be some hypothesis that X belongs to a
class C
 For classification determine P(H|X)
 P(H|X) is the probability that H holds given the
observed data sample X
 P(H|X) is posterior probability of H conditioned on
X
Bayes Theorem
Example: Sample space: All Fruits described
by their color and shape
X is “round” and “red”
H= hypothesis that X is an Apple
P(H|X) is our confidence that X is an apple
given that X is “round” and “red”
 P(H) is Prior Probability of H, i.e. the
probability that any given data sample is
an apple regardless of how it looks
 P(H|X) is based on more information
 Note that P(H) is independent of X
Bayes Theorem
Example: Sample space: All Fruits
 P(X|H) ?
 It is the probability that X is round and
red given that we know that it is true
that X is an apple
 Here P(X) is prior probability =
P(data sample from our set of fruits is
red and round)
Estimating Probabilities

 P(X), P(H), and P(X|H) may be estimated from

given data
 Bayes Theorem

P(H | X ) P( X | H )P(H )

P( X )
 Use of Bayes Theorem in Naïve Bayesian
Classifier!!
Bayesian Classifiers
• Consider each attribute and class label as random variables

• Given a record with attributes (A1, A2,…,An)

– Goal is to predict class C
– Specifically, we want to find the value of C that maximizes
P(C| A1, A2,…,An )

• Can we estimate P(C| A1, A2,…,An ) directly from data?

Application

• Digit Recognition

Classifier 5

• X1,…,Xn  {0,1} (Black vs. White pixels)

• Y  {5,6} (predict whether a digit is a 5 or a 6)
The Bayes Classifier
• In class, we saw that a good strategy is to predict:

– (for example: what is the probability that the image represents a

5 given its pixels?)

• So … How do we compute that?

The Bayes Classifier

• Use Bayes Rule!

Likelihood Prior

Normalization Constant

• Why did this help? Well, we think that we might be able to specify how features
are “generated” by the class label
The Bayes Classifier
• Let’s expand this for our digit recognition task:

• To classify, we’ll simply compute these two probabilities and

predict based on which one is greater
Model Parameters

• For the Bayes classifier, we need to “learn” two functions,

the likelihood and the prior

• How many parameters are required to specify the prior for

our digit recognition example?
Model Parameters

• How many parameters are required to specify the likelihood?

– (Supposing that each image is 30x30 pixels)

?
Model Parameters

• The problem with explicitly modeling P(X1,…,Xn|Y) is that there are usually way
too many parameters:
– We’ll run out of space
– We’ll run out of time
– And we’ll need tons of training data (which is usually not
available)
The Naïve Bayes Model

• The Naïve Bayes Assumption: Assume that all features are

independent given the class label Y
• Equationally speaking:
Why is this useful?

• # of parameters for modeling P(X1,…,Xn|Y):

 2(2n-1)

• # of parameters for modeling P(X1|Y),…,P(Xn|Y)

 2n
Naïve Bayesian Classification
 Also called Simple Bayesian Classification
 Why Naïve/Simple??
 Class Conditional Independence
Effect of an attribute values on a given class
is independent of the values of other attributes
 This assumption simplifies computations
Bayesian Classifiers
• Approach:
– compute the posterior probability P(C | A1, A2, …, An) for all
values of C using the Bayes theorem
P ( A A  A | C ) P (C )
P (C | A A  A )  1 2 n

P( A A  A )
1 2 n

1 2 n

– Choose value of C that maximizes

P(C | A1, A2, …, An)

– Equivalent to choosing value of C that maximizes

P(A1, A2, …, An|C) P(C)

• How to estimate P(A1, A2, …, An | C )?

Naïve Bayes Classifier

• Assume independence among attributes Ai when class is

given:
– P(A1, A2, …, An |C) = P(A1| Cj) P(A2| Cj)… P(An| Cj)

– Can estimate P(Ai| Cj) for all Ai and Cj.

– New point is classified to Cj if P(Cj)  P(Ai| Cj) is maximal.

Example
How to Estimate Probabilities from Data?

• Class: P(C) = Nc/N

Tid Refund M arital Taxable
– e.g., P(No) = 7/10,
Status Incom e Evade P(Yes) = 3/10
1
2
Yes
No
Single
M arried
125K
100K
No
No
For discrete attributes:
3
4
No
Yes
Single
M arried
70K
120K
No
No
P(Ai | Ck) = |Aik|/ Nc k
5
6
No
No
Divorced
M arried
95K
60K
Yes
No
– where |Aik| is number of
7 Yes Divorced 220K No instances having attribute
Ai and belongs to class Ck
8 No Single 85K Yes
9 No M arried 75K No

10
10 No Single 90K Yes
– Examples:
P(Status=Married|No) = 4/7
P(Refund=Yes|Yes)=0
How to Estimate Probabilities from Data?
• For continuous attributes:
– Discretize the range into bins
– Two-way split: (A < v) or (A > v)
• choose only one of the two splits as new attribute
– Probability density estimation:
• Assume attribute follows a normal distribution
• Use data to estimate parameters of distribution
(e.g., mean and standard deviation)
• Once probability distribution is known, can use it to estimate the
conditional probability P(Ai|c)
How to Estimate Probabilities from Data?

• Normal distribution:
( Ai  ij ) 2

Tid Refund Marital Taxable 1 2 ij2
Status Incom e Evade
P ( Ai | c j )  e
1 Yes Single 125K No 2  ij
2 No Married 100K No
3 No Single 70K No
– One for each (Ai,ci) pair
4 Yes Married 120K No
5 No Divorced 95K Yes • For (Income, class=No):
6 No Married 60K No
7 Yes Divorced 220K No – If Class=No
8 No Single 85K Yes
• sample mean = 110
9 No Married 75K No
10 No Single 90K Yes • sample variance = 2975
10

1 
( 120 110 ) 2

P ( Income 120 | No )  e 2 ( 2975)

0.0072
2 (54.54)
Example of Naïve Bayes Classifier
Given a Test Record:

X (Refund No, Married, Income 120K)

naive Bayes Classifier:

N c: number of classes
Original : P ( Ai | C )  ic
Nc p: prior probability

N 1 m: parameter
Laplace : P ( Ai | C )  ic
Nc  c
N ic  mp
m - estimate : P ( Ai | C ) 
Nc  m
Example of Naïve Bayes Classifier
Name Give Birth Can Fly Live in Water Have Legs Class
human yes no no yes mammals
python no no no no non-mammals
salmon no no yes no non-mammals
A: attributes
whale yes no yes no mammals
frog no no sometimes yes non-mammals M: mammals
komodo no no no yes non-mammals N: non-mammals
bat yes yes no yes mammals
pigeon no yes no yes non-mammals
cat yes no no yes mammals 6 6 2 2
P ( A | M )     0.06
leopard shark yes no yes no non-mammals 7 7 7 7
turtle no no sometimes yes non-mammals
1 10 3 4
penguin no no sometimes yes non-mammals P ( A | N )     0.0042
porcupine yes no no yes mammals 13 13 13 13
eel no no yes no non-mammals 7
salamander no no sometimes yes non-mammals P ( A | M ) P ( M ) 0.06  0.021
gila monster no no no yes non-mammals 20
platypus no no no yes mammals 13
owl no yes no yes non-mammals P ( A | N ) P ( N ) 0.004  0.0027
dolphin yes no yes no mammals 20
eagle no yes no yes non-mammals

P(A|M)P(M) > P(A|N)P(N)

=> Mammals
Give Birth Can Fly Live in Water Have Legs Class
yes no yes no ?
Naïve Bayesian Classification
Example
Age Income Student Credit_ratin Class:Buys_com
g p
<=30 HIGH N FAIR N
<=30 HIGH N EXCELLENT N
31…..40 HIGH N FAIR Y
>40 MEDIUM N FAIR Y
>40 LOW Y FAIR Y
>40 LOW Y EXCELLENT N
31…..40 LOW Y EXCELLENT Y
<=30 MEDIUM N FAIR N
<=30 LOW Y FAIR Y
>40 MEDIUM Y FAIR Y
<=30 MEDIUM Y EXCELLENT Y
31….40 MEDIUM N EXCELLENT Y
31….40 HIGH Y FAIR Y
>40 MEDIUM N EXCELLENT N
Naïve Bayesian Classification
Example
A= (<=30,MEDIUM, Y,FAIR, ???)
We need to max.
P(A|Cj)P(Cj) for j =1,2.
P(Cj) is computed from training sample
P(buys_comp=Y) = 9/14 = 0.643
P(buys_comp=N) = 5/14 = 0.357
How to calculate P(X|Ci)P(Ci) for i=1,2?
P(A|Cj) P(A1, A2, A3, A4|C) = PP(Ak|C)
Naïve Bayesian Classification

Example
P(X | buys_comp=Y)=0.222*0.444*0.667*0.667=0.044
P(X | buys_comp=N)=0.600*0.400*0.200*0.400=0.019

P(X | buys_comp=Y)P(buys_comp=Y) =
0.044*0.643=0.028
P(X | buys_comp=N)P(buys_comp=N) =
0.019*0.357=0.007

CONCLUSION: A buys computer

Problem to Solve

(a)Estimate the conditional probabilities for

P(Red|Yes), P(SUV|Yes), P(Domestic|
Yes) ,P(Red|No) , P(SUV|No), and
P(Domestic|No)using the m-estimate
approach, with p=1/2 and m=3
(b) Predict the class label for a test sample (Red
Domestic SUV) using Naïve Bayes
approach.
Probability of Error - Bayes Error Rate
Probability of Error
Naïve Bayes (Summary)
• Robust to isolated noise points
• Handle missing values by ignoring the instance during
probability estimate calculations
• Robust to irrelevant attributes
• Independence assumption may not hold for some
attributes
– Use other techniques such as Bayesian Belief Networks
(BBN)
Ensemble Classifiers

• Introduction & Motivation

• Methods to create Ensemble
• Bais Variance Decomposition
• Construction of Ensemble Classifiers
–Bagging
–Boosting (Ada Boost)
–Random Forests
Introduction & Motivation
• Suppose that you are a patient with a set of symptoms
• Instead of taking opinion of just one doctor (classifier),
you decide to take opinion of a few doctors!
• Is this a good idea? Indeed it is.
• Consult many doctors and then based on their diagnosis;
you can get a fairly accurate idea of the diagnosis.
• Majority voting - ‘bagging’
• More weightage to the opinion of some ‘good’ (accurate)
doctors - ‘boosting’
• In bagging, you give equal weightage to all classifiers,
whereas in boosting you give weightage according to the
accuracy of the classifier.
Ensemble Methods

• Construct a set of classifiers from the training data

• Predict class label of previously unseen records by

aggregating predictions made by multiple classifiers
General Idea
Original
D Training data

Step 1:
Create Multiple D1 D2 .... Dt-1 Dt
Data Sets

Step 2:
Build Multiple C1 C2 Ct -1 Ct
Classifiers

Step 3:
Combine C*
Classifiers

Figure taken from Tan et. al. book “Introduction to Data Mining”
Rationale for Ensemble Methods

• Statistical reasons
–A set of classifiers with similar training performances may
have different generalization performances.
–Combining outputs of several classifiers reduces the risk of
selecting a poorly performing classifier.
• Large volumes of data
–If the amount of data to be analyzed is too large, a single
classifier may not be able to handle it; train different
classifiers on different partitions of data.
• Too little data
–Ensemble systems can also be used when there is too little
data; resampling techniques.
Rationale for Ensemble Methods
Why does it work?
• Suppose there are 25 base classifiers
– Each classifier has error rate,  = 0.35
– Assume classifiers are independent
– Probability that the ensemble classifier makes a wrong
prediction:
25
 25  i
 
 i 
i 13 
  (1   ) 25 i
0.06


– CHK out yourself if it is correct!!

Example taken from Tan et. al. book “Introduction to Data Mining”
Ensemble Classifiers (EC)
• An ensemble classifier constructs a set of ‘base
classifiers’ from the training data
• Methods for constructing an EC
• Manipulating training set
• Manipulating input features
• Manipulating class labels
• Manipulating learning algorithms
Ensemble Classifiers (EC)
• Manipulating training set
• Multiple training sets are created by resampling the data
according to some sampling distribution
• Sampling distribution determines how likely it is that an
example will be selected for training – may vary from one trial
to another
• Classifier is built from each training set using a particular
learning algorithm
• Examples: Bagging & Boosting
Ensemble Classifiers (EC)

• Manipulating input features

• Subset of input features chosen to form each training set
• Subset can be chosen randomly or based on inputs given by
Domain Experts
• Good for data that has redundant features
• Random Forest is an example which uses DT as its base
classifiers
Ensemble Classifiers (EC)

• Manipulating class labels

• When no. of classes is sufficiently large
• Training data is transformed into a binary class problem by
randomly partitioning the class labels into 2 disjoint subsets, A0
& A1
• Re-labelled examples are used to train a base classifier
• By repeating the class labeling and model building steps several
times, and ensemble of base classifiers is obtained
• How a new tuple is classified?
• Example – error correcting output coding (pp 307)
Ensemble Classifiers (EC)

• Manipulating learning algorithm

• Learning algorithms can be manipulated in such a way that
applying the algorithm several times on the same training data
may result in different models
• Example – ANN can produce different models by changing
network topology or the initial weights of links between
neurons
• Example – ensemble of DTs can be constructed by introducing
randomness into the tree growing procedure – instead of
choosing the best split attribute at each node, we randomly
choose one of the top k attributes
Ensemble Classifiers (EC)

• First 3 approaches are generic – can be applied to any

classifier
• Fourth approach depends on the type of classifier used
• Base classifiers can be generated sequentially or in
parallel
Typical Ensemble Procedure

ML Lec 15 Naive Bayes
No ratings yet
ML Lec 15 Naive Bayes
16 pages
Naïve Bayes for Data Scientists
No ratings yet
Naïve Bayes for Data Scientists
31 pages
ML 09 Naive Bayes Classifier
No ratings yet
ML 09 Naive Bayes Classifier
24 pages
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
No ratings yet
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
26 pages
Classification With NaiveBayes
No ratings yet
Classification With NaiveBayes
19 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
Navie Classifier
No ratings yet
Navie Classifier
8 pages
Naive Bayes
No ratings yet
Naive Bayes
13 pages
Naive Bayes
No ratings yet
Naive Bayes
19 pages
Naive Bayes
No ratings yet
Naive Bayes
19 pages
Naïve Bayes Classifier (Week 8)
No ratings yet
Naïve Bayes Classifier (Week 8)
18 pages
Foundations of Data Science - Unit 6 - Naive Bayes
No ratings yet
Foundations of Data Science - Unit 6 - Naive Bayes
12 pages
Naive Bayes
No ratings yet
Naive Bayes
9 pages
Naive Bayes
No ratings yet
Naive Bayes
13 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
DM NaiveBayes
No ratings yet
DM NaiveBayes
15 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
21 pages
Bayesian Classification, Nearest
No ratings yet
Bayesian Classification, Nearest
46 pages
6 - Naive Bayes
No ratings yet
6 - Naive Bayes
26 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
PR January20 05 PDF
No ratings yet
PR January20 05 PDF
24 pages
Lecture 7
No ratings yet
Lecture 7
15 pages
CCS - Lec 5
No ratings yet
CCS - Lec 5
33 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Classification (Naive Bayes)
No ratings yet
Classification (Naive Bayes)
40 pages
Bayes Classification
No ratings yet
Bayes Classification
9 pages
Pgm5 With Output
No ratings yet
Pgm5 With Output
13 pages
Bayes' Theorem for Data Science
No ratings yet
Bayes' Theorem for Data Science
10 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
Unit6 - 3 Classification-Bayesian
No ratings yet
Unit6 - 3 Classification-Bayesian
15 pages
Naive Ba Yes
No ratings yet
Naive Ba Yes
28 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
ML Lecture#5
No ratings yet
ML Lecture#5
65 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
2.3 Bayes Classification
No ratings yet
2.3 Bayes Classification
15 pages
Bayesian Classification Guide
No ratings yet
Bayesian Classification Guide
6 pages
A5 PDF
No ratings yet
A5 PDF
9 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
Lecture 5-Naïve Bayes
No ratings yet
Lecture 5-Naïve Bayes
26 pages
AI Notes
No ratings yet
AI Notes
19 pages
Lecture - 4.1 - Bayes Classifier
No ratings yet
Lecture - 4.1 - Bayes Classifier
31 pages
Naïve Bayes Classifier: April 25, 2006
No ratings yet
Naïve Bayes Classifier: April 25, 2006
19 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 3
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 3
18 pages
Lecture-7 Classification Using Naive Bays
No ratings yet
Lecture-7 Classification Using Naive Bays
19 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
2 Naive Bayes
No ratings yet
2 Naive Bayes
49 pages
Naive by
No ratings yet
Naive by
23 pages
06 - NaiveBayes and ME
No ratings yet
06 - NaiveBayes and ME
25 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
Bayesian Learning
No ratings yet
Bayesian Learning
58 pages
Module 3 - Bayesian Classifier
No ratings yet
Module 3 - Bayesian Classifier
17 pages
Paperkoutandos
No ratings yet
Paperkoutandos
2 pages
Power System State Estimation
100% (4)
Power System State Estimation
54 pages
Graphical Solution - Mohr'S Stress Circle
No ratings yet
Graphical Solution - Mohr'S Stress Circle
4 pages
Irpwm Acs 21
No ratings yet
Irpwm Acs 21
52 pages
Reed Valve Dynamics of Reciprocating Compressor A Review IJERTV5IS090110
No ratings yet
Reed Valve Dynamics of Reciprocating Compressor A Review IJERTV5IS090110
5 pages
3rd Grade Math Word Problems Free Worksheets With Answers - Mashup Math
No ratings yet
3rd Grade Math Word Problems Free Worksheets With Answers - Mashup Math
1 page
2024 Pred 2 Hans
No ratings yet
2024 Pred 2 Hans
20 pages
Vertex-Cover to CNF-SAT Reduction
No ratings yet
Vertex-Cover to CNF-SAT Reduction
1 page
Quiz-2 Along With Solution
No ratings yet
Quiz-2 Along With Solution
2 pages
Control Systems Course Overview
No ratings yet
Control Systems Course Overview
50 pages
Lines Cut by A Transversal/ Parallel Lines Name:: Go To My Youtube Channel and Select "HW8"
No ratings yet
Lines Cut by A Transversal/ Parallel Lines Name:: Go To My Youtube Channel and Select "HW8"
1 page
Research Topics
No ratings yet
Research Topics
15 pages
Housing Price Prediction Modeling Using Machine Learning
No ratings yet
Housing Price Prediction Modeling Using Machine Learning
6 pages
ch07 fn202
No ratings yet
ch07 fn202
61 pages
Network Analysis (ECE - 2103)
No ratings yet
Network Analysis (ECE - 2103)
3 pages
POLYIBADAN .: Home - Student Portal
No ratings yet
POLYIBADAN .: Home - Student Portal
2 pages
Deep Learning Quiz With Answers
No ratings yet
Deep Learning Quiz With Answers
11 pages
(Course Outline) Network Analysis and Synthesis
No ratings yet
(Course Outline) Network Analysis and Synthesis
2 pages
Magnetic Force (Assignment-2)
No ratings yet
Magnetic Force (Assignment-2)
6 pages
ScrewTheoryRoboticsKINEMATICS PBook PrePrint VShort
No ratings yet
ScrewTheoryRoboticsKINEMATICS PBook PrePrint VShort
29 pages
Consciousness and MetaPhysics - Foreword
No ratings yet
Consciousness and MetaPhysics - Foreword
4 pages
Essential Elements Math Pacing Guide February Part 1
No ratings yet
Essential Elements Math Pacing Guide February Part 1
90 pages
Newtons Laws of Motion
No ratings yet
Newtons Laws of Motion
50 pages
Solving Trigo Func
No ratings yet
Solving Trigo Func
11 pages
Fluid Mechanics Lecture Notes
100% (1)
Fluid Mechanics Lecture Notes
21 pages
Escape Room Lesson
60% (5)
Escape Room Lesson
14 pages
Kuchar - Aperture Coupled Micro Strip Patch Antenna Array - 1996
No ratings yet
Kuchar - Aperture Coupled Micro Strip Patch Antenna Array - 1996
91 pages
Linear Motion Basics for Students
No ratings yet
Linear Motion Basics for Students
6 pages
Python Programming: An Introduction To Computer Science: Object-Oriented Design
No ratings yet
Python Programming: An Introduction To Computer Science: Object-Oriented Design
141 pages
Icsis 2022 Face Morphing
No ratings yet
Icsis 2022 Face Morphing
10 pages

Class Adv Classification IV

Uploaded by

Class Adv Classification IV

Uploaded by

Bayesian Classification

What are Bayesian Classifiers?

 Probabilistic learning: Calculate explicit

 P(X), P(H), and P(X|H) may be estimated from

P(H | X ) P( X | H )P(H )

• Given a record with attributes (A1, A2,…,An)

• Can we estimate P(C| A1, A2,…,An ) directly from data?

• X1,…,Xn  {0,1} (Black vs. White pixels)

– (for example: what is the probability that the image represents a

• So … How do we compute that?

• Use Bayes Rule!

• To classify, we’ll simply compute these two probabilities and

• For the Bayes classifier, we need to “learn” two functions,

• How many parameters are required to specify the prior for

• How many parameters are required to specify the likelihood?

• The Naïve Bayes Assumption: Assume that all features are

• # of parameters for modeling P(X1,…,Xn|Y):

• # of parameters for modeling P(X1|Y),…,P(Xn|Y)

– Choose value of C that maximizes

– Equivalent to choosing value of C that maximizes

• How to estimate P(A1, A2, …, An | C )?

• Assume independence among attributes Ai when class is

– Can estimate P(Ai| Cj) for all Ai and Cj.

– New point is classified to Cj if P(Cj)  P(Ai| Cj) is maximal.

• Class: P(C) = Nc/N

P ( Income 120 | No )  e 2 ( 2975)

X (Refund No, Married, Income 120K)

naive Bayes Classifier:

P(A|M)P(M) > P(A|N)P(N)

CONCLUSION: A buys computer

(a)Estimate the conditional probabilities for

• Introduction & Motivation

• Construct a set of classifiers from the training data

• Predict class label of previously unseen records by

– CHK out yourself if it is correct!!

• Manipulating input features

• Manipulating class labels

• Manipulating learning algorithm

• First 3 approaches are generic – can be applied to any

You might also like