L3 (Week3) Bayesian Classifier
L3 (Week3) Bayesian Classifier
Let D be a training set of tuples and their associated class labels, and each
tuple is represented by an n-D attribute vector X = (x1, x2, …, xn)
Suppose there are m classes C1, C2, …, Cm.
Classification is to derive the maximum posteriori, i.e., the maximal
P(Ci|X)
This can be derived from Bayes’ theorem
P(X | C )P(C )
P(C | X) = i i
i P(X)
Since P(X) is constant for all classes, only
A patient takes a lab test and the result comes back positive. It is known
that the test returns a correct positive result in only 99% of the cases and a
correct negative result in only 95% of the cases. Furthermore, only 0.03 of
the entire population has this disease.
P ( D | h) P ( h)
= argmax
hH P ( D)
= argmax P( D | h) P(h)
hH
H: set of all hypothesis.
Note that we can drop P(D) as the probability of the data is constant
(and independent of the hypothesis).
Maximum Likelihood
Now assume that all hypotheses are equally probable a priori,
i.e., P(hi ) = P(hj ) for all hi, hj belong to H.
This is called assuming a uniform prior. It simplifies computing
the posterior:
P( x1 , x2 , , xn | c j ) P(c j )
= argmax
c j C P( x1 , x2 ,, xn )
= argmax P( x1 , x2 ,, xn | c j ) P(c j )
c j C
Parameters estimation
P(cj)
Can be estimated from the frequency of classes in the training
examples.
P(x1,x2,…,xn|cj)
O(|X|n•|C|) parameters
Could only be estimated if a very, very large number of training
examples was available.
Independence Assumption: attribute values are conditionally
independent given the target value: naïve Bayes.
P( x1 , x 2 ,, x n | c j ) = P( xi | c j )
i
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6 >40 medium no excellent no
gi ( x), i = 1,..., K
gi ( x) = max g k ( x)
k
Discriminant Functions
choose if (x ) = max (x ) (x ) =1
− ( | x )
(x ) = ( | x )
(x | ) ( )
R1,...,R
R = x | (x ) = max (x )
1
8
K=2 Classes
1 if (x ) 0
choose
2 otherwise
Log odds:
log
( 1 | x)
( 2 | x)
Naïve Bayes Classifier: Comments
Advantages
Easy to implement
Good results obtained in most of the cases
Disadvantages
Assumption: class conditional independence, therefore loss of accuracy
Practically, dependencies exist among variables
E.g., hospitals: patients: Profile: age, family history, etc.
Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
Dependencies among these cannot be modeled by Naïve Bayes Classifier
How to deal with these dependencies? Bayesian Belief Networks (Chapter
9)
Textbook/ Reference Materials