Bayesian Learning
Bayesian Learning
Bayesian Classifiers
1
BAYESIAN LEARNING
2
BAYESIAN LEARNING
Bayes Theorem
3
BAYESIAN LEARNING
4
BAYESIAN LEARNING
Bayes Theorem
5
BAYESIAN LEARNING
6
BAYESIAN LEARNING
7
BAYESIAN LEARNING
It works as follows:
8
BAYESIAN LEARNING
9
BAYESIAN LEARNING
If class prior probabilities are equal (or not known and thus
assumed to be equal) then we need to calculate only P(X|Ci)
10
BAYESIAN LEARNING
12
BAYESIAN LEARNING
Example
P(colour shape | apple) = P(colour | apple) P(shape | apple)
13
BAYESIAN LEARNING
Naïve (Simple) Bayesian Classification
For an attribute Ak, which can take on the values x1k, x2k, …
e.g. colour = red, green, …
P(xk|Ci) = sik/si
Example:
15
Play-tennis example: estimating
P(xi|C) outlook
Outlook Temperature Humidity Windy Class P(sunny|p) = 2/9 P(sunny|n) = 3/5
sunny hot high false N
sunny hot high true N P(overcast|p) = 4/9 P(overcast|n) = 0
overcast hot high false P
rain mild high false P P(rain|p) = 3/9 P(rain|n) = 2/5
rain cool normal false P
rain cool normal true N temperature
overcast cool normal true P
sunny mild high false N P(hot|p) = 2/9 P(hot|n) = 2/5
sunny cool normal false P
rain mild normal false P P(mild|p) = 4/9 P(mild|n) = 2/5
sunny mild normal true P
overcast
overcast
mild
hot
high true
normal false
P
P
P(cool|p) = 3/9 P(cool|n) = 1/5
rain mild high true N
humidity
P(high|p) = 3/9 P(high|n) = 4/5
P(p) = 9/14 P(normal|p) = 6/9 P(normal|n) = 2/5
windy
P(n) = 5/14
P(true|p) = 3/9 P(true|n) = 3/5
P(false|p) = 6/9 P(false|n) = 2/5
Naive Bayesian Classifier (II)
P(X|p)·P(p) =
P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) =
3/9·2/9·3/9·6/9·9/14 = 0.010582
P(X|n)·P(n) =
P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) =
2/5·2/5·4/5·2/5·5/14 = 0.018286
Training dataset
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
Class: 30…40 high no fair yes
C1:buys_computer=>40 medium no fair yes
‘yes’ >40 low yes fair yes
C2:buys_computer=>40 low yes excellent no
‘no’ 31…40 low yes excellent yes
<=30 medium no fair no
Data sample <=30 low yes fair yes
X =(age<=30, >40 medium yes fair yes
Income=medium, <=30 medium yes excellent yes
Student=yes 31…40 medium no excellent yes
Credit_rating= 31…40 high yes fair yes
Fair) >40 medium no excellent 19 no
BAYESIAN LEARNING
Example:
Example:
To compute P(X|Ci) we compute the following conditional
probabilities
22
BAYESIAN LEARNING
Example:
Using the above probabilities we obtain
23
BAYESIAN LEARNING
Design issue:
- How to represent a text document in terms of
attribute values
24
BAYESIAN LEARNING
One approach:
- The attributes are the word positions
- Value of an attribute is the word found in that
position
Second approach:
The frequency with which a word occurs is counted
irrespective of the word’s position
26
BAYESIAN LEARNING
Results
27
BAYESIAN LEARNING
Results
28
BAYESIAN LEARNING
Minor Variant
29
Avoiding the Zero-Probability Problem
• Ex.
Suppose a dataset with 1000 tuples, income=low (0), income=
medium (990), and income = high (10)
• Use Laplacian correction (or Laplacian estimator)
• Adding 1 to each case
31
The independence hypothesis…
Diabetes Mass
Insulin Glucose
34
BAYESIAN LEARNING
Insulin Glucose
35
BAYESIAN LEARNING
36
BAYESIAN LEARNING
37
BAYESIAN LEARNING
38
BAYESIAN LEARNING
40
BAYESIAN LEARNING
41
BAYESIAN LEARNING
42
BAYESIAN LEARNING
43
BAYESIAN LEARNING
44
BAYESIAN LEARNING
45
BAYESIAN LEARNING
46
BAYESIAN LEARNING
Reference
Chapter 6 of T. Mitchell
47
INSTANCE - BASED LEARNING
k – NEAREST NEIGHBOUR
48
k – NEAREST NEIGHBOUR LEARNING
Introduction
Key Idea:
Just store all training examples <xi, f(xi)>
49
k – NEAREST NEIGHBOUR LEARNING
Introduction
Classification Algorithm:
* Given query instance xq
* Locate nearest training example xn
* Estimate
50
k – NEAREST NEIGHBOUR LEARNING
Introduction
51
k – NEAREST NEIGHBOUR LEARNING
Introduction
52
k – NEAREST NEIGHBOUR LEARNING
Introduction
53
k – NEAREST NEIGHBOUR LEARNING
54
k – NEAREST NEIGHBOUR LEARNING
where
55
k – NEAREST NEIGHBOUR LEARNING
56
k – NEAREST NEIGHBOUR LEARNING
57
k – NEAREST NEIGHBOUR LEARNING
Remarks
Advantages
• It is robust to noise
• Training is fast
Disadvantages
58
Reading Assignment & References
Chapter 8 of T. Mitchell
59