0% found this document useful (0 votes)
50 views

Bayesian Classification

Here are the probabilities for this example: P(C1) = 2/5 P(C2) = 3/5 P(age|C1) = 1/2 P(age|C2) = 2/3 P(income|C1) = 1/2 P(income|C2) = 1/3 P(studentcredit_rating|C1) = 1/2 P(studentcredit_rating|C2) = 2/3 Given a new instance: age > 40, income = low, studentcredit_rating = yes fair, the classifier would predict: P(C1) = 2/5

Uploaded by

kanika sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Bayesian Classification

Here are the probabilities for this example: P(C1) = 2/5 P(C2) = 3/5 P(age|C1) = 1/2 P(age|C2) = 2/3 P(income|C1) = 1/2 P(income|C2) = 1/3 P(studentcredit_rating|C1) = 1/2 P(studentcredit_rating|C2) = 2/3 Given a new instance: age > 40, income = low, studentcredit_rating = yes fair, the classifier would predict: P(C1) = 2/5

Uploaded by

kanika sharma
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Bayesian Classification

SUBMITTED BY:- SUBMITTED TO:-


DIVYA VERMA(20-805) DR. MANDEEP KAUR
KANIKA SHARMA(20-807) ASSISTANT PROFESSOR
ME(IT)1ST SEM UIET, PANJAB UNIVERSITY
Today’s discussion…

Bayesian Classifier
Principle of Bayesian classifier
Bayes’ theorem of probability

Naïve Bayesian Classifier


Bayesian Belief Networks
2
Bayesian Classifier
 Principle
 If it walks like a duck, quacks like a duck, then it is probably a duck

3
Bayesian Classifier
A statistical classifier

Performs probabilistic prediction, i.e., predicts class membership probabilities

Foundation

Based on Bayes’ Theorem.

Assumptions
1. The classes are mutually exclusive and exhaustive.

2. The attributes are independent given the class.

Called “Naïve” classifier because of these assumptions.


Empirically proven to be useful.

Scales very well.

4
Example: Bayesian Classification
 Example 8.2: Air Traffic Data

 Let us consider a set


observation recorded in a
database

 Regarding the arrival of airplanes


in the routes from any airport to
New Delhi under certain
conditions.

CS 40003: Data Analytics 5


Air-Traffic Data
Days Season Fog Rain Class
Weekday Spring None None On Time
Weekday Winter None Slight On Time
Weekday Winter None None On Time
Holiday Winter High Slight Late
Saturday Summer Normal None On Time
Weekday Autumn Normal None Very Late
Holiday Summer High Slight On Time
Sunday Summer Normal None On Time
Weekday Winter High Heavy Very Late
Weekday Summer None Slight On Time

Cond. to next slide…

6
Air-Traffic Data
Cond. from previous slide…
Days Season Fog Rain Class
Saturday Spring High Heavy Cancelled
Weekday Summer High Slight On Time
Weekday Winter Normal None Late
Weekday Summer High None On Time
Weekday Winter Normal Heavy Very Late
Saturday Autumn High Slight On Time
Weekday Autumn None Heavy On Time
Holiday Spring Normal Slight On Time
Weekday Spring Normal None On Time
Weekday Spring Normal Heavy On Time

7
Air-Traffic Data
 In this database, there are four attributes
A = [ Day, Season, Fog, Rain]
with 20 tuples.
 The categories of classes are:
C= [On Time, Late, Very Late, Cancelled]

 Given this is the knowledge of data and classes, we are to find most likely
classification for any other unseen instance, for example:

Week Day Winter High None ???

 Classification technique eventually to map this tuple into an accurate class.

8
Bayesian Classifier
 In many applications, the relationship between the attributes set and the
class variable is non-deterministic.

 In other words, a test cannot be classified to a class label with certainty.

 In such a situation, the classification can be achieved probabilistically.

 The Bayesian classifier is an approach for modelling probabilistic


relationships between the attribute set and the class variable.

 More precisely, Bayesian classifier use Bayes’ Theorem of Probability for


classification.

 Before going to discuss the Bayesian classifier, we should have a quick look at
the Bayes’ Theorem.

9
Bayes’ Theorem Of Probability
 Theorem 8.4: Bayes’ Theorem

Let be n mutually exclusive and exhaustive events associated with a random


experiment. If A is any event which occurs with , then

10
Naïve Bayesian Classifier
  Suppose, Y is a class variable and X = is a set of attributes,

with instance of Y.
INPUT (X) CLASS(Y)
… … …
… … … …

… … … …

 The classification problem, then can be expressed as the class-conditional


probability

11
Naïve Bayesian Classifier
 Naïve Bayesian classifier calculate this posterior probability using Bayes’ theorem,
  
which is as follows.

 From Bayes’ theorem on conditional probability, we have

where,
(Y)
Note:
 is called the evidence (also the total probability) and it is a constant.

 The probability P(Y|X) (also called class conditional probability) is therefore


proportional to P(X|Y).

 Thus, P(Y|X) can be taken as a measure of Y given that X.


P(Y|X)

12
Naïve Bayesian Classifier
  Suppose, for a given instance of X (say x = () and ….. .

 There are any two class conditional probabilities namely P(Y|X=x) and
P(YX=x).

 If P(YX=x) > P(YX=x), then we say that is more stronger than for the
instance X = x.

 The strongest is the classification for the instance X = x.

13
Naïve Bayesian Classifier
 Example: With reference to the Air Traffic Dataset mentioned earlier, let
us tabulate all the posterior and prior probabilities as shown below.
Class
Attribute On Time Late Very Late Cancelled
Weekday 9/14 = 0.64 ½ = 0.5 3/3 = 1 0/1 = 0
Saturday 2/14 = 0.14 ½ = 0.5 0/3 = 0 1/1 = 1
Day

Sunday 1/14 = 0.07 0/2 = 0 0/3 = 0 0/1 = 0


Holiday 2/14 = 0.14 0/2 = 0 0/3 = 0 0/1 = 0
Spring 4/14 = 0.29 0/2 = 0 0/3 = 0 0/1 = 0
Season

Summer 6/14 = 0.43 0/2 = 0 0/3 = 0 0/1 = 0


Autumn 2/14 = 0.14 0/2 = 0 1/3= 0.33 0/1 = 0
Winter 2/14 = 0.14 2/2 = 1 2/3 = 0.67 0/1 = 0
14
Naïve Bayesian Classifier

Class
Attribute On Time Late Very Late Cancelled
None 5/14 = 0.36 0/2 = 0 0/3 = 0 0/1 = 0
Fog

High 4/14 = 0.29 1/2 = 0.5 1/3 = 0.33 1/1 = 1


Normal 5/14 = 0.36 1/2 = 0.5 2/3 = 0.67 0/1 = 0
None 5/14 = 0.36 1/2 = 0.5 1/3 = 0.33 0/1 = 0
Rain

Slight 8/14 = 0.57 0/2 = 0 0/3 = 0 0/1 = 0


Heavy 1/14 = 0.07 1/2 = 0.5 2/3 = 0.67 1/1 = 1
Prior Probability 14/20 = 0.70 2/20 = 0.10 3/20 = 0.15 1/20 = 0.05

15
Naïve Bayesian Classifier
Instance:

Week Day Winter High Heavy ???

Case1: Class = On Time : 0.70 × 0.64 × 0.14 × 0.29 × 0.07 = 0.0013


Case2: Class = Late : 0.10 × 0.50 × 1.0 × 0.50 × 0.50 = 0.0125
Case3: Class = Very Late : 0.15 × 1.0 × 0.67 × 0.33 × 0.67 = 0.0222
Case4: Class = Cancelled : 0.05 × 0.0 × 0.0 × 1.0 × 1.0 = 0.0000

Case3 is the strongest; Hence correct classification is Very Late

16
Naïve Bayesian Classifier
   Algorithm: Naïve Bayesian Classification
Input: Given a set of k mutually exclusive and exhaustive classes C = ,
which have prior probabilities P(C1), P(C2),….. P(Ck).

There are n-attribute set A = which for a given instance have values = , = ,
….., =

Step: For each , calculate the class condition probabilities, i = 1,2,…..,k

Output: is the classification

Note: , because they are not probabilities rather proportion values (to posterior
probabilities) 17
Naïve Bayesian Classifier
Pros and Cons
 The Naïve Bayes’ approach is a very popular one, which often works well.

 However, it has a number of potential problems

 It relies on all attributes being categorical.

 If the data is less, then it estimates poorly.

18
A Practice Example
age income studentcredit_rating
buys_compu
Example 8.4 <=30 high no fair no
<=30 high no excellent no
Class: 31…40 high no fair yes
C1:buys_computer = ‘yes’ >40 medium no fair yes
C2:buys_computer = ‘no’ >40 low yes fair yes
>40 low yes excellent no
Data instance 31…40 low yes excellent yes
X = (age <=30,
<=30 medium no fair no
Income = medium,
<=30 low yes fair yes
Student = yes
>40 medium yes fair yes
Credit_rating = fair)
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
19
A Practice Example
 P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357

 Compute P(X|Ci) for each class


P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

 X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 × 0.444 × 0.667 × 0.667 = 0.044


P(X|buys_computer = “no”) = 0.6 × 0.4 × 0.2 × 0.4 = 0.019

P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028


P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

Therefore, X belongs to class (“buys_computer = yes”)


20
Bayesian Belief Networks
Bayesian belief network allows a subset of the variables
to be conditionally independent.
A graphical model of causal relationships.
 Represents dependency among the variables .
 Gives a specification of joint probability distribution .

 Nodes: random variables


 Links: dependency
X Y
 X and Y are the parents of Z, and Y is the
parent of P
Z  No dependency between Z and P
P
 Has no loops or cycles(Directed Acyclic
Graph)
21
Bayesian Belief Network: An Example
For example, lung cancer is influenced by a person's family history of lung cancer, as well
as whether or not the person is a smoker. It is worth noting that the variable Positive Xray
is independent of whether the patient has a family history of lung cancer or that the
patient is a smoker, given that we know the patient has lung cancer.
Family Smoker  The conditional probability table
History (CPT) for variable Lung Cancer:
(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)

LC 0.8 0.5 0.7 0.1


Emphysema ~LC 0.2 0.5 0.3 0.9
LungCancer
• CPT shows the conditional probability for
each possible combination of its parents
• Derivation of the probability of a particular
combination of values of X, from CPT:

PositiveXRay Dyspnea n
P ( x1 ,..., xn )   P ( x i | Parents (Y i ))
i 1
Bayesian Belief Networks 22
Advantages And Disadvantages:

• It can readily handle incomplete data sets.

• It allow one to learn about causal relationships.

• It readily facilitate use of prior knowledge.

• It is more complex to construct graphs.


23
Reference

The detail material related to this lecture can be found in

Data Mining: Concepts and Techniques, (3rd Edn.), Jiawei Han, Micheline Kamber, Morgan
Kaufmann, 2015.

Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Addison-
Wesley, 2014
https://www.tutorialspoint.com/data_mining/dm_bayesian_classification.htm

24
Thank You!!

25

You might also like