0% found this document useful (0 votes)
3 views

Bayesian Classification- problem (1)

The document explains Bayesian classification, focusing on the naive Bayes classifier which applies Bayes' theorem with strong independence assumptions among features. It details the process of calculating prior and conditional probabilities to classify data samples based on observed attributes. An example is provided to illustrate the classification of a dataset predicting whether a student will buy a computer based on age, income, student status, and credit rating.

Uploaded by

gautamchandan25
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Bayesian Classification- problem (1)

The document explains Bayesian classification, focusing on the naive Bayes classifier which applies Bayes' theorem with strong independence assumptions among features. It details the process of calculating prior and conditional probabilities to classify data samples based on observed attributes. An example is provided to illustrate the classification of a dataset predicting whether a student will buy a computer based on age, income, student status, and credit rating.

Uploaded by

gautamchandan25
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Bayesian Classification

A Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with
strong independence assumptions. A more descriptive term for the underlying probability model
would be "independent feature model".

In simple terms, a naive Bayes classifier assumes that the presence (or absence) of a particular
feature of a class is unrelated to the presence (or absence) of any other feature.

 Let X be a data sample (“evidence”): class label is unknown


 Let H be a hypothesis that X belongs to class C
 Classification is to determine P(H|X), the probability that the hypothesis holds given the
observed data sample X
 P(H) (prior probability), the initial probability

 E.g., X will buy computer, regardless of age, income,

 P(X): probability that sample data is observed


 P(X|H) (posteriori probability), the probability of observing the sample X, given that the
hypothesis holds

 E.g.,Given that X will buy computer, the prob. that X is 31..40, medium income

Bayes theorem is useful in that it provides a way of calculating the posteriori probability, P(H|X),
from P(H),P(X) and P(X|H), Bayes theorem can be stated as

P( X|H ) P( H )
P( H|X )=
P( X )

Native Bayesian Classification

The native Bayesian Classifier or simple Bayesian Classifier, works as follows,

 Let D be a training set of tuples and their associated class labels, and each tuple is
represented by an n-D attribute vector X = (x1, x2, …, xn)

 Suppose there are m classes C1, C2, …, Cm.

 Classification is to derive the maximum posteriori, i.e., the maximal P(C i|X)

 This can be derived from Bayes’ theorem


P ( X |C ) P ( C )
P(C |X )=
i i
i P( X )
P (C |X )=P ( X|C ) P ( C )
 Since P(X) is constant for all classes, only i i need ito be maximized.
 A simplified assumption: attributes are conditionally independent i.e., no dependence relation
between attributes, so P(X|Ci) becomes
n
P( X|C i )=∏ P( x |C i )=P( x |C i )×P( x |C i )×. ..×P( x |Ci )
k=1 k 1 2 n
 If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having value xk for Ak divided by |Ci, D| (#
of tuples of Ci in D)

 If Ak is continous-valued, P(xk|Ci) is usually computed based on Gaussian distribution with a


mean μ and standard deviation σ,
( x −μ )2

1 2σand
2
P(XK|Ci)=g(xk, μci, σci)
g( x , μ , σ )= e
√2 π σ

Ex: Consider the following Training dataset to illustrate the classification for predicting a class
label for the situation “A student age is less than or equal to 30 with medium income and fair
credit rating purchased computer or not”.

Understanding the Data

The table represents a dataset used for a classification problem. We're trying to predict whether
someone "buys a computer" (the "Class" column) based on several attributes:

 RID (Record ID): A unique identifier for each data point.

 age: Categorical age ranges: "<=30", "31...40", ">40".

 income: Categorical income levels: "high", "medium", "low".

 student: Binary (yes/no) indicating if the person is a student.

 credit_rating: Categorical credit rating: "fair", "excellent".

 Class: buys_computer: The target variable we want to predict. It's binary (yes/no).
1. Calculate Prior Probabilities P(Ci):

 P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643

P(buys_computer = “no”) = 5/14= 0.357

 2. Compute Conditional Probabilities P(X|Ci):

This step calculates the probability of observing specific attribute values (X) given each class
(Ci).

 Compute P(X|Ci) for each class

P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222

P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6

P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444

P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4

P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667

P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2


P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667

P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

3. X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044

P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019

P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028

P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

Therefore, X belongs to class (“buys_computer = yes”)

You might also like