0% found this document useful (0 votes)
23 views

Bayes Classification Methods

This document discusses Naive Bayes classification methods. It begins with an introduction to Bayesian classifiers and how they can predict class membership probabilities. It then explains that the Naive Bayesian classifier makes a simplifying assumption that attribute values are conditionally independent given the class. The document provides an example of how to calculate probabilities to classify a data point using the Naive Bayesian approach. It also discusses how decision trees can be used for classification.

Uploaded by

discodancerhasan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Bayes Classification Methods

This document discusses Naive Bayes classification methods. It begins with an introduction to Bayesian classifiers and how they can predict class membership probabilities. It then explains that the Naive Bayesian classifier makes a simplifying assumption that attribute values are conditionally independent given the class. The document provides an example of how to calculate probabilities to classify a data point using the Naive Bayesian approach. It also discusses how decision trees can be used for classification.

Uploaded by

discodancerhasan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

CSE-463

Machine Learning

Bayes Classification Methods


Naïve Bayes
Md. Rashadur Rahman
Department of CSE
CUET
Bayes Classification

• Bayesian classifiers are statistical classifiers.


• They can predict class membership probabilities such as the
probability that a given tuple belongs to a particular class.
• A simple Bayesian classifier known as the Naïve Bayesian classifier to
be comparable in performance with decision tree and selected neural
network classifiers.

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 2


What is “Naïve” in Naïve Bayesian Classifier

• Naïve Bayesian classifiers assume that the effect of an attribute


value on a given class is independent of the values of the other
attributes. This assumption is called class conditional
independence.
• It is made to simplify the computations involved and, in this sense,
is considered “Naïve”.

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 3


Bayes’ Theorem

• Bayes’ theorem is named after Thomas Bayes


• Let X be a data tuple. In Bayesian terms, X is considered
“evidence.” As usual, it is described by measurements made on a
set of n attributes.
• Let H be some hypothesis such as that the data tuple X belongs to
a specified class C.

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 4


A Sample Training and Testing Dataset
RID age income student? credit_rating buys_computer
1 youth high no fair no
2 youth high no excellent no
3 middle_aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle_aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle_aged medium no excellent yes
13 middle_aged high yes fair yes
14 senior medium no excellent no

n youth medium yes fair ?

X = (age = youth, income = medium, student = yes, credit rating = fair)


31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 5
Bayes’ Theorem

• Let H be some hypothesis such as that the data tuple X belongs to a


specified class C.
• For classification problems, we want to determine:
P(H|X)
the probability that the hypothesis H holds given the “evidence” or
observed data tuple X.
P(buyes_computer = “yes” | X )
P(buyes_computer = “no” | X )
X = (age = youth, income = medium, student = yes, credit rating = fair)

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 6


Bayes’ Theorem (cont.)
• Bayes’ Theorem is:

• Suppose that H is the hypothesis that our customer will buy a computer.
• P(H|X) is the posterior probability of H conditioned on X.
It is the probability that customer X will buy a computer given that we know X = (age =
youth, income = medium, student = yes, credit rating = fair)

• P(X|H) is the posterior probability of X conditioned on H.


It is the probability that a customer, X is = (age = youth, income = medium, student =
yes, credit rating = fair, given that we know the customer will buy a computer.

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 7


Bayes’ Theorem (cont.)

• Suppose that H is the hypothesis that our customer will buy a computer.
• P(H) is the prior probability of H
It is the probability that any given customer will buy a computer, regardless of age,
income, student, credit rating, or any other information, for that matter.

• P(X) is the prior probability of X


• It is the probability that a person from our set of customers is youth, medium
income, student, and his credit rating is fair.

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 8


Working Procedure of Naïve Bayesian Classifier
1. Let D be a training set of tuples and their associated class labels. As usual, each
tuple is represented by an n-dimensional attribute vector, X = (x1, x2,...,xn),
depicting n measurements made on the tuple from n attributes, respectively, A1,
A2,..., An.
2. Suppose that there are m classes, C1, C2,..., Cm. Given a tuple, X. the Naïve
Bayesian classifier predicts that tuple X belongs to the class Ci if and only if,

This can be derived from Bayes’ theorem

3. Since, P(X) is constant for all classes, Therefore

needs to be maximized
31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 9
Working Procedure of Naïve Bayesian Classifier
4. To reduce computation in evaluating P(X|Ci), the Naïve assumption of class-
conditional independence is made.
This presumes that the attributes’ values are conditionally independent of one
another, given the class label of the tuple. Thus,

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 10


Working Procedure of Naïve Bayesian Classifier

5. To predict the class label of X, P(X|Ci) * P(Ci) is evaluated for each class Ci.
And assigned the class label which has the highest probability.

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 11


Example
RID age income student? credit_rating buys_computer
1 youth high no fair no
2 youth high no excellent no
3 middle_aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle_aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle_aged medium no excellent yes
13 middle_aged high yes fair yes
14 senior medium no excellent no

n youth medium yes fair ?

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 12


Example (cont.)
The class label attribute, buys computer, has two
distinct values (namely, {yes, no}).
Let C1 correspond to the class buys computer = yes
and C2 correspond to buys computer = no. The
tuple we wish to classify is

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 13


Example (cont.)

To compute P(X|Ci), for i = 1, 2, we compute the


following conditional probabilities:

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 14


Example (cont.)

To compute P(X|Ci), for i = 1, 2, we compute the


following conditional probabilities:

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 15


Example (cont.)

Similarly,

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 16


Example (cont.)

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 17


Example (cont.)

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 18


How are decision trees used for classification?

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 19


Thank You

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 20


Home Study
1. What happens if we should end up with a probability value of
zero for some P(xk|Ci)?
2. Handling zero probability problem (Laplacian correction)

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 21


References

[1] Han, J., Kamber, M., & Pei, J. (2012). Data mining concepts and techniques third
edition. University of Illinois at Urbana-Champaign Micheline Kamber Jian Pei Simon
Fraser University

31/07/2023 Department of CSE, Chittagong University of Engineering & Technology 22

You might also like