0% found this document useful (0 votes)
31 views20 pages

Bayes Classifier

The document discusses the naive Bayes classifier, a powerful algorithm for classifying text documents, particularly effective with large datasets. It explains Bayes theorem, which calculates the probability of a hypothesis based on prior probabilities and observed data, and introduces the naive rule for classification. The document also provides examples illustrating the application of the naive Bayes classifier in real-life scenarios, emphasizing its utility in estimating class-conditional probabilities.

Uploaded by

Shivam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views20 pages

Bayes Classifier

The document discusses the naive Bayes classifier, a powerful algorithm for classifying text documents, particularly effective with large datasets. It explains Bayes theorem, which calculates the probability of a hypothesis based on prior probabilities and observed data, and introduces the naive rule for classification. The document also provides examples illustrating the application of the naive Bayes classifier in real-life scenarios, emphasizing its utility in estimating class-conditional probabilities.

Uploaded by

Shivam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Bayes Classifier

Dr. Partha Pratim Sarangi


Bayes Classifier
Introduction
• The naive Bayes classifier is probably among the most effective algorithms
for learning tasks to classify text documents.
• The naive Bayes technique is extremely helpful in case of huge datasets.
• For example, Google employs naive Bayes classifier to correct the spelling
mistakes in the text typed in by users.
• it gives a meaningful perspective to the comprehension of various learning
algorithms that do not explicitly manipulate probabilities.
• Bayes theorem is the cornerstone of Bayesian learning methods.
Bayes theorem
• Bayes theorem offers a method of calculating the probability of a
hypothesis on the basis of its prior probability, the probabilities of
observing different data given the hypothesis, and the observed data itself.
• The distribution of all possible values of discrete random variable y is
expressed as probability distribution.

• We assume that there is some a priori probability (or simply prior) P(yq)
that the next feature vector belongs to the class q.
Bayes theorem
• The continuous attributes are binned and converted to categorical variables.
• Therefore, each attribute xj is assumed to have value set that are countable.
• Bayes theorem provides a way to calculate posterior P(yk |x); k ϵ{1, …, M}
from the known priors P(yq), together with known conditional probabilities
P(x| yq); q = 1, …, M.

Using this relation, easier


Directly, difficult to calculate

• P(x) expresses variability of the observed data, independent of the class.


Naive Rule
• As per this rule, the record gets classified as a member of the majority class.
• Assume that there are six attributes in the data table
• x1: Day of Week, x2: Departure Time, x3: Origin, x4: Destination,
• x5: Carrier, x6: Weather
• and output y gives class labels (Delayed, On Time).
• Say 82% of the entries in y column record ‘On Time’.
• A naive rule for classifying a flight into two classes, ignoring information on x1,
x2, …, x6 is to classify all flights as being ‘On Time’.
• The naive rule is used as a baseline for evaluating the performance of more
complicated classifiers.
• Clearly, a classifier that uses attribute information should outperform the naive
rule.
Naive Bayes Classifier
• Takes into account the features as equally important and independent of
each other, considering the class.
• Not the scenario in real-life data.
• Each of the P(yq) may be estimated simply by counting the frequency with
which class yq occurs in the training data:

• If the decision must be made with so little information, it seems logical to


(Just like Naive rule)
use the following rule:
For balanced data, it will not work

Very much greater decision will be right


Naive Bayes Classifier
• In most other circumstances, we need to estimate class-conditional
probabilities P(x|yq) as well

• According to the assumption (attribute values are conditionally


independent, given the class), given the class of the pattern, the probability
of observing the conjunction x1, x2, …, xn is just the product of the
probabilities for the individual attributes:
Naive Bayes Classifier

• where yNB denotes the class output by the naive Bayes classifier.
• The number of distinct P(xj | yq) terms that must be estimated from the
training data is just the number of distinct attributes (n) times the number
of distinct classes (M).
Summary
Example:
y for x : {M, 1.95 m} ?

• y1 corresponds to the class


‘short’,
• y2 corresponds to the class
‘medium’, and
• y3 corresponds to the class
‘tall’.
Example:
y for x : {M, 1.95 m} ?
N1= no. of y1=4; N2= no. of y2=8; N3= no. of y3=3;
Sorted w.r.t x2:

Example: Height x2
Gender x1 (m) Class y
y for x : {M, 1.95 m} ? F 1.6 Short y1
F 1.6 Short y1
N1= no. of y1=4; N2= no. of y2=8; N3= no. of y3=3;
F 1.7 Short y1
M 1.7 Short y1
F 1.75 Medium y2
F 1.8 Medium y2
F 1.8 Medium y2
M 1.85 Medium y2
F 1.88 Medium y2
F 1.9 Medium y2
F 1.9 Medium y2
M 1.95 Medium y2
M 2 Tall y3
M 2.1 Tall y3
M 2.2 Tall y3
Example:
y for x : {M, 1.95 m} ?
N1= no. of y1=4; N2= no. of y2=8; N3= no. of y3=3;
Example:
y for x : {M, 1.95 m} ?
N1= no. of y1=4; N2= no. of y2=8; N3= no. of y3=3;
Example:
y for x : {M, 1.95 m} ?
N1= no. of y1=4; N2= no. of y2=8; N3= no. of y3=3;

• This gives q = 3.
• Therefore, for the pattern x = {M 1.95m}, the
predicted class is ‘tall’.

• The true class in the data table is ‘medium’.


• Use of naive Bayes algorithm on real-life datasets will bring out the power of
naive Bayes classifier when N is large.
Let us say, we want to classify a Red
Example 2: Domestic SUV, as stolen or not
Let us say, we want to classify a Red
Example 2: Domestic SUV, as stolen or not
Example 2:
• We need to calculate the probabilities P(Red|Yes), P(SUV|Yes), P(Domestic|Yes) ,
P(Red|No) , P(SUV|No), and P(Domestic|No)
• and multiply them by P(Yes) and P(No) respectively.
• Then we can estimate these values using equation for YNB
• Looking at P(Red|Y es), we have 5 cases where vj = Yes , and in 3 of those cases ai =
Red.
• So for P(RedjY es), n = 5 and nc = 3.
• Note that all attribute are binary (two possible values).
• We are assuming no other information so, p = 1 / (number-of-attribute-values) = 0.5
for all of our attributes.
• Our m value is arbitrary, (We will use m = 3) but consistent for all attributes.
• Now we simply apply eqauation (3) using the precomputed values of n , nc, p, and m.
Example 2:

You might also like