0% found this document useful (0 votes)
28 views

U4-Naive Bayes Algorithm

naive ayes algorithm in machine learning concepts

Uploaded by

accelia.s
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

U4-Naive Bayes Algorithm

naive ayes algorithm in machine learning concepts

Uploaded by

accelia.s
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

6.

3 Naive Bayes algorithm


6.3.1 Assumption
The naive Bayes algorithm is based on the following assumptions:
• All the features are independent and are unrelated to each other. Presence or absence of a
feature does not influence the presence or absence of any other feature.
• The data has class-conditional independence, which means that events are independent so
long as they are conditioned on the same class value.
These assumptions are, in general, true in many real world problems. It is because of these assump-
tions, the algorithm is called a naive algorithm.

6.3.2 Basic idea


Suppose we have a training data set consisting of N examples having n features. Let the features
be named as (F1 , . . . , Fn ). A feature vector is of the form (f1 , f2 , . . . , fn ). Associated with each
example, there is a certain class label. Let the set of class labels be {c1 , c2 , . . . , cp }.
Suppose we are given a test instance having the feature vector

X = (x1 , x2 , . . . , xn ).

We are required to determine the most appropriate class label that should be assigned to the test
instance. For this purpose we compute the following conditional probabilities

P (c1 ∣X), P (c2 ∣X), . . . , P (cp ∣X). (6.5)

and choose the maximum among them. Let the maximum probability be P (ci ∣X). Then, we choose
ci as the most appropriate class label for the training instance having X as the feature vector.
The direct computation of the probabilities given in Eq.(6.5) are difficult for a number of reasons.
The Bayes’ theorem can b applied to obtain a simpler method. This is explained below.

6.3.3 Computation of probabilities


Using Bayes’ theorem, we have:
P (X∣ck )P (ck )
P (ck ∣X) = (6.6)
P (X)
Since, by assumption, the data has class-conditional independence, we note that the events “x1 ∣ck ”,
“x2 ∣ck ”, ⋯, xn ∣ck are independent (because they are all conditioned on the same class label ck ).
Hence we have

P (X∣ck ) = P ((x1 , x2 , . . . , xn )∣ck )


= P (x1 ∣ck )P (x2 ∣ck )⋯P (xn ∣ck )

Using this in Eq,(6.6) we get


P (x1 ∣ck )P (x2 ∣ck )⋯P (xn ∣ck )P (ck )
P (ck ∣X) = .
P (X)
Since the denominator P (X) is independent of the class labels, we have

P (ck ∣X) ∝ P (x1 ∣ck )P (x2 ∣ck )⋯P (xn ∣ck )P (ck ).

So it is enough to find the maximum among the following values:

P (x1 ∣ck )P (x2 ∣ck )⋯P (xn ∣ck )P (ck ), k = 1, . . . , p.


CHAPTER 6. BAYESIAN CLASSIFIER AND ML ESTIMATION 65

Remarks
The various probabilities in the above expression are computed as follows:
No. of examples with class label ck
P (ck ) =
Total number of examples
No. of examples with jth feature equal to xj and class label ck
P (xj ∣ck ) =
No. of examples with class label ck

6.3.4 The algorithm


Algorithm: Naive Bayes

Let there be a training data set having n features F1 , . . . , Fn . Let f1 denote an arbitrary value of F1 ,
f2 of F2 , and so on. Let the set of class labels be {c1 , c2 , . . . , cp }. Let there be given a test instance
having the feature vector
X = (x1 , x2 , . . . , xn ).
We are required to determine the most appropriate class label that should be assigned to the test
instance.
Step 1. Compute the probabilities P (ck ) for k = 1, . . . , p.
Step 2. Form a table showing the conditional probabilities

P (f1 ∣ck ), P (f2 ∣ck ), ... , P (fn ∣ck )

for all values of f1 , f2 , . . . , fn and for k = 1, . . . , p.


Step 3. Compute the products

qk = P (x1 ∣ck )P (x2 ∣ck )⋯P (xn ∣ck )P (ck )

for k = 1, . . . , p.
Step 4. Find j such qj = max{q1 , q2 , . . . , qp }.
Step 5. Assign the class label cj to the test instance X.

Remarks
In the above algorithm, Steps 1 and 2 constitute the learning phase of the algorithm. The remaining
steps constitute the testing phase. For testing purposes, only the table of probabilities is required;
the original data set is not required.

6.3.5 Example
Problem
Consider a training data set consisting of the fauna of the world. Each unit has three features named
“Swim”, “Fly” and “Crawl”. Let the possible values of these features be as follows:
Swim Fast, Slow, No
Fly Long, Short, Rarely, No
Crawl Yes, No
For simplicity, each unit is classified as “Animal”, “Bird” or “Fish”. Let the training data set be as in
Table 6.1. Use naive Bayes algorithm to classify a particular species if its features are (Slow, Rarely,
No)?
CHAPTER 6. BAYESIAN CLASSIFIER AND ML ESTIMATION 66

Sl. No. Swim Fly Crawl Class


1 Fast No No Fish
2 Fast No Yes Animal
3 Slow No No Animal
4 Fast No No Animal
5 No Short No Bird
6 No Short No Bird
7 No Rarely No Animal
8 Slow No Yes Animal
9 Slow No No Fish
10 Slow No Yes Fish
11 No Long No Bird
12 Fast No No Bird

Table 6.1: Sample data set for naive Bayes algorithm

Solution
In this example, the features are
F1 = “Swim”, F2 = “Fly”, F3 = “Crawl”.
The class labels are
c1 = “Animal”, c2 = “ Bird”, c3 = “Fish”.
The test instance is (Slow, Rarely, No) and so we have:
x1 = “Slow”, x2 = “Rarely”, x3 = “No”.
We construct the frequency table shown in Table 6.2 which summarises the data. (It may be noted
that the construction of the frequency table is not part of the algorithm.)

Features
Class Swim (F1 ) Fly (F2 ) Crawl (F3 ) Total
Fast Slow No Long Short Rarely No Yes No
Animal (c1 ) 2 2 1 0 0 1 4 2 3 5
Bird (c2 ) 1 0 3 1 2 0 1 1 3 4
Fish (c3 ) 1 2 0 0 0 0 3 0 3 3
Total 4 4 4 1 2 1 8 4 8 12

Table 6.2: Frequency table for the data in Table 6.1

Step 1. We compute following probabilities.


No. of records with class label “Animal”
P (c1 ) =
Total number of examples
= 5/12
No. of records with class label “Bird”
P (c2 ) =
Total number of examples
= 4/12
No of records with class label “Fish”
P (c3 ) =
Total number of examples
= 3/12
CHAPTER 6. BAYESIAN CLASSIFIER AND ML ESTIMATION 67

Step 2. We construct the following table of conditional probabilities:

Features
Swim (F1 ) Fly (F2 ) Crawl (F3 )
Class
f1 f2 f3
Fast Slow No Long Short Rarely No Yes No
Animal (c1 ) 2/5 2/5 1/5 0/5 0/5 1/5 4/5 2/5 3/5
Bird (c2 ) 1/4 0/4 3/4 1/4 2/4 0/4 1/4 0/4 4/4
Fish (c3 ) 13 2/3 0/3 0/3 0/3 0/3 3/3 0/3 3/3

Table 6.3: Table of the conditional probabilities P (fi ∣ck )

Note: The conditional probabilities are calculated as follows:


No. of records with F1 = Slow and class label c1
P ((F1 = Slow)∣c1 ) =
No. of records with class label c1
= 2/5.

Step 3. We now calculate the following numbers:

q1 = P (x1 ∣c1 )P (x2 ∣c1 )P (x3 ∣c1 )P (c1 )


= (2/5) × (1/5) × (3/5) × (5/12)
= 0.02
q2 = P (x1 ∣c2 )P (x2 ∣c2 )P (x3 ∣c2 )P (c2 )
= (0/4) × (0/4) × (3/4) × (4/12)
=0
q3 = P (x1 ∣c3 )P (x2 ∣c3 )P (x3 ∣c3 )P (c3 )
= (2/3) × (0/3) × (3/3) × (3/12)
=0

Step 4. Now
max{q1 , q2 , q3 } = 0.05.

Step 5. The maximum is q1 an it corresponds to the class label

c1 = “ Animal”.

So we assign the class label “Animal” to the test instance “(Slow, Rarely, No)”.

6.4 Using numeric features with naive Bayes algorithm


The naive Bayes algorithm can be applied to a data set only if the features are categorical. This is
so because, the various probabilities are computed using the various frequencies and the frequencies
can be counted only if each feature has a limited set of values.
If a feature is numeric, it has to be discretized before applying the algorithm. The discretization
is effected by putting the numeric values into categories known as bins. Because of this discretization
is also known as binning. This is ideal when there are large amounts of data.
There are several different ways to discretize a numeric feature.
1. If there are natural categories or cut points in the distribution of values, use these cut points to
create the bins. For example, let the data consists of records of times when certain activities
were carried out. The the categories, or bins, may be created as in Figure 6.3.
CHAPTER 6. BAYESIAN CLASSIFIER AND ML ESTIMATION 68

Figure 6.3: Discretization of numeric data: Example

2. If there are no obvious cut points, we may discretize the feature using quantiles. We may
divide the data into three bins with tertiles, four bins with quartiles, or five bins with quintiles,
etc.

You might also like