0% found this document useful (0 votes)
22 views22 pages

8 ML

The document discusses Bayes classifier and naive Bayes classifier. It explains the learning and classification phases of naive Bayes, including making the assumption of independent attributes. An example of using naive Bayes for classification of whether to play tennis is also provided.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views22 pages

8 ML

The document discusses Bayes classifier and naive Bayes classifier. It explains the learning and classification phases of naive Bayes, including making the assumption of independent attributes. An example of using naive Bayes for classification of whether to play tennis is also provided.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Baye’s Classifier

Module-II
EC 19-203-0811
Introduction to Machine Learning

Course Outcomes
1. To understand various machine learning techniques
2. To acquire knowledge about classification techniques.
3. To understand dimensionality reduction techniques and decision trees.
4. To understand unsupervised machine learning techniques.

8-Bayes Classifier 2:12 PM 2


Syllabus
Module I
Introduction: Machine Learning, Applications, Supervised Learning -Classification, Regression,
Unsupervised Learning, Reinforcement Learning, Supervised Learning: Learning a Class from Examples,
Vapnik - Chervonenkis (VC) Dimension, Probably Approximately Correct (PAC) Learning, Noise, Learning
Multiple Classes, Regression, Model Selection and Generalization, Dimensions of a Supervised Machine
Learning Algorithm

Module II
Multilayer Perceptrons: Introduction, The Perceptron, Training a Perceptron, Learning Boolean
Functions, Multilayer Perceptrons, Backpropagation Algorithm, Training Procedures. Classification- Cross
validation and re-sampling methods- Kfold cross validation, Boot strapping, Measuring classifier
performance- Precision, recall, ROC curves. Bayes Theorem, Bayesian classifier, Maximum Likelihood
estimation, Density Functions.

8-Bayes Classifier 2:12 PM 3


Syllabus

Module II
Multilayer Perceptrons: Introduction, The Perceptron, Training a Perceptron, Learning Boolean
Functions, Multilayer Perceptrons, Backpropagation Algorithm, Training Procedures. Classification- Cross
validation and re-sampling methods- Kfold cross validation, Boot strapping, Measuring classifier
performance- Precision, recall, ROC curves. Bayes Theorem, Bayesian classifier, Maximum Likelihood
estimation, Density Functions.

8-Bayes Classifier 2:12 PM 4


Syllabus
Module III
Dimensionality Reduction: Introduction, Subset Selection, Principal Components
Analysis, Factor Analysis, Multidimensional Scaling, Linear Discriminant Analysis,
Isomap, Locally Linear Embedding, Decision Trees: Introduction, Univariate Trees,
Pruning, Rule Extraction from Trees, Learning Rules from Data, Multivariate Trees,
Introduction to Linear Discrimination, Generalizing the Linear Model.

Module IV
Clustering: Introduction, Mixture Densities, k-Means Clustering, Expectation-
Maximization Algorithm, Mixtures of Latent Variable Models, Supervised Learning after
Clustering, Hierarchical Clustering, Choosing the Number of Clusters.

8-Bayes Classifier 2:12 PM 5


References
1. https://www.geeksforgeeks.org/naive-bayes-classifiers/

2. Stephen Marsland, “MACHINE LEARNING An Algorithmic Perspective”, 2nd


Edition, CRC Press, 2015. [Ch-2]

3. Christopher M. Bishop, “Pattern Recognition and Machine Learning”,


Springer,2006.

4. Ethem Alpaydin, “Introduction to Machine Learning”, Second Edition, 2010


[Ch-3,4,]

5. Images from different websites

8-Bayes Classifier 2:12 PM 6


Contents

• Bayes Theorem
• Bayesian Classifier
• Maximum Likelihood Estimation
• Density Functions

8-Bayes Classifier 2:12 PM 7


Bayes Theorem

8-Bayes Classifier 2:12 PM 8


Bayes Classifier
• Establishing a probabilistic model for classification
– Discriminative model

– Generative model
MAP classification rule
– MAP: Maximum A Posterior
– Assign x to c* if
• Generative classification with the MAP rule
– Apply Bayesian rule to convert:

8-Bayes Classifier 2:12 PM 9


8-Bayes Classifier

Bayes Classifier
• Establishing a probabilistic model for classification
– Discriminative model
P(C |X ) C  c1 ,  , c L , X  (X1 ,  , Xn )
– Generative model
P( X |C ) C  c1 ,  , c L , X  (X1 ,  , Xn )

• MAP classification rule


– MAP: Maximum A Posterior
– Assign x to c* if P(C  c * |X  x )  P(C  c |X  x ) c  c * , c  c1 ,  , c L
• Generative classification with the MAP rule
– Apply Bayesian rule to convert:P(C|X)  P(X|C)P(C)  P(X|C)P(C)
P(X)

2:12 PM 10
8-Bayes Classifier

Bayes Classifier
• Bayes classification
P(C |X )  P( X |C )P(C )  P( X1 ,  , Xn |C )P(C )

Difficulty: learning the joint probability P( X1 ,  , Xn |C )


• Naïve Bayes classification
– Making the assumption that all input attributes are independent
P( X1 , X2 ,  , Xn |C )  P( X1 | X2 ,  , Xn ; C )P( X2 ,  , Xn |C )
 P( X1 |C )P( X2 ,  , Xn |C )
 P( X1 |C )P( X2 |C )    P( Xn |C )
– MAP classification rule
[ P( x1 |c * )    P( xn |c * )]P(c * )  [ P( x1 |c )    P( xn | c)]P(c ), c  c * , c  c1 ,  , c L

2:12 PM 11
8-Bayes Classifier

Bayes Classifier
• Naïve Bayes Algorithm (for discrete input attributes)
– Learning Phase: Given a training set S,
For each target value of ci (ci  c1 ,  , c L )
Pˆ (C  ci )  estimate P(C  ci ) with examples in S;
For every attribute value a jk of each attribute x j ( j  1,  , n; k  1,  , N j )
Pˆ ( X j  a jk |C  ci )  estimate P( X j  a jk |C  ci ) with examples in S;

Output: conditional probability tables; for x j , N j  L elements


– Test Phase: Given an unknown instance X   ( a1 ,  , an ),
Look up tables to assign the label c* to X’ if
[ Pˆ ( a1 |c * )    Pˆ ( an |c * )]Pˆ (c * )  [ Pˆ ( a1 | c)    Pˆ ( an |c)]Pˆ (c), c  c * , c  c1 ,  , c L

2:12 PM 12
8-Bayes Classifier

Bayes Classifier-Example
• Example: Play Tennis

2:12 PM 13
8-Bayes Classifier

Bayes Classifier-Example
• Learning Phase
Outlook Play=Yes Play=No
Sunny 2/9 3/5
Overcast 4/9 0/5
Rain 3/9 2/5

Temperature Play=Yes Play=No


Hot 2/9 2/5
Mild 4/9 2/5
Cool 3/9 1/5

2:12 PM 14
8-Bayes Classifier

Bayes Classifier-Example
• Learning Phase

Humidity Play=Yes Play=No


High 3/9 4/5
Normal 6/9 1/5

Wind Play=Yes Play=No


Strong 3/9 3/5
Weak 6/9 2/5

2:12 PM 15
8-Bayes Classifier

Bayes Classifier-Example
• Learning Phase

Outlook Play=Yes Play=No Temperature Play=Yes Play=No


Sunny 2/9 3/5 Hot 2/9 2/5
Overcast 4/9 0/5 Mild 4/9 2/5
Rain 3/9 2/5 Cool 3/9 1/5

Humidity Play=Yes Play=No Wind Play=Yes Play=No


High 3/9 4/5 Strong 3/9 3/5
Normal 6/9 1/5 Weak 6/9 2/5

P(Play=Yes) = 9/14 P(Play=No) = 5/14

2:12 PM 16
8-Bayes Classifier

Bayes Classifier-Example
• Test Phase
– Given a new instance,
x’=(Outlook=Sunny, Temperature=Cool, Humidity=High, Wind=Strong)
– Look up tables
P(Outlook=Sunny|Play=Yes) = 2/9 P(Outlook=Sunny|Play=No) = 3/5
P(Temperature=Cool|Play=Yes) = 3/9 P(Temperature=Cool|Play==No) = 1/5
P(Huminity=High|Play=Yes) = 3/9 P(Huminity=High|Play=No) = 4/5
P(Wind=Strong|Play=Yes) = 3/9 P(Wind=Strong|Play=No) = 3/5
P(Play=Yes) = 9/14 P(Play=No) = 5/14

– MAP rule
P(Yes|x’): [P(Sunny|Yes)P(Cool|Yes)P(High|Yes)P(Strong|Yes)]P(Play=Yes) = 0.0053
P(No|x’): [P(Sunny|No) P(Cool|No)P(High|No)P(Strong|No)]P(Play=No) = 0.0206

Given the fact P(Yes|x’) < P(No|x’), we label x’ to be “No”.

2:12 PM 17
8-Bayes Classifier

Bayes Classifier
Advantages :

•fast and easy ML algorithm to predict a class of datasets.


•useful for Binary as well as Multi-class Classifications.

Disadvantages :

assumes that all features are independent or unrelated, so it cannot learn


the relationship between features.

9:59 AM 18
8-Bayes Classifier

Maximum Likelihood Estimation


• is a method of estimating the parameters of an assumed probability
distribution, given some observed data.

• This is achieved by maximizing a likelihood function so that, under
the assumed statistical model, the observed data is most probable.

• The point in the parameter space that maximizes the likelihood


function is called the maximum likelihood estimate.

• If the likelihood function is differentiable, the derivative test for finding


maxima can be applied.

2:12 PM 19
8-Bayes Classifier

Density Functions
• In probability theory, a probability density function (PDF), or density of
a continuous random variable, is a function whose value at any given
sample (or point) in the sample space (the set of possible values taken by
the random variable) can be interpreted as providing a relative
likelihood that the value of the random variable would be equal to that
sample.
• Probability density is the probability per unit length
• while the absolute likelihood for a continuous random variable to take on
any particular value is 0 (since there is an infinite set of possible values to
begin with), the value of the PDF at two different samples can be used to
infer, in any particular draw of the random variable, how much more likely
it is that the random variable would be close to one sample compared to
the other sample.

2:12 PM 20
Conclusion

• Bayes' Theorem
• Bayes Classifier
• Examples

8-Bayes Classifier 2:12 PM 21


Thank You

You might also like