Lecture 9 - 10 Naive Generative Analysis
Lecture 9 - 10 Naive Generative Analysis
S. No. Objectives
• Foundation
• Assumptions
The classes are mutually exclusive and exhaustive.
The attributes are independent given the class.
5
Air-Traffic Data
6
Air-Traffic Data
7
Air-Traffic Data
• In this database, there are four attributes
A = [ Day, Season, Fog, Rain] with 20 tuples.
• The categories of classes are:
C= [On Time, Late, Very Late, Cancelled]
• Given this is the knowledge of data and classes, we are to find most likely classification
for any other unseen instance, for example:
Week Day Winter High None ???
8
Bayesian Classifier
• In many applications, the relationship between the attributes set and the class variable is
non-deterministic.
• In other words, a test cannot be classified to a class label with certainty.
• In such a situation, the classification can be achieved probabilistically.
• The Bayesian classifier is an approach for modelling probabilistic relationships between
the attribute set and the class variable.
• More precisely, Bayesian classifier use Bayes’ Theorem of Probability for classification.
• Before going to discuss the Bayesian classifier, we should have a quick look at the
Theory of Probability and then Bayes’ Theorem.
9
Bayes’ Theorem of Probability
10
Simple Probability
11
Simple Probability
• Suppose, A and B are any two events and P(A), P(B) denote the probabilities that the events A
and B will occur, respectively.
Can you give an example, so that two events are not mutually exclusive?
Hint: Tossing two identical coins, Weather (sunny, foggy, warm)
12
Simple Probability
• Independent events: Two events are independent if occurrences of one does not alter the
occurrence of other.
Can you give an example, where an event is dependent on one or more other events(s)?
Hint: Receiving a message (A) through a communication channel (B)
over a computer (C), rain and dating.
13
Joint Probability
14
Conditional Probability
Suppose, A and B are two events associated with a random experiment. The
probability of A under the condition that B has already occurred and is given by
15
Conditional Probability
Conditional Probability
or
For n events A1, A2, …, An and if all events are mutually independent to each other
Note:
if events are mutually exclusive
if A and B are independent
otherwise,
16
Conditional Probability
Generalization of Conditional Probability:
∵ P(A) = P(B)
17
Conditional Probability
In general,
18
Total Probability
19 19
Total Probability: An Example
Example 8.3
A bag contains 4 red and 3 black balls. A second bag contains 2 red and 4 black balls. One bag is
selected at random. From the selected bag, one ball is drawn. What is the probability that the ball
drawn is red?
Thus,
where, = Probability of drawing red ball when first bag has been chosen
and = Probability of drawing red ball when second bag has been chosen
20
Reverse Probability
Example 8.3:
A bag (Bag I) contains 4 red and 3 black balls. A second bag (Bag II) contains 2 red and 4 black
balls. You have chosen one ball at random. It is found as red ball. What is the probability that the
ball is chosen from Bag I?
Here,
Selecting bag I
Selecting bag II
A = Drawing the red ball
We are to determine P(|A). Such a problem can be solved using Bayes' theorem of probability.
21
Bayes’ Theorem
22
Prior and Posterior Probabilities
X Y
P(A) and P(B) are called prior probabilities
P(A|B), P(B|A) are called posterior probabilities A
A
Example 8.6: Prior versus Posterior Probabilities
B
This table shows that the event Y has two outcomes namely A and B,
which is dependent on another event X with various outcomes like A
and . B
Case1: Suppose, we don’t have any information of the event A. A
Then, from the given sample space, we can calculate P(Y = A) =
= 0.5 B
23
Naïve Bayesian Classifier
… … … …
24
Naïve Bayesian Classifier
Naïve Bayesian classifier calculate this posterior probability using Bayes’ theorem, which
is as follows.
From Bayes’ theorem on conditional probability, we have
where,
(Y)
Note:
is called the evidence (also the total probability) and it is a constant.
The probability P(Y|X) (also called class conditional probability) is therefore
proportional to P(X|Y).
Thus, P(Y|X) can be taken as a measure of Y given that X.
P(Y|X)
25
Naïve Bayesian Classifier
26
Naïve Bayesian Classifier
Example: With reference to the Air Traffic Dataset mentioned earlier, let us
tabulate all the posterior and prior probabilities as shown below.
Class
Attribute On Time Late Very Late Cancelled
Weekday 9/14 = 0.64 ½ = 0.5 3/3 = 1 0/1 = 0
Saturday 2/14 = 0.14 ½ = 0.5 0/3 = 0 1/1 = 1
Day
Class
Attribute On Time Late Very Late Cancelled
None 5/14 = 0.36 0/2 = 0 0/3 = 0 0/1 = 0
Fog
28
Naïve Bayesian Classifier
Instance:
29
Naïve Bayesian Classifier
There are n-attribute set A = which for a given instance have values = , = ,….., =
Note: , because they are not probabilities rather proportion values (to posterior probabilities)
30
Discriminative & Generative Models
• Machine learning models can be classified into discriminative and generative models.
• A “Discriminative model” models the decision boundary between the classes and A “Generative Model” explicitly
models the actual distribution of each class.
Discriminative & Generative Models
• The discriminative model is used particularly for supervised machine learning. Also called a
conditional model, it learns the boundaries between classes or labels in a dataset.
• The ultimate goal of discriminative models is to separate one class from another.
• Types of discriminative models in machine learning include: Logistic Regression, Support Vector
Machine, Decision Tree, Random Forest.
• Generative models are a class of statistical models that generate new data instances. These models are
used in unsupervised machine learning to perform tasks such as probability and likelihood estimation,
modelling data points, and distinguishing between classes using these probabilities.
• Generative models rely on the Bayes theorem to find the joint probability.
• Examples of Generative models are Naive Bayes (and generally Bayesian networks), Hidden Markov
model, Linear discriminant analysis (LDA), a dimensionality reduction technique.
Discriminative & Generative Models
A Generative Model learns the joint probability distribution p(x,y). It
predicts the conditional probability with the help of Bayes Theorem. A
Discriminative model learns the conditional probability distribution p(y|x).
Both of these models were generally used in supervised learning problems.
Note:
Joint Probability
Joint probability is the likelihood of more than one event occurring at the same time P(A and B).
Conditional Probability
The conditional probability of an event B is the probability that the event will occur given the knowledge
that an event A has already occurred. It is denoted by P(B|A).
Discriminative Vs Generative Models
• Discriminative models have the advantage of being more robust to outliers, unlike the
generative models.
• However, one major drawback is a misclassification problem, i.e., wrongly classifying a
data point.
• Another key difference between these two types of models is that while a generative
model focuses on explaining how the data was generated, a discriminative model focuses
on predicting labels of the data.
Gaussian Discriminant Analysis
PREDICTED CLASS
Class=Yes Class=No
ad TP TN a b c d
Accuracy
TP TN FP FN
Limitation of Accuracy
Consider a 2-class problem
Number of Class 0 examples =
9990 Number of Class 1 examples
= 10
PREDICTED CLASS
i
Weighted Accuracy w a w b w c w
1 2 3 4
d
Computing Cost of Classification
Cost PREDICTED CLASS
Matrix
C(i|j) + -
ACTUAL + -1 100
CLASS
- 1 0
Class=Yes a b
ACTU N=a+b+c+d
AL
CLAS Class=No c d
S
Accuracy = (a + d)/N
Class=Yes p q = q N – (q – p)(a + d)
ACTUAL
CLASS = N [q – (q-p) Accuracy]
Class=No q p
THANK
YOU
For Queries,
Write at : [email protected]