0% found this document useful (0 votes)
30 views

Lecture 9 - 10 Naive Generative Analysis

This document outlines a course on machine learning. It discusses Naive Bayes analysis, generative models, and Gaussian discriminant analysis. It provides objectives and outcomes of the course. It also presents an example of air traffic data and uses Bayes' theorem to explain how a Bayesian classifier can predict flight delays or cancellations based on attributes of the data.

Uploaded by

Saikat Das
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Lecture 9 - 10 Naive Generative Analysis

This document outlines a course on machine learning. It discusses Naive Bayes analysis, generative models, and Gaussian discriminant analysis. It provides objectives and outcomes of the course. It also presents an example of air traffic data and uses Bayes' theorem to explain how a Bayesian classifier can predict flight delays or cancellations based on attributes of the data.

Uploaded by

Saikat Das
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

APEX INSTITUTE OF TECHNOLOGY

COMPUTER SCIENCE &


ENGINEERING
Bachelor of Engineering (Computer Science)
Generative and Discriminative Analysis, Gaussian
Discriminant Analysis (GDA), Naïve Bayes Analysis
20CSF-286
Prof. (Dr.) Paras Chawla (E5653)

Unit 1 : Machine Learning DISCOVER . LEARN . EMPOWER


Outline
• Naïve Bayes Analysis.
• Generative Models
• Discriminative Models
• Gaussian Discriminant Analysis (GDA)
Course Objectives

S. No. Objectives

1 Understand the concept of learning in computer and


science.
2 Compare and contrast different paradigms for
learning (supervised, unsupervised, etc.)

3 Design experiments to evaluate and compare


different machine learning techniques on real-world
problems.
Course Outcomes
S. No. Outcomes
CO1 Understand the various key paradigms of Machine
Learning
CO2 Familiar with the mathematical and statistical
techniques used in Machine Learning
CO3 Implement a wide variety of learning algorithms
including well-studied methods for classification,
regression and clustering.
CO4 Analyze methods to evaluate learning models
generated from data.
CO5 Ability to evaluate machine learning models for
solving practical problems.
Bayesian Classifier
• A statistical classifier

Performs probabilistic prediction, i.e., predicts class membership probabilities

• Foundation

Based on Bayes’ Theorem.

• Assumptions
The classes are mutually exclusive and exhaustive.
The attributes are independent given the class.

• Called “Naïve” classifier because of these assumptions.


Empirically proven to be useful.
Scales very well.

5
Air-Traffic Data

Days Season Fog Rain Class


Weekday Spring None None On Time
Weekday Winter None Slight On Time
Weekday Winter None None On Time
Holiday Winter High Slight Late
Saturday Summer Normal None On Time
Weekday Autumn Normal None Very Late
Holiday Summer High Slight On Time
Sunday Summer Normal None On Time
Weekday Winter High Heavy Very Late
Weekday Summer None Slight On Time

Cond. to next slide…

6
Air-Traffic Data

Cond. from previous slide…


Days Season Fog Rain Class
Saturday Spring High Heavy Cancelled
Weekday Summer High Slight On Time
Weekday Winter Normal None Late
Weekday Summer High None On Time
Weekday Winter Normal Heavy Very Late
Saturday Autumn High Slight On Time
Weekday Autumn None Heavy On Time
Holiday Spring Normal Slight On Time
Weekday Spring Normal None On Time
Weekday Spring Normal Heavy On Time

7
Air-Traffic Data
• In this database, there are four attributes
A = [ Day, Season, Fog, Rain] with 20 tuples.
• The categories of classes are:
C= [On Time, Late, Very Late, Cancelled]

• Given this is the knowledge of data and classes, we are to find most likely classification
for any other unseen instance, for example:
Week Day Winter High None ???

• Classification technique eventually to map this tuple into an accurate class.

8
Bayesian Classifier

• In many applications, the relationship between the attributes set and the class variable is
non-deterministic.
• In other words, a test cannot be classified to a class label with certainty.
• In such a situation, the classification can be achieved probabilistically.
• The Bayesian classifier is an approach for modelling probabilistic relationships between
the attribute set and the class variable.
• More precisely, Bayesian classifier use Bayes’ Theorem of Probability for classification.
• Before going to discuss the Bayesian classifier, we should have a quick look at the
Theory of Probability and then Bayes’ Theorem.

9
Bayes’ Theorem of Probability

10
Simple Probability

Definition: Simple Probability

If there are n elementary events associated with a random experiment and m of n


of them are favorable to an event A, then the probability of happening or
occurrence of A is

11
Simple Probability

• Suppose, A and B are any two events and P(A), P(B) denote the probabilities that the events A
and B will occur, respectively.

• Mutually Exclusive Events:


Two events are mutually exclusive, if the occurrence of one precludes the occurrence of the other.
Example: Tossing a coin (two events)
Tossing a ludo cube (Six events)

Can you give an example, so that two events are not mutually exclusive?
Hint: Tossing two identical coins, Weather (sunny, foggy, warm)

12
Simple Probability

• Independent events: Two events are independent if occurrences of one does not alter the
occurrence of other.

Example: Tossing both coin and ludo cube together.


(How many events are here?)

Can you give an example, where an event is dependent on one or more other events(s)?
Hint: Receiving a message (A) through a communication channel (B)
over a computer (C), rain and dating.

13
Joint Probability

Definition : Joint Probability

If P(A) and P(B) are the probability of two events, then

If A and B are mutually exclusive, then


If A and B are independent events, then

Thus, for mutually exclusive events

14
Conditional Probability

Definition 8.2: Conditional Probability

If events are dependent, then their probability is expressed by conditional


probability. The probability that A occurs given that B is denoted by .

Suppose, A and B are two events associated with a random experiment. The
probability of A under the condition that B has already occurred and is given by

15
Conditional Probability
Conditional Probability

or

For three events A, B and C

For n events A1, A2, …, An and if all events are mutually independent to each other

Note:
if events are mutually exclusive
if A and B are independent
otherwise,

16
Conditional Probability
Generalization of Conditional Probability:

∵ P(A) = P(B)

By the law of total probability : P(B) =

17
Conditional Probability

In general,

18
Total Probability

Definition 8.3: Total Probability

Let be n mutually exclusive and exhaustive events associated with a random


experiment. If A is any event which occurs with , then

19 19
Total Probability: An Example
Example 8.3
A bag contains 4 red and 3 black balls. A second bag contains 2 red and 4 black balls. One bag is
selected at random. From the selected bag, one ball is drawn. What is the probability that the ball
drawn is red?

This problem can be answered using the concept of Total Probability


Selecting bag I
Selecting bag II
A = Drawing the red ball

Thus,
where, = Probability of drawing red ball when first bag has been chosen
and = Probability of drawing red ball when second bag has been chosen

20
Reverse Probability

Example 8.3:
A bag (Bag I) contains 4 red and 3 black balls. A second bag (Bag II) contains 2 red and 4 black
balls. You have chosen one ball at random. It is found as red ball. What is the probability that the
ball is chosen from Bag I?

Here,
Selecting bag I
Selecting bag II
A = Drawing the red ball

We are to determine P(|A). Such a problem can be solved using Bayes' theorem of probability.

21
Bayes’ Theorem

Theorem 8.4: Bayes’ Theorem

Let be n mutually exclusive and exhaustive events associated with a random


experiment. If A is any event which occurs with , then

22
Prior and Posterior Probabilities

X Y
P(A) and P(B) are called prior probabilities
P(A|B), P(B|A) are called posterior probabilities A
A
Example 8.6: Prior versus Posterior Probabilities
B
This table shows that the event Y has two outcomes namely A and B,
which is dependent on another event X with various outcomes like A
and . B
Case1: Suppose, we don’t have any information of the event A. A
Then, from the given sample space, we can calculate P(Y = A) =
= 0.5 B

Case2: Now, suppose, we want to calculate P(X = |Y =A) = = 0.4 . B


B
The later is the conditional or posterior probability, where as the
former is the prior probability. A

23
Naïve Bayesian Classifier

Suppose, Y is a class variable and X = is a set of attributes,


with instance of Y.

INPUT (X) CLASS(Y)


… … …
… … … …

… … … …

The classification problem, then can be expressed as the class-conditional probability

24
Naïve Bayesian Classifier

Naïve Bayesian classifier calculate this posterior probability using Bayes’ theorem, which
is as follows.
From Bayes’ theorem on conditional probability, we have

where,
(Y)
Note:
 is called the evidence (also the total probability) and it is a constant.
 The probability P(Y|X) (also called class conditional probability) is therefore
proportional to P(X|Y).
 Thus, P(Y|X) can be taken as a measure of Y given that X.
P(Y|X)

25
Naïve Bayesian Classifier

• Suppose, for a given instance of X (say x = () and ….. .


• There are any two class conditional probabilities namely P(Y|X=x) and P(YX=x).
• If P(YX=x) > P(YX=x), then we say that is more stronger than for the instance X = x.
• The strongest is the classification for the instance X = x.

26
Naïve Bayesian Classifier
Example: With reference to the Air Traffic Dataset mentioned earlier, let us
tabulate all the posterior and prior probabilities as shown below.
Class
Attribute On Time Late Very Late Cancelled
Weekday 9/14 = 0.64 ½ = 0.5 3/3 = 1 0/1 = 0
Saturday 2/14 = 0.14 ½ = 0.5 0/3 = 0 1/1 = 1
Day

Sunday 1/14 = 0.07 0/2 = 0 0/3 = 0 0/1 = 0


Holiday 2/14 = 0.14 0/2 = 0 0/3 = 0 0/1 = 0
Spring 4/14 = 0.29 0/2 = 0 0/3 = 0 0/1 = 0
Summer 6/14 = 0.43 0/2 = 0 0/3 = 0 0/1 = 0
Season

Autumn 2/14 = 0.14 0/2 = 0 1/3= 0.33 0/1 = 0


Winter 2/14 = 0.14 2/2 = 1 2/3 = 0.67 0/1 = 0
27
Naïve Bayesian Classifier

Class
Attribute On Time Late Very Late Cancelled
None 5/14 = 0.36 0/2 = 0 0/3 = 0 0/1 = 0
Fog

High 4/14 = 0.29 1/2 = 0.5 1/3 = 0.33 1/1 = 1


Normal 5/14 = 0.36 1/2 = 0.5 2/3 = 0.67 0/1 = 0
None 5/14 = 0.36 1/2 = 0.5 1/3 = 0.33 0/1 = 0
Rain

Slight 8/14 = 0.57 0/2 = 0 0/3 = 0 0/1 = 0


Heavy 1/14 = 0.07 1/2 = 0.5 2/3 = 0.67 1/1 = 1
Prior Probability 14/20 = 0.70 2/20 = 0.10 3/20 = 0.15 1/20 = 0.05

28
Naïve Bayesian Classifier

Instance:

Week Day Winter High Heavy ???


Case1: Class = On Time : 0.70 × 0.64 × 0.14 × 0.29 × 0.07 = 0.0013
Case2: Class = Late : 0.10 × 0.50 × 1.0 × 0.50 × 0.50 = 0.0125
Case3: Class = Very Late : 0.15 × 1.0 × 0.67 × 0.33 × 0.67 = 0.0222
Case4: Class = Cancelled : 0.05 × 0.0 × 0.0 × 1.0 × 1.0 = 0.0000

Case3 is the strongest; Hence correct classification is Very Late

29
Naïve Bayesian Classifier

Algorithm: Naïve Bayesian Classification

Input: Given a set of k mutually exclusive and exhaustive classes C = , which


have prior probabilities P(C1), P(C2),….. P(Ck).

There are n-attribute set A = which for a given instance have values = , = ,….., =

Step: For each , calculate the class condition probabilities, i = 1,2,…..,k

Output: is the classification

Note: , because they are not probabilities rather proportion values (to posterior probabilities)
30
Discriminative & Generative Models
• Machine learning models can be classified into discriminative and generative models.
• A “Discriminative model” ‌models the decision boundary between the classes and A “Generative Model” ‌explicitly
models the actual distribution of each class. 
Discriminative & Generative Models
• The discriminative model is used particularly for supervised machine learning. Also called a
conditional model, it learns the boundaries between classes or labels in a dataset.
• The ultimate goal of discriminative models is to separate one class from another.
• Types of discriminative models in machine learning include: Logistic Regression, Support Vector
Machine, Decision Tree, Random Forest.
• Generative models are a class of statistical models that generate new data instances. These models are
used in unsupervised machine learning to perform tasks such as probability and likelihood estimation,
modelling data points, and distinguishing between classes using these probabilities.
• Generative models rely on the Bayes theorem to find the joint probability.
• Examples of Generative models are Naive Bayes (and generally Bayesian networks), Hidden Markov
model, Linear discriminant analysis (LDA), a dimensionality reduction technique.
Discriminative & Generative Models
A Generative Model ‌learns the joint probability distribution p(x,y). It
predicts the conditional probability with the help of Bayes Theorem. A
Discriminative model ‌learns the conditional probability distribution p(y|x).
Both of these models were generally used in supervised learning problems.
Note:

Joint Probability
Joint probability is the likelihood of more than one event occurring at the same time P(A and B).

Conditional Probability
‌The conditional probability of an event B is the probability that the event will occur given the knowledge
that an event A has already occurred. It is denoted by P(B|A).
Discriminative Vs Generative Models

• Discriminative models have the advantage of being more robust to outliers, unlike the
generative models.
• However, one major drawback is a misclassification problem, i.e., wrongly classifying a
data point.
• Another key difference between these two types of models is that while a generative
model focuses on explaining how the data was generated, a discriminative model focuses
on predicting labels of the data.
Gaussian Discriminant Analysis 

Gaussian Discriminant Analysis is a Generative Learning Algorithm and


in order to capture the distribution of each class, it tries to fit a Gaussian
Distribution to every class of the data separately. 
Classification
Classification
Performance Measures
Confusion Matrix
Focus on the predictive capability of a model

PREDICTED CLASS

Class=Yes Class=No a: TP (true positive)


b: FN (false
ACTUAL Class=Yes a b
CLASS negative) c: FP
Class=No c d (false positive) d:
TN (true negative)
Accuracy Metric
Ratio of true positives and true negatives to the sum of true positives, true negatives, false
negatives,
and false positives
PREDICTED CLASS

Class=Yes Class=No

ACTUAL CLASS Class=Yes a b


(TP) (FN)
Class=No c d
(FP) (TN)

ad TP  TN a  b  c  d
Accuracy  
TP  TN  FP  FN
Limitation of Accuracy
Consider a 2-class problem
Number of Class 0 examples =
9990 Number of Class 1 examples
= 10

If the model predicts every example to be


class 0, accuracy is 9990/10000 = 99.9 %

Hence, accuracy is misleading because the


model does not detect any class 1
example
Cost Matrix
Cost matrix takes weights into account

PREDICTED CLASS

C(i|j) Class=Yes Class=No

ACTUAL Class=Yes C(Yes|Yes) C(No|Yes)


CLASS
Class=No C(Yes|No) C(No|No)

Cost of classifying class j example as class wawd


1 4

i
Weighted Accuracy  w a  w b  w c  w
1 2 3 4

d
Computing Cost of Classification
Cost PREDICTED CLASS
Matrix
C(i|j) + -
ACTUAL + -1 100
CLASS
- 1 0

Model PREDICTED CLASS Model PREDICTED CLASS


M1 M2
+ - + -
ACTUAL + 150 40 ACTUAL + 250 45
CLASS CLASS
- 60 250 - 5 200

Accuracy = 80% Accuracy = 90%


Cost = 3910 Cost = 4255
Cost vs. Accuracy
Count PREDICTED CLASS Accuracy is proportional to cost if
1. C(Yes|No)=C(No|Yes) = q
Class=Yes Class=No 2. C(Yes|Yes)=C(No|No) = p

Class=Yes a b
ACTU N=a+b+c+d
AL
CLAS Class=No c d
S
Accuracy = (a + d)/N

Cost PREDICTED CLASS Cost = p (a + d) + q (b + c)


Class=Yes Class=No = p (a + d) + q (N – a – d)

Class=Yes p q = q N – (q – p)(a + d)
ACTUAL
CLASS = N [q – (q-p)  Accuracy]
Class=No q p
THANK
YOU
For Queries,
Write at : [email protected]

You might also like