0% found this document useful (0 votes)

29 views82 pages

Week12_PCA_BayesianInference_before_lecture

NTU EE6483 Week12_PCA_BayesianInference_before_lecture

Uploaded by

yimingxiao2000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views82 pages

Week12_PCA_BayesianInference_before_lecture

NTU EE6483 Week12_PCA_BayesianInference_before_lecture

Uploaded by

yimingxiao2000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 82

Artificial Intelligence & Data Mining

Week 12

WEN Bihan (Asst Prof)

Homepage: https://personal.ntu.edu.sg/bihan.wen/

1
Recap: Bias and Variance

• Model Complexity Analysis:

Underfitting Overfitting
High bias and low variance Low bias and high variance

2
Recap: Overfitting and Underfitting

• Simple Model

• High Bias

• Cause an algorithm to miss relevant

relations between the input features and
the target outputs.

• Complex Model

• High Variance

• Cause an algorithm to model the noise in

the training set.

3
Recap: Overfitting and Underfitting

1. High complexity -> Large Gap -> Overfitting

2. Low complexity -> Small Gap -> Underfitting

4
Overfitting and Underfitting - Examples

• Suppose that you are training a NN for classification. When you

reduce the model complexity by removing a layer, your validation
accuracy increases.

• How does the model bias and variance change?

• Does your model suffer from overfitting or underfitting?

• Now, if you make your NN even deeper (higher complexity)

• Will your training accuracy increase?

• Will your validation accuracy increase?

5
Regularization to prevent overfitting

• Solutions, in the context of learning neural networks:

1. Limit the model complexity by reducing the model expressiveness.

2. Increase the training data complexity / size, to reduce the variance.

3. Simplify data distribution and dimensionality.

• Dimensionality Reduction

k = 200 k = 50 k=2
6
Dimensionality Reduction

WEN Bihan (Asst Prof)

Homepage: https://personal.ntu.edu.sg/bihan.wen/

7
Outline

• Concept of Dimensionality Reduction

• Principal Component Analysis (PCA)

• How to derive PCA (not examed)

• Examples

8
Carry-on Questions

• Why do we need dimensionality reduction?

• What are two objectives achieved by PCA?

• How to calculate PCA (not examed)?

9
Unsupervised Learning

• Clustering

• Exploit similar structures amongst the data themselves.

• One way to summarize various data points with a single categorical variable,
i.e., the cluster centroid.

10
Unsupervised Learning

• Clustering

• Exploit similar structures amongst the data themselves.

• One way to summarize various data points with a single categorical variable,
i.e., the cluster centroid.

• Dimensionality Reduction

• Another way to simplify the complex and high-dimensional data.

• Summarize data with a lower dimensional real valued vector.

11
Dimensionality Reduction

• Goal: find the best-fitting low-dimensional subspace to

𝑖=1 .
represent (𝑥𝑖 )𝑁

• Fit 2-d data with a 1-d line.

• Fit 3-d data with a 2-d plane.

12
Dimensionality Reduction

• Goal: find the best-fitting low-dimensional subspace to

𝑖=1 .
represent (𝑥𝑖 )𝑁

• Given data points in d dimensions

• Convert them to data points in r<d dimensions

• With minimal loss of information

13
Example: Reduce Data from 2D to 1D
(inches)

(cm)

14
Example: Reduce Data from 2D to 1D

Reduce data from

(inches)

2D to 1D

15
Example: Reduce Data from 2D to 1D

Reduce data from

(inches)

2D to 1D

16
Example: Reduce Data from 3D to 2D

3D data points 2D data points

(𝑥1 , 𝑥2 , 𝑥3 ) (𝑧1 , 𝑧2 )

Approximated by a 2D plane Projected onto the plane

17
Why Dimensionality Reduction?

• What is wrong with high dimensions?

• High Dimensions = Lots of Features

• Complex system to process

• Inefficient algorithm

• Overfitting to noise or other data corruptions.

18
Dimensionality Reduction

• Why do we need dimensionality reduction?

• Remove Feature Redundancy + Noise

• Prevent Overfitting

• Reduced Model Complexity

• Simplified Data distribution

• Better Visualization

19
Principal Component Analysis

• Principal Component Analysis (PCA)

• Simple and popular method.

• Unsupervised learning.

• To learn the “best” low-dimensional subspace for

data projection.

• What does the “best” mean here?

20
Principal Component Analysis

• Two ways to interpret the “Best”:

1. The mean square error (MSE) of the projected data is
minimized.

2. The variance of the projected data is maximized.

• These two objectives are achieved simultaneously by

applying PCA for dimensionality reduction.

• Why and How?

21
Principal Component Analysis

1. The mean square error (MSE) of the projected data

is minimized.

• 𝒙𝒊 ( black ): the original data.

• 𝒗 ( red ) : PCA subspace.

• 𝒗𝑻 𝒙𝒊 𝒗 ( blue ) : projected data.

• Minimize MSE ( green ) for all {𝒙𝒊 }

i.e., the perpendicular offset.
22
Principal Component Analysis

2. The variance of the projected data is maximized.

• 𝒗𝑻 𝒙𝒊 𝒗 ( blue ) : projected data.

• Variance = magnitude of ( blue )

• Blue variance is maximized.

23
Principal Component Analysis

• Minimizing MSE <=> Maximizing Projected Variance

• Blue2 + green2 = black2

• Black is fixed (given data)

• Maximizing blue (variance)

is equivalent to
Minimizing green (MSE).

24
Example: PCA

• Which of the following 𝒗 generates higher

projected variance?

25
Example: PCA

• Which of the following 𝒗 generates higher

projected variance?
• Find the projected mean 𝒖

𝒗 • Calculate the distance between each

point to the mean 𝒅𝒊

• The variance of the projected data

𝒅𝟐 𝒖
𝑁
𝒅𝟏 1
𝑣𝑎𝑟 = ෍ 𝒅𝒊 2
𝑁
𝑖=1

26
Example: PCA

• Subspace 𝒗 on the left generates higher

projected variance

Larger projected
variance!
Smaller projected 𝒗
variance!
27
Example: PCA

• Which of the following 𝒗 generates smaller

projected MSE?

28
Example: PCA

• Subspace 𝒗 on the left generates smaller

projected MSE – Smaller Perpendicular Offset

𝒗 Larger
projected MSE

Smaller
projected MSE
𝒗

29
Derivation of PCA - Optional

1. The mean square error (MSE) of

the projected data is minimized.

• For all {𝒙𝑖 }𝑁

𝑖=1 , the green
2

(in average) is minimized.

• Find the best red 𝒗 (unit vector)

to minimize the green2.

• The principal component:

𝒛𝒊 = 𝒗𝑇 𝒙𝒊 𝒗

30
Derivation of PCA - Optional

1. The mean square error (MSE) of

the projected data is minimized.

• The principal component:

𝒛𝒊 = 𝒗𝑇 𝒙𝒊 𝒗

• The MSE is minimized:

𝑁
1
𝑀𝑆𝐸 = ෍(𝒛𝒊 − 𝒙𝒊 )2
𝑁
𝑖=1

31
Derivation of PCA - Optional

2. The variance of the projected

data is maximized.

• Variance of the projected data:

𝑁
1
𝑉𝑎𝑟 = ෍(𝒛𝒊 − 𝒛ത )2
𝑁
𝑖=1

• 𝒛ത denotes the mean of {𝒛𝒊 }.

32
Derivation of PCA - Optional

2. The variance of the projected

data is maximized.

• For zero-mean data {𝒙𝑖 }𝑁

𝑖=1 , we
have 𝒛ത = 𝟎, then

𝑁 𝑁
1 1
𝑉𝑎𝑟 = ෍(𝒛𝒊 ) = ෍(𝒗𝑇 𝒙𝒊 )2
2
𝑁 𝑁
𝑖=1 𝑖=1

33
Derivation of PCA - Optional

• Mathematical Problem Formulation:

• PCA is to maximize the projected data variance.

• Consider a set of zero-mean {𝐱 i }N

i=1 ,

𝑁 𝑁
1 1
𝑉𝑎𝑟 = ෍(𝒛𝒊 )2 = ෍(𝒗𝑇 𝒙𝒊 )2
𝑁 𝑁
𝑖=1 𝑖=1

• Note that this variance is different from the

variance & bias for overfitting analysis.

34
Derivation of PCA - Optional

• Mathematical Problem Formulation:

• PCA is to maximize the projected data variance.

• Consider a set of zero-mean {𝐱 i }N

i=1 , the first weight unit vector 𝒗 satisfies:

𝑁
1
max ෍(𝒗𝑇 𝒙𝒊 )2
𝒗 2 =1 𝑁
𝑖=1

35
Derivation of PCA - Optional

• Recap: represent a set of data vectors {𝒙𝑖 }𝑁

𝑖=1 as a data matrix 𝑿:

𝒙𝒊 𝑻
𝒙𝟏 𝑻
⋮
=
𝑁
𝒙𝒊 𝑻
⋮
𝑻
𝒙𝑵

𝑑: feature dimension
36
Derivation of PCA - Optional

• Mathematical Problem Formulation:

• PCA is to maximize the projected data variance.

• Consider a set of zero-mean {𝐱 i }N

i=1 , the first weight unit vector 𝒗 satisfies:

𝑁
1 1 𝑇 𝑇
max ෍(𝒗𝑇 𝒙𝒊 )2 = max 𝒗 𝑿 𝑿𝒗
𝒗 2 =1 𝑁 𝒗 2 =1 𝑁
𝑖=1

37
Derivation of PCA - Optional

• Mathematical Problem Formulation:

• PCA is to maximize the projected data variance.

• Consider a set of zero-mean {𝐱 i }N

i=1 , the first weight unit vector 𝒗 satisfies:

𝑁
1 1 𝑇 𝑇
max ෍(𝒗𝑇 𝒙𝒊 )2 = max 𝒗 𝑿 𝑿𝒗
𝒗 2 =1 𝑁 𝒗 2 =1 𝑁
𝑖=1
1 𝑁 1 𝑇 𝑇
• Here σ𝑖=1(𝒗𝑇 𝒙𝒊 )2 = 𝒗 𝑿 𝑿𝒗 is the variance of the projected data.
𝑁 𝑁

• The optimal 𝒗* is the first unit weight vector for the PCA.

• The first Principal Component: 𝒛𝒊 = (𝒗∗𝑇 𝒙𝒊 )𝒗∗

38
Derivation of PCA - Optional

• Mathematical Problem Formulation :

• The optimal 𝒗* is the first unit weight vector for PCA:

𝑁
1 1 𝑇 𝑇
max ෍(𝒗𝑇 𝒙𝒊 )2 = max 𝒗 𝑿 𝑿𝒗
𝒗 2 =1 𝑁 𝒗 2 =1 𝑁
𝑖=1

• You can calculate the other 𝒗 and 𝒛𝒊 in an incremental fashion:

• Substitute 𝒙𝒊 with the residual (𝒙𝒊 - 𝒛𝒊 ).

• Calculate the next unit weight vector and principle component.

• How to obtain the optimal 𝒗 here?

39
Derivation of PCA - Optional

• Algorithm:

• We need to use Singular Value Decomposition (SVD):

40
Derivation of PCA - Optional

• Algorithm:

𝚺 𝑇
1. Apply SVD to the data matrix 𝑿, as sv𝑑 𝑿 = 𝑼 𝑽 .
𝟎

2. 𝑽 = 𝒗𝟏 , 𝒗𝟐 , … , 𝒗𝒌 , … 𝒗𝒅 , where 𝒗𝒌 is the k-th unit weight vector for PCA.

3. The k-th Principal Component of 𝒙𝒊 : (𝒗𝑘 𝑇 𝒙𝒊 )𝒗𝑘

4. Project to the k-dimensional subspace 𝑽𝒌 = 𝒗𝟏 , 𝒗𝟐 , … , 𝒗𝒌 :

𝒗1 𝑇 𝒙𝒊
𝑽 𝒌 𝑻 𝒙𝒊 = ⋮
𝒗𝑘 𝑇 𝒙𝒊

41
Derivation of PCA - Example

• Given data points, 𝒙1 = 4, 6, 10 , 𝒙2 = 3, 10, 13 , 𝒙3 = −2, −6, −8 .

• Calculate the first unit weight vector 𝒗, and the first Principal Component.

4 6 10
1. Data Matrix: 𝐗 = 3 10 13
−2 −6 −8

2. SVD: 𝐗 = 𝐔𝚺𝑽𝑻

−0.22 0.78 −0.58

3. 𝑽 = −0.57 −0.59 −0.58 −0.22
−0.79 0.20 0.58 𝒗𝟏 = −0.57 𝒛𝑖,𝑘 = (𝒗𝑘 𝑇 𝒙𝒊 )𝒗𝑘
−0.79
42
PCA Example 1

• Representation of handwriting digits in MNIST dataset:

Sample Variance
with rotations,
scales, etc.

43
PCA Example 1

• Representation of handwriting digits in MNIST dataset:

• PCA provides more robust and invariant features.

Reconstructions:

x k=2 k = 10 k = 50 k = 200

44
PCA Example 2
• Image Compression

Original Image

• Divide the original 372x492 image into patches:

• Each patch is an instance that contains 12x12 pixels on a grid
• View each as a 144-D vector
45
PCA Example 2
• Image Compression
PCA compression: 144D → 60D

46
PCA Example 2
• Image Compression
PCA compression: 144D → 16D

47
PCA Example 2
• Image Compression

16 most important eigenvectors

2 2 2 2
4 4 4 4
6 6 6 6
8 8 8 8
10 10 10 10
12 12 12 12
2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12 2 4 6 8 10 12

48
PCA Example 2
• Image Compression
PCA compression: 144D → 6D

49
PCA Example 2
• Image Compression
PCA compression: 144D → 1D

50
Carry-on Questions

• Why do we need dimensionality reduction?

• To remove redundancy, prevent overfitting, remove noise, etc.

• What are two objectives achieved by PCA?

• Minimizing MSE and Maximizing Projected Variance

• How to calculate PCA?

• Apply SVD to the data matrix. Select the unit weight vector 𝒗𝒌
from the V matrix, and calculate (𝒗𝑘 𝑇 𝒙𝒊 )𝒗𝑘 .
51
Bayesian Inference

WEN Bihan (Asst Prof)

Homepage: https://personal.ntu.edu.sg/bihan.wen/

52
Outline

• Probability and Conditional Probability

• Bayes’ Theorem

• Naïve Bayes

• Examples

53
Carry-on Questions

• What is Bayes’ Theorem?

• What is Naïve Bayes assumption?

54
From deterministic to probabilistic learning

• Training a deep neural network for classification:

• Deterministic model: always gives you the same output for the same input.

55
From deterministic to probabilistic learning

• Training a deep neural network for classification:

• Deterministic inference: gives you the same output for the same input.

• Bayesian Inference

• Tossing a coin twice, what outcomes will we get?

• There are 4 possible outcomes of this experiment

𝑆 = {𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇}

• Probabilistic model: 𝑃𝑟𝑜𝑏 𝐻𝐻 = 0.25

56
Basic Concepts in Probability Theory

• Concepts and Notations:

• h = A hypothesis or an event. Tossing a coin

57
Basic Concepts in Probability Theory

• Concepts and Notations:

• h = A hypothesis or an event.

• D = A collection of data (e.g., training data).

We tossed the coin 8

times, and got 4 tails
and 4 heads
sequentially.
{H,H,H,H,T,T,T,T}
58
Basic Concepts in Probability Theory

• Concepts and Notations:

• h = A hypothesis or an event.

• D = A collection of data (e.g., training data).

• P(h) = Probability that Hypothesis h holds.

• P(D) = Probability of observing the training data D.

P(h) = probability of getting heads

59
Basic Concepts in Probability Theory

• Concepts and Notations:

• h = A hypothesis or an event.

• D = A collection of data (e.g., training data).

• P(h) = Probability that Hypothesis h holds.

• P(D) = Probability of observing the training data D.

• P(D|h) = Probability of observing D when h holds. (read

“probability of D given h”, “probability of D conditioned on h” ).

60
Basic Concepts in Probability Theory

• We are interested of P(h|D), because

• We always learn from history / knowledge.

• We normally have the access to the training data D.

• We need to know P(h|D), aims to find the most probable

hypothesis given that data.

61
Basic Concepts in Probability Theory

• Concepts and Notations:

• h = A hypothesis or an event.

• D = A collection of data (e.g., training data).

• P(h) = Probability that Hypothesis h holds.

• P(D) = Probability of observing the training data D.

• P(D|h) = Probability of observing D when h holds. (read

“probability of D given h).

• Can we calculate P(h|D)? 62

Joint and Conditional Probability

• Conditional Probability 𝑃 𝐴 | 𝐵 :

The probability that A happens given that B happened.

• Joint probability 𝑃 𝐴, 𝐵 :

The probability that A and B both happen.

• 𝑃 𝐴, 𝐵 = 𝑃 𝐴 𝐵 ∗ 𝑃(𝐵)

63
Bayes’ Theorem

• The simple form of Bayes’ Theorem:

𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴|𝐵 =
𝑃(𝐵)

• How to derive the theorem?

• Use joint probability: 𝑃 𝐴, 𝐵 = 𝑃 𝐴 𝐵) 𝑃 𝐵 = 𝑃 𝐵 𝐴) 𝑃(𝐴).

• Therefore:
𝑃(𝐴, 𝐵) 𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴|𝐵 = =
𝑃(𝐵) 𝑃(𝐵)

64
Bayes’ Theorem

• Given P(B) which is a constant, the proportional form:

• Sometimes P(B) can be also calculated as

• This is based on Law of total probability:

65
Bayes’ Theorem

• Now, we can predict 𝑃(ℎ|𝐷):

𝑃 𝐷 ℎ) 𝑃(ℎ)
𝑃 ℎ𝐷 =
𝑃(𝐷)
• Some terminologies we are going to use:

• Prior probability 𝑃(ℎ): prior knowledge of ℎ before observing 𝐷.

• Posterior 𝑃 ℎ 𝐷 : the probability of ℎ after we have observed 𝐷.

• Likelihood 𝑃 𝐷 ℎ : Likelihood of observing 𝐷 given ℎ.

66
Bayes’ Theorem

• Maximum A Posteriori (MAP) hypothesis:

𝐡𝐌𝐀𝐏 = 𝐚𝐫𝐠𝐦𝐚𝐱 𝐏(𝐡|𝐃)

𝑃 𝐷 ℎ) 𝑃(ℎ)
• We just learned that 𝑃 ℎ 𝐷 = . Given D, we have
𝑃(𝐷)

𝐡𝐌𝐀𝐏 = 𝐚𝐫𝐠𝐦𝐚𝐱 𝐏 𝐃 𝐡 𝐏(𝐡)

• If P(h) is constant, MAP is equivalent to Maximum Likelihood (ML):

𝐡𝐌𝐀𝐏 = 𝐚𝐫𝐠𝐦𝐚𝐱 𝐏 𝐃 𝐡

67
Example Questions

Quiz 1: Flipping a coin

• Suppose we are flipping a fair coin twice.

What is the probability that both flips are heads?

68
Example Questions

• Quiz 2: Flipping a coin

• Suppose we are flipping a fair coin twice.

Given that the outcome of first flip is heads, what is the
probability that both flips are heads?

69
Example Questions

• Quiz 3: Our IE4483 is attended by students from both EEE and IEM.
Only 50% of the IEM students and 30% of the EEE students pass the
exam. Given that 60% of the entire class are EEE students, what is the
percentage of IEM students amongst those who pass the exam?

70
Example Questions

• Quiz 4: Monty Hall Problem

• You’re a contestant on a game show. You see three closed doors, and
behind one of them is a prize. You choose one door, and the host opens
one of the other doors and reveals that there is no prize behind it. Then
he offers you a chance to switch to the remaining door. Should you take
it?

http://en.wikipedia.org/wiki/Monty_Hall_problem 71
Naïve Bayes

• The basic form of Bayes’ Theorem:

𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴|𝐵 =
𝑃(𝐵)

• It is straightforward if A and B are both single attributes.

• But what if we have multiple conditions or query events?

• How to represent their conditional probabilities?

72
Naïve Bayes

• Extend from simple example to large training datasets.

• Single attribute Multiple attributes

• How does 𝑃 𝑎1 , 𝑎2 𝑣𝑗 ) relate to 𝑃 𝑎1 𝑣𝑗 ) and 𝑃 𝑎2 𝑣𝑗 ) ?

• Naïve Bayes Assumption:

𝑃 𝑎1 , 𝑎2 𝑣𝑗 ) = 𝑃 𝑎1 𝑣𝑗 )𝑃 𝑎2 𝑣𝑗 )

73
Naïve Bayes

• Naïve Bayes assumption:

• The conditional independence assumption

• The values of some features are conditionally independent on the others.

• Mathematical Form:

𝑃 𝑎1 , … , 𝑎𝑑 𝑣𝑗 ) = 𝑃 𝑎1 𝑣𝑗 ) … 𝑃 𝑎𝑑 𝑣𝑗 )

𝑃 𝑎1 , … , 𝑎𝑑 𝑣𝑗 ) = ෑ 𝑃 𝑎𝑖 𝑣𝑗 )
𝑖=1

74
Naïve Bayes - Example

• Play Tennis:

• New Observation: <Sunny,Cool,High,Strong>, do you play tennis?

75
Naïve Bayes - Example

• Play Tennis:

• Compare 𝑃 𝑦𝑒𝑠 < 𝑆, 𝐶, 𝐻, 𝑆 >) and

𝑃 𝑛𝑜 < 𝑆, 𝐶, 𝐻, 𝑆 >)
S = Sunny (outlook)
• According to Bayes’ Theorem, it is to compare
𝑃(𝑦𝑒𝑠, < 𝑆, 𝐶, 𝐻, 𝑆 >) and 𝑃(𝑛𝑜, < 𝑆, 𝐶, 𝐻, 𝑆 >)
C = Cool (Temp.)
• 𝑃(𝑦𝑒𝑠, < 𝑆, 𝐶, 𝐻, 𝑆 >) = 𝑃 𝑦𝑒𝑠 𝑃 < 𝑆, 𝐶, 𝐻, 𝑆 > 𝑦𝑒𝑠) H = High (Humidity)

S = Strong (Wind)

76
Bayes’ Theorem

• Conditional Prob to Joint Prob

𝑃 𝐴, 𝐵 = 𝑃 𝐴 𝐵 ∗ 𝑃(𝐵)

𝑃 𝑦𝑒𝑠, < 𝑆, 𝐶, 𝐻, 𝑆 > = 𝑃 𝑦𝑒𝑠 < 𝑆, 𝐶, 𝐻, 𝑆 > ∗ 𝑃(< 𝑆, 𝐶, 𝐻, 𝑆 >)

𝑃 𝑛𝑜, < 𝑆, 𝐶, 𝐻, 𝑆 > = 𝑃 𝑛𝑜 < 𝑆, 𝐶, 𝐻, 𝑆 > ∗ 𝑃(< 𝑆, 𝐶, 𝐻, 𝑆 >)

• 𝑃(< 𝑆, 𝐶, 𝐻, 𝑆 >) is a constant, i.e., same for both choices.

• Let’s call as the 𝑃(< 𝑆, 𝐶, 𝐻, 𝑆 >) as the 𝑃(𝑂𝑏𝑠𝑒𝑟. )

77
Bayes’ Theorem

• Bayes’ Theorem:

𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴|𝐵 =
𝑃(𝐵)

𝑃 𝑂𝑏𝑠𝑒𝑟. 𝑦𝑒𝑠 𝑃(𝑦𝑒𝑠)

𝑃 𝑦𝑒𝑠 | 𝑂𝑏𝑠𝑒𝑟. =
𝑃(𝑂𝑏𝑠𝑒𝑟. )

• Again, 𝑃(𝑂𝑏𝑠𝑒𝑟. ) is a constant, i.e., same for both choices.

• Just compare 𝑃 𝑂𝑏𝑠𝑒𝑟. 𝑦𝑒𝑠 𝑃(𝑦𝑒𝑠) and 𝑃 𝑂𝑏𝑠𝑒𝑟. 𝑛𝑜 𝑃(𝑛𝑜)

78
Naïve Bayes - Example

• Play Tennis:

• Compare 𝑃 𝑦𝑒𝑠 < 𝑆, 𝐶, 𝐻, 𝑆 >) and

𝑃 𝑛𝑜 < 𝑆, 𝐶, 𝐻, 𝑆 >)

• According to Bayes’ Theorem, it is to compare S = Sunny (outlook)

𝑃(𝑦𝑒𝑠, < 𝑆, 𝐶, 𝐻, 𝑆 >) and 𝑃(𝑛𝑜, < 𝑆, 𝐶, 𝐻, 𝑆 >)
C = Cool (Temp.)
• 𝑃(𝑦𝑒𝑠, < 𝑆, 𝐶, 𝐻, 𝑆 >) = 𝑃 𝑦𝑒𝑠 𝑃 < 𝑆, 𝐶, 𝐻, 𝑆 > 𝑦𝑒𝑠)
H = High (Humidity)
• Based on Naïve Bayes assumption, we have
𝑃 < 𝑆, 𝐶, 𝐻, 𝑆 > 𝑦𝑒𝑠) = S = Strong (Wind)
𝑃 𝑆 𝑦𝑒𝑠 𝑃 𝐶 𝑦𝑒𝑠 𝑃 𝐻 𝑦𝑒𝑠 𝑃 𝑆 𝑦𝑒𝑠
2 3 3 3
= 9 × 9 × 9 × 9 = 0.00823
• P(yes) = #yes / 15 = 9/14 By simply counting

9
• Thus, 𝑃(𝑦𝑒𝑠, < 𝑆, 𝐶, 𝐻, 𝑆 >) = 14 × 0.00823 = 𝟎. 𝟎𝟎𝟓𝟏
79
Naïve Bayes - Example

• Play Tennis:

• 𝑃(𝑦𝑒𝑠, < 𝑆, 𝐶, 𝐻, 𝑆 >) = 𝟎. 𝟎𝟎𝟓𝟏

• Similarly, 𝑃 𝑛𝑜, < 𝑆, 𝐶, 𝐻, 𝑆 > = 𝑃 𝑛𝑜 𝑃 < 𝑆, 𝐶, 𝐻, 𝑆 > 𝑛𝑜)

= 𝑃 𝑛𝑜 𝑃 𝑆 𝑛𝑜 𝑃 𝐶 𝑛𝑜 𝑃 𝐻 𝑛𝑜 𝑃 𝑆 𝑛𝑜
= 0.36 × 0.6 × 0.2 × 0.8 × 0.6 = 𝟎. 𝟎𝟐𝟎𝟕

• Do you play tennis?

• 𝑃 𝑦𝑒𝑠 < 𝑆, 𝐶, 𝐻, 𝑆 > < 𝑃 𝑛𝑜, < 𝑆, 𝐶, 𝐻, 𝑆 >

• Given the observation <Sunny,Cool,High,Strong>, more likely you

will NOT play tennis.

80
Carry-on Questions

• What is Bayes’ Theorem?

𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴|𝐵 =
𝑃(𝐵)

• What is Naïve Bayes assumption?

𝑃 𝑎1 , … , 𝑎𝑑 𝑣𝑗 ) = ෑ 𝑃 𝑎𝑖 𝑣𝑗 )
𝑖=1

81
What we have learned

• Probability and Conditional Probability

• Hypothesis, joint / conditional probability, etc.

• Bayes’ Theorem

• Various forms of Bayes’ Theorem, prior, likelihood, Posterior, etc.

• Naïve Bayes

• Multi-attributes, Naïve Bayes assumption, examples.

CP8D Information 2023
No ratings yet
CP8D Information 2023
4 pages
Belden Electronic Wire Catalog No 864
No ratings yet
Belden Electronic Wire Catalog No 864
28 pages
Chapter 10. Dimensionality Reduction With PCA
No ratings yet
Chapter 10. Dimensionality Reduction With PCA
23 pages
20-pca
No ratings yet
20-pca
50 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
Lecture 9_PCA
No ratings yet
Lecture 9_PCA
44 pages
کتاب نهم بارگزاری شده
No ratings yet
کتاب نهم بارگزاری شده
55 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
Pca
No ratings yet
Pca
6 pages
Linear Regression: Dimensionality Reduction
No ratings yet
Linear Regression: Dimensionality Reduction
7 pages
03 Dimensionality Reduction
No ratings yet
03 Dimensionality Reduction
38 pages
10_Autoencoders
No ratings yet
10_Autoencoders
42 pages
cs229 Notes10 PDF
No ratings yet
cs229 Notes10 PDF
6 pages
Lecture8 2015
No ratings yet
Lecture8 2015
51 pages
Unit 3
No ratings yet
Unit 3
102 pages
Dimension Reduction
No ratings yet
Dimension Reduction
23 pages
Data Pre-Processing-IV (Feature Extraction-PCA)_7c5a4c5da931f4f69a14c94e7e8b9062
No ratings yet
Data Pre-Processing-IV (Feature Extraction-PCA)_7c5a4c5da931f4f69a14c94e7e8b9062
23 pages
PCA
100% (1)
PCA
33 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Lecture 9 -Data Prep - Reduction - PCA-M
No ratings yet
Lecture 9 -Data Prep - Reduction - PCA-M
44 pages
PCA (v3)
No ratings yet
PCA (v3)
34 pages
Principal Component Analysis: Atent Ariables
No ratings yet
Principal Component Analysis: Atent Ariables
13 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
Module 5 - BECE309L - AIML - Part2
No ratings yet
Module 5 - BECE309L - AIML - Part2
34 pages
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
No ratings yet
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
19 pages
Principal Components Analysis (PCA) Final
No ratings yet
Principal Components Analysis (PCA) Final
23 pages
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages
ML_Lec-20
No ratings yet
ML_Lec-20
17 pages
4.5 Principal Component Analysis
No ratings yet
4.5 Principal Component Analysis
15 pages
Principal Component Analysis (PCA) : Anisha M. Lal
No ratings yet
Principal Component Analysis (PCA) : Anisha M. Lal
20 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
16 pages
Presentation a i Std 2
No ratings yet
Presentation a i Std 2
63 pages
Part1 Lecture 12 Annotated
No ratings yet
Part1 Lecture 12 Annotated
12 pages
Pca
No ratings yet
Pca
39 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
Lecture W12ab
No ratings yet
Lecture W12ab
60 pages
Curse of Dimensionality, Dimensionality Reduction With PCA
No ratings yet
Curse of Dimensionality, Dimensionality Reduction With PCA
36 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
No ratings yet
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
22 pages
315 F19 27 Pca1
No ratings yet
315 F19 27 Pca1
28 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
MLSP-6 dimensionality reduction
No ratings yet
MLSP-6 dimensionality reduction
39 pages
Kernel Principal Component Analysis and Its Applications in Face Recognition and Active Shape Models
No ratings yet
Kernel Principal Component Analysis and Its Applications in Face Recognition and Active Shape Models
9 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
lec 13-14 PCA
No ratings yet
lec 13-14 PCA
53 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Face Recognition PAC
No ratings yet
Face Recognition PAC
24 pages
ML Mod32019
No ratings yet
ML Mod32019
6 pages
Dimensionality Reduction by Pca: Non - Feasible
No ratings yet
Dimensionality Reduction by Pca: Non - Feasible
26 pages
Lecture 6(b) PCA-II
No ratings yet
Lecture 6(b) PCA-II
90 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
Ai & ML Week-9
No ratings yet
Ai & ML Week-9
30 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Documents - Pub Surgical Anatomy of Breast Management of Advanced Carcinoma Breast
No ratings yet
Documents - Pub Surgical Anatomy of Breast Management of Advanced Carcinoma Breast
53 pages
A Bluetooth® Audio Modules
No ratings yet
A Bluetooth® Audio Modules
30 pages
(A) Parking Pawl (B) Return Spring: Automatic Transmission
No ratings yet
(A) Parking Pawl (B) Return Spring: Automatic Transmission
2 pages
Microsoft Word - LIBS - TASK FPETR20 SAMPLE - TEST - OCT 104643 2017
No ratings yet
Microsoft Word - LIBS - TASK FPETR20 SAMPLE - TEST - OCT 104643 2017
10 pages
Brochure Complete Instrument Valve Solution Tescom Anderson Greenwood Instrumentation en 6272716
No ratings yet
Brochure Complete Instrument Valve Solution Tescom Anderson Greenwood Instrumentation en 6272716
8 pages
Bac1 End of Term Units 1-3 B
No ratings yet
Bac1 End of Term Units 1-3 B
6 pages
Hiab 270
100% (1)
Hiab 270
163 pages
Design For Environment
No ratings yet
Design For Environment
14 pages
The Comprehensive Valve Range: Consultant'S Approval
No ratings yet
The Comprehensive Valve Range: Consultant'S Approval
2 pages
Form 47 (See Rules 83 (2) and 87 (2) ) Authorisation For Tourist Permit or National Permit
No ratings yet
Form 47 (See Rules 83 (2) and 87 (2) ) Authorisation For Tourist Permit or National Permit
2 pages
Energy and Chemistry: Lesson 1: Nature of Energy
No ratings yet
Energy and Chemistry: Lesson 1: Nature of Energy
13 pages
Chapter 30 Assessment and Management of Patients With Vascular Disorders and Problems of Peripheral Circulation
No ratings yet
Chapter 30 Assessment and Management of Patients With Vascular Disorders and Problems of Peripheral Circulation
23 pages
Di Module Wiring Sample
No ratings yet
Di Module Wiring Sample
2 pages
Prefab
No ratings yet
Prefab
40 pages
Terrace Floor - R0 - 26.03.2024
No ratings yet
Terrace Floor - R0 - 26.03.2024
1 page
Finance Crypto Assignment
No ratings yet
Finance Crypto Assignment
8 pages
Sporulation of Aspergillus Niger and Aspergillus
No ratings yet
Sporulation of Aspergillus Niger and Aspergillus
10 pages
Securing Microservices In: Securing Your First Microservice
No ratings yet
Securing Microservices In: Securing Your First Microservice
43 pages
Electronic Devices and Circuits.
No ratings yet
Electronic Devices and Circuits.
8 pages
Aggalao, Jonathan Guyang PMSG
No ratings yet
Aggalao, Jonathan Guyang PMSG
1 page
Industry Best Companies in India
58% (19)
Industry Best Companies in India
174 pages
KIDS 2 - Syllabus 2025
No ratings yet
KIDS 2 - Syllabus 2025
2 pages
De-Embedding Transmission Line Measurements For Accurate Modeling of IC Designs
No ratings yet
De-Embedding Transmission Line Measurements For Accurate Modeling of IC Designs
7 pages
Minutes of the Meeting_Solo Parents
No ratings yet
Minutes of the Meeting_Solo Parents
3 pages
Mery Luciawaty - Reservoir Management Manager - Pertamina Hulu Energi ONWJ LTD - LinkedIn PDF
No ratings yet
Mery Luciawaty - Reservoir Management Manager - Pertamina Hulu Energi ONWJ LTD - LinkedIn PDF
8 pages
AE8302-Elements of Aeronautical Engineering: Reg. No
No ratings yet
AE8302-Elements of Aeronautical Engineering: Reg. No
2 pages
CAIIB Financial Management
0% (1)
CAIIB Financial Management
40 pages
Rocks and Minerals Webquest
No ratings yet
Rocks and Minerals Webquest
4 pages