SL09. Bayesian Learning

This document provides an overview of Bayesian learning. It begins by defining Bayesian learning as finding the most probable hypothesis given input data and prior knowledge. It then introduces Bayes' rule and how it can be used to calculate the posterior probability of a hypothesis. The document explains how to perform Bayesian learning by calculating the posterior probability for each hypothesis and selecting the one with the highest value. It also discusses how to apply Bayesian learning to classification problems and handles noise in the data.

Uploaded by

Keshav Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

SL09. Bayesian Learning

Uploaded by

Keshav Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

CS7641 SL09.

Bayesian Learning Mohamed Ameen Amer

SL09. Bayesian Learning

Introduction:
● We’re trying to learn the best (most probable) hypothesis 𝐻 given input data
and domain knowledge.
Best == Most probable
• It’s the probability of some hypothesis h given input data D:
𝑃𝑟 (ℎ | 𝐷)
• We’re trying to find hypothesis h with the highest probability Pr:
𝑎𝑟𝑔𝑚𝑎𝑥ℎ ∈ 𝐻 (𝑃𝑟 (ℎ | 𝐷))

Bayes Rule:
• Bayes Rule for probability states that:
𝑃𝑟 (𝐷 | ℎ)𝑃𝑟 (ℎ)
𝑃𝑟 (ℎ | 𝐷) =
𝑃𝑟 (𝐷)
- 𝑃𝑟 (ℎ | 𝐷) → The probability of a specific hypothesis given input data (Posterior probability).
- 𝑃𝑟 (𝐷 | ℎ) → The probability of data given the hypothesis. It’s the (likelihood) of seeing some
particular labels associated with input points, given a world where some hypothesis h is true.
- 𝑃𝑟 (ℎ) → The prior probability of a particular hypothesis. This value encapsulates our prior belief
that one hypothesis is likely or unlikely compared to other hypotheses. This is basically the
domain knowledge.
- 𝑃𝑟 (𝐷) → The likelihood of the data under all hypotheses (A normalizing term).
- Bayes Rule is derived from the chain rule:
𝑃𝑟 (𝑎, 𝑏) = 𝑃𝑟 (𝑎, 𝑏)𝑃𝑟 (𝑏)
𝑃𝑟 (𝑎, 𝑏) = 𝑃𝑟 (𝑏, 𝑎)𝑃𝑟 (𝑎)
𝑡ℎ𝑒𝑛 → 𝑃𝑟 (𝑎, 𝑏)𝑃𝑟 (𝑏) = 𝑃𝑟 (𝑏, 𝑎)𝑃𝑟 (𝑎)
𝑃𝑟 (𝑏, 𝑎)𝑃𝑟 (𝑎)
𝑃𝑟 (𝑎, 𝑏) =
𝑃𝑟 (𝑏)

Bayesian Learning:
• Bayesian Learning algorithm:
For each ℎ ∈ 𝐻:
𝑃𝑟 (𝐷 | ℎ)𝑃𝑟 (ℎ)
Calculate 𝑃𝑟 (ℎ | 𝐷) = 𝑃𝑟 (𝐷)
Output:
ℎ = 𝑎𝑟𝑔𝑚𝑎𝑥ℎ ∈ 𝐻 (𝑃𝑟 (ℎ | 𝐷))

Page | 1
CS7641 SL09. Bayesian Learning Mohamed Ameen Amer

- Using this approximate probability, we can calculate the Maximum a Posteriori (MAP), which is
the maximum probability hypothesis given the data across all hypotheses:
ℎ𝑚𝑎𝑝 = 𝑎𝑟𝑔𝑚𝑎𝑥ℎ ∈ 𝐻 (𝑃𝑟 (ℎ | 𝐷))
𝑃𝑟 (𝐷 | ℎ)𝑃𝑟 (ℎ)
ℎ𝑚𝑎𝑝 = 𝑎𝑟𝑔𝑚𝑎𝑥ℎ ∈ 𝐻 ( )
𝑃𝑟 (𝐷)
- Since we’re interested in finding the hypothesis with the highest probability, not the exact
probability value for each hypothesis, our prior on the data isn’t exactly relevant. That is, we
don’t care about the 𝑃𝑟 (𝐷) term in the denominator as it affects all computations equally:
ℎ𝑚𝑎𝑝 = 𝑎𝑟𝑔𝑚𝑎𝑥ℎ ∈ 𝐻 (𝑃𝑟 (𝐷 | ℎ)𝑃𝑟 (ℎ))
- We can also assume that our prior belief is uniform over all the hypotheses ℎ ∈ 𝐻 (we equally
believe in every ℎ ∈ 𝐻), then we can drop 𝑃𝑟 (ℎ) from the equation, ending up with the
Maximum Likelihood:
ℎ𝑚𝑙 = 𝑎𝑟𝑔𝑚𝑎𝑥ℎ ∈ 𝐻 (𝑃𝑟 (𝐷 | ℎ))
• The problem with Bayes Learning is that it’s not practical to perform direct computations for
large hypotheses spaces, because you have to look into every single hypothesis.

Bayesian Learning in Action:

• Assume:
- Given noise-free training data {〈𝑥𝑖 , 𝑑𝑖 〉} as examples of 𝑐.
- 𝑐 ∈ 𝐻
- Uniform prior.
• We need to calculate 𝑃𝑟 (ℎ | 𝐷):
𝑃𝑟 (𝐷 | ℎ)𝑃𝑟 (ℎ)
𝑃𝑟 (ℎ | 𝐷) =
𝑃𝑟 (𝐷)
1
𝑃𝑟 (ℎ) = , 𝑏𝑒𝑐𝑎𝑢𝑠𝑒 𝑤𝑒 ℎ𝑎𝑣𝑒 𝑎 𝑢𝑛𝑖𝑓𝑜𝑟𝑚 𝑝𝑟𝑖𝑜𝑟
|𝐻|
1 𝑖𝑓 𝑑𝑖 = ℎ(𝑥𝑖 ) ∀𝑥𝑖 , 𝑑𝑖 ∈ 𝐷
𝑃𝑟 (𝐷 | ℎ) = {
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
- This equation basically means that 𝑃𝑟 (𝐷 | ℎ) = 1 if ℎ ∈ 𝑉𝑆(𝐷)
1 |𝑉𝑆|
𝑃𝑟 (𝐷) = ∑ 𝑃𝑟 (𝐷 | ℎ𝑖 ) 𝑃𝑟 (ℎ𝑖 ) = ∑ 1 . =
|𝐻| |𝐻|
ℎ𝑖 ∈ 𝐻 ℎ𝑖 ∈ 𝑉𝑆𝐻,𝐷
1
1. 1
|𝐻|
𝑃𝑟 (ℎ | 𝐷) = =
|𝑉𝑆| |𝑉𝑆|
|𝐻|
- This means that given data 𝐷, the probability of ℎ to be a correct hypothesis is a uniform over all
the hypotheses in the version space.

Page | 2
CS7641 SL09. Bayesian Learning Mohamed Ameen Amer

Bayesian Learning with Noise:

• Assume:
- Given {〈𝑥𝑖 , 𝑑𝑖 〉}
- 𝑑𝑖 = 𝑓(𝑥𝑖 ) + 𝜀𝑖
- 𝜀𝑖 ~ 𝑁(0, 𝜎 2 ) → IID (Independent and Identically Distributed)
• We need to calculate 𝑃𝑟 (ℎ | 𝐷):
ℎ𝑚𝑙 = 𝑎𝑟𝑔𝑚𝑎𝑥ℎ ∈ 𝐻 (𝑃𝑟 (𝐷 | ℎ))
- To find 𝑃𝑟 (𝐷 | ℎ) for IID, we find the product of the probability of each data point given the
hypothesis:
ℎ𝑚𝑙 = 𝑎𝑟𝑔𝑚𝑎𝑥ℎ ∈ 𝐻 ∏ 𝑃𝑟 (𝑑𝑖 | ℎ)
𝑖
- Given a Gaussian noise:
2
1 −1 (𝑑𝑖 −ℎ(𝑥𝑖 ))
( . )
ℎ𝑚𝑙 = 𝑎𝑟𝑔𝑚𝑎𝑥ℎ ∈ 𝐻 ∏ 𝑒 2 𝜎2
𝑖
√2𝜋𝜎 2
- Since we’re looking for the maximum:
1
1. We can remove since we’re looking for the maximum.
√2𝜋𝜎 2
2. We can take the natural log 𝑙𝑛 to remove the exponential. Since 𝑙𝑛 of a product is equal to
the sum of the terms, we end up with the following function.

2
−1 (𝑑𝑖 − ℎ(𝑥𝑖 ))
ℎ𝑚𝑙 = 𝑎𝑟𝑔𝑚𝑎𝑥ℎ ∈ 𝐻 ∑ .
2 𝜎2
𝑖
1
- Again, since we’re calculating the maximum, we can remove the 2 and the 𝜎 2 :
2
ℎ𝑚𝑙 = 𝑎𝑟𝑔𝑚𝑎𝑥ℎ ∈ 𝐻 − ∑ (𝑑𝑖 − ℎ(𝑥𝑖 ))
𝑖
- Maximizing a negative value is the same as minimizing the positive sum of this value:
2
ℎ𝑚𝑙 = 𝑎𝑟𝑔𝑚𝑖𝑛ℎ ∈ 𝐻 ∑ (𝑑𝑖 − ℎ(𝑥𝑖 ))
𝑖
• This means: If you’re looking for the maximum likelihood hypothesis, you should minimize the sum
of squared error.
• This model will not work if the data is corrupted with any sort of noise other than Gaussian noise.

Page | 3
CS7641 SL09. Bayesian Learning Mohamed Ameen Amer

Minimum Description Length:

ℎ𝑚𝑎𝑝 = 𝑎𝑟𝑔𝑚𝑎𝑥ℎ ∈ 𝐻 (𝑃𝑟 (𝐷 | ℎ)𝑃𝑟 (ℎ))
ℎ𝑚𝑎𝑝 = 𝑎𝑟𝑔𝑚𝑎𝑥ℎ ∈ 𝐻 [log 𝑃𝑟 (𝐷 | ℎ) + log 𝑃𝑟 (ℎ)]
ℎ𝑚𝑎𝑝 = 𝑎𝑟𝑔𝑚𝑖𝑛ℎ ∈ 𝐻 [− 𝑙𝑜𝑔 𝑃𝑟 (𝐷 | ℎ) − 𝑙𝑜𝑔 𝑃𝑟 (ℎ)]
• Information theory: The optimal code for some event 𝑤 with probability 𝑃𝑟 has a length of − log 𝑃𝑟 .
• This means that in order to maximize the Maximum a Posteriori hypothesis, we need to minimize
two terms that can be described as length:
- 𝑙𝑜𝑔 𝑃𝑟 (ℎ) → This is the length of the hypothesis, which is the number of bits needed to
represent this hypothesis.
- 𝑙𝑜𝑔 𝑃𝑟 (𝐷 | ℎ) → This is the length of the data given a particular hypothesis. If the hypothesis
perfectly describes the data, so we don’t need any points. But if the hypothesis labels some
points wrong, so we need the correct labels for these points to be able to come up with a better
hypothesis. So basically this term captures the error.
• This is always a trade of, a more complex hypothesis will drive down error, while a simple hypothesis
will have some error.
• We need to find the best hypothesis, which is the simplest hypothesis that minimizes error. This
hypothesis is called the Minimum Description.

Bayesian Classification:
• The question in classification is “What is the best label?” not the best hypothesis.
• To find the best label, we need to do a weighted vote for every single hypothesis in the hypotheses
set, where the weight is the probability 𝑃𝑟 (ℎ | 𝐷).
• Now we end up trying to maximize 𝑣𝑚𝑎𝑝 :
𝑉𝑚𝑎𝑝 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑣𝑗 ∈ 𝑉 ∑ 𝑃𝑟 (𝑣𝑗 | ℎ𝑖 ) 𝑃𝑟 (ℎ𝑖 | 𝐷)
ℎ𝑖 ∈ 𝐻
• The Bayes optimal classifier is computationally very costly. This is because the posterior probability
𝑃𝑟 (ℎ | 𝐷) must be computed for each hypothesis ℎ ∈ 𝐻 and combined with the prediction 𝑃𝑟 (𝑣 | ℎ)
before 𝑉𝑚𝑎𝑝 can be computed.

Page | 4

Acústica Audio Nebula N4 Manual
100% (1)
Acústica Audio Nebula N4 Manual
91 pages
Automotive Signal Geneartor Asg102
100% (1)
Automotive Signal Geneartor Asg102
30 pages
Bayesian Learning Unit 3 PDF
No ratings yet
Bayesian Learning Unit 3 PDF
18 pages
slide07-bayes
No ratings yet
slide07-bayes
51 pages
ML - Unit 1 - Part Ii
No ratings yet
ML - Unit 1 - Part Ii
18 pages
Wa0002.
No ratings yet
Wa0002.
24 pages
15CS73 Module 4
No ratings yet
15CS73 Module 4
60 pages
Bayesian Learning
No ratings yet
Bayesian Learning
81 pages
6.1 Bayesian Learning
No ratings yet
6.1 Bayesian Learning
33 pages
Module 4 - Bayesian Learning
No ratings yet
Module 4 - Bayesian Learning
36 pages
Bayes Algorithm
No ratings yet
Bayes Algorithm
26 pages
Bayesian
No ratings yet
Bayesian
91 pages
Unit III
No ratings yet
Unit III
19 pages
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
No ratings yet
Bayesian Decision Theory and Learning: Jayanta Mukhopadhyay Dept. of Computer Science and Engg
56 pages
MODULE - 4 QB SOLVED-1
No ratings yet
MODULE - 4 QB SOLVED-1
31 pages
3.1 New
No ratings yet
3.1 New
12 pages
Naive Bayes
No ratings yet
Naive Bayes
60 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
31 pages
AI&ML-Q With Answer
No ratings yet
AI&ML-Q With Answer
18 pages
Lecture 9: Bayesian Learning: Cognitive Systems II - Machine Learning SS 2005
No ratings yet
Lecture 9: Bayesian Learning: Cognitive Systems II - Machine Learning SS 2005
39 pages
ML - Unit4pdf
No ratings yet
ML - Unit4pdf
65 pages
L13 Bayesian Methods
No ratings yet
L13 Bayesian Methods
30 pages
Features of Bayesian Learning Methods
No ratings yet
Features of Bayesian Learning Methods
39 pages
ML Unit-4
No ratings yet
ML Unit-4
24 pages
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
No ratings yet
Machine Learning: Lecture 6: Bayesian Learning (Based On Chapter 6 of Mitchell T.., Machine Learning, 1997)
15 pages
ML UNIT 4-1-24
No ratings yet
ML UNIT 4-1-24
24 pages
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
No ratings yet
ML - Unit-3 Chapter - 6 (Bayes Theorem) - Notes
123 pages
Unit 4
No ratings yet
Unit 4
18 pages
Bayesian Learning: Artificial Intelligence and Machine Learning 18CS71
No ratings yet
Bayesian Learning: Artificial Intelligence and Machine Learning 18CS71
24 pages
L23 Bayesian Naive
No ratings yet
L23 Bayesian Naive
18 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
178 pages
Bayesian Learning: Salma Itagi, Svit
No ratings yet
Bayesian Learning: Salma Itagi, Svit
14 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
16 pages
Module - 4 AIML
No ratings yet
Module - 4 AIML
22 pages
Module 5
No ratings yet
Module 5
24 pages
18CS71 Module 4
No ratings yet
18CS71 Module 4
30 pages
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
No ratings yet
Bayesian Learning: Based On "Machine Learning", T. Mitchell, Mcgraw Hill, 1997, Ch. 6
54 pages
Unit 2 Bayesian Learning
No ratings yet
Unit 2 Bayesian Learning
50 pages
module_5_notes BAYESIAN learning notes
No ratings yet
module_5_notes BAYESIAN learning notes
24 pages
##7 Rev ML Module-2 Bayesian Learning
No ratings yet
##7 Rev ML Module-2 Bayesian Learning
7 pages
Module - 4 Bayeian Learning
No ratings yet
Module - 4 Bayeian Learning
44 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
25 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
17 pages
UNIT 4 - Bayesian Learning
No ratings yet
UNIT 4 - Bayesian Learning
54 pages
Chapter 6 Bayesianlearning
No ratings yet
Chapter 6 Bayesianlearning
32 pages
ML Unit 3 Bayesian - Learning (Textbook)
No ratings yet
ML Unit 3 Bayesian - Learning (Textbook)
25 pages
2BAYESIAN LEARNING (1)
No ratings yet
2BAYESIAN LEARNING (1)
22 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
36 pages
ML 3
No ratings yet
ML 3
45 pages
Module 2 Notes
No ratings yet
Module 2 Notes
24 pages
Module4 Notes
100% (1)
Module4 Notes
31 pages
8-Probability Theory Cont''d and BAYESIAN LEARNING-01!08!2024
No ratings yet
8-Probability Theory Cont''d and BAYESIAN LEARNING-01!08!2024
22 pages
Bayesian Learning Video Tutorial
No ratings yet
Bayesian Learning Video Tutorial
25 pages
module_3_Last Part
No ratings yet
module_3_Last Part
16 pages
Dr. Arslan Shaukat
No ratings yet
Dr. Arslan Shaukat
18 pages
E-Note 14654 Content Document 20231228101425AM
No ratings yet
E-Note 14654 Content Document 20231228101425AM
10 pages
Concept Learning
No ratings yet
Concept Learning
33 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
14 pages
ML Unit III
No ratings yet
ML Unit III
40 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
No ratings yet
Ual Proposal Template Isabel Leadbeater
No ratings yet
Ual Proposal Template Isabel Leadbeater
8 pages
Ezinne CV
No ratings yet
Ezinne CV
2 pages
HF2n60 220F 1
No ratings yet
HF2n60 220F 1
5 pages
Declaration of Conformity: Elcometer Limited
No ratings yet
Declaration of Conformity: Elcometer Limited
1 page
Monthly Summary Collection 2017
No ratings yet
Monthly Summary Collection 2017
52 pages
Iep Sac Journal 2008 2009
No ratings yet
Iep Sac Journal 2008 2009
180 pages
Rumeysa Yurtseven-Final Project Report
No ratings yet
Rumeysa Yurtseven-Final Project Report
5 pages
IOT Based Single Axis Solar Panel Monitoring System Using Arduino
No ratings yet
IOT Based Single Axis Solar Panel Monitoring System Using Arduino
2 pages
Folleto General SMEC Año 2021 PRODUCT LINE UP - 12p - 2021 - en
No ratings yet
Folleto General SMEC Año 2021 PRODUCT LINE UP - 12p - 2021 - en
12 pages
Olt Nokia Mediciones
No ratings yet
Olt Nokia Mediciones
9 pages
3.scan Insertion Config
100% (1)
3.scan Insertion Config
50 pages
Oracle Database 11g: SQL Fundamentals I: Exam Number: 1Z0-051 Exam Price: US$ 125
No ratings yet
Oracle Database 11g: SQL Fundamentals I: Exam Number: 1Z0-051 Exam Price: US$ 125
4 pages
Onbase 13
No ratings yet
Onbase 13
39 pages
Chapter-8 (Controlling Services and Daemons)
No ratings yet
Chapter-8 (Controlling Services and Daemons)
2 pages
Prefunctional Test Checklist-12 - Vav
No ratings yet
Prefunctional Test Checklist-12 - Vav
6 pages
Scope Kone Care Standard & Plus
No ratings yet
Scope Kone Care Standard & Plus
1 page
Zainab Alam Resume
No ratings yet
Zainab Alam Resume
2 pages
Principle of Programming Languages LAB ETCS-458
No ratings yet
Principle of Programming Languages LAB ETCS-458
28 pages
Drilling Rig Checklist Rev1
No ratings yet
Drilling Rig Checklist Rev1
29 pages
BASIC CAL WEEK 7 9 Reviewer
No ratings yet
BASIC CAL WEEK 7 9 Reviewer
9 pages
Major Project Proposal (Steering Mechanism) - 2023-24
No ratings yet
Major Project Proposal (Steering Mechanism) - 2023-24
19 pages
PHD Thesis Data Mining PDF
100% (2)
PHD Thesis Data Mining PDF
5 pages
Flywheels and Gear Trains
No ratings yet
Flywheels and Gear Trains
25 pages
What Drives You Mad Everyday Annoyances Conversation Topics Dialogs Flashcards Icebreakers - 119266
No ratings yet
What Drives You Mad Everyday Annoyances Conversation Topics Dialogs Flashcards Icebreakers - 119266
4 pages
Module No.1 Media Information and Literacy Q4W1 2 1
No ratings yet
Module No.1 Media Information and Literacy Q4W1 2 1
16 pages
Sample Reading Paper Instrument For Form 1 Penilaian Sumatif
No ratings yet
Sample Reading Paper Instrument For Form 1 Penilaian Sumatif
7 pages
Deep Learning Lab With Output
No ratings yet
Deep Learning Lab With Output
12 pages
Tle CSS Module 6 - Testing Electronic Components
No ratings yet
Tle CSS Module 6 - Testing Electronic Components
39 pages

SL09. Bayesian Learning

Uploaded by

SL09. Bayesian Learning

Uploaded by

CS7641 SL09.

Bayesian Learning Mohamed Ameen Amer

SL09. Bayesian Learning

Bayesian Learning in Action:

Bayesian Learning with Noise:

Minimum Description Length:

You might also like