0% found this document useful (0 votes)
51 views

10 Bayes Theorem

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

10 Bayes Theorem

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

CSE 312

Foundations of Computing II
Lecture 5: Conditional Probability and Bayes Theorem

Anna R. Karlin
Slide Credit: Based on Stefano Tessaro’s slides for 312 19au
incorporating ideas from Alex Tsun, Rachel Lin, Hunter Schafer & myself J

1
Thank you for your feedback!!!

• Several people mentioned that I was going too fast.


– Slow me down! Ask questions!!!
– Watch Summer 2020 videos before class (at half speed)
– Do the reading before class.
• Some people said they wanted more practice
– Problems in textbook
– Do the section problems!
– MIT “Mathematics for Computer Science” 6.042J (sections on counting & probability)
– Get the book “A First Course in Probability” by Sheldon Ross
• More office hours?

2
Review Probability
Examples:
Definition. A sample space Ω is the set of
• Single coin flip: Ω = {𝐻, 𝑇}
all possible outcomes of an experiment.
• Two coin flips: Ω = {𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇}
• Roll of a die: Ω = 1, 2, 3, 4, 5, 6

Examples:
Definition. An event 𝐸 ⊆ Ω is a subset of • Getting at least one head in two coin flips:
possible outcomes. 𝐸 = {𝐻𝐻, 𝐻𝑇, 𝑇𝐻}
• Rolling an even number on a die :
𝐸 = {2, 4, 6}

3
Review Axioms of Probability

Let Ω denote the sample space and 𝐸, 𝐹 ⊆ Ω be events. Note


this is more general to any probability space (not just uniform)
Axiom 1 (Non-negativity): 𝑃 𝐸 ≥ 0
Axiom 2 (Normalization): 𝑃 Ω = 1
Axiom 3 (Countable Additivity): If E and F are mutually exclusive,
then 𝑃 𝐸 ∪ 𝐹 = 𝑃 𝐸 + 𝑃(𝐹)

Corollary 1 (Complementation): 𝑃 𝐸 ! = 1 − 𝑃(𝐸)


Corollary 2 (Monotonicity): If 𝐸 ⊆ 𝐹, 𝑃 𝐸 ≤ 𝑃(𝐹)
Corollary 3 (Inclusion-Exclusion): 𝑃 𝐸 ∪ 𝐹 = 𝑃 𝐸 + 𝑃 𝐹 − 𝑃(𝐸 ∩ 𝐹)

4
Agenda

• Conditional Probability
• Bayes Theorem
• Law of Total Probability
• Bayes Theorem + Law of Total Probability
• More Examples

5
Conditional Probability (Idea)

A B

36 7 13

14
What’s the probability that someone likes ice cream given they like donuts?

6
Conditional Probability

Definition. The conditional probability of event A given an event B


happened (assuming 𝑃 𝐵 ≠ 0) is
𝑃(𝐴 ∩ 𝐵)
𝑃 𝐴𝐵 =
𝑃(𝐵)

An equivalent and useful formula is

𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝐵 𝑃(𝐵)

7
Reversing Conditional Probability

Question: Does 𝑃 𝐴 𝐵 = 𝑃(𝐵|𝐴)?

No! The following is purely for intuition and makes no sense in


terms of probability
• Let A be the event you are wet
• Let B be the event you are swimming

𝑃 𝐴𝐵 =1
𝑃 𝐵𝐴 ≠1
9
Example with Conditional Probability
https://pollev.com/ annakarlin185

P(𝐵) 𝑃(𝐵|𝐴)
Toss a red die and a blue die (both 6 a) 1/6 1/6
sided and all outcomes equally b) 1/6 1/3
likely). What is 𝑃 𝐵 ? What is c) 1/6 3/36
d) 1/9 1/3
𝑃(𝐵|𝐴)?

10
Gambler’s fallacy
Assume we toss 51 fair coins.
Assume we have seen 50 coins, and they are all “tails”.
What are the odds the 51st coin is “heads”?

𝒜 = first 50 coins are “tails”


𝐵 = first 50 coins are “tails”, 51st coin is ”heads”

ℙ ℬ𝒜 =

11
Gambler’s fallacy
Assume we toss 51 fair coins.
Assume we have seen 50 coins, and they are all “tails”.
What are the odds the 51st coin is “heads”?

𝒜 = first 50 coins are “tails”


𝐵 = first 50 coins are “tails”, 51st coin is ”heads”
51st coin is independent of
ℙ 𝒜∩ℬ 1/2!" 1 outcomes of first 50 tosses!
ℙ ℬ𝒜 = = =
ℙ 𝒜 2/2!" 2

Gambler’s fallacy = Feels like it’s time for “ heads”!?


12
Agenda

• Conditional Probability
• Bayes Theorem
• Law of Total Probability
• Bayes Theorem + Law of Total Probability
• More Examples

13
Bayes Theorem

A formula to let us “reverse” the conditional.

Theorem. (Bayes Rule) For events A and B, where 𝑃 𝐴 , 𝑃 𝐵 > 0,

𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴𝐵 =
𝑃(𝐵)

𝑃 𝐴 is called the prior (our belief without knowing anything)


𝑃 𝐴 𝐵 is called the posterior (our belief after learning 𝐵)
14
Bayes Theorem Proof

By definition of conditional probability


𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝐵 𝑃(𝐵)

Swapping A, B gives
𝑃 𝐵∩𝐴 =𝑃 𝐵 𝐴 𝑃 𝐴

But 𝑃 𝐴 ∩ 𝐵 = 𝑃(𝐵 ∩ 𝐴), so


𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃 𝐵 𝐴 𝑃(𝐴)

Dividing both sides by 𝑃(𝐵) gives


𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴𝐵 =
𝑃(𝐵)

15
Our First Machine Learning Task: Spam Filtering

Subject: “FREE $$$ CLICK HERE”

What is the probability this email is spam, given the subject contains “FREE”?
Some useful stats:
– 10% of ham (i.e., not spam) emails contain the word “FREE” in the subject.
– 70% of spam emails contain the word “FREE” in the subject.
– 80% of emails you receive are spam.

16
Brain Break

17
Agenda

• Conditional Probability
• Bayes Theorem
• Law of Total Probability
• Bayes Theorem + Law of Total Probability
• More Examples

18
Partitions (Idea)

These events partition the sample space


1. They “cover” the whole space
2. They don’t overlap

19
Partition
Definition. Non-empty events 𝐸" , 𝐸# , … , 𝐸$ partition the sample space Ω if
(Exhaustive)
1
𝐸/ ∪ 𝐸0 ∪ ⋯ ∪ 𝐸1 = 0 𝐸2 = Ω
23/
(Pairwise Mutually Exclusive)
∀2 ∀245 𝐸2 ∩ 𝐸5 = ∅

EC E

20
Law of Total Probability (Idea)

If we know 𝐸/ , 𝐸0 , … , 𝐸1 partition Ω, what can we say about 𝑃(𝐹)

21
Law of Total Probability (LTP)

Definition. If events 𝐸" , 𝐸# , … , 𝐸$ partition the sample space Ω, then for any event F
$

𝑃 𝐹 = 𝑃 𝐹 ∩ 𝐸" + … + 𝑃 𝐹 ∩ 𝐸$ = 8 𝑃(𝐹 ∩ 𝐸% )
%&"

Using the definition of conditional probability 𝑃 𝐹 ∩ 𝐸 = 𝑃 𝐹 𝐸 𝑃(𝐸)


We can get the alternate form of this that show

𝑃 𝐹 = 6 𝑃 𝐹 𝐸2 𝑃(𝐸2 )
23/

22
Another Contrived Example

Alice has two pockets:


• Left pocket: Two red balls, two green balls
• Right pocket: One red ball, two green balls.

Alice picks a random ball from a random pocket.


[Both pockets equally likely, each ball equally likely.]

What is ℙ R ?

23
Sequential Process – Non-Uniform Case
Right Red • Left pocket: Two red, two green
• Right pocket: One red, two green.
Right
Right Green
• Alice picks a random ball from a
Left Red random pocket
Left
Left Green

24
Sequential Process – Non-Uniform Case
1/3 RR • Left pocket: Two red, two green
• Right pocket: One red, two green.
1/2 Right
2/3 RG
1/2
1/3 = 𝒫 𝑅 R) and 2/3 = 𝒫 𝐺 R)
LR
1/2 Left
1/2 LG

ℙ R = ℙ R ∩ Left + ℙ R ∩ Right (Law of total probability)


= ℙ Left ×ℙ R|Left + ℙ Right ×ℙ R|Right
1 1 1 1 1 1 5
= × + × = + =
2 2 2 3 4 6 12 25
Agenda

• Conditional Probability
• Bayes Theorem
• Law of Total Probability
• Bayes Theorem + Law of Total Probability
• More Examples

26
Our First Machine Learning Task: Spam Filtering

Subject: “FREE $$$ CLICK HERE”

What is the probability this email is spam, given the subject contains “FREE”?
Some useful stats:
– 10% of ham (i.e., not spam) emails contain the word “FREE” in the subject.
– 70% of spam emails contain the word “FREE” in the subject.
– 80% of emails you receive are spam.

27
Bayes Theorem with Law of Total Probability

Bayes Theorem with LTP: Let 𝐸", 𝐸#, … , 𝐸$ be a partition of the


sample space, and 𝐹 and event. Then,

𝑃 𝐹 𝐸" 𝑃(𝐸") 𝑃 𝐹 𝐸" 𝑃 𝐸"


𝑃 𝐸" 𝐹) = = $
𝑃(𝐹) ∑%&" 𝑃 𝐹 𝐸% 𝑃 𝐸%

Simple Partition: In particular, if 𝐸 is an event with non-zero


probability, then
𝑃 𝐹 𝐸 𝑃(𝐸)
𝑃 𝐸 𝐹) =
𝑃 𝐹 𝐸 𝑃 𝐸 + 𝑃 𝐹 𝐸 ' 𝑃(𝐸 ' )
28
Agenda

• Conditional Probability
• Bayes Theorem
• Law of Total Probability
• Bayes Theorem + Law of Total Probability
• More Examples

29
Example – Zika Testing
Usually no or mild symptoms (rash); sometimes
severe symptoms (paralysis).

During pregnancy: may cause birth defects.

Suppose you took a Zika test, and it returns


“positive”, what is the likelihood that you
actually have the disease?
• Tests for diseases are rarely 100% accurate.

This example and following slides are from Lisa Yan (Stanford). 30
Example – Zika Testing

Suppose we know the following Zika stats


– A test is 98% effective at detecting Zika (“true positive”)
– However, the test may yield a “false positive” 1% of the time
– 0.5% of the US population has Zika.

What is the probability you have Zika (event Z) if you test positive (event T).

https://pollev.com/ annakarlin185

A) Less than 0.25


B) Between 0.25 and 0.5
C) Between 0.5 and 0.75
D) Between 0.75 and 1
31
Example – Zika Testing

Suppose we know the following Zika stats


– A test is 98% effective at detecting Zika (“true positive”)
– However, the test may yield a “false positive” 1% of the time
– 0.5% of the US population has Zika.

What is the probability you have Zika (event Z) if you test positive (event T).

32
Example – Zika Testing Have zika blue, don’t pink

Suppose we know the following Zika stats


– A test is 98% effective at detecting Zika (“true positive”) 100%
– However, the test may yield a “false positive” 1% of the time 10/995 = approximately 1%
– 0.5% of the US population has Zika. 5% have it.

What is the probability you have Zika (event Z) if you test positive (event T).

Suppose we had 1000 people:


• 5 have Zika and test positive
• 985 do not have Zika and test negative
• 10 do not have Zika and test positive

5 1
= ≈ 0.33
5 + 10 3
Demo 33
Philosophy – Updating Beliefs

While it’s not 98% that you have the disease, your beliefs changed drastically

Z = you have Zika


T = you test positive for Zika

I now have a 33%


I have a 0.5% chance Receive positive chance of having Zika
of having Zika test result after the test.

Prior: P(Z) Posterior: P(Z|T)


34
Example – Zika Testing
Suppose we know the following Zika stats
– A test is 98% effective at detecting Zika (“true positive”)
– However, the test may yield a “false positive” 1% of the time
– 0.5% of the US population has Zika.

= if you have Zika (event 𝑍)?


What is the probability you test negative (event 𝑇)

35
Conditional Probability Define a Probability Space
The probability conditioned on 𝐴 follows the same properties as
(unconditional) probability.

Example. ℙ ℬ 7 𝒜 = 1 − ℙ(ℬ|𝒜)

36
Conditional Probability Define a Probability Space
The probability conditioned on 𝐴 follows the same properties as
(unconditional) probability.

Example. ℙ ℬ 7 𝒜 = 1 − ℙ(ℬ|𝒜)

Formally. (Ω, ℙ) is a probability space + ℙ 𝒜 > 0

(𝒜, ℙ(⋅ |𝒜)) is a probability space

37

You might also like