0% found this document useful (0 votes)
22 views

3_Conditional Prob

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

3_Conditional Prob

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Slide set  Conditional Probability

∙ Conditional probability
∙ Chain rule
∙ Sequential calculation of probabilities
∙ Law of total probability
∙ Bayes rule
∙ Independence
∙ Summary

© Copyright  Abbas El Gamal


Conditional probability

∙ We are often interested in the probability of event A given that event B occurred
∙ Examples:
󳶳 Alice tests positive for a certain disease, what is the probability that she has this disease?
󳶳 A  is read out of a memory chip, what is the probability that a  was stored?

∙ Definition: The conditional probability of A given B (with P(B) ̸= ) is:


P(A ∩ B) P(A, B)
P(A | B) = , or
P(B) P(B)
∙ Interpretation: Given B, only outcomes in A that are in A ∩ B can occur
Dividing by P(B) then normalizes the probability that A ∩ B occurs,
which we call the conditional probability of A given B

 / 
Example

∙ Roll a -sided die twice. So, the sample space is {, , , }
Assume each sample point has probability /.
Let B be the event that the minimum of the two die rolls is , and
A be the event that the maximum of the two die rolls is 
Find P(A | B)
B
P(A, B) 
P(A | B) =
P(B)

P(B) = 

 nd roll
P(A, B) =
 

P(A | B) =


  
Compared to P(A) = A  
 st roll

 / 
P(⋅ | B) is a probability measure

∙ For a given B, P(⋅ | B) constitutes a probability measure on Ω


󳶳 Positivity: P(A | B) ≥ 
P(Ω ∩ B) P(B)
󳶳 Normalization: P(Ω | B) = = =
P(B) P(B)
󳶳 Additivity: A , A disjoint evens, then

P((A ∪ A ) ∩ B) P 󶀡(A ∩ B) ∪ (A ∩ B)󶀱 P(A ∩ B) + P(A ∩ B)


P(A ∪ A | B) = = =
P(B) P(B) P(B)
= P(A | B) + P(A | B)
B

∙ For the -sided die example:
󳶳 Unconditional probability measure first

nd roll
assigns probability / to each outcome
󳶳 Conditioned on B, the conditional

probability measure first assigns probability
/ to (, ), (, ), (, ), (, ), (, ) and zero 
to all other outcomes    
st roll
 / 
Chain rule
∙ Probability measures are often specified via conditional probabilities
∙ Two methods to calculate probability of an event from conditional probabilities:
󳶳 the sequential method and
󳶳 a “divide and conquer” method using the law of total probability

∙ Both use the chain rule:


By the definition of conditional probability, P(A ∩ B) = P(A | B) P(B)
Suppose that A , A , . . . , An are events, then
P 󶀡 ∩ni= Ai 󶀱 = P 󶀡 ∩n−
i= A i 󶀱 ⋅ P 󶀡A n | ∩ n−
i= Ai 󶀱
= P 󶀡 ∩n−
i= A i 󶀱 ⋅ P 󶀡A n− | ∩ n−
i= A i 󶀱 ⋅ P 󶀡A n | ∩ n−
i= Ai 󶀱
..
.
n−
= P(A ) ⋅ P(A | A ) ⋅ P(A | A ∩ A ) ⋅ ⋅ ⋅ P 󶀡An | ∩i= Ai 󶀱
n
P󶀡 ∩ni= Ai 󶀱 = 󵠉 P(Ai | A , A , . . . , Ai− ), where A = 
i=

 / 
Sequential calculation of probabilities

. Construct a tree description of the sample space for the sequential experiment
. Assign conditional probabilities to the corresponding edges
. By the chain rule, the probability of an outcome is the product of the
conditional probabilities along the path from the root to the outcome’s leaf node
s P(A , A ) = P(A ) P(A | A )

s
P(A |A )

✁✁❅ P(A | A )
c

P(A ) ✁ ❅s P(A , Ac ) = P(A ) P(Ac | A )


s✁



P(Ac ) s P(Ac , A ) = P(Ac ) P(A |Ac )

❆❆s P(A |Ac )

❅s P(Ac , Ac ) = P(Ac ) P(Ac |Ac )
P(Ac |Ac )

 / 
Diagnosing a disease

∙ Let A be the event that a person has a certain disease and


B be the event that he tested positive for the disease. Assume that
P(A) = ., P(B | A) = ., and P(B | Ac ) = ., what is the probability of:
󳶳 False negative (misdiagnosis), i.e., P(A ∩ Bc )
󳶳 False positive (false diagnosis), i.e., P(Ac ∩ B)
∙ We use the sequential method:
s P(A, B) = P(A) P(B | A)

s
P(B|A) = .

✁❅
P(A) = . ✁ .❅s P(A, Bc ) = . × . = . (Misdiagnosis)

s✁


P(Ac ) = .
❆ s P(Ac , B) = . × . = . (False diagnosis)
❆ .
❆s

P(Bc |Ac ) = .
❅s P(Ac , Bc ) = P(Ac ) P(Bc |Ac )
 / 
Example

∙ Cards: Three cards are randomly drawn (without replacement) from


a -deck card. What is the probability of not drawing a heart?
∙ We can use counting to find this probability:

󳶳 The sample space for picking  cards has 󶀤 󶀴 choices

 − 
󳶳 The number of ways of picking  cards with no heart is: 󶀤 󶀴

  −   ×  × 
󳶳 The probability of no hearts is: 󶀤 󶀴󶄤󶀤 󶀴= = .
   ×  × 

 / 
Example

∙ We can also use the sequential method


∙ Let Ai be the event “not heart” at draw i = , , , and we want P(A ∩ A ∩ A )
We construct a (partial) tree with conditional probabilities on its edges
t   

t✟
P(A |A , A ) = /✟✟ P(A , A , A ) = P(A ) P(A | A ) P(A | A , A ) = × ×
  

P(A |A ) = / / ❍t


❍❍

t
✁✁❅
P(A ) = / ✁ /❅❅t

t✁



P(Ac ) = /❆

❆❆t

 / 
Recap
P(A ∩ B)
∙ Conditional probability of A given B (with P(B) ̸= ) is: P(A | B) =
P(B)
P(⋅ | B) is a probability measure
∙ Chain rule: For two events, P(A ∩ B) = P(A | B) P(B)
n
In general: P 󶀡 ∩ni= Ai 󶀱 = 󵠉 P(Ai | A , A , . . . , Ai− ), A = 
i=

∙ The sequential method for computing probabilities


∙ Up next:
󳶳 Computing probabilities via the law of total probability
󳶳 Bayes rule (basis for Bayesian statistical inference)

 / 
Law of total probability

∙ Let A , . . . , An be events that partition Ω, i.e., disjoint and ∪ni= Ai = Ω


n n
Then for any event B: P(B) = 󵠈 P(Ai ∩ B) = 󵠈 P(Ai ) P(B | Ai )
i= i=
Remark: This law holds also for n = ∞
∙ Chess tournament. Alice plays one of  classes of opponents:
󳶳 P( Class  ) = ., P( Alice wins | Class  ) = .
󳶳 P( Class  ) = ., P( Alice wins | Class  ) = .
󳶳 P( Class  ) = ., P( Alice wins | Class  ) = .

What is the probability that Alice wins the game?


∙ By the law of total probability
 
P( Alice wins ) = 󵠈 P( Class i , Alice wins ) = 󵠈 P( Class i ) P( Alice wins | Class i )
i= i=
= . × . + . × . + . × . = .

 / 
Binary communication link

∙ Consider a communication link in which a  or  is sent and a  or  is received


Assume that  is sent with probability . (and  with probability .)
The link is noisy: if a  is sent, a  is received with probability ., and
if a  is sent, a  is received with probability .
What is the probability the received bit is in error?
∙ We can represent the link model by a probability transition diagram
P( | ) = .
P() = .  
.

bit sent bit received

.
P() = .  
P( | ) = .

 / 
Binary communication link

∙ We can represent the link model by a probability transition diagram


P( | ) = .
P() = .  
.

bit sent bit received

.
P() = .  
P( | ) = .

∙ Define the events: A = { is sent}, B = { is received}, E = {received bit is in error}


∙ To find the probability of error, we use the law of total probability
P(E) = P(A) P(E | A) + P(Ac ) P(E | Ac )
= P(A) P(Bc | A) + P(Ac ) P(B | Ac )
= . × . + . × . = .

 / 
Bayes rule
∙ Let A , A , . . . , An be nonzero probability events (the causes) that partition Ω,
and let B be a nonzero probability event (the effect/observation)
∙ Given the a priori probabilities P(Ai ) and
the conditional probabilities P(B | Ai ), i = , . . . , n,
find the a posteriori probabilities P(Aj | B), j = , . . . , n
∙ From the definition of conditional probability,
P(B, Aj ) P(B | Aj )
P(Aj | B) = = P(Aj )
P(B) P(B)
By the law of total probability
Thomas Bayes (-)
n
P(B) = 󵠈 P(Ai ) P(B | Ai )
i=

Substituting we obtain Bayes rule:


P(B | Aj )
P(Aj | B) = P(Aj ) for j = , , . . . , n
∑ni= P(Ai ) P(B | Ai )

 / 
Disease diagnosis

∙ Recall that A is the event a person has the disease and B is the event that he tests
positive, where P(A) = ., P(B | A) = ., P(B | Ac ) = .
What is the probability that the person has the disease given he tested positive?
∙ We want to find P(A|B), so we use Bayes rule
P(B | A)
P(A|B) = c c
P(A)
P(A) P(B | A) + P(A ) P(B | A )
.
= × . = .
. × . + . × .
Versus a priori probability of ., so the test increased the probability a lot
But the test is as good as flipping a fair coin, even though it is % accurate!
What test accuracy would we needed to achieve % correct diagnosis?

 / 
Disease diagnosis

∙ Recall that A is the event a person has the disease and B is the event that he tests
positive, where P(A) = ., P(B | A) = ., P(B | Ac ) = .
What is the probability that the person has the disease given he tested negative?
∙ We want to find P(A|Bc ), so we again use Bayes rule
c P(Bc | A)
P(A|B ) = P(A)
P(A) P(Bc | A) + P(Ac ) P(Bc | Ac )
.
= × . = .
. × . + . × .
Versus a priori probability of ., so the test decreased the probability a lot

 / 
Independence

∙ Sometimes knowing that B occurred doesn’t change the probability of A, i.e.,


P(A | B) = P(A)

In this case we say that the two events are statistically independent
∙ Examples:
󳶳 Flip a fair coin and get a H, the probability of a H in the next flip is still /
󳶳 You are told Alice is born on /, the probability Bob is born on / is still /

∙ From the definition of conditional probability,


P(A, B)
P(A | B) =
P(B)
Hence, an equivalent definition of independence is that

P(A, B) = P(A) P(B)

It also follows that: P(B | A) = P(B)


 / 
Independence

∙ Suppose A and B are independent. i.e., P(A, B) = P(A) P(B),


are Ac and Bc also independent?
∙ Intuitively, the answer should be YES
∙ Let’s verify
P(Ac , Bc ) = P((A ∪ B)c ) De Morgan’s law
=  − P(A ∪ B)
=  − (P(A) + P(B) − P(A, B)) inclusion-exclusion
=  − (P(A) + P(B) − P(A) P(B)) independence
= ( − P(A)) ( − P(B)) = P(Ac ) P(Bc ),

hence Ac and Bc are indeed independent


∙ Two experiments are said to be independent if for every event A from the first
experiment and B from the second experiment, P(A, B) = P(A) P(B)

 / 
Independence for more than two events

∙ In general A , . . . , An are mutually independent if for every subset S ⊆ {, . . . , n},


P(∩i∈S Ai ) = 󵠉 P(Ai )
i∈S

∙ For example, for  events A , A , A :


P(A , A ) = P(A ) P(A ),
P(A , A ) = P(A ) P(A ),
P(A , A ) = P(A ) P(A ), and
P(A , A , A ) = P(A ) P(A ) P(A )

 / 
Independence for more than two events
n
∙ Note that P(A , . . . , An ) = 󵠉 P(Ai ) alone is not sufficient for independence
i=

∙ Example: Roll two fair dice independently. Define the events


A = {st die = , , or }, B = {st die = , , or }, C = {Sum of outcomes = }

Are A, B, and C independent?



∙ Since the dice are fair P(A) = P(B) =

The event C = {(, ), (, ), (, ), (, )}. Since the experiments are independent,
 
P(C) = =
 

Since A ∩ B ∩ C = {(, )}, P(A, B, C) = = P(A) P(B) P(C)

  
But P(A, B) = = ≠ = P(A) P(B), thus A, B, and C are not independent!
  
 / 
Independence for more than two events

∙ Also, independence of subsets does not necessarily imply independence


∙ Example: Flip a fair coin twice independently. Define the events:
󳶳 A: First toss is H
󳶳 B: Second toss is H
󳶳 C: First and second tosses have different outcomes

Are A, B, C independent?
∙ Clearly, A, B are independent, A, C are independent, and B, C are independent

But, P(A, B, C) =  ̸= P(A) P(B) P(C) = ,

since if you know A and B, you know that C could not have occurred

 / 
Coupon collector problem
∙ A food company includes a baseball player card in each cereal box it sells.
Assume k players and each box contains a randomly and independently chosen
card of one of them (with replacement). A fan buys n cereal boxes,
what is the probability that she gets Babe Ruth’s card?
󳶳 Let Ei be the event that she gets Ruth’s card in cereal box i ∈ {, , . . . , n}
We want to find P 󶀢 ∪ni= Ei 󶀲

󳶳 Since the card selection is random and with replacement, P(Ei ) =
k
󳶳 This is not enough information to find the probability of the union
󳶳 Let’s try to find the probability of the complementary event P 󶀢 ∩ni= Eic 󶀲 (De Morgan’s)
󳶳 Since the Ei s are independent, so are the Eic s, hence
n n
 n
P󶀢 ∩ni= Eic 󶀲 = 󵠉 P(Eic ) = 󵠉( − P(Ei )) = 󶀢 − 󶀲
i= i= k

 n
󳶳 Hence, P 󶀢 ∪ni= Ei 󶀲 =  − P 󶀢 ∩ni= Eic 󶀲 =  − 󶀢 − 󶀲
k
For example if k =  and we want P 󶀢 ∪ni= Ei 󶀲 = ., then n ≈ 
 / 
Summary

∙ Conditional probability captures prior information about the event probability


∙ Chain rule provides two important methods for computing probability:
󳶳 Sequential (tree) method
󳶳 Law of total probability method

∙ Bayes rule, basis of statistical inference


∙ Independence of events

 / 

You might also like