0% found this document useful (0 votes)
7 views

Conditional Probability Monty Hall Problem-with Examples for Data Science (1)

The Monty Hall Problem illustrates a counterintuitive probability scenario where switching doors after one is revealed increases the chance of winning a car from 1/3 to 2/3. Many people mistakenly believe that the two remaining doors have equal probabilities, leading to widespread confusion even among experts. The document also discusses conditional probability, emphasizing its importance in data science and how it can be applied to real-life scenarios.

Uploaded by

mezohaib0
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Conditional Probability Monty Hall Problem-with Examples for Data Science (1)

The Monty Hall Problem illustrates a counterintuitive probability scenario where switching doors after one is revealed increases the chance of winning a car from 1/3 to 2/3. Many people mistakenly believe that the two remaining doors have equal probabilities, leading to widespread confusion even among experts. The document also discusses conditional probability, emphasizing its importance in data science and how it can be applied to real-life scenarios.

Uploaded by

mezohaib0
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

The Monty Hall Problem:

The statement of this famous problem in Parade Magazine is as follows:

Suppose you're on a game show, and you're given the choice of three doors:
Behind one door is a car; behind the others, donkey. You pick a door, say No.1, and
the host, who knows what's behind the doors, opens another door, say No.3, which
has a donkey. He then says to you, "Do you want to pick door No.2?" Is it to your
advantage to switch your choice? (Whitaker 1990)

Many people argue that "the two unopened doors are the same so they each will
contain the car with probability 1/2, and hence there is no point in switching." As we
will now show, this naive reasoning is incorrect. To compute the answer, we will
suppose that the host always chooses to show you a donkey (When the problem and
the solution appeared in Parade, approximately 10,000 readers, including nearly 1,000
with Ph.D.s, wrote to the magazine claiming the published solution was wrong. Some
of the controversy was because the Parade version of the problem is technically
ambiguous since it leaves certain aspects of the host's behavior unstated, for example
whether the host must open a door and must make the offer to switch).

Assuming that you have picked door No.1, there are 3 cases:

Notice that although it took a number of steps to compute this answer, it is "obvious". When we picked one of the
three doors initially we had probability 1/3 of picking the car, and since the host can always open a door with a
donkey the new information does not change our chance of winning.
The above argument may easily persuade you in a moment by when you think about it
again by yourself, it is still somewhat confusing. So why is it so confusing?

Source of Confusion:

When first presented with the Monty Hall problem an overwhelming majority of
people assume that each door has an equal probability and conclude that switching
does not matter (Mueser and Granberg, 1999). Out of 228 subjects in one study, only
13% chose to switch (Granberg and Brown, 1995:713).

In her book The Power of Logical Thinking, vos Savant (1996:15) quotes cognitive
psychologist Massimo Piattelli-Palmarini as saying "... no other statistical puzzle
comes so close to fooling all the people all the time" and "that even Nobel physicists
systematically give the wrong answer, and that they insist on it, and they are ready to
berate in print those who propose the right answer."

Most statements of the problem, notably the one in Parade Magazine, do not match
the rules of the actual game show (Krauss and Wang, 2003:9), and do not fully
specify the host's behavior or that the car's location is randomly selected (Granberg
and Brown, 1995:712). Krauss and Wang (2003:10) conjecture that people make the
standard assumptions even if they are not explicitly stated. Although these issues are
mathematically significant, even when controlling for these factors nearly all people
still think each of the two unopened doors has an equal probability and conclude
switching does not matter (Mueser and Granberg, 1999). This "equal probability"
assumption is a deeply rooted intuition (Falk 1992:202). People strongly tend to think
probability is evenly distributed across as many unknowns as are present, whether it is
or not (Fox and Levav, 2004:637).

A competing deeply rooted intuition at work in the Monty Hall problem is the belief
that exposing information that is already known does not affect probabilities (Falk
1992:207). This intuition is the basis of solutions to the problem that assert the host's
action of opening a door does not change the player's initial 1/3 chance of selecting
the car. For the fully explicit problem this intuition leads to the correct numerical
answer, 2/3 chance of winning the car by switching, but leads to the same solution for
other variants where this answer is not correct (Falk 1992:207).

Another source of confusion is that the usual wording of the problem statement asks
about the conditional probability of winning given which door is opened by the host,
as opposed to the overall or unconditional probability. These are mathematically
different questions and can have different answers depending on how the host chooses
which door to open if the player's initial choice is the car (Morgan et al., 1991;
Gillman 1992). For example, if the host opens Door 3 whenever possible then the
probability of winning by switching for players initially choosing Door 1 is 2/3
overall, but only 1/2 if the host opens Door 3. In its usual form the problem statement
does not specify this detail of the host's behavior, making the answer that switching
wins the car with probability 2/3 mathematically unjustified. Many commonly
presented solutions address the unconditional probability, ignoring which door the
host opens; Morgan et al. call these "false solutions" (1991).

The

Conditional Probability with


examples For Data Science
Ashutosh Tripathi

Aug 15, 2019·7 min read

As the name suggests, Conditional Probability is the


probability of an event under some given condition. And
based on the condition our sample space reduces to the
conditional element.

For example, find the probability of a person subscribing


for the insurance given that he has taken the house loan.
Here sample space is restricted to the persons who have
taken house loan.

To understand Conditional probability, it is


recommended to have an understanding of probability
basics like Mutually Exclusive and Independent Events,
Joint, Union and Marginal Probabilities and Probability vs
Statistics etc. In case you want to revise those concepts,
you can refer those here Probability Basics for Data
Science.

Wiki Definition:
In probability theory, conditional probability is a
measure of the probability of an event occurring given
that another event has occurred. If the event of interest
is A and the event B is known or assumed to have
occurred, “the conditional probability of A given B”, or
“the probability of A under the condition B”, is usually
written as P(A | B), or sometimes PB(A) or P(A /
B) — Wikipedia

Now the question may come like why use


conditional probability and what is its significance
in Data Science?

Let’s take a real-life example. Probability of selling a TV


on a given normal day maybe only 30%. But if we
consider that given day is Diwali, then there are much
more chances of selling a TV. The conditional Probability
of selling a TV on a day given that Day is Diwali might be
70%. We can represent those probabilities as P(TV sell
on a random day) = 30%. P(TV sell given that today is
Diwali) = 70%.

So Conditional Probability helps Data Scientists to get


better results from the given data set and for Machine
Learning Engineers, it helps in building more accurate
models for predictions.

Let’s deep dive into it more:


The following table contains the different age group
peoples who have defaulted and not defaulted on Loans.

Converting the above table into probabilities


So if we are able to convert the data given in the
question, in tabular form, then sample space will get
reduced to either the full column or a complete row and
rest of the sample space becomes irrelevant.

What is the probability that a person will not


default on the loan given he/she is middle-
aged? P(No | Middle-Aged) = 0.586/0.690 = 0.85
[referring table — 2, probability form data] P(No|
Middle Aged ) = 27368/32219 = 0.85 [referring
table -1, normal numbered data]

If you notice, it is very clear that in the numerator it is


the Joint Probability that is the Probability of a person
not defaulting on the loan and also the person is middle-
aged.

And in the denominator, it is the Marginal probability


that is the Probability of a Person being middle-aged.
Hence we can also define the Conditional probability as
the ratio of Joint probability to the Marginal probability.

P(A|B) = P(A and B)/P(B)

Again let’s ask the question little differently by changing


the order as below.

What is the probability that a person is middle-aged


given he/she has not defaulted on the loan?

Now see, sample space has changed to the coloured row


that is persons who have not defaulted on Loan.
P(Middle-Aged | No) = 0.586/0.816 = 0.72 (Order
Matters)

Now did you notice something again, the probability is


changed by changing the order of the events.

Hence in Conditional probability order matters.

Conditional Probability Visualization using


Probability Tree
Explanation:
I have tried to explain each branch logic within the tree
itself. Now let’s dive into the questions which will explain
the importance of probability tree in calculating the
conditional probabilities.
P(Young and No)?
 Use standard conditional probability formula:

 P(Young | No) = P(Young and No)/P(No) which


implies:

 P(Young and No) = P(Young | No) * P(No)

 By Probability tree, we know the probability of


P(Young | No) = 0.275 and P(No) = 0.816.

 Now see right side all probabilities values are known,


hence put them in the above equation and we will get
the desired probability.

 P(Young and No) = 0.275 * 0.816 = 0.2244 = ~0.225

P(No and Young)? (Order is changed)


 P(No and Young) = P(Young and No) = 0.225 [same as
above]

 In Joint probability, order does not matter

P(Young)?
Look at all the branches associated with Young (ending
with Young) and take Sum of Products of probability
values within the branch, which means
P(Young) = 0.816 * 0.275 + 0.184 * 0.419 = 0.301496 =
~ 0.302

P(No)?
P(No) = 0.816 (Directly from the tree)

P(Young | No)?
P(Young | No) = p(Young | Not a loan defaulter) = 0.275
[see the tree]

P(No | Young)? [Order changed]


 P(No | Young) = P(Young and No)/P(Young) [we have
already calculated right side probabilities in above
calculation]

 P(No | Young) = 0.225/0.302 = 0.745

Now let’s explore the standard conditional


probability formula.
From conditional probability, we know that

 P(A|B) = P(A and B)/P(B)

 P(A and B) = P(B) * P(A|B) — — — — –[1]


Similarly

 P(B|A) = P(B and A)/P(A) = P(A and B)/P(A) [I n Joint


Probability order does not matter]

 P(A and B) = P(A) * P(B|A) — -[2]

From equation [1] and [2],

 P(B) * P(A|B) = P(A) * P(B|A)

 P(A|B) = P(A) * P(B|A) / P(B) [also known as Bayes’


Theorem]

Now if we want to find the P(No | Young). Then we can


use the above-derived formula directly. Because P(Young
| No) as well as P(Young) values will get from probability
tree and putting in above formula will give the result. we
will learn more on Bayes’ Theorem in next post.
Explanation
I have tried to explain the given problem using the
probability tree as shown above. However, if anything is
not clear, I am writing down what is given and what is
asked. And how to calculate what is being asked.
P(A becoming the CEO) = 0.2, P(B becoming the CEO) =
0.3, P(C becoming the CEO) = 0.4.

From the question, we need to deduce correctly that


later probabilities are all conditional probabilities. This is
the trick here. As they will only take beneficial decisions,
once they become the CEO. Hence we should read those
probabilities as below

 P(Taking beneficial Decisions | A is selected as CEO) =


0.5

 P(Taking beneficial Decisions | B is selected as CEO) =


0.45

 P(Taking beneficial Decisions | C is selected as CEO) =


0.6

So now P(having beneficial Decisions) is nothing but the


total probability. Hence will calculate the Sum of
Products of each branch probability values associated.
(branch with beneficial decisions, see the tree). Hence:

P(having beneficial decisions) = 0.2*0.05 + 0.3*0.45


+ 0.4*0.6 = 0.475
Explanation
 P(emails come in account 1) = 0.70

 P(emails come in account 2) = 0.20

 P(emails come in account 3) = 0.10


 P(email is spam | delivered to account 1) = P(Spam |
account 1) = 0.01

 P(email is spam | delivered to account 2) = P(Spam |


account 2) = 0.02

 P(email is spam | delivered to account 3) = P(Spam |


account 3) = 0.05

Using Bayes’ Theorem,

P(Account 2 | email is Spam) = P(Account 2) * P(Spam |


Account 2)/P(Spam)

Numerator values are known, Denominator P(Spam) is


know ans total probability and calculated by taking the
Sum of Products of Each Branch probability values
[branches ending with Spam see the tree].

P(Spam) = 0.70*0.01 + 0.20*0.02 + .10*0.05

P(Account 2) = 0.20 [given in question]

P( Spam | Account 2) = 0.02 [given in question]

Putting everything together, P( Account 2 | email is


Spam) = 0.20*0.02/( 0.70*0.01 + 0.20*0.02 + .10*0.05)
Answer: P( Account 2 | email is Spam) = 0.25

So that is all about Conditional Probability for data


Science. In next posts I will be writing Bayes’ Theorem in
detail and Probability Distributions which will complete
the Probability for Data Science Series.

I hope you enjoyed the post. For any comments, use t

You might also like