Conditional Probability Monty Hall Problem-with Examples for Data Science (1)
Conditional Probability Monty Hall Problem-with Examples for Data Science (1)
Suppose you're on a game show, and you're given the choice of three doors:
Behind one door is a car; behind the others, donkey. You pick a door, say No.1, and
the host, who knows what's behind the doors, opens another door, say No.3, which
has a donkey. He then says to you, "Do you want to pick door No.2?" Is it to your
advantage to switch your choice? (Whitaker 1990)
Many people argue that "the two unopened doors are the same so they each will
contain the car with probability 1/2, and hence there is no point in switching." As we
will now show, this naive reasoning is incorrect. To compute the answer, we will
suppose that the host always chooses to show you a donkey (When the problem and
the solution appeared in Parade, approximately 10,000 readers, including nearly 1,000
with Ph.D.s, wrote to the magazine claiming the published solution was wrong. Some
of the controversy was because the Parade version of the problem is technically
ambiguous since it leaves certain aspects of the host's behavior unstated, for example
whether the host must open a door and must make the offer to switch).
Assuming that you have picked door No.1, there are 3 cases:
Notice that although it took a number of steps to compute this answer, it is "obvious". When we picked one of the
three doors initially we had probability 1/3 of picking the car, and since the host can always open a door with a
donkey the new information does not change our chance of winning.
The above argument may easily persuade you in a moment by when you think about it
again by yourself, it is still somewhat confusing. So why is it so confusing?
Source of Confusion:
When first presented with the Monty Hall problem an overwhelming majority of
people assume that each door has an equal probability and conclude that switching
does not matter (Mueser and Granberg, 1999). Out of 228 subjects in one study, only
13% chose to switch (Granberg and Brown, 1995:713).
In her book The Power of Logical Thinking, vos Savant (1996:15) quotes cognitive
psychologist Massimo Piattelli-Palmarini as saying "... no other statistical puzzle
comes so close to fooling all the people all the time" and "that even Nobel physicists
systematically give the wrong answer, and that they insist on it, and they are ready to
berate in print those who propose the right answer."
Most statements of the problem, notably the one in Parade Magazine, do not match
the rules of the actual game show (Krauss and Wang, 2003:9), and do not fully
specify the host's behavior or that the car's location is randomly selected (Granberg
and Brown, 1995:712). Krauss and Wang (2003:10) conjecture that people make the
standard assumptions even if they are not explicitly stated. Although these issues are
mathematically significant, even when controlling for these factors nearly all people
still think each of the two unopened doors has an equal probability and conclude
switching does not matter (Mueser and Granberg, 1999). This "equal probability"
assumption is a deeply rooted intuition (Falk 1992:202). People strongly tend to think
probability is evenly distributed across as many unknowns as are present, whether it is
or not (Fox and Levav, 2004:637).
A competing deeply rooted intuition at work in the Monty Hall problem is the belief
that exposing information that is already known does not affect probabilities (Falk
1992:207). This intuition is the basis of solutions to the problem that assert the host's
action of opening a door does not change the player's initial 1/3 chance of selecting
the car. For the fully explicit problem this intuition leads to the correct numerical
answer, 2/3 chance of winning the car by switching, but leads to the same solution for
other variants where this answer is not correct (Falk 1992:207).
Another source of confusion is that the usual wording of the problem statement asks
about the conditional probability of winning given which door is opened by the host,
as opposed to the overall or unconditional probability. These are mathematically
different questions and can have different answers depending on how the host chooses
which door to open if the player's initial choice is the car (Morgan et al., 1991;
Gillman 1992). For example, if the host opens Door 3 whenever possible then the
probability of winning by switching for players initially choosing Door 1 is 2/3
overall, but only 1/2 if the host opens Door 3. In its usual form the problem statement
does not specify this detail of the host's behavior, making the answer that switching
wins the car with probability 2/3 mathematically unjustified. Many commonly
presented solutions address the unconditional probability, ignoring which door the
host opens; Morgan et al. call these "false solutions" (1991).
The
Wiki Definition:
In probability theory, conditional probability is a
measure of the probability of an event occurring given
that another event has occurred. If the event of interest
is A and the event B is known or assumed to have
occurred, “the conditional probability of A given B”, or
“the probability of A under the condition B”, is usually
written as P(A | B), or sometimes PB(A) or P(A /
B) — Wikipedia
P(Young)?
Look at all the branches associated with Young (ending
with Young) and take Sum of Products of probability
values within the branch, which means
P(Young) = 0.816 * 0.275 + 0.184 * 0.419 = 0.301496 =
~ 0.302
P(No)?
P(No) = 0.816 (Directly from the tree)
P(Young | No)?
P(Young | No) = p(Young | Not a loan defaulter) = 0.275
[see the tree]