0% found this document useful (0 votes)
44 views

21MidtermReview 1

The midterm exam for Stat 110 will cover topics taught in class including probability distributions, conditional probability, Bayes' rule, and important examples. It will consist of 4 equally weighted problems that can often be solved quickly without complex calculations. Students are advised to study homework solutions, lecture notes, practice problems, and the exam review document to prepare.

Uploaded by

coco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

21MidtermReview 1

The midterm exam for Stat 110 will cover topics taught in class including probability distributions, conditional probability, Bayes' rule, and important examples. It will consist of 4 equally weighted problems that can often be solved quickly without complex calculations. Students are advised to study homework solutions, lecture notes, practice problems, and the exam review document to prepare.

Uploaded by

coco
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Stat 110 Midterm Review, Fall 2011

Prof. Joe Blitzstein (Department of Statistics, Harvard University)

1 General Information
The midterm will be in class on Wednesday, October 12. There is no alternate time
for the exam, so please be there and arrive on time! Cell phones must be o↵, so it
is a good idea to bring a watch. No books, notes, or calculators are allowed, except
that you may bring two sheets of standard-sized paper (8.5” x 11”) with whatever
you want written on it (two-sided): notes, theorems, formulas, information about
the important distributions, etc.
There will be 4 problems, weighted equally. Many of the parts can be done
quickly if you have a good understanding of the ideas covered in class (e.g., seeing
if you can use Bayes’ Rule or understand what independence means), and for many
you can just write down an answer without needing to simplify. None will require
long or messy calculations. They are not arranged in order of increasing difficulty.
Since it is a short exam, make sure not to spend too long on any one problem.
Suggestions for studying: review all the homeworks and read the solutions, study
your lecture notes (and possibly relevant sections from either book), the strategic
practice problems, and this handout. Solving practice problems (which means trying
hard to work out the details yourself, not just trying for a minute and then looking
at the solution!) is extremely important.

2 Topics
• Combinatorics: multiplication rule, tree diagrams, binomial coefficients, per-
mutations and combinations, inclusion-exclusion, story proofs.

• Basic Probability: sample spaces, events, axioms of probability, equally likely


outcomes, inclusion-exclusion, unions, intersections, and complements.

• Conditional Probability: definition and meaning, writing P (A1 \ A2 \ · · · \ An )


as a product, Bayes’ Rule, Law of Total Probability, thinking conditionally,
independence vs. conditional independence.

1
• Random Variables: definition, meaning of X = x, stories, indicator r.v.s, prob-
ability mass functions (PMFs), probability density functions (PDFs), cumula-
tive distribution functions (CDFs), independence, Poisson approximation.

• Expected Value and Variance: definitions, linearity, standard deviation, Law


of the Unconscious Statistician (LOTUS).

• Important Discrete Distributions: Bernoulli, Binomial, Geometric, Negative


Binomial, Hypergeometric, Poisson.

• Important Continuous Distributions: Uniform, Normal.

• General Concepts: stories, symmetry, discrete vs. continuous, conditional prob-


ability is the soul of statistics, checking simple and extreme cases.

• Important Examples: birthday problem, matching problem, Newton-Pepys


problem, Monty Hall problem, testing for a rare disease, elk problem (capture-
recapture), gambler’s ruin, Simpson’s paradox, St. Petersburg paradox.

2
3 Important Distributions
The eight most important distributions we have discussed so far are listed below,
each with its PMF/PDF, mean, and variance. This table will be provided on the last
page of the midterm. As usual, we let 0 < p < 1 and q = 1 p. Each of these
distributions is important because it has a natural, useful story, so understanding
these stories (and recognizing equivalent stories) is crucial. It is also important to
know how these distributions are related to each other. For example, Bern(p) is the
same as Bin(1, p), and Bin(n, p) is approximately Pois( ) if n is large, p is small and
= np is moderate.

Name Param. PMF or PDF Mean Variance

Bernoulli p P (X = 1) = p, P (X = 0) = q p pq
n
Binomial n, p k
pk q n k , for k 2 {0, 1, . . . , n} np npq

Geometric p q k p, for k 2 {0, 1, 2, . . . } q/p q/p2


r+n 1
NegBinom r, p r 1
pr q n , n 2 {0, 1, 2, . . . } rq/p rq/p2
(wk)(n b k)
Hypergeom w, b, n , for k 2 {0, 1, . . . , n} µ= nw
( w+b n
)n nµ (1 µ
)
(w+b
n )
w+b w+b 1 n

e k
Poisson k!
, for k 2 {0, 1, 2, . . . }

1 a+b (b a)2
Uniform a<b b a
, for x 2 (a, b) 2 12

2 2)
Normal µ, 2 p1 e (x µ) /(2 µ 2
2⇡

3
4 Some Useful Formulas
4.1 De Morgan’s Laws
(A1 [ A2 · · · [ An )c = Ac1 \ Ac2 · · · \ Acn
(A1 \ A2 · · · \ An )c = Ac1 [ Ac2 · · · [ Acn

4.2 Complements
P (Ac ) = 1 P (A)

4.3 Unions
P (A [ B) = P (A) + P (B) P (A \ B)
n
X
P (A1 [ A2 [ · · · [ An ) = P (Ai ), if the Ai are disjoint
i=1
n
X
P (A1 [ A2 [ · · · [ An )  P (Ai )
i=1
n
!
X X
P (A1 [A2 [· · ·[An ) = ( 1)k+1 P (Ai1 \ Ai2 \ · · · \ Aik ) (Inclusion-Exclusion)
k=1 i1 <i2 <···<ik

4.4 Intersections
P (A \ B) = P (A)P (B|A) = P (B)P (A|B)
P (A1 \ A2 \ · · · \ An ) = P (A1 )P (A2 |A1 )P (A3 |A1 , A2 ) · · · P (An |A1 , . . . , An 1 )

4.5 Law of Total Probability


If E1 , E2 , . . . , En are a partition of the sample space S (i.e., they are disjoint and
their union is all of S) and P (Ej ) 6= 0 for all j, then
n
X
P (B) = P (B|Ej )P (Ej )
j=1

4
4.6 Bayes’ Rule
P (B|A)P (A)
P (A|B) =
P (B)
Often the denominator P (B) is then expanded by the Law of Total Probability.

4.7 Expected Value and Variance


Expected value is linear: for any random variables X and Y and constant c,
E(X + Y ) = E(X) + E(Y )
E(cX) = cE(X)
It is not true in general that Var(X + Y ) = Var(X) + Var(Y ). For example, let
X be Bernoulli(1/2) and Y = 1 X (note that Y is also Bernoulli(1/2)). Then
Var(X) + Var(Y ) = 1/4 + 1/4 = 1/2, but Var(X + Y ) = Var(1) = 0 since X + Y is
always equal to the constant 1. (We will see later exactly when the variance of the
sum is the sum of the variances.)
Constants come out from variance as the constant squared:
Var(cX) = c2 Var(X)
The variance of X is defined as E(X EX)2 , but often it is easier to compute using
the following:
Var(X) = E(X 2 ) (EX)2

4.8 Law of the Unconscious Statistician (LOTUS)


Let X be a discrete random variable and h be a real-valued function. Then Y = h(X)
is a random variable. To compute E(Y ) using the definition ofPexpected value, we
would need to first find the PMF of Y , and then use E(Y ) = y yP (Y = y). The
Law of the Unconscious Statistician says we can use the PMF of X directly:
X
E(h(X)) = h(x)P (X = x),
x

where the sum is over all possible values of X. Similarly, for X a continuous r.v. with
PDF fX , we can find the expected value of Y = h(X) using the PDF of X, without
having to find the PDF of Y :
Z 1
E(h(X)) = h(x)fX (x)dx
1

5
5 Stat 110 Midterm from 2007
1. Alice and Bob have just met, and wonder whether they have a mutual friend.
Each has 50 friends, out of 1000 other people who live in their town. They think
that it’s unlikely that they have a friend in common, saying “each of us is only friends
with 5% of the people here, so it would be very unlikely that our two 5%’s overlap.”
Assume that Alice’s 50 friends are a random sample of the 1000 people (equally likely
to be any 50 of the 1000), and similarly for Bob. Also assume that knowing who
Alice’s friends are gives no information about who Bob’s friends are.
(a) Compute the expected number of mutual friends Alice and Bob have (simplify).

(b) Let X be the number of mutual friends they have. Find the PMF of X.

(c) Is the distribution of X one of the important distributions we have looked at and
if so, which one? Note: even without solving (b), you can get credit by giving clear
reasons for or against each of the important distributions.

6
2. Two coins are in a hat. The coins look alike, but one coin is fair (with probability
1/2 of Heads), while the other coin is biased, with probability 1/4 of Heads. One of
the coins is randomly pulled from the hat, without knowing which of the two it is.
Call the chosen coin “Coin C”.
(a) Coin C is tossed twice, showing Heads both times. Given this information, what
is the probability that Coin C is the fair coin? (Simplify.)

(b) Are the events “first toss of Coin C is Heads” and “second toss of Coin C is
Heads” independent? Explain briefly.

(c) Find the probability that in 10 flips of Coin C, there will be exactly 3 Heads.
(The coin is equally likely to be either of the 2 coins; do not assume it already landed
Heads twice as in (a). Do not simplify.)

7
3. Five people have just won a $100 prize, and are deciding how to divide the $100
up between them. Assume that whole dollars are used, not cents. Also, for example,
giving $50 to the first person and $10 to the second is di↵erent from vice versa.
(a) How many ways are there to divide up the $100, such that each gets at least $10?
Hint: there are n+kk 1 ways to put k indistinguishable balls into n distinguishable
boxes; you can use this fact without deriving it.

(b) Assume that the $100 is randomly divided up, with all of the possible allocations
counted in (a) equally likely. Find the expected amount of money that the first
person receives (justify your reasoning).

(c) Let Aj be the event that the jth person receives more than the first person
(for 2  j  5), when the $100 is randomly allocated as in (b). Are A2 and A3
independent? (No explanation needed for this.) Express IA2 \A3 and IA2 [A3 in terms
of IA2 and IA3 (where IA is the indicator random variable of any event A).

8
4. (a) Let X ⇠ Pois( ), with > 0. Find E(X!), the average factorial of X.
(Simplify, and specify what condition on is needed to make the expectation finite.)

(b) The owner of a certain website is studying the distribution of the number of
visitors to the site. Every day, a million people independently decide whether to visit
the site, with probability p = 2⇥10 6 of visiting. Give a good, simple approximation
of the probability of getting at least two visitors on a particular day (simplify; your
answer should not involve series).

(c) In the scenario of (b), approximately how many days will it take on average until
there is a day with at least two visitors (including the day itself)?

9
5. Alice flips a fair coin n times and Bob flips another fair coin n + 1 times, resulting
in independent X ⇠ Bin(n, 12 ) and Y ⇠ Bin(n + 1, 12 ).
(a) Let V = min(X, Y ) be the smaller of X and Y , and let W = max(X, Y ) be the
larger of X and Y . (If X = Y , then V = W = X = Y .) Find E(V ) + E(W ) in
terms of n (simplify).

(b) Is it true that P (X < Y ) = P (n X <n+1 Y )? Explain why or why not.

(c) Compute P (X < Y ) (simplify). Hint: use (b) and that X and Y are integers.

10
6 Stat 110 Midterm from 2008
1. The gambler de Méré asked Pascal whether it is more likely to get at least one
six in 4 rolls of a die, or to get at least one double-six in 24 rolls of a pair of dice.
Continuing this pattern, suppose that a group of n fair dice is rolled 4 · 6n 1 times.
(a) Find the expected number of times that “all sixes” is achieved (i.e., how often
among the 4 · 6n 1 rolls it happens that all n dice land 6 simultaneously). (Simplify.)

(b) Give a simple but accurate approximation of the probability of having at least
one occurrence of “all sixes”, for n large (in terms of e but not n).

(c) de Méré finds it tedious to re-roll so many dice. So after one normal roll of the
n dice, in going from one roll to the next, with probability 6/7 he leaves the dice in
the same configuration and with probability 1/7 he re-rolls. For example, if n = 3
and the 7th roll is (3, 1, 4), then 6/7 of the time the 8th roll remains (3, 1, 4) and
1/7 of the time the 8th roll is a new random outcome. Does the expected number
of times that “all sixes” is achieved stay the same, increase, or decrease (compared
with (a))? Give a short but clear explanation.

11
2. To battle against spam, Bob installs two anti-spam programs. An email arrives,
which is either legitimate (event L) or spam (event Lc ), and which program j marks
as legitimate (event Mj ) or marks as spam (event Mjc ) for j 2 {1, 2}. Assume that
10% of Bob’s email is legitimate and that the two programs are each “90% accurate”
in the sense that P (Mj |L) = P (Mjc |Lc ) = 9/10. Also assume that given whether an
email is spam, the two programs’ outputs are conditionally independent.
(a) Find the probability that the email is legitimate, given that the 1st program
marks it as legitimate (simplify).

(b) Find the probability that the email is legitimate, given that both programs mark
it as legitimate (simplify).

(c) Bob runs the 1st program and M1 occurs. He updates his probabilities and
then runs the 2nd program. Let P̃ (A) = P (A|M1 ) be the updated probability
function after running the 1st program. Explain briefly in words whether or not
P̃ (L|M2 ) = P (L|M1 \ M2 ): is conditioning on M1 \ M2 in one step equivalent to
first conditioning on M1 , then updating probabilities, and then conditioning on M2 ?

12
3. (a) Let X1 , X2 , . . . be independent N (0, 4) r.v.s., and let J be the smallest value
of j such that Xj > 4 (i.e., the index of the first Xj exceeding 4). In terms of the
standard Normal CDF , find E(J) (simplify).

(b) Let f and g be PDFs with f (x) > 0 and g(x) > 0 for all x. Let X be a random
variable with PDF f . Find the expected value of the ratio fg(X)
(X)
(simplify).

x
(c) Define F (x) = e e . This is a CDF (called the Gumbel distribution) and is a
continuous, strictly increasing function. Let X have CDF F , and define W = F (X).
What are the mean and variance of W (simplify)?

13
4. (a) Find E(2X ) for X ⇠ Pois( ) (simplify).

(b) Let X and Y be independent Pois( ) r.v.s, and T = X + Y . Later in the course,
we will show that T ⇠ Pois(2 ); here you may use this fact. Find the conditional
distribution of X given T = n, i.e., find the conditional PMF P (X = k|T = n)
(simplify). Which “important distribution” is this conditional distribution, if any?

(c) Again let X and Y be Pois( ) r.v.s, and T = X + Y , but now assume now that
X and Y are not independent, and in fact X = Y . Prove or disprove the claim that
T ⇠ Pois(2 ) in this scenario.

14
7 Stat 110 Midterm from 2009
1. (a) Let X ⇠ Pois( ). Find E(eX ) (simplify).

(b) The numbers 1, 2, 3, . . . , n are listed in some random order (with all n! permuta-
tions equally likely). An inversion occurs each time a pair of numbers is out of order,
i.e., the larger number is earlier in the list than the smaller number. For example,
3, 1, 4, 2 has 3 inversions (3 before 1, 3 before 2, 4 before 2). Find the expected
number of inversions in the list (simplify).

15
2. Consider four nonstandard dice (the Efron dice), whose sides are labeled as follows
(the 6 sides on each die are equally likely).
A: 4, 4, 4, 4, 0, 0
B: 3, 3, 3, 3, 3, 3
C: 6, 6, 2, 2, 2, 2
D: 5, 5, 5, 1, 1, 1
These four dice are each rolled once. Let A be the result for die A, B be the result
for die B, etc.
(a) Find P (A > B), P (B > C), P (C > D), and P (D > A).

(b) Is the event A > B independent of the event B > C? Is the event B > C
independent of the event C > D? Explain.

16
3. A discrete distribution has the memoryless property if for X a random variable
with that distribution, P (X j + k|X j) = P (X k) for all nonnegative integers
j, k.
(a) If X has a memoryless distribution with CDF F and PMF pi = P (X = i), find
an expression for P (X j + k) in terms of F (j), F (k), pj , pk .

(b) Name one important discrete distribution we have studied so far which has the
memoryless property. Justify your answer with a clear interpretation in words or
with a computation.

17
4. The book Red State, Blue State, Rich State, Poor State (by Andrew Gelman)
discusses the following election phenomenon: within any U.S. state, a wealthy voter
is more likely to vote for a Republican than a poor voter; yet the wealthier states
tend to favor Democratic candidates! In short: rich individuals (in any state) tend
to vote for Republicans, while states with a higher percentage of rich people tend to
favor Democrats.
(a) Assume for simplicity that there are only 2 states (called Red and Blue), each
of which has 100 people, and that each person is either rich or poor, and either a
Democrat or a Republican. Make up numbers consistent with the above, showing
how this phenomenon is possible, by giving a 2 by 2 table for each state (listing how
many people in each state are rich Democrats, etc.).

(b) In the setup of (a) (not necessarily with the numbers you made up there), let
D be the event that a randomly chosen person is a Democrat (with all 200 people
equally likely), and B be the event that the person lives in the Blue State. Suppose
that 10 people move from the Blue State to the Red State. Write Pold and Pnew for
probabilities before and after they move. Assume that people do not change parties,
so we have Pnew (D) = Pold (D). Is it possible that both Pnew (D|B) > Pold (D|B) and
Pnew (D|B c ) > Pold (D|B c ) are true? If so, explain how it is possible and why it does
not contradict the law of total probability P (D) = P (D|B)P (B) + P (D|B c )P (B c );
if not, show that it is impossible.

18
8 Stat 110 Midterm from 2010
1. A family has two children. The genders of the first-born and second-born are
independent (with boy and girl equally likely), and which seasons the children were
born in are independent, with all 4 seasons equally likely.
(a) Find the probability that both children are girls, given that a randomly chosen
one of the two is a girl who was born in winter (simplify).

(b) Find the probability that both children are girls, given that at least one of the
two is a girl who was born in winter (simplify).

19
2. In each day that the “Mass Cash” lottery is run in Massachusetts, 5 of the integers
from 1 to 35 are chosen (randomly and without replacement).
(a) When playing this lottery, find the probability of guessing exactly 3 numbers
right, given that you guess at least 1 of the numbers right (leave your answer in
terms of binomial coefficients).

(b) Find an exact expression for the expected number of days needed so that all of
the 35
5
possible lottery outcomes will have occurred (leave your answer as a sum,
which can involve binomial coefficients).

(c) Approximate the probability that after 50 days of the lottery, every number from
1 to 35 has been picked at least once (don’t simplify, but your answer shouldn’t
involve a sum).

20
3. Let U ⇠ Unif(0, 1), and X = ln 1 UU .
(a) Write down (but do not compute) an integral giving E(X 2 ).

(b) Find the CDF of X (simplify).

(c) Find E(X) without using calculus (simplify).


Hint: 1 U has the same distribution as U .

21
4. Let X1 , X2 , X3 , . . . , X10 be the total number of inches of rain in Boston in October
of 2011, 2012, 2013, . . . , 2020, with these r.v.s independent N (µ, 2 ). (Of course,
rainfall can’t be negative, but µ and are such that it is extremely likely that all the
Xj ’s are positive.) We say that a record value is set in a certain year if the rainfall
is greater than all the previous years (going back to 2011; so by definition, a record
is always set in the first year, 2011).
(a) On average, how many of these 10 years will set record values? (Your answer can
be a sum but the terms should be simple.)

(b) Is the indicator of whether the year 2013 sets a record independent of the indicator
of whether the year 2014 sets a record? (Justify briefly.)

(c) Later in the course, we will show that if Y1 ⇠ N (µ1 , 12 ) and Y2 ⇠ N (µ2 , 22 ) are
independent, then Y1 Y2 ⇠ N (µ1 µ2 , 12 + 22 ). Using this fact, find the probability
that the October 2014 rainfall will be more than double the October 2013 rainfall in
Boston, in terms of .

22
Stat 110 Midterm Review Solutions, Fall 2011
Prof. Joe Blitzstein (Department of Statistics, Harvard University)

Here are solutions to the midterm review problems. Please try your best to solve a
problem before reading the solution. Good luck!

1 Stat 110 Midterm from 2007


1. Alice and Bob have just met, and wonder whether they have a mutual friend.
Each has 50 friends, out of 1000 other people who live in their town. They think
that it’s unlikely that they have a friend in common, saying “each of us is only friends
with 5% of the people here, so it would be very unlikely that our two 5%’s overlap.”
Assume that Alice’s 50 friends are a random sample of the 1000 people (equally likely
to be any 50 of the 1000), and similarly for Bob. Also assume that knowing who
Alice’s friends are gives no information about who Bob’s friends are.
(a) Compute the expected number of mutual friends Alice and Bob have (simplify).
Let Ij be an indicator r.v. for the jth person being a mutual friend. Then
1000
X 5 2
E( Ij ) = 1000E(I1 ) = 1000P (I1 = 1) = 1000 · ( ) = 2.5.
j=1
100

(b) Let X be the number of mutual friends they have. Find the PMF of X.
Condition on who Alice’s friends are, and then count the number of ways that
Bob can be friends with exactly k of them. This gives
50 950
k 50 k
P (X = k) = 1000
50

for 0  k  50 (and 0 otherwise).


(c) Is the distribution of X one of the 5 important distributions we have looked at
and if so, which one? Note: even without solving (b), you can get credit by giving
clear reasons for or against each of the 5 distributions.
Yes, it is the Hypergeometric distribution, as shown by the PMF from (b) or by
thinking of “tagging” Alice’s friends (like the elk) and then seeing how many tagged
people there are among Bob’s friends.

1
2. Two coins are in a hat. The coins look alike, but one coin is fair (with probability
1/2 of Heads), while the other coin is biased, with probability 1/4 of Heads. One of
the coins is randomly pulled from the hat, without knowing which of the two it is.
Call the chosen coin “Coin C”.
(a) Coin C is tossed twice, showing Heads both times. Given this information, what
is the probability that Coin C is the fair coin? (Simplify.)
By Bayes’ Rule,

P (HH|fair)P (fair) (1/4)(1/2) 4


P (fair|HH) = = = .
P (HH) (1/4)(1/2) + (1/16)(1/2) 5

(b) Are the events “first toss of Coin C is Heads” and “second toss of Coin C is
Heads” independent? Explain briefly.
They’re not independent: the first toss being Heads is evidence in favor of the
coin being the fair coin, giving information about probabilities for the second toss.

(c) Find the probability that in 10 flips of Coin C, there will be exactly 3 Heads.
(The coin is equally likely to be either of the 2 coins; do not assume it already landed
Heads twice as in (a). Do not simplify.)
Let X be the number of Heads in 10 tosses. By the Law of Total Probability
(conditioning on which of the two coins C is),

P (X = 3) = P (X = 3|fair)P (fair) + P (X = 3|biased)P (biased)


✓ ◆ ✓ ◆
10 10 10
= (1/2) (1/2) + (1/4)3 (3/4)7 (1/2)
3 3
✓ ◆✓ ◆
1 10 1 37
= + .
2 3 210 410

2
3. Five people have just won a $100 prize, and are deciding how to divide the $100
up between them. Assume that whole dollars are used, not cents. Also, for example,
giving $50 to the first person and $10 to the second is di↵erent from vice versa.
(a) How many ways are there to divide up the $100, such that each gets at least $10?
Hint: there are n+kk 1 ways to put k indistinguishable balls into n distinguishable
boxes; you can use this fact without deriving it.
Give each person $10, and then distribute the remaining $50 arbitrarily. By the
hint (thinking of people as boxes and dollars as balls!), the number of ways is
✓ ◆ ✓ ◆ ✓ ◆
5 + 50 1 54 54
= = .
50 50 4

(b) Assume that the $100 is randomly divided up, with all of the possible allocations
counted in (a) equally likely. Find the expected amount of money that the first
person receives (justify your reasoning).
Let Xj be the amount that j gets. By symmetry, E(Xj ) is the same for all j. But
X1 + · · · + X5 = 100, so by linearity 100 = 5EX1 . Thus, EX1 is $20.

(c) Let Aj be the event that the jth person receives more than the first person
(for 2  j  5), when the $100 is randomly allocated as in (b). Are A2 and A3
independent? (No explanation needed for this.) Express IA2 \A3 and IA2 [A3 in terms
of IA2 and IA3 (where IA is the indicator random variable of any event A).
The events A2 and A3 are not independent.
By definition, IA2 \A3 is 1 exactly when IA2 , IA3 are both 1. So

IA2 \A3 = IA2 IA3 .

As in inclusion-exclusion,

IA2 [A3 = IA2 + IA3 IA2 \A3 .

By the above, this is IA2 + IA3 I A2 I A3 .

3
4. (a) Let X ⇠ Pois( ), with > 0. Find E(X!), the average factorial of X.
(Simplify, and specify what condition on is needed to make the expectation finite.)
By LOTUS,
1
X 1
X
k k e
E(X!) = k!e /k! = e =
k=0 k=0
1

for 0 < < 1.


(b) The owner of a certain website is studying the distribution of the number of
visitors to the site. Every day, a million people independently decide whether to visit
the site, with probability p = 2⇥10 6 of visiting. Give a good, simple approximation
of the probability of getting at least two visitors on a particular day (simplify; your
answer should not involve series).
Let X be the number of visitors. Here n = 106 is large, p is small, and np = 2 is
moderate. So the Pois(2) distribution gives a good approximation, and

2 2 3
P (X 2) = 1 P (X < 2) ⇡ 1 e e ·2=1 .
e2

(c) In the scenario of (b), approximately how many days will it take on average until
there is a day with at least two visitors (including the day itself)?
Let T be the number of days needed, so T 1 is Geometric with parameter the
probability found in (b) (using the convention that the Geometric starts at 0). Then
E(T 1) = (1 p2 )/p2 , where p2 is the probability from (b). Thus,

E(T ) = 1/p2 ⇡ (1 3/e2 ) 1 .

4
5. Alice flips a fair coin n times and Bob flips another fair coin n + 1 times, resulting
in independent X ⇠ Bin(n, 12 ) and Y ⇠ Bin(n + 1, 12 ).
(a) Let V = min(X, Y ) be the smaller of X and Y , and let W = max(X, Y ) be the
larger of X and Y . (If X = Y , then V = W = X = Y .) Find E(V ) + E(W ) in
terms of n (simplify).
Note that V + W = X + Y (adding the smaller and larger is the same as adding
both numbers). So by linearity,
1
E(V ) + E(W ) = E(V + W ) = E(X + Y ) = E(X) + E(Y ) = (2n + 1)/2 = n + .
2
(b) Is it true that P (X < Y ) = P (n X <n+1 Y )? Explain why or why not.
Yes: n X ⇠ Bin(n, 1/2) (if a fair coin is flipped n times, then the number of
Heads and the number of Tails have the same distribution). Similarly, n + 1 Y has
the same distribution as Y , so the equation is true.
(c) Compute P (X < Y ) (simplify). Hint: use (b) and that X and Y are integers.
Simplifying,

P (X < Y ) = P (n X <n+1 Y ) = P (Y < X + 1) = P (Y  X)

since X and Y are integers (e.g., Y < 5 is equivalent to Y  4). But Y  X is the
complement of X < Y , so P (X < Y ) = 1 P (X < Y ). Thus, P (X < Y ) = 1/2.

5
2 Stat 110 Midterm from 2008
1. The gambler de Méré asked Pascal whether it is more likely to get at least one
six in 4 rolls of a die, or to get at least one double-six in 24 rolls of a pair of dice.
Continuing this pattern, suppose that a group of n fair dice is rolled 4 · 6n 1 times.
(a) Find the expected number of times that “all sixes” is achieved (i.e., how often
among the 4 · 6n 1 rolls it happens that all n dice land 6 simultaneously). (Simplify.)

Let Ij be an indicator r.v. for the event “all sixes” on the jth roll. Then E(Ij ) =
1/6n , so the expected value is 4 · 6n 1 /6n = 2/3.

(b) Give a simple but accurate approximation of the probability of having at least
one occurrence of “all sixes”, for n large (in terms of e but not n).

By a Poisson approximation with = 2/3 (the expected value from (a)), the
probability is approximately 1 e 2/3 .

(c) de Méré finds it tedious to re-roll so many dice. So after one normal roll of the
n dice, in going from one roll to the next, with probability 6/7 he leaves the dice in
the same configuration and with probability 1/7 he re-rolls. For example, if n = 3
and the 7th roll is (3, 1, 4), then 6/7 of the time the 8th roll remains (3, 1, 4) and
1/7 of the time the 8th roll is a new random outcome. Does the expected number
of times that “all sixes” is achieved stay the same, increase, or decrease (compared
with (a))? Give a short but clear explanation.

The answer stays the same, by the same reasoning as in (a), since linearity of
expectation holds even for dependent r.v.s.

6
2. To battle against spam, Bob installs two anti-spam programs. An email arrives,
which is either legitimate (event L) or spam (event Lc ), and which program j marks
as legitimate (event Mj ) or marks as spam (event Mjc ) for j 2 {1, 2}. Assume that
10% of Bob’s email is legitimate and that the two programs are each “90% accurate”
in the sense that P (Mj |L) = P (Mjc |Lc ) = 9/10. Also assume that given whether an
email is spam, the two programs’ outputs are conditionally independent.
(a) Find the probability that the email is legitimate, given that the 1st program
marks it as legitimate (simplify).

By Bayes’ Rule,
9
P (M1 |L)P (L) · 1
10 10 1
P (L|M1 ) = = 9 1 1 9 = .
P (M1 ) 10
· 10
+ 10 · 10
2

(b) Find the probability that the email is legitimate, given that both programs mark
it as legitimate (simplify).

By Bayes’ Rule,

P (M1 , M2 |L)P (L) ( 9 )2 · 10


1
9
P (L|M1 , M2 ) = = 9 2 10 1 1 2 9 = .
P (M1 , M2 ) ( 10 ) · 10 + ( 10 ) · 10
10

(c) Bob runs the 1st program and M1 occurs. He updates his probabilities and
then runs the 2nd program. Let P̃ (A) = P (A|M1 ) be the updated probability
function after running the 1st program. Explain briefly in words whether or not
P̃ (L|M2 ) = P (L|M1 \ M2 ): is conditioning on M1 \ M2 in one step equivalent to
first conditioning on M1 , then updating probabilities, and then conditioning on M2 ?

Yes, they are the same. If this were not the case, conditional probability would be
incoherent, since both are valid methods for updating probability. The probability
of an event given various pieces of evidence does not depend on the order in which
the pieces of evidence are incorporated into the updated proabilities.

7
3. (a) Let X1 , X2 , . . . be independent N (0, 4) r.v.s., and let J be the smallest value
of j such that Xj > 4 (i.e., the index of the first Xj exceeding 4). In terms of the
standard Normal CDF , find E(J) (simplify).

We have J 1 ⇠ Geom(p) with p = P (X1 > 4) = P (X1 /2 > 2) = 1 (2), so


E(J) = 1/(1 (2)).

(b) Let f and g be PDFs with f (x) > 0 and g(x) > 0 for all x. Let X be a random
variable with PDF f . Find the expected value of the ratio fg(X)
(X)
(simplify).

By LOTUS,
Z 1 Z 1
g(X) g(x)
E = f (x)dx = g(x)dx = 1.
f (X) 1 f (x) 1

x
(c) Define F (x) = e e . This is a CDF (called the Gumbel distribution) and is a
continuous, strictly increasing function. Let X have CDF F , and define W = F (X).
What are the mean and variance of W (simplify)?

Note that W is obtained by plugging X into its own CDF. The CDF of W is
1 1
P (W  w) = P (F (X)  w) = P (X  F (w)) = F (F (w)) = w

for 0 < w < 1, so W ⇠ Unif(0, 1). Thus, E(W ) = 1/2 and Var(W ) = 1/12.

8
4. (a) Find E(2X ) for X ⇠ Pois( ) (simplify).
P P1
By LOTUS, E(2X ) = 1 k
k=0 2 e
k
/k! = e k=0 (2 )k /k! = e e2 = e .

(b) Let X and Y be independent Pois( ) r.v.s, and T = X + Y . Later in the course,
we will show that T ⇠ Pois(2 ); here you may use this fact. Find the conditional
distribution of X given T = n, i.e., find the conditional PMF P (X = k|T = n)
(simplify). Which “important distribution” is this conditional distribution, if any?

k n k
✓ ◆
P (X = k, X + Y = n) P (X = k)P (Y = n k) e e e2 n! n 1
= = = ,
P (T = n) P (T = n) k! n k! (2 )n k 2n

which is the PMF of the Bin(n, 1/2) distribution.

(c) Again let X and Y be Pois( ) r.v.s, and T = X + Y , but now assume now that
X and Y are not independent, and in fact X = Y . Prove or disprove the claim that
T ⇠ Pois(2 ) in this scenario.

The r.v. T = 2X is not Poisson: it can only take even values 0, 2, 4, 6, . . . , whereas
any Poisson r.v. has positive probability of being any of 0, 1, 2, 3, . . . .
Alternatively, we can compute the PMF of 2X, or note that Var(2X) = 4 6=
2 = E(2X), whereas for any Poisson r.v. the variance equals the mean.

9
3 Stat 110 Midterm from 2009
1. (a) Let X ⇠ Pois( ). Find E(eX ) (simplify).

By LOTUS and the Taylor series for ex ,


1
X 1
X
X k k
E(e ) = e e /k! = e ( e)k /k! = e e e
=e (e 1)
.
k=0 k=0

(b) The numbers 1, 2, 3, . . . , n are listed in some random order (with all n! permuta-
tions equally likely). An inversion occurs each time a pair of numbers is out of order,
i.e., the larger number is earlier in the list than the smaller number. For example,
3, 1, 4, 2 has 3 inversions (3 before 1, 3 before 2, 4 before 2). Find the expected
number of inversions in the list (simplify).

Let Iij be the indicator of i and j being out of order, for each pair (i, j) with i < j.
There are n2 such indicators, each of which has expected value 1/2 by symmetry (i
before j and j before i are equally likely). So by linearity, the expected number of
inversions is n2 /2 = n(n4 1) .

10
2. Consider four nonstandard dice (the Efron dice), whose sides are labeled as follows
(the 6 sides on each die are equally likely).
A: 4, 4, 4, 4, 0, 0
B: 3, 3, 3, 3, 3, 3
C: 6, 6, 2, 2, 2, 2
D: 5, 5, 5, 1, 1, 1
These four dice are each rolled once. Let A be the result for die A, B be the result
for die B, etc.
(a) Find P (A > B), P (B > C), P (C > D), and P (D > A).

P (A > B) = P (A = 4) = 2/3
P (B > C) = P (C = 2) = 2/3
P (C > D) = P (C = 6) + P (C = 2, D = 1) = 2/3
P (D > A) = P (D = 5) + P (D = 1, A = 0) = 2/3
So the probability of each die beating the next is 2/3, going all the way around
in a cycle (these are “nontransitive dice”).

(b) Is the event A > B independent of the event B > C? Is the event B > C
independent of the event C > D? Explain.

A > B is independent of B > C since A > B is the same thing as A = 4,


knowledge of which gives no information about B > C (which is the same thing as
C = 2). On the other hand, B > C is not independent of C > D since P (C >
D|C = 2) = 1/2 6= 1 = P (C > D|C 6= 2).

11
3. A discrete distribution whose possible values are nonnegative integers has the
memoryless property if for X an r.v. with that distribution, P (X j + k|X j) =
P (X k) for all nonnegative integers j, k.

(a) If X has a memoryless distribution with CDF F and PMF pi = P (X = i), find
an expression for P (X j + k) in terms of F (j), F (k), pj , pk .

By the memoryless property,

P (X j + k, X j) P (X j + k)
P (X k) = P (X j + k|X j) = = ,
P (X j) P (X j)
so

P (X j + k) = P (X j)P (X k) = (1 F (j) + pj )(1 F (k) + pk ).

(b) Name one important discrete distribution we have studied so far which has the
memoryless property. Justify your answer with a clear interpretation in words or
with a computation.

The Geometric distribution is memoryless (in fact, it turns out to be essentially


the only discrete memoryless distribution!). This follows from the story of the Geo-
metric: consider Bernoulli trials, waiting for the first success (and defining waiting
time to be the number of failures before the first success). Say we have already had j
failures without a success. Then the additional waiting time from that point forward
has the same distribution as the original waiting time (the Bernoulli trials neither
are conspiring against the experimenter nor act as if he or she is “due” for a success:
the trials are independent). A calculation agrees: for X ⇠ Geom(p),

P (X j + k) q j+k
P (X j + k|X j) = = j = q k = P (X k).
P (X j) q

12
4. The book Red State, Blue State, Rich State, Poor State (by Andrew Gelman)
discusses the following election phenomenon: within any U.S. state, a wealthy voter
is more likely to vote for a Republican than a poor voter; yet the wealthier states
tend to favor Democratic candidates! In short: rich individuals (in any state) tend
to vote for Republicans, while states with a higher percentage of rich people tend to
favor Democrats.

(a) Assume for simplicity that there are only 2 states (called Red and Blue), each
of which has 100 people, and that each person is either rich or poor, and either a
Democrat or a Republican. Make up numbers consistent with the above, showing
how this phenomenon is possible, by giving a 2 by 2 table for each state (listing how
many people in each state are rich Democrats, etc.).

Red Dem Rep Total Blue Dem Rep Total

Rich 5 25 30 Rich 45 15 60

Poor 20 50 70 Poor 35 5 40

Total 25 75 100 Total 80 20 100

The above tables are as desired: within each state, a rich person is more likely
to be a Republican than a poor person; but the richer state has a higher percentage
of Democrats than the poorer state. Of course, there are many possible tables that
work.
Just giving tables all that was needed for this part, but note that the above
example is a form of Simpson’s paradox : aggregating the two tables seems to give
di↵erent conclusions than conditioning on which state a person is in. Letting D, W, B
be the events that a randomly chosen person is a Democrat, wealthy, and from the
Blue State (respectively), for the above numbers we have P (D|W, B) < P (D|W c , B)
and P (D|W, B c ) < P (D|W c , B c ) (controlling for whether the person is in the Red
State or the Blue State, a poor person is more likely to be a Democrat than a rich
person), but P (D|W ) > P (D|W c ) (stemming from the fact that the Blue State is
richer and more Democratic).

(b) In the setup of (a) (not necessarily with the numbers you made up there), let
D be the event that a randomly chosen person is a Democrat (with all 200 people
equally likely), and B be the event that the person lives in the Blue State. Suppose

13
that 10 people move from the Blue State to the Red State. Write Pold and Pnew for
probabilities before and after they move. Assume that people do not change parties,
so we have Pnew (D) = Pold (D). Is it possible that both Pnew (D|B) > Pold (D|B) and
Pnew (D|B c ) > Pold (D|B c ) are true? If so, explain how it is possible and why it does
not contradict the law of total probability P (D) = P (D|B)P (B) + P (D|B c )P (B c );
if not, show that it is impossible.

Yes, it is possible. Suppose with the numbers from (a) that 10 people move from
the Blue State to the Red State, of whom 5 are Democrats and 5 are Republicans.
Then Pnew (D|B) = 75/90 > 80/100 = Pold (D|B) and Pnew (D|B c ) = 30/110 >
25/100 = Pold (D|B c ). Intuitively, this makes sense since the Blue State has a higher
percentage of Democrats initially than the Red State, and the people who move have
a percentage of Democrats which is between these two values.
This result does not contradict the law of total probability since the weights
P (B), P (B c ) also change: Pnew (B) = 90/200, while Pold (B) = 1/2. The phenomenon
could not occur if an equal number of people also move from the Red State to the
Blue State (so that P (B) is kept constant).

14
4 Stat 110 Midterm from 2010
1. A family has two children. The genders of the first-born and second-born are
independent (with boy and girl equally likely), and which seasons the children were
born in are independent, with all 4 seasons equally likely.
(a) Find the probability that both children are girls, given that a randomly chosen
one of the two is a girl who was born in winter (simplify).

Once we specify the random child and learn she is a girl, we just need the other
child to be a girl; this has probability 1/2.
To write this more precisely, let Gj be the event that the jth born is a girl and Wj
be the event that the jth born is winter-born, for j = 1, 2. Define G3 , W3 similarly
for the randomly chosen child; we want P (G1 \ G2 |G3 \ W3 ). Conditioning on the
event A that the randomly chosen child is the first-born,

P (G1 \G2 |G3 \W3 ) = P (G1 \G2 |G3 , W3 , A)P (A|G3 , W3 )+P (G1 \G2 |G3 , W3 , Ac )P (Ac |G3 , W3 ).

But

P (G1 \ G2 |G3 , W3 , A) = P (G1 \ G2 |G1 , W1 , A) = P (G2 |G1 , W1 , A) = 1/2,

and similarly P (G1 \ G2 |G3 , W3 , Ac ) = 1/2, so the desired probability is 1/2.

(b) Find the probability that both children are girls, given that at least one of the
two is a girl who was born in winter (simplify).

Since the probability that a specific child is a winter-born girl is 1/8,

P (both girls, at least one born in winter)


P (both girls|at least one winter girl) =
P (at least one winter girl)
(1/4)(1 (3/4)2 ) 7/64
= 2
=
1 (7/8) 15/64
= 7/15.

Surprisingly, the seemingly irrelevant information about the season of birth matters,
unlike in the previous part!

15
2. In each day that the “Mass Cash” lottery is run in Massachusetts, 5 of the integers
from 1 to 35 are chosen (randomly and without replacement).
(a) Suppose you guess 5 numbers for the lottery. Find the probability of guessing
exactly 3 numbers right, given that you guess at least 1 of the numbers right (leave
your answer in terms of binomial coefficients).

The distribution is Hypergeometric (think of capture-recapture, “tagging” the num-


bers you choose). So

P (exactly 3 right)
P (exactly 3 right|at least 1 right) =
1 P (none right)
5 30
3 2
/ 35
5
= 5 30 .
1 0 5
/ 35
5

(b) Find an exact expression for the expected number of days needed so that all of
the 35
5
possible lottery outcomes will have occurred (leave your answer as a sum,
which can involve binomial coefficients).

Let n = 35
5
. By the coupon collector problem (or directly by linearity, writing the
expected number of days as a sum of Tj ’s with Tj 1 a Geometric), the expected
value is
✓ ◆
1 1 1
n + + ··· + + 1 .
n n 1 2

(c) Approximate the probability that after 50 days of the lottery, every number from
1 to 35 has been picked at least once (don’t simplify, but your answer shouldn’t
involve a sum).

Let Aj be the event that j doesn’t get picked, so

P (Aj ) = (30/35)50 = (6/7)50 .

Let X be the number of Aj that occur. A Poisson approximation for X is reasonable


since these events are rare and weakly dependent. This gives
35·(6/7)50
P (X = 0) ⇡ e .

16
3. Let U ⇠ Unif(0, 1), and X = ln 1 UU .
(a) Write down (but do not compute) an integral giving E(X 2 ).

By LOTUS, ✓ ✓ ◆◆2
Z 1
2 u
E(X ) = ln du.
0 1 u

(b) Find the CDF of X (simplify).

This can be done directly or by Universality of the Uniform. For the latter, solve
ex
x = ln( 1 u u ) for u, to get u = 1+e x . So X = F
1
(U ) where

ex
F (x) = .
1 + ex
This F is a CDF (by the properties of a CDF, as discussed in class). So by Univer-
sality of the Uniform, X ⇠ F .

(c) Find E(X) without using calculus (simplify).


Hint: 1 U has the same distribution as U .

By symmetry, 1 U has the same distribution as U , so by linearity,

E(X) = E(ln U ln(1 U )) = E(ln U ) E(ln(1 U )) = 0.

17
4. Let X1 , X2 , X3 , . . . , X10 be the total number of inches of rain in Boston in October
of 2011, 2012, 2013, . . . , 2020, with these r.v.s independent N (µ, 2 ). (Of course,
rainfall can’t be negative, but µ and are such that it is extremely likely that all the
Xj ’s are positive.) We say that a record value is set in a certain year if the rainfall
is greater than all the previous years (going back to 2011; so by definition, a record
is always set in the first year, 2011).
(a) On average, how many of these 10 years will set record values? (Your answer can
be a sum but the terms should be simple.)

Let Ij be the indicator r.v. of the jth year setting a record. Then P (Ij = 1) is 1/j,
since by symmetry all orderings of X1 , . . . , Xj are equally likely (so the largest of
these values is equally likely to be anywhere among them). By linearity, the expected
number of record values is
1 1 1
1+ + + ··· + .
2 3 10

(b) Is the indicator of whether the year 2013 sets a record independent of the indicator
of whether the year 2014 sets a record? (Justify briefly.)

Yes, they are independent (somewhat surprisingly). Determining whether there is a


record in 2014 is not related to the “internal squabble” of which of X1 , X2 , X3 is the
biggest. Let J be the index of whichever of X1 , X2 , X3 is largest (so J takes values
1,2,3). By symmetry, the probability that X4 is larger than all of X1 , X2 , X3 is not
a↵ected by conditioning on J; note though that saying X3 is a record is the same as
saying that J = 3.

(c) Later in the course, we will show that if Y1 ⇠ N (µ1 , 12 ) and Y2 ⇠ N (µ2 , 22 ) are
independent, then Y1 Y2 ⇠ N (µ1 µ2 , 12 + 22 ). Using this fact, find the probability
that the October 2014 rainfall will be more than double the October 2013 rainfall in
Boston, in terms of .

We have
✓ ◆
Y µ µ
P (X4 > 2X3 ) = P (2X3 X4 < 0) = P (Y < 0) = P p <p ,
5 5
⇣ ⌘ ⇣ ⌘
for Y ⇠ N (µ, 5 2 ). This is p µ , which can also be written as 1 pµ .
5 5

18

You might also like