Math 181A WI25 HW Problems Complete
Math 181A WI25 HW Problems Complete
General instructions:
Clearly and thoroughly write your solutions on blank paper, showing all your work. See the syllabus
for instructions for uploading to Gradescope. See the calendar for due dates/times.
You may list answers in exact form (e.g., π) or round to three decimal places (e.g., 3.142), unless
the problem says otherwise. If rounding to three decimal places would result in the number 0 (e.g.,
with 0.00012345), instead use scientific notation and write three decimal places (e.g., 1.235 · 10−4 ).
On any problem involving R, you must include your code and output as part of your answer. You
may take a screenshot of the code/output, or write it by hand.
Problems tend to focus on content from the two or three previous lectures and never require ideas
from a lecture that falls on the day a problem is due. For example, if a problem is due on Friday the
19th, it is likely to use ideas from lectures on Wednesday the 17th, Monday the 15th, and/or Friday
the 12th. It will not use ideas from the lecture on Friday 19th. It is possible that a problem requires
knowledge from earlier in the course, or from prerequisite courses. If some prerequisite knowledge
is required which you have forgotten, you should feel free to consult books/internet to learn this
knowledge (e.g., Taylor series, impropoer integrals, L’Hôpital’s Rule, etc.). Expect prerequisite
knowledge to be drawn on frequently.
At the end of the course calendar, you should see a phrase like “Problems XX-XX not collected”.
This refers to the problems at the end of this packet that are here to help you learn the material
but cannot be collected/graded because of union rules related to UCSD graders. You should work
these problems to develop your mastery of topics from the last few lectures in the course as you
prepare for the final exam.
1. The simplest random variable (RV) follows the Bernoulli distribution. This is a RV with two
possible values: success (which we think of as 1), which appears with probability p, and failure (0),
which appears with probability 1 − p.
(a) Explain why the pmf can be written in this surprising way: f (x; p) = px (1 − p)1−x .
(b) For many students, the above pmf feels like pure magic. Explain how you can come up with
this if you happen to know the pmf for the Binom(n, p) distribution.
(c) Explicitly calculate the mean and variance of the Bernoulli RV with parameter p using the
definitions of mean and variance.
(d) If X1 , . . . , Xn are iid (independent and identically distributed) Bernoulli(p) RVs, and Y ∼
Binom(n, p) is Binomial, write a formula that relates Y and the Xi s. Then, explain how the
formula can help you easily remember the mean and variance of a Binomial RV.
2. Let X1 , X2 , . . . , Xn be an iid sample from a random variable X with finite mean and finite variance.
X1 + X2 + · · · + Xn
Let X = . Show that:
n
(a) E[X] = E[X]
V ar[X]
(b) V ar[X] =
n
In your solution for each part, explicitly mention where the “independence” (from iid) is actually
needed and where the “identically distributed” (from iid) is actually needed. Finally, memorize
these two facts; we will need them almost every day moving forward.
3. The “Kernel Technique”. One of the most helpful tricks in mathematical statistics is to use
the
Z fact that all probability density functions must integrate to 1 over their support. That is
Z ∞
f (x) dx = 1. For example, if we take X ∼ Exp(λ = 4), then we know 4e−4x dx = 1
support 0
since the pdf is f (x) = 4e−4x . Now, each pdf (or pmf) can be separated into two parts: the con-
stant(s) and the terms with the variable (known as the “kernel”). For the exponential distribution
above, the kernel is e−4x . This distinction is useful because:
Z Z Z
1
1= f (x) dx = constant · kernel dx =⇒ kernel dx = .
support support support constant
Z ∞
2 /2
(a) Find e−x dx. Do not try to use integration techniques from calculus. Instead, think of
−∞
2
a RV with a pdf whose kernel looks like e−x /2 and use the above comments to immediately
write the answer. (Mention the RV in your answer on all parts!)
Z ∞
(b) Find x4 e−3x dx. (Integration by parts 4 times? Nope. The kernel technique.)
0
∞
X 2x
(c) (You could do this problem using Taylor series, but use the kernel technique. Note that
x!
x=0
because we have sum instead of an integral, you should be thinking about discrete random
variables here, not continuous random variables.)
∞
X x−1
(d) (0.7)x−4 (This is very scary without the kernel technique.)
3
x=4
Find the MME (method of moments estimate/estimator) for θ. (Note: We always assume a pdf is
0 outside of the zone specified. For example, here we assume fX (x; θ) = 0 if x ≤ 0 or x ≥ 1.)
5. In the 2017 video game hit Legend of Zelda: Breath of the Wild, you must collect star fragments
to upgrade your armor to the highest levels. You decide to explore the mechanic behind how these
rare items are generated in the game. Suppose you have this partial knowledge:
A star fragment will appear once per night, sometime after 9 PM (using the in-game clock).
Once the clock reads θ (a particular, unknown time of day), star fragments no longer appear.
The game uses a random number generator to decide on the spawn time for the star fragment
where each time between 9 PM and θ is equally likely.
It is important for the gaming community to learn what θ is because this helps users understand
the game and saves people time: If you know the time is past θ, you will stop waiting for the star
fragment (which you missed!) and plan on trying again the next night. To help the community,
you plan to record the appearance time for 6 star fragments on 6 random (in-game) nights. You
get: x1 =11:20 PM, x2 = 1:20 AM, x3 =12:20 AM, x4 = 10:00 PM, x5 = 1:05 AM, and x6 = 11:55
PM. Find the MME for θ based on these six data points.
τ
3, x=1
τ ,
x=2
6. Suppose a discrete RV is modeled by pX (x; τ ) = τ6
4,
x=3
1 − 3τ
4 , x=4
(a) Find an MME for λ using the first moment of X (which is what people typically use).
(b) Find an MME using the second (!) moment of X. Then, see if the estimators in parts a and
b give the same estimate for λ using the data X1 = 1, X2 = 2, X3 = 2.
8. Engineers will often use this distribution to model the lifetime of electronics:
β
fY (y; α, β) = αβy β−1 e−αy , where y > 0, α > 0, β > 0
Assuming Y1 , . . . , Yn are iid from this distribution, find the MME for α assuming that β is fixed
(known). (Hint: After setting up an integral, try a u-substitution with u = αy β . Remember
to
1
switch the bounds to u bounds, and the switch the dy as well. Your answer will have a Γ 1 +
β
in it. Also, this problem might be the first time in your life that you’ve seen exponents inside
exponents. Often, such expressions can be tough to read, so mathematicians will use the notation
exp(a) to mean ea . With this notation, we can write the pdf as αβy β−1 exp(−αy β ), which is a little
clearer.)
2y −y2 /θ
9. Let Y be a CRV with density fY (y; θ) = e where y > 0, θ > 0. Given a random sample
θ
y1 , . . . , y n :
(a) Find the MLE for θ. (As always with one parameter, you must check the second derivative
condition!)
(b) Find the MME for θ using first moments. You should get a different answer from part a, hence
showing the MME and MLE may be different.
10. One common distribution that appears in branching process theory is a DRV with pmf:
e−µx (µx)x−1
fX (x; µ) = where x ∈ {1, 2, . . .} and µ ∈ (0, 1)
x!
(a) Find the MLE for µ given iid X1 , . . . , Xn . Then, find the MLE for the particular data x1 =
2, x2 = 1, x3 = 6.
(b) Using Desmos, draw a graph of the likelihood function (not log-likelihood) for the data x1 =
2, x2 = 1, x3 = 6. It should be maximal at the µ value you found in part a. Include a sketch
of the graph from Desmos (or a screenshot if you’re tech-fancy). (Note: In Desmos, if you
click on the wrench icon in the upper-right, you can change the range of values on the x and
y axes.)
11. Economists frequently use the CRV X with pdf:
fX (x; α, β) = βαβ x−β−1 where x ≥ α > 0 and β > 1
Find the MLE for α and β. (As with all multivariable maximization problems in this class, you
need NOT show your MLE is maximal via higher derivatives. Also, as on all MLE problems, if no
data values are explicitly given, you should begin by naming them for your use: “Let x1 , . . . , xn be
a random sample of data.”.)
12. Suppose Y is a CRV whose pdf is pictured below. Find the MME and MLE for w given the small
sample: y1 = 1, y2 = 3. (Technology may be useful on the MLE. Do not try to find a formula for
the MLE in the general case of n data; no closed-form solution exists.)
13. Let’s start using R to see estimators in action. While an estimator looks like formula, it is actually a
random variable because as different random samples come out of a distribution, they combine (via
the formula) to make a random value. Different data give rise to different values of the estimator.
Let’s consider estimating the parameters from N (µ, σ 2 ). In the Lecture 4 “Additional Practice”
section, we see the MLEs are:
n
c2 = 1
X
µ
b = x and σ (xi − x)2
n
i=1
(a) Imagine we are collecting data on IQ scores at UCSD, and suppose these are N (µ = 106, σ 2 =
142 ). I have to give you values for µ and σ 2 so we can run a simulation, but pretend we don’t
know them! Using the function rnorm in R, generate a random sample of 23 IQ scores from this
distribution, and then write code that finds µ c2 . Include your code and results. (Note:
b and σ
When calculating σ c2 , do not use the built-in function var, as this does a slightly different
calculation than our formula above and because I want you to see how straight-forward it is
to calculate the formula for σc2 in a vectorized language like R.)
(b) Now, let’s imagine that instead of one sample of size 23, we collected 1000 samples, each of
size 23. Each sample gives a value for µ b, so we have 1000 different values for µ b. Using the
replicate function in R, create these 1000 values for µb, and then use the hist function to make
a histogram. Include your code and a rough sketch (or screenshot) of the histogram. This
picture allows you to see µ b as a random variable. In R, you can type ?replicate to read the
documentation for the replicate function.
14. Suppose we have a random sample Y1 , . . . , Yn from a CRV with density
θ
fY (y; θ) = where y > 0, θ > 1
(y + 1)θ+1
15. Suppose that the time it takes your computer to load R Studio on a random day is normally
distributed with unknown mean, µ, and variance 1.2 seconds2 . You’d like to build a 95% confidence
interval for µ, so you time your load speeds on 6 random days: 2.1, 1.7, 3.3, 2, 2.1, 1.9 (seconds).
(a) What value must c be for this to be a 70% confidence procedure? (Also, see problem 17.)
(b) Your friend collects some data in the above setting, builds a 70% CI, and writes in a journal
article: “Given our data, we find there is a 70% chance that the unknown µ is in our interval.”.
Critique this statement and offer an improved statement.
17. Let’s check your answer to 16a. Below is an outline of some R code that does this. Fill in the
missing parts, and then type the code into R and run it to see if your answer from 16a was correct.
Our setup will assume the n = 25 data come from the distribution N (µ = 7, σ 2 = 16) (we must set
a value for µ to run the simulation!). We make 50000 intervals and then see which capture µ and
which don’t. Finally, we calculate the confidence level.
18. Suppose X ∼ U nif (0, θ2 ) where θ > 0 is unknown. In this problem, we find point and interval
estimators for θ.
20. Suppose you are drawing a random sample of size n > 0 from N (µ, σ 2 ) where σ > 0 is known.
Decide if the following statements
are true or false and
explain your reasoning. Assume our 95%
σ σ
confidence procedure is X − 1.96 √ , X + 1.96 √ .
n n
(a) If (3.2, 5.1) is a 95% CI from a particular random sample, then there is a 95% chance that µ
is in this interval.
(b) If (3.2, 5.1) is a 95% CI from a particular random sample, then there is a 95% chance that the
mean from our next random sample will be in this interval.
(c) A 95% CI will contain 95% of the possible values from the population distribution we are
studying.
(d) If we generate 400 random CIs using our 95% confidence procedure, we expect about 20
intervals to not contain µ.
21. In the modern political era, campaign rallies are being infiltrated by people who do not support
the speaker (e.g., to sow dissent, to study those who do support the speaker, etc.). You’re curious
about this, so you decide to attend a political rally. Your plan is simple: you’ll choose 70 random
people in the audience and use hidden cameras to videotape them during the rally. When the crowd
breaks into a chant, you will use the video footage to see what proportion of your random subjects
actually engage in the chant. Suppose you do this and find only 58 of the 70 people took part in
the chant.
(a) Assign notation to and define the population parameter we are trying to study. Then, create
an approximate 92% CI for this parameter.
(b) You may have noticed that your approximate 92% CI from part a did not include 100% (or
1, if you are using decimals). Suppose you change the confidence level to C% and the upper
bound of the approximate CI exactly equals 100%. Find C.
22. One thing that often surprises San Diego newcomers is how present the U.S. military is here. As
a statistician at DQ Industries, your boss has tasked you with finding the proportion of San Diego
jobs that are connected to the military (this includes jobs at bases, contractors, military R & D,
etc.). You are required to draw a large-enough sample so that the sample proportion will be within
1.5% of the true value 90% of the time.
(a) Suppose you have no information about this proportion and that it costs $5 to contact each
person in your sample. What is the least amount of money you can spend to meet your
requirements?
(b) Your boss is horrified by the cost estimate in part (a). You decide to do some Google searching
to get an estimate for the proportion. At this website, you see that 1 in 5 jobs in San Diego
is linked to the military sector. Since this is a pro-military group, you figure this number can
serve as an upper bound on the true proportion. What is the new minimal cost estimate?
23. One frustrating issue with proportions is that your mathematics might give a CI like: (−2%, 7%)
or (96%, 103%). This typically happens when you are studying a trait with a proportion near 0%
or 100%. This makes it particularly difficult to study rare or hyper-prevelant phenomena.
(a) Suppose you are building an approximate 95% CI for a parameter p using a sample of size 100.
If the lower bound of your CI exactly equals 0%, what is your sample proportion? (Note: It
may not be possible to actually get this sample proportion because with 100 people the only
possible sample proportions are 0, 0.01, 0.02, . . . , 0.99, and 1.)
(b) Suppose you are trying to decide on a sample size for a study to determine what percentage
of Americans self-identify as transgender. Based on previous studies, this proportion is some-
where around 1%, so you decide you’d like to be within 0.1% of the true proportion 95% of the
time. If you take 7% as an upper bound on the transgender proportion, what is the smallest
sample size you can draw?
24. Caleb is ordering a piano for his new house. A standard piano has 88 keys, and the lifetime of a
key, in years, before it detunes or breaks is independently modeled by an exponential distribution,
1 breakdown
T , with parameter λ = . Once any one key detunes or breaks, Caleb will have to call
4 years
the piano tuner to repair his piano.
25. Allison is at a party with you, and in conversation, develops an ingenious new method to test
whether a community is suffering from food scarcity. She samples n random people from the com-
munity, and observes how many meals each misses per week. Then, if the median observation misses
more than 2 meals per week, the community is deemed food scarce.
In a scrambled haste to model the number of meals per week missed by a random individual
in your community, Allison sketches the following image on the back of a napkin:
In this continuous representation, 2.4 meals missed would mean that an individual missed 2 entire
meals, and missed out on 0.4 of what they should have eaten during a third meal. Find the
probability that Allison would deem your community food scarce given that she takes a sample of
5 people. Use an integral calculator to finish this problem.
26. Over the next month, Jack will run the 800 meter dash 10 times to try to run a time that is faster
than or equal to his current fastest time of 112 seconds. His times are iid from the distribution
1 x 39 x 40
fX (x) = exp − where x > 0
3 120 120
(a) Use a graphing calculator or Desmos to graph fX (x), and decide whether the pdf is reasonable
for a runner who generally runs around 120 seconds in the 800 meter dash. Include a screenshot
or sketch of the pdf as part of your answer.
(b) What is the probability that Jack runs a time that is faster than or equal to his current fastest
time in the 800 meter race during these 10 attempts?
28. Victoria is a competitive diver, and she is getting ready to compete at a major event. In diving,
athletes are often assigned a score the following way:
This is called the “trimmed mean” of the scores and is used to minimize the impact of extreme
scoring and biased judging. Assume Victoria’s scores from each judge are independently distributed
according to a uniform distribution on the interval [0, 10]. Victoria is interested in her expected
score and asks you to help her calculate it.
(a) [Least elegant, most tech-dependent approach] Using an integral calculator (!), find
Victoria’s expected score by calculating this nightmare:
E X(2) + E X(3) + E X(4) + E X(5) + E X(6)
5
(b) [More elegant, non-tech approach]
E X(2) + E X(3) + E X(4) + E X(5) + E X(6)
i. Show why is the same as
5
7 E X − E X(7) + E X(1)
.
5
ii. Without a calculator/tech (!), find E X(7) .
iii. Explain how you can infer E X(1) from E X(7) without actually calculating E X(1) .
iv. Find Victoria’s expected score using parts (i) through (iii).
(c) [Most elegant, no-work-needed approach] Now that you’ve seen the answer from parts
(a) and (b), explain in English how to intuitively get Victoria’s expected score without doing
any real work.
29. In this problem, we explore more estimators for µ and σ 2 in the distribution N (µ, σ 2 ).
2X1 + X2
(a) Typically, people use µ
b1 = X as an estimator for µ. You might also use µ
b2 = or
3
2 (X1 + 2X2 + 3X3 + · · · + nXn )
µ
b3 = . Show that all three of these estimators are unbiased.
n2 + n
(Note: µb2 might be used if you didn’t trust data X3 , . . . , Xn , while µ
b3 might be used if you
wanted to give increasing importance to data collected later in the process!)
n
c2 = 1 2
X
(b) In class, we showed that σ 1 Xi − X is a biased estimator for σ 2 when both µ and
n
i=1
n
1X
σ are unknown. Suppose, however, that µ is known and so we can use σ2 = c2 (Xi − µ)2 .
n
i=1
Show that σ c2 is actually unbiased.
2
30. Suppose you decide to randomly generate numbers from X ∼ U nif (0, θ). Your friend will ask for
n numbers and then use this information to guess what value you (secretly) chose for θ. Typically,
one might use θbMLE = max Xi = Xn′ to estimate θ. Your friend, however, has meganumerophobia,
and is afraid to say the maximum number in the random sample. Instead, he’ll say the second
′
largest number: θb = Xn−1 . Determine the bias of this estimator by carefully finding the density
′
function for Xn−1 and continuing from there. If the estimator is biased, check if it is asymptotically
unbiased, and also modify it to create a new unbiased estimator.
31. Suppose you have X ∼ Binom(n, p) where n is known and p is unknown. Typically, people use
X1
pb = to estimate p, where X = X1 is simply a sample of size 1. (Note: A sample of size 1 from
n
a Binomial RV is equivalent to n Bernoulli trials.) This might represent simultaneously flipping n
coins (just once!) and counting the number of heads you see, where each coin has pheads = p. Now,
if both n and p are known, we know the variance, V ,of Xisjust np(1 − p). If p is unknown, you
X1 X1
might want to estimate V using the estimator Vb = n 1− . Find the bias of Vb , and if
n n
it is biased, determine if it is asymptotically unbiased, and also modify Vb to create a new unbiased
estimator.
32. The number of times, X, a particular first-year college student calls home during a random week
is a Poisson RV with mean λ: X ∼ P oisson(λ). Curious to find the value for λ, you break into the
NSA (!) and access phone records for this student on n random weeks. You record the number of
calls home and get the random sample X1 , . . . , Xn .
33. Let X ∼ Exp(λ) with λ unknown, and suppose X1 , X2 is a random sample of size 2. Show that
√ 1
M = X1 · X2 is a biased estimator of and modify it to create an unbiased estimator. (Hint:
λ
During your journey, you’ll need the help of the gamma distribution, the gamma function, and the
√
knowledge that Γ(1/2) = π.)
34. Suppose that X ∼ U nif (0, 3θ) and we draw a random sample X1 , . . . , Xn . Find the MME and
4
compute its relative efficiency to θb2 = 2X1 − X2 .
3
35. In class, I showed the below picture. Here, I have changed the vertical axis from variance to SD.
In this new picture, how can we visualize the MSE? How does this way of seeing the MSE help us
decide which of two (possibly biased) estimators is more efficient?
36. Let X be a continuous random variable with E(X) = µ and V ar(X) = σ 2 < ∞. Suppose we try
to estimate µ using these two estimators from a random sample X1 , . . . , Xn (where n ≥ 3):
µ
b1 = X
µ
b2 = 2X1 + aX2 + bX3
For what a and b are both estimators unbiased and the relative efficiency of µ
b1 to µ
b2 is 45n?
37. Find the Fisher Information and the Cramer-Rao lower bound for the variance of an unbiased
estimator of θ given a random sample X1 , . . . , Xn from the density
x3 −x/θ
f (x; θ) = e where x > 0 and θ > 0.
6θ4
38. Find the Fisher Information and the Cramer-Rao lower bound for the variance of an unbiased
estimator of θ given a random sample X1 , . . . , Xn from the density
1
f (x; θ) = where − ∞ < x < ∞ and − ∞ < θ < ∞.
π · [1 + (x − θ)2 ]
You should use WolframAlpha.com to evaluate the complicated integral that will arise.
n
2x −x2 /θ 1X 2
39. Let X1 , . . . , Xn be iid based on f (x; θ) = e where x > 0. Show that θb = Xi is efficient.
θ n
i=1
3(θ − y)2
fY (y; θ) = where 0 < y < θ.
θ3
43. Suppose we try to model the test-taking abilities of a given student by the CRV X with pdf
Here, the constant θ is unknown and is determined by the work ethic and background training
of the student. Design an approximate 93% MLE CI for θ and use it to build a CI for the data
x1 = 0.8, x2 = 0.92, x3 = 0.81, x4 = 0.96 (which represent random test scores of the student: 80%,
92%, 81%, and 96%).
1
44. Consider a RV modeled by the density f (x; θ) = x(1−θ)/θ where 0 < x < 1 and θ > 0.
θ
(a) Find the MLE for θ based on a sample X1 , . . . , Xn .
(b) According to MLE theory, θbMLE should be asymptotically unbiased and consistent. Explicitly
show that both of these are true for your result from part a.
45. One distribution useful for modelling frequencies in the field of spectroscopy has the pdf:
−c
r exp
c 2x
f (x; c) = · 3/2
where x > 0, c > 0
2π x
√
a. Show that for any c > 0, the area under this pdf is 1, as it must be. (Recall that Γ(1/2) = π.)
b. Given iid data X1 , X2 , . . . , Xn from this distribution, find a formula for a 92% approximate
MLE CI for c.
46. Suppose X is a continuous random variable with pdf
where x > 0, µ is an unknown real number, and σ 2 > 0 is known. Suppose we have a random
sample X1 , . . . , Xn and wish to estimate µ.
(a) Find µbMLE , the maximum likelihood estimator for µ. (If you have some clever, fast way to do
this part, don’t use it. Show all the typical steps for finding an MLE.)
(b) Use your answer in part a to help find an approximate (two-sided) 92% MLE CI for µ given
the sample X1 = 1, X2 = e, X3 = e2 , X4 = e and σ = 2. (Again, don’t take any shortcuts
when answering this part. Show all the usual steps.)
47. State the decision rule (i.e., test) that would be used to test the following hypotheses for the specific
test statistic mentioned. Then, make a decision using the data provided and write a conclusion.
Assume the data come from a normal distribution with unknown µ and known σ. Include a picture
(OK to draw by hand, doing this in R is inefficient) of the sampling distribution for the test statistic
and label the critical region.
(a) H0 : µ = 20, H1 : µ < 20, n = 16, σ = 3, and α = 0.06. Test stat: x. Data: x = 18.5
x − 20
(b) H0 : µ = 20, H1 : µ < 20, n = 16, σ = 3, and α = 0.06. Test stat: √ . Data: x = 18.5
σ/ n
(c) H0 : µ = 10, H1 : µ ̸= 10, n = 100, σ = 0.4, and α = 0.12. Test stat: x. Data: x = 11
(d) H0 : µ = 50, H1 : µ > 50, n = 60, σ = 4, and α = 0.08. Test stat: 3x. Data: x = 50.5
Note: Life is about trade-offs. This problem helps you see this. For example, part a has a nicer-
looking test stat, but the distribution it follows is a little messier. Part b has a messy test stat, but
the distribution it follows is very nice. Part d is here to remind you that just about any expression
can act as a test stat, as long as you can determine its distribution. Since we never know what
expression might arise from MME or MLE, this reminder is comforting.
48. Calculate the P -values for problems 47b and 47c. Does using these P -values lead you to the same
conclusions as the critical regions did?
49. Suppose you wanted to alter problem 47a so that the P -value, when calculated, would equal 0.04.
If you could only change σ, what value would it need to equal to get the P -value to be 0.04?
50. In December 2017, the J-RPG Xenoblade Chronicles 2 was released for Nintendo Switch. The game
is epic in its number of main quests and side quests. Those that try to finish every aspect of the
game are known as “completionists”. What is the average time for all completionists in the world
(currently)? Assume completion times are normally distributed with unknown mean and standard
deviation 50 hours (a reasonable estimate for J-RPGs). Before you collect data, your friend claims
this average time is 250 hours (based on her personal experience). You think the value is something
different and go to HowLongToBeat.com to find some data. Based on when I looked at this page
(don’t use more recent data!), 96 completionists had submitted their times for an average of 254
hours. Define parameter(s), write hypotheses, draw a sampling distribution, and decide which
hypothesis to support using α = 0.01 (and any one of the three methods shown in class). [For those
curious, my completion time was around 225 hours, and my current play time is around 700 hours
because of expansion pass content!]
51. Students often wonder what to do if you get a P -value of exactly 0.05 when α = 0.05. In truth,
it doesn’t matter if you suggest rejecting H0 or keeping it, because the probability your P -value
exactly equals α is 0 (since the P -value is actually a continuous random variable). Let’s say you
wanted to be evil and design a problem for your next statistics exam where the P -value would
exactly equal 0.05. You plan to make a problem where we study µ from X ∼ N (µ, 32 ) with data
x = 7 and n = 28. What value(s) should you have students use for the null hypothesis to get your
P -value to be 0.05 assuming a two-sided H1 ?
(a) Suppose a problem has H0 : µ = µ0 and H1 : µ ̸= µ0 . If a given data set causes us to reject
H0 when α = 0.02, would the same data force us reject H0 if α = 0.05?
(b) Suppose a problem has H0 : µ = µ0 and H1 : µ > µ0 . If a given data set causes us to reject
H0 for some α, would the same data force us to reject H0 if we change H1 to µ ̸= µ0 ? Assume
α remains the same.
53. You’ve just made the best app ever! You plan to upload it to the app store and are curious how
many reviews you might get from users. The histogram of review counts for various apps in the
Apple store is very right-skewed: most apps get a small number of reviews, but some apps −
like Pandora, PayPal, and LinkedIn − get millions. It turns out that ln(review count) is roughly
normally distributed with σ = 2.6 (for those apps with more than 5 reviews). In this problem,
we’ll explore Y ∼ N (µ, 2.62 ) where Y = ln X is the natural log of the review counts. Your friend
claims that µ = 6.5, but you think it’s higher: people love rating stuff in the modern era! Using the
data set AppleStore.csv (found on Canvas/TritonEd in the Homework folder), conduct a hypothesis
test to determine whom to momentarily believe in life. This data set contains information on 7197
random apps from Apple’s app store. Load this into R using the “Import Dataset” button in the
upper right window of R studio. Make sure to remove rows with 5 or fewer reviews using R’s
subset command. Your answer will be a mix of R code and written work. Use α = 0.01. The
rating count tot column lists how many times a given app has been rated/reviewed by users.
54. Most manufacturing processes create defective items. Often these are tolerated up to a certain
point, after which machines must be replaced, a costly and time-consuming process. Suppose you
are working at a company that will permit 6% of all items to be defective. Your boss is curious if
things have gotten worse and asks you to inspect 230 random items.
(a) If you find 23 items are defective, what advice should you give your boss based on an approx-
imate hypothesis test with α = 0.01? As always, follow the steps from class.
(b) In R, you can conduct the test quite easily with the prop.test command. Read the documen-
tation for this (type ?prop.test into the R Studio prompt) and write a single line of code that
reproduces your results from part a. (Note: Set the “continuity correction” to false. We don’t
discuss this in class, but you can read about it here if you’re curious.)
(c) Your boss is wondering if an exact test for this situation would give different results. Find the
P -value based on an exact test without using binom.test (you’ll still need the computer at one
point) and then with binom.test in R (you’ll get the same answer).
55. Does the idea known as “home-field advantage” (HFA) actually exist? HFA suggests that in a given
sport, the home team will beat the away team more often than half the time. Several theories have
been offered for why this might occur: familiarity with your arena/playing space, support of the
home crowd, refereeing that favors the home team, etc. To explore HFA, researchers looked at 1000
random NFL games in the last 40 years and found that in 574 cases, the home team won.
(a) Draw a conclusion about the idea of HFA using an approximate test with α = 0.02, and show
that you have met the conditions necessary for using this test.
(b) After publication of the findings from part a, you read on an NFL blog that “Data show the
existence of HFA, likely the result of biased refereeing.” Respond to this claim from a statistical
perspective.
56. People often look down on machine learning because in some settings, it can only improve things in
small increments. While this might be true, in many settings a slight change can have a huge impact.
As an example, Americans spend about 3 trillion dollars per year spread across 30 billion credit
card transactions. Suppose that 0.4% (0.004, as a decimal) of these transactions are fraudulent,
and hence, credit card companies lose money reimbursing their users. If machine learning could
help reduce the percentage of fraudulent claims even slightly, this would save companies billions of
dollars! Researchers at Visa have designed a new algorithm to predict fraud and are curious if it
has reduced illegal card usage. You are tasked with determining whether this claim is statistically
reasonable using an approximate test with α = 0.03.
(a) What is the smallest number of transactions you could look at and meet Larsen & Marx’s
requirements for using the approximate test?
(b) Suppose you end up looking at 2400 claims. What is the largest number of fraudulent claims
that could appear among those 2400 claims that would cause a move to the alternative hy-
pothesis? (Continue to think about the Larsen & Marx criterion as in part a.)
σ
57. What hypothesis test has duality with the CI −∞, X + 1.3 √ ? Assume X ∼ N (µ, σ 2 ) with σ
n
known and µ unknown.
58. Suppose you are studying a phenomenon that is well-modeled by N (µ, 32 ). Existing research claims
that µ = 20, but you think µ might be lower based on recent changes in society. You’d like to
collect some data to verify your claim and plan to use a sample of size 60. If µ is actually 19, what
should you set α to in your hypothesis test if you want a Type II error rate of 0.08? Include a
beautiful picture in your answer.
59. A real number is said to be “normal” if, when written in any base b, the digits 0, 1, 2, . . . , b − 1
all appear with equal frequency. That is, there should be an equal proportion of 0s, and 1s, and
2s, etc. One of the strangest results in mathematics is this: It can be shown that nearly all real
numbers are normal, and yet, proving any particular number is normal is very difficult. Indeed,
mathematicians do not know if π or e are normal!
On Canvas/TritonEd, you can find the file pibinary.csv which contains the first 10242 digits of
pi in binary (base 2). Notice it starts with 11, which means 3 in binary. If π really is normal, what
percentage of 1s do you expect in its binary expansion? Assuming the first 10242 digits are repre-
sentative of all of π, conduct a hypothesis test on the proportion of 1s in π by randomly selecting
314 digits without replacement. You should set up variable(s) and hypotheses, write code to select
the digits and get the sample proportion, draw a picture of a sampling distribution, shade an area,
calculate a P -value, and reach a conclusion about your hypotheses using α = 0.05.
60. I recently read Pete Buttigieg’s book “Shortest Way Home”, a beautiful autobiography discussing
his life growing up and time as mayor of South Bend, Indiana. While most people thought the
book would focus on his sexuality (he was one of the first openly-gay men to run for the US
presidency), instead he spends most of the book discussing the hard challenges he faced as mayor.
In one section, he describes implementing a technology known as Shotspotter which listens for gun
shots in communities and automatically dispatches police when it believes a shot has been fired.
Naturally, the technology makes mistakes because slamming car doors and dropped objects might
sound like a gun shot. Suppose you work at Shotspotter and run a bunch of tests to determine the
accuracy of your technology, getting the below table.
Suppose we set the null hypothesis H0 : “No bullet was fired”, since this is our go-to belief about
sounds in life. Describe the alternative hypothesis for this setup, explain what Type I and II errors
would mean, and discuss the consequence of making each type of error if this technology were used
in an actual city. Then, find the Type I and II error rates in this data set. Finally, discuss which
type of error you personally think is worse in your hometown and explain why.
61. Congratulations! You’ve just gotten a job at the most popular museum in America: The Air and
Space Museum in Washington DC. Your first task as the resident statistician is to decide if a recent
exhibit change has increased the average number of visitors per day. Before the change, the number
of visitors per day was N (24000, 20002 ). Your plan is to check attendance numbers on n random
days in the next year, but need n to be as small as possible because it is costly and intrusive to
count visitors. Your boss would be excited by a new daily average of 25000 (assume the spread is
unchanged by the exhibit) and wants you to use α = 0.04 in your test. If you demand a power of (at
least) 0.85, what sample size should you use? You MUST include a picture with your answer. Feel
free to use R/calculator to do some of the calculations. If you do, include your code/commands.
62. Every 4 years (or so), we get a leap day (2/29). This raises an interesting question: If you’re born
on leap day (a “leap-day baby”) but we’re in a non-leap year, would you rather celebrate your
birthday on 2/28 or 3/1? On 2/29/2020, I was listening to NPR and a guest claimed that leap-day
babies opt for the two options in equal proportion. Naturally, I doubted this (I wasn’t sure if the
preference would be for 2/28 or 3/1), and so let’s think about a study you could conduct. Imagine
we’ll ask 500 random leap-day babies which day they prefer and record the percentage that choose
2/28. If the true percentage that opt for 2/28 is 49%, find the Type II error rate and power of our
test assuming we use a significance level of 0.10. You must define a parameter, write hypotheses,
and include a beautiful picture in your answer.
63. This problem is inspired by one of my best students, who went on to get his PhD in materials
science at MIT and his MD from Columbia. If you look at car tires, they tend to average about
30000 miles before being replaced. In general, tire lifetimes are known to be normally distributed
with SD 4500 miles. Now, my student claims to have a new manufacturing process that raises this
average (he is right!) and keeps the spread the same. Let µ be the true average lifespan of tires
with the new manufacturing process. When you draw a sample of size n, the power of this HT is
0.2 when µ = 32000, and the power is 0.85176 when µ = 35000. Find α and n for this HT.
64. Before being de-platformed from Twitter in January 2021 (only to be re-platformed in November
2022), the number of tweets that former President Trump sent on a random day might be modeled
by X ∼ P oisson(λ). You’ve heard his average tweet rate was 8 tweets/day, but you believe it
might be lower. You plan to collect a sample of size 1 and reject H0 if X1 ≤ 3. Find the Type I
Error rate, and the Type II Error rate if λ = 6.4. On this problem, you must write hypotheses and
include a beautifully-labeled diagram in your answer with two pmfs and a rejection fence. (Include
your code if you use R to create the diagram.)
65. Suppose X ∼ Exp(λ) where X is modeled by f (x; λ) = λe−λx , where x, λ > 0. You draw a sample
of size n and plan to use the statistic Xmin to decide between two hypotheses.
(a) Show that Xmin also has an exponential distribution and determine what parameter it is based
on (instead of λ).
(b) You plan to test H0 : λ = 2 vs. H1 : λ > 2 via the rule: If Xmin < c, reject H0 ; else keep
H0 . What must c be if you want your Type II error rate to be 0.08 when the true value
of λ is 6? Assume that n = 5. On this problem, you must write hypotheses and include a
beautifully-labeled diagram in your answer with two pdfs and a rejection fence. (Include your
code if you use R to create the diagram.)
(n − 1)S 2
68. Students often find it hard to believe that ∼ χ2n−1 . So, let’s simulate this situation and
σ2
see if the data agree. To do so, generate n = 7 numbers from N (4, σ 2 = 6). Find the variance
of these n numbers, and replicate this process a total of 10000 times. Make a density plot of the
(n − 1)S 2
10000 values for . Then, in red, overlay the pdf for χ2n−1 . Include your code and a sketch
σ2
of your plot. (Note: You may need to look up how to do several of these steps in R. This is totally
fine and how your life might look in the future. I’ve programmed in many languages: Pascal, C,
C++, Java, Python, R, SQL, html, Matlab, etc.; it simply isn’t possible to remember the syntax of
so many languages. Learning how to effectively search for syntax-related questions is an important
skill too!)
X −µ
69. It is also surprising that √ ∼ Tn−1 . To empirically convince you of this, do these steps: Let
S/ n
X −µ
X ∼ N (3, σ 2 = 52 ). Draw an iid sample of size n = 3. Using this sample, calculate the ratio √ .
S/ n
Replicate this process 10000 times. Draw the density, and then in red, overlay the density of Tn−1 .
Include your code and a sketch of the plot.
70. Decide which is bigger and explain why. Only use R to confirm your answer.
71. Let X1 , . . . , X16 be a random sample from a normal distribution with mean 0. For what k is the
below inequality true? Explain your reasoning.
4X
P > k = 0.08
S
72. Each day, Zelda starts the morning with a cup of coffee from her Keurig machine. Lately, she
has begun to think the machine is malfunctioning because the amount dispensed is different than
the advertised average amount, 12 oz. To explore this, she picks 7 random days from the next
month and actually weighs her coffee using a calibrated kitchen scale. She believes that the coffee
dispensing amounts are normally distributed, and her seven data points give x = 12.9 and s = 0.7
oz.
73. In question 55, we explored home-field advantage (HFA) using win percentages. Now we revisit
the question using the “margin of victory”, which is amenable to means. Looking at data from
317 college (American) football games involving top-25-ranked teams, researchers found an average
margin of victory (home team score − away team score) of y = 4.57 with s = 18.29. Do these data
support the notion of HFA?
(a) Conduct an appropriate hypothesis test using α = 0.03. Then, create a one-sided CI that has
duality with this hypothesis test.
(b) After publishing your findings, a rival academic argues that you have failed to establish the
normality of the population being considered (margins of victory for college football games
involving top-25-ranked teams). Respond to this criticism.
74. The most brutal algebra moment in 181A: Show that the pdf of Tn converges to √ the pdfn of N (0, 1)
as n → ∞. When doing this problem may use Stirling’s Formula: n! ≈ 2πn · n e −n
you
a n
and a
a
helpful fact from Calculus I: lim 1 + = e . Also, it is fine if you assume Γ(r) = (r − 1)!,
n→∞ n
even when r is not an integer.
75. In class, I claimed that if a population distribution has moderate or severe skew, then as long as the
X −µ
sample size was about 30 or more, we could count on Tn−1 ≈ √ . Our goal here is to empirically
S/ n
show this. To begin, let n = 4 and draw the Tn−1 density on the interval (−4, 4) using a dashed
line (use the lty parameter in the plot function to make a dashed line). Now, let X ∼ Exp(λ = 3),
X −µ
draw a sample of size n, and compute √ . Repeat this process to get a total of 50000 t-scores.
S/ n
Plot the density of these atop Tn−1 using a solid line. Both plots should be in the same color. Next,
repeat the above process for n = 10, 30, and 60 (use a new plot and new color for each n value).
Include your code and a sketch of the four plots.
Note: This example will also show that n ≈ 30 is not some absolute rule. You might be quite
bothered by the difference in your two graphs when n = 30. In general, the larger the skew in the
population, the larger n should be to overwhelm its effect. There is no perfect guideline here: data
analysis and statistics involve human, imperfect decision making. Sorry to break your quantitative
heart.
76. If you look at a bottle of ibuprofen, it will likely list the amount of medicine per pill (usually, 200
mg). Of course, this is only an average, and if you carefully measured the amount from pill to pill,
you would get a normal distribution. The spread of this distribution is very important because
giving too much or too little medicine can be dangerous. Suppose that the standard deviation in
dosage is 10 mg based on current manufacturing processes. You’ve come up with a new way to
create the pills that you believe will increase the precision of the dosage. To check this claim, you
produce a bunch of pills and randomly select some to measure the dosage. You get these values:
206.5, 198.9, 205.2, 205.8, 192.0, 199.5, 182.5, 191.9, 197.6, 190.7, 186.8, 187.3, 192.0.
77. I recently attended a Padres (baseball) game that was on pace to be the shortest game in the
modern era (all hope was ruined when people started scoring in the 8th inning!). I also happened
to be watching TV in 2010 when the longest tennis game ever was played (11 hours, 5 minutes).
All this got me thinking about the times of sporting events. For the sake of fans, commentators,
and marketing departments, it is helpful to have low variability in the time it takes to complete an
event, and an average game time that is long enough to entertain fans, but not so long that people
get exhausted. You decide to explore the effects of various rule changes that occurred in the NHL
(ice hockey!). Prior to a new rule set lauched in the early 2000s, hockey game times were known to
be normally distributed with an average time of 2 hours and 36 minutes and a standard deviation
of 19.2 minutes. Using a random sample of 24 games from the 2012 season (these occurred after
rule changes; we’ll assume they are normally distributed), you find an average time of x = 2.316
hours with sx = 18.3 minutes. If the goal of these changes was to decrease the average time but
keep the variation the same, do you think the new rules have done it? Argue using two hypothesis
tests, each with α = 0.02. Do the variance test first, and then the mean test.
78. Suppose that X ∼ N (µ, 1). We plan to test H0 : µ = 0 vs. H1 : µ = 4 using a random sample of
Xn
size n. Show that the BCR will take the form C = {X| Xi > c} where c is a constant.
i=1
80. Suppose you take a random sample of size 12 from X ∼ N (0, σ 2 ), whose variance is unknown. You
plan to test H0 : σ 2 = 1 against H1 : σ 2 = 3. Find the BCR of size α = 0.08.
1 1
81. Let X ∼ Bernoulli(p). We wish to test H0 : p = vs. H1 : p < using a random sample of size
3 3
11
X
11. Let C = {X| Xi < c}. Show that C is a UMPCR and find its α when c = 2.3.
i=1
82. You and a friend are arguing about what model X better explains your data. You claim H0 :
X ∼ Geom(3/4), while your friend is excited about some obscure distribution H1 : X ∼ Y ule −
Simon(ρ = 2). Using Wikipedia, you see that the pmf for the Yule-Simon distribution is
ρ · ρ! · (x − 1)!
f (x; ρ) = where x = 1, 2, 3, . . . and ρ is some integer
(x + ρ)!
(a) Using R, draw a plot of the pmfs for each hypothesis on {1, . . . , 10}. Use black filled-in circles
for the H0 distribution, and red ones for H1 . Include your code and a sketch of both pmfs
on the same set of axes. Note that the Geometric distribution built into R only counts the
number of failures leading to the success, which is different than how we defined the Geometric
distribution, so you’ll need to fiddle with it to give the results we’re expecting. Type ?dgeom
to learn more. Also, you should not hunt down a package with the Yule-Simon distribution.
Instead, use R’s vectorized abilities to create any probabilities you need.
(b) What BCR do you get when drawing a sample of size 1 and using k = 1 in the NPL? In
addition to doing the math, you should explain how you can use your picture from part a to
check your answer.
1
83. Suppose X is a RV with pdf f (x; λ) = λe−λ|x| where x ∈ R and λ > 0. Let X1 , . . . , Xn be a
2
random sample collected to test H0 : λ = λ0 vs. H1 : λ < λ0 .
(a) Find a general form for the UMP test.
(b) Explain why no UMP test exists for H0 : λ = λ0 vs. H1 : λ ̸= λ0 .
(c) Show that |X| ∼ Exp(λ).
(d) Using parts a and c, find the UMPCR with α = 0.03 when n = 16 and λ0 = 2. Then, find the
power of the UMP test based on this UMPCR if λ is actually 1.
84. According to 83b, there is no UMP test for H0 : λ = λ0 vs. H1 : λ ̸= λ0 . Find the GLRT for this
setup, simplifying your answer as much as you can.
85. You have decided to become an educational researcher and want to find a good model to describe test
scores in a given teacher’s classroom. You’re choosing between two different cdfs: H0 : F0 (x) = x3
and H1 : F1 (x) = x4 where 0 ≤ x ≤ 1 (here, x = 0.79 would mean a test score of 79%). Assume a
sample size of 1 is drawn for all parts of this problem.
(a) For what x is the likelihood ratio less than 1?
(b) Find the general form for a BCR for this test.
(c) Find the critical region of a best test with significance level α and the power of such a test.
m m−1 −xm /θ
86. Let X be a RV with pdf f (x; θ) = x e , where x > 0, θ > 0, and m is some known,
θ
positive constant. Draw a random sample X1 , . . . , Xn to test H0 : θ = θ0 against H1 : θ > θ0 .
(a) Find the UMPCR.
(b) Suppose n = 10 and θ0 = 8. Find the UMPCR that has significance level α = 0.02.
Hint: It helps to find the pdf for X m . To do so, use this result (sometimes taught in Math
180A): If Y is a RV with pdf fY (y) and h is either increasing or decreasing, then U = h(Y )
dy
has pdf fU (u) = fY (y) · where y = h−1 (u).
du
87. Suppose you draw a random sample X1 , . . . , Xn from a CRV with density
2
x −x
f (x; θ) = 2 exp , where x ≥ 0, θ > 0
θ 2θ2
(a) Find θbMLE .
(b) Find the GLRT to test H0 : θ = 1 vs. H1 : θ ̸= 1, simplifying as much as you can.
88. Using this Desmos link, experiment to find the answers to these questions related to the Beta
distribution:
(a) For what pair(s) (a, b) does Beta(a, b) become U nif (0, 1)?
(b) For what pair(s) (a, b) will the mode be at θ = 0.5?
(c) If you had a very strong belief that θ = 0.3, but would consider other values, what a and b
might be reasonable to use when setting up a distribution to model these prior beliefs using
the mode (answers will vary!)?
89. When information (coded as 0s and 1s) is sent across the internet, it can occasionally become
corrupted (i.e., a 0 sent becomes a 1 received, or a 1 sent becomes a 0 received). Suppose that over
a given connection, the ratio of 0s to 1s sent out is 2:7. Also, suppose that a 0 will be corrupted to
a 1 with probability 1/3, and a 1 will be corrupted to a 0 with probability 1/5. If a 1 is received
by a server, what is the probability that a 1 was actually sent by the sender?
90. We often speak about unfair coins in this class, and you might wonder if it is possible to create
such a monster. It turns out that it’s not that hard if you’re willing to bend your coin so it has a
U-shape. Read this blog post to get started.
Suppose I’ve bent a coin and want to explore pheads using Beta(a, b) to create a non-informative
prior. If we flip the coin 200 times and get 140 tails, what is the posterior distribution? What is
the posterior distribution if the prior is Beta(10, 10)? If the prior is Beta(a, b)?
91. Suppose you are trying to fit a Weibull distribution to your data where X ∼ W eibull(b) has
You’re unsure what value b should have, and your friend suggests encoding your beliefs about b into
a Gamma(r, α) prior. Given the data x1 , . . . , xn , show that b ∼ Gamma(r, α) is a conjugate prior
for this Weibull distribution, and find the updating rule that converts the prior hyper-parameters
into posterior hyper-parameters.
92. Once again, your friend is behind her computer using a random number generator modeled on
U nif (0, θ). You don’t know θ, so you decide to model your belief in what value your friend would
choose for θ. You’ve got a hunch that θ won’t be less than π, and that really large values for θ are
unlikely (would your friend really choose θ = 1234567890?). One option for this belief structure is
the Pareto distribution:
απ α
g(θ; π, α) = α+1 where θ ≥ π
θ
Here, α is a hyper-parameter that controls how fast the likelihood of θ decays as θ gets bigger.
Show this distribution is a conjugate prior for the uniform distribution using the data x1 , . . . , xn
and give the updating rule. (Hint: Use indicator functions throughout.)
93. Another famous distribution is the log-normal which, when indexed by the parameter τ has the pdf
√
τ h τ i
fX (x; τ ) = √ exp − (ln x)2 where x > 0, τ > 0
x 2π 2
(a) Find the Cramer-Rao Lower Bound (CRLB) on the variance of an unbiased estimator of τ for
a sample of size n.
(b) Suppose you wish to do Bayesian analysis on τ , so you assign it the prior distribution τ ∼
Gamma(r, λ). Given a random sample of data x1 , · · · , xn , show that the Gamma distribution
is a conjugate prior for the above log-normal distribution, and explain how the values r and λ
get updated in the posterior distribution.