0% found this document useful (0 votes)
70 views

STAT 2006 Chapter 2 - 2022

1. Point estimation involves using sample data to estimate unknown population parameters. A point estimator is a statistic that provides a single value as an estimate of the parameter. 2. The maximum likelihood estimation (MLE) method chooses estimates that make the observed sample data most probable or "likely". The MLE is the value of the parameter(s) that maximize the likelihood function. 3. For a sample from a normal distribution with unknown mean μ and variance σ2, the MLEs of μ and σ2 are the sample mean and sample variance, respectively. The document provides an example of finding MLEs using a sample of female student weights.

Uploaded by

Hiu Yin Yuen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views

STAT 2006 Chapter 2 - 2022

1. Point estimation involves using sample data to estimate unknown population parameters. A point estimator is a statistic that provides a single value as an estimate of the parameter. 2. The maximum likelihood estimation (MLE) method chooses estimates that make the observed sample data most probable or "likely". The MLE is the value of the parameter(s) that maximize the likelihood function. 3. For a sample from a normal distribution with unknown mean μ and variance σ2, the MLEs of μ and σ2 are the sample mean and sample variance, respectively. The document provides an example of finding MLEs using a sample of female student weights.

Uploaded by

Hiu Yin Yuen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Point Estimation

STAT 2006 Chapter 2


Estimation

Presented by
Simon Cheung
Email: [email protected]

Department of Statistics, The Chinese University of Hong Kong

STAT 2006 - Jan 2022 1


Introduction

Suppose we want to learn a population mean 𝜇𝜇 or a population proportion 𝑝𝑝. For


example,
• 𝑝𝑝 is the unknown proportion of college students in Hong Kong who do not have a
smart phone.
• 𝜇𝜇 is the unknown mean number of hours it takes to serve a customer in a restaurant.
In either case, we cannot survey the entire population. That is, we cannot survey all
college students in Hong Kong, nor can we measure the service time of all customers in
the restaurant. So, we take a random sample from the population, and use the data to
estimate the value of the population parameter.
To provide a good estimate of the population parameter, we introduce the concept of an
unbiased estimator and minimum variance unbiased estimator.

STAT 2006 - Jan 2022 2


Point Estimation

We denote a random sample of size 𝑛𝑛 as 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 and denote the corresponding
observed values of the random sample as 𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 .
The pdf 𝑓𝑓 𝑥𝑥; 𝜃𝜃 of the random variable 𝑋𝑋 depends on an unknown parameter 𝜃𝜃, taking
value in a set Ω. Ω is called the parameter space.
For example, if 𝜇𝜇 is the GPA of all college students, then the parameter space is Ω =
𝜇𝜇: 0 ≤ 𝜇𝜇 ≤ 4.3 , and, if 𝑝𝑝 denotes the proportion of students who works part-time, then
the parameter space is Ω = 𝑝𝑝: 0 ≤ 𝑝𝑝 ≤ 1 .
Definition. A statistic is a function of 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 , write 𝑢𝑢 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 . A point
estimator 𝜃𝜃� to estimate 𝜃𝜃 is a statistic of the random sample 𝜃𝜃� = 𝜃𝜃� 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 .

STAT 2006 - Jan 2022 3


Point Estimation

For example,
𝑛𝑛
1
• The function 𝑋𝑋� = � 𝑋𝑋𝑗𝑗 is a point estimator of the population mean 𝜇𝜇.
𝑛𝑛 𝑗𝑗=1
𝑛𝑛
1
• Let 𝑋𝑋𝑖𝑖 = 0 or 1. The function 𝑝𝑝̂ = � 𝑋𝑋𝑗𝑗 is a point estimator of the population
𝑛𝑛 𝑗𝑗=1
proportion 𝑝𝑝.
𝑛𝑛 2
1
• The function 𝑆𝑆 2 = � 𝑋𝑋𝑗𝑗 − 𝑋𝑋� is a point estimator of the population variance
𝑛𝑛−1 𝑗𝑗=1
𝜎𝜎 2 .

STAT 2006 - Jan 2022 4


Point Estimation

Definition. A point estimate of 𝜃𝜃 is computed from the observed sample 𝑢𝑢 𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 ,
where 𝑢𝑢 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 is a point estimator of 𝜃𝜃.
For example, if 𝑥𝑥𝑖𝑖 s are the observed GPA of a random sample of 88 students, then
88
1
𝑥𝑥̅ = � 𝑥𝑥𝑗𝑗
88
𝑗𝑗=1
is a point estimate of the mean GPA 𝜇𝜇 of all students in the population. Define 𝑥𝑥𝑖𝑖 = 0 if a
student does not work part-time and 𝑥𝑥𝑖𝑖 = 1 if a student works part-time, then 𝑝𝑝̂ = 0.11 is
a point estimate of 𝑝𝑝, the proportion of all students in the population who works part-
time.

STAT 2006 - Jan 2022 5


Maximum Likelihood Estimation

Basic idea. A good estimate of the unknown parameter 𝜃𝜃 is the value of 𝜃𝜃 which maximizes
the likelihood of getting the data we observed.
Suppose that 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 is a random sample for which the pdf of each 𝑋𝑋𝑖𝑖 is 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝜃𝜃 .
Then, the joint pdf of 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 is, as a function of 𝜃𝜃,
𝑛𝑛

𝐿𝐿 𝜃𝜃 = 𝑃𝑃 𝑋𝑋1 = 𝑥𝑥1 , 𝑋𝑋2 = 𝑥𝑥2 , … , 𝑋𝑋𝑛𝑛 = 𝑥𝑥𝑛𝑛 = � 𝑓𝑓 𝑥𝑥𝑗𝑗 , 𝜃𝜃 ,


𝑗𝑗=1

where, by definition, 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are independent. Therefore, our objective is to find 𝜃𝜃�
such that
𝜃𝜃� = argmax 𝐿𝐿 𝜃𝜃 .
𝜃𝜃∈Ω
We call 𝐿𝐿 𝜃𝜃 the likelihood function of 𝜃𝜃.

STAT 2006 - Jan 2022 6


Maximum Likelihood Estimation
For example. Suppose that 𝑋𝑋 = 0 if a randomly selected student does not possess a car and
𝑋𝑋 = 1 if a randomly selected student possesses a car. Let 𝑝𝑝 be the probability that 𝑋𝑋 = 1.
Note that 𝑝𝑝 = 𝐸𝐸 𝑋𝑋 . Now we draw a random sample 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 of 𝑛𝑛 students. Find the
maximum likelihood estimator of 𝑝𝑝.
Answer. The pdf of 𝑋𝑋 is 𝑓𝑓 𝑥𝑥, 𝑝𝑝 = 𝑝𝑝 𝑥𝑥 1 − 𝑝𝑝 1−𝑥𝑥 , where 𝑥𝑥 = 0 or 𝑥𝑥 = 1 and 0 < 𝑝𝑝 < 1.
Hence, the likelihood function of 𝑝𝑝 is
𝑛𝑛 𝑛𝑛 𝑛𝑛
� 𝑥𝑥𝑗𝑗 𝑛𝑛−� 𝑥𝑥𝑗𝑗
𝑥𝑥𝑗𝑗 1−𝑥𝑥𝑗𝑗
𝐿𝐿 𝑝𝑝 = � 𝑝𝑝 1 − 𝑝𝑝 = 𝑝𝑝 𝑗𝑗=1 1 − 𝑝𝑝 𝑗𝑗=1 .
𝑗𝑗=1
Since the log function is an increasing function, the 𝑝𝑝̂ which maximizes 𝐿𝐿 𝑝𝑝 also maximizes
ln 𝐿𝐿 𝑝𝑝 . We have
𝑛𝑛 𝑛𝑛

ln 𝐿𝐿 𝑝𝑝 = � 𝑥𝑥𝑗𝑗 ln 𝑝𝑝 + 𝑛𝑛 − � 𝑥𝑥𝑗𝑗 ln 1 − 𝑝𝑝 .
𝑗𝑗=1 𝑗𝑗=1

STAT 2006 - Jan 2022 7


Maximum Likelihood Estimation
Taking the first derivative of the log-likelihood with respect to 𝑝𝑝, we have
𝑛𝑛 𝑛𝑛
� 𝑥𝑥𝑗𝑗 𝑛𝑛 − � 𝑥𝑥𝑗𝑗
𝜕𝜕 ln 𝐿𝐿 𝑝𝑝 𝑗𝑗=1 𝑗𝑗=1
= − .
𝜕𝜕𝑝𝑝 𝑝𝑝 1 − 𝑝𝑝
𝜕𝜕 ln 𝐿𝐿 𝑝𝑝
The 𝑝𝑝̂ which maximizes ln 𝐿𝐿 𝑝𝑝 satisfies � = 0. By setting the first derivative to 0,
𝜕𝜕𝜕𝜕 𝑝𝑝=𝑝𝑝�
𝑛𝑛
1
we find that an estimate of 𝑝𝑝 is 𝑝𝑝̂ = � 𝑥𝑥𝑗𝑗 . To verify that 𝑝𝑝̂ is a maximum, we take the
𝑛𝑛 𝑗𝑗=1
second derivative with respect to 𝑝𝑝 to obtain
𝜕𝜕 2 ln 𝐿𝐿 𝑝𝑝 𝑛𝑛𝑥𝑥̅ 𝑛𝑛 1 − 𝑥𝑥̅
2
=− − 2
< 0.
𝜕𝜕𝑝𝑝 𝑝𝑝 1 − 𝑝𝑝

It follows that 𝑝𝑝̂ which maximizes ln 𝐿𝐿 𝑝𝑝 . Thus, the mle of 𝑝𝑝 is 𝑝𝑝̂ = 𝑋𝑋.

STAT 2006 - Jan 2022 8


Maximum Likelihood Estimation
Let 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 be a random sample of size 𝑛𝑛 from a distribution that depends on one or
more parameters 𝜃𝜃1 , 𝜃𝜃2 , … , 𝜃𝜃𝑚𝑚 with pdf 𝑓𝑓 𝑥𝑥, 𝜃𝜃1 , 𝜃𝜃2 , … , 𝜃𝜃𝑚𝑚 . Suppose that the parameter
space is Ω. We have the following definitions.
• Regarded as a function of 𝜃𝜃1 , 𝜃𝜃2 , … , 𝜃𝜃𝑚𝑚 , the likelihood function for 𝜃𝜃1 , 𝜃𝜃2 , … , 𝜃𝜃𝑚𝑚 is
defined as
𝑛𝑛

𝐿𝐿 𝜃𝜃1 , 𝜃𝜃2 , … , 𝜃𝜃𝑚𝑚 = � 𝑓𝑓 𝑥𝑥𝑖𝑖 , 𝜃𝜃1 , 𝜃𝜃2 , … , 𝜃𝜃𝑚𝑚 ,


𝑗𝑗=1
where 𝜃𝜃1 , 𝜃𝜃2 , … , 𝜃𝜃𝑚𝑚 ∈ Ω.
• Let 𝜃𝜃�𝑖𝑖 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 , 𝑖𝑖 = 1,2, … , 𝑚𝑚 be 𝑚𝑚 statistics for the random sample 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 .
If 𝜃𝜃�1 𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 , … , 𝜃𝜃�𝑚𝑚 𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 maximizes the likelihood function, then
𝜃𝜃�𝑖𝑖 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 is the maximum likelihood estimator (mle) of 𝜃𝜃𝑖𝑖 for 𝑖𝑖 = 1,2, … , 𝑚𝑚. The
corresponding observed values, 𝜃𝜃�𝑖𝑖 𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 , is called the maximum likelihood
estimate of 𝜃𝜃𝑖𝑖 for 𝑖𝑖 = 1,2, … , 𝑚𝑚.
STAT 2006 - Jan 2022 9
Maximum Likelihood Estimation

Example. Suppose the weights of randomly selected female college students are normally
distributed with unknown mean 𝜇𝜇 and variance 𝜎𝜎 2 . A random sample of 10 female college
students have the following weights (in pounds): 115, 122, 130, 127, 149, 160, 152, 138,
149, 180. Identify the likelihood function and the mle of 𝜇𝜇, the mean weight of all female
college students, and that of 𝜎𝜎 2 . Using the given sample, find a maximum likelihood
estimate of 𝜇𝜇 and 𝜎𝜎 2 .
Answer. Let 𝑋𝑋 be the weight in pounds of a randomly selected female college student. The
1 𝑥𝑥−𝜇𝜇 2
pdf of 𝑋𝑋 is 𝑓𝑓 𝑥𝑥, 𝜇𝜇, 𝜎𝜎 2 = exp − , for 𝑥𝑥 ∈ ℝ. The parameter space is Ω =
2𝜎𝜎2
2𝜋𝜋𝜎𝜎2
𝜇𝜇, 𝜎𝜎 , 𝜇𝜇 ∈ ℝ, 𝜎𝜎 > 0 . Thus, the likelihood function of 𝜇𝜇 and 𝜎𝜎 2 is given by
𝑛𝑛

𝑛𝑛 1 2
𝐿𝐿 𝜇𝜇, 𝜎𝜎 2 = 2𝜋𝜋𝜋𝜋 2 2 exp − 2 � 𝑥𝑥𝑗𝑗 − 𝜇𝜇 .
2𝜎𝜎
𝑗𝑗=1

STAT 2006 - Jan 2022 10


Maximum Likelihood Estimation
To find mle of 𝜇𝜇, we find 𝜇𝜇̂ which maximizes the log-likelihood function, that is log 𝐿𝐿 𝜇𝜇, 𝜎𝜎 2
assuming that 𝜎𝜎 is a known constant. Ignoring the constant terms,
𝑛𝑛
2
𝑛𝑛 2
1 2
log 𝐿𝐿 𝜇𝜇, 𝜎𝜎 = − log 𝜎𝜎 − 2 � 𝑥𝑥𝑗𝑗 − 𝜇𝜇 .
2 2𝜎𝜎
𝑗𝑗=1

Differentiating log 𝐿𝐿 𝜇𝜇, 𝜎𝜎 2 with respect to 𝜇𝜇, we have


𝑛𝑛
𝜕𝜕 2
1
log 𝐿𝐿 𝜇𝜇, 𝜎𝜎 = 2 � 2 𝑥𝑥𝑗𝑗 − 𝜇𝜇 .
𝜕𝜕𝜕𝜕 2𝜎𝜎
𝑗𝑗=1
Also, differentiating log 𝐿𝐿 𝜇𝜇, 𝜎𝜎 2 with respect to 𝜎𝜎 2 , we
have
𝑛𝑛
𝜕𝜕 2
𝑛𝑛 1 2
2
log 𝐿𝐿 𝜇𝜇, 𝜎𝜎 = − 2 + 2 2
� 𝑥𝑥𝑗𝑗 − 𝜇𝜇 .
𝜕𝜕𝜎𝜎 2𝜎𝜎 2 𝜎𝜎
𝑗𝑗=1

STAT 2006 - Jan 2022 11


Maximum Likelihood Estimation
The 𝜇𝜇,̂ 𝜎𝜎� 2 which maximizes the log 𝐿𝐿 satisfies that
𝜕𝜕 𝜕𝜕
log 𝐿𝐿 𝜇𝜇, 𝜎𝜎 = 0 and 2
log 𝐿𝐿 𝜇𝜇, 𝜎𝜎 = 0.
𝜕𝜕𝜕𝜕 𝜕𝜕𝜎𝜎
𝑛𝑛 2
1
It follows that 𝜇𝜇̂ = 𝑋𝑋� and 𝜎𝜎� 2 = � �
𝑋𝑋𝑗𝑗 − 𝑋𝑋 . To show that 𝜇𝜇,̂ 𝜎𝜎� 2 is a maximum, we
𝑛𝑛 𝑗𝑗=1
differentiate log 𝐿𝐿 𝜇𝜇, 𝜎𝜎 twice with respect to 𝜇𝜇 and with respect to 𝜎𝜎 2 respectively. We
have
𝜕𝜕 2 𝑛𝑛
2
log 𝐿𝐿 𝜇𝜇, 𝜎𝜎 � =− 2<0
𝜕𝜕𝜇𝜇 2
𝜎𝜎�
� ,�
𝜇𝜇 𝜎𝜎
𝜕𝜕 2 𝑛𝑛
log 𝐿𝐿 𝜇𝜇, 𝜎𝜎 � = − 2 < 0.
𝜕𝜕 𝜎𝜎 2 2 2𝜎𝜎�
𝜎𝜎 2
� ,�
𝜇𝜇

STAT 2006 - Jan 2022 12


Maximum Likelihood Estimation
𝑛𝑛
2 2
𝜕𝜕 log 𝐿𝐿 𝑛𝑛 𝜕𝜕 log 𝐿𝐿 𝑛𝑛 1 2
2
= − 2, 2 2
= − 2 � 𝑥𝑥𝑗𝑗 − 𝜇𝜇
𝜕𝜕𝜇𝜇 𝜎𝜎 𝜕𝜕 𝜎𝜎 2 𝜎𝜎 2 2 𝜎𝜎 3
𝑗𝑗=1
and
𝜕𝜕 2 log 𝐿𝐿 𝜕𝜕 2 log 𝐿𝐿 𝑛𝑛
2
= 2
=− 2 2
𝑋𝑋� − 𝜇𝜇
𝜕𝜕𝜎𝜎 𝜕𝜕𝜕𝜕 𝜕𝜕𝜕𝜕𝜕𝜕𝜎𝜎 𝜎𝜎
We have
2
𝜕𝜕 2 log 𝐿𝐿 𝜕𝜕 2 log 𝐿𝐿 𝜕𝜕 2 log 𝐿𝐿 𝑛𝑛2
𝐷𝐷 𝜇𝜇,̂ 𝜎𝜎� 2 = 2 � 2 2 � − 2 � = >0
𝜕𝜕𝜇𝜇 𝜕𝜕 𝜎𝜎 𝜕𝜕𝜎𝜎 𝜕𝜕𝜕𝜕 2 𝜎𝜎� 2 3
𝜇𝜇 𝜎𝜎 2
� ,� 𝜎𝜎 2
� ,�
𝜇𝜇 𝜎𝜎 2
� ,�
𝜇𝜇
Ref:
http://sites.science.oregonstate.edu/math/home/programs/undergrad/CalculusQuestSt
udyGuides/vcalc/min_max/min_max.html
STAT 2006 - Jan 2022 13
Maximum Likelihood Estimation
Based on the given sample, a maximum likelihood estimate of 𝜇𝜇 is
𝑛𝑛
1 115+122+130+127+149+160+152+138+149+180
𝜇𝜇̂ = � 𝑥𝑥𝑗𝑗 = = 142.2 pounds,
𝑛𝑛 𝑗𝑗=1 10

and that of 𝜎𝜎 2 is
𝑛𝑛 𝑛𝑛
1 1 1152 + ⋯ + 1802
2
2
𝜎𝜎� = � 𝑥𝑥𝑗𝑗 − 𝑥𝑥̅ = � 𝑥𝑥𝑗𝑗2 − 𝑥𝑥̅ 2 = − 142.22 = 347.96.
𝑛𝑛 𝑛𝑛 10
𝑗𝑗=1 𝑗𝑗=1

Note that
• the estimator is defined using capital letters to reflect that it is a random variable. But an
estimate is defined using lowercase letters to reflect that its value is fixed based on the
given random sample.
• the mle of 𝜎𝜎 2 is different from the sample variance 𝑆𝑆 2 .

STAT 2006 - Jan 2022 14


Unbiased Estimation
We show that if 𝑋𝑋 is a Bernoulli random variable with parameter 𝑝𝑝, then 𝑝𝑝̂ = 𝑋𝑋� is the mle
𝑛𝑛 2
1
of 𝑝𝑝, and if 𝑋𝑋~𝑛𝑛 𝜇𝜇, 𝜎𝜎 2
random variable, then 𝜇𝜇̂ = 𝑋𝑋� and 𝜎𝜎� 2 = � 𝑋𝑋𝑗𝑗 − 𝑋𝑋� are the
𝑛𝑛 𝑗𝑗=1
mle of 𝜇𝜇, 𝜎𝜎 2 respectively. However, are the mles good in any sense? A measure of
goodness is unbiasedness.
Definition. An estimator 𝜃𝜃� 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 of 𝜃𝜃 is unbiased if 𝐸𝐸 𝜃𝜃� 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 = 𝜃𝜃.
Otherwise, it is called a biased estimator of 𝜃𝜃 and the quantity 𝐸𝐸 𝜃𝜃� − 𝜃𝜃 ≠ 0 is called the
� .
Bias of 𝜃𝜃.
Example. If 𝑋𝑋 is a Bernoulli random variable with parameter 𝑝𝑝, then the mle of 𝑝𝑝, that is
� is an unbiased estimator of 𝑝𝑝.
𝑝𝑝̂ = 𝑋𝑋,
𝑛𝑛
1
𝐸𝐸 𝑝𝑝̂ = � 𝐸𝐸 𝑋𝑋𝑗𝑗 = 𝑝𝑝.
𝑛𝑛
𝑗𝑗=1

STAT 2006 - Jan 2022 15


Unbiased Estimation

Example. If 𝑋𝑋~𝑛𝑛 𝜇𝜇, 𝜎𝜎 2 random variable, then are the mle of 𝜇𝜇, 𝜎𝜎 2 , that is 𝜇𝜇̂ = 𝑋𝑋� and
𝑛𝑛 2
1
2
𝜎𝜎� = � �
𝑋𝑋𝑗𝑗 − 𝑋𝑋 , unbiased?
𝑛𝑛 𝑗𝑗=1
𝑛𝑛
1
Answer. 𝐸𝐸 𝜇𝜇̂ = � 𝐸𝐸 𝑋𝑋𝑗𝑗 = 𝜇𝜇
𝑛𝑛 𝑗𝑗=1
𝑛𝑛 𝑛𝑛
2
1 1 𝜎𝜎 1 2
𝐸𝐸 𝜎𝜎� = � 𝐸𝐸 𝑋𝑋𝑗𝑗2 − 𝐸𝐸
2
𝑋𝑋� 2 2 2
= � 𝜎𝜎 + 𝜇𝜇 − 2
+ 𝜇𝜇 = 1 − 𝜎𝜎 ≠ 𝜎𝜎 2 .
𝑛𝑛 𝑛𝑛 𝑛𝑛 𝑛𝑛
𝑗𝑗=1 𝑗𝑗=1

Hence, 𝜇𝜇̂ is an unbiased estimator of 𝜇𝜇 but 𝜎𝜎� 2 is a biased estimator of 𝜎𝜎 2 . Note also that
𝜎𝜎 2
𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋� = and 𝐸𝐸 𝑋𝑋 2 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 + 𝐸𝐸 𝑋𝑋 2 .
𝑛𝑛

STAT 2006 - Jan 2022 16


Unbiased Estimation

Example. If 𝑋𝑋~𝑛𝑛 𝜇𝜇, 𝜎𝜎 2 random variable, find an unbiased estimator of 𝜎𝜎 2 .


Answer. Recall that
𝑛𝑛 − 1 𝑆𝑆 2 2
2
~𝜒𝜒 𝑛𝑛−1 .
𝜎𝜎
𝑛𝑛−1 𝑆𝑆 2 2 = 𝜎𝜎 2 . It follows that an unbiased estimator of 𝜎𝜎 2 is
Hence, 𝐸𝐸 = 𝑛𝑛 − 1 ⟹ 𝐸𝐸 𝑆𝑆
𝜎𝜎 2
𝑆𝑆 2 . However, 𝑆𝑆 is not an unbiased estimator of 𝜎𝜎.

STAT 2006 - Jan 2022 17


Unbiased Estimation

Definition. 𝜃𝜃� is an asymptotically unbiased estimator of 𝜃𝜃 if


lim 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 𝜃𝜃� = lim 𝐸𝐸 𝜃𝜃� − 𝜃𝜃 = 0,
𝑛𝑛→∞ 𝑛𝑛→∞
where 𝑛𝑛 is the sample size.
𝑛𝑛 2
1
Example. If 𝑋𝑋~𝑛𝑛 𝜇𝜇, 𝜎𝜎 2 random variable, then 𝜎𝜎� 2 = � �
𝑋𝑋𝑗𝑗 − 𝑋𝑋 , the mle of 𝜎𝜎 2 , is a
𝑛𝑛 𝑗𝑗=1
1
biased estimator of 𝜎𝜎 2 .
But, since 𝐸𝐸 𝜎𝜎� 2 = 1 − 𝜎𝜎 2 ⟶ 𝜎𝜎 2 , as 𝑛𝑛 tends to infinity, 𝜎𝜎� 2 is
𝑛𝑛
asymptotically unbiased for 𝜎𝜎 2 .

STAT 2006 - Jan 2022 18


Method of Moments

Definition. Let 𝑋𝑋 be a random variable with finite mean 𝜇𝜇.


• 𝐸𝐸 𝑋𝑋 𝑘𝑘 is the 𝑘𝑘 𝑡𝑡𝑡 theoretical moment of the distribution about the origin, for 𝑘𝑘 = 1,2, …
• 𝐸𝐸 𝑋𝑋 − 𝜇𝜇 𝑘𝑘 is the 𝑘𝑘 𝑡𝑡𝑡 theoretical moment of the distribution about the mean, for 𝑘𝑘 =
1,2, …
𝑛𝑛
1
• 𝑀𝑀𝑘𝑘 = � 𝑋𝑋𝑗𝑗𝑘𝑘 is the 𝑘𝑘 𝑡𝑡𝑡 sample moment, for 𝑘𝑘 = 1,2, …
𝑛𝑛 𝑗𝑗=1
𝑛𝑛 𝑘𝑘
1
• 𝑀𝑀𝑘𝑘∗ = � 𝑋𝑋𝑗𝑗 − 𝑋𝑋� is the 𝑘𝑘 𝑡𝑡𝑡 sample moment about the sample mean, for 𝑘𝑘 =
𝑛𝑛 𝑗𝑗=1
1,2, …

STAT 2006 - Jan 2022 19


Method of Moments
Suppose that the pdf of 𝑋𝑋 is characterized by 𝑚𝑚 parameters, 𝜃𝜃1 , 𝜃𝜃2 , … , 𝜃𝜃𝑚𝑚 , the theoretical moments
should be functions of the parameters, that is 𝐸𝐸 𝑋𝑋 𝑘𝑘 = 𝑔𝑔𝑘𝑘 𝜃𝜃1 , 𝜃𝜃2 , … , 𝜃𝜃𝑚𝑚 , for 𝑘𝑘 = 1,2, … Since the
empirical distribution of 𝑋𝑋 converges to its pdf as the sample size increases to infinity, equating the
first 𝑚𝑚 sample moments with the corresponding theoretical moments create 𝑚𝑚 moment equations
to solve for 𝜃𝜃1 , 𝜃𝜃2 , … , 𝜃𝜃𝑚𝑚 . The values 𝜃𝜃�1 , 𝜃𝜃�2 , … , 𝜃𝜃�𝑚𝑚 that satisfies the moment equations are called
the moment estimators for 𝜃𝜃1 , 𝜃𝜃2 , … , 𝜃𝜃𝑚𝑚 respectively. The steps are
𝑛𝑛
1
• Equate the first sample moment about the origin 𝑀𝑀1 = � 𝑋𝑋𝑗𝑗 = 𝑋𝑋� to the first theoretical
𝑛𝑛 𝑗𝑗=1
moment 𝐸𝐸 𝑋𝑋 .
𝑛𝑛
1
• Equate the second sample moment about the origin 𝑀𝑀2 = � 𝑋𝑋𝑗𝑗2 = 𝑋𝑋 2 to the second
𝑛𝑛 𝑗𝑗=1
theoretical moment 𝐸𝐸 𝑋𝑋 2 .
• Continue equating sample moments about the origin, 𝑀𝑀𝑘𝑘 , with the corresponding theoretical
moments 𝐸𝐸 𝑋𝑋 𝑘𝑘 , for 𝑘𝑘 = 3,4, … until we have as many equations as we have parameters.
• Solve for the parameters.

STAT 2006 - Jan 2022 20


Method of Moments

Let 𝑋𝑋 be a Bernoulli random variable with parameter 𝑝𝑝. Find the moment estimator of 𝑝𝑝.
Answer. Since 𝐸𝐸 𝑋𝑋 = 𝑝𝑝, equating the first sample moment 𝑋𝑋� to the first theoretical
� Hence the moment estimator of 𝑝𝑝 is 𝑋𝑋.
moment 𝑝𝑝 to have 𝑝𝑝� = 𝑋𝑋. �
Let 𝑋𝑋~𝑛𝑛 𝜇𝜇, 𝜎𝜎 2 random variable. Find the moment estimators of 𝜇𝜇 and 𝜎𝜎 2 .
Answer. Since 𝐸𝐸 𝑋𝑋 = 𝜇𝜇 and 𝐸𝐸 𝑋𝑋 2 = 𝜇𝜇2 + 𝜎𝜎 2 , equating the first and the second sample
moment to the corresponding theoretical moment to obtain 𝜇𝜇� = 𝑋𝑋� and 𝜇𝜇� 2 + 𝜎𝜎� 2 = 𝑋𝑋 2 . We
can solve for 𝜇𝜇� and 𝜎𝜎� 2 to obtain 𝜇𝜇� = 𝑋𝑋� and 𝜎𝜎� 2 = 𝑋𝑋 2 −𝑋𝑋� 2 .

STAT 2006 - Jan 2022 21


Method of Moments

Another form of the method.


𝑛𝑛
1
• Equate the first sample moment about the origin 𝑀𝑀1 = � 𝑋𝑋𝑗𝑗 = 𝑋𝑋� to the first
𝑛𝑛 𝑗𝑗=1
theoretical moment 𝐸𝐸 𝑋𝑋 .
𝑛𝑛 2
1
• Equate the second sample moment about the mean 𝑀𝑀2∗ = � 𝑋𝑋𝑗𝑗 − 𝑋𝑋� to the
𝑛𝑛 𝑗𝑗=1
second theoretical moment about the mean 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 = 𝐸𝐸 𝑋𝑋 − 𝜇𝜇 2 .
• Continue equating sample moments about the mean, 𝑀𝑀𝑘𝑘∗ , with the corresponding
theoretical moments about the mean 𝐸𝐸 𝑋𝑋 − 𝜇𝜇 𝑘𝑘 , for 𝑘𝑘 = 3,4, … until we have as many
equations as we have parameters.
• Solve for the parameters.

STAT 2006 - Jan 2022 22


Method of Moments

Example. Let 𝑋𝑋 be a Gamma random variable with parameters 𝛼𝛼 and 𝜃𝜃, where its pdf is
1 −
𝑥𝑥
𝛼𝛼−1 𝑒𝑒 𝜃𝜃 , 𝑥𝑥 > 0.
𝑓𝑓 𝑥𝑥 = 𝑥𝑥
Γ 𝛼𝛼 𝜃𝜃 𝛼𝛼
Find the moment estimators of 𝛼𝛼 and 𝜃𝜃.
Answer. The first theoretical moment about the origin is 𝐸𝐸 𝑋𝑋 = 𝛼𝛼𝛼𝛼, and the second
theoretical moment about the mean is 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 = 𝛼𝛼𝜃𝜃 2 . By equating the first sample
moment about the origin and the second sample moment about the mean to the
𝑛𝑛 2
1
corresponding theoretical moments, we have 𝛼𝛼� 𝜃𝜃� = 𝑋𝑋� and 𝛼𝛼� 𝜃𝜃� 2 = � 𝑋𝑋𝑗𝑗 − 𝑋𝑋� . It
𝑛𝑛 𝑗𝑗=1
𝑛𝑛
1 2 𝑋𝑋� 𝑛𝑛𝑋𝑋� 2
follows that 𝜃𝜃� = � 𝑋𝑋𝑗𝑗 − 𝑋𝑋� and 𝛼𝛼� = � = 𝑛𝑛 .
𝑛𝑛𝑋𝑋� 𝑗𝑗=1 𝜃𝜃 � 𝑋𝑋𝑗𝑗 −𝑋𝑋�
2
𝑗𝑗=1

STAT 2006 - Jan 2022 23


Method of Moments

Example. Let 𝑋𝑋~𝑛𝑛 𝜇𝜇, 𝜎𝜎 2 random variable. Find the moment estimators of 𝜇𝜇 and 𝜎𝜎 2 .
Answer.
The first theoretical moment about the origin is 𝐸𝐸 𝑋𝑋 = 𝜇𝜇, and the second theoretical
moment about the mean is 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 = 𝜎𝜎 2 . By equating the first sample moment about the
origin and the second sample moment about the mean to the corresponding theoretical
𝑛𝑛 2
1
moments, we have 𝜇𝜇� = 𝑋𝑋� and 𝜎𝜎� 2 = � �
𝑋𝑋𝑗𝑗 − 𝑋𝑋 .
𝑛𝑛 𝑗𝑗=1

STAT 2006 - Jan 2022 24


UMVUE

Definition. An estimator 𝜃𝜃� of 𝜃𝜃 is called the uniformly minimum variance unbiased


estimator (UMVUE) if 𝜃𝜃� = argmin 𝑉𝑉𝑉𝑉𝑉𝑉 𝜃𝜃� .
� 𝑖𝑖𝑖𝑖 𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢
𝜃𝜃
Definition. The Fisher information of size 𝒏𝒏 about 𝜃𝜃 is defined as
2
𝜕𝜕𝜕 𝜃𝜃
𝐼𝐼𝑛𝑛 𝜃𝜃 = 𝐸𝐸 ,
𝜕𝜕𝜕𝜕
where ℓ 𝜃𝜃 = log 𝐿𝐿 𝜃𝜃 is the log-likelihood function of a random sample of size 𝑛𝑛.

STAT 2006 - Jan 2022 25


UMVUE

Theorem. Let 𝑋𝑋1 , … , 𝑋𝑋𝑛𝑛 be a random sample from a population with pdf 𝑓𝑓 𝑥𝑥; 𝜃𝜃 . Then,
under certain regularity conditions, we have the following results.
• 𝐼𝐼𝑛𝑛 𝜃𝜃 = 𝑛𝑛𝑛𝑛 𝜃𝜃 , where
2
𝜕𝜕 log 𝑓𝑓 𝑋𝑋; 𝜃𝜃
𝐼𝐼 𝜃𝜃 = 𝐸𝐸 ,
𝜕𝜕𝜕𝜕
is the Fisher information of size 1 about 𝜃𝜃.
𝜕𝜕2 log 𝑓𝑓 𝑋𝑋;𝜃𝜃
• 𝐼𝐼 𝜃𝜃 = 𝐸𝐸 − .
𝜕𝜕𝜃𝜃2
� the Cramer-Rao inequality is 𝑉𝑉𝑉𝑉𝑉𝑉 𝜃𝜃� ≥ 𝐼𝐼𝑛𝑛 𝜃𝜃
• For an unbiased estimator 𝜃𝜃, −1 .

STAT 2006 - Jan 2022 26


UMVUE
If 𝜃𝜃� is an unbiased estimator of 𝜃𝜃 and 𝑉𝑉𝑉𝑉𝑉𝑉 𝜃𝜃� = 𝐼𝐼𝑛𝑛 𝜃𝜃 −1 , then, 𝜃𝜃� is the UMVUE of 𝜃𝜃.
Example. Show that 𝑋𝑋�𝑛𝑛 is the UMVUE of the mean of a normal population 𝑛𝑛 𝜇𝜇, 𝜎𝜎 2 .
2 2
1 𝑥𝑥 − 𝜇𝜇 1 2
𝑥𝑥 − 𝜇𝜇
𝑓𝑓 𝑥𝑥; 𝜇𝜇 = exp − ⟹ log 𝑓𝑓 𝑥𝑥; 𝜇𝜇 = − log 2𝜋𝜋𝜎𝜎 − .
2𝜋𝜋𝜎𝜎 2 2𝜎𝜎 2 2 2𝜎𝜎 2
2 2
𝜕𝜕 log 𝑓𝑓 𝑥𝑥; 𝜇𝜇 𝑥𝑥 − 𝜇𝜇 𝜕𝜕 log 𝑓𝑓 𝑋𝑋; 𝜇𝜇 𝑋𝑋 − 𝜇𝜇 1
= 2
⟹ 𝐼𝐼 𝜇𝜇 = 𝐸𝐸 = 𝐸𝐸 = 2.
𝜕𝜕𝜕𝜕 𝜎𝜎 𝜕𝜕𝜕𝜕 𝜎𝜎 4 𝜎𝜎
Alternatively,
𝜕𝜕 2 log 𝑓𝑓 𝑋𝑋; 𝜇𝜇 1
𝐼𝐼 𝜇𝜇 = −𝐸𝐸 2
= 2.
𝜕𝜕𝜇𝜇 𝜎𝜎
Hence, the Cramer-Rao lower bound is
2
1 𝜎𝜎
𝑉𝑉𝑉𝑉𝑉𝑉 𝜇𝜇� ≥ 𝐼𝐼𝑛𝑛 𝜇𝜇 −1 = = .
𝑛𝑛𝑛𝑛 𝜇𝜇 𝑛𝑛
𝜎𝜎2
Since 𝐸𝐸 𝑋𝑋�𝑛𝑛 = 𝜇𝜇 and 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋�𝑛𝑛 = , 𝑋𝑋�𝑛𝑛 is the UMVUE of 𝜇𝜇.
𝑛𝑛
STAT 2006 - Jan 2021 27
UMVUE

Example. Show that 𝑋𝑋�𝑛𝑛 is the UMVUE of the parameter 𝜃𝜃 of a Bernoulli population.
𝑓𝑓 𝑥𝑥; 𝜃𝜃 = 𝜃𝜃 𝑥𝑥 1 − 𝜃𝜃 1−𝑥𝑥 , 𝑥𝑥 ∈ 0,1
𝜕𝜕 log 𝑓𝑓 𝑥𝑥; 𝜃𝜃 𝜕𝜕 𝑥𝑥 1 − 𝑥𝑥 𝑥𝑥 1
= 𝑥𝑥 log 𝜃𝜃 + 1 − 𝑥𝑥 log 1 − 𝜃𝜃 = − = −
𝜕𝜕𝜃𝜃 𝜕𝜕𝜕𝜕 𝜃𝜃 1 − 𝜃𝜃 𝜃𝜃 1 − 𝜃𝜃 1 − 𝜃𝜃
2
𝜕𝜕 log 𝑓𝑓 𝑋𝑋; 𝜃𝜃 𝑋𝑋 1
𝐼𝐼 𝜃𝜃 = 𝐸𝐸 = 𝑉𝑉𝑉𝑉𝑉𝑉 = .
𝜕𝜕𝜃𝜃 𝜃𝜃 1 − 𝜃𝜃 𝜃𝜃 1 − 𝜃𝜃
The Cramer-Rao lower bound is
1 𝜃𝜃 1 − 𝜃𝜃
𝑉𝑉𝑉𝑉𝑉𝑉 𝜃𝜃� ≥ 𝐼𝐼𝑛𝑛 𝜃𝜃 −1 = = .
𝑛𝑛𝑛𝑛 𝜃𝜃 𝑛𝑛
𝜃𝜃 1−𝜃𝜃
� �
Since 𝐸𝐸 𝑋𝑋𝑛𝑛 = 𝜃𝜃 and 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋𝑛𝑛 = , 𝑋𝑋�𝑛𝑛 is the UMVUE of 𝜃𝜃.
𝑛𝑛

STAT 2006 - Jan 2021 28


Consistency

𝑝𝑝
𝜃𝜃� is a consistent estimator of 𝜃𝜃 if 𝜃𝜃� 𝜃𝜃, that is, for any 𝜀𝜀 > 0,
lim 𝑃𝑃 𝜃𝜃� − 𝜃𝜃 > 𝜀𝜀 = 0.
𝑛𝑛→∞
Property. If 𝜃𝜃�𝑛𝑛 is a sequence of estimators of a parameter 𝜃𝜃 satisfying
• lim 𝑉𝑉𝑉𝑉𝑉𝑉 𝜃𝜃�𝑛𝑛 = 0
𝑛𝑛→∞
• lim 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 𝜃𝜃�𝑛𝑛 = 0
𝑛𝑛→∞
then, 𝜃𝜃�𝑛𝑛 is a consistent sequence of estimators of 𝜃𝜃.
2
Proof. 𝐸𝐸 𝜃𝜃�𝑛𝑛 − 𝜃𝜃 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝜃𝜃�𝑛𝑛 + 𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 𝜃𝜃�𝑛𝑛 .

STAT 2006 - Jan 2021 29


Consistency

Property. If 𝜃𝜃� ⟶𝑝𝑝 𝜃𝜃 and 𝜃𝜃� ⟶𝑝𝑝 𝜃𝜃 ′ , then


• 𝜃𝜃� ± 𝜃𝜃� ⟶𝑝𝑝 𝜃𝜃 ± 𝜃𝜃 ′
• 𝜃𝜃�𝜃𝜃� ⟶𝑝𝑝 𝜃𝜃𝜃𝜃 ′
• 𝜃𝜃�⁄𝜃𝜃� ⟶𝑝𝑝 𝜃𝜃⁄𝜃𝜃 ′ , assuming that 𝜃𝜃� ≠ 0 and 𝜃𝜃 ′ ≠ 0
• If 𝑔𝑔 is any real-valued function that is continuous at 𝜃𝜃, 𝑔𝑔 𝜃𝜃� ⟶𝑝𝑝 𝑔𝑔 𝜃𝜃 .

STAT 2006 - Jan 2021 30


Consistency

Example. Suppose that 𝑋𝑋1 , … , 𝑋𝑋𝑛𝑛 are independent random variables having the same
finite mean 𝜇𝜇 = 𝐸𝐸 𝑋𝑋1 , finite variance 𝜎𝜎 2 = 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋1 , and finite fourth moment 𝜇𝜇4 =
𝐸𝐸 𝑋𝑋14 . Show that 𝑋𝑋� is a consistent estimator of 𝜇𝜇 and 𝑆𝑆 2 is a consistent estimator of 𝜎𝜎 2 .
Answer. Note that 𝐸𝐸 𝑋𝑋� = 𝜇𝜇 (unbiased) and 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋� = 𝜎𝜎 2 ⁄𝑛𝑛 → 0, as 𝑛𝑛 → ∞. Hence, 𝑋𝑋� is
a consistent estimator of 𝜇𝜇. (We can also prove this using weak law of large numbers)
For 𝑆𝑆 2 , we have
𝑛𝑛 𝑛𝑛
1 𝑛𝑛 1
2
𝑆𝑆 = �
� 𝑋𝑋𝑖𝑖 − 𝑋𝑋 =2 � 𝑋𝑋𝑖𝑖2 − 𝑋𝑋� 2 .
𝑛𝑛 − 1 𝑛𝑛 − 1 𝑛𝑛
𝑖𝑖=1 𝑖𝑖=1

By the weak law of large numbers, we have 𝑋𝑋 2 ⟶𝑝𝑝 𝜇𝜇2 = 𝐸𝐸 𝑋𝑋12 and 𝑋𝑋� ⟶𝑝𝑝 𝜇𝜇. Since
𝑔𝑔 𝑥𝑥 = 𝑥𝑥 2 is continuous, we have 𝑋𝑋� 2 ⟶ 𝜇𝜇2 . Hence, 𝑆𝑆 2 ⟶ 𝜇𝜇2 − 𝜇𝜇2 = 𝜎𝜎 2 .

STAT 2006 - Jan 2021 31


Consistency

Note that when 𝑋𝑋1 , … , 𝑋𝑋𝑛𝑛 are independent 𝑛𝑛 𝜇𝜇, 𝜎𝜎 2 random variables. We understand
that
𝑛𝑛 − 1 𝑆𝑆 2 2
2
~𝜒𝜒𝑛𝑛−1 ,
𝜎𝜎
2
where 𝜒𝜒𝑛𝑛−1 is the chi-square distribution with 𝑛𝑛 − 1 degrees of freedom. Therefore,
𝑛𝑛 − 1 𝑆𝑆 2 2 = 𝜎𝜎 2 (𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢𝑢)
𝐸𝐸 = 𝑛𝑛 − 1 ⟹ 𝐸𝐸 𝑆𝑆
𝜎𝜎 2
𝑛𝑛−1 𝑆𝑆 2
Since 𝑉𝑉𝑉𝑉𝑉𝑉 = 2 𝑛𝑛 − 1 ,
𝜎𝜎 2
2𝜎𝜎 4
𝑉𝑉𝑉𝑉𝑉𝑉 𝑆𝑆 2 = ⟶ 0,
𝑛𝑛 − 1
as 𝑛𝑛 ⟶ ∞. Thus, 𝑆𝑆 2 is a consistent estimator of 𝜎𝜎 2 .

STAT 2006 - Jan 2021 32


Asymptotic normal of MLE

Theorem. Let 𝑋𝑋1 , … , 𝑋𝑋𝑛𝑛 be an independent random sample from a population with pdf
𝑓𝑓 𝑥𝑥; 𝜃𝜃 . Suppose that 𝜃𝜃� is the MLE of the true parameter 𝜃𝜃0 . Then, under certain
regularity conditions, as 𝑛𝑛 ⟶ ∞,
𝑑𝑑
𝑛𝑛 𝜃𝜃� − 𝜃𝜃0 𝑛𝑛 0, 𝐼𝐼 𝜃𝜃0 −1
.
1
In other words, the large sample distribution of the mle 𝜃𝜃� is 𝑛𝑛 𝜃𝜃0 , 𝐼𝐼 𝜃𝜃0 −1
.
𝑛𝑛

Remark: 𝐼𝐼 𝜃𝜃0 corresponds to Fisher information of size 1.

STAT 2006 - Jan 2021 33


Asymptotic normal of MLE

Example. Suppose that 𝑋𝑋1 , … , 𝑋𝑋𝑛𝑛 be an independent random sample from a Poisson
distribution with pdf
𝜆𝜆 𝑥𝑥
𝑓𝑓 𝑥𝑥, 𝜆𝜆 = 𝑒𝑒 −𝜆𝜆 , 𝑥𝑥 = 0,1,2, … ; 𝜆𝜆 > 0.
𝑥𝑥!
Then, the maximum likelihood estimator of 𝜆𝜆 is 𝜆𝜆̂ = 𝑋𝑋. � The log-likelihood function is
given by
log 𝐿𝐿 𝜆𝜆, 𝑋𝑋1 , … , 𝑋𝑋𝑛𝑛 = 𝑛𝑛𝑋𝑋� log 𝜆𝜆 − 𝑛𝑛𝜆𝜆.
Thus,
𝜕𝜕 log 𝐿𝐿 𝜆𝜆 𝑛𝑛𝑋𝑋� 𝜕𝜕 2 log 𝐿𝐿 𝜆𝜆 𝑛𝑛𝑋𝑋� 𝜕𝜕 2 log 𝐿𝐿 𝜆𝜆 𝑛𝑛
= − 𝑛𝑛; 2
= − 2 ⟹ 𝐼𝐼𝑛𝑛 𝜆𝜆 = −𝐸𝐸 2
= .
𝜕𝜕𝜕𝜕 𝜆𝜆 𝜕𝜕𝜆𝜆 𝜆𝜆 𝜕𝜕𝜆𝜆 𝜆𝜆
𝜆𝜆
The large sample distribution of 𝜆𝜆̂ = 𝑋𝑋� is 𝑛𝑛 𝜆𝜆, .
𝑛𝑛

STAT 2006 - Jan 2021 34


Interval Estimation

Point estimates, such as the sample proportion 𝑝𝑝,̂ the sample mean 𝑥𝑥̅ or the sample
variance 𝜎𝜎� 2 depends on the particular sample. When we use the sample mean 𝑥𝑥̅ to
estimate the population mean 𝜇𝜇, can we be confident that 𝑥𝑥̅ is close to 𝜇𝜇? Can we have a
measure as to how close the sample estimator 𝜃𝜃� is to the population parameter 𝜃𝜃?
An approach is to find an upper bound 𝑈𝑈 and a lower bound 𝐿𝐿 such that the value of the
population parameter 𝜃𝜃 is between 𝐿𝐿 and 𝑈𝑈 with probability 1 − 𝛼𝛼, that is
𝑃𝑃 𝐿𝐿 < 𝜃𝜃 < 𝑈𝑈 = 1 − 𝛼𝛼,
where 1 − 𝛼𝛼 ∈ 0,1 is called the confidence coefficient or the confidence level of the
interval 𝐿𝐿, 𝑈𝑈 .
Typical confidence coefficients are 90%, 95%, and 99%. For example, we are 95% confident
that the population mean is between 𝐿𝐿 and 𝑈𝑈.

STAT 2006 - Jan 2022 35


Interval Estimation

Example. For an independent random sample 𝑋𝑋1 , 𝑋𝑋2 , 𝑋𝑋3 , 𝑋𝑋4 from 𝑛𝑛 𝜇𝜇, 1 , consider an
interval estimator of 𝜇𝜇 by 𝑋𝑋� − 1, 𝑋𝑋� + 1 . Then, the probability that 𝜇𝜇 ∈ 𝑋𝑋� − 1, 𝑋𝑋� + 1
is given by
𝑃𝑃 𝜇𝜇 ∈ 𝑋𝑋� − 1, 𝑋𝑋� + 1 = 𝑃𝑃 𝑋𝑋� − 1 ≤ 𝜇𝜇 ≤ 𝑋𝑋� + 1 = 𝑃𝑃 −1 ≤ 𝑋𝑋� − 𝜇𝜇 ≤ 1
𝑋𝑋� − 𝜇𝜇
= 𝑃𝑃 −2 ≤ ≤ 2 = 𝑃𝑃 −2 ≤ 𝑍𝑍 ≤ 2 ≈ 0.9544,
1
4
where 𝑍𝑍~𝑛𝑛 0,1 . Note that we have over a 95% chance of covering 𝜃𝜃.

STAT 2006 - Jan 2021 36


Interval Estimation

Definition. Let 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 be an independent sample from a distribution with pdf 𝑓𝑓𝜃𝜃 . The
confidence coefficient 1 − 𝛼𝛼 of an interval estimator 𝐿𝐿 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 , 𝑈𝑈 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 of
𝜃𝜃 is given by
1 − 𝛼𝛼 = 𝑃𝑃 𝜃𝜃 ∈ 𝐿𝐿 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 , 𝑈𝑈 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 .

An interval estimator, together with a measure of confidence, is known as confidence


interval.

STAT 2006 - Jan 2021 37


Interval Estimation

Definition. The value 𝑧𝑧𝛼𝛼/2 is the 𝑍𝑍-value (obtained from a standard normal table) such that
𝛼𝛼 𝛼𝛼
the area to the right of it under the standard normal curve is , that is 𝑃𝑃 𝑍𝑍 ≥ 𝑧𝑧𝛼𝛼/2 = . By
2 2
𝛼𝛼
symmetry of the normal distribution, 𝑃𝑃 𝑍𝑍 ≤ −𝑧𝑧𝛼𝛼/2 = .
2

STAT 2006 - Jan 2021 38


Confidence intervals for means – One sample
Definition. A random variable 𝑄𝑄 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 , 𝜃𝜃 is a pivotal quantity if its distribution is
free of 𝜃𝜃.
Example. Let 𝑋𝑋~𝑛𝑛 𝜇𝜇, 𝜎𝜎 2 random variable, where 𝜎𝜎 2 is a known quantity. Then, since
𝜎𝜎 2

𝑋𝑋~𝑛𝑛 𝜇𝜇, , we have
𝑛𝑛
𝑋𝑋� − 𝜇𝜇
𝑍𝑍 = ~𝑛𝑛 0,1 .
𝜎𝜎/ 𝑛𝑛
𝜎𝜎
Note that 𝑍𝑍 is a pivotal quantity. A 1 − 𝛼𝛼 confidence interval for the mean 𝜇𝜇 is 𝑋𝑋� ± 𝑧𝑧𝛼𝛼/2 .
𝑛𝑛
𝑋𝑋� − 𝜇𝜇
1 − 𝛼𝛼 = 𝑃𝑃 −𝑧𝑧𝛼𝛼 ≤ 𝑍𝑍 ≤ 𝑧𝑧𝛼𝛼 = 𝑃𝑃 −𝑧𝑧𝛼𝛼 ≤ 𝜎𝜎 ≤ 𝑧𝑧𝛼𝛼
2 2 2 2
𝑛𝑛
𝜎𝜎 𝜎𝜎
= 𝑃𝑃 𝑋𝑋� − 𝑧𝑧𝛼𝛼 ≤ 𝜇𝜇 ≤ 𝑋𝑋� + 𝑧𝑧𝛼𝛼 .
2 𝑛𝑛 2 𝑛𝑛

STAT 2006 - Jan 2021 39


Confidence intervals for means – One sample
A random sample of 126 people subjected to constant inhalation of automobile exhaust
fumes in cities of Hong Kong had an average blood lead level concentration of 29.2 𝜇𝜇𝑔𝑔/𝑑𝑑𝑑𝑑.
Assume that 𝑋𝑋, the blood lead level of a randomly selected person, is normally distributed
with a standard deviation of 𝜎𝜎 = 7.5 𝜇𝜇𝜇𝜇/𝑑𝑑𝑑𝑑. From past data, it is known that the average
blood lead level concentration of humans with no exposure to automobile exhaust is 18.2
𝜇𝜇𝜇𝜇/𝑑𝑑𝑑𝑑. Is there convincing evidence the people in Hong Kong who exposed to constant
auto exhaust have elevated blood lead level concentration?
Answer.
𝜎𝜎
A 95% confidence interval for the mean 𝜇𝜇 is 𝑥𝑥̅ ± 𝑧𝑧𝛼𝛼/2 = 27.89,30.15 , where 𝑥𝑥̅ = 29.2,
𝑛𝑛
𝜎𝜎 = 7.5, 𝑛𝑛 = 126, and 𝑧𝑧0.025 = 1.96.
Since the entire 95% confidence interval is above 18.2, the mean
blood lead concentration, there is convincing evidence that the
people exposed to constant auto exhaust have elevated blood lead
level concentration.
STAT 2006 - Jan 2021 40
Confidence intervals for means – One sample

Example. A publishing company has just published a new college textbook. Before the
company decides the price of the book, it wants to know the average price of all such
textbooks in the market. The research department at the company took a sample of 36
such textbooks and collected information on their prices. This information produced a
mean price of $48.40 for this sample. It is known that the standard deviation of the
prices of all such textbooks is $4.50. Construct a 90% confidence interval for the mean
price of all such college textbooks assuming that the underlying population is normal.
Answer. Given 𝑛𝑛 = 36, 𝑥𝑥̅ = 48.4 and 𝜎𝜎 = 4.50. The 90% confidence interval for the mean
price of all such college textbooks is given by
4.5 4.5
𝑥𝑥̅ − 1.645 × , 𝑥𝑥̅ + 1.645 × = 47.1662,49.6338 .
36 36

STAT 2006 - Jan 2021 41


Confidence intervals for means – One sample

𝜎𝜎 𝜎𝜎
The statement 𝑃𝑃 𝑋𝑋� − 𝑧𝑧𝛼𝛼 ≤ 𝜇𝜇 ≤ 𝑋𝑋� + 𝑧𝑧𝛼𝛼 = 1 − 𝛼𝛼 cannot be interpreted to say that
2 𝑛𝑛 2 𝑛𝑛
𝜎𝜎 𝜎𝜎
the probability that the population mean 𝜇𝜇 falls inside the interval 𝑋𝑋� − 𝑧𝑧𝛼𝛼 , 𝑋𝑋� + 𝑧𝑧𝛼𝛼 is
2 𝑛𝑛 2 𝑛𝑛
1 − 𝛼𝛼. The correct interpretation is the followings:
• Suppose we take a large number of samples.
• We calculate a 95% confidence interval for each sample.
• Then, we expect that 95% of the intervals contain the actual unknown 𝜇𝜇.

STAT 2006 - Jan 2021 42


Confidence intervals for means – One sample

If a confidence interval for a parameter 𝜃𝜃 is 𝐿𝐿, 𝑈𝑈 , then the length of the interval is 𝑈𝑈 − 𝐿𝐿.
We are interested in obtaining intervals that are as narrow as possible. For example,
consider the following two cases:
• We can be 95% confident that the average amount of money spent monthly on housing
in the U.S. is between $300 and $3300.
• We can be 95% confident that the average amount of money spent monthly on housing
in the U.S. is between $1100 and $1300.
In the first statement, the average amount of money spent monthly can be anywhere
between $300 and $3300, whereas, for the second statement, the average amount has
been narrowed down to somewhere between $1100 and $1300. So, of course, we would
prefer to make the second statement, because it gives us a more specific range of the
magnitude of the population mean.

STAT 2006 - Jan 2021 43


Confidence intervals for means – One sample
For a normal distribution with known variance 𝜎𝜎 2 , the length of the 1 − 𝛼𝛼 confidence
𝜎𝜎 𝜎𝜎

interval 𝑋𝑋 − 𝑧𝑧 𝛼𝛼 �
, 𝑋𝑋 + 𝑧𝑧 𝛼𝛼 is
2 𝑛𝑛 2 𝑛𝑛
𝜎𝜎 𝜎𝜎 𝜎𝜎
𝑋𝑋� + 𝑧𝑧𝛼𝛼 �
− 𝑋𝑋 − 𝑧𝑧𝛼𝛼 = 2𝑧𝑧𝛼𝛼 .
2 𝑛𝑛 2 𝑛𝑛 2 𝑛𝑛
It tells us that
• As the population standard deviation 𝜎𝜎 decreases, the length of the interval decreases.
We have no control over the population standard deviation , so this factor doesn't help
us all that much.
• As the sample size 𝑛𝑛 increases, the length of the interval decreases. We should select as
large of a sample as we can afford.
• As the confidence level decreases, the length of the interval decreases. For example, that
for a 95% interval, 𝑧𝑧 = 1.96, whereas for a 90% interval, 𝑧𝑧 = 1.645. We want a high
confidence level, but not so high as to produce such a wide interval as to be useless.
STAT 2006 - Jan 2021 44
Confidence intervals for means – One sample
When 𝜎𝜎 is not known, we estimate the population variance 𝜎𝜎 2 with the sample variance
𝑆𝑆 2 .
Theorem. If 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are a random sample from 𝑛𝑛 𝜇𝜇, 𝜎𝜎 2 distribution, then
𝑋𝑋� − 𝜇𝜇
𝑇𝑇 =
𝑆𝑆/ 𝑛𝑛
has a t-distribution with 𝑛𝑛 − 1 degrees of freedom.
Proof. By the definition of t-distribution, if 𝑍𝑍~𝑛𝑛 0,1 , 𝑈𝑈~𝜒𝜒𝑟𝑟2 and 𝑍𝑍 and 𝑈𝑈 are independent, then
𝑍𝑍 �
𝑋𝑋−𝜇𝜇
𝑇𝑇 = has a t-distribution with 𝑟𝑟 degrees of freedom. Now, we have 𝑍𝑍 = ~𝑛𝑛 0,1 ,
𝑈𝑈/𝑟𝑟 𝜎𝜎/ 𝑛𝑛
𝑛𝑛−1 𝑆𝑆 2
~𝜒𝜒 2
𝑛𝑛−1 , and 𝑋𝑋� and 𝑆𝑆 2 are independent. Hence,
𝜎𝜎2
𝑋𝑋� − 𝜇𝜇
𝜎𝜎/ 𝑛𝑛 𝑋𝑋̅ − 𝜇𝜇
𝑇𝑇 = = ~𝑡𝑡𝑛𝑛−1 .
𝑛𝑛 − 1 𝑆𝑆 2 𝑆𝑆/ 𝑛𝑛
/ 𝑛𝑛 − 1
𝜎𝜎 2
STAT 2006 - Jan 2021 45
Confidence intervals for means – One sample

Theorem. If 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are a random sample from 𝑛𝑛 𝜇𝜇, 𝜎𝜎 2 distribution, then a 1 − 𝛼𝛼
𝑆𝑆

confidence interval for the population mean 𝜇𝜇 is 𝑋𝑋 ± 𝑡𝑡𝛼𝛼,𝑛𝑛−1 .
2 𝑛𝑛

𝑋𝑋−𝜇𝜇
Note that 𝑇𝑇 = is a pivotal quantity.
𝑆𝑆/ 𝑛𝑛

STAT 2006 - Jan 2021 46


Confidence intervals for means – One sample
𝑆𝑆
Terminology. With the t-interval 𝑋𝑋� ± 𝑡𝑡𝛼𝛼,𝑛𝑛−1 , we say that
2 𝑛𝑛

• 𝑋𝑋� is a point estimator of 𝜇𝜇. 𝑥𝑥̅ is a point estimate of 𝜇𝜇.


𝑆𝑆 𝑠𝑠
• �
𝑋𝑋 ± 𝑡𝑡 ,𝑛𝑛−1 is an interval estimator of 𝜇𝜇. 𝑥𝑥̅ ± 𝑡𝑡𝛼𝛼,𝑛𝑛−1
𝛼𝛼 is an interval estimate of 𝜇𝜇.
2 𝑛𝑛 2 𝑛𝑛
𝑆𝑆
• is the standard error of the mean.
𝑛𝑛
𝑆𝑆
• 𝑡𝑡𝛼𝛼,𝑛𝑛−1 is the margin of error.
2 𝑛𝑛

STAT 2006 - Jan 2021 47


Confidence intervals for means – One sample
Example. A random sample of 16 people yielded the following data on the number of
pounds of beef consumed per year:
118, 115, 125, 110, 112, 130, 117, 112, 115, 120, 113, 118, 119, 122, 123, 126
What is the average number of pounds of beef consumed each year per person?
The first step of analysis is to check that the data follows a normal distribution. This can
be done using a normal probability plot. Procedure for creating a normal probability
plot:
𝑖𝑖
• Use the cut-off points , 𝑖𝑖 = 1,2, … , 𝑛𝑛 to find the corresponding quantiles of normal
𝑛𝑛
𝑖𝑖
distribution with mean 𝑥𝑥̅ and standard deviation 𝑠𝑠, that is Φ−1
, 𝑖𝑖 = 1,2, … , 𝑛𝑛 .
𝑛𝑛
• Sort the data in ascending order, that is 𝑥𝑥𝑖𝑖 , 𝑖𝑖 = 1,2, … , 𝑛𝑛 ⟶ 𝑦𝑦1 ≤ 𝑦𝑦2 ≤ ⋯ ≤ 𝑦𝑦𝑛𝑛 .
𝑖𝑖
• Scatter plot between Φ−1 , 𝑖𝑖 = 1,2, … , 𝑛𝑛 and 𝑦𝑦𝑖𝑖 , 𝑖𝑖 = 1,2, … , 𝑛𝑛 .
𝑛𝑛

STAT 2006 - Jan 2021 48


Confidence intervals for means – One sample
Example.
If the points scatter around the 𝑥𝑥 = 𝑦𝑦 line, then the data has a normal distribution.

Since the data points (the right diagram) falls at least


approximately on a straight line, there's no reason to
conclude that the data are not normally distributed. Note
that 𝑥𝑥̅ = 118.44 and 𝑠𝑠 = 5.66. For a 95% confidence interval
with 𝑛𝑛 = 16 data points, we have 𝑡𝑡0.025,15 = 2.1314. The
𝑠𝑠
95% confidence interval for the mean 𝜇𝜇 is 𝑥𝑥̅ ± 𝑡𝑡𝛼𝛼,𝑛𝑛−1 =
2 𝑛𝑛
115.42,121.46 . That is, we have 95% confident that the
average amount of beef consumed each year per person is
between 115.42 and 121.46 pounds.

STAT 2006 - Jan 2021 49


Confidence intervals for means – One sample

Non-normal data
𝜎𝜎 2
By the Central Limit Theorem, the large sample distribution of 𝑋𝑋� is 𝑛𝑛 , irrespective
𝜇𝜇,
𝑛𝑛
of whether the data 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are normally distributed. Since 𝑡𝑡𝑛𝑛 ⟶ 𝑛𝑛 0,1 , as 𝑛𝑛 →

𝑋𝑋−𝜇𝜇
∞, the large sample distribution of 𝑇𝑇 = can be approximated by the standard
𝑆𝑆/ 𝑛𝑛
𝑠𝑠 𝑠𝑠
normal distribution. Thus, the two intervals 𝑥𝑥̅ ± 𝑡𝑡 𝛼𝛼
,𝑛𝑛−1 and 𝑥𝑥̅ ± 𝑧𝑧 𝛼𝛼 give similar
2 𝑛𝑛 2 𝑛𝑛
results for large samples.
Therefore, a rule of thumb is that we should use the t-interval for the mean, 𝑥𝑥̅ ±
𝑠𝑠
𝑡𝑡𝛼𝛼,𝑛𝑛−1 if the sample size is large enough.
2 𝑛𝑛

STAT 2006 - Jan 2021 50


Confidence intervals for means – One sample

Example. A random sample of 64 guinea pigs yielded the following survival times (in
days):
36,18,91,89,87,86,52,50,149,120,
119,118,115,114,114,108,102,189,178,173,
167,167,166,165,160,216,212,209,292,279,
278,273,341,382,380,367,355,446,432,421,
641,638,637,634,621,608,607,603,688,685,
663,650,735,725
What is the mean survival time (in days) of the population of guinea pigs?

STAT 2006 - Jan 2021 51


Confidence intervals for means – One sample
Example.
The normal probability plot indicates that the data does not adhere well to the 𝑥𝑥 = 𝑦𝑦
straight line. It suggest that the survival times are not normally distributed.
Since the sample size 𝑛𝑛 = 54 is large, we can use the t-interval for the
𝑠𝑠
mean, 𝑥𝑥̅ ± 𝑡𝑡𝛼𝛼,𝑛𝑛−1 . The 95% confidence interval for the mean survival
2 𝑛𝑛
times is 251.8444, 375.9705 (in days).
𝑠𝑠
If we use the normal interval for the mean, 𝑥𝑥̅ ± 𝑧𝑧 𝛼𝛼 , the 95%
2 𝑛𝑛
confidence interval for the mean survival times is 253.261,374.5538 .

STAT 2006 - Jan 2021 52


Confidence intervals for means – Two samples

The Two-sample t Test: The case of Equal Variances


Let 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑚𝑚 and 𝑌𝑌1 , 𝑌𝑌2 , … , 𝑌𝑌𝑛𝑛 be two independent samples drawn from 𝑛𝑛 𝜇𝜇𝑋𝑋 , 𝜎𝜎 2
𝜎𝜎 2 𝜎𝜎 2
2
and 𝑛𝑛 𝜇𝜇𝑌𝑌 , 𝜎𝜎 respectively. We have 𝑋𝑋~𝑛𝑛 � 𝜇𝜇𝑋𝑋 , �
, 𝑌𝑌~𝑛𝑛 𝜇𝜇𝑌𝑌 , , and 𝑋𝑋� and 𝑌𝑌� are
𝑚𝑚 𝑛𝑛
independent. Thus,
1 1
𝑋𝑋� − 𝑌𝑌~𝑛𝑛
� 𝜇𝜇𝑋𝑋 − 𝜇𝜇𝑌𝑌 , 𝜎𝜎 2
+ .
𝑚𝑚 𝑛𝑛

1 1
When 𝜎𝜎 is known, a 1 − 𝛼𝛼 confidence interval for 𝜇𝜇𝑋𝑋 − 𝜇𝜇𝑌𝑌 is 𝑋𝑋� − 𝑌𝑌� ± 𝑧𝑧𝛼𝛼 𝜎𝜎
2
+ .
2 𝑚𝑚 𝑛𝑛

STAT 2006 - Jan 2021 53


Confidence intervals for means – Two samples
The Two-sample t Test: The case of Equal Variances
2
𝑚𝑚−1 𝑆𝑆𝑋𝑋 2 𝑛𝑛−1 𝑆𝑆𝑌𝑌2 2
When 𝜎𝜎 2 is unknown, we need to estimate Since 𝜎𝜎 2 . ~𝜒𝜒𝑚𝑚−1 , ~𝜒𝜒𝑛𝑛−1 and
𝜎𝜎 2 𝜎𝜎 2
𝑆𝑆𝑋𝑋2 and 𝑆𝑆𝑌𝑌2 are independent, we can conclude that
𝑚𝑚 − 1 𝑆𝑆𝑋𝑋2 + 𝑛𝑛 − 1 𝑆𝑆𝑌𝑌2 2
2
~𝜒𝜒𝑚𝑚+𝑛𝑛−2 .
𝜎𝜎
2
𝑚𝑚−1 𝑆𝑆𝑋𝑋 + 𝑛𝑛−1 𝑆𝑆𝑌𝑌2
It follows that 𝑆𝑆𝑝𝑝2 = is an unbiased estimator of 𝜎𝜎 2 .
𝑚𝑚+𝑛𝑛−2
𝑘𝑘 1 ℎ 1
Note that we have used the following result: when 𝑈𝑈~𝜒𝜒𝑘𝑘2 =Γ , and 𝑉𝑉~𝜒𝜒ℎ2 =Γ ,
2 2 2 2
are independent,

2 𝑘𝑘 + ℎ 1
𝑈𝑈 + 𝑉𝑉~𝜒𝜒𝑘𝑘+ℎ =Γ , .
2 2
STAT 2006 - Jan 2021 54
Confidence intervals for means – Two samples
The Two-sample t Test: The case of Equal Variances
By the definition of t-distribution, the random variable
𝑋𝑋� − 𝑌𝑌� − 𝜇𝜇𝑋𝑋 − 𝜇𝜇𝑌𝑌
1 1
𝜎𝜎 𝑛𝑛 + 𝑚𝑚 𝑋𝑋� − 𝑌𝑌� − 𝜇𝜇𝑋𝑋 − 𝜇𝜇𝑌𝑌
𝑇𝑇 = = ~𝑡𝑡𝑚𝑚+𝑛𝑛−2 .
𝑚𝑚 − 1 𝑆𝑆𝑋𝑋2 + 𝑛𝑛 − 1 𝑆𝑆𝑌𝑌2 𝑆𝑆𝑝𝑝
1 1
+
𝜎𝜎 2 𝑛𝑛 𝑚𝑚
𝑚𝑚 + 𝑛𝑛 − 2
1 1
Thus, a 1 − 𝛼𝛼 confidence interval for for 𝜇𝜇𝑋𝑋 − 𝜇𝜇𝑌𝑌 is 𝑋𝑋� − 𝑌𝑌� ± 𝑡𝑡𝛼𝛼,𝑚𝑚+𝑛𝑛−2 𝑆𝑆𝑝𝑝 + .
2 𝑚𝑚 𝑛𝑛

STAT 2006 - Jan 2021 55


Confidence intervals for means – Two samples
The Two-sample t Test: The case of Equal Variances
Example. The feeding habits of two species of net-casting spiders are studied. The species,
the deinopis and menneus, coexist in eastern Australia. The following data were obtained
on the size, in mm, of the prey of random samples of the two species.

dinopis 12.9, 10.2, 7.4, 7.0, 10.5, 11.9, 7.1, 9.9, 14.4, 11.3
menneus 11.2, 6.5, 10.9, 13.0, 10.1, 5.3, 7.5, 10.3, 9.2, 8.8

What is the difference, if any, in the mean size of the prey (of the entire populations) of
the two species?
The standard deviation of dinopis is 2.5136 and that of menneus is 2.3294. Since the two
standard deviations are close, we can assume that the variances of the two populations
are similar.

STAT 2006 - Jan 2021 56


Confidence intervals for means – Two samples

The Two-sample t Test: The case of Equal Variances


Example. The feeding habits of two species of net-casting spiders are studied. The species,
the deinopis and menneus, coexist in eastern Australia. The following data were obtained
on the size, in mm, of the prey of random samples of the two species.

10 − 1 𝑠𝑠𝑋𝑋2 + 10 − 1 𝑠𝑠𝑌𝑌2 𝑠𝑠𝑋𝑋2 + 𝑠𝑠𝑌𝑌2


𝑠𝑠𝑝𝑝 = = = 2.4233.
20 − 2 2

A 95% confidence interval for the difference in the population means is


1 1 2
𝑋𝑋� − 𝑌𝑌� ± 𝑡𝑡𝛼𝛼,𝑚𝑚+𝑛𝑛−2 𝑆𝑆𝑝𝑝 + = 10.26 − 9.28 ± 2.101 × 2.4233 × = −1.2968,3.2568 .
2 𝑚𝑚 𝑛𝑛 10
Because the interval contains the value 0, we cannot conclude that the population means
differ.
STAT 2006 - Jan 2021 57
Confidence intervals for means – Two samples

Welch’s t-interval for 𝜇𝜇𝑋𝑋 − 𝜇𝜇𝑌𝑌


When the data are normally distributed, and 𝑛𝑛 and 𝑚𝑚 are large enough, but the population
variances 𝜎𝜎𝑋𝑋2 and 𝜎𝜎𝑌𝑌2 are not equal, then, a 1 − 𝛼𝛼 confidence interval for 𝜇𝜇𝑋𝑋 − 𝜇𝜇𝑌𝑌 is
𝑆𝑆𝑋𝑋2 𝑆𝑆𝑌𝑌2
𝑋𝑋� − 𝑌𝑌� ± 𝑡𝑡𝛼𝛼,𝑟𝑟 + ,
2 𝑚𝑚 𝑛𝑛
where 𝑟𝑟 is the integer part of
2 2
𝑆𝑆𝑋𝑋2 𝑆𝑆𝑌𝑌
+
𝑚𝑚 𝑛𝑛
.
2 2 2
𝑆𝑆𝑋𝑋 𝑆𝑆𝑌𝑌2
𝑚𝑚 𝑛𝑛
+
𝑚𝑚 − 1 𝑛𝑛 − 1

STAT 2006 - Jan 2021 58


Confidence intervals for means – Two samples

Welch’s t-interval for 𝜇𝜇𝑋𝑋 − 𝜇𝜇𝑌𝑌


Example. The weekend athlete often incurs an injury due to not having the most
appropriate equipment. For example, tennis elbow is an injury that is the result of the
stress encountered by the elbow when striking a tennis ball. To investigate whether the
new oversized racket delivered less stress to the elbow than a more conventionally sized
racket, a group of 70 tennis players was participated in the study. 40 players were randomly
assigned to use the oversized racket and the remaining 30 players used the conventionally
sized racket.
The force on the elbow just after the impact of a forehand strike of a tennis ball was
measured five times for each of the 70 tennis players. The mean force was then taken of
the five force readings; the summary of these 70 force readings is given in the following
table.

STAT 2006 - Jan 2021 59


Confidence intervals for means – Two samples
Welch’s t-interval for 𝜇𝜇𝑋𝑋 − 𝜇𝜇𝑌𝑌 Oversized Conventional
Sample Size 40 30
Sample Mean 33.9 25.2
Sample Standard Deviation 17.4 8.6
Since the sample variances of the two groups differ, we use Welch’s t-interval. We can calculate
2 2
𝑆𝑆𝑋𝑋2 𝑆𝑆𝑌𝑌
+
𝑚𝑚 𝑛𝑛
𝑟𝑟 = 2 2 = 59.
𝑆𝑆𝑋𝑋2 𝑆𝑆𝑌𝑌2
𝑚𝑚 𝑛𝑛
+
𝑚𝑚 − 1 𝑛𝑛 − 1
2
𝑆𝑆𝑋𝑋 𝑆𝑆𝑌𝑌2
Thus, a 95% confidence interval for 𝜇𝜇𝑋𝑋 − 𝜇𝜇𝑌𝑌 is 𝑋𝑋� − 𝑌𝑌� ± 𝑡𝑡𝛼𝛼,𝑟𝑟 + = 33.9 − 25.2 ±
2 𝑚𝑚 𝑛𝑛
17.4 2 8.62
2.001 + = 2.361,15.039 .
40 30
STAT 2006 - Jan 2021 60
Confidence intervals for means – Two pair samples

Let 𝑋𝑋1 , 𝑌𝑌1 , 𝑋𝑋2 , 𝑌𝑌2 , … , 𝑋𝑋𝑛𝑛 , 𝑌𝑌𝑛𝑛 be 𝑛𝑛 pairs of dependent measurements. Example. the
weight before and after an exercise. The objective is to construct a 1 − 𝛼𝛼 confidence
interval for 𝜇𝜇𝑋𝑋 − 𝜇𝜇𝑌𝑌 .
Then 𝐷𝐷𝑖𝑖 = 𝑋𝑋𝑖𝑖 − 𝑌𝑌𝑖𝑖 , 𝑖𝑖 = 1,2, … , 𝑛𝑛 form a random sample from 𝑛𝑛 𝜇𝜇𝑋𝑋 − 𝜇𝜇𝑌𝑌 , 𝜎𝜎 2 .
� − 𝜇𝜇𝑋𝑋 − 𝜇𝜇𝑌𝑌
𝐷𝐷
𝑇𝑇 = ~𝑡𝑡𝑛𝑛−1 .
𝑆𝑆𝐷𝐷
𝑛𝑛
Hence, a 1 − 𝛼𝛼 confidence interval for 𝜇𝜇𝑋𝑋 − 𝜇𝜇𝑌𝑌 is given by
𝑆𝑆𝐷𝐷 𝑆𝑆𝐷𝐷
� − 𝑡𝑡𝛼𝛼
𝐷𝐷 �
, 𝐷𝐷 + 𝑡𝑡𝛼𝛼,𝑛𝑛−1 .
2 ,𝑛𝑛−1 𝑛𝑛 2 𝑛𝑛

STAT 2006 - Jan 2021 61


Confidence intervals for means – Pair sample
Example. An experiment was conducted to compare people's reaction times to a red light
versus a green light. When signaled with either the red or the green light, the subject was
asked to hit a switch to turn off the light. When the switch was hit, a clock was turned off
and the reaction time in seconds was recorded. The following results give the reaction
times for eight subjects.
Subject Red (𝑋𝑋) Green (𝑌𝑌) 𝐷𝐷 = 𝑋𝑋 − 𝑌𝑌
1 0.30 0.43 -0.13
2 0.23 0.36 -0.13
3 0.41 0.58 -0.17
4 0.53 0.46 0.07
5 0.24 0.27 -0.03
6 0.36 0.41 -0.05
7 0.38 0.38 0.00
8 0.51 0.61 -0.10

STAT 2006 - Jan 2021 62


Confidence intervals for means – Pair sample

Example. An experiment was conducted to compare people's reaction times to a red light
versus a green light. When signaled with either the red or the green light, the subject was
asked to hit a switch to turn off the light. When the switch was hit, a clock was turned off
and the reaction time in seconds was recorded. The following results give the reaction
times for eight subjects.
From the data, 𝐷𝐷 � = −0.0675 and 𝑆𝑆𝐷𝐷 = 0.0798. Hence, a 95% confidence interval for
𝜇𝜇𝐷𝐷 = 𝜇𝜇𝑋𝑋 − 𝜇𝜇𝑌𝑌 is
0.0798 0.0798
−0.0675 − 2.365 , −0.0675 + 2.365 = −0.1342, −0.00081 .
8 8
Since the entire 95% confidence interval is negative, we are 95% confidence that people
react faster to a red light.

STAT 2006 - Jan 2021 63


Confidence intervals for means – Pair sample
Example. Are there physiological indicators associated with schizophrenia? In a 1990
article, researchers reported the results of a study that controlled for genetic and
socioeconomic differences by examining 15 pairs of identical twins, where one of the twins
was schizophrenic and the other not. The researchers used magnetic resonance imaging to
measure the volumes (in cubic centimeters) of several regions and sub-regions inside the
twins' brains. The following data came from one of the sub-regions, the left hippocampus:

Pair 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Unaffect 1.94 1.44 1.56 1.58 2.06 1.66 1.75 1.77 1.78 1.92 1.25 1.93 2.04 1.62 2.08
Affect 1.27 1.63 1.47 1.39 1.93 1.26 1.71 1.67 1.28 1.85 1.02 1.34 2.02 1.59 1.97

We calculate the difference 𝑈𝑈 − 𝐴𝐴:


0.67, -0.19, 0.09, 0.19, 0.13, 0.40, 0.04, 0.10, 0.50, 0.07, 0.23, 0.59, 0.02, 0.03, 0.11

STAT 2006 - Jan 2021 64


Confidence intervals for means – Pair sample
Example. Are there physiological indicators associated with schizophrenia? In a 1990
article, researchers reported the results of a study that controlled for genetic and
socioeconomic differences by examining 15 pairs of identical twins, where one of the twins
was schizophrenic and the other not. The researchers used magnetic resonance imaging to
measure the volumes (in cubic centimeters) of several regions and sub-regions inside the
twins' brains. The following data came from one of the sub-regions, the left hippocampus:
� = 0.199 and 𝑆𝑆𝐷𝐷 = 0.2383. Hence, a 95% confidence interval for 𝜇𝜇𝐷𝐷 =
From the data, 𝐷𝐷
𝜇𝜇𝑋𝑋 − 𝜇𝜇𝑌𝑌 is
0.2383 0.2383
0.199 − 2.1448 , 0.199 + 2.1448 = 0.0667,0.3306 .
15 15
That is, we can be 95% confident that the mean size for unaffected individuals is between
0.067 and 0.33 cubic centimeters larger than the mean size for affected individuals.

STAT 2006 - Jan 2021 65


Confidence intervals for means – Pair sample

Common use of Pair t-interval


• A person is matched with a similar person. For example, a person is matched to another
person with a similar intelligence (IQ scores, for example) to compare the effects of two
educational programs on test scores.
• Before and after studies. For example, a person is weighed, and then put on a diet, and
weighed again.
• A person serves as his or her own control. For example, a person takes an asthma drug
called GoodLungs to assess the improvement on lung function, has a period of 8-weeks
in which no drugs are taken (known as a washout period), and then takes a second
asthma drug called EvenBetterLungs to again assess the improvement on lung function.

STAT 2006 - Jan 2021 66


Confidence intervals for variances – One sample
The pivotal quantity involving 𝜎𝜎 2 is
𝑛𝑛 − 1 𝑆𝑆 2 2
2
~𝜒𝜒𝑛𝑛−1 .
𝜎𝜎
We have
2 𝑛𝑛 − 1 𝑆𝑆 2 2 2
𝑛𝑛 − 1 𝑆𝑆 2 𝑛𝑛 − 1 𝑆𝑆 2
1 − 𝛼𝛼 = 𝑃𝑃 𝜒𝜒1− 𝛼𝛼 ≤ 2
≤ 𝜒𝜒𝛼𝛼 = 𝑃𝑃 𝜎𝜎 ∈ 2 , 2 .
2,𝑛𝑛−1 𝜎𝜎 2 ,𝑛𝑛−1 𝜒𝜒𝛼𝛼 𝜒𝜒 𝛼𝛼
2 ,𝑛𝑛−1 1− 2 ,𝑛𝑛−1

The 1 − 𝛼𝛼 confidence interval for 𝜎𝜎 2 is given by


𝑛𝑛 − 1 𝑆𝑆 2 𝑛𝑛 − 1 𝑆𝑆 2
2 , 2
𝜒𝜒𝛼𝛼 𝜒𝜒 𝛼𝛼
,𝑛𝑛−1 1− 2 ,𝑛𝑛−1
2

STAT 2006 - Jan 2021 67


Confidence intervals for variances – One sample

A large candy manufacturer produces, packages and sells packs of candy targeted to
weigh 52 grams. A quality control manager working for the company was concerned that
the variation in the actual weights of the targeted 52-gram packs was larger than
acceptable. That is, he was concerned that some packs weighed significantly less than 52-
grams and some weighed significantly more than 52 grams. In an attempt to estimate 𝜎𝜎,
the standard deviation of the weights of all of the 52-gram packs the manufacturer
makes, he took a random sample of 𝑛𝑛 = 10 packs off of the factory line. The random
sample yielded a sample variance of 4.2 grams. Use the random sample to derive a 95%
confidence interval for 𝜎𝜎.
2 2
Answer. 𝑆𝑆 = 4.2, 𝑛𝑛 = 10, 𝜒𝜒0.025,𝑛𝑛−1 = 19.02, and 𝜒𝜒0.975,𝑛𝑛−1 = 2.70. A 95% confidence
𝑛𝑛−1 𝑛𝑛−1
interval is 𝑆𝑆 2
𝜒𝜒𝛼𝛼
, 𝑆𝑆
𝜒𝜒2 𝛼𝛼
= 1.41,3.74 .
2 ,𝑛𝑛−1
1− ,𝑛𝑛−1
2

STAT 2006 - Jan 2021 68


Confidence intervals for variances – Two sample
Let 𝑋𝑋 = 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 and 𝑌𝑌 = 𝑌𝑌1 , 𝑌𝑌2 , … , 𝑌𝑌𝑚𝑚 be random samples from independent
distributions 𝑛𝑛 𝜇𝜇𝑋𝑋 , 𝜎𝜎𝑋𝑋2 and 𝑛𝑛 𝜇𝜇𝑌𝑌 , 𝜎𝜎𝑌𝑌2 , respectively. We are of interest to construct the
2
𝜎𝜎𝑋𝑋
confidence interval for .
𝜎𝜎𝑌𝑌2
Definition. Suppose that 𝑈𝑈~𝜒𝜒𝑟𝑟21 and 𝑉𝑉~𝜒𝜒𝑟𝑟22 are independent. Then,
𝑈𝑈⁄𝑟𝑟1
𝐹𝐹 =
𝑉𝑉 ⁄𝑟𝑟2
has a 𝐹𝐹𝑟𝑟1,𝑟𝑟2 distribution with 𝑟𝑟1 and 𝑟𝑟2 degrees of freedom.
1
Note that has a 𝐹𝐹𝑟𝑟2 ,𝑟𝑟1 distribution with 𝑟𝑟2 and 𝑟𝑟1
𝐹𝐹
degrees of freedom.
1 1 1
1 − 𝛼𝛼 = 𝑃𝑃 𝐹𝐹 > 𝐹𝐹1−𝛼𝛼,𝑟𝑟1 ,𝑟𝑟2 = 𝑃𝑃 < ⟹ 𝐹𝐹𝛼𝛼,𝑟𝑟2 ,𝑟𝑟1 = .
𝐹𝐹 𝐹𝐹1−𝛼𝛼,𝑟𝑟1 ,𝑟𝑟2 𝐹𝐹1−𝛼𝛼,𝑟𝑟1 ,𝑟𝑟2
STAT 2006 - Jan 2021 69
Confidence intervals for variances – Two sample

Note that
𝑛𝑛 − 1 𝑆𝑆𝑋𝑋2 2
𝑚𝑚 − 1 𝑆𝑆 2
𝑌𝑌 2
2 ~𝜒𝜒𝑛𝑛−1 and 2 ~𝜒𝜒𝑚𝑚−1 .
𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌
Since 𝑆𝑆𝑋𝑋2 and 𝑆𝑆𝑌𝑌2 are independent,
𝑚𝑚 − 1 𝑆𝑆𝑌𝑌2
𝑆𝑆𝑌𝑌2 𝜎𝜎𝑋𝑋2 𝜎𝜎𝑌𝑌2 𝑚𝑚 − 1
2� 2 = 2 ~𝐹𝐹𝑚𝑚−1,𝑛𝑛−1 .
𝑆𝑆𝑋𝑋 𝜎𝜎𝑌𝑌 𝑛𝑛 − 1 𝑆𝑆𝑋𝑋
𝜎𝜎𝑋𝑋2 𝑛𝑛 − 1
𝑆𝑆𝑌𝑌2 2
𝜎𝜎𝑋𝑋
Hence, 2 � is a pivotal quantity.
𝑆𝑆𝑋𝑋 𝜎𝜎𝑌𝑌2

STAT 2006 - Jan 2021 70


Confidence intervals for variances – Two sample

𝑆𝑆𝑌𝑌2 𝜎𝜎𝑋𝑋2
1 − 𝛼𝛼 = 𝑃𝑃 𝐹𝐹1−𝛼𝛼,𝑚𝑚−1,𝑛𝑛−1 ≤ 2 � 2 ≤ 𝐹𝐹𝛼𝛼,𝑚𝑚−1,𝑛𝑛−1
2 𝑆𝑆𝑋𝑋 𝜎𝜎𝑌𝑌 2
𝑆𝑆𝑋𝑋2 𝜎𝜎𝑋𝑋2 𝑆𝑆𝑋𝑋2
= 𝑃𝑃 𝐹𝐹1−𝛼𝛼,𝑚𝑚−1,𝑛𝑛−1 2 ≤ 2 ≤ 𝐹𝐹𝛼𝛼,𝑚𝑚−1,𝑛𝑛−1 2
2 𝑆𝑆𝑌𝑌 𝜎𝜎𝑌𝑌 2 𝑆𝑆𝑌𝑌
1 𝑆𝑆𝑋𝑋2 𝜎𝜎𝑋𝑋2 𝑆𝑆𝑋𝑋2
= 𝑃𝑃 2 ≤ 2 ≤ 𝐹𝐹𝛼𝛼,𝑚𝑚−1,𝑛𝑛−1 2
𝐹𝐹𝛼𝛼,𝑛𝑛−1,𝑚𝑚−1 𝑆𝑆𝑌𝑌 𝜎𝜎𝑌𝑌 2 𝑆𝑆𝑌𝑌
2
2
𝜎𝜎𝑋𝑋
A 1 − 𝛼𝛼 confidence interval for is given by
𝜎𝜎𝑌𝑌2

1 𝑆𝑆𝑋𝑋2 𝜎𝜎𝑋𝑋2 𝑆𝑆𝑋𝑋2


2 ≤ 2 ≤ 𝐹𝐹𝛼𝛼,𝑚𝑚−1,𝑛𝑛−1 2
𝐹𝐹𝛼𝛼,𝑛𝑛−1,𝑚𝑚−1 𝑆𝑆𝑌𝑌 𝜎𝜎𝑌𝑌 2 𝑆𝑆𝑌𝑌
2

STAT 2006 - Jan 2021 71


Confidence intervals for variances – Two sample
Example. The feeding habits of two-species of net-casting spiders are studied. The species,
the deinopis and menneus, coexist in eastern Australia. The following summary statistics
were obtained on the size, in millimeters, of the prey of the two species:
Adult DEINOPIS Adult MENNEUS
𝑛𝑛 = 10 𝑚𝑚 = 10
𝑥𝑥̅ = 10.26 𝑚𝑚𝑚𝑚 𝑦𝑦� = 9.28 𝑚𝑚𝑚𝑚
𝑠𝑠𝑋𝑋 = 2.51 𝑠𝑠𝑌𝑌 = 1.9
Estimate, with 95% confidence, the ratio of the two population standard deviations.
Since 𝐹𝐹0.025,9,9 = 4.03, a 95% confidence interval for the ratio of two population standard
deviations is
1 2.51 𝜎𝜎𝑋𝑋 2.51
0.6584 = ≤ ≤ 4.03 = 2.6507.
4.03 1.90 𝜎𝜎𝑌𝑌 1.90
STAT 2006 - Jan 2021 72
Confidence intervals for proportion
Let 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 be a random sample of size 𝑛𝑛 of a Bernoulli random variable 𝑋𝑋 with a
success probability 𝑝𝑝 = 𝐸𝐸 𝑋𝑋 . Let 𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 denote the realization of 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 .
Since 𝑌𝑌 = ∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 ~𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵𝐵 𝑛𝑛, 𝑝𝑝 , the large sample distribution of 𝑍𝑍 is
𝑋𝑋� − 𝑝𝑝
𝑍𝑍 = ~𝑛𝑛 0,1 .
𝑝𝑝 1 − 𝑝𝑝
𝑛𝑛
Wald confidence interval
A 1 − 𝛼𝛼 confidence interval for 𝑝𝑝 is given by
𝑋𝑋� 1 − 𝑋𝑋� 𝑋𝑋� 1 − 𝑋𝑋�
𝑋𝑋� − 𝑧𝑧𝛼𝛼 , 𝑋𝑋� + 𝑧𝑧𝛼𝛼 .
2 𝑛𝑛 2 𝑛𝑛
A drawback of the Wald CI is that the lower bound can be beyond zero when the true
value of 𝑝𝑝 is small and the upper bound can be greater than 1 when the true value of 𝑝𝑝 is
near one.
STAT 2006 - Jan 2021 73
Confidence intervals for proportion

Example. We surveyed 𝑛𝑛 = 418 Hong Kong citizens about their opinions on insurance
rates. Of the 418 surveyed, 𝑌𝑌 = 218 blamed rising private health premiums. The sample
proportion is
280
𝑝𝑝̂ = = 0.67.
418
Estimate, with 95% confidence, the proportion of all Hong Kong citizens who blame rising
health insurance premiums.
Answer. With 𝑧𝑧0.025 = 1.96, a 95% confidence interval for 𝑝𝑝 is
𝑝𝑝̂ 1 − 𝑝𝑝̂ 𝑝𝑝̂ 1 − 𝑝𝑝̂
𝑝𝑝̂ − 𝑧𝑧0.025 , 𝑝𝑝̂ + 𝑧𝑧0.025 = 0.625,0.715 .
𝑛𝑛 𝑛𝑛

STAT 2006 - Jan 2021 74


Confidence intervals for difference in proportions

Suppose that 𝑌𝑌1 ~Binomial 𝑛𝑛1 , 𝑝𝑝1 , 𝑌𝑌2 ~Binomial 𝑛𝑛2 , 𝑝𝑝2 , and 𝑌𝑌1 and 𝑌𝑌2 are independent
Define 𝛿𝛿 = 𝑝𝑝1 − 𝑝𝑝2 . Then
𝑌𝑌1 𝑌𝑌2
𝛿𝛿̂ = 𝑝𝑝̂1 − 𝑝𝑝̂ 2 = − .
𝑛𝑛1 𝑛𝑛2
Normal approximation to Binomial implies that
𝑝𝑝1 1 − 𝑝𝑝1 𝑝𝑝2 1 − 𝑝𝑝2
̂
𝛿𝛿~𝑛𝑛 𝛿𝛿, +
𝑛𝑛1 𝑛𝑛2
A 1 − 𝛼𝛼 confidence interval for 𝛿𝛿 is given by
𝑝𝑝̂1 1 − 𝑝𝑝̂1 𝑝𝑝̂ 2 1 − 𝑝𝑝̂ 2
𝛿𝛿̂ ± 𝑧𝑧𝛼𝛼 +
2 𝑛𝑛1 𝑛𝑛2

STAT 2006 - Jan 2021 75


Confidence intervals for difference in proportions

Example. Two detergents were tested for their ability to remove stains of a certain type.
An inspector judged the first one to be successful on 63 out of 91 independent trials and
the second one to be successful on 42 out of 79 independent trials. The respective relative
frequencies of success are 0.692 and 0.532. An approximate 90% confidence interval for
the difference 𝑝𝑝1 − 𝑝𝑝2 of the two detergents is

0.692 × 0.308 0.532 × 0.468


0.692 − 0.532 ± 1.645 + = 0.038,0.282 .
91 79
Since this interval does not include zero, the first detergent is better than the second one
for removing the type of stains in question.

STAT 2006 - Jan 2021 76


Confidence intervals for difference in proportions
Example. What is the prevalence of anemia in developing countries?

Women in Developing Women in Developed


Countries Countries
Sample Size 2100 1900
Number with Anemia 840 323

Find a 95% confidence interval for the difference in proportions of all women with anemia
in developing countries and all women from developed countries with anemia.
Answer. A 95% confidence interval for the difference in proportions is
0.4 × 0.6 0.17 × 0.83
0.40 − 0.17 ± 1.96 + = 0.203,0.257 .
2100 1900

STAT 2006 - Jan 2021 77


Sample Size – Estimating a mean

To estimate 𝜇𝜇 with a maximum error 𝜀𝜀 > 0. That is 𝑥𝑥̅ − 𝜇𝜇 < 𝜀𝜀. Since a 1 − 𝛼𝛼 confidence
𝑠𝑠 𝑠𝑠
interval for 𝜇𝜇 is 𝑥𝑥̅ ± 𝑡𝑡 ,𝑛𝑛−1 , it follows that 𝑡𝑡 ,𝑛𝑛−1
𝛼𝛼 𝛼𝛼 < 𝜀𝜀. Thus,
2 𝑛𝑛 2 𝑛𝑛
𝑠𝑠 2
𝑛𝑛 ≥ 𝑡𝑡𝛼𝛼2,𝑛𝑛−1 2 .
2 𝜀𝜀
If 𝑛𝑛 is a large value, 𝑡𝑡𝛼𝛼,𝑛𝑛−1 ≈ 𝑧𝑧𝛼𝛼 . We have
2 2
𝑠𝑠 2
𝑛𝑛 ≥ 𝑧𝑧𝛼𝛼2 2 .
2 𝜀𝜀

STAT 2006 - Jan 2021 78


Sample Size – Estimating a mean

Example. A team of researchers wants to estimate the mean IQ of students enrolled at


one prestigious university. Previous research studies have examined samples of students
from other similar universities and usually find results around 𝑥𝑥̅ = 120 and 𝑠𝑠 = 10. In
order to construct a 90% confidence interval with a margin of error of ±2 IQ points, what
sample size should be obtained?
𝑠𝑠 2 10 2
2 2×
𝑛𝑛 ≥ 𝑧𝑧0.05 2
= 1.645 2
= 67.64.
𝜀𝜀 2
The research team should attempt to obtain a sample of at least 68 individuals.

STAT 2006 - Jan 2021 79


Sample Size – Estimating a proportion

To estimate 𝑝𝑝 with a maximum error 𝜀𝜀 > 0. That is 𝑥𝑥̅ − 𝑝𝑝 < 𝜀𝜀. Since a 1 − 𝛼𝛼 confidence
𝑝𝑝� 1−𝑝𝑝� 𝑝𝑝� 1−𝑝𝑝�
interval for 𝑝𝑝 is 𝑝𝑝̂ ± 𝑧𝑧𝛼𝛼 , it follows that 𝑧𝑧𝛼𝛼 < 𝜀𝜀. Thus,
2 𝑛𝑛 2 𝑛𝑛

𝑝𝑝̂ 1 − 𝑝𝑝̂
𝑛𝑛 ≥ 𝑧𝑧𝛼𝛼2 2
.
2 𝜀𝜀
We want to construct a 95% confidence interval for with a margin of error equal to 4%.
Because there is no estimate of the proportion given, we use 𝑝𝑝̂ = 0.5 for a conservative
estimate.
2
𝑝𝑝̂ 1 − 𝑝𝑝̂ 2
0.5 × 0.5
𝑛𝑛 ≥ 𝑧𝑧0.025 2
= 1.96 × 2
= 600.23.
𝜀𝜀 0.04
We should obtain a sample of at least 𝑛𝑛 = 601.

STAT 2006 - Jan 2021 80


Sample Size – Estimating a proportion for a small, finite population

Consider a population of size 𝑁𝑁. Suppose that there are 𝑁𝑁1 respondents (𝑁𝑁1 is unknown)
in the population who would answer yes to a particular question. The true proportion of
yes respondents is
𝑁𝑁1
𝑝𝑝 = .
𝑁𝑁
Let 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 be a random sample of size 𝑛𝑛 without replacement from the population.
𝑛𝑛
Define 𝑋𝑋𝑖𝑖 = 1 if respondent 𝑖𝑖 answers yes to a particular question. Then, 𝑋𝑋 = � 𝑋𝑋𝑗𝑗 is
𝑗𝑗=1
the number of respondents in the sample who answers yes to the question. It is known
that 𝑋𝑋 has a hypergeometric distribution with mean 𝐸𝐸 𝑋𝑋 = 𝑛𝑛𝑛𝑛 and 𝑉𝑉𝑉𝑉𝑉𝑉 𝑋𝑋 =
𝑁𝑁−𝑛𝑛
𝑛𝑛𝑛𝑛 1 − 𝑝𝑝 .
𝑁𝑁−1

STAT 2006 - Jan 2021 81


Sample Size – Estimating a proportion for a small, finite population

Hence, using the Central Limit Theorem, an approximate 1 − 𝛼𝛼 confidence interval for 𝑝𝑝 is
𝑝𝑝̂ 1 − 𝑝𝑝̂ 𝑁𝑁 − 𝑛𝑛
𝑝𝑝̂ ± 𝑧𝑧𝛼𝛼 � ,
2 𝑛𝑛 𝑁𝑁 − 1

where 𝑝𝑝̂ = 𝑋𝑋.
If we want to determine the sample size 𝑛𝑛 so that the error in estimating 𝑝𝑝 is no larger
than 𝜀𝜀 > 0, we require that
𝑝𝑝̂ 1 − 𝑝𝑝̂ 𝑁𝑁 − 𝑛𝑛 2 𝑝𝑝̂ 1 − 𝑝𝑝̂ 𝑁𝑁 − 𝑛𝑛
𝑧𝑧𝛼𝛼 � ≤ 𝜀𝜀 ⟹ 𝑧𝑧𝛼𝛼 � ≤ 𝜀𝜀 2 .
2 𝑛𝑛 𝑁𝑁 − 1 2 𝑛𝑛 𝑁𝑁 − 1
𝑝𝑝� 1−𝑝𝑝�
Let 𝑚𝑚 = 𝑧𝑧𝛼𝛼2 . Solving for 𝑛𝑛, we get
2
𝜀𝜀 2
𝑚𝑚𝑚𝑚
𝑛𝑛 ≥ .
𝑁𝑁 − 1 + 𝑚𝑚

STAT 2006 - Jan 2021 82


Sample Size – Estimating a proportion for a small, finite population

A researcher is studying the population of a small town in India of 𝑁𝑁 = 2000 people. She's
interested in estimating 𝑝𝑝 for several yes/no questions on a survey. How many people 𝑛𝑛
does she have to randomly sample (without replacement) to ensure that her estimates 𝑝𝑝̂
are within 𝜀𝜀 = 0.04 of the true proportion 𝑝𝑝?
Answer.
1 2 𝑝𝑝� 1−𝑝𝑝�
2× 0.25
Since 𝑝𝑝̂ 1 − 𝑝𝑝̂ ≤ , we calculate 𝑚𝑚 as 𝑚𝑚 = 𝑧𝑧0.025 = 1.96 = 600.25 =
4 𝜀𝜀 2 0.04 2
601. For 95% confidence, the sample size 𝑛𝑛 we require is that
𝑚𝑚𝑚𝑚 601 × 2000
𝑛𝑛 ≥ = = 462.3.
𝑁𝑁 − 1 + 𝑚𝑚 2000 − 1 + 601
Thus, we require 463 people to estimate 𝑝𝑝 with 95% confidence.

STAT 2006 - Jan 2021 83

You might also like