Chapter 7: Statistical Intervals Based On A Single Sample
Chapter 7: Statistical Intervals Based On A Single Sample
1/26
Section 7.1 : Basic Properties of Confidence Intervals
In section 6.1, we examined how to find a point estimate for an unknown population
parameter. When reporting point estimates, it is important to note the amount of
uncertainty associated with the point estimate. This is especially key when we want to
do statistical inference.
I Statistical Inference is the process of deducing properties of a population by use
of sample data.
I We will examine two methods of statistical inference: Confidence Intervals and
Hypothesis Tests.
The basic concepts and properties of confidence intervals (CIs) are most easily
introduced by first focusing on a simple, but somewhat unrealistic, problem situation.
Let’s take a look at Example 1.
2/26
Example 1
Suppose that the parameter of interest is a population mean µ and that the population
distribution is normal and the value of the population standard deviation σ is known.
I Draw a random sample from a normal distribution with mean value µ and known
standard deviation σ.
I Irrespective of the sample size n, the sample mean X is normally distributed with
√
expected value µ and standard deviation σ/ n.
I Standardizing X by first subtracting its expected value and then dividing by its
√
standard deviation σ/ n yields the standard normal variable
X −µ
Z = √
σ/ n
I The value of c such that P (−c ≤ Z ≤ c) = 0.95 is qnorm(0.975,0,1) ≈ 1.96.
This implies that !
X −µ
P −1.96 ≤ √ ≤ 1.96 = 0.95.
σ/ n
3/26
√
Multiplying through by σ/ n inside the parentheses
σ σ
P −1.96 · √ ≤ X − µ ≤ 1.96 · √ = 0.95
n n
4/26
Consider a random sample of size n from a normal distribution with mean
value µ and known standard deviation σ.
Quick check : Assume that the height of students on the campus is normally
distributed with unknown mean µ and standard deviation σ = 4.0 cm. A random
sample of n = 15 was selected. The resulting sample mean was x = 175 cm.
Construct a 95% confidence interval for the true mean µ.
5/26
Other Levels of Confidence
I The confidence level of 95% was inherited from the probability .95 for the initial
inequalities (on page 3).
I If a confidence level of 99% is desired, the initial probability of .95 must be
replaced by .99, which necessitates changing the z critical value from
qnorm(0.975,0,1) = 1.96 to qnorm(0.995,0,1) = 2.58.
I Notation: We use zα to denote the number on the horizontal scale that captures
upper tail area α.
6/26
General Form of a Confidence Interval
7/26
8/26
Example 2
Suppose you obtain a 95% confidence interval for the population mean (LB, UB).
I 95% of the time, when we calculate a confidence interval, the true mean will be
between the two values. 5% of the time, it will not.
I Because the true mean (population mean) is an unknown value, we don?t know if
we are in the 5% or the 95%.
I The CORRECT interpretation is
We are 95% confident that [LB, UB] covers the population mean”
This is a common shorthand for the idea that the calculations work 95% of the
time.
I FALSE Interpretation:
There is a 95% chance that the population mean is between LB and UB”
This is a very common misconception! It seems very close to true, but it isn’t
because the population mean value is fixed. So, it is either in the interval or not.
This is subtle but important.
9/26
Sample Size Necessary for a CI
√
Since the 95% CI extends E√= (1.96)(σ/ n) to each side of x, the width of the
interval is 2E = 2(1.96)(σ/ n).
Suppose we want to know how big of a sample we need for a desired width. We can
use the following formula
σ 2
n = zα/2 ·
E
where E is the desired margin of error. Note that the width of the interval is twice the
margin of error.
10/26
Example 3
The IQ of students enrolled at one prestigious university are assumed to be normally
distributed with unknown mean µ and known standard deviation σ = 3 points. A team
of researchers wants to estimate the mean IQ of students enrolled at this university.
I In order to construct a 90% confidence interval for µ with a margin of error of ±2
IQ points, what sample size should be obtained?
I What sample size is necessary to ensure that the resulting 95% CI for µ has a
width of (at most) 2 point?
11/26
Example 4
Assume that the helium porosity (in percentage) of coal samples taken from any
particular seam is normally distributed with true standard deviation 0.75.
1. Compute a 95% CI for the true average porosity of a certain seam if the average
porosity for 20 specimens from the seam was 4.85.
2. Compute a 98% CI for true average porosity of another seam based on 16
specimens with a sample average porosity of 4.56.
3. How large a sample size is necessary if the width of the 95% interval is to be 0.40?
12/26
Sections 7.2 and 7.3 : CIs for a Population Mean and Proportion
I The CI for µ given in the previous section assumed that the population
distribution is normal with the value of σ known.
X −µ
I Let’s take a step back. We construct the CIs from the fact that √ is
σ/ n
approximately normal.
I If n ≥ 30, the CLT can be applied. If we have a random sample X1 , X2 , . . . , Xn
from a population having mean µ and variance σ 2 then
X −µ
√ = Z ∼ N (0, 1)
σ/ n
I This means for n ≥ 30 or huge enough, we do not need to care about the
distribution of Xi . The method we constructed in the previous section still works.
13/26
I What if the σ is unknown?
I We can estimate σ 2 by σ b2 = s 2 . For large n, the substitution of s for σ only adds
little extra variability. The standardize variable has approximately a standard
normal distribution. That is,
X −µ
√ = Z ∼ N (0, 1)
s/ n
14/26
Example 5
The shopping times of n = 64 randomly selected customers at a local supermarket
were recorded. The average and standard deviation of the 64 shopping times were 33
minutes and 16 minutes, respectively. Construct a 95% confidence interval for the true
mean of the shopping times.
15/26
I However, if n is small (n < 30), the CLT cannot be applied so the quantity
√
(X − µ)/(σ/ n) is no longer standard normal.
I In case of small n, if the distribution of Xi is not assumed, then we cannot infer
anything!
I In case of small n, if the distribution of Xi is assumed to be normal, then we have
a hope to do a statistical inference.
I This is the only case we can do when we have a small sample: if the distribution
of Xi is (approximately) normal √ and σ is unknown, then we will estimate σ by s
and the quantity (X − µ)/(s/ n) will be more spread out than the standard
normal distribution. The inferences are based on t distributions. i.e.
X −µ
√ = Tn−1
s/ n
16/26
Properties of t Distributions
17/26
Properties of t Distributions
Let tα,ν be the number on the measurement axis for which the area under the
t curve with ν df to the right of α. This is called a t critical value. In R, you
can use function qt(x,df) to find this number.
18/26
Example 6
A manufacturer of gunpowder has developed a new powder, which was tested in eight
shells. The resulting muzzle velocities, in feet per second, were as follows
Assume that muzzle velocities are approximately normally distributed. Find a 95%
confidence interval for the true average velocity µ for shells of this type.
19/26
Example 7
The administrators for a hospital wished to estimate the average number of days
required for inpatient treatment of patients between the ages of 25 and 34. A random
sample of 500 hospital patients between these ages produced a mean and standard
deviation equal to 5.4 and 3.1 days, respectively. Construct a 95% confidence interval
for the mean length of stay for the population of patients from which the sample was
drawn.
20/26
Example 8
Organic chemists often purify organic compounds by a method known as fractional
crystallization. An experimenter wanted to prepare and purify 4.85 g of aniline. Ten
4.85-gram specimens of aniline were prepared and purified to produce acetanilide. The
following dry yields were obtained:
3.85 3.88 3.90 3.62 3.72 3.80 3.85 3.36 4.01 3.82
The sample statistics are x = 3.781 and s 2 = 0.0327. Construct a 95% confidence
interval for the mean number of grams of acetanilide that can be recovered from 4.85
grams of aniline.
21/26
A Confidence Interval for a Population Proportion
pb − p
p ∼ N (0, 1)
p(1 − p)/n
Provided that n is sufficiently large, a 100(1 − α)% confidence interval for the
proportion p of a population is given by
r r !
pb(1 − pb) pb(1 − pb)
pb − zα/2 , pb + zα/2 .
n n
22/26
Example 9
In the 2016 election, a random Sample of 1000 Republicans finds 337 of them support
Ted Cruz. Calculate a 90% CI for the true proportion of Republicans that support Ted
Cruz.
23/26
Example 10
Suppose a random sample of 100 students find 45 of them spend more than 10 hours
per week on homework. Construct a 95% CI for the true proportion of students that
spend more than 10 hours per week on homework.
24/26
For population proportion, suppose we want to know how big of a sample we need for
a desired width. We can use the following formula
p !2
pb(1 − pb)
n= zα/2 ·
E
where E is the desired margin of error. Note that the width of the interval is twice the
margin of error.
25/26
Example 11
The reaction of an individual to a stimulus in a psychological experiment may take one
of two forms, A or B. If an experimenter wishes to estimate the probability p that a
person will react in manner A, how many people must be included in the experiment?
Assume that the experimenter will be satisfied if the error of estimation is less than
0.04 with probability equal to 0.90. Assume also that he expects p to lie somewhere in
the neighborhood of 0.6. i.e. pb = 0.6.
26/26