0% found this document useful (0 votes)
335 views

Chapter 7: Statistical Intervals Based On A Single Sample

The document discusses statistical confidence intervals based on a single sample. It provides examples of how to construct 95% confidence intervals for a population mean when the population standard deviation is known or unknown. When the standard deviation is unknown, it is estimated using the sample standard deviation. The document also discusses how sample size impacts the desired width and accuracy of confidence intervals.

Uploaded by

Rituparn Singh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
335 views

Chapter 7: Statistical Intervals Based On A Single Sample

The document discusses statistical confidence intervals based on a single sample. It provides examples of how to construct 95% confidence intervals for a population mean when the population standard deviation is known or unknown. When the standard deviation is unknown, it is estimated using the sample standard deviation. The document also discusses how sample size impacts the desired width and accuracy of confidence intervals.

Uploaded by

Rituparn Singh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Chapter 7 : Statistical Intervals Based on a Single Sample

MATH 3800-001 Probability and Statistics For Engineers (Spring 2020)

Department of Mathematical and Statistical Sciences, University of Colorado Denver

Instructor: Mohammad Meysami

1/26
Section 7.1 : Basic Properties of Confidence Intervals

In section 6.1, we examined how to find a point estimate for an unknown population
parameter. When reporting point estimates, it is important to note the amount of
uncertainty associated with the point estimate. This is especially key when we want to
do statistical inference.
I Statistical Inference is the process of deducing properties of a population by use
of sample data.
I We will examine two methods of statistical inference: Confidence Intervals and
Hypothesis Tests.

The basic concepts and properties of confidence intervals (CIs) are most easily
introduced by first focusing on a simple, but somewhat unrealistic, problem situation.
Let’s take a look at Example 1.

2/26
Example 1
Suppose that the parameter of interest is a population mean µ and that the population
distribution is normal and the value of the population standard deviation σ is known.
I Draw a random sample from a normal distribution with mean value µ and known
standard deviation σ.
I Irrespective of the sample size n, the sample mean X is normally distributed with

expected value µ and standard deviation σ/ n.
I Standardizing X by first subtracting its expected value and then dividing by its

standard deviation σ/ n yields the standard normal variable

X −µ
Z = √
σ/ n
I The value of c such that P (−c ≤ Z ≤ c) = 0.95 is qnorm(0.975,0,1) ≈ 1.96.
This implies that !
X −µ
P −1.96 ≤ √ ≤ 1.96 = 0.95.
σ/ n

3/26

Multiplying through by σ/ n inside the parentheses
 
σ σ
P −1.96 · √ ≤ X − µ ≤ 1.96 · √ = 0.95
n n

and then simplifying yields


 
σ σ
P X − 1.96 · √ ≤ µ ≤ X + 1.96 · √ = 0.95
n n

How do we interpret this?

4/26
Consider a random sample of size n from a normal distribution with mean
value µ and known standard deviation σ.

A 95% confidence interval for µ can be expressed either as


 
σ σ
x − 1.96 · √ , x + 1.96 · √
n n
or as σ σ
x − 1.96 · √ ≤ µ ≤ x + 1.96 · √
n n

Quick check : Assume that the height of students on the campus is normally
distributed with unknown mean µ and standard deviation σ = 4.0 cm. A random
sample of n = 15 was selected. The resulting sample mean was x = 175 cm.
Construct a 95% confidence interval for the true mean µ.

5/26
Other Levels of Confidence
I The confidence level of 95% was inherited from the probability .95 for the initial
inequalities (on page 3).
I If a confidence level of 99% is desired, the initial probability of .95 must be
replaced by .99, which necessitates changing the z critical value from
qnorm(0.975,0,1) = 1.96 to qnorm(0.995,0,1) = 2.58.
I Notation: We use zα to denote the number on the horizontal scale that captures
upper tail area α.

A 100(1−α)% confidence interval for the mean µ of a normal population when


the value of σ is known is given by
 
σ σ
x − zα/2 · √ , x + zα/2 · √ .
n n

6/26
General Form of a Confidence Interval

7/26
8/26
Example 2
Suppose you obtain a 95% confidence interval for the population mean (LB, UB).
I 95% of the time, when we calculate a confidence interval, the true mean will be
between the two values. 5% of the time, it will not.
I Because the true mean (population mean) is an unknown value, we don?t know if
we are in the 5% or the 95%.
I The CORRECT interpretation is

We are 95% confident that [LB, UB] covers the population mean”

This is a common shorthand for the idea that the calculations work 95% of the
time.
I FALSE Interpretation:

There is a 95% chance that the population mean is between LB and UB”

This is a very common misconception! It seems very close to true, but it isn’t
because the population mean value is fixed. So, it is either in the interval or not.
This is subtle but important.

9/26
Sample Size Necessary for a CI

Since the 95% CI extends E√= (1.96)(σ/ n) to each side of x, the width of the
interval is 2E = 2(1.96)(σ/ n).

Suppose we want to know how big of a sample we need for a desired width. We can
use the following formula
 σ 2
n = zα/2 ·
E
where E is the desired margin of error. Note that the width of the interval is twice the
margin of error.

10/26
Example 3
The IQ of students enrolled at one prestigious university are assumed to be normally
distributed with unknown mean µ and known standard deviation σ = 3 points. A team
of researchers wants to estimate the mean IQ of students enrolled at this university.
I In order to construct a 90% confidence interval for µ with a margin of error of ±2
IQ points, what sample size should be obtained?

I What sample size is necessary to ensure that the resulting 95% CI for µ has a
width of (at most) 2 point?

11/26
Example 4
Assume that the helium porosity (in percentage) of coal samples taken from any
particular seam is normally distributed with true standard deviation 0.75.
1. Compute a 95% CI for the true average porosity of a certain seam if the average
porosity for 20 specimens from the seam was 4.85.
2. Compute a 98% CI for true average porosity of another seam based on 16
specimens with a sample average porosity of 4.56.
3. How large a sample size is necessary if the width of the 95% interval is to be 0.40?

12/26
Sections 7.2 and 7.3 : CIs for a Population Mean and Proportion

I The CI for µ given in the previous section assumed that the population
distribution is normal with the value of σ known.
X −µ
I Let’s take a step back. We construct the CIs from the fact that √ is
σ/ n
approximately normal.
I If n ≥ 30, the CLT can be applied. If we have a random sample X1 , X2 , . . . , Xn
from a population having mean µ and variance σ 2 then

X −µ
√ = Z ∼ N (0, 1)
σ/ n
I This means for n ≥ 30 or huge enough, we do not need to care about the
distribution of Xi . The method we constructed in the previous section still works.

If n is sufficiently large, a 100(1 − α)% confidence interval for the mean µ of


a population with known σ is given by
 
σ σ
x − zα/2 · √ , x + zα/2 · √ .
n n

13/26
I What if the σ is unknown?
I We can estimate σ 2 by σ b2 = s 2 . For large n, the substitution of s for σ only adds
little extra variability. The standardize variable has approximately a standard
normal distribution. That is,

X −µ
√ = Z ∼ N (0, 1)
s/ n

If n is sufficiently large, a 100(1 − α)% confidence interval for the mean µ of


a population with unknown σ is given by
 
s s
x − zα/2 · √ , x + zα/2 · √ .
n n

14/26
Example 5
The shopping times of n = 64 randomly selected customers at a local supermarket
were recorded. The average and standard deviation of the 64 shopping times were 33
minutes and 16 minutes, respectively. Construct a 95% confidence interval for the true
mean of the shopping times.

15/26
I However, if n is small (n < 30), the CLT cannot be applied so the quantity

(X − µ)/(σ/ n) is no longer standard normal.
I In case of small n, if the distribution of Xi is not assumed, then we cannot infer
anything!
I In case of small n, if the distribution of Xi is assumed to be normal, then we have
a hope to do a statistical inference.
I This is the only case we can do when we have a small sample: if the distribution
of Xi is (approximately) normal √ and σ is unknown, then we will estimate σ by s
and the quantity (X − µ)/(s/ n) will be more spread out than the standard
normal distribution. The inferences are based on t distributions. i.e.

X −µ
√ = Tn−1
s/ n

A 100(1−α)% confidence interval for the mean µ of a population with unknown


σ is given by
 
s s
x − tα/2,n−1 · √ , x + tα/2,n−1 · √ .
n n

16/26
Properties of t Distributions

I Any particular t distribution results from specifying the value of a single


parameter, called the number of degrees of freedom, abbreviated df. We’ll denote
this parameter by the Greek letter ν.
I Possible values of ν are the positive integers. So there is a t distribution with 1
df, another with 2 df, yet another with 3 df, and so on.

17/26
Properties of t Distributions

Let tν be the t distribution with µ df.


I Each tν curve is bell-shaped and centered at 0.
I Each tν curve is more spread out than the standard normal (z) curve.
I As ν increases, the spread of the corresponding tν curve decreases.
I As ν → ∞, the sequence of tν curves approaches the standard normal curve (so
the z curve is often called the t curve with df = ∞).

Let tα,ν be the number on the measurement axis for which the area under the
t curve with ν df to the right of α. This is called a t critical value. In R, you
can use function qt(x,df) to find this number.

18/26
Example 6
A manufacturer of gunpowder has developed a new powder, which was tested in eight
shells. The resulting muzzle velocities, in feet per second, were as follows

3005 2925 2935 2965 2995 3005 2937 2905

Assume that muzzle velocities are approximately normally distributed. Find a 95%
confidence interval for the true average velocity µ for shells of this type.

19/26
Example 7
The administrators for a hospital wished to estimate the average number of days
required for inpatient treatment of patients between the ages of 25 and 34. A random
sample of 500 hospital patients between these ages produced a mean and standard
deviation equal to 5.4 and 3.1 days, respectively. Construct a 95% confidence interval
for the mean length of stay for the population of patients from which the sample was
drawn.

20/26
Example 8
Organic chemists often purify organic compounds by a method known as fractional
crystallization. An experimenter wanted to prepare and purify 4.85 g of aniline. Ten
4.85-gram specimens of aniline were prepared and purified to produce acetanilide. The
following dry yields were obtained:

3.85 3.88 3.90 3.62 3.72 3.80 3.85 3.36 4.01 3.82

The sample statistics are x = 3.781 and s 2 = 0.0327. Construct a 95% confidence
interval for the mean number of grams of acetanilide that can be recovered from 4.85
grams of aniline.

21/26
A Confidence Interval for a Population Proportion

I Let p denote the proportion of successes in a population. A random sample of n


individuals is to be selected, and X is the number of successes in the sample.
Then X is Binomial random variable with parameters n and p.
p
I An unbiased estimator of p is pb = X /n. Its standard error is p(1 − p)/n.
I Provided that n is sufficiently large, it can be shown that

pb − p
p ∼ N (0, 1)
p(1 − p)/n

Provided that n is sufficiently large, a 100(1 − α)% confidence interval for the
proportion p of a population is given by
r r !
pb(1 − pb) pb(1 − pb)
pb − zα/2 , pb + zα/2 .
n n

22/26
Example 9
In the 2016 election, a random Sample of 1000 Republicans finds 337 of them support
Ted Cruz. Calculate a 90% CI for the true proportion of Republicans that support Ted
Cruz.

23/26
Example 10
Suppose a random sample of 100 students find 45 of them spend more than 10 hours
per week on homework. Construct a 95% CI for the true proportion of students that
spend more than 10 hours per week on homework.

24/26
For population proportion, suppose we want to know how big of a sample we need for
a desired width. We can use the following formula
p !2
pb(1 − pb)
n= zα/2 ·
E

where E is the desired margin of error. Note that the width of the interval is twice the
margin of error.

Can you derive this formula?

25/26
Example 11
The reaction of an individual to a stimulus in a psychological experiment may take one
of two forms, A or B. If an experimenter wishes to estimate the probability p that a
person will react in manner A, how many people must be included in the experiment?
Assume that the experimenter will be satisfied if the error of estimation is less than
0.04 with probability equal to 0.90. Assume also that he expects p to lie somewhere in
the neighborhood of 0.6. i.e. pb = 0.6.

26/26

You might also like