CIForMean LargeSample Lesson PDF
CIForMean LargeSample Lesson PDF
This lesson includes an overview of the subject, instructor notes, and example exercises using
Minitab.
Given a large sample size n (where we will assume n > 30), we will construct confidence intervals
for the mean μ of a population.
Prerequisites
This lesson requires knowledge from the Normal Distribution and Sampling Distribution of 𝑿 ̅
(e.g. Central Limit Theorem) lessons. Students need to understand that the distribution of 𝑋̅ is
exactly or approximately normally distributed for a large sample size n. Otherwise, the process
that we use in this lesson for constructing a confidence interval with a large sample will not
make sense.
Learning Targets
• Interpret what a confidence interval says about the likelihood of the interval containing
the parameter of interest.
• Construct a confidence interval by hand and use a standard normal table for a
population mean given a large sample size n.
• Construct a confidence interval using Minitab for a population mean given a large
sample size n.
WWW.MINITAB.COM/ACADEMIC
Time Required
It will take the instructor at least 60 minutes in class to introduce this lesson if the confidence
interval simulation exercise is included—it is highly recommended. This exercise will help
students understand the correct interpretation of a confidence interval for a given confidence
level. It will take about 15 minutes for students to generate their confidence intervals and write
them on the board during class. The exercises on the activity sheet will also take 60 minutes, and
they can be used as homework or quiz problems.
Materials Required
Assessment
The activity sheet contains exercises for students to assess their understanding of the learning
targets for this lesson.
Possible Extensions
This lesson introduces students to constructing confidence intervals for a population mean with
a large sample size. The recommended follow-up lesson is Population Mean Hypothesis
Testing for Large Samples.
References
For the Activity: “The Blind Paper Cutter: Teaching about Variation, Bias, Stability, and Process
Control,” by Richard Stone. Available from The American Statistician, Volume 52, No. 3 (August
1998), pp. 244-247.
Miscellaneous
• You might ask, “Why don’t we just compute the population mean rather than calculate
an estimate for it?” Good question! Suppose your population of interest is Canada, and
you want to know the mean age of the population.
o Due to lack of time, energy, and money, you cannot obtain the age of every person
in Canada.
o You can select a sample (e.g. a simple random sample – see the Sampling lesson)
and calculate the mean of that sample, 𝑥̅ .
• Your next question may be, “Why don’t we just use the sample mean 𝑥̅ to estimate the
population mean μ?” Another good question!
o We can – but the sample mean 𝑥̅ may be quite different from the population mean μ,
even if we obtained the sample correctly.
o In addition, a single number estimate by itself, such as 𝑥̅ , provides no information
about the precision and reliability of the estimate with respect to the larger
population.
• Statisticians use the sample statistic 𝑥̅ and the population or sample standard deviation
to provide an interval of plausible estimates for the population parameter µ. This interval
is called a confidence interval.
Definition: The confidence level is a measure of the degree of reliability of the confidence
interval.
1. In Minitab, have each student construct ten columns, each with 100 data points sampled
from a standard normal population.
Minitab 19
Minitab 19 (Mac)
Minitab Express
2. Have students examine their columns of random data from a standard normal population.
a) Are the sample values in the columns representative of numbers that they’d expect to
see from a normal distribution with mean 0 and standard deviation 1?
3. Have students compute 95% confidence intervals for the population mean for each column.
How the intervals are constructed will be discussed later in this lesson.
Minitab Express
4. Have students write their ten 95% confidence intervals for the population mean μ on the
front board, rounding the intervals to 2 decimal places to save class time.
The appendix for this lesson includes one hundred 95% confidence intervals for the
population mean μ, where the population of interest is a standard normal distribution. Below
is one set of ten intervals, in which one of them does not contain the true population mean
μ = 0.
μ: mean of C1, C2, C3, C4, C5, C6, C7, C8, C9, C10
Known standard deviation = 1
5. With the students, highlight and count the number of 95% confidence intervals for the
population mean that don’t contain the true mean, which we set as μ = 0. You should notice
that approximately 95% of them will contain the population mean μ = 0, while
approximately 5% of them will not.
6. Make sure the students see that the confidence intervals are unique since each random
sample of size n = 100 is unique. It’s important to reinforce in students’ minds that the
random part of the confidence interval is the interval, and not the parameter μ.
Example 1
A survey claims that the average earnings for college students for a one-month internship is μ =
$4500. Your students tell you this amount seems high. To test this claim, you take a random
sample of n = 100 students from your college population. The sample mean 𝑥̅ = $3975.
Using the random sample of 100 students’ monthly internship salaries at your college, you
calculate a 95% confidence interval for the mean to be [$3525, $4425]. Notice that this interval
does not contain the survey result’s stated mean of μ = $4500. What can we conclude?
From the Minitab confidence interval simulation exercise, 95% of the intervals built in this
random fashion will capture the true mean monthly earnings μ for summer college students at
your college. One of the following statements must be true:
• The interval you computed just happens to be one of the 5% of confidence intervals that
does not capture μ. You could take another sample to see if it reflects the results of the
first confidence interval.
Important: For the rest of this lesson, large 𝑋̅ represents a random variable and 𝑥̅ is an
actual value from that distribution. Small 𝑥̅ is the sample mean computed with data. Large
𝑋̅ is the distribution of all possible small 𝑥̅ ’s.
Example 2
A quality control engineer works at a cereal company, and she wants to estimate the true mean
fill weight μ of cereal boxes filled on a given day. She draws a simple random sample of 100
boxes from the population of all cereal boxes that are filled that day. The population standard
deviation is known to be σ = 0.1 oz. for this process. She computes the sample mean fill
weight to be 𝒙̅ = 12.05 oz.
• Let μ represent the true (unknown) population mean fill weight. The population standard
deviation is σ = 0.1 oz.
• Let x1, x2, ..., x100 be the individual weights of the simple random sample of 100 cereal
boxes.
• The value 𝑥̅ is a sample average weight of a random sample of 100 cereal boxes. Every
random sample of 100 boxes will produce a different 𝑥̅ .
• Let 𝑋̅ represent the distribution of all 𝑥̅ ’s from different random samples from the
population of all cereal boxes produced that day.
• From the Sampling Distribution of 𝑿 ̅ lesson, the Central Limit Theorem says:
Let X1, X2, X3, …, Xn be independent random variables with identical distributions with mean μ and
standard deviation . If n is “large” enough (n > 30 suggested in most texts), then:
The sampling distribution of the sample mean 𝑋̅ is approximately normally distributed with mean μ
𝜎
and standard deviation if the original distributions are non-normal.
√𝑛
The larger the sample size n is, the more normally distributed the sampling distribution will be and
the more tightly it will converge about the true population mean μ.
• The distribution of the sample mean 𝑋̅ is exactly normally distributed with mean μ and
𝜎
standard deviation 𝑛 if the original distributions are normal.
√
0.1
with mean μ𝑋̅ = μ oz. and standard deviation 𝜎𝑋̅ = = 0.01 oz.
√100
• Independent of the original distribution, the approximate distribution of 𝑋̅ for many
samples of size n = 100 has the following shape:
Let’s consider two separate cases for the location of 𝑥̅ = 12.05 on the distribution of 𝑋̅ graph.
Case 1: Suppose the calculated sample average 𝑥̅ lands within one standard deviation of the
true population mean μ. We have no idea what μ is, but we hope to capture μ with a 95%
confidence interval constructed with 𝑥̅ as the center of the interval. Below is a hypothetical
graph of this case.
Case 2: Suppose the calculated sample average 𝑥̅ lands outside of two standard deviations from
the true population mean μ. Again, we have no idea what μ is, but we hope to capture μ with a
95% confidence interval constructed with 𝑥̅ as the center of the interval. Below is a hypothetical
graph of this case.
Conclusion to Case 2: The 95% confidence interval does not capture μ as shown above.
• When constructing a 95% confidence interval, there is a 95% chance that we’ll obtain a
sample mean 𝑥̅ that is within two standard deviations of the population mean.
Constructing a confidence interval around sample means that fall within two standard
deviations of the population mean will guarantee that we capture μ.
• There is a 5% chance that we’ll obtain a sample mean 𝑥̅ that is outside of two standard
deviations of the population mean. Constructing a confidence interval around these
sample means will not capture μ.
• Although we don’t know the value of μ, the 68-95-99.7 rules says that approximately
95% of data lies within two standard deviations of μ.
• The probability of drawing a sample with a mean 𝑥̅ that is within two standard deviations
of μ is 95%.
• If we obtain a sample mean 𝑥̅ that is within two standard deviations of μ, then putting
95% confidence interval bounds around it will capture the true population mean μ.
• The probability of drawing a sample with a mean 𝑥̅ that is more than two standard
deviations away from μ is 5%.
• If we obtain a sample mean 𝑥̅ that is more than two standard deviations away from μ,
then putting 95% confidence interval bounds around it will not capture the true
population mean μ.
To calculate an approximate 95% confidence interval in this example, we let 𝑥̅ be at the center of
the interval. To attempt to capture μ within the interval, we extend the upper interval bound to
two standard deviations to the right of 𝑥̅ . The lower interval bound is set at two standard
deviations to the left of 𝑥̅ . Symbolically, here is what we have:
Our 95% confidence interval for the true population mean μ is approximately [12.03, 12.07] oz.
Question: Does the 95% confidence interval that we constructed for the true population mean μ
actually contain μ?
Answer: Probably. The above 95% confidence interval for μ is computed by a procedure that
succeeds in covering the population mean 95% of the time. There’s a 95% chance that we
captured μ and a 5% chance that we did not.
Example 3
Choose the best interpretation of a 95% confidence interval for the population mean μ.
A. If repeated random samples were taken and the 95% confidence intervals were computed for
each sample, 95% of the intervals would contain the population mean μ.
B. The probability that the population mean μ is in the confidence interval is 0.95.
Answer: The correct answer is A; it follows the confidence interval simulation exercise that we
did at the beginning of the lesson. Answer B is incorrect because it places the probability on μ,
instead of on the confidence interval. Answer C is incorrect since the confidence interval for the
population mean is built using sample means and not values from the population distribution.
Using population distribution values would give us a confidence interval that is wider than the
one for the population mean.
Notation:
Examples:
• The value z0.025 is the positive z-score that has α/2 = 0.025 probability to its right. Using
the standard normal table or Minitab (as shown in the Normal Distribution lesson), the
desired z-score is 1.96.
• The value -z0.025 is the negative z-score that has α/2 = 0.025 probability to its left. The
desired z-score is -1.96.
• The value z0.10 is the positive z-score that has α/2 = 0.1 probability to its right. The
desired z-score is 1.282.
• The value -z0.25 is the negative z-score that has α/2 = 0.25 probability to its left. The
desired z-score is -0.6745.
We are now ready for the formula for a confidence interval for the population mean μ given a
large sample size n.
Example 4
Every child loves getting his or her first bicycle. However, many parents dread the painful task of
reading through confusing instructions to put that bike together. To make sure a parent leaves
enough time for bike building before its exciting reveal, wouldn’t it be nice if the instruction
manual provided a confidence interval for the true population mean time to put the bike
together?
Suppose a bike manufacturer wants to construct a confidence interval for the true mean time to
put its most popular bike together. Instead of asking buyers to send in a survey (reread the
The mean time to assemble the bike based on the random sample of n = 50 people is 𝑥̅ = 10.4
minutes. The population standard deviation of the assembly times is known to be σ = 1.2
minutes.
(a) Determine a two-sided 95% confidence interval for the true population mean assembly time
μ. What assumption are you making about the distribution of 𝑋̅?
Solution: Although we do not know the population distribution for the assembly times, we can
assume that 𝑋̅ is approximately normally distributed since the sample size is n = 50. Since we
are building a two-sided 95% confidence interval for the population mean, we need to split =
0.05 into two halves and allow 0.025 probability in each tail of the normal curve. The z-score
corresponding to 0.025 probability in the right tail of a standard normal distribution is 1.96, and
-1.96 for 0.025 probability in the left tail. Using the confidence interval formula above, we have:
Solution: Since we are building a two-sided 99% confidence interval for the population mean,
then we need to split = 0.01 into two halves and allow 0.005 probability in each tail of the
normal curve. The z-scores corresponding to 0.005 probability in the right tail and left tail of a
standard normal distribution is 2.58 and -2.58, respectively. Using the confidence interval
formula above, we have:
Notice that we although we have greater reliability with a 99% confidence interval, compared to
95%, we now have a wider interval and therefore less precision.
Since we want a 99% confidence interval, we’ll need to add an extra step to the Minitab
instructions from part (a).
Determine a 90% confidence interval for the true population mean time between phone calls μ
based on the sample data.
Solution: Like Example 4, we do not know the population distribution of the time between
calls, but we can assume that 𝑋̅ is approximately normally distributed since the sample size is n
= 64. Since we are building a two-sided 90% confidence interval for the population mean, we
need to split = 0.10 into two halves and allow 0.05 probability in each tail of the normal curve.
The z-score corresponding to 0.05 probability in the right tail of a standard normal distribution
is 1.645, and -1.645 for 0.05 probability in the left tail.
In this example, we are given a data set instead of summarized statistics. We can use Minitab to
determine the sample mean 𝑥̅ (see the Describing Data Numerically lesson). Here are the
descriptive statistics from Minitab:
Note that the mean is equal to 3.176. The 90% confidence interval is:
𝛔 𝛔 𝟐. 𝟕 𝟐. 𝟕
[𝐱̅ − 𝐳𝛂⁄𝟐 ∗ , 𝐱̅ + 𝐳𝛂⁄𝟐 ∗ ] ≅ [𝟑. 𝟏𝟕𝟔 − 𝟏. 𝟔𝟒𝟓 ∗ , 𝟑. 𝟏𝟕𝟔 + 𝟏. 𝟔𝟒𝟓 ∗ ] ≅ [2.6208, 3.7312] minutes
√𝐧 √𝐧 √𝟔𝟒 √𝟔𝟒
Minitab Express
Appendix
This appendix contains one hundred confidence intervals for the population mean μ. The data
was generated from a standard normal distribution with mean μ = 0 and standard deviation
= 1.
In the simulation exercise, we know the population mean μ, but typically μ is unknown. Since we
built 95% confidence intervals, we expect approximately 95 out of 100 of them to contain the
true mean μ = 0. The highlighted intervals below do not contain the population mean μ = 0.