Confidence Intervals Notes
Confidence Intervals Notes
Concepts:
The Reasoning of Statistical Estimation
Margin of Error and Confidence Level
Confidence Intervals for a Population Mean
How Confidence Intervals Behave
Objectives:
Define statistical inference.
Describe the reasoning of statistical estimation.
Describe the parts of a confidence interval.
Interpret a confidence level.
Construct and interpret a confidence interval for the mean of a Normal
population.
Describe how confidence intervals behave.
References:
Moore, D. S., Notz, W. I, & Flinger, M. A. (2013). The basic practice of statistics (6th
ed.). New York, NY: W. H. Freeman and Company.
Statistical Inference
The purpose of collecting data on a sample is not simply to have data on that
sample. Researchers take the sample in order to infer from that data some
conclusion about the wider population represented by the sample.
Statistical Inference
Statistical inference provides methods for drawing conclusions about a population from
sample data.
These notes will cover how to estimate the mean of a variable for the entire
population after computing the mean for a specific sample.
For example, a researcher is interested in estimating the achievement
motivation of first year college students. The researcher must select a
random sample of students, administer a motivation scale, and then compute
the average score for the entire sample. Based on this average score, he or
she can then make an inference about the motivation of the entire population
of first year college students.
Simple Conditions for Inference about a Mean
There are certain requirements that must be met before making inferences about a
population mean:
1) The sample must be randomly selected.
2) The variable of interest must have a Normal distribution 𝑁(𝜇, 𝜎) in the
population.
3) The population mean 𝜇 is unknown, but the standard deviation 𝜎 for the
variable must be known.
Example:
Motivation Scale:
N = 400
Sample Mean = 80
Confidence Level
The confidence level is the overall capture rate if the method is used many times.
The sample mean will vary from sample to sample, but the method estimate ±
margin of error is used to get an interval based on each sample. C% of these
intervals capture the unknown population mean 𝜇. In other words, the actual mean
will be located within the interval C% of the time.
In the above example, a confidence level of 95% was selected. The value of z*
for a specific confidence level is found using a table in the back of a statistics
textbook. The value of z* for a confidence level of 95% is 1.96.
After putting the value of z*, the population standard deviation, and the
sample size into the equation, a margin of error of 3.92 is found.
Confidence Intervals
The formulas for the confidence interval and margin of error can be
combined into one formula.
The Reasoning of Statistical Estimation
Where does the formula for computing a confidence interval come from?
The true value of the population mean is never known – it can only be
approximated or estimated.
The best way to do this is to select a large number of random samples of the
same size from the population.
The mean from each random sample will be slightly different.
The average of these sample means is the population mean.
For instance, the mean for the sample in the example was 80, but if another
sample was selected the mean might be 78 or 83.
If a large number of sample means were represented graphically, they would
have a Normal distribution.
The mean of this distribution is the same as the sample mean, but the
standard deviation of this distribution is equal to the standard deviation of
the variable in the population divided by the square root of the sample size.
This is the reason that the standard deviation is divided by the square root of
n in the formula, instead of the simple standard deviation, because this
formula represents the standard deviation of the distribution of many
sample means.
When working with real data it may not be feasible to select a very large
number of random samples, but if researchers were able to do so, the
samples would form a Normal distribution.
The Reasoning of Statistical Estimation
If many random samples are collected, their means will have a Normal
distribution.
This means that the 68-95-99.7 Rule can be used to estimate the values
within which the population mean would fall.
Since 95% of values fall within two standard deviations of the mean
according to the 68-95-99.7 Rule, simply add and subtract two standard
deviations from the mean in order to obtain the 95% confidence interval.
Notice that with higher confidence levels the confidence interval gets large so
there is less precision.
According to the 68-95-99.7 Rule:
The 68% confidence interval for this example is between 78 and 82.
The 95% confidence interval for this example is between 76 and 84.
The 99.7% confidence interval for this example is between 74 and 86.
Therefore, the larger the confidence level, the larger the interval. There is a
trade-off between the two.
If researchers want to be very certain that their interval includes the
population mean, they must extend the interval, so there is less precision.
For intervals that are not specified in the 68-95-99.7 Rule, z* can be used to
obtain the upper bound and the lower bound of the interval.
It is important to note that z* provides more precise estimates than the 68-
95-99.7 Rule
How Confidence Intervals Behave
The z confidence interval for the mean of a Normal population illustrates several
important properties that are shared by all confidence intervals in common use.
The user chooses the confidence level and the margin of error follows.
Researchers would prefer high confidence with a small margin of error.
High confidence suggests the method almost always gives correct
answers.
A small margin of error suggests the parameter has been pinned down
precisely.
The confidence level is the overall capture rate if the method is used many times.
The sample mean will vary from sample to sample, but when the method estimate ±
margin of error is used to get an interval based on each sample, C% of these
intervals capture the unknown population mean μ.
1. Check the conditions for the interval that has been chosen.