Lecture 5 - Interval Estimation
Lecture 5 - Interval Estimation
1
Statistical Inference
Statistical Inference
We will use the sampling distribution to draw inferences about the unknown
population parameters.
• Population –
A population is the group of all items of interest to a statistics practitioner. It is
frequently very large; sometimes infinite.
E.g. All the voters in the country
• Sample –
A sample is a set of data drawn from the population. Potentially very large, but less
than the population.
E.g. All the voters from Maharashtra
Population
Sample
Subset
Statistic
Parameter
Populations have Parameters Samples have Statistics
• Parameter -
A descriptive measure of a population. A parameter is a statistical constant that
describes a feature about the population
Eg.
• Expected value μ (also called “the population mean”)
• Standard deviation σ (also called the “population standard deviation”)
• Statistic -
A descriptive measure of a sample. A statistic is any statistical value computed
to describe a sample
Eg.
• Sample mean (“x bar”)
• Sample Standard deviation s
Statistical inference is the process of making an estimate, prediction, or decision about a population
based on a sample.
Population
Sample
Inference
Statistic
Parameter
Thus, we can apply what we know about a sample to the larger population from which it
was drawn!
Rationale:
Large populations make investigating each member impractical and
expensive.
Easier and cheaper to take a sample and make estimates about the
population from the sample.
However:
Such conclusions and estimates are not always going to be correct.
For this reason, we build into the statistical inference “measures of
reliability”, namely confidence level and significance level.
If there are two unbiased estimators of a parameter, the one whose variance
is smaller is said to be relatively efficient
E.g. the sample mean is an unbiased estimator of the population mean µ , since:
E() = µ
E.g. the sample median is an unbiased estimator of the population mean µ since:
E(Sample median) = µ
V() is σ2/n
That is, as n grows larger, the variance of grows smaller.
That is, as n grows larger, the variance of the sample median grows smaller.
If there are two unbiased estimators of a parameter, the one whose variance is smaller is
said to be relatively efficient.
E.g. both the sample median and sample mean are unbiased estimators of the population
mean, however, the sample median has a greater variance than the sample mean, so we
choose since it is relatively efficient when compared to the sample median.
• Point estimate: A single statistic value that is the “best guess” for the parameter
value
• Interval estimate: An interval of numbers around the point estimate, that has a fixed
“confidence level” of containing the parameter value. Called a confidence interval
E.g.
Sample mean, (“x bar”) is the point estimator of μ
Sample Standard deviation, s, is the point estimator of σ
For example, suppose we want to estimate the mean summer income of a class of
business students. For n = 25 students,
is calculated to be 400 $/week.
A point estimator cannot be expected to provide the exact value of the population parameter.
An interval estimate can be computed by adding and subtracting a margin of error to the point estimate.
Level of significance (α) is the risk or the chance we take that the true
population parameter may not be contained in the confidence interval. It
measures how frequently the conclusion will be wrong in the long run.
E.g. a 5% significance level means that, in the long run, this type of conclusion
will be wrong 5% of the time.
𝜎 𝑠 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑆𝐷𝑜𝑟 𝑆𝑎𝑚𝑝𝑙𝑒𝑆𝐷
Sample Mean 𝑜𝑟 =
√𝑛 √ 𝑛 𝑆𝑞𝑢𝑎𝑟𝑒𝑟𝑜𝑜𝑡𝑜𝑓 𝑆𝑎𝑚𝑝𝑙𝑒𝑆𝑖𝑧𝑒
Z / t- score (depends on
confidence level)
1 - of all
values
/2 /2
𝑥
𝑧𝛼 / 2 𝜎 𝑥 𝑧𝛼 / 2 𝜎 𝑥
α /2 = 0.025 α /2 = 0.025
95 % Confidence
Interval
-1.96 1.96
“ α “ is the proportion in the tails (shaded in green) of the distribution that is outside the established
confidence interval
Step 5: Find the interval by adding and subtracting this product from the sample
mean
Here is a table of commonly used confidence levels, α, α/2 values, and corresponding z-
scores which we use in our examples
Zα/2 is calculated using the formula Normsinv(1-α/2) or by looking into the tables
42
43
Therefore,
Lower Confidence Limit = 12.8207
Upper Confidence Limit = 20.9793
Answer: The Population mean is estimated to lie within (12.8207, 20.9793) with 99%
confidence.
will contain the population mean, we say that we are 99% confident that the
interval includes the population mean m.
That is 99% of the times, this estimator will contain the population mean while
1% of the time it may not contain the population mean
1-
/2 /2
𝑥
interval
𝑧𝛼 / 2 𝜎 𝑥 𝑧𝛼 / 2 𝜎 𝑥
does not
interval
include m
𝑥 includes m
Each week Lloyd’s Department Store selects a simple random sample of 100 customers
in order to learn about the amount spent per shopping trip. Lloyd’s has been using the
weekly survey for several years. Based on the historical data, Lloyd’s now assumes a
known value of σ = $20 for the population standard deviation. The historical data also
indicate that the population follows a normal distribution.
Find a point estimate for the mean amount spent per shopping trip for the population
of all Lloyd’s customers.
Compute the margin of error for this estimate and develop an interval estimate of the
population mean.
Population Standard
Deviation is unknown
In a real-life situation, there can be various cases where a researcher does not have a fair idea about the
population standard deviation
For large sample sizes (n > 30), the sample standard deviation (s) can be a good estimate of the
population standard deviation (σ)
Hence, confidence interval for estimating population mean µ, when σ is unknown and sample size is
large (n ≥ 30) is
to
Where is the sample mean, n the sample size, s the sample standard deviation
In order to estimate the customer loyalty for a particular product, a researcher poses the following
question to a sample of 100 customers: How many years have you been continuously using this
product? This sample yielded a mean period of 8 years with a sample standard deviation of 2
years. Construct a 95% confidence interval for estimating the population mean
Since σ is unknown and sample size is large (n ≥ 30), therefore confidence interval is
to
In case of small sample size (n < 30), the problem can be solved by using the t
statistic, developed by a British statistician, William S. Gosset
53
The concept of degrees of freedom is central to the principle of estimating statistics of populations
from samples of them
The terms “degrees of freedom” describes the number of values in the final calculation of a statistic
that are free to vary
For. Eg., Imagine a set of three numbers {1, 6, 5}. Calculating the mean for those numbers is easy: (1
+ 6 + 5) / 3 = 4
Now, imagine a set of three numbers, whose mean is 3. There are lots of sets of three numbers with a
mean of 3, but for any set the bottom line is this: you can freely pick the first two numbers, any
number at all, but the third (last) number is out of your hands as soon as you picked the first two. Say
our first two numbers are the same as in the previous set, 1 and 6, giving us a set of two freely picked
numbers, and one number that we still need to choose, x: {1, 6, x}. For this set to have a mean of 3,
we don’t have anything to choose about x. X has to be 2, because (1 + 6 + 2) / 3 is the only way to get
to 3. So, the first two values were free for you to choose, the last value is set accordingly to get to a
given mean. This set is said to have two degrees of freedom, corresponding with the number of values
that you were free to choose (that is, that were allowed to vary freely)
The general rule then for any set is that if n equals the number of
values in the set, the degrees of freedom equals n – 1
3. There is not one t distribution, but rather a family of t distributions. All t distributions
have a mean of 0, but their standard deviations differ according to the sample size, n
4. The t distribution is more spread out and flatter at the center than the standard normal
distribution. As the sample size increases, however, the t distribution approaches the
standard normal distribution. Also the t distribution has more area in the tails as
compared to standard normal distribution
This formula is same as the z - formula, but the distribution table values are different
Note: t Z as n increases
Standard Normal
(t with df = ∞)
t (df = 13)
t-distributions are bell-shaped and
symmetric, but have ‘fatter’ tails than the
normal t (df = 5)
0 t
10 12 24 23 11 14 15 34 16 23
Construct a 95% confidence interval to estimate the average telephone expenses of the
employees in the population
• Since σ is unknown and sample size is small (n < 30), therefore confidence
interval is calculated using t – distribution given by
• s = 7.598
61
Solution using Excel
• n = 10,
• s = 7.598
• t0.025,9 = 2.262
Hence, the confidence
interval is 12.765 to
23.635
So, the personnel
department is 95%
confident that the
population mean will lie
between Rs. 12,765 and
Rs. 23,635
62
From a population, a random sample of size 20 is taken. This sample has a sample mean as 80 and sample
standard deviation as 10. construct a 99% confidence interval for population mean
Since σ is unknown and sample size is small (n < 30), therefore confidence interval is
to
= T.INV(1-0.005,19) = 2.861
Hence, the confidence interval is 73.603 to 86.397 Hence, we are 99% confident that the population mean
will lie between 73.603 and 86.397
Marvel Studio’s motion picture Guardians of the Galaxy opened over the first
two days of the 2014 Labor Day weekend to a record-breaking $94.3 million
in ticket sales revenue in North America (The Hollywood Reporter, August 3,
2014). The ticket sales revenue in dollars for a sample of 30 theaters is as
follows.
a. What is the 95% confidence interval estimate for the mean ticket sales
revenue per theater? Interpret this result.
b. Using the movie ticket price of $8.11 per ticket, what is the estimate of the
mean number of customers per theater?
c. The movie was shown in 4080 theaters. Estimate the total number of
customers who saw Guardians of the Galaxy and the total box office ticket
sales for the weekend.
Contrast this with: a 95% confidence interval estimate of starting salaries between
$42,000 and $45,000.
The second estimate is much narrower, providing accounting students more precise
information about starting salaries.
Increasing the sample size decreases the width of the confidence interval while
the confidence level can remain unchanged.
Note: this also increases the cost of obtaining additional data
• Margin of Error
𝜎
𝐸=𝑧 𝛼 / 2
√𝑛
71
Selecting the Sample Size…
• The Necessary Sample Size equation requires a value for the population
standard deviation s .
• If s is unknown, a preliminary or planning value for s can be used in the
equation.
1. Use the estimate of the population standard deviation computed in a
previous study.
2. Use a pilot study to select a preliminary study and use the sample
standard deviation from the study.
3. Use judgment or a “best guess” for the value of s .
72
Selecting the Sample Size…
Thus, the sample size for the new study needs to be at least 89.43 midsize
automobile rentals in order to satisfy the project director’s $2 margin-of-error
requirement.
In cases where the computed n is not an integer, we round up to the next
integer value; hence, the recommended sample size is 90 midsize automobile
rentals.