0% found this document useful (0 votes)
25 views125 pages

1 EC108 Estimation and Confidence Interval

This document discusses the estimation of population parameters using confidence intervals, focusing on the mean, proportion, and variance. It outlines the process of statistical inference, the importance of point estimates and interval estimates, and the role of sample size and confidence levels in determining the accuracy of these estimates. Additionally, it covers the construction of confidence intervals when the population standard deviation is known and introduces the t-distribution for cases where it is unknown.

Uploaded by

williambecca2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views125 pages

1 EC108 Estimation and Confidence Interval

This document discusses the estimation of population parameters using confidence intervals, focusing on the mean, proportion, and variance. It outlines the process of statistical inference, the importance of point estimates and interval estimates, and the role of sample size and confidence levels in determining the accuracy of these estimates. Additionally, it covers the construction of confidence intervals when the population standard deviation is known and introduces the t-distribution for cases where it is unknown.

Uploaded by

williambecca2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 125

Confidence Interval Estimation:

One
Population
Estimating the Value of a Parameter
Using Confidence Intervals
Chapter Outline
1

4
Introduction
Introduction
• In this chapter we address these and other
types of situations that require an estimate of
some population parameter.
• Inferential statements concerning estimates of a
single population parameter , based on
information contained in a random sample are
discussed
• Specifically, we focus on the procedures to
estimate the mean of a population, a proportion
of a population members and the variance of
the population
Overview
• We apply the results about the sample
mean to the problem of estimation
• Estimation is the process of using sample
data to estimate the value of a population
parameter
• We will quantify the accuracy of our
estimation process
Statistical Inference
• Its most important in decision making process
in economics, business and science
• Statistical Estimation involves the use of
sample statistics to predict the corresponding
population parameters.
• Statistical Inference refers to the estimation
and hypothesis testing
• Estimation is the process of inferring or
estimating a population parameter (such as its
mean or standard deviation) from the
corresponding statistic of a sample drawn from
a population
Statistical Estimation
Point estimator and point estimate
and CI estimator
• Lets take note of the following
• A point estimate is a single number,
– How much uncertainty is associated with a point estimate of a
population parameter? A point estimate is highly unreliable measure
of a population parameter as the probability that it will exactly
equal the true value is extremely small
• An interval estimate provides more information about a population
characteristic than does a point estimate. It provides a confidence level for
the estimate. Such interval estimates are called confidence intervals. CI
offers a range of values within which the population parameter is
expected to fall

Upper
Lower
Confidence Confidence
Point Estimate
Limit Limit
Width of
confidence interval
Point estimate
• A point estimator should be:
• 1. Unbiased
• 2. Most efficient
• (read more about this)
• An interval gives a range of values:
– Takes into consideration variation in sample
statistics from sample to sample
– Based on observations from 1 sample
– Gives information about closeness to unknown
population parameters (it provides information
about how close the point estimate, provided by the
sample, is to the value of the population parameter)
– Stated in terms of level of confidence. (Can never
be 100% confident)
• The general formula for all confidence intervals is
equal to:
Point Estimate ± (Critical Value)(Standard Error)
1
Learning Objectives
Confidence Interval Estimation of
Population Mean, μ, when σ is known
• Our objective is to find a range of values, rather
than a single number, to estimate a population
mean (This may look unrealistic since population
variance is usually unavailable )
• Assumptions
– Population standard deviation σ is known
– Population is normally distributed
– If population is not normal, use large sample
σ
• Confidence interval estimate: x = X  Z
n
(where Z is the normal distribution’s critical value for a probability of α/2
in each tail)
• Consider a 95% confidence interval:
1 −  = .95  = .05  / 2 = .025

α .475 .475 α
= .025 = .025
2 2

Z
Z= -1.96 0 Z= 1.96
Lower Upper
Confidence Point Estimate Confidence
Limit Limit

μ
μl μu
The confidence interval for a
single population mean, μ, is
given as
Confidence Interval and
Confidence Level
• Simple example
Confidence Interval Estimation for the Mean of a
Population That is Normally Distributed: Population
Variance Known
Example CI
Estimation
• The environment of our problem is that,
we want to estimate the value of an
unknown population mean
• The process that we use is called
estimation
• This is one of the most common goals of
statistics
Point Estimate
• Estimation involves two steps
– Step 1 – to obtain a specific numeric estimate,
this is called the point estimate
– Step 2 – to quantify the accuracy and
precision of the point estimate
• The first step is relatively easy
• The second step is why we need statistics
Examples of Point Estimate
• Some examples of point estimates are
– The sample mean to estimate the population
mean
– The sample standard deviation to estimate
the population standard deviation
– The sample proportion to estimate the
population proportion
– The sample median to estimate the
population median
Precision of Point Estimate
• The most obvious point estimate for the
population mean is the sample mean
• Now we will use the material on the
sampling distribution of sample mean to
quantify the accuracy and precision of this
point estimate
Example
• An example of what we want to quantify
– We want to estimate the miles per gallon for a
certain car
– We test some number of cars
– We calculate the sample mean …eg. it is 27
and CI is 25-29
– 27 miles per gallon would be our best guess
Example (continued)
• How sure are we that the gas economy is
27 and not 28.1, or 25.2?
• We would like to make a statement such
as
“We think that the mileage is 27 mpg
and we’re pretty sure that we’re
not too far off”
Interval Estimation
• A confidence interval for an unknown parameter is an
interval of numbers
– Compare this to a point estimate which is just one
number, not an interval of numbers ( a range of
numbers)
• The level of confidence represents the expected
proportion of intervals that will contain the parameter if
a large number of different samples is obtained
• The confidence interval quantifies the accuracy and
precision of the point estimate
• the larger the sample size, the narrower the interval
estimates that reflect our uncertainty about a
parameter’s true value
Interpret Confidence level
What does the level of confidence represent?
• If we obtain a series of 50 random samples from
a population of interest
• Follow a process for calculating confidence
intervals for population mean with a 90% level of
confidence from each of the sample means
• Then, we would expect that 90% of those 50
confidence intervals (or about 45) would contain
our population mean
Confidence Level
• The level of confidence is always
expressed as a percent
• The level of confidence is described by a
parameter α (i.e.,alpha)
• The level of confidence is (1 – α) • 100%
– When α = .05, then (1 – α) = .95, and we have
a 95% level of confidence
– When α = .01, then (1 – α) = .99, and we have
a 99% level of confidence
Confidence Interval
• If we expect that a method would create
intervals that contain the population mean
90% of the time, we call those intervals
90% confidence intervals
• If we have a method for intervals that
contain the population mean 95% of the
time, those are 95% confidence intervals
• And so forth
Summary
• To tie the definitions together
– We are using the sample mean to estimate the
population mean ..(Point estimate)
– With each specific sample, we can construct a ,for
instance, 95% confidence interval to estimate the
population mean… (Interval estimate)
– 95% confidence interval tells you that If we take
samples repeatedly, we expect that 95% of these
intervals would contain the population mean
Example
• Back to our 27 miles per gallon car
“We think that the mileage is 27 mpg
and we’re pretty sure that
we’re not too far off”
• Putting in numbers (quantify the accuracy)
“We estimate the gas mileage is 27 mpg
and we are 90% confident that
the real mileage of this model of car
is between 25 and 29 miles per gallon”
Example (continued)
“We estimate the gas mileage is 27 mpg”
• This is our point estimate
“and we are 90% confident that”
• Our confidence level is 90% (which is 1- α ,
i.e. α = 0.10)
“the real mileage of this model of car”
• The population mean
“is between 25 and 29 miles per gallon”
• Our confidence interval is (25, 29)
Known Population Standard
Deviation
• First, we assume that we know the
standard deviation of the population (σ)
• This is not very realistic … but we need it
for right now to introduce how to construct
a confidence interval
• We’ll solve this problem in a better way
(where we don’t know what σ is) later…
but first we’ll do this one
Assumption
To estimate the mean  with a known s, we
need a normal distribution assumption for
the sampling distribution of mean.
Assumption satisfied by:
1. Knowing that the sampled population is normally
distributed, or
2. Using a large enough random sample
Central Limit Theory
Go and read this part!!
Sampling Distribution of Means
• The values of a general normal random variable are within 1.96
times (or about 2 times according to empirical rule) its standard
deviation away from its mean 95% of the time
• Thus the sample mean is within
s
± 1.96 
n
of the population mean 95% of the time

s
Here, s =
x
n
Interval for Sample Mean
• Because the sample mean has an approximately
normal distribution, it is in the interval
s
  1.96 
n
around the (unknown) population mean 95% of the
time. In other words, the interval will cover 95% of
possible sample means, when you take samples from
the population repeatedly.

s
• Since X =   1.96  , we can flip the equation around
n
between μ and X to solve for the population mean μ
Interval for Population Mean
• After we solve for the population mean μ, we
find that μ is within the interval
s
x  1.96 
n
around the (known) sample mean “95% of the
time”
• This isn’t exactly true in the mathematical sense
as the population mean is not a random variable
… that’s why we call this a “confidence” instead
of a “probability”
Confidence Interval
• Thus a 95% confidence interval for the
Population mean is

s
x  1.96 
n

• This is in the form


Point estimate ± margin of error
• The margin of error here is 1.96 • σ / √ n
Example
• For our car mileage example
– Assume that the sample mean was 27 mpg
– Assume that we tested a sample of 40 cars
– Assume that we knew that the population standard
deviation was 6 mpg
• Then our 95% confidence interval estimate for
the true/population mean mileage would be
6
27  1.96 
40
or 27 ± 1.9
Critical Value
• If we wanted to compute a 90% confidence
interval, or a 99% confidence interval, etc., we
would just need to find the right standard normal
value (instead of 1.96 for a 95% confidence
interval) called critical value
• Frequently used confidence levels, and their
critical values, are
– 90% corresponds to 1.645
– 95% corresponds to 1.960
– 99% corresponds to 2.575
Critical Value
• The numbers 1.645, 1.960, and 2.575 are written as
a form of Z where  is the area to the right of the Z
value.
– z0.05 = 1.645 … P(Z ≥ 1.645) = .05
[use TI Calculator: invNorm(.95,0,1) = 1.645)]
– z0.025 = 1.960 … P(Z ≥ 1.960) = .025
[invNorm(0.975,0,1) = 1.960]
– z0.005 = 2.575 … P(Z ≥ 2.575) = .005
[invNorm(0,995,0.1) = 2.575]
where Z is a standard normal random variable
(Exercise : Go and read on how to get the critical
values from the Z tables)
Understand the role of margin of
error in constructing a
confidence interval
Margin of Error
• If we write the confidence interval as
27 ± 2
then we would call the number 2 (after the
±) the size of margin of error
• So we have three ways of writing
confidence intervals
– (25, 29)
– 27 ± 2
– 27 with a margin of error of 2
Margin of Error
• The margin of errors would be
– 1.645 • σ / √ n for 90% confidence intervals
– 1.960 • σ / √ n for 95% confidence intervals
– 2.575 • σ / √ n for 99% confidence intervals

• Once we know the margin of error, we can


state the confidence interval as
sample mean ± margin of error
Margin of Error
• The margin of error which is half of a length of a confidence interval
depends on three factors
– The level of confidence (1-α)
– The sample size (n)
– The standard deviation of the population (σ)
Notice that
➢ The higher the confidence level, the longer the length of the confidence
interval. That is, a 99% confidence interval will be longer than a 90%
confidence inter, because a wider interval will warrant better chance to
cover the population mean
➢ The larger the sample size, the shorter the confidence interval. This is
because the larger the sample size, the smaller the standard error of the
sample mean, which means the margin of error of the estimation is
smaller.
➢ The larger the standard deviation of the population, the longer the
confident interval. So, if the value of the variable varies very much, the
margin of error of the estimate increases.
Reducing Margin of Error
Summary
• We can construct a confidence interval
around a point estimator if we know the
population standard deviation σ
• The margin of error is calculated using σ,
the sample size n, and the appropriate Z-
value
• We can also calculate the sample size
needed to obtain a target margin of error
2. Confidence Intervals about a
Population Mean in Practice where
the Population Standard Deviation
is Unknown
Learning Objectives
• Know the properties of t-distribution
• Determine t-values
• Construct and interpret a confidence interval
about a population mean
Know the properties of
t-distribution
Unknown Population Standard
Deviation
• So far we assumed that we knew the population
standard deviation σ
• But, this assumption is not realistic, because if
we know the population standard deviation, we
probably would know the population mean as
well. Then there is no need to estimate the
population mean using a sample mean.
• So, it is more realistic to construct confidence
intervals in the case where we do not know the
population standard deviation
Replacing σ with s
• If we don’t know the population standard
deviation σ, we obviously can’t use the formula
Margin of error = 1.96 • σ / √ n
because we have no number to use for σ
• However, just as we can use the sample mean
to approximate the population mean, we can
also use the sample standard deviation to
approximate the population standard deviation
Student’s t-distribution
• Because we’ve changed our formula (by
using s instead of σ), we can’t use the
normal distribution any more
• Instead of the normal distribution, we use
the Student’s t-distribution
• This distribution was developed
specifically for the situation when σ is not
known (by Gosset)
Standard Errors from Samples

• Of course, life is usually not so simple.

• As undeniably cool as the Central Limit


Theorem is, however, it has a problem:
– We need to know σ
– How often do researchers really know
the population std (σ) deviation needed
for calculating standard errors?
• Thank Guinness for the solution…
How Guinness Saved the World
• In the beginning of the 20th Century, a
statistician at the Guinness Brewery in Dublin
concerned with quality control came up with a
William Gosset,
solution a.k.a. “Student”

• Calculate the standard deviation of the sample

s
mean s
X =
n
• and use Student’s t-distribution, which
depends on sample size for inference.
The t-distribution
• For samples under 120 or so,
the difference between the
sample distribution s and the
normal distribution σ can be
large, the smaller the sample
the larger the difference

• Solution: The t-distribution is


flatter than the Z distribution
and gets increasingly so as
the sample shrinks.
Small Sample? Hedge your bet!
• Thus, the smaller the sample
the larger the interval
necessary for a given level of
confidence.
t-table
• No longer can we
assume that the popn
mean (μ) will be within
1.96 std. deviations of
the sample mean in 95
out of 100 samples.

• The smaller the sample


the more std.
deviations we can
expect μ can be from x-
bar at a given level of
confidence.

• Degrees of freedom
capture the sample
size, In our case= n - 1
Confidence Intervals w/out σ
• Example: Randomly sampling 16 students for their GPA,
you get a sample mean of 3.0 and sample std. deviation (s)
of 0.4

• Identify an interval which will contain the true population


mean 95% of the time.
s .4
Calculate standard dev. of mean: = = = .10
n 16

t = (X −  ) /
s s
rewriting  = X  t
n n

2. Calculate the interval 3 ±(2.145*.1)=3±.21 This is a


confidence interval from 2.79 to 3.21. 95% of the time this
interval will contain the mean.
• If it were a known st. dev., σ, you would use the smaller
value of z, 1.96 and the interval would be smaller: between
2.804 and 3.196.
Another example

Sample of 15 students slept an average of 6.4


hours last night with standard deviation of 1 hour.
Need t with n-1 = 15-1 = 14 d.f.
For 95% confidence, t14 = 2.145

 s   1 
x  t  = 6.4  2.145  = 6.4  0.55
 n  15 
What happens to CI as
sample gets larger?

 s 
x  Z 
For large samples:

 n Z and t values
become almost
identical, so CIs are
 s  almost identical.
x  t 
 n
Properties of t-distribution
• Several properties are familiar about the Student’s t
distribution
– Just like the normal distribution, it is centered at 0
and symmetric about 0
– Just like the normal curve, the total area under
the Student’s t curve is 1, the area to left of 0 is
½, and the area to the right of 0 is also ½
– Just like the normal curve, as t increases, the
Student’s t curve gets close to, but never
reaches, 0
Difference between Z and t
• So what’s different?
• Unlike the normal, there are many different “standard”
t-distributions
– There is a “standard” one with 1 degree of freedom
– There is a “standard” one with 2 degrees of
freedom
– There is a “standard” one with 3 degrees of
freedom
– Etc.
• The number of degrees of freedom is crucial for the t-
distributions
t-statistic
• When σ is known, the z-score
x−
z=
s/ n
follows a standard normal distribution
• When σ is not known, the t-statistic
x−
t=
s/ n
follows a t-distribution with n – 1(sample size
minus 1) degrees of freedom
t-distribution
• Comparing three curves
– The standard normal curve
– The t curve with 14 degrees of freedom
– The t curve with 4 degrees of freedom
t-distribution
• Take note
Determine t-values
Calculation of t-distribution
• The calculation of t-distribution values t
can be done in similar ways as the
calculation of normal values z
– Using tables
– Using technology – TI graphing Calculator
Probability of exceeding the critical value
0.10 0.05 0.025 0.01 0.005 0.001
1. 3.078 6.314 12.706 31.821 63.657 318.313
Use a t-table shown to find a critical value 2. 1.886 2.920 4.303 6.965 9.925 22.327
(one tail) 3. 1.638 2.353 3.182 4.541 5.841 10.215

4. 1.533 2.132 2.776 3.747 4.604 7.173

Upper critical values of Student's t 5. 1.476 2.015 2.571 3.365 4.032 5.893

distribution with n degrees of freedom 6. 1.440 1.943 2.447 3.143 3.707 5.208

7. 1.415 1.895 2.365 2.998 3.499 4.782

8. 1.397 1.860 2.306 2.896 3.355 4.499

9. 1.383 1.833 2.262 2.821 3.250 4.296

10. 1.372 1.812 2.228 2.764 3.169 4.143

11. 1.363 1.796 2.201 2.718 3.106 4.024

12. 1.356 1.782 2.179 2.681 3.055 3.929

13. 1.350 1.771 2.160 2.650 3.012 3.852

14. 1.345 1.761 2.145 2.624 2.977 3.787

15. 1.341 1.753 2.131 2.602 2.947 3.733

16. 1.337 1.746 2.120 2.583 2.921 3.686

Or use TI graphing calculator to find 17. 1.333 1.740 2.110 2.567 2.898 3.646

18. 1.330 1.734 2.101 2.552 2.878 3.610


a critical value: for instance, 19. 1.328 1.729 2.093 2.539 2.861 3.579

20. 1.325 1.725 2.086 2.528 2.845 3.552

21. 1.323 1.721 2.080 2.518 2.831 3.527

22. 1.321 1.717 2.074 2.508 2.819 3.505

23. 1.319 1.714 2.069 2.500 2.807 3.485

24. 1.318 1.711 2.064 2.492 2.797 3.467

25. 1.316 1.708 2.060 2.485 2.787 3.450

26. 1.315 1.706 2.056 2.479 2.779 3.435

27. 1.314 1.703 2.052 2.473 2.771 3.421

28. 1.313 1.701 2.048 2.467 2.763 3.408

29. 1.311 1.699 2.045 2.462 2.756 3.396

30. 1.310 1.697 2.042 2.457 2.750 3.385


Critical values t
• Critical values for various degrees of freedom for the t-
distribution are (compared to the normal)

n Degrees of Freedom t0.025


6 5 2.571
16 15 2.131
31 30 2.042
101 100 1.984
1001 1000 1.962
Normal “Infinite” 1.960
Note: When the sample size is large, a t distribution is close to
a z distribution
Construct and interpret a
t-confidence interval about a
population mean
z-score and t-score
• The difference between the two formulas
x − x −
z= t=
s/ n s/ n

is that the sample standard deviation s is used to


approximate the population standard deviation σ
• The z-score has a normal distribution, the
t-statistic (or the t-score) has a t-distribution
95% Confidence interval for mean
with unknown σ
• A 95% confidence interval, with σ unknown, is

s to x + t s
x − t 0.025  0.025 
n n
where t0.025 is the critical value for the
t-distribution with (n – 1) degrees of freedom

Note: Compare it to the 95% confidence interval , with a


known σ:
s s
x−z 0.025
to x+z  0.025
n n
Summary
Critical Value t/2 corresponding to
Confidence Level 1 – α
• The different 95% confidence intervals with t0.025 would
be
– For n = 6, the sample mean ± 2.571 • s / √ 6
– For n = 16, the sample mean ± 2.131 • s / √ 16
– For n = 31, the sample mean ± 2.042 • s / √ 31
– For n = 101, the sample mean ± 1.984 • s / √ 101
– For n = 1001, the sample mean ± 1.962 • s / √ 1001
– When σ is known, the sample mean ± 1.960 • σ / √ n
Confidence interval for mean with
unknown σ
• In general, the (1 – α) • 100% confidence
interval, when σ is unknown, is
s s
x − t / 2  x + t / 2 
n to n

where tα/2 is the critical value for the


t-distribution with (n – 1) degrees of
freedom
Approximate t with z
• As the sample size n gets large, there is less
and less of a difference between the critical
values for the normal and the critical values
for the t-distribution
• Although t-critical value and z-critical value
may be close to each other when the sample
size is large, we still recommend to use a t-
distribution when σ is not known to obtain a
more accurate answer
– When doing rough assessment by hand,
the normal critical values can be used,
particularly when n is large, for example if
n is 30 or more
Example 1
• Assume that we want to estimate the average
weight of a particular type of very rare fish
– We are only able to borrow 7 specimens of
this fish
– The average weight of these was 1.38 kg (the
sample mean)
– The standard deviation of these 7 specimens
of this fish was 0.29 kg (a sample standard
deviation)
• What is a 95% confidence interval for the true
mean weight?
Example 1 (continued)
• n = 7, the critical value t0.025 for 6 degrees
of freedom is 2.447
• Our confidence interval thus is
0.29
1.38 − 2.447  = 1.11
7

to
0.29
1.38 + 2.447  = 1.65
7

or (1.11, 1.65)
Example 2
Suppose you do a study of acupuncture to determine how effective it is in
relieving pain. You measure sensory rates for 15 subjects with the results
given below. Use the sample data to construct a 95% confidence interval for
the mean sensory rate for the population (assumed normal) from which you
took the data.
8.6; 9.4; 7.9; 6.8; 8.3; 7.3; 9.2; 9.6; 8.7; 11.4; 10.3; 5.4; 8.1; 5.5; 6.9
Solution (Do in groups)

To find the confidence interval, first we need to find the sample mean. Since
population standard deviation is not given and we have the sample data to
calculate the sample standard deviation, we can construct a t-confidence interval
for estimating the mean.
Use TI calculator entering the data and obtain one-variable statistics. We obtain
X = 8.2267 and s =1.6722, where n = 15
Critical value is t 0.025;df =14 = 2.145
1.6722
8.2267  2.145 
95% confidence interval is 15 ; Between 7.30 and 9.15
Example 3
Check the underlying distribution
• When apply a t-interval, we need to make sure the
underlying population is approximately normally
distributed.
• When the sample size is small, outlier of the data will
have a major affect on the data set, because outliers will
affect the calculation of sample mean and sample
standard deviation.
• So what can we do?
– For a small sample, we always must check to see that
the outlier is a legitimate data value (and not just a
typo)
– We can collect more data, for example to increase n
to be over 30. Apply the central limit theorem, we can
use a z-interval to approximate a t-interval.
Summary
• We used values from the normal
distribution when we knew the value of the
population standard deviation σ
• When we do not know σ, we estimate σ
using the sample standard deviation s
• We use values from the t-distribution when
we use s instead of σ, i.e. when we don’t
know the population standard deviation
3. Confidence Intervals
about a
Population Proportion
Some scenarios
• What proportion of the students at BUSE
would like classes to be offered on
Saturdays?
• What percent of students in SADC expect
to pursue doctoral degrees?
• What proportion of registered voters will
vote for a particular candidate in an
upcoming election?
• In all these examples, the proportion of
population members possessing some
specific characteristic is of interest
Learning Objectives
• Obtain a point estimate for the population
proportion
• Construct and interpret a confidence interval
for the population proportion
• Determine the sample size necessary for
estimating a population proportion within a
specified margin of error
Obtain a point estimate for the
population proportion
Mean & Proportion
• So far, we learned to calculate confidence
intervals for the population mean, when we
knew σ and
• We also learned to calculate confidence
intervals for the mean, when we did not know
σ
• Here, we’ll learn how to construct confidence
intervals for situations when we are analyzing
a population proportion
• The issues and methods are quite similar
Sample Proportion
• When we analyze the population mean, we use
the sample mean as the point estimate
– The sample mean is our best guess for the
population mean

• When we analyze the population proportion, we


use the sample proportion as the point estimate
– The sample proportion is our best guess for
the population proportion
Proportion – Point Estimate
• Using the sample proportion is the natural
choice for the point estimate
• If we are doing a poll, and 68% of the
respondents said “yes” to our question,
then we would estimate that 68% of the
population would say “yes” to our question
also
• The sample proportion is written as p̂
Construct and interpret a confidence
interval for the population proportion
Confidence Interval for Mean
versus Proportion
• Confidence intervals for the population mean are
– Centered at the sample mean
– Plus and minus zα/2 times the standard deviation of the
sample mean (the standard error from the sampling
distribution)
• Similarly, confidence intervals for the population
proportion will be
– Centered at the sample proportion
– Plus and minus zα/2 times the standard deviation of the
sample proportion
Sampling Distribution of Proportion
• the distribution of the sample proportion is approximately
normal with  pˆ = p

p(1 − p)
s pˆ =
n

under most conditions


• We use this to construct confidence intervals for the
population proportion
Confidence Interval for Population
Proportion
• The (1 – α) • 100% confidence interval for the
population proportion is from

pˆ (1 − pˆ ) pˆ (1 − pˆ )
pˆ − z / 2 
n
to pˆ + z / 2  n

where zα/2 is the critical value for the normal


distribution

Note: That is,


sample proportion  zα/2  standard error of sample proportion
Margin of Error
• Like for confidence intervals for population
means, the quantity
pˆ (1 − pˆ )
z / 2 
n

is called the margin of error


Example

– We polled n = 500 voters (This a sample of


voters)
– When asked about a ballot question, p̂ = 47%
of them were in favor
– Obtain a 99% confidence interval for the
population proportion in favor of this ballot
question (α = 0.005)
Example (continued)
• The critical value z0.005 = 2.575, so
0.47  0.53
0.47 − 2.575  = 0.41
500

to
0.47  0.53
0.47 + 2.575  = 0.53
500

or (0.41, 0.53) is a 99% confidence


interval for the population proportion
Example 2
• Modified Bonus Plan
Determine the sample size necessary for
estimating a population proportion within
a specified margin of error
Sample Size Determination
• We often want to know the minimum sample
size to obtain a target margin of error for
estimating the population proportion
• A common use of this calculation is in polling
… how many people need to be polled for the
result to have a certain margin of error
– News stories often say “the latest polls show
that so-and-so will receive X% of the votes
with a E% margin of error …”
Example 1
• For our polling example, how many people
need to be polled so that we are within 1
percentage point with 99% confidence?
• The margin of error is
p̂ (1 − p̂ )
z / 2 
n
which must be 0.01
• We have a problem, though … what is p̂ ?
Two choices of p̂
• If we try to figure out the sample size n in the
experimental design stage before collecting
data, then we do not have sample data to
calculate p̂ . A way around this is that using p̂ = 0.5
will always yield a sample size that is large enough.

• We can also use an estimates p̂ from a previous study


(historic data) to calculate the sample size.
Example 1 (continued)

• In our case, if we using pˆ = 0.5 , then we have

0.5  0.5
2.575  = .01
n
so
2
 2.575 
n = 0.25   
 .01 
and n = 16,577
Example 1 (continued)
• We understand now why political polls
often have a 3 or 4 percentage points
margin of error
• Since it takes a large sample (n = 16,577)
to get to be 99% confident to within 1
percentage point, the 3 or 4 percentage
points margin of error targets are good
compromises between accuracy and cost
effectiveness
Sample Size Determination
• We can write this as a formula
• The sample size n needed to result in a margin
of error E% for (1 – α) • 100% confidence for a
population proportion is
(Z ) 2
 p̂  (1 − p̂ )
n= /2

(E% )2
• Usually we don’t get an integer for n, so we
would need to take the next higher number (the
one lower wouldn’t be large enough)
Example 2
Determine the sample size necessary to estimate the true
proportion of laboratory mice with a certain genetic defect. We
would like the estimate to be within 0.015 with 95% confidence.
Solution:
1. Level of confidence: 1 −  = 0.95, z/2 = 1.96
2. Desired maximum error is E = 0.015.
3. No estimate of p given, use p̂ = 0.5

4. Use the formula for n:


(Z ) 2
 p̂  (1 − p̂ ) (1.96) (0.5)(0.5)
2

n= /2
= = 4268.44  4269
( E% ) 2
(0.015
2
)
Example 2 (continued)
Suppose we know the genetic defect occurs in approximately 1 of
80 animals
Use: p̂ = 1 / 80 = 0.0125

(Z ) 2
 p̂  (1 − p̂ )
n= /2

(E% )2
=
(1.96) (0.0125)(0.9875) = 210.75  211
2

(0.015) 2

Note: As illustrated here, it is an advantage to have some indication


of the value expected for p, especially as p becomes increasingly further
from 0.5
Summary
• We can construct confidence intervals for
population proportions in much the same
way as for population means
• We need to use the formula for the
standard deviation of the sample
proportion
• We can also compute the minimum
sample size needed for a desired level of
accuracy
Which Procedure Do I Use?
Overview
• There are three different confidence
interval calculations covered in this unit
• It can be confusing which one is
appropriate for which situation
• I should use the normal … no, the t … no
the … ???
Which Parameter?
• The one main question right at the
beginning
• Which parameter are we trying to
estimate?
– A mean?
– A proportion?
• This the single most important question
z-interval or t-interval?
• In analyzing population means
• Is the population variance known?
– If so, then we can use the normal distribution
• If the population variance is not known
– If we have “enough” data (30 or more values), we still
can use the normal distribution
– If we don’t have “enough” data (29 or fewer values),
we should use the Student's t-distribution
• We don’t have to ask this question in the
analysis of proportions
z-interval for mean
• For the analysis of a population mean
• If
The data is OK (reasonably normal)
The variance is known
then we can use the normal distribution with a
confidence interval of
s s
x − z / 2 to x + z / 2
n n
t-interval for mean
• For the analysis of a population mean
• If
The data is OK (reasonably normal)
The variance is NOT known
then we can use the Student's t-distribution with
a confidence interval of
s s
x − t / 2  x + t / 2 
n to n
z-interval for Proportion
• For the analysis of a population proportion
• If sample size is large enough,
then we can use the proportions method
with a confidence interval of

pˆ (1 − pˆ ) to pˆ (1 − pˆ )
pˆ − z / 2  pˆ + z / 2 
n n
Summary
• The main questions that determine the
confidence interval to use:
• Is it a
– Population mean?
– Population proportion?
• In the case of a population mean, we need to
determine
– Is the population variance known?
– Does the data look reasonably normal?
Estimating the Value of a
Parameter
Using Confidence Intervals
Summary
• We can use a sample {mean, proportion} to
estimate the population {mean, proportion}
• In each case, we can use the appropriate
sampling distribution of the sample statistic to
construct a confidence interval around our
estimate
• The confidence interval expresses the
confidence we have that our calculated interval
contains the true parameter
REFERENCE

• Lind/Marchal/Wathen 14th Edition Statistical


Techniques in Business and Economics
• Newbold P (2013) Statistics for Business and
Economics
• Wegner T (2005) Applied Business Statistics-
Methods and Applications
• Freud J.E etal (1993) Elementary Business
Statistics-The Modern Approach

You might also like