0% found this document useful (0 votes)
28 views

W6 Lecture6

Here are the 90% and 99% confidence intervals for the mean height μ when σ is known to be 6cm and when σ is unknown: 1. When σ = 6cm (known): - 90% CI: (147.4 - 2.776, 147.4 + 2.776) = (144.624, 150.176) - 99% CI: (147.4 - 3.917, 147.4 + 3.917) = (143.483, 151.317) 2. When σ is unknown and s = 6cm: - 90% CI: (147.4 - 2.998, 147.4 + 2.998) = (144.402

Uploaded by

Thi Nam Phạm
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

W6 Lecture6

Here are the 90% and 99% confidence intervals for the mean height μ when σ is known to be 6cm and when σ is unknown: 1. When σ = 6cm (known): - 90% CI: (147.4 - 2.776, 147.4 + 2.776) = (144.624, 150.176) - 99% CI: (147.4 - 3.917, 147.4 + 3.917) = (143.483, 151.317) 2. When σ is unknown and s = 6cm: - 90% CI: (147.4 - 2.998, 147.4 + 2.998) = (144.402

Uploaded by

Thi Nam Phạm
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Biostatistics

Lecture 6
Estimation/Confidence Intervals
2022-2 Fall Semester

Instructor: Min Jin Ha


Department of Health Informatics and Biostatistics
Graduate School of Public Health
Yonsei University
Reading
• Pagano and Gauvreau, Chapter 9
Statistical Inference
• We have investigated the theoretical properties of a distribution of
sample means, we’re ready to take the next step and apply this
knowledge to the process of statistical inference

• Aim: estimate some characteristic of a continuous random variable


(e.g., mean) using information contained in a sample of observations
Interval Estimation
• Point estimation: using sample data to calculate a single number to
estimate the parameter of interest
• Sample mean 𝑥ҧ to estimate the population mean 𝜇
• The problem is that two different samples are very likely to result in
different sample means  there is some degree of uncertainty
• A point estimate does not provide any information about the inherent
variability of the point estimator
• From CLT, we know that 𝑥ҧ is more likely to be near the true population
mean if it is based on large sample
• Interval estimation provides a range of reasonable values that are intended
to contain the parameter of interest, a certain degree of confidence
What is Confidence Interval
• A confidence interval provides a range of reasonable values that are
intended to contain the parameter of interest with a certain degree of
confidence. It often takes the form
Point estimate ± margin of error
and is written
(point estimate – margin of error, point estimate + margin of error)
Caveat
• For illustration, we start by assuming 𝜎 is known.
• When 𝜎 is known?
• Almost never!
• However, it’s easier to understand if we assume that to start.
• By the end of the class, we’ll get rid of this assumption
Two-sided 95% Confidence Intervals (𝜎 known)
• A random variable 𝑋 has mean 𝜇 and standard deviation 𝜎
• The CLT states that
𝑋ത − 𝜇
𝑍= ∼ 𝑁(0,1)
𝜎/√𝑛
• From Lecture 5, we know 𝑃 −1.96 < 𝑍 < 1.96 = 0.95

𝑋−𝜇
• Equivalently, 𝑃 −1.96 < < 1.96 =0.95
𝜎/√𝑛
• Given this, we are able to manipulate the inequality inside the parentheses
without altering the probability statement to the form
𝑃 𝐿 < 𝜇 < 𝑈 =0.95
Show how L and U are derived
Two-sided 95% Confidence Intervals (𝜎 known)
𝜎 𝜎
𝑃 𝑋ത − 1.96 < 𝜇 < 𝑋ത + 1.96 =0.95
𝑛 √𝑛
𝜎 𝜎
• The quantities 𝑋ത − 1.96 and 𝑋ത + 1.96 are 95% confidence limits for the
𝑛 √𝑛
population mean 𝜇
• we are 95% confident that the interval will cover 𝜇
• If we were to select 100 random samples from the population and use these samples to
calculate 100 Cis for 𝜇, approximately 95% of the Cis would cover the true population mean 𝜇
and 5 would not
• Wrong Interpretations:
• There is 95% chance that 𝜇 lies in the interval
• Why it’s wrong? 𝜇 is fixed and does not move
Two-sided 1 − 𝛼 × 100% Confidence Intervals (𝜎 known)

• A generic confidence interval for 𝜇 can be obtained


𝛼
• Let 𝑧𝛼/2 be the upper 𝛼/2 quantile, i.e., P 𝑍 > 𝑧 𝛼 = = P 𝑍 < −𝑧𝛼
2 2 2

• The generic form 1 − 𝛼 × 100% CI for 𝜇 is


𝜎 𝜎
(𝑋ത − 𝑧𝛼 , 𝑋ത + 𝑧𝛼 )
2 𝑛 2 𝑛
• when 𝛼 = 0.05, the 1 − 𝛼 × 100% CI is the 95% CI that we found
99% Confidence Interval
• For a 99% interval, we need the z-value that cuts off the top 0.5% or
0.005 of the distribution, which is ?
When can we use this CI?
𝜎 𝜎
• The CI given by (𝑋ത − 𝑧𝛼 , 𝑋ത + 𝑧𝛼 ) is safe too use in the following
2 𝑛 2 𝑛
circumstances when 𝜎 is known
• X is normal (regardless of sample size)
• X is non-normal but the sample size is large
• It is typically not safe to use this CI when the sample size is small and
X is not normal random variable.
How can we get a more narrow CI?
𝜎 𝜎
(𝑋ത − 𝑧𝛼 , 𝑋ത + 𝑧𝛼 )
2 𝑛 2 𝑛
• Known 𝜎
• Decrease the margin of error
1. Compromise on our level of confidence, e.g., 90% interval
2. Increase the sample size n!
One-sided Confidence Intervals
• Sometimes, but not often, we want only an upper limit or lower limit
• Example: Consider the distribution of hemoglobin levels for the
population of children under the age of 6 who have been exposed to
high levels of lead. This distribution has an unknown mean 𝜇 and
standard deviation 𝜎 = 0.85𝑔 /100𝑚𝑙. We know that children who
have lead poisoning tend to have much lower levels of hemoglobin
than children who do not. Thus, we might be interested in finding an
upper bound for 𝜇. See Pagano 9.2
What if 𝜎 is unknown?
𝜎 𝜎
(𝑋ത − 𝑧𝛼 , 𝑋ത + 𝑧𝛼 )
2 𝑛 2 𝑛
• The CI cannot be computed if 𝜎 is unknown
• We use the sample standard deviation 𝑠 as an estimate of 𝜎
• We never know 𝜎, and if we replace 𝜎 with 𝑠, then
• We can’t use the CLT
What do you do when 𝜎 is unknown?
• While working for Guiness brewery in Dublin, William
Sealy Gosset published a paper on the t distribution, which
became known as Student’s t distribution.
(He published under “Student” because the brewery didn’t
allow him to use his own name)

• The t distribution is appropriate for constructing a


confidence interval for the mean when we need to
account for the additional variability due to estimating 𝜎
with s

𝑺𝒕𝒖𝒅𝒆𝒏𝒕 𝒔 𝒕-distribution

• The Student ′ 𝑠 𝑡-distribution to account


for the additional variability due to
estimating 𝜎 with 𝑠
• The t distribution looks a lot like the normal
except that it has fatter tails
• The parameter for t distribution is called degrees of freedom (df)
• As the df (denoted by 𝜈 in the figure)gets bigger, the t distribution looks
more and more like the normal
t distribution for CI
• The df measure the amount of information available in the data to
estimate 𝜎
ҧ
𝑥−𝜇
• The statistic 𝑡 = has a t distribution with 𝑛 − 1 df (denoted by
𝑠/ 𝑛
𝑡𝑛−1 ). We use 1 df by estimating the sample mean 𝑥ҧ
• Thus, 𝑛 gets larger  𝑠 gets to be a better estimate of 𝜎  the
distribution of the t statistic looking more like the normal
• With large enough 𝑛, normal approximation can be used to construct
CI.
• CI from t distribution is wider, accounting for the uncertainty on 𝜎
What if 𝜎 is unknown?
• We use CI given by
𝑠 𝑠
P 𝑋ത − t 𝑛−1,𝛼 ത
< 𝜇<𝑋 + t 𝑛−1,𝛼
2 𝑛 2 𝑛
𝛼
Where t 𝑛−1, is the quantile of probability 1- from t 𝑛−1
𝛼
2 2
• In R, use qt(p=0.975,df=n-1)
Applications
• Consider the distribution of heights for the population of individuals
between ages of 12 and 40 who suffer from fetal alcohol syndrome.
Fetal alcohol syndrome is the severe end of the spectrum of
disabilities caused by maternal alcohol use during pregnancy. The
distribution of heights has unknown mean 𝜇. A random sample of 31
patients is selected from the underlying population; the average
height for these individuals was 𝑥ҧ = 147.4𝑐𝑚.
1. When 𝜎 is known to be 6cm, construct 90% and 99% confidence intervals
for 𝜇. Interpret the results.
2. When 𝜎 is not known and the sample standard deviation calculated to be
6cm, construct 90% and 99% confidence intervals for 𝜇. Interpret the results.

You might also like