0% found this document useful (0 votes)
109 views57 pages

5.1 Inferential Statistics-Estimation

This document provides an overview of statistical estimation and confidence intervals. It defines key terms like parameters, statistics, sampling distributions, and point and interval estimation. The document explains how confidence intervals provide a range of plausible values for a population parameter based on a sample statistic. It also outlines the general formula for confidence intervals and how they indicate the precision of estimates and uncertainty around population values.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views57 pages

5.1 Inferential Statistics-Estimation

This document provides an overview of statistical estimation and confidence intervals. It defines key terms like parameters, statistics, sampling distributions, and point and interval estimation. The document explains how confidence intervals provide a range of plausible values for a population parameter based on a sample statistic. It also outlines the general formula for confidence intervals and how they indicate the precision of estimates and uncertainty around population values.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

Estimation

By; Abate Bekele (MSc)

School of Public Health


Learning objec7ves
§  At the end of this session students will be able to:
–  Understand sampling distribu7on
–  Understand the concept of es7ma7on
–  Explain two ways of es7ma7on
–  Understand two-sided and one-sided CIs
–  Compute CI for Means
–  Compute CI for difference between two popula7on means
–  Compute CI for propor7ons
–  Compute CI for difference between two popula7on
propor7ons
–  Interpret the CIs

School of Public Health


2
Parameter
•  A parameter is a number that describes
the popula7on.
•  Hence,
–  μ and σ are both popula&on mean and standard
devia&on parameters, respec&vely.
–  The popula7on propor7on parameter is usually
represented by p.
•  We don’t know it’s value because we
can’t examine the en5re popula5on.

School of Public Health


Sta)s)c
•  A sta5s5c is number that is computed from a
sample.
x
•  Therefore, and s x are both sta5s5cs.
•  A sample popula7on propor7on is
represented by .

•  The value is known when we take a sample
but can change from sample to sample. This is
known as sampling variability.
•  We use a sta5s5c to es5mate an unknown
parameter.
School of Public Health
What Is a Sampling Distribu5on?
¢ Sta7s7cs are random variables and random variables have
probability distribu7ons
¢ Think about taking repeated samples of size n and compu7ng
You could then plot a histogram of these sample
means.
¢ This is the probability distribu7on of the random variable


¢ The probability distribu7ons of sta7s7cs such as are called
X
sampling distribu7ons since they would only be observed if we
took many samples from the popula7on

School of Public Health


SD…
¢ In prac7ce, however, it is not common to select
repeated samples of size n from a given popula7on –
we take one sample!
¢ Understanding the proper7es of the theore7cal
distribu7on of the mean allows us to make inferences
based on a single sample
¢ So, the sampling distribu)on of a sta5s5c is the
distribu7on of all possible values of the sta7s7c from
all possible samples of the same size from the same
popula7on.

School of Public Health


SD…

School of Public Health


Sampling Distribu7on of sample means

•  Also known as sampling distribu7on of the mean


•  Each unit of observa7on in the sampling distribu7on
is a sample mean
•  Mean of sampling distribu7on = popula7on mean = µ
•  Spread of the sampling distribu7on gives a measure
of the magnitude of sampling error
•  Standard error of the sample mean = S.E. = σ
n
X
X X
X X X
X X X X

µ
School of Public Health
Standard devia7on vs. standard error

•  Standard devia5on (s.d.) tells us variability


among individuals
•  Standard error (S.E. ) tells us variability of
sample means
σ
•  Standard error of the mean = S.E. = n
Where; σ : standard devia7on of the
popula7on

School of Public Health


Sampling error
—  Apparent difference between popula7on mean and the
random sample mean that is purely due to chance in sampling
is called the sampling error
—  Sampling error does not mean that a mistake has been made
in the process of sampling but varia7on experienced due to
the process of sampling
—  Sampling error reflects the difference between the value
derived from the sample and the true popula7on value
—  The Only way to eliminate sampling error is to enumerate
the en7re popula7on

School of Public Health


The Central Limit Theorem
¢ Suppose you have a random variable X with any probability
distribu5on and some mean μ and standard devia7on σ
¢ The distribu7on of the sample mean computed for samples
X
of size n has three important proper7es:
1. The mean of the sampling distribu7on of is μ
X
2. The standard devia7on of the sampling distribu7on of is σ/
X
√n.
3. If X is normally distributed, then will also be
X X normally
distributed.
•  If X is non-normal, the sampling distribu7on of will be
X
approximately normal provided n is large enough (> 30).

School of Public Health


The CLT……
¢ The beauty of the CLT is that it allows us to make
probability statements about without regard for
X
the distribu7on of X provided n is large
¢ Since , we can standardize to obtain
2

σ
X ~ Ν(µ, ),
n

X−µ

z = ~ N (0 ,
1 )
σ/ n
and use our standard normal tables to find the
probability that lies in any par7cular interval (will be
X
discussed under Es5ma5on in detail)
School of Public Health
Estimation

•  The process of drawing conclusions about an


en7re popula7on based on the data in a sample
is known as sta5s5cal inference.

•  There are two ways of sta7s7cal inference;


•  Es7ma7on and
•  Hypothesis tes7ng

School of Public Health 13


Es5ma5on, Es5mator & Es5mate

•  Es5ma5on : is about es7ma7ng popula7on


parameters based on sample sta7s7cs (by
computa5on of a sta5s5c from sample data)
•  The sta5s5c itself is called an es5mator and can be
of two types - point or interval.
•  The value or values that the es5mator assumes are
called es5mates.

School of Public Health 14


Sta7s7cal es7ma7on
•  There are two ways to es7mate popula7on
values from sample values
–  Point es7ma7on
•  using a sample sta7s7c to es7mate a popula7on parameter based on a
single value
•  e.g. if a random sample of Malay births gave =3.5kg, and we use it to
X
es7mate µ, the mean birth weight of all Malay births in the sampled
popula7on, we are making a point es7ma7on
•  Point es5ma5on ignores sampling error !

–  Interval es7ma7on
•  using a sample sta7s7c to es7mate a popula7on parameter by making
allowance for sample varia7on (error)

School of Public Health 15 5/12/16


1. Point Es5mate
•  A single numerical value used to es7mate the
corresponding popula7on parameter.
Sample Sta7s7cs are Es7mators of Popula7on Parameters

Sample mean, X µ
Sample variance, S2 σ2
Sample propor7on, p P or π
Sample Odds Ra7o, OŔ OR
Sample Rela7ve Risk, RŔ RR
Sample correla7on coefficient, r ρ

School of Public Health 16


2. Interval es5ma5on
•  Give a plausible range of values of the es7mate
likely to include the “true” (popula7on) value
with a given confidence level/probability.

•  An interval es7mate provides more informa7on


about a popula7on characteris7c than does a
point es7mate.
•  Such interval es7mates are called confidence
intervals.
School of Public Health 17 5/12/16
Confidence Intervals
•  CIs also give informa7on about the precision
of an es7mate.
•  How much uncertainty is associated with a
point es7mate of a popula7on parameter?
•  When sampling variability is high, the CI will
be wide to reflect the uncertainty of the
observa7on.
•  Wider CIs indicate less certainty.

School of Public Health 18
CIs…
•  A CI in general:
–  Takes into considera7on varia7on in
sample sta7s7cs from sample to sample
–  Based on observa7on from 1 sample
–  Gives informa7on about closeness to
unknown popula7on parameters
–  Stated in terms of level of confidence
•  Never 100% sure

School of Public Health 19


General Formula:
The general formula for all CIs is:
The value of the statistic in my sample (eg., mean,
proportion, difference of mean/ proportion, etc.)
point es)mate ± (measure of how confident we
want to be) × (standard error)

From a Z table or a T table, depending


on the sampling distribution of the
statistic.
Standard error of the
statistic.

School of Public Health 20


Confidence Level
•  Confidence Level: Confidence in which the
interval will contain the unknown popula7on
parameter
•  A percentage (less than 100%)
–  Example: 95%
•  Also wripen (1 - α) = .95
•  Can be a two or one-sided

School of Public Health 21


Defini5on: 95% CI (Two sided CI)
1. Probabilistic interpretation:

School of Public Health 22


Two sided..

School of Public Health 23


5/12/16
Two sided..

School of Public Health 24


5/12/16
Two sided..

2. Prac&cal interpreta&on:
• When sampling is from a normally distributed
popula7on with known standard devia7on, we
are 100% (1-α) [e.g., 95%] confident that the
single computed interval contains the unknown
popula7on parameter.

School of Public Health 25


One-sided CI

School of Public Health 26


5/12/16
Using sta7s7cal tables
The (1-α) percent confidence interval (C.I.) for µ:
We want to find two values L and U between which µ
lies with high probability, i.e.
P( L ≤ µ ≤ U ) = 1-α
Confidence Level to Z-Value Guide

Confidence Level Zα/2 (2-Tail) Zα (1-Tail)

80% α = 20% 1.28 0.84

90% α = 10% 1.64 1.28

95% α = 5% 1.96 1.64

99% α = 1% 2.57 2.32

c α = 1.0-c Z(c/2) z(c-0.5)

School of Public Health 275/12/16


Z-table

School of Public Health 28


T-table

School of Public Health 29


CI for a Popula5on Mean
•  Suppose researchers wish to estimate the mean of
some normally distributed population.
•  They draw a random sample of size n from the
population and compute x , which they use as a point
estimate of µ.
•  Because random sampling involves chance, then x
can’t be expected to be equal to µ.
•  The value of x may be greater than or less than µ.
•  It would be much more meaningful to estimate µ by
an interval.

School of Public Health 30


We have the following cases:
A) When the population is normal

1) When the σ is known and the sample size is large or small, the C.I. has
the form:

P( x − Z (1−α / 2)σ / n < µ < x + Z (1−α / 2)σ / n ) = 1 − α

2) When σ is unknown, and the sample size is small, the C.I. has the form:

P( x − t(1−α / 2), s / n < µ < x + t(1−α / 2), ( n−1) s / n ) = 1 − α


( n −1)

School of Public Health 31


Deciding Between z and t.docx
B) When the population is not normal and n large (n>30)
1) When the σ is known the C.I. has the form:
P( x − Z (1−α / 2)σ / n < µ < x + Z (1−α / 2)σ / n ) = 1 − α

2) When σ is unknown, the C.I. has the form:

P(x − Z α s/ n <µ < x +Z α s / n) =1− α


(1− ) (1− )
2 2

School of Public Health 32


Deciding Z or t

School of Public Health 33


Example 1
•  Suppose a researcher, interested in obtaining an
estimate of the average level of some enzyme in a
certain human population, takes a sample of 10
individuals, determines the level of the enzyme in
each, and computes a sample mean of approximately
x = 22
•  Suppose further it is known that the variable of
interest is approximately normally distributed with a
variance of 45. We wish to estimate the CI of µ. With
α=0.05

School of Public Health 34


1- α=0.95→ α=0.05→ α/2=0.025,
variance = σ2 = 45 → σ=√ 45,n=10 x = 22
95%confidence interval for µ is given by:

P( x − Z (1−α / 2)σ / n < µ < x + Z (1−α / 2)σ / n ) = 1 − α

Z (1- α/2) = Z 0.975 = 1.96 (refer table)


Z 0.975(σ/√n) =1.96 (√ 45 / √10)=4.1578
22 ± 1.96 (√ 45 / √10) →
(22-4.1578, 22+4.1578) → (17.84, 26.16)

School of Public Health 35


Example 2
•  The activity values of a certain enzyme measured in normal
gastric tissue of 35 patients with gastric carcinoma has a mean
of 0.718 and a standard deviation of 0.511.We want to
construct a 90 % confidence interval for the population mean.
Note that the population is not normal, however
n=35 (n>30) n is large and σ is unknown, s=0.511
1- α=0.90→ α=0.1→ 1-α/2=0.95,
P( x − Z (1−α / 2) s / n < µ < x + Z (1−α / 2) s / n ) = 1 − α
Z (1- α/2) = Z0.95 = 1.645 (refer table)
Z 0.95(s/√n) =0.1421
0.718 ± 1.645 (0.511) / √35→ (0.576,0.860).

School of Public Health 36


Example 3
•  Suppose researchers, studied the effectiveness of
early weight bearing and ankle therapies following
acute repair of a ruptured Achilles tendon. One of the
variables they measured following treatment is the
muscle strength. In 19 subjects, the mean of the
strength was 250.8 with standard deviation of 130.9
we assume that the sample was taken from is
approximately normally distributed population.
Calculate 95% confidence interval for the mean of the
strength ?

School of Public Health 37


Solu7on
1- α=0.95→ α=0.05→ α/2=0.025, x = 250.8
Standard deviation= S = 130.9, n=19
95%confidence interval for µ is given by:
P( x − t(1−α / 2) ( n−1) s / n < µ < x + t(1−α / 2) ( n−1) s / n ) = 1 − α

t (1- α/2),n-1 = t 0.975,18 = 2.1009 (refer t-table )


t 0.975,18(s/√n) =2.1009 (130.9 / √19)=63.1
250.8 ± 2.1009 (130.9 / √19) →
(250.8- 63.1 , 22+63.1) → (187.7, 313.9)
School of Public Health 38
Confidence Interval for the difference between two
Popula5on Means:

•  If we draw two samples from two independent population and


we want to get the confidence interval for the difference
between two population means, then we have the following
cases:
•  The interpretation of the CI of the difference between
population means rests on the same assumptions as the CI of
the means.
a) When the population is normal
1) When the variances are known and the sample sizes are
large or small, the C.I. has the form:
σ 12 σ 22 σ 12 σ 22
( x1 − x2 ) − Z α + < µ1 − µ 2 < ( x1 − x2 ) + Z α +
1−
2 n1 n2 1−
2 n1 n2

School of Public Health 39


2) When variances are unknown but equal,
and the sample size is small, the C.I. has the
form:
( x − x ) − t α
1 2 S
1 1
+ < µ − µ < (x − x ) + t α
p S
1
1 1
+2 1 2 p
1− ,( n1 + n2 − 2 )
2 n1 n2 1− ,( n1 + n2 − 2 )
2 n1 n2
where
2 (n1 − 1) S12 + (n2 − 1) S 22
S =
p
n1 + n2 − 2

School of Public Health 40


b) When the popula5on is non-normal
1)  When the variances are unknown and the
sample sizes are large, the C.I. has the form:

2 2 2 2
S S
1 2 S S
1 2
( x1 − x2 ) − Z α + < µ1 − µ2 < ( x1 − x2 ) + Z α +
1−
2 n1 n2 1−
2 n1 n2

School of Public Health 41


Example 1

The researcher team interested in the difference between serum uric
acid level in a patient with and without Down’s syndrome. In a large hospital for the
treatment of the mentally retarded, a sample of 12 individual with Down’s Syndrome
yielded a mean of x1 = 4.5 mg/100 ml. In a general hospital a sample of 15 normal
individual of the same age and sex were found to have a mean value of x = 3.4
2
If it is reasonable to assume that the two population of values are
normally distributed with variances equal to 1 and 1.5, find the 95%
C.I for µ1 - µ2
Solution:
1- α=0.95→ α=0.05→ α/2=0.025 → Z (1- α/2) = Z0.975 = 1.96
σ 12 σ 22
( x1 − x2 ) ± Z α + =
1−
2 n1 n2
¢  1.1±1.96(0.4282) = 1.1± 0.84 = ( 0.26 , 1.94 ), We are 95% sure the true difference between means
lies within the interval.

School of Public Health 42


Example 2
to determine the effectiveness of
The purpose of the study was
an integrated outpatient dual-diagnosis treatment program for
mentally ill subject. The authors were addressing the problem
of substance abuse issues among people with sever mental
disorder. A retrospective chart review was carried out on 50
patients, the rechercher was interested in the number of
inpatient treatment days for mental disorders during a year
following the end of the program.
Among 18 patient with schizophrenia, the mean number of
treatment days was 4.7 with standard deviation of 9.3. For 10
subject with bipolar disorder, the mean number of treatment
days was 8.8 with standard deviation of 11.5. We wish to
construct 99% C.I for the difference between the means of the
populations represented by the two samples, assume
population variances are equal.

School of Public Health 43


1-α =0.99 → α = 0.01 → α/2 =0.005 → 1- α/2 = 0.995
n2 – 2 = 18 + 10 -2 = 26+ n1
t (1- α/2),(n1+n2-2) = t0.995,26 = 2.7787, then 99% C.I for μ1 – μ2
1 1
( x1 − x2 ) ± t α Sp +
1−
2
, ( n1 + n2 − 2 ) n1 n2
•  where
2 (n1 − 1) S12 + (n2 − 1) S 22 (17 x9.32 ) + (9 x11.52 )
Sp = = = 102.33
n1 + n2 − 2 18 + 10 − 2
then
(4.7-8.8)± 2.7787 √102.33 √(1/18)+(1/10)
-4.1 ± 11.086 =( - 15.186 , 6.986)

School of Public Health 44


Remark
Independent Related/Dependent
1.  Are samples come from two 1.  Are samples come from related /
distinct populations/groups
the same/ populations
2.  have different Data sources
2.  Have Same/related Data Source
3. The data of the samples are
n  Unrelated
3. The data are either
n  Independent ¢  Paired or Matched
4.Use difference between ¢  Repeated Measures
(Before/After)
the 2 Sample Means:
4.Use difference between each pair of
( x1 − x2 )
observations
Two different diets. Does one increase
longevity relative to the other? Di = X1i - X2i

Patients assigned randomly to receive a RBS level of study subjects before and
vaccine or placebo. Is the rate of the
disease the same in both groups, or did after breakfast.
the vaccine prevent disease?

SchoolMay
of 12,
Public
2016 Health 45
Remark…..
•  You can construct a 100(1-α)% confidence interval for a paired experiment using

sd
d ± tα / 2
n
•  Once you have designed the experiment by pairing, you MUST analyze it as a paired
experiment. If the experiment is not designed as a paired experiment in advance, do not use
this procedure.
•  The interpretation of the CI of the mean difference of paired measurements depends on these
assumptions:
a.  Your pair of subjects are randomly selected from the population of pairs or at least are
representative of the populations.
b.  The overall population of pairs, the difference is distributed in a Gaussian manner.
c.  The two measurements are before/after measurements on one subject or are measurements on
two subjects matched before the data were collected.
d.  All subjects come from the same population, and each subject (if before/after) or each pair of
matched subjects has been selected independently of the others.

School of Public Health 46


Example
4.4mmHg (X1) 9.9mmHg (X2) VAPOR PRESSURE
62.5 51.7
65.2 54.2 The data here are on the sugar concentration of
juice in half heads of red clover kept at
71.3 57.0 different vapor pressure for 8 hours. Construct
69.9 56.4 the 99% confidence interval for the difference
in mean sugar concentration
74.5 61.5
67.8 57.2
70.3 58.1
67.0 56.2
68. 5 58.4
62.4 55.5

School of Public Health 47


xi yi The 99% confidence interval for µ1-µ2 or
di di2
62.50 51.70
µd is given by:-
10.80 116.64
65.20 54.20
11.00 121 (d − ∈,d + ∈), where ∈= t α
(n − 1)s d
71.30 57.00 2

14.30 204.49
α
69.90 56.40 Hence,α = 1% = 0.01 ⇒ = 0.005
13.50 182.25 2
74.50 61.50 ⇒ tcritical = t α (n − 1) = t0.005 (10 − 1) = 3.250
13.00 169
2
67.80 57.20
10.60 112.36 ⇒∈= t α (n − 1)sd = (3.250)(0.663 ) = 2.15
2
70.30 58.10
12.20 148.84
67.00 56.20
⎪d − ∈= 11.32 - 2.15 = 9.17

⇒⎨
⎩d + ∈= 11.32 + 2.15 = 13.47
10.80 116.64 ⎪
68.50 58.40
10.10 102.01
Therefore, the 99% confidence interval
62.40 55.50
6.90 47.61 for µ1-µ2 is (9.17, 13.47)
Sum Σdi=113.2 Σdi2=12824.24

SchoolMay
of 12,
Public
2016 Health 48
Confidence Interval for a Population proportion (P)

•  A sample is drawn from the population of interest,


then compute the sample proportion such as P̂ .
number of element in the sample with some charachtaristic a
pˆ = =
Total number of element in the sample n

This sample proportion is used as the point estimator of


the population proportion. A confidence interval is
obtained by the following formula
ˆ (1 − P
P ˆ)
ˆ ±Z
P α
1−
2 n

5/12/16 School of Public Health 49


Example
In order to better counsel the parents of premature
babies, researchers investigated the survival of
premature infants. They retrospectively studied all
premature babies born at 22 to 25 weeks gestation at
the AUH during a 3-year period. The investigators
separately tabulated deaths for infants by their
gestational age. Of 29 infants born at 22 weeks
gestation, none survived 6 months. Of 39 infants born
at 25 weeks gestation, 31 survived for at least 6
months. Construct 95% CI for P for both cases?

School of Public Health 50


1-α =0.95 → α = 0.05 → α/2 =0.025 → 1- α/2 = 0.975
Z 1- α/2 = Z 0.975 =1.96 , n=39,
For the infants born at 25 weeks gestation the 95% C. I for P
31 Pˆ (1 − Pˆ ) 31 / 39(1 − 31 / 39)
pˆ = = 0.79487 Pˆ ± Z α = 31 / 39 ± 1.96
39 1−
2
n 39

(0.67,0.92)
This means that if the true proportion of surviving infants was
any less than 67%, there is less than 2.5% chance of observing
such a large proportion just by chance. It also means that if the
true proportion were any greater than 92%, the chance observing
such a small proportion just by chance is less than 2.5%.
Exercises: Do for the infants born at 22 weeks gestation
School of Public Health 51
CI for difference between two popula5on Propor5ons

•  Two samples are drawn from two independent popula7on of


interest, then compute the sample propor7on for each
sample for the characteris7c of interest. An unbiased point
es7mator for the difference between two popula7on
propor7ons Pˆ1 − Pˆ2
•  A 100(1-α)% confidence interval for P1 - P2 is given by

ˆ (1 − P
P ˆ) ˆ (1 − P
P ˆ )
ˆ −P
(P ˆ )±Z 1 1
+ 2 2
1 2 α
1−
2 n1 n2

School of Public Health 52


Assumption:
The subjects are randomly selected from the population or at least are
representative of that population.
Each subject was selected independently of the rest.
The only difference between groups is exposure to the risk factor or
exposure to the treatment
Example
A researcher investigated gender differences in proactive and reactive
aggression in a sample of 323 adults (68 female and 255 males ). In
the sample, 31 of the female and 53 of the males were using internet in
the internet café. We wish to construct 99 % confidence interval for
the difference between the proportions of adults go to internet café in
the two sampled population .

School of Public Health 53
Solu7on :
1-α =0.99 → α = 0.01 → α/2 =0.005 → 1- α/2 = 0.995
Z 1- α/2 = Z 0.995 =2.58 , nF=68, nM=255,
aF 31 aM 53
ˆ
pF = = ˆ
= 0.4559, p M = = = 0.2078
nF 68 nM 255
The 99% C. I is
ˆ (1 − P ˆ ) ˆ (1 − P
ˆ )
ˆ ˆ P F F P M M
( PF − PM ) ± Z α +

1−
2 nF nM

0.4559(1 − 0.4559) 0.2078(1 − 0.2078)
(0.4559 − 0.2078) ± 2.58 +
68 255
0.2481 ± 2.58(0.0655) = ( 0.07914 , 0.4171 )



School of Public Health 54
Exercise
•  A breast cancer research team collected the
following data on tumor size:
Type of Tumor Mean SD

A 41 3.85cm 1.95cm
B 32 2.80cm 1.70cm

•  Construct a 95 percent confidence interval for


the difference between popula7on means and
interpret the finding.

School of Public Health 55


•  In a popula7on of subjects who died from lung
cancer following exposure to asbestos, it was found
that the mean number of years elapsing between
exposure and death was 25. The standard devia7on
was 7 years. Consider the sampling distribu7on of
sample means based on samples of size 35 drawn
from this popula7on. What will be the shape of the
sampling distribu7on and why?

School of Public Health 56


•  Given the normally distributed random
variable X, find the numerical value of k such
that

School of Public Health 57

You might also like