0% found this document useful (0 votes)
13 views

Chapter5 Infererence Based On Two Samples

Chapter5 Infererence Based on Two Samples

Uploaded by

Chhin Visal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Chapter5 Infererence Based On Two Samples

Chapter5 Infererence Based on Two Samples

Uploaded by

Chhin Visal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Contents

Chapter5
Inferences Based on Two Samples

PHOK Ponna and PHAUK Sokkhey

Department of Applied Mathematics and Statistics


Institute of Technology of Cambodia

16/11/2023

AMS (ITC) Inferences Based on Two Samples 16/11/2023 1 / 25


Contents

Contents

1 Z Tests for a Difference Between Two Population Means

2 The Two-Sample t Test and Confidence Interval

3 Analysis of Paired Data

AMS (ITC) Inferences Based on Two Samples 16/11/2023 1 / 25


Z Tests for a Difference Between Two Population Means

Contents

1 Z Tests for a Difference Between Two Population Means

2 The Two-Sample t Test and Confidence Interval

3 Analysis of Paired Data

AMS (ITC) Inferences Based on Two Samples 16/11/2023 2 / 25


Z Tests for a Difference Between Two Population Means

Basic Assumptions
1. X1 , X2 , ..., Xm is a random sample from a distribution with mean
µ1 and variance σ12 .
2. Y1 , Y2 , ..., Yn is a random sample from a distribution with mean
µ2 and variance σ22 .
3. The X and Y samples are independent of one another.

Theorem 1
The expected value of X̄ − Ȳ is µ1 − µ2 , so X̄ − Ȳ is an unbiased
estimator of µ1 − µ2 . The standard deviation of X̄ − Ȳ is
q
σ2 σ2
σX̄ −Ȳ = m1 + n2

AMS (ITC) Inferences Based on Two Samples 16/11/2023 3 / 25


Z Tests for a Difference Between Two Population Means

Test Procedures for Normal Populations with Known Variances


We assume here that both population distributions are normal and
that the values of both σ12 and σ22 are known.
Null hypothesis: H0 : µ1 − µ2 = ∆0
x̄−ȳ −∆0
Test statistic value: z = r
σ2 σ2
1+ 2
m n

Alternative Hypothesis Rejection Region for Level α Test


H a : µ 1 − µ 2 > ∆0 R = {z : z ≥ zα }

H a : µ 1 − µ 2 < ∆0 R = {z : z ≤ −zα }
Ha : µ1 − µ2 ̸= ∆0 R = {z : z ≤ −zα/2 or z ≥ zα/2 }

AMS (ITC) Inferences Based on Two Samples 16/11/2023 4 / 25


Z Tests for a Difference Between Two Population Means

Example 1
Analysis of a random sample consisting of m = 20 specimens of
cold-rolled steel to determine yield strengths resulted in a sample
average strength of x̄ = 29.8 ksi. A second random sample of n = 25
two-sided galvanized steel specimens gave a sample average strength
of ȳ = 37.4 ksi. Assuming that the two yield-strength distributions are
normal with σ1 = 4.0 and σ2 = 5.0 (suggested by a graph in the
article “Zinc-Coated Sheet Steel: An Overview,” Automotive Engr.,
Dec. 1984: 39–43), does the data indicate that the corresponding true
average yield strengths µ1 and µ2 are different? Let’s carry out a test
at significance level α = 0.01.

AMS (ITC) Inferences Based on Two Samples 16/11/2023 5 / 25


Z Tests for a Difference Between Two Population Means

′ ′
Alternative Hypothesis β(∆
 )= P(type

II error when µ1 − µ2 = ∆ )
Ha : µ 1 − µ 2 > ∆ 0 Φ zα − ∆ −∆σ
0
 ′

Ha : µ 1 − µ 2 < ∆ 0 1 − Φ −zα − ∆ −∆ 0
 σ  
∆′ −∆0 ′
Ha : µ1 − µ2 ̸= ∆0 Φ zα/2 − σ − Φ −zα/2 − ∆ −∆
σ
0

q
where σ = σX̄ −Ȳ = (σ12 /m) + (σ22 /n) Sample sizes m and n can be
determined that will satisfy both P(type I error) = α and
P(type II error when µ1 − µ2 = ∆′ ) = β. For an upper-tailed test,
equating the previous expression for β(∆′ ) to the specified value of β
gives
σ12 σ22 (∆′ −∆0 )2
m + n = (zα +zβ )2
.

These expressions are also correct for a lower-tailed test, whereas α is


replaced by α/2 for a two-tailed test.

AMS (ITC) Inferences Based on Two Samples 16/11/2023 6 / 25


Z Tests for a Difference Between Two Population Means

Example 2
Persons having Raynaud’s syndrome are apt to suffer a sudden
impairment of blood circulation in fingers and toes. In an experiment
to study the extent of this impairment, each subject immersed a
forefinger in water and the resulting heat output (cal/cm2 /min) was
measured. For m = 10 subjects with the syndrome, the average heat
output was x̄ = 64, and for n = 10 nonsufferers, the average output
was 2.05. Let µ1 and µ2 denote the true average heat outputs for the
two types of subjects. Assume that the two distributions of heat
output are normal with σ1 = 0.2 and σ2 = 0.4.
(a) Carry out the test H0 : µ1 − µ2 = −1 versus Ha : µ1 − µ2 < −1.
Compute the P−value.
(b) What is the probability of a type II error when the actual
difference between µ1 and µ2 is µ1 − µ2 = −1.2 ?
(c) Assuming that m = n, what sample sizes are required to ensure
that β = 0.1 when µ1 − µ2 = −1.2.

AMS (ITC) Inferences Based on Two Samples 16/11/2023 7 / 25


Z Tests for a Difference Between Two Population Means

Large-Sample Tests
The assumptions of normal population distributions and known values
of σ1 and σ2 are unnecessary when both sample sizes are large(m ≥ 30
and n ≥ 30). In this case, the Central Limit Theorem guarantees that
X̄ − Ȳ has approximately a normal distribution regardless of the
underlying population distributions. Furthermore, using S12 and S22 in
place of σ12 and σ22 gives a variable whose distribution is approximately
standard normal:
X̄ −Ȳ −(µ1 −µ2 )
Z= r
S2 S2
1+ 2
m n

A large-sample test statistic results from replacing µ1 − µ2 by ∆0 , the


expected value of X̄ − Ȳ when H0 is true.
Use of the test statistic value along with the previously stated upper-,
lower-, and two-tailed rejection regions based on z critical values gives
large-sample tests whose significance levels are approximately α.

AMS (ITC) Inferences Based on Two Samples 16/11/2023 8 / 25


Z Tests for a Difference Between Two Population Means

Example 3
Are male college students more easily bored than their female
counterparts? This question was examined in the article “Boredom in
Young Adults–Gender and Cultural Comparisons” (J. Cross-Cult.
Psych., 1991: 209–223). The authors administered a scale called the
Boredom Proneness Scale to 97 male and 148 female U.S. college
students. Does the accompanying data support the research
hypothesis that the mean Boredom Proneness Rating is higher for men
than for women? Test the appropriate hypotheses using a 0.05
significance level.

Sample Sample Sample


Gender
Size Mean SD
Male 97 10.40 4.83
Female 148 9.26 4.68

AMS (ITC) Inferences Based on Two Samples 16/11/2023 9 / 25


Z Tests for a Difference Between Two Population Means

Confidence Intervals for µ1 − µ2

When both population distributions are normal, standardizing X̄ − Ȳ


gives a random variable Z with a standard normal distribution.
A 100(1 − α)% CI for µ1 − µ2 is
q
σ2 σ2
x̄ − ȳ ± zα/2 m1 + n2 .

If both m and n are large, the CLT implies that this interval is valid
even without the assumption of normal populations; in this case, the
confidence level is approximately 100(1 − α)%. Furthermore, use of
the sample variances S12 and S22 in the standardized variable Z yields a
valid interval in which s12 and s22 replace σ12 and σ22 .
Provided that m and n are both large, a CI for µ1 − µ2 with a
confidence level of approximately 100(1 − α)% is
q
s2 s2
x̄ − ȳ ± zα/2 m1 + n2 .
AMS (ITC) Inferences Based on Two Samples 16/11/2023 10 / 25
Z Tests for a Difference Between Two Population Means

Example 4
Two brands of batteries are tested, and their voltages are compared.
The summary statistics follow. Find the 95% confidence interval of the
true difference in the means. Assume that both variables are normally
distributed.
Brand X Brand Y
X̄1 = 9.2 volts X̄2 = 8.8 volts
σ1 = 0.3 volt σ2 = 0.1 volt
n1 = 27 n2 = 30
What does your interval say about the claim that there is no difference
in mean voltages?

AMS (ITC) Inferences Based on Two Samples 16/11/2023 11 / 25


The Two-Sample t Test and Confidence Interval

Contents

1 Z Tests for a Difference Between Two Population Means

2 The Two-Sample t Test and Confidence Interval

3 Analysis of Paired Data

AMS (ITC) Inferences Based on Two Samples 16/11/2023 12 / 25


The Two-Sample t Test and Confidence Interval

The Two-Sample t-Test

Assumption
Both population distributions are normal, so that X1 , X2 , ..., Xm is a
random sample from a normal distribution and so is Y1 , ..., Yn (with
the X ′ s and Y ′ s independent of one another). The plausibility of these
assumptions can be judged by constructing a normal probability plot of
the xi ’s and another of the yi ’s.

The test statistic and confidence interval formula are based on the
same standardized variable developed in Section 1, but the relevant
distribution is now t rather than z.

AMS (ITC) Inferences Based on Two Samples 16/11/2023 13 / 25


The Two-Sample t Test and Confidence Interval

The Two-Sample t-Test and Confidence Interval

Theorem
When the population distribution are both normal and σX2 ̸= σY2 , the
standardized variable
X̄ − Ȳ − (µ1 − µ2 )
T = q (1)
S12 S22
m + n

has approximately a t distribution with df ν estimated form the data by


2
s12 s22

+ 2
m n [(se1 )2 + (se2 )2 ]
ν= 2 2 =
(s12 /m) (s22 /n) (se1 )4 (se2 )4
m−1 + n−1 m−1 + n−1

where se1 = √s1 , se1 = s2


√ (round ν down to the nearest integer).
m n

AMS (ITC) Inferences Based on Two Samples 16/11/2023 14 / 25


The Two-Sample t Test and Confidence Interval

The two-sample t confidence interval for µ1 − µ2 with confidence level


100(1 − α)% is then
q
s2 s2
x̄ − ȳ ± tα/2,ν m1 + n2 .

The two-sample t test for testing H0 : µ1 − µ2 = ∆0 is as follows:

Test statistic value: t = r −∆0


x̄−ȳ
s2 s2
1+ 2
m n

Alternative Hypothesis Rejection Region for Level α Test


Ha : µ 1 − µ 2 > ∆ 0 R = {t : t ≥ tα,ν }
Ha : µ 1 − µ 2 < ∆ 0 R = {t : t ≤ −tα,ν }
Ha : µ1 − µ2 ̸= ∆0 R = {t : t ≤ −tα/2,ν or t ≥ tα/2,ν }

AMS (ITC) Inferences Based on Two Samples 16/11/2023 15 / 25


The Two-Sample t Test and Confidence Interval

Example 5
A researcher wishes to see if the average weights of newborn male
infants are different from the average weights of newborn female
infants. She selects a random sample of 10 male infants and finds the
mean weight is 7 pounds 11 ounces and the standard deviation of the
sample is 8 ounces. She selects a random sample of 8 female infants
and finds that the mean weight is 7 pounds 4 ounces and the standard
deviation of the sample is 5 ounces. Can it be concluded at α = 0.05
that the mean weight of the males is different from the mean weight
of the females? Assume that the variables are normally distributed.

Example 6
Find the 95% confidence interval for the true difference in means for
the data in Example 5.

AMS (ITC) Inferences Based on Two Samples 16/11/2023 16 / 25


The Two-Sample t Test and Confidence Interval

Pooled t Procedures
Alternatives to the two-sample t procedures just described result from
assuming not only that the two population distributions are normal but
also that they have equal variances (σ12 = σ22 ). That is, the two
population distribution curves are assumed normal with equal spreads,
the only possible difference between them being where they are
centered. The standardized variable
X̄ − Ȳ − (µ1 − µ2 )
T = q
Sp m1 + n1

has a t distribution with df ν = m + n − 2 and


s   
m−1 2 n−1
Sp = S1 + S22
m+n−2 m+n−2

Sp is called the pooled estimator of σ.

AMS (ITC) Inferences Based on Two Samples 16/11/2023 17 / 25


The Two-Sample t Test and Confidence Interval

Example 7
Consider the pooled T variable

(X̄ − Ȳ ) − (µ1 − µ2 )
T = q
Sp m1 + n1

which has a t distribution with m + n − 2 df when both population


distributions are normal with σ1 = σ2 .
(a) Use this T variable to obtain a pooled t confidence interval
formula for µ1 − µ2 .
(b) A sample of ultrasonic humidifiers of one particular brand was
selected for which the observations on maximum output of
moisture (oz) in a controlled chamber were 14.0, 14.3, 12.2, and
15.1. A sample of the second brand gave output values 12.1,
13.6, 11.9, and 11.2 (“Multiple Comparisons of Means Using
Simultaneous Confidence Intervals,” J. Qual. Techn., 1989:
232–41).
AMS (ITC) Inferences Based on Two Samples 16/11/2023 18 / 25
The Two-Sample t Test and Confidence Interval

Use the pooled t formula from part (a) to estimate the difference
between true average outputs for the two brands with a 95%
confidence interval.
(c) Estimate the difference between the two µ’s using the two-sample
t interval discussed in this section, and compare it to the interval
of part (b).
(d) Carry out the pooled t test H0 : µ1 − µ2 = 0 versus
Ha : µ1 − µ2 ̸= 0 at α = 0.05.

AMS (ITC) Inferences Based on Two Samples 16/11/2023 19 / 25


Analysis of Paired Data

Contents

1 Z Tests for a Difference Between Two Population Means

2 The Two-Sample t Test and Confidence Interval

3 Analysis of Paired Data

AMS (ITC) Inferences Based on Two Samples 16/11/2023 20 / 25


Analysis of Paired Data

Analysis of Paired Data

Assumpions
X1 , X2 , . . . , Xn is a random sample from N(µ1 , σ12 )
Y1 , Y2 , . . . , Yn is a random sample from N(µ2 , σ22 )
The X and Y samples are dependent.
The data consists of n independently selected pairs
(X1 , Y1 ), . . . , (Xn , Yn ) with E (Xi ) = µ1 and E (Yi ) = µ2 . Let
D1 = X1 − Y1 , . . . , Dn = Xn − Yn so the Di ’s are the differences within
pairs. Then the Di ’s are assumed to be normally distributed with
mean value µD and variance σD 2 (this is usually a consequence of the

Xi ’s and Yi ’s themselves being normally distributed).

AMS (ITC) Inferences Based on Two Samples 16/11/2023 21 / 25


Analysis of Paired Data

The Paired t Test


Because different pairs are independent, the Di ’s are independent of
each other. If we let D = X − Y , where X and Y are the first and
second observations, respectively, within an arbitrary pair, then the
expected difference is µD = E (X − Y ) = E (X ) − E (Y ) = µ1 − µ2 .
Thus any hypothesis about µ1 − µ2 can be phrased as a hypothesis
about the mean difference µD . But since the Di ’s constitute a normal
random sample (of differences) with mean µD , hypotheses about µD
can be tested using a one-sample t test.
Null Hypothesis: H0 : µD = ∆0
¯
Test statistics value: t = sd−∆√0
/ n
D

Alternative Hypothesis Rejection Region for Level α Test


Ha : µ D > ∆ 0 t ≥ tα,n−1
Ha : µ D < ∆ 0 t ≤ −tα,n−1
Ha : µD ̸= ∆0 either t ≥ tα/2,n−1 or t ≤ −tα/2,n−1
A P-value can be calculated as was done for earlier t tests.
AMS (ITC) Inferences Based on Two Samples 16/11/2023 22 / 25
Analysis of Paired Data

Example 8
Bank Deposits A random sample of nine local banks shows their
deposits (in billions of dollars) 3 years ago and their deposits (in
billions of dollars) today. At α = 0.05, can it be concluded that the
average in deposits for the banks is greater today than it was 3 years
ago? Use α = 0.05. Assume the variable is normally distributed.

Bank 1 2 3 4 5 6 7 8 9
3 years ago 11.42 8.41 3.98 7.37 2.28 1.10 1.00 0.9 1.35
Today 16.69 9.44 6.53 5.58 2.92 1.88 1.78 1.5 1.22

AMS (ITC) Inferences Based on Two Samples 16/11/2023 23 / 25


Analysis of Paired Data

Example 9
A dietitian wishes to see if a person’s cholesterol level will change if
the diet is supplemented by a certain mineral. Six randomly selected
subjects were pretested, and then they took the mineral supplement
for a 6-week period. The results are shown in the table. (Cholesterol
level is measured in milligrams per deciliter.) Can it be concluded that
the cholesterol level has been changed at α = 0.10? Assume the
variable is approximately normally distributed.

Subject 1 2 3 4 5 6
Before 210 235 208 190 172 244
After 190 170 210 188 173 228

AMS (ITC) Inferences Based on Two Samples 16/11/2023 24 / 25


Analysis of Paired Data

Confidence Interval for the Mean Difference


The paired t CI for µD is
sD sD
d¯ − tα/2,n−1 √ ≤ µD ≤ d¯ + tα/2,n−1 √
n n

Example 10
Find the 90% confidence interval for µD for the data in Example 9.

AMS (ITC) Inferences Based on Two Samples 16/11/2023 25 / 25

You might also like