Chapter5 Infererence Based On Two Samples
Chapter5 Infererence Based On Two Samples
Chapter5
Inferences Based on Two Samples
16/11/2023
Contents
Contents
Basic Assumptions
1. X1 , X2 , ..., Xm is a random sample from a distribution with mean
µ1 and variance σ12 .
2. Y1 , Y2 , ..., Yn is a random sample from a distribution with mean
µ2 and variance σ22 .
3. The X and Y samples are independent of one another.
Theorem 1
The expected value of X̄ − Ȳ is µ1 − µ2 , so X̄ − Ȳ is an unbiased
estimator of µ1 − µ2 . The standard deviation of X̄ − Ȳ is
q
σ2 σ2
σX̄ −Ȳ = m1 + n2
H a : µ 1 − µ 2 < ∆0 R = {z : z ≤ −zα }
Ha : µ1 − µ2 ̸= ∆0 R = {z : z ≤ −zα/2 or z ≥ zα/2 }
Example 1
Analysis of a random sample consisting of m = 20 specimens of
cold-rolled steel to determine yield strengths resulted in a sample
average strength of x̄ = 29.8 ksi. A second random sample of n = 25
two-sided galvanized steel specimens gave a sample average strength
of ȳ = 37.4 ksi. Assuming that the two yield-strength distributions are
normal with σ1 = 4.0 and σ2 = 5.0 (suggested by a graph in the
article “Zinc-Coated Sheet Steel: An Overview,” Automotive Engr.,
Dec. 1984: 39–43), does the data indicate that the corresponding true
average yield strengths µ1 and µ2 are different? Let’s carry out a test
at significance level α = 0.01.
′ ′
Alternative Hypothesis β(∆
)= P(type
′
II error when µ1 − µ2 = ∆ )
Ha : µ 1 − µ 2 > ∆ 0 Φ zα − ∆ −∆σ
0
′
Ha : µ 1 − µ 2 < ∆ 0 1 − Φ −zα − ∆ −∆ 0
σ
∆′ −∆0 ′
Ha : µ1 − µ2 ̸= ∆0 Φ zα/2 − σ − Φ −zα/2 − ∆ −∆
σ
0
q
where σ = σX̄ −Ȳ = (σ12 /m) + (σ22 /n) Sample sizes m and n can be
determined that will satisfy both P(type I error) = α and
P(type II error when µ1 − µ2 = ∆′ ) = β. For an upper-tailed test,
equating the previous expression for β(∆′ ) to the specified value of β
gives
σ12 σ22 (∆′ −∆0 )2
m + n = (zα +zβ )2
.
Example 2
Persons having Raynaud’s syndrome are apt to suffer a sudden
impairment of blood circulation in fingers and toes. In an experiment
to study the extent of this impairment, each subject immersed a
forefinger in water and the resulting heat output (cal/cm2 /min) was
measured. For m = 10 subjects with the syndrome, the average heat
output was x̄ = 64, and for n = 10 nonsufferers, the average output
was 2.05. Let µ1 and µ2 denote the true average heat outputs for the
two types of subjects. Assume that the two distributions of heat
output are normal with σ1 = 0.2 and σ2 = 0.4.
(a) Carry out the test H0 : µ1 − µ2 = −1 versus Ha : µ1 − µ2 < −1.
Compute the P−value.
(b) What is the probability of a type II error when the actual
difference between µ1 and µ2 is µ1 − µ2 = −1.2 ?
(c) Assuming that m = n, what sample sizes are required to ensure
that β = 0.1 when µ1 − µ2 = −1.2.
Large-Sample Tests
The assumptions of normal population distributions and known values
of σ1 and σ2 are unnecessary when both sample sizes are large(m ≥ 30
and n ≥ 30). In this case, the Central Limit Theorem guarantees that
X̄ − Ȳ has approximately a normal distribution regardless of the
underlying population distributions. Furthermore, using S12 and S22 in
place of σ12 and σ22 gives a variable whose distribution is approximately
standard normal:
X̄ −Ȳ −(µ1 −µ2 )
Z= r
S2 S2
1+ 2
m n
Example 3
Are male college students more easily bored than their female
counterparts? This question was examined in the article “Boredom in
Young Adults–Gender and Cultural Comparisons” (J. Cross-Cult.
Psych., 1991: 209–223). The authors administered a scale called the
Boredom Proneness Scale to 97 male and 148 female U.S. college
students. Does the accompanying data support the research
hypothesis that the mean Boredom Proneness Rating is higher for men
than for women? Test the appropriate hypotheses using a 0.05
significance level.
If both m and n are large, the CLT implies that this interval is valid
even without the assumption of normal populations; in this case, the
confidence level is approximately 100(1 − α)%. Furthermore, use of
the sample variances S12 and S22 in the standardized variable Z yields a
valid interval in which s12 and s22 replace σ12 and σ22 .
Provided that m and n are both large, a CI for µ1 − µ2 with a
confidence level of approximately 100(1 − α)% is
q
s2 s2
x̄ − ȳ ± zα/2 m1 + n2 .
AMS (ITC) Inferences Based on Two Samples 16/11/2023 10 / 25
Z Tests for a Difference Between Two Population Means
Example 4
Two brands of batteries are tested, and their voltages are compared.
The summary statistics follow. Find the 95% confidence interval of the
true difference in the means. Assume that both variables are normally
distributed.
Brand X Brand Y
X̄1 = 9.2 volts X̄2 = 8.8 volts
σ1 = 0.3 volt σ2 = 0.1 volt
n1 = 27 n2 = 30
What does your interval say about the claim that there is no difference
in mean voltages?
Contents
Assumption
Both population distributions are normal, so that X1 , X2 , ..., Xm is a
random sample from a normal distribution and so is Y1 , ..., Yn (with
the X ′ s and Y ′ s independent of one another). The plausibility of these
assumptions can be judged by constructing a normal probability plot of
the xi ’s and another of the yi ’s.
The test statistic and confidence interval formula are based on the
same standardized variable developed in Section 1, but the relevant
distribution is now t rather than z.
Theorem
When the population distribution are both normal and σX2 ̸= σY2 , the
standardized variable
X̄ − Ȳ − (µ1 − µ2 )
T = q (1)
S12 S22
m + n
Example 5
A researcher wishes to see if the average weights of newborn male
infants are different from the average weights of newborn female
infants. She selects a random sample of 10 male infants and finds the
mean weight is 7 pounds 11 ounces and the standard deviation of the
sample is 8 ounces. She selects a random sample of 8 female infants
and finds that the mean weight is 7 pounds 4 ounces and the standard
deviation of the sample is 5 ounces. Can it be concluded at α = 0.05
that the mean weight of the males is different from the mean weight
of the females? Assume that the variables are normally distributed.
Example 6
Find the 95% confidence interval for the true difference in means for
the data in Example 5.
Pooled t Procedures
Alternatives to the two-sample t procedures just described result from
assuming not only that the two population distributions are normal but
also that they have equal variances (σ12 = σ22 ). That is, the two
population distribution curves are assumed normal with equal spreads,
the only possible difference between them being where they are
centered. The standardized variable
X̄ − Ȳ − (µ1 − µ2 )
T = q
Sp m1 + n1
Example 7
Consider the pooled T variable
(X̄ − Ȳ ) − (µ1 − µ2 )
T = q
Sp m1 + n1
Use the pooled t formula from part (a) to estimate the difference
between true average outputs for the two brands with a 95%
confidence interval.
(c) Estimate the difference between the two µ’s using the two-sample
t interval discussed in this section, and compare it to the interval
of part (b).
(d) Carry out the pooled t test H0 : µ1 − µ2 = 0 versus
Ha : µ1 − µ2 ̸= 0 at α = 0.05.
Contents
Assumpions
X1 , X2 , . . . , Xn is a random sample from N(µ1 , σ12 )
Y1 , Y2 , . . . , Yn is a random sample from N(µ2 , σ22 )
The X and Y samples are dependent.
The data consists of n independently selected pairs
(X1 , Y1 ), . . . , (Xn , Yn ) with E (Xi ) = µ1 and E (Yi ) = µ2 . Let
D1 = X1 − Y1 , . . . , Dn = Xn − Yn so the Di ’s are the differences within
pairs. Then the Di ’s are assumed to be normally distributed with
mean value µD and variance σD 2 (this is usually a consequence of the
Example 8
Bank Deposits A random sample of nine local banks shows their
deposits (in billions of dollars) 3 years ago and their deposits (in
billions of dollars) today. At α = 0.05, can it be concluded that the
average in deposits for the banks is greater today than it was 3 years
ago? Use α = 0.05. Assume the variable is normally distributed.
Bank 1 2 3 4 5 6 7 8 9
3 years ago 11.42 8.41 3.98 7.37 2.28 1.10 1.00 0.9 1.35
Today 16.69 9.44 6.53 5.58 2.92 1.88 1.78 1.5 1.22
Example 9
A dietitian wishes to see if a person’s cholesterol level will change if
the diet is supplemented by a certain mineral. Six randomly selected
subjects were pretested, and then they took the mineral supplement
for a 6-week period. The results are shown in the table. (Cholesterol
level is measured in milligrams per deciliter.) Can it be concluded that
the cholesterol level has been changed at α = 0.10? Assume the
variable is approximately normally distributed.
Subject 1 2 3 4 5 6
Before 210 235 208 190 172 244
After 190 170 210 188 173 228
Example 10
Find the 90% confidence interval for µD for the data in Example 9.