0% found this document useful (0 votes)
77 views10 pages

Mathematics Soln

This document discusses estimation and hypothesis testing for the difference between two population means. It provides: 1) Definitions for independent and dependent samples drawn from two populations. 2) Formulas for estimating the difference between two population means (μ1 - μ2) using sample means (X̄ - Ȳ) and determining its standard deviation and sampling distribution. 3) Methods for constructing confidence intervals and performing hypothesis tests to evaluate whether the population means are the same or different, both when the population standard deviations (σ1, σ2) are known and when they are assumed to be equal.

Uploaded by

Duncan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views10 pages

Mathematics Soln

This document discusses estimation and hypothesis testing for the difference between two population means. It provides: 1) Definitions for independent and dependent samples drawn from two populations. 2) Formulas for estimating the difference between two population means (μ1 - μ2) using sample means (X̄ - Ȳ) and determining its standard deviation and sampling distribution. 3) Methods for constructing confidence intervals and performing hypothesis tests to evaluate whether the population means are the same or different, both when the population standard deviations (σ1, σ2) are known and when they are assumed to be equal.

Uploaded by

Duncan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Estimation and Hypothesis Testing: Two Populations (Sections 10.1 and 10.

2)

Definition (Independent vs. Dependent Samples)- Two samples are called independent
if they are drawn from two different populations and the elements of one sample have no
relationship to the elements of the second sample.
Example- A car magazine is comparing the total repair costs incurred during the first
three years on two sports cars, the T-999 and the XPY. Random samples of 45 T-999s and 51
XPYs are taken. Let X1 , X2 , · · · , X45 be the repair costs for the 45 T-999s and Y1 , Y2 , · · · , Y51
be the repair costs for the 51 XPYs. These two samples are independent.
Example- A nutritionist compares the average weights of 50 participants before and after
they went through a weight-loss program. Let X1 , X2 , · · · , X50 be the weights before and
Y1 , Y2 , · · · , Y50 be the weights after completing the program. These samples are dependent
because they involve the same 50 people.
Consider two populations with population means µ1 and µ2 and populations SDs σ1 and σ2 .
We will would like to perform estimation and hypothesis testing for the difference in the
population means µ1 − µ2 . For this purpose, we take n1 samples X1 , X2 , · · · , Xn1 from
population 1 and n2 samples Y1 , Y2 , · · · , Yn2 from population 2. Let X̄ and Ȳ be the sample
means for the two populations, i.e.,

X1 + X2 + · · · + Xn1
X̄ =
n1

and
Y1 + Y2 + · · · + Yn2
Ȳ = .
n2
Then it is easy to see that X̄ − Ȳ serves as an unbiased estimator for µ1 − µ2 , i.e.,

µX̄−Ȳ = µ1 − µ2 .

1
2

What can we say about the SD and sampling distribution of X̄ − Ȳ ? When the two samples
are independent, we have the following result:
Fact- If the two samples are independent, then
q
σX̄−Ȳ = σX2 + σY2 . (?)

Moreover, assume at least one of these two conditions holds:

(1) The population distribution for both populations 1 and population 2 is (approxi-
mately) normal.
(2) The sample sizes satisfy n1 ≥ 30 and n2 ≥ 30.

Then X̄ − Ȳ is (approximately) normally distributed, i.e.,


X̄ − Ȳ ∼ N µX̄−Ȳ , σX̄−Ȳ .

n1 n2 √σ1 √σ2
Remark- If N1
≤ 0.05 and N2
≤ 0.05, then σX̄ = n1
and σȲ = n2
and we can
write (?) as s
σ12 σ22
σX̄−Ȳ = + .
n1 n2
Remark- Throughout the rest of this chapter we assume the two samples under study are
independent and the populations satisfy one of the conditions provided in the previous Fact.
3

Interval Estimation of µ1 − µ2 : Known σ1 , σ2

Here is our main result:


Fact- An interval estimator for µ1 − µ2 with confidence level (1 − α)100% in the presence
of σ1 , σ2 is given by [X̄ − Ȳ − E, X̄ − Ȳ + E] where

E = zσX̄−Ȳ

is the margin of error and z > 0 is the unique number that satisfies

P (Z < z) = 1 − α/2.

Here, Z ∼ N (0, 1) is a standard normal random variable.


Example- A car magazine is comparing the total repair costs incurred during the first
three years on two sports cars, the T-999 and the XPY. Random samples of 45 T-999s and
51 XPYs are taken. All 96 cars are 3 years old and have similar mileages. The mean of
repair costs for the 45 T-999 cars is $3300 for the first 3 years. For the 51 XPY cars, this
mean is $3850. Assume that the standard deviations for the two populations are 800 and
1000, respectively. Construct a 97% confidence interval for the difference between the two
population means.
We have
α = 0.03,

n1 = 45, x̄ = 3300, σ1 = 800

and
n2 = 51, ȳ = 3850, σ2 = 1000.

We need to find z > 0 such that

P (Z < z) = 1 − α/2 = 1 − 0.03/2 = 0.985.


4

Using Table IV, we find z = 2.17. We also have


s r
σ12 σ22 8002 10002
σX̄−Ȳ = + = + = 183.93.
n1 n2 45 51

Then the margin of error is

E = zσX̄−Ȳ = 2.17 × 183.93 = 399.13.

Finally, x̄ − ȳ = 3300 − 3850 = −550 and the desired interval estimate is given by

[x̄ − ȳ − E, x̄ − ȳ + E] = [−550 − 399.13, −550 + 399.13] = [−949.13, −150.87].

Test of Hypothesis on µ1 − µ2 : Known σ1 , σ2

We are usually interested in the two-tailed test



 H0 : µ1 = µ2
 H : µ 6= µ
1 1 2

Our evidence about µ1 − µ2 is x̄ − ȳ. We reject H0 if x̄ − ȳ is sufficiently larger or smaller


than zero, i.e.,
x̄ − ȳ ≥ t OR x̄ − ȳ ≤ −t

for some t > 0.


Example- Consider the setup in the previous example. Using a 1% significance level, can
you conclude that such mean repair costs are different for these two types of cars? Follow
the p-value approach.
We proceed in four steps:

(1) The null and alternative hypotheses are

H0 : µ1 = µ2 , H1 : µ1 6= µ2 .
5

(2) The rejection region is of the form

x̄ − ȳ ≥ t OR x̄ − ȳ ≤ −t

for some t > 0. The critical points are where x̄ − ȳ = ±t. The relationship between
α and t comes through

α = P (reject H0 | H0 is true) = P (X̄ − Ȳ ≥ t OR X̄ − Ȳ ≤ −t | H0 is true)

(3) The evidence lies on the critical points when x̄ − ȳ = ±t, i.e., 3300 − 3850 = ±t
giving t = 550. Also, recall that


X̄ − Ȳ ∼ N µX̄−Ȳ , σX̄−Ȳ .

where µX̄−Ȳ = µ1 − µ2 and σX̄−Ȳ = 183.93. Under H0 , µ1 = µ2 and hence,

X̄ − Ȳ ∼ N (0, 183.932 ).

We have

p-value = P (X̄ − Ȳ ≥ 550 OR X̄ − Ȳ ≤ −550 | H0 is true)

= P (X̄ − Ȳ ≥ 550 | H0 is true) + P (X̄ − Ȳ ≤ −550 | H0 is true)

 X̄ − Ȳ − 0 550 − 0   X̄ − Ȳ − 0 −550 − 0 
= P ≥ H0 is true + P ≤ H0 is true
183.93 183.93 183.93 183.93
= P (Z ≥ 2.99) + P (Z ≤ −2.99)

= 2P (Z ≤ −2.99)

= 2 × 0.0014

= 0.0028.

(4) Since p-value= 0.0028 < α = 0.01, we reject H0 .


6

Interval Estimation of µ1 − µ2 : Unknown σ1 , σ2 , but σ1 = σ2

We will call the common value of σ1 , σ2 by σ, i.e., σ1 = σ2 = σ. Recall the sample variances
for the two samples are the random variables
( Xi )2 ( Y i )2
P P
Xi2 − Yi2 −
P P
n1 n2
S12 = , S22 = .
n1 − 1 n2 − 1

We define the pooled sample variance by

(n1 − 1)S12 + (n2 − 1)S22


Sp2 = .
n1 + n2 − 2

We have the following two facts regarding Sp :


Fact- Sp2 is an unbiased estimator of σ, i.e.,

µS 2 = σ 2 .
p

X̄−Ȳ −(µ1 −µ2 )


Fact- The random variable q has a student’s t-distribution with n1 + n2 − 2
Sp n1 + n1
1 2
degrees of freedom, i.e.,
X̄ − Ȳ − (µ1 − µ2 )
q ∼ Tn1 +n2 −2 .
Sp n11 + n12
This fact can be used to prove our main result in this section:
Fact- An interval estimator for µ1 − µ2 with confidence level (1 − α)100% in the absence
of σ1 , σ2 under the condition σ1 = σ2 is given by [X̄ − Ȳ − E, X̄ − Ȳ + E] where
r
1 1
E = t Sp +
n1 n2

is the margin of error and t > 0 is the unique number that satisfies

P (Tn1 +n2 −2 > t) = α/2.

Proof - The proof is omitted. 


7
q
1 1
Remark- The textbook introduces the notation sX̄−Ȳ = sp n1
+ n2
. We will not adopt
this notation.
Example- A high school counselor wanted to know if tenth-graders at her high school
tend to have the same free time as the twelfth-graders. She took random samples of 25 tenth-
graders and 23 twelfth-graders. Each student was asked to record the amount of free time
he or she had in a typical week. The mean for the tenth-graders was found to be 29 hours of
free time per week with a standard deviation of 7.0 hours. For the twelfth-graders, the mean
was 22 hours of free time per week with a standard deviation of 6.2 hours. Assume that the
two populations are approximately normally distributed with unknown but equal standard
deviations. Make a 90% confidence interval for the difference between the corresponding
population means.
We have
α = 0.1,

n1 = 25, x̄ = 29, s1 = 7

and
n2 = 23, ȳ = 22, s2 = 6.2.

The pooled SD is given by


s r
(n1 − 1)s21 + (n2 − 1)s22 (25 − 1)72 + (23 − 1)6.22
sp = = = 6.63.
n1 + n2 − 2 25 + 23 − 2

We need to find t > 0 such that P (T25+23−2 > t) = α/2, i.e., P (T46 > t) = 0.05. Using Table
V, we read t = 1.679. Then the margin of error is
r r
1 1 1 1
E = tsp + = 1.679 × 6.63 × + = 3.216.
n1 n2 25 23

Note that x̄ − ȳ = 29 − 22 = 7. The desired interval is

[x̄ − ȳ − E, x̄ − ȳ + E] = [7 − 3.216, 7 + 3.216] = [3.784, 10.216].


8
MATH 105, Lecture Notes, March 30

Test of Hypothesis on µ1 − µ2 : Unknown σ1 , σ2 , but σ1 = σ2

We are oftem interested in the two-tailed test



 H0 : µ1 = µ2
 H : µ 6= µ
1 1 2

Our evidence about µ1 − µ2 is qx̄−ȳ . We reject H0 if qx̄−ȳ is sufficiently larger or


1
sp n
+ n1 sp n
1
+ n1
1 2 1 2
smaller than zero, i.e.,

x̄ − ȳ x̄ − ȳ
q ≥ t OR q ≤ −t
sp n11 + 1
n2
sp n11 + 1
n2

for some t > 0. We note that under H0 ,

X̄ − Ȳ
q ∼ Tn1 +n2 −2 .
Sp n11 + 1
n2

Example- Consider the setup in the previous example. Test at a 5% significance level
whether the two population means are different.
We have
α = 0.05,

n1 = 25, x̄ = 29, s1 = 7

and
n2 = 23, ȳ = 22, s2 = 6.2.

The pooled SD is given by


s r
(n1 − 1)s21 + (n2 − 1)s22 (25 − 1)72 + (23 − 1)6.22
sp = = = 6.63.
n1 + n2 − 2 25 + 23 − 2

We take a p-value approach.

1
2

(1) The null and alternative hypotheses are

H0 : µ1 = µ2 , H1 : µ1 6= µ2 .

(2) The rejection region is of the form

x̄ − ȳ x̄ − ȳ
q ≥ t OR q ≤ −t,
sp n11 + 1
n2
sp n11 + 1
n2

for some t > 0. The critical points are where qx̄−ȳ = ±t. The relationship
1
sp n
+ n1
1 2
between α and t comes through
 
X̄ − Ȳ X̄ − Ȳ
α = P (reject H0 | H0 is true) = P  ≥ t OR ≤ −t H0 is true

q q
Sp n11 + 1
n2
Sp n11 + 1
n2

(3) Recall
n1 = 25, x̄ = 29, n2 = 23, ȳ = 22, sp = 6.63.

The evidence lies at the critical points when √1 1


29−22
= ±t. This gives ±3.654 = ±t
6.63 25 + 23

and hence, t = 3.654. Under H0 ,

X̄ − Ȳ
q ∼ Tn1 +n2 −2 = T25+23−2 = T46 .
Sp n11 + 1
n2

Then
p-value = P (T46 ≥ 3.654 OR T46 ≤ −3.654)

= P (T46 ≥ 3.654) + P (T46 ≤ −3.654)

= 2P (T46 ≥ 3.654).

Looking at Table V, we see that P (T46 ≥ 3.277) = 0.001. Therefore,


P (T46 ≥ 3.654) < 0.001 and we find

p-value ≤ 0.002.

(4) Since p-value< α, we reject H0 .

You might also like