0% found this document useful (0 votes)
68 views18 pages

Stats Coursework 2

The document describes a study conducted by a company that produces plastic water storage tanks of varying materials and capacities. The following key points are summarized: 1) Data on the tank capacities was collected and tested to determine if the average capacity was equal to the expected 4400 L, which was found to not be rejected. 2) The capacities of the three plastic materials were tested for differences and found to not be significantly different. 3) A simulation changed the tank capacities using random variables, and when re-tested the materials were found to have significantly different capacities. 4) The expected capacities under the new production process were calculated.

Uploaded by

Ricardo Mateus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views18 pages

Stats Coursework 2

The document describes a study conducted by a company that produces plastic water storage tanks of varying materials and capacities. The following key points are summarized: 1) Data on the tank capacities was collected and tested to determine if the average capacity was equal to the expected 4400 L, which was found to not be rejected. 2) The capacities of the three plastic materials were tested for differences and found to not be significantly different. 3) A simulation changed the tank capacities using random variables, and when re-tested the materials were found to have significantly different capacities. 4) The expected capacities under the new production process were calculated.

Uploaded by

Ricardo Mateus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Problem 1

A company specialises in the production of plastic tanks for water storage. These have an irregular
shape, but the production process guarantees a capacity of approximately 4400 L of water for each
tank. The company’s production analyst collects information on tanks of three different materials,
plastic A, plastic B and plastic C, in the form of amount of water filling individual containers, in litres.
Data collected are included in Sheet1 of the Excel workbook “[Link]”.

(a) Test the hypothesis that, irrespective of the type of plastic, a container has an average capacity
equal to 4400 L, at the 5% significance level.

The null and alternative hypothesis for the following problem are:

𝐻0 : 𝜇 = 4400
𝐻1 : 𝜇 ≠ 4400
𝛼 = 0.05 𝑛 = 105, there were 35 containers of each plastic

𝑥̅ = 4401.01 sample mean of all containers calculated on excel

Since we do not know the population standard deviation, we have to calculate the sample standard
deviation, which is 𝑠 = 4.46

Considering the standard deviation for the population is not given, we have to use the T Student
Distribution to calculate the test statistic.
𝑥̅ − 𝜇0 4401.01 − 4400
𝑇𝑠 = = = 2.32
𝜎/√𝑠 7.81/√105

As the alternative hypothesis says that the mean does not equal to 4400, means that it can be higher
or lower than 4400, making this a two tailed test. Thus, the critical value on the T Student
distribution that corresponds to the left and right tail areas are:

𝑇𝑐 = 𝑇105−1,0.05/2 = 𝑇104,0.025 = ±1.98

As |𝑇𝑠 | > |𝑇𝑐 |, we can reject the null hypothesis at a significance level of 5%

(b) Test the hypothesis that the three different types of container have the same capacity, at the 5%
significance level.

The null and alternative hypothesis for the following problem are:

𝐻0 : 𝜇𝐴 = 𝜇𝐵 = 𝜇𝐶
𝐻1 : 𝜇𝐴 ≠ 𝜇𝐵 ≠ 𝜇𝐶

𝑔 = 3 as different groups (Plastic A, Plastic B, Plastic C) in this treatment. All groups have sample size
of 35, thus 𝑛𝐴 = 35, 𝑛𝐵 = 35 and 𝑛𝐶 = 35 and 𝑁 = 105. 𝛼 = 0.05

The next step is calculating the sample means and sample variances for each container. The sample
mean of the whole population remains the same as in part (a) of this question, which is 𝑥̅ = 4401.01
Plastic A: 𝑥̅𝐴 = 4400.82 and 𝑠 2𝐴 = 12.62

Plastic B: 𝑥̅𝐵 = 4400.65 and 𝑠 2 𝐵 = 20.29

Plastic C: 𝑥̅𝐶 = 4401.54 and 𝑠 2 𝐶 = 27.55

Using the following results, we have to calculate the Sum of Squared Estimate of Errors (SSE) and
Sum of Squares Between (SSB)
𝑔

𝑆𝑆𝐵 ≡ ∑ 𝑛𝑖 (𝑥̅𝑖 − 𝑥̅ )2 = 𝟏𝟓. 𝟒𝟓


𝑖=1
𝑔

𝑆𝑆𝐸 ≡ ∑(𝑛𝑖 − 1)𝑠𝑖 2 = 𝟐𝟎𝟓𝟓. 𝟓𝟒


𝑖=1

Now we have to calculate the degrees of freedom associated to each in order to calculate Mean Sum
of Errors (MSE) and Mean Square Between (MSB)

𝑆𝑆𝐵 𝑑𝑓 = 𝑔 − 1 = 𝟐 and 𝑆𝑆𝐸 𝑑𝑓 = 𝑁 − 𝑔 = 𝟏𝟎𝟐

and thus:
𝑆𝑆𝐵 15.45
𝑀𝑆𝐵 = = = 𝟕. 𝟕𝟐
𝑑𝑓 2
𝑆𝑆𝐸 2055.54
𝑀𝑆𝐸 = = = 𝟐𝟎. 𝟏𝟓
𝑑𝑓 102

Now we have to calculate the test statistic for the F distribution, which is:
𝑀𝑆𝐵 7.72
𝐹= = = 𝟎. 𝟑𝟖𝟑
𝑀𝑆𝐸 20.15

Lastly, we need to calculate the critical value for the F-distribution, which is:

𝐹(𝑥; 𝑑1 , 𝑑2 ) = 𝐹(𝑥; 2, 102) = 3.09

As 𝐹 ≯ 𝐹𝑐 we cannot reject the null hypothesis at a significance level of 5%.

(c) Simulate a change in the production process. The original capacities CA, CB, CC of the different
containers change into the respective capacities C 0 A, C 0 B, C 0 C , according to the following
formulas:

𝐶 ′𝐴 = 𝐶𝐴 + 500𝑎 𝐶 ′ 𝐵 = 𝐶𝐵 + 1000𝑏 𝐶 ′ 𝐶 = 𝐶𝐶 + 1500𝑐


where a, b and c are uniform random numbers extracted, respectively, from rows 41 to 75 of data in
column A, rows 231 to 265 of data in column B and rows 871 to 905 of data in column C of Sheet3 in
the Excel workbook “[Link]”.
Please check columns R, S and T on the excel sheet “Problem 1” to see the workings for the above.

(d) Carry out the same test of part (b) on the containers built with the new production process.

The hypothesis to be tested is:

𝐻0 : 𝜇′ 𝐴 = 𝜇′ 𝐵 = 𝜇′ 𝐶

𝐻1 : 𝜇′ 𝐴 ≠ 𝜇′ 𝐵 ≠ 𝜇′ 𝐶

𝑔 = 3 as different groups (Plastic A new, Plastic B new, Plastic C new) in this treatment. All groups
have sample size of 35, thus 𝑛′𝐴 = 35, 𝑛′ 𝐵 = 35 and 𝑛′ 𝐶 = 35 and 𝑁 = 105. 𝛼 = 0.05

The next step is calculating the sample means and sample variances for each container. The sample
mean of the whole new population is 𝑥̅ ′ = 4899.57

Plastic A: 𝑥̅𝐴 = 4691.52 and 𝑠 2𝐴 = 24328.72

Plastic B: 𝑥̅𝐵 = 4862.96 and 𝑠 2 𝐵 = 111906.5

Plastic C: 𝑥̅𝐶 = 5144.25 and 𝑠 2 𝐶 = 194118.7

Using the following results, we have to calculate the Sum of Squared Estimate of Errors (SSE) and
Sum of Squares Between (SSB)
𝑔

𝑆𝑆𝐵 ≡ ∑ 𝑛𝑖 (𝑥̅𝑖 − 𝑥̅ )2 = 3657231.12


𝑖=1
𝑔

𝑆𝑆𝐸 ≡ ∑(𝑛𝑖 − 1)𝑠𝑖 2 = 11,232,032.79


𝑖=1

Now we have to calculate the degrees of freedom associated to each in order to calculate Mean Sum
of Errors (MSE) and Mean Square Between (MSB)

𝑆𝑆𝐵 𝑑𝑓 = 𝑔 − 1 = 𝟐 and 𝑆𝑆𝐸 𝑑𝑓 = 𝑁 − 𝑔 = 𝟏𝟎𝟐

and thus:
𝑆𝑆𝐵 3,657,231.12
𝑀𝑆𝐵 = = = 1,828,615.56
𝑑𝑓 2
𝑆𝑆𝐸 11,232,032.79
𝑀𝑆𝐸 = = = 110,117.97
𝑑𝑓 102

Now we have to calculate the test statistic for the F distribution, which is:
𝑀𝑆𝐵 1,828,615.56
𝐹= = = 16.61
𝑀𝑆𝐸 110,117.97
Lastly, we need to calculate the critical value for the F-distribution, which is:

𝐹(𝑥; 𝑑1 , 𝑑2 ) = 𝐹(𝑥; 2, 102) = 3.09

As 𝐹 > 𝐹𝑐 we can reject the null hypothesis at a significance level of 5%.

(e) What are the expected capacities of the new containers?

As the original population is a uniform distribution between 0 and 1, we can easily calculate that the
mean or expected value will be 0.5. Therefore, the expected capacities for each plastic are calculated
the following way:

𝐸(𝑋𝐴 ) = 4400 + 500(0.5) = 4650


𝐸(𝑋𝐵 ) = 4400 + 1000(0.5) = 4900
𝐸(𝑋𝐶 ) = 4400 + 1500(0.5) = 5150

(f) Using the values found in part (e), test your hypothesis that the containers have average values
equal to the expected capacities found, at the 5% significance level.

New Plastic A

The null and alternative hypothesis for the following problem are:

𝐻0 : 𝜇′ 𝐴 = 4650

𝐻1 : 𝜇′ 𝐴 ≠ 4650

𝛼 = 0.05 𝑛 = 35
𝑥̅ = 4691.52 sample mean for new population of container A

Since we do not know the population standard deviation, we have to calculate the sample standard
deviation, which is 𝑠 = 155.98

Considering the standard deviation for the population is not given, we have to use the T Student
Distribution to calculate the test statistic.
𝑥̅ − 𝜇0 4691.52 − 4650
𝑇𝑠 = = = 1.57
𝜎/√𝑠 155.98/√35
As the alternative hypothesis says that the mean does not equal to 4650, means that it can be higher
or lower than 4650, making this a two tailed test. Thus, the critical value on the T Student
distribution that corresponds to the left and right tail areas are:

𝑇𝑐 = 𝑇35−1,0.05/2 = 𝑇34,0.025 = ±2.03

As |𝑇𝑠 | < |𝑇𝑐 |, we cannot reject the null hypothesis at a significance level of 5%
New plastic B

The null and alternative hypothesis for the following problem are:

𝐻0 : 𝜇′ 𝐵 = 4900

𝐻1 : 𝜇′ 𝐵 ≠ 4900

𝛼 = 0.05 𝑛 = 35
𝑥̅ = 4862.96 sample mean for new population of container B

Since we do not know the population standard deviation, we have to calculate the sample standard
deviation, which is 𝑠 = 334.52

Considering the standard deviation for the population is not given, we have to use the T Student
Distribution to calculate the test statistic.
𝑥̅ − 𝜇0 4862.96 − 4900
𝑇𝑠 = = = −0.65
𝜎/√𝑠 334.52/√35
As the alternative hypothesis says that the mean does not equal to 4650, means that it can be higher
or lower than 4900, making this a two tailed test. Thus, the critical value on the T Student
distribution that corresponds to the left and right tail areas are:

𝑇𝑐 = 𝑇35−1,0.05/2 = 𝑇34,0.025 = ±2.03

As |𝑇𝑠 | < |𝑇𝑐 |, we cannot reject the null hypothesis at a significance level of 5%

New plastic C

The null and alternative hypothesis for the following problem are:

𝐻0 : 𝜇′ 𝐶 = 5150

𝐻1 : 𝜇′ 𝐶 ≠ 5150

𝛼 = 0.05 𝑛 = 35
𝑥̅ = 5144.25 sample mean for new population of container C

Since we do not know the population standard deviation, we have to calculate the sample standard
deviation, which is 𝑠 = 440.59

Considering the standard deviation for the population is not given, we have to use the T Student
Distribution to calculate the test statistic.
𝑥̅ − 𝜇0 5144.25 − 5150
𝑇𝑠 = = = −0.077
𝜎/√𝑠 440.59/√35
As the alternative hypothesis says that the mean does not equal to 4650, means that it can be higher
or lower than 5150, making this a two tailed test. Thus, the critical value on the T Student
distribution that corresponds to the left and right tail areas are:

𝑇𝑐 = 𝑇35−1,0.05/2 = 𝑇34,0.025 = ±2.03


As |𝑇𝑠 | < |𝑇𝑐 |, we cannot reject the null hypothesis at a significance level of 5%

Problem 2

Four varieties of wheat are grown at a large farm. The height of each spike in cm can depend on the
type of wheat and the type of fertiliser used.

(a) 5 observations of spikes’ height for each combination of type-of-wheat and type-of-fertiliser are
presented on Sheet2 of the Excel workbook “[Link]”. Using these data, calculate the following
quantities related to the analysis of variance with replication

𝑥̅ 𝑥̅𝑖.. 𝑥̅.𝑗. 𝑥̅𝑖𝑗.

𝑆𝑆𝑇, 𝑆𝑆𝑅, 𝑆𝑆𝐶, 𝑆𝑆𝑅𝐶

which have been introduced and explained in Lecture 09.

The first calculation done is the sample mean 𝑥̅ , which is calculated the following way:
𝑔 ℎ 𝑟
1
𝑥̅ ≡ ∑ ∑ ∑ 𝑥𝑖𝑗𝑘 = 145.57
𝑔ℎ𝑟
𝑖=1 𝑗=1 𝑘=1

Then we have to calculate the average of each row 𝑥̅𝑖.. , which is of each fertilizer regardless of the
wheat type, which is calculated as:

Fertilizer 1 ℎ 𝑟
1
𝑥̅1.. ≡ ∑ ∑ 𝑥𝑖𝑗𝑘 = 136.30
ℎ𝑟
𝑗=1 𝑘=1

Fertilizer 2 ℎ 𝑟
1
𝑥̅2.. ≡ ∑ ∑ 𝑥𝑖𝑗𝑘 = 145.32
ℎ𝑟
𝑗=1 𝑘=1

Fertilizer 3 ℎ 𝑟
1
𝑥̅3.. ≡ ∑ ∑ 𝑥𝑖𝑗𝑘 = 155.06
ℎ𝑟
𝑗=1 𝑘=1

Then we calculate the average of each column 𝑥̅.𝑗. , which is the average of each wheat type
regardless of each fertilize, which is calculate as:

Wheat 1 𝑔 𝑟
1
𝑥̅.1. ≡ ∑ ∑ 𝑥𝑖𝑗𝑘 = 136.98
𝑔𝑟
𝑖=1 𝑘=1
Wheat 2 𝑔 𝑟
1
𝑥̅.2. ≡ ∑ ∑ 𝑥𝑖𝑗𝑘 = 146.75
𝑔𝑟
𝑖=1 𝑘=1

Wheat 3 𝑔 𝑟
1
𝑥̅.3. ≡ ∑ ∑ 𝑥𝑖𝑗𝑘 = 147.26
𝑔𝑟
𝑖=1 𝑘=1

Wheat 4 𝑔 𝑟
1
𝑥̅.4. ≡ ∑ ∑ 𝑥𝑖𝑗𝑘 = 151.26
𝑔𝑟
𝑖=1 𝑘=1

The last mean calculation is 𝑥̅𝑖𝑗. Which is the average of a specific wheat type with a specific
fertilizer. This is calculated as:

𝑟
Fertilizer 1 and Wheat 1 1
𝑥̅11. = ∑ 𝑥𝑖𝑗𝑘 = 123.80
𝑟
𝑘=1
𝑟
Fertilizer 1 and Wheat 2 1
𝑥̅12. = ∑ 𝑥𝑖𝑗𝑘 = 138.71
𝑟
𝑘=1
𝑟
Fertilizer 1 and Wheat 3 1
𝑥̅13. = ∑ 𝑥𝑖𝑗𝑘 = 141.17
𝑟
𝑘=1
𝑟
Fertilizer 1 and Wheat 4 1
𝑥̅14. = ∑ 𝑥𝑖𝑗𝑘 = 141.52
𝑟
𝑘=1
𝑟
Fertilizer 2 and Wheat 1 1
𝑥̅21. = ∑ 𝑥𝑖𝑗𝑘 = 136.70
𝑟
𝑘=1
𝑟
Fertilizer 2 and Wheat 2 1
𝑥̅22. = ∑ 𝑥𝑖𝑗𝑘 = 149.63
𝑟
𝑘=1
𝑟
Fertilizer 2 and Wheat 3 1
𝑥̅23. = ∑ 𝑥𝑖𝑗𝑘 = 142.61
𝑟
𝑘=1
𝑟
Fertilizer 2 and Wheat 4 1
𝑥̅24. = ∑ 𝑥𝑖𝑗𝑘 = 152.36
𝑟
𝑘=1
𝑟
Fertilizer 3 and Wheat 1 1
𝑥̅31. = ∑ 𝑥𝑖𝑗𝑘 = 150.44
𝑟
𝑘=1
𝑟
Fertilizer 3 and Wheat 2 1
𝑥̅32. = ∑ 𝑥𝑖𝑗𝑘 = 151.91
𝑟
𝑘=1
𝑟
Fertilizer 3 and Wheat 3 1
𝑥̅33. = ∑ 𝑥𝑖𝑗𝑘 = 157.98
𝑟
𝑘=1
𝑟
Fertilizer 3 and Wheat 4 1
𝑥̅34. = ∑ 𝑥𝑖𝑗𝑘 = 159.91
𝑟
𝑘=1

For the calculations of: 𝑆𝑆𝑇, 𝑆𝑆𝑅, 𝑆𝑆𝐶, 𝑆𝑆𝑅𝐶, please find the workings on excel on the sheet
“Problem 2”. Ill provide the formula, but the detailed workings are very large and consist of multiple
simple additions. These calculations can be found on cells H25, H26, H27 and H28 and are the
following:
𝑔 ℎ 𝑟

𝑆𝑆𝑇 ≡ ∑ ∑ ∑(𝑥𝑖𝑗𝑘 − 𝑥̅ )2 = 11295.80


𝑖=1 𝑗=1 𝑘=1
𝑔

𝑆𝑆𝑅 ≡ ℎ𝑟 ∑(𝑥̅𝑖.. − 𝑥̅ )2 = 3521.05


𝑖=1

2
𝑆𝑆𝐶 ≡ 𝑔𝑟 ∑(𝑥̅.𝑗. − 𝑥̅ ) = 1656.59
𝑗=1
𝑔 ℎ

𝑆𝑆𝑅𝐶 ≡ 𝑟 ∑ ∑(𝑥𝑖𝑗. − 𝑥̅𝑖.. − 𝑥̅.𝑗. + 𝑥̅ )2 = 473.84


𝑖=1 𝑗=1

(b) Carry out the analysis of variance corresponding to the calculations in part (a), using an α = 0.04
significance level. Do data provide sufficient evidence to indicate an interaction effect?

In order to test for evidence that indicates an interaction effect, we need to use the previous
numbers calculated in part (a).

We are interested in using SSRC, which refers to the sum of squares for the interaction and perform
its test statistic. In order to do this, we need to calculate MSRC and MSE first, thus:
𝑆𝑆𝑅𝐶 473.84
𝑀𝑆𝑅𝐶 = = = 78.97
(𝑔 − 1)(ℎ − 1) 6

SSE stands for Sum of Squares for Error, which hasn’t been yet calculated. Nevertheless, with all the
others results, we can manipulate the formula and calculate SSE easily, thus:

𝑆𝑆𝑇 = 𝑆𝑆𝐸 + 𝑆𝑆𝑅 + 𝑆𝑆𝐶 + 𝑆𝑆𝑅𝐶


Simple manipulation of the formula to get:

𝑆𝑆𝑇 − 𝑆𝑆𝑅 − 𝑆𝑆𝑅𝐶 − 𝑆𝑆𝐶 = 𝑆𝑆𝐸


11295.8 − 3521.05 − 473.84 − 1656.59 = 5654.314
With SSE we can now calculate MSE, which is calculate by:
𝑆𝑆𝐸 5654.314
𝑀𝑆𝐸 = = = 117.59
𝑔ℎ(𝑟 − 1) 48

Now we can do the test statistic for SSRC to see if there is any evidence of interaction.
𝑀𝑆𝑅𝐶 78.97
𝐹= = = 0.672
𝑀𝑆𝐸 117.59

The critical value considers the degrees of freedom from both MSRC and MSE calculations and it’s
the following: 𝐹(𝑥; 𝑑1 , 𝑑2 ) = 𝐹(𝑥; 6, 48) = 2.420

As 𝐹 ≯ 𝐹𝑐 we cannot reject the null hypothesis at a significance level of 5%. Therefore, there is no
evidence to indicate an interaction effect between treatments.

(c) Would you use these data to justify a difference among population means across the type-of-
wheat treatment, at an α = 0.04 significance level? Would you use the same data to justify a
difference among population means across the type-of-fertiliser treatment, still at an α = 0.04
significance level?

Type-of-wheat treatment

To test if the data justifies a difference among population means across the type of wheat
treatment, we have to conduct the test statistic for SSC. In order to do this, we have to calculate
MSC first which is:

𝑆𝑆𝐶 1656.59
𝑀𝑆𝐶 = = = 552.20
ℎ−1 3

Now that we have MSC, we use MSE calculated on part (B) of the problem in order to conduct the
test statistic.
𝑀𝑆𝐶 552.20
𝐹= = = 4.696
𝑀𝑆𝐸 117.59

The critical value considers the degrees of freedom from both MSC and MSE calculations and it’s the
following: 𝐹(𝑥; 𝑑1 , 𝑑2 ) = 𝐹(𝑥; 3, 48) = 2.992

As 𝐹 > 𝐹𝑐 we can reject the null hypothesis at a significance level of 5%. Therefore, there is evidence
to indicate a difference among population means across the type of wheat treatment

Type-of-fertiliser treatment
To test if the data justifies a difference among population means across the type of fertilizer
treatment, we have to conduct the test statistic for SSR. In order to do this, we have to calculate
MSR first which is:

𝑆𝑆𝑅 3521.06
𝑀𝑆𝑅 = = = 1760.53
𝑔−1 2
Now that we have MSR, we use MSE calculated on part (B) of the problem in order to conduct the
test statistic.
𝑀𝑆𝑅 1760.53
𝐹= = = 14.97
𝑀𝑆𝐸 117.59

The critical value considers the degrees of freedom from both MSR and MSE calculations and it’s the
following: 𝐹(𝑥; 𝑑1 , 𝑑2 ) = 𝐹(𝑥; 2, 48) = 3.45

As 𝐹 > 𝐹𝑐 we can reject the null hypothesis at a significance level of 5%. Therefore, there is evidence
to indicate a difference among population means across the type of fertilizer treatment

Problem 3

Six models of automobile make A, six of make B and six of make C were used to compare the
mileages achievable by the three makes with a full tank. The data gathered are given in the following
table (measures in miles):

A B C
333.67 289.57 215.17
310.56 272.81 194.99
288.26 247.67 175.24
272.97 236.01 167.9
256.94 214.81 144.85
231.02 183.07 124.22

(a) Carry out a one-way analysis of variance to compare miles for the three car makes, at an α = 0.05
significance level

The null and alternative hypothesis for the following problem are:

𝐻0 : 𝜇𝐴 = 𝜇𝐵 = 𝜇𝐶
𝐻1 : 𝜇𝐴 ≠ 𝜇𝐵 ≠ 𝜇𝐶
𝑔 = 3 as there are 3 different types of cars. Each type of car has a sample size of 6, thus 𝑛𝐴 = 6,
𝑛𝐵 = 6 and 𝑛𝐶 = 6 and 𝑁 = 18.

The next step is calculating the sample means and sample variances for each car type. The sample
mean of the whole population is 𝑥̅ = 231.1

Plastic A: 𝑥̅𝐴 = 282.23 and 𝑠 2𝐴 = 1366.56

Plastic B: 𝑥̅𝐵 = 240.66 and 𝑠 2 𝐵 = 1496.28

Plastic C: 𝑥̅𝐶 = 170.4 and 𝑠 2 𝐶 = 1084.82

Using the following results, we have to calculate the Sum of Squared Estimate of Errors (SSE) and
Sum of Squares Between (SSB)
𝑔

𝑆𝑆𝐵 ≡ ∑ 𝑛𝑖 (𝑥̅𝑖 − 𝑥̅ )2 = 38348.31


𝑖=1
𝑔

𝑆𝑆𝐸 ≡ ∑(𝑛𝑖 − 1)𝑠𝑖 2 = 19738.31


𝑖=1

Now we have to calculate the degrees of freedom associated to each in order to calculate Mean Sum
of Errors (MSE) and Mean Square Between (MSB)

𝑆𝑆𝐵 𝑑𝑓 = 𝑔 − 1 = 2 and 𝑆𝑆𝐸 𝑑𝑓 = 𝑁 − 𝑔 = 15

and thus:
𝑆𝑆𝐵 38348.31
𝑀𝑆𝐵 = = = 19174.16
𝑑𝑓 2
𝑆𝑆𝐸 19738.31
𝑀𝑆𝐸 = = = 1315.88
𝑑𝑓 15

Now we have to calculate the test statistic for the F distribution, which is:
𝑀𝑆𝐵 19174.16
𝐹= = = 14.57
𝑀𝑆𝐸 1315.88

Lastly, we need to calculate the critical value for the F-distribution, which is:

𝐹(𝑥; 𝑑1 , 𝑑2 ) = 𝐹(𝑥; 2, 15) = 3.68

As 𝐹 > 𝐹𝑐 we can reject the null hypothesis at a significance level of 5%.


(b) When an analyst performs on the same set of data a two-way ANOVA without replication, she

obtains the following values for the sums of squares:

SSR 38348.3

SSC 19595.3

SSE 143.0

Carry out a two-way analysis of variance starting from the information provided, using an α = 0.05

significance level.

Given the information provided, we can put these into an ANOVA table to better understand the
missing values.

Important to consider that with a two-way ANOVA test our values of 𝑔 𝑎𝑛𝑑 ℎ change. Now, 𝑔 =
6 𝑎𝑛𝑑 ℎ = 3.

Source SS df MS F Fc

Row SSR = 38348.3 𝑔−1=5

Column SSC = 19595.3 ℎ−1=2

Error SSE = 143.0 (𝑔 − 1)(ℎ − 1) =10

TOTAL SST = 58086.6 𝑔ℎ − 1 = 17

The next step would be calculating MSR, MSC and MSE. Which are calculated the following way:
𝑆𝑆𝑅 38348.3
𝑀𝑆𝑅 = = = 7669.66
𝑔−1 5
𝑆𝑆𝐶 19595.3
𝑀𝑆𝐶 = = = 9797.65
𝑑𝑓 2
𝑆𝑆𝐸 143
𝑀𝑆𝐸 = = = 14.3
𝑑𝑓 10

Updated table

Source SS df MS F Fc

Row SSR = 38348.3 𝑔−1=5 7669.66

Column SSC = 19595.3 ℎ−1=2 9797.65

Error SSE = 143.0 (𝑔 − 1)(ℎ − 1) =10 14.3

TOTAL SST = 58086.6 𝑔ℎ − 1 = 17


Now we have to do their respective test statistics and find the critical value for each.

For SSR, the test statistic is the following:


𝑀𝑆𝑅 7669.66
𝐹= = = 536.34
𝑀𝑆𝐸 14.3
The critical value is 𝐹(𝑥; 𝑑1 , 𝑑2 ) = 𝐹(𝑥; 5, 10) = 3.326

For SSC the test statistic is the following


𝑀𝑆𝐶 9797.65
𝐹= = = 685.15
𝑀𝑆𝐸 117.59
The critical value is 𝐹(𝑥; 𝑑1 , 𝑑2 ) = 𝐹(𝑥; 2, 10) = 4.103

Updated table

Source SS df MS F Fc

Row SSR = 38348.3 𝑔−1=5 7669.66 536.34 3.326

Column SSC = 19595.3 ℎ−1=2 9797.65 685.15 4.103

Error SSE = 143.0 (𝑔 − 1)(ℎ − 1) =10 14.3

TOTAL SST = 58086.6 𝑔ℎ − 1 = 17

Since in both cases 𝐹 > 𝐹𝑐 , we can reject the null hypothesis that the populations means are the
same across the mileage and make-of-car treatment at a significance level of 5%.

(c) Compare the two types of analysis of variance in part (a) and (b). Why do you think the sums of
squares due to error is larger in the one-way ANOVA than in the two-way ANOVA? Justify your
answer in detail.

The one-way ANOVA data is made up of two elements being SSB and SSE. Therefore:

𝑥 = 𝑆𝑆𝐵 + 𝑆𝑆𝐸
Nevertheless, in the two-way ANOVA, due to the presence of another treatment there is another
element being SSR. Therefore, the formula being:

𝑥 = 𝑆𝑆𝑅 + 𝑆𝑆𝐶 + 𝑆𝑆𝐸


In this example since the data stayed the same and there was an addition of element (SSR), SSE had
to decrease. X remained 58086.6 in both ANOVA tests, but since SSC was 19595.3, SSE had to be
subtracted by that exact amount. This shows that the addition of this new treatment reduced the
errors between treatments.
Problem 04

An ice cream seller regularly sells 200 cones per day in summer. He fills the cones with just one
scoop of either vanilla (flavour code 1), or chocolat (flavour code 2), or pistachio (flavour code 3), or
strawberry (flavour code 4) or lemon (flavour code 5). Past sales suggest that vanilla and chocolat
sell equally well and account for 60% of daily sold cones. They are followed by pistachio with 25% of
daily sold cones and by strawberry and lemon who also sell equally well, with sales accounting for
the remaining 15% of daily sales. Sales of the 200 cones for a given day are listed in column A of
Sheet4 of the Excel workbook “[Link]”.

(a) Do sales on that specific day justify the trend suggested by past sales? Verify with an α = 0.025
significance level.

𝑛 = 200 as the data provides the sales of 200 cones

The first step is to understand the probability for each of the flavours.

𝑝01 = .3 𝑝02 = .3 𝑝03 = 0.25 𝑝04 = 0.075 𝑝05 = 0.075


. 3 + .3 + .25 + .075 + .075 = 1

The second step is to calculate the expected value for each flavour. The expected value is calculated
using the formula: 𝑛 ∗ 𝑝𝑖

𝐸 (𝑋1 ) 200 ∗ .3 = 60

𝐸 (𝑋2 ) 200 ∗ .3 = 60

𝐸 (𝑋3 ) 200 ∗ .25 = 50

𝐸 (𝑋4 ) 200 ∗ .075 = 15

𝐸 (𝑋5 ) 200 ∗ .075 = 15

Then we have to look at the sample of data and count the number of observations for each flavour
on day 1. Manually this would take a very long time, as the sample size is 200, but on excel we can
use the formula “countif”.

𝑂1 68

𝑂2 54

𝑂3 50

𝑂4 14

𝑂5 14
As we have 5 categories (flavours), we know that 𝑘 = 5. Using the information above, we can
calculate the test statistic:
𝑘
(𝑂𝑖 − 𝐸𝑖 )2
𝑇=∑ = 1.8
𝐸𝑖
𝑖=1

We can use the chi squared distribution to find the critical value. The degrees of freedom for the chi
squared distribution are calculated by 𝑘 − 1 = 4, and with an 𝛼 = 2.5% the critical value is 𝑐 =
11.14

As 𝑐 > 𝑇 we cannot reject the null hypothesis at a significance level of 2.5%.

(b) On a different day, the ice cream seller reports cones’ sales as listed in column B of Sheet 4 of the
Excel workbook “[Link]”. He has a feeling that the trend on that specific day has changed. Is
the ice cream seller’s feeling justified? Verify with an α = 0.025 significance level.

As we are comparing the sales data from day 2 to the original data provided, we can use the same
values of 𝑝 and 𝐸(𝑥) for this problem as well. The data that changes are the observations of each
flavour bought, which are the following:

𝑂1 38

𝑂2 58

𝑂3 79

𝑂4 16

𝑂5 9

As we have 5 categories (flavours), we know that 𝑘 = 5. Using the information above, we can
calculate the test statistic:
𝑘
(𝑂𝑖 − 𝐸𝑖 )2
𝑇=∑ = 27.42
𝐸𝑖
𝑖=1

We can use the chi squared distribution to find the critical value. The degrees of freedom for the chi
squared distribution are calculated by 𝑘 − 1 = 4, and with an 𝛼 = 2.5% the critical value is 𝑐 =
11.14
As 𝑇 > 𝑐 we can reject the null hypothesis at a significance level of 2.5%.
The trend of day 2 clearly changed from day 1, as the result of the test statistic 𝑇 increased from 1.8
to 27.42. This was most likely do the 79 observations from flavour 3, as it had 19 observations above
the expected value of 50.

Problem 05

For the current problem, use the nonparametric statistical tables provided in Lecture 10. 5 Data from
two branch offices of a national bank include the last balances from 5 and 4 accounts, as reported in
the following table:

Branch 1 Branch 2
Account Balance ($) Account Balance ($)
1 890 1 843
2 889 2 878
3 854 3 881
4 853 4 892
5 839

(a) Carry out a test, at an α = 0.1 significance level, on the hypothesis that the account balances for
the two branches come from populations having identical means.

As the sample size of the problem is smaller than 10, involves two different independent populations
with different sample sizer, Mann-Whitney test is the appropriate non-parametric method to use.

𝛼 = 0.1, 𝑛𝐴 = 5 and 𝑛𝐵 = 4
The null and alternative hypothesis for the following problem are:

𝐻0 : 𝜇𝐴 = 𝜇𝐵
𝐻1 : 𝜇𝐴 ≠ 𝜇𝐵

The first thing to do is to order the values from smallest to biggest and assign a rank to each value.
This is a very long process and can be seen on the excel sheet “Problem 5” in columns F, G and H.
After the ranking is done, we do a sum of all the ranks from sample A to determine our test statistic.
Only 1 sample is considered when conducting the test statistic, and it’s the following:

𝑈𝐴 = 1 + 3 + 4 + 7 + 8 = 𝟐𝟑

As sample A was used for the test statistic, 𝑛1 = 𝑛𝐴 = 5 and 𝑛2 = 𝑛𝐵 = 4. The hypothesis testing is
two tailed as the means can either be bigger or smaller than each other, thus we need to divide
alpha in half, 𝛼/2 = 0.05. This is all the information needed to find the left and right limits for the
rejection interval.
𝐶𝐿 = 17 Which is found on the table

𝐶𝑅 = 𝑛𝐴 (𝑛𝐴 + 𝑛𝐵 + 1) − 𝐶𝐿 = 5(5 + 4 + 1) − 17 = 𝟑𝟑

Since 23 lies inside the rejection interval, 17 < 24 < 33, we cannot reject the null hypothesis at a
significance level of 10%.

(b) The employee who previously provided the data had, in fact, forgotten to report a last set of
account balances. The forgotten set of data is listed in the following table:

Branch 1 Branch 2
Account Balance ($) Account Balance ($)
6 821 5 949
7 823 6 995
8 736 7 934
9 779 8 937
10 783 9 918
11 749 10 972
12 721 11 984
13 724 12 943
14 749
Do these new data, when added to the original ones, contradict or confirm the hypothesis in part (a),
at an α = 0.1 significance level?

As the sample size is larger than 10, the sample distribution of the test statistic now approximates a
normal distribution. This will impact the methodology of the problem.

The null and alternative hypothesis for the following problem are:

𝐻0 : 𝜇𝐴 = 𝜇𝐵
𝐻1 : 𝜇𝐴 ≠ 𝜇𝐵
The first thing to do is to order the values from smallest to biggest and assign a rank to each value.
This is a very long process and can be seen on the excel sheet “Problem 5” in columns O,P,Q and R.
After the ranking is done, we do a sum of all the ranks from sample A, which is

𝑈𝐴 = 113

As sample A was used again, 𝑛1 = 𝑛𝐴 = 14 and 𝑛2 = 𝑛𝐵 = 12. As it approximates a normal


distribution, we need to calculate the sample standard deviation and mean, which are:
1 1
𝜇𝐴 = 𝑛𝐴 (𝑛𝐴 + 𝑛𝐵 + 1) = ∗ 14(14 + 12 + 1) = 189
2 2
1 1
𝜎𝐴 = √ 𝑛𝐴 𝑛𝐵 (𝑛𝐴 + 𝑛𝐵 + 1) = √ ∗ 14 ∗ 12(14 + 12 + 1) = 19.44
12 12

As the distribution approximates the normal distribution, we can calculate 𝑧 for our test statistic. Hence:
113 − 189
𝑧𝑠 = = −3.91
19.44

Once again, since it’s a two tailed test we have to divide alpha by 2. The critical value is found the
following way:

𝑧𝛼/2 = 𝑧0.05 = ±1.64

As |𝑧𝑠 | > |𝑧𝛼/2 | we can reject the null hypothesis at a significance level of 10%.

You might also like