M346 Paper 2012
M346 Paper 2012
This examination is in TWO parts. Part 1 carries 25% of the total available marks
and Part 2 carries 75%.
You should attempt ONE question from Part 1: this question carries 25 marks. You
should attempt THREE questions from Part 2: each question in this part also carries
25 marks.
You are advised not to cross through any work until you have replaced it with another
solution to the same question (or part of question).
In Part 1 of the paper, if you answer both questions, your better score will count
towards your result. In Part 2 of the paper, if you answer more than three questions,
your best three scores will count towards your final mark.
This question paper is rather long because of the inclusion of tranches of GenStat
output. Do not let its length put you off. In your initial reading of the paper,
you will be able to either ignore or pass over very quickly all such output.
Please start each question on a new page, and cross out rough working.
Put all your used answer books together with your signed desk record on top. Fasten
them in the top left corner with the round paper fastener. Attach this question paper
to the back of the answer books with the flat paper clip.
Copyright
c 2012 The Open University
PART 1 (Questions 1 and 2)
You should attempt ONE question from this part of the examination,
which carries 25% of the total available marks. Each question carries
25 marks. A guide to mark allocation is shown beside each question
thus: [4].
In each question in Part 1 you are asked to write a short essay on a
topic from the course. By the word ‘essay’, we do not mean to imply
that your answer should be entirely text; formulae and mathematical
symbols, if appropriate, are allowed. However, you should think of
this as an essay question in the senses of structure and readability.
Indeed, 4 of the 25 marks will be awarded for putting the essay
together in a reasonably clear manner, including a reasonable
structure with beginning, middle and conclusion, and reasonably
concise use of language. References to specific data-based examples in
the course are not expected. However, it may be useful to illustrate
points by giving special cases, perhaps in mathematical form (e.g.
Y ∼ N(0, σ2 ) is a special case of a distributional assumption, and
α + β1 x1 + β2 x2 is a special case of a formula for a regression mean).
Question 1
Write an essay in which the role of treatments and blocks in designed
experiments is discussed.
Your answer should include
• a general description of treatments and blocks in the context of
designed experiments, including a description of what they are,
why they are used and how experimental units are allocated to
them; [5]
• a description of how an ANOVA model incorporates the
treatment and block structures; [6]
• a brief description of how the best fitted model is interpreted; [2]
• a brief discussion of how four types of experiment: completely
randomised, randomised block, factorial and latin square,
compare in terms of the number of treatment factors and number
of blocking factors; [4]
• a brief description of confounding, including one advantage and
one disadvantage of having a design that makes use of
confounding. [4]
The remaining four marks are for the clarity and structure of your
essay. [4]
Question 3
Data on a number of athletes were collected at the Australian
Institute of Sport. For each athlete, two measures were recorded in a
GenStat file: the measured haemoglobin Hg (in grams per decilitre),
and the red blood cell count RCC (×1012 cells per litre). Interest
centres on whether the response variable, Hg, can be usefully
predicted by the explanatory variable, RCC, using simple linear
regression.
(a) A scatterplot of Hg against RCC is given in Figure 1.
Figure 1
On the basis of this plot, would you say it is reasonable to fit a
simple linear regression model to the data? Briefly explain why or
why not. [3]
Regression analysis
Response variate: Hg
Fitted terms: Constant, RCC
Summary of analysis
Source d.f. s.s. m.s. v.r. F pr.
Regression 1 259.37 259.3713 672.58 <.001
Residual 196 75.58 0.3856
Total 197 334.96 1.7003
Percentage variance accounted for 77.3
Standard error of observations is estimated to be 0.621.
Message: The following units have high leverage.
Unit Response Leverage
161 16.100 0.046
181 18.500 0.032
199 17.700 0.030
Estimates of parameters
Parameter estimate s.e. t(196) t pr.
Constant 2.045 0.483 4.24 <.001
RCC 2.653 0.102 25.93 <.001
(i) Which two parts of the output given by GenStat indicate that
there is a significant relationship between haemoglobin and
red blood cell count? On the basis of this GenStat output, is
it reasonable to say that an increase in red blood cell count
causes an increase in haemoglobin? Why or why not? [3]
(ii) In the GenStat output above, three units, units 161, 181 and
190, are flagged as having high leverage. Units 161 and 181
have large Cook’s statistics whereas unit 190 does not.
Briefly explain these findings. [3]
(iii) A composite residual plot for Model A is given in Figure 2. Is
there any feature, or features, of this plot that indicates that
the assumptions of the simple linear regression model do not
hold for this model? [3]
Regression analysis
Response variate: Hg
Fitted terms: Constant + RCC + Sex
Summary of analysis
Source d.f. s.s. m.s. v.r. F pr.
Regression 2 270.33 135.1632 407.82 <.001
Residual 195 64.63 0.3314
Total 197 334.96 1.7003
Model C
Regression analysis
Response variate: Hg
Fitted terms: Constant + RCC + Sex + RCC.Sex
Summary of analysis
Source d.f. s.s. m.s. v.r. F pr.
Regression 3 270.56 90.1859 271.69 <.001
Residual 194 64.40 0.3319
Total 197 334.96 1.7003
Percentage variance accounted for 80.5
Standard error of observations is estimated to be 0.576.
Message: The following units have large standardized residuals.
Unit Response Residual
181 18.500 2.92
Message: The following units have high leverage.
Unit Response Leverage
69 15.900 0.090
74 15.000 0.094
88 14.700 0.066
95 14.500 0.066
113 14.000 0.061
137 13.500 0.094
153 14.300 0.081
161 16.100 0.105
181 18.500 0.063
199 17.700 0.058
Estimates of parameters
Parameter estimate s.e. t(194) t pr.
Constant 5.403 0.958 5.64 <.001
RCC 2.015 0.191 10.54 <.001
Sex 1 −1.69 1.25 −1.35 0.177
RCC.Sex 1 0.220 0.263 0.83 0.405
Figure 3
Correlations
fatheriq
motheriq −0.0248
speak −0.0305 0.0722
count −0.0750 0.0243 0.0595
read −0.0682 −0.0430 0.1851 0.9103
edutv 0.1162 −0.3300 −0.1545 −0.2157 −0.1666
cartoons −0.2484 0.3384 0.1094 0.1549 0.1257 −0.9234
fatheriq motheriq speak count read edutv cartoons
Model A
Regression analysis
Response variate: score
Fitted terms: Constant + fatheriq + motheriq + speak + count + read
+ edutv + cartoons
Summary of analysis
Source d.f. s.s. m.s. v.r. F pr.
Regression 7 562.4 80.343 11.97 <.001
Residual 28 187.9 6.711
Total 35 750.3 21.437
Percentage variance accounted for 68.7
Standard error of observations is estimated to be 2.59.
Message: The following units have large standardized residuals.
Unit Response Residual
24 169.00 2.36
Message: The following units have high leverage.
Unit Response Leverage
19 160.00 0.56
Estimates of parameters
Parameter estimate s.e. t(28) t pr.
Constant 75.5 24.0 3.14 0.004
fatheriq 0.252 0.138 1.84 0.077
motheriq 0.4001 0.0729 5.49 <.001
speak 0.188 0.148 1.27 0.214
count 0.206 0.266 0.78 0.445
read 7.54 5.59 1.35 0.188
edutv −4.20 2.25 −1.87 0.072
cartoons −3.34 2.02 −1.65 0.109
Model B
Regression analysis
Response variate: score
Fitted terms: Constant + motheriq + read + edutv + cartoons
Summary of analysis
Source d.f. s.s. m.s. v.r. F pr.
Regression 4 530.4 132.593 18.69 <.001
Residual 31 219.9 7.095
Total 35 750.3 21.437
Change 3 32.0 10.677 1.59 0.214
Percentage variance accounted for 66.9
Standard error of observations is estimated to be 2.66.
Message: The following units have high leverage.
Unit Response Leverage
4 157.00 0.36
33 151.00 0.31
Estimates of parameters
Parameter estimate s.e. t(31) t pr.
Constant 112.5 14.4 7.83 <.001
motheriq 0.4177 0.0740 5.64 <.001
read 11.60 2.24 5.19 <.001
edutv −6.05 2.12 −2.85 0.008
cartoons −5.11 1.88 −2.72 0.011
Analysis of variance
Variate: diameter
tree stratum
pollen 3 10.3004 3.4335 0.08 0.969
Residual 4 177.2346 44.3087 85.53
tree.∗Units∗stratum
pollen 3 11.5671 3.8557 7.44 0.004
Residual 13 6.7350 0.5181
Total 23 205.8372
Conduct a formal statistical test to investigate whether the
diameter of apples does depend on the type of pollen. (You
need to give details of the test statistic, the p value for the
test, and the degrees of freedom of the distribution with
which the test statistic is to be compared.) [4]
Figure 4
What does Figure 4 tell you about the appropriateness of the
model that has been fitted? Justify your answer. [4]
(v) How could the same model be fitted without using ANOVA? [2]
(b) In an experiment, the effect of the following three factors on the
hardness of dental fillings (variate hardness) was explored.
gold : the type of gold used (eight levels)
dentist : the dentist making up the filling (five levels)
condense : the condensation method used (three levels)
Every possible treatment combination was used, and there was
one replication per treatment combination. Two models were
then fitted to the data.
Model A : dentist + condense + gold + dentist.condense
+ dentist.gold + condense.gold
Model B : gold + dentist*condense.
(i) In Model A what assumption is made about the three-way
interaction? Why is this assumption necessary in order to
carry out inference? [2]
Analysis of variance
Variate: hardness
Figure 5
Regression analysis
Response variate: species
Distribution: Poisson
Link function: Log
Fitted terms: Constant + larea + lnitrate + lsolids + pH
Summary of analysis
mean deviance approx
Source d.f. deviance deviance ratio chi pr
Regression 4 61.34 15.335 15.34 <.001
Residual 39 68.36 1.753
Total 43 129.70 3.016
Dispersion parameter is fixed at 1.00.
Message: Deviance ratios are based on dispersion parameter with
value 1.
Regression analysis
Response variate: species
Distribution: Poisson
Link function: Log
Fitted terms: Constant + larea
Summary of analysis
mean deviance approx
Source d.f. deviance deviance ratio chi pr
Regression 1 51.88 51.881 51.88 <.001
Residual 42 77.82 1.853
Total 43 129.70 3.016
Change 3 9.46 3.153 3.15 0.024
Dispersion parameter is fixed at 1.00.
Message: Deviance ratios are based on dispersion parameter with
value 1.
Message: The following units have large standardized residuals.
Unit Response Residual
12 11.00 −2.39
29 2.00 −3.26
40 20.00 2.89
41 33.00 3.34
44 23.00 3.20
Message: The following units have high leverage.
Unit Response Leverage
12 11.00 0.145
Estimates of parameters
antilog of
Parameter estimate s.e. t(∗) t pr. estimate
Constant −0.292 0.402 −0.73 0.468 0.7470
larea 0.3214 0.0461 6.97 <.001 1.379
Message: s.e.s are based on dispersion parameter with value 1.
(d) (i) Does overdispersion appear to be a problem in Model B?
Justify your answer. [2]
(ii) Regardless of your answer to part (d)(i), suppose it is decided
that Model B is overdispersed. Suggest a way of changing
Model B to deal with the overdispersion. [1]
Regression analysis
Response variate: counts
Distribution: Poisson
Link function: Log
Fitted terms: Constant + housing + influenc + contact + satisfac
+ housing.influenc + housing.contact + influenc.contact
+ housing.satisfac + influenc.satisfac + contact.satisfac
Summary of analysis
mean deviance approx
Source d.f. deviance deviance ratio chi pr
Regression 31 789.71 25.474 25.47 <.001
Residual 40 43.95 1.099
Total 71 833.66 11.742
Dispersion parameter is fixed at 1.00.
Message: Deviance ratios are based on dispersion parameter with value 1.
Regression analysis
Response variate: counts
Distribution: Poisson
Link function: Log
Fitted terms: Constant + contact + housing + influenc + satisfac
+ contact.housing + contact.influenc + contact.satisfac
+ influenc.satisfac + housing.influenc