Final Exam 2017
Final Exam 2017
STUDENT
NUMBER U
Q1 Q2 Q3 Q4 Total
Pages 2 to 6 7 to 11 12 to 15 16 to 20
Marks 15 15 15 15 60
Score
H2S
Lactic
Residuals
[Hint: rounding errors will accumulate as you derive entries in this table from other
values shown in the R output, so do NOT round the results of intermediate
calculations. DO round all your final answers in the above table to 2 decimal places.
You may also have to use the statistical tables to estimate one or more of the
p-values, or you can receive the marks for showing appropriate critical values.]
Working
Are there any problem(s) shown on the “Normal Q-Q” plot on page 3?
If so describe the problem(s):
Are there any problem(s) shown on the “Cook’s distance” plot on page 3?
If so describe the problem(s):
What is your overall assessment? (select just ONE of the following options)
□ Residuals are not independent (obvious pattern)
□ Residuals do not have constant variance (heteroscedasticity)
□ Residuals are not normally distributed
□ There are possible outliers and/or influential observations
□ More than one of the above problems
□ No obvious problems
(2 marks – 0.5 for each section)
DFFITS
(see the next page for more answer spaces for part (c) of Question 1)
COVRATIO
DFBETAS
Given your answers above and considering the residual plots in part (b), are there
any observations that are vertical outliers and/or highly influential observations?
Should some observations be removed and the model re-fit to the remaining data?
(7 marks – 1 for each of the first 5 sections and 2 for the last summary section)
(3 marks)
Are there any problem(s) shown on the “Normal Q-Q” plot on page 7?
If so describe the problem(s):
Are there any problem(s) shown on the “Cook’s distance” plot on page 7?
If so describe the problem(s):
What is your overall assessment? (select just ONE of the following options)
□ Residuals are not independent (obvious pattern)
□ Residuals do not have constant variance (heteroscedasticity)
□ Residuals are not normally distributed
□ There are possible outliers and/or influential observations
□ More than one of the above problems
□ No obvious problems
(2 marks – 0.5 for each section)
Is growth.lm3 an appropriate model for the kid.weights data? If not, how might we
modify this model?
(2 marks)
Multiple R-squared
Adjusted R-squared
(5 marks)
Now examine the way in which the indicator variable boy has actually been added to
the model growth.lm2 to create the model growth.lm3. What are the effects of this
approach on the form of the relationship between the variables? Does the summary
output for the model growth.lm3 on page 8 of the R output suggest that the weight
growth curves for boys and girls differ by an additive constant; or a multiplicative
constant; or that completely separate curves are required?
(2 marks)
(3 marks)
(f) In the vif(growth.lm3) output on page 8, some of the variance inflation factors are
relatively large. Is this an issue that suggests some changes need to be made to the
model? Why or why not?
(1 mark)
(2 marks)
(b) Pages 10 and 11 of the R output present output for a model, cement_all.lm, which
includes all four of the explanatory variables and for another model, cement_all.lm2,
which has the same four explanatory variables, but in a different order. The anova( )
tables are shown for both models, but the output from plot( ), summary( ) and vif( ) are
only shown for the first model. How would the plot( ), summary( ) and vif( ) output differ
for the second model (as opposed to the output shown for the first model)?
(1 mark)
Are there any problem(s) shown on the “Normal Q-Q” plot on page 10?
If so describe the problem(s):
Are there any problem(s) shown on the “Cook’s distance” plot on page 10?
If so describe the problem(s):
What is your overall assessment? (select just ONE of the following options)
□ Residuals are not independent (obvious pattern)
□ Residuals do not have constant variance (heteroscedasticity)
□ Residuals are not normally distributed
□ There are possible outliers and/or influential observations
□ More than one of the above problems
□ No obvious problems
(2 marks – 0.5 for each section)
(2 marks)
(e) Present full details of a nested F test to test whether or not the variables x2 and x3 are a
significant addition to a model that already includes x4 and x1.
(3 marks)
(2 marks)
(g) Find 95% confidence intervals for each of the partial regression coefficients in the
model (cement.lm). Interpret the values of these partial regression coefficients.
(3 marks)
(1 mark)
(b) Page 16 of the R output shows the results of applying the step( ) function to suggest a
suitable multiple linear regression model for these data. Briefly describe the process of
model refinement that has been applied here.
(1 mark)
Are there any problem(s) shown on the “Normal Q-Q” plot on page 17?
If so describe the problem(s):
Are there any problem(s) shown on the “Cook’s distance” plot on page 17?
If so describe the problem(s):
What is your overall assessment? (select just ONE of the following options)
□ Residuals are not independent (obvious pattern)
□ Residuals do not have constant variance (heteroscedasticity)
□ Residuals are not normally distributed
□ There are possible outliers and/or influential observations
□ More than one of the above problems
□ No obvious problems
(2 marks – 0.5 for each section)
(3 marks)
(1 mark)
(f) Page 18 of the R output shows some summary output for the model (msleep.lm). What
do the signs of each of the partial regression coefficients suggest about the expected
amount of time spent sleeping?
(3 marks)
(2 marks)
(h) Under the suggested model, what is the expected difference in the daily hours spent
sleeping, between mammals that are in danger and those that are relatively safe? Find a
95% confidence interval for this difference.
(2 marks)
END OF EXAMINATION