0% found this document useful (0 votes)
245 views5 pages

Sta 226

(1) This document provides instructions and questions for an assignment on statistical inference. It includes questions on comparing insurance premiums between two companies, analyzing blood pressure data from a medical study, and analyzing customer and pig weight data from different groups. (2) Students are asked to perform statistical tests like two-sample t-tests, paired t-tests, ANOVA, and chi-squared tests of independence. They must interpret the results of these tests in R and determine if there is evidence of differences between groups. (3) The assignment evaluates students' ability to select appropriate statistical tests, carry out analyses in R, and draw conclusions about whether sample data provides evidence for various claims.

Uploaded by

Kimondo King
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
245 views5 pages

Sta 226

(1) This document provides instructions and questions for an assignment on statistical inference. It includes questions on comparing insurance premiums between two companies, analyzing blood pressure data from a medical study, and analyzing customer and pig weight data from different groups. (2) Students are asked to perform statistical tests like two-sample t-tests, paired t-tests, ANOVA, and chi-squared tests of independence. They must interpret the results of these tests in R and determine if there is evidence of differences between groups. (3) The assignment evaluates students' ability to select appropriate statistical tests, carry out analyses in R, and draw conclusions about whether sample data provides evidence for various claims.

Uploaded by

Kimondo King
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

STA 226: INTRODUCTION TO STATISTICAL INFERENCE .

ASSIGNMENT
Instructions: Attempt all questions. In case a level of significance is not stated, use 5%.
Question One
(a) It is desired to investigate the level of premium charged by two companies for
contents policies for houses in a certain area. Random sample of 10 houses insured by
company A are compared with similar houses insured by company B. The premiums
charged in each case are as follows.
Company A 117 154 166 189 190 202 233 263 289 331

Company B 142 160 166 188 221 241 276 279 284 302

(i) Illustrate the data given on a suitable diagram and hence comment briefly on the
validity of the assumptions required for a two-sample t test for the premiums of these
two companies. [1 mark]
(ii) Assume that the premiums are normally distributed, carry out a formal test to check
that it is appropriate to apply a two-sample t test to these data. [3 marks]
(iii) Test whether the level of premiums charged by company B was higher than the
charged by company A. State your conclusion clearly. [3 marks]
(b) In a medical study conducted to test the suggestion that daily exercise has the effect
of lowering blood pressure, a sample of eight patients with high blood pressure was
selected. Their blood pressure was measured initially and then again a month later
after they had participated in an exercise program. The results are shown in the table
below:

Patient 1 2 3 4 5 6 7 8

Before 155 152 146 153 146 160 139 148

After 145 147 123 137 141 142 140 138

The following contains the R-program outputs


Output 1

Shapiro-Wilk normality test

data: before - after


W = 0.9706, p-value = 0.9027

Page 1 of 5
Output 2

Paired t-test

data: before and after


t = 3.8549, df = 7, p-value = 0.003126
alternative hypothesis: true difference in means is greater
than 0
95 percent confidence interval:
5.46661 Inf
sample estimates:
mean of the differences
10.75

Use above outputs to answer the following questions.


(i) Does the data seem to be normal? Justify your answer. [1 mark]
(ii) Does the date provide sufficient evidence to support the claim that the exercise
reduces blood pressure in patients? [2 marks]
(c) Assume that the above data are from two independent populations.
The following are R outputs from the data.

Output 3

F test to compare two variances

data: before and after


F = 0.7866, num df = 7, denom df = 7, p-value = 0.7595
alternative hypothesis: true ratio of variances is not equal
to 1
95 percent confidence interval:
0.1574794 3.9289733
sample estimates:
ratio of variances
0.7865955

Output 4

Two Sample t-test

data: before and after


t = 3.1085, df = 14, p-value = 0.007702

Page 2 of 5
alternative hypothesis: true difference in means is not
equal to 0
95 percent confidence interval:
3.33269 18.16731
sample estimates:
mean of x mean of y
149.875 139.125

Output 5

Welch Two Sample t-test

data: before and after


t = 3.1085, df = 13.803, p-value = 0.007813
alternative hypothesis: true difference in means is not
equal to 0
95 percent confidence interval:
3.322749 18.177251
sample estimates:
mean of x mean of y
149.875 139.125
Clearly interpret the above outputs. [3 marks]

Question Two
(a) The number of new customers generated per month by different branches of a small
building society is being monitored for employee bonus purpose. Head office has
collated the figures sent in by four branches over recent months, which are as follows:

Branch 1 11 5 4 9 3 0 -

Branch 2 9 7 6 8 12 - -

Branch 3 5 4 5 6 0 8 6

Branch 4 7 8 12 0 1 15 6

There are different numbers of figures because of incomplete data being sent to Head
office. Investigate whether there is any different between the mean number of new
customers. Use 5% level of significance. [5 marks]

(b) Nineteen pigs are assigned at random among four experimental groups. Each group
is fed a different diet. The data are pig body weights, in kilograms, after being raised
on these diets. We wish to ask whether pig weights are the same for all four diets.

Page 3 of 5
Feed1 Feed2 Feed3 Feed4

60.8 68.7 102.6 87.9

57 67.7 102.1 84.2

65 74 100.2 83.1

58.6 66.3 96.5 85.7

61.7 69.8 100 90.3

(i) What type of hypothesis test will you use? [1 mark]


(ii) What are the test's assumptions? [2 marks]
(iii) A side-by-side boxplots are plotted as shown below to compare the three
distributions. Do the samples look like they were drawn from populations with same
distribution? Justify your answer. [2 marks]
100
90
80
70
60

feed1 feed2 feed3 feed4

(iv) The following is R output from the data. Interpret the results in the context of the
problem. [1 mark]

Page 4 of 5
Output 6

Analysis of Variance Table

Response:

Df Sum Sq Mean Sq F value Pr(>F)

feed 3 4686 1562 194.6 8.47e-13 ***

Residuals 16 128 8

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1

(v) The following R output to test the homogeneity of variances. Interpret the
results in the context of the problem. [2 mark]

Output 7

Bartlett test of homogeneity of variances

data: y by feed

Bartlett's K-squared = 0.2364, df = 3, p-value = 0.9715

(a) A certain specimen of plant produces flowers which are either red, white or pink. It
also produces leaves which maybe either plain or variegated. For example of 500
plants, the distribution of flower color and leaf type was.
Red White Pink
Plain 97 42 77
Variegated 105 148 31
Test whether these results indicates any association between flower color and the leaf
type. [4 marks]

Page 5 of 5

You might also like