0% found this document useful (0 votes)
26 views4 pages

Final 221220 Statmeth Solutions

This document provides exemplary solutions to statistical problems on a final exam. It includes solutions to problems involving confidence intervals, hypothesis testing, and tests of proportions. Key steps and formulas are shown for problems involving normal distributions, t-tests, chi-squared tests, and tests of two proportions. Justifications are provided for choices of tests and the interpretation of results.

Uploaded by

Alina A.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views4 pages

Final 221220 Statmeth Solutions

This document provides exemplary solutions to statistical problems on a final exam. It includes solutions to problems involving confidence intervals, hypothesis testing, and tests of proportions. Key steps and formulas are shown for problems involving normal distributions, t-tests, chi-squared tests, and tests of two proportions. Justifications are provided for choices of tests and the interpretation of results.

Uploaded by

Alina A.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Final exam Statistical Methods – Examplary solutions

20 December 2022

1.

a) Because of the Central Limit Theorem, P̂1600 has approximately a normal distribution with mean p,
which is the true population proportion of Dutch citizens who feel that Sinterklaas is an important
part of Dutch tradition, and variance p(1−p)
1600 .

b) For the 95%-CI


q we use the formula q
[P̂1600 − P̂1600 (1 − P̂1600 )/1600 · z0.025 , P̂1600 + P̂1600 (1 − P̂1600 )/1600 · z0.025 ],
q q
that is, [ 1090
1600 − 1090
1600 (1 − 1090
1600 ) · 1.96/40, 1090
1600 + 1090 1090
1600 (1 − 1600 ) · 1.96/40]
= [0.681 − 0.023, 0.681 + 0.023) = [0.658, 0.704].
2 /(4 · E 2
c) We are going to use the formula n ≥ zα/2 max ). In our case, Emax = 0.04/2 = 0.02. Thus,
2
n ≥ 1.96 /(4 · 0.0004) = 2401.

2.
a) The mean weight of three independent and randomly samples patients is a random variable with a
normal distribution and mean µ = 112 and variance σ 2 = 102 /3 = 100/3 ≈ 33.3.
130−112
Thus, P (X 3 > 130) = P (Z > √ )
10/ 3
where Z ∼ N (0, 1) = 1 − P (Z ≤ 3.12) = 1 − 0.9991 = 0.0009.

b) Exemplary choice and motivation:


Two independent samples t-test assuming equal variances (or standard deviations in both groups),
Motivation: The relative difference between the standard deviations is about 16% − 19% (depending
on the point of view), so they are quite close and it can be assumed that the standard deviations
are equal.
(Other choice could also be motivated - but motivation is always key.)

We need for the test assumptions either sufficiently large samples (both ≥ 30) or that the individual
measurements are normally distributed.
Either condition is satisfied here.

c) Parameter of interest: µ1 − µ2 ,
Hypotheses: H0 : µ1 − µ2 = 0 vs. Ha : µ1 − µ2 < 0.
The choice of test statistic has been made and motivated in b), so we here only give the formula
and state it’s distribution under H0 :
T2eq = √ X 1 −X 2
Sp2 /35+Sp2 /35

which under H0 has a t-distribution with 35 + 35 − 2 = 68 degrees of freedom.

t-score of the test:


(35−1)s21 +(35−1)s22
We first have to calculate s2p = 35+35−2 = (7.42 + 6.22 )/2 = 46.6
teq
2 =
√ 158−163
≈ −5
1.632 ≈ −3.06
46.6/35+46.6/35

We compare this to the critical value −t68,0.01 ≈ −t60,0.01 = −2.390.


The t-score is less than the critical value.
Because this concerns a left-tailed test, we reject the null hypothesis in favour of the alternative
hypothesis. Concluding, we have shown (at significance level α = 1%) that metformin is effective in
reducing the weight.
d) This is an unpaired testing problem because the individuals in both groups are not related at all.
(They even have been randomized into both treatment groups.)
As for designing the experiment differently, there are many options possible; below you can find a
selection of possible answers – others might be acceptable as well.
Exemplary answer about designing the study differently:
Yes, I would design it differently because one could take measurements before and after the treatment
of metformin and thus naturally end up in a 2 dependent samples problem.
(“No could also be argued if done correctly.”)

3.
a) We use the formula
q q
[P̂1 − P̂2 − zα/2 P̂1 (1−
n1
P̂1 )
+ P̂2 (1−P̂2 )
n2 , P̂1 − P̂2 + zα/2 P̂1 (1−P̂1 )
n1 + P̂2 (1−P̂2 )
n2 ]
for α = 10% which yields

r
h 30 24 30/40 · (1 − 30/40) 24/36 · (1 − 24/36)
− − 1.645 + ,
40 36 r 40 36
30 24 30/40 · (1 − 30/40) 24/36 · (1 − 24/36) i
− + 1.645 +
40 36 r 40 36 r
h3 2 3 1 2 1 3 2 3 1 2 1 i
= − − 1.645 · /40 + · /36, − + 1.645 · /40 + · /36
4 3 4 4 3 3 4 3 4 4 3 3
1 1
≈ [ 12 − 1.645 · 0.104, 12 + 1.645 · 0.104]
≈ [−0.088, 0.255].

b) Parameter of interest: p1 − p2 ,
significance level: α = 5%,
H0 : p1 = p2 vs. Ha : p1 > p2 .
Test statistic:
Z = r P̂1 −P̂2 ,
P (1−P ) P (1−P )
n1
+ n
2

where P = Xn11 +X
+n2 ;
2

Z has under H0 approximately an N (0, 1)-distribution if certain requirements are met.


And indeed, the requirements are met: x1 = 30, n1 − x1 = 10, x2 = 24, n2 = 36 − 24 = 12 are all
bigger than 5.
3 2
Test score: z = √ 4−3 ≈ 0.8
0.711·(1−0.711)/40+0.711·(1−0.711)/36
since p = 54/76 ≈ 0.711.
We compare this to the critical value zα = 1.645, and we see that the score is smaller, so we cannot
reject the null hypothesis based on this right-tailed test.
Concluding, there is not enough evidence to support the claim that CS students like programming
better than AI students.

c) Exemplary justification:
These two approaches in general yield the same conclusion:
The critical value method gives us a significant test result if the test score exceeds the 95%-quantile
of N (0, 1).
The p-value method gives us a significant outcome if the area to the right of the test score z w.r.t.
N (0, 1) is smaller than 5%. So this is exactly the case if the score exceeds the 95%.
4.

a) This concerns a test of homogeneity because the data were sampled at two different times, and also
because we are primarily interested in the equality/difference of proportions of active users.

Hypotheses:
H0 : p11 = p21 and p12 = p22 vs. Ha : p11 ̸= p21 or p12 ̸= p22 .
Alternatively, phrased in words:
H0 : The “before take-over” population and the “after take-over” population are homogeneous with
respect to their proportion of active/inactive users.
vs. Ha : The “before take-over” population and the “after take-over” population are heterogeneous
with respect to their proportion of active/inactive users.

b) The expected frequencies under the null are given as the product of the marginal totals over the
grand total. Thus,

N1· N·1 1063 · 922


E11 = = = 499.28,
n 1963
900 · 922
E12 = = 422.72,
1963
1063 · 1041
E21 = = 563.72,
1963
900 · 1041
E22 = = 477.28.
1963

These expected numbers are all bigger than 5, so the requirements of the test are met.

c) We use the test statistic


(O −E )2
X 2 = cell (i,j) i,jEi,j i,j
P

which has under H0 approximately a chi-squared distribution with (r−1)·(c−1) = (2−1)·(2−1) = 1


degree of freedom.
(from b) we know that the requirements are met).
The test score is χ2 = (526 − 499.28)2 /499.28 + (396 − 422.72)2 /422.72 + (537 − 563.72)2 /563.72 +
(504 − 477.28)2 /477.28
= 5.88
We compare this to the critical value χ21,0.01 = 6.635.
The score is smaller and the test is right-tailed which is why we cannot reject the null hypothesis.
Concluding, there is not enough evidence to support the claim that the proportion of active Twitter
users has changed after the take-over.

5.

a) The explainable variation is given by r2 = 0.3932 ≈ 0.154.

b) Coefficient estimates:
s 1.011
b1 = r sxy = 0.393 · 196.544 ≈ 0.002
b0 = y − b1 · x = 7.673 − 3991.751 · 0.002 ≈ −0.311

c) In general:
Residual plots can be used to check the fixed error variance assumption, i.e. whether or not the
spread of the residuals is about constant for all values of x or whether there is some trend or some
other strange pattern.
QQ-plots based on the residuals can be used to check the normality assumption for the error terms.

In our present data example:


No clear pattern or trend is visible in the residual plot, so we consider the fixed error variance
assumption to be met.
Furthermore, the QQ-plot depicts points which are arranged almost perfectly on a straight line,
which could be used as an indication that the residuals (and thus the error terms) follow a normal
distribution.

d) This subquestion is granted full points for everybody because sb1 was missing on the
exam. It has been announced but some students already left at that time.
Parameter of interest: β1 , the slope parameter in the linear regression model.
H0 : β1 = 0 vs. Ha : β1 ̸= 0.
Test statistic: SBB1
1
which under H0 (requirements are met, see c)) has approximately a t-distribution with n − 2 =
38 − 2 = 36 degrees of freedom.
0.002
t-score: t = 0.00079 ≈ 2.532.
We want to compare this score to the critical values ±t36,0.025% = ±2.028.
So we see that the test score belongs to the critical region on the right, thus we can reject H0 .
Consequently, we conclude that the Quiz scores do have an influence on the exam grades.

You might also like