Chapter3-Goodness of Fit Tests
Chapter3-Goodness of Fit Tests
Introduction
In the previous section, we discussed and derived the least squares estimates
for the model yi = β0 + β1 x1 + i .
If the model is fitted and no assumptions are violated, the next step is to use
the model to investigate the relationship between the independent and the
dependent variables as well as to make inference about the parameters.
Sxy
E[βˆ1 ] = E
Sxx
1
= E[Sxy ]
Sxx
n
1 X
= E[ (xi − x̄)(yi − ȳ)]
Sxx i=1
n
1 X
= (xi − x̄)(β0 + β1 xi − β0 − β1 x̄)
Sxx i=1
n
β1 X
= (xi − x̄)(xi − x̄)
Sxx i=1
β1 Sxx
= = β1
Sxx
1
For the intercept (β0 ), we have,
Sxy
V ar(βˆ1 ) = V ar( )
Sxx
n
" #
1 X
= V ar (xi − x̄)(yi − ȳ)
Sxx i=1
n
2 X
1
= (xi − x̄)2 V ar(yi − ȳ)
Sxx i=1
n
2 X
1
= (xi − x̄)2 V ar(yi )
Sxx i=1
n
2 X
1
= (xi − x̄)2 σ 2
Sxx i=1
Sxx
= 2
σ
Sxx
σ2
=
Sxx
The variance of βˆ0 is given by
2
V ar(βˆ0 ) = V ar(ȳ − βˆ1 x̄)
= V ar(ȳ) + V ar(βˆ1 x̄) − Cov(ȳ, βˆ1 x̄)
n
1X
= V ar( yi ) + x̄2 V ar(βˆ1 )
n i=1
n
1 X
= V ar(yi ) + x̄2 V ar(βˆ1 )
n2 i=1
n 2
1 X 2 2 σ
= σ + x̄
n2 i=1 Sxx
σ 2 σ 2 x̄2
= +
n Sxx
x̄2
!
1
= σ2 +
n Sxx
Exercise 1 Find the covariance between ŷ and β1 .
2
The sampling distribution of βˆ0 is given by βˆ0 ∼ N (β0 , σ 2 ( n1 + Sx̄xx )), where σ 2
is the variance of the error term. (This is because βˆ0 is a linear combination
of normal random variables, it must also be normal).
2
The sampling distribution of βˆ1 is given by βˆ1 ∼ N (β1 , Sσxx )
Now, using the properties of the sampling distributions of βˆ0 and βˆ1 , inference
about β0 and β1 can be made. Before we infer on β0 and β1 , we need to
estimate σ 2 . To estimate σ 2 , we will use the residuals, yi − ŷi , which are the
observed errors of fit. It is reasonable to say that the sample variance of the
residuals should provide an estimator of s2 .
n
1 X
s2 = (yi − ŷi )2
n − 2 i=1
n
1 X
= r2
n − 2 i=1
3
NB
(i) s2 = M SE
(iii) The n−2 in the denominator comes from the fact that we have n pieces
of information less 2 estimates of β0 and β1 .
βˆ1 − β1
t= q 2
s
Sxx
βˆ1
=q 2
s
Sxx
√
βˆ1 Sxx
=
s
Rejection Criteria
Tests involving β0
H0 : β0 = 0
H1 : β0 6= 0
4
Test statistic:
βˆ0 − β0
t= q 2
s2 ( n1 + Sx̄xx )
βˆ0
= q
x̄2
s ( n1 + Sxx
)
Rejection Criteria
5
Exercise 2 The data is of green liquor (N a2 S) concentration and paper ma-
chine production.
( (yi − ŷi )2
P
σˆ2 = s2 =
n−2
P 2
ri 80.5740
= = = 7.3249
n−2 13 − 2
x2 − nx̄2
X
Sxx =
= 11529419 − 13(939)2
= 67046
6
ˆ
Now t = qβ1 2 = √0.0694
7.3249
= 6.6396
s
67046
Sxx
q q
βˆ0 − tn−2 ( α2 ) s2 ( n1 + x̄2
Sxx
, βˆ0 − tn−2 ( α2 ) s2 ( n1 + x̄2
Sxx
AIM
To compute σ 2
7
Partitioning the total sum of squares
The measure of variability of the observations is expressed in terms of the
sum of squares of the observations and is denoted by SST given by
Pn
SST = i=1 (yi − ȳi )2
SST = Total Sum of Squares
Pn Pn
SSE = i=1 (yi − ŷi )2 = 2
i=1 ri
If all the yi values fall on the regression line, SSE will be zero. Thus, the
larger the SSE, the greater is the variation of yi observations around the
fitted regression line.
Pn
SSR = i=1 (ŷi − ȳ)2
(variability explained by the model)
SSR=SST-SSE
8
SSR can be viewed as a measure of the effect of the regression relation
in reducing the variability of yi .
Thus, for SLR, the decomposition of SST into two components is achieved
as follows
n n n
(yi − ȳ)2 = (ŷi − ȳ)2 + (yi − ŷi )2
X X X
n
(ŷi − ȳ)2
X
SSR =
i=1
hP i2
Yi Xi − nȲ X̄
=
Xi2 − nX̄ 2
P
[Sxy ]2
=
Sxx
= βˆ1 Sxy
9
Partitioning degrees of freedom
SST had n − 1 degrees of freedom (d.f) associated with it. This is because,
SST has n deviations, namely yi − ȳ. However, there is one constraint on
these deviations, namely ni=1 (yi − ȳ) = 0, so we lose one degrees of freedom,
P
Thus, the degrees of freedom are additive and given by (n − 1) = (1) +(n − 2)
Mean Squares
A sum of squares divided by the degrees of freedom is called a mean square,
e.g s2 = M SE. The two important mean squares are the regression mean
square denoted MSR and the error mean square denoted by MSE.
Thus, M SR = SSR1
and M SE = SSE
n−2
= s2
If β1 = 0 ⇒ E[M SR] = σ 2 - in this case, both MSE and MSR have the same
expected value.
When β1 6= 0, the term σ 2 + β12 (x − i − x̄)2 will be positive and E[M SR] >
P
10
BASIC ANOVA TABLE
It is useful to collect the sum of squares, degrees of freedom and mean squares
in an ANOVA table for regression analysis. The table below gives the struc-
ture and the appearance of the basic ANOVA table.
From the ANOVA table, we can get the estimate of the variance, s2 and
test the hypothesis that there is a regression relationship. The ratio F in
the ANOVA table has the Fisher’s distribution with 1 and n − 2 degrees of
freedom, if the assumption of the model holds.
If F is near 1, then MSR and MSE are approximately equal. F > 1, suggests
that β1 6= 0.
H0 : β1 = 0
H1 : β1 6= 0
Test statistic=F
Decision Rule
11
X, Temperature (o C) Y , Speed (m/s)
-20 323
0 327
20 340
50 364
100 384
(b) Construct the ANOVA table for this data set and hence test the hypoth-
esis that the slope is zero. Use α = 0.01.
Solution
(a)
X
Sxy = xi yi − nx̄ȳ
= 56940 − 5(30)(347.6) = 4800
x2i − nx̄2
X
Sxx =
= 13300 − 5(30)2 = 8800
Sxy
βˆ1 =
Sxx
4800
= = 0.5455
8800
βˆ0 = ȳ − βˆ1 x̄
= 347.6 − 0.5455(30) = 331.2364
12
(b)
X
SST = (yi − ȳ)
yi2 − nȳ 2
X
=
= 606810 − 5(347.6)2
= 2681.2
X
SSY = (ŷi − ȳ)
2
Sxy
=
Sxx
48002
=
8800
= 2618.1818
SSE
Therefore SSE = SST − SSR = 63.0182. M SE = n−2
= 21.0061,
M SR = SSR1
= 2618.1818
M SR
F = M SE
= 124.6394
H0 : β1 = 0
H1 : β1 6= 0
13
Since F > 34.1, we reject H0 and conclude that at α = 0.01, we have
sufficient evidence that the regression line is significant.
q
(c) Standard error of βˆ0 = V ar(βˆ0 )
q
Standard error of βˆ1 = V ar(βˆ1 )
x̄2
!
1
V ar(βˆ0 ) = σˆ2 +
n Sxx
302
!
1
= 21.0061 + = 6.349571136
5 8800
⇒ s.e(βˆ0 ) = 2.5198
σˆ2
V ar(βˆ1 ) =
Sxx
= 21.00618800 = 0.002387056
⇒ s.e(βˆ1 ) = 0.0489
14
Coefficient of Variation(CV)
The coefficient of variation (CV) measures the spread of noise (natural dis-
persion) around the regression line. It is given by
s
CV = × 100%
ȳ
√
The CV is scale free so it provides a better measure of spread thats = s2 .
A small value of CV suggests a good fit i.e there is not much noise around
the line.
Specifically, the hypotheses we wish to test are: H0 : The simple linear re-
gression model is correct
H1 : The simple linear regression model is not correct
The test involves partitioning the error or residual sum of squares into the
following components
15
SSE = SSP E + SSLOF
where SSP E is the sum of squares attributable to pure error, and SSLOF is
the sum of squares attributable to the lack of fit of the model.
The test requires that there be replicates (replication) at one or more values
of the predictor/explanatory variable (X).
where ȳi is the average of the ni observations at Xi . Squaring both sides and
summing over i and j yields
ni
m X ni
m X m
2 2
ni (ȳi − ŷi )2
X X X
(yi − ŷi ) = (yij − ȳi ) +
i=1 j=1 i=1 j=1 i=1
There are n − m degrees of freedom associated with the pure error sum of
squares. The sum of squares for lack of fit is simply SSLOF = SSE − SSP E
and it has m − 2 degrees of freedom.
16
The test statistic for lack of fit would then be
SSLOF
∗ (m−2) M SLOF
F = SSP E =
(n−m)
M SP E
F ∗ ∼ F(m−2,n−m)
Exercise 6 The following data set gives the cost of maintenance of a tractor
(Y) and the age of that tractor (X)
Age X Cost Y
4.5 62
4.5 105
4.5 103
4.0 50
4.0 72
5.0 68
5.0 89
5.5 99
1.0 16
1.0 18
6.0 76
2.5 98
2.5 47
2.5 55
17
(a) Fit a simple linear regression model to the data.
(b) Construct the ANOVA table and use the F test to test the significance
of the regression with α = 0.05.
(c) Test the significance of the regression constant (the intercept) using
α = 0.01.
Solution
(a)
yi = β0 + β1 xi + i
xy − nx̄ȳ
P
βˆ1 = P 2
x − nx̄2
4003.9 − 14(3.7256)(68.4286)
=
227.14 − 14(3.7256)2
431.9286
= = 13.2866
32.5086
(b)
yi2 − nȳ 2
X
SST =
= 76702 − 14(68.4286)2 = 11147.4286
18
SSR = βˆ1 Sxy
= 13.2866(431.9286) = 5738.8625
SSR
M SR = 1
= SSR
SSE
M SE = 12
= 450.7138
M SR
F = M SE
= 12.7328
H0 : β1 = 0
H1 : β1 6= 0
Since F > 4.75, we reject H0 and conclude that the regression is sig-
nificant.
19
(c) H0 : β0 = 0
H1 : β0 6= 0
βˆ0 −β0
Test statistic: t = s.e(βˆ0 )
∼ tα (n − 2)
Rejection Criteria
We reject H0 if t > t α2 (n − 2) = t0.005 (12) = 3.05
Test statistic
βˆ0
t=
ˆ 0)
s.e(beta
βˆ0
=
2 1 ˆ
2
σ n + Sx̄xx
18.8885
= = 1.2603
14.9878
Since t < 3.05, we fail to reject H0 and conclude that the regression
constant is not significant.
(yij − y¯.j )2
PP
xi yij y¯.j d.f
4.5 62 105 103 90 1178 2
4.0 50 72 61 242 1
5.0 68 89 78.5 220.5 1
5.5 99 99 0 0
1.0 16 18 17 2 1
6.0 76 76 0 0
2.5 98 47 55 66.7 1504.6667 2
20
nj
7 X
(yij − y¯.j )2 = 3147.166667
X
SSP E =
j=1 i=1
m = 7, therefore d.f = n − m = 14 − 7 = 7
SSLOF = SSE − SSP E = 2261.399433, d.f = m − 2 = 7 − 2 = 5
M SP E = SS
n−m
PE
= 449.5952381
SSLOF
M SLOF = m−2 = 452.2798866
M SLOF
F = M SP E
= 1.005971257
Activity 3.1
1 To understand better the plight of people returning to professions after
periods of inactivity, a survey was conducted at some 67 randomly se-
lected hospitals throughout Zimbabwe. The administrators were asked
about their willingness to hire medical technologists who had been away
from the field for a certain number of years. The results are summarized
in table below
21
Years of Inactivity X Percentage of Hospitals willing to Hire Y
0.25 100
1.5 94
4 75
8 44
13 28
18 17
22
Years of Inactivity X Percentage of Hospitals willing to Hire Y
4 3.9
8 8.1
12.5 12.4
16 16
20 19.8
25 25
31 31.1
36 35.8
40 40.1
(3) Suppose that the following data set on the cost of maintenance of a
tractor (Y) and the age of that tractor (X) was collected in Mashona-
land West amongst commercial farmers.
23
Age X Cost Y
4 55
4 85
4 100
3.5 50
3.5 72
4.5 78
4.5 92
5.0 105
1.0 25
1.0 29
3.0 52
1.5 37
1.5 47
1.5 55
24