0% found this document useful (0 votes)
48 views

Chapter3-Goodness of Fit Tests

This chapter discusses goodness of fit for simple linear regression models. It presents the properties of the least squares estimators for the slope (β1) and intercept (β0), including that they are unbiased. It also provides the variance formulas for β1 and β0. Hypothesis tests are introduced to test if the slope β1 or intercept β0 equal specific values like 0. The test statistics follow a t-distribution and the sample variance of the residuals s^2 is used to estimate the error variance σ^2. Confidence intervals for β1 and β0 are also described.

Uploaded by

joseph kamwendo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Chapter3-Goodness of Fit Tests

This chapter discusses goodness of fit for simple linear regression models. It presents the properties of the least squares estimators for the slope (β1) and intercept (β0), including that they are unbiased. It also provides the variance formulas for β1 and β0. Hypothesis tests are introduced to test if the slope β1 or intercept β0 equal specific values like 0. The test statistics follow a t-distribution and the sample variance of the residuals s^2 is used to estimate the error variance σ^2. Confidence intervals for β1 and β0 are also described.

Uploaded by

joseph kamwendo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

STAT112 APPLIED STATISTICS II

CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR


REGRESSION

Introduction
In the previous section, we discussed and derived the least squares estimates
for the model yi = β0 + β1 x1 + i .
If the model is fitted and no assumptions are violated, the next step is to use
the model to investigate the relationship between the independent and the
dependent variables as well as to make inference about the parameters.

Properties of the Estimators


The expectation of βˆ0 and βˆ1 can be shown to be β0 and β1 , respectively,
that is the least squares estimates of β0 and β1 are unbiased.

For the slope (β1 ), we have

Sxy
 
E[βˆ1 ] = E
Sxx
1
= E[Sxy ]
Sxx
n
1 X
= E[ (xi − x̄)(yi − ȳ)]
Sxx i=1
n
1 X
= (xi − x̄)(β0 + β1 xi − β0 − β1 x̄)
Sxx i=1
n
β1 X
= (xi − x̄)(xi − x̄)
Sxx i=1
β1 Sxx
= = β1
Sxx

1
For the intercept (β0 ), we have,

E[βˆ0 ] = E[ȳ − βˆ1 x̄]


= E(ȳ) − β1 x̄
1 X
= E(yi ) − β1 x̄
n
1X
= (β0 + β1 xi ) − β1 x̄
n
1X
= β0 + β1 xi − β1 x̄
n
= β0
NB: V ar(yi ) = var(i ) = σ 2 , (homogeneous variance assumption). That is,
the model error variance is constant for a fixed value of the regressor variable.

The variance of βˆ1 is given by

Sxy
V ar(βˆ1 ) = V ar( )
Sxx
n
" #
1 X
= V ar (xi − x̄)(yi − ȳ)
Sxx i=1
n
2 X
1

= (xi − x̄)2 V ar(yi − ȳ)
Sxx i=1
n
2 X
1

= (xi − x̄)2 V ar(yi )
Sxx i=1
n
2 X
1

= (xi − x̄)2 σ 2
Sxx i=1
Sxx
= 2
σ
Sxx
σ2
=
Sxx
The variance of βˆ0 is given by

2
V ar(βˆ0 ) = V ar(ȳ − βˆ1 x̄)
= V ar(ȳ) + V ar(βˆ1 x̄) − Cov(ȳ, βˆ1 x̄)
n
1X
= V ar( yi ) + x̄2 V ar(βˆ1 )
n i=1
n
1 X
= V ar(yi ) + x̄2 V ar(βˆ1 )
n2 i=1
n 2
1 X 2 2 σ
= σ + x̄
n2 i=1 Sxx
σ 2 σ 2 x̄2
= +
n Sxx
x̄2
!
1
= σ2 +
n Sxx
Exercise 1 Find the covariance between ŷ and β1 .
2
The sampling distribution of βˆ0 is given by βˆ0 ∼ N (β0 , σ 2 ( n1 + Sx̄xx )), where σ 2
is the variance of the error term. (This is because βˆ0 is a linear combination
of normal random variables, it must also be normal).

2
The sampling distribution of βˆ1 is given by βˆ1 ∼ N (β1 , Sσxx )

Now, using the properties of the sampling distributions of βˆ0 and βˆ1 , inference
about β0 and β1 can be made. Before we infer on β0 and β1 , we need to
estimate σ 2 . To estimate σ 2 , we will use the residuals, yi − ŷi , which are the
observed errors of fit. It is reasonable to say that the sample variance of the
residuals should provide an estimator of s2 .
n
1 X
s2 = (yi − ŷi )2
n − 2 i=1
n
1 X
= r2
n − 2 i=1

3
NB

(i) s2 = M SE

(ii) under the regression model assumptions, s2 is an unbiased estimator of


σ2.

(iii) The n−2 in the denominator comes from the fact that we have n pieces
of information less 2 estimates of β0 and β1 .

Making inference about parameters, β0 and β1


H0 : β1 = 0 (There is no linear relationship between Y and X)
H1 : β1 6= 0 (There is a linear relationship between Y and X)

β1 = 0 means that changes in x have no effect on y.


NB: The rejection of H0 does not say anything about the fit (whether it is
good or bad)
Test statistic:

βˆ1 − β1
t= q 2
s
Sxx

βˆ1
=q 2
s
Sxx

βˆ1 Sxx
=
s
Rejection Criteria

We reject H0 if | t |> tn−2 ( α2 )

Tests involving β0
H0 : β0 = 0
H1 : β0 6= 0

4
Test statistic:

βˆ0 − β0
t= q 2
s2 ( n1 + Sx̄xx )
βˆ0
= q
x̄2
s ( n1 + Sxx
)

Rejection Criteria

We reject H0 if | t |> tn−2 ( α2 )

Confidence interval for β0 and β1


(i) The (1 − α)100% confidence interval for β1 is given by
 q q 
βˆ1 − tn−2 ( α2 ) s2
Sxx
, βˆ1 + tn−2 ( α2 ) s2
Sxx

(ii) The (1 − α)100% confidence interval for β0 is given by


 q q 
βˆ0 − tn−2 ( α2 ) s2 ( n1 + x̄2
Sxx
, βˆ0 − tn−2 ( α2 ) s2 ( n1 + x̄2
Sxx

5
Exercise 2 The data is of green liquor (N a2 S) concentration and paper ma-
chine production.

Green Liquor (N a2 S) Conc.(yi ) Fitted (N a2 S) conc. ŷi Production Residual ri


40 40.7089 825 -0.7089
42 41.0556 830 0.9444
49 45.2170 890 3.783
46 45.5637 895 0.4363
44 45.2170 890 -1.217
48 46.6041 910 1.3959
46 46.9509 915 -0.9509
43 50.0718 960 -7.0718
53 52.1525 990 0.8475
52 53.5396 1010 -1.5396
54 53.6783 1012 0.3217
57 54.9267 1030 2.0733
58 56.3138 1050 1.6862

The fitted linear regression model is , ŷi = −16.5093 + 0.0694xi


(a) Test H0 : β1 = 0, using the t-test at 5% level of significance.
(b) find a 95% confidence interval for the intercept.
Solution
(a) We need to estimate σ 2 and find Sxx first

( (yi − ŷi )2
P
σˆ2 = s2 =
n−2
P 2
ri 80.5740
= = = 7.3249
n−2 13 − 2

x2 − nx̄2
X
Sxx =
= 11529419 − 13(939)2
= 67046

6
ˆ
Now t = qβ1 2 = √0.0694
7.3249
= 6.6396
s
67046
Sxx

Testing at 5% level of significance, tn−2 ( α2 ) = t11 (0.025) = 2.20


Since | t |> 2.20, we reject H0 and conclude that at 5% level of signifi-
cance, the slope is significantly different from zero → there is a linear
relationship between X and Y.

(b) The (1 − α)100% confidence interval for β0 is given by

 q q 
βˆ0 − tn−2 ( α2 ) s2 ( n1 + x̄2
Sxx
, βˆ0 − tn−2 ( α2 ) s2 ( n1 + x̄2
Sxx

⇒ 95% confidence interval of β0 is


 v !
9392
u
u
−16.5093 ± 2.20t7.3249
1
+ 
13 67046
= (−16.5093 ± 2.20 × 9.8434)
= (−16.5093 ± 21.6555)
= (−38.1648, 5.1462)

ANALYSIS OF VARIANCE APPROACH TO


SIMPLE LINEAR REGRESSION
A method called the Analysis of Variance (ANOVA) can be used to test the
significance of regression. ANOVA is a highly useful and flexible mode of
analysis for regression models.

AIM
To compute σ 2

To measure the degree of linear relationship between X and Y in the


sample data

7
Partitioning the total sum of squares
The measure of variability of the observations is expressed in terms of the
sum of squares of the observations and is denoted by SST given by
Pn
SST = i=1 (yi − ȳi )2
SST = Total Sum of Squares

If there is a lot of variability in the yi s, then SST is large. If SST = 0 → all


the y − is are the same.

Error Sum of Squares


The uncertainty associated with a prediction is related to the variability of
the yi around the fitted regression line as measured by the following devia-
tion, ri = yi − ŷi , (variability not explained by the model). If all the yi values
fall on the regression line, all the deviations ri will be zero.
The conventional measure of variability around the fitted regression is the
error sum of squares (SSE) or (SSResidual ) which is calculated as follows

Pn Pn
SSE = i=1 (yi − ŷi )2 = 2
i=1 ri

If all the yi values fall on the regression line, SSE will be zero. Thus, the
larger the SSE, the greater is the variation of yi observations around the
fitted regression line.

Regression Sum of Squares


The reduction in the variability associated with the utilization of the knowl-
edge of the independent variables Xi is another sum of squares known as
Regression Sum of Squares (SSR). It is defined as

Pn
SSR = i=1 (ŷi − ȳ)2
(variability explained by the model)

SSR=SST-SSE

8
SSR can be viewed as a measure of the effect of the regression relation
in reducing the variability of yi .

If SSR = 0, then the regression calculation will not reduce variability


at all.

SSR can be interpreted as the proportion of variation in Y explained


by the regression.

Thus, for SLR, the decomposition of SST into two components is achieved
as follows

SST = SSR + SSE

n n n
(yi − ȳ)2 = (ŷi − ȳ)2 + (yi − ŷi )2
X X X

i=1 i=1 i=1

The computational formulas for the above are as follows;


n
(yi − ȳ)2
X
SST =
i=1
n
yi2 − nȳ 2
X
=
i=1

n
(ŷi − ȳ)2
X
SSR =
i=1
hP i2
Yi Xi − nȲ X̄
=
Xi2 − nX̄ 2
P

[Sxy ]2
=
Sxx
= βˆ1 Sxy

and SSE = SST − SSR.

9
Partitioning degrees of freedom
SST had n − 1 degrees of freedom (d.f) associated with it. This is because,
SST has n deviations, namely yi − ȳ. However, there is one constraint on
these deviations, namely ni=1 (yi − ȳ) = 0, so we lose one degrees of freedom,
P

to remain with n − 1 degrees of freedom in the n-deviations.


SSE had n − 2 degrees of freedom, since we imposed constraints on the ri ’s
during the estimation of β0 and β1 .
SSR has one (1) degree of freedom, there are two parameters in the re-
gression function, but the deviations ŷi − ȳ are subject to the constraint,
Pn
i=1 (ŷi − ȳ) = 0.

Thus, the degrees of freedom are additive and given by (n − 1) = (1) +(n − 2)

Mean Squares
A sum of squares divided by the degrees of freedom is called a mean square,
e.g s2 = M SE. The two important mean squares are the regression mean
square denoted MSR and the error mean square denoted by MSE.
Thus, M SR = SSR1
and M SE = SSE
n−2
= s2

Some properties of mean squares


It can be shown that

(i) E[M SE] = σ 2

(ii) E[M SR] = σ 2 + β12 (x − i − x̄)2


P

If β1 = 0 ⇒ E[M SR] = σ 2 - in this case, both MSE and MSR have the same
expected value.

When β1 6= 0, the term σ 2 + β12 (x − i − x̄)2 will be positive and E[M SR] >
P

E[M SE]. Hence, if β − 1 6= 0, MSR will tend to be larger than MSE.


NB: SSE
σ2
and SSR
σ2
are independent chi-square random variables, with n − 2
and 1 degrees of freedom, respectively.

10
BASIC ANOVA TABLE
It is useful to collect the sum of squares, degrees of freedom and mean squares
in an ANOVA table for regression analysis. The table below gives the struc-
ture and the appearance of the basic ANOVA table.

Table 1: BASIC ANOVA TABLE

Source of Variation Sum of Squares degrees of freedom Mean Square F


SSR M SR
SSR = (ŷi − ȳ)2
P
Regression 1 M SR = 1 F =M SE
SSE = (yi − ŷi )2 M SE = SSE
P
Error n-2 n−2
SST = (yi − ȳ)2
P
Total n-1

From the ANOVA table, we can get the estimate of the variance, s2 and
test the hypothesis that there is a regression relationship. The ratio F in
the ANOVA table has the Fisher’s distribution with 1 and n − 2 degrees of
freedom, if the assumption of the model holds.

If F is near 1, then MSR and MSE are approximately equal. F > 1, suggests
that β1 6= 0.

Our hypotheses are as follows

H0 : β1 = 0
H1 : β1 6= 0

Test statistic=F
Decision Rule

We reject H0 if F > F1,n−2 (1 − α)


Exercise 3 An investigator interested in the dependence of the speed of
sound on temperature obtained the following measurements.
The suggested model is yi = β0 + βi xi + 1 , f or 1 = 1, 2, ..., 5
(a) Find the least squares estimates of β0 and β1 .

11
X, Temperature (o C) Y , Speed (m/s)
-20 323
0 327
20 340
50 364
100 384

(b) Construct the ANOVA table for this data set and hence test the hypoth-
esis that the slope is zero. Use α = 0.01.

(c) Find the standard errors of β0 and β1 .

Solution

(a)
X
Sxy = xi yi − nx̄ȳ
= 56940 − 5(30)(347.6) = 4800

x2i − nx̄2
X
Sxx =
= 13300 − 5(30)2 = 8800

Sxy
βˆ1 =
Sxx
4800
= = 0.5455
8800

βˆ0 = ȳ − βˆ1 x̄
= 347.6 − 0.5455(30) = 331.2364

Therefore, ŷi = 331.2364 + 0.5455xi

12
(b)
X
SST = (yi − ȳ)
yi2 − nȳ 2
X
=
= 606810 − 5(347.6)2
= 2681.2

X
SSY = (ŷi − ȳ)
2
Sxy
=
Sxx
48002
=
8800
= 2618.1818

SSE
Therefore SSE = SST − SSR = 63.0182. M SE = n−2
= 21.0061,
M SR = SSR1
= 2618.1818

M SR
F = M SE
= 124.6394

Table 2: ANOVA TABLE

Source of Variation Sum of Squares degrees of freedom Mean Square F


Regression 2618.1818 1 2618.1818 124.6394
Error 63.0182 3 21.0061
Total 2681.2 4

H0 : β1 = 0
H1 : β1 6= 0

Test statistic : F = 124.6394


Rejection Criteria
We reject H0 if F > F1,3 (0.01) = 34.1

13
Since F > 34.1, we reject H0 and conclude that at α = 0.01, we have
sufficient evidence that the regression line is significant.
q
(c) Standard error of βˆ0 = V ar(βˆ0 )
q
Standard error of βˆ1 = V ar(βˆ1 )

x̄2
!
1
V ar(βˆ0 ) = σˆ2 +
n Sxx
302
!
1
= 21.0061 + = 6.349571136
5 8800
⇒ s.e(βˆ0 ) = 2.5198

σˆ2
V ar(βˆ1 ) =
Sxx
= 21.00618800 = 0.002387056
⇒ s.e(βˆ1 ) = 0.0489

Coefficient of Determination (R2)


The quantity R2 = SSR SST
= 1 − SSE
SST
is called the coefficient of determina-
tion and is often used to judge the adequacy of a regression line/model. In
the case where X and Y are jointly distributed random variables, R2 is the
square of the correlation coefficient between X and Y .
0 ≤ R2 ≤ 1

We often refer loosely to R2 as the amount of variability in the data explained


or accounted for by the regression model.
Exercise 4 From the previous example R2 = SSR SST
= 2618.1818
2681.2
= 0.9765
That is, the model accounts for 97.65% of the variability in the data.

14
Coefficient of Variation(CV)
The coefficient of variation (CV) measures the spread of noise (natural dis-
persion) around the regression line. It is given by
s
CV = × 100%


The CV is scale free so it provides a better measure of spread thats = s2 .
A small value of CV suggests a good fit i.e there is not much noise around
the line.

Exercise 5 Referring to the previous example;



21.0061
CV = × 100%
347.6
= 1.3185%

⇒ our model is a good fit to the data.

A small value, suggesting minimal variation about the regression line

The Lack of Fit test


Regression models are often fit to data as an approximating function when
the true relationship between the variables Y and X is unknown. Naturally,
we would like to know whether the order of the model tentatively assumed
is correct.

Specifically, the hypotheses we wish to test are: H0 : The simple linear re-
gression model is correct
H1 : The simple linear regression model is not correct

The test involves partitioning the error or residual sum of squares into the
following components

15
SSE = SSP E + SSLOF

where SSP E is the sum of squares attributable to pure error, and SSLOF is
the sum of squares attributable to the lack of fit of the model.

The test requires that there be replicates (replication) at one or more values
of the predictor/explanatory variable (X).

Suppose we have n total observations such that


y11 , y12 , ..., y1n1 repeated observations at X1 .
y21 , y22 , ..., y2n2 repeated observations at X2 .
.
.
.
ym1 , ym2 , ..., ymnm repeated observations at Xm .
NB: There are m distinct levels of X.
To develop the partitioning of SSE. Note that the (ij)th residual is

(yi − ŷi ) = (yij − ȳi ) + (ȳi − ŷi )

where ȳi is the average of the ni observations at Xi . Squaring both sides and
summing over i and j yields
ni
m X ni
m X m
2 2
ni (ȳi − ŷi )2
X X X
(yi − ŷi ) = (yij − ȳi ) +
i=1 j=1 i=1 j=1 i=1

SSE = SSP E + SSLOF


Since the cross-product term equals zero.

There are n − m degrees of freedom associated with the pure error sum of
squares. The sum of squares for lack of fit is simply SSLOF = SSE − SSP E
and it has m − 2 degrees of freedom.

16
The test statistic for lack of fit would then be

SSLOF
∗ (m−2) M SLOF
F = SSP E =
(n−m)
M SP E

F ∗ ∼ F(m−2,n−m)

We would reject H0 if F ∗ > Fα (m − 2, n − m)


This test procedure may be easily introduced into the analysis of variance
conducted for the significance of regression.

If H0 is not rejected ⇒ model must be abandoned and attempt must be made


to find a more appropriate model. If not rejected ⇒ no apparent reason to
doubt the adequacy of the model.

Exercise 6 The following data set gives the cost of maintenance of a tractor
(Y) and the age of that tractor (X)

Age X Cost Y
4.5 62
4.5 105
4.5 103
4.0 50
4.0 72
5.0 68
5.0 89
5.5 99
1.0 16
1.0 18
6.0 76
2.5 98
2.5 47
2.5 55

17
(a) Fit a simple linear regression model to the data.

(b) Construct the ANOVA table and use the F test to test the significance
of the regression with α = 0.05.

(c) Test the significance of the regression constant (the intercept) using
α = 0.01.

(d) Test for lack of fit using α = 0.05.

Solution

(a)

yi = β0 + β1 xi + i

ŷi = βˆ0 + βˆ1 xi

where βˆ0 = ȳ − βˆ1 x̄ and βˆ1 = Sxy


Sxx

xy − nx̄ȳ
P
βˆ1 = P 2
x − nx̄2
4003.9 − 14(3.7256)(68.4286)
=
227.14 − 14(3.7256)2
431.9286
= = 13.2866
32.5086

βˆ0 = 68.4286 − (13.2866)(3.7256) = 18.8885

Therefore ŷi = 18.8885 + 13.2866xi

(b)

yi2 − nȳ 2
X
SST =
= 76702 − 14(68.4286)2 = 11147.4286

18
SSR = βˆ1 Sxy
= 13.2866(431.9286) = 5738.8625

SSE = SST − SSR = 5408.5661

d.f for SSR is 1


d.f for SSE are n − 2 = 12
d.f for SST are n − 1 = 13

SSR
M SR = 1
= SSR
SSE
M SE = 12
= 450.7138

M SR
F = M SE
= 12.7328

Table 3: ANOVA TABLE

Source of Variation Sum of Squares degrees of freedom Mean Square F


Regression 5738.8625 1 5738.8625 12.7328
Error 5408.5661 12 450.7138
Total 11147.4286 13

H0 : β1 = 0
H1 : β1 6= 0

Test statistic: F = 12.7328

Testing at α = 0.05, we reject H0 if F > F0.05 (1, 12) = 4.75

Since F > 4.75, we reject H0 and conclude that the regression is sig-
nificant.

19
(c) H0 : β0 = 0
H1 : β0 6= 0

βˆ0 −β0
Test statistic: t = s.e(βˆ0 )
∼ tα (n − 2)

Rejection Criteria
We reject H0 if t > t α2 (n − 2) = t0.005 (12) = 3.05

Test statistic

βˆ0
t=
ˆ 0)
s.e(beta
βˆ0
=
2 1 ˆ
 2

σ n + Sx̄xx
18.8885
= = 1.2603
14.9878

Since t < 3.05, we fail to reject H0 and conclude that the regression
constant is not significant.

(d) H0 : The simple linear regression model is correct. (E(Y ) = β0 + β1 X)


H1 : The simple linear regression model is not correct. (E(Y ) 6= β0 +
β1 X)

(yij − y¯.j )2
PP
xi yij y¯.j d.f
4.5 62 105 103 90 1178 2
4.0 50 72 61 242 1
5.0 68 89 78.5 220.5 1
5.5 99 99 0 0
1.0 16 18 17 2 1
6.0 76 76 0 0
2.5 98 47 55 66.7 1504.6667 2

20
nj
7 X
(yij − y¯.j )2 = 3147.166667
X
SSP E =
j=1 i=1

m = 7, therefore d.f = n − m = 14 − 7 = 7
SSLOF = SSE − SSP E = 2261.399433, d.f = m − 2 = 7 − 2 = 5

M SP E = SS
n−m
PE
= 449.5952381
SSLOF
M SLOF = m−2 = 452.2798866
M SLOF
F = M SP E
= 1.005971257

Table 4: ANOVA TABLE

Source of Variation Sum of Squares degrees of freedom Mean Square F


Regression 5738.8625 1 5738.8625 12.7328
Error 5408.5661 12 450.7138
Lack of Fit 2261.3994 5 452.2799 1.0060
Pure Error 3147.1667 7 449.5952
Total 11147.4286 13

We reject H0 if F > F0.05 (5, 7) = 3.97


Since F < 3.97, we fail to reject H0 and conclude that there is no
sufficient evidence to say that the simple linear regression model is not
correct.

Activity 3.1
1 To understand better the plight of people returning to professions after
periods of inactivity, a survey was conducted at some 67 randomly se-
lected hospitals throughout Zimbabwe. The administrators were asked
about their willingness to hire medical technologists who had been away
from the field for a certain number of years. The results are summarized
in table below

21
Years of Inactivity X Percentage of Hospitals willing to Hire Y
0.25 100
1.5 94
4 75
8 44
13 28
18 17

(a) Draw a scatter plot for this data and comment.


(b) Fit a straight line.
(c) Construct an ANOVA table and test for the significance of the
regression line.
(d) Calculate the coefficient of determination and comment on the
adequacy of the model.
(e) Compute the coefficient of variation (CV) and comment on the
goodness of fit of the model.
(f) Estimate the percent of Hospitals willingness to hire given that
some medical practitioner has been inactive for 11 years and also
estimate the number of years of inactivity for which the percent
of hospitals willingness to hire is 50%.
(g) Compute 95% confidence intervals for β0 and β1 .

2 Given the following data

22
Years of Inactivity X Percentage of Hospitals willing to Hire Y
4 3.9
8 8.1
12.5 12.4
16 16
20 19.8
25 25
31 31.1
36 35.8
40 40.1

Fit the data with a regression equation and test whether β1 = 0.

(3) Suppose that the following data set on the cost of maintenance of a
tractor (Y) and the age of that tractor (X) was collected in Mashona-
land West amongst commercial farmers.

23
Age X Cost Y
4 55
4 85
4 100
3.5 50
3.5 72
4.5 78
4.5 92
5.0 105
1.0 25
1.0 29
3.0 52
1.5 37
1.5 47
1.5 55

(a) Fit a simple linear regression model to the data.


(b) Construct the ANOVA table and use the F test to test the signif-
icance of the regression with α = 0.05.
(c) Test the significance of the regression constant (the intercept) us-
ing α = 0.05.
(d) Test for lack of fit using α = 0.05.

24

You might also like