0% found this document useful (0 votes)

48 views

Chapter3-Goodness of Fit Tests

This chapter discusses goodness of fit for simple linear regression models. It presents the properties of the least squares estimators for the slope (β1) and intercept (β0), including that they are unbiased. It also provides the variance formulas for β1 and β0. Hypothesis tests are introduced to test if the slope β1 or intercept β0 equal specific values like 0. The test statistics follow a t-distribution and the sample variance of the residuals s^2 is used to estimate the error variance σ^2. Confidence intervals for β1 and β0 are also described.

Uploaded by

joseph kamwendo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views

Chapter3-Goodness of Fit Tests

Uploaded by

joseph kamwendo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

STAT112 APPLIED STATISTICS II

CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR

REGRESSION

Introduction
In the previous section, we discussed and derived the least squares estimates
for the model yi = β0 + β1 x1 + i .
If the model is fitted and no assumptions are violated, the next step is to use
the model to investigate the relationship between the independent and the
dependent variables as well as to make inference about the parameters.

Properties of the Estimators

The expectation of βˆ0 and βˆ1 can be shown to be β0 and β1 , respectively,
that is the least squares estimates of β0 and β1 are unbiased.

For the slope (β1 ), we have

Sxy

E[βˆ1 ] = E
Sxx
1
= E[Sxy ]
Sxx
n
1 X
= E[ (xi − x̄)(yi − ȳ)]
Sxx i=1
n
1 X
= (xi − x̄)(β0 + β1 xi − β0 − β1 x̄)
Sxx i=1
n
β1 X
= (xi − x̄)(xi − x̄)
Sxx i=1
β1 Sxx
= = β1
Sxx

1
For the intercept (β0 ), we have,

E[βˆ0 ] = E[ȳ − βˆ1 x̄]

= E(ȳ) − β1 x̄
1 X
= E(yi ) − β1 x̄
n
1X
= (β0 + β1 xi ) − β1 x̄
n
1X
= β0 + β1 xi − β1 x̄
n
= β0
NB: V ar(yi ) = var(i ) = σ 2 , (homogeneous variance assumption). That is,
the model error variance is constant for a fixed value of the regressor variable.

The variance of βˆ1 is given by

Sxy
V ar(βˆ1 ) = V ar( )
Sxx
n
" #
1 X
= V ar (xi − x̄)(yi − ȳ)
Sxx i=1
n
2 X
1

= (xi − x̄)2 V ar(yi − ȳ)
Sxx i=1
n
2 X
1

= (xi − x̄)2 V ar(yi )
Sxx i=1
n
2 X
1

= (xi − x̄)2 σ 2
Sxx i=1
Sxx
= 2
σ
Sxx
σ2
=
Sxx
The variance of βˆ0 is given by

2
V ar(βˆ0 ) = V ar(ȳ − βˆ1 x̄)
= V ar(ȳ) + V ar(βˆ1 x̄) − Cov(ȳ, βˆ1 x̄)
n
1X
= V ar( yi ) + x̄2 V ar(βˆ1 )
n i=1
n
1 X
= V ar(yi ) + x̄2 V ar(βˆ1 )
n2 i=1
n 2
1 X 2 2 σ
= σ + x̄
n2 i=1 Sxx
σ 2 σ 2 x̄2
= +
n Sxx
x̄2
!
1
= σ2 +
n Sxx
Exercise 1 Find the covariance between ŷ and β1 .
2
The sampling distribution of βˆ0 is given by βˆ0 ∼ N (β0 , σ 2 ( n1 + Sx̄xx )), where σ 2
is the variance of the error term. (This is because βˆ0 is a linear combination
of normal random variables, it must also be normal).

2
The sampling distribution of βˆ1 is given by βˆ1 ∼ N (β1 , Sσxx )

Now, using the properties of the sampling distributions of βˆ0 and βˆ1 , inference
about β0 and β1 can be made. Before we infer on β0 and β1 , we need to
estimate σ 2 . To estimate σ 2 , we will use the residuals, yi − ŷi , which are the
observed errors of fit. It is reasonable to say that the sample variance of the
residuals should provide an estimator of s2 .
n
1 X
s2 = (yi − ŷi )2
n − 2 i=1
n
1 X
= r2
n − 2 i=1

3
NB

(i) s2 = M SE

(ii) under the regression model assumptions, s2 is an unbiased estimator of

σ2.

(iii) The n−2 in the denominator comes from the fact that we have n pieces
of information less 2 estimates of β0 and β1 .

Making inference about parameters, β0 and β1

H0 : β1 = 0 (There is no linear relationship between Y and X)
H1 : β1 6= 0 (There is a linear relationship between Y and X)

β1 = 0 means that changes in x have no effect on y.

NB: The rejection of H0 does not say anything about the fit (whether it is
good or bad)
Test statistic:

βˆ1 − β1
t= q 2
s
Sxx

βˆ1
=q 2
s
Sxx
√
βˆ1 Sxx
=
s
Rejection Criteria

We reject H0 if | t |> tn−2 ( α2 )

Tests involving β0
H0 : β0 = 0
H1 : β0 6= 0

4
Test statistic:

βˆ0 − β0
t= q 2
s2 ( n1 + Sx̄xx )
βˆ0
= q
x̄2
s ( n1 + Sxx
)

Rejection Criteria

We reject H0 if | t |> tn−2 ( α2 )

Confidence interval for β0 and β1

(i) The (1 − α)100% confidence interval for β1 is given by
q q
βˆ1 − tn−2 ( α2 ) s2
Sxx
, βˆ1 + tn−2 ( α2 ) s2
Sxx

(ii) The (1 − α)100% confidence interval for β0 is given by

q q
βˆ0 − tn−2 ( α2 ) s2 ( n1 + x̄2
Sxx
, βˆ0 − tn−2 ( α2 ) s2 ( n1 + x̄2
Sxx

5
Exercise 2 The data is of green liquor (N a2 S) concentration and paper ma-
chine production.

Green Liquor (N a2 S) Conc.(yi ) Fitted (N a2 S) conc. ŷi Production Residual ri

40 40.7089 825 -0.7089
42 41.0556 830 0.9444
49 45.2170 890 3.783
46 45.5637 895 0.4363
44 45.2170 890 -1.217
48 46.6041 910 1.3959
46 46.9509 915 -0.9509
43 50.0718 960 -7.0718
53 52.1525 990 0.8475
52 53.5396 1010 -1.5396
54 53.6783 1012 0.3217
57 54.9267 1030 2.0733
58 56.3138 1050 1.6862

The fitted linear regression model is , ŷi = −16.5093 + 0.0694xi

(a) Test H0 : β1 = 0, using the t-test at 5% level of significance.
(b) find a 95% confidence interval for the intercept.
Solution
(a) We need to estimate σ 2 and find Sxx first

( (yi − ŷi )2
P
σˆ2 = s2 =
n−2
P 2
ri 80.5740
= = = 7.3249
n−2 13 − 2

x2 − nx̄2
X
Sxx =
= 11529419 − 13(939)2
= 67046

6
ˆ
Now t = qβ1 2 = √0.0694
7.3249
= 6.6396
s
67046
Sxx

Testing at 5% level of significance, tn−2 ( α2 ) = t11 (0.025) = 2.20

Since | t |> 2.20, we reject H0 and conclude that at 5% level of signifi-
cance, the slope is significantly different from zero → there is a linear
relationship between X and Y.

(b) The (1 − α)100% confidence interval for β0 is given by

q q
βˆ0 − tn−2 ( α2 ) s2 ( n1 + x̄2
Sxx
, βˆ0 − tn−2 ( α2 ) s2 ( n1 + x̄2
Sxx

⇒ 95% confidence interval of β0 is

 v !
9392
u
u
−16.5093 ± 2.20t7.3249
1
+ 
13 67046
= (−16.5093 ± 2.20 × 9.8434)
= (−16.5093 ± 21.6555)
= (−38.1648, 5.1462)

ANALYSIS OF VARIANCE APPROACH TO

SIMPLE LINEAR REGRESSION
A method called the Analysis of Variance (ANOVA) can be used to test the
significance of regression. ANOVA is a highly useful and flexible mode of
analysis for regression models.

AIM
To compute σ 2

To measure the degree of linear relationship between X and Y in the

sample data

7
Partitioning the total sum of squares
The measure of variability of the observations is expressed in terms of the
sum of squares of the observations and is denoted by SST given by
Pn
SST = i=1 (yi − ȳi )2
SST = Total Sum of Squares

If there is a lot of variability in the yi s, then SST is large. If SST = 0 → all

the y − is are the same.

Error Sum of Squares

The uncertainty associated with a prediction is related to the variability of
the yi around the fitted regression line as measured by the following devia-
tion, ri = yi − ŷi , (variability not explained by the model). If all the yi values
fall on the regression line, all the deviations ri will be zero.
The conventional measure of variability around the fitted regression is the
error sum of squares (SSE) or (SSResidual ) which is calculated as follows

Pn Pn
SSE = i=1 (yi − ŷi )2 = 2
i=1 ri

If all the yi values fall on the regression line, SSE will be zero. Thus, the
larger the SSE, the greater is the variation of yi observations around the
fitted regression line.

Regression Sum of Squares

The reduction in the variability associated with the utilization of the knowl-
edge of the independent variables Xi is another sum of squares known as
Regression Sum of Squares (SSR). It is defined as

Pn
SSR = i=1 (ŷi − ȳ)2
(variability explained by the model)

SSR=SST-SSE

8
SSR can be viewed as a measure of the effect of the regression relation
in reducing the variability of yi .

If SSR = 0, then the regression calculation will not reduce variability

at all.

SSR can be interpreted as the proportion of variation in Y explained

by the regression.

Thus, for SLR, the decomposition of SST into two components is achieved
as follows

SST = SSR + SSE

n n n
(yi − ȳ)2 = (ŷi − ȳ)2 + (yi − ŷi )2
X X X

i=1 i=1 i=1

The computational formulas for the above are as follows;

n
(yi − ȳ)2
X
SST =
i=1
n
yi2 − nȳ 2
X
=
i=1

n
(ŷi − ȳ)2
X
SSR =
i=1
hP i2
Yi Xi − nȲ X̄
=
Xi2 − nX̄ 2
P

[Sxy ]2
=
Sxx
= βˆ1 Sxy

and SSE = SST − SSR.

9
Partitioning degrees of freedom
SST had n − 1 degrees of freedom (d.f) associated with it. This is because,
SST has n deviations, namely yi − ȳ. However, there is one constraint on
these deviations, namely ni=1 (yi − ȳ) = 0, so we lose one degrees of freedom,
P

to remain with n − 1 degrees of freedom in the n-deviations.

SSE had n − 2 degrees of freedom, since we imposed constraints on the ri ’s
during the estimation of β0 and β1 .
SSR has one (1) degree of freedom, there are two parameters in the re-
gression function, but the deviations ŷi − ȳ are subject to the constraint,
Pn
i=1 (ŷi − ȳ) = 0.

Thus, the degrees of freedom are additive and given by (n − 1) = (1) +(n − 2)

Mean Squares
A sum of squares divided by the degrees of freedom is called a mean square,
e.g s2 = M SE. The two important mean squares are the regression mean
square denoted MSR and the error mean square denoted by MSE.
Thus, M SR = SSR1
and M SE = SSE
n−2
= s2

Some properties of mean squares

It can be shown that

(i) E[M SE] = σ 2

(ii) E[M SR] = σ 2 + β12 (x − i − x̄)2

If β1 = 0 ⇒ E[M SR] = σ 2 - in this case, both MSE and MSR have the same
expected value.

When β1 6= 0, the term σ 2 + β12 (x − i − x̄)2 will be positive and E[M SR] >
P

E[M SE]. Hence, if β − 1 6= 0, MSR will tend to be larger than MSE.

NB: SSE
σ2
and SSR
σ2
are independent chi-square random variables, with n − 2
and 1 degrees of freedom, respectively.

10
BASIC ANOVA TABLE
It is useful to collect the sum of squares, degrees of freedom and mean squares
in an ANOVA table for regression analysis. The table below gives the struc-
ture and the appearance of the basic ANOVA table.

Table 1: BASIC ANOVA TABLE

Source of Variation Sum of Squares degrees of freedom Mean Square F

SSR M SR
SSR = (ŷi − ȳ)2
P
Regression 1 M SR = 1 F =M SE
SSE = (yi − ŷi )2 M SE = SSE
P
Error n-2 n−2
SST = (yi − ȳ)2
P
Total n-1

From the ANOVA table, we can get the estimate of the variance, s2 and
test the hypothesis that there is a regression relationship. The ratio F in
the ANOVA table has the Fisher’s distribution with 1 and n − 2 degrees of
freedom, if the assumption of the model holds.

If F is near 1, then MSR and MSE are approximately equal. F > 1, suggests
that β1 6= 0.

Our hypotheses are as follows

H0 : β1 = 0
H1 : β1 6= 0

Test statistic=F
Decision Rule

We reject H0 if F > F1,n−2 (1 − α)

Exercise 3 An investigator interested in the dependence of the speed of
sound on temperature obtained the following measurements.
The suggested model is yi = β0 + βi xi + 1 , f or 1 = 1, 2, ..., 5
(a) Find the least squares estimates of β0 and β1 .

11
X, Temperature (o C) Y , Speed (m/s)
-20 323
0 327
20 340
50 364
100 384

(b) Construct the ANOVA table for this data set and hence test the hypoth-
esis that the slope is zero. Use α = 0.01.

(c) Find the standard errors of β0 and β1 .

Solution

(a)
X
Sxy = xi yi − nx̄ȳ
= 56940 − 5(30)(347.6) = 4800

x2i − nx̄2
X
Sxx =
= 13300 − 5(30)2 = 8800

Sxy
βˆ1 =
Sxx
4800
= = 0.5455
8800

βˆ0 = ȳ − βˆ1 x̄
= 347.6 − 0.5455(30) = 331.2364

Therefore, ŷi = 331.2364 + 0.5455xi

12
(b)
X
SST = (yi − ȳ)
yi2 − nȳ 2
X
=
= 606810 − 5(347.6)2
= 2681.2

X
SSY = (ŷi − ȳ)
2
Sxy
=
Sxx
48002
=
8800
= 2618.1818

SSE
Therefore SSE = SST − SSR = 63.0182. M SE = n−2
= 21.0061,
M SR = SSR1
= 2618.1818

M SR
F = M SE
= 124.6394

Table 2: ANOVA TABLE

Source of Variation Sum of Squares degrees of freedom Mean Square F

Regression 2618.1818 1 2618.1818 124.6394
Error 63.0182 3 21.0061
Total 2681.2 4

H0 : β1 = 0
H1 : β1 6= 0

Test statistic : F = 124.6394

Rejection Criteria
We reject H0 if F > F1,3 (0.01) = 34.1

13
Since F > 34.1, we reject H0 and conclude that at α = 0.01, we have
sufficient evidence that the regression line is significant.
q
(c) Standard error of βˆ0 = V ar(βˆ0 )
q
Standard error of βˆ1 = V ar(βˆ1 )

x̄2
!
1
V ar(βˆ0 ) = σˆ2 +
n Sxx
302
!
1
= 21.0061 + = 6.349571136
5 8800
⇒ s.e(βˆ0 ) = 2.5198

σˆ2
V ar(βˆ1 ) =
Sxx
= 21.00618800 = 0.002387056
⇒ s.e(βˆ1 ) = 0.0489

Coefficient of Determination (R2)

The quantity R2 = SSR SST
= 1 − SSE
SST
is called the coefficient of determina-
tion and is often used to judge the adequacy of a regression line/model. In
the case where X and Y are jointly distributed random variables, R2 is the
square of the correlation coefficient between X and Y .
0 ≤ R2 ≤ 1

We often refer loosely to R2 as the amount of variability in the data explained

or accounted for by the regression model.
Exercise 4 From the previous example R2 = SSR SST
= 2618.1818
2681.2
= 0.9765
That is, the model accounts for 97.65% of the variability in the data.

14
Coefficient of Variation(CV)
The coefficient of variation (CV) measures the spread of noise (natural dis-
persion) around the regression line. It is given by
s
CV = × 100%
ȳ
√
The CV is scale free so it provides a better measure of spread thats = s2 .
A small value of CV suggests a good fit i.e there is not much noise around
the line.

Exercise 5 Referring to the previous example;

√
21.0061
CV = × 100%
347.6
= 1.3185%

⇒ our model is a good fit to the data.

A small value, suggesting minimal variation about the regression line

The Lack of Fit test

Regression models are often fit to data as an approximating function when
the true relationship between the variables Y and X is unknown. Naturally,
we would like to know whether the order of the model tentatively assumed
is correct.

Specifically, the hypotheses we wish to test are: H0 : The simple linear re-
gression model is correct
H1 : The simple linear regression model is not correct

The test involves partitioning the error or residual sum of squares into the
following components

15
SSE = SSP E + SSLOF

where SSP E is the sum of squares attributable to pure error, and SSLOF is
the sum of squares attributable to the lack of fit of the model.

The test requires that there be replicates (replication) at one or more values
of the predictor/explanatory variable (X).

Suppose we have n total observations such that

y11 , y12 , ..., y1n1 repeated observations at X1 .
y21 , y22 , ..., y2n2 repeated observations at X2 .
.
.
.
ym1 , ym2 , ..., ymnm repeated observations at Xm .
NB: There are m distinct levels of X.
To develop the partitioning of SSE. Note that the (ij)th residual is

(yi − ŷi ) = (yij − ȳi ) + (ȳi − ŷi )

where ȳi is the average of the ni observations at Xi . Squaring both sides and
summing over i and j yields
ni
m X ni
m X m
2 2
ni (ȳi − ŷi )2
X X X
(yi − ŷi ) = (yij − ȳi ) +
i=1 j=1 i=1 j=1 i=1

SSE = SSP E + SSLOF

Since the cross-product term equals zero.

There are n − m degrees of freedom associated with the pure error sum of
squares. The sum of squares for lack of fit is simply SSLOF = SSE − SSP E
and it has m − 2 degrees of freedom.

16
The test statistic for lack of fit would then be

SSLOF
∗ (m−2) M SLOF
F = SSP E =
(n−m)
M SP E

F ∗ ∼ F(m−2,n−m)

We would reject H0 if F ∗ > Fα (m − 2, n − m)

This test procedure may be easily introduced into the analysis of variance
conducted for the significance of regression.

If H0 is not rejected ⇒ model must be abandoned and attempt must be made

to find a more appropriate model. If not rejected ⇒ no apparent reason to
doubt the adequacy of the model.

Exercise 6 The following data set gives the cost of maintenance of a tractor
(Y) and the age of that tractor (X)

Age X Cost Y
4.5 62
4.5 105
4.5 103
4.0 50
4.0 72
5.0 68
5.0 89
5.5 99
1.0 16
1.0 18
6.0 76
2.5 98
2.5 47
2.5 55

17
(a) Fit a simple linear regression model to the data.

(b) Construct the ANOVA table and use the F test to test the significance
of the regression with α = 0.05.

(d) Test for lack of fit using α = 0.05.

Solution

(a)

yi = β0 + β1 xi + i

ŷi = βˆ0 + βˆ1 xi

where βˆ0 = ȳ − βˆ1 x̄ and βˆ1 = Sxy

Sxx

xy − nx̄ȳ
P
βˆ1 = P 2
x − nx̄2
4003.9 − 14(3.7256)(68.4286)
=
227.14 − 14(3.7256)2
431.9286
= = 13.2866
32.5086

βˆ0 = 68.4286 − (13.2866)(3.7256) = 18.8885

Therefore ŷi = 18.8885 + 13.2866xi

(b)

yi2 − nȳ 2
X
SST =
= 76702 − 14(68.4286)2 = 11147.4286

18
SSR = βˆ1 Sxy
= 13.2866(431.9286) = 5738.8625

SSE = SST − SSR = 5408.5661

d.f for SSR is 1

d.f for SSE are n − 2 = 12
d.f for SST are n − 1 = 13

SSR
M SR = 1
= SSR
SSE
M SE = 12
= 450.7138

M SR
F = M SE
= 12.7328

Table 3: ANOVA TABLE

Source of Variation Sum of Squares degrees of freedom Mean Square F

Regression 5738.8625 1 5738.8625 12.7328
Error 5408.5661 12 450.7138
Total 11147.4286 13

H0 : β1 = 0
H1 : β1 6= 0

Test statistic: F = 12.7328

Testing at α = 0.05, we reject H0 if F > F0.05 (1, 12) = 4.75

Since F > 4.75, we reject H0 and conclude that the regression is sig-
nificant.

19
(c) H0 : β0 = 0
H1 : β0 6= 0

βˆ0 −β0
Test statistic: t = s.e(βˆ0 )
∼ tα (n − 2)

Rejection Criteria
We reject H0 if t > t α2 (n − 2) = t0.005 (12) = 3.05

Test statistic

βˆ0
t=
ˆ 0)
s.e(beta
βˆ0
=
2 1 ˆ
2

σ n + Sx̄xx
18.8885
= = 1.2603
14.9878

Since t < 3.05, we fail to reject H0 and conclude that the regression
constant is not significant.

(d) H0 : The simple linear regression model is correct. (E(Y ) = β0 + β1 X)

H1 : The simple linear regression model is not correct. (E(Y ) 6= β0 +
β1 X)

(yij − y¯.j )2
PP
xi yij y¯.j d.f
4.5 62 105 103 90 1178 2
4.0 50 72 61 242 1
5.0 68 89 78.5 220.5 1
5.5 99 99 0 0
1.0 16 18 17 2 1
6.0 76 76 0 0
2.5 98 47 55 66.7 1504.6667 2

20
nj
7 X
(yij − y¯.j )2 = 3147.166667
X
SSP E =
j=1 i=1

m = 7, therefore d.f = n − m = 14 − 7 = 7
SSLOF = SSE − SSP E = 2261.399433, d.f = m − 2 = 7 − 2 = 5

M SP E = SS
n−m
PE
= 449.5952381
SSLOF
M SLOF = m−2 = 452.2798866
M SLOF
F = M SP E
= 1.005971257

Table 4: ANOVA TABLE

Source of Variation Sum of Squares degrees of freedom Mean Square F

Regression 5738.8625 1 5738.8625 12.7328
Error 5408.5661 12 450.7138
Lack of Fit 2261.3994 5 452.2799 1.0060
Pure Error 3147.1667 7 449.5952
Total 11147.4286 13

We reject H0 if F > F0.05 (5, 7) = 3.97

Since F < 3.97, we fail to reject H0 and conclude that there is no
sufficient evidence to say that the simple linear regression model is not
correct.

Activity 3.1
1 To understand better the plight of people returning to professions after
periods of inactivity, a survey was conducted at some 67 randomly se-
lected hospitals throughout Zimbabwe. The administrators were asked
about their willingness to hire medical technologists who had been away
from the field for a certain number of years. The results are summarized
in table below

21
Years of Inactivity X Percentage of Hospitals willing to Hire Y
0.25 100
1.5 94
4 75
8 44
13 28
18 17

(a) Draw a scatter plot for this data and comment.

(b) Fit a straight line.
(c) Construct an ANOVA table and test for the significance of the
regression line.
(d) Calculate the coefficient of determination and comment on the
adequacy of the model.
(e) Compute the coefficient of variation (CV) and comment on the
goodness of fit of the model.
(f) Estimate the percent of Hospitals willingness to hire given that
some medical practitioner has been inactive for 11 years and also
estimate the number of years of inactivity for which the percent
of hospitals willingness to hire is 50%.
(g) Compute 95% confidence intervals for β0 and β1 .

2 Given the following data

22
Years of Inactivity X Percentage of Hospitals willing to Hire Y
4 3.9
8 8.1
12.5 12.4
16 16
20 19.8
25 25
31 31.1
36 35.8
40 40.1

Fit the data with a regression equation and test whether β1 = 0.

(3) Suppose that the following data set on the cost of maintenance of a
tractor (Y) and the age of that tractor (X) was collected in Mashona-
land West amongst commercial farmers.

23
Age X Cost Y
4 55
4 85
4 100
3.5 50
3.5 72
4.5 78
4.5 92
5.0 105
1.0 25
1.0 29
3.0 52
1.5 37
1.5 47
1.5 55

(a) Fit a simple linear regression model to the data.

(b) Construct the ANOVA table and use the F test to test the signif-
icance of the regression with α = 0.05.
(c) Test the significance of the regression constant (the intercept) us-
ing α = 0.05.
(d) Test for lack of fit using α = 0.05.

MATH4512 2022spring HW4Solution
No ratings yet
MATH4512 2022spring HW4Solution
5 pages
Applied Linear Regression Models 4th Ed Note
No ratings yet
Applied Linear Regression Models 4th Ed Note
46 pages
Chapter 2: Simple Linear Regression
No ratings yet
Chapter 2: Simple Linear Regression
58 pages
简单线性回归分析Simple Linear Regression PDF
No ratings yet
简单线性回归分析Simple Linear Regression PDF
8 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Regression Analysis: Basic Concepts: 1 The Simple Linear Model
No ratings yet
Regression Analysis: Basic Concepts: 1 The Simple Linear Model
4 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Simple Linear Regression: Parameters
No ratings yet
Simple Linear Regression: Parameters
34 pages
Notes 516 Summer 09 Part 2
No ratings yet
Notes 516 Summer 09 Part 2
15 pages
Lecture 6
No ratings yet
Lecture 6
33 pages
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
No ratings yet
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
12 pages
Regression Analysis
No ratings yet
Regression Analysis
37 pages
Definition of Simple Linear Regression
No ratings yet
Definition of Simple Linear Regression
9 pages
Reg02
No ratings yet
Reg02
46 pages
TSNotes 1
No ratings yet
TSNotes 1
29 pages
Is The Dependent Variable Related To The Independent Variable?
No ratings yet
Is The Dependent Variable Related To The Independent Variable?
10 pages
Statistic SimpleLinearRegression
No ratings yet
Statistic SimpleLinearRegression
7 pages
Simple Linear Regression, Cont.: BIOST 515 January 13, 2004
No ratings yet
Simple Linear Regression, Cont.: BIOST 515 January 13, 2004
23 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
Chapter 02
No ratings yet
Chapter 02
14 pages
Chapter 9 Simple Linear Regression and Correlation (1) (1)
No ratings yet
Chapter 9 Simple Linear Regression and Correlation (1) (1)
56 pages
Simple Linear Regression and Correlation: Abrasion Loss vs. Hardness
No ratings yet
Simple Linear Regression and Correlation: Abrasion Loss vs. Hardness
23 pages
SRM Notes
No ratings yet
SRM Notes
38 pages
Brief Notes #11 Linear Regression
No ratings yet
Brief Notes #11 Linear Regression
6 pages
FCDS - RA ch2 Sp21
No ratings yet
FCDS - RA ch2 Sp21
17 pages
324.20
No ratings yet
324.20
2 pages
Simple Linear Regression Analysis - Final
No ratings yet
Simple Linear Regression Analysis - Final
46 pages
Simple Linear
No ratings yet
Simple Linear
10 pages
Chap02-5 (Autosaved)
No ratings yet
Chap02-5 (Autosaved)
66 pages
Chapter5-Multiple_Linear_Regression
No ratings yet
Chapter5-Multiple_Linear_Regression
5 pages
Simple Linear Regression and Multiple Linear Regression: MAST 6474 Introduction To Data Analysis I
No ratings yet
Simple Linear Regression and Multiple Linear Regression: MAST 6474 Introduction To Data Analysis I
15 pages
Lec3 ppt2019
No ratings yet
Lec3 ppt2019
18 pages
C1 English
No ratings yet
C1 English
26 pages
An Introduction To Statistical Learning
No ratings yet
An Introduction To Statistical Learning
19 pages
Linear Models
No ratings yet
Linear Models
35 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
Financial Economic Formula Sheet
No ratings yet
Financial Economic Formula Sheet
8 pages
Chapter2 (Simple Linear Regression)
No ratings yet
Chapter2 (Simple Linear Regression)
11 pages
Chapter 3 - Presentation
No ratings yet
Chapter 3 - Presentation
54 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
SimpleLinearRegression 150107
No ratings yet
SimpleLinearRegression 150107
25 pages
K 14slr PDF
No ratings yet
K 14slr PDF
49 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
8. Linear Regression
No ratings yet
8. Linear Regression
29 pages
ECMT1020 Formulas 2021
No ratings yet
ECMT1020 Formulas 2021
9 pages
10 Inference For Regression Part2
No ratings yet
10 Inference For Regression Part2
12 pages
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
No ratings yet
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
20 pages
Chapter5-Multiple Linear Regression
No ratings yet
Chapter5-Multiple Linear Regression
5 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Biostat Lecture 10
No ratings yet
Biostat Lecture 10
47 pages
Properties of Least Square Estimation
No ratings yet
Properties of Least Square Estimation
3 pages
Lecture 3 Tests of Hypotheisis About Regression Coefficients
No ratings yet
Lecture 3 Tests of Hypotheisis About Regression Coefficients
10 pages
BS Classes V2
No ratings yet
BS Classes V2
70 pages
Econometrics_3
No ratings yet
Econometrics_3
7 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Transformation of Axes (Geometry) Mathematics Question Bank
From Everand
Transformation of Axes (Geometry) Mathematics Question Bank
Mohmmad Khaja Shareef
3/5 (1)
Introduction to Bessel Functions
From Everand
Introduction to Bessel Functions
Frank Bowman
2.5/5 (1)
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Stage 1 Isolating Boiler Feed Pump 1. Purpose
No ratings yet
Stage 1 Isolating Boiler Feed Pump 1. Purpose
3 pages
Zimbabwe Power Company Internal Correspondence: Subject: Coal Segregation Suspension
No ratings yet
Zimbabwe Power Company Internal Correspondence: Subject: Coal Segregation Suspension
1 page
MEPE502 Steam Turbines 6
No ratings yet
MEPE502 Steam Turbines 6
16 pages
Chapter1-Introduction To Regression Analysis
No ratings yet
Chapter1-Introduction To Regression Analysis
12 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
31 pages
MEPE502 SubSynchronous Oscillations 8
No ratings yet
MEPE502 SubSynchronous Oscillations 8
23 pages
MEPE502 - Equations of Motion - 4
No ratings yet
MEPE502 - Equations of Motion - 4
22 pages
Small-Signal Stability Analysis of Power Systems
No ratings yet
Small-Signal Stability Analysis of Power Systems
2 pages
Coal Plant Defects
No ratings yet
Coal Plant Defects
1 page
Equation Formula: 1 1 FV PMT 1
No ratings yet
Equation Formula: 1 1 FV PMT 1
3 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
نموذج تسعير الأصول الرأسمالية (CAPM) مع مقياس السحب
No ratings yet
نموذج تسعير الأصول الرأسمالية (CAPM) مع مقياس السحب
10 pages
OTE Assignment 1
No ratings yet
OTE Assignment 1
2 pages
Multivariate Statistics Introduction
No ratings yet
Multivariate Statistics Introduction
20 pages
Regression An Ova
No ratings yet
Regression An Ova
24 pages
Geographically Weighted Poisson Regression (GWPR) : Prediksi Jumlah Kasus Baru Kusta Dengan Metode
No ratings yet
Geographically Weighted Poisson Regression (GWPR) : Prediksi Jumlah Kasus Baru Kusta Dengan Metode
11 pages
Advanced Regression and Model Selection: Upgrad Live Session - Ankit Jain
No ratings yet
Advanced Regression and Model Selection: Upgrad Live Session - Ankit Jain
18 pages
A Statistical Comparison of Objetive Functions For The Vehicle Routing Problem With Route Balancing
No ratings yet
A Statistical Comparison of Objetive Functions For The Vehicle Routing Problem With Route Balancing
12 pages
AnBis Gasal 2324 - Sesi 2
No ratings yet
AnBis Gasal 2324 - Sesi 2
73 pages
Tutorial3
No ratings yet
Tutorial3
13 pages
Stat982 (Chap14) Q Set
100% (1)
Stat982 (Chap14) Q Set
30 pages
Nama: Suryaningtyas Dharma Putri NRP: 150114390 1. Uji Beda Between Design Eksperimen Kontrol 3 2 4 3 6 2 8 2
No ratings yet
Nama: Suryaningtyas Dharma Putri NRP: 150114390 1. Uji Beda Between Design Eksperimen Kontrol 3 2 4 3 6 2 8 2
9 pages
GDP Forecasting Using Time Series Analysis
No ratings yet
GDP Forecasting Using Time Series Analysis
15 pages
IMS-module 3
No ratings yet
IMS-module 3
7 pages
Unit V Probability Distributions
No ratings yet
Unit V Probability Distributions
52 pages
Handout 6 Causality
No ratings yet
Handout 6 Causality
16 pages
Heckman Selection Model
No ratings yet
Heckman Selection Model
9 pages
Risk and Return New - II
No ratings yet
Risk and Return New - II
35 pages
ManScie Quiz 1 Midterms Answer Key
No ratings yet
ManScie Quiz 1 Midterms Answer Key
8 pages
Stock N Watson
No ratings yet
Stock N Watson
3 pages
Design of Experiment Question
100% (1)
Design of Experiment Question
3 pages
Chapter 18 Answers
No ratings yet
Chapter 18 Answers
7 pages
Time Series Analysis Using e Views
100% (1)
Time Series Analysis Using e Views
131 pages
Discount Rate
No ratings yet
Discount Rate
7 pages
Cowan Statistical Data Analysis
No ratings yet
Cowan Statistical Data Analysis
10 pages
Stat Unit-5
No ratings yet
Stat Unit-5
43 pages
MFIN6214 Lecture1 2020T3
No ratings yet
MFIN6214 Lecture1 2020T3
29 pages
Syllabus
No ratings yet
Syllabus
3 pages

Chapter3-Goodness of Fit Tests

Uploaded by

Chapter3-Goodness of Fit Tests

Uploaded by

STAT112 APPLIED STATISTICS II

CHAPTER 3 GOODNESS OF FIT OF THE SIMPLE LINEAR

Properties of the Estimators

For the slope (β1 ), we have

E[βˆ0 ] = E[ȳ − βˆ1 x̄]

The variance of βˆ1 is given by

(ii) under the regression model assumptions, s2 is an unbiased estimator of

Making inference about parameters, β0 and β1

β1 = 0 means that changes in x have no effect on y.

We reject H0 if | t |> tn−2 ( α2 )

We reject H0 if | t |> tn−2 ( α2 )

Confidence interval for β0 and β1

(ii) The (1 − α)100% confidence interval for β0 is given by

Green Liquor (N a2 S) Conc.(yi ) Fitted (N a2 S) conc. ŷi Production Residual ri

The fitted linear regression model is , ŷi = −16.5093 + 0.0694xi

Testing at 5% level of significance, tn−2 ( α2 ) = t11 (0.025) = 2.20

(b) The (1 − α)100% confidence interval for β0 is given by

⇒ 95% confidence interval of β0 is

ANALYSIS OF VARIANCE APPROACH TO

To measure the degree of linear relationship between X and Y in the

If there is a lot of variability in the yi s, then SST is large. If SST = 0 → all

Error Sum of Squares

Regression Sum of Squares

If SSR = 0, then the regression calculation will not reduce variability

SSR can be interpreted as the proportion of variation in Y explained

SST = SSR + SSE

i=1 i=1 i=1

The computational formulas for the above are as follows;

and SSE = SST − SSR.

to remain with n − 1 degrees of freedom in the n-deviations.

Some properties of mean squares

(i) E[M SE] = σ 2

(ii) E[M SR] = σ 2 + β12 (x − i − x̄)2

E[M SE]. Hence, if β − 1 6= 0, MSR will tend to be larger than MSE.

Table 1: BASIC ANOVA TABLE

Source of Variation Sum of Squares degrees of freedom Mean Square F

Our hypotheses are as follows

We reject H0 if F > F1,n−2 (1 − α)

(c) Find the standard errors of β0 and β1 .

Therefore, ŷi = 331.2364 + 0.5455xi

Table 2: ANOVA TABLE

Source of Variation Sum of Squares degrees of freedom Mean Square F

Test statistic : F = 124.6394

Coefficient of Determination (R2)

We often refer loosely to R2 as the amount of variability in the data explained

Exercise 5 Referring to the previous example;

⇒ our model is a good fit to the data.

A small value, suggesting minimal variation about the regression line

The Lack of Fit test

Suppose we have n total observations such that

(yi − ŷi ) = (yij − ȳi ) + (ȳi − ŷi )

SSE = SSP E + SSLOF

We would reject H0 if F ∗ > Fα (m − 2, n − m)

If H0 is not rejected ⇒ model must be abandoned and attempt must be made

(d) Test for lack of fit using α = 0.05.

ŷi = βˆ0 + βˆ1 xi

where βˆ0 = ȳ − βˆ1 x̄ and βˆ1 = Sxy

βˆ0 = 68.4286 − (13.2866)(3.7256) = 18.8885

Therefore ŷi = 18.8885 + 13.2866xi

SSE = SST − SSR = 5408.5661

d.f for SSR is 1

Table 3: ANOVA TABLE

Source of Variation Sum of Squares degrees of freedom Mean Square F

Test statistic: F = 12.7328

Testing at α = 0.05, we reject H0 if F > F0.05 (1, 12) = 4.75

(d) H0 : The simple linear regression model is correct. (E(Y ) = β0 + β1 X)

Table 4: ANOVA TABLE

Source of Variation Sum of Squares degrees of freedom Mean Square F

We reject H0 if F > F0.05 (5, 7) = 3.97

(a) Draw a scatter plot for this data and comment.

2 Given the following data

Fit the data with a regression equation and test whether β1 = 0.

(a) Fit a simple linear regression model to the data.

You might also like