0% found this document useful (0 votes)
23 views

Reading 5 A

Uploaded by

pratham.khanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Reading 5 A

Uploaded by

pratham.khanna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

SLR-Residuals and Fitted Values I

We continue to consider the Simple Linear Model

yi = β0 + β1 xi + ei , i = 1...n (5)

where the errors e1 , . . . , en are uncorrelated, E (ei ) = 0, and var(ei ) = σ 2 .


The least squares estimators are
Pn
(x − x̄ )(yi − ȳ) Sxy
β̂1 = i=1 Pn i 2
=
i=1 (xi − x̄ ) Sxx

β̂0 = ȳ − β̂1 x̄

In this lecture we are concerned with the fitted values, the residuals, and
the hat matrix, their properties and usage.

Hua Liang (GWU) 2118-M 243 /

Fitted values I

The fitted values, or predicted values, are

ŷi = β̂0 + β̂1 xi

i = 1, 2, . . . , n. These correspond to the y-value on the fitted regression


line at x = xi .
Two alternative forms are:

ŷi = ȳ + β̂1 (xi − x̄ )

n 
X  n
X
1 (xi − x̄ )(xk − x̄ )
ŷi = + yk = hik yk (6)
n Sxx
k =1 k =1

Hua Liang (GWU) 2118-M 244 /


Fitted values II

The vector of fitted values is


 
ŷ1
 
ŷ =  ... 
ŷn

Show (6)
The distribution of ŷi is described by the following:

Hua Liang (GWU) 2118-M 245 /

Fitted values III


Theorem 17 (Distribution of Fitted Values)
1 For i = 1, . . . , n, E (ŷi ) = E (yi ) = β0 + β1 xi ; i.e., ŷi is unbiased for
β0 + β1 xi , the mean response at xi .
2 For i = 1, . . . , n,
 
2 1 (xi − x̄ )2
var(ŷi ) = σ + = σ 2 hii
n Sxx

3 For any i , j ,
 
2 1 (xi − x̄ )(xj − x̄ )
cov(ŷi , ŷj ) = σ + = σ 2 hij
n Sxx

iid
4 If the errors are normally distributed, i.e. ei ∼ N (0, σ 2 ), then

ŷi ∼ N (β0 + β1 xi , σ 2 hii )


Hua Liang (GWU) 2118-M 246 /
Hat Matrix I

The coefficients of yk in (6) define the (n × n) matrix H = (hij )1≤i,j ≤n ,

1 (xi − x̄ )(xj − x̄ )
hij = +
n Sxx

Pn
Since ŷi = k =1 hik yk for all i = 1, . . . , n, we have, in vector notation

ŷ = H y

H is called the hat matrix, since “it puts the hat on y”, or the projection
matrix, for geometric reasons to be discussed later.

Hua Liang (GWU) 2118-M 247 /

Hat Matrix II

Example 18
In the SLR model, take n = 4 and x> = (x1 , x2 , x3 , x4 ) = (1, −1, 2, 2).
The model matrix is
 
1 1
 1 −1
X = 1 x = 1 2 

1 2

We have x̄ = 1, Sxx = 6.

H plays a very, very important role in regression, almost as important as


the model matrix X . It has a few important properties:

Hua Liang (GWU) 2118-M 248 /


Hat Matrix III

Theorem 19 (The Hat Matrix H )


H = X (X > X )−1 X >
H > = H (H is symmetrical)
H 2 = H (H is idempotent)
H > H = H , and HH > = H

Example 20
Consider the example above, with n = 4 and x> = (1, −1, 2, 2).

( Intercept ) xx
1 1 1
2 1 -1
3 1 2
4 1 2
attr ( ," assign ")
[1] 0 1

Hua Liang (GWU) 2118-M 249 /

Hat Matrix IV

> H <- X %* % solve ( t (X) %* % X ) %* % t (X)


> H

1 2 3 4
1 0.25 0.2500 0.2500 0.2500
2 0.25 0.9167 -0.0833 -0.0833
3 0.25 -0.0833 0.4167 0.4167
4 0.25 -0.0833 0.4167 0.4167

> H %* % H

1 2 3 4
1 0.25 0.2500 0.2500 0.2500
2 0.25 0.9167 -0.0833 -0.0833
3 0.25 -0.0833 0.4167 0.4167
4 0.25 -0.0833 0.4167 0.4167

Hua Liang (GWU) 2118-M 250 /


Hat Matrix I

Theorem 21 (Distribution of the fitted values)


>
The vector of fitted values, ŷ = ŷ1 ŷ2 . . . ŷn has the following
properties:
1 ŷ = H y
2 E (ŷ) = X β = E (y)
3 var(ŷ) = σ 2 H
4 If e1 , · · · , en are iid N (0, σ 2 ), then

ŷ ∼ MVN(X β, σ 2 H )

Hua Liang (GWU) 2118-M 251 /

Estimating the mean response I


For Massachussets, the Fuel yi = 543.2321, is this less than expected for
the xi = 15.11179
The expected value is

E (yi ) = β0 + β1 · 15.11179

We estimate it by
ŷi = β̂0 + β̂1 xi
So, for MA ŷi = 215.5162 + 25.2530 · 15.11179 = 597.1342. The variance
is  
2 1 (xi − x̄ )2
var(ŷi ) = σ + = σ 2 hii
n Sxx
A 95% confidence interval for the mean, at x = xi , if σ is known, is
s 
p 1 (xi − x̄ )2
ŷi ± 1.96 ∗ σ hii = ŷi ± 1.96 ∗ σ +
n Sxx
Hua Liang (GWU) 2118-M 252 /
Estimating the mean response II

Usually, σ is unknown, estimated by σ̂. The 95% CI is then


s 
p 1 (xi − x̄ )2
ŷi ± q.025 · σ̂ hii = ŷi ± q.025 · σ̂ +
n Sxx

where q.025 is the quantile of the tn−2 distribution which leaves .025
probability in the upper tail.
Data for state AA is not available in the dataset. Suppose we learn that
the purchase rate in state AA is x ∗ = 17.29; what is the expected Fuel for
state AA, based on our analysis?

E (y|x = x ∗ ) = β0 + β1 x ∗

estimated by
µ̂∗ = β̂0 + β̂1 x ∗

Hua Liang (GWU) 2118-M 253 /

Estimating the mean response III


The mean and variance of this estimator are given by E (µ̂∗ ) = β0 + β1 x ∗ ,
and  
1 (x ∗ − x̄ )2
var(µ̂∗ ) = σ 2 +
n Sxx
with a 95% CI for the mean
s 
1 (x ∗ − x̄ )2
µ̂∗ ± q.025 · σ̂ +
n Sxx

For state AA, µ̂∗ = 215.5162 + 25.2530 ∗ 17.2 = 649.8678, with a 95% CI

649.8678 ± 32.232.

We can draw this 95% CI for each value of x ∗ and get a confidence band
(see figure).

Hua Liang (GWU) 2118-M 254 /


Estimating the mean response IV

800
700
600
Fuel

500
400
300

12 14 16 18

Hua Liang (GWU) 2118-MLog(Mile) 255 /

Prediction I

How can we predict the true value y ∗ ? Is this the same question as
estimating the expected value E (y|x = x ∗ )?
We know that
y ∗ = β0 + β1 x ∗ + e ∗
where e ∗ ∼ N (0, σ 2 ) is a new error, independent of the observed data.
We estimate β0 , β1 by β̂0 , β̂1 , and therefore estimate y ∗ by
ŷ ∗ = β̂0 + β̂1 x ∗ . (Note that ŷ ∗ = µ̂∗ .)
What is the error of this prediction?

Prediction error = y ∗ − ŷ ∗ = (β0 + β1 x ∗ + e ∗ ) − (β̂0 + β̂1 x ∗ )

The mean prediction error is

E (y ∗ − ŷ ∗ ) = E (β0 + β1 x ∗ + e ∗ ) − E (β̂0 + β̂1 x ∗ ) = 0,

which is good!
Hua Liang (GWU) 2118-M 256 /
Prediction II
Show that the variance of the prediction error is
 ∗ − x̄ )2

1 (x
var(y ∗ − ŷ ∗ ) = . . . = σ 2 1 + +
n Sxx

As with the 95% CI and confidence band for the mean, we can compute a
95% prediction interval (and prediction band):
s 
1 (x ∗ − x̄ )2
ŷ ∗ ± 1.96 ∗ σ 1+ +
n Sxx

For state AA, the 95% prediction interval is

649.8678 ± 166.9234

Hua Liang (GWU) 2118-M 257 /

Residuals I
The residuals are

êi = yi − ŷi = yi − (β̂0 + β̂1 xi )

so
yi = ŷi + êi
In vector notation: ê = (ê1 , ê2 . . . , ên )> ,

y = ŷ + ê

Hua Liang (GWU) 2118-M 258 /


Residuals II

Hua Liang (GWU) 2118-M 259 /

Residuals III

Theorem 22 (Distribution of Residuals)


Pn
1 êi = yi − k =1 hik yk
2 E (êi ) = 0, i = 1, . . . , n
3
 
1 (xi − x̄ )2
var(êi ) = 1 − − = (1 − hii )σ 2
n Sxx
 
1 (xi −x̄ )(xj −x̄ )
4 for i 6= j , cov(êi , êj ) = −σ 2 n + Sxx = −σ 2 hij
iid
5 If ei ∼ N (0, σ 2 ), then b
ei ∼ N (0, (1 − hii )σ 2 )

Hua Liang (GWU) 2118-M 260 /


Residuals IV

Theorem 23 (Distribution of Residuals)


1 ê = (I − H )y
2 E (ê) = 0
3 var(ê) = σ 2 (In − H )
4 If e ∼ MVN(0, σ 2 I), then ê ∼ MVN(0, σ 2 (I − H ))

Properties of the residuals and fitted values


yi = ŷi + êi
P
êi = 0
P
xi êi = 0
P 2 P 2 P 2
yi = ŷi + êi
E (yi ) = E (ŷi ) + E (êi )
cov(ŷi , êi ) = 0!!!

Hua Liang (GWU) 2118-M 261 /

Residuals V

var(yi ) = var(ŷi ) + var(êi )


cov(yi , yj ) = cov(ŷi , ŷj ) + cov(êi , êj )

Hua Liang (GWU) 2118-M 262 /

You might also like