Chapter 3 Multiple Linear Regression: Ray-Bing Chen Institute of Statistics National University of Kaohsiung
Chapter 3 Multiple Linear Regression: Ray-Bing Chen Institute of Statistics National University of Kaohsiung
Ray-Bing Chen
Institute of Statistics
National University of Kaohsiung
1
3.1 Multiple Regression Models
2
• E(y) = 50 +10 x1 + 7 x2
3
• The response y may be related to k regressor or
predictor variables: (multiple linear regression
model)
4
• Multiple linear regression models are often used
as the empirical models or approximating
functions. (True model is unknown)
• The cubic model:
7
8
3.2 Estimation of the Model
Parameters
3.2.1 Least-squares Estimation of the Regression
Coefficients
• n observations (n > k)
• Assume
– The error term , E() = 0 and Var() = 2
– The errors are uncorrelated.
– The regressor variables, x1,…, xk are fixed.
9
• The sample regression model:
10
• Matrix notation:
11
• The least-squares function:
12
• The fitted model corresponding to the levels of the
regressor variable, x:
13
• Example 3.1 The Delivery Time Data
– y: the delivery time,
– x1: the number of cases of product stocked,
– x2: the distance walked by the route driver
– Consider y = 0 + 1 x1 + 2 x2 +
14
15
16
3.2.2 A Geometrical Interpretation of Least
Square
• y = (y1,…,yn) is the vector of observations.
• X contains p (p = k+1) column vectors (n ×1), i.e.
X = (1,x1,…,xk)
• The column space of X is called the estimation
space.
• Any point in the estimation space is X.
• Minimize square distance
S()=(y-X)’(y-X)
17
• Normal equation: X ' ( y Xˆ ) 0
18
3.2.3 Properties of the Least Square Estimators
• Unbiased estimator:
E ( ˆ ) E (( X ' X ) 1 X ' y ) E (( X ' X ) 1 X ' X )
• Covariance matrix:
Cov( ˆ ) 2 ( X ' X ) 1
• Let C=(X’X)-1
19
3.2.4 Estimation of 2
• Residual sum of squares:
SS Re s e' e
( y Xˆ )' ( y Xˆ )
y ' y 2ˆ ' X ' y ˆ ' ( X ' X ) ˆ
y ' y ˆ ' X ' y
• The degree of freedom: n – p
• The unbiased estimator of 2: Residual mean
squares
SS Re s
MS Re s
n p 20
• Example 3.2 The Delivery Time Data
24
3.2.6 Maximum-Likelihood Estimation
• The Model is y = X +
• ~N(0, 2I)
• The likelihood function and log-likelihood
function:
1
L( , )
2
exp( ( y X )' ( y X ) /( 2 2 ))
(2 2 ) n / 2
n 1
l ( , ) (ln( 2 ) ln( ))
2 2
( y X )' ( y X )
2 2 2
• The MLE of 2
25
3.3 Hypothesis Testing in Multiple
Linear Regression
• Questions:
– What is the overall adequacy of the model?
– Which specific regressors seem important?
• Assume the errors are independent and follow a
normal distribution with mean 0 and variance 2
26
3.3.1 Test for Significance of Regression
• Determine if there is a linear relationship between
y and xj, j = 1,2,…,k.
• The hypotheses are
H0: β1 = β2 =…= βk = 0
H1: βj 0 for at least one j
• ANOVA
• SST = SSR + SSRes
• SSR/2 ~ 2k, SSRes/2 ~ 2n-k-1, and SSR and SSRes
are independent
SS R / k MS R
F0 ~ Fk ,nk 1
SS Re s /( n k 1) MS Re s
27
• E ( MS Re s ) 2
*'
X '
X *
E ( MS R ) 2 c c
k 2
* ( 1 ,..., k )'
x11 x1 x1k x k
Xc
x x x x
n1 1 nk k
29
30
• Example 3.3 The Delivery Time Data
31
• R2 and Adjusted R2
– R2 always increase when a regressor is added to
the model, regardless of the value of the
contribution of that variable.
– An adjusted R2:
SS Re s /( n p)
R 2
1
SST /( n 1)
adj
32
3.3.2 Tests on Individual Regression Coefficients
• For the individual regression coefficient:
– H0: βj = 0 v.s. H1: βj 0
– Let Cjj be the j-th diagonal element of (X’X)-1.
The test statistic:
ˆ j ˆ j
t0 ~ t nk 1
ˆ 2 C
jj
se( ˆ )
j
34
• The subset of regressors:
35
• For the full model, the regression sum of square
SS R ( ) ˆ ' X ' y
• Under the null hypothesis, the regression sum of
squares for the reduce model
SS ( ) ˆ ' X ' y
R 1 1 1
• The degree of freedom is p-r for the reduce model.
• The regression sum of square due to β2 given β1
SS R ( 2 | 1 ) SS R ( ) SS R (1 )
• This is called the extra sum of squares due to β2
and the degree of freedom is p - (p - r) = r
• The test statistic
SS R ( 2 | 1 ) / r
F0 ~ Fr ,n p 36
MS Re s
• If β2 0, F0 follows a noncentral F distribution
with
1
2' X 2' [ I X 1 ( X 1' X 1 ) 1 X 1' ] X 2 2
2
• Multicollinearity: this test actually has no power!
• This test has maximal power when X1 and X2 are
orthogonal to one another!
• Partial F test: Given the regressors in X1, measure
the contribution of the regressors in X2.
37
• Consider y = β0 + β1 x1 + β2 x2 + β3 x3 +
SSR(β1| β0 , β2, β3), SSR(β2| β0 , β1, β3)
and SSR(β3| β0 , β2, β1) are signal-degree-of –
freedom sums of squares.
• SSR(βj| β0 ,…, βj-1, βj, … βk) : the
contribution of xj as if it were the last variable
added to the model.
• This F test is equivalent to the t test.
• SST = SSR(β1 ,β2, β3|β0) + SSRes
• SSR(β1 ,β2 , β3|β0) = SSR(β1|β0) +
SSR(β2|β1, β0) + SSR(β3 |β1, β2, β0)
38
• Example 3.5 Delivery Time Data
39
3.3.3 Special Case of Orthogonal Columns in X
• Model: y = Xβ + = X1β1+ X2β2 +
• Orthogonal: X1’X2 = 0
• Since the normal equation (X’X)β= X’y,
X 1' X 1 0 ˆ1 X 1' y
'
0 X 2 X 2 ˆ 2 X 2 y
'
40
41
3.3.4 Testing the General Linear Hypothesis
• Let T be an m p matrix, and rank(T) = r
• Full model: y = Xβ +
43
44
• Another form:
ˆ ' T '[T ( X ' X ) 1 T ' ]1 Tˆ / r
F
SS Re s ( FM ) /( n p)
• H0: Tβ = c v.s. H1: Tβ c Then
(Tˆ c)' [T ( X ' X ) 1 T ' ] 1 (Tˆ c) / r
F ~ Fr ,n p
SS Re s ( FM ) /( n p )
45