0% found this document useful (0 votes)
44 views

Chapter 3 Multiple Linear Regression: Ray-Bing Chen Institute of Statistics National University of Kaohsiung

1) This document summarizes key concepts in multiple linear regression including the multiple linear regression model, least squares estimation of regression coefficients, hypothesis testing in multiple linear regression, and tests on individual regression coefficients. 2) Multiple linear regression models involve more than one predictor variable and are often used when the true model is unknown. Least squares estimation is used to estimate the regression coefficients by minimizing the residual sum of squares. 3) Hypothesis tests in multiple linear regression include tests for the significance of the overall regression and tests on individual regression coefficients. These include the F-test to evaluate the significance of the regression and t-tests to evaluate the significance of individual predictors.

Uploaded by

Trina Mae Garcia
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Chapter 3 Multiple Linear Regression: Ray-Bing Chen Institute of Statistics National University of Kaohsiung

1) This document summarizes key concepts in multiple linear regression including the multiple linear regression model, least squares estimation of regression coefficients, hypothesis testing in multiple linear regression, and tests on individual regression coefficients. 2) Multiple linear regression models involve more than one predictor variable and are often used when the true model is unknown. Least squares estimation is used to estimate the regression coefficients by minimizing the residual sum of squares. 3) Hypothesis tests in multiple linear regression include tests for the significance of the overall regression and tests on individual regression coefficients. These include the F-test to evaluate the significance of the regression and t-tests to evaluate the significance of individual predictors.

Uploaded by

Trina Mae Garcia
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 45

Chapter 3 Multiple Linear Regression

Ray-Bing Chen
Institute of Statistics
National University of Kaohsiung

1
3.1 Multiple Regression Models

• Multiple regression model: involve more than one


regressor variable.
• Example: The yield in pounds of conversion
depends on temperature and the catalyst
concentration.

2
• E(y) = 50 +10 x1 + 7 x2

3
• The response y may be related to k regressor or
predictor variables: (multiple linear regression
model)

• The parameter j represents the expected change


in the response y per unit change in xi when all of
the remaining regressor variables xj are held
constant.

4
• Multiple linear regression models are often used
as the empirical models or approximating
functions. (True model is unknown)
• The cubic model:

• The model with interaction effects:

• Any regression model that is linear in the


parameters is a linear regression model, regardless
of the shape of the surface that it generates. 5
6
• The second-order model with interaction:

7
8
3.2 Estimation of the Model
Parameters
3.2.1 Least-squares Estimation of the Regression
Coefficients
• n observations (n > k)
• Assume
– The error term , E() = 0 and Var() = 2
– The errors are uncorrelated.
– The regressor variables, x1,…, xk are fixed.

9
• The sample regression model:

• The least-squares function:

• The normal equations:

10
• Matrix notation:

11
• The least-squares function:

12
• The fitted model corresponding to the levels of the
regressor variable, x:

• The hat matrix, H, is an idempotent matrix and is a


symmetric matrix. i.e. H2 = H and HT = H
• H is an orthogonal projection matrix.
• Residuals:

13
• Example 3.1 The Delivery Time Data
– y: the delivery time,
– x1: the number of cases of product stocked,
– x2: the distance walked by the route driver
– Consider y = 0 + 1 x1 + 2 x2 + 

14
15
16
3.2.2 A Geometrical Interpretation of Least
Square
• y = (y1,…,yn) is the vector of observations.
• X contains p (p = k+1) column vectors (n ×1), i.e.
X = (1,x1,…,xk)
• The column space of X is called the estimation
space.
• Any point in the estimation space is X.
• Minimize square distance
S()=(y-X)’(y-X)

17
• Normal equation: X ' ( y  Xˆ )  0

18
3.2.3 Properties of the Least Square Estimators
• Unbiased estimator:
E ( ˆ )  E (( X ' X ) 1 X ' y )  E (( X ' X ) 1 X ' X )  
• Covariance matrix:
Cov( ˆ )   2 ( X ' X ) 1
• Let C=(X’X)-1

• The LSE is the best linear unbiased estimator


• LSE = MLE under normality assumption

19
3.2.4 Estimation of 2
• Residual sum of squares:
SS Re s  e' e
 ( y  Xˆ )' ( y  Xˆ )
 y ' y  2ˆ ' X ' y  ˆ ' ( X ' X ) ˆ
 y ' y  ˆ ' X ' y
• The degree of freedom: n – p
• The unbiased estimator of 2: Residual mean
squares
SS Re s
MS Re s 
n p 20
• Example 3.2 The Delivery Time Data

• Both estimates are in a sense correct, but they


depend heavily on the choice of model.
• The model with small variance would be better.
21
3.2.5 Inadequacy of Scatter Diagrams in Multiple
Regression
• For the simple linear regression, the scatter
diagram is an important tool in analyzing the
relationship between y and x.
• However it may not be useful in multiple
regression.
– y = 8 – 5 x1 + 12 x2
– The y v.s. x1 plot do not exhibit any apparent
relationship between y and x1
– The y v.s. x2 plot indicates the linear
relationship with the slope  8.
22
23
• In this case, constructing scatter diagrams of y v.s.
xj (j = 1,2,…,k) can be misleading.
• If there is only one (or a few) dominant regressor,
or if the regressors operate nearly independently,
the matrix scatterplots is most useful.

24
3.2.6 Maximum-Likelihood Estimation
• The Model is y = X + 
•  ~N(0, 2I)
• The likelihood function and log-likelihood
function:
1
L(  ,  ) 
2
exp( ( y  X )' ( y  X ) /( 2 2 ))
(2 2 ) n / 2
n 1
l (  ,  )   (ln( 2 )  ln(  )) 
2 2
( y  X )' ( y  X )
2 2 2

• The MLE of 2

25
3.3 Hypothesis Testing in Multiple
Linear Regression
• Questions:
– What is the overall adequacy of the model?
– Which specific regressors seem important?
• Assume the errors are independent and follow a
normal distribution with mean 0 and variance 2

26
3.3.1 Test for Significance of Regression
• Determine if there is a linear relationship between
y and xj, j = 1,2,…,k.
• The hypotheses are
H0: β1 = β2 =…= βk = 0
H1: βj 0 for at least one j
• ANOVA
• SST = SSR + SSRes
• SSR/2 ~ 2k, SSRes/2 ~ 2n-k-1, and SSR and SSRes
are independent
SS R / k MS R
F0   ~ Fk ,nk 1
SS Re s /( n  k  1) MS Re s
27
• E ( MS Re s )   2
 *'
X '
X  *
E ( MS R )   2  c c

k 2
 *  (  1 ,...,  k )'
 x11  x1  x1k  x k 
 
Xc      
x  x  x  x 
 n1 1 nk k 

• Under H1, F0 follows F distribution with k and n-


k-1 and a noncentrality parameter of
 *' X c' X c  *

2
28
• ANOVA table

29
30
• Example 3.3 The Delivery Time Data

31
• R2 and Adjusted R2
– R2 always increase when a regressor is added to
the model, regardless of the value of the
contribution of that variable.
– An adjusted R2:
SS Re s /( n  p)
R 2
 1
SST /( n  1)
adj

– The adjusted R2 will only increase on adding a


variable to the model if the addition of the
variable reduces the residual mean squares.

32
3.3.2 Tests on Individual Regression Coefficients
• For the individual regression coefficient:
– H0: βj = 0 v.s. H1: βj  0
– Let Cjj be the j-th diagonal element of (X’X)-1.
The test statistic:
ˆ j ˆ j
t0   ~ t nk 1
ˆ 2 C
jj
se( ˆ )
j

– This is a partial or marginal test because any


estimate of the regression coefficient depends
on all of the other regression variables.
– This test is a test of contribution of xj given the
other regressors in the model
33
• Example 3.4 The Delivery Time Data

34
• The subset of regressors:

35
• For the full model, the regression sum of square
SS R (  )  ˆ ' X ' y
• Under the null hypothesis, the regression sum of
squares for the reduce model
SS (  )  ˆ ' X ' y
R 1 1 1
• The degree of freedom is p-r for the reduce model.
• The regression sum of square due to β2 given β1
SS R ( 2 | 1 )  SS R ( )  SS R (1 )
• This is called the extra sum of squares due to β2
and the degree of freedom is p - (p - r) = r
• The test statistic
SS R (  2 | 1 ) / r
F0  ~ Fr ,n p 36
MS Re s
• If β2  0, F0 follows a noncentral F distribution
with
1
  2' X 2' [ I  X 1 ( X 1' X 1 ) 1 X 1' ] X 2  2
2
• Multicollinearity: this test actually has no power!
• This test has maximal power when X1 and X2 are
orthogonal to one another!
• Partial F test: Given the regressors in X1, measure
the contribution of the regressors in X2.

37
• Consider y = β0 + β1 x1 + β2 x2 + β3 x3 + 
SSR(β1| β0 , β2, β3), SSR(β2| β0 , β1, β3)
and SSR(β3| β0 , β2, β1) are signal-degree-of –
freedom sums of squares.
• SSR(βj| β0 ,…, βj-1, βj, … βk) : the
contribution of xj as if it were the last variable
added to the model.
• This F test is equivalent to the t test.
• SST = SSR(β1 ,β2, β3|β0) + SSRes
• SSR(β1 ,β2 , β3|β0) = SSR(β1|β0) +
SSR(β2|β1, β0) + SSR(β3 |β1, β2, β0)

38
• Example 3.5 Delivery Time Data

39
3.3.3 Special Case of Orthogonal Columns in X
• Model: y = Xβ +  = X1β1+ X2β2 + 
• Orthogonal: X1’X2 = 0
• Since the normal equation (X’X)β= X’y,
 X 1' X 1 0  ˆ1   X 1' y 
     ' 
 0 X 2 X 2  ˆ 2   X 2 y 
'

• ˆ1  ( X 1' X 1 ) 1 X 1' y and ˆ 2  ( X 2' X 2 ) 1 X 2' y

40
41
3.3.4 Testing the General Linear Hypothesis
• Let T be an m  p matrix, and rank(T) = r
• Full model: y = Xβ + 

SS Re s ( FM )  y' y  ˆ ' X ' y (n - p degree of freedom)


• Reduced model: y = Z + , Z is an n  (p-r)
matrix and  is a (p-r) 1 vector. Then
ˆ  ( Z ' Z ) 1 Z ' y
SS Re s ( RM )  y ' y  ˆ ' Z ' y (n - p  r degree of freedom)

• The difference: SSH = SSRes(RM) – SSRes(FM)


with r degree of freedom. SSH is called the sum of
squares due to the hypothesis H0: Tβ = 0
42
• The test statistic:
SS H / r
F ~ Fr ,n p
SS Re s ( FM ) /( n  p)

43
44
• Another form:
ˆ ' T '[T ( X ' X ) 1 T ' ]1 Tˆ / r
F
SS Re s ( FM ) /( n  p)
• H0: Tβ = c v.s. H1: Tβ c Then
(Tˆ  c)' [T ( X ' X ) 1 T ' ] 1 (Tˆ  c) / r
F ~ Fr ,n  p
SS Re s ( FM ) /( n  p )

45

You might also like