Ch3slides Multiple Linear Regression
Ch3slides Multiple Linear Regression
• Least-square estimation
• Inference
• Some issues
Multiple Linear Regression
• y: response, x1 , . . . , xk : predictors
• β0 , β1 , . . . , βk : coefficients
• : error term
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 3/61
Multiple Linear Regression
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 4/61
Multiple Linear Regression
Remark. Some of the new predictors in the model could be powers of the
original ones
y = β0 + β1 x + β2 x2 + · · · + βk xk +
or interactions of them,
y = β0 + β1 x1 + β2 x2 + β12 x1 x2 +
These are still linear models (in terms of the regression coefficients).
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 5/61
Multiple Linear Regression
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 6/61
Multiple Linear Regression
Cov(i , j ) = 0, i 6= j
(Like in simple linear regression, we will add the normality and independence
assumptions when we get to the inference part)
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 7/61
Multiple Linear Regression
Letting
y1 1 x11 x12 · · · x1k β0 1
y2 1 x21 x22 · · · x2k β1 2
y = . , X = . , β = . , = . .
.. .. .. .. .. .. .. ..
. . . .
yn 1 xn1 xn2 · · · xnk βk n
X · β + |{z}
y = |{z} (3)
|{z} |{z}
n×1 n×p p×1 n×1
β̂ = (X0 X)−1 X0 y
Remark. The nonsingular condition holds true if and only if all the columns
of X are linearly independent (i.e. X is of full column rank).
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 10/61
Multiple Linear Regression
Remark. This is the same formula for β̂ = (β̂0 , β̂1 )0 in simple linear
regression. To demonstrate it, consider the toy data set of 3 points:
(0, 1), (1, 0), (2, 2) used before. The new formula gives that
β̂ = (X0 X)−1 X0 y
−1
" # 1 0 " # 1
1 1 1 1 1 1
= 1 1 0
0 1 2 0 1 2
1 2 2
" #−1 " #
3 3 3
=
3 5 4
" #
0.5
=
0.5
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 11/61
Multiple Linear Regression
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 12/61
Multiple Linear Regression
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 13/61
Multiple Linear Regression
X0 Xβ̂ = X0 y
is
X X X X
nβ̂0 + β̂1 xi1 + β̂2 xi2 + · · · + β̂k xik = yi
which simplifies to
This indicates that the centroid of the data, i.e., (x̄1 , . . . , x̄k , ȳ), is on the
least squares regression plane.
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 14/61
Multiple Linear Regression
e = y − ŷ = (I − H)y.
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 15/61
Multiple Linear Regression
b y
e = (I − H)y
b
ŷ = Hy
b
0
Col(X)
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 16/61
Multiple Linear Regression
(R demonstration in class).
1
http://jse.amstat.org/v11n2/datasets.heinz.html
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 17/61
Multiple Linear Regression
*To perform these two inference tasks, we will additionally assume that
the model errors i are normally and independently distributed with mean
iid
0 and variance σ 2 , i.e., 1 , . . . , n ∼ N(0, σ 2 ).
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 18/61
Multiple Linear Regression
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 19/61
Multiple Linear Regression
E(β̂) = β.
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 20/61
Multiple Linear Regression
Proof. We have
β̂ = (X0 X)−1 X0 y
= (X0 X)−1 X0 (Xβ + )
= (X0 X)−1 X0 · Xβ + (X0 X)−1 X0 ·
= β + (X0 X)−1 X0 .
It follows that
E(β̂) = β + (X0 X)−1 X0 E() = β
| {z }
=0
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 21/61
Multiple Linear Regression
Var(β̂) = σ 2 C.
That is,
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 22/61
Multiple Linear Regression
Var(Ay) = A · Var(y) · A0 ,
we have
= σ 2 (X X)−1 .
0
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 23/61
Multiple Linear Regression
E(SSRes ) = (n − p)σ 2 .
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 24/61
Multiple Linear Regression
Remark. The total and regression sums of squares are defined in the same
way as before:
X X
SSR = (ŷi − ȳ)2 = ŷi2 − nȳ 2 = kŷk2 − nȳ 2
X X
SST = (yi − ȳ)2 = yi2 − nȳ 2 = kyk2 − nȳ 2
They can be used to assess the adequacy of the model through the
coefficient of determination
SSR SSRes
R2 = =1−
SST SST
The larger R2 (i.e., the smaller SSRes ), the better the model.
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 25/61
Multiple Linear Regression
Adjusted R2
R2 measures the goodness of fit of a single model and is not a fair criterion
for comparing models with different sizes k (e.g., nested models)
u u
b u
u
2 SSRes /(n − p)
RAdj =1− b u
SST /(n − 1) u
2
u
RAdj
b
| | | | | | | |
model. k (#predictors)
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 27/61
Multiple Linear Regression
Remark.
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 29/61
Multiple Linear Regression
Next
We consider the following inference tasks in multiple linear regression:
• Hypothesis testing
• Interval estimation
For both tasks, we need to additionally assume that the model errors i
are iid N (0, σ 2 ).
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 30/61
Multiple Linear Regression
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 31/61
Multiple Linear Regression
H0 : β1 = · · · = βk = 0
H1 : βj 6= 0 for at least one j
M SR SSR /k H0
F0 = = ∼ Fk,n−p
M SRes SSRes /(n − p)
and we reject H0 if
F0 > Fα,k,n−p
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 32/61
Multiple Linear Regression
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 33/61
Multiple Linear Regression
H0 : βj = 0 vs H1 : βj 6= 0
To conduct the test, we need to use the point estimator β̂j (which is linear,
unbiased) and determine its distribution when H0 is true:
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 34/61
Multiple Linear Regression
and we reject H0 if
|t0 | > tα/2, n−p
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 35/61
Multiple Linear Regression
y = Xβ +
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 36/61
Multiple Linear Regression
We wish to test
H0 : β2 = 0 (βk−r+1 = · · · = βk = 0) vs H1 : β2 6= 0
such that
" #
β1
y = Xβ + = [X1 X2 ] = X1 β1 + X2 β2 +
β2
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 37/61
Multiple Linear Regression
(Full model) y = Xβ +
(Reduced model) y = X1 β1 +
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 38/61
Multiple Linear Regression
Example 0.5 (Weight ∼ Height + Waist Girth). We use the extra sum of
squares method to compare it with the reduced model (Weight ∼ Height):
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 40/61
Multiple Linear Regression
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 41/61
Multiple Linear Regression
β̂j − 0 β̂j
|t0 | > tα/2, n−p , t0 = =p
se(β̂j ) σ̂ 2 Cjj
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 43/61
Multiple Linear Regression
• Prediction interval
under the additional assumption that the errors i are independently and
normally distributed with zero mean and constant variance σ 2 .
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 44/61
Multiple Linear Regression
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 45/61
Multiple Linear Regression
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 46/61
Multiple Linear Regression
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 47/61
Multiple Linear Regression
y0 = x00 β + 0
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 48/61
Multiple Linear Regression
Proof. First, note that the mean of the response y0 at x0 , i.e., x00 β, is
estimated by ŷ0 = x00 β̂.
Let Ψ = y0 − ŷ0 be the difference between the true response and the point
estimator for its mean. Then Ψ (as a linear combination of y0 , y1 , . . . , yn )
is normally distributed with mean
and variance
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 49/61
Multiple Linear Regression
It follows that
y0 − ŷ0
q ∼ N (0, 1)
σ 2 (1 + x00 (X0 X)−1 x0 )
and correspondingly,
y0 − ŷ0
q ∼ tn−p
M SRes (1 + x00 (X0 X)−1 x0 )
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 50/61
Multiple Linear Regression
p
• βj (for each 0 ≤ j ≤ k): β̂j ± tα/2,n−p M SRes Cjj
!
(n−p)M SRes (n−p)M SRes
• σ2: χ2α ,n−p
, χ2
2 1− α ,n−p
2
q
• E(y | x0 ): ŷ0 ± tα/2, n−p M SRes x00 (X0 X)−1 x0
q
• y0 (at x0 ): ŷ0 ± tα/2, n−p M SRes (1 + x00 (X0 X)−1 x0 )
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 51/61
Multiple Linear Regression
• Hidden extrapolation
• Units of measurements
• Multicollinearity
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 52/61
Multiple Linear Regression
Hidden extrapolation
In multiple linear regression, extrap-
olation may occur even when all pre-
dictor values are within their ranges.
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 54/61
Multiple Linear Regression
Units of measurements
The choices of the units of the predictors in a linear model may cause their
regression coefficients to have very different magnitudes, e.g.,
y = 3 − 20x1 + 0.01x2
Unit Normal Scaling: For each regressor xj (and the response), rescale
the observations of xj (or y) to have zero mean and unit variance.
Let
1X 1 X 1 X
x̄j = xij , s2j = (xij − x̄j )2 , s2y = (yi − ȳ)2 .
n i n−1 i n−1 i
| {z } | {z }
Sjj =SST
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 57/61
Multiple Linear Regression
Unit Length Scaling: For each regressor xj (and the response), rescale
the observations of xj (or y) to have zero mean and unit length.
Remark.
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 58/61
Multiple Linear Regression
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 59/61
Multiple Linear Regression
Multicollinearity
A serious issue in multiple linear regression is multicolinearity, or near-linear
dependence among the regression variables, e.g., x3 ≈ 2x1 + 5x2 .
We will discuss in more detail how to diagnose (and fix) the issue of
multicollinearity in Chapter 9.
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 60/61
Multiple Linear Regression
Further learning
3.3.3 The Case of Orthogonal Columns in X
• Projection matrices
– Concepts
Dr. Guangliang Chen | Mathematics & Statistics, San José State University 61/61