Matrix Introduction
Matrix Introduction
Junhui Wang
Department of Statistics
Fall 2022
Matrix
where aij is the element in the i-th row and j-th column.
y1
y2
column vector: ,
..
.
yn
(transpose) row vector: y1 y2 · · · yn .
The (i, j)-th element of C is the inner product of the i-th row
of A and the j-th column of B (viewed as vectors in Rn ).
A must have the same number of columns as the number of
rows of B.
Generally, A B ̸= B A.
1′ 1 = n; 11′ = J
Rank of matrix
The rank of a matrix is defined to be the maximum number of
linearly independent columns in the matrix.
Some useful facts:
λ1 A1 + λ2 A2 + · · · + λp Ap = 0;
A + B = B + A; (A + B) + C = A +(B + C)
(A B) C = A(B C); C(A + B) = C A + C B; k(A + B) = k A +k B
(A′ )′ = A; (A + B)′ = A′ + B′ ; (A B)′ = B′ A′ ; (A B C)′ = C′ B′ A′
(A−1 )−1 = A; (A B)−1 = B−1 A−1
(A B C)−1 = C−1 B−1 A−1 ; (A′ )−1 = (A−1 )′
Random vector and matrix
E (A) = [E (Aij )]
E (A) = A
E (W) = E (A Y ) = A E (Y )
Cov(W) = Var(A Y ) = A Var(Y )A′ .
Derivative in matrix function
β = (β1 β2 · · · βp )′
1 ∂Q(β)
Q̃ = (Q +Q′ ); Q(β) = β ′ Q̃ β; = 2Q̃ β .
2 ∂β
in regression analysis , the symmetric set up is more useful
Simple linear regression in matrix form
yi = β0 + xi β1 + ei ; i = 1, · · · , n,
Y = E (Y ) + e = X β +e,
where
y1 1 x1 e1
y2 1 x2 e2
β0
Y = .. , X= .. .. , β= , e= .. ,
β1
. . . .
yn 1 xn en
response vector design matrix coefficient vector error vector
Q(β) = ∥Y − X β ∥2 = (Y − X β)′ (Y − X β)
= β ′ X′ X β −β ′ X′ Y − Y ′ X β +Y ′ Y
= β ′ X′ X β −2β ′ X′ Y + Y ′ Y .
Then we have
∂Q(β)
= 2X′ X β −2X′ Y = 2X′ (X β −Y )
∂β
ê = Y − X β̂
β̂ = (X′ X)−1 X′ Y
Unbiasedness:
Variance:
Moreover,
??
′ −1 ′ ′ −1 ′ 2
β̂ = (X X) XY ∼ (X X) X N(X β, σ I)
∼ N(β, σ 2 (X′ X)−1 )
Fitted values
Then we have
Yb = X β̂ = X(X′ X)−1 X′ Y ,
where H = X(X′ X)−1 X′ is the hat matrix or projection matrix.
It is obvious that H is symmetric and
HH = H.
Also, we have
Yb = H Y ∼ N(Xβ, σ 2 H)
Residuals
ê = Y − Yb = Y − H Y = (I − H)Y .
E (ê) = (I − H)E (Y ) = (I − H) X β = 0
Var(ê) = (I − H) Cov(Y )(I − H)′ = σ 2 (I − H)
Therefore,
n n
X X 1
SStot = yi2 − ( yi )2 /n = Y ′ Y − Y ′ JY
n
i=1 i=1
SSres = ê ê = Y (I − H)Y = Y ′ Y − Y ′ H Y
′ ′
= ê ′ (Y − X β̂) = ê ′ Y = (Y − X β̂)′ Y = Y ′ Y − β̂ ′ X′ Y
1 1
SSreg = SStot − SSres = β̂ ′ X′ Y − Y ′ JY = Y ′ H Y − Y ′ JY .
n n
We thus have a unified formula for these three sum of squares
Y′AY,
where A are
1
SSTO : I − J;
n
SSE : I − H;
1
SSR : H − J,
n
so they are all quadratic functions of Y.
Statistical inferences
(x∗ − x̄)2
2 2 1
sefit (b
y∗ ) = σ̂ + Pn 2
.
n i=1 (xi − x̄)