Chapter 15: Regression Analysis With Linear Algebra Primer
Chapter 15: Regression Analysis With Linear Algebra Primer
Overview
This primer is intended to provide a mathematical bridge to a master’s
level course that uses linear algebra for students who have taken an
undergraduate econometrics course that does not. Why should we make
the mathematical shift? The most immediate reason is the huge double
benefit of allowing us to generalise the core results to models with
many explanatory variables while simultaneously permitting a great
simplification of the mathematics. This alone justifies the investment
in time – probably not more than ten hours – required to acquire the
necessary understanding of basic linear algebra.
In fact, one could very well put the question the other way. Why do
introductory econometrics courses not make this investment and use linear
algebra from the start? Why do they (almost) invariably use ordinary
algebra, leaving students to make the switch when they take a second
course?
The answer to this is that the overriding objective of an introductory
econometrics course must be to encourage the development of a solid
intuitive understanding of the material and it is easier to do this with
familiar, everyday algebra than with linear algebra, which for many
students initially seems alien and abstract. An introductory course should
ensure that at all times students understand the purpose and value of what
they are doing. This is far more important than proofs and for this purpose
it is usually sufficient to consider models with one, or at most two,
explanatory variables. Even in the relatively advanced material, where we
are forced to consider asymptotics because we cannot obtain finite-sample
results, the lower-level mathematics holds its own. This is especially
obvious when we come to consider finite-sample properties of estimators
when only asymptotic results are available mathematically. We invariably
use a simple model for a simulation, not one that requires a knowledge of
linear algebra.
These comments apply even when it comes to proofs. It is usually
helpful to see a proof in miniature where one can easily see exactly
what is involved. It is then usually sufficient to know that in principle it
generalises, without there being any great urgency to see a general proof.
Of course, the linear algebra version of the proof will be general and often
simpler, but it will be less intuitively accessible and so it is useful to have
seen a miniature proof first. Proofs of the unbiasedness of the regression
coefficients under appropriate assumptions are obvious examples.
At all costs, one wishes to avoid the study of econometrics becoming an
extended exercise in abstract mathematics, most of which practitioners
will never use again. They will use regression applications and as long as
they understand what is happening in principle, the actual mechanics are
of little interest.
This primer is not intended as an exposition of linear algebra as such.
It assumes that a basic knowledge of linear algebra, for which there are
many excellent introductory textbooks, has already been acquired. For the
most part, it is sufficient that you should know the rules for multiplying
295
20 Elements of econometrics
two matrices together and for deriving the inverse of a square matrix, and
that you should understand the consequences of a square matrix having a
zero determinant.
Notation
Matrices and vectors will be written bold, upright, matrices upper case,
for example A, and vectors lower case, for example b. The transpose of a
matrix will be denoted by a prime, so that the transpose of A is A', and the
inverse of a matrix will be denoted by a superscript –1, so that the inverse
of A is A–1.
Test exercises
Answers to all of the exercises in this primer will be found at its end. If
you are unable to answer the following exercises, you need to spend more
time learning basic matrix algebra before reading this primer. The rules in
Exercises 3–5 will be used frequently without further explanation.
1. Demonstrate that the inverse of the inverse of a matrix is the original
matrix.
2. Demonstrate that if a (square) matrix possesses an inverse, the inverse
is unique.
3. Demonstrate that, if A = BC, A' = C' B'.
4. Demonstrate that, if A = BC, A–1 = C–1 B–1, provided that B–1 and C–1
exist.
5. Demonstrate that [A']–1 = [A–1]'.
296
Chapter 15: Regression analysis with linear algebra primer
The coefficient of this unit vector is the intercept in the regression model.
If it is included, and located as the first column, the X matrix becomes
1 X 12 ... X 1 j ... X 1k
... ... ... ... ... ...
X = 1 X i2
... X ij ...
[
X ik = 1 x 2 ... x j ... x k ] (8)
297
20 Elements of econometrics
e = y − yˆ = y − Xb (11)
and the residual sum of squares as
RSS = e' e = (y − Xb ) ' (y − Xb )
= y ' y − y ' Xb − b' X' y + b' X' Xb (12)
= y ' y − 2y ' Xb + b' X' Xb
(y' Xb = b' X' y since it is a scalar.) The next step is to obtain the normal
equations
∂RSS
=0 (13)
∂b j
for j = 1, ..., k and solve them (if we can) to obtain the least squares
coefficients. Using linear algebra, the normal equations can be written
X' Xb − X' y = 0 (14)
The derivation is straightforward but tedious and has been consigned
to Appendix A. X' X is a square matrix with k rows and columns. If
assumption A.2 is satisfied (that it is not possible to write one X variable as
a linear combination of the others), X’ X has an inverse and we obtain the
OLS estimator of the coefficients:
b = [X' X] X' y (15)
−1
Exercises
6. If Y = β1 + β2X + u, obtain the OLS estimators of β1 and β2 using (15).
7. If Y = β2X + u, obtain the OLS estimator of β2 using (15).
8. If Y = β1 + u, obtain the OLS estimator of β1 using (15).
(
E (b X ) = β + E [X' X] X' u X
−1
) (17)
(
E (b X ) = β + E [X' X] X' u X
−1
)
= β + E ([X' X] X')E (u )
−1
(19)
=β
σ u2 0 0 ... 0 0 0
0 σ u2 0 ... 0 0 0
0 0 σ u2 ... 0 0 0
... ... ... ... ... ... ... (20)
0 0 0 ... σ u2 0 0
0 0 0 ... 0 σ u2 0
0 0 0 ... 0 0 σ u2
that is, a matrix whose diagonal elements are all equal to σ u2 and whose
off-diagonal elements are all zero. It may more conveniently be written
2
Inσ u where In is the identity matrix of order n.
Similarly, we define the variance-covariance matrix of the regression
coefficients to be the matrix whose element in row i and column j is the
population covariance of bi and bj:
( ) {
cov bi , b j = E (bi − E (bi )) b j − E b j ( ( ))} = E{(b i ( )}
− βi ) b j − β j .
(21)
(( )(
= E [X' X] X' u [X' X] X' u ' X
−1 −1
) )
= E ([X' X] X' uu' X[X' X] X) (22)
−1 −1
= [X' X] σ u2
−1
299
20 Elements of econometrics
{
var(b ) = E [X' X] σ u2
−1
} (24)
{
= σ u2 E [X' X]
−1
}
the expectation being taken over the distribution of X.
To estimate var(b), we need to estimate σ u2. An unbiased estimator is
provided by e'e/(n – k). For a proof, see Appendix B.
(
b* = [X' X] X'+C y
−1
)
Unbiasedness requires
CX = 0 k (28)
where 0 is a k by k matrix consisting entirely of zeros. Then, with
E(b*)=β, the variance-covariance matrix of b* is given by
E{(b * −β )(b * −β ) ' } = E {( [X' X] −1
) ( )}
X'+C uu' [X' X] X'+C '
−1
(
= [X' X]
−1
X'+C)I σ ( [X' X] X'+C)'
n
2
u
−1
(29)
= ( [X' X] X'+C)( [X' X] X'+C)' σ
−1 −1 2
u
= ( [X' X] + CC')' σ
−1 2
u
∑c
s =1
2
ik ,
which is positive unless cis = 0 for all s. Hence minimising the variances of
the estimators of all of the elements of β requires C = 0. This implies that
OLS provides the minimum variance unbiased estimator.
300
Chapter 15: Regression analysis with linear algebra primer
1 −1
1 (31)
= β + plim X' X X' u
n n
and so plim b = β. Note that this is only an outline of the proof. For a
proper proof and a generalisation to less restrictive assumptions, see
Greene pp.64–65.
Frisch–Waugh–Lovell theorem
We will precede the discussion of the Frisch–Waugh–Lovell (FWL) theorem
by introducing the residual-maker matrix. We have seen that, when we fit
y = Xβ + u (33)
using OLS, the residuals are given by
e = y − yˆ = y − Xb (34)
Substituting for b, we have
e = y − X[X' X] X' y
−1
[
= I − X[X' X] X' y
−1
] (35)
= My
where
M = I − X[X' X] X'
−1
(36)
M is known as the ‘residual-maker’ matrix because it converts the values of
y into the residuals of y when regressed on X. Note that M is symmetric,
because M'=M, and idempotent, meaning that MM=M.
Now suppose that we divide the k variables comprising X into two
subsets, the first s and the last k–s. (For the present purposes, it makes no
difference whether there is or is not an intercept in the model, and if there
is one, whether the vector of ones responsible for it is in the first or second
subset.) We will partition X as
X = [X 1 X 2 ] (37)
where X1 comprises the first s columns and X2 comprises the last k–s, and
we will partition β similarly, so that the theoretical model may be written
β
y = [X 1 X 2 ] 1 + u (38)
β
2
301
20 Elements of econometrics
The FWL theorem states that the OLS estimates of the coefficients in β1
are the same as those that would be obtained by the following procedure:
regress y on the variables in X2 and save the residuals as ey. Regress each
of the variables in X1 on X2 and save the matrix of residuals as eX1. If we
regress ey on eX1, we will obtain the same estimates of the coefficients of
β1 as we did in the straightforward multiple regression. (Why we might
want to do this is another matter. We will come to this later.) Applying the
preceding discussion relating to the residual-maker, we have
ey = M 2y (39)
where
M 2 = I − X 2 [X 2 ' X 2 ] X 2 '
−1
(40)
and
e X1 = M 2 X 1 (41)
Let the vector of coefficients obtained when we regress ey on eX1 be
denoted b1*. Then
b *1 = [e X1 ' e X1 ] e X1 ' e y
−1
= [X 1 ' M 2 X 1 ] X 1 ' M 2 y
−1
X ' y
X' y = 1 (46)
X 2 ' y
Hence, splitting the normal equations into their upper and lower
components, we have
X1 ' X1b 1 + X1 ' X 2 b 2 − X1 ' y = 0 (47)
and
X 2 ' X1b 1 + X 2 ' X 2 b 2 − X 2 ' y = 0 (48)
From the second we obtain
X 2 ' X 2 b 2 = X 2 ' y − X 2 ' X1b 1 (49)
and so
b 2 = [X 2 ' X 2 ] [X 2 ' y − X 2 ' X 1 b 1 ]
−1
(50)
Substituting for b2 in the first normal equation,
X 1 ' X 1 b 1 + X 1 ' X 2 [X 2 ' X 2 ] [X 2 ' y − X 2 ' X 1 b 1 ] − X 1 ' y = 0
−1
(51)
Hence
X 1 ' X 1 b 1 − X 1 ' X 2 [X 2 ' X 2 ] X 2 ' X 1 b 1 = X 1 ' y − X 1 ' X 2 [X 2 ' X 2 ] X 2 ' y (52)
−1 −1
302
Chapter 15: Regression analysis with linear algebra primer
and so
[ ] [ ]
X 1 ' I − X 2 [X 2 ' X 2 ] X 2 ' X 1 b 1 = X 1 ' I − X 2 [X 2 ' X 2 ] X 2 ' y
−1 −1 (53)
Hence
X 1 ' M 2 X1b 1 = X1 ' M 2 y (54)
and
b 1 = [X 1 ' M 2 X 1 ] X 1 ' M 2 y = b *1
−1
(55)
Why should we be interested in this result? The original purpose remains
instructive. In early days, econometricians working with time series data,
especially macroeconomic data, were concerned to avoid the problem
of spurious regressions. If two variables both possessed a time trend, it
was very likely that ‘significant’ results would be obtained when one was
regressed on the other, even if there were no genuine relationship between
them. To avoid this, it became the custom to detrend the variables before
using them by regressing each on a time trend and then working with
the residuals from these regressions. Frisch and Waugh (1933) pointed
out that this was an unnecessarily laborious procedure. The same results
would be obtained using the original data, if a time trend was added as an
explanatory variable.
Generalising, and this was the contribution of Lovell, we can infer that,
in a multiple regression model, the estimator of the coefficient of any one
variable is not influenced by any of the other variables, irrespective of
whether they are or are not correlated with the variable in question. The
result is so general and basic that it should be understood by all students
of econometrics. Of course, it fits neatly with the fact that the multiple
regression coefficients are unbiased, irrespective of any correlations among
the variables.
A second reason for being interested in the result is that it allows one
to depict graphically the relationship between the observations on
the dependent variable and those on any single explanatory variable,
controlling for the influence of all the other explanatory variables. This is
described in the textbook in Section 3.2.
Exercise
9. Using the FWL theorem, demonstrate that, if a multiple regression
model contains an intercept, the same slope coefficients could be
obtained by subtracting the means of all of the variables from the data
for them and then regressing the model omitting an intercept.
Exact multicollinearity
We will assume, as is to be expected, that k, the number of explanatory
variables (including the unit vector, if there is one), is less than n, the
number of observations. If the explanatory variables are independent,
the X matrix will have rank k and likewise X'X will have rank k and will
possess an inverse. However, if one or more linear relationships exist
among the explanatory variables, the model will be subject to exact
multicollinearity. The rank of X, and hence of X'X, will then be less than k
and X'X will not possess an inverse.
Suppose we write X as a set of column vectors xj, each corresponding to
the observations on one of the explanatory variables:
[
X = x 1 ... x j ... x k ] (56)
where
303
20 Elements of econometrics
x1 j
...
x j = xij (57)
...
x nj
Then
x1 '
...
X' = x j ' (58)
...
x k '
and the normal equations
X' Xb − X' y = 0 (59)
may be written
x 1 ' Xb x 1 ' y
... ...
x j ' Xb − x j ' y = 0 (60)
... ...
x k ' Xb x k ' y
Now suppose that one of the explanatory variables, say the last, can be
written as a linear combination of the others:
k −1
x k = ∑ λi x i (61)
i =1
Then the last of the normal equations is that linear combination of the
other k – 1. Hence it is redundant, and we are left with a set of k – 1
equations for determining the k unknown regression coefficients. The
problem is not that there is no solution. It is the opposite: there are too
many possible solutions, in fact an infinite number. One coefficient could
be chosen arbitrarily, and then the normal equations would provide
a solution for the other k – 1. Some regression applications deal with
this situation by dropping one of the variables from the regression
specification, effectively assigning a value of zero to its coefficient.
Exact multicollinearity is unusual because it mostly occurs as a
consequence of a logical error in the specification of the regression model.
The classic example is the dummy variable trap. This occurs when a set of
dummy variables Dj, j = 1, ..., s are defined for a qualitative characteristic
that has s categories. If all s dummy variables are included in the
specification, in observation i we will have
s
∑
j =1
Dij = 1 (62)
since one of the dummy variables must be equal to 1 and the rest are all
zero. But this is the (unchanging) value of the unit vector. Hence the sum
of the dummy variables is equal to the unit vector. As a consequence, if the
unit vector and all of the dummy variables are simultaneously included
in the specification, there will be exact multicollinearity. The solution is
to drop one of the dummy variables, making it the reference category, or,
alternatively, to drop the intercept (and hence unit vector), effectively
making the dummy variable coefficient for each category the intercept for
304
Chapter 15: Regression analysis with linear algebra primer
{
var(λ' b ) = E (λ' b − E (λ' b ))} 2
(64)
= E {(λ' b − λ' β ) }
2
= λ' [X' X] λσ u2
−1
The square root of this expression provides the standard error of λ'b after
2
we have replaced σ u by its estimator e'e/(n – k) in the usual way.
305
20 Elements of econometrics
( ) {( ( ))(
var b WLS = E b WLS − E b WLS b WLS − E b WLS ' ( )) }
= E {(b WLS
− β )(b − β) '}
WLS
{( )(
= E [X' A' AX] X' A' Au [X' A' AX] X' A' Au '
−1 −1
)}
= E {[X' A' AX] X' A' Auu' A' AX[X' A' AX] }
−1 −1
= [X' CX]−1 σ u2
since A has been defined so that
AE (uu')A' = I (77)
Of course, in practice we seldom know σ , but if it is appropriate
2
ui
to hypothesise that the standard deviation is proportional to some
measurable variable Zi, then the WLS regression will be homoscedastic if
we define A to have diagonal element i equal to the reciprocal of Zi.
(80)
= β + [W' X] W' u
−1
307
20 Elements of econometrics
−1
−1
( d
n b IV − β →
) 1 1 1
N 0, σ u2 plim W' X plim W' W plim X' W
n n n
Exercises
10. Using (79) and (85), demonstrate that, for the simple regression
model
Yi = β 1 + β 2 X i + u i
with Z acting as an instrument for X (and the unit vector acting as an
instrument for itself),
b1IV = Y − b2IV X
∑ (Z )( )
n
i − Z Yi − Y
IV i =1
b 2 =
∑ (Z )( )
n
i −Z Xi − X
i =1
and, as an approximation,
σ u2
( )
var b2IV = ×
1
∑ (X )
n 2
2 rXZ
i −X
i =1
where Z is the instrument for X and rXZ is the correlation between X and Z.
11. Demonstrate that any variable acting as an instrument for itself is
unaffected by the first stage of two-stage least squares.
12. Demonstrate that TSLS is equivalent to IV if the equation is exactly
identified.
308
Chapter 15: Regression analysis with linear algebra primer
309
20 Elements of econometrics
matrix with the eigenvalues as the diagonal elements. Then there exists a
matrix of eigenvectors, C, such that
C' ΩC = Λ (90)
C has the properties that CC' = I and C' = C–1. Since Λ is a diagonal
matrix, if its eigenvalues are all positive (which means that it is what is
known as a ‘positive definite’ matrix), it can be factored as Λ =Λ1/2 Λ1/2
where Λ1/2 is a diagonal matrix whose diagonal elements are the square
roots of the eigenvalues. It follows that the inverse of Λ can be factored as
Λ–1 =Λ–1/2 Λ–1/2. Then, in view of (90),
Λ −1/2 [C' ΩC] Λ −1/2 = Λ −1/2 ΛΛ −1/2 = Λ −1/2 Λ 1/2 Λ 1/2 Λ −1/2 = I (91)
Thus, if we define P =Λ C', (91) becomes
–1/2
PΩP' = I (92)
As a consequence, if we premultiply (86) through by P, we have
Py = PXβ + Pu (93)
or
y* = X * β + u * (94)
where y* = Py, X* = PX, and u* = Pu, and E(u * u*') = Iσ . An OLS 2
u
regression of y*on X* will therefore satisfy the usual regression model
assumptions and the estimator of β will have the usual properties. Of
course, the approach usually requires the estimation of Ω, Ω being positive
definite, and there being no problems in extracting the eigenvalues and
determining the eigenvectors.
Exercise
13. Suppose that the disturbance term in a simple regression model (with
an intercept) is subject to AR(1) autocorrelation with ρ < 1 , and
suppose that the sample consists of just two observations. Determine
the variance-covariance matrix of the disturbance term, find its
eigenvalues, and determine its eigenvectors. Hence determine P and
state the transformed model. Verify that the disturbance term in the
transformed model is iid.
310
Chapter 15: Regression analysis with linear algebra primer
Hence
∂b' X' Xb k k k
= ∑ bq x j ' x q + ∑ b p x p ' x j = 2∑ b p x p ' x j (A.10)
∂b j q =1 p =1 p =1
311
20 Elements of econometrics
k
x j ' ∑ b p x p = x j ' y (A.13)
p =1
Hence
x j ' Xb = x j ' y (A.14)
since
b1
...
k
[
Xb = x 1 ... x p ]
... x k b p = ∑ x p b p (A.15)
p =1
...
bk
Hence, stacking the k normal equations,
x 1 ' Xb x 1 ' y
... ...
x j ' Xb = x j ' y (A.16)
... ...
x k ' Xb x k ' y
Hence
x1 ' x1 '
... ...
x j ' Xb = x j ' y (A.17)
... ...
x k ' x k '
Hence
312
Chapter 15: Regression analysis with linear algebra primer
n m
tr (AB ) = ∑ ∑ a ip b pi (B.2)
i =1 p =1
n
Similarly, diagonal element i of BA is ∑
p =1
bip a pi . Hence
m n
tr (BA ) = ∑ ∑ bip a pi (B.3)
i =1 p =1
What we call the symbols used to index the summations makes no
difference. Re-writing p as i and i as p, and noting that the order of the
summation makes no difference, we have tr (BA ) = tr (AB ) .
We also need to note that
tr (A + B ) = tr (A ) + tr (B ) (B.4)
where A and B are square matrices of the same dimension. This follows
immediately from the way that we sum conformable matrices.
By definition,
e = y − yˆ = y − Xb (B.5)
Using
b = [X' X] X' y
−1
(B.6)
we have
e = y − X[X' X] X' y
−1
(B.7)
= I n u − X[X' X] X' u
−1
= Mu
where In is an identity matrix of dimension n and
M = I n − X[X' X] X'
−1 (B.8)
Hence
e' e = u' M' Mu (B.9)
Now M is symmetric and idempotent: M'=M and MM=M . Hence
e' e = u' Mu (B.10)
e' e is a scalar, and so the expectation of e' e and the expectation of the
trace of e' e are the same. So
313
20 Elements of econometrics
The penultimate line uses tr (AB ) = tr (BA ) . The last line uses the fact that
the expectation of the sum of the diagonal elements of a matrix is equal to
the sum of their individual expectations. Assuming that X, and hence M, is
nonstochastic,
E (e' e ) = tr {ME (uu')}
(
= tr MI n σ u2 )
= σ u2 tr (M ) (B.12)
(
= σ u2 tr I n − X[X' X] X'
−1
)
{ (
= σ u2 tr (I n ) − tr X[X' X] X'
−1
)}
The last step uses tr (A + B ) = tr (A ) + tr (B ) . The trace of an identity
matrix is equal to its dimension. Hence
( (
E (e' e ) = σ u2 n − tr X[X' X] X'
−1
))
= σ u2 (n − tr (X' X[X' X] )) −1
(B.13)
= σ (n − tr (I k ))
2
u
= σ u2 (n − k )
Hence e' e / (n − k ) is an unbiased estimator of σ u .
2
314
Chapter 15: Regression analysis with linear algebra primer
1 ∑ X i2 − nX
[X' X]−1 = 2
n∑ X − n X − nX
2
i
2
n .
We also have
1' y ∑ Yi
X' y = =
x' y ∑ X i Yi
So
b = [X' X] X' y
−1
1 ∑ X i2 − nX nY
=
n∑ X i2 − n 2 X 2 − nX n ∑ X i Yi
1 nY ∑ X i2 − nX ∑ X i Yi
=
n∑ X i2 − n 2 X 2 − n XY + n∑ X i Yi
2
1 Y ∑ X i2 − X ∑ X i Yi
=
∑ (X i −X )
2
( )(
∑ X i − X Yi − Y )
315
20 Elements of econometrics
Thus
b2 =
∑ (X − X )(Y − Y )
i i
∑ (X − X )
2
i
and
Y ∑ X i2 − X ∑ X i Yi
b1 =
∑ (X )
2
i −X
b1 =
Y (∑ X i
2
)
− nX 2 + Y nX 2 − X ∑ X i Yi
∑ (X )
2
i −X
=
Y (∑ (X ) )− X (∑ X Y − nXY )
i −X
2
i i
∑ (X − X )
2
i
X (∑ (X − X )(Y − Y ))i i
=Y − = Y −b X
∑ (X − X )
2 2
i
X1
...
7. If Y = β 2 X + u , y = Xβ + u where X = x = X i . Then
...
X n
X' X = x' x = ∑ X i2
1
The inverse of X'X is . In this model, X' y = x' y = ∑ X i Yi
∑ X i2
So
b = [X' X] X' y
−1
=
∑ X i Yi
∑ X i2
8. If Y = β 1 + u , y = Xβ + u where X = 1, the unit vector. Then
X'X = 1'1 = n and its inverse is 1/n.
X' y = 1' y = ∑ Yi = nY
So
b = [X' X] X' y
−1
1
= nY = Y
n
316
Chapter 15: Regression analysis with linear algebra primer
10. The general form of the IV estimator is b IV = [W' X]−1 W' y . In the case
of the simple regression model, with Z acting as an instrument for X
and the unit vector acting as an instrument for itself, W = [1 z ] and
X = [1 x] . Thus
1 ∑ Z i X i − nX
[W' X]−1 = .
n∑ Z i X i − n Z X − nZ
2
n
We also have
1' y ∑ Yi
W' y = =
z' y ∑ Z i Yi
So
b IV = [W' X] W' y
−1
1 ∑ Z i X i − nX nY
=
n∑ Z i X i − n Z X − nZ
2
n ∑ Z i Yi
1 nY ∑ Z i X i − nX ∑ Z i Yi
=
n∑ Z i X i − n 2 Z X − n Z Y + n∑ Z i Yi
2
1 Y ∑ Z i X i − X ∑ Z i Yi
=
∑( Zi − Z X i − X )( )
( )(
∑ Z i − Z Yi − Y )
Thus
b2IV =
∑ (Z − Z )(Y
i i −Y)
∑ (Z − Z )(X
i i −X)
and
Y ∑ Z i X i − X ∑ Z i Yi
b1IV =
∑ (Z i )(
−Z Xi − X )
b1IV may be written in its more usual form as follows:
b1IV =
Y (∑ Z X i i )
− nZ X + Y nZ X − X ∑ Z i Yi
∑ (Z i − Z )(X i − X )
Y (∑ (Z i − Z )(X i − X )) − X (∑ Z i Yi − nZ Y )
=
∑ (Z i − Z )(X i − X )
X (∑ (Z i − Z )(Yi − Y ))
=Y − = Y − b2IV X
∑ i (Z − Z )( X i − X )
11. By definition, if one of the variables in X is acting as an instrument
for itself, it is included in the W matrix. If it is regressed on W, a
perfect fit is obtained by assigning its column in W a coefficient of 1
317
20 Elements of econometrics
and assigning zero values to all the other coefficients. Hence its fitted
values are the same as its original values and it is not affected by the
first stage of Two-Stage Least Squares.
12. If the variables in X are regressed on W and the matrix of fitted values
of X saved,
b TSLS = X [ ]
ˆ ' X −1 X
ˆ 'y
= [W' X] W' y
−1
= b IV
Note that, in going from the second line to the third, we have used
[ABC]−1 = C −1 B −1 A −1 , and we have exploited the fact that W'X is
square and possesses an inverse.
13. The variance-covariance matrix of u is
1 ρ
ρ 1
and hence the characteristic equation for the eigenvalues is
(1 − λ )2 − ρ 2 = 0
The eigenvalues are therefore 1 – ρ and 1 + ρ. Since we are told ρ < 1 ,
the matrix is positive definite.
c
Let c = 1 . If λ = 1 – ρ, the matrix A – λI is given by
c 2
ρ ρ
A − λI =
ρ ρ
and hence the equation
[A − λI]c = 0
yields
ρc1 + ρc 2 = 0
Hence, also imposing the normalisation
c' c = c12 + c 22 = 1
1 1
we have c1 = and c 2 = − , or vice versa.
2 2
If λ = 1 + ρ,
− ρ ρ
A − λI =
ρ − ρ
1 1
1
we have c1 = c 2 = . Thus C = 2 2
2 − 1 1
2 2
and
1 1 1 1 1
0 − −
1− ρ
P = Λ −1/2 C' = 2 2 = 1 1− ρ 1− ρ
1 1 1 2 1 1
0
1 + ρ 2 2
1+ ρ 1+ ρ
1 1
−
1 1− ρ 1− ρ 1− ρ 1+ ρ
=
2 1 1 − 1 − ρ 1 + ρ
1+ ρ 1+ ρ
1 2 0 1 0
= =
2 0 2 0 1
none of its elements is the white noise ε in the AR(1) process, but
nevertheless its elements are iid.
( )
var u1* =
1 1
2 1− ρ
{var(u1 ) + var(u 2 ) − 2 cov(u1 , u 2 )}
=
1 1
2 1− ρ
{ }
σ u2 + σ u2 − 2 ρσ u2 = σ u2
( )
var u 2* =
1 1
2 1+ ρ
{var(u1 ) + var(u 2 ) + 2 cov(u1 , u 2 )}
=
1 1
2 1+ ρ
{ }
σ u2 + σ u2 + 2 ρσ u2 = σ u2
319
20 Elements of econometrics
( )
cov u1* , u 2* =
1 1
2 1− ρ 2
cov{(u1 − u 2 ), (u1 + u 2 )}
1 1
= {var(u1 ) + cov(u1 , u 2 ) − cov(u 2 , u1 ) − var(u 2 )}
2 1− ρ 2
=0
Hence
E (u * u*') = Iσ u2
Of course, this was the objective of the P transformation.
320