0% found this document useful (0 votes)

53 views

Elementary Regression Theory: Theorem 1. If The K × 1 Vector Y Is Distributed As N (, V), and If B Is An M × K Matrix

The document summarizes several theorems regarding elementary regression theory: 1) If a vector Y is normally distributed, and B is a matrix, then Z=BY is also normally distributed with mean Bμ and covariance BVB'. 2) If a matrix B and vector Y satisfy certain conditions, then linear and quadratic forms of BY and Y'AY are independently distributed. 3) For symmetric matrices A and B, there exists an orthogonal transformation that diagonalizes both if and only if AB=BA.

Uploaded by

Faca Córdoba

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views

Elementary Regression Theory: Theorem 1. If The K × 1 Vector Y Is Distributed As N (, V), and If B Is An M × K Matrix

Uploaded by

Faca Córdoba

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Elementary Regression Theory

Richard E. Quandt
Princeton University
Theorem 1. If the k 1 vector Y is distributed as N(, V ), and if B is an m k matrix
(m
<
=
k) of rank m, then Z = BY is distributed as N(B, BV B

).
Proof . The moment generating function for the multivariate normal distribution of y is
E(e

Y
) = e

V /2
.
Hence, the moment generating function for Z is E(e

BY
) = e

B+

BV B

/2
, which is the moment
generating function for a multivariate normal distribution with mean vector B and covariance
matrix BV B

.
Theorem 2. If B is a q n matrix, A an nn matrix, and if BA = 0, and Y is distributed as
N(,
2
I), then the linear form BY and the quadratic form Y

AY are independently distributed.

Proof . Without loss of generality, A can be taken to be symmetric. Then there exists an or-
thogonal matrix P such that
P

AP =
_
D
1
0
0 D
2
_
,
where D
1
and D
2
are diagonal, and writing the matrix in partitioned forms is needed in the proof
below. Let P

Y = Z. Then Z is distributed as N(P

,
2
I), since E(P

Y ) = P

and E[(Z
P

)(Z P

] = P

E[(Y )(Y )

]P =
2
I by the orthogonality of P.
Now let
BP = C. (1)
By the hypothesis of the theorem,
0 = BA = BAP = BPP

AP = CD (2)
and partitioning C and D conformably, CD can be written as
CD =
_
C
11
C
12
C
21
C
22
_ _
D
1
0
0 D
2
_
,
2 Quandt
where either D
1
or D
2
must be zero. If neither D
1
nor D
2
is zero, Eq.(2) would imply that C
11
=
C
12
= C
21
= C
22
= 0; hence C = 0, and from Eq.(1), B = 0, which is a trivial case. So assume that
D
2
is zero. But if D
2
= 0, Eq.(2) implies only C
11
= C
21
= 0. Then
BY = BPP

Y = CZ = [ 0 C
2
]
_
Z
1
Z
2
_
= C
2
Z
2
where C
2
denotes [ C

12
C

22
]

, and
Y

AY = Y

APP

Y = Z

DZ = Z

1
D
1
Z
1
.
Since the elements of Z are independent and BY and Y

AY share no element of Z in common, they

are independent.
Corollary 1. It follows immediately that if the elements of the vector x

= (x
1
, . . . , x
n
) are
independent drawings from a normal distribution, the sample mean, x and (n times) the sample
variance,

i
(x
i
x)
2
, are independently distributed.
This follows because, dening i

as the vector of n 1s, (1, . . . , 1), we can write

x =
i

n
x
and

i
(x
i
x)
2
= x

_
I
ii

n
__
I
ii

n
_
x = x

_
I
ii

n
_
x
and we can verify that the matrices
_
I
ii

n
_
and
_
i

n
_
have a zero product.
Theorem 3. If A and B are both symmetric and of the same order, it is necessary and sucient
for the existence of an orthogonal transformation P

AP =diag, P

BP =diag, that AB = BA.

Proof . (1) Necessity. Let P

AP = D
1
and P

BP = D
2
, where D
1
and D
2
are diagonal matrices.
Then P

APP

BP = D
1
D
2
= D and P

BPP

AP = D
2
D
1
= D. But then it follows that AB = BA.
(2) Suciency. Assume that AB = BA and let
1
, . . . ,
r
denote the distinct eigenvalues of A,
with multiplicities m
1
, . . . , m
r
. There exists an orthogonal Q
1
such that
Q

1
AQ
1
= D
1
=
_

1
I
1
0 . . . 0
0
2
I
2
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . .
r
I
r
_

_
,
where I
j
denotes an identity matrix of order m
j
. Then D
1
commutes with Q

1
BQ
1
, since
D
1
Q

1
BQ
1
= Q

1
AQ
1
Q

1
BQ
1
= Q

1
ABQ
1
= Q

1
BAQ
1
= Q

1
BQ
1
Q

1
AQ
1
= Q

1
BQ
1
D
1
.
Regression Theory 3
It follows that the matrix Q

1
BQ
1
is a block-diagonal matrix with symmetric submatrices (Q

1
BQ
1
)
i
of dimension m
i
(i = 1, . . . , r) along the diagonal. Then we can nd orthogonal matrices P
i
such
that P

i
(Q

1
BQ
1
)
i
P
i
are diagonal. Now form the matrix
Q
2
=
_

_
P
1
0 . . . 0
0 P
2
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . P
r
_

_
.
Q
2
is obviously orthogonal. Then the matrix Q dened as Q
1
Q
2
is orthogonal and diagonalizes both
A and B, since
Q

AQ = Q

2
D
1
Q
2
= Q

2
Q
2
D
1
= D
1
(because the blockdiagonality of Q
2
causes it to commute with D
1
), and
Q

BQ = Q

2
Q

1
BQ
1
Q
2
is a diagonal matrix by construction.
Theorem 4. If Y is distributed as N(, I), the positive semidenite forms Y

AY and Y

BY
are independently distributed if and only if AB = 0.
Proof . (1) Suciency. Let AB = 0. Then B

= BA = 0 and AB = BA. Then, by Theorem

3, there exists an orthogonal P such that P

AP = D
1
and P

BP = D
2
. It follows that D
1
D
2
= 0,
since
D
1
D
2
= P

APP

BP = P

ABP
and AB = 0 by hypothesis. Then D
1
and D
2
must be of the forms
D
1
=
_
_

1
0 0
0 0 0
0 0 0
_
_
(3)
and
D
2
=
_
_
0 0 0
0
2
0
0 0 0
_
_
,
where
1
and
2
are diagonal submatrices and where D
1
and D
2
are partitioned conformably.
Now let Z = P

Y . Then Z is distributed as N(P

, I), and
Y

AY = Z

APZ = Z

1
Z
1
and
Y

BY = Z

BPZ = Z

2
Z
2
.
4 Quandt
Since the elements of Z are independent and the two quadratic forms share no element of Z in
common, they are independent.
(2) Necessity. If Y

AY and Y

BY are independent, and Y = PZ, where P is orthogonal and

AP = D
1
, then Z

D
1
Z and Z

BPZ cannot have elements of Z in common. Thus, if D

1
has
the form of Eq.(3), then P

BP must be of the form

D
2
=
_
_
0 0 0
0
22

23
0
32

33
_
_
.
Then P

APP

BP = 0, from which it follows that AB = 0.

Denition 1. A square matrix A is said to be idempotent if A
2
= A.
In what follows, we consider only symmetric idempotent matrices.
Theorem 5. The eigenvalues of a symmetric matrix A are all either 0 or 1 if and only if A is
idempotent.
Proof . (1) Suciency. If is an eigenvalue of the matrix A, and x is the corresponding eigen-
vector, then
Ax = x. (4)
Multiplying (4) by A yields
A
2
x = AX =
2
x,
but since A is idempotent, we also have
A
2
x = Ax = x.
From the last two equations it follows that
(
2
)x = 0,
but since x is an eigenvector and hence not equal to the zero vector, it follows that is equal to
either 0 or 1. It also follows that if A has rank r, there exists an orthogonal matrix P such that
P

AP =
_
I
r
0
0 0
_
, (5)
where I
r
is an identity matrix of order r.
(2) Necessity. Let the eigenvalues of A all be 0 or 1. Then there exists orthogonal P such that
P

AP = E
r
, where E
r
is the matrix on the right hand side of Eq.(5). Then A = PE
r
P

, and
A
2
= PE
r
P

PE
r
P

= PE
r
P

= A, and A is idempotent.
Regression Theory 5
Theorem 6. If n-vector Y is distributed as N(0, I), then Y

AY is distributed as
2
(k) if and
only if A is idempotent of rank k.
Proof . (1) Suciency. Let P be the orthogonal matrix that diagonalizes A, and dene Z = P

Y .
Then
Y

AY = Z

APZ =
k

i=1
z
2
i
by Theorem 5. But the right hand side is the sum of squares of k normally and independently
distributed variables with mean zero and variance 1; hence it is distributed as
2
(k).
(2) Necessity. Assume that Y

AY is distributed as
2
(k). Since A is symmetric, there exists
orthogonal P such that P

AP = D, where D is diagonal. Let Z = P

Y ; then
Y

AY = Z

APZ = Z

DZ =
n

i=1
d
i
z
2
i
,
and we dene the right hand side as . Since the z
i
are N(0, 1) and independent, the moment
generating function of is
n

i=1
(1 2d
i
)
1/2
.
The moment generating function for Y

AY (since it is distributed as
2
(k)) is (1 2)
k/2
. These
two moment generating functions must obviously equal one another, which is possible only if k of
the d
i
are equal to 1 and the rest are equal to 0; but this implies that A is idempotent of rank k.
In the next theorem we introduce matrices A
r
and A
u
; the subscripts that identify the matrices
refer to the context of hypothesis testing in the regression model and indicate the model restricted
by the hypothesis or the unrestricted model.
Theorem 7. Let u be an n-vector distributed as N(0,
2
I), and let A
r
and A
u
be two idempo-
tent matrices, with A
r
= A
u
and A
r
A
u
= A
u
. Then, letting u
r
= A
r
u and u
u
= A
u
u, the quantity
F =
(u

r
uru

u
uu)/(tr(Ar)tr(Au))
u

u
uu/tr(Au)
has the F distribution with tr(A
r
) tr(A
u
) and tr(A
u
) degrees of
freedom.
Proof . Dividing both numerator and denominator by
2
, we nd from Theorem 6 that the
denominator has
2
(tr(A
u
)) distribution. From the numerator we have u

r
u
r
u

u
u
u
= u

r
A
r
u
u

u
A
u
u = u

(A
r
A
u
)u. But (A
r
A
u
) is idempotent, because
(A
r
A
u
)
2
= A
2
r
A
r
A
u
A
u
A
r
+ A
2
u
= A
r
A
u
A
u
+A
u
= A
r
A
u
.
Hence, the numerator divided by
2
has
2
(tr(A
r
) tr(A
u
)) distribution. But the numerator and
the denominator are independent, because the matrices of the respective quadratic forms, A
r
A
u
and A
u
, have a zero product.
6 Quandt
Theorem 8. If Y is distributed as N(,
2
I), then Y

AY/
2
is distributed as noncentral

2
(k, ), where =

A/2
2
if and only if A is idempotent of rank k.
The proof is omitted.
Theorem 9. The trace of a product of square matrices is invariant under cyclic permutations
of the matrices.
Proof . Consider the product BC, where B and C are two matrices of order n. Then tr(BC) =

j
b
ij
c
ji
and tr(CB) =

i

j
c
ij
b
ji
. But these two double sums are obviously equal to one
another.
Theorem 10. All symmetric, idempotent matrices not of full rank are positive semidenite.
Proof . For symmetric and idempotent matrices, we have A = AA = A

A. Pre- and postmulti-

plying by x

and x respectively, x

Ax=y

y, which is
>
=
0.
Theorem 11. If A is idempotent of rank r, then tr(A) = r.
Proof . It follows from Eq.(4) in Theorem 5 that P

AP = E
r
. Hence, from Theorem 9 it follows
that tr(A) = tr(P

AP) = tr(APP

) = tr(E
r
) = r.
Theorem 12. If Y is distributed with mean vector = 0 and covariance matrix
2
I, then
E(Y

AY ) =
2
tr(A).
Proof .
E(Y

AY ) = E{

j
a
ij
y
i
y
j
} = E{

i
a
ii
y
2
i
} +E{

j
i=j
a
ij
y
i
y
j
}
. But when i = j, E(y
i
y
j
) = E(y
i
)E(y
j
) = 0; hence E(Y

AY ) =
2
tr(A).
We now specify the regression model as
Y = X +u (6)
where Y is an n 1 vector of observations on the dependent variable, X is an n k matrix of
observations on the independent variables, and u is an n1 vector of unobservable error terms. We
make the following assumptions:
Assumption 1. The elements of X are nonstochastic (and, hence, may be taken to be identical
in repated samples).
Assumption 2. The elements of the vector u are independently distributed as N(0,
2
).
It follows from Assumption 2 that the joint density of the elements of u is
L =
_
1
2
2
_
n/2
e
u

u/2
2
.
Regression Theory 7
The loglikelihood is
log L =
_
n
2
_
log(2)
_
n
2
_
log
2

1
2
2
(Y X)

(Y X),
and its partial derivatives are
log L

=
1

2
_
X

(Y X)

log L

2
=
1
2
2
+
1
2
4
(Y X)

(Y X).
Setting these equal to zero yields the maximum likelihood estimates, which are

= (X

X)
1
X

Y (7)
and

2
= (Y X

(Y X

)/n. (8)
Remark. It is obvious from the form of the likelihood function that the estimate

is also the
least squares estimate of , i.e., the estimate that minimizes (Y X)

(Y X), without any

assumptions about the distribution of the error term vector u.
Theorem 13.

is unbiased.
Proof . E
_
(X

X)
1
X

= (X

X)
1
X

E(X +u) = + E(u) = .

Theorem 14. E(
2
) =
nk
n

2
.
Proof . Substituting into Eq.(6) for

yields
E(
2
) =
1
n
E
_
(Y X(X

X)
1
X

Y )

(Y X(X

X)
1
X

Y )

=
1
n
E
_
Y

(I X(X

X)
1
X

)(I X(X

X)
1
X

=
1
n
E
_
Y

(I X(X

X)
1
X

, (9)
since I X(X

X)
1
X

is idempotent, as may be easily veried by multiplying it by itself. Substi-

tuting X + u for Y in Eq.(9) yields
E(
2
) =
1
n
E
__
u

(I X(X

X)
1
X

)u
_
,
and applying Theorem 12,
E(
2
) =
1
n

2
tr(I X(X

X)
1
X

) =
1
n

2
_
tr(I) tr
_
X(X

X)
1
X

_
=
1
n

2
_
n tr
_
(X

X)(X

X)
1
_
=
n k
n

2
.
It follows from Theorem 14 that an unbiased estimator of
2
can be dened as
2
ub
= (n/(n
k))
2
.
8 Quandt
Theorem 15.

has the k-variate normal distribution with mean vector and covariance
matrix
2
(X

X)
1
.
Proof . Normality of the distribution follows from Theorem 1. The fact that the mean of this
distribution is follows from Theorem 13. The covariance matrix of

is
E
_
(

)(

= E(

= E{(X

X)
1
X

(X +u)(X +u)

X(X

X)
1
}

= E
_
(X

X)
1
X

2
IX(X

X)
1

=
2
(X

X)
1
.
Theorem 16. (n k)
2
ub
/
2
is distributed as
2
(n k).
Proof . (nk)
2
ub
can be written as
u

_
IX(X

X)
1
X

. Since IX(X

X)
1
X

is idempotent
of rank n k, Theorem 6 applies.
Theorem 17.

and
2
ub
are independently distributed.
Proof . Multiplying together the matrix of the linear form, (X

X)
1
X

, and of the quadratic

form, I X(X

X)
1
X

, we obtain
(X

X)
1
X

_
I X(X

X)
1
X

_
= 0,
which proves the theorem.
Theorem 18. (Markov) Given Y = X + u, E(u) = 0, E(uu

) =
2
I, and E(u

X) = 0, the
best, linear, unbiased estimate of is given by

= (X

X)
1
X

Y .
Proof . First note that Assumption 2 is not used in this theorem. Let C be any constant matrix
of order k n and dene the linear estimator

= CY
. Without loss of generality, let C = (X

X)
1
X

+ B, so that

=
_
(X

X)
1
X

+ B

Y. (10)
Then unbiasedness requires that
E(

) = E
__
(X

X)
1
X

+ B

Y
_
= + BX = .
For the last equality to hold, we require that BX = 0.
The covariance matrix of

is
E(

)(

= E(

= E
__
(X

X)
1
X

+ B
_
X + u
_

_
X(X

X)
1
+ B

,
Regression Theory 9
where we have substituted for

from Eq.(10) and for Y we have substituted X + u. Noting that
any expression in the product with BX in it is zero due to the requirement of unbiasedness, and any
expression with a single u becomes zero when we take expectations, the covariance matrix simplies
to
2
_
(X

X)
1
+ BB

. But BB

is a positive semidenite matrix; hence the covariance matrix of

exceeds that of

by a positive semidenite matrix. Hence the least squares estimator is best.
Theorem 19. Denoting the i
th
diagonal element of (X

X)
1
by (X

X)
1
i
, the quantity
_
(

i
)/
_
(X

X)
1
i

2
_1
2

/
_

2
ub
/
2
1
2
has the t distribution with n k degrees of freedom.
Proof . (

i
)/
_
(X

X)
1
i

2
1
2
is normally distributed as N(0, 1) by Theorem 15.
2
ub
(nk)/
2
is distributed as
2
(n k) by Theorem 16. The two are independently distributed by Theorem 17.
But the ratio of an N(0, 1) variable to the squareroot of an independent
2
variate, which has been
rst divided by its degrees of freedom, has t(n k) distribution.
We next consider a test on a subset of the regression parameters. For this purpose we partition
the regression model as
Y = X
1

1
+ X
2

2
+ u, (11)
where X
1
is n(k q) and X
2
is nq, and where
1
and
2
are (k q)- and q-vectors respectively.
We shall test a hypothesis about the vector
2
, leaving
1
unrestricted by the hypothesis.
Dene S(

1
,

2
) as the sum of the squares of the regression deviations when the least-squares
estimates for both
1
and
2
are used, and denote the sum of the squares of deviations when
2
is
assigned a xed value, and a least-squares estimate is obtained for
1
as a function of the chosen
2
by S(

1
(
2
),
2
).
Theorem 20. The quantity
__
S(

1
(
2
),
2
) S(

1
,

2
)
_
/q

/
_
S(

1
,

2
)/(n k)

is distributed
as F(q, n k).
Proof . The least squares estimate for
1
, as a function of
2
, is

1
(
2
) = (X

1
X
1
)
1
X

1
(Y X
2

2
)
and the restricted sum of squares is
S(

1
(
2
),
2
) =
_
Y X
2

2
X
1
(X

1
X
1
)
1
X

1
(Y X
2

2
)

_
Y X
2

2
X
1
(X

1
X
1
)
1
X

1
(Y X
2

2
)

= (Y

2
X

2
)A
r
(Y X
2

2
) = (

1
X

1
+ u

)A
r
(X
1

1
+ u)
= u

A
r
u,
where A
r
= I X
1
(X

1
X
1
)
1
X

1
, and where we have used Eq.(11) to replace Y . Since S(

1
,

2
) is
u

A
u
u, where A
u
= I X(X

X)
1
X

, the ratio in the theorem can be written as

F =
u

(A
r
A
u
)u/q
u

A
u
u/(n k)
. (12)
10 Quandt
A
r
A
u
is idempotent, for
(A
r
A
u
)
2
= A
r
A
u
A
r
A
u
A
u
A
r
,
but
A
u
A
r
=
_
I X(X

X)
1
X

_
I X
1
(X

1
X
1
)
1
X

= I X(X

X)
1
X

X
1
(X

1
X
1
)
1
X

1
+ X(X

X)
1
X

X
1
(X

1
X
1
)
1
X

1
,
where the last matrix on the right can be written as
X
_
I
0
_
(X

1
X
1
)
1
X

1
= X
1
(X

1
X
1
)
1
X

1
.
Hence, A
u
A
r
= A
u
and A
r
A
u
is idempotent. It further follows immediately that A
u
(A
r

A
u
) = 0; hence the numerator and denominator are independently distributed. But the ratio of
two independent
2
variates, when divided by their respective degrees of freedom, has the indicated
F-distribution.
Corollary 2. If
1
is the scalar parameter representing the constant term, and
2
the vector
of the remaining k 1 parameters, the F-statistic for testing the null hypothesis H
0
:
2
= 0 is
_
R
2
/(k 1)

/
_
(1 R
2
)/(n k)

, where R
2
is the coecient of determination.
Proof . Dene y = Y Y , x = X
2
X
2
, where Y is the vector containing for each element the
sample mean of the elements of Y and X
2
is the matrix the j
th
column of which contains for each
element the sample mean of the corresponding column of X
2
. Then write the regression model in
deviations from sample mean form as
y = x
2
+v
where v represents the deviations of the error terms from their sample mean. The numerator of the
test statistic in Theorem 20 can then be written as
_
S(
2
) S(

2
)

/(k 1). (13)

We have
S(
2
) = y

y +

2
x

x
2
2

2
x

y (14)
and
S(

2
) = (y x

2
)

(y x

2
) = y

2
x

2
(15)
Note that the parenthesized expression in the last equation is zero by the denition of

2
as the
least squares estimate. Then
S(
2
) S(

2
) =

2
x

x
2
2

2
x

y +

2
x

y
Regression Theory 11
=

2
x

x
2
2

2
x

y +

2
x

y +

2
x

2
(by adding and subtracting

2
x

2
)
=

2
x

x
2
2

2
x

y +

2
x

2
(by noting that the third and fth terms cancel)
=

2
x

x
2
+

2
x

2
2

2
x

2
(by replacing x

y by x

2
)
= (
2

2
)

x(
2

2
). (16)
We also dene
R
2
= 1
S(

2
)
y

y
. (17)
Under H
0
, S(
2
) S(

2
) =

2
x

y = y

yR
2
from the rst line of Eq.(16) by the denition of R
2
in
Eq.(17). Combining the denition of S(

2
) with that of R
2
yields for the denominator y

y R
2
y

y.
Substituting these expressions in the denition of F and cancelling out y

y from the numerator and

the denominator yields the result.
Theorem 21. Let X
r
be a n k and X an n p matrix, and assume that there exists a
matrix C of order p k such that X
r
= XC. Then the matrices A
r
= I X
r
(X

r
X
r
)
1
X

r
and
A
u
= I X(X

X)
1
X

satisfy the conditions of Theorem 7.

Proof . A
r
and A
u
are obviously idempotent. To show that A
r
A
u
= A
u
, multiply them to obtain
A
r
A
u
= I X
r
(X

r
X
r
)
1
X

r
X(X

X)
1
X

+ X
r
(X

r
X
r
)
1
X

r
X(X

X)
1
X

= I X
r
(X

r
X
r
)
1
X

r
X(X

X)
1
X

+ X
r
(X

r
X
r
)
1
C

= I X
r
(X

r
X
r
)
1
X

r
= A
u
,
where we have replaced X

r
in the rst line by C

, cancelled the term (X

X)(X

X)
1
, and nally
replaced C

by X

r
.
The precise form of the test statistics in performing tests on subsets of regression coecients
depends on whether there are enough observations (degrees of freedom) to obtain least squares
regression coecients under the alternative hypothesis. We rst consider the case of sucient degrees
of freedom.
Sucient Degrees of Freedom.
Case 1: Test on a Subset of Coecients in a Regression. Write the model as
Y = X
1

1
+X
2

2
+ u = X +u, (18)
12 Quandt
where X
1
is nk
1
and X
2
is nk
2
, and where we wish to test H
0
:
2
= 0. Under H
0
, the model is
Y = X
1

1
+ u, (19)
and we can write
X
1
= X
_
I
0
_
,
where the matrix on the right in parentheses is of order (k
1
+ k
2
) k
1
. Hence the conditions of
Theorem 7 are satised. Denote the residuals from Eq.(18) and (19) respectively by u
u
and u
r
.
Then the F-test of Theorem 20 can also be written as
( u

r
u
r
u

u
u
u
)/k
2
u
u
u
u
/(n k
1
k
2
)
, (20)
which is distributed as F(k
2
, n k
1
k
2
).
Since u
u
= (I X(X

X)
1
X

)u and u
r
= (I X
1
(X

1
X
1
)
1
X

1
)u, the numerator is u

(A
r

A
u
)u/tr(A
r
A
u
), where the trace in question is n k
1
(n k
1
k
2
)) = k
2
. By the same token,
the denominator is u

A
u
u/(n k
1
k
2
). But this is the same as the statistic (12).
Case 2: Equality of Regression Coecients in Two Regressions. Let the model be given by
Y
i
= X
i

i
+ u
i
i = 1, 2, (21)
where X
1
is n
1
k and X
2
is n
2
k, and where k < min(k
1
, k
2
). We test H
0
:
1
=
2
. The
unrestricted model is then written as
Y =
_
Y
1
Y
2
_
=
_
X
1
0
0 X
2
_ _

2
_
+
_
u
1
u
2
_
, (22)
and the model restricted by the hypothesis can be written as
Y =
_
X
1
X
2
_

1
+ u. (23)
We obviously have
X
r
= X
_
I
I
_
,
where the the matrix in brackets on the right is of order 2k k and the conditions of Theorem 7
are satised. The traces of the restricted and unrestricted A-matrices are
tr(A
r
) = tr
_
I X
r
(X

r
X
r
)
1
X

r
_
= tr
_
I
n1+n2

_
X
1
X
2
_
(X

1
X
1
+ X

2
X
2
)
1
(X

1
X

2
)
_
= n
1
+n
2
k
and
tr(A
u
) = tr
_
I
_
X
1
0
0 X
2
_ _
X

1
X
1
0
0 X

2
X
2
_
1
_
X

1
0
0 X

2
_
_
= n
1
+n
2
2k.
Regression Theory 13
Letting u
u
and u
r
denote, as before, the unrestricted and restricted residuals respectively, the test
statistic becomes
( u

r
u
r
u

u
u
u
)/k
u

u
u
u
/(n
1
+n
2
2k)
,
which is distributed as F(k, n
1
+n
2
2k).
Case 3: Equality of Subsets of Coecients in Two Regressions. Now write the model as
Y
i
= X
i

i
+ Z
i

i
+ u
i
i = 1, 2,
where X
i
is of order n
i
k
1
and Z
i
is of order n
i
k
2
, with k
1
+k
2
< min(n
1
, n
2
). The hypothesis
to be tested is H
0
:
1
=
2
.
The unrestricted and restricted models are, respectively,
Y =
_
Y
1
Y
2
_
=
_
X
1
0 Z
1
0
0 X
2
0 Z
2
_
_

2
_

_ +
_
u
1
u
2
_
= X + u (24)
and
Y =
_
X
1
Z
1
0
X
2
0 Z
2
_
_
_

2
_
_
+u = X
r

r
+ u. (25)
Clearly,
X
r
= X
_

_
I 0 0
I 0 0
0 I 0
0 0 I
_

_,
and hence the conditions of Theorem 7 are satised. Hence we can form the F-ratio as in Eq.(12)
or (20), where the numerator degrees of freedom are tr(A
r
A
u
) = n
1
+n
2
k
1
2k
2
(n
1
+n
2

2k
1
2k
2
) = k
1
and the denominator degrees of freedom are tr(A
u
) = n
1
+n
2
2k
1
2k
2
.
Insucient Degrees of Freedom.
Case 4: Equality of Regression Coecients in Two Regressions. This case is the same as Case
2, except we now assume that n
2
<
=
k. Denote by u
r
be the residuals from the restricted regression
using the full set of n
1
+ n
2
observations and let u
u
denote the residuals from the regression using
only the rst n
1
observations. Then u
r
= A
r
u, u
u
= A
1
u
1
, where A
1
= I X
1
(X

1
X
1
)
1
X

1
. We
can then also write u
u
= [A
1
0]u, and u

u
u
u
= u

A
u
u, where
A
u
=
_
A
1
0
0 0
_
(26)
is an (n
1
+ n
2
) (n
1
+ n
2
) matrix. Since X

1
A
1
= 0, we have
X

A
u
= [X

1
X

2
]
_
A
1
0
0 0
_
= 0
14 Quandt
Hence A
r
A
u
= A
u
, and the conditions of Theorem 7 are satised. The relevant traces are tr(A
r
) =
n
1
+n
2
k and tr(A
u
) = n
1
k.
Case 5: Equality of Subsets of Coecients in Two Regressions. This is the same case as Case
3, except that we now assume that k
2
< n
2
<
=
k
1
+ k
2
; hence there are not enough observations
in the second part of the dataset to estimate a separate regression equation. As before, let u
r
be
the residuals from the restricted model, and u
u
be the residuals from the regression on the rst n
1
observations. Denote by W
1
the matrix [X
1
Z
1
]. Again as before, u
r
= A
r
u, u
u
= A
1
u
1
= [A
1
0]u,
where A
1
= I W
1
(W

1
W
1
)
1
W

1
]. Dening A
u
as in Eq.(26), we again obtain A
r
A
u
= A
u
and the
conditions of Theorem 7 are satised. The relevant traces then are tr(A
r
) = n
1
+n
2
k
1
2k
2
and
tr(A
u
) = n
1
k
1
k 2. Notice that the requirement that tr(A
r
) tr(A
u
) be positive is fullled if
and only if n
2
> k
2
, as we assumed.
Irrelevant Variables Included.
Consider the case in which
Y = X
1

1
+u
is the true model, but in which the investigator mistakenly estimates the model
Y = X
1

1
+ X
2

2
+ u.
The estimated coecient vector

= (

2
) becomes

=
_
(X

1
X
1
) (X

1
X
2
)
(X

2
X
1
) (X

2
X
2
)
_1 _
X

1
X

2
_
[X
1

1
+u]
=
_
I
0
_

1
+
_
(X

1
X
1
) (X

1
X
2
)
(X

2
X
1
) (X

2
X
2
)
_1 _
X

1
X

2
_
u
,
from which it follows that
E
_

2
_
=
_

1
0
_
,
and hence the presence of irrelevant variables does not aect the unbiasedness of the regression
parameter estimates. We next prove two lemmas needed for Theorem 22.
Lemma 1. If A is n k, n
>
=
k, with rank k, then A

A is nonsingular.
Proof . We can write
A

A = [A

1
A

2
]
_
A
1
A
2
_
= A

1
A
1
+ A

2
A
2
,
where A

1
A
1
is of order and rank k. We can then write C

A
1
= A
2
, where C

is an (n k) k
matrix. Then A

A = A

1
[I + CC

]A
1
. The matrix CC

is obviously positive semidenite; but then

I +CC

is positive denite (because its eigenvalues exceed the corresponding eigenvalues of CC

by
unity). But then A

A is the product of three nonsingular matrices.

Regression Theory 15
Lemma 2. Let X
1
be nk
1
, X
2
be n k
2
, let the columns of X = [X
1
X
2
] be linearly inde-
pendent, and dene M = I X
1
(X

1
X
1
)
1
X

1
. Then the matrix (X

2
X
2
) (X

2
X
1
)(X

1
X
1
)
1
(X

1
X
2
)
is nonsingular.
Proof . Write (X

2
X
2
) (X

2
X
1
)(X

1
X
1
)
1
(X

1
X
2
) as X

2
MX
2
. M obviously has rank (M) =
nk
1
. Since MX
1
= 0, the columns of X
1
span the null-space of M. It follows that (MX
2
) = k
2
, for
if the rank of MX
2
were smaller than k
2
, there would exist a vector c
2
= 0 such that MX
2
c
2
= 0, and
the vector X
2
c
2
would lie in the null-space of M, and would therefore be spanned by the columns of
X
1
. But then we could write X
2
c
2
+X
1
c
1
= 0, for a vector c
1
= 0, which contradicts the assumption
that the columns of X are linearly independent.
But then it follows that X

2
MX
2
= X

2
M

MX
2
has rank k
2
by Lemma 1.
Theorem 22. The covariance matrix for

1
with irrelevant variables included exceeds the
covariance matrix for the correctly specied model by a positive semidenite matrix.
Proof . The covariance matrix for

1
for the incorrectly specied model is obviously the upper
left-hand block of of

2
_
(X

1
X
1
) (X

1
X
2
)
(X

2
X
1
) (X

2
X
2
)
_
1
whereas the covariance matrix in the correctly specied model is

2
(X

1
X
1
)
1
.
Since the inverse of a partitioned matrix can be written as
_
A B
C D
_
1
=
_
A
1
_
I + B(D CA
1
B
_
1
CA
1

A
1
B(D CA
1
B)
1
(D CA
1
B)
1
CA
1
(D CA
1
B)
1
_
,
the required upper left-hand block of the covariance matrix in the misspecied model is

2
(X

1
X
1
)
1
_
I + (X

1
X
2
)[(X

2
X
2
) (X

2
X
1
)(X

1
X
1
)
1
(X

1
X
2
)]
1
(X

2
X
1
)(X

1
X
1
)
1

=
2
_
(X

1
X
1
)
1
+ (X

1
X
1
)
1
(X

1
X
2
)
_
X

2
(I X
1
(X

1
X
1
)
1
X

1
)X
2

1
(X

2
X
1
)(X

1
X
1
)
1
_
Subtracting
2
(X

1
X
1
)
1
, we obtain the dierence between the two covariance matrices as

2
_
(X

1
X
1
)
1
(X

1
X
2
)
_
X

2
(I X
1
(X

1
X
1
)
1
X

1
)X
2

1
(X

2
X
1
)(X

1
X
1
)
1
_
Since the matrix in square brackets is positive denite, its inverse exists and the matrix in
__
is
positive semidenite.
16 Quandt
Relevant Variables Omitted.
Consider the case in which the true relation is
Y = X
1

1
+ X
2

2
+ u, (27)
but in which the relation
Y = X
1

1
+u (28)
is estimated. Then we have
Theorem 23. For the least squares estimator

1
we have E(

1
) =
1
+ (X

1
X
1
)
1
(X

1
X
2
)
2
.
Proof . We have

1
= (X

1
X
1
)
1
X
1
Y = (X

1
X
1
)
1
X

1
[X
1

1
+X
2

2
+u], and taking expectations
leads to the result.
Estimation with Linear Restrictions.
Now consider the model
Y = X + u, (29)
r = R, (30)
where r is p 1, R is p k, p < k, and where the rank of R is (R) = p. We assume that the
elements of r and R are known numbers; if the rank of R were less than p, then some restrictions
could be expressed as linear combinations of other restrictions and may be omitted. Minimizing the
sum of the squares of the residuals subject to (30) requires forming the Lagrangian
V = (Y X)

(Y X)

(R r), (31)
where is a vector of Lagrange multipliers. Now denote by

and

the estimates obtained by
setting the partial derivatives of Eq.(29) equal to zero, and let

denote the least squares estimates
without imposing the restrictions (30). Then we have
Theorem 24.

=

+ (X

X)
1
R

(R(X

X)
1
R

)
1
(r R

) and

= 2(R(X

X)
1
R

)
1
(r
R

).
Proof . Setting the partial derivatives of Eq.(31) equal to zero yields
V

= 2X

Y + 2(X

= 0 (32)
V

= R

r = 0. (33)
Multiplying Eq.(32) on the left by R(X

X)
1
yields
2R(X

X)
1
X

Y + 2R

R(X

X)
1
R

= 0
Regression Theory 17
or
2R

+ 2R

R(X

X)
1
R

= 0. (34)
Since

satises the constraints by denition, Eq.(34) yields

= 2
_
R(X

X)
1
R

1
(r R

).
Substituting this result in Eq.(32) and solving for

yields

=

+ (X

X)
1
R

_
R(X

X)
1
R

1
(r R

). (34a)
Corollary 3. If r R

= 0, then E(

) = and

= 0.
Proof . Substituting X + u for Y and (X

X)
1
X

Y for

in the expression for

in Eq.(34a)
and taking expectations yields the result for E(

). Using this by replacing

by in the formula
for

yields the result for E(

).
Dene
A = I (X

X)
1
R

_
R(X

X)
1
R

1
R. (35)
We then have
Theorem 25. The covariance matrix of

is
2
A(X

X)
1
.
Proof . Substituting (X

X)
1
X

Y for

and X +u for Y in

(Eq.(34a)), we can write

) = (X

X)
1
_
X

_
R(X

X)
1
R

1
R(X

X)
1
X

u
=
_
I (X

X)
1
R

_
R(X

X)
1
R

1
R

X)
1
X

u = A(X

X)
1
X

u.
(36)
Multiplying Eq.(36) by its transpose and taking expectations, yields
Cov(

) =
2
A(X

X)
1
A

=
2
_
I (X

X)
1
R

_
R(X

X)
1
R

1
R

X)
1
_
I R

_
R(X

X)
1
R

1
R(X

X)
1

=
2
_
(X

X)
1
(X

X)
1
R

_
R(X

X)
1
R

1
R(X

X)
1
(X

X)
1
R

_
R(X

X)
1
R

1
R(X

X)
1
+ (X

X)
1
R

_
R(X

X)
1
R

1
R(X

X)
1
R

_
R(X

X)
1
R

1
R(X

X)
1

=
2
_
(X

X)
1
(X

X)
1
R

_
R(X

X)
1
R

1
R(X

X)
1

=
2
A(X

X)
1
.
18 Quandt
We now consider the test of the null hypothesis H
0
: R = r. For this purpose we construct an
F-statistic as in Theorem 20 (see also Eq.(12)).
The minimum sum of squares subject to the restriction can be written as
S
r
=
_
Y X
_

+ (X

X)
1
R

_
R(X

X)
1
R

1
(r R

)
_

_
Y X
_

+ (X

X)
1
R

_
R(X

X)
1
R

1
(r R

)
_
= (Y X

(Y X

)
_
X(X

X)
1
R

_
R(X

X)
1
R

1
(r R

(Y X

)
(Y X

_
X(X

X)
1
R

_
R(X

X)
1
R

1
(r R

+ (r R

_
R(X

X)
1
R

1
(X

X)
1
(X

X)(X

X)
1
R

_
R(X

X)
1
R

1
(r R

)
= S
u
+ (r R

_
R(X

X)
1
R

1
(r R

),
(37)
where S
u
denotes the unrestricted minimal sum of squares, and where the disappearance of the
second and third terms in the third and fourth lines of the equation is due to the fact that that
X

(Y X

) = 0 by the denition of

. Substituting the least squares estimate for

in (37), we
obtain
S
r
S
u
= [r R(X

X)
1
X

Y ]

_
R(X

X)
1
R

1
[r R(X

X)
1
X

Y ]
= [r R R(X

X)
1
X

_
R(X

X)
1
R

1
[r R R(X

X)
1
X

u]
= u

X(X

X)
1
R

_
R(X

X)
1
R

1
R(X

X)
1
X

u = u

B
1
u,
(38)
since under H
0
, r R = 0. The matrix B
1
is idempotent and of rank p because
X(X

X)
1
R

_
R(X

X)
1
R

1
R(X

X)
1
(X

X)(X

X)
1
R

_
R(X

X)
1
R

1
R(X

X)
1
X

= X(X

X)
1
R

_
R(X

X)
1
R

1
R(X

X)
1
X

and
tr(B
1
) = tr(X(X

X)
1
R

_
R(X

X)
1
R

1
R(X

X)
1
X

)
= tr(R

_
R(X

X)
1
R

1
R(X

X)
1
(X

X)(X

X)
1
)
= tr(
_
R(X

X)
1
R

1
R(X

X)
1
R

) = tr(I
p
) = p.
The matrix of the quadratic form S
u
is clearly B
2
= I X(X

X)
1
X

which is idempotent and or

rank n k. Moreover, B
1
B
2
= 0, since
X(X

X)
1
R

_
R(X

X)
1
R

1
R(X

X)
1
X

(I X(X

X)
1
X

) = 0.
Hence
(S
r
S
u
)/p
S
u
/(n k)
is distributed as F(p, n k).
We now turn to the case in which the covariance matrix of u is and we wish to test the
hypothesis H
0
: R = r. We rst assume that is known. We rst have
Regression Theory 19
Theorem 26. If u is distributed as N(0, ), and if is known, then the Lagrange Multiplier,
Wald, and likelihood ratio test statistics are identical.
Proof . The loglikelihood function is
log L() = (2)
n/2
+
1
2
log |
1
|
1
2
(Y X)

1
(Y X),
where |
1
| denotes the determinant of
1
, and the score vector is
log L

= X

1
(Y X).
By further dierentiation, the Fischer Information matrix is
I() = X

1
X.
The unrestricted maximum likelihood estimator for is obtained by setting the score vector equal
to zero and solving, which yields

= (X

1
X)
1
X

1
Y.
Letting u denote the residuals Y X

, the loglikelihood can be written as
log L =
n
2
log(2) +
1
2
log |
1
|
1
2
u

1
u.
To obtain the estimates restricted by the linear relations R = r, we form the Lagrangian
L(, ) = log L() +

(R r)
and set its partial derivatives equal to zero, which yields
log L

= X

1
(Y X) + R

= 0
log L

= R r = 0.
(39)
Multiply the rst equation in (39) by (X

1
X)
1
, which yields

=

+ (X

1
X)
1
R

.
Multiplying this further by R, and noting that R

= r, we obtain

=
_
R(X

1
X)
1
R

1
(R

r) (40)

=

(X

1
X)
1
R

_
R(X

1
X)
1
R

1
(R

r). (41)
20 Quandt
The loglikelihood, evaluated at

is
log L(

) =
n
2
log(2) +
1
2
log |
1
|
1
2
u

1
u.
We now construct the test statistics. The Lagrange multiplier statistic is
LM =
_
log L

)
1
_
log L

_
= u

1
X(X

1
X)
1
X

1
u
=

R(X

1
X)
1
R

.
(42)
The Wald statistic is
W = (R

_
R(X

1
X)
1
R

1
(R

r), (43)
and since the covariance matrix of (R

r) is R(X

1
X)
1
R

, W can be written as
W = (R

_
R(X

1
X)
1
R

1
_
R(X

1
X)
1
R

_
R(X

1
X)
1
R

1
(R

r)
=

_
R(X

1
X)
1
R

= LM,
where we have used the denition of

in (40). The likelihood ratio test statistic is
LR = 2
_
log L(

) log L(

= u

1
u u

1
u. (44)
Since
1/2
u =
1/2
(Y X

), and substituting in this for

from its denition in (41), we obtain

1/2
u =
1/2
_
Y X

X(X

1
X)
1
R

. (45)
We multiply Eq.(45) by its transpose and note that terms with (Y X

)
1
X vanish; hence we
obtain
u

1
u = u

1
u +

R(X

1
X)
1
R

.
But the last term is the Lagrange multiplier test statistic from (42); hence comparing this with (44)
yields LR = LM.
We now consider the case when is unknown, but is a smooth function of a p-element vector ,
and denoted by (). We then have
Theorem 27. If u is normally distributed as N(0, ()), then W
>
=
LR
>
=
LM.
Proof . Denote by

the vector (

). The loglikelihood is
log L() =
n
2
log(2) +
1
2
log |
1
()| +
1
2
(Y X)

1
()(Y X).
Regression Theory 21
Denoting the unrestricted estimates by

and the restricted estimates by

, as before, and in par-
ticular, denoting by

the matrix ( ) and by

the matrix ( ), the three test statistics can be
written, in analogy with Eqs.(42) to (44), as
LM = u

1
X(X

1
X)
1
X

1
u
W = (R

_
R(X

1
X)
1
R

1
(R

r)
LR = 2
_
log L( ,

) log L( ,

)
_
.
Now dene
LR( ) = 2
_
log L( ,

) log L( ,

u
)
_
, (46)
where

u
is the unrestricted maximizer of logL( , ) and
LR( ) = 2
_
log L( ,

r
) log L( ,

)
_
, (47)
where

r
is the maximizer of log L( , ) subject to the restriction R r = 0. LR( ) employs the
same matrix as the LM statistic; hence by the argument in Theorem 26,
LR( ) = LM.
It follows that
LR LM = LR LR( ) = 2
_
log L( ,

) log L( ,

u
)

>
=
0,
since the and

estimates are unrestricted. We also note that W and LR( ) use the same ,
hence they are equal by Theorem 26. Then
W LR = LR( ) LR = 2
_
log L( ,

) log L( ,

r
)
_
>
=
0, (48)
since

r
is a restricted estimate and the highest value of the likelihood with the restriction that can
be achieved is log L( ,

). Hence W
>
=
LR
>
=
LM.
We now prove a matrix theorem that will be needed subsequently.
Theorem 28. If is symmetric and positive denite of order p, and if H is of order p q,
with q
<
=
p, and if the rank of H is q, then
_
H
H

0
_
is nonsingular.
Proof . First nd a matrix, conformable with the rst,
_
P Q
Q

R
_
22 Quandt
such that
_
H
H

0
_ _
P Q
Q

R
_
=
_
I 0
0 I
_
.
Performing the multiplication and equating the two sides, we obtain
P + HQ

= I (49)
Q+ HR = 0 (50)
H

P = 0 (51)
H

Q = I (52)
From (49) we have
P +
1
HQ

=
1
. (53)
Multiplying Eq.(53) on the left by H

, and noting from Eq.(51) that H

P = 0, we have
H

1
HQ

= H

1
. (54)
Since H is of full rank, H

1
H is nonsingular by a straightforward extension of Lemma 1. Then
Q

= (H

1
H)
1
H

1
, (55)
which gives us the value of Q. Substituting (55) in Eq.(53) gives
P =
1

1
H(H

1
H)
1
H

1
. (56)
From Eq.(50) we have

1
HR = Q,
and multiplying this by H

and using Eq.(52) yields

1
HR = I
and
R = (H

1
H)
1
, (57)
which determines the value of R. Since the matrix
_
P Q
Q

R
_
is obviously the inverse of the matrix in the theorem, the proof is complete.
Regression Theory 23
We now consider the regression mode Y = X + u, where u is distributed as N(0,
2
I), subject
to the restrictions R = 0; hence this is the same model as considered before with r = 0. Minimize
the sum of squares subject to the restrictions by forming the Lagrangian
L = (Y X)

(Y X) +

R. (58)
The rst order conditions can be written as
_
(X

X)
1
R

R 0
_ _

_
=
_
X

Y
0
_
. (59)
Denote the matrix on the left hand side of (59) by A, and write its inverse as
A
1
=
_
P Q
Q

S
_
. (60)
We can then write the estimates as
_

_
=
_
PX

Y
Q

Y
_
, (61)
and taking expectations, we have
E
_

_
=
_
PX

X
Q

X
_
. (62)
From multiplying out A
1
A we obtain
PX

X + QR = I (63)
Q

X + SR = 0 (64)
PR

= 0 (65)
Q

= I (66)
Hence we can rewrite Eq.(62) as
E
_

_
=
_
(I QR)
SR
_
=
_

0
_
, (67)
since R = 0 by denition. This, so far, reproduces Corollary 3.
Theorem 29. Given the denition in Eq.(61), the covariance matrix of (

) is

2
_
P 0
0 S
_
.
Proof . It is straightforward to note that
cov(

) = E
__

0
__ __

0
__

=
2
_
PX

XP PX

XQ
QX

XP Q

XQ
_
. (68)
24 Quandt
From (65) and (66), multiplying the second row of A into the rst column of A
1
gives
RP = 0,
and multiplying it into the second column gives
RQ = I.
Hence, multiplying Eq.(63) on the right by P gives
PX

XP + QRP = P
or, since RP = 0,
PX

XP = P.
Multiplying Eq.(63) by Q on the right gives
PX

XQ+ QRQ = Q,
or, since RQ = I,
PX

XQ = 0.
Finally, multiplying (64) by Q on the right gives
Q

XQ +SRQ = 0,
which implies that
Q

XQ = S.
We now do large-sample estimation for the general unconstrained and constrained cases. We
wish to estimate the parameters of the density function f(x, ), where x is a random variable and
is a parameter vector with k elements. In what follows, we denote the true value of by
0
. The
loglikelihood is
log L(x, ) =
n

i=1
log f(x
i
, ). (69)
Let

be the maximum likelihood estimate and let D

be the dierential operator. Also dene I

1
()
as var(D

log f(x, )). It is immediately obvious that var(D

log L(x, )) = nI
1
(). Expanding in
Taylor Series about
0
, we have
0 = D

log L(x,

) = D

log L(x,
0
) + (

0
)D
2

log L(x,
0
) +R(x,
0
,

) (70)
Regression Theory 25
Theorem 30. If

is a consistent estimator and the third derivative of the loglikelihood function
is bounded, then

n(

0
) is distributed as N(0, I
1
(
0
)
1
).
Proof . From Eq.(70) we have

0
) =
n
1/2
D

log L(x,
0
) + n
1/2
R(x,
0
,

)
n
1
D
2

log L(x,
0
)
(71)
where R is a remainder term of the form (

0
)
3
D
3

(log L(x, )/2, being between

and
0
. The
quantity n
1/2
D

log L(x,
0
) is a sum of n terms, each of which has expectation 0 and variance
I
1
(
0
); hence by the Central Limit Theorem, n
1/2
D

log L(x,
0
) is asymptotically normally dis-
tributed with mean zero and variance equal to (1/n)nI
1
(
0
) = I
1
(
0
). The remainder term converges
in probability to zero. The denominator is 1/n times the sum of n terms, each of which has expec-
tation equal to I
1
(
0
); hence the entire denominator has the same expectation and by the Weak
Law of Large Numbers the denominator converges to this expectation. Hence

n(

0
) converges
in distribution to a random variable which is I
1
(
0
)
1
times an N(0, I
1
(
0
)) variable and hence is
asymptotically distributed as N(0, I
1
(
0
)
1
).
We now consider the case when there are p restrictions given by h()

= (h
1
(), . . . , h
p
()) = 0.
Estimation subject to the restrictions requires forming the Lagrangian
G = log L(x, )

h()
and setting its rst partial derivatives equal to zero:
D

log L(x,

) H

= 0
h(

) = 0
(72)
where H

is the k p matrix of the derivatives of h() with respect to . Expanding in Taylor Series
and neglecting the remainder term, yields asymptotically
D

log L(x,
0
) + D
2

log L(x,
0
)(

0
) H

= 0
H

0
) = 0
(73)
The matrix H

should be evaluated at

; however, writing H

= H

(
0
)

+ H

(
0
)(

0
) and
noting that if the restrictions hold,

will be near
0
and

will be small, we may take H

to be
evaluated at
0
.
Theorem 31. The vector
_
n(

0
)
1

_
26 Quandt
is asymptotically normally distributed with mean zero and covariance matrix
_
P 0
0 S
_
where
_
I
1
(
0
) H

0
_
1
=
_
P Q
Q

S
_
.
Proof . Dividing the rst line of (73) by

n and multiplying the second line by

n, we can write
_

1
n
D
2

log L(x,
0
) H

0
__
n(

0
)
1
n

_
=
_
1

n
D

log L(x,
0
)
0
_
. (74)
The upper left-hand element in the left-hand matrix converges in probability to I
1
(
0
) and the top
element on the right hand side converges in distribution to N(0, I
1
(
0
)). Thus, (74) can be written
as
_
I
1
(
0
) H

0
__
n(

0
)
1
n

_
=
_
1

n
D

log L(x,
0
)
0
_
. (75)
Eq.(75) is formally the same as Eq.(59); hence by Theorem 29,
_
n(

0
)
1

_
is asymptotically normally distributed with mean zero and covariance matrix
_
P 0
0 S
_
where
_
I
1
(
0
) H

0
_
1
=
_
P Q
Q

S
_
. (76)
We now turn to the derivation of the asymptotic distribution of the likelihood ratio test statistic.
As before,

denotes the unrestricted, and

the restricted estimator.
Theorem 32. Under the assumptions that guarantee that both the restricted and unrestricted
estimators (

and

respectively) are asymptotically normally distributed with mean zero and co-
variance matrices I
1
(
0
) and P respectively, and if the null hypothesis H
0
: h() = 0 is true, the
Regression Theory 27
likelihood ratio test statistic, 2 log = 2(logL(x,

) log L(x,

)) is asymptotically distributed as

2
(p).
Proof . Expand log L(x,

) in Taylor Series about

, which yields to an approximation
log L(x,

) = log L(x,

) + D

log L(x,

)(

) +
1
2
(

_
D
2

(log L(x,

)

). (77)
Since the second term on the right hand side is zero by denition, the likelihood ratio test statistic
becomes
2 log = (

_
D
2

log L(x,

)

). (78)
Let v be a k-vector distributed as N(0, I
1
(
0
)). Then we can write

0
) = I
1
(
0
)
1
v

0
) = Pv
(79)
where P is the same P as in Eq.(76). Then, to an approximation,
2 log = v

(I
1
(
0
)
1
P)

I
1
(
0
)(I
1
(
0
)
1
P)v
= v

(I
1
(
0
)
1
P P + PI
1
(
0
)P)v
. (80)
We next show that P = PI
1
(
0
)P. From Eq.(56) we can write
P = I
1
(
0
)
1
I
1
(
0
)
1
H(H

I
1
(
0
)
1
H)
1
H

I
1
(
0
)
1
. (80)
Multiplying this on the left by I
1
(
0
) yields
I
1
(
0
)P = I H(H

I
1
(
0
)
1
H)
1
H

I
1
(
0
)
1
,
and multiplying this on the left by P (using the right-hand side of (81)), yields
PI
1
(
0
)P = I
1
(
0
)
1
I
1
(
0
)
1
H[H

I
1
(
0
)
1
H]
1
H

I
1
(
0
)
1
I
1
(
0
)
1
H[H

I
1
(
0
)
1
H]
1
H

I
1
(
0
)
1
+I
1
(
0
)
1
H[H

I
1
(
0
)
1
H]
1
H

I
1
(
0
)
1
H[H

I
1
(
0
)
1
H]
1
H

I
1
(
0
)
1
= P
(82)
Hence,
2 log = v

(I
1
(
0
)
1
P)v. (83)
Since I
1
(
0
) is symmetric and nonsingular, it can always be written as I
1
(
0
) = AA

, where A is a
nonsingular matrix. Then, if z is a k-vector distributed as N(0, I), we can write
v = Az
28 Quandt
and E(v) = 0 and cov(v) = AA

= I
1
(
0
) as required. Then
2 log = z

(I
1
(
0
)
1
P)Az
= z

I
1
(
0
)
1
Az z

PAz
= z

)
1
A
1
Az z

PAz
= z

z z

PAz
= z

(I A

PA)z
. (84)
Now (A

PA)
2
= A

PAA

PA = A

PI
1
(
0
)PA, but from Eq.(82), P = PI
1
(
0
)P; hence A

PA is
idempotent, and its rank is clearly the rank of P. But since the k restricted estimates must satisfy
p independent restrictions, the rank of P is k p. Hence the rank of I A

PA is k (k p) = p.
We next turn to the Wald Statistic. Expanding h(

) in Taylor Series about

0
gives asymptotically
h(

) = h(
0
) +H

0
)
and under the null hypothesis
h(

) = H

0
). (85)
Since

n(

0
) is asymptotically distributed as N(0, I
1
(
0
)
1
),

nh(

), which is asymptotically
the same as H

0
), is asymptotically distributed as N(0, H

I
1
(
0
)
1
H

). Then the Wald

Statistic, h(

[cov(h(

))]
1
h(

) becomes
W = nh(

I
1
(
0
)
1
H

]
1
h(

). (86)
Theorem 33. Under H
0
: h() = 0, and if H

is of full rank r, W is asymptotically distributed

as
2
(p).
Proof . Let z be distributed as N(0, I) and let I
1
(
0
)
1
= AA

, where A is nonsingular. Then

AZ is distributed as N(0, I
1
(
0
)
1
), which is the asymptotic distribution of

0
). Thus, when
h() = 0,

nh(

) = H

0
)
is asymptotically distributed as H

Az. The Wald Statistic can be written as

W = z

I
1
(
0
)
1
H

]
1
H

Az, (87)
which we obtain by substituting in Eq.(86) the asymptotic equivalent of

nh(

). But the matrix in

Eq.(87) is idempotent of rank p, since
A

I
1
(
0
)
1
H

]
1
H

I
1
(
0
)
1
H

]
1
H

A =
A

I
1
(
0
)
1
H

]
1
H

A
Regression Theory 29
where we have substituted I
1
(
0
)
1
for AA

, I
1
(
0
)
1
is of rank k, H

is of full rank p, and A is

nonsingular.
We next turn to the Lagrange Multiplier test. If the null hypothesis that h() = 0 is true,
then the gradient of the loglikelihood function is likely to be small, where the appropriate metric is
the inverse covariance matrix for D

log L(x, ). Hence the Lagrange Multiplier statistic is written

generally as
LM = [D

log L(x,

)]

_
cov(D

log L(x,

))

1
[D

log L(x,

)]. (88)
Theorem 34. Under the null hypothesis, LM is distributed as
2
(p).
Proof . Expanding D

log L(x,

) in Taylor Series, we have asymptotically
D

log L(x,

) = D

log L(x,

) + D
2

log L(x,

)(

). (89)
D
2

log L(x,

) converges in probability to nI
1
(
0
), D

log L(x,

) converges in probability to zero,
and D

log L(x,

) converges in probability to nI
1
(
0
)(

). But asymptotically

=

under the
null; hence n
1/2
D

log L(x,

) is asymptotically distributed as N(0, I
1
(
0
)). Hence the appropriate
test is
LM = n
1
[D

log L(x,

)]I
1
(
0
)
1
[D

log L(x,

)]
which by (88) is asymptotically
LM = n
1
_
n(

I
1
(
0
)I
1
(
0
)
1
I
1
(
0
)(

= n(

I
1
(
0
)(

). (90)
But this is the same as Eq.(78), the likelihood ratio statistic, since the term D
2

log L(x,

) in
Eq.(78) is nI
1
(
0
). Since the likelihood ratio statistic has asymptotic
2
(p) distribution, so does the
LM statistic.
We now illustrate the relationship among W, LM, and LR and provide arguments for their
asymptotic distributions in a slightly dierent way than before with a regression model Y = X +u,
with u distributed as N(0,
2
I), and the restrictions R = r.
In that case the three basic statistics are
W = (R

_
R(X

X)
1
R

1
(R

r)/
2
LM = (R

_
R(X

X)
1
R

1
(R

r)/
2
LR =
n
2
(log
2
log
2
)
(91)
where W is immediate from Eq.(45) when

is set equal to
2
I, LM follows by substituting (40)
in to (42) and setting

=
2
, and LR = 2 log follows by substituting

, respectively

in the
30 Quandt
likelihood function and computing 2 log . The likelihood ratio itself can be written as
=
_
u

u/n
u

u/n
_
n/2
=
_
1
1 +
1
n
2
(R

r)
_
n/2
(92)
where we have utilized Eq.(37) by dividing both sides by S
u
and taking the reciprocal. We can also
rewrite the F-statistic
(S
r
S
u
)/p
S
u
/(n k)
as
F =
(R

_
R(X

X)
1
R

1
(R

r)/p
S
u
/(n k)
. (93)
Comparing (92) and (93) yields immediately
=
_
1
1 +
p
nk
F
_
n/2
(94)
and comparing W in (91) with (92) yields
=
_
1
1 + W/n
_
n/2
. (95)
Equating (94) and (95) yields
W
n
=
p
n k
F
or
W = p
_
1 +
k
n k
_
F. (96)
Although the left-hand side is asymptotically distributed as
2
(p) and F has the distribution of
F(p, nk), the right hand side also has asymptotic distribution
2
(p), since the quantity pF(p, nk)
converges in distribution to that chi
2
distribution.
Comparing the denitions of LM and W in (91) yields
LM =
_

2

2
W
_
(97)
and from Eq.(37) we have

2
=
2
(1 +W/n). (98)
Hence, from (97) and (98) we deduce
LM =
W
1 +W/n
, (99)
Regression Theory 31
and using (96) we obtain
LM =
p
_
n
nk
_
F
1 +
p
n
n
nk
F
=
npF
n k + pF
, (100)
which converges in distribution as n to
2
(p). From (95) we obtain
2 log = LR = nlog
_
1 +
W
n
_
. (101)
Since for positive z, e
z
> 1 +z, it follows that
LR
n
= log
_
1 +
W
n
_
<
W
n
and hence W > LR.
We next note that for z
>
=
0, log(1 + z)
>
=
z/(1 + z), since (a) at the origin the left and right
hand sides are equal, and (b) at all other values of z the derivative of the left-hand side, 1/(1 + z)
is greater than the slope of the right-hand side, 1/(1 +z)
2
. It follows that
log
_
1 +
W
n
_
>
=
W/n
1 +W/n
.
Using (99) and (101), this shows that LR
>
=
LM.
Recursive Residuals.
Since least squares residuals are correlated, even when the true errors u are not, it is inappropriate
to use the least squares residuals for tests of the hypothesis that the true errors are uncorrelated. It
may therefore be useful to be able to construct residuals that are uncorrelated when the true errors
are. In order to develop the theory of uncorrelated residuals, we rst prove a matrix theorem.
Theorem 35 (Barletts). If A is a nonsingular n n matrix, if u and v are n-vectors, and if
B = A+ uv

, then
B
1
= A
1

A
1
uv

A
1
1 + v

A
1
u
.
Proof . To show this, we verify that pre- or postmultiplying the above by b yields an identity
matrix. Thus, postmultiplying yields
B
1
B = I =
_
A
1

A
1
uv

A
1
1 +v

A
1
u
_
(A +uv

)
= I
A
1
uv

1 +v

A
1
u
+ A
1
uv

A
1
uv

1 +v

A
1
u
= I +
A
1
uv

+ A
1
uv

+A
1
uv

A
1
u) A
1
u(v

A
1
u)v

1 +v

A
1
u
= I
(102)
32 Quandt
We consider the standard regression model Y = X + u, where u is distributed as N(0,
2
I)
and where X is n k of rank k. Dene X
j
to represent the rst j rows of the X-matrix, Y
j
the
rst j rows of the Y -vector, x

j
the j
th
row of X, and y
j
the j
th
element of Y . It follows from the
denitions, for example, that
X
j
=
_
X
j1
x

j
_
and Y
j
=
_
Y
j1
y
j
_
.
Dene the regression coecient estimate based on the rst j observations as

j
= (X

j
X
j
)
1
X

j
Y
j
. (103)
We then have the following
Theorem 36.

j
=

j1
+
(X

j1
X
j1
)
1
x
j
(y
j
x

j1
)
1 +x

j
(X

j1
X
j1
)
1
x
j
.
Proof . By Theorem 35,
(X

j
X
j
)
1
= (X

j1
X
j1
)
1

j1
X
j1
)
1
x
j
x

j
(X

j1
X
j1
)
1
1 +x

j
(X

j1
X
j1
)
1
x
j
.
We also have by denition that
X
j
Y
j
= X
j1
Y
j1
+ x
j
y
j
.
Substituting this in Eq.(103) gives

j
=
_
(X

j1
X
j1
)
1

j1
X
j1
)
1
x
j
x

j
(X

j1
X
j1
)
1
1 + x

j
(X

j1
X
j1
)
1
x
j
_
(X
j1
Y
j1
+ x
j
y
j
)
=

j1
+ (X

j1
X
j1
)
1
x
j
y
j
(X

j1
X
j1
)
1
x
j
x

j
[(X

j1
X
j1
)
1
X

j1
Y
j1
] + (X

j1
X
j1
)
1
x
j
x

j
(X

j1
X
j1
)
1
x
j
y
j
1 + x

j
(X

j1
X
j1
)
1
x
j
=

j1
+
(X

j1
X
j1
)
1
x
j
(y
j
x

j1
)
1 + x

j
(X

j1
X
j1
)
1
x
j
,
where, in the second line, we bring the second and third terms on a common denominator and also
note that the bracketed expression in the numerator is

j1
by denition.
First dene
d
j
=
_
1 + x

j
(X

j1
X
j1
)
1
x
j

1/2
(104)
Regression Theory 33
and also dene the recursive residuals u
j
as
u =
y
j
x

j1
d
j
j = k + 1, . . . , n (105)
Hence, recursive residuals are dened only when the can be estimated from at least k observations,
since for j less than k + 1, (X

j1
X
j1
)
1
would not be nonsingular. Hence the vector u can be
written as
u = CY, (106)
where
C =
_

_
x

k+1
(X

k
Xk)
1
X

k
dk+1
1
dk+1
0 0 . . . 0
x

k+2
(X

k+1
Xk+1)
1
)X

k+1
dk+2
1
dk+2
0 . . . 0
.
.
.
.
.
.
.
.
.
x

n
(X

n1
Xn1)
1
)X

n1
dn
1
dn
_

_
. (107)
Since the matrix X

j
has j columns, the fractions that appear in the rst column of C are rows with
increasingly more columns; hence the term denoted generally by 1/d
j
occurs in columns of the C
matrix further and further to the right. Thus, the element 1/d
k+1
is in column k + 1, 1/d
k+2
in
column k + 2, and so on. It is also clear that C is an (n k) n matrix. We then have
, Theorem 37. (1) u is linear in Y ; (2) E( u) = 0; (3) The covariance matrix of the u is
scalar, i.e., CC

= I
nk
; (4) For all linear, unbiased estimators with a scalar covariance matrix,

n
i=k+1
u
2
i
=

n
i=1
u
2
i
, where u is the vector of ordinary least squares residuals.
Proof . (1) The linearity of u in Y is obvious from Eq.(106).
(2) It is easy to show that CX = 0 by multiplying Eq.(107) by X on the right. Multiplying, for
example, the (p k)
th
row of C, (p = k + 1, . . . , n), into X, we obtain
x

p
(X

p1
X
p1
)
1
X

p1
d
p
X
p1
+
1
d
p
x

p
= 0.
It then follows that E( u) = E(CY ) = E(C(X + u)) = E(u) = 0.
(3) Multiplying the (p k)
th
row of C into the (p k)
th
column of C

, we obtain
1
d
2
p
+
x

p
(X

p1
X
p1
)
1
X

p1
X
p1
(X

p1
X
p1
)
1
x
p
d
2
p
= 1
by denition. Multiplying the (p k)
th
row of C into the (s k)
th
column of C

, (s > p), yields

_
x

p
(X

p1
X
p1
)
1
X

p1
d
p
1
d
p
0 . . . 0
_
_

_
X
p1
x

p
x

p+1
.
.
.
_

_
(X

s1
X
s1
)
1
d
s
=
_
x

p
d
p
d
s
+
x

p
d
p
d
s
_
(X

s1
X
s1
)
1
x
s
= 0
.
34 Quandt
(4) We rst prove that C

C = I X(X

X)
1
X

. Since CX = 0 by (2) of the theorem, so is

. Dene M = I X(X

X)
1
X

; then
MC

= (I X(X

X)
1
X

= C

. (108)
Hence,
MC

= (M I)C

= 0. (109)
But for any square matrix A and any eigenvalues of A, if (AI)w = 0, then w is an eigenvector
of A. Since M is idempotent, and by Theorem 5 the eigenvalues of M are all zero or 1, the columns
of C

are the eigenvectors of M corresponding to the unit roots (which are n k in number, becase
the trace of M is n k).
Now let G

be the n k matrix which contains the eigenvectors of M corresponding to the zero

roots. Then, since M is symmetric, the matrix of all the eigenvectors of M is orthogonal and
[ C

]
_
C
G
_
= I.
Let denote the diagonal matrix of eigenvalues for some matrix A and let W be the matrix of its
eigenvectors. Then AW = W; applying this to the present case yields
M[ C

] = [ C

]
_
I 0
0 0
_
= [ C

0 ].
Hence
M = MI = M[ C

]
_
C
G
_
= [ C

0 ]
_
C
G
_
= C

C.
But
n

i=k+1
u
2
= Y

CY = Y

[I X(X

X)
1
X

]Y =
n

i=1
u
2
.
Now dene S
j
by S
j
= (Y
j
X
j

j
)

(Y
j
X
j

j
); thus S
j
is the sum of the squares of the least
squares residuals based on the rst j observations. We then have
Theorem 38. S
j
= S
j1
+ u
2
.
Regression Theory 35
Proof . We can write
S
j
= (Y
j
X
j

j
)

(Y
j
X
j

j
) = Y

j
(I X
j
(X

j
X
j
)
1
X
j
)Y
j
= Y

j
Y
j
Y

j
X
j
(X

j
X
j
)
1
X

j
X
j
(X

j
X
j
)
1
X

j
Y
j
(where we have multiplied by X

j
X
j
(X

j
X
j
)
1
)
= Y

j
Y
j

j
X

j
X
j

j
+ 2

j1
(X

j
Y
j
+X

j
X
j

j
)
(where we replaced (X

j
X
j
)
1
X

j
Y
j
by

j
and where the third
term has value equal to zero)
= Y

j
Y
j

j
X

j
X
j

j
+ 2

j1
X

j
Y
j
+ 2

j1
X

j
X
j

j
+

j1
X

j
X
j

j1
X

j
X
j

j1
(where we have added and subtracted the last term)
= (Y
j
X
j

j1
)

(Y
j
X
j

j1
) (

j1
)

j
X
j
(

j1
)
. (110)
Using the denition of X
j
and Y
j
and the denition of regression coecient estimates, we can also
write
X

j
X
j

j
= X

j
Y
j
= X

j1
Y
j1
+ x
j
y
j
= X

j1
X
j1

j1
+x
j
y +j
= (X

j
X
j
x
j
x

j
)

j1
+x
j
y
j
= X

j
X
j

j1
+ x
j
(y
j
x

j1
)
,
and multiplying through by (X

j
X
j
)
1
,

j
=

j1
+ (X

j
X
j
)
1
x
j
(y
j
x

j1
). (111)
Substituting from Eq.(111) for

j1
in Eq.(110), we obtain
S
j
= S
j1
+ (y
j
x

j1
)
2
x

j
(X

j
X
j
)
1
x
j
(y
j
x

j1
)
2
. (112)
Finally, we substitute for (X

j
X
j
)
1
in Eq.(112) from Bartletts Identity (Theorem 35), yielding
S
j
= S
j1
+ (y
j
x

j1
)
2

_
1 +x

j
(X

j1
X
j1
)
1
x
j
x

j
(X

j1
X
j1
)
1
x
j
(x

j
(X

j1
X
j1
)
1
x
j
)
2
+ (x

j
(X

j1
X
j1
)
1
x
j
)
2
1 +x

j
(X

j1
X
j1
)
1
x
j
_
from which the Theorem follows immediately, since u is dened as
(y
j
x

j1
)/(1 +x

j
(X

j1
X
j1
)
1
x
j
).
We now briey return to the case of testing the equality of regression coecients in two regression
in the case of insucient degrees of freedom (i.e., the Chow Test). As in Case 4, on p. 13, the number
36 Quandt
of observations in the two data sets is n
1
and n
2
respectively. Denoting the sum of squares from the
regression on the rst n
1
observations by u

u
u
u
and the sum of squares using all n
1
+n
2
observations
by u

r
u
r
, where the us are the ordinary (not recursive) least squares residuals, the test statistic can
be written as
( u

r
u
r
u

u
u
u
)/n
2
u

u
u
u
/(n
1
k)
.
By Theorem 37, this can be written as
(
n1+n2
i=k+1
u
2
i

n1
i=k+1
u
2
i
)/n
2
n1

i=k+1
u
2
i
/(n
1
k)
=
n1+n2

i=n1+1
u
2
i
/n
2
n1

i=k+1
u
2
i
/(n
1
k)
.
It may be noted that the numerator and denominator share no value of u
i
; since the us are inde-
pendent, the numerator and denominator are independently distributed. Moreover, each u
i
has zero
mean, is normally distributed and is independent of every other u
j
, and has variance
2
, since
E( u
2
i
) = E
_
(x

i
u
i
x

i1
)
2
1 + x

i
(X

i1
X
i1
)
1
x
i
_
=
x

i
E
_
(

i1
)(

i1
)

x
i
+E(u
2
i
)
1 +x

i
(X

i1
X
i1
)
1
x
i
=
2
.
Hence, the ratio has an F distribution, as argued earlier.
Cusum of Squares Test. We consider a test of the hypothesis that a change in the true values of
the regression coecients occured at some observation in a series of observations. For this purpose
we dene
Q
i
=
i

j=k+1
u
2
j
n

j=k+1
u
2
j
, (113)
where u
j
represent the recursive residuals.
We now have
Theorem 39. On the hypothesis that the values of the regression coecients do not change,
the random variable 1 Q
i
has Beta distribution, and E(Q
i
) = (i k)/(n k).
Proof . From Eq.(113), we can write
Q
1
i
1 =

n
j=i+1
u
2
j

i
j=k+1
u
2
j
. (114)
Since the numerator and denominator of Eq.(114) are sums of iid normal variables with zero mean
and constant variance, and since the numerator and denominator share no common u
j
, the quantity
z = (Q
1
i
1)
i k
n k
Regression Theory 37
is distributed as F(n i, i k). Consider the distribution of the random variable w, where W is
dened by
(n i)z/(i k) = w/(1 w)
. Then the density of w is the Beta density
( + 1)
()()

(1 w)

,
with = 1 + (n i)/2 and = 1 + (i k)/2. It follows that
E(1 Q
i
) =
+ 1
+ + 2
=
n i
n k
,
and
E(Q
i
) =
i k
n k
. (115)
Durbin (Biometrika, 1969, pp.1-15) provides tables for constructing condence bands for Q
i
of
the form E(Q
i
) c
0
.

Matrix Analysis (2nd) Solutions To Exercises
55% (11)
Matrix Analysis (2nd) Solutions To Exercises
81 pages
Mastering EES Themechangers - Blogspot.in
100% (3)
Mastering EES Themechangers - Blogspot.in
608 pages
Advanced Econometrics PDF
No ratings yet
Advanced Econometrics PDF
58 pages
An Introduction To Malliavin Calculus With Applications To Economics
No ratings yet
An Introduction To Malliavin Calculus With Applications To Economics
83 pages
A Handbook To Conquer Casella and Berger Book in Ten Days: Oliver Y. Chén Last Update: June 25, 2016
No ratings yet
A Handbook To Conquer Casella and Berger Book in Ten Days: Oliver Y. Chén Last Update: June 25, 2016
15 pages
Teaching Mathematics Foundations To Middle Years - (15 Place Value F-4) PDF
100% (1)
Teaching Mathematics Foundations To Middle Years - (15 Place Value F-4) PDF
29 pages
Basicmatrixtheorems PDF
No ratings yet
Basicmatrixtheorems PDF
3 pages
Basic Matrix Theorems
No ratings yet
Basic Matrix Theorems
3 pages
1 Regression Analysis and Least Squares Estimators
No ratings yet
1 Regression Analysis and Least Squares Estimators
8 pages
Symmetric-Matrices-and-Eigendecomposition
No ratings yet
Symmetric-Matrices-and-Eigendecomposition
15 pages
matrix 4
No ratings yet
matrix 4
26 pages
Lect 4
No ratings yet
Lect 4
16 pages
linear_56
No ratings yet
linear_56
28 pages
Lecture 20
No ratings yet
Lecture 20
2 pages
Sym Herm
No ratings yet
Sym Herm
7 pages
Aprendizaje Estadistico Final
No ratings yet
Aprendizaje Estadistico Final
71 pages
Presu 12
No ratings yet
Presu 12
30 pages
Tutorial 2 2023
No ratings yet
Tutorial 2 2023
10 pages
1 Regression Analysis and Least Squares Estimators
No ratings yet
1 Regression Analysis and Least Squares Estimators
7 pages
Homework 2
No ratings yet
Homework 2
6 pages
Untitled 1
No ratings yet
Untitled 1
5 pages
MathModel_Lecture 8 1
No ratings yet
MathModel_Lecture 8 1
8 pages
Presence Tasks 2
No ratings yet
Presence Tasks 2
8 pages
Lecture II - Docx - 12
No ratings yet
Lecture II - Docx - 12
12 pages
1981 Estimating the Dimension of a Linear-model_j. Andel, m. g. Perez and a. i. Negrao
No ratings yet
1981 Estimating the Dimension of a Linear-model_j. Andel, m. g. Perez and a. i. Negrao
12 pages
Graybill Intro
No ratings yet
Graybill Intro
20 pages
Matrix Introduction
No ratings yet
Matrix Introduction
30 pages
Teaching Notes 3
No ratings yet
Teaching Notes 3
10 pages
dis1_sol
No ratings yet
dis1_sol
8 pages
Cochran
No ratings yet
Cochran
3 pages
Canonical
No ratings yet
Canonical
7 pages
Lect 6
No ratings yet
Lect 6
20 pages
Solutions For Practice Set
No ratings yet
Solutions For Practice Set
7 pages
Econ-607 - Unit2-W1-3
No ratings yet
Econ-607 - Unit2-W1-3
117 pages
MIT System Theory Solutions
No ratings yet
MIT System Theory Solutions
75 pages
Properties
No ratings yet
Properties
8 pages
LMnotes 04
No ratings yet
LMnotes 04
9 pages
Midterm1sol PDF
No ratings yet
Midterm1sol PDF
4 pages
Ecomt Solns1
No ratings yet
Ecomt Solns1
15 pages
Department of Mathematics Indian Institute of Technology, Bombay
No ratings yet
Department of Mathematics Indian Institute of Technology, Bombay
8 pages
Estimation of Mean Vector and Variance Covariance Matrix PDF
No ratings yet
Estimation of Mean Vector and Variance Covariance Matrix PDF
7 pages
Chapter 21
No ratings yet
Chapter 21
6 pages
bitstream_821479
No ratings yet
bitstream_821479
6 pages
Regression Models With Unknown Singular Covariance Matrix: Muni S. Srivastava, Dietrich Von Rosen
No ratings yet
Regression Models With Unknown Singular Covariance Matrix: Muni S. Srivastava, Dietrich Von Rosen
19 pages
Algebraic Multiplicity, Geometrical Multiplicity & Orthogonal Matrix - by P K Kar
No ratings yet
Algebraic Multiplicity, Geometrical Multiplicity & Orthogonal Matrix - by P K Kar
11 pages
matrices
No ratings yet
matrices
14 pages
MA5158 Unit I Section 5
No ratings yet
MA5158 Unit I Section 5
29 pages
Graybill Dist FC
No ratings yet
Graybill Dist FC
7 pages
Exercises Session1 PDF
No ratings yet
Exercises Session1 PDF
4 pages
Resumo Álgebra Linear
No ratings yet
Resumo Álgebra Linear
18 pages
Notes Analysis
No ratings yet
Notes Analysis
14 pages
Linear Model
No ratings yet
Linear Model
11 pages
Elementary Proofs of The Jordan Decomposition Theorem For Niloptent Matrices
No ratings yet
Elementary Proofs of The Jordan Decomposition Theorem For Niloptent Matrices
4 pages
Multiple Regression Model - Matrix Form
No ratings yet
Multiple Regression Model - Matrix Form
22 pages
Lec4 Quad Form
No ratings yet
Lec4 Quad Form
18 pages
Homework 7 Solutions: 5.2 - Diagonalizability
No ratings yet
Homework 7 Solutions: 5.2 - Diagonalizability
7 pages
Lec 8
No ratings yet
Lec 8
11 pages
LN3 Properties Eigenvalues&Eigenvectors
No ratings yet
LN3 Properties Eigenvalues&Eigenvectors
7 pages
Education and Research: UP School of Statistics Student Council
No ratings yet
Education and Research: UP School of Statistics Student Council
26 pages
Solution Manual For Econometric Analysis 7th Edition by Greene
No ratings yet
Solution Manual For Econometric Analysis 7th Edition by Greene
12 pages
Lecture2 Module1 Anova 1
No ratings yet
Lecture2 Module1 Anova 1
11 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Set Theory Essentials
From Everand
Set Theory Essentials
Emil Milewski
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Lyx Beamer Sweave
No ratings yet
Lyx Beamer Sweave
5 pages
3 Ph.D. Review: Tracking Trends On The Web Using Novel Machine Learning Methods
No ratings yet
3 Ph.D. Review: Tracking Trends On The Web Using Novel Machine Learning Methods
15 pages
(Koen Frenken (Editor) ) Applied Evolutionary Econo
No ratings yet
(Koen Frenken (Editor) ) Applied Evolutionary Econo
344 pages
Spatial Regression
No ratings yet
Spatial Regression
111 pages
R Graphics
No ratings yet
R Graphics
76 pages
Basic Graph Theory With Applications To Economics: February 16, 2010
No ratings yet
Basic Graph Theory With Applications To Economics: February 16, 2010
44 pages
Regional Economic Growth in Mexico: Policy Research Working Paper 5369
No ratings yet
Regional Economic Growth in Mexico: Policy Research Working Paper 5369
24 pages
Trade Structure and Economic Growth JITED 2006
No ratings yet
Trade Structure and Economic Growth JITED 2006
38 pages
4.1 Answer Key
No ratings yet
4.1 Answer Key
4 pages
Ial Maths p3 Ex6b
No ratings yet
Ial Maths p3 Ex6b
3 pages
Practice Final Exam
No ratings yet
Practice Final Exam
3 pages
DLL - COT1 Simple Events
No ratings yet
DLL - COT1 Simple Events
9 pages
As Dira Cop Rs Vector Potentials
No ratings yet
As Dira Cop Rs Vector Potentials
4 pages
5th Grade CST Math Prep Key Standards With Hints and Explanations
100% (2)
5th Grade CST Math Prep Key Standards With Hints and Explanations
154 pages
Power Series Math
No ratings yet
Power Series Math
9 pages
Pmena Gokbel Poster-2015
No ratings yet
Pmena Gokbel Poster-2015
1 page
Bank Soalan
No ratings yet
Bank Soalan
13 pages
Test - Exponential & Logarithmic Functions
No ratings yet
Test - Exponential & Logarithmic Functions
2 pages
Digital
No ratings yet
Digital
2 pages
Grade 10
No ratings yet
Grade 10
8 pages
US5139084
No ratings yet
US5139084
12 pages
Maths DPP (Relation and Function)
100% (2)
Maths DPP (Relation and Function)
23 pages
Excel
No ratings yet
Excel
10 pages
Gravity Notes Grande
No ratings yet
Gravity Notes Grande
32 pages
The Plane Through Three Points: R Such That M
No ratings yet
The Plane Through Three Points: R Such That M
70 pages
Term Paper Module 1
No ratings yet
Term Paper Module 1
18 pages
Ian Goodfellow Yoshua Bengio and Aaron Courville D
No ratings yet
Ian Goodfellow Yoshua Bengio and Aaron Courville D
4 pages
Grade 5 Large Numbers Id
No ratings yet
Grade 5 Large Numbers Id
8 pages
ASRJC_9758_2024_Prelim P1
No ratings yet
ASRJC_9758_2024_Prelim P1
6 pages
(New ICMI Studies) R. Sutherland, Teresa Rojano, Alan Bell, Romulo Lins - Perspectives on School Algebra -Springer Netherlands (2000)
No ratings yet
(New ICMI Studies) R. Sutherland, Teresa Rojano, Alan Bell, Romulo Lins - Perspectives on School Algebra -Springer Netherlands (2000)
287 pages
JTS 3 7 QP
No ratings yet
JTS 3 7 QP
6 pages
Improper Integrals
No ratings yet
Improper Integrals
7 pages
EE3330TU Self-Test - Solution
No ratings yet
EE3330TU Self-Test - Solution
2 pages
Shape of The Graph: Independent Variable, The Master Variable À This Is Plotted On X-Axis
No ratings yet
Shape of The Graph: Independent Variable, The Master Variable À This Is Plotted On X-Axis
30 pages
APPC Q2 2024 Scoring Guide
No ratings yet
APPC Q2 2024 Scoring Guide
4 pages

Elementary Regression Theory: Theorem 1. If The K × 1 Vector Y Is Distributed As N (, V), and If B Is An M × K Matrix

Uploaded by

Elementary Regression Theory: Theorem 1. If The K × 1 Vector Y Is Distributed As N (, V), and If B Is An M × K Matrix

Uploaded by

Elementary Regression Theory

AY are independently distributed.

Y = Z. Then Z is distributed as N(P

AY share no element of Z in common, they

as the vector of n 1s, (1, . . . , 1), we can write

BP =diag, that AB = BA.

= BA = 0 and AB = BA. Then, by Theorem

Y . Then Z is distributed as N(P

BY are independent, and Y = PZ, where P is orthogonal and

BPZ cannot have elements of Z in common. Thus, if D

BP must be of the form

BP = 0, from which it follows that AB = 0.

AP = D, where D is diagonal. Let Z = P

A. Pre- and postmulti-

(Y X), without any

E(X +u) = + E(u) = .

is idempotent, as may be easily veried by multiplying it by itself. Substi-

, and of the quadratic

is a positive semidenite matrix; hence the covariance matrix of

, the ratio in the theorem can be written as

/(k 1). (13)

y from the numerator and

satisfy the conditions of Theorem 7.

, cancelled the term (X

is obviously positive semidenite; but then

is positive denite (because its eigenvalues exceed the corresponding eigenvalues of CC

A is the product of three nonsingular matrices.

). Using this by replacing

which is idempotent and or

, and noting from Eq.(51) that H

and using Eq.(52) yields

be the dierential operator. Also dene I

log f(x, )). It is immediately obvious that var(D

(log L(x, )/2, being between

n and multiplying the second line by

) in Taylor Series about

). Then the Wald

is of full rank r, W is asymptotically distributed

, where A is nonsingular. Then

Az. The Wald Statistic can be written as

). But the matrix in

is of full rank p, and A is

log L(x, ). Hence the Lagrange Multiplier statistic is written

, (s > p), yields

. Since CX = 0 by (2) of the theorem, so is

be the n k matrix which contains the eigenvectors of M corresponding to the zero

You might also like