0% found this document useful (0 votes)

8 views38 pages

et_Ch3

Chapter 3 discusses Classical Least Squares Theory, emphasizing the importance of empirical analysis in economics through regression analysis, particularly the Ordinary Least Squares (OLS) method. It explains the process of estimating parameters in both simple and multiple linear regression models, detailing the minimization of the sum of squared errors to derive OLS estimators. The chapter also addresses the conditions for unique solutions in OLS estimation and the implications of multicollinearity in regression models.

Uploaded by

amop3507

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views38 pages

et_Ch3

Uploaded by

amop3507

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Chapter 3

Classical Least Squares Theory

In the ﬁeld of economics, numerous hypotheses and theories have been proposed in or-
der to describe the behavior of economic agents and the relationships between economic
variables. Although these propositions may be theoretically appealing and logically cor-
rect, they may not be practically relevant unless they are supported by real world data.
A theory with supporting empirical evidence is of course more convincing. Therefore,
empirical analysis has become an indispensable ingredient of contemporary economic
research. By econometrics we mean the collection of statistical and mathematical meth-
ods that utilize data to analyze the relationships between economic variables.

A leading approach in econometrics is the regression analysis. For this analysis one
must first specify a regression model that characterizes the relationship of economic
variables; the simplest and most commonly used specification is the linear model. The
linear regression analysis then involves estimating unknown parameters of this specifi-
cation, testing various economic and econometric hypotheses, and drawing inferences
from the testing results. This chapter is concerned with one of the most important
estimation methods in linear regression, namely, the method of ordinary least squares
(OLS). We will analyze the OLS estimators of parameters and their properties. Testing
methods based on the OLS estimation results will also be presented. We will not discuss
asymptotic properties of the OLS estimators until Chapter 6. Readers can find related
topics in other econometrics textbooks, e.g., Davidson and MacKinnon (1993), Gold-
berger (1991), Greene (2000), Harvey (1990), Intriligator et al. (1996), Johnston (1984),
Judge et al. (1988), Maddala (1992), Ruud (2000), and Theil (1971), among many oth-
ers.

39
40 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

3.1 The Method of Ordinary Least Squares

Suppose that there is a variable, y, whose behavior over time (or across individual units)
is of interest to us. A theory may suggest that the behavior of y can be well characterized
by some function f of the variables x1 , . . . , xk . Then, f (x1 , . . . , xk ) may be viewed as a
“systematic” component of y provided that no other variables can further account for
the behavior of the residual y − f (x1 , . . . , xk ). In the context of linear regression, the
function f is speciﬁed as a linear function. The unknown linear weights (parameters)
of the linear speciﬁcation can then be determined using the OLS method.

3.1.1 Simple Linear Regression

In simple linear regression, only one variable x is designated to describe the behavior of
the variable y. The linear speciﬁcation is

α + βx,

where α and β are unknown parameters. We can then write

y = α + βx + e(α, β),

where e(α, β) = y − α − βx denotes the error resulted from this specification. Different
parameter values result in different errors. In what follows, y will be referred to as
the dependent variable (regressand) and x an explanatory variable (regressor). Note
that both the regressand and regressor may be a function of some other variables. For
example, when x = z 2 ,

y = α + βz 2 + e(α, β).

This speciﬁcation is not linear in the variable z but is linear in x (hence linear in
parameters). When y = log w and x = log z, we have

log w = α + β(log z) + e(α, β),

which is still linear in parameters. Such speciﬁcations can all be analyzed in the context
of linear regression.

Suppose that we have T observations of the variables y and x. Given the linear
specification above, our objective is to find suitable α and β such that the resulting
linear function “best” fits the data (yt , xt ), t = 1, . . . , T . Here, the generic subscript t
is used for both cross-section and time-series data. The OLS method suggests to find

c Chung-Ming Kuan, 2004
3.1. THE METHOD OF ORDINARY LEAST SQUARES 41

a straight line whose sum of squared errors is as small as possible. This amounts to
ﬁnding α and β that minimize the following OLS criterion function:

1 1
T T
Q(α, β) := et (α, β)2 = (y − α − βxt )2 .
T t=1 T t=1 t

The solutions can be easily obtained by solving the ﬁrst order conditions of this mini-
mization problem.

The ﬁrst order conditions are:

2
T
∂Q(α, β)
=− (yt − α − βxt ) = 0,
∂α T
t=1

2
T
∂Q(α, β)
=− (yt − α − βxt )xt = 0.
∂β T
t=1

Solving for α and β we have the following solutions:

T
(yt − ȳ)(xt − x̄)
β̂T = t=1T ,
t=1 (xt − x̄)
2

α̂T = ȳ − β̂T x̄,

T T
where ȳ = t=1 yt /T and x̄ = t=1 xt /T . As α̂T and β̂T are obtained by minimizing the
OLS criterion function, they are known as the OLS estimators of α and β, respectively.
The subscript T of α̂T and β̂T signiﬁes that these solutions are obtained from a sample
of T observations. Note that if xt is a constant c for every t, then x̄ = c, and hence β̂T
cannot be computed.

The function ŷ = α̂T + β̂T x is the estimated regression line with the intercept α̂T
and slope β̂T . We also say that this line is obtained by regressing y on (the constant one
and) the regressor x. The regression line so computed gives the “best” ﬁt of data, in
the sense that any other linear function of x would yield a larger sum of squared errors.
For a given xt , the OLS ﬁtted value is a point on the regression line:

ŷt = α̂T + β̂T xt .

The diﬀerence between yt and ŷt is the t th OLS residual:

êt := yt − ŷt ,

which corresponds to the error of the speciﬁcation as

êt = et (α̂T , β̂T ).

c Chung-Ming Kuan, 2004
42 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

Note that regressing y on x and regressing x on y lead to diﬀerent regression lines in

general, except when all (yt , xt ) lie on the same line; see Exercise 3.9.

Remark: Diﬀerent criterion functions would result in other estimators. For exam-
ple, the so-called least absolute deviation estimator can be obtained by minimizing the
average of the sum of absolute errors:

1
T
|y − α − βxt |,
T t=1 t

which in turn determines a diﬀerent regression line. We refer to Manski (1991) for a
comprehensive discussion of this topic.

3.1.2 Multiple Linear Regression

More generally, we may specify a linear function with k explanatory variables to describe
the behavior of y:

β1 x1 + β2 x2 + · · · + βk xk ,

so that

y = β1 x1 + β2 x2 + · · · + βk xk + e(β1 , . . . , βk ),

where e(β1 , . . . , βk ) again denotes the error of this speciﬁcation. Given a sample of T
observations, this speciﬁcation can also be expressed as

y = Xβ + e(β), (3.1)

where β = (β1 β2 · · · βk ) is the vector of unknown parameters, y and X contain all

the observations of the dependent and explanatory variables, i.e.,
⎡ ⎤ ⎡ ⎤
y1 x11 x12 · · · x1k
⎢ ⎥ ⎢ ⎥
⎢ y ⎥ ⎢ x x · · · x ⎥
⎢ 2 ⎥ ⎢ 21 22 2k ⎥
y = ⎢ . ⎥, X =⎢ . .. .. .. ⎥ ,
⎢ .. ⎥ ⎢ .. . . . ⎥
⎣ ⎦ ⎣ ⎦
yT xT 1 xT 2 · · · xT k

where each column of X contains T observations of an explanatory variable, and e(β)

is the vector of errors. It is typical to set the ﬁrst explanatory variable as the constant
one so that the ﬁrst column of X is the T × 1 vector of ones, . For convenience, we
also write e(β) as e and its element et (β) as et .

c Chung-Ming Kuan, 2004
3.1. THE METHOD OF ORDINARY LEAST SQUARES 43

Our objective now is to find a k-dimensional regression hyperplane that “best” fits
the data (y, X). In the light of Section 3.1.1, we would like to minimize, with respect
to β, the average of the sum of squared errors:
1 1
Q(β) := e(β) e(β) = (y − Xβ) (y − Xβ). (3.2)
T T
This is a well-defined problem provided that the basic identification requirement below
holds for the specification (3.1).
[ID-1] The T × k data matrix X is of full column rank k.
Under [ID-1], the number of regressors, k, must be no greater than the number of
observations, T . This is so because if k > T , the rank of X must be less than or equal to
T , and hence X cannot have full column rank. Moreover, [ID-1] requires that any linear
specification does not contain any “redundant” regressor; that is, any column vector of
X cannot be written as a linear combination of other column vectors. For example, X
contains a column of ones and a column of xt in simple linear regression. These two
columns would be linearly dependent if xt = c for every t. Thus, [ID-1] requires that xt
in simple linear regression is not a constant.
The first order condition of the OLS minimization problem is

∇β Q(β) = ∇β (y y − 2y Xβ + β X Xβ)/T = 0.

By the matrix diﬀerentiation results in Section 1.2, we have

∇β Q(β) = −2X (y − Xβ)/T = 0.

Equivalently, we can write

X Xβ = X y. (3.3)

These k equations, also known as the normal equations, contain exactly k unknowns.
Given [ID-1], X is of full column rank so that X X is positive deﬁnite and hence
invertible by Lemma 1.13. It follows that the unique solution to the ﬁrst order condition
is

β̂ T = (X X)−1 X y. (3.4)

Moreover, the second order condition is also satisﬁed because

∇2β Q(β) = 2(X X)/T

is a positive deﬁnite matrix under [ID-1]. Thus, β̂ T is the unique minimizer of the OLS
criterion function and hence known as the OLS estimator of β. This result is formally
stated below.

c Chung-Ming Kuan, 2004
44 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

Theorem 3.1 Given the speciﬁcation (3.1), suppose that [ID-1] holds. Then, the OLS
estimator β̂ T given by (3.4) uniquely minimizes the OLS criterion function (3.2).

If X is not of full column rank, its column vectors are linearly dependent and there-
fore satisfy an exact linear relationship. This is the problem of exact multicollinearity.
In this case, X X is not invertible so that there exist infinitely many solutions to the
normal equations X Xβ = X y. As such, the OLS estimator β̂ T cannot be uniquely
determined. See Exercise 3.4 for a geometric interpretation of this result. Exact mul-
ticollinearity usually arises from inappropriate model specifications. For example, in-
cluding both total income, total wage income, and total non-wage income as regressors
results in exact multicollinearity because total income is, by definition, the sum of wage
and non-wage income; see also Section 3.5.2 for another example. In what follows, the
identification requirement for the linear specification (3.1) is always assumed.

Remarks:

1. Theorem 3.1 does not depend on the “true” relationship between y and X. Thus,
whether (3.1) agrees with true relationship between y and X is irrelevant to the
existence and uniqueness of the OLS estimator.

2. It is easy to verify that the magnitudes of the coeﬃcient estimates β̂i , i = 1, . . . , k,

are affected by the measurement units of dependent and explanatory variables; see
Exercise 3.7. As such, a larger coefficient estimate does not necessarily imply that
the associated regressor is more important in explaining the behavior of y. In fact,
the coefficient estimates are not directly comparable in general; cf. Exercise 3.5.

Once the OLS estimator β̂ T is obtained, we can plug it into the original linear
speciﬁcation and obtain the vector of OLS ﬁtted values:

ŷ = X β̂ T .

The vector of OLS residuals is then

ê = y − ŷ = e(β̂ T ).

From the normal equations (3.3) we can deduce the following algebraic results. First,
the OLS residual vector must satisfy the normal equations:

X (y − X β̂ T ) = X e = 0,

c Chung-Ming Kuan, 2004
3.1. THE METHOD OF ORDINARY LEAST SQUARES 45

so that X ê = 0. When X contains a column of constants (i.e., a column of X is

proportional to , the vector of ones), X ê = 0 implies

T

ê = êt = 0.
t=1

That is, the sum of OLS residuals must be zero. Second,

ŷ ê = β̂ T X ê = 0.

These results are summarized below.

Theorem 3.2 Given the speciﬁcation (3.1), suppose that [ID-1] holds. Then, the vector
of OLS ﬁtted values ŷ and the vector of OLS residuals ê have the following properties.
T
(a) X ê = 0; in particular, if X contains a column of constants, t=1 êt = 0.

(b) ŷ ê = 0.

Note that when ê = (y − ŷ) = 0, we have

1 1
T T
yt = ŷt .
T T
t=1 t=1

That is, the sample average of the data yt is the same as the sample average of the ﬁtted
values ŷt when X contains a column of constants.

3.1.3 Geometric Interpretations

The OLS estimation result has nice geometric interpretations. These interpretations
have nothing to do with the stochastic properties to be discussed in Section 3.2, and
they are valid as long as the OLS estimator exists.

In what follows, we write P = X(X X)−1 X which is an orthogonal projection

matrix that projects vectors onto span(X) by Lemma 1.14. The vector of OLS ﬁtted
values can be written as

ŷ = X(X X)−1 X y = P y.

Hence, ŷ is the orthogonal projection of y onto span(X). The OLS residual vector is

ê = y − ŷ = (I T − P )y,

c Chung-Ming Kuan, 2004
46 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

x2
ê = (I − P )y

x2 β̂ 2 P y = x1 β̂ 1 + x2 β̂ 2

x1
x1 β̂ 1

Figure 3.1: The orthogonal projection of y onto span(x1 ,x2 )

which is the orthogonal projection of y onto span(X)⊥ and hence is orthogonal to ŷ

and X; cf. Theorem 3.2. Consequently, ŷ is the “best approximation” of y, given the
information contained in X, as shown in Lemma 1.10. Figure 3.1 illustrates a simple
case where there are only two explanatory variables in the speciﬁcation.

The following results are useful in many applications.

Theorem 3.3 (Frisch-Waugh-Lovell) Given the speciﬁcation

y = X 1 β 1 + X 2 β 2 + e,

where X 1 is of full column rank k1 and X 2 is of full column rank k2 , let β̂ T =

(β̂ 1,T β̂ 2,T ) denote the corresponding OLS estimators. Then,

β̂ 1,T = [X 1 (I − P 2 )X 1 ]−1 X 1 (I − P 2 )y,

β̂ 2,T = [X 2 (I − P 1 )X 2 ]−1 X 2 (I − P 1 )y,

where P 1 = X 1 (X 1 X 1 )−1 X 1 and P 2 = X 2 (X 2 X 2 )−1 X 2 .

Proof: These results can be directly veriﬁed from (3.4) using the matrix inversion
formula in Section 1.4. Alternatively, write

y = X 1 β̂ 1,T + X 2 β̂ 2,T + (I − P )y,

c Chung-Ming Kuan, 2004
3.1. THE METHOD OF ORDINARY LEAST SQUARES 47

where P = X(X X)−1 X with X = [X 1 X 2 ]. Pre-multiplying both sides by X 1 (I −

P 2 ), we have

X 1 (I − P 2 )y

= X 1 (I − P 2 )X 1 β̂ 1,T + X 1 (I − P 2 )X 2 β̂ 2,T + X 1 (I − P 2 )(I − P )y.

The second term on the right-hand side vanishes because (I − P 2 )X 2 = 0. For the
third term, we know span(X 2 ) ⊆ span(X), so that span(X)⊥ ⊆ span(X 2 )⊥ . As each
column vector of I − P is in span(X)⊥ , I − P is not aﬀected if it is projected onto
span(X 2 )⊥ . That is,

(I − P 2 )(I − P ) = I − P .

Similarly, X 1 is in span(X), and hence (I − P )X 1 = 0. It follows that

X 1 (I − P 2 )y = X 1 (I − P 2 )X 1 β̂ 1,T ,

from which we obtain the expression for β̂ 1,T . The proof for β̂ 2,T is similar. 2

Theorem 3.3 shows that β̂ 1,T can be computed from regressing (I − P 2 )y on (I −

P 2 )X 1 , where (I − P 2 )y and (I − P 2 )X 1 are the residual vectors of the “purging”
regressions of y on X 2 and X 1 on X 2 , respectively. Similarly, β̂ 2,T can be obtained by
regressing (I − P 1 )y on (I − P 1 )X 2 , where (I − P 1 )y and (I − P 1 )X 2 are the residual
vectors of the regressions of y on X 1 and X 2 on X 1 , respectively.

From Theorem 3.3 we can deduce the following results. Consider the regression of
(I − P 1 )y on (I − P 1 )X 2 . By Theorem 3.3 we have

(I − P 1 )y = (I − P 1 )X 2 β̂ 2,T + residual vector, (3.5)

where the residual vector is

(I − P 1 )(I − P )y = (I − P )y.

Thus, the residual vector of (3.5) is identical to the residual vector of regressing y on
X = [X 1 X 2 ]. Note that (I − P 1 )(I − P ) = I − P implies P 1 = P 1 P . That is, the
orthogonal projection of y directly on span(X 1 ) is equivalent to performing iterated
projections of y on span(X) and then on span(X 1 ). The orthogonal projection part of
(3.5) now can be expressed as

(I − P 1 )X 2 β̂ 2,T = (I − P 1 )P y = (P − P 1 )y.

c Chung-Ming Kuan, 2004
48 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

x2
ê = (I − P )y

Py
(I − P 1 )y

(P − P 1 )y

x1
P 1y

Figure 3.2: An illustration of the Frisch-Waugh-Lovell Theorem

These relationships are illustrated in Figure 3.2. Similarly, we have

(I − P 2 )y = (I − P 2 )X 1 β̂ 1,T + residual vector,

where the residual vector is also (I − P )y, and the orthogonal projection part of this
regression is (P − P 2 )y. See also Davidson and MacKinnon (1993) for more details.

Intuitively, Theorem 3.3 suggests that β̂ 1,T in effect describes how X 1 characterizes
y, after the effect of X 2 is excluded. Thus, β̂ 1,T is different from the OLS estimator of
regressing y on X 1 because the effect of X 2 is not controlled in the latter. These two
estimators would be the same if P 2 X 1 = 0, i.e., X 1 is orthogonal to X 2 . Also, β̂ 2,T
describes how X 2 characterizes y, after the effect of X 1 is excluded, and it is different
from the OLS estimator from regressing y on X 2 , unless X 1 and X 2 are orthogonal to
each other.

As an application, consider the speciﬁcation with X = [X 1 X 2 ], where X 1 con-

tains the constant term and a time trend variable t, and X 2 includes the other k − 2
explanatory variables. This speciﬁcation is useful when the variables of interest exhibit
a trending behavior. Then, the OLS estimators of the coeﬃcients of X 2 are the same
as those obtained from regressing (detrended) y on detrended X 2 , where detrended y
and X 2 are the residuals of regressing y and X 2 on X 1 , respectively. See Section 3.5.2
and Exercise 3.11 for other applications.

c Chung-Ming Kuan, 2004
3.1. THE METHOD OF ORDINARY LEAST SQUARES 49

3.1.4 Measures of Goodness of Fit

We have learned that from previous sections that, when the explanatory variables in a
linear specification are given, the OLS method yields the best fit of data. In practice,
one may consdier a linear specfication with different sets of regressors and try to choose
a particular one from them. It is therefore of interest to compare the performance across
different specifications. In this section we discuss how to measure the goodness of fit of
a specification. A natural goodness-of-fit measure is of course the sum of squared errors
ê ê. Unfortunately, this measure is not invariant with respect to measurement units of
the dependent variable and hence is not appropriate for model comparison. Instead, we
consider the following “relative” measures of goodness of fit.

Recall from Theorem 3.2(b) that ŷ ê = 0. Then,

y y = ŷ ŷ + ê ê + 2ŷ ê = ŷ ŷ + ê ê.

This equation can be written in terms of sum of squares:

T
T
T
yt2 = ŷt2 + ê2t ,
t=1 t=1 t=1
TSS RSS ESS

where TSS stands for total sum of squares and is a measure of total squared variations of
yt , RSS stands for regression sum of squares and is a measures of squared variations of
fitted values, and ESS stands for error sum of squares and is a measure of squared vari-
ation of residuals. The non-centered coefficient of determination (or non-centered R2 )
is defined as the proportion of TSS that can be explained by the regression hyperplane:
RSS ESS
R2 = =1− . (3.6)
TSS TSS
Clearly, 0 ≤ R2 ≤ 1, and the larger the R2 , the better the model fits the data. In
particular, a model has a perfect fit if R2 = 1, and it does not account for any variation
of y if R2 = 0. It is also easy to verify that this measure does not depend on the
measurement units of the dependent and explanatory variables; see Exercise 3.7.

As ŷ ŷ = ŷ y, we can also write

ŷ ŷ (ŷ y)2
R2 = = .
yy (y y)(ŷ ŷ)

It follows from the discussion of inner product and Euclidean norm in Section 1.2 that
the right-hand side is just cos2 θ, where θ is the angle between y and ŷ. Thus, R2 can be

c Chung-Ming Kuan, 2004
50 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

interpreted as a measure of the linear association between these two vectors. A perfect
ﬁt is equivalent to the fact that y and ŷ are collinear, so that y must be in span(X).
When R2 = 0, y is orthogonal to ŷ so that y is in span(X)⊥ .

It can be verified that when a constant is added to all observations of the depen-
dent variable, the resulting coefficient of determination also changes. This is clearly a
drawback because a sensible measure of fit should not be affected by the location of the
dependent variable. Another drawback of the coefficient of determination is that it is
non-decreasing in the number of variables in the specification. That is, adding more
variables to a linear specification will not reduce its R2 . To see this, consider a specifi-
cation with k1 regressors and a more complex one containing the same k1 regressors and
additional k2 regressors. In this case, the former specification is “nested” in the latter,
in the sense that the former can be obtained from the latter by setting the coefficients
of those additional regressors to zero. Since the OLS method searches for the best fit of
data without any constraint, the more complex model cannot have a worse fit than the
specifications nested in it. See also Exercise 3.8.

A measure that is invariant with respect to constant addition is the centered co-
eﬃcient of determination (or centered R2 ). When a speciﬁcation contains a constant
term,

T
T
T
(yt − ȳ) =
2 ¯
(ŷt − ŷ) +
2
ê2t ,
t=1 t=1 t=1
Centered TSS Centered RSS ESS
T
where ŷ¯ = ȳ = t=1 yt /T . Analogous to (3.6), the centered R2 is defined as
Centered RSS ESS
Centered R2 = =1− . (3.7)
Centered TSS Centered TSS
Centered R2 also takes on values between 0 and 1 and is non-decreasing in the number
of variables in the specification. In contrast with non-centered R2 , this measure excludes
the effect of the constant term and hence is invariant with respect to constant addition.

When a specﬁcation contains a constant term, we have

T
T
T
(yt − ȳ)(ŷt − ȳ) = (ŷt − ȳ + êt )(ŷt − ȳ) = (ŷt − ȳ)2 ,
t=1 t=1 t=1
T t
because t=1 ŷt êt = t=1 êt
= 0 by Theorem 3.2. It follows that
T
(ŷt − ȳ)2 [ Tt=1 (yt − ȳ)(ŷt − ȳ)]2
R2 = t=1
T
= T .
(y − ȳ)2 [ (y − ȳ)2 ][ T (ŷ − ȳ)2 ]
t=1 t t=1 t t=1 t

c Chung-Ming Kuan, 2004
3.2. STATISTICAL PROPERTIES OF THE OLS ESTIMATORS 51

That is, the centered R2 is also the squared sample correlation coefficient of yt and ŷt ,
also known as the squared multiple correlation coefficient. If a specification does not
contain a constant term, the centered R2 may be negative; see Exercise 3.10.

Both centered and non-centered R2 are still non-decreasing in the number of regres-
sors. This property implies that a more complex model would be preferred if R2 is the
only criterion for choosing a speciﬁcation. A modiﬁed measure is the adjusted R2 , R̄2 ,
which is the centered R2 adjusted for the degrees of freedom:

ê ê/(T − k)
R̄2 = 1 − .
(y y − T ȳ 2 )/(T − 1)
This measure can also be expressed in different forms:
T −1 k−1
R̄2 = 1 − (1 − R2 ) = R2 − (1 − R2 ).
T −k T −k
That is, R̄2 is the centered R2 with a penalty term depending on model complexity
and explanatory ability. Observe that when k increases, (k − 1)/(T − k) increases but
1 − R2 decreases. Whether the penalty term is larger or smaller depends on the trade-
off between these two terms. Thus, R̄2 need not be increasing with the number of
explanatory variables. Clearly, R̄2 < R2 except for k = 1 or R2 = 1. It can also be
verified that R̄2 < 0 when R2 < (k − 1)/(T − 1).

Remark: As different dependent variables have different TSS, the associated speci-
fications are therefore not comparable in terms of their R2 . For example, R2 of the
specifications with y and log y as dependent variables are not comparable.

3.2 Statistical Properties of the OLS Estimators

Readers should have noticed that the previous results, which are either algebraic or
geometric, hold regardless of the random nature of data. To derive the statistical
properties of the OLS estimator, some probabilistic conditions must be imposed.

3.2.1 Classical Conditions

The following conditions on data are usually known as the classical conditions.

[A1] X is non-stochastic.

[A2] y is a random vector such that

(i) IE(y) = Xβ o for some β o ;

c Chung-Ming Kuan, 2004
52 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

(ii) var(y) = σo2 I T for some σo2 > 0.

[A3] y is a random vector such that y ∼ N (Xβ o , σo2 I T ) for some β o and σo2 > 0.

Condition [A1] is not crucial, but, as we will see below, it is quite convenient for
subsequent analysis. Concerning [A2](i), we first note that IE(y) is the “averaging”
behavior of y and may be interpreted as a systematic component of y. [A2](i) is thus
a condition ensuring that the postulated linear function Xβ is a specification of this
systematic component, correct up to unknown parameters. Condition [A2](ii) regulates
that the variance-covariance matrix of y depends only on one parameter σo2 ; such a
matrix is also known as a scalar covariance matrix. Under [A2](ii), yt , t = 1, . . . , T ,
have the constant variance σo2 and are pairwise uncorrelated (but not necessarily inde-
pendent). Although conditions [A2] and [A3] impose the same structures on the mean
and variance of y, the latter is much stronger because it also specifies the distribution
of y. We have seen in Section 2.3 that uncorrelated normal random variables are also
independent. Therefore, yt , t = 1, . . . , T , are i.i.d. (independently and identically dis-
tributed) normal random variables under [A3]. The linear specification (3.1) with [A1]
and [A2] is known as the classical linear model, and (3.1) with [A1] and [A3] is also
known as the classical normal linear model. The limitations of these conditions will be
discussed in Section 3.6.

In addition to β̂ T , the new unknown parameter var(yt ) = σo2 in [A2](ii) and [A3]
should be estimated as well. The OLS estimator for σo2 is

ê ê 1 2
T
σ̂T2 = = êt , (3.8)
T −k T −k
t=1

where k is the number of regressors. While β̂ T is a linear estimator in the sense that
it is a linear transformation of y, σ̂T2 is not. In the sections below we will derive the
properties of the OLS estimators β̂ T and σ̂T2 under these classical conditions.

3.2.2 Without the Normality Condition

Under the imposed classical conditions, the OLS estimators have the following statistical
properties.

Theorem 3.4 Consider the linear speciﬁcation (3.1).

(a) Given [A1] and [A2](i), β̂ T is unbiased for β o .

c Chung-Ming Kuan, 2004
3.2. STATISTICAL PROPERTIES OF THE OLS ESTIMATORS 53

(b) Given [A1] and [A2], σ̂T2 is unbiased for σo2 .

(c) Given [A1] and [A2], var(β̂ T ) = σo2 (X X)−1 .

Proof: Given [A1] and [A2](i), β̂ T is unbiased because

IE(β̂ T ) = (X X)−1 X IE(y) = (X X)−1 X Xβ o = β o .

To prove (b), recall that (I T − P )X = 0 so that the OLS residual vector can be written
as

ê = (I T − P )y = (I T − P )(y − Xβ o ).

Then, ê ê = (y − Xβ o ) (I T − P )(y − Xβ o ) which is a scalar, and

IE(ê ê) = IE trace (y − Xβ o ) (I T − P )(y − Xβ o )

= IE[trace (y − Xβ o )(y − Xβ o ) (I T − P ) .

By interchanging the trace and expectation operators, we have from [A2](ii) that

IE(ê ê) = trace IE[(y − Xβ o )(y − Xβ o ) (I T − P )]

= trace IE[(y − Xβ o )(y − Xβ o ) ](I T − P )

= trace σo2 I T (I T − P )

= σo2 trace(I T − P ).

By Lemmas 1.12 and 1.14, trace(I T − P ) = rank(I T − P ) = T − k. Consequently,

IE(ê ê) = σo2 (T − k),

so that

IE(σ̂T2 ) = IE(ê ê)/(T − k) = σo2 .

This proves the unbiasedness of σ̂T2 . Given that β̂ T is a linear transformation of y, we

have from Lemma 2.4 that

var(β̂ T ) = var (X X)−1 X y

= (X X)−1 X (σo2 I T )X(X X)−1

= σo2 (X X)−1 .

This establishes (c). 2

c Chung-Ming Kuan, 2004
54 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

It can be seen that the unbiasedness of β̂ T does not depend on [A2](ii), the variance
property of y. It is also clear that when σ̂T2 is unbiased, the estimator

β̂ T ) = σ̂T2 (X X)−1
var(

is also unbiased for var(β̂ T ). The result below, known as the Gauss-Markov theorem,
indicates that when [A1] and [A2] hold, β̂ T is not only unbiased but also the best (most
eﬃcient) among all linear unbiased estimators for β o .

Theorem 3.5 (Gauss-Markov) Given the linear speciﬁcation (3.1), suppose that [A1]
and [A2] hold. Then the OLS estimator β̂ T is the best linear unbiased estimator (BLUE)
for β o .

Proof: Consider an arbitrary linear estimator β̌ T = Ay, where A is non-stochastic.

Writing A = (X X)−1 X + C, β̌ T = β̂ T + Cy. Then,

var(β̌ T ) = var(β̂ T ) + var(Cy) + 2 cov(β̂ T , Cy).

By [A1] and [A2](i),

IE(β̌ T ) = β o + CXβ o .

Since β o is arbitrary, this estimator would be unbiased if, and only if, CX = 0. This
property further implies that

cov(β̂ T , Cy) = IE[(X X)−1 X (y − Xβ o )y C ]

= (X X)−1 X IE[(y − Xβ o )y ]C

= (X X)−1 X (σo2 I T )C

= 0.

Thus,

var(β̌ T ) = var(β̂ T ) + var(Cy) = var(β̂ T ) + σo2 CC ,

where σo2 CC is clearly a positive semi-definite matrix. This shows that for any linear
unbiased estimator β̌ T , var(β̌ T ) − var(β̂ T ) is positive semi-definite, so that β̂ T is more
efficient. 2

c Chung-Ming Kuan, 2004
3.2. STATISTICAL PROPERTIES OF THE OLS ESTIMATORS 55

Example 3.6 Given the data [y X], where X is a nonstochastic matrix and can be
partitioned as [X 1 X 2 ]. Suppose that IE(y) = X 1 b1 for some b1 and var(y) = σo2 I T
for some σo2 > 0. Consider ﬁrst the speciﬁcation that contains only X 1 but not X 2 :

y = X 1 β 1 + e.

Let b̂1,T denote the resulting OLS estimator. It is clear that b̂1,T is still a linear estimator
and unbiased for b1 by Theorem 3.4(a). Moreover, it is the BLUE for b1 by Theorem 3.5
with the variance-covariance matrix

var(b̂1,T ) = σo2 (X 1 X 1 )−1 ,

by Theorem 3.4(c).

Consider now the linear speciﬁcation that involves both X 1 and irrelevant regressors
X 2:

y = Xβ + e = X 1 β 1 + X 2 β 2 + e.

This speciﬁcation would be a correct speciﬁcation if some of the parameters (β 2 ) are

restricted to zero. Let β̂ T = (β̂ 1,T β̂ 2,T ) be the OLS estimator of β. Using Theorem 3.3,
we ﬁnd

IE(β̂ 1,T ) = IE [X 1 (I T − P 2 )X 1 ]−1 X 1 (I T − P 2 )y = b1 ,

IE(β̂ 2,T ) = IE [X 2 (I T − P 1 )X 2 ]−1 X 2 (I T − P 1 )y = 0,

where P 1 = X 1 (X 1 X 1 )−1 X 1 and P 2 = X 2 (X 2 X 2 )−1 X 2 . This shows that β̂ T is

unbiased for (b1 0 ) . Also,

var(β̂ 1,T ) = var([X 1 (I T − P 2 )X 1 ]−1 X 1 (I T − P 2 )y)

= σo2 [X 1 (I T − P 2 )X 1 ]−1 .

Given that P 2 is a positive semi-deﬁnite matrix,

X 1 X 1 − X 1 (I T − P 2 )X 1 = X 1 P 2 X 1 ,

must also be positive semi-deﬁnite. It follows from Lemma 1.9 that

[X 1 (I T − P 2 )X 1 ]−1 − (X 1 X 1 )−1

is a positive semi-definite matrix. This shows that b̂1,T is more efficient than β̂ 1,T , as it
ought to be. When X 1 X 2 = 0, i.e., the columns of X 1 are orthogonal to the columns
of X 2 , we immediately have (I T − P 2 )X 1 = X 1 , so that β̂ 1,T = b̂1,T . In this case,
estimating a more complex specification does not result in efficiency loss. 2

c Chung-Ming Kuan, 2004
56 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

Remark: This example shows that for the speciﬁcation y = X 1 β 1 + X 2 β 2 + e, the

OLS estimator of β 1 is not the most efficient when IE(y) = X 1 b1 . The failure of the
Gauss-Markov theorem in this example is because [A2](i) does not hold in general;
instead, [A2](i) holds only for β o = (b1 0 ) , where b1 is arbitrary but the remaining
elements are not. This result thus suggests that when the restrictions on paramters are
not taken into account, the resulting OLS estimator would suffer from efficiency loss.

3.2.3 With the Normality Condition

We have learned that the normality condition [A3] is much stronger than [A2]. With
this stronger condition, more can be said about the OLS estimators.

Theorem 3.7 Given the linear speciﬁcation (3.1), suppose that [A1] and [A3] hold.

(a) β̂ T ∼ N β o , σo2 (X X)−1 .

(b) (T − k)σ̂T2 /σo2 ∼ χ2 (T − k).

Proof: As β̂ T is a linear transformation of y, it is also normally distributed as

β̂ T ∼ N β o , σo2 (X X)−1 ,

by Lemma 2.6, where its mean and variance-covariance matrix are as in Theorem 3.4(a)
and (c). To prove the assertion (b), we again write ê = (I T − P )(y − Xβ o ) and deduce

(T − k)σ̂T2 /σo2 = ê ê/σo2 = y ∗ (I T − P )y ∗ ,

where y ∗ = (y − Xβ o )/σo . Let C be the orthogonal matrix that diagonalizes the

symmetric and idempotent matrix I T − P . Then, C (I T − P )C = Λ. Since rank(I T −
P ) = T − k, Λ contains T − k eigenvalues equal to one and k eigenvalues equal to zero
by Lemma 1.11. Without loss of generality we can write

I T −k 0
y ∗ (I T − P )y ∗ = y ∗ C[C (I T − P )C]C y ∗ = η η,
0 0

where η = C y ∗ . Again by Lemma 2.6, y ∗ ∼ N (0, I T ) under [A3]. Hence, η ∼

N (0, I T ), so that ηi are independent, standard normal random variables. Consequently,

T −k
y ∗ (I T − P )y ∗ = ηi2 ∼ χ2 (T − k).
i=1

c Chung-Ming Kuan, 2004
3.2. STATISTICAL PROPERTIES OF THE OLS ESTIMATORS 57

This proves (b). Noting that the mean of χ2 (T − k) is T − k and variance is 2(T − k),
the assertion (c) is just a direct consequence of (b). 2

Suppose that we believe that [A3] is true and specify the log-likelihood function of
y as:
T T 1
log L(β, σ 2 ) = − log(2π) − log σ 2 − 2 (y − Xβ) (y − Xβ).
2 2 2σ
The first order conditions of maximizing this log-likelihood are
1
∇β log L(β, σ 2 ) = X (y − Xβ) = 0,
σ2
T 1
∇σ2 log L(β, σ 2 ) = − 2 + 4 (y − Xβ) (y − Xβ) = 0,
2σ 2σ
and their solutions are the MLEs β̃ T and σ̃T2 . The first k equations above are equivalent
to the OLS normal equations (3.3). It follows that the OLS estimator β̂ T is also the
MLE β̃ T . Plugging β̂ T into the first order conditions we can solve for σ 2 and obtain

(y − X β̂ T ) (y − X β̂ T ) ê ê
σ̃T2 = = , (3.9)
T T
which is diﬀerent from the OLS variance estimator (3.8).

The conclusion below is stronger than the Gauss-Markov theorem (Theorem 3.5).

Theorem 3.8 Given the linear speciﬁcation (3.1), suppose that [A1] and [A3] hold.
Then the OLS estimators β̂ T and σ̂T2 are the best unbiased estimators for β o and σo2 ,
respectively.

Proof: The score vector is

⎡ ⎤
1
σ2
X (y − Xβ)
s(β, σ 2 ) = ⎣ ⎦,
− 2σT 2 + 1
2σ4 (y − Xβ) (y − Xβ)

and the Hessian matrix of the log-likelihood function is

⎡ ⎤

− 1
σ2 X X − 1
σ4 X (y − Xβ)
H(β, σ 2 ) = ⎣ ⎦.
1 T 1
− σ4 (y − Xβ) X 2σ4 − σ6 (y − Xβ) (y − Xβ)

It is easily veriﬁed that when [A3] is true, IE[s(β o , σo2 )] = 0 and

⎡ ⎤
− σ12 X X 0
IE[H(β o , σo2 )] = ⎣ o ⎦.
0 − 2σT 4
o

c Chung-Ming Kuan, 2004
58 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

The information matrix equality (Lemma 2.9) ensures that the negative of IE[H(β o , σo2 )]
equals the information matrix. The inverse of the information matrix is then
⎡ ⎤
σo2 (X X)−1 0
⎣ ⎦,
2σo4
0 T

which is the Cramér-Rao lower bound by Lemma 2.10. Clearly, var(β̂ T ) achieves this
lower bound so that β̂ T must be the best unbiased estimator for β o . Although the
variance of σ̂T2 is greater than the lower bound, it can be shown that σ̂T2 is still the best
unbiased estimator for σo2 ; see, e.g., Rao (1973, p. 319) for a proof. 2

Remark: Comparing to the Gauss-Markov theorem, Theorem 3.8 gives a stronger

result at the expense of a stronger condition (the normality condition [A3]). The OLS
estimators now are the best (most eﬃcient) in a much larger class of estimators, namely,
the class of unbiased estimators. Note also that Theorem 3.8 covers σ̂T2 , whereas the
Gauss-Markov theorem does not.

3.3 Hypotheses Testing

After a speciﬁcation is estimated, it is often desirable to test various economic and
econometric hypotheses. Given the classical conditions [A1] and [A3], we consider the
linear hypothesis

Rβ o = r, (3.10)

where R is a q × k non-stochastic matrix with rank q < k, and r is a vector of pre-

speciﬁed, hypothetical values.

3.3.1 Tests for Linear Hypotheses

If the null hypothesis (3.10) is true, it is reasonable to expect that Rβ̂ T is “close” to
the hypothetical value r; otherwise, they should be quite diﬀerent. Here, the closeness
between Rβ̂ T and r must be justiﬁed by the null distribution of the test statistics.

If there is only a single hypothesis, the null hypothesis (3.10) is such that R is a row
vector (q = 1) and r is a scalar. Note that a single hypothesis may involve two or more
parameters. Consider the following statistic:

Rβ̂ T − r
.
σo [R(X X)−1 R ]1/2

c Chung-Ming Kuan, 2004
3.3. HYPOTHESES TESTING 59

By Theorem 3.7(a), β̂ T ∼ N (β o , σo2 (X X)−1 ), and hence

Rβ̂ T ∼ N (Rβ o , σo2 R(X X)−1 R ).

Under the null hypothesis, we have

Rβ̂ T − r R(β̂ T − β o )
1/2 = ∼ N (0, 1). (3.11)
−1
σo [R(X X) R ] σo [R(X X)−1 R ]1/2

Although the left-hand side has a known distribution, it cannot be used as a test statistic
because σo is unknown. Replacing σo by its OLS estimator σ̂T yields an operational
statistic:

Rβ̂ T − r
τ= . (3.12)
σ̂T [R(X X)−1 R ]1/2

The null distribution of τ is given in the result below.

Theorem 3.9 Given the linear speciﬁcation (3.1), suppose that [A1] and [A3] hold.
Then under the null hypothesis (3.10) with R a 1 × k vector,

τ ∼ t(T − k),

where τ is given by (3.12).

Proof: We ﬁrst write the statistic τ as

Rβ̂ T − r (T − k)σ̂T2 /σo2
τ= ,
σo [R(X X)−1 R ]1/2 T −k

where the numerator is distributed as N (0, 1) by (3.11), and (T −k)σ̂T2 /σo2 is distributed
as χ2 (T − k) by Theorem 3.7(b). Hence, the square of the denominator is a central χ2
random variable divided by its degrees of freedom T − k. The assertion follows if we
can show that the numerator and denominator are independent. Note that the random
components of the numerator and denominator are, respectively, β̂ T and ê ê, where β̂ T
and ê are two normally distributed random vectors with the covariance matrix

cov(ê, β̂ T ) = IE[(I T − P )(y − Xβ o )y X(X X)−1 ]

= (I T − P ) IE[(y − Xβ o )y ]X(X X)−1

= σo2 (I T − P )X(X X)−1

= 0.

c Chung-Ming Kuan, 2004
60 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

Since uncorrelated normal random vectors are also independent, β̂ T is independent of

ê. By Lemma 2.1, we conclude that β̂ T is also independent of ê ê. 2

As the null distribution of the statistic τ is t(T − k) by Theorem 3.9, τ is known as

the t statistic. When the alternative hypothesis is Rβ o = r, this is a two-sided test;
when the alternative hypothesis is Rβ o > r (or Rβ o < r), this is a one-sided test. For
each test, we first choose a small significance level α and then determine the critical
region Cα . For the two-sided t test, we can find the values ±tα/2 (T − k) from the table
of t distributions such that
α = IP{τ < −tα/2 (T − k) or τ > tα/2 (T − k)}

= 1 − IP{−tα/2 (T − k) ≤ τ ≤ tα/2 (T − k)}.

The critical region is then

Cα = (−∞, −tα/2 (T − k)) ∪ (tα/2 (T − k), ∞),

and ±tα/2 (T − k) are the critical values at the signiﬁcance level α. For the alternative
hypothesis Rβ o > r, the critical region is (tα (T − k), ∞), where tα (T − k) is the critical
value such that

α = IP{τ > tα (T − k)}.

Similarly, for the alternative Rβ o < r, the critical region is (−∞, −tα (T − k)).

The null hypothesis is rejected at the signiﬁcance level α when τ falls in the critical
region. As α is small, the event {τ ∈ Cα } is unlikely under the null hypothesis. When
τ does take an extreme value relative to the critical values, it is an evidence against the
null hypothesis. The decision of rejecting the null hypothesis could be wrong, but the
probability of the type I error will not exceed α. When τ takes a “reasonable” value
in the sense that it falls in the complement of the critical region, the null hypothesis is
not rejected.

Example 3.10 To test a single coeﬃcient equal to zero: βi = 0, we choose R as the

transpose of the i th Cartesian unit vector:

R = [ 0 ··· 0 1 0 ··· 0 ].

Let mii be the i th diagonal element of M −1 = (X X)−1 . Then, R(X X)−1 R = mii .
The t statistic for this hypothesis, also known as the t ratio, is
β̂i,T
τ= √ ∼ t(T − k).
σ̂T mii

c Chung-Ming Kuan, 2004
3.3. HYPOTHESES TESTING 61

When a t ratio rejects the null hypothesis, it is said that the corresponding estimated co-
efficient is significantly different from zero; econometrics and statistics packages usually
report t ratios along with the coefficient estimates. 2

Example 3.11 To test the single hypothesis βi + βj = 0, we set R as

R = [ 0 ··· 0 1 0 ··· 0 1 0 ··· 0 ].

Hence, R(X X)−1 R = mii + 2mij + mjj , where mij is the (i, j) th element of M −1 =
(X X)−1 . The t statistic is

β̂i,T + β̂j,T
τ= ∼ t(T − k). 2
σ̂T (mii + 2mij + mjj )1/2

Several hypotheses can also be tested jointly. Consider the null hypothesis Rβ o = r,
where R is now a q × k matrix (q ≥ 2) and r is a vector. This hypothesis involves q
single hypotheses. Similar to (3.11), we have under the null hypothesis that

[R(X X)−1 R ]−1/2 (Rβ̂ T − r)/σo ∼ N (0, I q ).

Therefore,

(Rβ̂ T − r) [R(X X)−1 R ]−1 (Rβ̂ T − r)/σo2 ∼ χ2 (q). (3.13)

Again, we can replace σo2 by its OLS estimator σ̂T2 to obtain an operational statistic:

(Rβ̂ T − r) [R(X X)−1 R ]−1 (Rβ̂ T − r)

ϕ= . (3.14)
σ̂T2 q

The next result gives the null distribution of ϕ.

Theorem 3.12 Given the linear speciﬁcation (3.1), suppose that [A1] and [A3] hold.
Then under the null hypothesis (3.10) with R a q × k matrix with rank q < k, we have

ϕ ∼ F (q, T − k),

where ϕ is given by (3.14).

Proof: Note that

(Rβ̂ T − r) [R(X X)−1 R ]−1 (Rβ̂ T − r)/(σo2 q)
ϕ= .
σ̂2
(T − k) σT2 (T − k)
o

c Chung-Ming Kuan, 2004
62 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

In view of (3.13) and the proof of Theorem 3.9, the numerator and denominator terms
are two independent χ2 random variables, each divided by its degrees of freedom. The
assertion follows from the deﬁnition of F random variable. 2

The statistic ϕ is known as the F statistic. We reject the null hypothesis at the
signiﬁcance level α when ϕ is too large relative to the critical value Fα (q, T − k) from
the table of F distributions, where Fα (q, T − k) is such that

α = IP{ϕ > Fα (q, T − k)}.

If there is only a single hypothesis, the F statistic is just the square of the corresponding
t statistic. When ϕ rejects the null hypothesis, it simply suggests that there is evidence
against at least one single hypothesis. The inference of a joint test is, however, not
necessary the same as the inference of individual tests; see also Section 3.4.

Example 3.13 Joint null hypothesis: Ho : β1 = b1 and β2 = b2 . The F statistic is

−1
1 β̂1,T − b1 m11 m12 β̂1,T − b1
ϕ= 2 ∼ F (2, T − k),
2σ̂T β̂2,T − b2 m21 m22 β̂2,T − b2

where mij is as deﬁned in Example 3.11. 2

Remark: For the null hypothesis of s coeﬃcients being zero, if the corresponding F
statistic ϕ > 1 (ϕ < 1), dropping these s regressors will reduce (increase) R̄2 ; see
Exercise 3.12.

3.3.2 Power of the Tests

Recall that the power of a test is the probability of rejecting the null hypothesis when the
null hypothesis is indeed false. In this section, we consider the hypothesis Rβ o = r + δ,
where δ characterizes the deviation from the null hypothesis, and analyze the power
performance of the t and F tests.

Theorem 3.14 Given the linear speciﬁcation (3.1), suppose that [A1] and [A3] hold.
Then under the hypothesis that Rβ o = r + δ, where R is a q × k matrix with rank q < k,
we have

ϕ ∼ F (q, T − k; δ D−1 δ, 0),

where ϕ is given by (3.14), D = σo2 [R(X X)−1 R ], and δ D−1 δ is the non-centrality
parameter of the numerator term.

c Chung-Ming Kuan, 2004
3.3. HYPOTHESES TESTING 63

Proof: When Rβ o = r + δ,

[R(X X)−1 R ]−1/2 (Rβ̂ T − r)/σo

= [R(X X)−1 R ]−1/2 [R(β̂ T − β o ) + δ]/σo .

Given [A3],

[R(X X)−1 R ]−1/2 R(β̂ T − β o )/σo . ∼ N (0, I q ),

and hence

[R(X X)−1 R ]−1/2 (Rβ̂ T − r)/σo ∼ N (D−1/2 δ, I q ).

It follows from Lemma 2.7 that

(Rβ̂ T − r) [R(X X)−1 R ]−1 (Rβ̂ T − r)/σo2 ∼ χ2 (q; δ D−1 δ),

which is the non-central χ2 distribution with q degrees of freedom and the non-centrality
parameter δ D−1 δ. This is in contrast with (3.13) which has a central χ2 distribution
under the null hypothesis. As (T − k)σ̂T2 /σo2 is still distributed as χ2 (T − k) by The-
orem 3.7(b), the assertion follows because the numerator and denominator of ϕ are
independent. 2

Clearly, when the null hypothesis is correct, we have δ = 0, so that ϕ ∼ F (q, T − k).
Theorem 3.14 thus includes Theorem 3.12 as a special case. In particular, for testing a
single hypothesis, we have

τ ∼ t(T − k; D−1/2 δ),

which reduces to t(T − k) when δ = 0, as in Theorem 3.9.

Theorem 3.14 implies that when Rβ o deviates farther from the hypothetical value
r, the non-centrality parameter δ D−1 δ increases, and so does the power. We illustrate
this point using the following two examples, where the power are computed using the
GAUSS program. For the null distribution F (2, 20), the critical value at 5% level is 3.49.
Then for F (2, 20; ν1 , 0) with the non-centrality parameter ν1 = 1, 3, 5, the probabilities
that ϕ exceeds 3.49 are approximately 12.1%, 28.2%, and 44.3%, respectively. For the
null distribution F (5, 60), the critical value at 5% level is 2.37. Then for F (5, 60; ν1 , 0)
with ν1 = 1, 3, 5, the probabilities that ϕ exceeds 2.37 are approximately 9.4%, 20.5%,
and 33.2%, respectively. In both cases, the power increases with the non-centrality
parameter.

c Chung-Ming Kuan, 2004
64 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

3.3.3 An Alternative Approach

Given the speciﬁcation (3.1), we may take the constraint Rβ o = r into account and
consider the constrained OLS estimation that ﬁnds the saddle point of the Lagrangian:
1
min (y − Xβ) (y − Xβ) + (Rβ − r) λ,
β,λ T
where λ is the q × 1 vector of Lagrangian multipliers. It is straightforward to show that
the solutions are
λ̈T = 2[R(X X/T )−1 R ]−1 (Rβ̂ T − r),
(3.15)
β̈ T = β̂ T − (X X/T )−1 R λ̈T /2,
which will be referred to as the constrained OLS estimators.
Given β̈ T , the vector of constrained OLS residuals is

ë = y − X β̈ T = y − X β̂ T + X(β̂ T − β̈ T ) = ê + X(β̂ T − β̈ T ).

It follows from (3.15) that

β̂ T − β̈ T = (X X/T )−1 R λ̈T /2

= (X X)−1 R [R(X X)−1 R ]−1 (Rβ̂ T − r).

The inner product of ë is then

ë ë = ê ê + (β̂ T − β̈ T ) X X(β̂ T − β̈ T )

= ê ê + (Rβ̂ T − r) [R(X X)−1 R ]−1 (Rβ̂ T − r).

Note that the second term on the right-hand side is nothing but the numerator of the
F statistic (3.14). The F statistic now can be written as
ë ë − ê ê (ESSc − ESSu )/q
ϕ= = , (3.16)
qσ̂T
2 ESSu /(T − k)
where ESSc = ë ë and ESSu = ê ê denote, respectively, the ESS resulted from con-
strained and unconstrained estimations. Dividing the numerator and denominator of
(3.16) by centered TSS (y y − T ȳ 2 ) yields another equivalent expression for ϕ:
(Ru2 − Rc2 )/q
ϕ= , (3.17)
(1 − Ru2 )/(T − k)
where Rc2 and Ru2 are, respectively, the centered coefficient of determination of con-
strained and unconstrained estimations. As the numerator of (3.17), Ru2 − Rc2 , can be
interpreted as the loss of fit due to the imposed constraint, the F test is in effect a
loss-of-fit test. The null hypothesis is rejected when the constrained specification fits
data much worse.

c Chung-Ming Kuan, 2004
3.4. CONFIDENCE REGIONS 65

Example 3.15 Consider the speciﬁcation: yt = β1 + β2 xt2 + β3 xt3 + et . Given the

hypothesis (constraint) β2 = β3 , the resulting constrained speciﬁcation is

yt = β1 + β2 (xt2 + xt3 ) + et .

By estimating these two speciﬁcations separately, we obtain ESSu and ESSc , from which
the F statistic can be easily computed. 2

Example 3.16 Test the null hypothesis that all the coefficients (except the constant
term) equal zero. The resulting constrained specification is yt = β1 + et , so that Rc2 = 0.
Then, (3.17) becomes
Ru2 /(k − 1)
ϕ= ∼ F (k − 1, T − k),
(1 − Ru2 )/(T − k)
which requires only estimation of the unconstrained specification. This test statistic is
also routinely reported by most of econometrics and statistics packages and known as
the “regression F test.” 2

3.4 Conﬁdence Regions

In addition to point estimators for parameters, we may also be interested in finding
confidence intervals for parameters. A confidence interval for βi,o with the confidence
coefficient (1 − α) is the interval (g α , g α ) that satisfies

IP{ g α ≤ βi,o ≤ g α } = 1 − α.

That is, we are (1 − α) × 100 percent sure that such an interval would include the true
parameter βi,o .

From Theorem 3.9, we know

β̂i,T − βi,o
IP −tα/2 (T − k) ≤ √ ≤ tα/2 (T − k) = 1 − α,
σ̂T mii

where mii is the i th diagonal element of (X X)−1 , and tα/2 (T − k) is the critical value
of the (two-sided) t test at the signiﬁcance level α. Equivalently, we have
√ √
IP β̂i,T − tα/2 (T − k)σ̂T mii ≤ βi,o ≤ β̂i,T + tα/2 (T − k)σ̂T mii = 1 − α.

This shows that the conﬁdence interval for βi,o can be constructed by setting
√
g α = β̂i,T − tα/2 (T − k)σ̂T mii ,
√
g α = β̂i,T + tα/2 (T − k)σ̂T mii .

c Chung-Ming Kuan, 2004
66 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

It should be clear that the greater the confidence coefficient (i.e., α smaller), the larger
is the magnitude of the critical values ±tα/2 (T − k) and hence the resulting confidence
interval.

The confidence region for Rβ o with the confidence coefficient (1 − α) satisfies

IP{(β̂ T − β o ) R [R(X X)−1 R ]−1 R(β̂ T − β o )/(qσ̂T2 ) ≤ Fα (q, T − k)}

= 1 − α,

where Fα (q, T − k) is the critical value of the F test at the signiﬁcance level α.

Example 3.17 The conﬁdence region for (β1,o = b1 , β2,o = b2 ). Suppose T − k = 30

and α = 0.05, then F0.05 (2, 30) = 3.32. In view of Example 3.13,
⎧ −1 ⎫
⎨ 1 β̂1,T − b1 m 11 m 12 β̂1,T − b1 ⎬
IP ≤ 3.32 = 0.95,
⎩ 2σ̂T2 β̂2,T − b2 m21 m22 β̂2,T − b2 ⎭

which results in an ellipse with the center (β̂1,T , β̂2,T ). 2

Remark: A point (β1,o , β2,o ) may be outside the joint confidence ellipse but inside
the confidence box formed by individual confidence intervals. Hence, each t ratio may
show that the corresponding coefficient is insignificantly different from zero, while the F
test indicates that both coefficients are not jointly insignificant. It is also possible that
(β1 , β2 ) is outside the confidence box but inside the joint confidence ellipse. That is,
each t ratio may show that the corresponding coefficient is significantly different from
zero, while the F test indicates that both coefficients are jointly insignificant. See also
an illustrative example in Goldberger (1991, Chap. 19).

3.5 Multicollinearity
In Section 3.1.2 we have seen that a linear specification suffers from the problem of
exact multicollinearity if the basic identifiability requirement (i.e., X is of full column
rank) is not satisfied. In this case, the OLS estimator cannot be computed as (3.4).
This problem may be avoided by modifying the postulated specifications.

3.5.1 Near Multicollinearity

In practice, it is more common that explanatory variables are related to some extent but
do not satisfy an exact linear relationship. This is usually referred to as the problem of

c Chung-Ming Kuan, 2004
3.5. MULTICOLLINEARITY 67

near multicollinearity. But as long as there is no exact multicollinearity, parameters can

still be estimated by the OLS method, and the resulting estimator remains the BLUE
under [A1] and [A2].

Nevertheless, there are still complaints about near multicollinearity in empirical

studies. In some applications, parameter estimates are very sensitive to small changes
in data. It is also possible that individual t ratios are all insigniﬁcant, but the regres-
sion F statistic is highly signiﬁcant. These symptoms are usually attributed to near
multicollinearity. This is not entirely correct, however. Write X = [xi X i ], where X i
is the submatrix of X excluding the i th column xi . By the result of Theorem 3.3, the
variance of β̂i,T can be expressed as

var(β̂i,T ) = var [xi (I − P i )xi ]−1 xi (I − P i )y = σo2 [xi (I − P i )xi ]−1 ,

where P i = X i (X i X i )−1 X i . It can also be veriﬁed that

σo2
var(β̂i,T ) = T ,
t=1 (xti − x̄i )2 (1 − R2 (i))

where R2 (i) is the centered coeﬃcient of determination from the auxiliary regression of
xi on X i . When xi is closely related to other explanatory variables, R2 (i) is high so
that var(β̂i,T ) would be large. This explains why β̂i,T are sensitive to data changes and
why corresponding t ratios are likely to be insigniﬁcant. Near multicollinearity is not a
necessary condition for these problems, however. Large var(β̂i,T ) may also arise due to
small variations of xti and/or large σo2 .

Even when a large value of var(β̂i,T ) is indeed resulted from high R2 (i), there is
nothing wrong statistically. It is often claimed that “severe multicollinearity can make
an important variable look insignificant.” As Goldberger (1991) correctly pointed out,
this statement simply confuses statistical significance with economic importance. These
large variances merely reflect the fact that parameters cannot be precisely estimated
from the given data set.

Near multicollinearity is in fact a problem related to data and model speciﬁcation.

If it does cause problems in estimation and hypothesis testing, one may try to break the
approximate linear relationship by, e.g., adding more observations to the data set (if
plausible) or dropping some variables from the current speciﬁcation. More sophisticated
statistical methods, such as the ridge estimator and principal component regressions,
may also be used; details of these methods can be found in other econometrics textbooks.

c Chung-Ming Kuan, 2004
68 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

3.5.2 Digress: Dummy Variables

A linear speciﬁcation may include some qualitative variables to indicate the presence or
absence of certain attributes of the dependent variable. These qualitative variables are
typically represented by dummy variables which classify data into diﬀerent categories.

For example, let yt denote the annual salary of college teacher t and xt the years
of teaching experience of t. Consider the dummy variable: Dt = 1 if t is a male and
Dt = 0 if t is a female. Then, the speciﬁcation

yt = α0 + α1 Dt + βxt + et

yields two regression lines with diﬀerent intercepts. The “male” regression line has the
intercept α0 + α1 , and the “female” regression line has the intercept α0 . We may test
the hypothesis α1 = 0 to see if there is a diﬀerence between the starting salaries of male
and female teachers.

This speciﬁcation can be expanded to incorporate an interaction term between D

and x:

yt = α0 + α1 Dt + β0 xt + β1 (Dt xt ) + et ,

which yields two regression lines with diﬀerent intercepts and slopes. The slope of the
“male” regression line is mow β0 + β1 , whereas the slope of the “female” regression line
is β0 . By testing β1 = 0, we can check whether teaching experience is treated the same
in determining salaries for male and female teachers.

In the analysis of quarterly data, it is also common to include the seasonal dummy
variables D1t , D2t and D3t , where for i = 1, 2, 3, Dit = 1 if t is the observation of
the i th quarter and Dit = 0 otherwise. Similar to the previous example, the following
speciﬁcation,

yt = α0 + α1 D1t + α2 D2t + α3 D3t + βxt + et ,

yields four regression lines. The regression line for the data of the i th quarter has
the intercept α0 + αi , i = 1, 2, 3, and the regression line for the fourth quarter has
the intercept α0 . Including seasonal dummies allows us to classify the levels of yt into
four seasonal patterns. Various interesting hypotheses can be tested based on this
speciﬁcation. For example, one may test the hypotheses that α1 = α2 and α1 = α2 =
α3 = 0. By the Frisch-Waugh-Lovell theorem we know that the OLS estimate of β
can also be obtained from regressing yt∗ on x∗t , where yt∗ and x∗t are the residuals of

c Chung-Ming Kuan, 2004
3.6. LIMITATIONS OF THE CLASSICAL CONDITIONS 69

regressing, respectively, yt and xt on seasonal dummies. Although some seasonal eﬀects

of yt and xt may be eliminated by the regressions on seasonal dummies, their residuals
yt∗ on x∗t are not the so-called “seasonally adjusted data” which are usually computed
using diﬀerent methods or algorithms.

Remark: The preceding examples show that, when a speciﬁcation contains a constant
term, the number of dummy variables is always one less than the number of categories
that dummy variables intend to classify. Otherwise, the speciﬁcation would have exact
multicollinearity; this is known as the “dummy variable trap.”

3.6 Limitations of the Classical Conditions

The previous estimation and testing results are based on the classical conditions. As
these conditions may be violated in practice, it is important to understand their limi-
tations.

Condition [A1] postulates that explanatory variables are non-stochastic. Although

this condition is quite convenient and facilitates our analysis, it is not practical. When
the dependent variable and regressors are economic variables, it does not make too
much sense to treat only the dependent variable as a random variable. This condition
may also be violated when a lagged dependent variable is included as a regressor, as in
many time-series analysis. Hence, it would be more reasonable to allow regressors to be
random as well.

In [A2](i), the linear speciﬁcation Xβ is assumed to be correct up to some unknown

parameters. It is possible that the systematic component IE(y) is in fact a non-linear
function of X. If so, the estimated regression hyperplane could be very misleading. For
example, an economic relation may change from one regime to another at some time
point so that IE(y) is better characterized by a piecewise liner function. This is known
as the problem of structural change; see e.g., Exercise 3.14. Even when IE(y) is a linear
function, the speciﬁed X may include some irrelevant variables or omit some important
variables. Example 3.6 shows that in the former case, the OLS estimator β̂ T remains
unbiased but is less eﬃcient. In the latter case, it can be shown that β̂ T is biased but
with a smaller variance-covariance matrix; see Exercise 3.6.

Condition [A2](ii) may also easily break down in many applications. For example,
when yt is the consumption of the t th household, it is likely that yt has smaller variation
for low-income families than for high-income families. When yt denotes the GDP growth
rate of the t th year, it is also likely that yt are correlated over time. In both cases, the

c Chung-Ming Kuan, 2004
70 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

variance-covariance matrix of y cannot be expressed as σo2 I T . A consequence of the

failure of [A2](ii) is that the OLS estimator for var(β̂ T ), σ̂T2 (X X)−1 , is biased, which
in turn renders the tests discussed in Section 3.3 invalid.

Condition [A3] may fail when yt have non-normal distributions. Although the BLUE
property of the OLS estimator does not depend on normality, [A3] is crucial for deriving
the distribution results in Section 3.3. When [A3] is not satisﬁed, the usual t and F
tests do not have the desired t and F distributions, and their exact distributions are
typically unknown. This causes serious problems for hypothesis testing.

Our discussion thus far suggests that the classical conditions are quite restrictive.
In subsequent chapters, we will try to relax these conditions and discuss more generally
applicable methods. These methods play an important role in contemporary empirical
studies.

Exercises
3.1 Construct a linear regression model for each equation below:
x eα+βx
y = α xβ , y = α eβx , y= , y= .
αx − β 1 + eα+βx

3.2 Use the general formula (3.4) to ﬁnd the OLS estimators from the speciﬁcations
below:
yt = α + βxt + e, t = 1, . . . , T,

yt = α + β(xt − x̄) + e, t = 1, . . . , T,

yt = βxt + e, t = 1, . . . , T.

Compare the resulting regression lines.

3.3 Given the speciﬁcation yt = α + βxt + e, t = 1, . . . , T , assume that the classical

conditions hold. Let α̂T and β̂T be the OLS estimators for α and β, respectively.

(a) Apply the general formula of Theorem 3.4(c) to show that

T
x2
var(α̂T ) = σo T t=1 t
2
,
T t=1 (xt − x̄)2
1
var(β̂T ) = σo2 T ,
t=1 (xt − x̄)2
x̄
cov(α̂T , β̂T ) = −σo2 T .
t=1 (xt − x̄)
2

c Chung-Ming Kuan, 2004
3.6. LIMITATIONS OF THE CLASSICAL CONDITIONS 71

What kind of data can make the variances of the OLS estimators smaller?
(b) Suppose that a prediction ŷT +1 = α̂T + β̂T xT +1 is made based on the new
observation xT +1 . Show that

IE(ŷT +1 − yT +1 ) = 0,

1 (x − x̄)2
var(ŷT +1 − yT +1 ) = σo2 1 + + TT +1 .
T t=1 (xt − x̄)
2

What kind of xT +1 can make the variance of prediction error smaller?

3.4 Given the speciﬁcation (3.1), suppose that X is not of full column rank. Does
there exist a unique ŷ ∈ span(X) that minimizes (y − ŷ) (y − ŷ)? If yes, is there
a unique β̂ T such that ŷ = X β̂ T ? Why or why not?

3.5 Given the estimated model

yt = β̂1,T + β̂2,T xt2 + · · · + β̂k,T xtk + êt ,

consider the standardized regression:

yt∗ = β̂2,T
∗
x∗t2 + · · · + β̂k,T
∗
x∗tk + ê∗t ,

∗ are known as the beta coeﬃcients, and

where β̂i,T

yt − ȳ xti − x̄i êt

yt∗ = , x∗ti = , ê∗t = ,
sy sxi sy

with s2y = (T − 1)−1 Tt=1 (yt − ȳ)2 is the sample variance of yt and for each

i, s2xi = (T − 1)−1 Tt=1 (xti − x̄i )2 is the sample variance of xti . What is the
∗ and β̂ ? Give an interpretation of the beta coeﬃcients.
relationship between β̂i,T i,T

3.6 Given the following speciﬁcation

y = X 1 β 1 + e,

where X 1 (T × k1 ) is a non-stochastic matrix, let b̂1,T denote the resulting OLS

estimator. Suppose that IE(y) = X 1 b1 + X 2 b2 for some b1 and b2 , where X 2
(T × k2 ) is also a non-stochastic matrix, b2 = 0, and var(y) = σo2 I.

(a) Is b̂1,T unbiased?

(b) Is σ̂T2 unbiased?
(c) What is var(b̂1,T )?

c Chung-Ming Kuan, 2004
72 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

(d) Let β̂ T = (β̂ 1,T β̂ 2,T ) denote the OLS estimator obtained from estimating
the speciﬁcation: y = X 1 β 1 + X 2 β 2 + e. Compare var(β̂ 1,T ) and var(b̂1,T ).
(e) Does your result in (d) change when X 1 X 2 = 0?

3.7 Given the speciﬁcation (3.1), will the changes below aﬀect the resulting OLS
estimator β̂ T , t ratios, and R2 ?

(a) y ∗ = 1000 × y and X are used as the dependent and explanatory variables.
(b) y and X ∗ = 1000 × X are used as the dependent and explanatory variables.
(c) y ∗ and X ∗ are used as the dependent and explanatory variables.

3.8 Let Rk2 denote the centered R2 obtained from the model with k explanatory vari-
ables.

(a) Show that

k T
(xti − x̄i )yt
Rk2 = β̂i,T
t=1
T
,
i=1 t=1 (yt − ȳ)2
T T
where β̂i,T is the i th element of β̂ T , x̄i = t=1 xti /T , and ȳ = t=1 yt /T .

(b) Show that Rk2 ≥ Rk−1

2 .

3.9 Consider the following two regression lines: ŷ = α̂ + β̂x and x̂ = γ̂ + δ̂y. At
which point do these two lines intersect? Using the result in Exercise 3.8 to show
that these two regression lines coincide if and only if the centered R2 s for both
regressions are one.

3.10 Given the speciﬁcation (3.1), suppose that X does not contain the constant term.
Show that the centered R2 need not be bounded between zero and one if it is
computed as (3.7).

3.11 Rearrange the matrix X as [xi X i ], where xi is the i th column of X. Let ui and
v i denote the residual vectors of regressing y on X i and xi on X i , respectively.
Deﬁne the partial correlation coeﬃcient of y and xi as

ui v i
ri = .
(ui ui )1/2 (v i v i )1/2

Let Ri2 and R2 be obtained from the regressions of y on X i and y on X, respec-

tively.

c Chung-Ming Kuan, 2004
3.6. LIMITATIONS OF THE CLASSICAL CONDITIONS 73

(a) Apply the Frisch-Waugh-Lovell Theorem to show

(I − P i )xi xi (I − P i )
I − P = (I − P i ) − ,
xi (I − P i )xi

where P = X(X X)−1 X and P i = X i (X i X i )−1 X i . This result can also

be derived using the matrix inversion formula; see, e.g., Greene (1993. p. 27).

(b) Show that (1 − R2 )/(1 − Ri2 ) = 1 − ri2 , and use this result to verify

R2 − Ri2 = ri2 (1 − Ri2 ).

What does this result tell you?

(c) Let τi denote the t ratio of β̂i,T , the i th element of β̂ T obtained from regress-
ing y on X. First show that τi2 = (T − k)ri2 /(1 − ri2 ), and use this result to
verify

ri2 = τi2 /(τi2 + T − k).

(d) Combine the results in (b) and (c) to show

R2 − Ri2 = τi2 (1 − R2 )/(T − k).

What does this result tell you?

3.12 Suppose that a linear model with k explanatory variables has been estimated.

(a) Show that σ̂T2 = Centered TSS(1 − R̄2 )/(T − 1). What does this result tell
you?

(b) Suppose that we want to test the hypothesis that s coeﬃcients are zero. Show
that the F statistic can be written as

(T − k + s)σ̂c2 − (T − k)σ̂u2
ϕ= ,
sσ̂u2

where σ̂c2 and σ̂u2 are the variance estimates of the constrained and uncon-
strained models, respectively. Let a = (T − k)/s. Show that

σ̂c2 a+ϕ
= .
σ̂u2 a+1

c Chung-Ming Kuan, 2004
74 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

3.13 For the linear speciﬁcation y = Xβ + e, an alternative expression of k − m linear

restrictions on β can be expressed as β = Sθ + d, where θ is a m-dimensional
vector of unknown parameters, S is a k × m matrix of pre-speciﬁed constants with
full column rank, and d is a vector of pre-speciﬁed constants.

(a) By incorporating this restriction into the speciﬁcation, ﬁnd the OLS estima-
tor θ̂ of θ.
(b) The constrained least squares estimator of β is β̂ d = S θ̂ + d. Show that

β̂ d = QS β̂ + (I − QS )d,

where QS = S(S X XS)−1 S X X. Is this decomposition orthogonal?

X β̂ d = P XS y + (I − P XS )Xd,

where P XS = XS(S X XS)−1 S X . Use a graph to illustrate this result.

3.14 (The Chow Test) Consider the model of a one-time structural change at a known
change point:

y1 X1 0 βo e1
= + ,
y2 X2 X2 δo e2

where y 1 and y 2 are T1 × 1 and T2 × 1, X 1 and X 2 are T1 × k and T2 × k,

respectively. The null hypothesis is δ o = 0. How would you test this hypothesis
based on the constrained and unconstrained models?

References

Davidson, Russell and James G. MacKinnon (1993). Estimation and Inference in Econo-
metrics, New York, NY: Oxford University Press.

Goldberger, Arthur S. (1991). A Course in Econometrics, Cambridge, MA: Harvard

University Press.

Greene, William H. (2000). Econometric Analysis, 4th ed., Upper Saddle River, NJ:
Prentice Hall.

Harvey, Andrew C. (1990). The Econometric Analysis of Time Series, Second edition.,
Cambridge, MA: MIT Press.

c Chung-Ming Kuan, 2004
3.6. LIMITATIONS OF THE CLASSICAL CONDITIONS 75

Intriligator, Michael D., Ronald G. Bodkin, and Cheng Hsiao (1996). Econometric
Models, Techniques, and Applications, Second edition, Upper Saddle River, NJ:
Prentice Hall.

Johnston, J. (1984). Econometric Methods, Third edition, New York, NY: McGraw-Hill.

Judge, Georgge G., R. Carter Hill, William E. Griﬃths, Helmut Lütkepohl, and Tsoung-
Chao Lee (1988). Introduction to the Theory and Practice of Econometrics, Sec-
ond edition, New York, NY: Wiley.

Maddala, G. S. (1992). Introduction to Econometrics, Second edition, New York, NY:

Macmillan.

Manski, Charles F. (1991). Regression, Journal of Economic Literature, 29, 34–50.

Rao, C. Radhakrishna (1973). Linear Statistical Inference and Its Applications, Second
edition, New York, NY: Wiley.

Ruud, Paul A. (2000). An Introduction to Classical Econometric Theory, New York,

NY: Oxford University Press.

Theil, Henri (1971). Principles of Econometrics, New York, NY: Wiley.

c Chung-Ming Kuan, 2004
76 CHAPTER 3. CLASSICAL LEAST SQUARES THEORY

c Chung-Ming Kuan, 2004

Notes Simple Linear Regression Analysis
No ratings yet
Notes Simple Linear Regression Analysis
39 pages
MAF3821 2024 Part1
100% (1)
MAF3821 2024 Part1
35 pages
The Simple Regression Model: Introductory Econometrics: A Modern Approach (Wooldridge)
No ratings yet
The Simple Regression Model: Introductory Econometrics: A Modern Approach (Wooldridge)
15 pages
Seemingly Unrelated Regressions
No ratings yet
Seemingly Unrelated Regressions
9 pages
Da Unit III
0% (1)
Da Unit III
43 pages
Metrics 2019 Lec3
No ratings yet
Metrics 2019 Lec3
59 pages
STAT 3008 Applied Regression Analysis Tutorial 1 - Term 2, 2019 20
No ratings yet
STAT 3008 Applied Regression Analysis Tutorial 1 - Term 2, 2019 20
2 pages
Econometrics1 Cha2
100% (1)
Econometrics1 Cha2
77 pages
Cambridge 2024
No ratings yet
Cambridge 2024
466 pages
Econometrics Chap - 2
No ratings yet
Econometrics Chap - 2
57 pages
Chapter - 3
No ratings yet
Chapter - 3
38 pages
Econometrics Simple Linear Regression
No ratings yet
Econometrics Simple Linear Regression
22 pages
Multiple Regression Analysis: I 0 1 I1 K Ik I
100% (1)
Multiple Regression Analysis: I 0 1 I1 K Ik I
30 pages
Ordinary Least Squares With A Single Independent Variable
No ratings yet
Ordinary Least Squares With A Single Independent Variable
6 pages
1-Chap II Econometrics ABC DR Mitiku
No ratings yet
1-Chap II Econometrics ABC DR Mitiku
80 pages
Chapter Three Chapter Three The Multiple Linear Regression (MLR)
No ratings yet
Chapter Three Chapter Three The Multiple Linear Regression (MLR)
50 pages
Handout3 26
No ratings yet
Handout3 26
7 pages
Regression
No ratings yet
Regression
39 pages
OLS
No ratings yet
OLS
18 pages
Ees 400 - Topic Four - Multivariate Regression Analysis
No ratings yet
Ees 400 - Topic Four - Multivariate Regression Analysis
9 pages
Application of Ordinary Least Square Method in Nonlinear
No ratings yet
Application of Ordinary Least Square Method in Nonlinear
4 pages
Week 2 - Simple Linear Regression
No ratings yet
Week 2 - Simple Linear Regression
25 pages
OLS Method
No ratings yet
OLS Method
12 pages
Chapter2
No ratings yet
Chapter2
20 pages
Chap 3.2
No ratings yet
Chap 3.2
56 pages
Econometrics for Finace Lecture II-Session Two
No ratings yet
Econometrics for Finace Lecture II-Session Two
19 pages
da-unit-iii
No ratings yet
da-unit-iii
43 pages
02 Simple Regression
No ratings yet
02 Simple Regression
29 pages
Mathemathical Economics
No ratings yet
Mathemathical Economics
60 pages
Session 2 - 25.10.23
No ratings yet
Session 2 - 25.10.23
20 pages
Ols 23-24
No ratings yet
Ols 23-24
87 pages
Classical Least Squares Theory - Lecture Notes
No ratings yet
Classical Least Squares Theory - Lecture Notes
109 pages
Ch3 Slides Ed4 2024
No ratings yet
Ch3 Slides Ed4 2024
72 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
21 pages
Ols 2
No ratings yet
Ols 2
19 pages
Chapter 1 Article
No ratings yet
Chapter 1 Article
9 pages
Econometrics I: Chapter 3: Two Variable Regression Model: The Problem of Estimation
No ratings yet
Econometrics I: Chapter 3: Two Variable Regression Model: The Problem of Estimation
35 pages
Simple Regression Model CH02
No ratings yet
Simple Regression Model CH02
60 pages
Introduction To Mathematical Modeling: Simple Linear Regression
No ratings yet
Introduction To Mathematical Modeling: Simple Linear Regression
21 pages
Chapter 3 Econometrics Edited
No ratings yet
Chapter 3 Econometrics Edited
48 pages
Week 2
No ratings yet
Week 2
33 pages
Chap 2
No ratings yet
Chap 2
15 pages
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
No ratings yet
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
17 pages
TCH442E Quantitative Methods For Finance
No ratings yet
TCH442E Quantitative Methods For Finance
21 pages
Introduction To Econometrics - Summary
No ratings yet
Introduction To Econometrics - Summary
23 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
17 pages
CHAPTER TWO
No ratings yet
CHAPTER TWO
44 pages
Lecture-4
No ratings yet
Lecture-4
11 pages
Abdi Least Squares 06 Pretty
No ratings yet
Abdi Least Squares 06 Pretty
7 pages
Ch3_slides_Ed4_2024_20(1)
No ratings yet
Ch3_slides_Ed4_2024_20(1)
72 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Slab Design
100% (27)
Slab Design
78 pages
Classical Least Squares Theory
No ratings yet
Classical Least Squares Theory
38 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
TH Editorial Analysis 20 May 2025
No ratings yet
TH Editorial Analysis 20 May 2025
11 pages
Least Squares.: Herv e Abdi
No ratings yet
Least Squares.: Herv e Abdi
4 pages
Tally Practice Question for Yellow Belt o Project Case Study of Tally
No ratings yet
Tally Practice Question for Yellow Belt o Project Case Study of Tally
10 pages
Ordinary Least Squares: Linear Model
No ratings yet
Ordinary Least Squares: Linear Model
13 pages
Marklogic Server: Concepts Guide
No ratings yet
Marklogic Server: Concepts Guide
108 pages
7323Microsoft WSH and VBScript programming for the absolute beginner Ford all chapter instant download
No ratings yet
7323Microsoft WSH and VBScript programming for the absolute beginner Ford all chapter instant download
55 pages
Lista Peças - GA11 - API 339924
No ratings yet
Lista Peças - GA11 - API 339924
764 pages
Regression and Multiple Regression Analysis
100% (1)
Regression and Multiple Regression Analysis
21 pages
Chukwurah Raluchukwu-Lab Report 5
No ratings yet
Chukwurah Raluchukwu-Lab Report 5
23 pages
Distributed Database
No ratings yet
Distributed Database
22 pages
7_Building Microservices--Building Microservices
No ratings yet
7_Building Microservices--Building Microservices
2 pages
Conjunction Makalah
100% (2)
Conjunction Makalah
11 pages
Importance of Econometrics
No ratings yet
Importance of Econometrics
3 pages
Edtpa Lesson Plans
100% (1)
Edtpa Lesson Plans
18 pages
Statistical Composition - Article Writing
No ratings yet
Statistical Composition - Article Writing
11 pages
Type Euro Symbol, Pound Symbol, Yen Symbol, and Other Currency Symbols Online
No ratings yet
Type Euro Symbol, Pound Symbol, Yen Symbol, and Other Currency Symbols Online
3 pages
Pipe and Cistern 19-Mar-2025 13-01-29
No ratings yet
Pipe and Cistern 19-Mar-2025 13-01-29
9 pages
CG Practical File
No ratings yet
CG Practical File
63 pages
Lingvistica Aplicata
No ratings yet
Lingvistica Aplicata
7 pages
Paper 3 St. Thomas Aquinas
No ratings yet
Paper 3 St. Thomas Aquinas
2 pages
King Ping of Zhou - Wikipedia
No ratings yet
King Ping of Zhou - Wikipedia
3 pages
CHARACTER LIST OF KARNAD'S HAYAVADANA_051311
No ratings yet
CHARACTER LIST OF KARNAD'S HAYAVADANA_051311
6 pages
1.4 Scope and Limitation of The Study The Scope of This Study Is To Document The Normalization of Using Colloquial and Slang Words
No ratings yet
1.4 Scope and Limitation of The Study The Scope of This Study Is To Document The Normalization of Using Colloquial and Slang Words
8 pages
Handout 1: Fill in The Blanks With The Correct Form of The Verbs
No ratings yet
Handout 1: Fill in The Blanks With The Correct Form of The Verbs
2 pages
Hands-On Workshop On Molecular Docking & MD Simulation-3
No ratings yet
Hands-On Workshop On Molecular Docking & MD Simulation-3
3 pages
BetterMeds Brochure Feb 2024
No ratings yet
BetterMeds Brochure Feb 2024
14 pages
Examen
No ratings yet
Examen
2 pages
Secrets of Powershell Remoting
100% (1)
Secrets of Powershell Remoting
13 pages
Report. Sample
No ratings yet
Report. Sample
2 pages
I 11 3prim E Term2
No ratings yet
I 11 3prim E Term2
80 pages
Script Writing Herodotus Egyptian Hieroglyphs Hieroglyph Greek
100% (1)
Script Writing Herodotus Egyptian Hieroglyphs Hieroglyph Greek
2 pages
ICT Course Outline
No ratings yet
ICT Course Outline
2 pages
Gold Exp B1 WritingFile Article
No ratings yet
Gold Exp B1 WritingFile Article
2 pages
SBQ Skills
No ratings yet
SBQ Skills
2 pages
Ambalathara Adress
No ratings yet
Ambalathara Adress
2 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Capsule Calculus
From Everand
Capsule Calculus
Ira Ritow
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)