Introduction To Econometrics
Introduction To Econometrics
CHAPTER 1
Introduction
The economic theories you have learnt in various economics courses suggests
many relationships among economic variables. For instance, in microeconomics
we learn demand and supply models in which the quantities demanded and
supplied of a good depend on its price. In macroeconomics, we study ‘investment
function’ to explain the amount of aggregate investment in the economy as the
rate of interest changes; and ‘consumption function’ that relates aggregate
consumption to the level of aggregate disposable income.
What is Econometrics?
“Econometrics is the science which integrates economic theory, economic statistics, and
mathematical economics to investigate the empirical support of the general schematic law
established by economic theory. It is a special type of economic analysis and research in
which the general economic theories, formulated in mathematical terms, is combined with
empirical measurements of economic phenomena. Starting from the relationships of
economic theory, we express them in mathematical terms so that they can be measured.
We then use specific methods, called econometric methods in order to obtain numerical
estimates of the coefficients of the economic relationships.”
I) Economic Models:
Any economic theory is an observation from the real world. For one reason, the
immense complexity of the real world economy makes it impossible for us to
understand all interrelationships at once. Another reason is that all the
interrelationships are not equally important as such for the understanding of the
economic phenomenon under study. The sensible procedure is therefore, to pick
up the important factors and relationships relevant to our problem and to focus
our attention on these alone. Such a deliberately simplified analytical framework
1. A set of variables
Example 1.1: Economic theory postulates that the demand for a commodity
depends on its price, on the prices of other related commodities, on consumers’
income and on tastes. This is an exact relationship which can be written
mathematically as:
Q b 0 b1 P b 2 P0 b 3 Y b 4 t
The above demand equation is exact. However, many more factors may affect
demand. In econometrics the influence of these ‘other’ factors is taken into
account by the introduction into the economic relationships of random variable.
In our example, the demand function studied with the tools of econometrics
would be of the stochastic form:
Q b0 b1 P b2 P0 b3Y b4t u
Where u stands for the random factors which affect the quantity demanded.
In this step the econometrician has to express the relationships between economic
variables in mathematical form. This step involves the determination of three
important tasks:
b) The a priori theoretical expectations about the size and sign of the
parameters of the function.
Specification of the model is the most important and the most difficult stage of any
econometric research. It is often the weakest point of most econometric
applications. In this stage there exists enormous degree of likelihood of
committing errors or incorrectly specifying the model. Some of the common
reasons for incorrect specification of the econometric models are:
b. The limitation of our knowledge of the factors which are operative in any
particular case.
This stage consists of deciding whether the estimates of the parameters are
theoretically meaningful and statistically satisfactory. This stage enables the
econometrician to evaluate the results of calculations and determine the reliability
of the results. For this purpose we use various criteria which may be classified
into three groups:
II. Explanatory ability. The model should be able to explain the observations of
the actual world. It must be consistent with the observed behavior of the
economic variables whose relationship it determines.
III. Accuracy of the estimates of the parameters. The estimates of the coefficients
should be accurate in the sense that they should approximate as best as
possible the true parameters of the structural model. The estimates should
if possible possess the desirable properties of unbiasedness, consistency
and efficiency.
Economic data sets come in a variety of types. While some econometric methods
can be applied with little or no modification to many different kinds of data sets,
the special features of some data sets must be accounted for or should be
exploited. There are broadly three types of data that can be employed in
quantitative analysis of financial problems: We next describe the most important
data structures encountered in applied work.
Cross-sectional data are data on one or more variables collected at a single point
in time. It consists of a sample of individuals, households, firms, cities, states,
countries, or a variety of other units, taken at a given point in time. Sometimes
the data on all units do not correspond to precisely the same time period. For
example, several families may be surveyed during different weeks within a year.
In a pure cross section analysis we would ignore any minor timing differences in
collecting the data. If a set of families was surveyed during different weeks of the
same year, we would still view this as a cross-sectional data set.
An important feature of cross-sectional data is that we can often assume that they
have been obtained by random sampling from the underlying population.
Sometimes random sampling is not appropriate as an assumption for analyzing
cross-sectional data. For example, suppose we are interested in studying factors
that influence the accumulation of family wealth. We could survey a random
sample of families, but some families might refuse to report their wealth. If, for
example, wealthier families are less likely to disclose their wealth, then the
resulting sample on wealth is not a random sample from the population of all
families. Another violation of random sampling occurs when we sample from
units that are large relative to the population, particularly geographical units.
The potential problem in such cases is that the population is not large enough to
reasonably assume the observations are independent draws.
Cross-sectional data are widely used in economics and other social sciences. In
economics, the analysis of cross-sectional data is closely aligned with the applied
microeconomics fields, such as labor economics, state and local public finance,
industrial organization, urban economics, demography, and health economics.
Data on individuals, households, firms, and cities at a given point in time are
important for testing microeconomic hypotheses and evaluating economic
policies.
Time series data, as the name suggests, are data that have been collected over a
period of time on one or more variables. Time series data have associated with
them a particular frequency of observation or collection of data points. The
frequency is simply a measure of the interval over, or the regularity with which, the
data are collected or recorded. Because past events can influence future events
and lags in behavior are prevalent in the social sciences, time is an important
dimension in a time series data set. Unlike the arrangement of cross-sectional
data, the chronological ordering of observations in a time series conveys
potentially important information.
A panel data (or longitudinal data) set consists of a time series for each cross-
sectional member in the data set. It has the dimensions of both time series and
cross-sections. As an example, suppose we have wage, education, and
employment history for a set of individuals followed over a ten-year period. Or
we might collect information, such as investment and financial data, about the
same set of firms over a five-year time period. Panel data can also be collected on
geographical units. For example, we can collect data for the same set of counties
in the Ethiopia on immigration flows, tax rates, wage rates, government
expenditures, etc., for the years 1995, 2000, and 2005.
CHAPTER 2
Economic theories are mainly concerned with the relationships among various
economic variables. These relationships, when phrased in mathematical terms, can
predict the effect of one variable on another. The functional relationships of these
variables define the dependence of one variable upon the other variable (s) in the
specific form. The specific functional forms may be linear, quadratic, logarithmic,
exponential, hyperbolic, or any other form.
such a case, for any given value of X, the dependent variable Y assumes some
specific value only with some probability. Let’s illustrate the distinction between
stochastic and non-stochastic relationships with the help of a supply function.
Assuming that the supply for a certain commodity depends on its price (other
determinants taken to be constant) and the function being linear, the relationship
can be put as:
Q f ( P ) P ( 2.1)
The above relationship between P and Q is such that for a particular value of P,
there is only one corresponding value of Q. This is, therefore, a deterministic
(non-stochastic) relationship since for each price there is always only one
corresponding quantity supplied. This implies that all the variation in Y is due
solely to changes in X, and that there are no other factors affecting the dependent
variable.
If this were true all the points of price-quantity pairs, if plotted on a two-
dimensional plane, would fall on a straight line. However, if we gather
observations on the quantity actually supplied in the market at various prices and
we plot them on a diagram we see that they do not fall on a straight line.
The deviation of the observation from the line may be attributed to several factors.
d. Error of aggregation
e. Error of measurement
Yi X ui ………………………………………………………. (2.2)
Thus a stochastic model is a model in which the dependent variable is not only
determined by the explanatory variable(s) included in the model but also by
others which are not included in the model.
The above stochastic relationship (2.2) with one explanatory variable is called
simple linear regression model. The true relationship which connects the variables
involved is split into two parts: a part represented by a line and a part represented
by the random term ‘u’.
The scatter of observations represents the true relationship between Y and X. The
line represents the exact part of the relationship and the deviation of the
observation from the line represents the random component of the relationship.
Were it not for the errors in the model, we would observe all the points on the line
Y1' , Y2' ,......,Yn' corresponding to X 1 , X 2 ,...., X n . However because of the random
Yi xi ui
the dependent var iable the regression line random var iable
The first component in the bracket is the part of Y explained by the changes in X
and the second is the part of Y not explained by X, that is to say the change in Y is
due to the random influence of ui .
The classicals assumed that the model should be linear in the parameters
regardless of whether the explanatory and the dependent variables are linear or
not. This is because if the parameters are non-linear it is difficult to estimate them
since their value is not known but you are given with the data of the dependent
and independent variable.
Example
the assumption.
This means that the value which u may assume in any one period depends on
chance; it may be positive, negative or zero. Every value has a certain probability
of being assumed by u in any particular instance.
3. The mean value of the random variable(U) in any particular period is zero
This means that for each value of x, the random variable(u) may assume various
values, some greater than zero and some smaller than zero, but if we considered
all the positive and negative values of u, for any given value of X, they would
have on average value equal to zero. In other words the positive and negative
values of u cancel each other.
For all values of X, the u’s will show the same dispersion around their mean. In
Fig.2.c this assumption is denoted by the fact that the values that u can assume lie
with in the same limits, irrespective of the value of X. For X 1 , u can assume any
value with in the range AB; for X 2 , u can assume any value with in the range CD
which is equal to AB and so on.
Graphically;
Mathematically;
This means the values of u (for each x) have a bell shaped symmetrical
distribution about their zero mean and constant variance 2 , i.e.
U i N (0, 2 ) ………………………………………..……2.4
assumption of no autocorrelation)
This means the value which the random term assumed in one period does not
depend on the value which it assumed in any other period.
Algebraically,
Cov (u i u j ) [(u i (u i )][ u j (u j )]
E (u i u j ) 0 …………………………..…. (2.5)
This means that, in taking large number of samples on Y and X, the X i values are
the same in all samples, but the ui values do differ from sample to sample, and so
This means there is no correlation between the random variable and the
explanatory variable. If two variables are unrelated their covariance is zero.
Proof:-
( X iU i ) ( X i )(U i )
( X iU i )
0
We can now use the above assumptions to derive the following basic concepts.
Proof: Yi
X i Since (ui ) 0
X i ui ( X i )
2
(u i ) 2
2 (Since (u i ) 2 2 )
xi , are a set of fixed values by assumption 5 and therefore don’t affect the shape of
the distribution of y i .
Yi ~ N( x i , 2 )
Cov (Yi , Y j ) 0
Proof:
E{[ X i U i E ( X i U i )][ X j U j E ( X j U j )}
(Since Yi X i U i and Y j X j U j )
Specifying the model and stating its underlying assumptions are the first stage of
any econometric application. The next step is the estimation of the numerical
But, here we will deal with the OLS and the MLM methods of estimation.
Because Y and X represent their respective population value, and and are
called the true parameters since they are estimated from the population value of Y
and X. But it is difficult to obtain the population value of Y and X because of
technical or economic reasons. So we are forced to take the sample value of Y and
X. The parameters estimated from the sample value of Y and X are called the
ˆ and ˆ are estimated from the sample of Y and X and ei represents the sample
(CLS) involves finding values for the estimates ˆ and ˆ which will minimize the
e 2
i (Yi ˆ ˆX i ) 2 …………………………………………. (2.7)
To find the values of ˆ and ˆ that minimize this sum, we have to partially
differentiate e 2
i with respect to ˆ and ˆ and set the partial derivatives equal to
zero.
ei2
1. 2 (Yi ˆ ˆX i ) 0.......................................................(2.8)
ˆ
ˆ Y ˆX ..........................................................................(2.10)
ei2
2. 2 X i (Yi ˆ ˆX ) 0..................................................(2.11)
ˆ
Note: at this point that the term in the parenthesis in equation 2.8 and 2.11 is the
residual, e Yi ˆ ˆX i . Hence it is possible to rewrite (2.8) and (2.11) as
e i 0 and X e i i 0............................................(2.12)
Equation (2.9) and (2.13) are called the Normal Equations. Substituting the values
of ̂ from (2.10) to (2.13), we get:
Y X i i X i (Y ˆX ) ˆX i2
Y X i ˆXX i ˆX i2
Y Xi i Y X i ˆ (X i2 XX i )
XY nXY = ˆ ( X i2 nX 2)
XY nXY
ˆ …………………. (2.14)
X i2 nX 2
( X X )(Y Y ) ( XY XY XY XY )
( X X )(Y Y ) XY n X Y (2.15)
( X X ) 2 X 2 nX 2 (2.16)
( X X )(Y Y )
ˆ
( X X ) 2
xi yi
ˆ ……………………………………… (2.17)
xi2
Subject to: ˆ 0
Z
2(Yi ˆ ˆX i ) 0 (i )
ˆ
Z
2(Yi ˆ ˆX i ) ( X i ) 0 (ii )
ˆ
z
2 0 (iii )
X i (Yi ˆX i ) 0
Yi X i ˆX i 0
2
X i Yi
ˆ ……………………………………..(2.18)
X i2
This formula involves the actual values (observations) of the variables and not their
The ideal or optimum properties that the OLS estimates possess may be
summarized by well known theorem known as the Gauss-Markov Theorem.
Statement of the theorem: “Given the assumptions of the classical linear regression model, the OLS
estimators, in the class of linear and unbiased estimators, have the minimum variance, i.e. the OLS
According to this theorem, under the basic assumptions of the classical linear
regression model, the least squares estimators are linear, unbiased and have
minimum variance (i.e. are best of all linear unbiased estimators). Some times the
theorem referred as the BLUE theorem i.e. Best, Linear, Unbiased Estimator. An
estimator is called BLUE if:
a. Linear: a linear function of the random variable, such as, the dependent
variable Y.
According to the Gauss-Markov theorem, the OLS estimators possess all the BLUE
properties. The detailed proof of these properties are presented below
a) Linearity: (for ˆ )
(but xi ( X X ) X nX nX nX 0 )
xi Y xi
ˆ ; Now, let Ki (i 1,2,.....n)
xi2 xi2
ˆ K i Y (2.19)
ˆ is linear in Y
b) Unbiasedness:
Proposition: ˆ & ˆ are the unbiased estimators of the true parameters &
From your statistics course, you may recall that if ˆ is an estimator of then
E (ˆ) the amount of bias and if ˆ is the unbiased estimator of then bias =0 i.e.
E (ˆ) 0 E (ˆ)
In our case, ˆ & ˆ are estimators of the true parameters & .To show that they
are the unbiased estimators of their respective parameters means to prove that:
( ˆ ) and (ˆ )
k i k i X i k i ui ,
xi ( X X ) X nX nX nX
k i 0
xi
2
xi2
xi2 xi2
k i 0 ………………………………………………………………… (2.20)
xi X i ( X X ) Xi
k i X i
xi2 xi2
X 2 XX X 2 nX 2
1
X 2 nX 2 X 2 nX 2
k i X i 1.............................……………………………………………(2.21)
ˆ k i ui ˆ k i ui (2.22)
( ˆ ) , since (ui ) 0
From the proof of linearity property under 2.2.2.3 (a), we know that:
̂ 1 n Xk i Yi
1
n X i 1 n u i Xk i Xk i X i Xk i u i
1 n u i X k i u i , ˆ 1
n u i Xk i u i
1 n Xk i )u i ……………………(2.23)
(ˆ ) (2.24)
Now, we have to establish that out of the class of linear and unbiased estimators
of and , ˆ and ˆ possess the smallest sampling variances. For this, we shall
first obtain variance of ˆ and ˆ and then establish that each has the minimum
variance in comparison of the variances of other linear and unbiased estimators
obtained by any other econometric methods than OLS.
a. Variance of ˆ
var( ˆ ) E ( k i u i ) 2
x i xi2 1
k i 2 , and therefore, k i
2
2
x i ( x i )
2 2
x i
ˆ 2
var( ) k i 2 ……………………………………………..(2.26)
2 2
x i
b. Variance of ̂
ˆ (2.27 )
2
var(ˆ ) 1 n Xk i u i2
2
1 n Xk i (u i ) 2
2
2 ( 1 n Xk i ) 2
2 ( 1 n2 2 n Xk i X 2 k i2 )
2 ( 1 n 2 X n k i X 2 k i2 ) , Since k i 0
2 ( 1 n X 2 k i2 )
1 X2 xi2 1
2( ) , Since k 2
2
n xi ( x i ) x i
2 i 2 2
Again:
1 X 2 xi2 nX 2 X 2
n xi2 nxi2 nxi
2
2 1 X2 X i2
var( ) n 2
ˆ 2 …………………………………………(2.28)
xi nx 2
i
We have computed the variances OLS estimators. Now, it is time to check whether
these variances of OLS estimators do possess minimum variance property
compared to the variances other estimators of the true and , other than
ˆ and ˆ .
1. Minimum variance of ˆ
where , wi ki ; but: wi ki ci
* wi ( X i ui ) Since Yi X i U i
But, wi ki ci
Since w i X i 1 and k i X i 1 c i X i 0 .
Thus, from the above calculations we can summarize the following results.
wi var(Yi )
2
ci xi
wi2 k i2 ci2 Since k i ci 0
xi2
Given that ci is an arbitrary constant, 2 ci2 is a positive i.e. it is greater than zero.
Thus var( *) var( ˆ ) . This proves that ˆ possesses minimum variance property.
In the similar way we can prove that the least square estimate of the constant
intercept ( ̂ ) possesses minimum variance.
2. Minimum Variance of ̂
ˆ ( 1 n Xk i )Yi
By analogy with that the proof of the minimum variance property of ˆ , let’s use
the weights wi = ci + ki Consequently;
* ( 1 n Xwi )Yi
* ( 1 n Xwi )( X i u i )
X ui
( Xwi XX i wi Xwi u i )
n n n
( wi ) 0, ( wi X i ) 1 and ( wi ui ) 0
i.e., if wi 0, and wi X i 1 . These conditions imply that ci 0 and ci X i 0 .
( 1 n Xwi ) 2 var(Yi )
2 ( 1 n Xwi ) 2
2 ( 1 n 2 X 2 wi 2 1 n Xwi )
2
2 ( n n 2 X 2 wi 2 X wi )
2 1
n
var( *) 2
1
n X 2 wi
2
,Since wi 0
var( *) 2 1
n X 2 (k i2 ci2
1 X2
var( *) 22
2 X 2 ci2
n xi
X i2
2
2 X 2 ci2
nx i
2
Therefore, we have proved that the least square estimators of linear regression
model are best, linear and unbiased (BLU) estimators.
You may observe that the variances of the OLS estimates involve 2 , which is the
population variance of the random disturbance term. But it is difficult to obtain
the population data of the disturbance term because of technical and economic
reasons. Hence it is difficult to compute 2 ; this implies that variances of OLS
estimates are also difficult to compute. But we can compute these variances if we
take the unbiased estimate of 2 which is ˆ 2 computed from the sample value of
the disturbance term ei from the expression:
ei2
ˆ u2 …………………………………..2.30
n2
e
2
n2
yˆ and e i .
Proof:
Yi ˆ ˆX i ei
Yˆ ˆ ˆx
Y Yˆ ei …………………………………………………………… (2.31)
ei Yi Yˆ …………………………………………………………… (2.32)
Y Yˆi
Y Yˆ (2.33)
n n
Y Yˆ e
Y Yˆ
(Y Y ) (Yˆ Yˆ ) e
yi yˆ i e ……………………………………………… (2.34)
From (2.34):
ei yi yˆ i ……………………………………………….. (2.35)
From: Yi X i U i
Y X U
We get, by subtraction
y i (Yi Y ) i ( X i X ) (U i U ) xi (U U )
y i x (U U ) ……………………………………………………. (2.36)
Note that we assumed earlier that, (u ) 0 , i.e. in taking a very large number
samples we expect U to have a mean value of zero, but in any particular single
sample U is not necessarily zero.
Similarly: From;
Yˆ ˆ ˆx
Y ˆ ˆx
We get, by subtraction
Yˆ Yˆ ˆ ( X X )
yˆ ̂ x ……………………………………………………………. (2.37)
(ui u ) ( ˆi ) xi
The summation over the n sample values of the squares of the residuals over the
‘n’ samples yields:
ei2 [(ui u ) ( ˆ ) xi ]2
A. [(u u ) 2 ] (u i2 u u i )
2 ( u i ) 2
u i
n
1
(u i2 ) ( u ) 2
n
n 2 1n (ui2 2ui u j )
n 2 1n ((ui2 ) 2ui u j ) i j
n 2 1n n u2 n2 (ui u j )
n u2 u2 ( given (ui u j ) 0)
u2 (n 1) …………………………………………….. (2.39)
Given that the X’s are fixed in all samples and we know that
1
( ˆ ) 2 var( ˆ ) u2
x 2
1
Hence xi2 .(ˆ ) 2 xi2 . u2
x 2
But from (2.22), ( ˆ ) k i ui and substitute it in the above expression, we will get:
xi u i
= -2
( x i u i ) xi
,since k i
xi x
2
i
2
(x u ) 2
2 i 2i
xi
x i 2 u i 2 2 x i x j u i u j
2
x i
2
x 2 (u i )
2
2 ( given (u i u j ) 0)
xi
2
Consequently, Equation (2.38) can be written interms of (2.39), (2.40) and (2.41) as
follows:
ei2 n 1 u2 2 2 u2 (n 2) u2 ………………………….(2.42)
e 2
i E (ˆ u2 ) u2 ………………………………………………..(2.43)
n2
ei2
Since ˆ 2
n2
u
ei2
Thus, ˆ 2 is unbiased estimate of the true variance of the error term ( 2 ).
n2
The conclusion that we can drive from the above proof is that we can substitute
ei2
ˆ 2 for ( 2 ) in the variance expression of ˆ and ˆ , since E (ˆ 2 ) 2 .
n2
ˆ 2 ei2
Var ( ˆ ) 2 = …………………………………… (2.44)
x i ( n 2) x i
2
2 X i ei X i …………………………… (2.45)
2 2 2
Var( )
ˆ ˆ
2
nxi n(n 2) xi
2
e can be computed as ei 2 yi ̂ xi yi .
2
Note: i
2
Do not worry about the derivation of this expression! We will perform the
derivation of it in our subsequent subtopic.
After the estimation of the parameters and the determination of the least square
regression line, we need to know how ‘good’ is the fit of this line to the sample
observation of Y and X, that is to say we need to measure the dispersion of
observations around the regression line. This knowledge is essential because the
closer the observation to the line, the better the goodness of fit, i.e. the better is the
explanation of the variations of Y by the changes in the explanatory variables.
We divide the available criteria into three groups: the theoretical a priori criteria, the
statistical criteria, and the econometric criteria. Under this section, our focus is on
statistical criteria (first order tests). The two most commonly used first order tests
in econometric analysis are:
I. The coefficient of determination (the square of the correlation coefficient i.e. R2). This
test is used for judging the explanatory power of the independent variable(s).
II. The standard error tests of the estimators. This test is used for judging the statistical
reliability of the estimates of the regression coefficients.
R2 shows the percentage of total variation of the dependent variable that can be
explained by the changes in the explanatory variable(s) included in the model. To
elaborate this let’s draw a horizontal line corresponding to the mean value of the
dependent variable Y . (see figure ‘d’ below). By fitting the line Yˆ ˆ 0 ˆ1 X we try
.Y
By: TDT Page 40
Hawassa University
Department of Economics Introduction to Econometrics
Y = e Y Yˆ
Y Y = Yˆ Yˆ ˆ 0 ˆ1 X
= Yˆ Y
Y.
As can be seen from fig.(d) above, Y Y represents measures the variation of the
sample observation value of the dependent variable around the mean. However
the variation in Y that can be attributed the influence of X, (i.e. the regression line)
is given by the vertical distance Yˆ Y . The part of the total variation in Y about Y
that can’t be attributed to X is equal to e Y Yˆ which is referred to as the residual
variation.
In summary:
Now, we may write the observed Y as the sum of the predicted value ( Yˆ ) and the
residual term (ei.).
Yi Yˆ ei
predicted Y
Observed Yi i Re sidual
From equation (2.34) we can have the above equation but in deviation form
expression:
y 2 ( yˆ 2 e) 2
y 2 ( yˆ 2 e i2 2ye i )
y i 2 e i2 2yˆ e i
yˆe 0 ………………………………………………(2.46)
Therefore;
y i2
yˆ 2 ei2 ………………………………...(2.47)
Total Explained Un exp lained
var iation var iation var ation
OR,
i.e.
ESS yˆ 2
………………………………………. (2.49)
TSS y 2
From equation (2.37) we have yˆ ˆx . Squaring and summing both sides give us
yˆ 2 ˆ 2 x 2 (2.50 )
ˆ 2 x 2
ESS / TSS …………………………………(2.51)
y 2
xy xi x y
22
2 , Since ˆ i 2 i
x y
2
xi
xy xy
……………………………………… (2.52)
x 2 y 2
xy xy
ESS/TSS = r2
x 2 y 2
The limit of R2: The value of R2 falls between zero and one. i.e. 0 R 2 1 .
Interpretation of R2
Suppose R 2 0.9 , this means that the regression line gives a good fit to the
observed data since this line explains 90% of the total variation of the Y value
around their mean. The remaining 10% of the total variation in Y is unaccounted
for by the regression line and is attributed to the factors included in the
disturbance variable u i .
To test the significance of the OLS parameter estimators we need the following:
Unbiased estimator of 2
ˆ 2
var( ˆ )
x 2
ˆ 2 X 2
var(ˆ )
nx 2
e 2 RSS
ˆ 2
n2 n2
For the purpose of estimation of the parameters the assumption of normality is not
used, but we use this assumption to test the significance of the parameter
estimators; because the testing methods or procedures are based on the
assumption of the normality assumption of the disturbance term. Hence before
We have already assumed that the error term is normally distributed with mean
zero and variance 2 , i.e. U i ~ N ( 02 ),. Similarly, we also proved that
2
1. ˆ ~ N , 2
x
2 X 2
2. ˆ ~ N ,
nx 2
To show whether ˆ and ˆ are normally distributed or not, we need to make use of
one property of normal distribution. “........ any linear function of a normally
distributed variable is itself normally distributed.”
ˆ 2 2 X 2
~ N , 2 ; ~ N ,
ˆ
x n x 2
The OLS estimates ˆ and ˆ are obtained from a sample of observations on Y and
X. Since sampling errors are inevitable in all estimates, it is necessary to apply test
of significance in order to measure the size of the error and determine the degree
of confidence in order to measure the validity of these estimates. This can be done
by using various tests. The most common ones are:
All of these testing procedures reach on the same conclusion. Let us now see these
testing methods one by one.
This test helps us decide whether the estimates ˆ and ˆ are significantly different
from zero, i.e. whether the sample from which they have been estimated might
have come from a population whose true parameters are zero. 0 and / or 0 .
SE( ˆ ) var( ˆ )
SE(ˆ ) var(ˆ )
Second: compare the standard errors with the numerical values of ˆ and ˆ .
Decision rule:
If SE( ˆi ) 1 2 ˆi , accept the null hypothesis and reject the alternative
The acceptance or rejection of the null hypothesis has definite economic meaning.
Namely, the acceptance of the null hypothesis 0 (the slope parameter is zero)
implies that the explanatory variable to which this estimate relates does not in fact
influence the dependent variable Y and should not be included in the function,
since the conducted test provided evidence that changes in X leave Y unaffected.
In other words acceptance of H0 implies that the relation ship between Y and X is
in fact Y (0) x , i.e. there is no relationship between X and Y.
Numerical example: Suppose that from a sample of size n=30, we estimate the
following supply function.
Q 120 0.6 p ei
SE : (1.7) (0.025 )
Test the significance of the slope parameter at 5% level of significance using the
standard error test.
SE ( ˆ ) 0.025
( ˆ ) 0.6
1
2 ˆ 0.3
This implies that SE( ˆi ) 1 2 ˆi . The implication is ˆ is statistically significant at 5%
level of significance.
Note: The standard error test is an approximated test (which is approximated from the z-
test and t-test) and implies a two tail test conducted at 5% level of significance.
Like the standard error test, this test is also important to test the significance of the
parameters. From your statistics, any variable X can be transformed into t using
the general formula:
X
t , with n-1 degree of freedom.
sx
( X X ) 2
sx
n 1
n Sample size
ˆi
t ˆ
SE ( ˆ )
With n-k degree of freedom.
ˆ
tˆ
SE (ˆ )
Where:
SE = is standard error
Since we have two parameters in simple linear regression with intercept different
from zero, our degree of freedom is n-2. Like the standard error test we formally
test the hypothesis: H 0 : i 0 against the alternative H1 : i 0 for the slope
Step 1: Compute t*, which is called the computed value of t, by taking the value of
in the null hypothesis. In our case 0 , then t* becomes:
ˆ 0 ˆ
t*
SE ( ˆ ) SE ( ˆ )
Step 3: Check whether there is one tail test or two tail tests. If the inequality sign
in the alternative hypothesis is , then it implies a two tail test and divide the
chosen level of significance by two; decide the critical rejoin or critical value of t
called tc. But if the inequality sign is either > or < then it indicates one tail test and
there is no need to divide the chosen level of significance by two to obtain the
critical value of to from the t-table.
Example:
If we have H 0 : i 0
Against: H1 : i 0
Then this is a two tail test. If the level of significance is 5%, divide it by two to
obtain critical value of t from the t-table.
Numerical Example:
Suppose that from a sample size n=20 we estimate the following consumption
function:
C 100 0.70 e
(75 .5) (0.21)
The values in the brackets are standard errors. We want to test the null
hypothesis: H 0 : i 0 against the alternative H1 : i 0 using the t-test at 5% level
of significance.
ˆ 0 ˆ 0.70
t* = 3 .3
SE ( ˆ ) SE ( ˆ ) 0.21
at
2 =0.025 and 18 degree of freedom (df) i.e. (n-2=20-2). From the t-table
‘tc’ at 0.025 level of significance and 18 df is 2.10.
Rejection of the null hypothesis doesn’t mean that our estimate ˆ and ˆ is the
correct estimate of the true population parameter and . It simply means that
our estimate comes from a sample drawn from a population whose parameter
is different from zero.
In order to define how close the estimate to the true parameter, we must construct
confidence interval for the true parameter, in other words we must establish
limiting values around the estimate with in which the true parameter is expected
to lie within a certain “degree of confidence”. In this respect we say that with a
given probability the population parameter will be within the defined confidence
interval (confidence limits).
ˆ
any value of t which is equal to at n-2 degree of freedom is
SE ( ˆ )
1 2 2 i.e. 1 .
ˆ
but t* …………………………………………………….(2.58)
SE ( ˆ )
ˆ
Pr t c t c 1 ………………………………………..(2.59)
SE ( ˆ )
Pr SE( ˆ )t c ˆ SE( ˆ )t c 1 by multiplyin g SE( ˆ )
Pr ˆ SE( ˆ )t c ˆ SE( ˆ )t c 1 by subtractin g ˆ
Pr ˆ SE( ˆ ) ˆ SE( ˆ )t c 1 by multiplyin g by 1
Pr ˆ SE( ˆ )t c ˆ SE( ˆ )t c 1 int erchanging
The limit within which the true lies at (1 )% degree of confidence is:
H0 : 0
H1 : 0
Decision rule: If the hypothesized value of in the null hypothesis is within the
hypothesis is outside the limit, reject H0 and accept H1. This indicates ˆ is
statistically significant.
Numerical Example:
Y 128 .5 2.88 X e
(38 .2) (0.85)
Solution:
1) The limit within which the true lies at 95% confidence interval is:
ˆ SE ( ˆ )t c
ˆ 2.88
SE ( ˆ ) 0.85
(1.09, 4.67)
2) The value of in the null hypothesis is zero which implies it is outside the
confidence interval. Hence is statistically significant.
Y 128 .5 2.88 X
Example: , R2 = 0.93. The numbers in the
(38 .2) (0.85)
parenthesis below the parameter estimates are the standard errors. Some
econometricians report the t-values of the estimated coefficients in place of the
standard errors.
CHAPTER 3
THE CLASSICAL REGRESSION ANALYSIS
3.1. Introduction
Yi 0 1 P1 2 P2 3 X i ui --------------------------------------------------------- (3.1)
is consumer’s income, and ' s are unknown parameters and ui is the disturbance.
Equation (3.1) is a multiple regression with three explanatory variables. In general for K-
explanatory variable we can write the model as follows:
j ( j 0,1,2,....(k 1)) are unknown parameters and ui is the disturbance term. The
In this chapter we will first start our discussion with the assumptions of the multiple
regressions and we will proceed our analysis with the case of two explanatory variables
and then we will generalize the multiple regression model in the case of k-explanatory
variables using matrix algebra.
In order to specify our multiple linear regression models and proceed our analysis with
regard to this model, some assumptions are compulsory. But these assumptions are the
same as in the single explanatory variable model developed earlier except the
assumption of no perfect multicollinearity. These assumptions are:
3. Homoscedasticity: The variance of each ui is the same for all the xi values. i.e.
E (ui ) u (constant)
2 2
from the values of any other ui (corresponding to Xj) for i j. i.e. E (u i u j ) 0 for
xi j
fulfilled if we assume that the values of the X’s are a set of fixed numbers in all
(hypothetical) samples.
7. No perfect multicollinearity: The explanatory variables are not perfectly linearly
correlated.
We can’t exclusively list all the assumptions but the above assumptions are some of the
basic assumptions that enable us to proceeds our analysis.
In order to understand the nature of multiple regression model easily, we start our
analysis with the case of two explanatory variables, then extend this to the case of k-
explanatory variables.
is multiple regressions with two explanatory variables. The expected value of the above
model is called population regression equation i.e.
are also sometimes known as regression slopes of the regression. Note that, 2 for
estimated from sample data. Let us suppose that the sample data has been used to
estimate the population regression equation. We leave the method of estimation
unspecified for the present and merely assume that equation (3.4) has been estimated by
sample regression equation, which we write as:
Now it is time to state how (3.3) is estimated. Given sample observation on Y , X 1 & X 2 ,
we estimate (3.3) using the method of least square (OLS).
respect to ˆ0 , ˆ1 and ˆ 2 and set the partial derivatives equal to zero.
ei2
2 Yi ˆ0 ˆ1 X 1i ˆ 2 X 2i 0 ………………………. (3.8)
ˆ0
ei2
2 X 1i Yi ˆ0 ˆ1 X 1i ˆ1 X 1i 0 ……………………. (3.9)
ˆ
1
ei2
2 X 2i Yi ˆ0 ˆ1 X 1i ˆ 2 X 2i 0 ………… ……….. (3.10)
ˆ
2
We know that
X Yi (X i Yi nX i Yi ) xi y i
2
i
X X i (X i nX i ) xi
2 2 2 2
i
Substituting the above equations in equation (3.15), the normal equation (3.12) can be
written in deviation form as follows:
x 1
2
x x
1 2 ̂1 = x 2 y …………. (3.20)
x x x 1 2 2
2
ˆ 2 x 3y
x y . x x1 x2 . x1 y
2
ˆ 2 2 2 1 2 ………………….……………………… (3.22)
x1 . x2 ( x1 x2 ) 2
We can also express ˆ1 and ˆ 2 in terms of covariance and variances of Y , X 1 and X 2
e
2
ESS RSS
R 2
1 1 i 2 ------------------------------------- (3.25)
TSS TSS yi
ESS ˆ1x1i y i ˆ 2 x 2i y i
R2 ---------------------------------- (3.27)
TSS y 2
between Yˆ & Yt . Since the sample correlation coefficient measures the linear association
between two variables, if R2 is high, that means there is a close association between the
values of Yt and the values of predicted by the model, Yˆt . In this case, the model is said to
“fit” the data well. If R2 is low, there is no association between the values of Yt and the
values predicted by the model, Yˆt and the model does not fit the data well.
One difficulty with R 2 is that it can be made large by adding more and more variables,
even if the variables added have no economic justification. Algebraically, it is the fact
that as the variables are added the sum of squared errors (RSS) goes down (it can remain
unchanged, but this is rare) and thus R 2 goes up. If the model contains n-1 variables then
R 2 =1. The manipulation of model just to obtain a high R 2 is not wise. An alternative
measure of goodness of fit, called the adjusted R 2 and often symbolized as R 2 , is usually
reported by regression programs. It is computed as:
ei2 / n k n 1
R 2 1 1 (1 R 2 ) -------------------------------- (3.28)
y / n 1 nk
2
This measure does not always goes up when a variable is added because of the degree of
freedom term n-k is the numerator. As the number of variables k increases, RSS goes
down, but so does n-k. The effect on R 2 depends on the amount by which R 2 falls. While
solving one problem, this corrected measure of goodness of fit unfortunately introduces
another one. It loses its interpretation; R 2 is no longer the percent of variation explained.
This modified R 2 is sometimes used and misused as a device for selecting the
appropriate set of explanatory variables.
So far we have discussed the regression models containing one or two explanatory
variables. Let us now generalize the model assuming that it contains k variables. It will
be of the form:
Y 0 1 X 1 2 X 2 ...... k X k U
There are k parameters to be estimated. The system of normal equations consist of k+1
equations, in which the unknowns are the parameters 0 , 1 , 2 ....... k and the known
terms will be the sums of squares and the sums of products of all variables in the
structural equations.
Least square estimators of the unknown parameters are obtained by minimizing the sum
of the squared residuals.
ei2
2(Yi ˆ0 ˆ1 X 1 ˆ 2 X 2 ...... ˆ k X k ) 0
ˆ
0
ei2
2(Yi ˆ0 ˆ1 X 1 ˆ 2 X 2 ...... ˆ k X k )( xi ) 0
ˆ1
……………………………………………………..
ei2
2(Yi ˆ0 ˆ1 X 1 ˆ 2 X 2 ...... ˆ k X k )( x ki ) 0
ˆ
k
The general form of the above equations (except first ) may be written as:
ei2
2(Yi ˆ0 ˆ1 X 1i ˆ k X ki ) 0 ; where ( j 1,2,....k )
ˆ
j
…………………………………………………..............................................
Solving the above normal equations will result in algebraic complexity. But we can solve
this easily using matrix. Hence in the next section we will discuss the matrix approach to
linear regression model.
The general linear regression model with k explanatory variables is written in the form:
Yi 0 1 X 1i 2 X 2i ............. k X ki +Ui
U=stochastic disturbance term and i=ith observation, ‘n’ being the size of the observation.
Since i represent the ith observation, we shall have ‘n’ number of equations with ‘n’
number of observations on each variable.
Y1 0 1 X 11 2 X 21 3 X 31............. k X k1 U1
Y2 0 1 X 12 2 X 22 3 X 32 ............. k X k 2 U 2
Y3 0 1 X 13 2 X 23 3 X 33 ............. k X k 3 U 3
…………………………………………………...
Yn 0 1 X 1n 2 X 2n 3 X 3n ............. k X kn U n
To derive the OLS estimators of , under the usual (classical) assumptions mentioned
ˆ0 e1
ˆ e
1 2
ˆ . and e .
. .
ˆ en
k
We have to minimize:
e
i 1
2
i e12 e22 e32 ......... en2
e1
e
2
[e1 , e2 ......en ] . e' e
.
en
ei2 e' e
YY ' ˆ ' X ' Y Y ' Xˆ ˆ ' X ' Xˆ ………………….… (3.30)
e' e Y ' Y 2ˆ ' X ' Y ˆ ' X ' Xˆ ------------------------------------- (3.31)
ei2 (e' e)
2 X ' Y 2 X ' Xˆ
ˆ ˆ
( X ' AX )
Since 2 AX and also too 2X’A
ˆ
Hence ˆ is the vector of required least square estimators, ˆ0 , ˆ1 , ˆ 2 ,........ˆ k .
We have seen, in simple linear regression that the OLS estimators (ˆ & ˆ ) satisfy the
small sample property of an estimator i.e. BLUE property. In multiple regression, the
OLS estimators also satisfy the BLUE property. Now we proceed to examine the desired
properties of the estimators in matrix notations:
1. Linearity
Let C= ( X X ) 1 X
̂ CY ……………………………………………. (3.33)
2. Unbiased ness
ˆ ( X ' X ) 1 X ' Y
ˆ ( X ' X ) 1 X ' ( X U )
Since ( X ' X ) 1 X ' X I
( ˆ ) ( X ' X ) 1 X 'U
( ) ( X ' X ) 1 X 'U
( X ' X ) 1 X ' (U )
, Since (U ) 0
3. Minimum variance
Before showing all the OLS estimators are best (possess the minimum variance property),
it is important to derive their variance.
We know that, var( ˆ ) ( ˆ ) 2 ( ˆ )( ˆ )'
( ˆ )( ˆ )'
( ˆ1 1 ) 2
( ˆ1 1 )( ˆ 2 2 ) .......
( ˆ1 1 )( ˆ k k )
ˆ ˆ
( 2 2 )( 1 1 ) ( ˆ 2 2 ) 2
....... ( ˆ 2 2 )( ˆ k k )
: : :
: : :
( ˆ k k )( ˆ1 1 ) ( ˆ k k )( ˆ 2 2 ) ........ ( ˆ k k ) 2
The above matrix is a symmetric matrix containing variances along its main diagonal and
covariance of the estimators everywhere else. This matrix is, therefore, called the
Variance-covariance matrix of least squares estimators of the regression slopes. Thus,
var( ˆ ) ( ˆ )( ˆ )' ……………………………………………(3.35)
ˆ ( X X ) 1 X U ………………………………………………(3.36)
var( ˆ ) ( X ' X ) 1 X 'U ( X ' X ) 1 X 'U '
var( ˆ ) ( X ' X ) 1 X 'UU ' X ( X ' X ) 1
( X ' X ) 1 X ' (UU ' ) X ( X ' X ) 1
Note: ( u2 being a scalar can be moved in front or behind of a matrix while identity matrix I n can
be suppressed).
n X 1n ....... X kn
X X 2
....... X 1n X kn
1 1n 1n
Where, ( X ' X ) : : :
: : :
X kn X 1n X kn ....... X 2 kn
We can, therefore, obtain the variance of any estimator say ̂1 by taking the ith term from
Where the X’s are in their absolute form. When the x’s are in deviation form we can write
the multiple regression in matrix form as ;
̂ ( x x) 1 x y
Where ˆ = : and ( x x ) : : :
: : :
: 2
ˆ k x n x1 x n x 2 ....... x k
The above column matrix ˆ doesn’t include the constant term ̂ 0 .Under such conditions
(the proof is the same as (3.37) above). In general we can illustrate the variance of the
parameters by taking two explanatory variables.
The multiple regressions when written in deviation form that has two explanatory
variables is,
y1 ˆ1 x1 ˆ 2 x 2
var( ˆ ) ( ˆ )( ˆ )'
( ˆ1 1 )
ˆ
In this model; ( )
( ˆ 2 2 )
( ˆ )' ( ˆ1 1 )( ˆ 2 2 )
( ˆ1 1 )
( ˆ )( ˆ )'
ˆ
( ˆ1 1 )( ˆ 2 2 )
( 2 2 )
( ˆ1 1 ) 2 ( ˆ1 1 )( ˆ 2 2 )
ˆ ˆ
and ( )( )'
ˆ ˆ
( 1 1 )( 2 2 ) ( ˆ 2 2 ) 2
x11 x 21
x12 x 22 x x12 ....... x1n
x and x' 11
: : x12 x 22 ....... x 2 n
x
1n x 2 n
1
x12 x1 x 2
( x' x)
2 1
2
x1 x 2 x 22
u u
x 22 x1 x 2
u2
Or u2 ( x' x) 1 x1 x 2 x12
x12 x1 x 2
x1 x 2 x 22
u2 x 22
i.e., var( ˆ1 ) ……………………………………(3.39)
x12 x 22 (x1x 2 ) 2
u2 x12
and, var( ˆ 2 ) ………………. …….…….(3.40)
x12 x 22 (x1x 2 ) 2
() u2 x1 x 2
cov(ˆ1 , ˆ 2 ) …………………………………….(3.41)
x12 x 22 (x1x 2 ) 2
e 2
As we have seen in simple regression model ˆ 2 i . For k-parameters (including
n 2
ei2
the constant parameter) ˆ
2
.
n k
In the above model we have three parameters including the constant term and
e 2
ˆ 2 i
n 3
e y i 1 x1 y 2 x 2 y ………………………………………...(3.43)
2 2
i
This is all about the variance covariance of the parameters. Now it is time to see the
minimum variance property.
Minimum variance of ˆ
To show that all the i ' s in the ˆ vector are Best Estimators, we have also to prove that
the variances obtained in (3.37) are the smallest amongst all other possible linear
unbiased estimators. We follow the same procedure as followed in case of single
explanatory variable model where, we first assumed an alternative linear unbiased
estimator and then it was established that its variance is greater than the estimator of the
regression model.
ˆ
Assume that ˆ is an alternative unbiased and linear estimator of . Suppose that
ˆ
ˆ ( X ' X ) 1 X ' B X U
ˆ
ˆ ( X ' X ) 1 X ' ( X U ) B( X U )
ˆ
(ˆ ) ( X ' X ) 1 X ' ( X U ) B( X U )
( X ' X ) 1 X ' X ( X ' X ) 1 X 'U BX BU
BX , [Since E(U) = 0].……………………………….(3.44)
ˆ
Since our assumption regarding an alternative ˆ is that it is to be an unbiased estimator
ˆ
of , therefore, ( ˆ ) should be equal to ; in other words ( XB ) should be a null matrix.
ˆ
Thus we say, BX should be =0 if ( ˆ ) ( X ' X ) 1 X ' B Y is to be an unbiased estimator. Let
us now find variance of this alternative estimator.
ˆ ˆ ˆ
var( ˆ ) ( ˆ )( ˆ )'
( X ' X ) 1 X ' B Y ( X ' X ) 1 X ' B Y '
( X ' X ) 1 X ' B ( X U ) ( X ' X ) 1 X ' B ( X U ) '
[ ( X ' X ) 1 X ' X ( X ' X ) 1 X 'U BX BU
( X ' X ) 1
X ' X ( X ' X ) X 'U BX BU '}
1
[ ( X ' X ) 1 X 'U BU ( X ' X ) 1 X 'U BU '}
( BX 0)
( X ' X ) 1 X 'U BU U ' X ( X ' X ) 1 U ' B'
( X ' X ) 1 B UU ' X ( X ' X ) 1 U ' B'
( X ' X ) 1 X ' B (UU ' ) X ( X ' X ) 1 B'
u2 I n ( X ' X ) 1 X ' B X ( X ' X ) 1 B'
u2 ( X ' X ) 1 X ' X ( X ' X ) 1 BX ( X ' X ) 1 ( X ' X ) 1
u2 ( X ' X ) 1 X ' X ( X ' X ) 1 BX ( X ' X ) 1 ( X ' X ) 1 X ' B' BB'
u2 ( X ' X ) 1 BB' ( BX 0)
ˆ
var( ˆ ) u ( X ' X ) u BB' ……………………………………….(3.45)
2 1 2
ˆ
Or, in other words, var( ˆ ) is greater than var( ˆ ) by an expression u2 BB' and it proves
We know that ei2 e' e Y ' Y 2ˆ ' X ' Y ˆ ' X ' Xˆ since ( X ' X ) ˆ X ' Y and Y Y Y
2
i
We know, y i Yi Y
1
y i2 Yi 2 (Yi ) 2
n
1
In matrix notation, y i2 Y ' Y (Yi ) 2 …………………………………………… (3.48)
n
Equation (3.48) gives the total sum of squares variations in the model.
1
Y 'Y ( y ) 2 e ' e
n
1
ˆ ' X ' Y (Yi ) 2 ……………………….(3.49)
n
1
ˆ ' X ' Y
( Yi )2
ˆ ' X ' Y nY 2 …………………… (3.50)
R
2 n
1
Y ' Y ( Yi ) 2 Y ' Y nY 2
n
i. Model: Y X U
If we invoke the assumption that U i ~. N (0, 2 ) , then we can use either the t-test or
standard error test to test a hypothesis about any individual partial regression coefficient.
To illustrate consider the following example.
A. H 0 : 1 0
H 1 : 1 0
B. H 0 : 2 0
H1 : 2 0
The null hypothesis (A) states that, holding X2 constant X1 has no (linear) influence on Y.
Similarly hypothesis (B) states that holding X1 constant, X2 has no influence on the
dependent variable Yi.To test these null hypothesis we will use the following tests:
i. Standard error test: under this and the following testing methods we test only for
ˆ 2 x 22i ei2
SE ( ˆ1 ) var( ˆ1 ) ; where ˆ
2
x x
2
1i
2
2i ( x1 x 2 ) 2 n3
If SE ( ˆ1 ) 1
2 ˆ1 , we accept the null hypothesis that is, we can conclude that the estimate
i is not statistically significant.
If SE ( ˆ1 1
2 ˆ1 , we reject the null hypothesis that is, we can conclude that the estimate i
is statistically significant.
Note: The smaller the standard errors, the stronger the evidence that the estimates are
statistically reliable.
ˆi
t* ~ t n -k , where n is number of observation and k is number of parameters.
SE ( ˆi )
ˆ 2 2
t* ; with n-3 degree of freedom
SE( ˆ 2 )
ˆ 2
t*
SE ( ˆ 2 )
If t*<t (tabulated), we accept the null hypothesis, i.e. we can conclude that ˆ 2 is not
significant and hence the regressor does not appear to contribute to the explanation of
the variations in Y.
If t*>t (tabulated), we reject the null hypothesis and we accept the alternative one; ˆ 2 is
statistically significant. Thus, the greater the value of t* the stronger the evidence that i
is statistically significant.
Throughout the previous section we were concerned with testing the significance of the
estimated partial regression coefficients individually, i.e. under the separate hypothesis
that each of the true population partial regression coefficients was zero. In this section
we extend this idea to joint test of the relevance of all the included explanatory variables.
Now consider the following:
Y 0 1 X 1 2 X 2 ......... k X k U i
H 0 : 1 2 3 ............ k 0
This null hypothesis is a joint hypothesis that 1 , 2 ,........ k are jointly or simultaneously
equal to zero. A test of such a hypothesis is called a test of overall significance of the
observed or estimated regression line, that is, whether Y is linearly related to
X 1 , X 2 ,........X k .
Can the joint hypothesis be tested by testing the significance of individual significance of
testing the significance of ˆ 2 under the hypothesis that 2 0 , it was assumed tacitly that the
testing was based on different sample from the one used in testing the significance of ̂ 3 under the
null hypothesis that 3 0 . But to test the joint hypothesis of the above, we shall be violating the
The test procedure for any set of hypothesis can be based on a comparison of the sum of
squared errors from the original, the unrestricted multiple regression model to the sum
of squared errors from a regression model in which the null hypothesis is assumed to be
true. When a null hypothesis is assumed to be true, we in effect place conditions or
constraints, on the values that the parameters can take, and the sum of squared errors
increases. The idea of the test is that if these sum of squared errors are substantially
different, then the assumption that the joint null hypothesis is true has significantly
reduced the ability of the model to fit the data, and the data do not support the null
hypothesis.
If the null hypothesis is true, we expect that the data are compliable with the conditions
placed on the parameters. Thus, there would be little change in the sum of squared
errors when the null hypothesis is assumed to be true.
Let the Restricted Residual Sum of Square (RRSS) be the sum of squared errors in the
model obtained by assuming that the null hypothesis is true and URSS be the sum of the
1
Gujurati, 3rd ed.pp
squared error of the original unrestricted model i.e. unrestricted residual sum of square
(URSS). It is always true that RRSS - URSS 0.
H 0 : 1 2 3 ............ k 0
Yi Yˆ e
ei Yi Yˆi
This sum of squared error is called unrestricted residual sum of square (URSS). This is
the case when the null hypothesis is not true. If the null hypothesis is assumed to be true,
i.e. when all the slope coefficients are zero.
Y ˆ 0 ei
̂ 0
Y i
Y (applying OLS)…………………………….(3.52)
n
e Y ̂ 0 but ̂ 0 Y
e Y Y
The sum of squared error when the null hypothesis is assumed to be true is called
Restricted Residual Sum of Square (RRSS) and this is equal to the total sum of square
(TSS).
RRSS URSS / K 1
The ratio: ~ F( k 1,n k ) ……………………… (3.53); (has an F-ditribution
URSS / n K
with k-1 and n-k degrees of freedom for the numerator and denominator respectively)
RRSS TSS
(TSS RSS ) / k 1
F
RSS / n k
ESS / k 1
F ………………………………………………. (3.54)
RSS / n k
ESS
/ k 1
F TSS
RSS
/k n
TSS
R2 / k 1
F …………………………………………..(3.55)
1 R2 / n k
This implies the computed value of F can be calculated either as a ratio of ESS & TSS or
R2 & 1-R2. If the null hypothesis is not true, then the difference between RRSS and URSS
(TSS & RSS) becomes large, implying that the constraints placed on the model by the null
hypothesis have large effect on the ability of the model to fit the data, and the value of F
tends to be large. Thus, we reject the null hypothesis if the F test static becomes too large.
This value is compared with the critical value of F which leaves the probability of in
the upper tail of the F-distribution with k-1 and n-k degree of freedom.
If the computed value of F is greater than the critical value of F (k-1, n-k), then the
parameters of the model are jointly significant or the dependent variable Y is linearly
related to the independent variables included in the model.
In order to help you understand the working of matrix algebra in the estimation of the
regression coefficient, variance of the coefficients and testing of the parameters and the
model, consider the following numerical example.
Example 1. Consider the data given in Table 2.1 below to fit a linear function:
Y 1 X 1 2 X 2 3 X 3 U
Table: 2.1. Numerical example for the computation of the OLS estimators.
Σx1x2=240
Σx12=270
Σx22=630
Σx32=750
Σyi2=594
Σx2x3=-420
Σx1x3=-330
Σx3yi=319
Σx2yi=492
Σx3yi=-625
Σyi=0
Σx1=0
Σx2=0
Σx3=0
From the table, the means of the variables are computed and given below:
Based on the above table and model answer the following questions.
Solution:
In the matrix notation: ˆ ( x' x) 1 x' y ; (when we use the data in deviation form),
ˆ1 x11 x 21 x31
Where, ˆ ˆ 2 , x x12 x 22 x32 ; so that
ˆ : : :
3 x
1n x2n x3n
Note: the calculations may be made easier by taking 30 as common factor from all the
elements of matrix (x’x). This will not affect the final results.
And
(ii) The elements in the principal diagonal of ( x' x) 1 when multiplied u2 give the variances of the
1
ˆ ' X ' Y (Yi ) 2
n ˆ1x1 y ˆ 2 x 2 y ˆ3 x3 y 575 .98
(iii) R2 0.97
1 y 2
594
Y ' Y (Yi ) 2 i
n
We can test the significance of individual parameters using the student’s t-test.
The computed value of ‘t’ is given above as t * .these values indicates us only ̂1 is
insignificant.
Example 2. The following matrix gives the variances and covariance of the of the three
variables:
y x1 x2
y 7.59 3.12 26 .99
x1 29 .16 30 .80
x 2 133 .00
The first raw and the first column of the above matrix shows y 2
and the first raw and
Y1 AY21 Y2 2 e vi
Y2 is food price
And Y ln Y1 , X 1 ln Y2 and X 2 ln Y3
y Y Y , x1 X X , and x2 X X
a. Estimate 1 and 2
ˆ ˆ
b. Compute variance of 1 and 2
c. Compute coefficient of determination
d. Report the regression result.
Solution:
It is difficult to estimate the above model as it is, to estimate the above model easily let’s
take the natural log of the above model;
ln Y1 ln A 1 ln Y2 2 ln Y3 Vi
Y 0 X 1 X 2 Vi
The above matrix is based on the transformed model. Using values in the matrix
x11 x12
ˆ1 x x 22
ˆ and , x 21
ˆ 2 : :
x1n x nn
x 2 x1 x2 x y
x' x 1 2
and x' y 1
x1 x2 x2 x2 y
29 .16 30 .80
| x' x | 2929 .64
30 .80 133 .00
ˆ
(a) ˆ ( x' x) 1 x' y 1
ˆ 2
(b). The element in the principal diagonal of ( x' x) 1 when multiplied by u2 give the
1.6680
ˆ u2 0.0981
17
ˆx 2 y ˆx3 y
(c). R 2
y i2
Yˆ1 AY1
( 0.1421) ( 0.2358)
Y3
SE (0.0667 )(0.0312 ) R 2 0.78
t* (2.13) (7.55)
The (constant) food price elasticity is negative but income elasticity is positive. Also
income elasticity if highly significant. About 78 percent of the variations in the
consumption of food are explained by its price and income of the consumer.
Example 3:
On the basis of the information given below answer the following question
Solution:
a. Since the above model is a two explanatory variable model, we can estimate ˆ1 and ˆ 2 using the
formula in equation (3.21) and (3.22) i.e.
x 2 yx 22 x 2 yx1 x 2
ˆ1
x12 x 22 (x1 x 2 ) 2
Since the x’s and y’s in the above formula are in deviation form we have to find the
corresponding deviation forms of the above given values.
We know that:
x1 x2 X 1 X 2 nX 1 X 2
x1 y X 1Y nX 1Y
x2 y X 2Y nX 2Y
13500 25(16)(32 )
700
x12 X 12 nX 12
3200 25(10 ) 2
700
x22 X 22 nX 22
7300 25(16 ) 2
900
x 2 yx 22 x 2 yx1 x 2
ˆ1
x12 x 22 (x1 x 2 ) 2
ˆ Y ˆ1 X 1 ˆ 2 X 2
ˆ 2 x 22
b. var( ˆ1 )
x12 x 22 (x1 x 2 ) 2
ei2
̂ 2 Where k is the number of parameter
nk
ei2
ˆ 2
n3
e12 y 2 ˆ1x1 y ˆ 2 x 2 y
ei2
ˆ 2
n3
1809 .3
25 3
82 .24
(82.24)(900 )
var( ˆ1 ) 0.137
540 ,000
ˆ 2 x12
var( ˆ 2 )
x12 x12 (x1 x 2 ) 2
(82.24)(700 )
0.1067
540 ,000
This is done by comparing the computed value of t and critical value of t which is
obtained from the table at
2 level of significance and n-k degree of freedom.
t* 0.278
Hence; 0.751
SE ( ˆ1 ) 0.370
t c 2.074
t* 0.755
t* t c
The decision rule if t* t c is to reject the alternative hypothesis that says is different
from zero and to accept the null hypothesis that says is equal to zero. The conclusion
is ̂1 is statistically insignificant or the sample we use to estimate ̂1 is drawn from the
population of Y & X1in which there is no relationship between Y and X1(i.e. 1 0 ).
ESS RSS
R2 1-
TSS TSS
and TSS y 2 and ESS yˆ 2 ˆ1x1 y ˆ2 x2 y ...... ˆ k xk y
RSS 10809 .3
R 2 1- 1
TSS 2400
0.24
ei2 / n k (1 R 2 )( n 1)
Adjusted R 2 1 1
y 2 / n 1 nk
(1 0.24 )( 24 )
1
22
0.178
H 0 : 1 2 0
The joint test hypothesis is testing using the F-test given below.
ESS / k 1
F *( k 1),( n k )
RSS / n k
R2 / k 1
1 R2 / n k
F *( 2, 22) 3.4736 this is the computed value of F. Let’s compare this with the
F*(2,22) = 3.47
Fc(2,22)=3.44
F*>Fc, the decision rule is to reject H0 and accept H1. We can say that the model is
significant i.e. the dependent variable is, at least, linearly related to one of the
explanatory variables.