Econometrics Module
Econometrics Module
Contents
1.0 Aims and Objectives
1.1 Definition of Econometrics
1.2 Goals of Econometrics
1.3 Division of Econometrics
1.4 Methodology of Econometrics
1.5 The Nature and Sources of Data for Econometrics Analysis
1.5.1 Types of Data
1.5.2 The Sources of Data
1.6 Summary
1.7 Answers to Check Your Progress
1.8 References
1.9 Model Examination Questions
The purpose of this unit is to let you know what econometrics is all about; and to discuss the
scope, goals, division and methodology of econometric analysis.
A. Economic theory makes statements or hypotheses that are mostly qualitative in nature
Ex. Microeconomic theory states that, other things remaining the same, a reduction in the price
of a commodity is expected to increase the quantity demanded of that commodity. But the
theory itself does not provide any numerical measure of the relationship between the two: that
is it does not tell by how much the quantity will go up or down as a result of a certain change in
the price of the commodity. It is the job of econometrician to provide such numerical
statements.
The econometrician often needs special methods since the data are not generated as the result of
a controlled experiment. This creates special problems not normally dealt with in mathematical
statistics. Moreover, such data are likely to contain errors of measurement, and the
econometrician may be called up on to develop special methods of analysis to deal with such
errors of measurement.
2. Policy-Making
In many cases we apply the various econometric techniques in order to obtain reliable estimates
of the individual coefficients of the economic relationships from which we may evaluate
elasticities or other parameters of economic theory (multipliers, technical coefficients of
production, marginal costs, marginal revenues, etc.) The knowledge of the numerical value of
these coefficients is very important for the decisions of firms as well as for the formulation of
the economic policy of the government. It helps to compare the effects of alternative policy
decisions.
3. Forecasting
In formulating policy decisions it is essential to be able to forecast the value of the economic
magnitudes. Such forecasts will enable the policy-maker to judge whether it is necessary to
take any measures in order to influence the relevant economic variables.
Econometrics
Theoretical Applied
iii) the mathematical form of the model (number of equations liner or non-linear form of
these equations, etc).
The specification of the econometric model will be based on economic theory and on any
available information relating to the phenomenon being studied. The econometrics must know
the general laws of economic theory, and further more he must gather any other information
relevant to the particular characteristics of the relationship as well as all studies already
published on the subject by other research workers.
The most common errors of specification are:
- the omission of some variables from the functions
- the omission of some equations
- the mistaken mathematical form of the functions.
Evaluation of Estimates
After the estimation of the model the econometrician must proceed with the evaluation of the
results of the calculations that is with the determination of the reliability of these results. The
evaluation consists of deciding whether the estimates of the parameters are theoretically
meaningful and statistically satisfactory. Various criteria may be used.
- Economic a prior criteria: – These are determined by the principles of economic
theory and refer to the sign and the size of the parameters of economic relationships. In
econometric jargon we say that economic theory imposes restrictions on the signs and
values of the parameters of economic relationships.
- Statistical criteria: – These are determined by statistical theory and aim at the
evaluation of the statistical reliability of the estimates of the parameters of the model.
The most widely used statistical criteria are the correlation coefficient and the
standard deviation( or the standard error) of the estimates. These concepts will be
discussed in the subsequent units. Note that the statistical criteria are secondary only to
the a priori theoretical criteria. The estimates of the parameters should be rejected in
general if they happen to have the wrong sign or size even though the pass the
statistical criteria.
- Econometric criteria: – are determined by econometric theory. It aims at the
investigation of whether the assumptions of the econometric method employed are
satisfied or not in any particular case. When the assumptions of an econometric
technique are not satisfied it is customary to re specify the model.
Therefore, the final stage of any applied econometric research is the investigation of the
stability of the estimates, their sensitivity to changes in the size of the sample.
One way of establishing the forecasting power of a model is to use the estimates of the model
for a period not included in the sample. The estimated value (forecast value) is compared with
the actual (realized) magnitude of the relevant dependent variable. Usually there will be a
difference between the actual and the forecast value of the variable, which is tested with the
aim of establishing whether it is (statistically) significant. If after conducting the relevant test of
significance, we find that the difference between the realized value of the dependent variable
and that estimated from the model is statistically significant, we conclude that the forecasting
power of the model, its extra – sample performance, is poor.
Another way of establishing the stability of the estimates and the performance of the model
outside the sample of data from which it has been estimated, is to re-estimate the function with
an expanded sample, that is a sample including additional observations. The original estimates
will normally differ from the new estimates. The difference is tested for statistical significance
with appropriate methods.
b) The estimates of the coefficients ( β ' s ) may be poor, due to deficiencies of the sample
data.
c) The estimates are ‘good’ for the period of the sample, but the structural background
conditions of the model may have changed from the period that was used as the basis for
the estimation of the model, and there fore the old estimates are not ‘good’ for
forecasting. The whole model needs re-estimation before it can be used for prediction.
Example . Suppose that we estimate the demand function for a given commodity with a single
equation model using time-series data for the period 1950 – 68 as follows
This equation is then used for ‘forecasting’ the demand of the commodity in the year 1970, a
period outside the sample data.
Given Y1970 = 1000 and P1970 = 5
^
Q t = 100 + 5(1000) – 30(5) = 4, 950 units.
If the actual demand for this commodity in 1970 is 4, 500 there is a difference of 450 between
the estimated from the model and the actual market demand for the product. The difference can
be tested for significance by various methods. If it is found significant, we try to find out what
are the sources of the error in the forecast, in order to improve the forecasting power of our
model.
The success of any econometric analysis ultimately depends on the availability of the
appropriate data. Let us first discuss the types of data and then we will see the sources and
limitations of the data.
b) Cross-Section data
These data give information on the variables concerning individual agents (consumers or
producers) at a given point of time.
Example:
- the census of population conducted by CSA.
-survey of consumer expenditure conducted by Addis Ababa university
Note that due to heterogeneity, cross- sectional data have their own problems.
c) Pooled Data
These are repeated surveys of a single (cross-section) sample in different periods of time. They
record the behavior of the same set of individual microeconomic units over time. There are
elements of both time series and cross sectional data.
The panel or longitudinal data also called micro panel data, is a special type of pooled data in
which the same cross-sectional unit is surveyed over time.
The individual (researcher) himself may collect data through interviews or using questionnaire.
In the social sciences the data that one generally obtains is non experimental in nature; that is
not subject to the control of the researcher. For example, data on GNP, unemployment, stock
prices etc are not directly under the control of the investigator. This often creates special
problems for the researcher in pinning down the exact cause or causes affecting a particular
situation.
Limitations
Although there is plenty of data available for economic research, the quality of the data is often
not that good. Reasons are:
- Since most social science data are not experimental in nature, there is the possibility of
observational errors.
- Errors of measurement arising from approximations and round offs.
- In questionnaire type surveys, there is the problem of non-response
- Respondents may not answer all the questions correctly
- Sampling methods used in obtaining data
- Economic data is generally available at a highly aggregate level. For example most macro
data like GNP, unemployment, inflation etc are available for the economy as a whole.
- Because of confidentiality, certain data can be published only in highly aggregate form
For example, data on individual tax, production, employment etc at firm level are usually
available in aggregate form.
Because of all these and many other problems, the researcher should always keep in mind that
the results of research are only as good as the quality of the data. Therefore, the results of the
research may be unsatisfactory due to the poor quality of the available data (may not be due to
wrong model)
1.6. SUMMARY
Definition of Econometrics
Economic theory, mathematical economics and statistics
Methodology of econometrics:
C) Evaluation of Estimates
Criteria for evaluation of the estimates
- Economic a prior criteria: – These are determined by the principles of economic
theory and refer to the sign and the size of the parameters of economic relationships.
- Statistical criteria: – These are determined by statistical theory
correlation coefficient and the standard deviation( or the standard error) of the
estimates.
- Econometric criteria: – are determined by econometric theory.
Types of Data
There are three types of data
A)Time series data
qualitative or quantitative data
dummy variables or categorical variable.
B)Cross-Section data
C)Pooled Data
. The panel data
The Sources of Data
Qd =
β o + β1 P+ β 2 Y
0
3. The results of research are only as good as the quality of the data. Explain it.
4. Mention some of the reasons for the poor forecasting power of the estimated model.
Content
2.0 Aims and Objectives
2.1 The Concept of Regression Analysis
2.2 Population Regression Function Vs Sample Regression Function
2.3 The Method of Ordinary Least Squares
2.4 Statistical Test of Significance and Goodness of Fit
2.5 Confidence Interval and Prediction
2.6 Summary
2.7 Answers to Check Your Progress
2.8 Model Examination
2.9 References
This unit introduces the key idea behind regression analysis. The objective of such analysis is
to estimate and/or predict the mean or average value of the dependent variable on the basis of
the known or fixed values of the explanatory variables.
be able to apply ordinary least squares method in a two variable regression analysis and
interpret the results.
Regression analysis is concerned with the study of the dependence of one variable, the
dependent variable, on one or more other variables, the explanatory variable with the view to
Basically, the existence of the disturbance term is justified in three main ways.
i) Omission of other variables: although income might be the major determinant of the
level of consumption, it is not the only determinant. Other variables such as interest
rate, or liquid asset holdings may have a systematic influence on consumption. Their
omission constitutes one type of specification error. But, the disturbance term is often
ii) Measurement error: it may be the case that the variable being explained cannot be
measures accurately, either because of data collection difficulties or because it is
inherently un measurable and a proxy variables must be used instead. The disturbance
term can in these circumstances be though of as representing this measurement error
[(of the variable(s)]
iii) Randomness in human behavior. Humans are not machines that will do as instructed.
So there is unpredictable element. Example: due to unexplained case, an increase in
income may not influence consumption. Thus the disturbance term captures such human
behavior that is left unexplained by the economic model.
iv) Imperfect specification of the model. Example: we may have linearized a non-linear
function if so, the random term may tell us the wrong specification
Generally speaking regression analysis is concerned with the study of the dependency of one
dependent variable on one or more other variables called the explanatory variable(s) or the
independent variable(s). Moreover, the true relationship that connects the variables involved is
split in to two. They are systematic (or explained variation and random or (unexplained)
variation. Using (2.2) we can disaggregate the two components as follows
Y = 0 + 1X + U
That is,
[variation in Y] = [systematic variation] + [random variation]
In our analysis we will assume that the “independent” variable X is nonrandom. We will also
assume a linear model. Note that this course is concerned with linear model like (2.3). In this
regard it is essential to know what the term linear really means, for it can be interpreted in two
different ways. These are,
a) Linearity in the variables
b) Linearity in parameters
The following discussion stress that regression analysis is largely concerned with estimating
and/or predicting the (population) mean or average value of the dependent variable on the basis
of the known or fixed values of the explanatory variable(s).
PRF
160
140
120
100
80
60
40
20
100 140 180 220 260
Figure 2.1(the above line) is known as the population regression line or, more generally, the
population regression curve. Geometrically, a population regression curve is simply the locus
of the conditional means or expectations of the dependent variable for the fixed value of the
explanatory variables.
From the preceding discussion it is clear that each conditional mean which is E
( Y
X)
is a i
( X) =
E
Y
i 0 + 1Xi
Where 0 and 1 are unknown but fixed parameters known as the regression coefficients
(intercept and slope coefficients respectively). The above equation is known as the linear
population regression function. But since consumption expenditure does not necessarily
increase as income level increases we incorporate the error term. That is,
Yi = E
( X ) +U
Y
i
i
= 1 + 2Xi + Ui .......................................................(2.4)
Note that in table 2.1we observe that for the same value of X (e.g. 100) we have different value
of Y (65, 70, 75 and 80). Thus the value of Y is also affected by other factors that can be
captured by the error term, U.
If we take the expected value of (2.4) we obtain,
E
( Yi
Xi ) = E( X ) + E (
Y
i
Ui
Xi ) .................................(2.5)
Since
E
( Yi
Xi ) = E (Y X ) i
it implies that
E
( Ui
Xi )= 0 ...............................(2.6)
Thus, the assumption that the regression line passes through the conditional means of Y implies
that the conditional mean values of Ui (conditional upon the given X's) are zero.
SRF
Consumption expenditure
Income
Hence, analogous to PRF that underlines the population regression line, we can develop the
concept of the sample regression function (SRF) to represent the sample regression line. The
sample regression function (which is a counterpart of the PRF stated earlier) may be written as:
Y^ i = β^ 1 + β^ 2 X + U^ i ...............................(2.7)
i
^
where Y ( is read as “Y – hat” or “Y – cap”) = estimator of E
(Y X ) , β^
i 1 = estimator of 1 ,
β^ 2 = estimator of , U^ i = an estimator of U
2 i
To sum up, because our analysis is based on a single sample from some population our primary
objective in regression analysis is to estimate the PRF given by
Yi = 1 + 2Xi + Ui
on the basis of
Y^ i = β^ 1 + β^ 2 X + U^ i
i
We employ SRF because in most of the cases our analysis is based upon a single sample from
some population. But because of sampling fluctuations our estimate of the PRF based on the
SRF is at best an approximate one.
^
Note that Y 1 overestimates the true E
(Y X ) for X shown therein. By the same token, for any
i
i
Xi to the left of the point A, the SRF will underestimate the true PRF. Such over and under
estimation is inevitable because of sampling fluctuations.
Note that there are several methods of constructing the SRF, but in so far regression analysis is
concerned the model that is used most extensively is the method of Ordinary Least Squares,
OLS
^
In other words, how should the SRF be constructed so that β 1 is as “close” as possible to the
^
true 1 and β 2 is as “close” as possible to the true 2 even though we never know the true 1
and 2. We can develop procedures that tell us how to construct the SRF to mirror the PRF as
faithfully as possible. This can be done even though we never actually determine the PRF itself.
The method of ordinary least squares has some very attractive statistical properties that have
made it one of the most powerful and popular method of regression analysis.
Thus 1 + 2X represent systematic explain variation and Ui refer to unexplained variation.
However, the PRF is not directly observable. Hence, we estimate it from the SRF. That is,
^ ^ ^
Yi = β 1 + β 2 X i + U i
=
Y^ i + U^ i
Where
Y^ i is the estimated (conditional mean) value of Y
i
Note that
U^
i = Yi –
Y^ i
^ ^
= Yi – β 1 + β 2 X i
Choose the SRF in such a way that the sum of the residuals
U^ i = (Y – Y^ i ) is as small as
i
possible. However, Ui is zero (refer 2.6) although the U i are widely scattered about the SRF.
We can avoid this problem if we consider the sum of the squared errors, which is the least
squares criterion.
That is, minimize
^2
U i = (Yi –
Y^ i )2
^ ^
= (Yi – β 0 + β 1 X i )2 ..................................(2.8)
Thus, the least square method requires the sum of the squared error term to be as small as
^ ^
In other words, the least-square method allows to choose β 0 and β 1 as estimator of 0 and 1
respectively so that
^ ^
(Yi - β 1 - β 2 Xi)2
is minimum.
If the deviation of the actual from the estimate is the minimum, then our estimation from the
collected samples provides a very good approximation of the true relationship between the
variables
Note that to estimate the coefficients 0 and 1 we need observations on X, Y and U. Yet U is
never observed like the other explanatory variables, and therefore in order to estimate the
function Xi = 0 + 1Xi + Ui, we should guess the value of U i, That means we should make
some reasonable (plausible) assumptions about the shape of the distribution of each U i (i.e., its
Note that in PRF: Yi = 0 + 1Xi + Ui. It shows that Yi depends on both Xi and Ui. Therefore,
unless we are specific about how X i and Ui are created or generated, there is no way we can
make any statistical inference about Yi and also, as we shall see about 0 and 1.
Thus the linear regression model is based on certain assumptions, some of which refer to the
distribution of the random variable Ui, some to the relationship between U i and the explanatory
variables, and finally some refer to the relationship between the explanatory variables
themselves. The following are assumption underlying the method of least squares.
Assumption 1: Linear regression model: - the regression model is linear in the parameters.
Assumption 2: X (explanatory) values are fixed in repeated sampling. Values taken by the
regressor X are considered fixed in repeated samples. More technically, X is assumed to be non
stochastic. In other wordsThat is our regression analysis is conditional regression analysis, that
is conditional on the given values of the regressor (s) X.
E.g. Recall that for a fixed value of 100 we have Y values of 65, 70, 75 and 80. Hence
X is assumed to be non stochastic.
Assumption 3: Ui is a random real variable. The value which U may assume in any one period
depends on chance. It may be positive, negative or zero. Each value has a certain probability of
being assumed by U in any particular instance.
Assumption 4: Zero mean of the disturbance term. This means that for each value of X, U may
assume various values, some greater than zero and some smaller than zero, but if we consider
all the possible values of U, for any given value of X, they would have an average value equal
to zero. Hence the mean or expected value of the random disturbance term U i is zero.
Symbolically, we have:
E
( Ui
Xi )=0
That meanse the mean value of Ui
conditional upo the given Xi
) = E[ ]
2
Var
( Ui
Xi
U i− E
( U i)
Xi
= E
( Ui
Xi )
= 2 ..........................................(2.9)
Recall that this holds because of assumption 4
Equation (2.9) states that the variance of U i.for each Xi is some positive constant number equal
to 2, equal variance. This means that the Y population corresponding to various X values have
the same variance. Consider the following figures.
Y
Y
Fig a Fig b
X
X X1 X2 X3
X1 X2 X3
(a) (b)
Figure 2.5 Variance of the error term for each Xi
Note that in both cases the distribution of the error term is normal. That is the value of U (for
each Xi) have a bell-shaped systematical distribution about their zero mean.
Var
( Ui
Xi ) =σ 2
i
Where the subscript i on 2 indicates that the variance of the Y population is no longer
constant. To understand the rational behind this assumptions, refer figure (b) where
var
( U
X1 ) < Var ( U
X3 ).
Therefore, the likelihood is that the Y observations coming from the population with X = X 1
would be closer to the PRF then those coming from population corresponding to X = X 3. In
short all Y values corresponding to the various X’s will not be equally reliable, reliability being
judged by how closely or distantly the Y values are distributed around their means.
Stated differently this assumption is saying that all Y values corresponding to the various X’s
are equally important since they have the same variance. Thus assumption 5 implies that the
conditional variance of Yi, are also homoscedastic. That is,
Var (Yi /Xi) = 2
Notice from the above two assumptions that the variable Ui has a normal distribution. That is,
U i ~ N (0, 2). This means the random term Ui is with zero mean and constant variance, 2
Assumption 6: No autocorrelation between the disturbances. Given any two X values, X i and
Xj (i j), the correlation between any two U i and Uj (i j) is zero. This implies that the error
term committed for the ith observation is independent of the error term committed for the j th
observation. Such cases are also known as no serial correlation.
Symbolically,
cov [Ui, Uj/Xi, Xj] = E[Ui – E(Ui)/Xi] [Uj – E(Uj)/Xj]
= E(Ui/Xi) (Ui/Xj)
=0
Figure (a) and (b) implies that because U i dependent on Uj it means that Yt = 0 + 1Xt + Ut
depends not only on Xt but also on Ut-1. This is because Ut-1 to some extent determines U t. Note
that figure (c) shows that there is no systematic pattern to the U’s, thus indicating zero
correlation.
Assumption 7: Zero covariance between U i and Xi or E(UiXi) = 0. That is, the error term is
independent of the explanatory variable(s). If the two are uncorrelated it means that X and U
have separate influence on Y. But if X and U are correlated it is not possible to assess their
individual effects on Y. But since we have assumed that X values are fixed (or non random) in
repeated samples, there is no way for it to co-vary with the error term. Thus, assumption 7 is
not very crucial.
(i.e., Yi -
Y^ i ) should be as small as possible. This method provides us with unique estimates of
^ ^
That is, the sum of squared residual deviations is to be minimized with respect to β 0 and β 1 .
Thus, using partial derivatives we minimize ∑ U^ 2i and set it equal to zero (Recall that the
necessary condition on minimization, or maximization process is that the first derivative is set
to zero). Hence,
∂ (∑ U^ 2i )
=0
∂ β^ 0 .....................................(2.10)
∂ (∑ U^ 2i )
=0
and ∂ β^ 1 ..................................... (2.11)
Recall from (2.8) the formula of ∑ U^ 2i . In this regard the partial differentiation of (2.8) with
^
respect to β 0 will be
∂ (∑ U^ 2i )
∂ β^ −2 ∑ (Y i − β^ 0 − β^ 1 X i )=0
0 = .................... (2.12)
^
In the same way the partial differentiation of (2.8) with respect to β 1 will be
∂ (∑ U^ 2i )
∂ β^ −2 ∑ Xi ( Y i− β^ 0 − β^ 1 X i )=0
1 = ........................ (2.13)
Simplifying (2.12) and (2.13) we generate the following normal equations.
Yi =
n β^ 0 + β^ 1 ∑ X i ......................................... (2.14)
2
XiYi = β 0 ∑ X i + β 1 ∑ X
^ ^ i
......................................... (2.15)
^
Solving for β 1 from (2.15)we obtain:
β^ 1=
∑ X i Y i− β^ 0 ∑ X i
∑X 2 i .......................................... (2.16)
β^ 0 =
∑ X 2i ∑ Y i −∑ X i Y i
n ∑ X 2i −(∑ X i )
2
.......................................... (2.17)
^
Alternatively solving for β 0 from (2.14) gives
β^ 0 =
∑ Y i−β 1 ∑ X i
n
∑ Y − β^
n 1 ∑ Xn
= ....................................... (2.18)
^
= Y - β1 X ....................................... (2.19)
where xi = Xi – X and, yi = Yi - Y
In other words
∑ ( X i −X ) ( Y −Y )
β^ 1 = ∑ ( X− X )2 ............................................ (2.22)
∑X ∑ Y +n ∑ X ∑ Y
∑ XY − n
∑ Y −∑ X
n n n
∑ X 2−2 ∑ ∑ X +n ∑ ∑
X X X
Hence, n n n
2∑ X ∑ Y ∑ X ∑ Y
∑ XY − n
+
n
∑ X 2− ∑ n ∑ + ∑ n∑
2 X X X X
=
n ∑ X i Y i −∑ X i ∑ Y
2
n ∑ X 2 −( ∑ X i )
= i
In this event we should estimate the function Y = 0 + 1X + U by imposing the restriction, 0
= 0.
This is a restricted minimization problem: we minimize
∑ U^ 2i =∑ ( Y − β^ 0− β^ 1 X )
2
^
Subject to β 0 = 0 .......................................... (2.23)
Note that estimation of elasticities is possible from an estimated regression line. Recall that in
^ ^ ^ ^ ^
SRF Y i = β 0 + β 1 X i is the equation of a line whose intercept is β 0 and its slope β 1 . The
^ ^
coefficient β 1 is the derivative of Y with respect to X (i.e. (i .e ., dY dX )
This implies that for a linear function, the coefficientβ^ 1 is a component of the elasticity. This is
because it is defined by the formula
dY
Y dY X
η P= =
dX dX Y
X ............................................... (2.24)
^
Substituting β 1 in place of ( dY dX ) we obtain an average elasticity of the form
x̄ x̄
η P= β^ . = β^ 1
^y ȳ ............................................... (2.25)
^ ^
In passing note that the least square estimators (i.e. β 0 and β 1 ) are point estimators, that is,
given the sample, each estimator will provide only a single (point) value of the relevant
population parameter.
In conclusion the regression line obtained using the least square estimators has the following
properties
^ ^ X̄
i It passes through the sample mean of Y and X. Recall that we got β 0 = Ȳ − β which can
^ ^
be written as Ȳ = β0 + β 1 X̄
^ is equal to the mean value of the actual y. That is
ii. The mean value of the estimated Y = y
^ȳ=Ȳ
iii. The mean values of the residual is equal to zero
Note that column 2 to 7 in the above table is constructed using the information given in column
1 and 2.
^
We can compute β 0 for the above tabulated figure by applying the formula given in (2.17)
that is,
^
Similarly we can compute β 1 , by using the formula given in (2.20) or (2.27). That is, using
(2.20), we obtain:
β^ 1 = 16,800
= 0.51
330,000
^ ^
Notice that once we compute β 1 , we can very easily calculate β 0 using (2.19) as follows.
Interpretation of (2.26) reveals that when family income increase by 1 Birr, the estimated
^
consumption expenditure β 1 amounts to 51 cents.
^
The value of β 0 = 24.4 (which is the intercept) indicates the average level of consumption
expenditure when family income is zero.
2. State and explain the assumptions underlying the method of (ordinary) least squares.
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
3. What is the importance of assuming homoscedastic variance of the error term and no
autocorrelation between the disturbances?
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
4. The following results have been obtained from a sample of 11 observations on the value of
sales (Y) of a firm and corresponding prices (X).
X = 519.18, Y = 217.82
ΣXi2 = 3,134,543, ΣXiYi = 1,296,836
a) Estimate the regression line (function) and interpret the results.
b) Compute the price elasticity of sales using average values of X and Y.
As noted in the previous discussion, given the assumptions of the classical linear regression
model, the least squares possess some ideal or optimum properties. These properties are
contained in the well-known Gauss-Markov theorem.
To understand this theorem, we need to consider the best linear unbiasedness property of an
^
estimator. That is, as estimator, say OLS estimator β i is said to be best linear unbiased
estimator (BLUE) of i if the following hold:
i. Linear Estimator. It is linear, that is, a linear function of a random variable, such as the
dependent variable Y in the regression model. Thus, an estimator is linear if it is a linear
function of the sample observations; that is, if it is determined by a linear combination of
the sample data. Given the same observations Y 1, Y2, …, Yn, a linear estimator will have
the form.
K1Y1 + K2Y2 + … + KnYn ............................................. (2.27)
Where the K i ’s are some constants
iii. Minimum Variance estimator (or best estimator) An estimator is best when it has the
smallest variance as compared with any other estimate obtained from econometric
^
β
methods. Symbolically is best if
More formally,
^ ~
Var ( β ) < Var ( β )
Introduction to Econometrics Page 37
An unbiased estimator with the least variance is known as an efficient estimator.
^
β ^
β
The following figure shows the sampling distribution of two alternative estimators, and
We can prove that the least squares estimators are BLUE provided that the random term U
satisfies some general assumptions, namely that the U has zero mean and constant variance.
This proposition, together with the set of conditions under which it is true, is known as Gauss-
Markov least squares theorem
All Estimators
All Linear Estimators
Linear Unbiased
Estimators
The box above reveals that all estimators are not linear. Furthermore, not all linear estimators
are unbiased. The unbiased linear estimators are a subset of the linear estimators. In the group
^
β
of linear unbiased estimator, has the smallest variance. Hence, OLS possess three properties
namely linear, unbiased and minimum Variance.
∑ X i (Y −Y ) =
∑ xY −Y ∑ x i
= ∑ x 2i ∑ x2i
∑ xiY i
= ∑ xi
2
β^ 1=∑ k i Y i
∑ xi
K = ∑ xi
2
where i
Var( β 1 ) = E[ 1
β^ −E( β^ 1 ) ]
2
^
^
Notice that since E( β 1 ) = 1 it follows that
=
∑ xi 2 .................................................... (2.31)
^
Thus, the standard error (s.e.) of β 1 is given by
σu
^
s.e( β 1 ) = √∑ x i2 ................................................... (2.32)
^
It follows that the variance (and s.e.) of β 0 can be obtained following the same line of reasoning
as above.
^ Ȳ − β1 X̄ .
Recall from (2.19) that β 0 =
^
Moreover, remember that from the PRF we can compute Ȳ = β1 + β 2 X̄ + Ū . Substituting this
This means
β^ 0 - 0 = 1 X̄ + Ū − β 1 X̄
^
= -( β1 − β 1 ) X̄ + Ū
^
^
Now, since Var ( β 0 ) = E(0 - 0)2 it follows that
^ ^
Var ( β 0 ) = E( β 0 - 0)
[−( β^ −β ) X̄ +Ū ]
2
=E 1 1
=E
[ 2
X̄ 2 ( β^ 1 −β 1 ) + Ū 2 −2 X̄ ( β^ 1 −β 1 ) Ū ] ................................ (2.33)
Introduction to Econometrics Page 41
X̄ 2 E ( β^ 1−β 1 ) + E ( Ū 2 ) +2 X ( β^ 1 −β 1 ) E Ū
2
=
1
2
nσ 2
= n
σ2
= n
Therefore, using this information we can adjust (2.32) to obtain
σ σ
2 u2 u2
X̄ + +0
^ ∑ x i2 n
Var( β 0 ) =
( )
1 X̄ 2
σ +
u
2
n ∑x2
= i
∑ Xi2
σ 2
^ u n∑ x
Var( β 0 ) = i
2
.................................................. (2.34)
^
Hence, the standard error of β 0 is given by
^
s.e.( β 0 ) =
σu
^
√ ∑ X i2
n∑ X 2
^
i
^ ^
................................................. (2.35)
Moreover, the covariance (cov) between β 0 and β 1 describes how β 0 and β 1 are related.
^ ^ ^ β 0 )] [ β 1 - E( β^ 1 )]
Cov ( β 0 , β 1 ) = E[( β 0 - E(
^ β β^ β
= E( β 0 - 0) ( 1 - 1)
^
Using the information given about E( β 0 - 0) in (2.32), we can rewrite the above result as
follows
σ^ 2u =
∑ U^ i
n−k ................................................. (2.37)
where k (which is 2 in this case) stands for the number of parameters and hence n-k represents
the degree of freedom.
Remember that
b) In the same way we define the deviation of the regressed (i.e., the estimate from the line)
^
values of Y ’s , from the mean value, y^ =Y^ i−Ȳ . This is part of the total variation of Y i
which is explained by the regression line. Thus, the sum of the squares of these deviations
is the total explained by the regression line
n n
∑ ^yi2= ∑ ( ^y i− ȳ ) 2
[Explained variation] = i=1 i ............................... (2.40)
∑ ( y^ −Y )2 + ∑ U^ i2
1 = ∑ ( Y −Y )2 ∑ ( Y −Y )2 ........................................... (2.46)
We now define r2 as
∑ ( y^ −Y )2
r2 = ∑ ( Y −Y )2
∑ ^y 2i
= ∑ y2 ..................................................... (2.47)
ESS
Notice that (2.47) is nothing but TSS Note Thus, r2 is the square of correlation coefficient r 2
determines the proportion of the variation of Y which is explained by variations in X. For this
reason r2 is also called the coefficient of determination. It measures the proportion of the total
variation in Y explained by the regression model.
RSS
r2 = 1 – TSS
β^ 1
∑ xy
r2 = ∑ y2 ................................................... (2.49)
β^ 21
∑x 2
= ∑ y2 ................................................... (2.50)
Note that if we are working on cross section data, an r 2 value equal to 0.5 may be a good fit.
But for time series data 0.5 may be too low. This means that there is no hard-and-fast rule as to
how much r2 should be. Generally, however, r2 is a good fit the higher the value of it is.
∑ Xi2
σ
^ u n∑ x
2
Recall from (2.36) that Var ( β 0 ) = i
2
42 . 16(322 , 000)
=41 .13
= 10(33 , 000 )
42 . 16
=0 .0013
= 33 , 000
We can calculate r2 by using (2.47), (2.48), (2.49) or (2.50). for this example we use
(2.49) and (2.50)
0.51 (16,800)
Using (2.49), r2 = = 0.96
8890
(0.51)2 (33,000)
Using (2.50), r2 = = 0.96
8890
β^ 0 ~ N ( √
β0 , σ
2 ∑ X 21
n ∑ x 2i )
and
β^ 1 ~ N ( √
β 1 , σ 2u
1
n ∑ x 2i )
Among a number of tests in this regard, we will examine the standard error test. This test helps
^ ^
us to decide whether the estimates β 0 and β 1 are significantly different from zero, i.e., whether
the sample from which they have been estimated might have come from a population whose
true parameters are zero (0 = 0 and/or 1 = 0). Formally we test the null hypothesis.
H0: i = 0 (i.e., X and Y have no relationship)
against the alternative hypothesis:
H1: 0 (i.e., Y and X have a relationship)
This is a two-tailed (or two sided) hypothesis. Very often such a two-sided alternative
hypothesis reflects the fact that we do not have a strong a priori or theoretical expectation about
the direction in which the alternative hypothesis should move from the null hypothesis.
In statistics, when we reject the null hypothesis, we say that our finding is statistically
significant. On the other hand, when we do not reject the null hypothesis, we say that our
finding is not statistically significant.
Some times we have a strong a priori or theoretical expectation (or expectations based on some
previous empirical work) that the alternative hypothesis is one sided or unidirectional rather
than two-sided, as just discussed.
For instance in a consumption – income function C = 0 + 1Y one could postulate that:
H0: 1 0.3
H1: 1 > 0.3
That is, perhaps economic theory or prior empirical work suggests that the marginal propensity
to consume (1 ) is greater than 0.3. [Note: Students are strongly advised to refer and grasp the
discussion in unit 7 and 8 of the course Statistics for Economics]
Recall that in our statistics to economics course we learned the formula which transforms the
value of any variable X into t units as shown below.
X i −μ
t= Sx
n = Sample size
Accordingly the variable
β^ i −βi β^ i− βi
t = √ Var β i = s . e . β i ............................................... (2.51)
follows the t-distribution with n – k degrees of freedom, where
β^ i = least square estimate of
i
i = hypothesized value of i
^
Var β i = estimated variance of i (from the regression)
n = Sample size
Acceptance
Rejection Region Rejection
region region
-t/2 t/2
Recall that if it is a one tailed test, the rejection region is found only on one side. Hence, we
reject H0 if t* > t. or t* < -t.
^ ^
Given (2.50) the sample value of t* would be greater than 2 if the relevant estimate ( β 0 or β 1 )
is at least twice its standard deviation. In other words, we reject the null hypothesis if
^ ^
t* > 2 if β i > 2 s.e. ( β i ) ........................................................... (2.53)
^ ^
or S( β i ) < β i /2
Example: Suppose that from a sample size n = 20, we estimate the following consumption
function.
Ĉ = 100 + 0.70Y
(75.5) (0.21)
where the figure in brackets are the standard errors of the coefficients β0= 100 and β1=0.70.
Are the estimates significant?
β^ 0 = 100
*
t = = = 1.32
s.e ( β^0 =)
75.5
β^ 1 = 0.70
*
t = = = 3.3
s.e ( β^1 =)
0.21
Note that for β0 since the calculated value (=1.32) is less than the table value (2.10), we cannot
reject the H0: β0=0. Thus the estimated value β0 is insignificant.
In conclusion, note that if a researcher gets high r2 value and the estimates have low standard
errors then the result is good. In practice, however, such an ideal situation is rare. Rather we
may have low R2 values and low standard errors or high r 2 values but high standard errors.
There is no agreement among econometricians in this case, so the main issue is whether to
obtain high r2 or lower standard error of the parameter estimates.
In general r2 is more important if the model is to be used for forecasting. Standard error
becomes more important when the purpose of the exercise is the explanation or analysis of
economic phenomena and the estimation of reliable values of the economic relationship.
Recall what we have said about constructing confidence interval in the course “Statistics for
Economics”. We said that in confidence interval analysis first we determine the probability
level. This is referred to as the confidence level (or confidence coefficient). Usually the 95%
confidence level is chosen. This means that in repeated sampling the confidence limits,
computed from the sample, would include the true population parameter in 95 percent of the
cases. In the other 5 percent of the cases the population parameter will fall outside the
confidence limit.
^
The Z-statistics for the regression parameters (i.e., β i ), is given by
β^ i −β i
Z= s . e ( β^ )
i .................................................. (2.54)
where s.e = standard error
Our first task is to choose a confidence coefficient say 95 percent. We next look at the standard
normal table and find that the probability of the value of Z lying between –1.96 and 1.96 is
0.95. This may be written as follows
P(-1.96 < S . e ( β^ )
i < 1.96) = 0.95
Rearranging this result we obtain
^ ^ ^ ^
P[ β i -1.96 s.e ( β i ) < i < β i + 1.96 s.e ( β i )] = 0.95
Thus, the 95 percent confidence interval for i is
β^ i -1.96 S.e ( β^ i ) < < β^ i + 1.96 S.e ( β^ i )
i
or i = β^
i 1.96 (S.e β^ i)
^ ^
Example: given β i = 9 and s.e( β i ) = 2, choosing a value of 95 percent for the confidence
coefficient
Solution: we find the confidence interval to be
i = 9 1.96(2)
^ ^
If β i = 8.4 and s.e( β i ) = 2.2, choose the 95 percent for the confidence coefficient
(38.2) (0.85)
where the results in the parenthesis are standard errors. Construct the 95% confidence interval
for the intercept and slope.
where
Y^ i is the estimator of true E(Y ) corresponding to given X. Note that there are two kinds
i
where
Y^ 0 = estimator of E(Y/X )
0
E(
Y^ 0 ) = E( β^ 0 + β^ 1 X 0 ) = + X
0 1 0
Var(
Y^ 0 ) = Var ( β^ 0) + Var(β^ X 2 1) β^ β^
0+ 2 Cov ( 0 1 )X0
[ ]
2
2 1 ( X 0 − X̄ )
σu +
=
n ∑ x2 i ............................................. (2.56)
2 ^2
By replacing the unknown σ u by its unbiased estimator σ u , (2.56) can be rewritten as
[ ]
2
1 ( X 0− X̄ )
+
Var(
Y^ 0 ) = σ^ 2u n ∑ x2 i .............................................. (2.57)
RSS ∑ U^ i
2
=
^ 2 = n−k n−k
Recall that σ
increases the further away the value of X0 is from X̄ . Therefore the variable
Y^ 0 −Ȳ
t = s. e( Y^ 0 )
Y^
= 0 t S.e( 0 )
Y^ ...............................................................
/2
(2.59)
where s.e ( √
Y^ 0 ) = Var( Y^ 0 )
b) Individual Prediction
If our interest lies in predicting an individual Y value, Y 0, corresponding to a given X value,
say X0, then the application in forecasting is called individual prediction
Consider the following example
Y^ 0 = 24.45 + 0.50X
0
That is,
Y^ 0 = 24.45 + 0.509(100)
= 75.36
In order to see the reliability of the above result, we have to obtain the prediction error which is
given by the predicted value less the actual value.
i.e.,
Y^ 0 - Y
0
^ ^
= ( β 0 + β 1 X 0 ) – (0 + 1X0 + U0)
=( β^0 β^
- 0) + ( 1 - 1)X0 – U0
Note that E(
Y^ 0 - Y) = E[( β^ 0 - ) + ( β^ 1 - )X – U ] = 0
0 1 0 0
Var(
Y^ 0 - Y ) = E[Y^ 0 -Y ]2
0 0
β^
= E[( 0
^
- 0) + ( β 1 - 1)X0 – U0]2
= 2
[ 1
1+ +
( X 0 − X̄ ) 2
n ∑ ( X i− X̄ )2 ] ................................................ (2.60)
^ we get:
By replacing the unknown 2 by its unbiased estimator σ
2
[ ]
2
2 1 ( X 0 − X̄ )
σ^ 1+ +
n ∑ ( X i− X̄ )2
v a^ r (Y^ 1 -Y0) = ............................................... (2.61)
Note that the variance increases the further away the value of X0 is from X̄
t= Se ( Y^ −Y 0 ) ......................................................... (2.62)
Follows a t-distribution with n-2 degrees of freedom. Therefore, the t-distribution can be used
to draw inferences about the true Y 0. Continuing with the above example we see that the point
= (58.63
Y 0 / X 0 = 100 92.09)
2.6 SUMMARY
The overall goodness of fit of the regression model is measured by the coefficient of
determination, r2. It tells what proportion of the variation in the dependent variable, or
regress and is explained by the explanatory variable or the regressor. This r2 lies between 0
and 1; the closer it is to 1, the better is the fit.
Hypothesis testing answers the question of whether a given finding is compatible with a
stated hypothesis or not. In hypothesis testing the Z-test and t test are used, among others.
If the model is deemed practically adequate, it may be used for forecasting (predicting)
purpose. In this regard we have mean predication and interval prediction.
4.
Y^
a) i = 0.45 + 0.41Xi
b) 0.98
L=∑ ( Y − β^ 0 − β^ 1 X ) −λβ 0
2
^ ^
where is the lagrangean multiplier. We minimize the function with respect to β 0 β 1 and
∂L
∂ β 0 = −2 ∑ (Y − β^ 0 − β^ 1 X )−λ=0 ------------------------- a
∂L
∂ β 1 = −2 ∑ X (Y − β^ 0 − β^ 1 X )=0 ------------------------ b
∂L
=− β^ 0 =0
∂λ ------------------------- c
Substituting (c) into (b) and re-arranging we obtain
−2 ∑ X (Y − β^ 1 X )=0
β^ 1=
∑ XY
∑ X2
Answer to check your progress 3
a) Ŷi = 2.69 - 0.48 Xi
^ ^ ^ ^
b) Var ( β 0 ) = 0.01, S.e ( β 0 ) = 0.12 , Var ( β 1 )= 0.013, S.e ( β 1 )= 0.01
c) r2 = 0.66
For no. 1
For no.2
This is individual prediction problem
i) For X0 = 850
Y^ 0 = 31.76 + 0.71(850)
= 635.2
The value of t0.025 for 10 degrees of freedom is 2.23. Hence, the 95% confidence interval is
given by
Y0 = 635.2 2.23 (5.68)
= 622.3 < Y0 < 647.67
So we are 95% confident that the forecasted value of Y(= Y 0) will lie between 622.3 and
647.67.
The following table includes GDP(X) and the demand for food (Y) for a certain country over
ten year period.
Year 1980 81 82 83 84 85 86 87 88 89
Y 6 7 8 10 8 9 10 9 11 10
X 50 52 55 59 57 58 62 65 68 70
Introduction to Econometrics Page 64
1. Estimate the food function Y = β0+ β1X+ u and interpret your result.
2. Calculate elasticity of Y with respect to X at their mean value and interpret your result.
3. Compute r2 and find the explained and unexplained variation in the food expenditure.
4. Compute the standard error of the regression estimates and conduct tests of significance at
the 5% significant level.
5. Find the 95% confidence interval for the population parameter (β0 and β1)
Ŷi = 31.76 + 0.71 Xi
^2
r2 = 0.99 , σ u . = 285.61
7. Construct a 95% confidence interval for the result you obtained in (6). [Hint: use individual
prediction approach]
Where Ei is normal with zero mean and unknown variance σ u , gave the following data:
Yi = 21.9 (Yi-Y )2 = 86.9
(Xi- X )(Yi-Y ) =106.4 Xi=186.2 (Xi- X )2 =215.4
Contents
3.0 Aims and Objectives
3.1 Introduction
3.2 Specification of the Model
3.3 Assumptions
3.4 Estimation
3.5 The Coefficient of Multiple Determination
3.6 Test of Significance in Multiple Regression
3.7 Forecasting Based on Multiple Regression
3.8 The Method of Maximum Livelihood (ML)
3.9 Summary
3.10 Answers to Check Your Progress
3.11 References
3.12 Model Examination Question
The purpose of this unit is to introduce you with the concept of multiple linear regression
model and show how the method of OLS can be extended to estimate the parameters of such
models.
3.1 INTRODUCTION
We have studied the two-variable model extensively in the previous unit. But in economics you
hardly found that one variable is affected by only one explanatory variable. For example, the
demand for a commodity is dependent on price of the same commodity, price of other
competing or complementary goods, income of the consumer, number of consumers in the
market etc. Hence the two variable model is often inadequate in practical works. Therefore, we
need to discuss multiple regression models. The multiple linear regression is entirely concerned
with the relationship between a dependent variable (Y) and two or more explanatory variables
(X1, X2, …, Xn).
Let us start our discussion with the simplest multiple regression model i.e., model with two
explanatory variables.
Y = f(X1, X2)
Example: Demand for a commodity may be influenced not only by the price of the commodity
but by the consumers income.
Since the theory does not specify the mathematical form of the demand function, we assume
the relationship between Y, X1, and X2 is linear. Hence we may write the three variable
Population Regression Function (PRF) as follows:
Yi = o + 1X1i + 2X2i +Ui
3.3 ASSUMPTIONS
To complete the specification of our simple model we need some assumptions about the
random variable U. These assumptions are the same as those assumptions already explained in
the two-variables model in unit 2.
3.4 ESTIMATION
We have specified our model in the previous subsection. We have also stated the assumptions
required in subsection 3.3. Now let us have sample observations on Y, X 1i, and X2i and obtain
estimates of the true parameters b0, b1 and b2
Yi X1i X2i
Y1 X11 X21
Y2 X12 X22
Y3 X13 X23
Yn X1n X2n
The sample regression function (SRF) can be written as
¿ ¿ ¿
Yi = β o + β1 X 1i + β 2 X 2i +U i
¿ ¿ ¿
β
Where β o , β 1 and β 2 are estimates of the true parameters o , β 1 and β 2
U i is the residual term.
Y^ ∑ ¿ ¿ ¿
expression with respect to the unknowns (i.e. β o , β 1 and β 2 ) should be set to zero.
( ) =0
¿ ¿ ¿ 2
∂ ∑ Y i −β 0 −β1 X 1i −β 2 X 2i
¿
∂ β0
( ) =0
¿ ¿ ¿ 2
∂ ∑ Y i −β 0 −β1 X 1i −β 2 X 2i
¿
∂ β1
( ) =0
¿ ¿ ¿ 2
∂ ∑ Y i −β 0 −β1 X 1i −β 2 X 2i
¿
∂ β2
After differentiating, we get the following normal equations:
¿ ¿ ¿
After solving the above normal equations we can obtain values for β o , β 1 and β 2
¿ ¿ ¿
β o = Ȳ − β1 X̄ 1−β 2 X̄ 2
(∑ x 2i y i ) (∑ x 21i )−( ∑ x 1i y i )( ∑ x 1i x 2i )
¿
2
β2 = (∑ x 21 i )(∑ x22 i )−(∑ x 1 i x 2 i )
where the variables x and y are in deviation forms
¿ ¿ ¿
Note: The values for the parameter estimates ( β o , β 1 and β 2 ) can also be obtained by using
other methods (ex. crammer’s rule).
[ 1 X 1 ∑ x2 + X 2 ∑ x 1 −2 X̄ 1 X̄ 2 ∑ x 1 x 2
]
2 2 2 2
¿ σ^ 2u + 2
n ∑ x 21 ∑ x 22−(∑ x 1 x 2 )
Var( β o ) =
¿ σ^ 2u
∑ x 22
∑ x 21 ∑ x 22 −(∑ x 1 x 2)
2
Var( β 1 ) =
¿ σ^ 2u
∑ x 21
∑ x 21 ∑ x 22 −(∑ x 1 x 2)
2
Var( β 2 ) =
In unit 2 we saw the coefficient of determination (r 2) that measures the goodness of fit of the
regression equation. This notion of r 2 can be easily extended to regression models containing
more than two variables.
In the three-variable model we would like to know the proportion of the variation in Y
explained by the variables X1 and X2 jointly.
The quantity that gives this information is known as the multiple coefficient of determination.
It is denoted by R2, with subscripts the variables whose relationships is being studies.
2
y . X1 X 2
Example: R - shows the percentage of the total variation of Y explained by the
regression plane, that is, by changes in X1 and X2.
∑ ^y i2 ∑ ( Ȳ i−Ȳ ) 2
=
R
2
y . X1 X 2
=
∑ y 2 ∑ ( Y i−Ȳ ) 2
i
∑ U i2 RSS
=1−
=1–
∑ i2
y TSS
yi =
^y i + U
i
¿ ¿
^y
Ui2 = (yi - i )2 = (yi - β 1 x1i - β 2 x 2i )2
¿ ¿
= (yi -
^y i )y since U = y - ^y i
i i i
¿ ¿ ¿ ¿
∑U2
i
=1−
[ ¿
∑ y i2−β 1∑ x 1i y i−β 2 ∑ x 2 i y i
¿
]
R
2
y . X1 X 2
=1–
∑ y2
i
∑ yi 2
¿ ¿
β1 ∑ x 1 i y i + β 2 ∑ x 2i y i
=
∑ y i2 , where x1i, x2i and yi are in their deviation forms.
The value of R2 lies between 0 and 1. The higher R2 the greater the percentage of the variation
of Y explained by the regression plane, that is, the better the goodness of fit of the regression
plane to the sample observations. The closer R2 to zero, the worse the fit.
The Adjusted R2
Note that as the number of regressors (explanatory variables) increases the coefficient of
multiple determinations will usually increase. To see this, recall the definition of R 2
∑ U i2
R2 = 1 –
∑ yi2
Now yi2 is independent of the number of X variables in the model because it is simply (yi -
Ȳ )2. The residual sum of squares (RSS), Ui2, however depends on the number of explanatory
variables present in the model. It is clear that as the number of X variables increases, Ui2 is
bound to decrease (at least it will not increase), hence R 2 will increase. Therefore, in comparing
two regression models with the same dependent variable but differing number of X variables,
Therefore, to correct for this defect we adjust R 2 by taking into account the degrees of freedom,
which clearly decrease as new repressors are introduced in the function
∑ U i2 / ( n−k )
R2 = 1 – ∑ y i2 / ( n−1 )
( n−1 )
2
or R = 1 – (1 – R2) n−k
where k = the number of parameters in the model (including the intercept term)
n = the number of sample observations
R2 = is the unadjusted multiple coefficient of determination
As the number of explanatory variables increases, the adjusted R 2 is increasingly less than the
2
unadjusted R2. The adjusted R2 ( R ) can be negative, although R2 is necessarily non-negative.
In this case its value is taken as zero.
2
If n is large, R and R2 will not differ much. But with small samples, if the number of
2
regressors (X’s) is large in relation to the sample observations, R will be much smaller than
R2.
The principle involved in testing multiple regressions is identical with that of simple regression.
We can test whether a particular variable X1 or X2 is significant or not holding the other
variable constant. The t test is used to test a hypothesis about any individual partial regression
coefficient. The partial regression coefficient measures the change in the mean value of Y
E(Y/X2,X3), per unit change in X2, holding X3 constant
¿
β1 −β i
¿
t = S( β i ) ~ t(n – k) (i = 0, 1, 2, …., k)
The theoretical values of t (at the chosen level of significance) are the critical values that define
the critical region in a two-tail test, with n – k degrees of freedom.
H0:
βi = 0
H1:
β i 0 or one sided ( β i > 0, β i < 0)
The null hypothesis states that, holding X2 constant, X1 has no (linear) influence on y.
If the computed t value exceeds the critical t value at the chosen level of significance, we may
¿
reject the hypothesis; otherwise, we may accept it ( β 1 is not significant at the chosen level of
significance and hence the corresponding regression does not appear to contribute to the
explanation of the variations in Y).
Look at the following figure
Assume = 0.05,
t α
2 = 2.179 for 12 df
Acceptance
region
95%
Critical
region 2.5% Critical
region (2.5%)
Note that the greater the value of t calculated the stronger is the evidence that β i is significant.
For a number of degrees of freedom higher than 8 the critical value of t (at the 5% level of
significance) for the rejection of the null hypothesis is approximately 2.
The above joint hypothesis can be tested by the analysis of variance (AOV) technique. The
following table summarizes the idea.
Therefore to undertake the test first find the calculated value of F and compare it with the F
tabulated. The calculated value of F can be obtained by using the following formula.
∑ ^y i2 / ( k−1 ) ESS/ ( k −1 )
=
F = ∑ U i / ( n−k )
2 RSS/ ( N −k )
follows the F distribution with k – 1 and n – k df.
where k – 1 refers to degrees of freedom of the numerator
n – k refers to degrees of freedom of the denominator
k – number of parameters estimated
F
0 1 2 3 4 5
When R2 = 0, F is zero. The larger the R 2, the greater the F value. In the limit, when R 2 = 1, F is
infinite. Thus the F test, which is a measure of the overall significance of the estimated
regression, is also a test of significance of R2. Testing the null hypothesis is equivalent to
testing the null hypothesis that (the population) R2 is zero.
This has been discussed in unit 2. The (1 - )100% confidence interval for
β i is given by
¿ ¿
β i t S( β i ), (i = 0, 1, 2, 3, … k)
/2
Example: Suppose we have data on wheat yield (y), amount of rainfall (x 2), and amount of
fertilizer applied (X1). It is assumed that the fluctuations in yield can be explained by varying
levels of rainfall and fertilizer.
Table 3.6.1
(1) (2) (3) (4) (5) (6) (7) (8) (9)
Yield Fertilizer Rain fall yi x1i x2i x1i yi x2i yi x1x2
(Y) (X1) (X2)
40 100 10
50 200 20
50 300 10
Ȳ = 60 X̄ 1 = 400 X̄ 2 = 20 (means)
¿ ¿ ¿
Now find the deviations of the observations from their mean values. (Column 4 to 11 in the
above table)
The next step will be to insert the following values (in deviation) in to the above formula
x1i yi = 16500, x2i2 = 400, x2iyi = 600, x1i x2i = 7000, x1i2 = 280,000,
= 0.833
¿
Now β o =
= 60 – (0.0381) (400) – (0.833) (20)
= 28.1
Introduction to Econometrics Page 78
Hence the estimated function is written as follows
Y^ i = 28.1 + 0.0381X + 0.833X
1 2
¿ ¿
Solution
σ^ 2u . ∑ x 22 σ^ 2 ∑ x 22
¿ ¿
Var( β 1 ) = ∑ 1 ∑ 2 ( ∑ 1 2 ) , Var ( β 2 ) = ∑ 1 ∑ 2 ( ∑ 1 2 )
2 2
x 2 x 2− x x x 2 x 2− x x
^2
In order to use the above formula we need to find σ u
∑ U 2i
σ^ 2u = n−k but U = Y - Y^
i i
U2 = y2 - ^y 2
U2 = y3 -
^y 3
: :
¿
Therefore, Ui = (Yi - Y )2 2
2 0 . 576 ¿ ⋮
(Yi−Y^) ¿5.6 4 ¿21.4286¿ ¿
21 . 4286
Hence u2 = 7−3 = 5.3532
¿ (5 . 3572)(400 )
Var( β 1 ) = (280 , 000 )(400 )−(7000 ) = 0.000034
2
S( β 1 ) = √ 0 . 000034 = 0.0058
¿ (280 ,000 )
Var( β 2 ) = (5.3572) 63 , 000 , 000 = 0.02381
S( β 2 ) = √ 0 . 02381 = 0.1543
R2 =
∑ y i2
(0 . 0381)(16 , 500)+(0 .833 )(600 )
= 1150 = 0.98
Interpretation: 98% of the variation in yield is due to the regression plane (i.e., because of
variation in the amount of fertilizer and rainfall). The model is a good fit.
ttabulated =
t 0. 05 ( 7−3 )
2 = 2.78- can be found from the statistical table (t-distribution)
Decision: Since tcalculated > ttabulated , we reject H0.
That is β 1 is statistically significant. The variable X1, fertilizer significantly affects yield.
(b) H0: β 2 =0
H1:
β 2 0
= 0.05
¿
β2 −β 2
0. 833
¿
ttabulated =
t 0. 05 ( 7−3 )
2 = 2.78
¿
0.0219< β 1 <0.0542
Interpretation: The value of the true population parameter β 1 will lie between 0.0219 and
0.0542 in 95 out of 100 cases.
¿ ¿ ¿
Note: The coefficient of X1 and X2 ( β 1 and β 2 ) measures the partial effect. For example β 1
measures the rate of change of Y with respect to X1 while X2 is held constant
Decision: we reject H0 since Fcal > Ftab. We accept that the regression is significant: not all
β i ’s
are zero.
Let us now turn our attention to the problem of forecasting the value of the dependent variable
for a given set of values of the explanatory variables. Suppose the given values of the
explanatory variables be X01, X02, X03,…, X0k, and let the corresponding value of the dependent
variable be Y0. Now we are interested in forecasting Y0.
For three variable cases, the point forecast can be found as follows:
¿ ¿ ¿
Y^ 0 = β o + β1 X 01+ β2 X 02
Example 1. Consider the example in section 3.6. (Table 3.6.1)
The estimated regression equation is
Y^ i = 28.1 + 0.0381X + 0.833X .
1 2
P(
Y^ 0 -t SY^ 0 < Y < Y^ 0 + t SY^ 0 ) = 1 -
/2 0 /2
Y^ 0 t SY^ 0
/2
SY^
where 0 is the standard error of the forecast value and it can be found by using the following
formula
SY^
0 =
S √ 1+ X T0 ( X T X )−1 X 0
√ √ √
∑ U i2 ¿
UT U Y T Y − βT x T Y
= =
where S = n−k n−k n−k
X T0 = [X , X , …, X ]
01 02 0k
XT X=¿ [ [ ]
n ∑ X1 ∑ X2 ¿ ] ∑ X1 ∑ X 2 ∑ X1 X2 ¿ ¿
1 ¿¿
¿ Row data form
Note: Students need to know some basic concepts on matrix algebra. It is necessary for the
analysis of general multiple linear regression models.
A method of point estimation with some stronger theoretical properties than the method of OLS
in the method of maximum likelihood (ML)
{ }
2
1 1 ( Y i− β0 −β 1 X i )
exp −
where f(Yi) = σ √ 2 π 2 σ2 ………………(2)
Which is the density function of a normally distributed variable with the given mean and
variance.
{ }
2
1 −1 ( Y i− β0 −β 1 X i )
n
exp
2
∑
LF (o, 1, ) = σ ( √ 2 π )
2
n
σ2 ……………………...(4)
The method of maximum likelihood, as the name indicates, consists in estimating the unknown
parameters in such a manner that the probability of observing the given Y’s is as high (or
maximum) as possible. Therefore, we have to find the maximum of the function (4).
Using your knowledge of deferential calculus
n 1 ( Y i −β 0−β 1 X i )2
∑
lnLF = nln - 2 ln(2) - 2 σ2 ……………(5)
−n n 1 ( Y i −β 0−β 1 X i )2
∑
= 2 ln2 - 2 ln(2) - 2 σ2 ………………(6)
Differentiating partially with respect to o, 1, and 2, setting the result to zero we obtain
1 ~ ~
~
σ 2 (Yi - β 0 - β 1 Xi)Xi = 0 …………………………………(11)
−n + 1 ~ ~
~
2σ 2 ~ 4 β
2 σ (Yi - 0 - 1 Xi)2 = 0 ………………………….(12)
β
~ ~
1
^2
= n U i
~2
It is obvious that the ML estimator σ differs from the OLS estimator σ
¿
2 [ 1
= (n−2 )
]∑ U^ 2i
,
which was shown to be an unbiased estimator of 2. Thus, the ML estimator of 2 is biased. The
magnitude of this bias can be easily determined as follows:
~
1
^ 2
E( σ ) = n (U i ) =
2 ( )
n−2 2
n
σ 2 2
= 2 – n
σ
(
But notice that as n, the sample size, increases indefinitely, the second term above n
−2
σ )
,
2
~2
the bias factor, tends to be zero. Therefore, asymptotically (i.e., in a very large sample), σ is
unbiased too.
3.9 SUMMARY
Yi = β o + β1 X 1i + β 2 X 2i +U i
Assumptions of the model
1. Zero mean value of Ui: E(Ui/X1i, X2i) = 0 for each i.
2
2. Homoscedasticity ; Var (Ui) = E(Ui2) = σ u
2
3. Normality; Ui ~ N(0, σ u )
4. No serial correlation (serial independence of the U’s); Cov (Ui, Uj) = 0 for i j
5. Independence of Ui and Xi; Cov(Ui, X1i) = Cov (Ui, X2i) = 0
6. No collinearity between the X variables (No multicollinearity)
7. Correct specification of the model
Formulas for the parameters
¿ ¿ ¿
β o = Ȳ − β1 X̄ 1−β 2 X̄ 2
(∑ x 1i y i ) ( ∑ x 22i )−( ∑ x 2i y i ) (∑ x 1i x 2i )
¿
(∑ x 21 i )(∑ x22 i )−(∑ x 1 i x 2 i )
2
β1 =
(∑ x 2i y i ) (∑ x 21i )−( ∑ x 1i y i )( ∑ x 1i x 2i )
¿
2
β2 = (∑ x 21 i )(∑ x22 i )−(∑ x 1 i x 2 i )
where the variables x and y are in deviation forms
Var( β o ) =
n [X 2 ∑ x22 + X 22 ∑ x 21 −2 X̄ 1 X̄ 2 ∑ x 1 x 2
^σ 2u 1 + 1
∑ x 2 ∑ x 2 −( ∑ x x ) 1 2
2
1 2
]
¿ σ^ 2u
∑ x 22
∑ x 21 ∑ x 22 −(∑ x 1 x 2)
2
Var( β 1 ) =
¿ σ^ 2u
∑ x 21
∑ x 21 ∑ x 22 −(∑ x 1 x 2)
2
Var( β 2 ) =
σ^ 2u=
∑ U2
Where n−k , k being the total number of parameters that are estimated.
x1 and x2 are in deviations form.
The multiple coefficient of determination (R 2): measures the proportion of the variation in Y
explained by the variables X1 and X2 jointly.
∑ ^y i2 ∑ ( Ȳ i−Ȳ ) 2 ∑ U i2 RSS
= =1−
R
2
y . X1 X 2
=
∑ y 2 ∑ ( Y i−Ȳ ) 2
i =1–
∑ yi2 TSS
The Adjusted R2
∑ U i2 / ( n−k )
( n−1 )
R2 = 1 – ∑ y i2 / ( n−1 ) 2
or R = 1 – (1 – R ) n−k
2
The partial regression coefficient measures the change in the mean value of Y E(Y/X 2,X3), per
unit change in X2, holding X3 constant
Hypothesis Testing about Individual Partial Regression Coefficients
¿
β1 −β i
¿
t = S( β i ) ~ t(n – k) (i = 0, 1, 2, …., k)
H1:
β i 0 or one sided ( β i > 0, β i < 0)
Forecasting
Point forecast vs interval estimation (the forecasted value will lie on the interval (a, b))
Y^ 0 −Y 0
S Y^
The 95% confidence interval for Y0 (forecasted value) can be given by making use of 0
~t(n – k)
P(-t/2 < t < t/2) = 1 -
S( β 1 ) = 2.55, S( β 2 ) = 0.01
¿ ¿
1. The following table shows observations on quantity of oranges sold (y), price in cents
(X1), and advertising expenditures (X2)
Quantity Price Advertising expenditure
(Y) (X1) (X2)
55 100 5.5
70 90 6.3
90 80 7.2
100 70 7.0
2. The following results were obtained from a sample of 12 firms on their output (Y), labor
input (X1) and capital input (X2), measured in arbitrary units.
Y = 753 Y2 = 48,139 YX1 = 40, 830
X1 = 643 X12 = 34,843 YX2 = 6,796
X2 = 106 X22 = 976 X1X2 = 5,779
a) Find the least squares equation of Y on X 1 and X2. What is the economic meaning of
your coefficients?
b) Given the following sample values of output (Y), compute the standard errors of the
estimates and test their statistical significance.
Firms A B C D E F G H I J K L
Outpu 64 71 53 67 55 58 77 57 56 51 76 68
t
c) Find the multiple correlation coefficient and the unexplained variation in output
d) Construct 99 percent confidence intervals for the population parameters.
3. From the following data estimate the partial regression coefficients, their standard errors, and
the adjusted and unadjusted R2 values.
n =15
4. The following represents the true relationship between the independent variables
X1, X2, X3, and the dependent variable Y
Yi= bo+b1X1i+b2X2i+b3X3i+Ui
5. There are occasions when the two variable linear regression model assumes the following
form:
Yi=Xi+Ei
Where is the parameter and E is the disturbance term. In this model the intercept
term is zero. The model is therefore known as regression through the origin.
β=
∑XY i i
∑X =
2
2
σ ∑ ei
¿ 2
( β )= u 2 2 =
∑ Xi
2
Where Ei is normal with zero mean and unknown variance σ u , gave the following data:
Yi = 21.9 (Yi-Y )2 = 86.9
8 The quantity demanded of commodity is assumed to be a linear function of its price X. the
following results has been obtained from a sample of 10 observations.
Price in 15 13 12 12 9 7 7 4 6 3
Birr (x)
Quality in 760 775 780 785 790 795 800 810 830 840
kg
Introduction to Econometrics Page 93
Making use of the above information
i) Show that Cov(b0,b1) = ∑ xi2 under basic assumptions of the linear regression
model.
ii) Show that the estimated regression line passes through the mean values of X and Y.
iii) Assume that the above relationship is reduced into Y i = bo + Ui assuming the basic
assumptions of the linear regression model obtain the unbiased estimators of the mean
and variance of b0.
10. The following results have been obtained form a sample of 13 observations on quantity
demanded Y of a particular commodity and its corresponding price X
Y(Kg) 780 785 790 795 810 810 820 821 830 835 840 849 870
X(birr) 20 18 16 14 13 12 11 11 10 9 8 7 5
i) Estimate the linear demand function for the commodity and interpret your result.
ii) Compute the standard error of regression line and the coefficients
iii) Compute the price elasticity of demand at mean values.
iv) What is the part of variation in demand for the commodity that remains unexplained
by the regression line?
v) What is the explained proportion? What does it show?
vi) Forecast the quantity demand at price level of 13 Birr. Compare your result with the
value in the data.
11. There is a two-way causation in correlation analysis where as there is a one-way Causation
in regression analysis. Explain
12. Assume that X and Y are perfectly and linearly related in such as manner that all the points
in the scatter diagram would lie exactly on a regression line given by
Y= b0 + b1X . Show that the correlation coefficient (r) is either 1 or –1.
13. Assume that the quantity supplied of a commodity Y is a linear function of its price (X1)
and wage rate of labor used (X2) in the production of the commodity. The sample values are
summarized as follows
Y=1,282 Y2=132,670 X1Y=53,666
X1=545 X21=22,922 X2Y=5,707
14. The following table shows the levels of output (Y). labour in put (X1) and capital input (X2)
of 12 firms measured in arbitrary units.
X2=110 X21 =980 YX1=40,834
X1=647 X12=34,843 YX2=6,7100
Y=757 Y2=48,143 X1X2=5,783
i) Estimate the output function Y=bo+b1X1+b2X2+U
ii) What is the economic meaning of the coefficients
iii) Compute the standard errors of the coefficients
iv) Run tests of significance of the coefficients
v) Compute the coefficient of multiple determination and interpret it
vi) Conduct the overall significance test and interpret your result.
15. What are some of the problems in using R2 as a measure of goodness of fit? Compare and
contrast R2 and the corrected R2 on the basis of sample size and parameters to be estimated
from a particular model.
16. Assume that X and Y are perfectly and linearly related in such as manner that all the points
in the scatter diagram would lie exactly on a regression line given by Y= b 0 + b1X. Show
that the correlation coefficient (r) is either 1 or –1.
The aim of this unit is to show the reader what is meant by violation of basic econometric assumption that formed
the basis of the classical linear regression model. After the student have completed this unit he/she will
understand:
the sources of the variation
consequences of the problem
the various ways of detecting the problem
the alternative approaches in solving the problem
4.1 introduction
It can be shown that by using these observations we would get a bad estimate of the true line. If
the true line lies below or above the observations, the estimated line would be biased
^
The above figure shows that the estimated line Y is not a good approximation to the true line,
E(Y)
Note that there is no test for the verification of this assumption because the assumption E(U) =
0 is forced upon us if we are to establish the true relationship. That is, we set E(U) = 0 at the
outset of our estimation procedure. Its plausibility should be examined in each particular case
on a priori grounds. In any econometric application we must be sure that the following things
are fulfilled so as to be safe from violating the assumption of E(U) = 0
In figure (b) we picture the case of (monotonically) increasing variance of U i’s: as X increases,
so does the variance of U. This is a common form of hetrodcedasticity assumed in econometric
Cons
Income
Low High
income income
Note, however, that the problem of hetroscedasticity is the problem of cross-sectional data
rather than time series data. That is, the problem is more serious on cross section data.
B) Causes of Hetroscedasticity
Hetrodcedasticity can also arise as a result of several cases. The first one is the presence of
outliers (i.e., extreme values compared to the majority of a variable). The inclusion or exclusion
of such an observation, especially if the sample size is small, can substantially alter the results
Another source of hetrodcedasticity arises from violating the assumption that the regression
model is correctly specified. Very often what looks like hetroscedasticity may be due to the fact
that some important variables are omitted from the model. In such situation the residuals
obtained from the regression may give the distinct impression that the error variance may not
be constant. But if the omitted variables are included in the model, the impression may
disappear.
In summary we may say that on a priori grounds there are reasons to believe that the
assumption of homoscedasticity may often be violated in practice. It is therefore, important to
examine the consequences of hetroscedaticity.
i) If U is hetroscedastic, the OLS estimates do not have the minimum variance property in
the class of unbiased estimators; that is, they are inefficient in small samples.
Furthermore, they are inefficient in large samples
ii) The coefficient estimates would still be statistically unbiased. That is the expected value
^2
Figure 4.4 Relationship between U i and X
In figure (a) we see that there is no systematic relationship between the two variables,
suggesting that perhaps no hetrodcedasticity is present in the data. Figure (b) and (c) however,
suggests a linear relationship between the two variables particularly figure (c) reveals or
suggests that the hetroscedastic variance may be proportional to the value of Y or X. Figure (d)
^2
and (e) indicate a quadratic relationship between U i and
Y^ i or X. This knowledge may help us
in transforming our data in such a manner that in the regression on the transformed data the
variance of the disturbance is homoscedastic. Note that this visual inspection method is also
known as the informal method. The following tests follow formal method.
Yi = βo + β1Xi + Ui .......................................(4.4)
Ŷ= 1992.34 + 0.23 Xi
S.e= (936.48) (0.09)
t = (2.13) (2.33) r2 = 0.44
Suppose that the residuals obtained from the above regression, were regressed on Xi as
suggested in (4.3), giving the following results.
As shown in the above result (t value) the coefficient of lnXi is not significant. That is, there is
no statistically significant relationship between the two variables. Following the Park test, one
may conclude that there is no hetroscedasticity in the error variance.
Although empirically appealing, the park test has some problems. For instance the error term V i
entering into the (4.3) may not satisfy the OLS assumptions and may itself be hetrodcedastic.
Nonetheless, as strictly exploratory method, one may use the park test
[ ∑ d 2i
2
rs = 1 – 6 n (n −1)
] .............................................(4.5)
Where di = difference in the rank assigned to two different characteristics of the i th individual or
phenomenon and n = number of individuals or phenomena ranked. The steps required in this
test is stated as follows
Assume Yi = 0 + 1Xi + Ui
^
U
Step 1. Fit the regression to the data on Y and X and obtain the residuals i
^
U ^ ^
Step 2. Ignoring the sign of i , that is, taking their absolute value | U i |, rank both |U i |
and Xi (or
Y^ i ) according to an ascending or descending order and compute the
spearman’s rank correlation coefficient given previously, (4.5).
Step 3. Assuming that the population rank correlation coefficient s is zero and n > 8,
the significance of the sample rs can be tested by the t test as follows:
r s √ n−2
If the computed t value exceeds the critical t value, we may accept the hypothesis of
hetrodcedasticity; otherwise we may reject it. If the regression model involves more than one X
^
U
variable, rs can be computed between | i | and each of the X variable separately and can be
Example To illustrate the rank correlation test consider the regression Yi = β0 + β1Xi. Suppose
10 observations are used to this equation. The following table make use of the rank correlation
approach to test the hypothesis of hetroscedasticity. Notice that column 6 and 7 put rank of |Ûi|
and Xi in an ascending order.
d (difference
Observ- Rank of
Y X Ŷ Û=(Y- Ŷ) Rank of Xi between the two d2
ation Ûi
ranking)
TOTAL 0 110
rs =
[
110
1 – 6 10(100−1 )
]
= 0.33
Step I: the observations are ordered according to the magnitude of the independent variable
thought to be related to the variance of the disturbances.
Step II: a certain number of central observations (represented by c) are omitted, leaving two
equal-sized groups of observations, one group corresponding to low values of the chosen
independent variable and the other group corresponding to high values. Note that the
observations are omitted to sharpen or accentuate the difference between the small variance and
the large variance group.
Step III. we fit separate regression to each sub-sample, and we obtain the sum of squared
residuals from each of them and the ratio of their sum of squared residuals is formed. That is,
^2
U i = residuals form the sub-sample of low values of X 1 with [(n-c)/2] – k degrees of
freedom, where k is the total number of parameters in the model.
∑ U 22 /[ {( n−c ) /2 }−k ] = ∑ U^ 22
F = ∑ U 1 / [ { ( n−c ) /2 }−k ] ∑ 1
* ^2 U^ 2
.........................................(4.7)
has an F distribution (with numerator and denomenator each [{n-c-2k}/2] degrees of freedom,
where n = total number of observations, c = central observations omitted, k = number of
parameters estimated from each regression). If the two variances are the same (that is, if the
U^ ' s are homoscedasticc) the value of F* will tend to one. If the variance differ, F* will have a
^2 ^2
large value (given that by the design of the test U 2 > U 1 . Generally, the observed F* is
compared with the theoretical value of F with (n-c-2k)/2 degrees of freedom (at a chosen level
of significance. The theoretical value of F (obtained from the F-tables) is the value of F that
defines the critical region of the test.
If F* > F we accept that there is hetroscedasticity (that is we reject the null hypothesis of no
difference between the variances of U’s in the two sub samples). If F * < F, we accept that the
U’s are homoscedastic (in other words we accept the null hypothesis). The higher the observed
F* ratio the stronger the hetrodcedasticity of the U’s.
Example: Suppose that we have data on consumption expenditure in relation to income for a
cross section of 30 families. Suppose we postulate that consumption expenditure is linearly
related to income but that hetoscedasticity is present in the data. Suppose further that the
middle 4 observations are dropped after the necessary reordering of the data. Suppose we
obtain the following result after we perform a separate regression based on the two 13
observations.
[ ]
1536 .8
11
377 .17
F* = 11
Note, however, that the ability of the Goldfeld-Quadent test to perform successfully depends on
how c is chosen. Moreover, its success depends on identifying the correct X (i.e., independent)
variable with which to order the observations. This limitation of this test can be avoided if we
consider the Breusch-Pagan –Godfrey (BPG) test.
that is, i2 is some function of the non-stochastic variables Z’s. some or all of the X’s can serve
as Z’s. Specifically, assume that
i2 = 0 + 1Z1i + … + mZmi ..........................................(4.10)
Step1.Estimate Yi = 0 + 1X1i +…+ kXki + Ui by OLS and obtain the residualsU 1 ,U 2 ,…,
^ ^ U^ n
~
σ 2=∑ U^ 2i /n . Note that this is the maximum likelihood estimator of 2.
Step2. Obtain
Example: Suppose we have 30 observations data on Y and X that gave us the following
regression result.
~
σ 2=∑ U^ 2i /30 =
2361 .15
Step 2 30 = 78.71
^ 2 ~2
Step 3 Pi = U i / σ HAT IS DERIDE THE RESIDUALS Û OBTAINED FROM
REGRESSION IN STEP 1 BY 78.71 TO CONSTRUCT THE VARIABLE PI.
From the Chi Square table we find that for 1 df the 5% critical Chi square value is 3.84. Thus,
the observed Chi square value is significant at 5% level of significance.
Note that BPG test is asymptotic. That is, it is a large sample test. The test is sensitive in small
samples with regard to the assumption that the disturbances Vi are normally distributed.
It is believed that the variance of U i is proportional to the square of the explanatory variable X,
one may transform the original model as follows. Divide the original through by Xi to obtain
Yi β0 Ui
= + β 1+
Xi Xi Xi
= 0
( )
1
Xi
+ 1 + Vi ............................................... (4.11)
where Vi is the transformed disturbance term, equal to Ui/Xi. Now it is easy to verify that
( )
2
Ui
Xi
E(Vi2) = E
1
= 2
E( U 2i )
Xi
Given this it can be concluded that
1
2
E( U 2i )
Xi = 2
Assumption two: Given the model Yi = 0 + 1Xi + Ui suppose that we assume the error
variance to be proportional to Xi. That is,
E(Ui2) = 2Xi
In this case the original model can be transformed by dividing the model with √ X i . That is,
Yi β0 Xi Ui
= +β +
√ Xi √ Xi √ Xi √ Xi
1
β0 + β √ X +V
= √ x i 1 i i = Y* = 0* + 1*Xi + Vi ............................(4.12)
where Vi =
U i √ X i and X > 0
i
Given assumption 2, one can readily verify that E(Vi2) = 2, a homoscedastic situation. That is
[√ ]
2
Ui
E
2 Xi
Var (Vi) = E(Vi ) =
1
X
= i E(Ui2)
Since by assumption we said E(Ui2) = 2Xi. It implies that
1
X
Var (Vi) = i 2Xi = 2
Therefore, one may proceed to apply OLS to the transformed equation. Note an important
feature of the transformed model: It has no intercept term. Therefore, one will have to use the
Introduction to Econometrics Page 112
regression through the origin model to estimate 0 and 1. Having run regression on the
transformed model (4.12) one can get back to the original model simply by multiplying it with
√ Xi
Assumption three: A log transformation such as
lnYi = 0 + 1lnXi + Ui
Very often such transformation reduces hetrodcedasticity when compared with the regression
Yi = 0 + 1Xi + Ui
This result arises because log transformation compresses the scales in which the variables are
measured. For example log transformation reduces a ten-fold difference between two values
(such as between 8 and 80) into a two-fold difference (because ln 80 = 4.32 and ln 8 = 2.08)
To conclude, the remedial measures explained earlier through transformation point out that we
are essentially speculating about the nature of i2. Note, also that the OLS estimators obtained
from the transformed equation are BLUE. Which of the transformation discussed will work will
depend on the nature of the problem and the severity of hetroscedasticity. Moreover, we may
not know a priori which of the X variable should be chosen for transformation the data in case
of multiple regression model. In addition log transformation is not applicable if some of the Y
and X values are zero or negative. Besides the use of t-test, F tests, etc are valid only in large
samples when regression is conducted in transformed variables.
1. State with brief reason whether the following statements are true, false, or uncertain
a) In the presence of hetroscedasticity OLS estimators are biased as well as
inefficient
b) If hetroscedasticity is present, the conventional t and F tests are invalid
2. State three consequences of hetroscedasticity
3. List and explain the BPG test
4. Suppose that you have data of personal saving and personal income of Ethiopia for 31
year period. Assume that graphical inspection suggest that Ui's are hetroscedasticso so
that you wanted to employ the Gordfield Quandt test. Suppose you ordered the
^2
U 1 = 144,771.5
b) For Sub set II
S^ 2 = 1141.07 + 0.029I
^2
U 2 = 769,899.2
Is there any evidence of hetroscedasticity?
4.4 AUTOCORRELATION
A. The Nature of Autocorrelation
An important assumption of the classical linear model is that there is no autocorrelation or
serial correlation among the disturbances U i entering into the population regression function.
This assumption implies that the covariance of Ui and Uj in equal to zero. That is:
Cov(UiUj) = E{[Ui – E(Ui)] [Uj – E (Uj)]
= E(UiUj) = 0 (for i j)
But if this assumption is violated, it implies that the disturbances are said to be auto correlated.
This could arise for several reasons.
i) Spatial autocorrelation: In regional cross-section data, a random shock affecting
economic activity in one region may cause economic activity in an adjacent region to
change because of close economic ties between the regions. Shocks due to weather
similarities might also tend to cause the error terms between adjacent regions to be related.
ii) Prolonged influence of shocks: In time series data, random shocks (disturbances) have
effects that often persist over more than one time period. An earth quick, flood, strike or
war, for example, will probably affect the economy’s operation in periods.
iii) Inertia: past action often have a strong effect on current actions, so that a positive
disturbance in one period is likely to influence activity in succeeding periods.
Since auto correlated errors arise most frequently in time series models, the discussion in the
rest of this unit is couched in terms of time series data.
There are a number of time-series patterns or process that can be used to model correlated
errors. The most common is what is known as “ the first order autoregressive process or AR(1)
process. Consider
Yt = 0 + 1Xt + Ut
where t denotes data or observation at time t (i.e., a time series data) with this one can assume that the disturbances
are generated as follows
Ut = Ut-1 + t
Where is known as the coefficient of auto covariance and where t is the stochastic such that
it satisfies the standard OLS assumptions, namely
E(t) = 0
Var(t) = 2
Cov (t, t+s) = 0
where subscript ‘s’ represent the exact period of lag.
The above specification is of first order because the regression of U t is on itself lagged one
period (where the coefficient is the first order coefficient of autocorrelation) Note that the
above specification postulates that the movement or shift in Ut consists of two parts: a part
Ut-1, which accounts for systematic shift, and the other t which is purely random.
Relationships between Ut’s
Cov (Ut, Ut-1) = E[(Ut – E(Ut) (Ut-1 – E(Ut-1)]
= E[Ut Ut-1]
by substituting Ut = Ut-1 + t we obtain:
E[(Ut-1 + t) Ut-1]
= E[U2t-1] + E[t Ut-1]
Note that E(t) = 0 thus E(t Ut-1) = 0
Since with the assumption of homoscedasticity (i.e., constant variance) Var(U t) = Var (Ut-1) =
u2 the result would be
ρσ 2u
2
=ρ
= σu
where -1 < < 1
Hence, (rho) is simple correlation of the successive errors of the original model.
Note that when > 0 successive errors are positively correlated and when < 0 successive
errors are negatively correlated. It can be shown that corr(U t, Ut-s) = s (where s represents the
exact period of lag). It implies that the correlation (be it negative or positive) between any two
period diminishes as time goes by; i.e., as s increases
b) Consequences of Autocorrelation
When the disturbance term exhibits serial correlation the value as well as the standard errors of
the parameter estimates are affected.
i) If disturbances are correlated, the prevaild value of the disturbances have some information
to convey about the current disturbances. If this information is ignored it is clear that the
sample data is not being used with maximum efficiency. However the estimates of the
parameters do not have the statistical biase even when the residuals are serially correlated.
That is, the parameter of OLS estimates are statistically unbiased in the sense that their
expected value is equal to the true parameter.
ii) The variance of the random term U may be seriously underestimated. In particular, the
under estimation of the variance of U will be more serious in the case of positive
autocorrelation of the error term (U t). With positive first-order auto correlated errors it
implies that fitting an OLS estimating line clearly gives an estimate quite wide of the mark.
The high variation in these estimates will cause the variance of OLS to be greater than it
would have been had the errors been distributed randomly. The following figure illustrates
positive autocorrelated errors
Notice from the diagram that the OLS estimating line gives a better fit to the data than the true
relationship. This reveals why in this contest r 2 is overestimated and u2 (and the variance of
^
OLS) is under estimated. When the standard error of β ' s are biased down wards, it leads to
confidence intervals which are much narrow. Moreover, parameter estimate of irrelevant
explanatory variable may be highly significant. In other words, the figure reveals that the
iii) The prediction based on ordinary least squares estimate will be inefficient with
outocorrelated errors. This is because of having a larger variance as compared with predictions
based on estimates obtained from other econometric techniques. Recall that the variance of the
forecast depends on the variances of the coefficient estimates and the variance of U. Since these
variances are not minimal as compared with other techniques, the standard error of the forecast
(from OLS) will not have the least value, due to autocorrelated U’s.
the residual
U^ t which can be obtained form the usual OLS procedure. The examination of U^ t
can provide useful information not only about autocorrelation but also about hetrescedasticity,
model inadequacy, or specification bias.
i) Graphical Method
Some rough idea about the existence of autocorrelation may be gained by plotting the residuals
either against time or against their own lagged variables.
For instance, suppose plotting the residual against its lagged variable bring about the following
relationship.
U^ t
U^ t−1
Figure 4.9
U^ t and U^ t−1
As the above figure reveals most of the residuals are bunched in the first and the third
quadrants suggesting very strongly that there is positive correlation in the residuals. However,
the graphical method we have just discussed is essentially subjective or qualitative in nature.
But there are quantitative tests that can be used to supplement the purely qualitative approach
The Durbin-Watson test tests the hypothesis that H 0: = 0 (implying that the error terms are
not autocorrelated with a first order scheme) against the alternate. However, the sampling
distribution for the d-statistic depends on the sample size n, the number of explanatory
variables k and also on the actual sample values of the explanatory variables. Thus, the critical
The Durbin-Watson test procedure in testing the null hypothesis of = 0 against the alternative
hypothesis of positive autocorrelation is illustrated in the figure below.
Note that under the null hypothesis the actual sampling distribution of d, for the given n and k
and for the given sample X values is shown by the unbroken curve. It is such that 5 percent of
the area beneath it lies to the left of the point d *, i.e., P(d < d*) = 0.05. If d* were known we
would reject the null hypothesis at the 5 percent level of significance if for our sample d < d *.
Unfortunately, for the reason given above, d* is unknown. The broken curve labeled DL and du
represent for given values of n and k, the upper and lower limits to the sampling distribution of
d with in which the actual sampling distribution must lie whatever the sample x-values.
du
d
dL
d*L d* d*u 4
Note that tables for d*U and d*L are constructed to facilitate the use of one-tail rather than two
tail tests. The following representation explains better the actual test procedure which shows
that the limit of d are 0 and 4.
Note:
H0: No positive autocorrelation
H0*: No Negative autocorrelation
d) Remedial Measure
Since in the presence of serial correlation the OLS estimators are inefficient, it is essential to
seek remedial measure.
If the source of the problem is suspected to be due to omission of important variables, the
solution is to include those omitted variables. Besides if the source of the problem is believed
to be the result of misspecification of the model, then the solution is to determine the
appropriate mathematical form.
If the above approaches are ruled out, the appropriate procedure will be to transform the
original data so that we can comeup with a new form (or model) which satisfies the assumption
of no serial correlation. Of course, the transformation depends on the nature of the serial
correlation. If the nature of serial correlation is assumed to follow the first-order autoregressive
scheme, namely,
Ut = Ut-1 + t I . ....................................................(4.15)
In this case the serial correlation problem can be satisfactorily resolved if , the coefficient of
autocorrelation, is known.
Consider the following two variable model
Yt = 0 + 1Xt + Ut .......................................................(4.16)
For time t-1 the above model will be
Step 4: Since a priori it is not known that the ρ^ obtained from the regression in step 2 is the
^¿ ^¿
best estimate of , substitute the values of β 0 and β 1 obtained from the regression in
^ **
step 3 into the original regression (4.21) and obtain the new residuals, say U t as
U^ ** ^¿ ^¿
t = Yt - β 0 - β 1 Xt
^¿ ^¿
Note that this can be easily computed since Yt, Xt, β 0 and β 2 are all known.
Step 5: Now estimate this regression
U^ ** ^^ U^ **
t = ρ t−1 + Wt
M = -2461 + 0.28G
^2
U t = 573, 069
4.5 MULTICOLLINEARITY
a) The nature of the problem
One of the assumption of the classical linear regression model (CLRM) is that there is no perfect multicollinearity
among the regressors included in the regression model. Note that although the assumption is said to be violated
only in the case of exact multicollinearity (i.e., an exact linear relationship among some of the regressors), the
presence of multicollinearity (an approximate linear relationship among some of the regressors) lead to estimating
problems important enough to warrant out treating it as a violation of the classical linear regression model.
Multicollinearity does not depend on any theoretical or actual linear relationship among any of
the regressors; it depends on the existence of an approximate linear relationship in the data set
at hand. Unlike most other estimating problems, this problem is caused by the particular sample
available. Multicollinearity in the data could arise for several reasons. For example, the
independent variables may all share a common time trend, one independent variable might be
the lagged value of another that follows a trend, some independent variable may have varied
together because the data were not collected from a wide enough base, or there could in fact
exist some kind of approximate relationship among some of the regressors.
Note that the existence of multicollinearity will affect seriously the parameter estimates.
Intuitively, when any two explanatory variables are changing in nearly the same way, it
becomes extremely difficult to establish the influence of each one regressors on the dependent
variable separately. That is, if two explanatory variables change by the same proportion, the
influence on the dependent variable by one of the explanatory variables may be erroneously
attributed to the other. Their effect cannot be sensibly investigated, due to the high inter
correlation.
ii) Because of consequence (i), the confidence interval tend to be much wider, leading to the
acceptance of the “Zero null hypothesis” (i.e., the true population coefficient is zero).
iii) Because of consequence (i), the t-ratio of one or more coefficient’s tend to be statistically
insignificant.
iv) Although the t-ratio of one or more coefficients is statistically insignificant, R 2, the overall
measure of goodness of fit, can be very high. This is the basic symptom of the problem.
v) The OLS estimators and their standard errors can be sensitive to small changes in the data.
That is when few observations are included, the pattern of relationship may change and
affect the result.
vi) Forecasting is still possible if the nature of the collinearity remains the same within the
new (future) sample observation. That is, if collinearity exists on the data of the past 15
years sample, and if collinearity is expected to be the same for the future sample period,
then forecasting will not be a problem.
c) Detecting Multicollinearity
Note that multicollinearity is a question of degree and not of a kind. The meaningful distinction
is not between the presence of multicolinearity, but between its various degrees.
Multicollinearity is a feature of the sample and not of the population. Therefore, we do not “test
i) High R2 but few significant t-ratios: If R 2 is high, say in excess of 0.8, the F-test in most
cases will reject the hypothesis that the partial slope coefficients are simultaneiously equal
to zero. But the individual t tests will show that none of very few of the partial slope
coefficients are statistically different from zero.
ii) High pair-wise correlation among regressors. If the pair-wise correlation coefficient
among two regressors is high, say in excess of 0.8, then multicolinearity is a serious
problem.
iii) Auxiliary Regression: - Since multicollinearity arises because one or more of the
regressors are exact or approximately linear combinations of the other regressors, one way
of finding out which X variable is related to other X variables is to regress each X i on the
remaining X variables and compute the corresponding R 2that will help to decide abut the
problem. For example, consider the following auxiliary regression :
Xk = 1X1 + 2X2 + … + k-1Xk-1 + V
If the R2 of the above regression is high it implies that X k is highly correlated with the rest
of the explanatory variables and hence drop Xk from the model.
d) Remedial Measures
The existence of multicolinearity in a data set doesnot necessarily mean that the coefficient
estimators in which the researcher is interested have unacceptably high variance. Thus, the
econometrician should not worry about multicollinearity if the R 2 from the regression exceeds
the R2 of any independent variable regressed on the other independent variables”. Moreover the
researcher should worry about multicollinearity if the t-statistics are all greater than 2. Because
multicollinearity is essentially a sample problem there are no infallible guides. However one
can try the following rules of thumb, the success depending on the severity of the collinearity
problem.
Now as the sample size increases, x1i2 will generally increases. Thus for any given r12, the
^
variance of β 1 will decrease, thus decreasing the standard error, which will enable us to
estimate 1 more precisely.
b) Drop a variable: - when faced with severe multicollinearity, one of the “simplest” thing to
do is to drop one of the collinear variables. But note that in dropping a variable from the
model we may be committing a specification bias or specification error. Specification bias
arises from incorrect specification of the model used in the analysis. Thus, if economic
theory requires some variables to be included in the model, dropping one of the variables
due to multicollinearity problem would constitute specification bias. This is because we
are dropping a variable when its true coefficient in the equation being estimated is not
zero.
c) Transformation of variables: - In time series analysis, one reason for high
multicollinearity between two variables is that over time both variables tend to move in
the same direction. One way of minimizing then dependence is to transform the variables.
That is, suppose Yt = 0 + 1X1t + 2X2t.
This relation must also hold at time t-1 because the origin of time is arbitrary anyway.
Therefore we have
Yt-1 = 0 + 1X1t-1 + 2X2t-1 + Ut-1.
Subtracting this from the above gives
Yt – Yt-1 = 1(X1t – X1t-1) + 2(X2t – X2t-1) + Vt
This is known as the first difference form because we run the regression, not on the original
variables, but on the difference of successive values of the variables. The first difference
regression model often reduces the severity of multicollinearity because, although the levels of
X1 and X2 may be highly correlated, there is no a priori reason to believe that their difference
will also be highly correlated
1. State with reasons whether the following statements are true, false or uncertain
a) Despite perfect multicollinearity, OLS estimators are BLUE
b) If an auxiliary regression shows that a particular R 2 is high, there is definite
evidence of high collinearity.
2. In data involving economic time series such as GDP, income, prices, unemployment, etc.
multicollinearity is usually suspected. Why?
3. State three remedial measure if multicollinearity is detected
4.6 SUMMARY
- In the presence of hetroscedasticity, the variance of OLS estimators are not provided by the
usual OLS formulas. But if we persist in using the usual OLS formula, the t and F tests
based on them can be highly misleading, resulting in erroneous conclusions.
- Autocorrelation can arise for several reasons that make OLS estimators to be inefficient.
The remedy depends on the nature of the interdependence among the disturbances, Ut
- Multicollinearity is a question of degree and not of a kind. Although there are no sure
methods of detecting collinearity, there are several indicators of it.
1 a) False. Because though OLS estimates are inefficient with the presence of
hetroscedasticity, they would still be statistically unbiased.
b) True, because OLS estimated do not have the minimum variance.
From the Durbin - Watson table, with 5 percent level of significance, n = 20 and K=1, we find
that d2 = 1.20 and du = 1.41. Since d* = 0.937 is less that d2 = 1.20, we conclude that there is
positive autocorrelation in the import function.
Answer To Check Your Progress 3
2. This is because the variables are highly interrelated. For example, an increase in income
brings about an increase in GDP. Moreover, an increase in unemployment usually brings
about a decline in prices.
3. Refer the text for the answer
4.8 MODEL EXAMINATION QUESTIONS
Carrying out the Goldfeld Quandt test of hetroscedasticity at the 5% level of significance.
Content
5.0 Aims and Objectives
5.1 Introduction
5.2 Models with Binary Regressors
5.3 Non-Linear Regression Models
5.3.1 Non-Linear Relationships in Economics
5.3.2 Specification and Estimation of Non-Linear Models
5.3.2.1 Polynomials
5.3.2.2 Log-log Models
5.3.2.3 Semi-log Models
5.3.2.4 Reciprocal Model
5.4 Summary
5.5 Answers to Check Your Progress
5.6 References
5.7 Model Examination Questions
This unit aims at introducing models with binary explanatory variable(s) and specification and
estimation of non-linear models.
5.1 INTRODUCTION
As it is mentioned in the previous section, this unit is dealing with the role of qualitative
explanatory variables in regression analysis and the functional forms of some non-linear
regressor models. It will be shown that the introduction of qualitative variables, often called
Now let us take some examples with a single quantitative explanatory variable and two or more
qualitative explanatory variables.
Example 1: Suppose a researcher wants to find out whether sex makes any difference in a
college teacher’s salary, assuming that all other variables such as age, education level,
experience etc are held constant.
Yi =
β0 + β1 D + U
i i
Consider the following hypothetical data on satisfying salaries of college teachers by sex
Starting salary Sex
(Y) (1 = male, 0 = female)
22,000 1
19,000 0
18,000 0
21,700 1
18,500 0
21,000 1
20,500 1
17,000 0
17,500 0
21,200 1
Y^ i = 18,000 + 3,280D
i
(0.32) (0.44)
t = (57.74) (7.439)
R2 = 0.8737
The above results shows that the estimated mean salary of female college teachers is birr
¿ ¿ ¿
Since β 1 is statistically significant, the results indicate that the mean salaries of the two
categories are different, actually the female teacher’s average salary is lower than her male
counter part. If all other variables are held constant, there is sex discrimination in the salaries of
the two sexes.
Salary
¿ ¿
(β0 +β1 )
= 21,800
= 3,280
¿
β 0 = 18,000
Yi =
β0 + β1 D +β2 X + U
i i i
The female teacher is known as the base category since it is assigned the value of 0.
Note that the assignment of 1 and 0 values to two categories, such as male and female, is
arbitrary in the sense that in our example we could have assigned D = 1 for female and D = 0
for male. But in interpreting the results of the models which use the dummy variables it is
critical to know how the 1 and 0 values are assigned.
The coefficient
β 0 (intercept) is the intercept term for the base category. The coefficient β 1
attached to the dummy variable D can be called the differential intercept coefficient because it
tells by how much the value of the intercept term of the category that receives the value of 1
differs from the intercept coefficient of the base category.
The other important point is on the number of dummy variables to be included in the model. If
a qualitative variable has m categories, introduce only m–1 dummy variables. In the above
examples, sex has two categories, and hence we introduced only a single dummy variable. If
this rule is not followed, we shall fall in to what might be called the dummy-variable trap, that
is, the situation of perfect multicollinearity.
Example 3: Let us take an example on regression on one quantitative variable and one
qualitative variable with more than two classes. Suppose we want to regress the annual
expenditure on health care by an individual on the income and education of the individual. Now
the variable education is qualitative in nature. We can have, as an example, three mutually
exclusive levels of education.
- Less than high school
- High school
- College
The number of dummies = 3 – 1 = 2. (Note the rule)
Yi =
β0 + β1 D +β2 D + β3 X + U
1i 2i i i
E(Yi/D1 = 0, D2 = 0, Xi) =
β 0 + β 3 X , for less than high school education
i
E(Yi/D1 = 1, D2 = 0, Xi) = (
β 0 + β 1 ) + β 3 X for high school education
i,
E(Yi/D1 = 0, D2 = 1, Xi) = (
β 0 + β 2 )+ β 3 X , for college education
i
College education
β2
β1
β0
X (income)
Figure 5.2: Expenditure on health care in relation to income for three levels of education
The intercept
β 0 is the intercept of the base category. The differential intercepts β 1 and β 2 tells
by how much the intercepts of the other two categories differ from the intercept of the base
category.
The purpose of this section is to introduce you with models that are linear in the parameters but
non linear in the variables.
The assumption of linear relationship between the dependent and the explanatory variables may
not be acceptable for many economic relationships. Given the complexity of the real world we
expect non-linearities in most economic relationships.
ATC ATC
Quantity
of output
Product
(Y) TP
Input
Other economic functions like demand, supply, income-consumption curves, etc can also be
non-linear.
E(Yi/D1 = 1, D2 = 0, Xi) = (
β 0 + β 1 ) + β 3 X for high school education
i,
Example 1: Y =
β0 + β1 X + β2 X 2 + β3 X 3 + … + U
1 1 1
C=
β 0 + β 1 X - β 2 X2 + β 3 X3 +U
Where, C = total cost; X = output
To fit this model we need to transform some of the variables
Let X2 = Z and X3 = W, U = error term
Then the above model becomes
C=
β0 + β1 X -β2 Z + β3 W + U
Now we can proceed with the application of OLS to the above linear relationship
Example 2. Suppose we have data on yield of wheat and amount of fertilizer applied. Assume
that the increased amount of fertilizer begin to burn the crop causing the yield to decline.
Y X X2
55 1 1
70 2 4
75 3 9
65 4 16
60 5 25
we want to fit the second degree equation
Yi =
β0 + β1 X + β2 X 2 + U
1i 1i i
2
Let X 1 i = W
Then Yi =
β0 + β1 X + β2 W + U
1i i
This is linear both in terms of parameters and variables. We apply OLS to the above function.
The results are presented as follows:
Y^ i = 36 + 24.07X – 3.9 X i2
i
2
It is possible to test the significance of X i
H0: β 2 = 0
H1: β 2 < 0
¿
β2 −β 2 −3 . 90
¿ =
t= S( β ) 1 .059
2 = -3.71
t0.05(5-3) = 2.92
2
Decision: we reject H0: since β 2 is significant, X i should be retained in the model. This
implies that the relationship between yield and amount of fertilizer has to be estimated by
second degree equation.
lnYi = ln
β 0 + β 1 lnX + β 2 lnX
1 2
Since both the dependent and the explanatory variables are expressed in terms of logarithm, the
model is known as double-log or log-log or log-linear model.
lnYi = ln
β 0 + β 1 lnX + β 2 lnX + U
1 2
This model is linear in the parameters and can be estimated by OLS if the assumptions of the
classical linear regression model are fulfilled.
¿
Y* =
β 0 + β 1 X ¿1 + β 2 X ¿2 +U Which is linear both in terms of the parameters and variables.
i
Example 3: The following table shows the yearly outputs of an industry and the amount of
inputs (labor and capital) used for eight firms.
applied work, is that the coefficient β 1 and β 2 measure the elasticity of output with respect to
L and K (labor and capital).
¿
β 1 = 0.4349 – implies that a one percent increase in labor input will result a 0.4349% increase
in the output level assuming that capital is held constant.
¿
β 2 = 0.3395 – implies that a one percent increase in the amount of capital will increase the
level of output by 0.3395 percent assuming that labor is constant.
Note that the sum of elasticities ( β 1 + β 2 ) indicate the type of returns to scale. The returns to
scale show the responsiveness of output when all inputs are charged proportionately
695……………43 β 0 + β 1 lnX + U
Or lnY = Ln 1
724 …………...38
812 ………..….36 b) Interpret β 1
887……………28
991……………23
1186…………..19
1940…………..10
Example 1: 1. lnYi =
β0 + β1 X + U
i i
2. Yi = 0 + 1 lnXi + Ui
The above models are called semilog models. We call the first model log-lin model and the
second model is known as lin-log model. The name given to the above models is based on
whether the dependent variable or the explanatory variable is in the log form.
lnYi =
β0 + β1 X + U
i i
relative change in Y
β 1 = absolute change in X
Multiplying the relative change in Y by 100 will give you the percentage change in Y for an
absolute change in X.
Example: ln
G N^ P t = 6.96 + 0.027T
Where GNP = real gross
(0.015) (0.012) national product
T – time (in years)
r2 = 0.95
F1.13 = 260.34
The above result shows that the real GNP of the country was growing at the rate of 2.70 percent
per year (for the sample period). It is possible to estimate a linear trend model
G N^ P t = 1040.11 + 35 T
(18.9) (2.07)
r2 = 0.95
F1.13 = 284.7
This model implies that for the sample period the real GNP was growing at the constant
absolute amount of about $35 billion a year. The choice between the log-lin and linear model
will depend up on whether one is interested in the relative or the absolute change in the GNP.
Y=
β 0 + β 1 Z + U which is linear both in terms of the parameters and variables.
i
( 1
)
The above model shows that as X increases indefinitely, the term β x approaches zero and
1
Y Y Y
x
β0 0
x
0 x - 0 β1
(a) (b) (c)
β0 β0
−b1
Introduction to Econometrics
b0 Page 146
β
Figure 5.4: the reciprocal model Yi = 0 + β 1 x
(1)
We can have examples for each of the above functions (fig. a, b and c)
1. The average fixed cost curve relates the average fixed cost of production to the level of
output. As it is indicated in fig. (a) the AFC declines continuously as output increases.
2. The Philips curve which relates the unemployment rate with the rate of inflation can be a
good example for fig (b) above
3. The reciprocal model of fig (c) is appropriate Engel expenditure curve that relates a
consumer’s expenditure on a commodity to his total expenditure or income.
Linear Y=
β0 + β1 X β1 β1 y
x
( )
β x)
( y
Log-linear
β
lnY = 0 + β 1 lnX 1 β1
Log-lin
β
lnY= 0 + β 1 X β 1 (y) β 1 (x)
β x)
( β y)
(
1 1
Lin-log Y=
β 0 + β 1 lnX 1 1
( ) ( ) β ( xy )
1 1 1
Reciprocal
β β
Y= 0 + 1 x -β x 1
2
- 1
Note that if the value of x and y are not given elasticity is often calculated at the mean values,
x̄ and ȳ .
5.4 SUMMARY
Dummy variables
Variables, which assume such 0 and 1 values, are called dummy variables
Binary variables.
Qualitative variables
Categorical variables
Dichotomous variables.
-The Polynomials
Some of the most common forms of non-linear economic relationships can be expressed by
polynomials.
Example: Y =
β0 + β1 X + β2 X 2 + β3 X 3 + … + U
1 1 1
lnYi = ln
β 0 + β 1 lnX + β 2 lnX + U
1 2
lnYi =
β 0 + β 1 X + U - the log-lin model
i i
-Reciprocal Models
The functions defined as
β1
Yi =
β 0 + x i + U is known as a reciprocal model.
i
5.2.1
1 a) Yi =
β0 + β1 X + U
i
b) E(Yi/Xi = 0) =
β0
E(Yi/Xi = 1) =
β 0 +β1
2. a) Number of dummy variables = No of categories –1 = 3 – 1 = 2
b) Yi =
β0 + β1 D + β2 D + β3 X + U
1i 2i i i
c) E(Yi/Di = 1, D2 = 0, Xi) =
β0 + β1 + β3 X
i
E(Yi/D1 = 0, D2 = 0, Xi) =
β0 +β3 X
i
E(Yi/D1 = 0, D2 = 1, Xi) = (
β0 +β2 ) + β3 X
i
5.3.2
a) lnY = 9.121 – 0.69 ln X
(10.07) (0.02)
R2 = 0.992
b) An increase in price by one percent will decrease the demand for the commodity by
0.69 percent.
5.3.3
(a) fig (b)
(b) –1.43 is the wage floor. It shows that as X increases indefinitely the percentage decrease
in wages will not be more than 1.43 percent per year.
5.6 REFERENCES
3. The following table gives data on annual percentage change in wage rates(Y) and the
unemployment rate (X) for a country for the period 1950 – 1966.
Percentage increase Unemployment (%)
Yt = Yi =
β0 + β1 ( )
1
xt
+ Ut
ln
G N^ P t = 6.96 + 0.027T r2 = 0.95
(0.015) (0.0017) F1,13 = 260.34
and that of the linear trend model
G N^ P t = 1040.11 + 35 T r2 = 0.95
(18.86) (2.07) F1,13 = 284.74
Which model do you prefer? Why?
5. The demand function for coffee is estimated as follows
Y^ t = 2.69 – 0.4795 X r2 = 0.6628
t
(0.1216) (0.1140)
where Yt = cups per person per day Xt = average retail price of coffee
Find the price elasticity of demand.
Contents
6.0 Aims and Objective
6.1 Introduction
6.2. Simultaneous Dependence of Economic Variables
6.3 Identification Problem
6.4 Test of Simultaneity
6.5 Approaches to Estimation
6.6 Summary
6.7Answer to Check Your Progress
6.8 Model Examination
6.9 Summary
The purpose of this unit is to introduce the student very briefly about the concept of
simultaneous dependence of economic variables. Thus, when the student have completed this
unit he/she will:
understand the concept of simultaneous equation
distinguish between endogenous and exogenous variables in a model
be able to derive reduced form equation from structural equations
understand the concept of under identified, identified and over identified equations
be able to conduct test of simultaneity
The application of least squares to a single equation assumes, among others, that the
explanatory variables are truly exogenous, that there is one-way causation between the
dependent variable (Y) and the explanatory variables (X). That is, the function cannot be
treated in isolation as a single equation model but belongs to a wider system of equations which
describes the relationship among all the relevant variables. In such cases we must use a multi
equation model which would include separate equations in which y and x would appear as
endogenous variables. A system describing the joint dependence of variables is called a system
of simultaneous equations.
In a single equations discussed in the previous units the cause and effect relationship is
unidirectional where the explanatory variables are the cause and the dependent variable is the
effect.
However, there are situations where there is a two-way flow of influence among economic
variables; that is, one economic variable affects another economic variable(s) and is, in turn,
affected by it (them). In such case we need to consider two equations and thus come up with
simultaneous equation models in which there is more than one regression equations for each
independent variable.
The first thing we need to answer is the question of “what happens if the parameters of each
equation are estimated by applying, say, the method of OLS, disregarding other equations in
the system? Recall that one of the crucial assumptions of the method of OLS is that the
explanatory X variables are either non stochastic or if stochastic (random are distributed
independently of the stochastic distribution term. If neither of these conditions is met, then, the
least-squares estimators are not only biased but also inconsistent; that is, as the sample size
increases indefinitely, the estimators do not converge to their true (population) values.
Example. Recall that price of a commodity and the quantity (bought and sold) are determined
by the intersection of the demand and supply curves for that commodity. Consider the
following linear demand and supply models.
Demand function
Q dt = + P + U ……………………………...(6.3)
0 1 t 1t
Supply function
Q ts = + P + U …………………………………(6.4)
0 1 t 2t
Equilibrium Condition
Q dt = Q ts ……………….………………………..(6.5)
d s
Where Q t = Quantity demanded, Q t = Quantity supplied, P = price and t = time
Note that P and Q are jointly dependent variables. If U 1 changes because of changes in other
d
variables affecting Q t (such as income and tastes) the demand shifts. Recall that such shift in
demand changes both P and Q. Similarly, a change in U 2t (because of changes in weather and
the like) will shift (affect) supply, again affecting both P and Q. Because of this simultaneous
dependence between Q and P, U1 and Pt in (6.3) and U2t and Pt is (6.4) cannot be independent.
Therefore a regression of Q on P as in (6.3) would violate an important assumption of the
classical linear regression model, namely, the assumption of no correlation between the
explanatory variable(s) and the disturbance term. In summary, the above discussion reveals that
in contrast to single equation models, in simultaneous equation models more than one
dependent, or endogenous, variable is involved, necessitating as many equations as the number
of endogenous variables. As a consequence such an endogenous explanatory variable becomes
stochastic and is usually correlated with the disturbance term of the equation in which it
appears as an explanatory variable.
Both equation 6.6 and 6.7 are structural or behavioral equations because they are portraying the
structure of an economy, where equation (6.7) being an identity. The ’s are known as the
structural parameters or coefficients. From the structural equations one can solve for the
endogenous variables and derive a reduced-form equations and the associated reduced form
If equation (6.6) is substituted into equation (6.7), and solve for Y we obtain the following
β0 1 Ut
Yt = 1−β 1 +
1−β 1 I + 1−β 1
t
= 0 + 1It + Wt ……………………………..(6.8)
β0 1 Ut
where 0 = 1−β 1 , 1 =
1−β 1 and W = 1−β 1
t
Notice an interesting feature of the reduced-form equations. Since only the predetermined
variables and stochastic disturbances appear on the right side of these equations, and since the
predetermined variables are assumed to be uncorrelated with the disturbance terms, the OLS
method can be applied to estimate the coefficients of the reduced-form equations (the ’s). This
will be the case if a researcher is only interested in predicting the endogenous variables, only
wishes to estimate the size of the multipliers (i.e. the ’s)
Note that the identification problem is a mathematical (as opposed to statistical) problem
associated with simultaneous equation systems. It is concerned with the equation of the
possibility or impossibility of obtaining meaningful estimates of the structural parameters.
An identified equation may be either exactly (or fully or just) identified or over identified. It is
said to be over identified if more than one numerical value can be obtained for some of the
parameters of the structural equations. The circumstances under which each of these cases
occurs will be shown in the following discussion.
a) Under Identification
Consider the demand-and-supply model (6.3) and (6.4), together with the market clearing or
equilibrium, condition (6.5) that demand is equal to supply. By the equilibrium condition (i.e.,
Q dt = Q ts ) we obtain,
α 1 U 2t −β 1 U 1 t
Wt = α 1−β 1
Note that 0 and 1, (the reduced-form-coefficients) contain all four structural parameters; 0,
1, 0 and 1. But, there is no way in which the four structural unknowns can be estimated from
only two reduced form coefficients. Recall from high school algebra that to estimate four
unknowns we must have four (independent) equations, and in general, to estimate k unknowns
we must have R (independent) equations. What all this means is that, given time series data on
p(price) and Q(quantity) and no other information, there is no way the researcher guarantee
whether he/she is estimating the demand function or the supply function. That is, a given P t and
Qt represent simply the point of intersection of the appropriate demand and supply curves
because of the equilibrium condition that demand is equal to supply.
β2 U 2t −U 1t
2 = α 1 −β 1 , Vt = α 1−β 1
Substituting the equilibrium price (6.16) into the demand or supply equation of (6.13) or (6.14)
we obtain the corresponding equilibrium quantity:
Qt = 3 + 4It + sPt-1 + Wt …......................................……..(6.17)
where the reduced-form coefficients are
α 1 β 0 −α 0 β 1 α2 β 1
3 = α 1 −β 1 , 4 = α 1 −β 1
α1 β 2 α 1 U 2t −β 1 U 1 t
5 = α 1 −β 1 , Wt = α 1−β 1
the demand-and-supply model given in equations (6.13) and (6.14) contain six structural
coefficients 0, 1, 2, 0, 1, and 2 – and there are six reduced form coefficients - 0, 1, 2,
3, 4 and 5 – to estimate them. Thus, we have six equations in six unknowns, and normally
we should be able to obtain unique estimates. Therefore, the parameters of both the demand
and supply equations can be identified and the system as a whole can be identified.
c) Over identification
Note that for certain goods and services, wealth of the consumer is another important
determinant of demand. Therefore, the demand function (6.13) can be modified as follows,
keeping the supply function as before:
Demand function: Qt = 0 + 1Pt + 2It + 3Rt + U1t ……………….(6.18)
Supply function: Qt = 0 + 1Pt + 2Pt-1 + U2t ……………………….(6.19)
where R represents wealth
Equating demand to supply, we obtain the following equilibrium price and quantity
Pt = 0 + 1It + 2Rt + 3Pt-1 + Vt ……………………………..….. (6.20)
Qt = 4 + sIt + 6Rt + 7Pt-1 + Wt ……………………………….... (6.21)
In a simple example such as the forgoing, it is easy to check for identification; in more
complicated systems, however, it is not so easy. However this time consuming procedure can
be avoided by resorting to either the orders condition or the rank condition of identification.
Although the order condition is easy to apply, it provides only a necessary condition for
identification. On the other hand the rank condition is both a necessary and sufficient condition
for identification. [Note: the order and rank conditions for identification will not be discussed
since the objective of this unit is to briefly introduce and inform the reader about simultaneous
equation. For detailed and advanced discussion readers can refer the reference list stated at the
end of this unit].
Therefore Pt = tP^ V^
+ t …………………………….….(6.27)
Where
P^ t are estimated P , and V^ t are estimated residuals. Substituting (6.27) into (6.23) we
t
get:
P^
Qt = 0 + 1 t+ V^
1 t + U2t ……………………(6.28)
Now under the null hypothesis that there is no simultaneity, the correlation between
V^ t and U
2t
should be zero, asymptotically. Thus if we ran the regression (6.28) and find that the coefficient
of Vt in (6.28) is statistically zero, we can conclude that there is no simultaneity problem.
At the outset it may be noted that the estimation problem is rather complex because there are a
variety of estimation techniques with varying statistical properties. In view of the introductory
nature of this unit we shall consider very briefly the following techniques.
This method is applied in estimating an over identified equation. Theoretically, the two stages
least squares may be considered as an extension of ILS method. The 2SLS method boils down
to the application of ordinary list squares in two stages. That is, in the first stage, we apply least
squares to the reduced form equations in order to obtain an estimate of the exact and random
components of the endogenous variables appearing in the right hand side of the equation with
their estimated value and then we apply OLS to the transformed original equation to obtain
estimates of the structural parameters.
Note, however, that since 2SLS is equivalent to ILS in the just-identified case, it is usually
applied uniformly to all identified equations in the system. [For a detailed discussion of this
method readers may refer the reference list stated at the end of this unit].
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
It = 0 + 1Yt + Ut
Yt = Qt + It
Write the reduced form equation expressed in the form of Yt and It
6.6 SUMMARY
We have seen that a unique feature of a simultaneous equation model is that the endogenous
variable in one equation may appear as an explanatory variable in other equation of the system
so that OLS method may not be applied. The identification problem in this regard asks whether
one can obtain a unique numerical estimates of the structural coefficients from the estimated
reduced form coefficients. This leads to the issue of just identified, under identified and over
identified equations. Note also that in the presence of simultaneity, OLS is generally not
applicable. None the less, it is imperative to test for simultaneity explicitly. For this purpose the
houseman specification test can be used. There are several methods of estimating a
simultaneous equation model.
Y = 1−α
α
t +
01
1−α
Y +
1
1−α
U t t
4. 1 1 1
I = 1−α
t
α
+
0α
1−α 1
1
1
It+
1
U
1−α 1 t
1. What is the economic meaning of the imposition of “Zero restriction on the parameters of a
Model?
Y1t = 4 + 8X1t
Y2t = 2 + 12X1t
a) Which structural coefficients, if any, can be estimated from the reduced form
coefficients?
b) Show that the reduced form parameters measure the total effect of a change in the
exogenous variables.
Contents
7.0 Aims and Objective
7.1 Introduction
7.2 Qualitative Response Models
7.2.1 Categories of Qualitative Response Model
7.3 The Linear Probability Model (LPM)
7.4 The Logit Model
7.5 The Probit Model
7.6 The Tobit Model
7.7 Summary
7.8 Answers to Check Your Progress Questions
7.9 References
7.10 Model Examination Questions
The purpose of this unit is to familiarize students with the concept of qualitative dependent
variable in a regression model and the estimating problems associated with such models.
Binary dependent variables are extremely common in the social sciences. Suppose we want to
study the labor-force participation of adult males as a function of the unemployment rate,
average wage rate, family income, education, etc. A person either is in the labor force or not.
Hence, the dependent variable, labor-force participation, can take only two values: 1 if the
person is in the labor force and 0 if he or she is not. We can consider another example. A
family may or may not own a house. If it owns a house, it takes a value 1 and 0 if it does not.
There are several such examples where the dependent variable is dichotomous. A unique
feature of all the examples is that the dependent variable is of the type that elicits a yes or no
response; that is, it is dichotomous in nature. Now before we discuss the estimation of models
involving dichotomous response variables, let us briefly discuss the concept of qualitative
response models:
ii. Ordinal variables:- these are variables that have categories that can be ranked.
Example: – Rank to indicate political orientation
Y = 1, radical
= 2, liberal
= 3, conservative
- Rank according to education attainment
Y = 1, primary education
= 2, secondary education
= 3, university education
iii. Nominal variables: These variables occur when there are multiple outcomes that cannot be
ordered.
Example: Occupation can be grouped as farming, fishing, carpentry etc.
Y = 1 farming
= 2 fishing Note that numbers are
assigned arbitrarily
= 3 carpentry
= 4 Livestock
iv. Count variables: These variables indicate the number of times some event has occurred.
Example: How many strikes have been occurred.
Now let us turn our attention to the four most commonly used approaches to estimating binary
response models (Type of binomial models).
1. Linear probability models
2. The logit model
3. The probit model
4. The tobit (censored regression) model.
The linear probability model is the regression model applied to a binary dependent variable. To
fix ideas, consider the following simple model:
Yi =
β 0 + β 1 X + U ……………………………(1)
i i
The above model expresses the dichotomous Yi as a linear function of the explanatory variable
Xi. Such kinds of models are called linear probability models (LPM) since E(Y i/Xi) the
conditional expectation of Yi given Xi, can be interpreted as the conditional probability that the
event will occur given Xi; that is, Pr(Yi = 1/Xi). Thus, in the preceding case, E(Y i/Xi) gives the
probability of a family owing a house and whose income is the given amount X i. The
justification of the name LPM can be seen as follows.
E(Yi/Xi) =
β 0 + β 1 X …………………………………….(2)
i
Now, letting Pi = probability that Yi = 1 (that is, that the event occurs) and 1 – P i = probability
that Yi = 0 (that is, that the event does not occur), the variable Y i has the following
distributions:
Yi Probability
0 1−Pi
1 Pi
Total 1
Therefore, by the definition of mathematical expectation, we obtain
E(Yi) = 0 (1 – Pi) + 1(Pi) = Pi ……………………………………..(3)
E(Yi/Xi) = Yi =
β 0 + β 1 X = P ……………………………………(4)
i i
That is, the conditional expectation of the model (1) can, in fact, be interpreted as the
conditional probability of Yi.
Since the probability Pi must lie between 0 and 1, we have the restriction 0 E (Yi/Xi) 1 that
is, the conditional expectation, or conditional probability, must lie between 0 and 1.
1. Heteroscedasticity
The variance of the disturbance terms depends on the X’s and is thus not constant. Let us see
this as follows. We have the following probability distributions for U.
Yi Ui Probability
0 −β 0 −β1 X i 1−Pi
1 1−β 0 −β1 X i Pi
Now by definition Var (Ui) = E(Ui – E(Ui)]2 = E(Ui2) since E(Ui) = 0 by assumption
Therefore, using the preceding probability distribution of Ui, we obtain
Var(Ui) = E(Ui2) = (-
β 0 – β 1 X )2 (1-P ) + (1- β 0 – β 1 X )2 (P )
i i i i
=(-
β 0 – β 1 X )2(1- β 0 – β 1 X ) + (1- β 0 – β 1 X )2 ( β 0 + β 1 X )
i i i i
=(
β 0 + β 1 X ) (1- β 0 – β 1 X )
i i
of β is inefficient and the standard errors are biased, resulting in incorrect test.
2. Non-normality of Ui
Although OLS does not require the disturbance (U’s) to be normally distributed, we assumed
them to be so distributed for the purpose of statistical inference, that is, hypothesis testing, etc.
Ui = Yi-
β0 – β1 X
i
Now when Yi = 1, Ui = 1 -
β0 – β1 X
i
and when Yi = 0, Ui = –
β0 – β1 X
i
3. Non-Sensical Predictions
The LPM produces predicted values outside the normal range of probabilities (0, 1). It predicts
value of Y that are negative and greater than 1. This is the real problem with the OLS
estimation of the LPM.
4. Functional Form:
Since the model is linear, a unit increase in X results in a constant change of β in the
probability of an event, holding all other variables constant. The increase is the same regardless
of the current value of X. In many applications, this is unrealistic. When the outcome is a
probability, it is often substantively reasonable that the effects of independent variables will
have diminishing returns as the predicted probability approaches 0 or 1.
Remark: Because of the above mentioned problems the LPM model is not recommended for
empirical works.
We have seen that LPM has many problems, such as non-normality of U i, heteroscedasticity of
Ui, possibility of
Y^ i lying outside the 0-1 range, and the generally lower R 2 values. But these
Example: The LPM estimated by OLS (on home ownership) is given as follows:
Y^ i = -0.9457 + 0.1021X
i
(0.1228) (0.0082)
t = (-7.6984) (12.515)
R2 = 0.8048
The above regression is interpreted as follows
- The intercept of –0.9457 gives the “probability” that a family with zero income will
own a house. Since this value is negative, and since probability cannot be negative, we
treat this value as zero.
- The slope value of 0.1021 means that for a unit change in income, on the average the
probability of owning a house increases by 0.1021 or about 10 percent. This is so
whether the income level is increased or not. This seems patently unrealistic. In reality
one would expect that Pi is non-linearly related to Xi.
Therefore, what we need is a (probability) model that has the following two features:
1. As Xi increases, Pi = E(Y = 1/X) increases but never steps outside the 0-1 interval.
2. The relationship between Pi and Xi is non-linear, that is, “ one which approaches zero at
slower and slower rates as Xi gets small and approaches one at slower and slower rates
as Xi gets very large”
Geometrically, the model we want would look something like fig 7.1 below.
1 CDF
X
-
Introduction to Econometrics Page 173
0
The above S-shaped curve is very much similar with the cumulative distribution function
(CDF) of a random variable. (Note that the CDF of a random variable X is simply the
probability that it takes a value less than or equal to x0, were x0 is some specified numerical
value of X. In short, F(X), the CDF of X, is F(X = x0) = P(X x0). Please refer to your text
statistics for economists).
Therefore, one can easily use the CDF to model regressions where the response variable is
dichotomous, taking 0-1 values.
The CDFs commonly chosen to represent the 0-1 response models are.
a) the logistic – which gives rise to the logit model
b) the normal – which gives rise to the probit (or normit) model
Now let us see how one can estimate and interpret the logit model.
Pi = E(Y = 1/Xi) =
β 0 +β1 X
i
Where X is income and Y = 1 means the family owns a house. Now consider the following
representation of home ownership.
1
−( β 0 + β1 X i )
Pi = E(Y = 1/Xi) = 1+ e
1
Pi = 1+ e
−Z i
where Zi =
β 0 +β1 X
i
This equation represents what is known as the (cumulative) logistic distribution function. Since
the above equation is non linear in both the X and the β ’s. This means we cannot use the
familiar OLS procedure to estimate the parameters. This can be linear as follows.
1
Zi
1 – Pi = 1+ e
1−Pi 1+e
Pi
Now 1−Pi is simply the odds ratio in favor of owning a house- the ratio of the probability that
a family will own a house to the probability that it will not own a house.
Taking the natural log of the odds ratio we obtain
( )
Pi
1−Pi β 0 +β1 X
Li = ln = Zi = i
L(the log of the odds ratio) is linear in X as well as β (the parameters). L is called the logit
and hence the name logit model is given to it.
β 0 – the intercept tells the value of the log-odds in favor of owning a house if income is
zero. Like most interpretations of intercepts, this interpretation may not have any physical
meaning.
( )β
Pi
1−Pi 0 + β 1 Xi
Li = ln = + Ui
To estimate the above model we need values of Xi and Li. Standard OLS cannot be applied
( 1
) ( 0
since values of L are meaningless (ex. L = ln 0 and L = ln 1 .
)
Therefore estimation is by using the maximum likelihood method. (because of its mathematical
complexities we will not discuss the method here).
Example: Logit estimates. Assume that Y is linearly related to the variables Xi’s as follows:
Yi =
β 0 +β1 X + β 2 X + β 3 X + β 4 X + β 5 X + U
1 2 3 4 5 i
Note: Parameters of the model are not the same as the marginal effects we are used to when
analyzing OLS.
The estimating model that emerges from the normal CDF is popularly known as the probit
model.
Here the observed dependent variable Y, takes on one of the values 0 and 1 using the following
criteria.
¿
Define a latent variable Y* such that
Y i = X 1i β +
I
¿
Y = 1 if
Yi > 0
¿
0 if
Yi 0
The latent variable Y* is continuous (- < Y* < ). It generates the observed binary variable Y.
An observed variable, Y can be observed in two states:
i) if an event occurs it takes a value of 1
ii) if an event does not occur it takes a value of 0
The latent variable is assumed to be a linear function of the observed X’s through the structural
model.
Example:
Let Y measures whether one is employed or not. It is a binary variable taking values 0 and 1.
Y* - measures the willingness to participate in the labor market. This changes continuously and
is unobserved. If X is a wage rate, then as X increases the willingness to participate in the labor
market will increase. (Y* - the willingness to participate cannot be observed). The decision of
the individual will be changed (becomes zero) if the wage rate is below the critical point.
However, since the latent dependent variable is unobserved the model cannot be estimated
using OLS. Maximum likelihood can be used instead.
Most often, the choice is between normal errors and logistic errors, resulting in the probit
(normit) and logit models, respectively. The coefficients derived from the maximum likelihood
(ML) function will be the coefficients for the probit model, if we assume a normal distribution.
If we assume that the appropriate distribution of the error term is a logistic distribution, the
coefficients that we get from the ML function will be the coefficient of the logit model. In both
cases, as with the LPM, it is assumed that E[i/Xi] = 0
In the probit model, it is assumed that Var (i/Xi) = 1. In the logit model, it is assumed that Var
2
(i/Xi) = π /3 . Hence the estimates of the parameters ( β ’s) from the two models are not
directly comparable.
But as Amemiya suggests, a logit estimate of a parameter multiplied by 0.625 gives a fairly
good approximation of the probit estimate of the same parameter. Similarly the coefficients of
LPM and logit models are related as follows:
β LPM = 0.25 β Logit, except for intercept
β LPM = 0.25 β Logit + 0.5 for intercept
Summary
- logit function
(α + βX i )
e 1
=
( α + βX i) −α −βX i
P(Y = 1/X) = 1+ e 1+e (we obtain this by dividing both the numerator
α +βx i
and denominator by e
- Probit function
An extension of the probit model is the tobit model developed by James Tobin. To explain this
model, let us consider the home ownership example.
Suppose we want to find out the amount of money the consumer spends in buying a house in
relation to his or her income and other economic variables. Now we have a problem. If a
consumer does not purchase a house, obviously we have no data on housing expenditure for
such consumers; we have such data only on consumers who actually purchase a house.
Thus consumers are divided into two groups, one consisting of say, N 1 consumers about whom
we have information on the regressors (say income, interest rate etc)as well as the regresand
( amount of expenditure on housing) and another consisting of say, N 2 consumers about whom
we have information only on the regressors but on the regressand. A sample in which
Yi =
β 0 + β 1 X + U if RHS > 0
1i i
= 0, otherwise
Where RHS = right-hand side
The method of maximum likelihood can be used to estimate the parameters of such models.
7.7 SUMMARY
( )
2
1 1 X−μ
exp−
P(Y = 1/X) = (- - β Xi) where (.) = σ √ 2 π
2 2 σ
- Latent variable
- Similarity and differences between logit and probit models
¿ ¿
Y=
Y i if Y i > 0
¿
0 if
Yi 0
7.8 ANSWERS TO CHECK YOUR PROGRESS QUESTIONS
Answers to check your progress questions in this unit are already discussed in the text.
7.9 REFERENCES
CONTENT
8.0 Aims and Objective
8.1 Introduction
8.2 Stationarity and Unit Roots
8.3 Cointegration Analysis and Error Correction Mechanism
8.4 Summary
8.5 Answers to Check Your Progress
8.6 Model Examination
The aim of this unit is to extend the discussion of regression analysis by incorporating a brief
discussion of time series econometrics.
8.1 INTRODUCTION
Recall from our unit one discussion that one of the two important type of data used in empirical
analysis is time series data. Time series data have become so frequently and intensively used in
empirical research that econometricians have recently begun to pay very careful attention to
such data.
In this very brief discussion we first define the concept of stationary time series and then
develop tests to find out whether a time series is stationary. In this connection we introduce
some related concepts, such as unit roots. We then distinguish between trend stationary and
difference stationary time series. A common problem in regression involving time series data is
the phenomenon of spurious regression. Therefore, an introduction to this concept will be
made. A last the concept of cointegration will be stated and point out its importance in
empirical research.
Any time series data can be thought of as being generated by a stochastic or random process. A
type of stochastic process that has received a great deal of attention by time series analysis is
the so-called stationary stochastic process.
Broadly speaking, a stochastic process is said to be stationary if its mean and variance are
constant over time and the value of covariance between two time periods depend only on the
distance or lag between the two time periods and not on the actual time at which the covariance
is computed. A non-stationary series on the other hand, do not have long run mean where the
variable returns and the variance extends to infinity as time goes by.
For many of time series data, however, stationarity is unlikely to exist. If this is the case, the
conventional hypothesis testing procedure based on t, F, Chi-square and other tests may be
Studies have developed different mechanism that enable non-stationary variables attain
stationalrity. It has been argued that if a variable has deterministic trend (i.e. if it can be
perfectly predictable rather than being variable or stochastic), including trend variable in the
regression removes the trend component and makes it stationary. For example in the regression
of consumption expenditure (PCE) an income (PDI) if we observe a very high r 2, which is
typically the case, it may reflect, not the true degree of association between the two variables,
but simply the common trend present in them. That is, with time the two variables move
together. To avoid such spurious association, the common practice is to regress PCE on PDI
and t(time), the trend variable. The coefficient of PDI obtained from this regression now
represents the net influence of PDI on PCE, having removed the trend effect. In other words,
the explicit introduction of the trend variable in the regression has the effect of detrending (i.e.,
removing the influence of trend from) both PCE and PDI. Such process is called trend
stationary since the deviation from the trend is stationary.
However, most time series data have a characteristic of stochastic trend (that is, the trend is
variable which therefore, cannot be predicted with certainty). In such cases, in order to avoid
the problem associated with spurious regression, pre-testing the variables for the existence of
unit roots (i.e., non stationarity) becomes compulsory. In general if a variable has stochastic
trend, it needs to be differenced in order to obtain stationarity. Such process is called difference
stationary process.
In this regard, the Dickey Fuller (DF) test enables us to assess the existence of stationarity. The
simplest DF test starts with the following first order autoregressive model.
Yt = Yt-1 + Ut ………………………………………..(8.1)
Subtracting Yt-1 from both sides gives
Yt -Yt-1 = Yt = Yt-1 - Yt-1 + Ut Yt = Yt-1 +
= (-1)Yt-1 + Ut
Here as well the parameter is used while testing for stationerity. Rejecting the null hypothesis
(of H0: = 0) implies that there exists stationerity. That is Yt is also influenced by Yt-1 in
addition to Ut. Thus, the change in Yt (i.e., Yt) does not follow a random walk. Note that
accepting the null hypothesis is suggests the existence of unit root (or non stationarity)
The DF test has a series limitation in that it suffers from residual autocorrelation. Thus, it is
inappropriate to use DF distribution with the presence of a utocorrelated errors. To amend this
weakness, the DF model is augmented with additional lagged first difference of the dependent
variable. This is called Augmented Dicky Fuller (ADF). This regression model avoids
autocorrelation among the residuals. Incorporating lagged first difference of Y t in (8.3) gives
the following ADF model.
k
∑ ΔY t−i +U t
Yt = + T + Yt-1 + i i=1 ………........................….(8.4)
where k is the lag length
Example: Let us illustrate the ADF test using the Personal Consumption Expenditure (PCE)
data of Ethiopia suppose that regressions of PCE that corresponds to (8.4) gave following
results:
PCE = 233.08 + 1.64t – 0.06PCEt-1 + PCEt-1 …………………………(8.5)
For our purpose the important thing is (taw) statistic of PCEt-1 variable. This is a table that
helps to test the hypothesis stated earlier. Suppose the calculated value do not exceeds its
table value, in this case we fail to reject the null hypothesis which indicates the PCE time series
is not stationary. Thus, if it is not stationary, using the variable at levels will lead to spurious
regression result. As has been stated earlier, if a variable is not stationary at levels, we need to
conduct the test on the variable in its difference form. If a variable that is not stationary in
levels appears to be stationary after nth difference then the variable is said to be integrated
order of n, symbolically we write I(n). Suppose we repeat the preceding exercise using the first
difference of PCE (i.e., PCEt = PCEt – PCEt-1as explanatory variables). If the test result allows
us to reject the null hypothesis we conclude that PCE is integrated of order one, I(1). Note from
our discussion that application of OLS in stationary variables will bring about non-spurious
result. Therefore, before regression is performed that make use of time series variables, the
stationarity of all variables must first be checked.
Note that taking the variables in difference form presents only the dynamic interaction among
the variables with no information about the long run relationship. However, if the variables that
are non stationary separately have the same trend, it points that the variables have a stationary
linear combination. This in turn implies that the variables are cointegrated, i.e., there exists
long run equilibrium (relationship) among the variables.
1. Distinguish between trend stationary process (TSP) and a difference stationary process
(DSP)?
Cointegration among the variables reflects the presence of long run relationship in the system.
We need to test for cointegration because differencing the variables to attain stationarity
generates a model that does not show the long run behavior of the variables. Hence, testing for
cointegration is the same as testing for long run relationship.
There are two approaches used in testing for cointegration. They are i) Engle-Granger (two-step
algorism) and ii) Johansen Approach.
Example: Suppose we regress PCE on PDI to find out the following estimated relationship
between the two.
PCEt = 0 + 1PDI + Ut ………………………………………(8.6)
To identify whether PCE and PDI are cointegrated (i.e., have stationary linear combination) or
not we write (8.6) as follows
Ut = PCEt-1 - 0 - 1PDI ....……………………………………(8.7)
The purpose of (8.7) is to find that U t [i.e., the linear combination (PCE - 0 - 1PDI)] is I(0) or
stationary. Using the procedure stated in the earlier sub tunit for testing stationarity, if we reject
the null hypothesis then we say that the variables PCE and PDI are cointegrated.
If variables are cointegrated, the regression on the levels of the two variables as in (8.6), is
meaningful (i.e., not spurious); and we do not lose any valuable long term information, which
would result if we were to use their first differences instead.
In short, provided we check that the residuals are stationary, the traditional regression
methodology that we have learned so far (including t and F tests) is applicable to data involving
time series.
We just showed that PCE and PDI are cointegrated, that is there is a long-term equilibrium
relationship between the two. Of course, in the short run there may be disequilibrium.
Therefore, one can treat the error term in (8.7) as the “equilibrium error”. We can use this error
term to tie the short-run behavior of PCE to its long run value. In other words, the presence of
cointegration makes it possible to model the variables (that are in first difference) through the
error correction model (ECM). In the model a one time lagged value of the residual hold the
error correction term where its coefficient captures the speed of adjustment to the long run
where
U^ t−1 is the one period lagged value of the residual from regression (8.6) and is the
t
In (8.8) PDI captures the short run disturbances in PDI whereas the error correction term
U^ t−1
captures the adjustment toward the long-run equilibrium. If 2 is statistical significant (and has
to be negative between 0 and –1), it tells us what proportion of the disequilibrium in PCE in
one period is corrected in the next period.
However, the use of Engle Granger method is criticized for its failure on some issues that are
addressed by the Johansen Approach. Interested readdress can get a detailed discussion of this
advanced approach on Harris (1995).
8.4 SUMMARY
In this very brief unit we discussed time series regression analysis. The explanation showed
that most economic time series are non-stationary. Stationarity can be checked by using the
ADF test. Regression of one time series variable on one or more time series variables often can
give spurious results. This phenomenon is known as spurious regression. One way to guard
against it is to find out if the time series are cointegrated. Cointegration of two (or more) time
series suggests that there is a long run or equilibrium relationship between them. The Engle-
Granger or Johansen approach can be used to find out if two or more time series are
The answers for all questions are found in the discussion under sub units 8.2 and 8.3