Study e Material
Study e Material
Study E Material
Dr. M. Chitra
Title: BASIC ECONOMETRICS
Study E Material
ISBN: 978-93-95422-76-5
Pages: 115
Price: `330
Study E - Material
BASIC ECONOMETRICS
Prepared by
Dr.M.CHITRA1
1
Dr.M.Chitra is a faculty in Department of Econometrics, studied B.Sc Mathematics, M.Sc Mathematical
Economics, M.A Economics, M.Phil with Applied Econometrics, Ph.D in Economics. National and International
experienced in teaching and lecturing.
Syllabus- M.A Economics
Basic Econometrics
Page
S.NO UNITS Headings
No
1 Unit-1 Introduction to Basic Econometrics 1
2 Unit-2 Simple linear Regression Model 15
3 Unit-3 Multiple Linear Regression Model 36
4 Unit-4 Problems of Single Equation Models 52
5 Unit-5 Dynamic &Qualitative Regression Models 65
Additional Unit – Gretel Software
6 Unit-6 81
(Not in Syllabus )
GLOSSARY 90
Sample Question Papers 102
BASIC ECONOMETRICS STUDY E MATERIAL
UNIT I
Structure
1.1 Objectives
1.2 Introduction
1.3 Meaning and Definition
1.4 Nature of Econometrics
1.4.1Objectives of Econometrics
1.4.2 Features of Econometric equations
1.4.3Econometrics is a separate discipline. Why?
1.4.4 Tools of Econometrics
1.4.5 Raw materials of Econometrics
1.4.6 Methodology of Econometrics
1.4.7 Economic Model Vs Econometric Model
1.4.8Types of Econometrics
1.5 Scope of Econometrics
1.6 Goals of Econometrics
1.7 Let us sum up
1.8 Unit End Exercise
1.9 Reference Books
1.1 Objectives
After reading the unit you will be able to:
gain insights into nature of econometrics
understand and be able to articulate, both orally and in writing, the scope of
econometricsinto reality and day today life
1.2 Introduction
Econometric methods are widely used in economic research. Research required
different variety of techniques, which is varied from one subject to another. In recent
decades an increased emphasis has been laid down on the development and use of
statistical techniques for the analysis of the economic problems. Prof. Ragnar Frisch, a
Norwegian economist and statistician first of all named this science as “Econometrics” in
1926. Econometrics emerged as an independent discipline studying economics phenomena.
1
BASIC ECONOMETRICS STUDY E MATERIAL
But it recognized and got attention after the world war. In 1931, the realization of the
necessity of econometric work had become so evident, which made to form “Econometric
Society”. This International association includes practically all the worker in the field. The
society published a periodical called “Econometrica” which disseminates the result of
econometric research work. The electronic gadgets like computers have stimulated the
utilization of econometrics in recent days.
2
BASIC ECONOMETRICS STUDY E MATERIAL
3
BASIC ECONOMETRICS STUDY E MATERIAL
assume that economic relationships that are exact. On the contrary, econometrics
assumes that economic relationships are not exact but stochastic. Econometric
methods are designed to take into account random disturbances which create
deviations from exact behavioural patterns suggested by economic theory and
mathematical economics. Econometric methods are designed in such a way that
they take into account the random disturbances.
d. Econometrics differs both from mathematical statistics and economic statistics. An
economic statistician gathers empirical data, records them or charts them, and then
attempts to describe the pattern in their development over time and detects some
relationship between various economic magnitudes. Economic statistics is mainly
descriptive aspect of economics. It does not provide explanations of the
development of the various variables and measurement of the parameters of
economic relationships.
e. On the contrary, mathematical statistics deals with methods of measurement which
are developed on the basis of controlled experiments in laboratories. Statistical
methods of measurement are not appropriate for economic relationships, which
cannot be measured on the basis of evidence provided by controlled experiments,
because such experiments cannot be designed for economic phenomena.
Econometrics uses statistical methods for adapting them to the problems of economic
life. These adapted statistical methods are called econometric methods. In particular,
econometric methods are adjusted so that they become appropriate for the measurement of
economic relationships which are stochastic, that is, they include random elements. Hence,
Econometrics is a separate discipline.
4
BASIC ECONOMETRICS STUDY E MATERIAL
over a period of time is called time series data. That is, values of one or more variables for
several time periods pertaining to a single economic entity are given such data set is called
time series data.
Cross-sectional data are data on one or more variables collected at the same point of
time. These data give information on the variables concerning individual agents at a given
point of time.
Pooled data is a combination of time series and cross sectional data. That is, in the
pooled data are elements of both time series and cross-sectional data.
Forecasting or prediction
Hypothesis testing
To illustrate the preceding steps, let us consider the well-known psychological law of
consumption.
1) Statement of theory or hypothesis
Keynes stated “the fundamental psychological law......is that men (women) are
disposed, as a rule and on average, to increase their consumption as their income increases,
but not as much as the increase in their income”. In short, Keynes postulated that the
marginal propensity to consume (MPC), that is, the rate of change in consumption as a
result of change in income, is greater than zero, but less than one. That is 0<MPC<1.
5
BASIC ECONOMETRICS STUDY E MATERIAL
6
BASIC ECONOMETRICS STUDY E MATERIAL
7
BASIC ECONOMETRICS STUDY E MATERIAL
ESTIMATE
Theories Estimation of parameters
Model Confidence regions
Assumptions Tests of hypotheses
Data Graphical displays
Statistical methods
8
BASIC ECONOMETRICS STUDY E MATERIAL
9
BASIC ECONOMETRICS STUDY E MATERIAL
10
BASIC ECONOMETRICS STUDY E MATERIAL
2. Unavailability of data: Even if we know what some of the excluded variables are
and therefore consider a multiple regression rather than a simple regression, we
may not have quantitative information about these variables. It is a common
experience in empirical analysis that the data we would ideally like to have often
are not available. For example, in principle we could introduce family wealth as an
explanatory variable in addition to the income variable to explain family
consumption expenditure. But unfortunately, information on family wealth
generally is not available. Therefore, we may be forced to omit the wealth variable
from our model despite its great theoretical relevance in explaining consumption
expenditure.
3. Core variables versus peripheral variables: Assume in our consumption-income
example that besides income X1, the number of children per family X2, sex X3,
religion X4, education X5, and geographical region X6 also affect consumption
expenditure. But it is quite possible that the joint influence of all or some of these
variables may be so small and at best nonsystematic or random that as a practical
matter and for cost considerations it does not pay to introduce them into the model
explicitly. One hopes that their combined effect can be treated as a random variable
ui
4. Intrinsic randomness in human behavior: Even if we succeed in introducing all
the relevant variables into the model, there is bound to be some "intrinsic"
randomness in individual Y's that cannot be explained no matter how hard we try.
The disturbances, the ui‟s, may very well reflect this intrinsic randomness.
5. Poor proxy variables: Although the classical regression model assumes that the
variables Y and X are measured accurately, in practice the data may be plagued by
errors of measurement. Consider, for example, Keynes well-known theory of the
Psychological law of consumption function regards consumption expenditure (Yp)
as a function of income (Xp). But since data on these variables are not directly
observable, in practice we use proxy variables, such as current
consumptionexpenditre (Y) and current income (X), which can be observable. Since
the observed Y and X may not equal Yp and Xp, there is the problem of errors of
measurement. The disturbance term u may in this case then also represent the
errors of measurement. As we will see in a later chapter, if there are such errors of
measurement, they can have serious implications for estimating the regression
coefficients, the p's.
6. Principle of parsimony: Following we would like to keep our regression model as
simple as possible. If we can explain the behavior of Y "substantially" with two or
three explanatory variables and if our theory is not strong enough to suggest what
other variables might be included, why introduce more variables? Let ui represent
all other variables. Of course, we should not exclude relevant and important
variables just to keep the regression model simple.
11
BASIC ECONOMETRICS STUDY E MATERIAL
12
BASIC ECONOMETRICS STUDY E MATERIAL
13
BASIC ECONOMETRICS STUDY E MATERIAL
14
BASIC ECONOMETRICS STUDY E MATERIAL
UNIT-II
2.1 Objectives
2.2. Introduction
2.3 Meaning of Simple Regression
2.4 The concept of Population Regression Function (PRF)
2.5 Estimation of parameters using Ordinary Least Square (OLS) method
2.6 The Classical Linear Regression model (CLRM): The assumptions
2.7 Properties of OLS estimators
2.7.1 Property:1 Estimators are linear in parameters
2.7.2 Property: 2 Estimators are unbiased
2.7.3 Property:3 Estimators have a minimum variance
2.7.4 Gauss-Markov theorem
2.8. Goodness of Fit
2.9 Tests of Hypotheses
2.10 Simple soled problems
2.11 Let us sum up
2.12 Unit -End Exercises
2.13 Reference Books
2.1 Objectives
To learn the procedure of estimation of Ordinary Least Square Method,
To explore the valid interpretation of the regression estimates with assumptions on
independent variables and error term
To know the properties of estimators with proof
2.2 Introduction
We know, economics is the study of allocating the scarce resources to meet the
unlimited needs and wants of human being. During the allocation process, by a mixed
economy of country like India, the Demand for the goods and services or Supply of goods
and services are determine by few factors directly and indirectly. If the policy makers and
planners know the exact significant determinants which are influencing the demand for
and supply of particular good or service, then it will be easy to allocate the resources in apt
way to satisfy the consumer and supplier and study the impact into the economy of their
policy implementation. Further the exact manipulated elasticity of supply and demand
values will be helpful to know the scenario of an economy. Regression is tool to support in
micro and macro analysis for analyzing the influencing factors, elasticity and impact of any
15
BASIC ECONOMETRICS STUDY E MATERIAL
programme, event, policy, etc,.Hence in this chapter , an attempt is made to explain the
estimation procedure, assumptions made about the independent variable and error term
and properties of estimators.
16
BASIC ECONOMETRICS STUDY E MATERIAL
The figure 2.1 shows that for each X, there is a population of Y values, which are spread
around the mean of those values. From this diagrammatic explanation, it is clear that each
condition mean E(Y/Xi) is a function of Xi, where Xiis a given value of X
Symbolically, E(Y/Xi) = f (xi)
E(Y/Xi) is a linear function of Xi; it is called as Conditional Expectation
FunctionorPopulation Regression Function (PRF). Since E(Y/Xi) is a linear function of Xi,
say of the type, E(Y/Xi) = α + β Xi where α and β are unknown parameters, where α is
known as intercept and β is known as slope coefficients. The term regression, regression
equation and regression model will be used synonymously.
^ ^
= Σ (Yi-α + βXi )2 must be minimum ------ (1)
ӘΣei2 ӘΣei2
= =0
Ә( ^
α) Ә( ^
β)
17
BASIC ECONOMETRICS STUDY E MATERIAL
^ ^
= Σ Yi Xi– α Σxi- β ΣXi2= 0
^ ^
=> Σyixi = α Σxi+ βΣXi2 ------------- B
A and B are called as normal equation
^ ^
=> Σyi = n α + β ΣXi
^ ^
=> Σyixi = α Σxi+ βΣXi2 Solving these two equation by using elimination method, one
^ ^
can obtain the value of α and β , The other simple alternative way is as follows:
^ ^
From (A) ΣYi= n α + β ΣXi
Σyi ^ ^ ΣXi
Divide by n throughout or both sides} n = α + β n
^ ^
Y =α+β x ------------- (2 ) {Since ΣYi/n = y and ΣXi/n= x }
Σ (xi - x ) (yi - y ) ^
= β -------- Result R1
Σ (xi - x )2
Σ xi yi ^ ^
i.e., Σx 2 = β and substituting this β value in 2 we get,
i
^ Σ xi yi
Y = α + Σx 2 x
i
Σ xi yi ^
i.e., Y - x Σxi2 = α --------Result R2
^ Σ xi yi ^ Σ xi yi
Hence, α = Y - x Σx 2 and β = Σx 2
i i
2.6 The Classical linear Regression model: The assumptions underlying the Method of
Ordinary Least Squares.
Our intention is to study the method of estimation, obtain the values of unknown
parameter and draw inferences about the true parameter. In constructing the econometric
18
BASIC ECONOMETRICS STUDY E MATERIAL
model, it is essential to depict specifically about how the independent variables and error
term are created or generated for a critical valid interpretation of the regression estimators.
There are ten assumptions in the context two variable regression model or Simple
regression model. The assumptions are as follows:
1. The regression model is linear in the parameter. That is Yi = α + β Xi + ui
2. X is assumed to be nonstochastic or X values are fixed in repeated sampling.
3. Given the value of Xi the mean or expected value of the random disturbance term ui
is zero. Technically, the conditional mean value of ui is zero.
4. Given the value of Xi,the variance of ui is same for all observations. That is the
conditional variances of ui are identical. Technically represents the assumptions of
Homoscedasticity.
5. Give any two X values, Xi and Xj (i ≠j), the correlation between any two u i and uj(i
≠j) is zero, where i and j are two different observations. Technically, this
assumption represents that no serial correlation or autocorrelation.
6. The disturbance term ui and explanatory variable X are uncorrelated. Technically
there exist zero covariance between ui and Xi.
7. The number of observation „n‟ must be greater than the number of parameters to be
estimated. Otherwise the number of observation „n‟ must be greater than the
number of explanatory variables.
8. The X values in a given sample must not all be the same. Technically var (X) must
be a finite positive number.
9. The regression model is correctly specified. Otherwise there is no specification bias
or error.
10. There is no perfect linear relationship among the explanatory variables. Technically
there is no Multicollinearity.
19
BASIC ECONOMETRICS STUDY E MATERIAL
Σxi (yi - y )
= Σxi2
Σxi yi - Σxi y
= Σxi2
Σ xi yi Σ xi y
= Σx 2 - Σx 2
i i
^ Σ xi yi Σ xi
β = Σx 2 - y Σx 2
i i
Σ xi yi
= Σx 2 - 0 (∵Σxi = 0)
i
^ Σ xi yi
β = Σx 2
i
^ ^ Σxi
β = Σ wiyi , β is linear. Where Σx 2 = Σwi
i
1 Σxi Σxi 1
= Σyi n - wi x Σwi2 =
Σxi2 = Σxi2
^ 1 Σxi Σxi Σxi2
α = Σyizi , linear ,where zi = n - wi x Σwixi = Σx 2 = Σx 2 =1
i i
^
Therefore ^
α and β are linear
20
BASIC ECONOMETRICS STUDY E MATERIAL
()
E ^
β =β
E (^
β) = β is an unbiased estimator of β
Let us take,^
α value as
α = y -^
^ β x
Taking expectation on both sides we get
α) = E( y - ^
E(^ β x )
α) = E( y ) - E( ^
E(^ β x ) (∵ E ( y )=αo + β1 x ]
= αo + β 1 x - ^
β x
E (^
α ) = αo, Thus αo is an unbiased estimator.
21
BASIC ECONOMETRICS STUDY E MATERIAL
1 2
= σu2 Σ n2 + x- 2 wi2 - n x- wi
n 2 Σw 2 = 1
= σu2 n2 + x- 2 Σwi2 - n x- Σwi (∵Σwi=0) i Σxi2
1 -
x
= σu2n + Σx 2
i
Σxi2 + n x- 2
= σu2 nΣx 2
i
Σ(xi- -x) 2 + n x- 2
= σu2
nΣxi2
Σxi2 + Σ x- 2 -2 Σxi x- +n x- 2
= σu2 (∵ Σ x- 2 = nx- )
n Σxi2
Σxi2-2n x- 2 + 2n x- 2
= σu
2 ( Second term and third term in bracket are cancelled)
nΣxi2
Σxi
2
= σu2 nΣx 2
i
σu2 Σxi2
= Σx 2 x n (x stands for multiplication of first term and second term)
i
Σxi2 σ u2
Var (^
α) = Var (^
β) . ∵Σxi2 = Var (^
β)
n
Σxi2
Var (^α) = Var (^β) . n
=E [(^
α –α)(^
β –β)]
Cov (^
α^β) = E [x- (β - ^
β) (^
β - β)] => Since/ ∵^
α–α= Y -^
β x - Y -β x
= E [x- (β - ^
β)2] => =-x- (β - ^
β) (both Y bar cancelled
= -x- E (^
β - β)2
= -x- Var ^ β
σ u2 ∵Var ^ σ u2
= - x- Σx 2 β = Σx 2
i i
Var (^
β) = Var (Σwi Yi)
= ΣwiVar Yi (∵Var Yi = Varui= σu2)
= σui2 Σwi2
22
BASIC ECONOMETRICS STUDY E MATERIAL
xi xi 2
= σ2 Σ wi - Σx 2 + Σx => The bracket terms / it‟s of the form (a+b)2 So,
i
i
xi 2 Σxi2 xi xi
= σ2 Σ wi - Σx 2 + σ2 (Σx 2)2 + 2 σ2 Σ wi - Σx 2 + Σx
i i i i
xi 2 1
= σ2 Σ wi - Σx 2 + σ2 Σx 2 (∵Σwi = 0)
i i
σ2 ∵ w = xi
= Σx 2 i
Σxi2
i
= Var (^
β)
The variance of the linear estimator ^
β is equal to the variance of least square estimator
^
β. Otherwise var (^ β) >var (β). Hence, ^ β is the minimum variance of linear unbiased
estimator of β.
^
Hence the OLS estimators ^ α and β are linear functions of the independent variables
^ ^
and also unbiased estimators of ^ α and β respectively. Now to prove that only ^ α and β are
also best estimators, it is essential to show that among all the unbiased estimators, the
variance of the OLS estimators is the least, otherwise OLS estimators is BLUE.
^
^
Let β = Σciyibe any other linear estimator of β, where ci = wi + di and di being any
arbitrary constant other than zero. Then to prove an unbiased estimator as follows:
^
^ ^
^
E ( β) = β (∵ β is an unbiased estimator of β)
^
^
E ( β) = E (Σciyi)
= E (Σci (α + βxi + ui)]
= E (Σciα + β Σcixi +Σ ui ci)]
= ΣciE (α) + E (β) Σcixi +Σci E (ui)
^
^
E ( β) = β……Unbiased Proved. (only if Σei=0, and Σxici=1, E(ui)=0)
Rough work :i.e., If Σ(wi+di) = 0 and Σ (wi+di) xi = 1
=> Σdi= 0, Σwixi + Σdixi = 1
23
BASIC ECONOMETRICS STUDY E MATERIAL
= Var ^
β + σu2 Σ di2
^
^
Hence Var ( β ) = Var ^ β + a positive quantity
^
^ ^
^
∴Var ( β ) - Var ^β > 0 (or) Var ( β ) >Var ^
β
Thus, the variance of the OLS estimators is the least among all linear unbiased
estimators.
Similarly, for ^α
^
Let ^
α = Σci*yi be any other linear unbiased estimator of α where ci* = wi*+di*, di* being
any arbitrary constant other than zero.
Then,
^
^ = Σci*yi…….. Linear and to prove unbiased, Let us take
α
^
^ Σci* (α + βxi + ui)]
α=
= Σci*α + β Σci*xi +Σ ui ci* (only if Σci*=1 and Σci*xi = 0 and E (ui) = 0)
Taking expectation on both sides
^
^ = E (α) + Σci*E (ui) we get
E (α)
^
^ = α + Σci*E(ui)
α (since E (ui) = 0)
=α+0
^
^ =α
E (α)
Rough work : if Σci* = 1 = Σ(wi*+di*) = 1+Σ di* => Σ di*= 0 where Σ wi*= 1
Σci*xi = Σ(wi*+di*) xi = Σwi*xi + Σdi*xi = 0 (∵Σwi*xi = 0, Σdi*xi = 0)
Σwi*xi = Σdi*xi + 0
1 = Σdi*xi => Σdi*xi = -1
24
BASIC ECONOMETRICS STUDY E MATERIAL
^
Var (^
α) = E [Σci*ui]2
= E [Σci*2ui2]
= σu2 Σ ci*2
= σu2 Σ (wi* + di*)2
= σu2 Σ wi*2 + σu2Σdi*2 + σu2 2 Σwi*di*
= var (^
α) + a positive quantity
^
Var ( ^α) >var (^
α)
^
αis a linear unbiased estimator. The OLS estimators have the least variance, which is
the best linear unbiased estimator (BLUE).
25
BASIC ECONOMETRICS STUDY E MATERIAL
X n
2
1879
5
4747 4485 262
= .35(approx.)
1879 1125 754
and also X =15 and Y =59.8
ˆ ˆ X on passing through mean Y
Since the equation is Y= ˆ ˆ X
On passing through mean
Y ˆ ˆ X
59.8= ˆ .35x15
̂ =59.8-5.25=54.55
Regression equation will be –
Ŷ =54.55+0.35X
On putting given values of X, the corresponding values of Ŷ can be calculated as
shown in the table.Now, we will test the hypothesis,
Suppose the null hypothesis is =0. The formula of t is
ˆ
t= . x Thus, t 0.35x
2 754
x3 0.35x9.81 3.433
e 2
i /n2 i
23.51
26
BASIC ECONOMETRICS STUDY E MATERIAL
Tabulated value of„t‟ as 3 degree of freedom is 2.353.Since tabulated value is less than
calculated value of t, the hypothesis is to be rejected and alternative hypothesis will be
accepted. Thus is different from zero.
Sˆ e 2
23.51
0.102
n 2. x 2
3x 754
Confidence interval at 95% level is 0.35 (3.182x0.102) ,0.35 0.325
Illustration:2. The following table gives the production of steel in different years at a steel
bx
factory. Find out the equation y= a.e expressing the relationship between production and
year
Years 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
Production (000
10.2 12.0 13.9 15.9 17.9 20.1 22.7 26.0 29.0 32.5 36.1
tons)
bx
Solution: The given equation is y= a.e where a andb are the constant and e the exponential
constant. Taking log to the base e we have-
log e y log e a bx On putting Y= log e y and a 0 log e a; our equation will be
y= a 0 bx Now least squares method will be applied to estimate a 0 and b.
,
Year x Production y Y = logey1= log10y×2.3025 x2 xY
2011 -5 10.2 1.0086x2.3025=2.3223 25 -11.6115
2012 -4 12.0 1.0792x2.3025=2.4848 16 -9.9392
2013 -3 13.9 1.1430x2.3025=2.6318 9 -78954
2014 -2 15.9 1.2014x2.3025=2.7662 4 -5.5324
2015 -1 17.9 1.2529x2.3025=2.8848 1 -2.88848
2016 0 20.1 1.3032x2.3025=3.0006 0 0
2017 1 22.7 1.3560x2.3025=3.1222 1 3.1223
2018 2 26.0 1.4150x2.3025=3.2580 4 6.5160
2019 3 29.0 1.4624x2.3025=3.2580 9 10.1016
2020 4 32.5 1.5119x2.3025=3.4811 16 13.9244
2021 5 36.1 1.5575x2.3025=3.5861 25 17.9305
Total 32.9015 110 13.7315
The two normal equations are
Y na 0b x ……………(i)
xY a x b x
0
2
……………..(ii)
1
Since we are given with the log table to base of 10, to change the base to ‘e’ we will have to multiply the
usual value of log to the base 10 with the value of ‘log10e’ which is equal to 2.3025.
27
BASIC ECONOMETRICS STUDY E MATERIAL
ˆ ˆ X *
The estimated relationship is Y
where X* is the reciprocal of X. By formula
X*Y
X* Y
3.702
1.28X11.3
ˆ
X
n 4
1.28
X
*2 2
*2 0.4614
n 4
3.702 3.616 0.086
1.66
0.4614 0.4096 0.0518
Y ˆ 1.66X * or ˆ Y 1.66X *
^
=2.825-(1.66x.32)=2.825-.531
1
The equation is Y=2.294+1.66X* or Y=2.29+1.66
X
28
BASIC ECONOMETRICS STUDY E MATERIAL
ˆ yx
X X Y Y 106.4 0.49
XX 2
215.4
29
BASIC ECONOMETRICS STUDY E MATERIAL
ˆ yx
X X Y Y 106.4 1.22
YY 2
86.9
ˆ X ˆ Y 9.31 -(1.22X1.09) =7.98
Thus estimated regression line of X on Y is
X= 7.98+1.22Y
c) when X=10
then Y=-3.47+(0.49x10) =1.43
d) when Y=1.5
then X=7.98+(1.22x1.5)= 9.81
XY
X. Y
where ˆ n
X
X
2
2
n
56x 40
364
= 20 252 0.686
56x56 367.2
524
20
X
X 56 2.8 , Y
Y 40 2
n 20 n 20
ˆ
ˆ Y X 2 -0.686x2.8=2-1.921=0.079
Thus estimated regression line becomes Y=0.079+686X
b) Estimated regression line is X Y
XY n
X. Y
^
where,
Y
Y
2
2
30
BASIC ECONOMETRICS STUDY E MATERIAL
56x 40
364
= 20 252
40x 40 256 80
256
20
252
= 1.43
176
ˆ X ˆ Y 2.8 -1.43x2=2.8-2.86=-0.06
Now the estimated regression line becomes
X=0.06+1.43Y
c) When X=7
Then Y=0.079+.686x7=4.881
d) When Y=3
then X=-0.06+1.43x3=4.23
Illustration: 7 The following table gives ages in year of 10 husbands and their wives:-
Age of husband (X) 18 19 20 21 22 23 24 25 26 27
Age of wife (Y) 17 17 18 18 18 19 19 20 21 22
a) Estimate the linear regression of the ages of wives (Y) on the ages of husbands (X).
b) Plot the regression line on the scatter diagram.
c) Are the age of wives dependent on the ages of their husbands? Use 5% level of
significance.
d) Estimate the age of the wife whose husband is 28 years old.
XY
X Y
where, ˆ n
X
X
2
2
n
and ˆ y ˆ X
To solve the ̂ and ̂ we shall construct the following table-
X Y x-X-a (a=23) x2 y=Y-a (a=19) xy y2
18 17 -5 25 -2 10 4
19 17 -4 16 -2 8 4
20 18 -3 9 -1 3 1
21 18 -2 4 -1 2 1
22 18 -1 1 -1 1 1
23 19 0 0 0 0 0
24 19 +1 1 0 0 0
31
BASIC ECONOMETRICS STUDY E MATERIAL
25 20 +2 4 +1 2 1
26 21 +3 9 +2 6 4
27 22 +4 16 +3 12 9
X Y X =-5 X 2
y =-1 xy y 2
=25
=225 =189 =85 =44
X
X 225 22.5 , Y
Y 189 18.9
n 10 n 10
xy x y 44
(5) x (1)
43.5
Again, ˆ n = 10 = =0.527
x 2 5
2
82.5
x 2
n
85
10
ˆ y ˆ X =18.9-(0.527x22.5) =7.044
The regression line becomes Y= 7.044+0.527X
d) when X=28 ,Putting the value of X in the regression line
Y=7.044+0.527x28=218 ,Thus when the age of husband is 28 years, the age of wife will be
22 years (approx)
(c) To test the hypothesis we shall apply the „t‟ test
t= x x .2
e2
n2
To determine „e‟ we shall construct table as following-
x y Ŷ e=Y- Ŷ e2
18 17 16.529 0.471 0.2197
19 17 17.053 -0.053 0.0028
20 18 17.583 0.417 0.1738
21 18 18.110 -0.110 0.0121
22 18 18.637 -0.637 0.4056
23 19 19.164 -0.164 0.0268
24 19 19.691 -0.691 0.4764
25 20 20.218 -0.218 0.0475
26 21 20.745 0.255 0.0650
27 22 21.272 0.272 0.5299
e 2
=1.9596
Let =0
32
BASIC ECONOMETRICS STUDY E MATERIAL
Illustration: 8 The following data were collected from 5 different plants in a certain
industry.
Total cost (Y) 80 44 51 70 61
Production (X) 12 4 6 11 8
Answer the following questions
a) Estimate a linear total cost function Y= X for the industry.
b) What is the economic significance of the estimate of and .
c) Estimate the total cost for a level of production of 10.
Solution: Our regression line is Y= X
XY
X Y
where, ˆ n ˆ Y ˆ X
and,
X
X
2
2
X
X 41 8.2 Y Y 306 61.2
n 5 n 5
ˆ Y ˆ X =61.2-4.25x8.2=26.35
33
BASIC ECONOMETRICS STUDY E MATERIAL
knowing the entrepreneur can estimate his total of production. The accuracy of and
is necessary for the good results
c)when X=10 , then Y=26.35+4.25x10=68.85
34
BASIC ECONOMETRICS STUDY E MATERIAL
35
BASIC ECONOMETRICS STUDY E MATERIAL
UNIT3
3.1 Objectives
3.2 Introduction
3.3 Meaning of Multiple Regression
3.4 Assumptions underlying the Method of OLS
3.5 Estimation of Multiple Linear Regression Model
3.6 Properties of OLS estimators
3.6.1 Estimators are linear in parameters
3.6.2 Estimators are unbiased
3.6.3. Estimators have a minimum variance
3.7. Goodness of Fit-R2 and the adjusted R2
3.8 Solved numerical Problems
3.9 Let us sum up
3.10 Unit -End Exercises
3.11 Reference Books
3.1 Objectives
The specific objective of this chapter intends towards the student are as follows
To make them with familiarity of multiple regression and its properties
To understand the multiple regressions for using in reality as a researcher, as a field
surveyor, and desk researcher.
3. 2 Introduction
This unit made an attempt to explain the Multiple Regression Model and its properties.
Any economic activity is not with single factor, but determined by more than one factors.
The daily routine life demand in early morning tea or coffee determined the factors like
input cost of its preparation, taste and preference, and its competitive beverages prices.
Every economic activity determined by more than one variable from micro to macro.
Hence it is essential to study this chapter for dealing such a kind of situation.
36
BASIC ECONOMETRICS STUDY E MATERIAL
Let us start with the theoretical proposition that changes in one variable can be
explained by change in several other variables. Such a relationship is described in simple
way by a multiple linear regression equation of the form
Yi 1 2X2i 3X3i ............ k Xki u i ……. (1)
where Y denotes the dependent variable, the X‟s are explanatory variables and U is a
stochastic disturbance term.
c. E(ui2)= u
2
d. E(uiuj )= 0 for i j
e. Each of the explanatory variables is non-stochastic with fixed value in repeated
X
n
2
samples and such that for any sample size ki X k / n is a finite number
i 1
i 1
.
e n
= Y Xˆ Y Xˆ
37
BASIC ECONOMETRICS STUDY E MATERIAL
or
ei2
2XY 2XXˆ
ˆ
2XXˆ 2XY 0
XXˆ XY
Premultiplying by XX to both the sides we have,
1
Y=X β+u
ˆ XX XX u
1
XX Xu
1
taking expectations
E ˆ XX XE(u)
1
Since E(u)=0
Ê
Thus are unbiased estimators of
3.6.2 ii) The Least Square Estimators are linear estimators- Since least squares estimators
have linear relation with Y, so they are linear.
38
BASIC ECONOMETRICS STUDY E MATERIAL
3.6.3 iii) The Least Square Estimators are best estimators-Now we will prove that our
estimators are best among all the estimators, since
Var . ̂ = E
ˆ
ˆ
E
= X
1
X X u
XX 1
Xu
= E X X X u.u XX X
1 1
Since XX XX
1 1
XX X.E(uu).XXX
1 1
Now we shall prove the more general result which is a special case. The more general
result also has its applications in predication problem. Let us consider a relation
b=(A+B)Y where Y=X +u, A = XX X and B is a constant.
1
or b=(A+B) (X +u)
=(A+B)X +(A+B)u
Taking expectation of both sides we have
E(b) = (A+B)X since E(u)=0
=AX +BX
= +BX since AX=In
E(b) = only if BX=0.
Thus b is an unbiased estimator.
Again b= +(A+B)u
b- =(A+B)u
39
BASIC ECONOMETRICS STUDY E MATERIAL
2u XX XXXX BXXX XX XB BB
1 1 1 1
Since XX XX In ; BX 0
1
ˆ
=Var. u BB
2
1
û i2
y 2
i
present in the model. Intuitively, it is clear that as the number of X variables increases,
û 2
i is likely to decrease (at least it will not increase); hence R2 as defined it will increase.
In view of this, in comparing two regression models with the same dependent variable but
differing number of X variables, one should be very wary of choosing the model with the
highest R2.
To compare two R2 terms, one must take into account the number of X variables
present in the model. This can be done readily if we consider an alternative coefficient of
determination, which is as follows:
40
BASIC ECONOMETRICS STUDY E MATERIAL
R 2 1
û i2 / (n k )
where k = the number of parameters in the model including the
yi2 / (n 1)
intercept term. (In the three-variable regression, k = 3. Why?) The R2 thus defined is known
as the adjusted R2, denoted by R 2 . The term adjusted means adjusted for the degrees of
freedom associated with the sums of squares entering into: û 2
i has n-k degrees of
freedom in a model involving k parameters, which include the intercept term, and y 2
i
has n-1 degree of freedom. (Why?) For the three-variable case, we know that û 2
i has n-3
ˆ 2
degrees of freedom. R 1 where ̂ 2 is the residual variance, an unbiased estimator of
2
2
SY
true 2 , and SY is the sample variance of Y. It is easy to see that R 2 and R2 are related
2
n 1
because, substituting the value of R2, obtain R 2 1 (1 R 2 ) It implies that as the
nk .
number of X variables increases, the adjusted R2 increases less that the unadjusted R2; and
R 2 can be negative, although R2 is necessarily nonnegative. In case R 2 turns out to be
negative in an application, its value is taken as zero. Which R2should one use in practice?
As Theil notes: ….it is good practice to use R 2 rather than R2 because R2 tends to give an
overly optimistic picture of the fit of the regression, particularly when the number of
explanatory variables is not very small compared with the number of observations,
explanatory variables is not very small compared with the number of observations.
SY ˆ . Y ˆ Y ˆ NY
0 1
2
2
SN ˆ . N ˆ NY ˆ N
0 1 2
2
41
BASIC ECONOMETRICS STUDY E MATERIAL
Family S Y N Y2 N2 SY SN NY
A 6 8 5 64 25 48 30 40
B 12 11 2 121 4 132 24 22
C 10 9 1 81 1 90 10 9
D 7 6 3 36 9 42 21 18
E 3 6 4 36 16 18 12 24
Total 38 40 15 338 55 330 97 113
On putting values in normal equations,
38 5ˆ 0 40ˆ 1 15ˆ 2 ……….(1)
330 40ˆ 338ˆ 113ˆ
0 1 2 ……… (2)
97 15ˆ 0 113ˆ 1 55ˆ 2 …….....(3)
On multiplying equation (1) by 8 and subtracting it from equation (2)
18ˆ 1 7ˆ 2 26 ………(4)
On multiplying equation (1) by 3 and substracting it from equation (3)
7ˆ 1 10ˆ 2 17 ………(5)
On multiplying equation (4) by 10 and (5) by 7
180ˆ 1 70ˆ 260
49ˆ 1 70ˆ 2 119
on subtracting
131ˆ 141
1
141
or ˆ 1 1.076
131
On putting the value of ̂1 in equation (5)
7x1.076- 10ˆ 2 =17
- 10ˆ 2 =9.466 or ˆ 2 .9466 .947
Now the equation will pass through mean values. So , S ˆ 0 ˆ 1 Y ˆ 2 N
7.6 ̂0 +(8x1.076)+3(-947) = ̂0 =1.833
Our estimated relation will be
S=1.833+1.076Y-.947N
Illustration: 2 The following matrix gives the variances and covariances of three variables:-
X1=log food consumption per capita
X2=log food price
X3=log disposable income per capita
X1 X2 X3
X1 7.59 3.12 26.99
X 2 29.16 30.80
X 3 133.0
42
BASIC ECONOMETRICS STUDY E MATERIAL
133.0 30.80
XX 1 1
2929.64 30.80 29.16
43
BASIC ECONOMETRICS STUDY E MATERIAL
133 30.80
2929.64 2929.64 .045 .010
30.80 29.16 .010 .009
2929.64 2929.64
X 2 X X 3.12
Now XX1 X1 1 2
X3 X1X3 26.99
Illustration: 3 Three related variates X1, X2, X3 take the following sets of values:-
X1 1 2 3 4 5
X2 2 1 5 4 3
X3 3 1 4 5 2
a) Show that the regression plan of X1 on X2 and X3 is
18X1-17X2+10X3=33
b) Also test the null hypothesis H0( 2=0) against alternative hypothesis H1( 2 0) at 5%
level of significance.
Solution: Let the regression line be
X1= 1+ 2X2+ 3X3.
2 X
We have ˆ XX XX1 where, ˆ X X 2X3 and X' 2
1
3 , X3
44
BASIC ECONOMETRICS STUDY E MATERIAL
X1
X 1
15
3 X 2 3, X3 3
n 5
Now we shall construct the following quantities in terms of deviation around the
means
X
x X n
2
15X15
2
2
2
2
2
55 10
5
15X15
x 2
3 55
5
10, x12 10
x1x 3 47 45 2, x 2 x 3 53 45 8
X x2 x 2 x 3 10 8
XX 2 X 2 X3 2
X3 x 2 x 3 x 3 8 10
10 8
XX 100 64 36
8 10
Cofactor of 10=10, cofactor of 8=-8, cofactor of 8=-8, Cofactor of 10=10.
45
BASIC ECONOMETRICS STUDY E MATERIAL
0.278x5 0.222x 2
0.222x5 0.278x 2
0.946 2
ˆ XX XX1
1
̂ 2 =0.946, ̂ 2 =-0.554
0.554 3
X1 ˆ 2 X2 ˆ 3 X3 ˆ 1
3-.946X3+.554X3= ̂1
3-.946X3+.554X3= ̂1
̂1 =1.824
Regression line of X1 on X2 and X3 is
X1=1.824+0.946X2-0.554X3.
or we can write it in another way; on multiplying both the sides of eq. by18
18X1=32.83+17.028X2-9.97X3.
or 18X1=33+17X2-10X3
or 18X1=17X2+10X3=33. (proved)
b) Test of significance
1 X̂1
e X1 X̂1 e2
1 2.054 -1.054 1.110
2 2.216 -0.216 0.0467
3 4.338 -1.338 1.790
4 1.838 +2.162 4.674
5 3.554 +1.446 2.091
e 2
=7.875
ˆ 2 2
Applying the „t‟ test- t a ii 2 =hypothetical parameter , e= error term,
e2
nk
k= no.ofparameters , aii= ith diagonal element in XX
1
46
BASIC ECONOMETRICS STUDY E MATERIAL
Illustration:4 The following table shows the weights (X1) to the nearest pound, heights (X2)
to the nearest inch and ages (X3) to the nearest year of 12 boys:-
Weight (X1) Height (X2) Age (X3)
64 57 8
71 59 10
53 49 6
67 62 11
55 51 8
58 50 7
77 55 10
57 48 9
56 52 10
51 42 6
76 61 12
68 57 9
Estimate the least squares regression line to predict the weight of a boy of given height
and age.
Solution: Let the regression line of X1on X2 and X3 be X1 1 2 X 2 3X3
To determine the values of parameters I e. 1 2 and 3 we shall construct the following
table-
X1 X2 X3 X1X 2 1X2 X2X3 X12 X 22 X 32
64 57 8 3648 512 456 4096 3249 64
71 59 10 4189 710 590 5041 3481 100
53 49 6 2597 318 294 2809 2401 36
67 62 11 4145 737 682 4489 3844 121
55 51 8 2850 440 408 3025 2601 64
58 50 7 2900 406 350 3364 2500 49
77 55 10 4235 770 550 5929 3025 100
57 48 9 2736 513 432 3249 2304 81
56 52 10 2912 560 520 3136 2704 100
51 42 6 2142 306 252 2601 1764 36
76 61 12 4636 912 732 5776 3721 144
68 57 9 3876 612 5133 4624 3249 81
753 643 106 40830 6796 5779 48139 34843 976
753 643 106
X1 62.75, X 2 53.58, X3 8.83
12 15 12
Setting of value to the actual mean-
x x X / n 34843 643
2
/ 12 388.92
2 2 2
2 2 2
47
BASIC ECONOMETRICS STUDY E MATERIAL
x x X / n 976 106
2
/ 12 39.67
2 2 2
3 3 3
ˆ 2
x x . x x x . x x = (481.75x39.67) (144.5 99.17)
1 2
2
3 1 3 2 3
x . x x x 388.92x39.67 99.17
2 2 2 2
2 3 2 3
ˆ 3
x x . x x x . x x = 44.5x388.92 481.75x99.17
1 3
2
2 1 2 2 3
x . x x x 388.92x39.67 99.17
2 2 2 2
2 3 2 3
=62.75-(0.85X53.58)-(1.51X8.83)
=62.75-45.54+13.33=3.88 Thus regression line is X1=3.88+0.85X2+1.51X3.
Illustration: 5 From the following data compute the regression line of X1 on X2 and 3.
Year 2011 2012 2013 2014 2015 2016 2017 2018 2019
X1 100 106 107 120 110 116 123 133 137
X2 100 104 106 111 111 115 120 124 126
X3 100 99 110 126 113 103 102 103 98
Where X1= Index of imports of goods and services to U.S.A at constant (2000) prices.
X2=Index of gross U.S.A product at 2000 prices.
X3= Ratio of indices of prices of imports and general U.S.A output respectively.
Solution: Let the estimated regression line of X1 onX2 and X3 be
X1 1 2 X 2 3X3 From the above table firstly we shall compute the mean
n=9 X 1 =1052, X 2 =1017, X 3 =954
X1
X 1052 116.9 ; X 2
1017
113 ; X 3
954
106
n 9 9 9
48
BASIC ECONOMETRICS STUDY E MATERIAL
X X =119,750,
1 2 X1X3=111.433 X X2 3 =107, 690
X =124,288
2
1 X 2
2 =115,571, X 2
3 =101,772
Now we shall compute in terms of deviation from actual mean
x x x1x 2
X . X
1 2
119750
1052x1017
874
1 2
n 9
x x x1x 3
X1. X3 111433
1052x954
79
1 3
n 9
x x X X n 107690 9 112
X. X 1017 x954
2 3
2 3 2 3
X
x x n 124228 9 126089
2
2 2 1052x1052
1
1 1
X
x x n 115571 9
2
1017 x1017
2
2
2
2
2
650
X
x x n 101772 9 648
2
2 2 954x954
3
3 3
x 2
We know that ˆ XX XX1
1
where, X= [x2 x3] and X
x3
x 2 x 22 x 2 x 3 650 112
Now X X x 2 x 3
x3 x 2 x 3 x 32 112 648
650 112
XX 650X648 112X112 408656
112 648
Cofactor of 650=648 since sign for cofactor is (-1)i+j
Cofactor of -112=112 where i=no. of rows,
Cofactor of -112=112 j=no. of columns
Cofactor of 648=650
648 112
Adjoint of XX
Adjo int
Since, Inverse =
112 650 Deter min ant
648 112
648 112 408656 408656
XX 1 1
408656 112 650 112 650
408656 408656
(Since [A]=[ A]where, A is a matrix and is scalar)
XX 1
0.00158 0.00027 x x x
XX 1 2 X1 1 2
874
0.00027 0.00159 , x3 x1x 3 79
49
BASIC ECONOMETRICS STUDY E MATERIAL
ˆ X X X X
1
B.
ˆ X Y X X
1
C.
ˆ X Y Y 'Y
1
D.
50
BASIC ECONOMETRICS STUDY E MATERIAL
51
BASIC ECONOMETRICS STUDY E MATERIAL
UNIT 4
Structure
4.1 Objectives
4.2 Introduction
4.3 Violation of OLS assumptions
4.4 Multicollinearity
4.4.1 Meaning and types
4.4.2 Causes, Consequences
4.4.3 Deduction and Remedial Measures
4.5 Heteroscedasticity
4.5.1 Meaning
4.5.2 Causes, Consequences
4.5.3 Deduction and Remedial Measures
4.6 Autocorrelation
4.6.1 Meaning
4.6.2 Causes, Consequences
4.6.3 Deduction and Remedial Measures
4.7 Specifications
4.7.1 Meaning, Reasons and types
4.7.2 Causes, Consequences, Tests
4.8 Let us sum up
4.9 Unit End Exercise
4.10 Reference Books
4.1 Objectives
After going through the unit you will be able to:
Understand the concepts and issues of violation of assumptions
Sense the causes and consequences of violation of assumptions.
Know the method of detection and remedies for violation of assumptions.
4.2 Introduction
The econometric models are constructed by introducing the random variable '𝑈𝑖 ' for to
take into account of influence of various errors, such as (a) errors of omitted variable (b)
errors of the mathematical form of the model (c) errors of measurement of the dependent
variable and (d) the effects of the erratic element which is inherent in human behaviour.
52
BASIC ECONOMETRICS STUDY E MATERIAL
We studied the role of random variable '𝑈𝑖 ‟ and the reason for it is introduced into model
in unit one and in unit two, under assumptions. Further to get valid and representative
results one must be familiar with the expected consequences from non-fulfillment of an
assumption. Hence, this chapter, we will discuss the causes, consequences, detection and
remedies which are to be made if any one of the basic assumption is violated.
4.4.Multicollinearity
The classical linear regression model (CLRM) assumes that there is no multicollinearity
among the regressors included in the regression model.
4.4.1. Meaning:
Multicollinearity refers to the existence of more than one exact linear relationship, and
collinearity refers to the existence of a single linear relationship. Originally,
Multicollinearity meant that the existence of a “perfect” or exact, linear relationship among
some or all explanatory variables of a regression model.
In Classical linear regression model assume that there is no multicollinearity, among
the explanatory variables (Xs). The reasoning is this: if multicollinearity is perfect. Then
the regression coefficients of explanatory variables are in determinate and their standard
errors are determinate and their standard errors are infinite. If multicollinearity is less than
perfect, the regression coefficients possess large standard errors, which means the
coefficients cannot be estimated with great precision of accuracy
53
BASIC ECONOMETRICS STUDY E MATERIAL
Types of Multicollinearity
The types of multicollinearity are of four types as follows:
1. High Multicollinearity: It signifies a high or strong correlation between two or more
independent variables, but not a perfect one.
2. Perfect Multicollinearity: This degree of collinearity indicates an exact linear
relationship between two or more independent variables.
3. Data-based Multicollinearity: The possibility of collinearity, in this case, arises out
of the selected dataset.
4. Structural Multicollinearity: This issue arises when researchers have a poorly
designed framework for the regression analysis.
54
BASIC ECONOMETRICS STUDY E MATERIAL
55
BASIC ECONOMETRICS STUDY E MATERIAL
4.5 Heterocedasticity
The classical linear regression model assumes that, the variance of each disturbance
term U; is same for all observations. That o the conditional variances of U; are identical
U
symbolically Var i = where var stands for variance. This is the assumption of
2
Xi
Hemoscedasticily, or equal variance. If this assumption is violated, heteroscedasticity
arises
4.5.1 Meaning
Heteroscedasticity refers to for given vales of X's the variance of each disturbance term
Ui is not constant number equal to
2
Ui
var 2
Xi
Consequence ofHetroscedasticity
If the assumptions of homoscedastic, disturbance is not fulfilled, we have the following
consequences:
1) We cannot apply the formula of the variances of the coefficients to conduct tests of
significance and construct confidence intervals. The tests are replicable.
56
BASIC ECONOMETRICS STUDY E MATERIAL
where V is the Stochastic disturbance term. Since is generally not known park suggests
2
using U2 as a proxy and running the following regression logU2=logX2+ log Xi+ ;
If turns out to be statistically significant, it would be suggest that heteroscedasticity is
present in the data.
57
BASIC ECONOMETRICS STUDY E MATERIAL
Remedial measures
The presence of heteroscedasticity does not destroy the unbiasedness and consistence
properties of ous estimators, but They are no longer efficient, and not even asymptotically.
Therefore remedial measures needed to solve the problem of heteroscedasticity
There are two approaches to remediation: When i is known and when i is not
2 2
known.
(a) When i is known, then to correct the problem of heterosedasticity is by means of
2
Weighted least squares, for the estimators thus obtained are BLUE.
(b) When i is not known, then use the data transformation method(1) based on the
2
58
BASIC ECONOMETRICS STUDY E MATERIAL
59
BASIC ECONOMETRICS STUDY E MATERIAL
The Remedial measures are based on the knowledge. One has about the nature of
interdependence among the disturbances that is knowledge about the structure of
autocorrelation. Then the remedial tan measures can be grouped as when to know and
is not known. The problem of autocorrelation can be removed by Markov first order
autoregressive scheme, known as AR(I) Scheme. When is not known. This scheme
assumes that the disturbance in the current time period is linearly related to the
disturbance term in the precious time period, the coefficient of autocorrelation e providing
the extent of the autocorrelation of providing the extent of the interdependence.
If the value of l is known, then the problem of autocorrelation can be removed by using
Durbin - Watson a Theil- Nagar Modified and Cochrane - Orcutt (C-O) iterative procedure.
4.7.1. Meaning
In simple words, Specification error means the error that occur because of mistake in
variables inclusion or exclusion or assumption of the model.
60
BASIC ECONOMETRICS STUDY E MATERIAL
4.7.2 Consequences
The inclusion of irrelevant variable (S) does not affect the relationship between other
variables and the dependent variable,Because, the estimator for such a variable turns out to
be zero. The estimates of inclusion of irrelevant variable in a model are unbiased and
consistent. However the estimates are not efficient because of the variance are larger than
they would have been in the model excluding the irrelevant variable. Further the
model estimators violate the properties of „BLUE‟, the concept of regression because the
estimators are inefficient. If the specification error is due to qualitative change in one or
more explanatory variables, then also, the estimations will be biased.
Another sort of specification error arises when the functional relationship is incorrect.
The magnitude of bias will depend upon the size of coefficients. Thus the estimated
parameters will be biased if we calculate the parameters without taking into account of the
errors committed.
(b) There is no linear relationship among the explanatory variables X‟s (multcolltinearity)
are violated causes of violation what will happen in estimation? How to identify the factors
influencing for violations of assumption? and what are the remedial measures to solve the
issue of violation of assumption were studied in a descriptive way not in empirical way.
61
BASIC ECONOMETRICS STUDY E MATERIAL
62
BASIC ECONOMETRICS STUDY E MATERIAL
63
BASIC ECONOMETRICS STUDY E MATERIAL
64
BASIC ECONOMETRICS STUDY E MATERIAL
UNIT V
5.1 Objectives
5.2 Introduction
5.3 Lag and Reasons for introducing Lag
5.4 DL, AR, MA
5.5. Adhoc Estimation drawbacks
5.6 Koyck approach and feature
5.7 Dummy variable
5.7.1 Meaning of Dummy variable
5.7.2 Nature of Dummy variable
5.7.3 Types of Dummy variable
5.7.4. Caution in use of Dummy variable
5.8 ANOVA and ANCOVA
5.8.1Meaning ANOVA
5.8.2. Types of ANOVA
5.8.3 Advantages and Disadvantages of ANOVA
5.8.4 Meaning of ANCOVA
5.8.5. Assumptions of ANCOVA
5.8.6 Advantages and Disadvantages of ANCOVA
5.8.7 Comparison of ANOVA and ANCOVA
5.9 Regression on qualitative dependent variables
5.10 Let us sum up
5.11 Unit End Exercise
5.12 Reference Books
5.1. Objectives
To learn and apply the knowledge in real data set by construction of econometric
modeling with dummies
To understand the lag and its reason for introduction in analysis
To study the ANOVA and ANCOVA
To learn regression on qualitative independent variables and qualitative
dependent variables
65
BASIC ECONOMETRICS STUDY E MATERIAL
5.2 Introduction
Econometric researches incorporate many economic variables. Some of them are
quantifiable or measurable while some variables are qualitative and hence are not
measurable directly. In general, the explanatory variables in any regression analysis are
assumed to be quantitative in nature. For example, the variables like temperature, distance,
age etc. are quantitative in the sense that they are recorded on a well-defined scale. But in
reality all variables in an economic activity may not be measureable, in such qualitative
cases of variables, this unit knowledge will be used for finding the qualitative variables
influence on quantitative and qualitative variable. In this unit, the readers may get the
knowledge of lag variable usage for to study the implications of past period and shocks.
Further reader learns about the qualitative variables usage as independent and dependent
in a regression analysis as dummies.
66
BASIC ECONOMETRICS STUDY E MATERIAL
brands. Moreover, they may hesitate to buy in the expectation of further decline in
price or innovations.
3. Institutional reasons. These reasons also contribute to lags. For example,
contractual obligations may prevent firms from switching from one source of labor
or raw material to another. As another example, those who have placed funds in
long-term savings accounts for fixed durations such as one year, three years, or
seven years are essentially “locked in‟‟ even though money market conditions may
be such that higher yields are available else where. Similarly, employers often give
their employees a choice among several health insurance plans, but once a choice is
made, an employee may not switch to another plan for at least one year. Although
this may be done for administrative convenience, the employee is locked in for one
year
For psychological, technological, and institutional reasons, a regress and may respond to a
regressor(s) with a time lag. Regression models that take into account time lags are known
as dynamic or lagged regression models. There are two types of lagged models:
distributed-lag and autoregressive. In the former, the current and lagged values of
regressors are explanatory variables. In the latter, the lagged value(s) of the regress and
appears as an explanatory variable(s).
67
BASIC ECONOMETRICS STUDY E MATERIAL
68
BASIC ECONOMETRICS STUDY E MATERIAL
2. The appearance of Yt−1 is likely to create some statistical problems. Yt−1, like Yt, is
stochastic, which means that we have a stochastic explanatory variable in the
model. Recall that the classical least-squares theory is predicated on the assumption
that the explanatory variables either are non stochastic or, if stochastic, are
distributed independently of the stochastic disturbance term. Hence, we must find
out if Yt−1 satisfies this assumption.
3. In the original model the disturbance term was ut, whereas in the transformed
model it is vt= (ut− λut−1). The statistical properties of vt depend on what is
assumed about the statistical properties of ut, for, as shown later, if the original ut‟s
are serially uncorrelated, the vt‟s are serially correlated. Therefore, we may have to
face up to the serial correlation problem in addition to the stochastic explanatory
variable Yt−1.
4. The presence of lagged Y violates one of the assumptions underlying the Durbin–
Watson d test. Therefore, we will have to develop an alternative to test for serial
correlation in the presence of lagged Y. One alternative is the Durbin h test,
Auto regressiveness poses estimation challenges; if the lagged regress and is correlated
with the error term, OLS estimators of such models are not only biased but also are
inconsistent. Bias and inconsistency are the case with the Koyck and the adaptive
expectations models; the partial adjustment model is different in that it can be consistently
estimated by OLS despite the presence of the lagged regress and.
To estimate the Koyck and adaptive expectations models consistently, the most
popular method is the method of instrumental variable. The instrumental variable is
aproxy variable for the lagged regress and but with the property that it is uncorrelated
with the error term.
An alternative to the lagged regression models just discussed is the Almon polynomial
distributed-lag model, which avoids the estimation problems associated with
theautoregressive models. The major problem with the Almon approach, however, is
thatone must prespecifyboth the lag length and the degree of the polynomial. There areboth
formal and informal methods of resolving the choice of the lag length and thedegree of the
polynomial.
69
BASIC ECONOMETRICS STUDY E MATERIAL
should beincluded among the explanatory variables, or the regressors.Since such variables
usually indicate the presence or absence of a “quality” or an attribute.
70
BASIC ECONOMETRICS STUDY E MATERIAL
during war period; consumption expenditure might also change during war
period. Similarly, these might be temporary change in relation during the
different seasons, periods or even during different political regimes.
(ii) Spatial Effects: Sometimes economic functions change with a change in
country, economic structure or other regional differences. For example,
consumption functions for U.S.A include some variables but when this
consumption function is applied to the Indian population, necessary
corrections should be made before hand. The reason is that behaviour
pattern of American consumption will certainly be different from that of
their Indian counter-parts. They would also be facing an environment
different from that of Indian consumers. Thus consumption function for
India may include the effects of different economic setting.
(iii) Qualitative Variable‟s Effects: Economic behaviour is also influenced by
qualitative phenomena such as sex, occupation social status, material status
etc. For example, consumption pattern of a newly married couple is bound
to be different from that of an elderly couple. Similarly, the expenditure of
white collared lobourers might be different from that of manual labourers.
Thus these effects must be incorporated in the estimation process.
Effect of all above causes can be incorporated into our regression model by the
specification of appropriate dummy variables. In practice we find several types of models
containing dummy variables.
71
BASIC ECONOMETRICS STUDY E MATERIAL
5. If a qualitative variable has more than one category, the choice of the benchmark
category is strictly up to the researcher. Sometimes the choice of the benchmark is
dictated by the particular problem at hand.
6. if a model has several qualitative variables with several classes, introduction of
dummy variables can consume a large number of degrees of freedom. Therefore,
one should always weigh the number of dummy variables to be introduced against
the total number of observations available for analysis.
72
BASIC ECONOMETRICS STUDY E MATERIAL
Disadvantages of ANOVA
1. It often happens that the parent populations do not follow the normal distribution.
For example, the lifetimes of products generally follow the Weibull distribution. In
such cases the ANOVA method cannot be used. For instance, we may not be able to
use the ANOVA technique to compare the mean life of bulbs produced by three
companies.
2. If there are two or more dependent variables then the ANOVA technique cannot be
applied. The MANOVA test must be used in such cases.
3. It rarely happens that all the population variances are equal. If the assumption of
homoscedasticity is violated then the use of ANOVA cannot be justified.
4. If the null hypothesis is rejected we can only conclude that some population means
are unequal. The ANOVA test does not tell us anything about which of them are
unequal. Some post hoc tests must be carried out in order to know about that.
5. Checking all the background assumptions such as independence, normality,
homoscedasticity, etc. is in and of itself a difficult task.
73
BASIC ECONOMETRICS STUDY E MATERIAL
6. Although the calculations involved are elementary they are still tedious to peform
by hand. But ANOVA tests are usually carries out using statistical software so this is
not a huge barrier.
74
BASIC ECONOMETRICS STUDY E MATERIAL
b. ANCOVA is more precise than blocking if the correlation between the covariate
and the criterion is greater than .6Remember, ANCOVA not only reduces bias, but
it also improves sensitivity
Disadvantages of ANCOVA
The main disadvantage of ANCOVA is the underlying assumption of no difference
across groups or treatment arms in terms of the covariate used in the analysis and the
homogeneity of regression slopes
a. More assumptions to be violated with ANCOVA and effects of violations of those
assumptions not always clear.
b. Skill of Computational labor needed, if doing by hand, takes laborious.
c. Blocking is more precise than ANCOVA when correlation between covariate and
criterion is less than .4
75
BASIC ECONOMETRICS STUDY E MATERIAL
Limitation of LPM
1. The error term is not normally distributed; it also follows the Bernoulli distribution.
2. The variance of the error term is heteroskedastistic. The variance for the Bernoulli
distribution is p(1-p), where p is the probability of a success.
3. The value of the R-squared statistic is limited, given the distribution of the LPMs.
4. Possibly the most problematic aspect of the LPM is the non-fulfilment of the
requirement that the estimated value of the dependent variable y lies between 0
and 1.
5. One way around the problem is to assume that all values below 0 and above 1 are
actually 0 or 1 respectively
6. An alternative and much better remedy to the problem is to use an alternative
technique such as the Logit or Probit models.
7. The final problem with the LPM is that it is a linear model and assumes that the
probability of the dependent variable equalling 1 is linearly related to the
explanatory variable.
76
BASIC ECONOMETRICS STUDY E MATERIAL
For example if we have a model where the dependent variable takes the value of 1 if a
student has extension contact and 0 otherwise, regressed on the student education level.
The probability of contacting an extension employer will rise as education level rises.
Logit Model
In the logit model the dependent variable is the log of the odds ratio, which is a
linearfunction of the regressors. The probability function that underlies the logit model is
thelogistic distribution. If the data are available in grouped form, we can use OLS
toestimate the parameters of the logit model, provided we take into account explicitly
theheteroscedastic nature of the error term. If the data are available at the individual,
ormicro, level, nonlinear-in-the-parameter estimating procedures are called for.
77
BASIC ECONOMETRICS STUDY E MATERIAL
78
BASIC ECONOMETRICS STUDY E MATERIAL
PART B
Short answer and Essay type questions
79
BASIC ECONOMETRICS STUDY E MATERIAL
80
BASIC ECONOMETRICS STUDY E MATERIAL
Structure
6.1 Objectives
6.2 Introduction to Gretl
6.3 Features of Gretl Software
6.4 Installation of Gretl Software (Downloading Gretl from the Internet for Free)
6.5 Creating Data Sets and Reading them into Gretl
6.6 Simple Descriptive Statistics in Gretl
6.7 Let us sum up
6.8 Unit -End Exercises
6.9 Answer to Check Your Progress
6.10 Suggested Readings
Econometrics requires skills in (licensed) software packages for testing the existing
theory, creating a new theory, for evaluating any policy implications in an economy, for
examining the impact on public by government funded programme accessibility, and
utilization in a nation and for to study the cause and effect of a fact.
6.1 Objectives
This Unit is designed to provide students with the basic tools to work with data using
the open source package gretl. In this Chapter students will learn how
To write a script file.
To Import native-type data sets and several other types of data sets.
To explore your data.
To Run basic statistical tests and to Run OLS regressions.
To create graphs and plots.
81
BASIC ECONOMETRICS STUDY E MATERIAL
GRETL is the first complete econometric software package released under the GNU
software license. The software consists of a shared library, a command-line client program,
and a graphical client program. It comes with many sample data files from Greene (2000)
and Ramanathan, (2002), which are immediately accessible from the menu. It supports
several least-squares based statistical estimators (including two-stage least squares and
panel data methods), time series models (including the Cochrane–Orcutt procedure and
VARs), and some maximum likelihood methods (logit and probit). It also has built-in
commands for several econometric tests (including the Chow, Hausman, and Dickey–
Fuller tests). A copy of Gretl can be downloaded from the Internet at
http://www.sourceforge.net. It is approximately 7.5MB in size. An important item that
can be found on the Gretl window is the option for defining a new variable. Often new
variables must be created.
82
BASIC ECONOMETRICS STUDY E MATERIAL
a. What is Gretl?
b. Is Gretl Paid software?
c. Who wrote the code to Gretl?
d. Is Gretl able to open in MS Window?
gre
Software?
6.4 Installation of Gretl Software
1. DownloadingGretl from the Internet for Free
GRETL econometric software can be downloaded at the following site:
http://gretl.sourceforge.net/
1. On the left hand side of this web site, double left-click on “Gretl for Windows”
,You will be directed to another web page that looks like the following:
83
BASIC ECONOMETRICS STUDY E MATERIAL
You can double left-click on the gretl icon and it will start the software package. A
main gretl window will open and looks like the window on the following page. The
options which run along the top of the window are
File Utilities Session Data Sample Variable Model and Help
The blank part of the window will be filled when you open your data file (to be
explained shortly). It will show a list of the variables in your data set. You can then choose
the type of analysis that you wish to perform. You can run Gretl by simply clicking on
options or you can write a gretl program and run it either interactively or in batch modes.
Unless you wish to explore the software on your own, the first step in a Gretl session is
the creation of a data set which Gretl can understand. The simplest way this can be done is
to use Notepad (or Wordpad) and the Gretl Command editor. You don‟t need Excel to
create this kind of data set. Let‟s take an example.
84
BASIC ECONOMETRICS STUDY E MATERIAL
Data Set Type I: Let‟s suppose that you have 4 observations on two variables X1 and
X2, which you can write as
Date X1 X2
1970 Q1 1 3
1970 Q2 3 7
1970 Q3 3 6
1970 Q4 5 8
1971 Q1 6 12
You may have simply keyed these in yourself, or you may have found the data on the
Internet and used a cut and paste function to create a file. Let‟s suppose that you use
Notepad (or Wordpad) to create a data file called mydata.txt and which you have saved on
your desktop.
85
BASIC ECONOMETRICS STUDY E MATERIAL
Note that this quarterly data does not have any names, like X1 or X2, in the Notepad
(or Wordpad) file. It also does not have any dates, like 1970 Q3 or 1971 Q1.
Gretl will not read this data file as it is. You must do two things.
(1) You save the file again (using Notepad (or Wordpad)) as mydat.gdt
You should save the file to c:\userdata\gretl\user
(2) Next, you start Gretl (by clicking on the icon) and select File then
choose
New Command File and then choose Regular Script
(3) A new Gretl window will open and you proceed to create a Data
Header File
(4) You type into the window a description of the data and save it as
mydat.hdr .You should save the file to c:\userdata\gretl\user
There are several things to note.
First, the data file mydat.gdt and the header file mydat.hdr BOTH use the base name
“mydat”
Second, the data file must use the file suffix ___.gdt and the header file must use the file
suffix ___.hdr
Third, you should save the files in the c:\userdata\gretl\user location.
Fourth, the data is arranged in columns by observation within the data file mydat.gdt
Fifth, the header file has comment lines ( (* _____*) ), a list of names for the variables (x1
and x2 ending in ; ) and a description of the time series nature of the data (4=quarterly,
1970.11971.1 byobs).
Sixth, the variable names are case sensitive, so X1 is different from x1.
After you have created and saved both files mydat.gdt and mydat.hdr to the folder
c:\userdata\gretl\user you can start Gretl and begin your analysis.
86
BASIC ECONOMETRICS STUDY E MATERIAL
When Gretl starts (after you click on the Gretl icon), the main Gretl window will open
and you can read your data into the window. To do this, you choose FILE and then OPEN
DATA. You, then choose USER FILE and proceed to double left-click on the file mydat
shown. Gretl will then read in your data into the main Gretl window and you will see the
window shown on the following page. Note that both x1 and x2 have been read into Gretl.
A constant has beenautomatically generated also. We can now carry out many types of
statistical analyses on x1 and x2.
Note, also that at the bottom the data frequency and sample range is shown: “Quarterly:
Full range 1970:1 – 1971:1; current sample 1970:1 – 1971:1” Also, at the bottom there are
several convenient shortcuts
These are, respectively: (1) calculator, (2) editor (may not work), (3) interactive Gretl
console, (4) icon view (must have Gretl sessions saved first), (5) Gretl website link, (6)
Gretl Manual (in ___.pdf form, must have free Acrobat Reader from Adobe.com), (7) Gretl
Help, (8) X-Y Graphics, (9) Open Data. There are other ways to read data into Gretl, but
for now, this should be enough since it can be done on virtually any computer.
87
BASIC ECONOMETRICS STUDY E MATERIAL
Note that this command gives a number of different summary statistics – not just the
sample mean only. S.D. = standard deviation, C.V. = coefficient of variation =
(S.D./Mean), SKEW = Measure of skewness, EXCSKURT = measure of excess kurtosis.
We can also make a time series graph for X and Y. Choose DATA, GRAPH
SPECIFIED VARS, TIME SERIES PLOT…
Note that x1 and x2 are quarterly variables defined over the sample range 1970:1 –
1971:1. You can copy and paste this graph to a word document. Click on the graph and
follow the options given. This is known as a time series graph. Gretl can also produce X-Y
graphs. Just choose DATA, GRAPH SPECIFIED VARS, X-Y SCATTER…
88
BASIC ECONOMETRICS STUDY E MATERIAL
89
BASIC ECONOMETRICS STUDY E MATERIAL
90
BASIC ECONOMETRICS STUDY E MATERIAL
17. Autoregressive Process: A stochastic process in which the current value of the
disturbance is a function of past values with a random variables super imposed
18. Average effect: A measure of the effect of a binary explanatory variable, x, on the
outcome of interest; based on comparing the outcome when x equals 1 with the
outcome when X equals 0.
19. Average treatment effect (ATE): a measure commonly used in the policy
evaluation literature that gives the expected difference in outcomes between those
who receive a treatment and those who do not, across the whole study population.
Related to the average treatment effect on the treated (ATET) which is the expected
difference for those who would opt for treatment.
20. Backward Elimination: A computational routine in which one variable is dropped
at a time starting from a model which includes all the independent variables.
21. Behavioral Equation: An algebraic relationship based some assumption about the
behaviour of economic agents.
22. Best Estimator : within a given class of estimators the one with the minimum
variance.
23. Best Linear Unbiased Forecast: Within the class of linear unbiased forecasts the
one with minimum error variance.
24. Bias : The difference between the expected value of an estimator and the true value
of the parameter being estimated. A measure of how well the estimator performs
on average
25. Binary variable: A variable that takes only two values, usually coded as zero and
one.
26. Bivariate probit model: A model that combines two binary probit models to deal
with a system of two binary dependent variables.
27. Box-Cox Transformation: A generalised functional form which has as imiting
cases several forms used often in practise. e.g. bilinear, double log, semi-log, etc.
28. Conditional logit: A model for unordered multinomial outcomes in which the
regressors vary across the alternatives (see mixed logitand multinomial logit).
29. Conditional Probability: Probability of a event given that some other specified
event has already occurred.
30. Consistency: The estimator approaches the true parameter as sample size increases.
31. Consistent estimate: An estimate that converges on the true parameter value as the
sample size increases (towards infinity).
32. Contemporaneous Covariance:Across equations covariance between the
disturbances referring to its same time period.
33. Continuous variable: A variable that can take any take the value of any real
number with in an interval.
34. Cox proportional hazard model: A semi parametric model for duration analysis.
35. Cross-section data: Survey data in which each respondent is observed only once,
giving a “snapshot” view of the population at a point in time.
91
BASIC ECONOMETRICS STUDY E MATERIAL
92
BASIC ECONOMETRICS STUDY E MATERIAL
57. Ex-Post Forecasts: Utilising part of the sample data to forecast values for which
actual values are available.
58. Extraneous Estimators: Estimators of parameters in a model obtained from a
different body of data in possibly a different context
59. Exogenous Variables: Variables which appear in the system but are determined
outside the system.
60. Exogeneity: In the context of regression analysis, the assumption that the
regressors, x, are independent of the error term.
61. FIML: Full-information maximum likelihood (FIML) estimates multiple equation
models using the joint distribution for the equations rather than estimating each
equation separately.
62. Final Form of the Equation System: Each endogenous variable is expressed
exclusively in terms of current and lagged exogenous variables. Lagged
endogenous variables eliminated.
63. First Order Regression: An estimation procedure for missing observations in which
they are estimated from some auxiliary regressions
64. Fixed effects: The fixed effects specification treats the individual effects in panel
data models as parameters to be estimated. This is appropriate when inferences are
to be confined to the effects in the sample only, and the effects themselves are of
substantive interest With individual level survey data fixed effects are best
interpreted as random individual effects that are correlated with the explanatory
variables. This contrasts with randome effects that are assumed to be independent
of the regressors (see random effects).
65. Forward Selection:A computational routine in which one variable is added at a
time starting from a model with one independent variables.
66. Gamma distribution: Probability distribution often used to model individual
heterogeneity, especially in count data regression and duration analysis.
67. Gauss Mark of Theorem; The result that for the standard linear model with scalar
covariance matrix of disturbances the ordinary least squares estimators are BLUE
68. Geometric Lags: Lag coefficients are generated from a geometric distribution and
exhibit exponential decline..
69. Gibbs sampling: a method for drawing samples from a distribution that is used in
MCMC algorithms.
70. GMM:Many of the estimators discussed in this book fall within the unifying
framework of generalised method of moments (GMM) estimation. This replaces
population moment conditions (e.g. based on expected values) with their sample
analogues (e.g. based on sample means).
71. Generalized least squares: A generalization of ordinary least squares which relaxes
the assumption that the error terms are independently and identically distributed
acrossobservations.
93
BASIC ECONOMETRICS STUDY E MATERIAL
72. Hausman test: Tests whether there is a significant difference between two sets of
coefficients: one set that are efficient under the null but inconsistent under the
alternative and another set that are inefficient under the null but still consistent
under the alternative. Commonly used to test the IIA assumpition in multinomial
choice models and as a test of exogeneity (comparing OLS and IV extimates).
73. Hazard function: Defined as the ratio of the density function to the survivor
function for a random variable. The hazard function plays a key role in duration
analysis where it is interpreted as the probability of failing now given survival up
to now.
74. Heckit model: A two-step estimator designed to deal with the sample selection
problem.
75. Heteroskedasticity: When the variance of the error term is not constant across
observations.
76. Heteroskedastic Linear Model: A regression model in which the disturbance
variance can change from observation to observation.
77. Homoskedasticity: The property of constant variance of random disturbance,when
the variance of the error term is constant across observations.
78. Identification: The process of distinguishing a particular structure from a set of
competing structures using data and prior information.
79. Indirect Least Squares: Estimating reduced form parameters by OLS and solving
for structural parameters from these.
80. Independence of Explanatory Variables and Random Disturbances: the
assumption that the explanatory variables, if they are to be treated as stochastic, are
statically independent of the random disturbance.
81. Influential Observation: A measure of influence of an influential observation.
82. Interval Forecast: Analogous to interval estimation. providing an interval which
will bracket the true value with stated probability.
83. Interaction: Joint effect of two attributes. incorporated by including products of
dummy variables
84. Interpolation: A procedure to estimate a missing value lying between two known
values.
85. International Contribution: Increases in the R2 as a result of adding a variable to
the model.
86. Instrumental Variable: A variable which is uncorrelated with the disturbance but
highly correlated with the explanatory variable which it acts as the instrument
87. Intersection of Two Events: Joint occurrence of two events, For more than two
events, the definition is same.
88. Instability of Coefficients: Extreme sensitivity of coefficient magnitudes and signs
to small perturbations of data and/or addition of variables. a consequences of
multicollinearity.
94
BASIC ECONOMETRICS STUDY E MATERIAL
95
BASIC ECONOMETRICS STUDY E MATERIAL
96
BASIC ECONOMETRICS STUDY E MATERIAL
97
BASIC ECONOMETRICS STUDY E MATERIAL
98
BASIC ECONOMETRICS STUDY E MATERIAL
161. Reduced Form: A set of relationship, derived from the structural equations, in
which each endogenous variable is expressed a function of all exogenous variables.
162. RESET: A general test for misspecification of the functional form of a regression
model.
163. Retransformation problem: Highlights the need to use an appropriate
transformation back to the y-scale when regression models are run on transformed
data such as log(y).
164. Ridge Regression: A procedure which attempts to reduce the influence of
Multicollinearities through the introduction of a biasing constant.
165. Ridge Trace: A graphical procedure used for choosing the value of the biasing
constant in ridge regression
166. Right censoring: Occurs when values in the right hand tail of a distribution are cut-
off at some threshold and only the threshold value is known. This often arises in
duration analysis where some spells are incomplete at the time the data are
collected.
167. Risk function: A decision -Theoretic concept. The expected cost of using an
estimator to estimate a parameter. cost arises from the wrong decisions ,i.e. using
an estimate different from true value
168. Sample: A finite collection of values of the random variable actually observed.
169. Sampling Distribution: Probability distribution of an estimator.
170. Sample selection bias: The bias created when non-responders are systematically
different from responders.
171. Sample Space: The collection of all the possible elementary outcomes.
172. Scalar- Covariance Matrix: When the covariance matrix of the random
disturbances is diagonal with identical diagonal elements. Consequence of
homoskedasticity and serial independence.
173. Serial Correlation: Successive disturbance terms in the regression mode are
correlated.
174. Serial Independence: The property of mutual stochastic independence of random
disturbances.
175. Semi parametric: A method that mixes parametric assumptions (e.g. that the
relationship between y and X is linear) and nonparametric assumptions (e.g. that
the distribution of the error term is unknown).
176. Singular- Covariance Matrix: Contemporaneous disturbances are linearly
dependent resulting in a singular covariance matrix of disturbances
177. Specification Error Test: A test designed to detect departure from one or more
assumptions behind a proposed model.
178. Splicing: A procedure to obtain a consistent index series from two series with
different bases but at least one point of overlap.
179. Standardisation: Re-expressing variables with a change of origin and units of
measurement
99
BASIC ECONOMETRICS STUDY E MATERIAL
100
BASIC ECONOMETRICS STUDY E MATERIAL
199. Variables: Entities whose behaviour is being studied. generally they represent
some measurable and observable economic construct.
200. Variance Inflation Factors: Diagonal element of covariance matrix of OLS
estimators. they indicate the impact of multicollinearity on estimator variances.
201. Variance of Random Variable : A measure of dispersion around the mean in the
values of a random variable. defined with respect to its density function.
202. Von Neumann Ratio: A statistic for testing for serial correlation in a series of
random variables.
203. Weibull model: A parametric model for duration analysis.
204. Weighted least squares: Weights (wi) are attached to the values of the dependent
variable (yi) and independent variables (xi) before using least squares regression.
This method can be used to correct for heteroskedasticity.
205. Zero Order Regression: An estimation procedure for missing observations in
which they are replaced by averages of the available observations.
***********
101
BASIC ECONOMETRICS STUDY E MATERIAL
102
BASIC ECONOMETRICS STUDY E MATERIAL
9. If in our regression model, one of the explanatory variables included is the lagged
value of the dependent variable, then the model is referred to as
A. Best fit model B. Dynamic model
C. Autoregressive model D. First-difference form
10. In binary logistic regression:
A. The dependent variable is continuous.
B. The dependent variable is divided into two equal subcategories.
C. The dependent variable consists of two categories.
D. There is no dependent variable.
PART-B
Answer the following questions either (a) Or (b) (5X7=35)
X2: 2 15 4 3
X3: 3 14 5 2
(Or)
(b).Illustrate the application of Multiple Regression in day to day life.
14. (a) Write a note on Park test.
(Or)
(b) What are the consequences and remedies of Autocorrelation?
15. (a) What are the reasons for lags in econometrics?
(Or)
(b) Differentiates ANOVA and ANCOVA.
PART-C
Answer any one of the following questions (3X10=30)
16. Illustrate the methodology of Econometrics.
17. What are the assumptions of Classical Linear Regression Model?
18. Derive the formula for 𝛽. of multiple regression in matrix form.
19. Enumerate the causes, consequences and remedies of Multicollinearity.
20. State the consequences of Model Specification Error.
******************
103
BASIC ECONOMETRICS STUDY E MATERIAL
104
BASIC ECONOMETRICS STUDY E MATERIAL
PART-B (5x7=35)
11. a) Describe the objectives and features of Econometrics.
(OR)
b) Explain the Methodology of Econometrics.
15. a).What are the reasons for the introduction of Lag in Regression?
(OR)
b). Find the difference between ANOVA and ANCOVA
PART-C (3x10=30)
Answer any Three questions:-
16. Illuminate the scope of Econometrics.
17. Enumerate the assumption of Classical Linear Regression model.
18. Three related variables take following sets of values: Estimate a regression X1 on X2
and X3
X1: 1 2 3 4 5
X2: 2 1 5 4 3
X3: 3 1 4 5 2
105
BASIC ECONOMETRICS STUDY E MATERIAL
******************
106
BASIC ECONOMETRICS STUDY E MATERIAL
****$$$***
107