0% found this document useful (0 votes)
34 views

Econometrics II

TO HELP STUDENTS IN MAKING NOTES

Uploaded by

Matilda Shivachi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Econometrics II

TO HELP STUDENTS IN MAKING NOTES

Uploaded by

Matilda Shivachi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Econometrics II: (ECON 324)

1. Programme : Bachelor of Economics


2. Department : Economics
3. Course Title : Econometrics II
4. Course Code : ECON 324
5. Course Duration : Semester
6. Lecture hours per week :3
7. Tutorial hours per week :1
8. Course Credits :3
9. Method of Assessment : Course work 40%
: Final Examination 60%

10. Course Description

This course builds and expands on the knowledge acquired in Econometrics I. As such, it
emphasizes both the theoretical and the practical aspects of statistical analysis, focusing on
techniques for estimating econometric models of various kinds and for conducting tests of
hypotheses of interest to economists. It is designed in such a way so as to help the individual to
develop a solid theoretical background in introductory level econometrics, the ability to
implement the techniques and to critique empirical studies in economics.

11. Aim of the Course


The aim of this course is to provide the basic knowledge of econometrics that is essential
equipment for any serious economist or social scientist, to a level where the participant would be
competent to continue with the study of the subject in a graduate programme. While the course is
ambitious in terms of its coverage of technical topics, equal importance is attached to the
development of an intuitive understanding of the material that will allow these skills to be
utilised effectively and creatively, and to give participants the foundation for understanding
specialized applications through self-study with confidence when needed.

12. Learning Outcomes


By the end of this course, students should be able to:
a. Develop both simple and Multiple Regression models;
b. Outline Properties of OLS Estimators;
c. Derive OLS Estimators;
d. Use statistical Packages to Test hypotheses; and
e. Report on regression Results.

1. Topics of Study

13.1 Simple Regression Analysis


a) Definition of Simple Regression Model
b) Deriving the Ordinary Least Squares (OLS) Estimators
c) Properties of OLS Estimators
d) Units of Measurement and Functional Form
e) Unbiasedness and Variances of OLS Estimators

1
f) Regression Through the Origin
13.2 Multiple Regression Analysis
a) Motivation for Multiple Regression
b) Mechanics and Interpretation of Ordinary Least Squares
c) Expected Value of the OLS Estimator
d) Variance of OLS Estimator
e) Efficiency of OLS: The Gauss-Markov Theorem
f) The Language of Multiple Regression Analysis

13.3 Multiple Regression Analysis: Inferencing


a) Sampling Distribution of OLS Estimators
b) Testing Hypothesis About a Single Population Parameter: The t Test
c) Confidence Intervals
d) Testing Hypotheses About a Single Linear Combinations of Parameters
e) Testing Multiple Linear Restrictions: The F Test
f) Reporting Regression Results

13.5 Statistical Packages for Econometricians


a) Microsoft Excel
b) Eviews
c) STATA
d) SPSS

14. Prescribed Text(s)


Gujarati, D (2014) Econometrics by Example, 2nd edition, Palgrave Macmillan
Stock, J. H and Watson, M. W (2014) Introduction to Econometrics, 3rd edition, Pearson
Wooldridge, J. M, (2013) Introductory Econometrics: A Modern Approach, 5th edition, Cengage
Learning.

15. Recommended Textbooks


Pedace, R (2013) Econometrics for Dummies, Wiley
Verbeek, M (2012) A Guide to Modern Econometrics, 4th edition, Wiley
Studenmud, A. H (2016) Using Econometrics: A Practical Guide, 7th edition, Pearson
Westhoff, F (2013) An Introduction to Econometrics: A Self-Contained Approach, MIT Press
Hilmer, C (2013) Practical Econometrics: Data Collection, Analysis, and Application, 1st
edition, McGraw-Hill Education

2
THE REGRESSION MODEL
Chapter 1
WHAT IS ECONOMETRICS?
Literally interpreted, econometrics means “economic measurement.” Although
measurement is an important part of econometrics, the scope of econometrics is much
broader, as can be seen from the following quotations:
1. Econometrics, the result of a certain outlook on the role of economics, consists of
the application of mathematical statistics to economic data to lend empirical support
to the models constructed by mathematical economics and to obtain
numerical results.
2. Econometrics may be defined as the quantitative analysis of actual economic
phenomena based on the concurrent development of theory and observation, related
by appropriate methods of inference.
3. Econometrics may be defined as the social science in which the tools of economic
theory, mathematics, and statistical inference are applied to the analysis of economic
phenomena.
4. Econometrics is concerned with the empirical determination of economic
Laws.

METHODOLOGY OF ECONOMETRICS
How do econometricians proceed in their analysis of an economic problem?
That is, what is their methodology? Although there are several schools of thought on
econometric methodology, we present here the traditional or classical methodology, which still
dominates empirical research in economics and other social and behavioral sciences .
Broadly speaking, traditional econometric methodology proceeds along
the following lines:
1. Statement of theory or hypothesis.
2. Specification of the mathematical model of the theory
3. Specification of the statistical, or econometric, model
4. Obtaining the data
5. Estimation of the parameters of the econometric model
6. Hypothesis testing
7. Forecasting or prediction
8. Using the model for control or policy purposes.

TYPES OF DATA
The success of any econometric analysis ultimately depends on the availability of the
appropriate data. It is therefore essential that we spend some time discussing the nature,
sources, and limitations of the data that one may encounter in empirical analysis.
Three types of data may be available for empirical analysis: time series, cross-section, and
pooled (i.e., combination of time series and cross-section) data.

3
1.Time Series Data: A time series is a set of observations on the values that a variable takes at
different times. Such data may be collected at regular time intervals, such as daily (e.g., stock
prices, weather reports), weekly (e.g., money supply figures), monthly [e.g., the unemployment
rate, the Consumer Price Index (CPI)], quarterly (e.g., GDP), annually (e.g., government
budgets), quinquennially, that is, every 5 years (e.g., the census of manufactures), or
decennially (e.g., the census of population). Although time series data are used heavily in
econometric studies, they present special problems for econometricians.

2. Cross-section Data: Cross-section data are data on one or more variables collected at the
same point in time, such as the census of population conducted by the Census Bureau every 10
years. Just as time series data create their own special problems (because of the stationarity
issue), cross-sectional data too have their own problems, specifically the problem of
heterogeneity.

3. Pooled Data: In pooled, or combined, data are elements of both time series and cross-
section data.
Panel, Longitudinal, or Micro-Panel Data is a special type of pooled data in which the same
cross-sectional unit (say, a family or a firm) is surveyed over time. For example, the U.S.
Department of Commerce carries out a census of housing at periodic intervals. At each periodic
survey the same household (or the people living at the same address) is interviewed to
find out if there has been any change in the housing and financial conditions of that household
since the last survey. By interviewing the same household periodically, the panel data provides
very useful information on the dynamics of household behavior.

Simple Regression
The simple regression model can be used to study the relationship between two variables.
For reasons we will see, the simple regression model has limitations as a
general tool for empirical analysis. Nevertheless, it is sometimes appropriate as an
empirical tool. Learning how to interpret the simple regression model is good practice for
studying multiple regression, which we will do in subsequent chapters.
Much of applied econometric analysis begins with the following premise: y and x are two
variables, representing some population, and we are interested in “explaining y in terms
of x,” or in “studying how y varies with changes in x. For example, y is soybean crop yield and x
is amount of fertilizer; y is hourly wage and x is years of education; and y is a community crime
rate and x is number of police officers.
In writing down a model that will “explain y in terms of x,” we must confront three
issues.
First, since there is never an exact relationship between two variables, how do we
allow for other factors to affect y? Second, what is the functional relationship between
y and x? And third, how can we be sure we are capturing a ceteris paribus relationship
between y and x (if that is a desired goal)?

4
We can resolve these ambiguities by writing down an equation relating y to x. A simple
equation is

Y = βο+ β 1 χ 1 +U

β
In this section we will show that we are only able to get reliable estimators of 0 and β 1 from
a random sample of data when we make an assumption restricting how the unobservable
U is related to the explanatory variable χ . Without such a restriction, we will not be able
to estimate the ceteris paribus effect, β 1 . Because U and χ are random variables, we need a
concept grounded in probability. Before we state the key assumption about how χ and u are
β
related, we can always make one assumption about u . As long as the intercept 0 is included
in the equation, nothing is lost by assuming that the average value of u in the population is
zero.
Assumptions
1. The expectations that U has 0 expected value.
Ε ( U ) =0
2. The covariance between × and u is equal to zero.
Cov ( x , y )=0 or Ε( x , y)=0

Rewrite the assumptions


1. Ε ( U ) =Ε(Y −βο−β 1 χ 1 )
2. Ε [ X (Y −βο− β1 X 1 )]=0
βο , β 1 = these are the population parameters.
β^ ο , β^ 1 = these are the sample estimates.

For the Sample


1 and 2 can be re-expressed as
n

n
−1
∑ ( Y i− β^ ο− β^ 1 X 1 )=0
1. = i=1
n

n
−1
∑ χ i ( Y i− β^ ο− β^ 1 χ 1 )=0
2. = i=1

^ β^ ₁ using the minimizing residual sum of squares.


Solving for β ο and

Y i =βο+ β 1 χ 1 +U i

5
U i =Y i− βο−β 1 χ 1
∑ U i2=∑ (Y i −βο− β1 χ 1)
Solutions
n

n
−1
∑ ( Y i− β^ ο− β^ 1 χ 1 )=0
1. = i=1

n
∑ β^ ο
= n
∑ Y − i=1
n
− β^ ( )
∑ χ1
n
n β^ ο ^
Ȳ − − β 1 χ̄ 1 =0
= n
=Ȳ − β^ ο− β^ χ =0
1 1

β^ ο=Ȳ − β^ 1 χ̄ 1

n
−1
∑ χ ( Y − β^ ο− β^ 1 χ 1)=0
2. i=1

∑ x ( y − β^ 0− β^ 1 x 1 ) =0
n
n ∑ x ( y − β^ 0− β^ 1 x 1 ) n
∗ =0∗
1 n 1
∑ χ (Y −(Ȳ − β^ 1 χ̄ i )= β^ 1 χ i=0
∑ χ [(Y −Ȳ ), β^ ( χ̄ − χ )]=0
1 1 1

∑ χ [(Y −Ȳ )− β^ 1( χ − χ̄ 1 )]=0


∑ χ (Y −Ȳ )− β^ ∑ χ ( χ − χ̄ )=0
1 1 1

β^ 1
∑ χ ( χ 1− χ̄ 1) =∑ χ (Y −Ȳ )
∑ χ ( χ 1− χ̄ 1) ∑ χ ( χ i − χ̄ i )
β^ 1 =
∑ χ (Y −Ȳ )
∑ χ ( χ 1− χ̄ 1 )
Follow the properties of Summation Notation

6
∑ χ (Y −Ȳ )=∑ ( χ − χ̄ 1 )(Y −Ȳ )
∑ χ ( χ− χ̄ )=∑ ( χ 1− χ̄ )( χ 1− χ̄ 1 )
¿ ∑ ( χ 1 − χ̄ 1 )2
^
Rewrite β 1 as

β^ 1=
∑ ( χ 1− χ̄ 1 )(Y −Ȳ )
∑ ( χ 1− χ̄ 1 )2
β^ ο and β^ 1 are called the Ordinary list Squares (OLS) estimators of βο , β 1 .
Minimizing the residual sum of squares

∑ U^ 2=∑ (Y − β^ ο− β^ 1 χ )2
Y^ = The fitted value of Y

Y^ = β^ ο+ β^ 1 χ 1
U = Error term/ disturbance term

U^ = Residual

U^ =Y −Ȳ
U^ =Y − β^ ο− β^ 1 χ 1

^y = β^ 0 + β^ 1 χ 1

U^ is an estimate of the disturbance term.

7
PROPERTIES OF ORDINARY LIST SQUARES (OLS) ON ANY SAMPLE DATA


Fitted values and residuals.

Algebraic properties of OLS statistics.
There are several useful algebraic properties of OLS estimates and their
associated statistics, therefore, we shall now cover the most three important of
these.
1. The sum and therefore the sample average of OLS residuals is zero.
∑ U^ i=0
2. The sample covariance between the regressor and the OLS residual is zero.
N
∑ χ i U^ i=0
i=1

3. The point ( χ̄ , ȳ) is always on the OLS regression line.


∑ y = ȳ
n
∑ χ = χ̄
n
In conclusion, y= ^y + u^ .

MOTIVATING THE R-SQUARE


Define the total sum of squares (SST), the explained sum of squares (SSE) and the
residual sum of squares (SSR), as follows;

N
SST =∑ ( y i − ȳ )2
i=1
N
SSE=∑ ( Y^ i−Ȳ )2
i=1
SSR=∑ U^ I 2

 SST is a measure of the total sample variation in the


y i . If we divide the SST by
n=1 the sample variance of y .

 SSE measures the sample variation in the


Y^ i .

 SSR measures the sample variation in the


U^ .
i

Thus, SST =SSE+ SSR

Proving that SST =SSE+ SSR

8
N
∑ (Y i−Ȳ )2=∑ (Y −Y^ i +Y^ i −Ȳ )2
i=1

=∑ [(Y i− Y^ i )+( Y^ i−Ȳ )]2


=∑ [ U^ i +(Y^ i−Ȳ )]
=∑ [U i2 +2 U^ i (Y^ i−Ȳ )+( Y−
^ Ȳ )]2
=∑ U^ +2 ∑ U^ ( Y^ −Ȳ )+∑ ( Y−
i i i
^ Ȳ )2
=SSR+2 ∑ U^ i ( Y^ i−Ȳ )+SSE
Now, ∑ U^ =( Y^ −Ȳ )=0
i i
Because, there is no covariance between the residual and the fitted values.

∑ ^ 2
Therefore SST = (Y −Y ) =SSR +SSE

GOODNESS OF FIT
2
Where we actually use the R .
SST= SSR + SSE
SSE SSR
R2 = 1=
SST OR SST

R2 ranges between 0 and 1.

0.5 below is not a good fit but 0.5 above shows goodness of fit.

R2 is the ratio of the explained variation compared to the total variation thus it is interpreted
as fraction of the sample variation in Y explained by χ .
2
Adjusted R
2
It is a strict version of the R .

SSR ∫ n−k
R̄2 −1−
SST ∫ n−1

MULTIPLE REGRESSION MODEL ANALSIS: ESTIMATES


1. Motivation of Multiple Regression
e.g. Salary = βο+ β 1 Edu+ β 2 Attendance+U
Y = βο+ β 1 χ 1 + β 2 χ 2 + U

9
Where, βο=Intercept
β 1= measures the ∆ in Y due to the ∆ in χ 1 , Ceteris Paribus.
β 2 = this measures the ∆ in due to the ∆ in χ 2 .

Multiple regression analysis is also useful for generalizing functional relationships between
variables.
2
e.g. Con = βο+ β 1 inc+ β 2 inc +U
The model with K- Independent variables.
The general multiple linear regression model; also called the multiple regression model can be
written in the population as follows.

1.
Y = βο+β 1 χ 1 +β 2 χ 2 +.. .. . .. ..+β k χ k +U
Where, βο= Parameter associated with χ 1
β 1= Parameter associated with χ 2
K= open ended

Since there are K independent variables and an intercept, equation 1 contains K + 1 population
parameters.
DERIVING THE OLS ESTIMATORS OF THE MULTIPLE REGRESSION MODEL

Y = βο+β 1 χ 1 +β 2 χ 2 +U
Y^ = β^ ο+ β^ χ + β^ χ
1 1 2 2
U^ =Y −Y^
U^ =Y − β^ ο− β^ 1 χ 1− β^ 2 χ 2
^
Minimizing the residual ( U ) sum of squares.

∑ U^ 2=∑ (Y − β^ ο− β^ 1 χ 1− β^ 2 χ 2)2
First Order Conditions

2 ∑ U^ 2
=(−1 ) [ 2 ∑ (Y − β^ ο− β^ 1 χ 1− β^ 2 χ 2 )2
i. 2 β^ ο
−2 ∑ ( y− β^ ο− β^ 1 χ 1 − β^ 2 χ 2 ) 0
=
ii. = −2 −2

∑ ( y− β^ ο− β^ 1 x 1− β^ 2 x 2)=0
10
2 ∑ u^ 2
=−2 χ 1 ∑ ( y− β^ ο− β^ 1 − β^ 1 χ 1 − β^ 2 χ 2 )=0
iii. 2

2 ∑ u^ 2
=−2 ∑ χ 2 ( y− β^ ο− β^ 1 − β^ 1 χ 1 − β^ 2 χ 2 )=0
^
2 β2
iv.

THE NORMAL EQUATIONS

i. ∑ ( Υ − β^ ο− β^ 1 χ 1− β^ 2 χ 2)=0

∑ ( Υ −n β^ ο− β^ 1∑ χ 1− β^ 2 ∑ χ 2 )=0

∑ Υ − β^ ∑ χ 1 − β^ ∑ χ 2 = n β^ ο
1 2
n n n n
^
∑ Υ − β ∑ χ 1 − ∑ χ2
β^ ο=
n n n

^ Ῡ− β^ χ̄ − β^ χ̄¿¿ 2 ¿¿ ¿¿
βο= 1 1 2

^
Solving for β 1 .
Remember Y = βο+ β 1 χ 1 + β 2 χ 2 + U
We introduce
χ 1 =α 0 + α 1 χ 2 +r 1
E ( χ 1 )= ^χ 1
r^ = χ 1 − ^χ 1
Rearrange χ 1 = ^χ 1 + ^r 1
Remember
∑ U^ χ 1
∑ U^ ( ^χ 1+^r1 )
∑ U^ χ^ 1+∑ U^ r^

11
∑ U^ r^ =∑ r^ (Y − β^ 0− β^ 1 χ 1− β^ 2 χ 2 )
¿ ∑ y r^ − β^ 0 ∑ r^ 1 − β^ 1 ∑ χ 1 r^ 1 − β^ 2 ∑ χ 2 r^ 1
¿ ∑ y r^ − β^ 1 ∑ χ 1 r^ 1
¿ ∑ y r^ − β^ 1 ∑ ^r 1 ( ^χ 1 + ^r 1 )
β^ 1 ∑ χ 1 r^ ∑ y r^
=
∑ 1 χ ^
r ∑ χ 1 ^r
β^ 1 =
∑ y ^r
∑ χ 1 r^

EXPECTED VALUE OF OLS ESTIMATORS

 We now turn to the statistical properties of OLS for estimating the parameters in an
underlined population model.
 Under this section we derive the expected value of the OLS estimators.
 In population we state and discuss four assumptions (which are direct exertions of the
simple regression model assumptions).
 Under which the OLS estimators are unbiased for the population parameter.
 We shall also explicitly obtain the bias in OLS when an important variable has been
omitted from the regression.

THE OLS ASSUMPTIONS


1. Linearity in parameters
2. Random sampling
3. No perfect collinearity
4. Zero conditional mean
1. Linearity in Parameters.
Y = β0 + β 1 χ 1 + β 2 χ 2 + U
β 0 , β 1 and β 2 are linear.
2. Random Sampling
We have a random sample of ‘n’ observations [( χ i1 , χ i 2 , χ i3 ..... χ ik , χy i ):i=1,2,...n],
following the population model in assumption 1.
Y i =β 0 + β 1 χ 1i + β 2 χ 2i +. . .+ β k χ ik +U i
Where i refers to the observation and the second subscript on χ is the variable
number.
3. No Perfect Collinearity

12
In the sample (and therefore in population) none of the independent variable is
constant and there are no exact lineal relationships among the independent variables
then we say the model suffers from perfect collinearity and it cannot be estimated using
OLS. It is important to note that assumption three does not allow the independent
variables to be correlated but they just cannot be perfectly correlated.
4. Zero Conditional Mean
The error U has an expected value of zero, given values of the independent variables. In

other words the


E(U|χ , χ .... χ )=0.
1 2 k
Possible Violations
 Assumptions four can fail if the functional relationship between the explained
and explanatory is miss-specified.
 Omitting an important factor that is correlated to the and χ variables cause the
assumption to also fail.
When these four assumptions hold we can state the unbiasedness of OLS theorem.
Theorem 1: The unbiasedness of OLS. Under the assumptions MLR.1 through MLR.4

E ( β^ j )=β j
j=0 .. .. k
β .
For any values of the population parameter j In other words the OLS estimators are
unbiased estimators of the population parameters. Including relevant variables in a regression
model. This means that one or more of the independent variable is included in the model even
though it has no partial effect on Y in the population ( that is its population coefficient is zero).
To illustrate this suppose we specify the modes as follows.

Y = β0 + β 1 χ 1 + β 2 χ 2 + β 3 χ 3
χ
And this model satisfies assumptions MLR.1 through MLR.4. However, 3 has no effect on Y
β
after χ 1 or χ 2 have been controlled for, which means that 3 is equal to zero. The variables
χ 3 may not be correlated with χ 1 and χ 2 ; all that matters is that, once χ 1 and χ 2 are

controlled for
χ 3 , has no effect on Y . In terms of conditional expectations;

E(Y │ χ 1 , χ 2 , χ 3 )=E(Y │ χ 1 , χ 2 )=β 0 + β 1 χ 1 + β 2 χ 2

Because we do not know that


β 3 is equal to zero, we are enquired to estimate the equation

including
χ3 .

13
Y^ = β^ 0 + β^ 1 ^χ 1 + β^ 2 χ 2 + β^ 3 χ 3
What is the effect of including the irrelevant variable when its coefficient in the model is zero?
^ ^
In terms of the unbiasedness of β 1 and β 2 there is no effect.
The Omitted Variable Bias
Now suppose that rather than including an irrelevant variable, we omitted a variable that
actually belongs to the true population model. This is often called the problem of excluding the
irrelevant variable or excluding a relevant variable or underspecifying the model.
Let us begin with the case where the true population model has two explanatory variables and
an error,
Y= β +β χ +β χ +U
0 1 1 2 2 and we assume that this model satisfies assumptions
MLR.1 through MLR.4. However, due to ignorance or data an availability, we estimate the
model by excluding χ 2 , in other words we perform a simple regression of Y on χ 1 only to the
~ ~ ~
obtain the equation;
Y = β0 + β 1 χ1 .
~
We use the symbol ‘tilde’ rather than the ‘hut’ to emphasis that
β 1 comes from
underspecifying model.
Example

Wage=β 0 +β 1 edu +β 2 exp e+U

We instead estimate the model


Wage=β 0 +β 1 edu+V .

^
Where V is equal to β 2 exp erience +U . As we derived β 1 under simple regression, a similar
~ ~ ^ ^ ~
procedure will be used to derive
β 1 in this case. However, β 1= β1 + β1 δ 1 . When assumptions
^ ^
MLR.1 through MLR.4 hold, we know that β 1 and β 2 would be unbiased for β 1 and β 2
~ ~ ~ ~
E ( β 1 )=E ( β^ 1 + β^ 2 + δ 1 )=E ( β^ 1 ) + E ( β^ 2 ) δ 1 =β 1 + β 2 δ 1 ,
respectively. Therefore, which implies
~ ~ ~
that bias in 1 = E ( β 1 )−β 1 =β 2 δ 2 .
β
~ ~
β 2 δ 2 is called the omitted variable bias. So β 1 can be unbiased only under two possibilities.
~
β 2 δ 1 will be equal to zero.
1. If β 2 =0 , since
~ ~
δ =0 , since δ 1 is the covariance between χ 1 and χ 2 .
2. If 1
Note: Over specification does not affect the biasness of the OLS estimators when the four
assumptions hold but under specification can lead to biasness in OLS estimators. Even though

14
we can still have an biasness under the under the two conditions stated above, that is when χ 2
~
is an irrelevant variable ( β 2=0) or when χ 1 and χ 2 are not correlated δ 1 )=0 .
(

15

You might also like