0% found this document useful (0 votes)
136 views233 pages

Econometrics All Chpter

Uploaded by

abrgnr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
136 views233 pages

Econometrics All Chpter

Uploaded by

abrgnr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 233

MBA: Econometrics

Theory and Application


(Econ 612)
By Dagmawe T.
([email protected])
Course Contents
 Chapter 1: Introduction
 Chapter 2: Simple Linear Regression Model

 Chapter 3: Multiple Linear Regression Model


 Chapter 4: Violations of the Assumptions of
the Classical Model
 Chapter 5: Regression Analysis with
Qualitative Information: Binary (Dummy
Variables)
 Chapter 6: Introduction to Basic Regression Analysis with Time
Series Data
 Chapter 7: Introduction to Panel Data Regression Models 2
Chapter One: Introduction

1.1. Definition and Scope of


Econometrics
1.2. Models: Economic models
and Econometric models

1.3. Aim and Methodology of


Econometrics
1.4. The Sources and Types of Data
3
1.1. Definition & Scope of
Econometrics
What is Econometrics?

Oikovomia Economy
Econo-metrics
Metopov Measure

 Econometrics means economic measurement. ---measuring


unknown values of theoretically defined parameters
 It was coined in 1930s following the foundation of
econometric society
 Ragnar Frisch and Jan Tinbergen--- founders of
4
5
 The scope of econometrics is broader than
measurement.
 The social science in which the tool of economic
theory, mathematics, and statistical inference are
applied to the analysis of economic phenomena.
 Econometrics converts qualitative statements into
quantitative statements.
 It is a systematic study of economic phenomena
using observed data.

6
Econometrics is:
 A conjunction of economic theory and actual measurements,
using the theory and technique of statistical inference as a bridge
pier (By T.Haavelmo, 1944)
 The Application of the mathematical statistics to economic data
in order to lend empirical support to the economic mathematical
models and obtain numerical results (Gerhard Tintner, 1968)
 The quantitative analysis of actual economic phenomena based
on concurrent development of theory and observation, related
by appropriate methods of inference (P.A.Samuelson,
T.C.Koopmans and J.R.N.Stone, 1954)

 A social science which applies economics, mathematics and


statistical inference to the analysis of economic phenomena (By
7
 Econometrics is a field of knowledge which helps us
evaluating economic theories in empirical terms.
 It is defined as a science that deals with measurements
of economic relationship through statistical
techniques.

Example:
 Relationship b/n demand for a good & price of a good
 Relationship b/n job performance & leadership style
 Relationship b/n tuition fee & number of students enrolled
 Relationship b/n advertising expenditure & market share
 Relationship b/n education and earnings
 Relationship b/n CGPA and hours of studying
8
Economic Mathematical
Theory Economics

Scope of
Econometrics

Statistics
9
Economic Mathematical Mathematical
Theory Economics Economics

Economics Mathematics
Scope of Econometrics
Econometrics

Economic Statistics Mathematical


statistics Statistics Statistics
10
Econometrics is an integration of economic theory,
statistical inference, and mathematical economics
 Economic theory: makes statements that are qualitative in
nature (verbal exposition).
 Mathematical economics: expressing economic theory in
mathematical forms and symbols without verification of the
theory (without empirical testing)
 Statistics: Economic and mathematical (inferential)
 deals with collecting, processing, and presenting data.

They do not provide numerical values (measurements)


for economic relationships. 11
 Economic theory, mathematical economics and statistics
are necessary, but not by itself sufficient conditions for
understanding the real situations.
 It is the unification of the three that is powerful and that
constitutes econometrics ( R.Frisch, 1933).

Differences
 No essential difference between economic theory and
mathematical economics: way of expression
 Both state economic relationships in exact form.

 They do not allow for random elements.


12
Econometrics differs from mathematical economics in
that it assumes random (stochastic) relationships among
economic variables.
 Econometric methods provide numerical values of coefficients
of economic phenomena.
Econometrics differs from economic statistics in that
the later does not concern with testing economic theories.
 Econometrics differs from mathematical statistics in
that the later deals with statistical methods of measurement
based on controlled experiments. Experimental data
 Econometrics uses adapted statistical methods, which are
adjusted to become appropriate for economic relationships.
 Econometrics involves non-experimental (observational) data
13
 Econometrics is a set of research tools employed in the
various business disciplines and social sciences.

 Studying econometrics fills a gap between being “a


student of economics” and being “a practicing economist.
 Econometrics is about how we can use economic, business
or social science theory, data and tools from statistics, to
answer “how much” type questions.
It is the application of statistical and mathematical
techniques to the analysis of empirical data with the
purpose of verifying or refuting theories.
14
1.2. Economic Models Vs. Econometric
Models
 Models are simplified representations of the real world.

Economic Model
 is an organized set of relationships that describes the
functioning of an economic entity under a set of
simplifying assumptions.
 consists of three structural elements:
1. A set of variables
2. A list of fundamental relationships
3. A number of coefficients
 postulates exact (deterministic) relationships among
15
variables.
An Econometric model:
 consists of behavioral equation derived from economic
models and specification of probability distribution of errors.
 contains a random element which is ignored by economic
models.
 has two parts: observed variables and disturbances
 postulates stochastic (random) relationships among variables.

 Example: Economic theory postulates that demand for a


good depends on its price, on the prices of other related
commodities, on consumers’ income and on tastes.
16
Economic Model:

Econometric Model:
where u stands for the error term.
 makes the distinction between economic and
econometric models.
 The main difference between the economic modeling and
econometric modeling is economic modeling is exact in
nature whereas the later contains a stochastic term also.
Why we include the error term in the model???
17
1.3. Aims and Methodology of Econometrics
Three main goals of Econometrics
Analysis - testing the implication of a theory
 verifying how well economic theories explain the observed
behavior of economic units.
Policy making - Obtaining numerical estimates of the
coefficients of economic relationships for policy simulations.

Forecasting- using the numerical estimates of the


coefficients in order to forecast the future values of economic
magnitudes – for planning of economic development

18
Methodology of Econometrics
Econometric research is concerned with the
measurement of parameters of economic relationships
and prediction of values of variables

 Starting with the postulated theoretical relationships


among economic variables, the following steps/stages
are involved in econometric research methodology.

• Specification the model


• Estimation of the model
• Evaluation of the estimates
• Evaluation of the forecasting power of
19
1. Specification of the Model
• Formulation of maintained hypothesis.
• It is about expressing the relationships between economic
variables in mathematical form.
• It Involves the determination of:
 variables included in the model: Dependent and
independent variables
 the size and sign of the parameters of the model
(priori theoretical expectation)
 The mathematical form of the model: number of
equations, specific functional form of the equations (linear or
non-linear)
20
• It presupposes the knowledge of theory and familiarity
with the particular phenomenon being studied.
• It should be theory-inspired and data-centered.
• It is the most important and difficult stage of any
econometric research.
• It is often the weakest point of most econometric applications.
• Almost all econometric models are sensitive to specification
errors. Why?
• Common reasons:
 imperfections and looseness statements in theories
 Limitation of our knowledge
 obstacles in data requirements
21
• The most common errors of model specification are:

a) Omissions of important variables from model.

b) Inclusion of irrelevant variables in the model.

c) Wrong mathematical form & measurement errors.

Note: our econometric models are mis-specified to some


extent. Hence, we seek our models to be reasonably well
specified - keep our errors relatively modest

22
2. Estimation of the Model: Testing maintained hypothesis
• It is about providing numerical estimates of parameters of
the model.
• It is purely a technical stage, which requires knowledge of:
 the various econometric methods and their
assumptions,
 the economic implications for estimates
• This is stage involves:
 Data collection: Gathering of the data on the variables
included in the model
 Examining the aggregations problems of the model:
examining the possibility of aggregation errors in the
estimates of coefficients 23
 Examining of the identification conditions of the model:
checking whether the parameters are the true coefficients
of the estimated model and to determine whether a
relationship can be statistically estimated or not.
 Examining of the degree of multicollinearity : the degree
of correlation between explanatory variables.
 Choice of appropriate econometric techniques for
estimation: OLS, Logit, probit, VEC, ARDL…..
single equation techniques- applied to one equation at a
time
Simultaneous-equation techniques- applied to all
equations of a system.

24
 Some important criteria for choice of appropriate
estimation technique:
– the nature of relationship and its identification condition
– desirable properties of estimates obtained: unbiasedness,
efficiency, consistency and sufficiency
– the purpose of econometric research: analysis,
forecasting, policy making
– simplicity of the technique: easy computation and less-
data requirements
– time and cost requirements

25
3. Evaluation of Estimates
• It is about the determining of the reliability of estimates
(results of the model)
• It consists of deciding whether the estimates are
theoretically meaningful, statistically satisfactory and
econometrically correct.
Economic criteria
• Evaluation criteria involves: Statistical Criteria
I. Economic a priori criteria: Econometric criteria
- refer to the size (magnitude) & sign of the parameters
- are determined by economic theory
- Estimates with wrong signs or size should be rejected
unless there is a good reason to believe the result.
26
II. Statistical criteria: First order test
- aim at evaluating statistical reliability of estimates
- are determined by statistical theory
- Correlation coefficient test, standard error test, t-test,
F-test, & R2-test are some of the most commonly
used statistical tests. Post-
estimation
III. Econometric criteria: second order tests tests
– Aim at the detecting the violation or validity of the
assumptions of the econometric technique employed.
– are determined by econometric theory
– determine the reliability of statistical criteria
– help us establish whether the estimates have desirable 27
4. Evaluation of forecasting power of the model:
• This stage involves the investigation of the stability of the
estimates and their sensitivity to changes in the size of the
The ability of the model to predict the future values of dep.
sample. Var.
• Extra-sample performance of the model: outside sample data
• Some of ways of establishing the forecasting power of the
model are:
 Using estimates of the model for a period not included
in the sample.
 re-estimating the model with an expanded sample
(sample including additional observations)
• Conducting test of statistical significance for the difference
between the actual (original) and new (forecast) values to
check the forecasting power of the model. 28
Summary of Econometric Modeling

Economic theory Using a model for


policy purpose

Mathematical Forecasting or
model of the theory prediction

Specification of Hypothesis Testing


Econometric model

Collecting Estimation of
Data econometric model
29
Desirable properties of an econometric model
• The ‘goodness’ of an econometric model is judged according to the
following desirable properties.
1. Theoretical plausibility: The model should be compatible
(consistent) with the postulates of economic theory.

2. Explanatory ability: The model should be able to explain the


observations of the actual world.
3. Accuracy of the estimates of the parameters: the model should
approximate as best as possible the true parameters of the model.

4. Forecasting ability: The model should produce satisfactory


predictions of future values of the dependent variable.

5. Simplicity: the fewer the equations and the simpler their


mathematical form, the better the model is considered, ceteris
paribus 30
1.4. The Sources and Types of Data

Econometrics in absence of data would not exist.

different types datasets have their own


advantages and limitations.
Data can be qualitative or quantitative.

There are three types of datasets


1. cross-sectional data
2. Time series data
3. Panel data
31
1. Cross-sectional data: data collected from different
parties or entities at a given point in time.
e.g. survey of consumer expenditures in 2010
2. Times series data: a set of observations on a variable or
several variables over time.
• collected at regular intervals: daily, weekly, monthly,
quarterly, annually, quinquenially, decennially
Eg. data on weather reports, money supply, inflation,
GDP, government budget, Population Census

3. Panel or longitudinal data: data in which the same


cross-sectional unit (an individuals, households, firms or
countries) is surveyed over time
• combination of time series and cross-sectional data
32
Example of cross sectional data
individuals Consumption income age sex
1 6000 7000 38 Male
2 2000 4000 32 Male
3 3000 4000 32 Female
4 520 700 70 Male
5 400 200 35 Female
6 1100 1500 24 Female
7 3500 4000 36 Male
8 2300 5000 24 Male
9 2000 3000 38 Female
10 1000 2000 25 Female33
Example of Time series data
year RGDPpc Inflation FD Investment Govt. Exp GDS
2007 4215.8
15.10 32.69 45441.75 21077.95 8810.00
2008 4548.02 55.20 27.25 64007.59 27432.23 12671.81
2009 4819.093 2.70 24.41 86908.68 33205.86 22640.61
2010 5282.513 7.30 27.06 106170.11 36081.07 29291.39
2011 5720.123 38.00 28.22 165380.00 53147.10 88842.90
2012 6053.615 20.80 25.34 277243.70 62044.50 143745.70

2013 6521.836 7.40 27.14 295456.40 77636.90 152382.90

2014 7007.728 8.50 28.07 403042.90 98115.20 217848.20

2015 7541.87 10.40 29.00 485888.40 111301.10 269229.70


34
Example of Panel data
year country HDI EC FDI FD URB
2011 RWANDA 0.458 396.772 2.06155 21.9664 42.256
2012 RWANDA 0.466 399.68 3.4537 20.6267 42.667
2013 RWANDA 0.475 406.254 3.93416 21.469 43.086
2014 RWANDA 0.481 416.796 4.1741 21.5041 43.514
2015 RWANDA 0.485 428.171 1.80488 21.0418 43.95
2011 ETHIOPIA 0.422 489.682 1.96074 17.7101 17.735
2012 ETHIOPIA 0.427 490.69 0.64317 17.7101 18.16
2013 ETHIOPIA 0.435 494.728 2.82041 17.7101 18.59
2014 ETHIOPIA 0.441 496.815 3.33569 17.7101 19.028
2015 ETHIOPIA 0.448 498.31 4.07436 17.7101 19.472
2011 KENYA 0.536 468.976 3.45734 30.5726 23.967
2012 KENYA 0.541 462.097 2.73775 29.5362 24.37
2013 KENYA 0.546 475.883 2.03063 31.7127 24.78
2014 KENYA 0.55 513.426 1.33605 34.1352 25.197
2015 KENYA 0.555 521.52 0.97185 34.3752 25.622
35
Chapter 2: Simple Linear Regression

2.1. Concept of Regression


2.2. Population Regression Function (PRF) and
Sample Regression Function (SRF)
2.3. Assumptions of Classical Linear Regression

Model (CLRM)

2.4. Estimating Simple Linear Regression Model

2.5. Coefficient of Determination in SLRM

2.6. Hypothesis Testing in SLRM

2.7. Predictions Using SLRM 36


• Economic theory can give us the direction of a change:
Qualitative information.
• But what if we want to know just “how?” and “how
much?”
• How can we model the relationship b/n variables?
• Then we need:
– A sample of data
– A way/technique to estimate relationship

Regression analysis one of the most


important technique.
37
The Concept of Regression
Historical Origin of Regression
 The term ‘regression was coined by Sir Francis Galton
(1822-1911), who studied the relationship between the
height of children and height of parents.

 Galton found that, although there was a tendency for tall


parents to have tall children and for short parents to have
short children, the average height of children tend to
converge or “regress” toward the average height of all.

38
Modern Definition of Regression
 Regression analysis refers to estimating functions showing
the relationship between two or more variables and
corresponding tests.
 It is a technique of studying of the statistical dependence
of one LHS variable: dependent variable, on one or more
RHS variables: independent variables, with a view to
estimate and/or predict average value of the former on
the basis of fixed values of the latter.
 Regression does not necessarily imply causation: A
statistical relationship in itself cannot establish causation
connection.
 Causation must come from outside statistics, ultimately
from some theory or common sense. 39
Different Terminologies of Variables

Explanatory Variable(s)
Dependent Variable
 
Independent Variable(s)
Explained Variable
 
Predictand Predictor(s)
 
Regressand Regressor(s)
 
Response Stimulus or control variables


Endogenous Exogenous(es)
Regression analysis assumed that dependent variable is
stochastic and independent variable is fixed ( non-stochastic)40
Regression analysis has following objectives & uses
to show the relationship among variables.
to estimate average value (mean) of the dependent
variable given the value of independent variable(s);
to test hypothesis about sign and magnitude of
relationship
to forecast future value(s) of the dependent variable

It is to explain the variation in the dependent variable


based on the variation in one or more independent
variables.
 It indicates how variables are related or to what extent
variables are associated with each other. 41
Stochastic and non-stochastic relationships
 Non-stochastic relationship: 𝒀 = 𝒇 ( 𝑿)
– also called deterministic or mathematical or exact
relationship
– A relationship between X and Y is said to be non-
stochasticif for each value of the independent
variable (X) there is one and only one
corresponding value of dependent variable (Y).
– All observations fall on the line relationship.

– Exact relationships are rarely encountered in


business environments.
42
 Stochastic relationship: 𝒀 = 𝒇 ( 𝑿 ) +𝒖
– Also called statistical or random or inexact
relationship

– A relationship between X and Y is said to be


stochastic if for each value of X there are some
values of Y with some probability.
– Y cannot be entirely determined by X.
– Regression analysis is concerned with such
relationships
43
Non-stochastic
relationship
Y

44
Stochastic relationship

45
Simple Linear Regression
– The term ‘simple’ refers to the fact that we use only two
variables (one dependent and one independent variable).
– Linearrefers to linear in parameters. it may or may not be
linear in the variable. The parameters appear with a power of
7

one & is not multiplied/divided by other parameters.

 Simple linear regression is the simplest form of a


regression analysis having a single explanatory variable
related in linear form.

46
𝒀 𝒊 = 𝜷 𝟎 + 𝜷 𝟏 𝑿 𝒊 +𝒖 𝒊
Deterministic Random (stochastic)
component component

Where, Y is dependent variable


X is independent variable
&regression coefficients (Parameters)
is error (disturbance) term
is number of cases or observations
47
Y 𝒀 𝒊= 𝜷𝟎 +𝜷 𝟏 𝑿 𝒊
𝒖𝒊

𝜷𝟎
𝜷𝟏

48
Error (disturbance) term (): It is a proxy for all variables
that are not included in the regression model, but may
collectively affect Y.
 Why we include in the model? Sources of error term
1. It captures the effect of omitted variables.

Why omitted?
 Lack of data and limited knowledge
 Vagueness of theory
 Difficulty in measuring some factors
 Poor proxy variables
 Principle of parsimony: keeping our model simple and
manageable
49
2. Random behavior of human beings: human reactions are
unpredictable.
3. Measurement Errors: variables may be measured
inaccurately
4. Wrong model Specification due to
 wrong mathematical form
 Exclusion of important variables & inclusion of irrelevant ones

5. Aggregation error: errors in aggregation over time, space


 error term is a term added to a regression model to
capture all the variation in Y that can’t be explained by Xs.
50
Population Regression Function and Sample
Regression Function
Population Regression Function (PRF)
 It is the function that relates the conditional mean of
dependent variable with independent variable(s).
 It measures how the mean value of Y varies with X.

(stochastic version)

, E(

51
Conditional Expectation
X 80 100 120 140 160 180 200 220 240 260
Y
Weekly 55 65 79 80 102 110 120 135 137 150
family
consumption 60 70 84 93 107 115 136 137 145 152
expenditure 65 74 90 95 110 120 140 140 155 175
Y
70 80 94 103 116 130 144 152 165 178
75 85 98 108 118 135 145 157 175 180
-- 88 -- 113 125 140 -- 160 189 185
-- -- -- 115 -- -- -- 162 -- 191

Total 325 462 445 707 678 750 685 1043 966 1211

conditional 65 77 89 101 113 125 137 149 161 173


Mean
53
Sample Regression Function (SRF)
 It is the sample counterpart of PRF.
 It is an estimator of PRF on the basis of sample information
 SRFs are approximation of the true PRF due to sampling
fluctuations.

𝒀 𝒊= ^
𝜷𝟎 + ^
𝜷 𝟏 𝑿 𝒊+ 𝒆𝒊
SRF ^ =^ ^
𝒀 𝒊 𝜷 𝟎 𝜷𝟏 𝑿𝒊
+

where is estimator of E(Y)


is residual term 54
55
𝒆𝒊
Notes on PRF and SRF
 PRF is an idealized concept (unobservable) whereas SRF is
observable.
 For a given PRF, there are many SRFs.

 The primary objective of regression analysis is to estimate


PRF using SRF.
 The disturbance term ui plays a critical role in estimating
the PRF

 For empirical purposes, it is the stochastic PRF that


matters.

57
Assumptions of Classical Linear Regression Model
 CLRM is a basic framework of regression analysis and
based on a set of assumptions.
 Statistical inferences are valid if model assumptions are
reasonably satisfied in practice.

1. Linearity Assumption: The model must be linear in


parameters, but not necessarily linear in variables
 must be linear.
2. X values are fixed in repeated sampling:
 X is assumed to be non-stochastic
 X’s are the same in all samples (across samples)
58
3. Variability in X values: X-values in a given sample must
not be the same (within a sample)
 Var(X) must be a finite positive number.
 if , it is impossible to estimate the parameters.
4. Randomness of : the error term is assumed to be a
random variable.
5. Zero mean Assumption:for all
 the error terms have zero conditional mean.
 On average, factors which are not included in the model do
not systematically affect the mean value of Y: Positive and
negative effects of tend to cancel out each other.
 Parameter Estimation Requires Known Relationship
Between Data and Regression Function
59
60
6. The Assumption of Homoscedasticity:
𝟐
𝒗𝒂𝒓 (𝒖𝒊 / 𝑿 𝒊 )=𝜹
 constant (equal) conditional variance of the error term

 The variance of the error term is the same for all


observations (the same across X values).
 common in cross-sectional data.

Question: Prove that

61
Y exhibits similar amounts of variance
across the range of values for X

62
63
7. The assumption of NO-autocorrelation (no serial correlation):

 Error terms of different observations are independent.


The error terms across observations are NOT correlated
with each other
the error term in one time period never affects the error
term in the next.
Very common in time-series data.

Question: Prove that


Distinguish between correlation and
autocorrelation

64
65
8. The assumption of zero covariance between

No correlation between regressors and error terms.


 assumed to have separate and additive effect on Y.
What if are correlated?
Prove that .

9. Normality Assumption:
is assumed to have a normal distribution with zero mean
and constant variance.
10. No model specification error: the model is
correctly specified
66
11. The number of observations must be greater than the
number of explanatory variables
12. The assumption of no perfect multicollinearity (Ch-3)

 Assumptions about the model How realistic


 Assumptions about the error term (4-9) are all these
 Assumptions about the data. assumptions

Basic Concepts derived from the above assumptions


1. Y is normallydistributed:
2. Successive Values of Y are independent:
67
Estimating Simple Linear Regression Model
 It is about finding the numerical values of population
parameters.
 Parameters in SLRM can be estimated by different
techniques of estimation.
 The most commonly used estimation techniques are:
1. Method of Moments (MOM) the oldest method
2. Ordinary Least Square Method (OLS)
3. Maximum Likelihood Method (MLM)

Method of Moments: It is a method of providing the


numerical values of population parameters through replacing
population assumptions by their sample counterparts.
68
Ordinary Least Square Method (OLS)
 Developed by Carl F. Gauss.
 The most widely used estimation method. Why?
 It is a method which involves finding the values for
estimates for which the sum of squared residuals is
minimized.
 Recall:
PRF: 𝒀 𝒊= 𝜷𝟎 + 𝜷 𝟏 𝑿 𝒊+𝒖𝒊

SRF:
^
𝒆 𝒊=𝒀 𝒊 − 𝒀 𝒊

69
^ =𝒀 − ^
𝒆 𝒊=𝒀 𝒊 − 𝒀 𝜷 − ^
𝜷𝟏 𝑿 𝒊
𝒊 𝒊 𝟎
Question: how is the SRF determined in such a way that it is
as close as possible to PRF?
– How to make sure that and close to and respectively.
It calls for minimization.
Criteria
1. Minimizing choosing SRF in such a manner that the sum
of residual is as small as possible.

However, it’s not a good criterion as it gives equal weight


to all types of residuals.

70
71
2. Least square Criterion:
 The SRF can be fixed in such a way that residual sum of
Squares (RSS) is a small as possible.
It is an important criterion. Why?
a. It gives fairweight to each type of residuals
b. Estimators obtained have some attractive statistical
properties.
Hence,

(square and take sum both sides)

 we estimate and in such a way that is minimum.

72
 Thus we minimize subject to and .
 For minimization, take partial derivatives with respect to
and

and
 At minimum of ,

73
 Finally, the process of differentiation yields the following
equations for estimating and

(in deviation form)

Numerical Properties of OLS estimators (Reading Assignment:


Gujarati: pp 62-65) 74
StatisticalProperties of OLSestimators and
Gauss-MarkovTheorem
 Gauss-Markov Theorem: a theorem which states given
the assumptions of CLRM that the OLS estimators are
BLUE- Best Linear Unbiased estimators.
 Linear: OLS estimators are linear function of Y.
 Unbiased: their expected value is equal to the true
population parameters. ,
 Best (Efficient): OLS estimators have minimum variance
among the class of linear and unbiased estimators.

75
Gauss-Markov Theorem
 Summarizes the statistical properties of OLS estimators.
 states that OLS estimator has the least (minimum) variance of
all estimators in the class of linear & unbiased estimators.

Question: Prove that OLS estimators are BLUE.

Prove that are linear in Y.


Prove that are unbiased estimators of and
Prove that have minimum variance property.

76
Statistical Tests of Significance of OLS estimators:
 First order tests for the evaluating the parameter.
 The two most commonly used first order tests in
econometric analysis:

1. The coefficient of Determination (: used for


judging the explanatory power of regressors.
2. Testing the significance of estimates: used for judging
the statistical reliability of estimates

77
1. The coefficient of Determination (

 It is a measure of goodness of fit.


 It shows the percentage of total variation of the
dependent variable that can be explained by the changes
in the regressors included in the model.
 It tells us how “well” the sample regression line fits the
data.
 What portion of the total variation in Y is
explained by the model?

78
1. The coefficient of Determination (

Unexplained
variation
Total
variation in Y
Explained
variation

79
 The total variation in Y has two parts: due to regression
line and due to residual.
-Deviation of Y from its mean: ……. (1)
-Deviation of from its mean: ……… (2)
-Deviation of Y from the regression line: …. (3)

(square both sides and take


summation)

, but Why??

80
TSS = ESS + RSS
( total ) (explained ) (unexplained
variation) variation) variation)

Thus,

81
 If
RSS =0 or ESS=1, the model is perfect.
RSS=1, no relationship between Y and X (=0)

 a measure of the proportion of total variation in Y which is


explained by variation in regressors.
 Properties of
1. is non-negative

 If is close to 1, x is a ‘good’ explanatory variable


 if is close to 0, it explains very little of the variation

82
2. Testing the significance of OLS estimators
 The interest of econometricians is not only in obtaining
the estimator, (point estimation) but also in using it to
make inferences about the true parameter, (interval
estimation).
 For this purpose, we need:
 The variance ( of OLS estimators
 Unbiased estimator of
 The Normality assumption of :

 Why Normality Assumption?


83
 Then,

Why?
 One property of Normal distribution states that “ any
linear function of a normally distributed variable is itself
normally distributed.”
 Since the OLS estimators are linear functions of Y, then
they are also normally distributed.

84
 is unknown in practice since it’s difficult to get population
data, so we need unbiased estimator of .
prove it.
 With the assumption of Normality distribution, the OLS
estimators have the following statistical properties
1.They are unbiased
2. They are efficient ------ Precise estimation.
3.They are consistent: as the sample size increases
indefinitely, the estimators converge to their true
population values.

85
 Since OLS estimator, is normally distributed,
1. (if Standard Normal distribution
2. (if
t-distribution

NB:Standard error (SE) is a measure of precision


(reliability) of estimators in representing the true
parameters.

86
87
 We need a test of significance of estimators to
measure the validity of estimates:
to measure the size of sampling error
 to determine the degree of confidence

 Hypothesis: It is a statement about the population


parameter. It may or may not be true.
 Null Hypothesis (): the hypothesis that is initially
assumed to be true. (status quo)
 Alternative Hypothesis (): the hypothesis that we
need to prove using data.
88
 We can’t be 100% confident in statistical inferences from
samples.
 two types of errors in hypothesis testing
 Type I error: wrong decision of rejecting the true Ho
 Type II error: wrong decision of accepting false Ho.

 We cannot completely eliminate these errors and Type I


errors are generally considered more serious.
 The probability of a Type I error is called the Level Of
Significance and is denoted by α -alpha.
 Common level of significances: 1%, 5%, and 10%

89
 Hypothesis
--- X is statistically insignificant
(no relationship between X and Y)
--- X is statistically significant
(relationship between X and Y)

 How to make a decision about whether to reject the


null hypothesis or not?
Common tests of Significance: Decision Approaches
1. Standard error test
2. Test statistic approach (students’ t-test)
3. Confidence Interval approach 90
1. Standard error test
Steps
I. Compute standard errors of parameters
SE
II. Compare SE and numerical values of parameters

Decision Rule:
 if SE reject (accept )
if SE fail to reject

91
2. t-statistic approach (students’ t-test)
Steps
I. Compute the t-statistic ():
II. Choose the level of significance (: 1%, 5% and 10%
III. Find critical value of t (:
Decision Rule:
If reject (accept)
 if fail to reject (reject

Rule of thumb: reject if


92
3. Confidence Interval approach
 We need to construct CI to define how close the
estimates to the true population parameter.
 we must establish limiting values around the estimate
within which the true parameter is expected to lie
within a certain “degree of confidence”.
 Given the level of significance (
. SE
Decision Rule:
 If the (1-contains the hypothesized value, we fail to
reject .
93
4. P-value approach
 P-value: exact level of significance
 It is the lowest level of significance at which the null
hypothesis can be rejected.
Decision Rule:
 If P-value , reject .

 Reading Assignment: Statistical significance vs


Practical significance

94
Predictions with simple Linear Regression
 One of the most important applications of regression
analysis is prediction of the dependent variable based on a
given value of the regressors.
 Once the estimated parameters are proved to be
significant and valid, then it can be used to
forecast/predict future values of dependent variable.
 There are two kinds of predictions
 Mean prediction: it is the prediction of the conditional
mean value of Y corresponding to a chosen X.
 Individual Prediction: it is prediction of individual Y value
corresponding to a given X.
95

Example: Consider a SLRMthat relates the consumption
(Y) and income of households (X). The following data were
obtained from sample of 10 observations.

X 10 7 10 5 8 8 6 7 9 10
Y 11 10 12 6 10 7 9 10 11 10
Question
1. Estimate the consumption function and interpret the results
2. Estimate Standard errors of regression coefficients
3. What percent of variation is explained by the regression line (?
4. Test the hypothesis that income influences consumption at 95%
CI.
5. Predict consumption at income of 20.
96
X Y XY
10 11 100 110 11.1 -0.1 0.01 1.4 1.96 2 4 2.8 1.5 2.25

7 10 49 70 8.85 1.15 1.32 0.4 0.16 -1 1 -0.4 -0.75 0.562

10 12 100 120 11.1 0.9 0.81 2.4 5.76 2 4 4.8 1.5 2.25

5 6 25 30 7.35 -1.3 1.82 -3.6 12.96 -3 9 10.8 2.25 5.062

8 10 64 80 9.6 0.4 0.16 0.4 0.16 0 0 0 0 0

8 7 64 56 9.6 -2.6 6.76 -2.6 6.76 0 0 0 0 0

6 9 36 54 8.1 0.9 0.81 -0.6 0.36 -2 4 1.2 -1.5 2.25

7 10 49 70 8.85 1.15 1.32 0.4 0.16 -1 1 -0.4 -0.75 0.562

9 11 81 99 10.4 0.65 0.42 1.4 1.96 1 1 1.4 0.75 0.562

10 10 100 100 11.1 -1.1 1.21 0.4 0.16 2 4 0.8 1.5 2.25

80 96 668 789 96 0 14.6 0.0 30. 0 28 21 0 15.


5 4 75
97
and

^ =^
𝒀 𝜷 ^
𝒊 𝟎 𝜷 𝟏 𝑿 𝒊 =𝟑 .𝟔 +𝟎 . 𝟕𝟓 𝑿 𝒊
+ 98
 : a one unit increase in income leads to an increase in
consumption by 0.75 units, on average.
 amount of consumption with a zero income.
(minimum level of consumption)
2.

Then, SE
SE

99
 52% of the total variation in consumption is explained
by variation in income.

4. Hypothesis:

Decision: reject since . Thus, income is statistically


significance in affecting consumption.
100
Chapter Three: Multiple Linear Regression Model
– Introduction
– A model with two regressors
– Method of Ordinary Least Squares revised
– Coefficient of Multiple Determination
– Statistical Properties of Least Squares

– Hypothesis Testing in Multiple Linear Regression


101
 So far, we have discussed the simple linear regression
model.

 SLRM involves the relationship between dependent


variable and a single explanatory variable.
 Simple linear regression is a rare case, however.
 It is inadequate for several applications.
102
Many economic theories suggest that relationships
involve more than one regressors.
Example
 Demand = f (price, income, preferences, expectations …..)
 Consumption = f (income, age, wealth, sex …….)
 Work performance = f (salary, leadership style, work
overload, working environment)
 Students’ CGPA = f(studying hour, previous GPA, sex, pc
ownership……)
103
Multiple regression model simultaneously considers
the influence of many explanatory variables on a
dependent variable.
The aim is to look at
the individual effect of
each variable while
“adjusting out” the
influence of other
variables

104
 The multiple linear regression Model (MLRM) is:

Where --- dependent variable (regressand)


’s -- explanatory variables
’s ---- partial regression coefficients (parameters)
----- error (disturbance) term. Why?

NB: No matter how many regressors are included in our


regression model, there is still a case that we couldn’t
include some factors in our model and hence collectively
captured by the error term. 105
Assumptions of MLRM
 Assumptions and their implications (rationale) are the
same as beforediscussed in the case of SLRM.
1. Randomness of error term
2. Zero mean of error term:
3. Homoscedasticity:
4. No autocorrelation:
5. Zero covariance:
6. Normality: )
7. No specification bias
8. Linear in parameters
9. f
The assumption of no perfect multicollinearity 106
The assumption of no perfect multicollinearity
 It states that the regressors (, , …. ) are not perfectly
correlated.
 No exact linear relationship between/among .
 None of the regressors can be written as exact linear
combination of the remain regressors in the model.
 If two variables are perfectly correlated, it would be
difficult to disentangle the separate effects of the two
variables.
 It does not rule out non-linear relationship.
 Correlation among economic variables is common, the
problem is when there is a perfect collinearity among 107
A model with two explanatory Variables
 Three-variable regression model: the simplest case of MLRM

PRF:
the conditional mean value of
 -- average value of of all regressors excluded from the
model
 --- measures the change in mean value of per unit
change in , holding the value of .
 --- measure the net effect of a unit change i,
keeping constant
108
and
 However, since PRF is unknown to us, it has to be
estimated from sample data.
SRF:

Where are estimates of the and


is the predicted value of Y.
Then,

109
Estimation Problem: OLS Method Revised
 The OLS estimators are derived by minimizing RSS ()
 To obtain the OLS estimators, we partially differentiate
with respect to , and and the partial derivatives equal to
zero.

∑ 𝒆𝒊 =𝟎
∑ 𝒙𝟏 𝒆𝒊 =𝟎
∑ 𝒙𝟐 𝒆𝒊 =𝟎
110
After rearranging the above expressions, we produce the
following three normal equations
(1)

(2)

(3)

From equation (1), we obtain:

111
From (2) and (3),

In deviation form, the above equations become:

112
Finally, we obtain the OLS estimators as:

113
 If and are uncorrelated, the multiple regression
coefficient of is similar to the simple regression
coefficient of on .

Note

114
Variance of s
]

Where,
115
Coefficient of Determination (

 In a regression model with two regressors ,

116
TSS ESS RSS

Hence,
117
 As in a simple regression, is a measure of the
proportion of total variation in Y which is explained by
the regressors in the model. (measure goodness of fit)
 If is high, the model is Said to fit the data well.
 One measure problem of is that it always increases and
never decreases with every regressor added to the
model. (i.e. even if the variables added have no
economic justification.)
 So, in order to incorporate the impact of changes in the
number of regressors in our model, it is necessary to
adjust with degrees of freedom.
118
 This will be done by using an alternative
measure of goodness of fit called adjusted (

Adjusted coefficient of correlation ()


 iscorrected measure of goodness of fit.
 It does not always increase as the new variable is added.
 It is no longer the percept of variation explained.
 It explains the percentage of variation of the independent
variables that affect the dependent variables.
 The addition of independent variables that do not fit the
model will be penalized by
119
 is computed as:

Note

1. if =1 and k=1
2. can be negative if =0 and k>1
3. will decrease if irrelevant variable
is added.
It is sometimes used as a device for selecting the
appropriate set of regressors.
120
The main difference between R2 and the adjusted R2:
 R2 assumes that every single variable explains
the variation in the dependent variable.
 The adjusted R2 tells you the percentage of variation
explained by only the independent variables that actually
affect the dependent variable.

121
General Linear Regression Model:
(using Matrix Approach)
 It is a MLRM with k- explanatory variables.

 Here, we have n-equations with n-observations.

122
 Then the equations can be put in matrix form as:

Y X U
(n*1) (n*(k+1)) [(k+1)*1) (n*1)
 In short,

123
PRF: (unkown to us. Why?)
SRF:
 To derive the OLS estimators of under the usual
assumptions, we need to minimize

124
. Why??

125
First Order Condition (FOC)

Two Important Theorems


1. 2.
Then,

126
(multiply both sides by

127
Statistical Properties of OLS estimators
 Like a two variable case, satisfy the BLUE property.

128
129
130
Coefficient of Determination (
 can be derived in matrix form as:

131
132
133
Hypothesis Testing In MLRM
 In MLRM, there are two tests of significance.
I. Tests of Individual Significance
II. Test of overall significance of the model

Tests of Individual Significance


 Testing the significance of individual parameters
of the model
 t-test is used to check the significance of individual
regression coefficents.

134
Hypothesis
is statistically insiginificant

is statistically siginificant

is statistically insiginificant

is statistically siginificant

135
 Null Hypothesis states that has no (linear)
influence on Y.
Common Testing Appraoches
1. Standard error test
 If SE (, we reject the
If SE (, we reject the
NB:The smaller the SEs, the stronger the evidence
that the estimates are statistically reliable.

136
2. Student’s t-test
 If, we reject .

NB: The greater the value of , the stronger the


evidence that the estimates are statistically
significant.

137
2. Test of overall (joint) significance
 It is a joint test of the relevance of all regressors
included in the model.

 Hypothesis

138
 Here, the null hypothesis is the hypothesis that
are jointly
or simultaneously equal to zero. (all slope
coefficients are zero).
 The alternative hypothesis states that not all ‘s
are simultaneously zero.
 When is assumed to be NOT True, is
Unresricted RSS (URSS).
 Whenis assumed to be TRUE, is resricted RSS
(RRSS).
139
When we reject

140
 If we fail to reject (if is assumed to be true) or when
all slope coefficients are zero,

 Thus,
141
 In Joint significance test, the testing procedure is
based on the comparison of RSS from the original
regression model to RSS from a regression model
where is assumed to be true.
 Now,

142
F-test is a measure of overall significance of
estimated regression (model adequacy)
It is the test of significance of
143
Here, the calculated value of F is compared with
the critical value of F which leaves the probability
of α in the upper tail of F-distribution with k-1
and n-k degrees of freedom.
If we reject . ---- all explanatory variables are
jointly significant (Y is linearly related with all
regressors).

144
Overall Significance
Rejection Rule
 RejectH0 in favor of Ha if fcalc falls in
colored area
Reject H0

Do Not 
Reject H0

0 F
F ( k , n -K-1, 1-α)
 Reject H0 for Ha if P-value = P(F>fcalc)<α

145
146
147
148
149
150
151
152
153
154
Chapter Five
Regression Analysis
with Qualitative
Information
155
The Nature of Qualitative Information
 Sometimes we can not obtain a set of numerical
values for all the variables we want to use in a
regression model.
 This is because some variables can not be
quantified easily.

Examples:
(a) Gender may play a role in determining salary levels
(b) Different ethnic groups may follow different
consumption patterns
So far, we have concerned with regression analysis
on variables which are quantitative in nature.
- They are recorded on a well defined scale
Example: consumption, price, income, experience….
 However, some variables are essentially qualitative
in nature. ---- defined on nominal scale
Example: sex, colour, nationality, region………
Such variables do not have any natural scale of
measurement.
They usually indicate the presence of a quality or an
attribute
157
Question: How can we quantify a qualitative
information?
Dummy Variables
 aka indicator or categorical variables
 Are artificial variables constructed to quantify nominal
scale variables taking on values of ‘1’ or ‘0’
1- indicates the presence of qualitative attribute
0- indicates the absence of that attribute
 Classify the data into mutually exclusive categories
 Are proxy variables for qualitative characteristic in
regression model. 158
We have n- dummy variables for n-categories.

 dummy variables can be used as independent or


dependent variables.

1.Dummy as Independent variables


 Regression with dummy regressors.
 Dummy regressors are incorporated in regression
models in the same way as quantitative variables.
 Can be estimated using standard OLS methodology.

159
Some Notes while using dummy regressors
1. When we have a dummy variable for each category
and an intercept in our model, we face a perfect
multicollinearity problem.
Dummy variable Trap:
 it is a situation of perfect multicollinearity if the
dummy variables for each category and intercept term
are included in the model.
Solution
a) introduce n-1 dummies for n- categories in our
model. (the number of dummies should be one less
than the number of categories) 160
Stata command: reg y i.x
b) Drop the intercept term if as many dummy variables
are introduced as the number of categories.
Stata command: reg Y D1 D2 , noconstant
2. The category for which no dummy variable is
assigned is called base/bench mark/reference/
omitted category.
 All comparisons are made in relation to the base
category
3. The intercept term represents the mean value of the
base category
161
4. Coefficients attached to dummy variables are called
differential intercept coefficients.
- They are interpreted as the change in the value of the
dependent variable compared with the base category.
- They must always be interpreted in relation to the
base category.
5. The choice of base category is up to the researcher
 The choice of which of the two different outcomes is
to be assigned the value of 1 does not alter the
results.

162
Stata command:

To generate dummy:
gen D1=(varname==1)
gen D2=(varname==2)
or
xi i.varname, noomit

163
i) Regression with a single Dummy regressor
 Consider , where
is wage and
0) ------ mean wage for male
1) ----- mean wage for female
 =1)=0) ----- differential intercept coefficient
 It measures the mean wage difference b/n male and
female.

164
 Whether the average wage of female is greater
than their male counterparts depends on the sign and
significance of
 A positive and significant implies that the mean wage
of female workers is greater than that of males by
amount.
 How can we know that sex significantly affects wage?
Hypothesis
Ho: -------- no sex discrimination
H1: ----- salary difference b/n M & F

165
wage

𝜷 𝟎+ 𝜷𝟏
𝜷𝟏
𝜷𝟎

𝑫𝒊

166
Example: ((using wage data)

,
Stata command: reg wage i.sex

a) Find the mean wage for male and female


b) Interpret the slope coefficient
c) Test the hypothesis that there is no sex discrimination

167
ii) Regression with one quantitative and one
dummy regressor

Where X is years of work experience

0) ------ mean wage for male


1) ----- mean wage for female at a given experience.
 is the mean salary difference between male and
female for the same years of experience

168
 If females are paid more (by ), on average, as
compared to male counterparts for the same years of
experience.

salary ( 𝜷𝟎 + 𝜷 𝟏) + 𝜷 𝟐 𝑿 𝒊

𝜷𝟏 𝜷 𝟎+ 𝜷𝟐 𝑿 𝒊
𝜷 𝟎+ 𝜷𝟏

𝜷𝟎

experience
169
E

 The rate of change in mean wage by years of


experience is the same for both sex (common slope)
Example: ,
Stata command: reg wage sex exp
(0.000)

a) Find the mean wage for male and female


b) Find the mean wage differential b/n male and
female for the same years of experience.
c) Comment on the significance of slope coefficients.
170
iii) Multiple Dummy in regression Model
 the case of more than two categories of a variable.
eg. Educational level

where
 ----- mean salary for BA holders
 ------ the mean salary differential b/n BA and MSc
 ----- the mean salary differential b/n BA and PhD

171
Stata command: reg wage i. educexp
Or xi i.education, noomit
regwage _Ieducation_1 _Ieducation_2 exp

(0.000) (0.000)

a) Find the mean wage for BA holders


b) Find the mean wage MSc holders
c) Find the mean wage for PhD holders
d) Interpret the coefficients
e) Is there a significant mean wage difference across each
category? 172
iv) Regression with two qualitative variables

Base category: male BA workers

0, 0, 0) ------ mean wage for male BA workers, at given a


experience

173
Stata command: reg wage sex i. education exp
or xi i.education, noomit
reg wage sex _Ieducation_1 _Ieducation_2 exp
. reg wage i.sex i.education experience

Source SS df MS Number of obs = 60


F(4, 55) = 67.60
Model 3646.86729 4 911.716823 Prob > F = 0.0000
Residual 741.803849 55 13.4873427 R-squared = 0.8310
Adj R-squared = 0.8187
Total 4388.67114 59 74.3842566 Root MSE = 3.6725

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

sex
female -.9461648 .957455 -0.99 0.327 -2.864948 .972618

education
MSc 6.262634 1.253719 5.00 0.000 3.750125 8.775143
PhD 12.80867 1.875075 6.83 0.000 9.050939 16.56641

experience 1.840992 .3825059 4.81 0.000 1.074433 2.607551


_cons 14.98184 1.403567 10.67 0.000 12.16903 17.79465 174
Base category: male with BA

1. Find the mean wage for Male BA holders


2. Find the mean wage for female BA holders
3. Find the mean wage for male MSc holders
4. Find the mean wage for female MSc holders
5. Find the mean wage for male PhD holders
6. Find the mean wage for female PhD holders
7. Comment of the significance of each coefficients and
give your conclusions

175
 Note that, in the above case, the effect of one dummy
variable is independent of the other dummy variable
(additive effect--- no interaction).
 For example, if the mean wage for male workers is
higher than female counterparts, this is so whether
they do have BA or MSc or PhD.

 However, the differential effect of one dummy is


constant across the other dummy is unrealistic in
many cases.
 Thus, there may be an interaction between dummy
variables (additive and multiplicative effect)
176
V) Interaction between Dummy variables
 Interaction term is independent variable which is the
product of two other regressors. It can be dummy or
continuous.
 interaction dummy is the product of two dummy
variables
 It modifies the effects of two attributes considered
individually.
 It the interaction dummy is statistically significant, the
presence of two attributes will reinforce the individual
effects of these attributes.
177
 If the coefficients of interaction terms (& are
significant, the effect of education level on wage
earning depends on sex of the worker.
 Hypothesis:
Ho: education differential does not depend on sex

178
 Stata command: gen sed= sex* education
reg wage sex i. education i.se experience
. reg wage i.sex i.education i.se experience

Source SS df MS Number of obs = 60


F(6, 53) = 44.28
Model 3658.80087 6 609.800145 Prob > F = 0.0000
Residual 729.870269 53 13.7711371 R-squared = 0.8337
Adj R-squared = 0.8149
Total 4388.67114 59 74.3842566 Root MSE = 3.7109

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

sex
female -1.297905 1.7052 -0.76 0.450 -4.718102 2.122292

education
MSc 6.393786 1.767796 3.62 0.001 2.848036 9.939536
PhD 11.7254 2.427275 4.83 0.000 6.856904 16.5939

se
female Msc -.3059176 2.268282 -0.13 0.893 -4.855516 4.24368
female PhD 1.836186 2.541282 0.72 0.473 -3.26098 6.933352

experience 1.858286 .388374 4.78 0.000 1.079306 2.637265


_cons 15.12419 1.559297 9.70 0.000 11.99664 18.25175179
a) Interpret the result
b) Comment on the significance of coefficients
c) Does sex have the same impact on wage across each
level of education?
d) Is there an interaction between sex and education?
 Since the interaction variables are insignificant, sex has
the same impact on wage across the three education
levels. Hence, there is no interaction between sex and
education.
 Interaction occurs when the effect of one of the
independent variables on the dependent variable
depends on the other independent variable.
180
Vi). Interaction between Dummy and
Quantitative regressor

Where --- wage, -- experience and


 =0, X) ---- mean wage for male
 =1, X) ---- for female

--- intercept differential --- the wage difference b/n Male


and female for the same years of experience.
---- slope differential ---- differential effect of experience

181
 Stata command
gen sexexp= sex*experience
reg wage sex experience sexexp
. reg wage sex experience sexexp

Source SS df MS Number of obs = 60


F(3, 56) = 40.70
Model 3008.69466 3 1002.89822 Prob > F = 0.0000
Residual 1379.97648 56 24.6424371 R-squared = 0.6856
Adj R-squared = 0.6687
Total 4388.67114 59 74.3842566 Root MSE = 4.9641

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

sex -.3727282 3.07909 -0.12 0.904 -6.540887 5.79543


experience 3.770985 .486521 7.75 0.000 2.796366 4.745603
sexexp .0231804 .6866312 0.03 0.973 -1.352307 1.398668
_cons 12.80339 2.238978 5.72 0.000 8.318173 17.2886
182
xi i.education*experience
. reg wage experience i.education _IeduXexper_1 _IeduXexper_2

Source SS df MS Number of obs = 60


F(5, 54) = 54.73
Model 3665.38648 5 733.077295 Prob > F = 0.0000
Residual 723.284662 54 13.3941604 R-squared = 0.8352
Adj R-squared = 0.8199
Total 4388.67114 59 74.3842566 Root MSE = 3.6598

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

experience 3.052143 .9533569 3.20 0.002 1.140777 4.963509

education
MSc 8.556469 3.579159 2.39 0.020 1.380686 15.73225
PhD 18.56934 4.311517 4.31 0.000 9.925272 27.21341

_IeduXexper_1 -1.032326 1.13667 -0.91 0.368 -3.311211 1.246559


_IeduXexper_2 -1.668114 1.102289 -1.51 0.136 -3.878071 .5418425
_cons 11.48786 2.503309 4.59 0.000 6.469025 16.50669
183
a) Interpret the results
b) Check that there is interaction between sex and years
of experience.
c) Check that there is interaction between education and
years of experience

184
Vii). Dummy variables in semi-log models
 In semi-log models, the coefficient of dummy variable,
when multiplied by 100%, is interpreted as percentage
difference in Y.
 Exact percentage difference:

 Stata command
gen lnwage= ln(wage)
reglnwage sex i.educationexperience

185
. reg lnwage sex i.education experience sexexp _IeduXexper_1 _IeduXexper_2

Source SS df MS Number of obs = 60


F(7, 52) = 46.12
Model 5.10124577 7 .728749395 Prob > F = 0.0000
Residual .821692063 52 .01580177 R-squared = 0.8613
Adj R-squared = 0.8426
Total 5.92293783 59 .100388777 Root MSE = .12571

lnwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

sex -.0534635 .0784367 -0.68 0.499 -.2108582 .1039312

education
MSc .5249681 .1237604 4.24 0.000 .2766247 .7733115
PhD .8948986 .1486981 6.02 0.000 .596514 1.193283

experience .1600806 .0341374 4.69 0.000 .0915789 .2285822


sexexp .0031754 .0175443 0.18 0.857 -.0320298 .0383807
_IeduXexper_1 -.093619 .0392727 -2.38 0.021 -.1724253 -.0148126
_IeduXexper_2 -.1262159 .0379258 -3.33 0.002 -.2023196 -.0501122
_cons 2.557795 .0954203 26.81 0.000 2.36632 2.74927
186
1) Interpret the results
2) Find the mean wage for male and female
3) Find the mean wage for each level of education.
4) Does sex matter for wage?
5) Test the hypothesis that the effect of experience on wage
is the same for male and female
6) Test the hypothesis that the effect of experience on wage
is the same across each level of education.
 A one yr increase in experience for BA workers leads
to a 16% increase in wage, on average.
 A one % increase in experience for MSc and PhD
workers leads to a 6.65% and 3.39% increase in wage
respectively, on average. 187
2.
2.

2.

2. Dummy Dependent Variable


2.

(Qualitative Response Models)


What if the dependent variable is a dummy variable
(qualitative in nature)
Example: labor force participation decision, vote choice,
house ownership, Willingness to pay decision
Can we still use OLS methodology?
 what is the fundamental difference between
regression models with quantitative dependent
variable & those with qualitative dependent variable?
188
2.
2.

2.

 When the dependent variable takes


 two responses--- binary / dichotomous variable
 More that two responses---- polycholomous variable

 Qualitative Response Regression Models (probability


models) can be Binary or multinomial.
 Binary response models is when the dependent
variable takes two values/responses
 There are three binary response models
1. Linear Probability Model (LPM)
2. Logit Model
3. Probit Model
4. Tobit (reading assignmnt) 189
2.

Linear Probability Model (LPM)


2.

• LPM is a linear regression model with a binary


regressand.
• It is an extension of OLS methodology to a dummy
dependent variable.
• Consider =+
• ) is the conditional probability that the event will
occur, given
• )=) = ----- PRF

• )1
190
2.

• Example:
• ) is the probability of an individual deciding to work at
a given amount of wage.
• Stata command: reg Y X
• = -0.9456861 + 0.102131 .

 The intercept of −0.946 gives the “probability’’ that


an individual decides to work with a zero wage.
 Since this value is negative, and since probability cannot
be negative, we treat this value as zero, which is sensible
in the present instance. 191
2.

 The slope value of 0.102 implies that for a unit change in


wage, on the average, the probability of an individual
deciding to work increases by 0.1021 or about 10 %.

 Of course, given a particular level of wage, say X=12


thousand, the estimated probability of deciding to work
is =−0.9457+12(0.1021) =0.2795.

 That is, the probability that an individual with a wage of


birr 12,000 will decide to work is about 28 %.

192
Limitation of LPM
1. Non-normality of error term
 The error term follows a Bernoulli distribution.
 A Bernoulli distribution is an experiment with two
outcomes: success with probability P and failure
with probability (1-P) where it has mean of P and
variance P(1-P) .

 As sample size increase, statistical inference of LPM


will follow the usual OLS procedure.

193
2. Hetroscedastic Variance of the error term
 -------- prove it.
 Consequence: LPM estimates are inefficient

 Solution: weighted Least square

194
• To avoid the heteroscedastic variance,
 predict p

 replace p=0 if p<0

 replace p=1 if p>1

 gen pf=1- p

 gen w= p* pf

 gen sqrtw= sqrt(w)

 gen ystar= y/ sqrtw

 gen x0star=1/ sqrtw

 gen x1star= income/ sqrtw

• Fit the following model:= + +


3. non-fulfillment of ) 1
 Fitted probabilities may lie outside the 0-1 range.
 There is no guarantee that will necessarily lie
between 0 and 1.

solution
 If the estimate of E( )0, assume probability as 0.
 If the estimate of E()1, assume probability as 1.

 Use logit or probit models.

196
4. Questionable value of as a measure of
goodness of fit
 The conventional coefficient of determination is of
limited value to judge the goodness of fit of the
model.

5. Functional Form
• E()= + is linear: LPM assumes that E() increases
linearly with (Marginal effect of is constant), which
may be unrealistic.
• E() is non-linearly related to in practice.

• Solution: use Logit or probit model


197
Alternative Models to LPM:
( Logit and Probit Models)
 They are probability models with two characteristics.
1. ) 1
- as increases increases but never steps
outside the 0-1 interval.
2. Non-linear relationship between and .
• The cumulative distribution function(CDF) can be used to
model the such qualitative response models.
• CDF is the probability of a random variable when it takes a
value less than or equal to some specified numerical value:
• All CDFs are s-shaped.

• The CDFs commonly chosen to represent qualitative


responsive models:
 The Logistic CDF –which gives rise to the logit model.

 The Normal CDF-which gives rise to the probit

(normit) model.
Logit Model
• For the binary choice model: =+, where = and = income, let us represent E( )= P()
= .

• Multiplying the above by gives: = .


• As approaches to - , approaches to 0.

• As approaches to , approaches to 1.

• is non-linearly related to :

 =
• =.

=.

• Since = and (1-) = , = (1-) =

λ(1-λ)= marginal effect(mfx).

• Estimation problem: is non-linear in the .


• Let us define the odds: the ratio of the probability that the event will
happen (a family will own a house ) to the probability that the event will
not happen (it will not own a house).
• = =.
• Range: ( 0 , )

• Let us define the log of the odds ratio (the Logit):

• = ln() = .
• For estimation purpose, we write the Logit model as follows:

= ln( ) = + .
Features of the Logit model
• As P goes from 0 to 1 (i.e., as Z varies from −∞ to+∞), the Logit L goes
from −∞ to +∞ or Range: ( -, )
• L is linear in X.

• As many regressors can be added as may be dictated by the underlying


theory.
• If L, the Logit, is positive, it means that when the value of the
regressor(s) increases, the odds that the regressand equals 1 (meaning
some event of interest happens) increases or L is positive when the
odds is > 1.
• If L is negative, the odds that the regressand equals 1 decreases as the
value of X increases or L is negative when the odds is between 0 and 1.
• The slope measures the change in L for a unit change in X, that is, it
tells how the log-odds in favor of owning a house change as income
changes by a unit, say, $1000.
• The intercept is the value of the log-odds in favor of owning a house if
income is zero.
• The Logit model assumes that the log of the odds is linearly related to .
• Given a certain level of income, say, , the probability of owning a house
can be estimated but it requires estimating and .
• To estimate the Logit model =ln()= + , we need the values of the
regressor(s) , say, and the regressand, or the Logit L.
Estimation of the Logit model

• If a family owns a house, = .

• If a family does not own a house, = .

• In this situation we may have to resort to the maximum likelihood (ML)


method to estimate the parameters since the standard OLS is not applicable.
• In ML method, the objective is to obtain the values of the unknown
parameters in such a manner that the probability of observing the given s is as
high as possible.
• The joint probability of observing n Y values is given as follows:

f(, , ,….,)=f()* f()* f()*….. f()


• Each is drawn independently.
• It is known as likelihood function(LF).
• LF=
• LF= since is a Bernoulli random variable.
• Taking the natural logarithm of the LF gives the log-likelihood
function:
• lnLF=+(1-
• lnLF=+
• lnLF=+
• lnLF=
• lnLF=;since ln( ) = and (1-) = ,
• lnLF=; since ln1=0
• ML: Maximizing the lnLF with respect to s.
• First order conditions: = 0 and = 0.
• = * =0
= = 0; First normal equation.
• =* =0
= = 0; Second normal equation.
• Thus, the normal equations are non-linear so that no explicit
solutions can be obtained and iteration method will be applied.
Interpretation of the Logit model
• Test is based on standard normal (Z) statistic because it uses asymptotic
standard errors.
• Overall significance of the model is based on likelihood ratio test (LR statistic):
 2(ULLF-RLLF) , with df = No of restrictions and where ULLF =
unrestricted log-likelihood function = =ln( )= + and RLLF= restricted
log-likelihood function ==ln( )= +
• Pseudo is a measure of goodness of fit: Pseudo =1-(
• Odds ratio is defined as follows:
 = = =

• In general, if you take the antilog of the jth slope coefficient , subtract 1 from

it, and multiply the result by 100%, you will get the percent change in the odds

for a unit increase in the jthregressor.


• Consider the logit2 data, where =grade = , =gpa , = score on an examination
given at the beginning of the term(diagnostic exam) and = methodology = .
• Fit the following model:= =+++ +

Stat command: logit grade gpa score methodology


• = + + +

= -13.02135 + 2.826113 + 0.0951577+ 2.378688

sd. (4.931325 ) (1.262941 ) (0.1415542 ) (1.064564 )

z (-2.64) (2.24) (0.67) (2.23 )

p-value (0.008 ) (0.025 ) (0.501 ) (0.025 )

LR chi2(3)= 15.40
. logit grade gpa score methodology, nolog

Logistic regression Number of obs = 32


LR chi2(3) = 15.40
Prob > chi2 = 0.0015
Log likelihood = -12.889633 Pseudo R2 = 0.3740

grade Coef. Std. Err. z P>|z| [95% Conf. Interval]

gpa 2.826113 1.262941 2.24 0.025 .3507938 5.301432


score .0951577 .1415542 0.67 0.501 -.1822835 .3725988
methodology 2.378688 1.064564 2.23 0.025 .29218 4.465195
_cons -13.02135 4.931325 -2.64 0.008 -22.68657 -3.35613
• To predict the Logit, use the following command after estimation: predict Logit,
xb

• At = 2.66, =20 and = 0 , the Logit can be calculated manually as;

di -13.02135 + 2.826113*2.66 + 0.0951577*20 + 2.378688*0 3.600734

• To predict the odds, use the following command after defining Logit: gen
odds=exp(Logit)

• To predict the odds individually: di exp(-3.6007354) 0.02730364

• To predict probability, use the following command after estimation:

predict probability or predict probability, pr

• To calculate probability manually, = :

dis 0.02730364/(1+ 0.02730364)0.02657796

• That means when gpa equals 2.66, diagnostic exam scored is 20 out of 100 and the
teaching method is not new, the probability of scoring A grade is 2.658 percent.
• In order to report the odds ratio, use the following command:

logistic grade gpa score methodology or logit grade gpa score methodology, or
• To calculate the odds ratio for one unit increase in gpa manually, that is, :

di exp(2.826113)16.879722
• To calculate the percentage change in the odds ratio for one unit increase in
gpa, score and methodology, respectively:
di(16.879722-1)*100%=1587.9722%

di ( 1.099832-1)*100%=9.9832%

di ( 10.79073-1)*100%=979.073%
• The last one suggests that students who are exposed to the new method of
teaching are more than 10 times (900 percent) likely to get an A than
students who are not exposed to it, other things remaining the same.
. logistic grade gpa score methodology

Logistic regression Number of obs = 32


LR chi2(3) = 15.40
Prob > chi2 = 0.0015
Log likelihood = -12.889633 Pseudo R2 = 0.3740

grade Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

gpa 16.87972 21.31809 2.24 0.025 1.420194 200.6239


score 1.099832 .1556859 0.67 0.501 .8333651 1.451502
methodology 10.79073 11.48743 2.23 0.025 1.339344 86.93802
_cons 2.21e-06 .0000109 -2.64 0.008 1.40e-10 .03487
• To calculate the marginal effect at the mean, = (1- ) : mfx
• To calculate the marginal effect at a reperesentative Value: mfx , at (2.66 20 0)
• =(1- ) dis (0.02657796)*(1-0.02657796)*2.826113 0.07311599
• =(1- )dis(0.02657796)*(1-0.02657796)*0.09515770.00246188

• =P( ) - P()

dis (0.22757635-0.02657796) 0.20099839


 Logitdi-13.02135+2.826113*2.66+0.0951577*20+2.378688*1-1.2220474

 Odds di exp(-1.2220474) 0.29462633


 Probabilitydis 0.29462633/(1+0.29462633) 0.22757635
Marginal effect
• is the effect of change in Xi on pr (Y=1/X)
• To know the change of X will affect the probability
that Y=1.
. mfx

Marginal effects after logit


y = Pr(grade) (predict)
= .25282025

variable dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

gpa .5338589 .23704 2.25 0.024 .069273 .998445 3.11719


score .0179755 .02624 0.69 0.493 -.033448 .069399 21.9375
method~y* .4564984 .18105 2.52 0.012 .10164 .811357 .4375

(*) dy/dx is for discrete change of dummy variable from 0 to 1


• The gpacoefficient of 2.8261 means, with other variables held constant,
that if gpaincreases by a unit, on average the estimated logit increases
by about 2.83 units, suggesting a positive relationship between the two.
• As you can see, all the other regressors have a positive effect on the logit,
although statistically the effect of methodology is not significant.
• However, together all the regressors have a significant impact on the
final grade, as the LR statistic is 15.40, whose p value is about 0.0015,
which is very small.
• ULLFLast Iteration/Log likelihood = -12.889633

• RLLFFirst Iteration/Log likelihood = -20.59173


• Likelihood ratio (LR) statistic= 2(ULLF-RLLF)

di 2*(-12.889633-(-20.59173 ))=15.404194
• P-Value=0.00150485: di chi2tail(3, 15.40)
 Upper Critical Boundary= 9.3484036: di invchi2tail(3, 0.025)
 Lower Critical Bundary= 0.21579528: di invchi2tail(3, 0.975)
 Since the P-value is less than α= 5% or the LR statistic (Chi square-
calculated)=15.404194 is greater than the lower and the upper critical
values, 0.21579528 and 9.3484036, we reject the null hypothesis that =
= =0. Thus, gpa,score and methodology are simultaneously affecting the
grade and the model is overall significant.
 To calculate Pseudo =1-(= 0.37403836

di 1-(-12.889633/-20.59173 )
• To conduct the LR test formally, use the following
commands:

logit grade gpa score methodology

estimates store ULLF

logit grade

estimates store RLLF

lrtest ULLF RLLF


• The same result!
The Probit Model
• It is a model that emerges from the normal CDF.
• It is also known as the Normit model.
• The probability density function for normal distribution and its CDF are
given, respectively, as follows: f()=exp[(] and for a specified value of =, F() = .
• It is based on the theory of utility or rational choice.
• Let assume that the decision of the ith family to own a house or not depends on
an unobservable utility index (also known as a latent variable), that is
determined by one or more explanatory variables, say income , in such a way
that the larger the value of the index , the greater the probability of a family
owning a house.
• We express the index as =, where is the income of the ith family.
• Let =
• There is a critical or threshold level of the index, call it *,such that
if exceeds * , the family will own a house, otherwise it will not.
• The threshold *, like , is not observable, but if we assume that it is
normally distributed with the same mean(=1) and variance(=0), it
is possible not only to estimate the parameters of the utility index
given above but also to get some information about the
unobservable index itself.
• Given the assumption of normality, the probability that *is less
than or equal to can be computed from the standardized normal
CDF as: E()= P()=P(≥* )= P(*)=P() =F()=F().
* =N(0,1)
• The PDF for standardized normal distribution: f()= f()= f()=
exp[(].
• The CDF for standardized normal distribution: F() =F()= F() = =
=.

• Thus, since represents the probability that an event will occur,


here the probability of owning a house, measured by the area of
the standard normal curve from −∞ to as =F()= F(), the utility
index can be obtained by:= = ()= the inverse of the normal CDF =
the critical value from the standardized normal distribution.
• Marginal effect: = * = * = f()*= f() = f() *
• Fit the following using probit model: =+++ +

probit grade gpa score methodology


• = + + +
= -7.45232+ 1.62581+0.0517289 +1.426332
sd. (2.542472) (0.6938825 ) (0.0838903) (0.5950379 )
z (-2.93) (2.34) (0.62) (2.40)
p-value (0.003) (0.019) (0.537) (0.017)
LR chi2(3)=
. probit grade gpa score methodology, nolog

Probit regression Number of obs = 32


LR chi2(3) = 15.55
Prob > chi2 = 0.0014
Log likelihood = -12.818803 Pseudo R2 = 0.3775

grade Coef. Std. Err. z P>|z| [95% Conf. Interval]

gpa 1.62581 .6938825 2.34 0.019 .2658255 2.985795


score .0517289 .0838903 0.62 0.537 -.1126929 .2161508
methodology 1.426332 .5950379 2.40 0.017 .2600795 2.592585
_cons -7.45232 2.542472 -2.93 0.003 -12.43547 -2.469166
• To predict the Probit, use the following command after estimation: predict
probit, xb

• At = 2.66, =20 and = 0 , the probit can be calculated manually as:

di -7.45232 + 1.62581*2.66 + 0.0517289*20 + 1.426332*0 -2.0930874

• To predict probability, use the following command after estimation:

predict probability or predict probability, pr

• To calculate probability manually,di normal(-2.0930874) [ =F(z)=


0.01817068]

• That means when gpa equals 2.66, diagnostic exam scored is 20 out of 100
and the teaching method is not new, the probability of scoring A grade is
1.817 percent.
• To calculate the marginal effect at the mean, = f()* =f(): mfx

• Manually:
 sum gpa score methodology

 di -7.45232+1.62581*3.11719+0.0517289 *21.9375 +1.426332 *


0.4375( = -0.62553833)
 di normalden(-0.62553833) (f(z)= 0.32805053)

 di normal(-0.62553833) (F()= 0.32805053=The probability of


scoring A at the mean)
 di 0.32805053*1.62581 ( =f()*= 0.53334783)

 di 0.32805053*0.0517289( =f()*= 0.01696969)


 For dummy Independent Variable:

=P() -P()
• di -7.45232+1.62581*3.11719 + 0.0517289 *21.9375 +1.426332
(0.17677342)
• di -7.45232 +1.62581*3.11719 +0.0517289 *21.9375 (-1.2495586)
• di normal(0.17677342) (0.57015682)
• di normal(-1.2495586) (0.10573042)
• di 0.57015682-0.10573042 (0.4644264)
• The probability of scoring A at mean increases by 46.44 percent when
teaching methodology changes from the old to the new.

• To calculate the marginal effect at a reperesentative Value:

mfx , at (2.66 20 0)
• While the gpaand the methodology coefficients are individually
significant, the score coefficient is insignificant.
• In addition, together all the regressors have a significant impact on the
final grade, as the LR statistic is 15.55, whose p value is about 0.0014,
which is very small.
• To change the Probit coefficients to Logit, multiply the Probit estimate
of a parameter by 1.6:
di 1.6*-7.45232 [()= -11.923712]
di 1.6*1.62581 [()=2.601296]
di 1.6*0.0517289 [()=0.08276624]
di 1.6*1.426332 [()=2.2821312]
=-11.923712+2.601296+0.08276624+2.2821312 (Logit from Probit)

=-13.02135+ 2.826113+0.0951577+ 2.378688 (Actual Logit)


• To change the Logit coefficients to Probit, multiply the Logit estimate of a
parameter by 0.625.
• To change the Logit coefficients to LPM:

= LPM coffecient except intercepet=0.25*

= LPM intercepet=0.25* + 0.5

• The standard logistic (the basis of logit) and the standard normal distributions
(the basis of probit) both have a mean value of zero, their variances are

different; 1 for the standard normal and for the logistic distribution, where .
• To compare them graphically, use the following commands:
• set obs 600
• egen x=fill(-300 -299)
• replace x=x/100
• gen probit=1/sqrt(2*3.1415)*exp(-((x^2)/2))
• gen logit=(exp(x))/[[1+exp(x)]^2]
• twoway (connected probit x) (connected logit x)
• gen cumul_logit=sum(logit)
• gen cumul_probit=sum(probit)
• twoway (connected cumul_probit x) (connected cumul_logit
100
80
60
40
20
0

-4 -2 0 2 4
x

cumul_probit cumul_logit
Logit vs Probit Model
 Both models give qualitatively similar results.
 The estimates of parameters of the two models are not
directly comparable.
 Both give statistically sound results as ) 1

 Both models show non-linear relationship b/n &.

• Both modes arise from the cumulative distribution


function (CDF), having S-shaped curves.

• Both models produce asymptotically consistent, efficient


and normal estimates.
 Both logit and probit models are estimated using
maximum likelihood (ML) method as a standard OLS is
not applicable.
 The objective of ML method is to obtain the values of unknown
parameters in such a manner that the probability of observing a
given Y is as high as possible.
 Measure of fit:

 the overall significance of both logit and probit models is


based on likelihood ratio test (LR statistic).
 Pseudo is a measure of goodness of fit in both models.
Differences
 Logit model arises from Logistic CDF while Probit Model
from Normal CDF
 The logit model has slightly fatter tails: the conditional
probability approaches zero or one at a slower rate in Logit
than in Probit.
 The probit curve approaches more quickly than the logistic
curve.
 Logit model is relatively simple to analyze and interpret
compared to probit model.

You might also like