Econometrics All Chpter
Econometrics All Chpter
Oikovomia Economy
Econo-metrics
Metopov Measure
6
Econometrics is:
A conjunction of economic theory and actual measurements,
using the theory and technique of statistical inference as a bridge
pier (By T.Haavelmo, 1944)
The Application of the mathematical statistics to economic data
in order to lend empirical support to the economic mathematical
models and obtain numerical results (Gerhard Tintner, 1968)
The quantitative analysis of actual economic phenomena based
on concurrent development of theory and observation, related
by appropriate methods of inference (P.A.Samuelson,
T.C.Koopmans and J.R.N.Stone, 1954)
Example:
Relationship b/n demand for a good & price of a good
Relationship b/n job performance & leadership style
Relationship b/n tuition fee & number of students enrolled
Relationship b/n advertising expenditure & market share
Relationship b/n education and earnings
Relationship b/n CGPA and hours of studying
8
Economic Mathematical
Theory Economics
Scope of
Econometrics
Statistics
9
Economic Mathematical Mathematical
Theory Economics Economics
Economics Mathematics
Scope of Econometrics
Econometrics
Differences
No essential difference between economic theory and
mathematical economics: way of expression
Both state economic relationships in exact form.
Economic Model
is an organized set of relationships that describes the
functioning of an economic entity under a set of
simplifying assumptions.
consists of three structural elements:
1. A set of variables
2. A list of fundamental relationships
3. A number of coefficients
postulates exact (deterministic) relationships among
15
variables.
An Econometric model:
consists of behavioral equation derived from economic
models and specification of probability distribution of errors.
contains a random element which is ignored by economic
models.
has two parts: observed variables and disturbances
postulates stochastic (random) relationships among variables.
Econometric Model:
where u stands for the error term.
makes the distinction between economic and
econometric models.
The main difference between the economic modeling and
econometric modeling is economic modeling is exact in
nature whereas the later contains a stochastic term also.
Why we include the error term in the model???
17
1.3. Aims and Methodology of Econometrics
Three main goals of Econometrics
Analysis - testing the implication of a theory
verifying how well economic theories explain the observed
behavior of economic units.
Policy making - Obtaining numerical estimates of the
coefficients of economic relationships for policy simulations.
18
Methodology of Econometrics
Econometric research is concerned with the
measurement of parameters of economic relationships
and prediction of values of variables
22
2. Estimation of the Model: Testing maintained hypothesis
• It is about providing numerical estimates of parameters of
the model.
• It is purely a technical stage, which requires knowledge of:
the various econometric methods and their
assumptions,
the economic implications for estimates
• This is stage involves:
Data collection: Gathering of the data on the variables
included in the model
Examining the aggregations problems of the model:
examining the possibility of aggregation errors in the
estimates of coefficients 23
Examining of the identification conditions of the model:
checking whether the parameters are the true coefficients
of the estimated model and to determine whether a
relationship can be statistically estimated or not.
Examining of the degree of multicollinearity : the degree
of correlation between explanatory variables.
Choice of appropriate econometric techniques for
estimation: OLS, Logit, probit, VEC, ARDL…..
single equation techniques- applied to one equation at a
time
Simultaneous-equation techniques- applied to all
equations of a system.
24
Some important criteria for choice of appropriate
estimation technique:
– the nature of relationship and its identification condition
– desirable properties of estimates obtained: unbiasedness,
efficiency, consistency and sufficiency
– the purpose of econometric research: analysis,
forecasting, policy making
– simplicity of the technique: easy computation and less-
data requirements
– time and cost requirements
25
3. Evaluation of Estimates
• It is about the determining of the reliability of estimates
(results of the model)
• It consists of deciding whether the estimates are
theoretically meaningful, statistically satisfactory and
econometrically correct.
Economic criteria
• Evaluation criteria involves: Statistical Criteria
I. Economic a priori criteria: Econometric criteria
- refer to the size (magnitude) & sign of the parameters
- are determined by economic theory
- Estimates with wrong signs or size should be rejected
unless there is a good reason to believe the result.
26
II. Statistical criteria: First order test
- aim at evaluating statistical reliability of estimates
- are determined by statistical theory
- Correlation coefficient test, standard error test, t-test,
F-test, & R2-test are some of the most commonly
used statistical tests. Post-
estimation
III. Econometric criteria: second order tests tests
– Aim at the detecting the violation or validity of the
assumptions of the econometric technique employed.
– are determined by econometric theory
– determine the reliability of statistical criteria
– help us establish whether the estimates have desirable 27
4. Evaluation of forecasting power of the model:
• This stage involves the investigation of the stability of the
estimates and their sensitivity to changes in the size of the
The ability of the model to predict the future values of dep.
sample. Var.
• Extra-sample performance of the model: outside sample data
• Some of ways of establishing the forecasting power of the
model are:
Using estimates of the model for a period not included
in the sample.
re-estimating the model with an expanded sample
(sample including additional observations)
• Conducting test of statistical significance for the difference
between the actual (original) and new (forecast) values to
check the forecasting power of the model. 28
Summary of Econometric Modeling
Mathematical Forecasting or
model of the theory prediction
Collecting Estimation of
Data econometric model
29
Desirable properties of an econometric model
• The ‘goodness’ of an econometric model is judged according to the
following desirable properties.
1. Theoretical plausibility: The model should be compatible
(consistent) with the postulates of economic theory.
Model (CLRM)
38
Modern Definition of Regression
Regression analysis refers to estimating functions showing
the relationship between two or more variables and
corresponding tests.
It is a technique of studying of the statistical dependence
of one LHS variable: dependent variable, on one or more
RHS variables: independent variables, with a view to
estimate and/or predict average value of the former on
the basis of fixed values of the latter.
Regression does not necessarily imply causation: A
statistical relationship in itself cannot establish causation
connection.
Causation must come from outside statistics, ultimately
from some theory or common sense. 39
Different Terminologies of Variables
Explanatory Variable(s)
Dependent Variable
Independent Variable(s)
Explained Variable
Predictand Predictor(s)
Regressand Regressor(s)
Response Stimulus or control variables
Endogenous Exogenous(es)
Regression analysis assumed that dependent variable is
stochastic and independent variable is fixed ( non-stochastic)40
Regression analysis has following objectives & uses
to show the relationship among variables.
to estimate average value (mean) of the dependent
variable given the value of independent variable(s);
to test hypothesis about sign and magnitude of
relationship
to forecast future value(s) of the dependent variable
44
Stochastic relationship
45
Simple Linear Regression
– The term ‘simple’ refers to the fact that we use only two
variables (one dependent and one independent variable).
– Linearrefers to linear in parameters. it may or may not be
linear in the variable. The parameters appear with a power of
7
46
𝒀 𝒊 = 𝜷 𝟎 + 𝜷 𝟏 𝑿 𝒊 +𝒖 𝒊
Deterministic Random (stochastic)
component component
𝜷𝟎
𝜷𝟏
48
Error (disturbance) term (): It is a proxy for all variables
that are not included in the regression model, but may
collectively affect Y.
Why we include in the model? Sources of error term
1. It captures the effect of omitted variables.
Why omitted?
Lack of data and limited knowledge
Vagueness of theory
Difficulty in measuring some factors
Poor proxy variables
Principle of parsimony: keeping our model simple and
manageable
49
2. Random behavior of human beings: human reactions are
unpredictable.
3. Measurement Errors: variables may be measured
inaccurately
4. Wrong model Specification due to
wrong mathematical form
Exclusion of important variables & inclusion of irrelevant ones
(stochastic version)
, E(
51
Conditional Expectation
X 80 100 120 140 160 180 200 220 240 260
Y
Weekly 55 65 79 80 102 110 120 135 137 150
family
consumption 60 70 84 93 107 115 136 137 145 152
expenditure 65 74 90 95 110 120 140 140 155 175
Y
70 80 94 103 116 130 144 152 165 178
75 85 98 108 118 135 145 157 175 180
-- 88 -- 113 125 140 -- 160 189 185
-- -- -- 115 -- -- -- 162 -- 191
Total 325 462 445 707 678 750 685 1043 966 1211
𝒀 𝒊= ^
𝜷𝟎 + ^
𝜷 𝟏 𝑿 𝒊+ 𝒆𝒊
SRF ^ =^ ^
𝒀 𝒊 𝜷 𝟎 𝜷𝟏 𝑿𝒊
+
57
Assumptions of Classical Linear Regression Model
CLRM is a basic framework of regression analysis and
based on a set of assumptions.
Statistical inferences are valid if model assumptions are
reasonably satisfied in practice.
61
Y exhibits similar amounts of variance
across the range of values for X
62
63
7. The assumption of NO-autocorrelation (no serial correlation):
64
65
8. The assumption of zero covariance between
9. Normality Assumption:
is assumed to have a normal distribution with zero mean
and constant variance.
10. No model specification error: the model is
correctly specified
66
11. The number of observations must be greater than the
number of explanatory variables
12. The assumption of no perfect multicollinearity (Ch-3)
SRF:
^
𝒆 𝒊=𝒀 𝒊 − 𝒀 𝒊
69
^ =𝒀 − ^
𝒆 𝒊=𝒀 𝒊 − 𝒀 𝜷 − ^
𝜷𝟏 𝑿 𝒊
𝒊 𝒊 𝟎
Question: how is the SRF determined in such a way that it is
as close as possible to PRF?
– How to make sure that and close to and respectively.
It calls for minimization.
Criteria
1. Minimizing choosing SRF in such a manner that the sum
of residual is as small as possible.
70
71
2. Least square Criterion:
The SRF can be fixed in such a way that residual sum of
Squares (RSS) is a small as possible.
It is an important criterion. Why?
a. It gives fairweight to each type of residuals
b. Estimators obtained have some attractive statistical
properties.
Hence,
72
Thus we minimize subject to and .
For minimization, take partial derivatives with respect to
and
and
At minimum of ,
73
Finally, the process of differentiation yields the following
equations for estimating and
75
Gauss-Markov Theorem
Summarizes the statistical properties of OLS estimators.
states that OLS estimator has the least (minimum) variance of
all estimators in the class of linear & unbiased estimators.
76
Statistical Tests of Significance of OLS estimators:
First order tests for the evaluating the parameter.
The two most commonly used first order tests in
econometric analysis:
77
1. The coefficient of Determination (
78
1. The coefficient of Determination (
Unexplained
variation
Total
variation in Y
Explained
variation
79
The total variation in Y has two parts: due to regression
line and due to residual.
-Deviation of Y from its mean: ……. (1)
-Deviation of from its mean: ……… (2)
-Deviation of Y from the regression line: …. (3)
, but Why??
80
TSS = ESS + RSS
( total ) (explained ) (unexplained
variation) variation) variation)
Thus,
81
If
RSS =0 or ESS=1, the model is perfect.
RSS=1, no relationship between Y and X (=0)
82
2. Testing the significance of OLS estimators
The interest of econometricians is not only in obtaining
the estimator, (point estimation) but also in using it to
make inferences about the true parameter, (interval
estimation).
For this purpose, we need:
The variance ( of OLS estimators
Unbiased estimator of
The Normality assumption of :
Why?
One property of Normal distribution states that “ any
linear function of a normally distributed variable is itself
normally distributed.”
Since the OLS estimators are linear functions of Y, then
they are also normally distributed.
84
is unknown in practice since it’s difficult to get population
data, so we need unbiased estimator of .
prove it.
With the assumption of Normality distribution, the OLS
estimators have the following statistical properties
1.They are unbiased
2. They are efficient ------ Precise estimation.
3.They are consistent: as the sample size increases
indefinitely, the estimators converge to their true
population values.
85
Since OLS estimator, is normally distributed,
1. (if Standard Normal distribution
2. (if
t-distribution
86
87
We need a test of significance of estimators to
measure the validity of estimates:
to measure the size of sampling error
to determine the degree of confidence
89
Hypothesis
--- X is statistically insignificant
(no relationship between X and Y)
--- X is statistically significant
(relationship between X and Y)
Decision Rule:
if SE reject (accept )
if SE fail to reject
91
2. t-statistic approach (students’ t-test)
Steps
I. Compute the t-statistic ():
II. Choose the level of significance (: 1%, 5% and 10%
III. Find critical value of t (:
Decision Rule:
If reject (accept)
if fail to reject (reject
94
Predictions with simple Linear Regression
One of the most important applications of regression
analysis is prediction of the dependent variable based on a
given value of the regressors.
Once the estimated parameters are proved to be
significant and valid, then it can be used to
forecast/predict future values of dependent variable.
There are two kinds of predictions
Mean prediction: it is the prediction of the conditional
mean value of Y corresponding to a chosen X.
Individual Prediction: it is prediction of individual Y value
corresponding to a given X.
95
Example: Consider a SLRMthat relates the consumption
(Y) and income of households (X). The following data were
obtained from sample of 10 observations.
X 10 7 10 5 8 8 6 7 9 10
Y 11 10 12 6 10 7 9 10 11 10
Question
1. Estimate the consumption function and interpret the results
2. Estimate Standard errors of regression coefficients
3. What percent of variation is explained by the regression line (?
4. Test the hypothesis that income influences consumption at 95%
CI.
5. Predict consumption at income of 20.
96
X Y XY
10 11 100 110 11.1 -0.1 0.01 1.4 1.96 2 4 2.8 1.5 2.25
10 12 100 120 11.1 0.9 0.81 2.4 5.76 2 4 4.8 1.5 2.25
10 10 100 100 11.1 -1.1 1.21 0.4 0.16 2 4 0.8 1.5 2.25
^ =^
𝒀 𝜷 ^
𝒊 𝟎 𝜷 𝟏 𝑿 𝒊 =𝟑 .𝟔 +𝟎 . 𝟕𝟓 𝑿 𝒊
+ 98
: a one unit increase in income leads to an increase in
consumption by 0.75 units, on average.
amount of consumption with a zero income.
(minimum level of consumption)
2.
Then, SE
SE
99
52% of the total variation in consumption is explained
by variation in income.
4. Hypothesis:
104
The multiple linear regression Model (MLRM) is:
PRF:
the conditional mean value of
-- average value of of all regressors excluded from the
model
--- measures the change in mean value of per unit
change in , holding the value of .
--- measure the net effect of a unit change i,
keeping constant
108
and
However, since PRF is unknown to us, it has to be
estimated from sample data.
SRF:
109
Estimation Problem: OLS Method Revised
The OLS estimators are derived by minimizing RSS ()
To obtain the OLS estimators, we partially differentiate
with respect to , and and the partial derivatives equal to
zero.
∑ 𝒆𝒊 =𝟎
∑ 𝒙𝟏 𝒆𝒊 =𝟎
∑ 𝒙𝟐 𝒆𝒊 =𝟎
110
After rearranging the above expressions, we produce the
following three normal equations
(1)
(2)
(3)
111
From (2) and (3),
112
Finally, we obtain the OLS estimators as:
113
If and are uncorrelated, the multiple regression
coefficient of is similar to the simple regression
coefficient of on .
Note
114
Variance of s
]
Where,
115
Coefficient of Determination (
116
TSS ESS RSS
Hence,
117
As in a simple regression, is a measure of the
proportion of total variation in Y which is explained by
the regressors in the model. (measure goodness of fit)
If is high, the model is Said to fit the data well.
One measure problem of is that it always increases and
never decreases with every regressor added to the
model. (i.e. even if the variables added have no
economic justification.)
So, in order to incorporate the impact of changes in the
number of regressors in our model, it is necessary to
adjust with degrees of freedom.
118
This will be done by using an alternative
measure of goodness of fit called adjusted (
Note
1. if =1 and k=1
2. can be negative if =0 and k>1
3. will decrease if irrelevant variable
is added.
It is sometimes used as a device for selecting the
appropriate set of regressors.
120
The main difference between R2 and the adjusted R2:
R2 assumes that every single variable explains
the variation in the dependent variable.
The adjusted R2 tells you the percentage of variation
explained by only the independent variables that actually
affect the dependent variable.
121
General Linear Regression Model:
(using Matrix Approach)
It is a MLRM with k- explanatory variables.
122
Then the equations can be put in matrix form as:
Y X U
(n*1) (n*(k+1)) [(k+1)*1) (n*1)
In short,
123
PRF: (unkown to us. Why?)
SRF:
To derive the OLS estimators of under the usual
assumptions, we need to minimize
124
. Why??
125
First Order Condition (FOC)
126
(multiply both sides by
127
Statistical Properties of OLS estimators
Like a two variable case, satisfy the BLUE property.
128
129
130
Coefficient of Determination (
can be derived in matrix form as:
131
132
133
Hypothesis Testing In MLRM
In MLRM, there are two tests of significance.
I. Tests of Individual Significance
II. Test of overall significance of the model
134
Hypothesis
is statistically insiginificant
is statistically siginificant
is statistically insiginificant
is statistically siginificant
135
Null Hypothesis states that has no (linear)
influence on Y.
Common Testing Appraoches
1. Standard error test
If SE (, we reject the
If SE (, we reject the
NB:The smaller the SEs, the stronger the evidence
that the estimates are statistically reliable.
136
2. Student’s t-test
If, we reject .
137
2. Test of overall (joint) significance
It is a joint test of the relevance of all regressors
included in the model.
Hypothesis
138
Here, the null hypothesis is the hypothesis that
are jointly
or simultaneously equal to zero. (all slope
coefficients are zero).
The alternative hypothesis states that not all ‘s
are simultaneously zero.
When is assumed to be NOT True, is
Unresricted RSS (URSS).
Whenis assumed to be TRUE, is resricted RSS
(RRSS).
139
When we reject
140
If we fail to reject (if is assumed to be true) or when
all slope coefficients are zero,
Thus,
141
In Joint significance test, the testing procedure is
based on the comparison of RSS from the original
regression model to RSS from a regression model
where is assumed to be true.
Now,
142
F-test is a measure of overall significance of
estimated regression (model adequacy)
It is the test of significance of
143
Here, the calculated value of F is compared with
the critical value of F which leaves the probability
of α in the upper tail of F-distribution with k-1
and n-k degrees of freedom.
If we reject . ---- all explanatory variables are
jointly significant (Y is linearly related with all
regressors).
144
Overall Significance
Rejection Rule
RejectH0 in favor of Ha if fcalc falls in
colored area
Reject H0
Do Not
Reject H0
0 F
F ( k , n -K-1, 1-α)
Reject H0 for Ha if P-value = P(F>fcalc)<α
145
146
147
148
149
150
151
152
153
154
Chapter Five
Regression Analysis
with Qualitative
Information
155
The Nature of Qualitative Information
Sometimes we can not obtain a set of numerical
values for all the variables we want to use in a
regression model.
This is because some variables can not be
quantified easily.
Examples:
(a) Gender may play a role in determining salary levels
(b) Different ethnic groups may follow different
consumption patterns
So far, we have concerned with regression analysis
on variables which are quantitative in nature.
- They are recorded on a well defined scale
Example: consumption, price, income, experience….
However, some variables are essentially qualitative
in nature. ---- defined on nominal scale
Example: sex, colour, nationality, region………
Such variables do not have any natural scale of
measurement.
They usually indicate the presence of a quality or an
attribute
157
Question: How can we quantify a qualitative
information?
Dummy Variables
aka indicator or categorical variables
Are artificial variables constructed to quantify nominal
scale variables taking on values of ‘1’ or ‘0’
1- indicates the presence of qualitative attribute
0- indicates the absence of that attribute
Classify the data into mutually exclusive categories
Are proxy variables for qualitative characteristic in
regression model. 158
We have n- dummy variables for n-categories.
159
Some Notes while using dummy regressors
1. When we have a dummy variable for each category
and an intercept in our model, we face a perfect
multicollinearity problem.
Dummy variable Trap:
it is a situation of perfect multicollinearity if the
dummy variables for each category and intercept term
are included in the model.
Solution
a) introduce n-1 dummies for n- categories in our
model. (the number of dummies should be one less
than the number of categories) 160
Stata command: reg y i.x
b) Drop the intercept term if as many dummy variables
are introduced as the number of categories.
Stata command: reg Y D1 D2 , noconstant
2. The category for which no dummy variable is
assigned is called base/bench mark/reference/
omitted category.
All comparisons are made in relation to the base
category
3. The intercept term represents the mean value of the
base category
161
4. Coefficients attached to dummy variables are called
differential intercept coefficients.
- They are interpreted as the change in the value of the
dependent variable compared with the base category.
- They must always be interpreted in relation to the
base category.
5. The choice of base category is up to the researcher
The choice of which of the two different outcomes is
to be assigned the value of 1 does not alter the
results.
162
Stata command:
To generate dummy:
gen D1=(varname==1)
gen D2=(varname==2)
or
xi i.varname, noomit
163
i) Regression with a single Dummy regressor
Consider , where
is wage and
0) ------ mean wage for male
1) ----- mean wage for female
=1)=0) ----- differential intercept coefficient
It measures the mean wage difference b/n male and
female.
164
Whether the average wage of female is greater
than their male counterparts depends on the sign and
significance of
A positive and significant implies that the mean wage
of female workers is greater than that of males by
amount.
How can we know that sex significantly affects wage?
Hypothesis
Ho: -------- no sex discrimination
H1: ----- salary difference b/n M & F
165
wage
𝜷 𝟎+ 𝜷𝟏
𝜷𝟏
𝜷𝟎
𝑫𝒊
166
Example: ((using wage data)
,
Stata command: reg wage i.sex
167
ii) Regression with one quantitative and one
dummy regressor
168
If females are paid more (by ), on average, as
compared to male counterparts for the same years of
experience.
salary ( 𝜷𝟎 + 𝜷 𝟏) + 𝜷 𝟐 𝑿 𝒊
𝜷𝟏 𝜷 𝟎+ 𝜷𝟐 𝑿 𝒊
𝜷 𝟎+ 𝜷𝟏
𝜷𝟎
experience
169
E
where
----- mean salary for BA holders
------ the mean salary differential b/n BA and MSc
----- the mean salary differential b/n BA and PhD
171
Stata command: reg wage i. educexp
Or xi i.education, noomit
regwage _Ieducation_1 _Ieducation_2 exp
(0.000) (0.000)
173
Stata command: reg wage sex i. education exp
or xi i.education, noomit
reg wage sex _Ieducation_1 _Ieducation_2 exp
. reg wage i.sex i.education experience
sex
female -.9461648 .957455 -0.99 0.327 -2.864948 .972618
education
MSc 6.262634 1.253719 5.00 0.000 3.750125 8.775143
PhD 12.80867 1.875075 6.83 0.000 9.050939 16.56641
175
Note that, in the above case, the effect of one dummy
variable is independent of the other dummy variable
(additive effect--- no interaction).
For example, if the mean wage for male workers is
higher than female counterparts, this is so whether
they do have BA or MSc or PhD.
178
Stata command: gen sed= sex* education
reg wage sex i. education i.se experience
. reg wage i.sex i.education i.se experience
sex
female -1.297905 1.7052 -0.76 0.450 -4.718102 2.122292
education
MSc 6.393786 1.767796 3.62 0.001 2.848036 9.939536
PhD 11.7254 2.427275 4.83 0.000 6.856904 16.5939
se
female Msc -.3059176 2.268282 -0.13 0.893 -4.855516 4.24368
female PhD 1.836186 2.541282 0.72 0.473 -3.26098 6.933352
181
Stata command
gen sexexp= sex*experience
reg wage sex experience sexexp
. reg wage sex experience sexexp
education
MSc 8.556469 3.579159 2.39 0.020 1.380686 15.73225
PhD 18.56934 4.311517 4.31 0.000 9.925272 27.21341
184
Vii). Dummy variables in semi-log models
In semi-log models, the coefficient of dummy variable,
when multiplied by 100%, is interpreted as percentage
difference in Y.
Exact percentage difference:
Stata command
gen lnwage= ln(wage)
reglnwage sex i.educationexperience
185
. reg lnwage sex i.education experience sexexp _IeduXexper_1 _IeduXexper_2
education
MSc .5249681 .1237604 4.24 0.000 .2766247 .7733115
PhD .8948986 .1486981 6.02 0.000 .596514 1.193283
2.
2.
• )1
190
2.
• Example:
• ) is the probability of an individual deciding to work at
a given amount of wage.
• Stata command: reg Y X
• = -0.9456861 + 0.102131 .
192
Limitation of LPM
1. Non-normality of error term
The error term follows a Bernoulli distribution.
A Bernoulli distribution is an experiment with two
outcomes: success with probability P and failure
with probability (1-P) where it has mean of P and
variance P(1-P) .
193
2. Hetroscedastic Variance of the error term
-------- prove it.
Consequence: LPM estimates are inefficient
194
• To avoid the heteroscedastic variance,
predict p
gen pf=1- p
gen w= p* pf
solution
If the estimate of E( )0, assume probability as 0.
If the estimate of E()1, assume probability as 1.
196
4. Questionable value of as a measure of
goodness of fit
The conventional coefficient of determination is of
limited value to judge the goodness of fit of the
model.
5. Functional Form
• E()= + is linear: LPM assumes that E() increases
linearly with (Marginal effect of is constant), which
may be unrealistic.
• E() is non-linearly related to in practice.
(normit) model.
Logit Model
• For the binary choice model: =+, where = and = income, let us represent E( )= P()
= .
• As approaches to , approaches to 1.
• is non-linearly related to :
=
• =.
=.
• = ln() = .
• For estimation purpose, we write the Logit model as follows:
= ln( ) = + .
Features of the Logit model
• As P goes from 0 to 1 (i.e., as Z varies from −∞ to+∞), the Logit L goes
from −∞ to +∞ or Range: ( -, )
• L is linear in X.
• In general, if you take the antilog of the jth slope coefficient , subtract 1 from
it, and multiply the result by 100%, you will get the percent change in the odds
LR chi2(3)= 15.40
. logit grade gpa score methodology, nolog
• To predict the odds, use the following command after defining Logit: gen
odds=exp(Logit)
• That means when gpa equals 2.66, diagnostic exam scored is 20 out of 100 and the
teaching method is not new, the probability of scoring A grade is 2.658 percent.
• In order to report the odds ratio, use the following command:
logistic grade gpa score methodology or logit grade gpa score methodology, or
• To calculate the odds ratio for one unit increase in gpa manually, that is, :
di exp(2.826113)16.879722
• To calculate the percentage change in the odds ratio for one unit increase in
gpa, score and methodology, respectively:
di(16.879722-1)*100%=1587.9722%
di ( 1.099832-1)*100%=9.9832%
di ( 10.79073-1)*100%=979.073%
• The last one suggests that students who are exposed to the new method of
teaching are more than 10 times (900 percent) likely to get an A than
students who are not exposed to it, other things remaining the same.
. logistic grade gpa score methodology
• =P( ) - P()
di 2*(-12.889633-(-20.59173 ))=15.404194
• P-Value=0.00150485: di chi2tail(3, 15.40)
Upper Critical Boundary= 9.3484036: di invchi2tail(3, 0.025)
Lower Critical Bundary= 0.21579528: di invchi2tail(3, 0.975)
Since the P-value is less than α= 5% or the LR statistic (Chi square-
calculated)=15.404194 is greater than the lower and the upper critical
values, 0.21579528 and 9.3484036, we reject the null hypothesis that =
= =0. Thus, gpa,score and methodology are simultaneously affecting the
grade and the model is overall significant.
To calculate Pseudo =1-(= 0.37403836
di 1-(-12.889633/-20.59173 )
• To conduct the LR test formally, use the following
commands:
logit grade
• That means when gpa equals 2.66, diagnostic exam scored is 20 out of 100
and the teaching method is not new, the probability of scoring A grade is
1.817 percent.
• To calculate the marginal effect at the mean, = f()* =f(): mfx
• Manually:
sum gpa score methodology
=P() -P()
• di -7.45232+1.62581*3.11719 + 0.0517289 *21.9375 +1.426332
(0.17677342)
• di -7.45232 +1.62581*3.11719 +0.0517289 *21.9375 (-1.2495586)
• di normal(0.17677342) (0.57015682)
• di normal(-1.2495586) (0.10573042)
• di 0.57015682-0.10573042 (0.4644264)
• The probability of scoring A at mean increases by 46.44 percent when
teaching methodology changes from the old to the new.
mfx , at (2.66 20 0)
• While the gpaand the methodology coefficients are individually
significant, the score coefficient is insignificant.
• In addition, together all the regressors have a significant impact on the
final grade, as the LR statistic is 15.55, whose p value is about 0.0014,
which is very small.
• To change the Probit coefficients to Logit, multiply the Probit estimate
of a parameter by 1.6:
di 1.6*-7.45232 [()= -11.923712]
di 1.6*1.62581 [()=2.601296]
di 1.6*0.0517289 [()=0.08276624]
di 1.6*1.426332 [()=2.2821312]
=-11.923712+2.601296+0.08276624+2.2821312 (Logit from Probit)
• The standard logistic (the basis of logit) and the standard normal distributions
(the basis of probit) both have a mean value of zero, their variances are
different; 1 for the standard normal and for the logistic distribution, where .
• To compare them graphically, use the following commands:
• set obs 600
• egen x=fill(-300 -299)
• replace x=x/100
• gen probit=1/sqrt(2*3.1415)*exp(-((x^2)/2))
• gen logit=(exp(x))/[[1+exp(x)]^2]
• twoway (connected probit x) (connected logit x)
• gen cumul_logit=sum(logit)
• gen cumul_probit=sum(probit)
• twoway (connected cumul_probit x) (connected cumul_logit
100
80
60
40
20
0
-4 -2 0 2 4
x
cumul_probit cumul_logit
Logit vs Probit Model
Both models give qualitatively similar results.
The estimates of parameters of the two models are not
directly comparable.
Both give statistically sound results as ) 1