7 binaryresponsemf
7 binaryresponsemf
• The labor market status of a person. The variable takes the value
1 if a person is employed and 0 if he is unemployed. The values 1
and 0 can be assigned arbitrarily.
yi = β0 + β1 xi + vi , E(vi ) = 0
is called linear probability model in this context. This linear model is not
an adequate statistical model as the expected value E(yi |xi ) = β0 + β1 xi
can lie outside [0,1] and does not represent a probability. In addition,
the error term is heteroscedastic as V (vi |xi ) = (β0 + β1 xi )(1−β 0 −β 1 xi )
depends on xi .
is a single linear index. The function F maps the single index into [0,1]
and satisfies in general
where φ(.) is the pdf and Φ(.) the cdf of the standard normal distribution.
In the logit model, the transformation function F is the logistic func-
tion. The response probabilities are then
ez i 1
P (yi = 1|xi1 , ..., xiK ) = =
1+e z i 1 + e−zi
Figure 1 shows the transformation function F for the two models.
Note: The Logit and Probit model are almost identical and the choice
of the model is usually arbitrary. However, the parameters β of the two
3 Short Guides to Microeconometrics
0.8
← Logit
P [yi = 1|zi ] = F (zi )
0.6
0.4
0.2
Rescaled Logit → ← Probit
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
zi
Figure 1: Mapping of the linear index zi in the probit model, the logit
model and the rescaled logit model (factor 1.6).
4
E(y*)
E(y)
OLS
3
latent
observed
1
y
−1
−2
−3
0 2 4 6 8 10
x
The latent variable yi∗ can be interpreted as the utility difference between
choosing yi = 1 and 0. In is then called a random utility model.
5 Short Guides to Microeconometrics
P (yi = 1|xi1 , ..., xiK ) = P (yi∗ > 0|xi1 , ..., xiK ) = P (zi + ui > 0|xi1 , ..., xiK )
= P (ui > −zi |xi1 , ..., xiK ) = 1 − Φ(−zi /σ)
z
i β0 β1 βK
= Φ =Φ + xi1 + ... + xiK .
σ σ σ σ
Different from the linear regression model, the parameters β cannot di-
rectly be interpreted as marginal effects on the dependent variable yi . In
some situations, the index function zi = β0 + β1 xi1 + ... + βK xiK has a
clear interpretation in a theoretical model and the marginal effect βk of a
change in the independent variable xik on yi∗ is meaningful. Even then,
the marginal effect is only identified if there is reason to set σ 2 to unity.
In general, we are interested in the marginal effect of a change in xik
Binary Response Models 6
This marginal effect depends on the values of all explanatory variables xik
for observation i. Therefore, any individual has a different marginal effect.
There are several ways to summarize and report the information in the
model. A first possibility is to present the marginal effects for the “mean
type”, i.e. xik = x̄ik for all k , the “median type”, or some interesting
extreme types. A second approach is to calculate the marginal effects for
all observations in the sample and report the mean of the effects.
The estimated model can also be used for predictions
The probit and logit models are estimated by maximum likelihood (ML).
Assuming independence across observations, the likelihood function is
Y Y
L = P (yi = 0|xi ) P (yi = 1|xi )
{i|yi =0 } {i|yi =1 }
N
Y
= [1 − F (zi )]1−yi F (zi )yi
i=1
where P (yi = 1|xi1 , ..., xiK ) = F (zi ) = Φ(zi ) in the probit model and
P (yi = 1|xi1 , ..., xiK ) = F (zi ) = ezi /(1 + ezi ) in the logit model. The
corresponding log likelihood function is
N
X
log L = [(1 − yi ) log (1 − F (zi )) + yi log F (zi )]
i=1
The first order conditions for an optimum are in general, for all k including
a constant xi0 = 1
N
∂ log L X −f (zi ) f (zi )
= (1 − yi ) + yi xik = 0
∂βk i=1
1 − F (zi ) F (zi )
7 Implementation in Stata 17
The probit and logit model are estimated with the probit and, respec-
tively, logit command. For example, load data
webuse auto.dta
and estimate the effect of the explanatory variables weight and mpg on
the dependent dummy variable foreign with the probit model
probit foreign weight mpg
Response probabilities are estimated for each observation with the post-
estimation command predict:
predict p_foreign, pr
Marginal effects for specific types are calculated with the post-estimation
command margins. For example, the marginal effects for a car with
weight of 2000 lbs. and 40 mpg is reported by
margins, dydx(*) at(weight = 2000 mpg = 40)
The marginal effects for the mean type, e.g. a car with average weight
and mpg, are calculated with
margins, dydx(*) atmeans
8 Implementation in R 4.3.1
The probit and logit model are estimated with the command glm which
fits generalized linear models. For example, load data
library(haven)
auto <- read_dta("https://www.stata-press.com/data/r17/auto.dta")
and estimate the effect of the explanatory variables weight and mpg on
the dependent dummy variable foreign with the probit model
probit <- glm(foreign~weight+mpg, family=binomial(link=probit),
data=auto)
summary(probit)
Tests and confidence bounds for maginal effects are reported with
mfx <- margins(probit, at = list(weight=2000, mpg=40))
summary(mfx)
References
Introductory textbooks
Advanced textbooks
Articles