0% found this document useful (0 votes)
0 views

7 binaryresponsemf

This document provides an overview of binary response models, specifically focusing on the probit and logit models used in microeconometrics. It discusses the limitations of linear probability models, introduces the concept of latent variables, and explains the estimation methods including maximum likelihood and ordinary least squares. Additionally, it includes practical implementation examples in Stata and R for estimating these models.

Uploaded by

Sandeep Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views

7 binaryresponsemf

This document provides an overview of binary response models, specifically focusing on the probit and logit models used in microeconometrics. It discusses the limitations of linear probability models, introduces the concept of latent variables, and explains the estimation methods including maximum likelihood and ordinary least squares. Additionally, it includes practical implementation examples in Stata and R for estimating these models.

Uploaded by

Sandeep Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Short Guides to Microeconometrics Kurt Schmidheiny

Fall 2023 University of Basel

Binary Response Models


matrix-free
1 Introduction

Many dependent variables of interest in economics and other social sci-


ences can only take two values. The two possible outcomes are usually
denoted by 0 and 1. Such variables are called dummy variables or di-
chotomous variables. Some examples:

• The labor market status of a person. The variable takes the value
1 if a person is employed and 0 if he is unemployed. The values 1
and 0 can be assigned arbitrarily.

• Voting behavior of a person. The variable takes 1 if the person votes


in favor of a new policy and 0 otherwise. Again the values 1 and 0
are arbitrary.

The expected value of a dichotomous variable yi ∈ {0, 1} is the probability


that it takes the value 1:

E(yi ) = 0 · P (yi = 0) + 1 · P (yi = 1) = P (yi = 1) .

The linear regression model, e.g. for one explanatory variable

yi = β0 + β1 xi + vi , E(vi ) = 0

is called linear probability model in this context. This linear model is not
an adequate statistical model as the expected value E(yi |xi ) = β0 + β1 xi
can lie outside [0,1] and does not represent a probability. In addition,
the error term is heteroscedastic as V (vi |xi ) = (β0 + β1 xi )(1−β 0 −β 1 xi )
depends on xi .

Version: 3-12-2023, 21:00


Binary Response Models 2

2 The Econometric Model: Probit and Logit

Binary response models directly describe the response probabilities


P (yi = 1) of the dependent variable yi .
Consider a sample of N independently and identically distributed
(i.i.d.) observations i = 1, ... , N of the dependent dummy variable yi
and K explanatory variables xi1 , ..., xiK . The probability that the depen-
dent variable takes value 1 is modeled as

P (yi = 1|xi1 , ..., xiK ) = F (zi ) = F (β0 + β1 xi1 + ... + βK xiK )

where β0 to βk are K + 1 parameters and

zi = β0 + β1 xi1 + ... + βK xiK

is a single linear index. The function F maps the single index into [0,1]
and satisfies in general

F (−∞) = 0, F (∞) = 1, ∂F (z)/∂z > 0.

The probit model assumes that the transformation function F is the


cumulative distribution function (cdf) of the standard normal distribu-
tion. The response probabilities are then
Zzi Zzi
1 1 2
P (yi = 1|xi1 , ..., xiK ) = Φ (zi ) = φ(t) dt = √ e− 2 t dt

−∞ −∞

where φ(.) is the pdf and Φ(.) the cdf of the standard normal distribution.
In the logit model, the transformation function F is the logistic func-
tion. The response probabilities are then
ez i 1
P (yi = 1|xi1 , ..., xiK ) = =
1+e z i 1 + e−zi
Figure 1 shows the transformation function F for the two models.
Note: The Logit and Probit model are almost identical and the choice
of the model is usually arbitrary. However, the parameters β of the two
3 Short Guides to Microeconometrics

0.8
← Logit
P [yi = 1|zi ] = F (zi )

0.6

0.4

0.2
Rescaled Logit → ← Probit
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
zi

Figure 1: Mapping of the linear index zi in the probit model, the logit
model and the rescaled logit model (factor 1.6).

models are scaled differently. Multiplying the parameters in the probit


model by 1.6 are approximately the same as the logit estimates.1

3 Latent Variable Model

There is an alternative interpretation that gives rise to the probit (and


analogously the logit) model. Consider a latent variable which is not
observed by the researcher and linearly depends on xi1 , ..., xiK

yi∗ = β0 + β1 xi1 + ... + βK xiK + ui , E(ui |xi1 , ..., xiK ) = 0


1 The
factor 1.6 is derived from equating the first derivative of F in the probit
model with the one in the rescaled logit model. This is the appropriate rescaling for
the marginal effect of the average type. An alternative approach is to equate the
standard deviation of the distribution for which F is the cdf. For the probit model the

standard deviation is 1 and for the logit model π/ 3 ∼ = 1.81.
Binary Response Models 4

4
E(y*)
E(y)
OLS
3
latent
observed

1
y

−1

−2

−3
0 2 4 6 8 10
x

Figure 2: The probit model with a latent variable. N = 30, K = 2,


β0 = −2 and β1 = 0.5.

The latent variable yi∗ can be interpreted as the utility difference between
choosing yi = 1 and 0. In is then called a random utility model.
5 Short Guides to Microeconometrics

Only the choice yi is observed by the researcher. An individual chooses


yi = 1 if the latent variable is positive and 0 otherwise, hence the observed
variable is (
1 if yi∗ > 0
yi =
0 if yi∗ ≤ 0
Furthermore, assume that the individual observations (xi1 , ..., xiK , yi ) are
i.i.d., that the explanatory variables are exogenous and that the error term
is normally distributed and homoskedastsic

ui |xi1 , ..., xiK ∼ N (0, σ 2 )

The probability that individual i chooses yi = 1 can now be derived


from the latent variable and the decision rule, i.e.

P (yi = 1|xi1 , ..., xiK ) = P (yi∗ > 0|xi1 , ..., xiK ) = P (zi + ui > 0|xi1 , ..., xiK )
= P (ui > −zi |xi1 , ..., xiK ) = 1 − Φ(−zi /σ)
z   
i β0 β1 βK
= Φ =Φ + xi1 + ... + xiK .
σ σ σ σ

The probit model arises when σ 2 is set to unity.


Note: βk and σ are not separately identified as only the ratio βk /σ can
be estimated. Figure 2 visualizes the latent variable model.

4 Interpretation of the Parameters

Different from the linear regression model, the parameters β cannot di-
rectly be interpreted as marginal effects on the dependent variable yi . In
some situations, the index function zi = β0 + β1 xi1 + ... + βK xiK has a
clear interpretation in a theoretical model and the marginal effect βk of a
change in the independent variable xik on yi∗ is meaningful. Even then,
the marginal effect is only identified if there is reason to set σ 2 to unity.
In general, we are interested in the marginal effect of a change in xik
Binary Response Models 6

on the expected value of the observed variable yi , i.e.


∂E(yi |xi1 , ..., xiK ) ∂P (yi = 1|xi1 , ..., xiK )
Probit: = = φ (zi ) βk
∂xik ∂xik
∂E(yi |xi1 , ..., xiK ) ∂P (yi = 1|xi1 , ..., xiK ) ezi
Logit: = = 2 βk
∂xik ∂xik (1 + ezi )

This marginal effect depends on the values of all explanatory variables xik
for observation i. Therefore, any individual has a different marginal effect.
There are several ways to summarize and report the information in the
model. A first possibility is to present the marginal effects for the “mean
type”, i.e. xik = x̄ik for all k , the “median type”, or some interesting
extreme types. A second approach is to calculate the marginal effects for
all observations in the sample and report the mean of the effects.
The estimated model can also be used for predictions

Probit: Pb(yi = 1|xi1 , ..., xiK ) = Φ(b


zi )
ezbi
Logit: Pb(yi = 1|xi1 , ..., xiK ) =
1 + ezbi
where zbi = βb0 + βb1 xi1 + ... + βbK xiK .
For a discrete explanatory variable xik it is more accurate to report the
effect of a discrete change ∆xik . The discrete effect of a dummy variable
xik changing from 0 to 1 is estimated as
d = Pb(yi = 1|..., xik = 1, ...) − Pb(yi = 1|..., xik = 0, ...)
∆P

and depends on the values of all other explanatory variables xi` , ` 6= k.


Predictions can also be aggregated to, for example, the predicted num-
ber of observations with yi = 1. There are two prediction methods for
P
this aggregate: (1) assume ybi = 1 if Pbi > 0.5 and calculate i ybi or (2)
P b
sum the predicted choice probabilities i P (yi = 1|xi1 , ..., xiK ). The two
measures can be contrasted to the actual numbers. Method 1 also al-
lows to compare actual and predicted outcomes for any observation. It is
also often interesting to report and contrast predicted numbers for certain
types of individuals.
7 Short Guides to Microeconometrics

5 Estimation with Maximum Likelihood

The probit and logit models are estimated by maximum likelihood (ML).
Assuming independence across observations, the likelihood function is
Y Y
L = P (yi = 0|xi ) P (yi = 1|xi )
{i|yi =0 } {i|yi =1 }
N
Y
= [1 − F (zi )]1−yi F (zi )yi
i=1

where P (yi = 1|xi1 , ..., xiK ) = F (zi ) = Φ(zi ) in the probit model and
P (yi = 1|xi1 , ..., xiK ) = F (zi ) = ezi /(1 + ezi ) in the logit model. The
corresponding log likelihood function is
N
X
log L = [(1 − yi ) log (1 − F (zi )) + yi log F (zi )]
i=1

The first order conditions for an optimum are in general, for all k including
a constant xi0 = 1
N  
∂ log L X −f (zi ) f (zi )
= (1 − yi ) + yi xik = 0
∂βk i=1
1 − F (zi ) F (zi )

where f (z) ≡ ∂F (z)/∂z. This simplifies in the probit model to


∂ log L X −φ (zi ) X φ (zi )
= xik + xik = 0
∂βk 1 − Φ (zi ) Φ (zi )
{i|yi =0 } {i|yi =1 }

and in the logit model to


N 
ezi

∂ log L X
= yi − xik = 0.
∂βk i=1
1 + ezi

There is no analytical solution to these FOCs and numerical optimization


routines are used. The log likelihood function can be shown to be glob-
ally concave for both models and numerical routines converge well to the
unique global maximum.
Binary Response Models 8

The ML estimator of β is consistent and asymptotically normally dis-


tributed. The approximate distribution in large samples is
A
βb ∼ N (β, Avar(β))

where Avar(β) is estimated by one of the standard ML procedures (inverse


expected H, inverse Hessian, BHHH, or Eicker-Huber-White-Sandwich).
Asymptotic hypothesis tests are performed as Wald, likelihood ratio or
lagrange multiplier tests.
The ML estimation of the probit model (and analogously the logit
model) rests on the strong assumption that the latent error term is nor-
mally distributed and homoscedastic. The ML estimator is inconsistent
in the presence of heteroscedasticity and robust (sandwich) covariance es-
timators cannot solve this. Several semi-parametric estimation strategies
have been proposed that relax the distributional assumption about the
error term. See Horowitz and Savin (2001) for an introduction and Gerfin
(1996) for a nice comparison of different estimators.

6 Estimation with OLS

Despite the logical inconsistency of the linear probability model, OLS


can be used to estimate binary choice models. OLS is then called the
linear probability model (LPM). The estimated OLS slope coefficients are
estimates for the average marginal effects of the true non-linear model.
In practice, the OLS slope coefficients will be very similar to the average
marginal effects calculated after probit or logit estimation. However, it
is very important to report robust (Eicker-Huber-White) standard errors
because of the intrinsic heteroscedasticity of the linear probability model.
The linear probability model has in practice several advantages over
probit or logit estimation: it is easier to calculate, the parameters are
directly interpretable, fixed effects and instrumental variables estimators
can easily be implemented. Note that adding fixed effects as dummy
variables in the probit or logit model will yield biased estimates.
9 Short Guides to Microeconometrics

7 Implementation in Stata 17

The probit and logit model are estimated with the probit and, respec-
tively, logit command. For example, load data
webuse auto.dta

and estimate the effect of the explanatory variables weight and mpg on
the dependent dummy variable foreign with the probit model
probit foreign weight mpg

or the logit model


logit foreign weight mpg

Stata reports the inverse hessian matrix as default covariance estimator.


The sandwich covariance estimator is reported with
probit foreign weight mpg, vce(robust)

Response probabilities are estimated for each observation with the post-
estimation command predict:
predict p_foreign, pr

Marginal effects for specific types are calculated with the post-estimation
command margins. For example, the marginal effects for a car with
weight of 2000 lbs. and 40 mpg is reported by
margins, dydx(*) at(weight = 2000 mpg = 40)

The marginal effects for the mean type, e.g. a car with average weight
and mpg, are calculated with
margins, dydx(*) atmeans

If explanatory dummy variables are defined as factor variables, Stata re-


ports exact discrete effect.
Average marginal effects in the estimation sample are calculated by
margins, dydx(*)
Binary Response Models 10

8 Implementation in R 4.3.1

The probit and logit model are estimated with the command glm which
fits generalized linear models. For example, load data
library(haven)
auto <- read_dta("https://www.stata-press.com/data/r17/auto.dta")

and estimate the effect of the explanatory variables weight and mpg on
the dependent dummy variable foreign with the probit model
probit <- glm(foreign~weight+mpg, family=binomial(link=probit),
data=auto)
summary(probit)

or the logit model


logit <- glm(foreign~weight+mpg, family=binomial(link = "logit"),
data=auto)
summary(logit)

R reports the inverse hessian matrix as default covariance estimator. The


sandwich covariance estimator is reported with
library(sandwich)
coeftest(probit, vcov=sandwich)

Response probabilities are estimated for each observation with predict:


p_foreign <- predict(probit,type=c("response"))

The R package margins offers a convenient way to calculate marginal


effects. For example, the marginal effects for a car with weight of 2000
lbs. and 40 mpg are calculated with
library(margins)
margins(probit, at = list(weight=2000, mpg=40))

Tests and confidence bounds for maginal effects are reported with
mfx <- margins(probit, at = list(weight=2000, mpg=40))
summary(mfx)

Average marginal effects in the estimation sample are calculated by


margins(probit)
11 Short Guides to Microeconometrics

References

Introductory textbooks

Stock, James H. and Mark W. Watson (2020), Introduction to Economet-


rics, 4th Global ed., Pearson. Chapter 11.
Wooldridge, Jeffrey M. (2009), Introductory Econometrics: A Modern
Approach, 4th ed., Cengage Learning. Chapters 17.1.
Aldrich, John and Forrest D. Nelson (1984), Linear Probability, Logit and
Probit Models, Sage University Press.

Advanced textbooks

Cameron, A. Colin and Pravin K. Trivedi (2005), Microeconometrics:


Methods and Applications, Cambridge University Press. Chapter 14.
Wooldridge, Jeffrey M. (2010), Econometric Analysis of Cross Section and
Panel Data, MIT Press. Chapter 15.
Maddala, G.S. (1983), Limited-Dependent and Qualitative Variables in
Econometrics, Cambridge: Cambridge University Press. Chapter 2.

Articles

Gerfin Michael (1996), Parametric and Semi-Parametric Estimation of


the Binary Response Model of Labour Market Participation, Journal
of Applied Econometrics, 11, 321-339.
Horowitz, Joel and N. Savin (2001), Binary Response Models: Logits, Pro-
bits and Semiparametrics, Journal of Economic Perspectives, 15(4),
43-56.

You might also like