Week 12 LPN Logit 0
Week 12 LPN Logit 0
Probit
University of Colorado
Anschutz Medical Campus
2
Binary outcomes
3
Linear Probability Models
4
Linear Probability Models
The problem is that we know that this model is not entirely
correct. Recall that in the linear model we assume
Y ∼ N(β0 + β1 X1 + · · · + βp Xp , σ 2 ) or equivalently, i ∼ N(0, σ 2 )
That is, Y distributes normal conditional on Xs or the error
distributes normal with mean 0
Obviously, a 1/0 variable can’t distribute normal, and i can’t be
normally distributed either
We also know that we needed the normality assumption for
inference, not to get best betas
The big picture: Using the linear model for a 1/0 outcomes is
mostly wrong in the sense that the SEs are not right
Yet, the effects of covariates on the probability of the outcome are
more often than not fine
So LPM is the wrong but super useful model because changes can
be interpreted in the probability scale
5
Example
Data from the National Health and Nutrition Examination Survey
(NHANES)
We model the probability of hypertension given age
reg htn age
------------------------------------------------------------------------------
htn | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0086562 .0004161 20.80 0.000 .0078404 .009472
_cons | -.1914229 .0193583 -9.89 0.000 -.2293775 -.1534682
------------------------------------------------------------------------------
7
Linear Probability Models
8
Lowess
Lowess can show you how the relationship between the indicator
variable and the explanatory variable looks like
scatter htn age, jitter(5) msize(vsmall) || line lhtn age, sort ///
legend(off) color(red) saving(hlowess.gph, replace) ///
title("Lowess")
9
LPN, more issues
10
So why do we use LPMs?
Not long ago, maybe 10 or 15 years ago (it’s always 10 to 15 to 20
years ago), you couldn’t use other alternatives with large datasets
(logistic, probit)
It would take too long to run the models or they wouldn’t run;
researchers would take a sample and run logit or probit as a sensitivity
analysis
The practice still lingers in HSR and health economics
The main reason to keep using LPM as a first step in modeling, it’s
because the coefficients are easy to interpret
In my experience, if the average of the outcome is not close to 0 or 1,
not much difference between LPM or logit/probit (but SEs can
change, although not by a lot)
But not a lot of good reasons to present LPM results in papers
anymore, except maybe in difference-in-difference models
11
One more time
12
Logistic or logit model
Logistic models can be derived in several ways, which makes learning
confusing since you can read different versions
In the MLE lecture we derived the model assuming that the outcome
1/0 distributes Bernoulli and that observations were iid . We will
extend that example today
An (almost) identical way is to assume that the outcome comes
from a Binomial distribution since the Binomial is the sum of iid
Bernoulli random variables
A third way is to assume that there is a latent and continuous
variable that distributes logistic (yes, there is also a logistic pdf), or
probit, but we only get to observe a 1 or 0 when the latent variable
crosses a threshold
You get to the same model but the latent interpretation has a
bunch of applications ins economics (for example, random utility
models) and psychometrics (the latent variable is “ability” but you
only observed if a person answers a question correctly, a 1/0)
13
Recall the Logit MLE from the MLE lecture
14
Adding covariates
15
Logistic response function
If we constrain the response to be between 0 and 1, it can’t be linear
with respect to X
twoway function y=exp(x) / (1+ exp(x)), range(-10 10) saving(l1.gph, replace)
twoway function y=exp(-x) / (1+ exp(-x)), range(-10 10) saving(l2.gph, replace)
graph combine l1.gph l2.gph, xsize(20) ysize(10)
graph export lboth.png, replace
16
Logistic or logit model
17
Logistic MLE
18
Yet another way
19
Compare logistic distribution and normal
Stata doesn’t have a logistic distribution but you can simulate any
probability distribution using the uniform distribution and plugging in
into the inverse of the pdf you want
clear
set seed 123456
set obs 5000
gen u = uniform()
* Plot
kdensity l, bw(0.3) gen(xl dl)
kdensity n, bw(0.3) gen(xn dn)
line dl xl, sort color(red) || line dn xn, sort ///
title("Logistic (red) vs normal distribution") ytitle("Density") ///
xtitle("x") legend(off)
20
Standard logistic vs standard normal
* Plot
kdensity l, bw(0.3) gen(xl dl)
kdensity n, bw(0.3) gen(xn dn)
line dl xl, sort color(red) || line dn xn, sort ///
title("Logistic (red) vs normal distribution") ytitle("Density") ///
xtitle("x") legend(off)
graph export logvsnorm.png, replace
21
Logistic vs normal
Assuming either one as the latent distributions makes little difference
22
Big picture
Not a big difference in the probability scale between probit and logit
If you are an economist you run probit models; for the rest of the
world, there is the logistic model
IMPORTANT: There is a big difference in terms of interpreting a
regression output because the coefficients are estimated in different
scales
In the logistic model the effect of a covariate can be made linear in
terms of the odds-ratio; you can’t do the same in probit models
We will see examples of both
23
Digression: How to simulate a logistic model
24
Simulation
Note that the parameters match the ones I used in the gen step in
previous slide
logit y x1
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .0998315 .0145838 6.85 0.000 .0712478 .1284153
_cons | -1.004065 .0493122 -20.36 0.000 -1.100715 -.9074147
------------------------------------------------------------------------------
25
Big picture
26
Comparing LPM, logit, and probit models
* Logit
qui logit htn age
est sto logit
predict hatlog
* Probit
qui probit htn age
est sto prob
predict hatprob
line hatlpm age, sort color(blue) || line hatlog age, sort || ///
line hatprob age, sort legend(off) saving(probs.gph, replace)
graph export prob.png, replace
27
Predicted probabilities
Note that probabilities are not linear with probit and logit even
though we wrote models in the same way
Note that there is practically no difference between the logit and
probit models in this example
28
But wait, look closely... What about effects?
In the linear model the effect of X (age in this case) is always the
same regardless of age. It’s the slope for small changes and when we
use marginal effects
See graph again. Around 45 the effect will be identical (similar
slopes). But not around 25 or 65
29
Effect at different points
30
Effect at different points
But for logit the effect will be different. Note that the difference is
quite large
qui logit htn age
margins, dydx(age) at(age=(25 45 65)) vsquish
Conditional marginal effects Number of obs = 3,529
Model VCE : OIM
Expression : Pr(htn), predict()
dy/dx w.r.t. : age
1._at : age = 25
2._at : age = 45
3._at : age = 65
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age |
_at |
1 | .0030922 .0001522 20.32 0.000 .0027939 .0033904
2 | .0084346 .0004166 20.25 0.000 .007618 .0092511
3 | .0149821 .0009574 15.65 0.000 .0131057 .0168585
------------------------------------------------------------------------------
31
But, does this matter?
32
Coefficients in the estimation scale?
-----------------------------------------------------
Variable | lpm logit prob
-------------+---------------------------------------
_ |
age | .00865616
| 0.0000
_cons | -.19142286
| 0.0000
-------------+---------------------------------------
htn |
age | .0623793 .0352797
| 0.0000 0.0000
_cons | -4.4563995 -2.5542014
| 0.0000 0.0000
-----------------------------------------------------
legend: b/p
33
Similar in probability scale
We will learn how to make them comparable using the probability
scale, which is what we really care about. The margins command is
computing an average effect across values of age
. * LPM
qui reg htn age
margins, dydx(age)
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0086562 .0004161 20.80 0.000 .0078404 .009472
------------------------------------------------------------------------------
* Logit
qui logit htn age
margins, dydx(age)
------------+----------------------------------------------------------------
age | .0084869 .0004066 20.87 0.000 .00769 .0092839
------------------------------------------------------------------------------
* Probit
qui probit htn age
margins, dydx(age)
-------------+----------------------------------------------------------------
age | .0084109 .0003917 21.47 0.000 .0076432 .0091787
------------------------------------------------------------------------------
34
Summary
To make life easier, logistic models are often interpreted using odds
ratios but odds ratios can be misleading
In the probit model, we interpret parameters as shifts in the
cumulative normal, even less intuitive
35