Propensity Score Matching in Stata Using Teffects
Propensity Score Matching in Stata Using Teffects
use http://ssc.wisc.edu/sscc/pubs/files/psm
It consists of four variables: a treatment indicator t, covariates x1 and x2, and an outcome y. This is constructed data, and the effect of
the treatment is in fact a one unit increase in y. However, the probability of treatment is positively correlated with x1 and x2, and both x1
and x2 are positively correlated with y. Thus simply comparing the mean value of y for the treated and untreated groups badly
overestimates the effect of treatment:
ttest y, by(t)
(Regressing y on t, x1, and x2 will give you a pretty good picture of the situation.)
The psmatch2 command will give you a much better estimate of the treatment effect:
psmatch2 t x1 x2, out(y)
----------------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
y Unmatched | 1.8910736 -.423243358 2.31431696 .109094342 21.21
ATT | 1.8910736 .871388246 1.01968536 .173034999 5.89
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.
However, the default behavior of teffects is not the same as psmatch2 so we'll need to use some options to get the same results.
First, psmatch2 by default reports the average treatment effect on the treated (which it refers to as ATT). The teffects command by
default reports the average treatment effect (ATE) but will calculate the average treatment effect on the treated (which it refers to as ATET)
if given the atet option. Second, psmatch2 by default uses a probit model for the probability of treatment. The teffects command
uses a logit model by default, but will use probit if the probit option is applied to the treatment equation. So to run the same model
using teffects type:
https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm 1/6
15/10/2020 Propensity Score Matching in Stata using teffects
Running teffects with the default options gives the following:
teffects psmatch (y) (t x1 x2)
----------------------------------------------------------------------------------------
Variable Sample | Treated Controls Difference S.E. T-stat
----------------------------+-----------------------------------------------------------
y Unmatched | 1.8910736 -.423243358 2.31431696 .109094342 21.21
ATT | 1.8910736 .930722886 .960350715 .168252917 5.71
ATU |-.423243358 .625587554 1.04883091 . .
ATE | 1.01936701 . .
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.
The ATE from this model is very similar to the ATT/ATET from the previous model. But note that psmatch2 is reporting a somewhat
different ATT in this model. The teffects command reports the same ATET if asked:
Standard Errors
The output of psmatch2 includes the following caveat:
Note: S.E. does not take into account that the propensity score is estimated.
A recent paper by Abadie and Imbens (2012. Matching on the estimated propensity score. Harvard University and National Bureau of
Economic Research) established how to take into account that propensity scores are estimated, and teffects psmatch relies on their
work. Interestingly, the adjustment for ATE is always negative, leading to smaller standard errors: matching based on estimated propensity
scores turns out to be more efficient than matching based on true propensity scores. However, for ATET the adjustment can be positive or
negative, so the standard errors reported by psmatch2 may be too large or to small.
Handling Ties
Thus far we've used psmatch2 and teffects psmatch to do simple nearest-neighbor matching with one neighbor (and no caliper).
However, this raises the question of what to do when two observations have the same propensity score and are thus tied for "nearest
neighbor." Ties are common if the covariates in the treatment model are categorical or even integers.
The psmatch2 command by default matches with one of the tied observations, but with the ties option it matches with all tied
observations. The teffects psmatch command always matches with all ties. If your data set has multiple observations with the same
propensity score, you won't get exactly the same results from teffects psmatch as you were getting from psmatch2 unless you
go back and add the ties option to your psmatch2 commands. (At this time we are not aware of any clear guidance as to whether it is
better to match with ties or not.)
https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm 2/6
15/10/2020 Propensity Score Matching in Stata using teffects
teffects psmatch (y) (t x1 x2), nn(3)
Postestimation
By default teffects psmatch does not add any new variables to the data set. However, there are a variety of useful variables that
can be created with options and post-estimation predict commands. The following table lists the 1st and 467th observations of the
example data set after some of these variables have been created. We'll refer to it as we explain the commands that created the new
variables. Reviewing these variables is also a good way to make sure you understand exactly how propensity score matching works.
+-------------------------------------------------------------------------------------------------------+
| x1 x2 t y match1 ps0 ps1 y0 y1 te |
|-------------------------------------------------------------------------------------------------------|
1. | .0152526 -1.793022 0 -1.79457 467 .9081651 .0918349 -1.79457 2.231719 4.026289 |
467. | -2.057838 .5360286 1 2.231719 781 .907606 .092394 -.6012772 2.231719 2.832996 |
+-------------------------------------------------------------------------------------------------------+
The gen() option tells teffects psmatch to create a new variable (or variables). For each observation, this new variable will
contain the number of the observation that observation was matched with. If there are ties or you told teffects psmatch to use
multiple neighbors, then gen() will need to create multiple variables. Thus you supply the stem of the variable name, and teffects
psmatch will add suffixes as needed.
teffects psmatch (y) (t x1 x2), gen(match)
In this case each observation is only matched with one other, so gen(match) only creates match1. Referring to the example output,
the match of observation 1 is observation 467 (which is why those two are listed).
Note that these observation numbers are only valid in the current sort order, so make sure you can recreate that order if needed. If
necessary, run:
gen ob=_n
and then:
sort ob
The predict command with the ps option creates two variables containing the propensity scores, or that observation's predicted
probability of being in either the control group or the treated group:
Here ps0 is the predicted probability of being in the control group (t=0) and ps1 is the predicted probability of being in the treated group
(t=1). Observations 1 and 467 were matched because their propensity scores are very similar.
The po option creates variables containing the potential outcomes for each observation:
predict y0 y1, po
Because observation 1 is in the control group, y0 contains its observed value of y. y1 is the observed value of y for observation 1's match,
observation 467. The propensity score matching estimator assumes that if observation 1 had been in the treated group its value of y would
have been that of the observation in the treated group most similar to it (where "similarity" is measured by the difference in their
propensity scores).
Observation 467 is in the treated group, so its value for y1 is its observed value of y while its value for y0 is the observed value of y for
its match, observation 781.
Running the predict command with no options gives the treatment effect itself:
predict te
The treatment effect is simply the difference between y1 and y0. You could calculate the ATE yourself (but emphatically not its standard
error) with:
sum te
sum te if t
https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm 3/6
15/10/2020 Propensity Score Matching in Stata using teffects
matches the controls. Mathematically this is all equivalent to using matching to estimate what an observation's outcome would have been if
it had been in the other group, as described above.
Sometimes researchers then want to run regressions on the "matched sample," defined as the observations in the treated group plus the
observations in the control group which were matched to them. The problem with this approach is that the matched sample is based on
propensity scores which are estimated, not known. Thus the matching scheme is an estimate as well. Running regressions after matching is
essentially a two stage regression model, and the standard errors from the second stage must take the first stage into account, something
standard regression commands do not do. This is an area of ongoing research.
We will discuss how to run regressions on a matched sample because it remains a popular technique, but we cannot recommend it.
psmatch2 makes it easy by creating a _weight variable automatically. For observations in the treated group, _weight is 1. For
observations in the control group it is the number of observations from the treated group for which the observation is a match. If the
observation is not a match, _weight is missing. _weight thus acts as a frequency weight (fweight) and can be used with Stata's
standard weighting syntax. For example (starting with a clean slate again):
Observations with a missing value for _weight are omitted from the regression, so it is automatically limited to the matched sample.
reg command are incorrect because they do not take into account the matching
Again, keep in mind that the standard errors given by the
stage.
teffects psmatch does not create a _weight variable, but it is possible to create one based on the match1 variable. Here is
example code, with comments:
merge 1:m ob using fulldata // merge back into the full data
replace weight=1 if t // set weight to 1 for treated observations
The resulting weight variable will be identical to the _weight variable created by psmatch2, as can be verified with:
assert weight==_weight
It is used in the same way and will give exactly the same results:
reg y x1 x2 t [fweight=weight]
psmatch2. If your propensity score matching model can be done using both
Obviously this is a good bit more work than using
teffects psmatch and psmatch2, you may want to run teffects psmatch to get the correct standard error and then
psmatch2 if you need a _weight variable.
This regression has an N of 666, 333 from the treated group and 333 from the control group. However, it only uses 189 different
observations from the control group. About 1/3 of them are the matches for more than one observation from the treated group and are thus
duplicated in the regression (run tab weight if !t for details). Researchers sometimes use the norepl (no replacement) option
in psmatch2 to ensure each observation is used just once, even though this generally makes the matching worse. To the best of our
knowledge there is no equivalent with teffects psmatch.
The results of this regression leave somewhat to be desired:
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | 1.11891 .0440323 25.41 0.000 1.03245 1.205369
x2 | 1.05594 .0417253 25.31 0.000 .97401 1.13787
t | .9563751 .0802273 11.92 0.000 .7988445 1.113906
_cons | .0180986 .0632538 0.29 0.775 -.1061036 .1423008
------------------------------------------------------------------------------
By construction all the coefficients should be 1. Regression using all the observations (reg y x1 x2 t rather than reg y x1 x2 t
[fweight=weight]) does better in this case:
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | 1.031167 .0346941 29.72 0.000 .9630853 1.099249
x2 | .9927759 .0333297 29.79 0.000 .9273715 1.05818
t | .9791484 .0769067 12.73 0.000 .8282306 1.130066
https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm 4/6
15/10/2020 Propensity Score Matching in Stata using teffects
_cons | .0591595 .0416008 1.42 0.155 -.0224758 .1407948
------------------------------------------------------------------------------
clear all
use http://www.ssc.wisc.edu/sscc/pubs/files/psm
ttest y, by(t)
reg y x1 x2 t
gen ob=_n
save fulldata,replace
assert weight==_weight
reg y x1 x2 t [fweight=weight]
reg y x1 x2 t
https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm 5/6
15/10/2020 Propensity Score Matching in Stata using teffects
©2009-2019 UW Board of Regents, University of Wisconsin - Madison
https://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm 6/6