Coefplot Manual
Coefplot Manual
Social Sciences
Department of Social Sciences
http://ideas.repec.org/p/bss/wpaper/1.html
http://econpapers.repec.org/paper/bsswpaper/1.htm
Abstract
Graphical presentation of regression results has become increasingly popular in the scientific
literature, as graphs are much easier to read than tables in many cases. In Stata such plots can be
produced by the marginsplot command ([R] marginsplot). However, while marginsplot is very
versatile and flexible, it has two major limitations: it can only process results left behind by margins
([R] margins) and it can only handle one set of results at the time. In this article I introduce a new
command called coefplot that overcomes these limitations. It plots results from any estimation
command and combines results from several models into a single graph. The default behavior of
coefplot is to plot markers for coefficients and horizontal spikes for confidence intervals. However,
coefplot can also produce various other types of graphs. The capabilities of coefplot are illustrated
in this article using a series of examples.
Keywords: coefplot, marginsplot, margins, regression plot, coefficients plot
Contents
1 Introduction 2
2 Syntax 3
2.1 Model options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Plot options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Subgraph options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Global options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Examples 8
3.1 Plotting a single model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Plotting multiple models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.1 Models as plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2.2 Subgraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.3 Appending models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.4 How coefficients and equations are matched . . . . . . . . . . . . . . . . . . . . . . 14
3.2.5 How coefficients are ordered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Labeling the categorical axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3.1 Custom coefficient labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.2 Headings and groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.3 Equation labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.4 Labels on opposite side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5 Alternate plot types and advanced examples . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5.1 Vertical mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5.2 Using the recast() option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5.3 Adding marker labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5.4 Arranging subgraphs by coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.5 Using a continuous axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.5.6 Plotting results from matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1 Introduction
Tabulating regression coefficients has long been the preferred way of communicating results from sta-
tistical models. However, researchers now more and more employ graphs to present regression results.
This has several reasons. On the one hand, interpretation of regression tables can be very challenging,
especially if there are interaction effects, categorical variables, or nonlinear functional forms. Moreover,
in nonlinear models, the original regression coefficients are often not the primary interest of researchers.
For example, in logistic regression the raw coefficients represent effects on log odds. However, most peo-
ple would be more comfortable with effects expressed on the probability scale. Since probability effects
are not constant in such a model, it can be helpful, for example, to plot effect functions. On the other
hand, and more fundamentally, it has been recognized that presentation of results in form of graphs can
me much more effective than tabulation.
While graphics have always been very present in science, one type of plot has become increasingly
popular. In such a plot, regression coefficients or other statistics of interest are displayed as markers
accompanied by spikes indicating confidence intervals. See Kastellec and Leoni (2007) for some examples.
Creating such graphs in Stata is tedious (although see Newson, 2003). The coefficients and variances
have to be gathered from the e()-returns, confidence intervals have to be computed, and the results have
to be appropriately stored as variables in the data set. Then a suitable variable for the category axis
has to be generated and coefficient labels have to be defined. Finally, a complicated graph command
has to be issued to plot the coefficients and confidence intervals. With the introduction of marginsplot
([R] marginsplot) in Stata 12 this task has been greatly simplified. It is now possible to plot coefficients
and confidence intervals with just a few lines of code. For example, consider the following linear regression
model ([R] regress):
. sysuse auto, clear
(1978 Automobile Data)
. regress price mpg trunk length turn
Source SS df MS Number of obs = 74
F( 4, 69) = 5.79
Model 159570047 4 39892511.8 Prob > F = 0.0004
Residual 475495349 69 6891236.94 R-squared = 0.2513
Adj R-squared = 0.2079
Total 635065396 73 8699525.97 Root MSE = 2625.1
2
To plot the regression coefficients (which, in this case, are equal to the average marginals effects), we
could type:
length
turn
-600 -400 -200 0 200
Effects on Linear Prediction
marginsplot is a very versatile command that can do much more than what is shown above, espe-
cially when plotting predictive margins, the area of application marginsplot was primarily designed for.
However, two main drawbacks prevent marginsplot from being easily employed as a general tool for
plotting coefficients or other estimation results. First, marginsplot can only process results left behind
by margins ([R] margins). Second, marginsplot can only deal with one set of results at the time (i.e.
the results from one call of margins).
I therefore wrote a command that can be applied to any estimation results and can combine results
from several estimation sets into one graph. The new coefplot command can be seen as a graphical
equivalent to popular tabulation programs such as outreg (Gallup, 2012) or estout (Jann, 2007).
To install the coefplot package, type
. ssc install coefplot, replace
2 Syntax
The basic syntax of coefplot is:
⇥ ⇤ ⇥ ⇤
coefplot subgraph || subgraph ... , globalopts
where name is the name of a stored model (see [R] estimates; type . or leave blank to refer to the
active model) or
3
⇥ ⇤
matrix(mspec) , modelopts
to plot results from a matrix (see [P] matrix) where mspec may be:
name to use the first row of matrix name
name[#,.] to use row # of matrix name; may also type name[#,] or name[#]
name[.,#] to use column # of matrix name; may also type name[,#]
Parentheses around plot can be omitted if plot does not contain spaces.
coefplot has four levels of options: (1) modelopts are options that apply to a single model (or
matrix). They specify the information to be collected and displayed. (2) plotopts are options that apply
to a single plot, possibly containing results from multiple models. They affect the rendition of markers
and confidence intervals and provide a label for the plot. (3) subgropts are options that apply to a single
subgraph, possibly containing multiple plots. (4) globalopts are options that apply to the overall graph.
The options levels are nested in the sense that upper level options include all lower level options. That
is, globalopts includes subgropts, plotopts, and modelopts; subgropts includes plotopts and modelopts;
plotopts includes modelopts. If lower level options are specified at an upper level, they serve as defaults
for all included lower levels elements. These defaults, however, are overwritten by options specified at a
lower level.
The following sections give a brief overview of the available options. For a more detailed description
see the see the online help.
4
⇥ ⇤
eqlist = newname eqlist = newname ...
and eqlist is a space separated list of equation names; type equation names as prefix * to replace a
prefix or *suffix to replace a suffix.
asequation(string) sets the equation for all coefficients to string.
⇥ ⇤
eform (coeflist) plots exponentiated coefficients, where coeflist is as above for keep().
rescale(spec) rescales coefficients; spec is either # or
⇥ ⇤
coeflist = # coeflist = # ...
where coeflist is as above for keep().
swapnames swaps coefficient names and equation names.
Confidence intervals
marker_options change the look of markers (color, size, etc.); see [G] marker_options.
mlabel adds coefficient values as marker labels.
marker_label_options change the look and position of marker labels; see [G] marker_label_options.
recast(plottype) plots the markers using plottype; supported plot types are scatter (default), line,
connected, area, bar, spike, dropline, and dot.
5
Rendition of confidence intervals
modelopts, plotopts, and subgropts are global model, plot, and subgraph options as described above.
horizontal places coefficient values on the X axis; this is the general default.
vertical places coefficient values on the Y axis; this is the default with at().
order(coeflist) orders coefficients, where coeflist is as above for keep(); may specify . instead of coef
to introduce gaps; not allowed with at().
relocate(spec) repositions coefficients, where spec is
⇥ ⇤ ⇥⇥ ⇤ ⇤
eq: coef = # eq: coef = # ...
bycoefs arranges subgraphs by coefficients; not allowed with at().
norecycle increments plot styles across subgraphs.
grid(spec) determines where grid lines are placed; not allowed with at(); spec may be between (grid
lines between coefficients), within (grid lines within coefficients), or none (omit grid lines).
nooffsets suppresses automatic offsets of plot positions.
format(format) sets the display format for coefficient values; format may be a numeric format or a date
format as described in [D] format.
Labels
6
noeqlabels suppresses equation labels.
eqlabels(spec) specifies custom labels for equations; not allowed with at(); spec is
⇥ ⇥ ⇤⇤ ⇥ ⇥ ⇤ ⇥ ⇤
label label ... , no gap (#) asheadings offset(#) truncate(#) wrap(#)
⇤
nobreak suboptions
gap() specifies the gap between equations; default is gap(1). asheadings treats equation labels
as headings; see headings(). offset(#) offsets the labels by # (only allowed with asheadings).
truncate(), wrap(), and nobreak are as described above under coeflabels(). suboptions are axis
label suboptions as described in [G] axis_label_options.
eqstrict specifies to be strict about equations.
headings(spec) adds headings between coefficients; not allowed with at(); spec is
⇥ ⇤ ⇥ ⇥ ⇤ ⇥ ⇤
coeflist = label coeflist = label ... , no gap (#) offset(#) truncate(#) wrap(#)
⇤
nobreak suboptions
where coeflist is as above for keep(). gap() specifies the gap before headings; default is gap(1).
offset(#) offsets the headings by #. truncate(), wrap(), and nobreak are as described above
under coeflabels(). suboptions are axis label suboptions as described in [G] axis_label_options.
groups(spec) adds labels for groups of coefficients; not allowed with at(); spec is
⇥ ⇤ ⇥ ⇥ ⇤ ⇥ ⇤
coeflist = label coeflist = label ... , no gap (#) truncate(#) wrap(#) nobreak
⇤
suboptions
where coeflist is as above for keep(). gap() specifies the size of the gaps before and after the groups;
default is gap(1). truncate(), wrap(), and nobreak are as described above under coeflabels().
suboptions are axis label suboptions as described in [G] axis_label_options.
plotlabels(spec) specifies labels for the plots to be used in the legend; spec is
⇥ ⇥ ⇤⇤ ⇥ ⇤
label label ... , truncate(#) wrap(#) nobreak
where truncate(), wrap(), and nobreak are as described above under coeflabels().
bylabels(spec) specifies labels for the subgraphs; spec is
⇥ ⇥ ⇤⇤ ⇥ ⇤
label label ... , truncate(#) wrap(#) nobreak
where truncate(), wrap(), and nobreak are as described above under coeflabels().
Save results
⇥ ⇤
generate (prefix ) generates variables containing the graph data; default prefix is coefplot_.
replace overwrites existing variables.
Add plots
twoway_options are twoway options, other than by(), as documented in [G] twoway_options.
byopts(byopts) determines how subgraphs are combined; byopts are as described in [G] by_option.
7
3 Examples
3.1 Plotting a single model
The syntax to produce a plot of the coefficients of a single model is
⇥ ⇤ ⇥ ⇤
coefplot name , options
where name is the name of a stored model (see [R] estimates), or . or empty string for the active
model.
For example, to plot coefficients and 95% confidence intervals for the most recent model, type:
Length (in.)
_cons
We may not be interested in the constant, so we can add drop(_cons) to remove it. Furthermore,
xline(0) will add a reference line at zero so we can better see which coefficients are significantly different
from zero:
Length (in.)
coefplot can graph results from almost any estimation command. For example, to plot coefficients
from a logit model ([R] logit), type:
8
. sysuse auto, clear
(1978 Automobile Data) Mileage (mpg)
. logit foreign mpg trunk length turn
(output omitted )
. coefplot, drop(_cons) xline(0) Trunk space (cu. ft.)
> xtitle(Log odds)
Length (in.)
With logit models one is often interested in odds ratios instead of the raw coefficients. To plot
odds ratios instead of log odds, use the eform option that causes coefplot to compute exponents of
coefficients and confidence intervals:
Length (in.)
.4 .6 .8 1 1.2
Odds ratio
Furthermore, if you want to plot average marginal effects instead of log odd or odds ratios, you can
apply margins (see [R] margins):
It is essential to specify the post option with margins so that it posts its results in e(), from where
coefplot collects the results to be displayed. If you do not specify the post option then margins leaves
9
e() unchanged and coefplot uses the raw coefficients from the logit model that still reside in e().
where name is again the name of a stored model, or . or empty string for the active model. plotopts are
options that apply to a single plot. They specify the information to be collected, affect the rendition of
the plot, and provide a label for the plot in the legend. globalopts are options that apply to the overall
graph, such as titles or axis labels, but may also contain any options allowed as plot options to provide
defaults for the single plots.
A basic example is as follows:
D F
To specify separate options for an individual model, enclose the model and its options in parentheses.
For example, to add a label for each plot in the legend, to use alternative plot styles, and to change the
marker symbol, you could type:
10
. coefplot (D, label(Domestic Cars)
> pstyle(p3)) Mileage (mpg)
> (F, label(Foreign Cars)
> pstyle(p4))
> , msymbol(S) drop(_cons) xline(0)
Trunk space (cu. ft.)
Length (in.)
I specified msymbol() as a global option so that the same symbol is used for in both plots. To use
different symbols, include an individual msymbol() option for each plot.
coefplot offsets the plot positions of the coefficients so that the confidence spikes do not overlap. To
deactivate the automatic offsets, you can specify global option nooffsets. Alternatively, custom offsets
may be specified by the offset() option (if offset() is specified for at least one model, automatic
offsets are disabled). The spacing between coefficients is one unit, so usually offsets between 0.5 and
0.5 make sense. For example, if you want to use smaller offsets than the default, you could type:
Length (in.)
If the dependent variables of the models you want to include in the graph have different scales, it
can be useful to employ the axis() plot option to assign specific axes to the models. For example, to
include a regression on price and a regression on weight in the same graph, type:
11
. sysuse auto, clear Weight
-40 -20 0 20 40 60
(1978 Automobile Data)
. regress price mpg trunk length turn
Mileage (mpg)
(output omitted )
. estimates store Price
. regress weight mpg trunk length turn Trunk space (cu. ft.)
(output omitted )
. estimates store Weight Length (in.)
. coefplot Price (Weight, axis(2)),
> drop(_cons)
> xtitle(Price) xtitle(Weight, axis(2)) Turn Circle (ft.)
Price Weight
3.2.2 Subgraphs
where plotlist is a list of plots as in section 3.2.1 and subgropts are options that apply to a single subgraph.
An example with one model per subgraph is:
12
. regress weight mpg trunk length turn Price Weight
> if foreign==0
(output omitted ) Mileage (mpg)
. estimates store wD
. regress weight mpg trunk length turn
Trunk space (cu. ft.)
> if foreign==1
(output omitted )
. estimates store wF
Length (in.)
. coefplot
> (D, label(Domestic))
> (F, label(Foreign)), bylabel(Price)
Turn Circle (ft.)
> || wD wF, bylabel(Weight)
> ||, drop(_cons) xline(0)
> byopts(xrescale) -1000 -500 0 500 -100 -50 0 50 100
Domestic Foreign
Option byopts(xrescale) was specified so that each subgraph can have its own scale.
In the example above, plot labels for the legend were set within the first subgraph. They could also
have been specified within the second subgraph, as plot styles are recycled with each new subgraph
and plot options are collected across subgraphs. To prevent recycling of plot styles, add the norecycle
option:
. coefplot D F, bylabel(A) A B
> || wD wF, bylabel(B)
> ||, drop(_cons) xline(0) Mileage (mpg)
> norecycle byopts(xrescale)
> legend(rows(1))
Trunk space (cu. ft.)
Length (in.)
D F wD wF
Furthermore, to leave a plot position empty in one of the subgraphs, you can specify _skip in place
of a plot:
Length (in.)
Domestic Foreign
13
3.2.3 Appending models
where name is again the name of a stored model, or . or empty string for the active model, and modelopts
are options that apply to a single model.
For example, if you want to draw a graph comparing bivariate and multivariate effects, you could
type:
bivariate multivariate
The default for coefplot is to use the first (nonzero) equation from each model and match coefficients
across models by their names (ignoring equation names). For example, regress returns one (unnamed)
equation containing the regression coefficients whereas tobit ([R] tobit) returns two equations, equation
“model” containing the regression coefficients and equation “sigma” containing the standard error of the
regression. Hence, the default for coefplot is to match the regression coefficients from the two models
and ignore equation “sigma” from the Tobit model:
Even though the collected results from regress and tobit have different equation names (“_” and
“model”, respectively), coefplot matches their coefficients, that is, the equation names are ignored.
This is the default if only one equation per model is collected. If you want to take equation names into
account nonetheless, you can specify the eqstrict option:
14
. coefplot regress tobit, xline(0) Mileage (mpg)
> eqstrict Trunk space (cu. ft.)
Length (in.)
_
Turn Circle (ft.)
_cons
Mileage (mpg)
model
Length (in.)
_cons
-1 -.5 0 .5 1 1.5
regress tobit
Although eqstrict causes equation names to be relevant, the second equation from the tobit model
is still ignored. To include all equations, type:
Mileage (mpg)
Trunk space (cu. ft.)
model
Length (in.)
Turn Circle (ft.)
_cons
sigma
_cons
-1 -.5 0 .5 1 1.5
regress tobit
Furthermore, to match the coefficients from regress with the first equation from tobit and also
print the second equation from tobit, you can use asequation() to set the equation name of regress
to “model”:
Length (in.)
_cons
sigma
_cons
-1 -.5 0 .5 1 1.5
regress tobit
Alternatively, you could also use eqrename(_ = model) to rename equation “_” to “model” or
eqrename(model = _) to rename equation “model” to “_”.
15
Another application of the asequation() option is when you want to assign equations to results from
margins. The following example shows how to plot log odds of a multinomial logit ([R] mlogit) along
with average marginal effects:
3
. label variable mpp "Miles per pint" Car type
4
. forvalues i = 3/5 { Car type
2. quietly margins, dydx(*)
> predict(outcome(`i´)) post
3. estimates store ame`i´
4. quietly estimates restore mlogit Miles per pint
5. }
5
. coefplot mlogit, keep(*:) drop(_cons) Car type
Finally, if you want to match coefficients that have different names in the input models, you can apply
the rename() option. Here is an example that illustrates the effect of measurement error in regression
models:
. drop _all
. matrix C = ( 1, .5, 0 \ .5, 1, .3 x1
> \ 0, .3, 1 )
. drawnorm x1 x2 x3, n(10000) corr(C)
(obs 10000) x2
. generate y = 1 + x1 + x2 + x3 +
> 5 * invnorm(uniform())
. regress y x1 x2 x3
x3
(output omitted )
. estimates store m1
. generate x1err = x1 + _cons
> 2 * invnorm(uniform())
. regress y x1err x2 x3 0 .5 1 1.5
(output omitted )
Without error With error
. estimates store m2
. coefplot (m1, label(Without error))
> (m2, label(With error)), xline(1) rename(x1err = x1)
We can see how measurement error on x1 distorts all slope coefficients in the model, even for variable
x3 that is uncorrelated with x1 (due to the indirect correlation through x2).
In general, coefficients are plotted in the same order (from top to bottom) as they appear in the input
models. However, coefficients appearing only in later models are placed after coefficients from earlier
models (with the exception of _cons, which is always placed last). Have a look at the following example:
16
. sysuse auto, clear m1 m2 m3
(1978 Automobile Data)
. label variable mpg "1. mpg" 1. mpg
. estimate store m3
. coefplot m1 || m2 || m3, xline(0) drop(_cons) byopts(row(1))
Even though in the full model (m3) trunk comes before length, the order of the two coefficients is
reversed in the plot. This is because length but not trunk is part of the first model. That is, because
trunk only appears in the later models, it is placed after length that appears already in the first model.
To establish an order as in model m3, you can use the order() option:
2. trunk
3. length
4. turn
Within order(), you can use the * (any string) and ? (any nonzero character) wildcards. Further-
more, you can type . to insert gaps (but also see the section on headings and groups below). Example:
17
. label variable mpg m1 m2 m3
turn
length
Headroom (in.)
(output omitted )
. coefplot, xline(0) keep(*:) _cons
> order(foreign gear* head*)
Car type
Gear Ratio
5
Headroom (in.)
_cons
-10 -5 0 5 10 15
To reorder equations, to apply different orderings within equations, or to break equations apart,
specify equation names within order(), as in the following examples:
Gear Ratio
5
Car type
_cons
Headroom (in.)
Gear Ratio
4
Car type
_cons
-10 -5 0 5 10 15
18
. coefplot, xline(0) keep(*:) Car type
5
> order(5:foreign 4:foreign gear*
> 5:_cons *)
Car type
4
Gear Ratio
_cons
Headroom (in.)
5
Gear Ratio
Headroom (in.)
4
_cons
-10 -5 0 5 10 15
In the second example, headroom and _cons from equation 4 are placed last because they are re-
maining coefficients that have not been listed in order().
Note that equation names have to be specified either for all elements in order() or for none. Hence,
for example, typing order(foreign 4:_cons) would be invalid.
By default, coefficients (and gaps, if specified) are placed at integer values on the categorical axis
(starting with 1 from top to bottom). If you want to place coefficients at nonstandard values, you can
apply the relocate() option. relocate() is an end-of-pipe option, that is, after the categorical axis
has been set up in the usual way, relocate() moves the specified coefficients, leaving empty the original
positions. Here is an example:
To illustrate the effect of relocate() the original plot positions have been marked with labels “1”,
“2”, “3”, and “4” in the example.
19
. sysuse auto, clear
Headroom (in.)
(1978 Automobile Data)
. keep if rep78>=3
Repair Record 1978=4
(10 observations deleted)
. regress mpg headroom i.rep##i.foreign
Repair Record 1978=5
(output omitted )
. coefplot, xline(0) Foreign
_cons
-20 -10 0 10 20 30
To use coefficient names instead of variable labels, specify the nolabels option:
4.rep78
5.rep78
1.foreign
4.rep78#1.foreign
5.rep78#1.foreign
_cons
-20 -10 0 10 20 30
An easy way to provide labels for the coefficients is to define appropriate variable and value labels
before applying coefplot (see [R] label). However, not all coefficients have corresponding variables
(e.g. _cons). To provide labels for such coefficients or to assign custom labels to coefficients without
manipulating variable labels, use the coeflabels() option:
20
. sysuse auto, clear
Headroom (in.)
(1978 Automobile Data)
. keep if rep78>=3
Repair Record 1978=4
(10 observations deleted)
. regress mpg headroom i.rep##i.foreign
Repair Record 1978=5
(output omitted )
. coefplot, xline(0) Foreign Car
> coeflabel(1.foreign = "Foreign Car"
> _cons = "Constant")
Repair Record 1978=4 # Foreign
Constant
-20 -10 0 10 20 30
coeflabels() has a wrap() and a truncate() suboption to deal with long labels. These sub-
options apply to all coefficient labels, whether they are automatically generated or provided within
coeflabels(). For example, to limit the line with to 20 characters and wrap long labels to multiple
lines, type:
Foreign
_cons
-20 -10 0 10 20 30
Multiline labels can also be created explicitly using compound double quotes within coeflabels().
Such labels will not be altered by wrap() or truncate():
Foreign
_cons
-20 -10 0 10 20 30
21
3.3.2 Headings and groups
Sometimes it is useful to add headings between coefficients to better arrange a graph. This can be
achieved by the headings() option:
In this example, omit requests to plot omitted coefficients and baselevels requests to plot base level
coefficients. Omitted coefficients and base levels coefficients are always equal to zero, but in some cases
it can be helpful to include them in a graph for reasons of clarity. The {bf} tag changes text to bold;
see [G] text for details on text in graphs.
Instead of adding headings you can also define groups of coefficients and add group labels using the
groups() option:
Domestic
Foreign
22
. coefplot, xline(0) omitted base Headroom (in.)
> headings( Repair record:
> 3.rep78 = "{it:Repair record:}"
Main Effects
Repair Record 1978=3
> 0.foreign = "{it:Car type:}", nogap) Repair Record 1978=4
> groups(headroom 1.foreign = Repair Record 1978=5
> "{bf:Main Effects}"
Car type:
> ?.rep78#?.foreign =
Domestic
> "{bf:Interaction Effects}")
> drop(_cons) Foreign
Interaction Effects
Repair Record 1978=3 # Foreign
Repair Record 1978=4 # Domestic
Repair Record 1978=4 # Foreign
Repair Record 1978=5 # Domestic
Repair Record 1978=5 # Foreign
-20 -10 0 10 20
In this example, the nogap suboption was specified within headings() to prevent adding extra space
before the headings.
Equation labels provide yet another layer of labels. The default is to place the equation labels on the
right hand side, similar to group labels:
Milage
Equation 3
Foreign
_cons
-10 -5 0 5
However, you can also set the equation labels as headings between equations using the asheadings
suboption:
23
. coefplot, omitted keep(*:) Equation 1
> coeflabels(mpg = "Milage") mpp
> eqlabels("{bf:Equation 1}" Foreign
> "{bf:Equation 2}"
_cons
> "{bf:Equation 3}", asheadings)
Equation 2
mpp
Foreign
_cons
Equation 3
mpp
Foreign
_cons
-10 -5 0 5
The default is to plot all labels on the left of the plot region. Use option yscale(alt) to move labels to
the right (see [G] twoway_options):
Group labels and equation labels are rendered as additional axes (axis 2 for group labels; axis 2 or
3 for equation labels, depending on whether groups were specified), so you have to employ the axis()
suboption to move these:
24
. coefplot, xline(0) omitted base Headroom (in.)
> groups(?.rep78 =
> `""{bf:Repair}" "{bf:Record}""´ Repair Record 1978=3
Record
Repair
> ?.foreign = "{bf:Car Type}" Repair Record 1978=4
> ?.rep78#?.foreign = Repair Record 1978=5
> "{bf:Interaction Effects}",
> angle(rvertical))
Car Type
Domestic
> drop(_cons) yscale(alt axis(2))
Foreign
Interaction Effects
Repair Record 1978=3 # Foreign
Repair Record 1978=4 # Domestic
Repair Record 1978=4 # Foreign
Repair Record 1978=5 # Domestic
Repair Record 1978=5 # Foreign
-20 -10 0 10 20
Moving group labels to the right can also be useful if you want to add an extra set of individual
coefficient labels, without actually forming groups. Here is an example in which groups() is used to add
information on the sample sizes of factor levels:
(output omitted )
. coefplot, xline(0) omitted baselevels Repair Record 1978=5 N = 11
> groups(3.rep78 = "N = 30"
> 4.rep78 = "N = 18"
> 5.rep78 = "N = 11"
> 0.foreign = "N = 38" Domestic N = 38
> 1.foreign = "N = 21"
> , nogap angle(horizontal))
> drop(_cons) yscale(alt axis(2)) Foreign N = 21
-5 0 5 10
25
. sysuse auto, clear
(1978 Automobile Data) Mileage (mpg)
. regress price mpg trunk length turn
(output omitted )
. coefplot, drop(_cons) xline(0) Trunk space (cu. ft.)
> msymbol(s) mfcolor(white)
> levels(99.9 99 95)
> legend(order(1 "99.9" 2 "99" 3 "95")
> row(1)) Length (in.)
99.9 99 95
Line widths are (logarithmically) increased across the confidence intervals. To use different line widths
specify the lwidth() suboption within ciopts():
Length (in.)
99.9 99 95
To compute confidence intervals, coefplot collects the variances of the coefficients from the diag-
onal of e(V) and then, depending on whether degrees of freedom are available in scalar e(df_r) (or,
for estimates from [MI] mi, in matrix e(df_mi)), applies the standard formulas for confidence inter-
vals based on the t-distribution or the normal distribution, respectively. If e(V) is not available, then
coefplot looks for standard errors in vector e(se) and uses these for confidence interval computation.
If a model does not provide degrees of freedom but you want to compute confidence intervals based on
the t-distribution, you can provide the degrees of freedom through option df() (see the online help). If
variances are stored in a matrix other than e(V), use the v() option to provide the appropriate matrix
name, or, alternatively use option se() to provide custom standard errors (in which case variances from
e(V) will be ignored). Likewise, if your estimation command provides precomputed confidence intervals,
use the ci() option to include them in the plot. For example, to plot the normal-approximation, per-
centile, and bias-corrected confidence intervals that are provided in e(ci_normal), e(ci_percentile),
and e(ci_bc) by the bootstrap method, you could type:
26
. regress price mpg trunk length turn,
> vce(bootstrap) Mileage (mpg)
(output omitted )
. coefplot
> (, ci(ci_normal) label(normal)) Trunk space (cu. ft.)
> (, ci(ci_percentile) label(percent))
> (, ci(ci_bc) label(bc))
> , drop(_cons) xline(0) legend(row(1))
Length (in.)
normal percent bc
In addition to level() and ci() you can also use option cismooth to add smoothed confidence
intervals.1 By default, cismooth generates confidence intervals for 50 equally spaced levels (1, 3, . . . ,
99) width graduated color intensities and varying line widths, as illustrated in the following example:
Length (in.)
The smoothed confidence intervals are produced independently from levels() and ci() and are not
affected by ciopts(). Their appearance, however, can be set by a number of suboptions (see the online
help). If cismooth is specified together with levels() or ci(), then the smoothed confidence intervals
are placed behind the confidence intervals from levels() or ci().
By default, coefplot produces a horizontal graph with labels on the Y axis and values on the X axis.
To flip axes specify the vertical option:
1 The cismooth option has been inspired by code form David B. Sparks to produce smoothed confidence interval plots
in R (see http://dsparks.wordpress.com/2011/02/21/choropleth-tutorial-and-regression-coefficient-plots/).
27
200
. sysuse auto, clear
(1978 Automobile Data)
. regress price mpg trunk length turn
(output omitted )
0
. coefplot, drop(_cons) vertical yline(0)
-200
-400
-600
Mileage (mpg) Trunk space (cu. ft.) Length (in.) Turn Circle (ft.)
When changing from horizontal to vertical mode, options referring to specific axes need to be adjusted.
This is why yline(0) was used in the example instead of xline(0) to draw the zero line.
To change the plot types used for coefficient markers and confidence intervals, you can use the recast()
option. Available plot types for markers are standard twoway plots such as scatter (the default), line,
dot, or bar. For confidence intervals use range plots such as rspike (the default), rline, rcap, or rbar.
Capped spikes for confidence intervals For example, to display confidence intervals using capped
spikes, you could type:
Length (in.)
Bar charts of proportions Furthermore, a bar chart of proportions with capped confidence spikes
can be produced as follows:
28
. sysuse auto, clear
.8
(1978 Automobile Data)
. proportion rep if foreign==0
.6
(output omitted )
. estimates store domestic
Proportion
. proportion rep if foreign==1
.4
(output omitted )
. estimates store foreign
.2
. coefplot domestic foreign,
> vertical recast(bar)
> barwidth(0.25) fcolor(*.5)
0
> ciopts(recast(rcap)) citop
> xtitle(Repair Record 1978) 1 2 3 4 5
> ytitle(Proportion) Repair Record 1978
domestic foreign
In this example the citop option was used to prevent the lower limits of the confidence intervals from
being hidden behind the bars.
Bars and lines Different plot types can be mixed, as the following example illustrates:
. proportion rep
.8
(output omitted )
. estimates store total
.6
. coefplot
> (domestic, offset(-.15) recast(bar)
Proportion
> lpattern(dash)))
1 2 3 4 5
> , xtitle(Repair Record 1978) Repair Record 1978
> ytitle(Proportion) vertical
domestic foreign
total
To add the values of the coefficients as marker labels, use the mlabel option, possibly together with
format() to set the display format:
29
. sysuse auto, clear -1.1
Headroom (in.)
(1978 Automobile Data)
. keep if rep78>=3 -.31
Repair Record 1978=4
(10 observations deleted)
. regress mpg headroom i.rep##i.foreign 12
Repair Record 1978=5
(output omitted )
. coefplot, xline(0) mlabel format(%9.2g) Foreign
3.7
-9
Repair Record 1978=5 # Foreign
23
_cons
-20 -10 0 10 20 30
Stata graphs do not support background colors for marker labels, which makes labels unreadably if
you place them on top of the markers using mlabposition(0). However, here is a workaround. The
trick is to add a second “confidence interval” that is a bar of fixed width (the dot in the suboptions within
ciopts() specifies the “default” style; see [G] stylelists):
-20 -10 0 10 20 30
Foreign 3.7
_cons 23
-20 -10 0 10 20 30
A bit unfortunate might be that due to the box the exact location of the coefficient can no longer
be seen in the graph. Here is an example where an additional vertical spike is added to mark the point
30
estimates.
In the example, bars of close-to-zero width are used produce the vertical spikes. Zero width bars
would be invisible. By adding tiny offsets of ±10 9 the bars become visible.
The plot might still not be optimal since for the first coefficient, the confidence interval is hidden
behind the marker label box. Plotting the confidence intervals as bars can, for example, solve this
problem (“..” in recast() specifies to repeat style rbar until end; see [G] stylelists):
_cons 23
-20 -10 0 10 20 30
In some situations it is sensible to have a separate subgraph for each coefficient. This can be achieved
by the bycoefs option. Technically, bycoefs flips coefficients and subgraphs, that is, the coefficients
are treated as “subgraphs” and what was specified as subgraphs is treated as “coefficients”. This seems
difficult to understand, but should become clear in the following example:
31
. sysuse auto, clear Mileage (mpg) Headroom (in.)
(1978 Automobile Data)
rep78_3
. forv i = 3/5 {
2. quietly regress price mpg rep78_4
> headroom weight turn
> if rep78==`i´
rep78_5
3. estimate store rep78_`i´
4. } -500 0 500 -2000 0 2000 4000
. coefplot rep78_3 || rep78_4 || rep78_5, Weight (lbs.) Turn Circle (ft.)
> drop(_cons) xline(0)
> bycoefs byopts(xrescale) rep78_3
rep78_4
rep78_5
In the example, option byopts(xrescale) was specified so that each coefficient can have its own
scale. As some people prefer vertical mode for such a graph, you might want to specify the vertical
option:
4000
> drop(_cons) yline(0) vertical
500
2000
0
0
-2000
-500
500 1000
10
0
5
-1000 -500
0
If bycoefs is specified, options relocate(), headings(), groups() apply to the elements on the
categorical axis (instead of coefficients). To address the elements use integer numbers, 1, 2, 3 etc., as in
the following example:
32
. sysuse auto, clear Mileage (mpg) Headroom (in.)
1000 2000
(1978 Automobile Data)
400
. regress price mpg headroom weight turn
200
(output omitted )
0
. estimates store Total
-1000
. regress price mpg headroom weight turn
-200
> if foreign==0
Weight (lbs.) Turn Circle (ft.)
(output omitted )
1000
8
. estimates store Domestic
500
. regress price mpg headroom weight turn
4
> if foreign==1
0
(output omitted )
-500
. estimates store Foreign
0
Domestic Foreign Total Domestic Foreign Total
. coefplot Domestic || Foreign || Total, Subgroup results Subgroup results
> drop(_cons) yline(0) vertical
> bycoefs byopts(yrescale)
> group(1 2 = "Subgroup results", nogap) ylabel(0, add)
Option ylabel(0, add) has been added to ensure that zero is included in each subgraph.
Coefficients provided to coefplot may represent estimates along a continuous dimension. Examples are
predictive margins or marginal effects computed over values of a continuous variable. In such a case, use
the at() option to provide the plot positions to coefplot. Here is an example where predictive margins
of foreign are computed by level of mpg, once from a bivariate model and once from a multivariate
model:
(output omitted )
. margins, at(mpg=(10(2)40)) post
Pr(foreign=1)
.6
(output omitted )
. estimates store bivariate
.4
at() causes coefplot to use a continuous axis with default labeling for the plotted estimates instead
of compiling a categorical axis. It also causes coefplot to switch to vertical mode, as this is the more
common way to display such results. As no categorical axis is constructed if at() is specified, options
order(), relocate(), grid() coeflabels(), eqlabels(), headings(), groups(), and bycoefs are
not allowed. Furthermore, note that continuous and categorical mode cannot be mixed. That is, at()
has to be specified for all models or for none. In the example above, at was used without argument.
This is suitable for results provided by margins, as coefplot contains some special code to retrieve the
plot positions in this case. See the online help for alternative applications of at().
coefplot does not change the plot type for markers and confidence intervals and hence still draws
dots and spikes. Use the recast() option to change this, e.g., as follows:
33
. coefplot bivariate multivariate, at
> ytitle(Pr(foreign=1))
1
> xtitle(Miles per Gallon)
> recast(line) lwidth(*2)
.8
> ciopts(recast(rline) lpattern(dash))
Pr(foreign=1)
.4 .2
0 .6
10 20 30 40
Miles per Gallon
bivariate multivariate
Finally, to plot results from a matrix ([P] matrix) instead of the e()-returns, use syntax
⇥ ⇤ ⇥ ⇤ ⇥ ⇤⇥ ⇤ ⇥ ⇤
coefplot ( matrix(mspec) , modelopts ... ) ...
A single coefplot command can contain both regular syntax and matrix() syntax. For example, to
add means to the graph above you could proceed as follows:
34
. mean mpg trunk turn
(output omitted )
Mileage (mpg)
. estimates store mean
. coefplot (matrix(res[,1]), label(median)
> ci((res[,2] res[,3])))
> (mean)
Trunk space (cu. ft.)
10 20 30 40
median mean
References
Gallup, J. L. 2012. A new system for formatting estimation tables. The Stata Journal 12(1): 3–28.
Jann, B. 2007. Making regression tables simplified. The Stata Journal 7(2): 227–244.
Kastellec, J. P., and E. L. Leoni. 2007. Using Graphs Instead of Tables in Political Science. Perspectives
on Politics 5(4): 755–771.
Newson, R. 2003. Confidence intervals and p-values for delivery to the end user. The Stata Journal 3(3):
245–269.
35