rtabstat
rtabstat
com
tabstat — Compact table of summary statistics
Description
tabstat displays summary statistics for a series of numeric variables in one table. It allows you
to specify the list of statistics to be displayed. Statistics can be calculated (conditioned on) another
variable. tabstat allows substantial flexibility in terms of the statistics presented and the format of
the table.
Quick start
Mean of v1 displayed using v1’s display format
tabstat v1, format
Same as above, but use format with 2 significant digits and a comma
tabstat v1, format(%9.2fc)
Nonmissing observations, mean, standard error, and coefficient of variation for v1
tabstat v1, statistics(n mean semean cv)
Quartiles and interquartile range of v1 and v2
tabstat v1 v2, statistics(q iqr)
Same as above, but report statistics separately for each level of catvar
tabstat v1 v2, by(catvar) statistics(q iqr)
Same as above, but display a separate column for each statistic
tabstat v1 v2, by(catvar) statistics(q iqr) columns(statistics)
Menu
Statistics > Summaries, tables, and tests > Other tables > Compact table of summary statistics
1
2 tabstat — Compact table of summary statistics
Syntax
tabstat varlist if in weight , options
options Description
Main
by(varname) group statistics by variable
statistics(statname . . . ) report specified statistics
Options
labelwidth(#) width for by() variable labels; default is labelwidth(16)
varwidth(#) variable width; default is varwidth(12)
columns(variables) display variables in table columns; the default
columns(statistics)
display statistics in table columns
format (% fmt) display format for statistics; default format is %9.0g
casewise perform casewise deletion of observations
nototal do not report overall statistics; use with by()
missing report statistics for missing values of by() variable
noseparator do not use separator line between by() categories
longstub make left table stub wider
save store summary statistics in r()
by is allowed; see [D] by.
aweights and fweights are allowed; see [U] 11.1.6 weight.
Options
Main
by(varname) specifies that the statistics be displayed separately for each unique value of varname;
varname may be numeric or string. For instance, tabstat height would present the overall mean
of height. tabstat height, by(sex) would present the mean height of males, and of females,
and the overall mean height. Do not confuse the by() option with the by prefix (see [D] by); both
may be specified.
statistics(statname . . . ) specifies the statistics to be displayed; the default is equivalent to
specifying statistics(mean). (stats() is a synonym for statistics().) Multiple statistics
may be specified and are separated by white space, such as statistics(mean sd). Available
statistics are
tabstat — Compact table of summary statistics 3
Options
labelwidth(#) specifies the maximum width to be used within the stub to display the labels of the
by() variable. The default is labelwidth(16). 8 ≤ # ≤ 32.
varwidth(#) specifies the maximum width to be used within the stub to display the names of the vari-
ables. The default is varwidth(12). varwidth() is effective only with columns(statistics).
Setting varwidth() implies longstub. 8 ≤ # ≤ 32.
columns(variables | statistics) specifies whether to display variables or statistics in the columns
of the table. columns(variables) is the default when more than one variable is specified.
format and format(% fmt) specify how the statistics are to be formatted. The default is to use a
%9.0g format.
format specifies that each variable’s statistics be formatted with the variable’s display format; see
[D] format.
format(% fmt) specifies the format to be used for all statistics.
The column width is the maximum width of these formats. The minimum column width is nine
display characters.
casewise specifies casewise deletion of observations. Statistics are to be computed for the sample
that is not missing for any of the variables in varlist. The default is to use all the nonmissing
values for each variable.
nototal is for use with by(); it specifies that the overall statistics not be reported.
missing specifies that missing values of the by() variable be treated just like any other value and
that statistics should be displayed for them. The default is not to report the statistics for the by()==
missing group. If the by() variable is a string variable, by()=="" is considered to mean missing.
noseparator specifies that a separator line between the by() categories not be displayed.
longstub specifies that the left stub of the table be made wider so that it can include names of the
statistics or variables in addition to the categories of by(varname). The default is to describe the
statistics or variables in a header. longstub is ignored if by(varname) is not specified.
save specifies that the summary statistics be returned in r(). The overall (unconditional) statistics
are returned in matrix r(StatTotal) (rows are statistics, columns are variables). The conditional
statistics are returned in the matrices r(Stat1), r(Stat2), . . . , and the names of the corresponding
variables are returned in the macros r(name1), r(name2), . . . .
4 tabstat — Compact table of summary statistics
More summary statistics can be requested via the statistics() option. The group totals can be
suppressed with the nototal option.
. tabstat price weight mpg rep78, by(foreign) stat(mean sd min max) nototal
Summary statistics: Mean, SD, Min, Max
Group variable: foreign (Car origin)
foreign price weight mpg rep78
Although the header of the table describes the statistics running vertically in the “cells”, the table
may become hard to read, especially with many variables or statistics. The longstub option specifies
that a column be added describing the contents of the cells. The format option can be issued to
specify that tabstat display the statistics by using the display format of the variables rather than
the overall default %9.0g.
tabstat — Compact table of summary statistics 5
. tabstat price weight mpg rep78, by(foreign) stat(mean sd min max) long format
foreign Stats price weight mpg rep78
We can specify a layout of the table in which the statistics run horizontally and the variables run
vertically by specifying the col(statistics) option.
. tabstat price weight mpg rep78, by(foreign) stat(min mean max) col(stat) long
foreign Variable Min Mean Max
Finally, tabstat can also be used to enhance summarize so we can specify the statistics to
be displayed. For instance, we can display the number of observations, the mean, the coefficient of
variation, and the 25%, 50%, and 75% quantiles for a list of variables.
. tabstat price weight mpg rep78, stat(n mean cv q) col(stat)
variable N mean cv p25 p50 p75
Because we did not specify the by() option, these statistics were not displayed for the subgroups
of the data formed by the categories of the by() variable.
6 tabstat — Compact table of summary statistics
Video example
Descriptive statistics in Stata
Acknowledgments
The tabstat command was written by Jeroen Weesie and Vincent Buskens both of the Department
of Sociology at Utrecht University, The Netherlands.
Reference
Donath, S. 2018. baselinetable: A command for creating one- and two-way tables of summary statistics. Stata Journal
18: 327–344.
Also see
[R] summarize — Summary statistics
[R] table — Table of frequencies, summaries, and command results
[R] table summary — Table of summary statistics
[R] tabulate, summarize() — One- and two-way tables of summary statistics
[D] collapse — Make dataset of summary statistics
Stata, Stata Press, and Mata are registered trademarks of StataCorp LLC. Stata and
®
Stata Press are registered trademarks with the World Intellectual Property Organization
of the United Nations. StataNow and NetCourseNow are trademarks of StataCorp
LLC. Other brand and product names are registered trademarks or trademarks of their
respective companies. Copyright c 1985–2023 StataCorp LLC, College Station, TX,
USA. All rights reserved.
For suggested citations, see the FAQ on citing Stata documentation.