0% found this document useful (0 votes)

2 views

bayesbayesstatssummary.pdf#bayesbayesstatssummary

The document describes the 'bayesstats summary' command in Stata, which calculates and reports posterior summary statistics for model parameters from Bayesian estimation results. It provides various options for summarizing model parameters, including posterior means, standard deviations, credible intervals, and more, with specific commands for different parameter specifications. Additionally, the document outlines syntax, options, and examples for using the command effectively.

Uploaded by

Phương Lê

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

bayesbayesstatssummary.pdf#bayesbayesstatssummary

Uploaded by

Phương Lê

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Title stata.

com
bayesstats summary — Bayesian summary statistics

Description Quick start Menu Syntax

Options Remarks and examples Stored results Methods and formulas
References Also see

Description
bayesstats summary calculates and reports posterior summary statistics for model parameters and
functions of model parameters using current Bayesian estimation results. Posterior summary statistics
include posterior means, posterior standard deviations, MCMC standard errors (MCSE), posterior
medians, and equal-tailed credible intervals or highest posterior density (HPD) credible intervals.

Quick start
Posterior summaries for all model parameters after a Bayesian regression model
bayesstats summary
Same as above, but only for parameters {y:x1} and {y:x2}
bayesstats summary {y:x1} {y:x2}
Same as above
bayesstats summary {y:x1 x2}
Posterior summaries for elements 1,1 and 2,1 of matrix parameter {S}
bayesstats summary {S_1_1 S_2_1}
Posterior summaries for all elements of matrix parameter {S}
bayesstats summary {S}
Posterior summaries with HPD instead of equal-tailed credible intervals and with credible level of
90%
bayesstats summary, hpd clevel(90)
Posterior summaries with MCSE calculated using batch means
bayesstats summary, batch(100)
Posterior summaries for functions of scalar model parameters
bayesstats summary ({y:x1}-{y:_cons}) (sd:sqrt({var}))
Posterior summaries for the log-likelihood and log-posterior functions
bayesstats summary _loglikelihood _logposterior
Posterior summaries for selected model parameters and functions of model parameters and for
log-likelihood and log-posterior functions using abbreviated syntax
bayesstats summary {var} ({y:x1}-{y:_cons}) _ll _lp
Posterior summaries of the simulated outcome
bayespredict {_ysim}, saving(predres)
bayesstats summary {_ysim} using predres

1
2 bayesstats summary — Bayesian summary statistics

Posterior summaries of the mean across observations of the simulated outcome labeled as mymean
bayesstats summary (mymean: @mean({_ysim})) using predres

Menu
Statistics > Bayesian analysis > Summary statistics

Syntax
Syntax is presented under the following headings:
Summary statistics for model parameters
Summary statistics for predictions

Summary statistics for model parameters

Summary statistics for all model parameters

bayesstats summary , options showreffects (reref )

bayesstats summary all , options showreffects (reref )

Summary statistics for selected model parameters

bayesstats summary paramspec , options

Summary statistics for expressions of model parameters

bayesstats summary exprspec , options

Summary statistics of log-likelihood or log-posterior functions

bayesstats summary loglikelihood | logposterior , options

Full syntax

bayesstats summary spec spec . . . , options

paramspec can be one of the following:

{eqname:param} refers to a parameter param with equation name eqname;
{eqname:} refers to all model parameters with equation name eqname;
{eqname:paramlist} refers to parameters with names in paramlist and with equation name eqname;
or
{param} refers to all parameters named param from all equations.
In the above, param can refer to a matrix name, in which case it will imply all elements of this
matrix. See Different ways of specifying model parameters in [BAYES] Bayesian postestimation
for examples.
bayesstats summary — Bayesian summary statistics 3

exprspec is an optionally labeled expression of model parameters specified in parentheses:

( exprlabel: expr)
exprlabel is a valid Stata name, and expr is a scalar expression that may not contain matrix model
parameters. See Specifying functions of model parameters in [BAYES] Bayesian postestimation
for examples.
loglikelihood and logposterior also have respective synonyms ll and lp.
spec is one of paramspec, exprspec, loglikelihood (or ll), or logposterior (or lp).

Summary statistics for predictions

Summary statistics for simulated outcomes, residuals, and more

bayesstats summary yspec yspec . . . using predfile , options

Summary statistics for expressions of simulated outcomes, residuals, and more

bayesstats summary (yexprspec) (yexprspec) . . . using predfile , options

Summary statistics for Mata functions of simulated outcomes, residuals, and more

bayesstats summary (funcspec) (funcspec) . . . using predfile , options

Full syntax

bayesstats summary predspec predspec . . . using predfile , options

predfile is the name of the dataset created by bayespredict that contains prediction results.
yspec is {ysimspec | residspec | muspec | label}.
ysimspec is { ysim#} or { ysim#[numlist]}, where { ysim#} refers to all observations of the #th
simulated outcome and { ysim#[numlist]} refers to the selected observations, numlist, of the #th
simulated outcome. { ysim} is a synonym for { ysim1}.
residspec is { resid#} or { resid#[numlist]}, where { resid#} refers to all residuals of the
#th simulated outcome and { resid#[numlist]} refers to the selected residuals, numlist, of the
#th simulated outcome. { resid} is a synonym for { resid1}.
muspec is { mu#} or { mu#[numlist]}, where { mu#} refers to all expected values of the #th
outcome and { mu#[numlist]} refers to the selected expected values, numlist, of the #th outcome.
{ mu} is a synonym for { mu1}.
label is the name of the function simulated using bayespredict.
With large datasets, specifications { ysim#}, { resid#}, and { mu#} may use a lot of time and
memory and should be avoided. See Generating and saving simulated outcomes in [BAYES] bayespre-
dict.

yexprspec is exprlabel: yexpr, where exprlabel is a valid Stata name and yexpr is a scalar expression
that may contain individual observations of simulated outcomes, { ysim#[#]}; individual expected
outcome values, { mu#[#]}; individual simulated residuals, { resid#[#]}; and other scalar
predictions, {label}.
4 bayesstats summary — Bayesian summary statistics

funcspec is label: @func(arg1 , arg2 ), where label is a valid Stata name; func is an official or user-
defined Mata function
that operates on column vectors
and returns a real scalar; and arg1 and arg2
are one of { ysim # }, { resid # }, or { mu # }. arg2 is primarily for use with user-defined
Mata functions; see Defining test statistics using Mata functions in [BAYES] bayespredict.
predspec is one of yspec, (yexprspec), or (funcspec). See Different ways of specifying predictions
and their functions in [BAYES] Bayesian postestimation.

options Description
Main
clevel(#) set credible interval level; default is clevel(95)
hpd display HPD credible intervals instead of the default equal-tailed credible
intervals
batch(#) specify length of block for batch-means calculations; default is batch(0)
∗
chains( all | numlist) specify which chains to use for computation; default is chains( all)
∗
sepchains compute results separately for each chain
skip(#) skip every # observations from the MCMC sample; default is skip(0)
nolegend suppress table legend
display options control spacing, line width, and base and empty cells
Advanced
corrlag(#) specify maximum autocorrelation lag; default varies
corrtol(#) specify autocorrelation tolerance; default is corrtol(0.01)
∗
Options chains() and sepchains are relevant only when option nchains() is used during Bayesian estimation.
collect is allowed; see [U] 11.1.10 Prefix commands.

Options

Main
clevel(#) specifies the credible level, as a percentage, for equal-tailed and HPD credible intervals.
The default is clevel(95) or as set by [BAYES] set clevel.
hpd displays the HPD credible intervals instead of the default equal-tailed credible intervals.
batch(#) specifies the length of the block for calculating batch means and an MCSE using batch
means. The default is batch(0), which means no batch calculations. When batch() is not
specified, the MCSE is computed using effective sample sizes instead of batch means. batch()
may not be combined with corrlag() or corrtol().
chains( all | numlist) specifies which chains from the MCMC sample to use for computation. The
default is chains( all) or to use all simulated chains. Using multiple chains, provided the chains
have converged, generally improves MCMC summary statistics. Option chains() is relevant only
when option nchains() is used during Bayesian estimation.
sepchains specifies that the results be computed separately for each chain. The default is to compute
results using all chains as determined by option chains(). Option sepchains is relevant only
when option nchains() is used during Bayesian estimation.
showreffects and showreffects(reref ) are for use after multilevel models, and they specify that
the results for all or a list reref of random-effects parameters be provided in addition to other model
parameters. By default, all random-effects parameters are excluded from the results to conserve
computation time.
bayesstats summary — Bayesian summary statistics 5

skip(#) specifies that every # observations from the MCMC sample not be used for computation.
The default is skip(0) or to use all observations in the MCMC sample. Option skip() can be
used to subsample or thin the chain. skip(#) is equivalent to a thinning interval of # +1. For
example, if you specify skip(1), corresponding to the thinning interval of 2, the command will
skip every other observation in the sample and will use only observations 1, 3, 5, and so on in the
computation. If you specify skip(2), corresponding to the thinning interval of 3, the command
will skip every 2 observations in the sample and will use only observations 1, 4, 7, and so on in
the computation. skip() does not thin the chain in the sense of physically removing observations
from the sample, as is done by, for example, bayesmh’s thinning() option. It only discards
selected observations from the computation and leaves the original sample unmodified.
nolegend suppresses the display of the table legend, which identifies the rows of the table with the
expressions they represent.
display options: vsquish, noemptycells, baselevels, allbaselevels, nofvlabel,
fvwrap(#), fvwrapon(style), and nolstretch; see [R] Estimation options.

Advanced

corrlag(#) specifies the maximum autocorrelation lag used for calculating effective sample sizes. The
default is min{500, mcmcsize()/2}. The total autocorrelation is computed as the sum of all lag-k
autocorrelation values for k from 0 to either corrlag() or the index at which the autocorrelation
becomes less than corrtol() if the latter is less than corrlag(). Options corrlag() and
batch() may not be combined.
corrtol(#) specifies the autocorrelation tolerance used for calculating effective sample sizes. The
default is corrtol(0.01). For a given model parameter, if the absolute value of the lag-k
autocorrelation is less than corrtol(), then all autocorrelation lags beyond the k th lag are
discarded. Options corrtol() and batch() may not be combined.

Remarks and examples stata.com

Remarks are presented under the following headings:
Introduction
Bayesian summaries for an auto data example

Introduction
bayesstats summary reports posterior summary statistics for model parameters and their functions
using the current Bayesian estimation results. When typed without arguments, the command displays
results for all model parameters. Alternatively, you can specify a subset of model parameters following
the command name; see Different ways of specifying model parameters in [BAYES] Bayesian
postestimation. You can also obtain results for scalar functions of model parameters; see Specifying
functions of model parameters in [BAYES] Bayesian postestimation.
Sometimes, it may be useful to obtain posterior summaries of log-likelihood and log-posterior
functions. This can be done by specifying loglikelihood and logposterior (or the respective
synonyms ll and lp) following the command name.
You can also obtain the posterior summaries for prediction quantities when you specify the prediction
dataset in the using specification; see Different ways of specifying predictions and their functions in
[BAYES] Bayesian postestimation for how to specify prediction quantities with bayesstats summary.
6 bayesstats summary — Bayesian summary statistics

bayesstats summary reports the following posterior summary statistics: posterior mean, posterior
standard deviation, MCMC standard error, posterior median, and equal-tailed credible intervals or, if
the hpd option is specified, HPD credible intervals. The default credible level is set to 95%, but you
can change this by specifying the clevel() option. Equal-tailed and HPD intervals may produce very
different results for asymmetric or highly skewed marginal posterior distributions. The HPD intervals
are preferable in this situation.
You should not confuse the term “HPD interval” with the term “HPD region”. A {100×(1−α)}% HPD
interval is defined such that it contains {100 ×(1 −α)}% of the posterior density. A {100 ×(1 −α)}%
HPD region also satisfies the condition that the density inside the region is never lower than that outside
the region. For multimodal univariate marginal posterior distributions, the HPD regions may include
unions of nonintersecting HPD intervals. For unimodal univariate marginal posterior distributions, HPD
regions are indeed simply HPD intervals. The bayesstats summary command thus calculates HPD
intervals assuming unimodal marginal posterior distributions (Chen and Shao 1999).
Some authors use the term “posterior intervals” instead of “credible intervals” and the term “central
posterior intervals” instead of “equal-tailed credible intervals” (for example, Gelman et al. [2014]).

Bayesian summaries for an auto data example

Recall our analysis of auto.dta from example 4 in [BAYES] bayesmh using the mean-only normal
model for mpg with a noninformative prior.
. use https://www.stata-press.com/data/r18/auto
(1978 automobile data)
. set seed 14
. bayesmh mpg, likelihood(normal({var}))
> prior({mpg:_cons}, flat) prior({var}, jeffreys)
Burn-in ...
Simulation ...
Model summary

Likelihood:
mpg ~ normal({mpg:_cons},{var})
Priors:
{mpg:_cons} ~ 1 (flat)
{var} ~ jeffreys

Bayesian normal regression MCMC iterations = 12,500

Random-walk Metropolis--Hastings sampling Burn-in = 2,500
MCMC sample size = 10,000
Number of obs = 74
Acceptance rate = .2668
Efficiency: min = .09718
avg = .1021
Log marginal-likelihood = -234.645 max = .1071

Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]

mpg
_cons 21.29222 .6828864 .021906 21.27898 19.99152 22.61904

var 34.76572 5.91534 .180754 34.18391 24.9129 47.61286

bayesstats summary — Bayesian summary statistics 7

Example 1: Summaries for all parameters

If we type bayesstats summary without arguments after the bayesmh command, we will obtain
the same summary table as reported by bayesmh.
. bayesstats summary
Posterior summary statistics MCMC sample size = 10,000

Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]

mpg
_cons 21.29222 .6828864 .021906 21.27898 19.99152 22.61904

var 34.76572 5.91534 .180754 34.18391 24.9129 47.61286

The posterior mean of {mpg: cons} is 21.29 and of {var} is 34.8. They are close to their respective
frequentist analogs (the sample mean of mpg is 21.297, and the sample variance is 33.47), because
we used a noninformative prior. Posterior standard deviations are 0.68 for {mpg: cons} and 5.92
for {var}, and they are comparable to frequentist standard errors under this noninformative prior.
The standard error estimates of the posterior means, MCSEs, are low. For example, MCSE is 0.022
for {mpg: cons}. This means that the precision of our estimate is, up to one decimal point, 21.3
provided that MCMC converged. The posterior means and medians of {mpg: cons} are close, which
suggests that the posterior distribution for {mpg: cons} may be symmetric. According to the credible
intervals, we are 95% certain that the posterior mean of {mpg: cons} is roughly between 20 and
23 and that the posterior mean of {var} is roughly between 25 and 48. We can infer from this that
{mpg: cons} is greater than, say, 15, and that {var} is greater than, say, 20, with a very high
probability. (We can use [BAYES] bayestest interval to compute the actual probabilities.)
The above is also equivalent to typing
. bayesstats summary {mpg:_cons} {var}
(output omitted )

Example 2: Credible intervals

By default, bayesstats summary reports 95% equal-tailed credible intervals. We can change the
default credible level by specifying the clevel() option.
. bayesstats summary, clevel(90)
Posterior summary statistics MCMC sample size = 10,000

Equal-tailed
Mean Std. dev. MCSE Median [90% cred. interval]

mpg
_cons 21.29222 .6828864 .021906 21.27898 20.18807 22.44172

var 34.76572 5.91534 .180754 34.18391 26.28517 44.81732

As expected, 90% credible intervals are more narrow.

8 bayesstats summary — Bayesian summary statistics

To calculate and report HPD intervals, we specify the hpd option.

. bayesstats summary, hpd
Posterior summary statistics MCMC sample size = 10,000

HPD
Mean Std. dev. MCSE Median [95% cred. interval]

mpg
_cons 21.29222 .6828864 .021906 21.27898 19.94985 22.54917

var 34.76572 5.91534 .180754 34.18391 24.34876 46.12339

The posterior distribution of {mpg: cons} is symmetric about the posterior mean; thus there is
little difference between the 95% equal-tailed credible interval from example 1 and this 95% HPD
credible interval for {mpg: cons}. The 95% HPD interval for {var} has a smaller width than the
corresponding equal-tailed interval in example 1.

Example 3: Batch-means estimator

bayesstats summary provides two estimators for MCSE: effective-sample-size and batch-means.
Estimation using effective sample sizes is the default. You can use the batch(#) option to request the
batch-means estimator, where # is the batch size. The optimal batch size depends on the autocorrelation
in the MCMC sample. For example, if we observe that the autocorrelation for the parameters of interest
is negligible after lag 100, we can specify batch(100) to estimate MCSE.
In our example, autocorrelation dies out after about lag 10 (see, for example, Autocorrelation plots
in [BAYES] bayesgraph and example 1 in [BAYES] bayesstats ess), so we use 10 as our batch size:
. bayesstats summary, batch(10)
Posterior summary statistics MCMC sample size = 10,000
Batch size = 10

Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]

mpg
_cons 21.29222 .6828864 .015315 21.27898 19.99152 22.61904

var 34.76572 5.91534 .135295 34.18391 24.9129 47.61286

Note: Mean and MCSE are estimated using batch means.

The batch-means MCSE estimates are somewhat smaller than those obtained by default using effective
sample sizes.
Use caution when choosing the batch size for the batch-means method. For example, if you use
the batch size of 1, you will obtain MCSE estimates under the assumption that the draws in the MCMC
sample are independent, which is not true.
bayesstats summary — Bayesian summary statistics 9

Example 4: Subsampling or thinning the chain

You can reduce correlation between MCMC draws by thinning or subsampling the MCMC chain.
You can use the skip(#) option to skip every # observations from the MCMC sample, which is
equivalent to a thinning interval of # + 1. For example, if you specify skip(1), corresponding to the
thinning interval of 2, bayesstats summary will skip every other observation in the sample and will
use only observations 1, 3, 5, and so on in the computation. If you specify skip(2), corresponding
to the thinning interval of 3, bayesstats summary will skip every two observations in the sample
and will use only observations 1, 4, 7, and so on in the computation. By default, no observations are
skipped—skip(0). Note that skip() does not thin the chain in the sense of physically removing
observations from the sample, as is done by bayesmh’s thinning() option. It discards only selected
observations from the computation and leaves the original sample unmodified.
. bayesstats summary, skip(9)
note: skipping every 9 sample observations; using observations 1,11,21,....
Posterior summary statistics MCMC sample size = 1,000

Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]

mpg
_cons 21.29554 .6813796 .029517 21.27907 19.98813 22.58582

var 34.7396 5.897313 .206269 33.91782 24.9554 48.11452

We selected to skip every 9 observations, which led to a significant reduction of the MCMC sample
size and thus increased our standard deviations. In some cases, with larger MCMC sample sizes,
subsampling may decrease standard deviations because of the decreased autocorrelation in the reduced
MCMC sample.

Example 5: Summaries for expressions of model parameters

bayesstats summary accepts expressions to provide summaries of functions of model parameters.
For example, we can use expression (sd:sqrt({var})) with a label, sd, to summarize the standard
deviation of mpg in addition to the variance.
. bayesstats summary (sd:sqrt({var})) {var}
Posterior summary statistics MCMC sample size = 10,000
sd : sqrt({var})

Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]

sd 5.87542 .4951654 .014972 5.846701 4.991282 6.900207

var 34.76572 5.91534 .180754 34.18391 24.9129 47.61286
10 bayesstats summary — Bayesian summary statistics

Expressions can also be used for calculating posterior probabilities, although this can be more
easily done using bayestest interval (see [BAYES] bayestest interval). For illustration, let’s verify
the probability that {var} is within the endpoints of the reported credible interval, indeed 0.95.
. bayesstats summary (prob:{var}>24.913 & {var}<47.613)
Posterior summary statistics MCMC sample size = 10,000
prob : {var}>24.913 & {var}<47.613

Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]

prob .9502 .2175424 .005301 1 0 1

Example 6: Summaries for log likelihood and log posterior

We can use reserved names loglikelihood (or the synonym ll) and logposterior (or the
synonym lp) to obtain summaries of the log likelihood and log posterior for the simulated MCMC
sample.
. bayesstats summary _ll _lp
Posterior summary statistics MCMC sample size = 10,000
_ll : _loglikelihood
_lp : _logposterior

Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]

_ll -235.4162 .990654 .032232 -235.1379 -238.1236 -234.4345

_lp -238.9507 1.037785 .034535 -238.6508 -241.7889 -237.9187

Example 7: Summaries for predicted outcomes

We continue our series of examples by computing summaries for Bayesian predictions. Let’s
generate Bayesian predictions of mpg and summarize them.
We use bayespredict to simulate outcome values for mpg for the first 10 observations from
the fitted bayesmh model. To use bayespredict, we must first save the simulation results from
bayesmh in a Stata dataset, autosim.dta. We then use bayespredict to save the prediction results
in the dataset mpgreps.dta.
. bayesmh, saving(autosim)
note: file autosim.dta saved.
. bayespredict {_ysim[1/10]}, saving(mpgreps) rseed(16)
Computing predictions ...
file mpgreps.dta saved.
file mpgreps.ster saved.

We can now summarize the prediction results by using bayesstats summary. We specify the
prediction quantity we wish to summarize, the simulated outcome { ysim} in our example, and the
prediction dataset, mpgreps.dta, which contains the prediction quantity, in the using specification.
bayesstats summary — Bayesian summary statistics 11

. bayesstats summary {_ysim} using mpgreps

Posterior summary statistics MCMC sample size = 10,000

Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]

_ysim1_1 21.24878 6.018783 .062648 21.23939 9.444973 33.07051

_ysim1_2 21.2539 5.944206 .060415 21.21638 9.421932 32.90605
_ysim1_3 21.3256 5.910363 .061595 21.31499 9.801655 33.02746
_ysim1_4 21.40651 5.963456 .059479 21.45933 9.794156 33.39388
_ysim1_5 21.19781 5.926335 .061197 21.26437 9.759916 32.80291
_ysim1_6 21.34776 5.94413 .059441 21.32314 9.771529 33.30251
_ysim1_7 21.34043 5.898474 .058985 21.34119 9.821613 33.07709
_ysim1_8 21.25329 5.957051 .05886 21.26176 9.476474 32.96236
_ysim1_9 21.25284 5.866096 .05962 21.3052 9.714165 32.82636
_ysim1_10 21.3464 5.931401 .060853 21.30528 9.670334 33.10769

bayesstats summary reports posterior summaries for all simulated outcomes in the prediction dataset,
mpgreps.dta. Estimated posterior means and standard deviations are similar to the corresponding
observed values for mpg, 21.30 and 5.79, respectively.
We can specifically examine the first observation of the replicated sample, { ysim 1}, and
compare it with the observed value, mpg[1], of 22.
. bayesstats summary ({_ysim_1}>=‘=mpg[1]’) using mpgreps
Posterior summary statistics MCMC sample size = 10,000
expr1 : _ysim1_1>=22

Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]

expr1 .4479 .497303 .004973 0 0 1

We find that 45% of the replicates of mpg[1] are greater than 22. The reported probability of
0.45 is known as the posterior predictive p-value and is used for goodness-of-fit checking; see
[BAYES] bayesstats ppvalues.

Stored results
bayesstats summary stores the following in r():
Scalars
r(mcmcsize) MCMC sample size used in the computation
r(clevel) credible interval level
r(hpd) 1 if hpd is specified, 0 otherwise
r(batch) batch length for batch-means calculations
r(skip) number of MCMC observations to skip in the computation; every r(skip) observations
are skipped
r(corrlag) maximum autocorrelation lag
r(corrtol) autocorrelation tolerance
r(nchains) number of chains used in the computation
Macros
r(names) names of model parameters and expressions
r(expr #) #th expression
r(exprnames) expression labels
r(chains) chains used in the computation, if chains() is specified
12 bayesstats summary — Bayesian summary statistics

Matrices
r(summary) matrix with posterior summaries statistics for parameters in r(names)
r(summary chain#) matrix summary for chain #, if sepchains is specified

Methods and formulas

Methods and formulas are presented under the following headings:
Point estimates
Credible intervals

Most of the summary statistics employed in Bayesian analysis are based on the marginal posterior
distributions of individual model parameters or functions of model parameters.
Let θ be a scalar model parameter and {θt }Tt=1 be an MCMC chain of size T drawn from the
marginal posterior distribution of θ. For a function g(θ), substitute {θt }Tt=1 with {g(θt )}Tt=1 in the
formulas below. If θ is a covariance matrix model parameter, the formulas below are applied to each
element of the lower-diagonal portion of θ.

Point estimates
Marginal posterior moments are approximated using the Monte Carlo integration applied to the
simulated samples {θt }Tt=1 .
Sample posterior mean and sample posterior standard deviation are defined as follows,

T T
1X 1 X
θb = θt , sb2 = b2
(θt − θ)
T t=1 T − 1 t=1

where θb and sb2 are sample estimators of the population posterior mean E(θt ) and posterior variance
Var(θt ).
With multiple chains, the posterior mean and standard deviation are estimated using the combined
sample of all chains or of those that are requested in the chains() option as follows. Let {θjt }Tt=1
be the j th Markov chain, j = 1, . . . , M , with sample mean θbj and variance sb2j . The overall sample
posterior mean is
M T
1 XX
θb = θjt
MT j=1 t=1

and equals the average of the sample means of individual chains. Let B and W be the respective
between-chains and within-chain variances

M M
T X b b 2, W = 1
X
B= (θj − θ) sb2
M − 1 j=1 M j=1 j

The estimator of the posterior variance is

T −1 1
sb2 = W+ B (1)
T T
bayesstats summary — Bayesian summary statistics 13

When the chains are strongly stationary, sb2 is an unbiased estimator of the marginal posterior variance
of θ (Gelman et al. 2014, sec. 11.4).
The precision of the sample posterior mean is evaluated by its standard error, also known as the
Monte Carlo standard error √ (MCSE). Note that MCSE cannot be estimated using the classical formula
for the standard error, sb/ T , because of the dependence between θt ’s.
Let
∞
X
σ 2 = Var(θt ) + 2 Cov(θt , θt+k )
k=1
√
Then, T ×MCSE approaches σ asymptotically in T .
bayesstats summary provides two different approaches for estimating MCSE. Both approaches
try to adjust for the existing autocorrelation in the MCMC sample. The first one uses the so-called
effective sample size (ESS), and the second one uses batch means (Roberts 1996; Jones et al. 2006).
The ESS-based estimator for MCSE, the default in bayesstats summary, is given by
√
MCSE(θ)b = sb/ ESS

ESS is defined as
max
Xlags
ESS = T /(1 + 2 ρk )
k=1
where ρk is the lag-k autocorrelation, and max lags is the maximum number less than or equal to
ρlag such that for all k = 1, . . . , max lags, |ρk | > ρtol , where ρlag and ρtol are specified in options
corrlag() and corrtol() with the respective default values of 500 and 0.01. ρk is estimated as
γk /γ0 , where
T −k
1 X
γk = (θt − θ)(θ
b t+k − θ) b
T t=1
is the lag-k empirical autocovariance.
With multiple chains, the overall ESS is given by the sum of the effective sample sizes of individual
chains. The MCSE is then calculated using the formula
v
uM
uX
MCSE(θ) b = sb/t ESSj
j=1

where sb is computed using (1) and ESSj is the effective sample size of the j th chain.
The batch-means estimator of MCSE is obtained as follows. For a given batch of length b, the
initial MCMC chain is split into m batches of size b,
{θj 0 +1 , . . . , θj 0 +b } {θj 0 +b+1 , . . . , θj 0 +2b } . . . {θT −b+1 , . . . , θT }
0
where j = T − m × b and m batch means µ b1 , . . . , µbm are calculated as sample means of each
batch. m is chosen as the maximum number such that m × b ≤ T . If m is not a divisor of T ,
the first T − m × b observations of the sample are not used in the batch-means computation. The
batch-means estimator of the posterior variance, sb2batch , is based on the assumption that µ bj s are much
less correlated than the original sample draws.
The batch-means estimator of the posterior mean is
m
1 X
θbbatch = µ
bj
m j=1
14 bayesstats summary — Bayesian summary statistics

We have θbbatch = θb, whenever m × b = T . Under the assumption that the batch means are
Pm
uncorrelated, sb2batch = {1/(m − 1)} j=1 (b µj − θbbatch )2 can be used as an estimator of σ 2 /b. This
fact justifies the batch-means estimator of MCSE given by

sbbatch
MCSEbatch (θ)
b = √
m

The accuracy of the batch-means estimator depends on the choice of the batch length b. The higher
the batch length b should be, provided
the autocorrelation in the original MCMC sample, the larger √
that the number of batches m does not become too small; T is typically used as the maximum
value for b. The batch length is commonly determined by inspecting the autocorrelation plot for θ.
Under certain assumptions, Flegal and Jones (2010) establish that an asymptotically optimal batch
size is of order T 1/3 .
With multiple chains, the batch-means estimator is calculated using the combined sample of all
chains or of those that are requested in the chains() option.

Credible intervals
Let θ(1) , . . . , θ(T ) be an MCMC sample ordered from smallest to largest. Let (1 − α) be a credible
level. Then, a {100 × (1 − α)}% equal-tailed credible interval is

(θ([T α/2]) , θ([T (1−α/2)]) )

where [ ] in the above imply an integer number.

A {100 × (1 − α)}% HPD interval is defined as the shortest interval among the {100 × (1 − α)}%
credible intervals (θ(j) , θ(j+[T (1−α)]) ), j = 1, . . . , T − [T (1 − α)].
With multiple chains, credible intervals are computed using the combined sample of all chains or
of those requested with the chains() option; see Brooks and Gelman (1998, sec. 1.1).

References
Brooks, S. P., and A. Gelman. 1998. General methods for monitoring convergence of iterative simulations. Journal
of Computational and Graphical Statistics 7: 434–455. https://doi.org/10.1080/10618600.1998.10474787.
Chen, M.-H., and Q.-M. Shao. 1999. Monte Carlo estimation of Bayesian credible and HPD intervals. Journal of
Computational and Graphical Statistics 8: 69–92. https://doi.org/10.2307/1390921.
Flegal, J. M., and G. L. Jones. 2010. Batch means and spectral variance estimators in Markov chain Monte Carlo.
Annals of Statistics 38: 1034–1070. https://doi.org/10.1214/09-AOS735.
Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. 2014. Bayesian Data Analysis.
3rd ed. Boca Raton, FL: Chapman and Hall/CRC.
Jones, G. L., M. Haran, B. S. Caffo, and R. Neath. 2006. Fixed-width output analysis for Markov chain Monte Carlo.
Journal of the American Statistical Association 101: 1537–1547. https://doi.org/10.1198/016214506000000492.
Roberts, G. O. 1996. Markov chain concepts related to sampling algorithms. In Markov Chain Monte Carlo in Practice,
ed. W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, 45–57. Boca Raton, FL: Chapman and Hall.
bayesstats summary — Bayesian summary statistics 15

Also see
[BAYES] bayes — Bayesian regression models using the bayes prefix+
[BAYES] bayesmh — Bayesian models using Metropolis–Hastings algorithm+
[BAYES] bayesselect — Bayesian variable selection for linear regression+
[BAYES] Bayesian estimation — Bayesian estimation commands
[BAYES] Bayesian postestimation — Postestimation tools after Bayesian estimation
[BAYES] bayesgraph — Graphical summaries and convergence diagnostics
[BAYES] bayespredict — Bayesian predictions
[BAYES] bayesstats ess — Effective sample sizes and related statistics
[BAYES] bayesstats ppvalues — Bayesian predictive p-values and other predictive summaries
[BAYES] bayestest interval — Interval hypothesis testing

Stata, Stata Press, and Mata are registered trademarks of StataCorp LLC. Stata and
®
Stata Press are registered trademarks with the World Intellectual Property Organization
of the United Nations. StataNow and NetCourseNow are trademarks of StataCorp
LLC. Other brand and product names are registered trademarks or trademarks of their
respective companies. Copyright c 1985–2023 StataCorp LLC, College Station, TX,
USA. All rights reserved.
For suggested citations, see the FAQ on citing Stata documentation.

CH 05 Testj
No ratings yet
CH 05 Testj
16 pages
GREEN Belt Statistics Cheat Sheet
No ratings yet
GREEN Belt Statistics Cheat Sheet
13 pages
Psychology Ia
100% (6)
Psychology Ia
21 pages
bayesbayesianpostestimation.pdf#bayesBayesianpostestimation
No ratings yet
bayesbayesianpostestimation.pdf#bayesBayesianpostestimation
7 pages
bayesbayestestinterval.pdf#bayesbayestestinterval
No ratings yet
bayesbayestestinterval.pdf#bayesbayestestinterval
13 pages
bayesbayestestmodel.pdf#bayesbayestestmodel
No ratings yet
bayesbayestestmodel.pdf#bayesbayestestmodel
12 pages
Bayes
No ratings yet
Bayes
825 pages
Stata Bayesian Analysis Reference Manual: Release 14
No ratings yet
Stata Bayesian Analysis Reference Manual: Release 14
281 pages
Bayes
No ratings yet
Bayes
281 pages
Bayes PDF
No ratings yet
Bayes PDF
634 pages
bmabmacoefsample
No ratings yet
bmabmacoefsample
6 pages
slides
No ratings yet
slides
381 pages
BayesianThinking Day1 Albert WORKSHOP Ppts PDF
No ratings yet
BayesianThinking Day1 Albert WORKSHOP Ppts PDF
188 pages
Lecture 10
No ratings yet
Lecture 10
33 pages
Bayesian Modelling For Data Analysis and Learning From Data
No ratings yet
Bayesian Modelling For Data Analysis and Learning From Data
19 pages
Computational Statistics With Matlab: Mark Steyvers May 13, 2011
No ratings yet
Computational Statistics With Matlab: Mark Steyvers May 13, 2011
78 pages
Putational Statistics Using Matlab
No ratings yet
Putational Statistics Using Matlab
78 pages
Python in Stata
No ratings yet
Python in Stata
21 pages
Lec18 HierarchicalBayesianModels
No ratings yet
Lec18 HierarchicalBayesianModels
20 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
2021 Lecture09 BayesianNetworks
No ratings yet
2021 Lecture09 BayesianNetworks
60 pages
EECS6895 AdvancedBigDataAnalytics Lecture6
No ratings yet
EECS6895 AdvancedBigDataAnalytics Lecture6
81 pages
Lecture1 introToBayes
No ratings yet
Lecture1 introToBayes
65 pages
bayesbayesmh.pdf#bayesbayesmh
No ratings yet
bayesbayesmh.pdf#bayesbayesmh
172 pages
Bayesian Statistics (Szábo & V.d.vaart)
No ratings yet
Bayesian Statistics (Szábo & V.d.vaart)
146 pages
Lec16 SummarizingPosteriors BayesianModelSelection
No ratings yet
Lec16 SummarizingPosteriors BayesianModelSelection
59 pages
bayesian-inference
No ratings yet
bayesian-inference
18 pages
Bayesian-Statistics Final 20140416 3
No ratings yet
Bayesian-Statistics Final 20140416 3
38 pages
Computational Bayesian Statistics
100% (1)
Computational Bayesian Statistics
254 pages
확통1 LectureNote09 on Bayesian Statistical Inference
No ratings yet
확통1 LectureNote09 on Bayesian Statistical Inference
78 pages
Basic Tutorial Stata PDF
No ratings yet
Basic Tutorial Stata PDF
5 pages
Where can buy An Introduction to Bayesian Inference, Methods and Computation Nick Heard ebook with cheap price
100% (4)
Where can buy An Introduction to Bayesian Inference, Methods and Computation Nick Heard ebook with cheap price
37 pages
Trotta_Bayesian_Inference_Aug_2018
No ratings yet
Trotta_Bayesian_Inference_Aug_2018
57 pages
Bayesian Inference: The Basics
No ratings yet
Bayesian Inference: The Basics
37 pages
4 Introduction ToStatistica JJT
No ratings yet
4 Introduction ToStatistica JJT
33 pages
Computational Statistics With Matlab
No ratings yet
Computational Statistics With Matlab
71 pages
Bayesian Ibrahim
No ratings yet
Bayesian Ibrahim
370 pages
Intro
No ratings yet
Intro
5 pages
An Introduction to Bayesian Inference, Methods and Computation Nick Heard - Read the ebook now with the complete version and no limits
100% (1)
An Introduction to Bayesian Inference, Methods and Computation Nick Heard - Read the ebook now with the complete version and no limits
62 pages
1-MS2 (Intro Bayes)
No ratings yet
1-MS2 (Intro Bayes)
38 pages
Stata Datawork
No ratings yet
Stata Datawork
22 pages
Stata 1
No ratings yet
Stata 1
24 pages
24 Intro to Bayesian Inference (1)
No ratings yet
24 Intro to Bayesian Inference (1)
33 pages
Lecture2 2013
No ratings yet
Lecture2 2013
60 pages
Introduction Stata Slides 2
No ratings yet
Introduction Stata Slides 2
25 pages
2021 - Nature - Bayesian Statistics and Modelling
100% (1)
2021 - Nature - Bayesian Statistics and Modelling
26 pages
Bayesian Statistics: A Biologist S Interpretation: Marguerite Pelletier URI Natural Resources Science / U.S. EPA
No ratings yet
Bayesian Statistics: A Biologist S Interpretation: Marguerite Pelletier URI Natural Resources Science / U.S. EPA
19 pages
Rnbreg
No ratings yet
Rnbreg
14 pages
RB Sample
No ratings yet
RB Sample
8 pages
Description: Package Name: Author: Date
No ratings yet
Description: Package Name: Author: Date
14 pages
Main
No ratings yet
Main
195 pages
Stata Handout
No ratings yet
Stata Handout
16 pages
Bayesian-inference-slides-2021
No ratings yet
Bayesian-inference-slides-2021
37 pages
23 STS907
No ratings yet
23 STS907
16 pages
23-24_M4_RM_TA_Basic_Stata_use
No ratings yet
23-24_M4_RM_TA_Basic_Stata_use
19 pages
25 Intro to Bayesian Inference (1)
No ratings yet
25 Intro to Bayesian Inference (1)
31 pages
Lecture 5 - 8 Bayesian Estimation
No ratings yet
Lecture 5 - 8 Bayesian Estimation
65 pages
Stata An Introduction Summer 2020
No ratings yet
Stata An Introduction Summer 2020
60 pages
Lec_4
No ratings yet
Lec_4
35 pages
Bayesian
No ratings yet
Bayesian
26 pages
Stata Logistic
No ratings yet
Stata Logistic
4 pages
Bayesbayesxtreg
No ratings yet
Bayesbayesxtreg
4 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
CCP303
No ratings yet
CCP303
17 pages
Chem 26.1 Lab Manual 2017 Edition (2019) PDF
No ratings yet
Chem 26.1 Lab Manual 2017 Edition (2019) PDF
63 pages
CFP Investment Planning Practice Book Sample
100% (1)
CFP Investment Planning Practice Book Sample
51 pages
Chapter - I Introduction and Design of The Study
No ratings yet
Chapter - I Introduction and Design of The Study
23 pages
Exercises
100% (1)
Exercises
37 pages
MC 106 354 395
No ratings yet
MC 106 354 395
42 pages
Chapter - 1: 1.1 Background of The Study
No ratings yet
Chapter - 1: 1.1 Background of The Study
11 pages
Users Guide CUDAL
No ratings yet
Users Guide CUDAL
52 pages
Problem Set
100% (1)
Problem Set
23 pages
Statistics for Management and Economics 10th Edition by Gerald Keller (eBook PDF)instant download
100% (3)
Statistics for Management and Economics 10th Edition by Gerald Keller (eBook PDF)instant download
59 pages
F9FM-Session07 D08iojim
No ratings yet
F9FM-Session07 D08iojim
12 pages
Cao 4
No ratings yet
Cao 4
52 pages
Den
No ratings yet
Den
15 pages
Jensen R. Broñola Jr. Bs Physics-Wmsu Ms Physics-Dlsu-Manila
No ratings yet
Jensen R. Broñola Jr. Bs Physics-Wmsu Ms Physics-Dlsu-Manila
12 pages
Usp Diapositivas 1220 Ciclo de Vida Estadistica
No ratings yet
Usp Diapositivas 1220 Ciclo de Vida Estadistica
121 pages
Jurnal
No ratings yet
Jurnal
10 pages
Working With Z-Scores 2:: Surveys & The Central Limit Theorem
No ratings yet
Working With Z-Scores 2:: Surveys & The Central Limit Theorem
3 pages
Every Number in The 5th, 10th, 15th, and 20th Position Is A Larger Value
No ratings yet
Every Number in The 5th, 10th, 15th, and 20th Position Is A Larger Value
2 pages
Counting Stomata Lab
No ratings yet
Counting Stomata Lab
8 pages
BComp3 Module 5 Measures of Variability
No ratings yet
BComp3 Module 5 Measures of Variability
17 pages
Normal Distribution Properties Areas Under Normal Curve Roselle B. Berres
No ratings yet
Normal Distribution Properties Areas Under Normal Curve Roselle B. Berres
26 pages
Department of Industrial and Management Engineering: Arab Academy For Science, Technology, and Maritime Transport
No ratings yet
Department of Industrial and Management Engineering: Arab Academy For Science, Technology, and Maritime Transport
10 pages
ASSIGNMENT
No ratings yet
ASSIGNMENT
1 page
16 ACTL2131 Exercises
No ratings yet
16 ACTL2131 Exercises
94 pages
2 - Principles of Epidemiology in Public Health Practice, 3rd Edition
No ratings yet
2 - Principles of Epidemiology in Public Health Practice, 3rd Edition
252 pages
TTT - Tapping Torque TestsystemE PDF
No ratings yet
TTT - Tapping Torque TestsystemE PDF
18 pages
Rajat Kapoor Assignment No. 5 8102
No ratings yet
Rajat Kapoor Assignment No. 5 8102
7 pages

bayesbayesstatssummary.pdf#bayesbayesstatssummary

Uploaded by

bayesbayesstatssummary.pdf#bayesbayesstatssummary

Uploaded by

Title stata.

Description Quick start Menu Syntax

Summary statistics for model parameters

Summary statistics for selected model parameters

Summary statistics for expressions of model parameters

Summary statistics of log-likelihood or log-posterior functions

paramspec can be one of the following:

exprspec is an optionally labeled expression of model parameters specified in parentheses:

Summary statistics for predictions

Summary statistics for expressions of simulated outcomes, residuals, and more

Remarks and examples stata.com

Bayesian summaries for an auto data example

Bayesian normal regression MCMC iterations = 12,500

var 34.76572 5.91534 .180754 34.18391 24.9129 47.61286

Example 1: Summaries for all parameters

var 34.76572 5.91534 .180754 34.18391 24.9129 47.61286

Example 2: Credible intervals

var 34.76572 5.91534 .180754 34.18391 26.28517 44.81732

As expected, 90% credible intervals are more narrow.

To calculate and report HPD intervals, we specify the hpd option.

var 34.76572 5.91534 .180754 34.18391 24.34876 46.12339

Example 3: Batch-means estimator

var 34.76572 5.91534 .135295 34.18391 24.9129 47.61286

Note: Mean and MCSE are estimated using batch means.

Example 4: Subsampling or thinning the chain

var 34.7396 5.897313 .206269 33.91782 24.9554 48.11452

Example 5: Summaries for expressions of model parameters

sd 5.87542 .4951654 .014972 5.846701 4.991282 6.900207

prob .9502 .2175424 .005301 1 0 1

Example 6: Summaries for log likelihood and log posterior

_ll -235.4162 .990654 .032232 -235.1379 -238.1236 -234.4345

Example 7: Summaries for predicted outcomes

. bayesstats summary {_ysim} using mpgreps

_ysim1_1 21.24878 6.018783 .062648 21.23939 9.444973 33.07051

expr1 .4479 .497303 .004973 0 0 1

Methods and formulas

The estimator of the posterior variance is

(θ([T α/2]) , θ([T (1−α/2)]) )

where [ ] in the above imply an integer number.

You might also like