0% found this document useful (0 votes)

37 views

bayes_R2_v3

This document discusses the challenges of defining R-squared (R2) for Bayesian regression models, highlighting that the classical definition can yield values greater than 1 due to posterior uncertainty. The authors propose an alternative definition of R2 that incorporates the variance of predicted values and expected variance of errors, ensuring the ratio remains between 0 and 1. The paper also provides a method for calculating Bayesian R2 using posterior simulations, aiming to improve the interpretation of model fit in Bayesian contexts.

Uploaded by

royalinkonsult

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views

bayes_R2_v3

Uploaded by

royalinkonsult

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

R-squared for Bayesian regression models∗

Andrew Gelman† Ben Goodrich‡ Jonah Gabry‡ Aki Vehtari§

4 Nov 2018

Abstract

The usual definition of R2 (variance of the predicted values divided by the variance of the
data) has a problem for Bayesian fits, as the numerator can be larger than the denominator.
We propose an alternative definition similar to one that has appeared in the survival analysis
literature: the variance of the predicted values divided by the variance of predicted values plus
the expected variance of the errors.

1. The problem

Consider a regression model of outcomes y and predictors X with predicted values E(y|X, θ), fit
to data (X, y)n , n = 1, . . . , N . Ordinary least squares yields an estimated parameter vector θ̂ with
N ŷ , where we are using the notation,
predicted values ŷn = E(y|Xn , θ̂) and residual variance Vn=1 n

N
N 1 X
Vn=1 zn = (zn − z̄)2 , for any vector z.
N −1
n=1

The proportion of variance explained,

N ŷ
Vn=1 n
classical R2 = N
, (1)
Vn=1 yn

is a commonly used measure of model fit, and there is a long literature on interpreting it, adjusting
it for degrees of freedom used in fitting the model, and generalizing it to other settings such as
hierarchical models; see, for example, Xu (2003) and Gelman and Pardoe (2006).
Two challenges arise in defining R2 in a Bayesian context. The first is the desire to reflect
posterior uncertainty in the coefficients, which should remove or at least reduce the overfitting
problem of least squares. Second, in the presence of strong prior information and weak data, it
N ŷ to be higher than total variance, V N y , so that the
is possible for the fitted variance, Vn=1 n n=1 n
classical formula (1) can yield an R2 greater than 1 (Tjur, 2009). In the present paper we propose a
generalization that has a Bayesian interpretation as a variance decomposition.
∗
To appear in The American Statistician. We thank Frank Harrell and Daniel Jeske for helpful comments and the
National Science Foundation, Office of Naval Research, Institute for Education Sciences, Defense Advanced Research
Projects Agency, and Sloan Foundation for partial support of this work.
†
Department of Statistics and Department of Political Science, Columbia University.
‡
Institute for Social and Economic Research and Policy, Columbia University.
§
Department of Computer Science, Aalto University.
Least squares and Bayes fits Bayes posterior simulations

2
●

● ●

Posterior mean fit

1
● ● ●
0

0
y

y
●

● ●
● ●
Least−squares
fit ●
−1

−1
● ●

●
(Prior regression line)
−2

−2
−2 −1 0 1 2 −2 −1 0 1 2
x x

Figure 1: Simple example showing the challenge of defining R2 for a fitted Bayesian model. Left
plot: data, least-squares regression line, and fitted Bayes line, which is a compromise between the
prior and the least-squares fit. The standard deviation of the fitted values from the Bayes model
(the blue dots on the line) is greater than the standard deviation of the data, so the usual definition
of R2 will not work. Right plot: posterior mean fitted regression line along with 20 draws of the
line from the posterior distribution. To define the posterior distribution of Bayesian R2 we compute
equation (3) for each posterior simulation draw.

2. Defining R2 based on the variance of estimated prediction errors

Our first thought for Bayesian R2 is simply to use the posterior mean estimate of θ to create
Bayesian predictions ŷn and then plug these into the classical formula (1). This has two problems:
first, it dismisses uncertainty to use a point estimate in Bayesian computation; and, second, the
ratio as thus defined can be greater than 1. When θ̂ is estimated using ordinary least squares, and
assuming the regression model includes a constant term, the numerator of (1) is less than or equal
to the denominator by definition; for general estimates, though, there is no requirement that this
be the case, and it would be awkward to say that a fitted model explains more than 100% of the
variance.
To see an example where the simple R2 would be inappropriate, consider the model y =
α + βx + error with a strong prior on (α, β) and only a few data points. Figure 1a shows data
and the least-squares regression line, with R2 of 0.77. We then do a Bayes fit with informative
priors α ∼ N(0, 0.22 ) and β ∼ N(1, 0.22 ). The standard deviation of the fitted values from the Bayes
model is 1.3, while the standard deviation of the data is only 1.08, so the square of this ratio—R2
as defined in (1)—is greater than 1. Figure 1b shows the posterior mean fitted regression line along
with 20 draws of the line y = α + βx from the fitted posterior distribution of (α, β).
Here is our proposal. First, instead of using point predictions ŷn , we use expected values

2
conditional on the unknown parameters,

ynpred = E(ỹn |Xn , θ),

where ỹn represents a future observation from the model with predictors Xn . For a linear model,
ynpred is simply the linear predictor, Xn β; for a generalized linear model it is the linear predictor
transformed to the data scale. The posterior distribution of θ induces a posterior predictive
distribution for y pred .
Second, instead of working with (1) directly, we define R2 explicitly based on the distribution of
future data ỹ, using the following variance decomposition for the denominator:
Explained variance varfit
alternative R2 = = , (2)
Explained variance + Residual variance varfit + varres
where

N N
varfit = Vn=1 E(ỹn |θ) = Vn=1 ynpred is the variance of the modeled predictive means, and
N
varres = E(Vn=1 (ỹn − ynpred )|θ) is the modeled residual variance.

This first of these quantites is the variance among the expectations of the new data; the second
term is the expected variance for new residuals, in both cases assuming the same predictors X as in
the observed data. We are following the usual practice in regression to model the outcomes y but
not the predictors X. As defined, varfit and varres are defined conditional on the model parameters
θ, and so our Bayesian R2 , the ratio (2), depends on θ as well.
Both variance terms can be computed using posterior quantities from the fitted model: varfit
is determined based on y pred which is a function of model parameters (for example, ynpred = Xn β
for linear regression and ynpred = logit−1 (Xn β) for logistic regression), and varres depends on
the modeled probability distribution; for example, varres = σ 2 for simple linear regression and
varres = N1 N
P
n=1 (πn (1 − πn )) for logistic regression.
By construction, the ratio (2) is always between 0 and 1, no matter what procedure is used
to construct the estimate y pred . Versions of (2) have appeared in the survival analysis literature
(Kent and O’Quigley, 1988; Choodari-Oskoo et al., 2010), where it makes sense to use expected
rather than observed data variance in the denominator, as this allows one to compute a measure of
explained variance that is completely independent of the censoring distribution in time-to-event
models. Our motivation is slightly different but the same mathematical principles apply, and our
measure could also be extended to nonlinear models.
In Bayesian inference, instead of a point estimate θ̂, we have a set of posterior simulation draws,
θs , s = 1, . . . , S. For each θs , we can compute the vector of predicted values ynpred s = E(ỹ|Xn , θs )
and the expected residual variance varsres , and thus the proportion of variance explained is,
N y pred s
Vn=1 n
Bayesian Rs2 = , (3)
N y pred s + vars
Vn=1 n res

3
Bayesian R squared posterior and median

0.00 0.25 0.50 0.75 1.00

Figure 2: The posterior distribution of Bayesian R2 for the simple example shown in Figure 1
computed using equation (3) for each posterior simulation draw.

where varsres = (σ 2 )s for a linear regression model with equal variances.

For linear regression and generalized linear models, expression (3) can be computed using the
posterior_linpred function in the rstanarm package and a few additional lines of code, as we
demonstrate in the appendix, or see Gelman et al. (2018) for further development. For the example
in Figure 1, we display the posterior distribution of R2 in Figure 2; this distribution has median
0.75, mean 0.70, and standard deviation 0.17.

3. Discussion

R2 has well-known problems as a measure of model fit, but it can be a handy quick summary for
linear regressions and generalized linear models (see, for example, Hu et al., 2006), and we would
like to produce it by default when fitting Bayesian regressions. Our preferred solution is to use (3):
predicted variance divided by predicted variance plus error variance. This measure is model based:
all variance terms come from the model, and not directly from the data.
A new issue then arises, though, when fitting a set of a models to a single dataset. Now that the
denominator of R2 is no longer fixed, we can no longer interpret an increase in R2 as a improved
fit to a fixed target. We think this particular loss of interpretation is necessary: from a Bayesian
perspective, a concept such as “explained variance” can ultimately only be interpreted in the context
of a model. The denominator of (3) can be interpreted as an estimate of the expected variance of
predicted future data from the model under the assumption that the predictors X are held fixed;
alternatively the predictors can be taken as random, as suggested by Helland (1987) and Tjur
(2009). In either case, we can consider our Bayesian R2 as a data-based estimate of the proportion
of variance explained for new data. If the goal is to see continual progress of the fit to existing data,
one can simply track the decline in the expected error variance, σ 2 .

4
Another issue that arises when using R2 to evaluate and compare models is overfitting. As with
other measures of predictive model fit, overfitting should be less of an issue with Bayesian inference
because averaging over the posterior distribution is more conservative than taking a least-squares
or maximum likelihood fit, but predictive accuracy for new data will still on average be lower, in
expectation, than for the data used to fit the model (Gelman et al., 2014). One could construct an
overfitting-corrected R2 in the same way that is done for log-score measures via cross-validation
(Vehtari et al., 2017). In the present paper we are trying to stay close to the sprit of the original R2
in quantifying the model’s fit to the data at hand.

References

Choodari-Oskoo, B., P. Royston, and M. K. B. Parmar (2010). A simulation study of predictive

ability measures in a survival model I: Explained variation measures. Statistics in Medicine 31,
2627–2643.

Gelman, A., B. Goodrich, J. Gabry, and A. Vehtari (2018). Bayesian R2 . https://avehtari.

github.io/bayes_R2/bayes_R2.html.

Gelman, A., J. Hwang, and A. Vehtari (2014). Understanding predictive information criteria for
Bayesian models. Statistics and Computing 24, 997–1016.

Gelman, A. and I. Pardoe (2006). Bayesian measures of explained variance and pooling in multilevel
(hierarchical) models. Technometrics 48, 241–251.

Helland, I. S. (1987). On the interpretation and use of R2 in regression analysis. Biometrics 43,
61–69.

Hu, B., M. Palta, and J. Shao (2006). Properties of R2 statistics for logistic regression. Statistics in
Medicine 25, 1383–1395.

Kent, J. T. and J. O’Quigley (1988). Measures of dependence for censored survival data.
Biometrika 75, 525–534.

Tjur, T. (2009). Coefficient of determination in logistic regression models—A new proposal: The
coefficient of discrimination. American Statistician 63, 366–372.

Vehtari, A., A. Gelman, and J. Gabry (2017). Practical Bayesian model evaluation using leave-one-out
cross-validation and WAIC. Statistics and Computing 27, 1413–1432.

Xu, R. (2003). Measuring explained variation in linear mixed-effects models. Statistics in Medicine 22,
3527–3541.

5
Appendix

This simple version of the bayes_R2 function works with Bayesian linear regressions fit using the
stan_glm function in the rstanarm package.

# Compute Bayesian R-squared for linear models.

#
# @param fit A fitted linear or logistic regression object in rstanarm
# @return A vector of R-squared values with length equal to
# the number of posterior draws.
#
bayes_R2 <- function(fit) {
y_pred <- rstanarm::posterior_linpred(fit)
var_fit <- apply(y_pred, 1, var)
var_res <- as.matrix(fit, pars = c("sigma"))^2
var_fit / (var_fit + var_res)
}

## Example from Figure 1 of the paper

x <- 1:5 - 3
y <- c(1.7, 2.6, 2.5, 4.4, 3.8) - 3
xy <- data.frame(x,y)

## Bayes fit with strong priors

library("rstanarm")
fit_bayes <- stan_glm(y ~ x, data = xy,
prior_intercept = normal(0, 0.2, autoscale = FALSE),
prior = normal(1, 0.2, autoscale = FALSE),
prior_aux = NULL)

## Compute Bayesian R2
rsq_bayes <- bayes_R2(fit_bayes)
hist(rsq_bayes)
print(c(median(rsq_bayes), mean(rsq_bayes), sd(rsq_bayes)))

Expanding the code to work for other generalized linear models requires some additional steps,
including setting transform=TRUE in the call to posterior_linpred (to apply the inverse-link
function to the linear predictor), the specification of the formula for varres for each distribution
class, and code to accomodate multilevel models fit using stan_glmer.

SRM Formula Sheet
No ratings yet
SRM Formula Sheet
10 pages
Quick Guide To Commonly Used Statistical Tests
No ratings yet
Quick Guide To Commonly Used Statistical Tests
1 page
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
Colin Cameron 1997
No ratings yet
Colin Cameron 1997
14 pages
An Introduction To Bayesian VAR (BVAR) Models R-Econometrics
No ratings yet
An Introduction To Bayesian VAR (BVAR) Models R-Econometrics
16 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
Bio24_Rathouz
No ratings yet
Bio24_Rathouz
45 pages
Cramer, Mean and Variance of r2 - 000009
No ratings yet
Cramer, Mean and Variance of r2 - 000009
14 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
Uni Variate Regression
No ratings yet
Uni Variate Regression
61 pages
Coefficient of Determination
No ratings yet
Coefficient of Determination
4 pages
Var Bayes Linreg
No ratings yet
Var Bayes Linreg
14 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
PBM Notes
No ratings yet
PBM Notes
130 pages
Bayes Gauss
100% (1)
Bayes Gauss
29 pages
Machine Learning and Pattern Recognition Bayesian Complexity Control
No ratings yet
Machine Learning and Pattern Recognition Bayesian Complexity Control
4 pages
Lecture 6: Classical Normal Linear Regression Model Some Basic Ideas
No ratings yet
Lecture 6: Classical Normal Linear Regression Model Some Basic Ideas
9 pages
Chap2 PDF
No ratings yet
Chap2 PDF
20 pages
Coefficient of Determination
No ratings yet
Coefficient of Determination
8 pages
확통1 LectureNote09 on Bayesian Statistical Inference
No ratings yet
확통1 LectureNote09 on Bayesian Statistical Inference
78 pages
AF ECO 4000 cheat sheet
No ratings yet
AF ECO 4000 cheat sheet
3 pages
Lecture 2
No ratings yet
Lecture 2
9 pages
SRM Notes
No ratings yet
SRM Notes
38 pages
Ec410 Lecture 4 - Simple Regression II
No ratings yet
Ec410 Lecture 4 - Simple Regression II
8 pages
M2S2 - Statistical Modelling: DR Axel Gandy Imperial College London Spring 2011
No ratings yet
M2S2 - Statistical Modelling: DR Axel Gandy Imperial College London Spring 2011
25 pages
Bayesian Modelling For Data Analysis and Learning From Data
No ratings yet
Bayesian Modelling For Data Analysis and Learning From Data
19 pages
Posterior Mean and Variance Approximation For Regression and Time Series Problems
No ratings yet
Posterior Mean and Variance Approximation For Regression and Time Series Problems
25 pages
GLMConstrained
No ratings yet
GLMConstrained
11 pages
RenSun Sankhya2004 ComparisonBayesFreqtstPrediction
No ratings yet
RenSun Sankhya2004 ComparisonBayesFreqtstPrediction
29 pages
Chaeat Sheet Econometrics
100% (2)
Chaeat Sheet Econometrics
5 pages
SSN
No ratings yet
SSN
9 pages
Week-2
No ratings yet
Week-2
54 pages
ch12_0
No ratings yet
ch12_0
43 pages
Chapter2 - Ordinary Least Squares
No ratings yet
Chapter2 - Ordinary Least Squares
32 pages
ch12 0
No ratings yet
ch12 0
82 pages
Lecturenotes 2
No ratings yet
Lecturenotes 2
10 pages
Lecture01 Uppsala EQG 12
No ratings yet
Lecture01 Uppsala EQG 12
39 pages
Eco No Metrics
No ratings yet
Eco No Metrics
4 pages
UnivariateRegression 3
No ratings yet
UnivariateRegression 3
81 pages
2024 1 Metrics 6 Multipleols 4
No ratings yet
2024 1 Metrics 6 Multipleols 4
18 pages
Manual Econometrics
No ratings yet
Manual Econometrics
20 pages
LECTURE 12
No ratings yet
LECTURE 12
8 pages
ST903 Week9sol
No ratings yet
ST903 Week9sol
2 pages
Cooks
No ratings yet
Cooks
5 pages
Bayesian Uncertainty Quantification
No ratings yet
Bayesian Uncertainty Quantification
23 pages
Empirical Finance 6
No ratings yet
Empirical Finance 6
38 pages
Standard Errors For Regression Equations
No ratings yet
Standard Errors For Regression Equations
4 pages
Bayesian Inference in The Normal Linear Regression Model
No ratings yet
Bayesian Inference in The Normal Linear Regression Model
53 pages
Linear Regression
No ratings yet
Linear Regression
56 pages
2 Statistical Definitions: 2.1 Probability Density Function
No ratings yet
2 Statistical Definitions: 2.1 Probability Density Function
9 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
θ, then the probability density function for Y, θ), can be written as  y∣=exp  ybcd  y θ) is called the natural −m  n y ,
No ratings yet
θ, then the probability density function for Y, θ), can be written as  y∣=exp  ybcd  y θ) is called the natural −m  n y ,
6 pages
LN3_Least Squares Estimation-Finite-Sample Properties_ver2_slides
No ratings yet
LN3_Least Squares Estimation-Finite-Sample Properties_ver2_slides
35 pages
Simple Linear Regression: Parameters
No ratings yet
Simple Linear Regression: Parameters
34 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
From Everand
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
Mohmmad Khaja Shareef
No ratings yet
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Complex numbers
From Everand
Complex numbers
Alessio Mangoni
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
STAT 310 Syllabus
No ratings yet
STAT 310 Syllabus
5 pages
SAS 10 ACC 117 1st Periodical Exam John Lorenz Dayanghirang.docx-1
No ratings yet
SAS 10 ACC 117 1st Periodical Exam John Lorenz Dayanghirang.docx-1
8 pages
CFA Level 2 Formula Sheet
No ratings yet
CFA Level 2 Formula Sheet
44 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
74 pages
AJAD 16 - 2 Technical Efficiency in Tilapia Aquaculture - Philippines
No ratings yet
AJAD 16 - 2 Technical Efficiency in Tilapia Aquaculture - Philippines
24 pages
Introduction To Polynomial Regression
No ratings yet
Introduction To Polynomial Regression
5 pages
Regression & Correlation
No ratings yet
Regression & Correlation
18 pages
Non Linear Regression
No ratings yet
Non Linear Regression
12 pages
Task 03: Data Analysis: House Pricing Vs Incinerator Installation
No ratings yet
Task 03: Data Analysis: House Pricing Vs Incinerator Installation
10 pages
Instant ebooks textbook (Ebook) Applied Multilevel Analysis: A Practical Guide for Medical Researchers (Practical Guides to Biostatistics and Epidemiology) by Jos W. R. Twisk ISBN 9780521849753, 0521849756 download all chapters
100% (3)
Instant ebooks textbook (Ebook) Applied Multilevel Analysis: A Practical Guide for Medical Researchers (Practical Guides to Biostatistics and Epidemiology) by Jos W. R. Twisk ISBN 9780521849753, 0521849756 download all chapters
81 pages
MTL390 L0 Introduction
No ratings yet
MTL390 L0 Introduction
12 pages
Roger Koenker, Gilbert Bassett and Jr.1978
No ratings yet
Roger Koenker, Gilbert Bassett and Jr.1978
19 pages
HW6 Solution
No ratings yet
HW6 Solution
10 pages
Chapter 7. Sampling and Sampling Distributions - Exercises
No ratings yet
Chapter 7. Sampling and Sampling Distributions - Exercises
49 pages
9.0 Lesson Plan
No ratings yet
9.0 Lesson Plan
16 pages
Chapter 15
No ratings yet
Chapter 15
13 pages
Ordinal and Multinomial Models
100% (1)
Ordinal and Multinomial Models
58 pages
Contoh Curve Fitting: Non Linear Regression
No ratings yet
Contoh Curve Fitting: Non Linear Regression
4 pages
Introduction to the Practice of Statistics 9th Edition Moore Test Bank download
100% (5)
Introduction to the Practice of Statistics 9th Edition Moore Test Bank download
69 pages
COSM - Lesson Plan (CSE)
No ratings yet
COSM - Lesson Plan (CSE)
4 pages
Jawaban UTS
No ratings yet
Jawaban UTS
2 pages
Econometrics II-1-1
No ratings yet
Econometrics II-1-1
37 pages
Student Program Section Unique ID Unique ID Left Middle
No ratings yet
Student Program Section Unique ID Unique ID Left Middle
43 pages
Chapter 2 - Maximum Likelihood - HEC_Lausanne
No ratings yet
Chapter 2 - Maximum Likelihood - HEC_Lausanne
277 pages
Econometrics Note Gujarati Chapter 14
No ratings yet
Econometrics Note Gujarati Chapter 14
10 pages
Forecasting Product/Item Demand: John Molson School of Business BSTA 477: Managerial Forecasting Winter 2019
No ratings yet
Forecasting Product/Item Demand: John Molson School of Business BSTA 477: Managerial Forecasting Winter 2019
51 pages
Generalized Linear Models
No ratings yet
Generalized Linear Models
109 pages
SMA 6304 / MIT 2.853 / MIT 2.854: Manufacturing Systems
No ratings yet
SMA 6304 / MIT 2.853 / MIT 2.854: Manufacturing Systems
35 pages

bayes_R2_v3

Uploaded by

bayes_R2_v3

Uploaded by

R-squared for Bayesian regression models∗

Andrew Gelman† Ben Goodrich‡ Jonah Gabry‡ Aki Vehtari§

The proportion of variance explained,

Posterior mean fit

2. Defining R2 based on the variance of estimated prediction errors

ynpred = E(ỹn |Xn , θ),

0.00 0.25 0.50 0.75 1.00

where varsres = (σ 2 )s for a linear regression model with equal variances.

Choodari-Oskoo, B., P. Royston, and M. K. B. Parmar (2010). A simulation study of predictive

Gelman, A., B. Goodrich, J. Gabry, and A. Vehtari (2018). Bayesian R2 . https://avehtari.

# Compute Bayesian R-squared for linear models.

## Example from Figure 1 of the paper

## Bayes fit with strong priors

You might also like