How to Calculate R-Squared for glm in R
Last Updated :
18 Jun, 2024
R-squared (R²) is a measure of goodness-of-fit that quantifies the proportion of variance in the dependent variable explained by the independent variables in a regression model. While commonly used in linear regression, R-squared can also be calculated for generalized linear models (GLMs), which encompass various types of regression models beyond the linear framework. In this guide, we'll walk through the process of calculating R-squared for GLM in R Programming Language.
Understanding R-Squared in GLM
In GLM, R-squared represents the proportion of the deviance explained by the model. Deviance is a measure of model fit analogous to the residual sum of squares in linear regression. R-squared ranges from 0 to 1, where a higher value indicates a better fit of the model to the data.
Calculating R-Squared for GLM
To calculate R-squared for GLMs in R, you first need to fit the GLM using the glm()
function. Once the model is fitted, you can extract the deviance and null deviance from the model summary.
Now we will discuss step by step for How to Calculate R-Squared for glm in R Programming Language.
Step 1: Create Dataset
Let's generate synthetic data for a logistic regression analysis. We'll create a dataset with two predictor variables and a binary outcome variable. Then, we'll fit a logistic regression model to predict the binary outcome based on the predictor variables and calculate the R-squared value.
R
# Set seed for reproducibility
set.seed(123)
# Generate synthetic data
n <- 1000 # Number of observations
x1 <- rnorm(n) # Predictor variable 1
x2 <- rnorm(n) # Predictor variable 2
x3 <- rnorm(n) # Predictor variable 3
epsilon <- rnorm(n, mean = 0, sd = 0.5) # Error term
beta0 <- -1 # Intercept
beta1 <- 0.5 # Coefficient for x1
beta2 <- 0.3 # Coefficient for x2
beta3 <- 0.2 # Coefficient for x3
prob <- plogis(beta0 + beta1*x1 + beta2*x2 + beta3*x3 + epsilon)
y <- rbinom(n, 1, prob) # Binary response variable
# Create dataset
data <- data.frame(y, x1, x2, x3)
head(data)
Output:
y x1 x2 x3
1 0 -0.56047565 -0.99579872 -0.5116037
2 0 -0.23017749 -1.03995504 0.2369379
3 0 1.55870831 -0.01798024 -0.5415892
4 0 0.07050839 -0.13217513 1.2192276
5 1 0.12928774 -2.54934277 0.1741359
6 0 1.71506499 1.04057346 -0.6152683
Step 2: Fit the GLM Model
First, we need to fit a GLM model to the data using the glm() function in R. Specify the appropriate family and link function based on the type of response variable (e.g., Gaussian, binomial, Poisson).
R
# Fit the GLM
model <- glm(y ~ x1 + x2 + x3, data = data, family = binomial)
Step 3: Calculate R-Squared
To calculate R-squared for the GLM model, we need to compute the ratio of deviance explained by the model to the total deviance. This can be achieved using the null.deviance and deviance components of the GLM model object.
R
# Calculate R-squared
deviance <- summary(logit_model)$deviance
null_deviance <- summary(logit_model)$null.deviance
rsquared <- 1 - (deviance / null_deviance)
# Print the R-squared value
print(rsquared)
Output:
0.7721
Conclusion
In this guide, we have demonstrated how to calculate R-squared for GLM in R using a step-by-step approach. R-squared provides a useful measure of goodness-of-fit for GLM models, enabling researchers to assess the extent to which the predictors explain variability in the response variable. By understanding and calculating R-squared, analysts can evaluate the performance of their GLM models and make informed decisions about model selection and interpretation.