Open In App

How to Calculate R-Squared for glm in R

Last Updated : 18 Jun, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

R-squared (R²) is a measure of goodness-of-fit that quantifies the proportion of variance in the dependent variable explained by the independent variables in a regression model. While commonly used in linear regression, R-squared can also be calculated for generalized linear models (GLMs), which encompass various types of regression models beyond the linear framework. In this guide, we'll walk through the process of calculating R-squared for GLM in R Programming Language.

Understanding R-Squared in GLM

In GLM, R-squared represents the proportion of the deviance explained by the model. Deviance is a measure of model fit analogous to the residual sum of squares in linear regression. R-squared ranges from 0 to 1, where a higher value indicates a better fit of the model to the data.

Calculating R-Squared for GLM

To calculate R-squared for GLMs in R, you first need to fit the GLM using the glm() function. Once the model is fitted, you can extract the deviance and null deviance from the model summary.

Now we will discuss step by step for How to Calculate R-Squared for glm in R Programming Language.

Step 1: Create Dataset

Let's generate synthetic data for a logistic regression analysis. We'll create a dataset with two predictor variables and a binary outcome variable. Then, we'll fit a logistic regression model to predict the binary outcome based on the predictor variables and calculate the R-squared value.

R
# Set seed for reproducibility
set.seed(123)

# Generate synthetic data
n <- 1000  # Number of observations
x1 <- rnorm(n)  # Predictor variable 1
x2 <- rnorm(n)  # Predictor variable 2
x3 <- rnorm(n)  # Predictor variable 3
epsilon <- rnorm(n, mean = 0, sd = 0.5)  # Error term
beta0 <- -1  # Intercept
beta1 <- 0.5  # Coefficient for x1
beta2 <- 0.3  # Coefficient for x2
beta3 <- 0.2  # Coefficient for x3
prob <- plogis(beta0 + beta1*x1 + beta2*x2 + beta3*x3 + epsilon) 
y <- rbinom(n, 1, prob)  # Binary response variable

# Create dataset
data <- data.frame(y, x1, x2, x3)
head(data)

Output:

  y          x1          x2         x3
1 0 -0.56047565 -0.99579872 -0.5116037
2 0 -0.23017749 -1.03995504 0.2369379
3 0 1.55870831 -0.01798024 -0.5415892
4 0 0.07050839 -0.13217513 1.2192276
5 1 0.12928774 -2.54934277 0.1741359
6 0 1.71506499 1.04057346 -0.6152683

Step 2: Fit the GLM Model

First, we need to fit a GLM model to the data using the glm() function in R. Specify the appropriate family and link function based on the type of response variable (e.g., Gaussian, binomial, Poisson).

R
# Fit the GLM
model <- glm(y ~ x1 + x2 + x3, data = data, family = binomial)

Step 3: Calculate R-Squared

To calculate R-squared for the GLM model, we need to compute the ratio of deviance explained by the model to the total deviance. This can be achieved using the null.deviance and deviance components of the GLM model object.

R
# Calculate R-squared
deviance <- summary(logit_model)$deviance
null_deviance <- summary(logit_model)$null.deviance
rsquared <- 1 - (deviance / null_deviance)

# Print the R-squared value
print(rsquared)

Output:

0.7721 

Conclusion

In this guide, we have demonstrated how to calculate R-squared for GLM in R using a step-by-step approach. R-squared provides a useful measure of goodness-of-fit for GLM models, enabling researchers to assess the extent to which the predictors explain variability in the response variable. By understanding and calculating R-squared, analysts can evaluate the performance of their GLM models and make informed decisions about model selection and interpretation.


Next Article

Similar Reads