How to Calculate R-Squared for glm in R
Last Updated :
18 Jun, 2024
R-squared (R²) is a measure of goodness-of-fit that quantifies the proportion of variance in the dependent variable explained by the independent variables in a regression model. While commonly used in linear regression, R-squared can also be calculated for generalized linear models (GLMs), which encompass various types of regression models beyond the linear framework. In this guide, we'll walk through the process of calculating R-squared for GLM in R Programming Language.
Understanding R-Squared in GLM
In GLM, R-squared represents the proportion of the deviance explained by the model. Deviance is a measure of model fit analogous to the residual sum of squares in linear regression. R-squared ranges from 0 to 1, where a higher value indicates a better fit of the model to the data.
Calculating R-Squared for GLM
To calculate R-squared for GLMs in R, you first need to fit the GLM using the glm()
function. Once the model is fitted, you can extract the deviance and null deviance from the model summary.
Now we will discuss step by step for How to Calculate R-Squared for glm in R Programming Language.
Step 1: Create Dataset
Let's generate synthetic data for a logistic regression analysis. We'll create a dataset with two predictor variables and a binary outcome variable. Then, we'll fit a logistic regression model to predict the binary outcome based on the predictor variables and calculate the R-squared value.
R
# Set seed for reproducibility
set.seed(123)
# Generate synthetic data
n <- 1000 # Number of observations
x1 <- rnorm(n) # Predictor variable 1
x2 <- rnorm(n) # Predictor variable 2
x3 <- rnorm(n) # Predictor variable 3
epsilon <- rnorm(n, mean = 0, sd = 0.5) # Error term
beta0 <- -1 # Intercept
beta1 <- 0.5 # Coefficient for x1
beta2 <- 0.3 # Coefficient for x2
beta3 <- 0.2 # Coefficient for x3
prob <- plogis(beta0 + beta1*x1 + beta2*x2 + beta3*x3 + epsilon)
y <- rbinom(n, 1, prob) # Binary response variable
# Create dataset
data <- data.frame(y, x1, x2, x3)
head(data)
Output:
y x1 x2 x3
1 0 -0.56047565 -0.99579872 -0.5116037
2 0 -0.23017749 -1.03995504 0.2369379
3 0 1.55870831 -0.01798024 -0.5415892
4 0 0.07050839 -0.13217513 1.2192276
5 1 0.12928774 -2.54934277 0.1741359
6 0 1.71506499 1.04057346 -0.6152683
Step 2: Fit the GLM Model
First, we need to fit a GLM model to the data using the glm() function in R. Specify the appropriate family and link function based on the type of response variable (e.g., Gaussian, binomial, Poisson).
R
# Fit the GLM
model <- glm(y ~ x1 + x2 + x3, data = data, family = binomial)
Step 3: Calculate R-Squared
To calculate R-squared for the GLM model, we need to compute the ratio of deviance explained by the model to the total deviance. This can be achieved using the null.deviance and deviance components of the GLM model object.
R
# Calculate R-squared
deviance <- summary(logit_model)$deviance
null_deviance <- summary(logit_model)$null.deviance
rsquared <- 1 - (deviance / null_deviance)
# Print the R-squared value
print(rsquared)
Output:
0.7721
Conclusion
In this guide, we have demonstrated how to calculate R-squared for GLM in R using a step-by-step approach. R-squared provides a useful measure of goodness-of-fit for GLM models, enabling researchers to assess the extent to which the predictors explain variability in the response variable. By understanding and calculating R-squared, analysts can evaluate the performance of their GLM models and make informed decisions about model selection and interpretation.
Similar Reads
How to Calculate Mean Squared Error in Excel?
Mean Squared Error is defined as the mean of the square of the difference between the actual values and the expected values.Where,O = Observed value,E = Expected Value,n = No. of observationsExample:Follow the below steps to evaluate the MSE in Excel:Step 1: Suppose we are given the data in form of
1 min read
How to Calculate Root Mean Square Error in Excel?
In simple terms, Root mean square error means how much far apart are the observed values and predicted values on average. The formula for calculating the root-mean-square error is as follows : Where, n: number of samplesf: Forecasto: observed valuesCalculating Root Mean Square Error in Excel : Follo
2 min read
How to Calculate Cohenâs d in R
In this article, we will discuss what is Cohenâs d and how to Calculate Cohenâs d in R Programming Language. What is Cohenâs d?Cohen's d is a measure that indicates the difference between two means. It is commonly used to quantify the magnitude of the difference between two groups in a study. It is
3 min read
How to Calculate Quartiles in R?
In this article, we will discuss how to calculate quartiles in the R programming language. Quartiles are just special percentiles that occur after a certain percent of data has been covered.First quartile: Refers to 25th percentile of the data. This depicts that 25% percent of data is under the prod
1 min read
How to Calculate Standardized Residuals in R
Residuals measure the difference between observed values and values predicted by a regression model. A standardized residual measures each residual by its estimated standard deviation making it easier to identify outliers and influential observations. In R standardized residuals are calculated with
3 min read
How to Calculate Cosine Similarity in R?
In this article, we are going to see how to calculate Cosine Similarity in the R Programming language. We can define cosine similarity as the measure of the similarity between two vectors of an inner product space. The formula to calculate the cosine similarity between two vectors is: ΣXiYi / (âΣXi^
2 min read
How to Calculate the OOB of Random Forest in R?
Random Forest is a versatile machine-learning algorithm that builds multiple decision trees and merges them together to get a more accurate and stable prediction. One of the advantages of Random Forest is that it provides an internal error estimation known as the Out-of-Bag (OOB) error. This article
4 min read
How to Calculate a Bootstrap Standard Error in R?
In this article, we will be looking at the different approaches to calculate a bootstrap standard error using various packages and their functionalities in the R programming language. Bootstrap Standard Error: The standard deviation of the bootstrap samples (also known as the bootstrap standard erro
3 min read
How to calculate standard error and CI to plot in R
In statistics, the standard error (SE) and confidence intervals (CI) are essential measures used to understand the variability and uncertainty associated with a sample statistic, such as the mean. In R Programming Language Calculating these values is crucial for assessing the reliability of estimate
9 min read
How to Calculate Geometric Mean in R?
In this article, we will discuss how to calculate the Geometric Mean in R Programming Language.We can define the geometric mean as the average rate of return of a set of values calculated using the products of the terms.Method 1: Compute Geometric Mean ManuallyIn this method, the user can calculate
2 min read