How to Extract the Residuals and Predicted Values from Linear Model in R?
Last Updated :
13 Sep, 2024
Extracting residuals and predicted (fitted) values from a linear model is essential in understanding the model's performance. The lm()
function fits linear models in R and you can easily extract residuals and predicted values using built-in functions. This article will guide you through the steps and theory behind residuals and predicted values.
Linear Model
A linear model assumes that the relationship between the dependent variable Y and the independent variables X1, X2,…, Xn can be modeled as:
Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_n X_n + \epsilon
- β0 is the intercept.
- β1,…,βn are the coefficients.
- ϵ is the error term (residuals).
Predicted Values
Predicted (or fitted) values are the estimated values of the dependent variable Y based on the linear model. They represent the expected values of Y for given values of the independent variables.
\hat{Y} = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_n X_n
Residuals
Residuals represent the difference between the observed values of Y and the predicted values Y^. They indicate how much the model's predictions deviate from the actual values.
\text{Residual} = Y - \hat{Y}
Extracting Residuals and Predicted Values
We will use the built-in mtcars
dataset for this example, where we fit a linear model with mpg
as the dependent variable and wt
(weight) as the independent variable.
Step 1: Fit the Linear Model
First we will Fit the Linear Model.
R
# Fit a linear model
model <- lm(mpg ~ wt, data = mtcars)
summary(model)
Output:
Call:
lm(formula = mpg ~ wt, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.5432 -2.3647 -0.1252 1.4096 6.8727
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
wt -5.3445 0.5591 -9.559 1.29e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
Step 2: Extract the Predicted Values
To extract the predicted values, use the fitted()
function.
R
# Extract predicted values
predicted_values <- fitted(model)
head(predicted_values)
Output:
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout Valiant
23.28261 21.91977 24.88595 20.10265 18.90014 18.79325
The fitted()
function returns the predicted values (or fitted values) for all observations based on the linear model.
Step 3: Extract the Residuals
To extract the residuals, use the residuals()
function.
R
# Extract residuals
residuals_values <- residuals(model)
head(residuals_values)
Output:
Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive Hornet Sportabout Valiant
-2.2826106 -0.9197704 -2.0859521 1.2973499 -0.2001440 -0.6932545
The residuals()
function returns the difference between the observed and predicted values for all observations.
Conclusion
Extracting the residuals and predicted values from a linear model in R is simple using the fitted()
and residuals()
functions. These values are crucial for evaluating model performance and identifying areas where the model may not fit the data well. You can also visualize these relationships to get a clearer understanding of the model's fit.
Similar Reads
How to Extract the Intercept from a Linear Regression Model in R
Linear regression is a method of predictive analysis in machine learning. It is basically used to check two things: If a set of predictor variables (independent) does a good job predicting the outcome variable (dependent).Which of the predictor variables are significant in terms of predicting the ou
4 min read
How to Extract the p-value and F-statistic from aov Output in R
In statistical analysis, the analysis of variance (ANOVA) is widely used to test if there are significant differences between the means of multiple groups. In R, the aov() function performs ANOVA, and the summary output includes important values like the F-statistic and p-value. These values help de
3 min read
How to Extract the Model Equation from Model Object in R?
Extracting the model equation from a model object in R allows you to understand the relationship between the dependent variable and the independent variables. This process is particularly useful when interpreting results from linear, logistic regression, or other statistical models. R provides funct
3 min read
How to Get Residuals from Repeated Measures ANOVA Model in R
In R Language when performing a Repeated Measures ANOVA, residuals can be an important diagnostic tool to assess the model's goodness-of-fit. Residuals represent the difference between observed and predicted values in the model, and their analysis can help in detecting violations of model assumption
4 min read
How to Calculate the p-value of Parameters for ARIMA Model in R?
ARIMA (AutoRegressive Integrated Moving Average) is a widely used statistical method for time series forecasting. Evaluating the significance of ARIMA model parameters is essential to understand the model's reliability. The p-value of the parameters helps in determining whether a parameter is signif
3 min read
How to Interpret Results of h2o.predict in R
H2O.ai is a great platform for machine learning and data science, offering open-source and enterprise-level solutions. It facilitates high-performance computing for large-scale data analysis. It provides a suite of tools designed to simplify and accelerate the development of machine learning models.
6 min read
How to Plot Predicted Values in R?
In this article, we will discuss how to plot predicted values in the R Programming Language. A linear model is used to predict the value of an unknown variable based on independent variables using the technique linear regression. It is mostly used for finding out the relationship between variables a
4 min read
How to proceed from Simple to Multiple and Polynomial Regression in R
Regression analysis allows us to understand how one or more independent variables relate to a dependent variable. Simple linear regression, which explores the relationship between two variables. Multiple linear regression extends this to include several predictors simultaneously. Finally, polynomial
11 min read
How to Extract a p-value When Performing anova() Between Two glm Models in R
Generalized Linear Models (glm) are widely used in R for modeling relationships between variables when the dependent variable is not normally distributed. Once two or more models are fitted, we often want to compare them to determine which model is a better fit. The anova() function in R allows us t
4 min read
How to Use the linearHypothesis() Function in R
In statistics, understanding how variables relate to each other is crucial. This helps in making smart decisions. When we build regression models, we need to check if certain combinations of variables are statistically significant. In R Programming Language a tool called linear hypothesis () in the
4 min read