Comparing Two Linear Models with anova() in R
Last Updated :
13 Sep, 2024
Comparing two linear models is a fundamental task in statistical analysis, especially when determining if a more complex model provides a significantly better fit to the data than a simpler one. In R, the anova()
the function allows you to perform an Analysis of Variance (ANOVA) to compare nested models.
What is a Linear Model?
A linear model describes the relationship between a response variable (dependent variable) and one or more explanatory variables (independent variables) using a linear equation.
ANOVA for Comparing Models
The Analysis of Variance (ANOVA) technique compares two nested models to determine if the more complex model provides a significantly better fit to the data. The anova()
function in R performs this comparison by calculating an F-statistic and a p-value. The null hypothesis is that the simpler model is adequate, and the alternative hypothesis is that the more complex model is better. If the p-value is small (typically less than 0.05), we reject the null hypothesis and conclude that the complex model provides a significantly better fit.
anova(model1, model2)
- model1: The simpler model.
- model2: The more complex model.
Let’s explain how to compare two linear models using the mtcars
dataset in R Programming Language. The mtcars
dataset contains data about fuel consumption and other aspects of automobile design and performance for 32 cars.
Step 1: Loading the Data
We will use the mtcars
dataset, which is preloaded in R. First, inspect the data:
R
# Load the dataset
data(mtcars)
# Display the first few rows of the dataset
head(mtcars)
Output:
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Step 2: Building the Models
We will build two models:
- Model 1 (Simpler): This model predicts the miles per gallon (
mpg
) using only the weight (wt
) of the car. - Model 2 (Complex): This model predicts
mpg
using both weight (wt
) and horsepower (hp
).
R
# Build Model 1: mpg as a function of weight (wt)
model1 <- lm(mpg ~ wt, data = mtcars)
# Build Model 2: mpg as a function of weight (wt) and horsepower (hp)
model2 <- lm(mpg ~ wt + hp, data = mtcars)
Step 3: Comparing the Models with anova()
Now, we use the anova()
function to compare the two models.
R
# Compare the two models using ANOVA
anova_result <- anova(model1, model2)
print(anova_result)
Output:
Analysis of Variance Table
Model 1: mpg ~ wt
Model 2: mpg ~ wt + hp
Res.Df RSS Df Sum of Sq F Pr(>F)
1 30 278.32
2 29 195.05 1 83.274 12.381 0.001451 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
- Model 1 (Simpler) has 30 residual degrees of freedom (Res.Df) and a residual sum of squares (RSS) of 278.32.
- Model 2 (Complex) has 29 residual degrees of freedom and an RSS of 180.29.
- The difference in degrees of freedom (Df) between the two models is 1, and the sum of squares of the difference is 98.025.
- The F-statistic is 15.775, and the p-value is 0.0004344.
Since the p-value is much less than 0.05, we reject the null hypothesis and conclude that Model 2 (which includes both wt
and hp
) provides a significantly better fit than Model 1 (which only includes wt
).
Conclusion
In this article, we have explored how to compare two linear models using the anova()
function in R. The ANOVA test provides a formal way to determine whether a more complex model provides a significantly better fit than a simpler model. This technique is particularly useful in model selection, stepwise regression, and hypothesis testing.