Open In App

Comparing Two Linear Models with anova() in R

Last Updated : 13 Sep, 2024
Summarize
Comments
Improve
Suggest changes
Share
Like Article
Like
Report

Comparing two linear models is a fundamental task in statistical analysis, especially when determining if a more complex model provides a significantly better fit to the data than a simpler one. In R, the anova() the function allows you to perform an Analysis of Variance (ANOVA) to compare nested models.

What is a Linear Model?

A linear model describes the relationship between a response variable (dependent variable) and one or more explanatory variables (independent variables) using a linear equation.

ANOVA for Comparing Models

The Analysis of Variance (ANOVA) technique compares two nested models to determine if the more complex model provides a significantly better fit to the data. The anova() function in R performs this comparison by calculating an F-statistic and a p-value. The null hypothesis is that the simpler model is adequate, and the alternative hypothesis is that the more complex model is better. If the p-value is small (typically less than 0.05), we reject the null hypothesis and conclude that the complex model provides a significantly better fit.

anova(model1, model2)

  • model1: The simpler model.
  • model2: The more complex model.

Let’s explain how to compare two linear models using the mtcars dataset in R Programming Language. The mtcars dataset contains data about fuel consumption and other aspects of automobile design and performance for 32 cars.

Step 1: Loading the Data

We will use the mtcars dataset, which is preloaded in R. First, inspect the data:

R
# Load the dataset
data(mtcars)

# Display the first few rows of the dataset
head(mtcars)

Output:

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Step 2: Building the Models

We will build two models:

  • Model 1 (Simpler): This model predicts the miles per gallon (mpg) using only the weight (wt) of the car.
  • Model 2 (Complex): This model predicts mpg using both weight (wt) and horsepower (hp).
R
# Build Model 1: mpg as a function of weight (wt)
model1 <- lm(mpg ~ wt, data = mtcars)

# Build Model 2: mpg as a function of weight (wt) and horsepower (hp)
model2 <- lm(mpg ~ wt + hp, data = mtcars)

Step 3: Comparing the Models with anova()

Now, we use the anova() function to compare the two models.

R
# Compare the two models using ANOVA
anova_result <- anova(model1, model2)
print(anova_result)

Output:

Analysis of Variance Table

Model 1: mpg ~ wt
Model 2: mpg ~ wt + hp
Res.Df RSS Df Sum of Sq F Pr(>F)
1 30 278.32
2 29 195.05 1 83.274 12.381 0.001451 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
  • Model 1 (Simpler) has 30 residual degrees of freedom (Res.Df) and a residual sum of squares (RSS) of 278.32.
  • Model 2 (Complex) has 29 residual degrees of freedom and an RSS of 180.29.
  • The difference in degrees of freedom (Df) between the two models is 1, and the sum of squares of the difference is 98.025.
  • The F-statistic is 15.775, and the p-value is 0.0004344.

Since the p-value is much less than 0.05, we reject the null hypothesis and conclude that Model 2 (which includes both wt and hp) provides a significantly better fit than Model 1 (which only includes wt).

Conclusion

In this article, we have explored how to compare two linear models using the anova() function in R. The ANOVA test provides a formal way to determine whether a more complex model provides a significantly better fit than a simpler model. This technique is particularly useful in model selection, stepwise regression, and hypothesis testing.


Next Article

Similar Reads