Hierarchical linear regression using R
Last Updated :
24 Apr, 2025
Linear Regression model is used to establish a connection between two or more variables. These variables are either dependent or independent. Linear Regression In R Programming Language is used to give predictions based on the given data about a particular topic, It helps us to have valuable insights and give conclusions that help us in many future decisions. Hierarchical linear regression is an extension of the standard linear regression that allows for the analysis of such hierarchical data(grouped data or data at different levels).
Understanding Hierarchical Linear Regression
Hierarchical Linear Regression is used when the researchers want to study the relationship between the variables at different levels. These levels have different coefficients and they vary with each other. In R Programming Language this is used to understand the trend of the data in an overall manner as well as at different levels.
A hierarchical model is generally used when the data is divided into different groups at different levels. This kind of model helps in understanding the hierarchy at multiple levels. Sectors like education, psychology, and sociology use this model where data is taken in a multilevel structure.
To implement hierarchical linear regression in R, we can use various packages such as:
- lme4: The lme4 package stands for "Linear Mixed-Effects Models. This package uses linear algebra to deal with hierarchical data. It handles large datasets with complex nature and random effects structures. The observations are not independent in such datasets and are influenced by multiple sources of variations. It also provides us with functions of model fitting and model diagnostic which helps users in checking the efficiency of the model. It also provides the formula function helping in specifying the model formula in R.
- nlme: It stands for "Nonlinear Mixed-Effects Models. This is one of the fundamental packages in R for fitting various types of nonlinear mixed-effects models. This deals with data where the relation between the variables is nonlinear. It supports the inclusion of both random and fixed effects. It can also handle missing values from our datasets using different techniques. This package also provides extensions to check model accuracy and visualization.
- tidyverse: This is a collection of packages with a set of tools for data manipulation, visualization, and analysis. It has packages like ggplot2, dplyr, readr packages, etc.
- ggplot2: ggplot2 library stands for the grammar of graphics, popular because of its declarative syntax used to visualize and plot our data into graphs for better understanding.
Syntax to install and load these packages:
R
#installing important packages
install.packages("lme4")
install.packages("nlme")
install.packages("tidyverse")
install.packages("ggplot2")
#load necessary library
library(lme4)
library(nlme)
library(tidyverse)
library(ggplot2)
Implementing Hierarchical Linear Regression in R
In this article we will use different real-world base examples to understand our topic in detail using different packages for analysis and visualization. We can perform this analysis in 5 different steps:
- Step 1: Creating/Loading a Dataset
- Step 2: Loading packages
- Step 3: Model Fitting
- Step 4: Running an Analysis of Variance (ANOVA) and Assessing Model Fit
- Step 5: Making a prediction
- Step 6: Interpretation and Visualization
We will try to understand the relationship between teaching method and student performance or marks obtained by the student. We will create a fictional dataset using a set.seed() function which helps in generating random data.
Hierarchical linear regression using mtcars dataset in R
In this example we will take an in-built dataset in R programming language called "mtcars". This dataset have information such as weight, horsepower, and quarter mile time, miles per gallon (mpg) about different types of cars. We will try to understand how these attributes affect each other at different levels.
Step 1: Loading the Dataset
R
# Loading the Dataset
data(mtcars)
We can also view our data by using different functions such as head(), tail() or summary(). The head() function returns the first 6 rows of our dataset similarly, tail() returns the last 6 rows.
Step 2: Model Specification
In this step, we'll specify the hierarchical linear regression model. We'll consider two levels of hierarchy: car features and car make. We will try to understand the influence of car make or brand on mpg.
R
# Car features level
model <- lm(mpg ~ wt + hp + qsec, data = mtcars)
# Car make level
model_by_make <- lm(mpg ~ disp + drat + wt, data = mtcars)
# summary of the model
summary(model)
summary(model_by_make)
Output:
Call:
lm(formula = mpg ~ wt + hp + qsec, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.8591 -1.6418 -0.4636 1.1940 5.6092
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 27.61053 8.41993 3.279 0.00278 **
wt -4.35880 0.75270 -5.791 3.22e-06 ***
hp -0.01782 0.01498 -1.190 0.24418
qsec 0.51083 0.43922 1.163 0.25463
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.578 on 28 degrees of freedom
Multiple R-squared: 0.8348, Adjusted R-squared: 0.8171
F-statistic: 47.15 on 3 and 28 DF, p-value: 4.506e-11
summary(model_by_make)
Call:
lm(formula = mpg ~ disp + drat + wt, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.2342 -2.3719 -0.3148 1.6315 6.2820
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 31.043257 7.099792 4.372 0.000154 ***
disp -0.016389 0.009578 -1.711 0.098127 .
drat 0.843965 1.455051 0.580 0.566537
wt -3.172482 1.217157 -2.606 0.014495 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.951 on 28 degrees of freedom
Multiple R-squared: 0.7835, Adjusted R-squared: 0.7603
F-statistic: 33.78 on 3 and 28 DF, p-value: 1.92e-09
In above model specification : one model "model" examines how car features (such as weight, horsepower, and quarter mile time) influence the miles per gallon (mpg) performance of the cars. The other model, "model_by_make", focuses on understanding how certain features (such as displacement, rear axle ratio, and weight) affect mpg within different car makes based on the dataset we have. This is a hierarchical structure of data.
Step 3: Running an Analysis of Variance (ANOVA) and Assessing Model Fit
As discussed in the previous example, this step is important for understanding the significance of the model fit.
R
# Conducting Analysis of Variance (ANOVA) for both models
anova_model <- anova(model)
anova_model_by_make <- anova(model_by_make)
# Printing ANOVA results for both models
cat("ANOVA for model:\n")
print(anova_model)
cat("\nANOVA for model by make:\n")
print(anova_model_by_make)
Output:
ANOVA for model:
Analysis of Variance Table
Response: mpg
Df Sum Sq Mean Sq F value Pr(>F)
wt 1 847.73 847.73 127.5739 6.131e-12 ***
hp 1 83.27 83.27 12.5319 0.00142 **
qsec 1 8.99 8.99 1.3527 0.25463
Residuals 28 186.06 6.64
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
ANOVA for model by make:
Analysis of Variance Table
Response: mpg
Df Sum Sq Mean Sq F value Pr(>F)
disp 1 808.89 808.89 92.9171 2.152e-10 ***
drat 1 14.26 14.26 1.6383 0.2111
wt 1 59.14 59.14 6.7937 0.0145 *
Residuals 28 243.75 8.71
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The incremental R-squared of 0.05123624 suggests that the additional predictors in the "model_by_make" account for an additional 5.12% of the variance.
Overall, the results indicate that the additional predictors in the "model_by_make" significantly contribute to explaining the variance in the response variable compared to the initial "model".
Assessing Sums of Squares for both models
R
# Assessing Sums of Squares for both models
SST <- anova_model$`Sum Sq`[1]
SSR <- anova_model$`Sum Sq`[2]
SSE <- anova_model$`Sum Sq`[3]
SST_model_by_make <- anova_model_by_make$`Sum Sq`[1]
SSR_model_by_make <- anova_model_by_make$`Sum Sq`[2]
SSE_model_by_make <- anova_model_by_make$`Sum Sq`[3]
cat("\nSums of Squares:\n")
cat("SST:", SST, "\n")
cat("SSR:", SSR, "\n")
cat("SSE:", SSE, "\n")
Output:
Sums of Squares:
SST: 847.7252
SSR: 83.27418
SSE: 8.988458
Calculating Difference in Sum of Squares for both models
R
# Calculating Difference in Sum of Squares for both models
SSR_diff <- SST - SSE
SSR_diff_model_by_make <- SST_model_by_make - SSE_model_by_make
# Deriving F-Statistics and P-Values for both models
F_statistic <- anova_model_by_make$`F value`[1]
p_value <- anova_model_by_make$`Pr(>F)`[1]
F_statistic_model_by_make <- anova_model_by_make$`F value`[1]
p_value_model_by_make <- anova_model_by_make$`Pr(>F)`[1]
cat("\nDifference in Sum of Squares for model by make (SSR_diff_model_by_make):",
SSR_diff_model_by_make, "\n")
cat("\nF-Statistic for model by make:", F_statistic_model_by_make, "\n")
cat("P-Value for model by make:", p_value_model_by_make, "\n")
Output:
Difference in Sum of Squares for model by make (SSR_diff_model_by_make): 749.7461
F-Statistic for model by make: 92.91705
P-Value for model by make: 2.151799e-10
Computing Incremental R-Squared
R
# Computing Incremental R-Squared
incremental_R_squared <- summary(model)$r.squared - summary(model_by_make)$r.squared
cat("\nIncremental R-Squared:", incremental_R_squared, "\n")
Output:
Incremental R-Squared: 0.05123624
4. Making Predictions
We'll predict the mpg for a specific car make, in this case, "Ford Pinto." Here we will assign certain values for displacement of car, rear axle ratio for a car and weight of the car in thousands of pounds.
R
new_data <- data.frame(disp = 120, drat = 3.9, wt = 2.8)
predicted_mpg <- predict(model_by_make, newdata = new_data)
cat("Predicted MPG:", predicted_mpg, "\n")
Output:
Predicted MPG: 23.48507
23.48507 represents the estimated miles per gallon (mpg) performance based on our predictions.
Interpretation and Visualization
R
library(ggplot2)
# Adding a new variable 'make' derived from row names of mtcars dataset
mtcars$make <- rownames(mtcars)
#'make' on the x-axis and 'mpg' on the y-axis
ggplot(mtcars, aes(x = make, y = mpg)) +
geom_bar(stat = "identity", fill = "Red") +
labs(title = "MPG by Car Make", x = "Car Make", y = "MPG") +
# Adjust the x-axis text angle for better readability
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Output:
Hierarchical linear regression using RThe barplot shows mpg(miles per gallon) of various car make(brand). This graph provides a clear visualization of the mpg performance across various car makes, allowing for easy comparison.
In this example, we used in-built dataset in R called "mtcars" and used it to perform hierarchical regression. Here, we specified two model for different of our dataset. We predicted certain values for mpg using our dataset. Prediction understands the previous patterns and trends in our dataset and predicted values using predict() function. We also plotted a graph based on car make(brand) and mpg values for them for better understanding.
Conclusion:
This article was focused on different examples based on the real-life. With the help of these examples we understood how hierarchical linear regression works and how it uses multilevel dataset to understand the relationship between the dependent and independent variables and how they influence the prediction. We also understood the role of underlying patterns and trends of historical dataset in prediction.
Similar Reads
Simple Linear Regression in R
Regression shows a line or curve that passes through all the data points on the target-predictor graph in such a way that the vertical distance between the data points and the regression line is minimum What is Linear Regression?Linear Regression is a commonly used type of predictive analysis. Linea
12 min read
ML | Multiple Linear Regression using Python
Linear regression is a statistical method used for predictive analysis. It models the relationship between a dependent variable and a single independent variable by fitting a linear equation to the data. Multiple Linear Regression extends this concept by modelling the relationship between a dependen
4 min read
Non-Linear Regression in R
Non-Linear Regression is a statistical method that is used to model the relationship between a dependent variable and one of the independent variable(s). In non-linear regression, the relationship is modeled using a non-linear equation. This means that the model can capture more complex and non-line
6 min read
Linear Regression Assumptions and Diagnostics using R
Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. Before interpreting the results of a linear regression analysis in R, it's important to check and ensure that the assumptions of linear regression are met. Ass
7 min read
7 Steps to Run a Linear Regression Analysis using R
Linear Regression is a useful statistical tool for modelling the relationship between a dependent variable and one or more independent variables. It is widely used in many disciplines, such as science, medicine, economics, and education. For instance, several areas of education employ linear regress
9 min read
Multiple linear regression using ggplot2 in R
A regression line is basically used in statistical models which help to estimate the relationship between a dependent variable and at least one independent variable. There are two types of regression lines : Single Regression Line.Multiple Regression Lines. In this article, we are going to discuss h
3 min read
A Practical approach to Simple Linear Regression using R
Simple Linear Regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables. One variable denoted x is regarded as an independent variable and other one denoted y is regarded as a dependent variable. It is assumed that the two
9 min read
Linear Regression in Machine learning
Linear regression is a type of supervised machine-learning algorithm that learns from the labelled datasets and maps the data points with most optimized linear functions which can be used for prediction on new datasets. It assumes that there is a linear relationship between the input and output, mea
15+ min read
Linear Regression and Group By in R
Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. In R programming language it can be performed using the lm() function which stands for "linear model". Sometimes, analysts need to apply linear regression sepa
3 min read
Significance Test for Linear Regression in R
Linear regression is a statistical method for modeling the relationship between one or more independent variables and a dependent variable. It is frequently used to forecast the value of a dependent variable using the values of one or more independent factors. The lm() function in R can be used to c
5 min read