Open In App

Linear Regression on Group Data in R

Last Updated : 11 Apr, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. In R programming language it can be performed using the lm() function which stands for "linear model". Sometimes, analysts need to apply linear regression separately to subsets of data grouped by a particular variable. This is where the concept of "grouped regression" comes.

Implementing Grouping Data in R

Grouping data in R allows you to analyse subsets of your data that share a common attribute. For example, you might want to perform linear regression separately for each group defined by a categorical variable. Below is the step by step implementation:

1. Install and Load Necessary Packages

First we will Load dplyr package.

R
install.packages("dplyr")
library(dplyr)

2. Prepare Your Data

Assume you have a dataset where you want to perform linear regression on each group defined by a categorical variable, such as group_var. We will generate a sample dataset.

  • set.seed(123) ensures reproducibility.
  • data.frame() creates a dataset with group, x and y variables.
  • rnorm(30) creates 30 random values following a normal distribution.
R
# Sample data
set.seed(123)
data <- data.frame(
  group = rep(c("A", "B", "C"), each = 10),
  x = rnorm(30),
  y = rnorm(30)
)
print(data)

Output:

Screenshot-2025-04-11-130512

3. Group Data and Apply Linear Regression

Using the group_by() function you can group the data by a categorical variable and then apply the lm() function to each group using do().

  • group_by(group) groups the dataset by the categorical variable group, allowing operations to be applied separately to each subset.
  • do(model = lm(y ~ x, data = .)) applies linear regression within each group modeling y as a function of x and stores the models in a list.
R
# Perform linear regression by group
models <- data %>%
  group_by(group) %>%
  do(model = lm(y ~ x, data = .))

print(models)

Output:

Screenshot-2025-04-11-131158

Here models will contain a list of linear regression models, one for each group.

4. Extract and Summarise Results

Here we extract and summarise the coefficients or other statistics from the models.

  • summarise() calculates summary statistics for each group.
  • coef(model)[1] extracts the intercept from the linear regression model for each group.
  • coef(model)[2] extracts the slope (coefficient of x) from the model for each group.
R
# Summarize coefficients by group
coefficients_summary <- models %>%
  summarise(
    intercept = coef(model)[1],
    slope = coef(model)[2]
  )

print(coefficients_summary)

Output:

Screenshot-2025-04-11-131336

This will provide a summary table of the intercept and slope for each group allowing you to understand how the relationship between data. This approach is useful in scenarios where relationships between variables differ across categories such as analyzing customer behavior, medical outcomes or vehicle performance.


Similar Reads