Linear Regression on Group Data in R

Last Updated : 11 Apr, 2025

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. In R programming language it can be performed using the lm() function which stands for "linear model". Sometimes, analysts need to apply linear regression separately to subsets of data grouped by a particular variable. This is where the concept of "grouped regression" comes.

Implementing Grouping Data in R

Grouping data in R allows you to analyse subsets of your data that share a common attribute. For example, you might want to perform linear regression separately for each group defined by a categorical variable. Below is the step by step implementation:

1. Install and Load Necessary Packages

First we will Load dplyr package.

install.packages("dplyr")
library(dplyr)

2. Prepare Your Data

Assume you have a dataset where you want to perform linear regression on each group defined by a categorical variable, such as group_var. We will generate a sample dataset.

set.seed(123) ensures reproducibility.
data.frame() creates a dataset with group, x and y variables.
rnorm(30) creates 30 random values following a normal distribution.

# Sample data
set.seed(123)
data <- data.frame(
  group = rep(c("A", "B", "C"), each = 10),
  x = rnorm(30),
  y = rnorm(30)
)
print(data)

Output:

3. Group Data and Apply Linear Regression

Using the group_by() function you can group the data by a categorical variable and then apply the lm() function to each group using do().

group_by(group) groups the dataset by the categorical variable group, allowing operations to be applied separately to each subset.
do(model = lm(y ~ x, data = .)) applies linear regression within each group modeling y as a function of x and stores the models in a list.

# Perform linear regression by group
models <- data %>%
  group_by(group) %>%
  do(model = lm(y ~ x, data = .))

print(models)

Output:

Here models will contain a list of linear regression models, one for each group.

4. Extract and Summarise Results

Here we extract and summarise the coefficients or other statistics from the models.

summarise() calculates summary statistics for each group.
coef(model)[1] extracts the intercept from the linear regression model for each group.
coef(model)[2] extracts the slope (coefficient of x) from the model for each group.

# Summarize coefficients by group
coefficients_summary <- models %>%
  summarise(
    intercept = coef(model)[1],
    slope = coef(model)[2]
  )

print(coefficients_summary)

Output:

This will provide a summary table of the intercept and slope for each group allowing you to understand how the relationship between data. This approach is useful in scenarios where relationships between variables differ across categories such as analyzing customer behavior, medical outcomes or vehicle performance.

Numbering Rows within Groups of DataFrame in R

nyadavxenc

Improve

Article Tags :

Practice Tags :

Machine Learning

Linear Regression on Group Data in R

Implementing Grouping Data in R

1. Install and Load Necessary Packages

2. Prepare Your Data

3. Group Data and Apply Linear Regression

4. Extract and Summarise Results

Similar Reads

Thank You!

What kind of Experience do you want to share?