How to find group-wise summary statistics for R dataframe?
Last Updated :
21 Apr, 2021
Finding group-wise summary statistics for the dataframe is very useful in understanding our data frame. The summary includes statistical data: mean, median, min, max, and quartiles of the given dataframe. The summary can be computed on a single column or variable, or the entire dataframe. In this article, we are going to see how to find group-wise summary statistics for data frame in R Programming Language.
In the code below we have used a built-in data set: iris flower dataset. Then we can inspect our dataset by using the head() or tail() function which will print the top and bottom part of the dataframe. In the code below, we have displayed the top 10 rows of our sample dataframe.
Output:

Summary of single variable or column
Our dataframe is stored in the “df” variable. We want to print the summary of the column: Sepal.Length. So, we pass “df$Sepal.length” as an argument in the summary() function.
Syntax: summary(dataframe$column_name)
The summary() function takes in a dataframe column and returns:
- Central Tendency-> mean and median,
- Interquartile range-> 25th and 75th quartiles,
- Range-> min, and max values for that single column.
Example 1:
R
df <- iris
summary (df$Sepal.Length)
|
Output:

Example 2: We can also pass the “digits” as an argument which specifies up to how many decimal places we want to correct our output values
Syntax: summary(dataframe$column_name , digits=number_of_decimal_places)
R
df <- iris
summary (df$Sepal.Width, digits = 3)
|
Output:

Summary of entire dataframe
In the code below, we have passed the entire dataframe as an argument in the summary() function, so it computes a summary of the entire dataframe(all the columns or variables)
Syntax: summary(dataframe_name)
Output:

Group-wise summary of data
For a better understanding of Dataframe in R, it is recommended to refer R – Data Frames article.
Let’s create a sample dataframe first:
R
df <- data.frame (
Weekday = factor ( rep ( c ( "Mon" , "Tues" , "Wed" ,
"Thurs" , "Fri" ), each = 4),
levels = c ( "Mon" , "Tues" , "Wed" ,
"Thurs" , "Fri" )),
Quarter = paste0 ( "Q" , rep (1:4, each = 5)),
Delay = c (9.9, 5.4, 8.8, 6.9, 4.9, 9.7, 7.9, 5, 8.8,
11.1, 10.2, 9.3, 12.2, 10.2, 9.2, 9.7, 12.2,
8.1, 7.9, 5.6))
df
|
Output:

Summarising group-wise data of Single Variable
Our data frame consists of 3 variables: Week-day, Quarter, and Delay. The variable which we will be summarising is Delay and in the process, Quarter variable will be collapsed.
In the below code, we will be using dplyr package. The dplyr package in R is a structure of data manipulation that provides a uniform set of verbs, helping to resolve the most frequent data manipulation hurdles. We will be performing a grouping operation using the group_by() function and a summary operation using the summarize() function. Then we will calculate 2 statistical summaries: maximum delay time and minimum delay time.
Syntax: group_by(variable_name)
R
library (dplyr)
df <- data.frame (
Weekday = factor ( rep ( c ( "Mon" , "Tues" , "Wed" , "Thurs" ,
"Fri" ), each = 4),
levels = c ( "Mon" , "Tues" , "Wed" , "Thurs" ,
"Fri" )),
Quarter = paste0 ( "Q" , rep (1:4, each = 5)),
Delay = c (9.9, 5.4, 8.8, 6.9, 4.9, 9.7, 7.9, 5, 8.8,
11.1, 10.2, 9.3, 12.2, 10.2, 9.2, 9.7, 12.2,
8.1, 7.9, 5.6))
df %>%
group_by (Weekday) %>%
summarize (min_delay = min (Delay), max_delay = max (Delay))
|
Output:

Summarising group-wise data of Multiple Variable
Let’s create another sample dataframe ->df2:
R
df2 <- data.frame (
Quarter = paste0 ( "Q" , rep (1:4, each = 4)),
Week = rep ( c ( "Weekday" , "Weekend" ), each=2, times=4),
Direction = rep ( c ( "Inbound" , "Outbound" ), times=8),
Delay = c (10.8, 9.7, 15.5, 10.4, 11.8, 8.9, 5.5,
3.3, 10.6, 8.8, 6.6, 5.2, 9.1, 7.3, 5.3, 4.4))
df2
|
Output:

Summarizing data group-wise:
In this case, our dataframe is having 4 variables: Quarter, Week, Direction, Delay. In the code below, we have grouped and summarised by Quarter and Week, and in the process, the variable Direction is collapsed.
Syntax: group_by(variable_name1,variable_name2 )
R
library (dplyr)
df2 <- data.frame (
Quarter = paste0 ( "Q" , rep (1:4, each = 4)),
Week = rep ( c ( "Weekday" , "Weekend" ), each=2, times=4),
Direction = rep ( c ( "Inbound" , "Outbound" ), times=8),
Delay = c (10.8, 9.7, 15.5, 10.4, 11.8, 8.9, 5.5,
3.3, 10.6, 8.8, 6.6, 5.2, 9.1, 7.3, 5.3, 4.4))
df2 %>%
group_by (Quarter, Week) %>%
summarize (min_delay = min (Delay), max_delay = max (Delay))
|
Output:

Similar Reads
How to Calculate Summary Statistics by Group in R?
In this article, we will discuss how to calculate summary statistics by the group in the R programming language. What is summary statistics in R?Summary Statistics by Group in R Programming Language are numerical or graphical representations that provide a concise and informative overview of a datas
5 min read
How to get summary statistics by group in R
In this article, we will learn how to get summary statistics by the group in R programming language. Sample dataframe in use: grpBy num 1 A 20 2 A 30 3 A 40 4 B 50 5 B 50 6 C 70 7 C 80 8 C 25 9 C 35 10 D 45 11 E 55 12 E 65 13 E 75 14 E 85 15 E 95 16 E 105Method 1: Using tapply() tapply() function in
6 min read
Compute Summary Statistics In R
Summary statistics provide a concise overview of the characteristics of a dataset, offering insights into its central tendency, dispersion, and distribution. R Programming Language with its variety of packages, offers several methods to compute summary statistics efficiently. Here we'll explore vari
4 min read
How To Calculate Summary Statistics In Pandas
Pandas, an incredibly versatile data manipulation library for Python, has various capabilities to calculate summary statistics on datasets. Summary statistics can give you a fast and comprehensive overview of the most important features of a dataset. In the following article, we will explore five me
4 min read
How to sort grouped Pandas dataframe by group size ?
In this article, we will discuss how to sort grouped data based on group size in Pandas. Functions used Here we will pass the inputs through the list as a dictionary data structure. groupby(): groupby() is used to group the data based on the column values.size(): This is used to get the size of the
3 min read
How to Write a Loop to Run the t-Test of a Data Frame in R
In statistical analysis, the t-test is used to compare the means of two groups to determine whether there is a significant difference between them. Often, you may need to run t-tests for multiple variables in a data frame. Writing a loop in R allows you to automate this process, which is especially
4 min read
DataFrame Row Slice in R
In this article, we are going to see how to Slice row in Dataframe using R Programming Language. Row slicing in R is a way to access the data frame rows and further use them for operations or methods. The rows can be accessed in any possible order and stored in other vectors or matrices as well. Row
4 min read
How to Find and Count Missing Values in R DataFrame
In this article, we will be discussing how to find and count missing values in the R programming language. Find and Count Missing Values in the R DataFrameGenerally, missing values in the given data are represented with NA. In R programming, the missing values can be determined by is.na() method. Th
4 min read
How to Calculate the P-Value of an F-Statistic in R
F-test is a statistical test and it produces the F-statistic which possesses F distribution under the null hypothesis. This article focuses on how we can compute the P-value of an F-statistic in R Programming Language. Finding P-value of an F statistic in R R provides us pf() function using which we
3 min read
How to apply functions in a Group in a Pandas DataFrame?
In this article, let's see how to apply functions in a group in a Pandas Dataframe. Steps to be followed for performing this task are - Import the necessary libraries.Set up the data as a Pandas DataFrame.Use apply function to find different statistical measures like Rolling Mean, Average, Sum, Maxi
1 min read