Group by one or more variables using Dplyr in R

Last Updated : 16 Dec, 2021

The group_by() method is used to divide and segregate date based on groups contained within the specific columns. The required column to group by is specified as an argument of this function. It may contain multiple column names.

Syntax:

group_by(col1, col2, ...)

Example 1: Group by one variable

# installing required libraries
library("dplyr")

# creating a data frame
data_frame <- data.frame(col1 = sample(6:7, 9 , replace = TRUE),
                         col2 = letters[1:3],
                         col3 = c(1,4,5,1,NA,NA,2,NA,2))

print ("Original DataFrame")
print (data_frame)

print ("Modified DataFrame")

# computing difference of each group
data_frame%>%group_by(col1)

Output

[1] "Original DataFrame" 
col1 col2 col3 
1    6    a    1 
2    7    b    4 
3    7    c    5 
4    6    a    1 
5    7    b   NA 
6    6    c   NA 
7    6    a    2 
8    6    b   NA 
9    7    c    2 
[1] "Modified DataFrame" 
# A tibble: 9 x 3 
# Groups:   col1 [2]    
col1 col2   col3   
<int> <chr> <dbl> 
1     6 a         1 
2     7 b         4 
3     7 c         5 
4     6 a         1 
5     7 b        NA 
6     6 c        NA 
7     6 a         2 
8     6 b        NA 
9     7 c         2

Grouping can be also done using multiple columns belonging to the data frame for this just the names of the columns have to be passed to the function.

Example 2: Group by multiple columns

# installing required libraries
library("dplyr")

# creating a data frame
data_frame <- data.frame(col1 = sample(6:7, 9 , replace = TRUE),
                         col2 = letters[1:3],
                         col3 = c(1,4,5,1,NA,NA,2,NA,2))

print ("Original DataFrame")
print (data_frame)

print ("Modified DataFrame")

# computing difference of each group
data_frame%>%group_by(col1,col2)

Output

[1] "Original DataFrame" 
col1 col2 col3
 1    7    a    1 
2    7    b    4 
3    7    c    5 
4    6    a    1 
5    6    b   NA 
6    6    c   NA 
7    7    a    2 
8    6    b   NA 
9    6    c    2 
[1] "Modified DataFrame" 
# A tibble: 9 x 3 
# Groups:   col1, col2 [6]    
col1 col2   col3   
<int> <chr> <dbl> 
1     7 a         1 
2     7 b         4 
3     7 c         5 
4     6 a         1 
5     6 b        NA 
6     6 c        NA 
7     7 a         2 
8     6 b        NA 
9     6 c         2

Dplyr - Groupby on multiple columns using variable names in R

yippeee25

Improve

Article Tags :

Group by one or more variables using Dplyr in R

Similar Reads

Thank You!

What kind of Experience do you want to share?