Open In App

Group by one or more variables using Dplyr in R

Last Updated : 16 Dec, 2021
Comments
Improve
Suggest changes
Like Article
Like
Report

The group_by() method is used to divide and segregate date based on groups contained within the specific columns. The required column to group by is specified as an argument of this function. It may contain multiple column names.

Syntax:

group_by(col1, col2, ...)

Example 1: Group by one variable

R
# installing required libraries
library("dplyr")

# creating a data frame
data_frame <- data.frame(col1 = sample(6:7, 9 , replace = TRUE),
                         col2 = letters[1:3],
                         col3 = c(1,4,5,1,NA,NA,2,NA,2))

print ("Original DataFrame")
print (data_frame)

print ("Modified DataFrame")

# computing difference of each group
data_frame%>%group_by(col1)

Output

[1] "Original DataFrame" 
col1 col2 col3 
1    6    a    1 
2    7    b    4 
3    7    c    5 
4    6    a    1 
5    7    b   NA 
6    6    c   NA 
7    6    a    2 
8    6    b   NA 
9    7    c    2 
[1] "Modified DataFrame" 
# A tibble: 9 x 3 
# Groups:   col1 [2]    
col1 col2   col3   
<int> <chr> <dbl> 
1     6 a         1 
2     7 b         4 
3     7 c         5 
4     6 a         1 
5     7 b        NA 
6     6 c        NA 
7     6 a         2 
8     6 b        NA 
9     7 c         2

Grouping can be also done using multiple columns belonging to the data frame for this just the names of the columns have to be passed to the function.

Example 2: Group by multiple columns

R
# installing required libraries
library("dplyr")

# creating a data frame
data_frame <- data.frame(col1 = sample(6:7, 9 , replace = TRUE),
                         col2 = letters[1:3],
                         col3 = c(1,4,5,1,NA,NA,2,NA,2))

print ("Original DataFrame")
print (data_frame)

print ("Modified DataFrame")

# computing difference of each group
data_frame%>%group_by(col1,col2)

Output

[1] "Original DataFrame" 
col1 col2 col3
 1    7    a    1 
2    7    b    4 
3    7    c    5 
4    6    a    1 
5    6    b   NA 
6    6    c   NA 
7    7    a    2 
8    6    b   NA 
9    6    c    2 
[1] "Modified DataFrame" 
# A tibble: 9 x 3 
# Groups:   col1, col2 [6]    
col1 col2   col3   
<int> <chr> <dbl> 
1     7 a         1 
2     7 b         4 
3     7 c         5 
4     6 a         1 
5     6 b        NA 
6     6 c        NA 
7     7 a         2 
8     6 b        NA 
9     6 c         2

Next Article
Article Tags :

Similar Reads