Summarise multiple columns using dplyr in R
Last Updated :
24 Oct, 2021
In this article, we will discuss how to summarise multiple columns using dplyr package in R Programming Language,
Method 1: Using summarise_all() method
The summarise_all method in R is used to affect every column of the data frame. The output data frame returns all the columns of the data frame where the specified function is applied over every column.
summarise_all(data, function)
Arguments :
- data - The data frame to summarise the columns of
- function - The function to apply on all the data frame columns.
R
library("dplyr")
# creating a data frame
df < - data.frame(col1=sample(rep(c(1: 5), each=3)),
col2=5: 19)
print("original dataframe")
print(df)
# summarising the data
print("summarised dataframe")
summarise_all(df, mean)
Output
[1] "original dataframe"
col1 col2
1 2 1
2 3 2
3 4 3
4 2 4
5 2 5
6 4 6
7 1 7
8 1 8
9 5 9
10 3 10
11 5 11
12 1 12
13 4 13
14 5 14
15 3 15
col1 col2
1 3 8
Explanation: The mean of all the values is calculated column-wise, that is, the sum of values of col1 is calculated and divided by the number of rows. Similarly, the summation of values is computed for col2 and col3. All the columns are returned in the final output.
Method 2: Using summarise_at() method
The summarise_at() affects variables that are extracted with a character vector or vars(). It applies the selected function to the data frame. The output data frame contains all the columns that are specified in the summarise_at method. In case all the columns of the data frame are mentioned, then the functionality of this method is the same as the summarise_all method.
data %>%
summarise_at(vars(-cols(), ...), function)
Arguments :
- data - The data frame to summarise the columns of
- function - The function to apply on all the data frame columns.
R
library("dplyr")
# creating a data frame
df < - data.frame(col1=sample(rep(c(1: 5), each=3)),
col2=1: 15,
col3=letters[1:15])
print("original dataframe")
print(df)
# summarising the data
print("summarised dataframe")
df % >%
summarise_at(c("col1", "col2"), mean, na.rm=TRUE)
Output
[1] "original dataframe"
col1 col2 col3
1 3 1 a
2 5 2 b
3 4 3 c
4 4 4 d
5 5 5 e
6 3 6 f
7 2 7 g
8 2 8 h
9 1 9 i
10 4 10 j
11 2 11 k
12 5 12 l
13 1 13 m
14 3 14 n
15 1 15 o
[1] "summarised dataframe"
col1 col2
1 3 8
Similar Reads
Mutating column in dplyr using rowSums In this article, we are going to discuss how to mutate columns in dataframes using the dplyr package in R Programming Language. Installation The package can be downloaded and installed in the R working space using the following command : Install Command - install.packages("dplyr") Load Command - l
3 min read
Drop multiple columns using Dplyr package in R In this article, we will discuss how to drop multiple columns using dplyr package in R programming language. Dataset in use: Drop multiple columns by using the column name We can remove a column with select() method by its column name Syntax: select(dataframe,-c(column_name1,column_name2,.,column_na
4 min read
Convert Multiple Columns to Numeric Using dplyr In data analysis with R Programming Language, it's common to encounter datasets where certain columns must be converted to numeric type for further study or modeling. In this article, we'll explore how to efficiently convert multiple columns to numeric using the dplyr package in R. Identifying Colum
8 min read
Sum Across Multiple Rows and Columns Using dplyr Package in R In this article, we are going to see how to sum multiple Rows and columns using Dplyr Package in R Programming language. The dplyr package is used to perform simulations in the data by performing manipulations and transformations. It can be installed into the working space using the following comman
2 min read
Summarize Multiple Columns of data.table by Group in R In this article, we will discuss how to summarize multiple columns of data.table by Group in R Programming Language. Creating table for demonstration:R # load data.table package library("data.table") # create data table with 3 columns # items # weight and #cost data <- data.table( items= c("choco
2 min read