How to Calculate the Mean by Group in R DataFrame ?
Last Updated :
25 Sep, 2023
Calculating the mean by group in an R DataFrame involves splitting the data into subsets based on a specific grouping variable and then computing the mean of a numeric variable within each subgroup.
In this article, we will see how to calculate the mean by the group in R DataFrame in R Programming Language.
It can be done with two approaches:
Dataset creation: First, we create a dataset so that later we can apply the above two approaches and find the Mean by group.
R
GFG <- data.frame (
Category = c ( "A" , "B" , "C" , "B" , "C" , "A" , "C" , "A" , "B" ),
Frequency= c (9,5,0,2,7,8,1,3,7)
)
print (GFG)
|
Output:
Category Frequency
1 A 9
2 B 5
3 C 0
4 B 2
5 C 7
6 A 8
7 C 1
8 A 3
9 B 7
So, as you can see the above code is for creating a dataset named “GFG”.
It has 2 columns named Category and Frequency. So, when you run the above code in an R compiler.
Before we discuss those approaches let us first know how we got the output values:
- In Table 1, We have two columns named Category and Frequency.
- In Category, we have some repeating variables of A, B, and C.
- A group values (9,8,3), B group values (5,2,7), and C group values (0,7,1) are taken from the Frequency column.
- So, to find the Mean we have a formula
MEAN = Sum of terms / Number of terms
- Hence, the Mean by Group of each group (A, B, C) would be
Sum:
- A=9+8+3=20
- B=5+2+7=14
- C=0+7+1=8
A number of terms:
- A is repeated 3 times
- B is repeated 3 times
- C is repeated 3 times
Mean by group (A, B, C):
- A(mean) = Sum/Number of terms = 20/3 = 6.67
- B(mean) = Sum/Number of terms = 14/3 = 4.67
- C(mean) = Sum/Number of terms = 8/3 = 2.67
Code Implementations
Method 1: Using aggregate function
Aggregate function: Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form.
Syntax: aggregate(x = dataset_Name , by = group_list, FUN = any_function)
# Basic R syntax of aggregate function
Now, let’s sum our data using an aggregate function:
R
group_mean<- aggregate (x= GFG$Frequency,
by = list (GFG$Category),
FUN = mean)
print (group_mean)
|
Output:
Group.1 x
1 A 6.666667
2 B 4.666667
3 C 2.666667
In the above aggregate function, it takes on three parameters
- First is the dataset name in our case it is “GFG”.
- Second is the column name which values we need to make different groups in our case it is a Category column, and it is separated into three groups (A, B, C).
- In the third parameter, we need to mention which function(i.e mean, sum, etc) we need to perform on a group formed (A, B, C)
dplyr is a package that provides a set of tools for efficiently manipulating datasets in R
Methods in dplyr package:
- mutate() adds new variables that are functions of existing variables
- select() picks variables based on their names.
- filter() picks cases based on their values.
- summarise() reduces multiple values to a single summary.
- arrange() changes the ordering of the rows.
Install this library:
install.packages("dplyr")
Load this library:
library("dplyr")
R
library ( "dplyr" )
group_mean <- GFG %>%
group_by (Category) %>%
summarise_at ( vars (Frequency),
list (Mean_Frequency = mean))
print (group_mean)
|
Output:
# A tibble: 3 × 2
Category Mean_Frequency
<chr> <dbl>
1 A 6.67
2 B 4.67
3 C 2.67
Code Steps:
- The %>% operator allows us to perform the operations one after another.
- group_by(Category) groups the data by the “Category” column. This means that subsequent operations will be performed separately for each unique value in the “Category” column.
- summarise_at() has two parameters first is a column on which it applies the operation given as the second parameter of it.
- The result is a new data frame called group_mean, which contains one row for each unique category and a column “Mean_Frequency” that holds the calculated means.
Finally, group_mean is printed to the console to display the summary statistics for each category.
Method 3: Use the data.table package
The data.table
package provides a concise and efficient way to calculate summary statistics by group. In this case, we calculate the mean of the “Frequency” column for each group defined by the “Category” column.
R
library (data.table)
gfg <- data.table (GFG)
mean_by_category <- gfg[, . (Mean_Frequency = mean (Frequency)), by = Category]
print (mean_by_category)
|
Output:
Category Mean_Frequency
1: A 6.666667
2: B 4.666667
3: C 2.666667
Code Steps:
- The first line loads the data.table library in R. The data.table package is used for efficient data manipulation.
- Then we convert the existing data frame GFG into a data.table named gfg
- Mean by the “Category” group using the data.table is calculated as follows:
- Inside the gfg data table, we perform the mean of Frequency column group wise, The Mean_Frequency stores the group wise mean of Frequency column.
- The `by` argument specifies the grouping variable. It tells R to group the data by the “Category” column before applying the calculation.
Similar Reads
How to calculate the mode of all rows or columns from a dataframe in R ?
In this article, we will discuss how to calculate the mode of all rows and columns from a dataframe in R Programming Language. Method 1: Using DescTools package The DescTools package in R is used to perform descriptive analysis. It contains a collection of miscellaneous basic statistic functions and
4 min read
How to calculate time difference with previous row of a dataframe by group in R
A dataframe may consist of different values belonging to groups. The columns may have values belonging to different data types or time frames as POSIXct objects. These objects allow the application of mathematical operations easily, which can be performed in the following ways : Method 1: Using dply
5 min read
How to find Mean of DataFrame Column in R ?
In this article, we will discuss how to compute the mean of the Dataframe Column in R Programming language. It can be done in various ways: Using $-OperatorUsing [[]]Using Column IndexUsing summarise function of the dplyr PackageUsing colMeans Function Method 1: Using $-Operator. This is one of the
5 min read
How to find the mean of all values in an R data frame?
In this article, we are going to find the mean of the values of a dataframe in R with the use of mean() function. Syntax: mean(dataframe) Creating a Dataframe A dataframe can be created with the use of data.frame() function that is pre-defined in the R library. This function accepts the elements and
2 min read
Count non-NA values by group in DataFrame in R
In this article, we will discuss how to count non-NA values by the group in dataframe in R Programming Language. Method 1 : Using group_by() and summarise() methods The dplyr package is used to perform simulations in the data by performing manipulations and transformations. The group_by() method in
5 min read
Create Lagged Variable by Group in R DataFrame
Lagged variable is the type of variable that contains the previous value of the variable for which we want to create the lagged variable and the first value is neglected. Data can be segregated based on different groups in R programming language and then these categories can be processed differently
5 min read
How to add column to dataframe in R ?
In this article, we are going to see how to add columns to dataframe in R. First, let's create a sample dataframe. Adding Column to the DataFrame We can add a column to a data frame using $ symbol. syntax: dataframe_name $ column_name = c( value 1,value 2 . . . , value n) Here c() function is a vect
2 min read
Calculate difference between dataframe rows by group in R
In this article, we will see how to find the difference between rows by the group in dataframe in R programming language. Method 1: Using dplyr package The group_by method is used to divide and segregate date based on groups contained within the specific columns. The required column to group by is s
5 min read
How to add Header to Dataframe in R ?
A header necessarily stores the names or headings for each of the columns. It basically helps the user to identify the role of the respective column in the data frame. The top row containing column names is called the header row of the data frame. In this article, we will learn how to add a Header t
3 min read
How to convert DataFrame column from Character to Numeric in R ?
In this article, we will discuss how to convert DataFrame column from Character to Numeric in R Programming Language. All dataframe column is associated with a class which is an indicator of the data type to which the elements of that column belong to. Therefore, in order to simulate the data type c
5 min read