Count non-NA values by group in DataFrame in R

How to Calculate the Mean by Group in R DataFrame ?

Last Updated : 25 Sep, 2023

Calculating the mean by group in an R DataFrame involves splitting the data into subsets based on a specific grouping variable and then computing the mean of a numeric variable within each subgroup.

In this article, we will see how to calculate the mean by the group in R DataFrame in R Programming Language.

It can be done with two approaches:

Using aggregate function
Using dplyr Package

Dataset creation: First, we create a dataset so that later we can apply the above two approaches and find the Mean by group.

R

# GFG dataset name and creation
GFG <- data.frame(                                            
   Category  = c ("A","B","C","B","C","A","C","A","B"),       
   Frequency= c(9,5,0,2,7,8,1,3,7)                            
)
 
# Prints the dataset
print(GFG)                                                    

Output:

  Category Frequency
1        A         9
2        B         5
3        C         0
4        B         2
5        C         7
6        A         8
7        C         1
8        A         3
9        B         7

So, as you can see the above code is for creating a dataset named “GFG”.

It has 2 columns named Category and Frequency. So, when you run the above code in an R compiler.

Before we discuss those approaches let us first know how we got the output values:

In Table 1, We have two columns named Category and Frequency.
In Category, we have some repeating variables of A, B, and C.
A group values (9,8,3), B group values (5,2,7), and C group values (0,7,1) are taken from the Frequency column.
So, to find the Mean we have a formula

MEAN = Sum of terms / Number of terms

Hence, the Mean by Group of each group (A, B, C) would be

Sum:

A=9+8+3=20
B=5+2+7=14
C=0+7+1=8

A number of terms:

A is repeated 3 times
B is repeated 3 times
C is repeated 3 times

Mean by group (A, B, C):

A(mean) = Sum/Number of terms = 20/3 = 6.67
B(mean) = Sum/Number of terms = 14/3 = 4.67
C(mean) = Sum/Number of terms = 8/3 = 2.67

Code Implementations

Method 1: Using aggregate function

Aggregate function: Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form.

Syntax: aggregate(x = dataset_Name , by = group_list, FUN = any_function)

# Basic R syntax of aggregate function

Now, let’s sum our data using an aggregate function:

R

# Specify data column
group_mean<- aggregate(x= GFG$Frequency,
                      # Specify group indicator
                      by = list(GFG$Category),      
                      # Specify function (i.e. mean)
                      FUN = mean)
print(group_mean)

Output:

  Group.1        x
1       A 6.666667
2       B 4.666667
3       C 2.666667

In the above aggregate function, it takes on three parameters

First is the dataset name in our case it is “GFG”.
Second is the column name which values we need to make different groups in our case it is a Category column, and it is separated into three groups (A, B, C).
In the third parameter, we need to mention which function(i.e mean, sum, etc) we need to perform on a group formed (A, B, C)

Method 2: Using dplyr Package

dplyr is a package that provides a set of tools for efficiently manipulating datasets in R

Methods in dplyr package:

mutate() adds new variables that are functions of existing variables
select() picks variables based on their names.
filter() picks cases based on their values.
summarise() reduces multiple values to a single summary.
arrange() changes the ordering of the rows.

Install this library:

install.packages("dplyr")

Load this library:

library("dplyr")

R

# load dplyr library
library("dplyr")                             
 
# Specify data frame
group_mean <- GFG %>%
    # Specify group indicator, column, function
    group_by(Category) %>%
    # Calculate the mean of the "Frequency" column for each group
    summarise_at(vars(Frequency),
                 list(Mean_Frequency = mean))
 
 
# Print the resulting summary data frame
print(group_mean)

Output:

# A tibble: 3 × 2
  Category Mean_Frequency
  <chr>             <dbl>
1 A                  6.67
2 B                  4.67
3 C                  2.67

Code Steps:

The %>% operator allows us to perform the operations one after another.
group_by(Category) groups the data by the “Category” column. This means that subsequent operations will be performed separately for each unique value in the “Category” column.
summarise_at() has two parameters first is a column on which it applies the operation given as the second parameter of it.
The result is a new data frame called group_mean, which contains one row for each unique category and a column “Mean_Frequency” that holds the calculated means.

Finally, group_mean is printed to the console to display the summary statistics for each category.

Method 3: Use the data.table package

The data.table package provides a concise and efficient way to calculate summary statistics by group. In this case, we calculate the mean of the “Frequency” column for each group defined by the “Category” column.

R

# Load the data.table library
library(data.table)
 
# Convert data.frame to data.table
gfg <- data.table(GFG)
 
# Calculate the mean by "Category" group
mean_by_category <- gfg[, .(Mean_Frequency = mean(Frequency)), by = Category]
 
# Print the result
print(mean_by_category)

Output:

   Category Mean_Frequency
1:        A       6.666667
2:        B       4.666667
3:        C       2.666667

Code Steps:

The first line loads the data.table library in R. The data.table package is used for efficient data manipulation.
Then we convert the existing data frame GFG into a data.table named gfg
Mean by the “Category” group using the data.table is calculated as follows:
- Inside the gfg data table, we perform the mean of Frequency column group wise, The Mean_Frequency stores the group wise mean of Frequency column.
- The `by` argument specifies the grouping variable. It tells R to group the data by the “Category” column before applying the calculation.

Count non-NA values by group in DataFrame in R

code_blooded7

Improve

Article Tags :

Similar Reads

How to calculate the mode of all rows or columns from a dataframe in R ?

In this article, we will discuss how to calculate the mode of all rows and columns from a dataframe in R Programming Language. Method 1: Using DescTools package The DescTools package in R is used to perform descriptive analysis. It contains a collection of miscellaneous basic statistic functions and

How to calculate time difference with previous row of a dataframe by group in R

A dataframe may consist of different values belonging to groups. The columns may have values belonging to different data types or time frames as POSIXct objects. These objects allow the application of mathematical operations easily, which can be performed in the following ways : Method 1: Using dply

How to find Mean of DataFrame Column in R ?

In this article, we will discuss how to compute the mean of the Dataframe Column in R Programming language. It can be done in various ways: Using $-OperatorUsing [[]]Using Column IndexUsing summarise function of the dplyr PackageUsing colMeans Function Method 1: Using $-Operator. This is one of the

How to find the mean of all values in an R data frame?

In this article, we are going to find the mean of the values of a dataframe in R with the use of mean() function. Syntax: mean(dataframe) Creating a Dataframe A dataframe can be created with the use of data.frame() function that is pre-defined in the R library. This function accepts the elements and

Count non-NA values by group in DataFrame in R

In this article, we will discuss how to count non-NA values by the group in dataframe in R Programming Language. Method 1 : Using group_by() and summarise() methods The dplyr package is used to perform simulations in the data by performing manipulations and transformations. The group_by() method in

Create Lagged Variable by Group in R DataFrame

Lagged variable is the type of variable that contains the previous value of the variable for which we want to create the lagged variable and the first value is neglected. Data can be segregated based on different groups in R programming language and then these categories can be processed differently

How to add column to dataframe in R ?

In this article, we are going to see how to add columns to dataframe in R. First, let's create a sample dataframe. Adding Column to the DataFrame We can add a column to a data frame using $ symbol. syntax: dataframe_name $ column_name = c( value 1,value 2 . . . , value n) Here c() function is a vect

Calculate difference between dataframe rows by group in R

In this article, we will see how to find the difference between rows by the group in dataframe in R programming language. Method 1: Using dplyr package The group_by method is used to divide and segregate date based on groups contained within the specific columns. The required column to group by is s

How to add Header to Dataframe in R ?

A header necessarily stores the names or headings for each of the columns. It basically helps the user to identify the role of the respective column in the data frame. The top row containing column names is called the header row of the data frame. In this article, we will learn how to add a Header t

How to convert DataFrame column from Character to Numeric in R ?

In this article, we will discuss how to convert DataFrame column from Character to Numeric in R Programming Language. All dataframe column is associated with a class which is an indicator of the data type to which the elements of that column belong to. Therefore, in order to simulate the data type c