Open In App

How to Use Aggregate and Not Drop Rows with NA in R

Last Updated : 16 May, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

In R Programming Language the aggregate() function is used to compute summary statistics by group. By default, aggregate() drop any rows with missing values (NA) in the grouping columns. However, we can specify the argument na.action = na.pass to retain rows with NA values during aggregation.

Let us study in detail about how to use aggregate and Not Drop Rows with NA in R

Syntax:

aggregate(formula, data, FUN, na.action = na.pass)

Where:

  • formula: A formula specifying the variables to be aggregated and the grouping variable(s).
  • data: The data frame containing the variables.
  • FUN: The function to be applied for aggregation (e.g., mean, sum, max, etc.).
  • na.action: Specifies how to handle NA values. Setting na.action = na.pass retains rows with NA values during aggregation.

1. Aggregating with Sum

In this example, we have a dataset containing two columns: "Group" and "Value" and we will aggregate the sum of "Value" by "Group", and retain rows with NA values during aggregation.

R
df1 <- data.frame(Group = c("A", "B", "A", "B", NA),
                  Value = c(NA, 2, NA, 4, 5))

result1 <- aggregate(Value ~ Group, data = df1, FUN = sum, na.action = na.pass)

print(result1)

Output:

aggwithsum
Aggregate Function With Sum

2. Aggregating with Custom Function

In this example, we want to find the median of "Rating" within each "Group" in a dataset df with two columns: "Group" and "Rating".Here we apply a custom function to compute the median of "Rating" within each "Group", ensuring that rows with NA values are not dropped during aggregation.

R
df4 <- data.frame(Group = c("A", "B", "A", "B", NA),
                  Rating = c(3.5, 4.2, NA, 3.8, 4.5))

median_custom <- function(x) {
  median(x, na.rm = TRUE)
}

result4 <- aggregate(Rating ~ Group, data = df4, FUN = median_custom, 
                     na.action = na.pass)

print(result4)

Output:

aggwithcus
Aggregate With Custom Function

3. Aggregating with Count

In this example we want to count the number of purchases made by each customer, ensuring that rows with NA values are retained during aggregation.

R
customer_data <- data.frame(
  Customer = c('Jayesh', 'Anurag', 'Vipul', 'Shivang', 'Pratham'),
  Purchases = c(5, 8, NA, 12, NA),
  Returns = c(NA, 2, 1, NA, 3)
)

aggregate(. ~ Customer, data = customer_data, FUN = function(x) sum(!is.na(x)),
          na.action = na.pass)

Output:

aggwithcount
Aggregate Function with count

4. Aggregating with Mean

In this example, we calculate the mean score for each student in the subjects while ensuring that rows with NA values are retained during aggregation. The na.action = na.pass argument allows us to include NA values in the calculation of the mean score for each subject.

R
student_scores <- data.frame(
  Student = c('Jayesh', 'Anurag', 'Vipul', 'Shivang', 'Pratham'),
  Math = c(80, NA, 75, 90, 85),
  Science = c(NA, 70, 85, 88, 92),
  English = c(78, 85, 82, NA, 90)
)

aggregate(. ~ Student, data = student_scores, FUN = mean, na.action = na.pass)

Output:

aggwithmean
Aggregate Function With Mean

In this article we understood that the aggregate() function is a tool for computing summary statistics by group. By default, aggregate() drops any rows containing missing values (NA) in the grouping columns, which may lead to inaccurate analyses. However, by specifying na.action = na.pass, we can retain rows with NA values during aggregation, ensuring a more comprehensive analysis.


Similar Reads