How to Use Aggregate and Not Drop Rows with NA in R
Last Updated :
16 May, 2025
In R Programming Language the aggregate()
function is used to compute summary statistics by group. By default, aggregate()
drop any rows with missing values (NA) in the grouping columns. However, we can specify the argument na.action = na.pass to retain rows with NA values during aggregation.
Let us study in detail about how to use aggregate and Not Drop Rows with NA in R
Syntax:
aggregate(formula, data, FUN, na.action = na.pass)
Where:
formula
: A formula specifying the variables to be aggregated and the grouping variable(s).data
: The data frame containing the variables.FUN
: The function to be applied for aggregation (e.g., mean
, sum
, max
, etc.).na.action
: Specifies how to handle NA values. Setting na.action = na.pass
retains rows with NA values during aggregation.
1. Aggregating with Sum
In this example, we have a dataset containing two columns: "Group" and "Value" and we will aggregate the sum of "Value" by "Group", and retain rows with NA values during aggregation.
R
df1 <- data.frame(Group = c("A", "B", "A", "B", NA),
Value = c(NA, 2, NA, 4, 5))
result1 <- aggregate(Value ~ Group, data = df1, FUN = sum, na.action = na.pass)
print(result1)
Output:
Aggregate Function With Sum2. Aggregating with Custom Function
In this example, we want to find the median of "Rating" within each "Group" in a dataset df with two columns: "Group" and "Rating".Here we apply a custom function to compute the median of "Rating" within each "Group", ensuring that rows with NA values are not dropped during aggregation.
R
df4 <- data.frame(Group = c("A", "B", "A", "B", NA),
Rating = c(3.5, 4.2, NA, 3.8, 4.5))
median_custom <- function(x) {
median(x, na.rm = TRUE)
}
result4 <- aggregate(Rating ~ Group, data = df4, FUN = median_custom,
na.action = na.pass)
print(result4)
Output:
Aggregate With Custom Function3. Aggregating with Count
In this example we want to count the number of purchases made by each customer, ensuring that rows with NA values are retained during aggregation.
R
customer_data <- data.frame(
Customer = c('Jayesh', 'Anurag', 'Vipul', 'Shivang', 'Pratham'),
Purchases = c(5, 8, NA, 12, NA),
Returns = c(NA, 2, 1, NA, 3)
)
aggregate(. ~ Customer, data = customer_data, FUN = function(x) sum(!is.na(x)),
na.action = na.pass)
Output:
Aggregate Function with count4. Aggregating with Mean
In this example, we calculate the mean score for each student in the subjects while ensuring that rows with NA values are retained during aggregation. The na.action = na.pass argument allows us to include NA values in the calculation of the mean score for each subject.
R
student_scores <- data.frame(
Student = c('Jayesh', 'Anurag', 'Vipul', 'Shivang', 'Pratham'),
Math = c(80, NA, 75, 90, 85),
Science = c(NA, 70, 85, 88, 92),
English = c(78, 85, 82, NA, 90)
)
aggregate(. ~ Student, data = student_scores, FUN = mean, na.action = na.pass)
Output:
Aggregate Function With Mean In this article we understood that the aggregate() function is a tool for computing summary statistics by group. By default, aggregate() drops any rows containing missing values (NA) in the grouping columns, which may lead to inaccurate analyses. However, by specifying na.action = na.pass, we can retain rows with NA values during aggregation, ensuring a more comprehensive analysis.
Similar Reads
How to Use aggregate and Not Drop Rows with NA in R In R Programming Language the aggregate() function is used to compute summary statistics by group. By default, aggregate() drop any rows with missing values (NA) in the grouping columns. However, we can specify the argument na.action = na.pass to retain rows with NA values during aggregation. Let us
3 min read
How to Use aggregate Function in R In this article, we will discuss how to use aggregate function in R Programming Language. aggregate() function is used to get the summary statistics of the data by group. The statistics include mean, min, sum. max etc. Syntax: aggregate(dataframe$aggregate_column, list(dataframe$group_column), FUN)
2 min read
How to Create a Lag Variable Within Each Group in R? Creating lag variables within groups is a common task in time series and panel data analysis. It involves generating a new variable that contains the value of an existing variable from a previous period or row within each group. This process is crucial for tasks such as time series forecasting, pane
5 min read
How to Select Rows with NA Values in R In this article, we will examine various methods to select rows with NA values in the R programming language. What are NA values?NA represents 'not available' used for indicating the missing values or undefined data in the datasets. It is a logical constant of length 1. NA is one of the reserved wor
4 min read
How to Aggregate multiple columns in Data.table in R ? In this article, we will discuss how to aggregate multiple columns in Data.table in R Programming Language. A data.table contains elements that may be either duplicate or unique. As a result of this, the variables are divided into categories depending on the sets in which they can be segregated. The
5 min read
How to Use "Is Not NA" in R? In this article, we will discuss how to use Is Not NA in R Programming Language. NA is a value that is not a number. The is.na() method is used to check whether the given value is NA or not, we have to use the function for this. Inorder to use is NOT Â NA, then we have to add the "!" operator to the
2 min read