Open In App

Pandas Groupby Average

Last Updated : 13 Jan, 2025
Summarize
Comments
Improve
Suggest changes
Share
Like Article
Like
Report

GroupBy operations are powerful tools for summarizing and aggregating data. One common operation is calculating the average (mean) of groups within a DataFrame. Whether you're analyzing sales data by region, customer behavior by age group, or any other grouped data, groupby() method combined with aggregation functions like mean() makes it easy to compute averages for each group.

Let's understand with a simple example:

Python
import pandas as pd

data = {'Name': ['Emma', 'Hasan', 'Rob', 'Emma', 'Hasan'],
        'Marks': [85, 70, 65, 90, 65]}
df = pd.DataFrame(data)
average_marks = df.groupby('Name')['Marks'].mean()
print(average_marks)

Output
Name
Emma     87.5
Hasan    67.5
Rob      65.0
Name: Marks, dtype: float64

The groupby function involves three key steps:

  1. Splitting: The data is divided into groups based on specified criteria.
  2. Applying: A function (like mean, sum, etc.) is applied to each group.
  3. Combining: The results are combined back into a DataFrame or Series.

This method is significant because it enables efficient analysis of large datasets by summarizing information in a structured way.

Method 1: Grouping by a Single Column

The most basic way to calculate the average for grouped data with a single column and then applying the mean() function to the grouped data.

Python
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Gender': ['Female', 'Male', 'Male', 'Female', 'Female'],
    'Age': [25, 30, 35, 28, 22],
    'Salary': [50000, 60000, 70000, 55000, 48000]
}

df = pd.DataFrame(data)

# Grouping by 'Gender' and calculating the mean for each group
grouped_data = df.groupby('Gender').mean()
print(grouped_data)

Output
         Age   Salary
Gender               
Female  25.0  51000.0
Male    32.5  65000.0

Method 2: Grouping by Multiple Columns

You can also group by multiple columns to calculate averages for more specific subgroups. This is helpful when you want to segment your data into more detailed categories, such as grouping by both Gender and Age.

Python
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank', 'Grace'],
    'Gender': ['Female', 'Male', 'Male', 'Female', 'Female', 'Male', 'Female'],
    'Age': [25, 30, 35, 28, 22, 40, 29],
    'Salary': [50000, 60000, 70000, 55000, 48000, 72000, 53000]
}

df = pd.DataFrame(data)

# Grouping by 'Gender' and 'Age', then calculating the mean
grouped_data = df.groupby(['Gender', 'Age']).mean()
print(grouped_data)

Output
             Salary
Gender Age         
Female 22   48000.0
       25   50000.0
       28   55000.0
       29   53000.0
Male   30   60000.0
       35   70000.0
       40   72000.0

Method 3: Grouping with Multiple Aggregation Functions

Sometimes, you may want to calculate not just the average, but multiple statistics (such as count, sum, or median) for each group. Pandas allows to apply multiple aggregation functions simultaneously using agg().

Python
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Gender': ['Female', 'Male', 'Male', 'Female', 'Female'],
    'Salary': [50000, 60000, 70000, 55000, 52000]
}

df = pd.DataFrame(data)

# Group by 'Gender' and calculate statistics for 'Salary'
grouped_df = df.groupby('Gender')['Salary'].agg(['mean', 'sum', 'count'])

print(grouped_df)

Output
                mean     sum  count
Gender                             
Female  52333.333333  157000      3
Male    65000.000000  130000      2

Next Article

Similar Reads