Pandas Groupby Average

Last Updated : 13 Jan, 2025

GroupBy operations are powerful tools for summarizing and aggregating data. One common operation is calculating the average (mean) of groups within a DataFrame. Whether you're analyzing sales data by region, customer behavior by age group, or any other grouped data, groupby() method combined with aggregation functions like mean() makes it easy to compute averages for each group.

Let's understand with a simple example:

Python

import pandas as pd

data = {'Name': ['Emma', 'Hasan', 'Rob', 'Emma', 'Hasan'],
        'Marks': [85, 70, 65, 90, 65]}
df = pd.DataFrame(data)
average_marks = df.groupby('Name')['Marks'].mean()
print(average_marks)

Output

Name
Emma     87.5
Hasan    67.5
Rob      65.0
Name: Marks, dtype: float64

The groupby function involves three key steps:

Splitting: The data is divided into groups based on specified criteria.
Applying: A function (like mean, sum, etc.) is applied to each group.
Combining: The results are combined back into a DataFrame or Series.

This method is significant because it enables efficient analysis of large datasets by summarizing information in a structured way.

Method 1: Grouping by a Single Column

The most basic way to calculate the average for grouped data with a single column and then applying the mean() function to the grouped data.

Python

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Gender': ['Female', 'Male', 'Male', 'Female', 'Female'],
    'Age': [25, 30, 35, 28, 22],
    'Salary': [50000, 60000, 70000, 55000, 48000]
}

df = pd.DataFrame(data)

# Grouping by 'Gender' and calculating the mean for each group
grouped_data = df.groupby('Gender').mean()
print(grouped_data)

Output

         Age   Salary
Gender               
Female  25.0  51000.0
Male    32.5  65000.0

Method 2: Grouping by Multiple Columns

You can also group by multiple columns to calculate averages for more specific subgroups. This is helpful when you want to segment your data into more detailed categories, such as grouping by both Gender and Age.

Python

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank', 'Grace'],
    'Gender': ['Female', 'Male', 'Male', 'Female', 'Female', 'Male', 'Female'],
    'Age': [25, 30, 35, 28, 22, 40, 29],
    'Salary': [50000, 60000, 70000, 55000, 48000, 72000, 53000]
}

df = pd.DataFrame(data)

# Grouping by 'Gender' and 'Age', then calculating the mean
grouped_data = df.groupby(['Gender', 'Age']).mean()
print(grouped_data)

Output

             Salary
Gender Age         
Female 22   48000.0
       25   50000.0
       28   55000.0
       29   53000.0
Male   30   60000.0
       35   70000.0
       40   72000.0

Method 3: Grouping with Multiple Aggregation Functions

Sometimes, you may want to calculate not just the average, but multiple statistics (such as count, sum, or median) for each group. Pandas allows to apply multiple aggregation functions simultaneously using agg().

Python

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Gender': ['Female', 'Male', 'Male', 'Female', 'Female'],
    'Salary': [50000, 60000, 70000, 55000, 52000]
}

df = pd.DataFrame(data)

# Group by 'Gender' and calculate statistics for 'Salary'
grouped_df = df.groupby('Gender')['Salary'].agg(['mean', 'sum', 'count'])

print(grouped_df)

Output

                mean     sum  count
Gender                             
Female  52333.333333  157000      3
Male    65000.000000  130000      2

Pandas Groupby Average

abhirajksingh

Improve

Article Tags :

Pandas Groupby Average

Method 1: Grouping by a Single Column

Method 2: Grouping by Multiple Columns

Method 3: Grouping with Multiple Aggregation Functions

Similar Reads

Thank You!

What kind of Experience do you want to share?