PARTITION BY Clause in PostgreSQL

Last Updated : 11 Oct, 2024

In PostgreSQL, the PARTITION BY clause plays an important role in dividing datasets into partitions so that various window functions can efficiently operate on those partitions.

In this guide, we will cover the syntax, examples, and the advantages of using the PARTITION BY clause, making it a handy tool for working with PostgreSQL table partitioning.

PARTITION BY Clause in PostgreSQL

Using the PARTITION BY Clause we need to calculate row numbers, rank employees based on salary, or calculate cumulative totals. PARTITION BY allows us to perform these operations on subsets of data without losing the dataset's integrity. It is particularly useful for PostgreSQL partitioning of large datasets and making queries faster.

Syntax:

window_function() OVER (PARTITION BY column_name ORDER BY column_name)

key terms

window_function(): This can be any window function like ROW_NUMBER(), RANK(), SUM(), etc.
PARTITION BY column_name: Defines the column(s) by which the data will be partitioned.
ORDER BY column_name: Specifies the order in which rows within each partition are processed by the window function.

Why Use `PARTITION BY` in PostgreSQL?

Partitioning data with the PARTITION BY clause is beneficial in various scenarios:

Data Analysis: When we need to group data for window functions like ROW_NUMBER(), RANK(), or SUM(), PARTITION BY is essential to analyze each group or partition separately.
Performance Optimization: It enhances query performance by breaking down large datasets into smaller, more manageable chunks.
Efficient Querying: Helps PostgreSQL efficiently retrieve and process only relevant data within each partition, reducing query time and load on the database.

Examples of PARTITION BY Clause in PostgreSQL

Let's look at a few examples to demonstrate how the PARTITION BY clause works in PostgreSQL. Suppose we have a table called employees with the following data. This table shows the employee_id (automatically generated by the SERIAL type), their department, and their salary.

Query:

CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    department VARCHAR(50),
    salary NUMERIC
);


INSERT INTO employees (department, salary) VALUES
('HR', 50000),
('HR', 60000),
('IT', 70000),
('IT', 80000),
('Sales', 55000),
('Sales', 65000);

Output:

employee_id	department	salary
1	HR	50000
2	HR	60000
3	IT	70000
4	IT	80000
5	Sales	55000
6	Sales	65000

Example 1: Using `PARTITION BY RANGE` in PostgreSQL

In this example, we will partition employees by department and assign row numbers based on salary in descending order. We want to assign a row number to each employee within their respective department.

Query:

SELECT employee_id, department, salary,
       ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_number
FROM employees;

Output:

employee_id	department	salary	row_number
2	HR	60000	1
1	HR	50000	2
4	IT	80000	1
3	IT	70000	2
6	Sales	65000	1
5	Sales	55000	2

Explanation:

In this query, the PARTITION BY department groups the data by department, and within each partition, the rows are assigned a unique row number based on salary in descending order. This allows for separate row numbering in each department.

Example 2: Using `PARTITION BY LIST` for Cumulative Totals

In this example, we will calculate the cumulative salary for employees within each department. This operation helps in determining cumulative data within a partitioned subset.

Query:

SELECT employee_id, department, salary,
       SUM(salary) OVER (PARTITION BY department ORDER BY salary) AS cumulative_salary
FROM employees;

Output:

employee_id	department	salary	cumulative_salary
1	HR	50000	50000
2	HR	60000	110000
3	IT	70000	70000
4	IT	80000	150000
5	Sales	55000	55000
6	Sales	65000	120000

Explanation:

In this query, the SUM() function calculates the cumulative salary for employees within each department. The PARTITION BY department ensures that the cumulative sum is calculated within each department, while the ORDER BY salary sorts the employees by their salaries before applying the cumulative sum.

Example 3: Using `PARTITION BY HASH` in PostgreSQL

Using the PARTITION BY HASH, we can split data based on hash values. This method is useful when we want to evenly distribute rows across partitions. For example, we could partition employee records based on their department's hash value:

Query:

CREATE TABLE employees_partitioned_by_hash (
    employee_id SERIAL PRIMARY KEY,
    department VARCHAR(50),
    salary NUMERIC
)
PARTITION BY HASH (department);

CREATE TABLE employees_hr PARTITION OF employees_partitioned_by_hash
    FOR VALUES WITH (MODULUS 3, REMAINDER 0);

CREATE TABLE employees_it PARTITION OF employees_partitioned_by_hash
    FOR VALUES WITH (MODULUS 3, REMAINDER 1);

CREATE TABLE employees_sales PARTITION OF employees_partitioned_by_hash
    FOR VALUES WITH (MODULUS 3, REMAINDER 2);

Explanation:

In this case, the table is PARTITION BY HASH, with each partition based on the hash value of the department. This approach helps in distributing data more evenly across partitions.

Conclusion

PARTITION BY clause is another important feature in PostgreSQL when using window functions as it creates partitions from a dataset to carry out operations and analysis on personal subsets of data. With this clause, we are in a position to get analytical details of the results without affecting the whole database.

PARTITION BY Clause in PostgreSQL

savita8z3a3

Improve

Article Tags :

PARTITION BY Clause in PostgreSQL

PARTITION BY Clause in PostgreSQL

Why Use PARTITION BY in PostgreSQL?

Examples of PARTITION BY Clause in PostgreSQL

Example 1: Using PARTITION BY RANGE in PostgreSQL

Output:

Example 2: Using PARTITION BY LIST for Cumulative Totals

Output:

Example 3: Using PARTITION BY HASH in PostgreSQL

Conclusion

Similar Reads

Thank You!

What kind of Experience do you want to share?

Why Use `PARTITION BY` in PostgreSQL?

Example 1: Using `PARTITION BY RANGE` in PostgreSQL

Example 2: Using `PARTITION BY LIST` for Cumulative Totals

Example 3: Using `PARTITION BY HASH` in PostgreSQL