Open In App

PARTITION BY Clause in PostgreSQL

Last Updated : 11 Oct, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

In PostgreSQL, the PARTITION BY clause plays an important role in dividing datasets into partitions so that various window functions can efficiently operate on those partitions.

In this guide, we will cover the syntax, examples, and the advantages of using the PARTITION BY clause, making it a handy tool for working with PostgreSQL table partitioning.

PARTITION BY Clause in PostgreSQL

Using the PARTITION BY Clause we need to calculate row numbers, rank employees based on salary, or calculate cumulative totals. PARTITION BY allows us to perform these operations on subsets of data without losing the dataset's integrity. It is particularly useful for PostgreSQL partitioning of large datasets and making queries faster.

Syntax:

window_function() OVER (PARTITION BY column_name ORDER BY column_name)

key terms

  • window_function(): This can be any window function like ROW_NUMBER(), RANK(), SUM(), etc.
  • PARTITION BY column_name: Defines the column(s) by which the data will be partitioned.
  • ORDER BY column_name: Specifies the order in which rows within each partition are processed by the window function.

Why Use PARTITION BY in PostgreSQL?

Partitioning data with the PARTITION BY clause is beneficial in various scenarios:

  • Data Analysis: When we need to group data for window functions like ROW_NUMBER(), RANK(), or SUM(), PARTITION BY is essential to analyze each group or partition separately.
  • Performance Optimization: It enhances query performance by breaking down large datasets into smaller, more manageable chunks.
  • Efficient Querying: Helps PostgreSQL efficiently retrieve and process only relevant data within each partition, reducing query time and load on the database.

Examples of PARTITION BY Clause in PostgreSQL

Let's look at a few examples to demonstrate how the PARTITION BY clause works in PostgreSQL. Suppose we have a table called employees with the following data. This table shows the employee_id (automatically generated by the SERIAL type), their department, and their salary.

Query:

CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
department VARCHAR(50),
salary NUMERIC
);


INSERT INTO employees (department, salary) VALUES
('HR', 50000),
('HR', 60000),
('IT', 70000),
('IT', 80000),
('Sales', 55000),
('Sales', 65000);

Output:

employee_iddepartmentsalary
1HR50000
2HR60000
3IT70000
4IT80000
5Sales55000
6Sales65000

Example 1: Using PARTITION BY RANGE in PostgreSQL

In this example, we will partition employees by department and assign row numbers based on salary in descending order. We want to assign a row number to each employee within their respective department.

Query:

SELECT employee_id, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_number
FROM employees;

Output:

employee_iddepartmentsalaryrow_number
2HR600001
1HR500002
4IT800001
3IT700002
6Sales650001
5Sales550002

Explanation:

In this query, the PARTITION BY department groups the data by department, and within each partition, the rows are assigned a unique row number based on salary in descending order. This allows for separate row numbering in each department.

Example 2: Using PARTITION BY LIST for Cumulative Totals

In this example, we will calculate the cumulative salary for employees within each department. This operation helps in determining cumulative data within a partitioned subset.

Query:

SELECT employee_id, department, salary,
SUM(salary) OVER (PARTITION BY department ORDER BY salary) AS cumulative_salary
FROM employees;

Output:

employee_iddepartmentsalarycumulative_salary
1HR5000050000
2HR60000110000
3IT7000070000
4IT80000150000
5Sales5500055000
6Sales65000120000

Explanation:

In this query, the SUM() function calculates the cumulative salary for employees within each department. The PARTITION BY department ensures that the cumulative sum is calculated within each department, while the ORDER BY salary sorts the employees by their salaries before applying the cumulative sum.

Example 3: Using PARTITION BY HASH in PostgreSQL

Using the PARTITION BY HASH, we can split data based on hash values. This method is useful when we want to evenly distribute rows across partitions. For example, we could partition employee records based on their department's hash value:

Query:

CREATE TABLE employees_partitioned_by_hash (
employee_id SERIAL PRIMARY KEY,
department VARCHAR(50),
salary NUMERIC
)
PARTITION BY HASH (department);

CREATE TABLE employees_hr PARTITION OF employees_partitioned_by_hash
FOR VALUES WITH (MODULUS 3, REMAINDER 0);

CREATE TABLE employees_it PARTITION OF employees_partitioned_by_hash
FOR VALUES WITH (MODULUS 3, REMAINDER 1);

CREATE TABLE employees_sales PARTITION OF employees_partitioned_by_hash
FOR VALUES WITH (MODULUS 3, REMAINDER 2);

Explanation:

In this case, the table is PARTITION BY HASH, with each partition based on the hash value of the department. This approach helps in distributing data more evenly across partitions.

Conclusion

PARTITION BY clause is another important feature in PostgreSQL when using window functions as it creates partitions from a dataset to carry out operations and analysis on personal subsets of data. With this clause, we are in a position to get analytical details of the results without affecting the whole database.


Next Article
Article Tags :

Similar Reads