PARTITION BY Clause in PostgreSQL
Last Updated :
11 Oct, 2024
In PostgreSQL, the PARTITION BY
clause plays an important role in dividing datasets into partitions so that various window functions can efficiently operate on those partitions.
In this guide, we will cover the syntax, examples, and the advantages of using the PARTITION BY
clause, making it a handy tool for working with PostgreSQL table partitioning.
PARTITION BY Clause in PostgreSQL
Using the PARTITION BY Clause we need to calculate row numbers, rank employees based on salary, or calculate cumulative totals. PARTITION BY
allows us to perform these operations on subsets of data without losing the dataset's integrity. It is particularly useful for PostgreSQL partitioning of large datasets and making queries faster.
Syntax:
window_function() OVER (PARTITION BY column_name ORDER BY column_name)
key terms
- window_function(): This can be any window function like ROW_NUMBER(), RANK(), SUM(), etc.
- PARTITION BY column_name: Defines the column(s) by which the data will be partitioned.
- ORDER BY column_name: Specifies the order in which rows within each partition are processed by the window function.
Why Use PARTITION BY
in PostgreSQL?
Partitioning data with the PARTITION BY
clause is beneficial in various scenarios:
- Data Analysis: When we need to group data for window functions like
ROW_NUMBER()
, RANK
()
, or SUM
()
, PARTITION BY
is essential to analyze each group or partition separately.
- Performance Optimization: It enhances query performance by breaking down large datasets into smaller, more manageable chunks.
- Efficient Querying: Helps PostgreSQL efficiently retrieve and process only relevant data within each partition, reducing query time and load on the database.
Examples of PARTITION BY Clause in PostgreSQL
Let's look at a few examples to demonstrate how the PARTITION BY clause works in PostgreSQL. Suppose we have a table called employees with the following data. This table shows the employee_id
(automatically generated by the SERIAL
type), their department, and their salary.
Query:
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
department VARCHAR(50),
salary NUMERIC
);
INSERT INTO employees (department, salary) VALUES
('HR', 50000),
('HR', 60000),
('IT', 70000),
('IT', 80000),
('Sales', 55000),
('Sales', 65000);
Output:
employee_id | department | salary |
---|
1 | HR | 50000 |
2 | HR | 60000 |
3 | IT | 70000 |
4 | IT | 80000 |
5 | Sales | 55000 |
6 | Sales | 65000 |
Example 1: Using PARTITION BY RANGE
in PostgreSQL
In this example, we will partition employees by department and assign row numbers based on salary in descending order. We want to assign a row number to each employee within their respective department.
Query:
SELECT employee_id, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_number
FROM employees;
Output:
employee_id | department | salary | row_number |
---|
2 | HR | 60000 | 1 |
1 | HR | 50000 | 2 |
4 | IT | 80000 | 1 |
3 | IT | 70000 | 2 |
6 | Sales | 65000 | 1 |
5 | Sales | 55000 | 2 |
Explanation:
In this query, the PARTITION BY department groups the data by department, and within each partition, the rows are assigned a unique row number based on salary in descending order. This allows for separate row numbering in each department.
Example 2: Using PARTITION BY LIST
for Cumulative Totals
In this example, we will calculate the cumulative salary for employees within each department. This operation helps in determining cumulative data within a partitioned subset.
Query:
SELECT employee_id, department, salary,
SUM(salary) OVER (PARTITION BY department ORDER BY salary) AS cumulative_salary
FROM employees;
Output:
employee_id | department | salary | cumulative_salary |
---|
1 | HR | 50000 | 50000 |
2 | HR | 60000 | 110000 |
3 | IT | 70000 | 70000 |
4 | IT | 80000 | 150000 |
5 | Sales | 55000 | 55000 |
6 | Sales | 65000 | 120000 |
Explanation:
In this query, the SUM() function calculates the cumulative salary for employees within each department. The PARTITION BY department ensures that the cumulative sum is calculated within each department, while the ORDER BY salary sorts the employees by their salaries before applying the cumulative sum.
Example 3: Using PARTITION BY HASH
in PostgreSQL
Using the PARTITION BY HASH, we can split data based on hash values. This method is useful when we want to evenly distribute rows across partitions. For example, we could partition employee records based on their department's hash value:
Query:
CREATE TABLE employees_partitioned_by_hash (
employee_id SERIAL PRIMARY KEY,
department VARCHAR(50),
salary NUMERIC
)
PARTITION BY HASH (department);
CREATE TABLE employees_hr PARTITION OF employees_partitioned_by_hash
FOR VALUES WITH (MODULUS 3, REMAINDER 0);
CREATE TABLE employees_it PARTITION OF employees_partitioned_by_hash
FOR VALUES WITH (MODULUS 3, REMAINDER 1);
CREATE TABLE employees_sales PARTITION OF employees_partitioned_by_hash
FOR VALUES WITH (MODULUS 3, REMAINDER 2);
Explanation:
In this case, the table is PARTITION BY HASH
, with each partition based on the hash value of the department. This approach helps in distributing data more evenly across partitions.
Conclusion
PARTITION BY clause is another important feature in PostgreSQL when using window functions as it creates partitions from a dataset to carry out operations and analysis on personal subsets of data. With this clause, we are in a position to get analytical details of the results without affecting the whole database.
Similar Reads
PostgreSQL - GROUP BY clause
The GROUP BY clause in PostgreSQL is an essential tool that allows us to group rows that share the same values in one or more columns. This powerful functionality is commonly used to perform aggregate calculations such as SUM(), COUNT(), AVG(), and more, enabling us to summarize data efficiently. In
4 min read
Parameterize an PostgreSQL IN clause
In PostgreSQL, the IN clause is a powerful tool for filtering data based on a set of specified values. However, when dealing with dynamic values or user input, it's essential to parameterize the IN clause to prevent SQL injection vulnerabilities and improve query performance. In this article, we'll
4 min read
PostgreSQL - LIMIT clause
The PostgreSQL LIMIT clause is a handy tool used to fetch a specific subset of rows returned by a query. This clause is optional and can be a powerful way to control the amount of data your query returns, especially when working with large datasets. Let us better understand the LIMIT Clause in Postg
2 min read
PostgreSQL - HAVING clause
The HAVING clause in PostgreSQL is an essential feature for filtering grouped data that has been aggregated using functions like SUM(), COUNT(), AVG(), and others. Unlike the WHERE clause, which filters rows before aggregation, the HAVING clause is used to filter results after the grouping and aggre
4 min read
PostgreSQL - CUME_DIST Function
The PostgreSQL CUME_DIST() function is a powerful analytical tool used to determine the relative position of a value within a set of given values. This function helps compute the cumulative distribution of values in a result set, which can be particularly useful in statistical analysis and reporting
3 min read
PostgreSQL DATE_PART Function
Handling dates and times efficiently is essential for data-driven applications, and PostgreSQL provides powerful built-in functions for managing and manipulating time-based data. One such function is the DATE_PART() function, which allows us to extract specific subfields from date and timestamp valu
5 min read
PostgreSQL FETCH Clause
The PostgreSQL FETCH clause is an essential feature for controlling and managing the number of rows returned in our SQL queries. It provides a standardized approach for limiting results, similar to the LIMIT clause but with more flexibility and compatibility across different database systems. This a
4 min read
PostgreSQL - AGE Function
In PostgreSQL, the AGE() function is a powerful tool for calculating the difference between two TIMESTAMP values. This function is especially useful for determining the age of individuals or the duration between events. Let us better understand the AGE() Function in PostgreSQL from this article.Synt
2 min read
partition_copy in C++ STL
partition_copy is the inbuilt function defined in <algorithm> library in STL. partition_copy function duplicates the partitioned elements into the various containers given in its parameters. It requires 5 arguments which are the beginning and ending position of the container, the beginning pos
4 min read
PostgreSQL - Partial Index
In PostgreSQL, partial indexes are a powerful feature designed to improve query performance while minimizing index size. By allowing you to specify the rows of a table that should be indexed, partial indexes can significantly speed up queries that use common WHERE conditions with constant values.Let
2 min read