PostgreSQL - DISTINCT ON expression
Last Updated :
11 Oct, 2024
The DISTINCT ON
clause in PostgreSQL allows us to retrieve unique rows based on specific columns by offering more flexibility than the standard DISTINCT
clause. DISTINCT ON
allow us to specify which row to keep for each unique value based on an ORDER BY
clause.
This is particularly useful for selecting the most recent or highest values in grouped data. In this article, we’ll explore the PostgreSQL DISTINCT ON syntax, examples and so on.
What is the PostgreSQL DISTINCT ON Clause?
- The
DISTINCT ON
in PostgreSQL clause allows us to retrieve unique rows based on one or more columns in a table.
- However, unlike the standard
DISTINCT
clauses that discard all duplicate rows, DISTINCT ON
gives us more control.
- It enables us to determine which row to retain by arranging the rows in a particular order through the ORDER BY clause.
Syntax
SELECT DISTINCT ON (column1, column2, ...) column1, column2, ...
FROM table_name
ORDER BY column1, column2, ...;
Explanation:
- DISTINCT ON (column1, column2, ...): This part tells PostgreSQL to return the first row for each unique combination of the specified columns.
- ORDER BY: The
ORDER BY
clause is crucial because it determines which row from each group of duplicates will be kept. The rows are ordered based on the columns specified here.
Key Features of PostgreSQL DISTINCT ON
- Allows fetching the first unique row based on specified columns.
- Works with the
ORDER BY
clause to determine which row to keep in case of duplicates.
- Enables retrieving data in a more controlled manner compared to the standard
DISTINCT
.
Examples of Using PostgreSQL DISTINCT ON
Let’s explore some examples to understand how DISTINCT ON
works in real-world scenarios.
Example 1: Retrieve Highest Score for Each Student
First, create a table student_scores
to store students' scores in various subjects.
CREATE TABLE student_scores (
id SERIAL PRIMARY KEY,
name VARCHAR(50) NOT NULL,
subject VARCHAR(50) NOT NULL,
score INTEGER NOT NULL
);
Next, insert some sample data:
INSERT INTO student_scores (name, subject, score)
VALUES
('Alice', 'Math', 90),
('Bob', 'Math', 85),
('Alice', 'Physics', 92),
('Bob', 'Physics', 88),
('Charlie', 'Math', 95),
('Charlie', 'Physics', 90);
Now, let’s retrieve the highest score for each student in any subject:
SELECT DISTINCT ON (name) name, subject, score
FROM student_scores
ORDER BY name, score DESC;
Output:
name | subject | score |
---|
Alice | Physics | 92 |
Bob | Physics | 88 |
Charlie | Math | 95 |
Explanation: In this query, the DISTINCT ON (name)
clause ensures that we get one row for each student, and the ORDER BY
clause sorts the scores in descending order so that the highest score for each student is returned.
Example 2: Log Data – Latest Request by URL
Suppose we have a log table that records URLs and the duration of each request:
CREATE TABLE logs (
id SERIAL PRIMARY KEY,
url VARCHAR(255) NOT NULL,
request_duration INTEGER NOT NULL,
timestamp TIMESTAMP NOT NULL
);
Insert some data:
INSERT INTO logs (url, request_duration, timestamp)
VALUES
('/home', 120, '2024-01-01 10:00:00'),
('/about', 95, '2024-01-01 11:00:00'),
('/home', 110, '2024-01-01 12:00:00'),
('/contact', 105, '2024-01-01 10:30:00'),
('/about', 100, '2024-01-01 12:30:00');
To retrieve the most recent request duration for each URL, use:
SELECT DISTINCT ON (url) url, request_duration, timestamp
FROM logs
ORDER BY url, timestamp DESC;
Output:
url | request_duration | timestamp |
---|
/about | 100 | 2024-01-01 12:30:00 |
/contact | 105 | 2024-01-01 10:30:00 |
/home | 110 | 2024-01-01 12:00:00 |
Explanation: Here, DISTINCT ON (url)
returns the most recent request for each URL, thanks to the ORDER BY url, timestamp DESC
clause.
Important Points about PostgreSQL DISTINCT ON expression
- The PostgreSQL
DISTINCT
ON
expression is used to return only the first row of each set of rows where the given expression has the same value, effectively removing duplicates based on the specified column.
- It is used to retain the "first row" of each group of duplicates in the result set, based on the ordering specified in the
ORDER BY
clause.
- The
DISTINCT ON
expression must always match the leftmost expression in the ORDER BY
clause to ensure predictable results.
- Unlike the
DISTINCT
clause, which removes all duplicates, DISTINCT ON
allows for more fine-grained control by specifying which duplicate row to keep.
Conlusion
Overall, the PostgreSQL DISTINCT ON clause helps you get unique rows based on specific columns while giving you control over which row to keep. By using the ORDER BY
clause, you can decide which entry, such as the highest score or the most recent log, should be shown. This makes it a useful tool for organizing and retrieving data more efficiently in PostgreSQL.