0% found this document useful (0 votes)
4 views

lec05_aggregates

The document provides an introduction to data management focusing on aggregate functions in SQL, including examples of using COUNT, AVG, and SUM with GROUP BY clauses. It emphasizes the importance of grouping data to compute aggregates for different categories, such as job roles and product sales. Additionally, it highlights common errors in SQL queries related to grouping and selecting attributes.

Uploaded by

alessiatxy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

lec05_aggregates

The document provides an introduction to data management focusing on aggregate functions in SQL, including examples of using COUNT, AVG, and SUM with GROUP BY clauses. It emphasizes the importance of grouping data to compute aggregates for different categories, such as job roles and product sales. Additionally, it highlights common errors in SQL queries related to grouping and selecting attributes.

Uploaded by

alessiatxy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 111

Introduction to Data Management

Aggregates

April 3, 2024 Aggregates 1


Announcements
§ Homework 2
• Posted
• Due on Friday
• Sqlite

§ Homework 3: coming up soon (SQL Azure)

Today’s lecture is more challenging!


Please study the slides carefully at home

April 3, 2024 Aggregates 2


Aggregates
May use alias
SELECT count(*) as C
C
FROM Payroll
2
WHERE Job = ‘TA’;

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 3
Aggregates
May use alias
SELECT count(*) as C
C
FROM Payroll
2
WHERE Job = ‘TA’;

SELECT count(*) as C, avg(Salary) as A


C A
FROM Payroll
2 55000
WHERE Job = ‘TA’;

Payroll
We may compute
UserID Name Job Salary several aggreges
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 4
Today: GROUP BY

April 3, 2024 Aggregates 5


Group By
§ So far, a single aggregate, or a tuple of aggregates

count(*) avg(Salary) count(distinct Job)


… … …

§ Next: compute a set of aggregates, one per group:


… count(*)
… …
… …
… …

April 3, 2024 Aggregates 6


Group By Basics
SELECT Job, avg(Salary)
FROM Payroll
GROUP BY Job;

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 7
Group By Basics
SELECT Job, avg(Salary)
FROM Payroll
GROUP BY Job;

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 8
Group By Basics
SELECT Job, avg(Salary)
FROM Payroll
GROUP BY Job;

Payroll
UserID Name Job Salary
Job avg(Salary)
123 Jack TA 50000
TA 55000
345 Allison TA 60000
Prof 95000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 9
Group By Basics
Find total revenue for each product.

Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 10
Group By Basics
Find total revenue for each product.

SELECT Product, sum(Price*Quant)as Rev


FROM Sales
GROUP BY Product;

Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 11
Group By Basics
Find total revenue for each product.

SELECT Product, sum(Price*Quant)as Rev


FROM Sales
GROUP BY Product;

Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 12
Group By Basics
Find total revenue for each product.

SELECT Product, sum(Price*Quant)as Rev


FROM Sales
GROUP BY Product;

Sales
One row for each product
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan Product Rev
Bagel 1.50 20 March Bagel 140 60+50+30
Banana 0.5 50 Feb Banana 75 25+50
Banana 5 10 Feb Apple 40 40
Apple 4 10 March
April 3, 2024 Aggregates 13
Group By Basics
Find total revenue for each month.

Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 14
Group By Basics
Find total revenue for each month.

SELECT Month, sum(Price*Quant)as Rev


FROM Sales
GROUP BY Month;

Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 15
Group By Basics
Find total revenue for each month.

SELECT Month, sum(Price*Quant)as Rev


FROM Sales
GROUP BY Month;

Sales
GROUP BY Month
Product Price Quant Month Product Price Quant Month
Bagel 3 20 Jan Bagel 3 20 Jan
Bagel 5 10 Jan Bagel 5 10 Jan
Bagel 1.50 20 March Banana 0.5 50 Feb
Banana 0.5 50 Feb Banana 5 10 Feb
Banana 5 10 Feb Bagel 1.50 20 March
Apple 4 10 March Apple 4 10 March
April 3, 2024 Aggregates 16
Group By Basics
Find total revenue for each month.

SELECT Month, sum(Price*Quant)as Rev


FROM Sales
GROUP BY Month;

Sales
GROUP BY Month
Product Price Quant Month Product Price Quant Month
Bagel 3 20 Jan Bagel 3 20 Jan
Bagel 5 10 Jan Bagel 5 10 Jan
Bagel 1.50 20 March Banana 0.5 50 Feb
Banana 0.5 50 Feb Banana 5 10 Feb
Banana 5 10 Feb Bagel 1.50 20 March
Apple 4 10 March Apple 4 10 March
April 3, 2024 Aggregates 17
Group By Basics
One row for
Find total revenue for each month. each month
SELECT Month, sum(Price*Quant)as Rev
FROM Sales
Month Rev
GROUP BY Month;
Jan 140 60+50
Feb 75 25+50
March 40 40+30
Sales
GROUP BY Month
Product Price Quant Month Product Price Quant Month
Bagel 3 20 Jan Bagel 3 20 Jan
Bagel 5 10 Jan Bagel 5 10 Jan
Bagel 1.50 20 March Banana 0.5 50 Feb
Banana 0.5 50 Feb Banana 5 10 Feb
Banana 5 10 Feb Bagel 1.50 20 March
Apple 4 10 March Apple 4 10 March
April 3, 2024 Aggregates 18
Group By Basics
Find total revenue per month, for sales over 2.50

SELECT Month, sum(Price*Quant)as Rev


FROM Sales
WHERE Price > 2.5
GROUP BY Month;

Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 19
Group By Basics
Find total revenue per month, for sales over 2.50

SELECT Month, sum(Price*Quant)as Rev


FROM Sales
WHERE Price > 2.5
GROUP BY Month;

Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan Not interested
in these sales
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 20
Group By Basics
Find total revenue per month, for sales over 2.50

SELECT Month, sum(Price*Quant)as Rev


FROM Sales
WHERE Price > 2.5
GROUP BY Month;

Sales
GROUP BY Month
Product Price Quant Month Product Price Quant Month
Bagel 3 20 Jan Bagel 3 20 Jan
Bagel 5 10 Jan Bagel 5 10 Jan
Bagel 1.50 20 March Banana 5 10 Feb
Banana 0.5 50 Feb Apple 4 10 March
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 21
Group By Basics
One row for
Find total revenue per month, for sales over 2.50 each month
SELECT Month, sum(Price*Quant)as Rev
FROM Sales
Month Rev
WHERE Price > 2.5
Jan 140 60+50
GROUP BY Month;
Feb 75 25+50
March 40 40+30
Sales
GROUP BY Month
Product Price Quant Month Product Price Quant Month
Bagel 3 20 Jan Bagel 3 20 Jan
Bagel 5 10 Jan Bagel 5 10 Jan
Bagel 1.50 20 March Banana 5 10 Feb
Banana 0.5 50 Feb Apple 4 10 March
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 22
Group By Basics
Find total revenue for each product and each month.

SELECT Product, Month, sum(Price*Quant) as Rev


FROM Sales
GROUP BY Product, Month;

Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 23
Group By Basics
Find total revenue for each product and each month.

SELECT Product, Month, sum(Price*Quant) as Rev


FROM Sales
GROUP BY Product, Month;

Sales
Product Price Quant Month
Bagel 3 20 Jan
Product Month Rev
Bagel 5 10 Jan
Bagel Jan 110
Bagel 1.50 20 March
Bagel March 30
Banana 0.5 50 Feb
Banana Feb 75
Banana 5 10 Feb
Apple March 40
Apple 4 10 March
April 3, 2024 Aggregates 24
A Source of Errors
What does this query return?
SELECT Product, Price, sum(Price*Quant) as Rev
FROM Sales
GROUP BY Product;

Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 25
A Source of Errors
What does this query return?
SELECT Product, Price, sum(Price*Quant) as Rev
FROM Sales
GROUP BY Product;

No unique
Sales One row for price for
each product the group
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan Product Price Rev
Bagel 1.50 20 March Bagel ?? 140
Banana 0.5 50 Feb Banana ?? 75
Banana 5 10 Feb Apple ?? 40
Apple 4 10 March
April 3, 2024 Aggregates 26
A Source of Errors
What does this query return?
SELECT Product, Price, sum(Price*Quant) as Rev
FROM Sales
GROUP BY Product;
Rule: every attribute in SELECT
must also occur in GROUP BY No unique
Sales One row for price for
each product the group
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan Product Price Rev
Bagel 1.50 20 March Bagel ?? 140
Banana 0.5 50 Feb Banana ?? 75
Banana 5 10 Feb Apple ?? 40
Apple 4 10 March
April 3, 2024 Aggregates 27
Discussion so far

§ GROUP BY: list of attributes

§ SELECT: some group-by attrs, and aggregates

§ One output tuple for each group

April 3, 2024 Aggregates 28


Semantics

April 3, 2024 Aggregates 29


Semantics
SELECT attr1, attr2,.., agg1(..), agg2(..),..
FROM Tables
WHERE Condition
GROUP BY attr1, attr2,..;

§ Step 1: compute SELECT * FROM .. WHERE..

§ Step 2: GROUP BY

§ Step 3: for each group emit 1 output

April 3, 2024 Aggregates 30


Example
SELECT Month, sum(Quant)
FROM Sales
WHERE Price < 4.5
GROUP BY Month;

Product Price Quant Month


Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March

April 3, 2024 Aggregates 31


Example
SELECT Month, sum(Quant)
FROM Sales
WHERE Price < 4.5
GROUP BY Month;

Product Price Quant Month Step 1 Product Price Quant Month


Bagel 3 20 Jan Bagel 3 20 Jan
Bagel 5 10 Jan
SELECT *
Bagel 1.50 20 March
FROM Sales
Bagel 1.50 20 March Banana 0.5 50 Feb
Banana 0.5 50 Feb
WHERE Price < 4.5; Apple 4 10 March
Banana 5 10 Feb
Apple 4 10 March

April 3, 2024 Aggregates 32


Example
SELECT Month, sum(Quant)
FROM Sales
WHERE Price < 4.5
GROUP BY Month;

Product Price Quant Month Step 1 Product Price Quant Month


Bagel 3 20 Jan Bagel 3 20 Jan
Bagel 5 10 Jan
SELECT *
Bagel 1.50 20 March
FROM Sales
Bagel 1.50 20 March Banana 0.5 50 Feb
Banana 0.5 50 Feb
WHERE Price < 4.5; Apple 4 10 March
Banana 5 10 Feb
Apple 4 10 March

Step 2
Product Price Quant Month
Bagel 3 20 Jan

Group-by Banana
Bagel
0.5
1.50
50
20
Feb
March
Apple 4 10 March

April 3, 2024 Aggregates 33


Example
SELECT Month, sum(Quant)
FROM Sales Each group,
WHERE Price < 4.5 one output
GROUP BY Month;

Product Price Quant Month Step 1 Product Price Quant Month


Bagel 3 20 Jan Bagel 3 20 Jan
Bagel 5 10 Jan
SELECT *
Bagel 1.50 20 March
FROM Sales
Bagel 1.50 20 March Banana 0.5 50 Feb
Banana 0.5 50 Feb
WHERE Price < 4.5; Apple 4 10 March
Banana 5 10 Feb
Apple 4 10 March

Step 2
Product Price Quant Month Month Quant
Bagel 3 20 Jan Jan 20

Group-by Banana
Bagel
0.5
1.50
50
20
Feb
March
Feb
March
50
30
Apple 4 10 March

Step 3
April 3, 2024 Aggregates 34
Multiple Aggregates
SELECT Product, count(*), sum(Quant) Product count… sum…

FROM Sales Bagel 3 50

GROUP BY Product; Banana 2 60


Apple 1 10

Product Price Quant Month


Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 35
Multiple Aggregates
SELECT Product, count(*), sum(Quant) Product count… sum…

FROM Sales Bagel 3 50

GROUP BY Product; Banana 2 60


Apple 1 10

SELECT Product, count(*)


FROM Sales
GROUP BY Product;

Product Price Quant Month


Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 36
Multiple Aggregates
SELECT Product, count(*), sum(Quant) Product count… sum…

FROM Sales Bagel 3 50

GROUP BY Product; Banana 2 60


Apple 1 10

SELECT Product, count(*) Product count…


FROM Sales Bagel 3
GROUP BY Product; Banana 2
Apple 1

Product Price Quant Month


Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 37
Multiple Aggregates
SELECT Product, count(*), sum(Quant) Product count… sum…

FROM Sales Bagel 3 50

GROUP BY Product; Banana 2 60


Apple 1 10

SELECT Product, count(*) Product count…


FROM Sales Bagel 3
GROUP BY Product; Banana 2
Apple 1
SELECT Product
FROM Sales
GROUP BY Product;
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 38
Multiple Aggregates
SELECT Product, count(*), sum(Quant) Product count… sum…

FROM Sales Bagel 3 50

GROUP BY Product; Banana 2 60


Apple 1 10

SELECT Product, count(*) Product count…


FROM Sales Bagel 3
GROUP BY Product; Banana 2
Apple 1
SELECT Product Product
FROM Sales Bagel
GROUP BY Product; Banana
Product Price Quant Month Apple
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 39
Multiple Aggregates
SELECT Product, count(*), sum(Quant) Product count… sum…

FROM Sales Bagel 3 50

GROUP BY Product; Banana 2 60


Apple 1 10

SELECT Product, count(*) Product count…


FROM Sales Bagel 3
GROUP BY Product; Banana 2
Apple 1
SELECT Product Product
FROM Sales Bagel
GROUP BY Product; Banana
Product Price Quant Month Apple
Bagel 3 20 Jan Same as
Bagel 5 10 Jan
Bagel 1.50 20 March SELECT DISTINCT Product
Banana 0.5 50 Feb FROM Sales;
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 40
Coping with Empty Groups

April 3, 2024 Aggregates 41


Coping with Empty Groups

§ A group is never empty, by definition!

§ Therefore count(*) ≥ 1

§ Sometimes we want answers with count(*)=0

§ Then we use outer-joins

April 3, 2024 Aggregates 42


Coping with Empty Groups
SELECT Job, count(*) Job Count(*) Count people
FROM Payroll TA 2
per job
GROUP BY Job;
Prof 2

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 43
Coping with Empty Groups
SELECT Job, count(*) Job Count(*)
FROM Payroll TA 2
GROUP BY Job;
Prof 2

SELECT Job, count(*)


FROM Payroll
WHERE Salary > 55000
GROUP BY Job;

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 44
Coping with Empty Groups
SELECT Job, count(*) Job Count(*)
FROM Payroll TA 2
GROUP BY Job;
Prof 2

SELECT Job, count(*) Job Count(*)


FROM Payroll
TA 1
WHERE Salary > 55000
GROUP BY Job; Prof 2

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 45
Coping with Empty Groups
SELECT Job, count(*) Job Count(*)
FROM Payroll TA 2
GROUP BY Job;
Prof 2

SELECT Job, count(*) Job Count(*)


FROM Payroll
TA 1
WHERE Salary > 55000
GROUP BY Job; Prof 2

SELECT Job, count(*)


FROM Payroll
WHERE Salary > 75000
GROUP BY Job;
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 46
Coping with Empty Groups
SELECT Job, count(*) Job Count(*)
FROM Payroll TA 2
GROUP BY Job;
Prof 2

SELECT Job, count(*) Job Count(*)


FROM Payroll
TA 1
WHERE Salary > 55000
GROUP BY Job; Prof 2

TA group
SELECT Job, count(*) no longer Job Count(*)
FROM Payroll
exists Prof 2
WHERE Salary > 75000
GROUP BY Job;
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 47
Coping with Empty Groups
SELECT Job, count(*) Job Count(*)
FROM Payroll TA 2
GROUP BY Job;
Prof 2

SELECT Job, count(*) Job Count(*)


FROM Payroll
TA 1
WHERE Salary > 55000
GROUP BY Job; Prof 2

TA group
SELECT Job, count(*) no longer Job Count(*)
FROM Payroll
exists Prof 2
WHERE Salary > 75000
GROUP BY Job;
Payroll
UserID Name Job Salary
123 Jack TA 50000
Can never have count(*)=0
345 Allison TA 60000
If we want them: outer joins!
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 48
Coping with Empty Groups
How many cars does each person drive? Let’s start
with a simpler
example

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 49
Coping with Empty Groups
How many cars does each person drive? Let’s start
with a simpler
SELECT P.Name, count(*) example
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID
GROUP BY P.UserID;

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 50
Coping with Empty Groups
How many cars does each person drive? Let’s start
with a simpler
SELECT P.Name, count(*) example
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID
GROUP BY P.UserID;

Name count
We want this
Jack 1
Magda 2
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 51
Coping with Empty Groups
How many cars does each person drive?
Incorrect! Why??
SELECT P.Name, count(*)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID
GROUP BY P.UserID;

Name count
We want this
Jack 1
Magda 2
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 52
Coping with Empty Groups
How many cars does each person drive?
Incorrect! Why??
SELECT P.Name, count(*)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID P.Name must
GROUP BY P.UserID; occur in GROUP BY

Name count
We want this
Jack 1
Magda 2
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 53
Coping with Empty Groups
How many cars does each person drive?
Now it’s correct
SELECT P.Name, count(*)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID
GROUP BY P.Name, P.UserID;

Name count
We want this
Jack 1
Magda 2
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 54
Coping with Empty Groups
How many cars does each person drive?

SELECT P.Name, count(*)


FROM Payroll P, Regist R
WHERE P.UserID = R.UserID
GROUP BY P.Name, P.UserID;

P.UserID P.Name P.Job P.Salary R.UserID R.Car

123 Jack TA 50000 123 Charger


Steps 1,2:
567 Magda Prof 90000 567 Civic

567 Magda Prof 90000 567 Pinto

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 55
Coping with Empty Groups
How many cars does each person drive?

SELECT P.Name, count(*)


FROM Payroll P, Regist R
WHERE P.UserID = R.UserID
GROUP BY P.Name, P.UserID;

P.UserID P.Name P.Job P.Salary R.UserID R.Car Name count


123 Jack TA 50000 123 Charger Jack 1
Steps 1,2:
567 Magda Prof 90000 567 Civic Magda 2
567 Magda Prof 90000 567 Pinto

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 56
Coping with Empty Groups
How many cars does each person drive?
To also include Allison, Dan,
SELECT P.Name, count(*) we will use outer joins
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID
GROUP BY P.Name, P.UserID;

P.UserID P.Name P.Job P.Salary R.UserID R.Car Name count


123 Jack TA 50000 123 Charger Jack 1
Steps 1,2:
567 Magda Prof 90000 567 Civic Magda 2
567 Magda Prof 90000 567 Pinto

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 57
Coping with Empty Groups
How many cars does each person drive?
To also include Allison, Dan,
we will use outer joins

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 58
Coping with Empty Groups
How many cars does each person drive?
SELECT P.Name, count(*)
FROM Payroll P LEFT OUTER JOIN Regist R ON P.UserID = R.UserID
GROUP BY P.Name, P.UserID;

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 59
Coping with Empty Groups
How many cars does each person drive?
SELECT P.Name, count(*)
FROM Payroll P LEFT OUTER JOIN Regist R ON P.UserID = R.UserID
GROUP BY P.Name, P.UserID;

P.UserID P.Name P.Job P.Salary R.UserID R.Car

123 Jack TA 50000 123 Charger

345 Allison TA 60000 NULL NULL

Steps 1 567 Magda Prof 90000 567 Civic

567 Magda Prof 90000 567 Pinto

789 Dan Prof 100000 NULL NULL

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 60
Coping with Empty Groups
How many cars does each person drive?
SELECT P.Name, count(*)
FROM Payroll P LEFT OUTER JOIN Regist R ON P.UserID = R.UserID
GROUP BY P.Name, P.UserID;

P.UserID P.Name P.Job P.Salary R.UserID R.Car

123 Jack TA 50000 123 Charger

345 Allison TA 60000 NULL NULL

Steps 1,2: 567 Magda Prof 90000 567 Civic

567 Magda Prof 90000 567 Pinto

789 Dan Prof 100000 NULL NULL

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 61
Coping with Empty Groups
How many cars does each person drive?
SELECT P.Name, count(*)
FROM Payroll P LEFT OUTER JOIN Regist R ON P.UserID = R.UserID
GROUP BY P.Name, P.UserID;

P.UserID P.Name P.Job P.Salary R.UserID R.Car

123 Jack TA 50000 123 Charger Name count


345 Allison TA 60000 NULL NULL Jack 1
Steps 1,2: 567 Magda Prof 90000 567 Civic Allison 1
567 Magda Prof 90000 567 Pinto Magda 2
789 Dan Prof 100000 NULL NULL Dan 1

Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 62
Coping with Empty Groups
How many cars does each person drive?
SELECT P.Name, count(*)
FROM Payroll P LEFT OUTER JOIN Regist R ON P.UserID = R.UserID
GROUP BY P.Name, P.UserID;

P.UserID P.Name P.Job P.Salary R.UserID R.Car

123 Jack TA 50000 123 Charger Name count


345 Allison TA 60000 NULL NULL Jack 1
Steps 1,2: 567 Magda Prof 90000 567 Civic Allison 1
567 Magda Prof 90000 567 Pinto Magda 2
789 Dan Prof 100000 NULL NULL Dan 1

Payroll Regist
UserID Name Job Salary UserID Car Should be 0!
How to fix?
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 63
Coping with Empty Groups
How many cars does each person drive? Count ignores NULLs
SELECT P.Name, count(R.Car)
FROM Payroll P LEFT OUTER JOIN Regist R ON P.UserID = R.UserID
GROUP BY P.Name, P.UserID;

P.UserID P.Name P.Job P.Salary R.UserID R.Car

123 Jack TA 50000 123 Charger Name count


345 Allison TA 60000 NULL NULL Jack 1
Steps 1,2: 567 Magda Prof 90000 567 Civic Allison 0
567 Magda Prof 90000 567 Pinto Magda 2
789 Dan Prof 100000 NULL NULL Dan 0

Payroll Regist
UserID Name Job Salary UserID Car Now it’s
correct
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 64
Coping with Empty Groups
For each job, how many people earn more than 75000?

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 65
Coping with Empty Groups
For each job, how many people earn more than 75000?

SELECT Job, count(*)


Job Count(*)
FROM Payroll
WHERE Salary > 75000 Prof 2
GROUP BY Job;

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 66
Coping with Empty Groups
For each job, how many people earn more than 75000?

SELECT Job, count(*)


Job Count(*)
FROM Payroll
WHERE Salary > 75000 Prof 2
GROUP BY Job;

Payroll To include users


UserID Name Job Salary where count(*)=0,
123 Jack TA 50000 we will use a
345 Allison TA 60000 self-outer-join
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 67
Coping with Empty Groups
For each job, how many people earn more than 75000?
SELECT P1.Job, count(DISTINCT P2.UserID)
FROM Payroll P1 LEFT OUTER JOIN Payroll P2
ON P1.Job = P2.Job and P2.Salary > 75000
GROUP BY P1.Job;

Payroll To include users


UserID Name Job Salary where count(*)=0,
123 Jack TA 50000 we will use a
345 Allison TA 60000 self-outer-join
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 68
Coping with Empty Groups
For each job, how many people earn more than 75000? What goes here?
SELECT P1.Job, count(DISTINCT P2.UserID) Keep your thought!
FROM Payroll P1 LEFT OUTER JOIN Payroll P2
ON P1.Job = P2.Job and P2.Salary > 75000
GROUP BY P1.Job;

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 69
Coping with Empty Groups
For each job, how many people earn more than 75000?
SELECT P1.Job, count(DISTINCT P2.UserID)
FROM Payroll P1 LEFT OUTER JOIN Payroll P2
ON P1.Job = P2.Job and P2.Salary > 75000
Left Outer Join
GROUP BY P1.Job;

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 70
Coping with Empty Groups
For each job, how many people earn more than 75000?
SELECT P1.Job, count(DISTINCT P2.UserID)
FROM Payroll P1 LEFT OUTER JOIN Payroll P2
ON P1.Job = P2.Job and P2.Salary > 75000
Left Outer Join
GROUP BY P1.Job;
P1.UserID P1.Name P1.Job P1.Salary P2.UserID P2.Name P2.Job P2.Salary
123 Jack TA 50000 NULL NULL NULL NULL
345 Allison TA 60000 NULL NULL NULL NULL
567 Magda Prof 90000 567 Magda Prof 90000
789 Dan Prof 100000 567 Magda Prof 90000
567 Magda Prof 90000 789 Dan Prof 100000
789 Dan Prof 100000 789 Dan Prof 100000

Payroll
UserID Name Job Salary We want to include all jobs,
123 Jack TA 50000 even when the count is 0.
345 Allison TA 60000 Need an outer join with the Jobs
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 71
Coping with Empty Groups
For each job, how many people earn more than 75000?
SELECT P1.Job, count(DISTINCT P2.UserID)
FROM Payroll P1 LEFT OUTER JOIN Payroll P2 Group by P1.Job
ON P1.Job = P2.Job and P2.Salary > 75000
GROUP BY P1.Job;
P1.UserID P1.Name P1.Job P1.Salary P2.UserID P2.Name P2.Job P2.Salary
123 Jack TA 50000 NULL NULL NULL NULL
345 Allison TA 60000 NULL NULL NULL NULL
567 Magda Prof 90000 567 Magda Prof 90000
789 Dan Prof 100000 567 Magda Prof 90000
567 Magda Prof 90000 789 Dan Prof 100000
789 Dan Prof 100000 789 Dan Prof 100000

Payroll
UserID Name Job Salary We want to include all jobs,
123 Jack TA 50000 even when the count is 0.
345 Allison TA 60000 Need an outer join with the Jobs
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 72
Coping with Empty Groups
For each job, how many people earn more than 75000? What do we write here?
SELECT P1.Job, count(DISTINCT P2.UserID)
FROM Payroll P1 LEFT OUTER JOIN Payroll P2 Job …
ON P1.Job = P2.Job and P2.Salary > 75000 Want this: TA 0
GROUP BY P1.Job; Prof 2

P1.UserID P1.Name P1.Job P1.Salary P2.UserID P2.Name P2.Job P2.Salary


123 Jack TA 50000 NULL NULL NULL NULL
345 Allison TA 60000 NULL NULL NULL NULL
567 Magda Prof 90000 567 Magda Prof 90000
789 Dan Prof 100000 567 Magda Prof 90000
567 Magda Prof 90000 789 Dan Prof 100000
789 Dan Prof 100000 789 Dan Prof 100000

Payroll
UserID Name Job Salary We want to include all jobs,
123 Jack TA 50000 even when the count is 0.
345 Allison TA 60000 Need an outer join with the Jobs
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 73
Coping with Empty Groups
For each job, how many people earn more than 75000?
SELECT P1.Job, count(DISTINCT P2.UserID)
FROM Payroll P1 LEFT OUTER JOIN Payroll P2 Count this
ON P1.Job = P2.Job and P2.Salary > 75000
GROUP BY P1.Job;
P1.UserID P1.Name P1.Job P1.Salary P2.UserID P2.Name P2.Job P2.Salary
123 Jack TA 50000 NULL NULL NULL NULL
345 Allison TA 60000 NULL NULL NULL NULL
567 Magda Prof 90000 567 Magda Prof 90000
789 Dan Prof 100000 567 Magda Prof 90000
567 Magda Prof 90000 789 Dan Prof 100000
789 Dan Prof 100000 789 Dan Prof 100000

Payroll
UserID Name Job Salary We want to include all jobs,
123 Jack TA 50000 even when the count is 0.
345 Allison TA 60000 Need an outer join with the Jobs
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 74
Coping with Empty Groups
For each job, how many people earn more than 75000?
SELECT P1.Job, count(DISTINCT P2.UserID) Job Count(…)
FROM Payroll P1 LEFT OUTER JOIN Payroll P2 TA 0
ON P1.Job = P2.Job and P2.Salary > 75000 Prof 2
GROUP BY P1.Job;
P1.UserID P1.Name P1.Job P1.Salary P2.UserID P2.Name P2.Job P2.Salary
123 Jack TA 50000 NULL NULL NULL NULL
345 Allison TA 60000 NULL NULL NULL NULL
567 Magda Prof 90000 567 Magda Prof 90000
789 Dan Prof 100000 567 Magda Prof 90000
567 Magda Prof 90000 789 Dan Prof 100000
789 Dan Prof 100000 789 Dan Prof 100000

Payroll
UserID Name Job Salary We want to include all jobs,
123 Jack TA 50000 even when the count is 0.
345 Allison TA 60000 Need an outer join with the Jobs
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 75
Discussion

Coping with empty groups requires some creativity

§ Use Left-outer-join

§ Sometimes, you need a self-left-outer-join

April 3, 2024 Aggregates 76


The HAVING Clause

April 3, 2024 Aggregates 77


The HAVING Clause

§ WHERE:
• Applies a predicate to a single tuple*
• Cannot use any aggregate operation

§ HAVING:
• Applies a predicate to an entire group
• May use aggregate operations
• Can only check attributes occurring in GROUP-BY

* Actually, to one tuple from each relation in the FROM clause


April 3, 2024 Aggregates 78
The HAVING Clause
Find the total quantity of products that were sold ≥ 2 times.

Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March

April 3, 2024 Aggregates 79


The HAVING Clause
Find the total quantity of products that were sold ≥ 2 times.

SELECT Product, sum(Quant)


FROM Sales
GROUP BY Product
HAVING count(*) >= 2;

Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March

April 3, 2024 Aggregates 80


The HAVING Clause
Find the total quantity of products that were sold ≥ 2 times.

SELECT Product, sum(Quant)


FROM Sales
GROUP BY Product
HAVING count(*) >= 2;

Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March

April 3, 2024 Aggregates 81


The HAVING Clause
Find the total quantity of products that were sold ≥ 2 times.

SELECT Product, sum(Quant)


FROM Sales
GROUP BY Product
HAVING count(*) >= 2;

Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan count(*)=3
Bagel 1.50 20 March
Banana 0.5 50 Feb
count(*)=2
Banana 5 10 Feb
Apple 4 10 March count(*)=1 NOT included

April 3, 2024 Aggregates 82


The HAVING Clause
Find the total quantity of products that were sold ≥ 2 times.

SELECT Product, sum(Quant)


FROM Sales
GROUP BY Product
HAVING count(*) >= 2;

Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan count(*)=3 Product sum
Bagel 1.50 20 March Bagel 50
Banana 0.5 50 Feb Banana 60
count(*)=2
Banana 5 10 Feb
Apple 4 10 March count(*)=1 NOT included

April 3, 2024 Aggregates 83


SQL Query Summary
SELECT A
FROM R1, …, Rn
WHERE C1
GROUP BY a1, …, ak
HAVING C2
ORDER BY T
A = any attributes from a1, …, ak. and/or any aggregates

C1 = any condition on the attributes in R1, …, Rn

C2 = any condition on a1, …, ak and/or any aggregates

T = any attributes from a1, …, ak and/or any aggregates


April 3, 2024 Aggregates 84
Discussion: WHERE v.s. HAVING
§ WHERE:
• Applies to single tuple from each table
• May decrease size of groups, even make them empty
• Cannot use aggregates (count(*)=5, sum(…) > 10)

§ HAVING:
• Applies to entire group: keep it or drop it
• May use aggregates (count(*)=5, sum(…) > 10)
• May only use attributes in GROUP-BY

April 3, 2024 Aggregates 85


The Witness

April 3, 2024 Aggregates 86


The Witness
§ SQL provides the aggregate operators min, max

§ SQL does not have argmin or argmax

§ Often we want to find the record that achieves that


minimum or maximum: we call it The Witness

§ One way to compute it is using the HAVING clause

April 3, 2024 Aggregates 87


The Witnessing Problem
Find the person with highest salary for each job

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 88
The Witnessing Problem
Find the person with highest salary for each job

Job Name Salary


Desired answer: TA Allison 60000
Prof Dan 100000

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 89
The Witnessing Problem
Find the person with highest salary for each job

SELECT Job, MAX(Salary) Job Salary


FROM Payroll TA 60000
GROUP BY Job Prof 100000

Finding max is easy.

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 90
The Witnessing Problem
Find the person with highest salary for each job

SELECT Job, MAX(Salary) Job Salary


FROM Payroll TA 60000
GROUP BY Job Prof 100000

Finding max is easy.

Payroll
UserID Name Job Salary But we want argmax.
123 Jack TA 50000 How do we find
345 Allison TA 60000 the witness?
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 91
The Witnessing Problem
Find the person with highest salary for each job

SELECT Job, Name, MAX(Salary) Does this work?


FROM Payroll
GROUP BY Job

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 92
The Witnessing Problem
Find the person with highest salary for each job

SELECT Job, Name, MAX(Salary) Does this work?


FROM Payroll
GROUP BY Job

WRONG!
Name not in GROUP BY
Payroll
UserID Name Job Salary
123 Jack TA 50000 Sqlite does not return an error,
but returns junk outputs.
345 Allison TA 60000
Don’t use this.
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 93
The Witnessing Problem
Find the person with highest salary for each job

Plan:
1. Compute the max(Salary) for each Job
2. Join back with Payroll on Job
3. Return the users where Salary = max(Salary)

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 94
The Witnessing Problem
Find the person with highest salary for each job

We first join
Plan:
1. Compute the max(Salary) for each Job
2. Join back with Payroll on Job
3. Return the users where Salary = max(Salary)
Goes in HAVING
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 95
The Witnessing Problem
Find the person with highest salary for each job

SELECT P1.Job, MAX(P1.Salary)y


FROM Payroll AS P1, Payroll AS P2
WHERE P1.Job = P2.Job
GROUP BY P1.Job, P2.Name, P2.Salary
HAVING P2.Salary = MAX(P1.Salary)

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 96
The Witnessing Problem
Find the person with highest salary for each job

SELECT P1.Job, MAX(P1.Salary)y


FROM Payroll AS P1, Payroll AS P2
WHERE P1.Job = P2.Job
GROUP BY P1.Job, P2.Name, P2.Salary
HAVING P2.Salary = MAX(P1.Salary)

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 97
The Witnessing Problem
Find the person with highest salary for each job

SELECT P1.Job, P2.Name, P2.Salary


FROM Payroll AS P1, Payroll AS P2
WHERE P1.Job = P2.Job
GROUP BY P1.Job, P2.Name, P2.Salary
HAVING P2.Salary = MAX(P1.Salary)

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 98
The Witnessing Problem
Find the person with highest salary for each job

SELECT P1.Job, P2.Name, P2.Salary


FROM Payroll AS P1, Payroll AS P2
WHERE P1.Job = P2.Job
GROUP BY P1.Job, P2.Name, P2.Salary
HAVING P2.Salary = MAX(P1.Salary)

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 99
The Witnessing Problem
Find the person with highest salary for each job

SELECT P1.Job, P2.Name, P2.Salary


FROM Payroll AS P1, Payroll AS P2
WHERE P1.Job = P2.Job
GROUP BY P1.Job, P2.Name, P2.Salary
HAVING P2.Salary = MAX(P1.Salary)
Incorrect!
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 100
The Witnessing Problem
Find the person with highest salary for each job

SELECT P1.Job, P2.Name, P2.Salary


FROM Payroll AS P1, Payroll AS P2
WHERE P1.Job = P2.Job
GROUP BY P1.Job, P2.Name, P2.Salary
HAVING P2.Salary = MAX(P1.Salary)
Correct; but not done!
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 101
The Witnessing Problem
Find the person with highest salary for each job

SELECT P1.Job, P2.Name, P2.Salary


FROM Payroll AS P1, Payroll AS P2
WHERE P1.Job = P2.Job
GROUP BY P1.Job, P2.Name, P2.Salary
HAVING P2.Salary = MAX(P1.Salary)

Payroll Which P2 should


UserID Name Job Salary we return for each Job?

123 Jack TA 50000


345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 102
The Witnessing Problem
Find the person with highest salary for each job

SELECT P1.Job, P2.Name, P2.Salary


FROM Payroll AS P1, Payroll AS P2
WHERE P1.Job = P2.Job
GROUP BY P1.Job, P2.Name, P2.Salary
HAVING P2.Salary = MAX(P1.Salary)

Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 103
The Witnessing Problem
Find the person with highest salary for each job
SELECT P1.Job, P2.Name, P2.Salary
FROM Payroll AS P1, Payroll AS P2
WHERE P1.Job = P2.Job
GROUP BY P1.Job, P2.Name, P2.Salary
HAVING MAX(P1.Salary) = P2.Salary;

UserID Name Job Salary


123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000

April 3, 2024 Aggregates 104


The Witnessing Problem
Find the person with highest salary for each job
SELECT P1.Job, P2.Name, P2.Salary
FROM Payroll AS P1, Payroll AS P2
WHERE P1.Job = P2.Job
GROUP BY P1.Job, P2.Name, P2.Salary
Payroll join with Payroll
HAVING MAX(P1.Salary) = P2.Salary;
P1 P2
UserID Name Job Salary UserID Name Job Salary
123 Jack TA 50000 123 Jack TA 50000
345 Allison TA 60000 123 Jack TA 50000
123 Jack TA 50000 345 Allison TA 60000
345 Allison TA 60000 345 Allison TA 60000
567 Magda Prof 90000 567 Magda Prof 90000
789 Dan Prof 100000 567 Magda Prof 90000
567 Magda Prof 90000 789 Dan Prof 100000
789 Dan Prof 100000 789 Dan Prof 100000

UserID Name Job Salary


123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000

April 3, 2024 Aggregates 105


The Witnessing Problem
Find the person with highest salary for each job
SELECT P1.Job, P2.Name, P2.Salary
FROM Payroll AS P1, Payroll AS P2
WHERE P1.Job = P2.Job
Group by
GROUP BY P1.Job, P2.Name, P2.Salary
HAVING MAX(P1.Salary) = P2.Salary;
P1 P2
UserID Name Job Salary UserID Name Job Salary
123 Jack TA 50000 123 Jack TA 50000
345 Allison TA 60000 123 Jack TA 50000
123 Jack TA 50000 345 Allison TA 60000
345 Allison TA 60000 345 Allison TA 60000
567 Magda Prof 90000 567 Magda Prof 90000
789 Dan Prof 100000 567 Magda Prof 90000
567 Magda Prof 90000 789 Dan Prof 100000
789 Dan Prof 100000 789 Dan Prof 100000

UserID Name Job Salary


123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000

April 3, 2024 Aggregates 106


The Witnessing Problem
Find the person with highest salary for each job
SELECT P1.Job, P2.Name, P2.Salary
FROM Payroll AS P1, Payroll AS P2
WHERE P1.Job = P2.Job Compute max(P1.Salary)
GROUP BY P1.Job, P2.Name, P2.Salary
HAVING MAX(P1.Salary) = P2.Salary;
P1 P2
UserID Name Job Salary UserID Name Job Salary
123 Jack TA 50000 123 Jack TA 50000
max(salary)=60000
345 Allison TA 60000 123 Jack TA 50000
123 Jack TA 50000 345 Allison TA 60000
max(salary)=60000
345 Allison TA 60000 345 Allison TA 60000
567 Magda Prof 90000 567 Magda Prof 90000
max(salary)=100000
789 Dan Prof 100000 567 Magda Prof 90000
567 Magda Prof 90000 789 Dan Prof 100000
max(salary)=100000
789 Dan Prof 100000 789 Dan Prof 100000

UserID Name Job Salary


123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000

April 3, 2024 Aggregates 107


The Witnessing Problem
Find the person with highest salary for each job
SELECT P1.Job, P2.Name, P2.Salary
FROM Payroll AS P1, Payroll AS P2
WHERE P1.Job = P2.Job Check HAVING
GROUP BY P1.Job, P2.Name, P2.Salary
HAVING MAX(P1.Salary) = P2.Salary;
P1 P2
UserID Name Job Salary UserID Name Job Salary
123 Jack TA 50000 123 Jack TA 50000
max(salary)=60000
345 Allison TA 60000 123 Jack TA 50000
123 Jack TA 50000 345 Allison TA 60000
max(salary)=60000
345 Allison TA 60000 345 Allison TA 60000
567 Magda Prof 90000 567 Magda Prof 90000
max(salary)=100000
789 Dan Prof 100000 567 Magda Prof 90000
567 Magda Prof 90000 789 Dan Prof 100000
max(salary)=100000
789 Dan Prof 100000 789 Dan Prof 100000

UserID Name Job Salary


123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000

April 3, 2024 Aggregates 108


The Witnessing Problem
Find the person with highest salary for each job
SELECT P1.Job, P2.Name, P2.Salary
FROM Payroll AS P1, Payroll AS P2
WHERE P1.Job = P2.Job
GROUP BY P1.Job, P2.Name, P2.Salary
HAVING MAX(P1.Salary) = P2.Salary;
P1 P2
UserID Name Job Salary UserID Name Job Salary
123 Jack TA 50000 123 Jack TA 50000
max(salary)=60000
345 Allison TA 60000 123 Jack TA 50000
123 Jack TA 50000 345 Allison TA 60000
max(salary)=60000
345 Allison TA 60000 345 Allison TA 60000
567 Magda Prof 90000 567 Magda Prof 90000
max(salary)=100000
789 Dan Prof 100000 567 Magda Prof 90000
567 Magda Prof 90000 789 Dan Prof 100000
max(salary)=100000
789 Dan Prof 100000 789 Dan Prof 100000

UserID Name Job Salary


123 Jack TA 50000
P1.Job P2.Name P2.Salary
345 Allison TA 60000 TA Allison 60000
567 Magda Prof 90000
Prof Dan 100000
789 Dan Prof 100000

April 3, 2024 Aggregates 109


The Witnessing Problem
Find the person with highest salary for each job
SELECT P1.Job, P2.Name, P2.Salary
FROM Payroll AS P1, Payroll AS P2
WHERE P1.Job = P2.Job
GROUP BY P1.Job, P2.Name, P2.Salary
HAVING MAX(P1.Salary) = P2.Salary;

Final output has the witnesses

UserID Name Job Salary


123 Jack TA 50000
P1.Job P2.Name P2.Salary
345 Allison TA 60000 TA Allison 60000
567 Magda Prof 90000
Prof Dan 100000
789 Dan Prof 100000

April 3, 2024 Aggregates 110


Summary
Group-by can be subtle!

§ Empty groups

§ Having clause

§ Finding the witness

April 3, 2024 Aggregates 111

You might also like