lec05_aggregates
lec05_aggregates
Aggregates
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 3
Aggregates
May use alias
SELECT count(*) as C
C
FROM Payroll
2
WHERE Job = ‘TA’;
Payroll
We may compute
UserID Name Job Salary several aggreges
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 4
Today: GROUP BY
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 7
Group By Basics
SELECT Job, avg(Salary)
FROM Payroll
GROUP BY Job;
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 8
Group By Basics
SELECT Job, avg(Salary)
FROM Payroll
GROUP BY Job;
Payroll
UserID Name Job Salary
Job avg(Salary)
123 Jack TA 50000
TA 55000
345 Allison TA 60000
Prof 95000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 9
Group By Basics
Find total revenue for each product.
Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 10
Group By Basics
Find total revenue for each product.
Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 11
Group By Basics
Find total revenue for each product.
Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 12
Group By Basics
Find total revenue for each product.
Sales
One row for each product
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan Product Rev
Bagel 1.50 20 March Bagel 140 60+50+30
Banana 0.5 50 Feb Banana 75 25+50
Banana 5 10 Feb Apple 40 40
Apple 4 10 March
April 3, 2024 Aggregates 13
Group By Basics
Find total revenue for each month.
Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 14
Group By Basics
Find total revenue for each month.
Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 15
Group By Basics
Find total revenue for each month.
Sales
GROUP BY Month
Product Price Quant Month Product Price Quant Month
Bagel 3 20 Jan Bagel 3 20 Jan
Bagel 5 10 Jan Bagel 5 10 Jan
Bagel 1.50 20 March Banana 0.5 50 Feb
Banana 0.5 50 Feb Banana 5 10 Feb
Banana 5 10 Feb Bagel 1.50 20 March
Apple 4 10 March Apple 4 10 March
April 3, 2024 Aggregates 16
Group By Basics
Find total revenue for each month.
Sales
GROUP BY Month
Product Price Quant Month Product Price Quant Month
Bagel 3 20 Jan Bagel 3 20 Jan
Bagel 5 10 Jan Bagel 5 10 Jan
Bagel 1.50 20 March Banana 0.5 50 Feb
Banana 0.5 50 Feb Banana 5 10 Feb
Banana 5 10 Feb Bagel 1.50 20 March
Apple 4 10 March Apple 4 10 March
April 3, 2024 Aggregates 17
Group By Basics
One row for
Find total revenue for each month. each month
SELECT Month, sum(Price*Quant)as Rev
FROM Sales
Month Rev
GROUP BY Month;
Jan 140 60+50
Feb 75 25+50
March 40 40+30
Sales
GROUP BY Month
Product Price Quant Month Product Price Quant Month
Bagel 3 20 Jan Bagel 3 20 Jan
Bagel 5 10 Jan Bagel 5 10 Jan
Bagel 1.50 20 March Banana 0.5 50 Feb
Banana 0.5 50 Feb Banana 5 10 Feb
Banana 5 10 Feb Bagel 1.50 20 March
Apple 4 10 March Apple 4 10 March
April 3, 2024 Aggregates 18
Group By Basics
Find total revenue per month, for sales over 2.50
Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 19
Group By Basics
Find total revenue per month, for sales over 2.50
Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan Not interested
in these sales
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 20
Group By Basics
Find total revenue per month, for sales over 2.50
Sales
GROUP BY Month
Product Price Quant Month Product Price Quant Month
Bagel 3 20 Jan Bagel 3 20 Jan
Bagel 5 10 Jan Bagel 5 10 Jan
Bagel 1.50 20 March Banana 5 10 Feb
Banana 0.5 50 Feb Apple 4 10 March
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 21
Group By Basics
One row for
Find total revenue per month, for sales over 2.50 each month
SELECT Month, sum(Price*Quant)as Rev
FROM Sales
Month Rev
WHERE Price > 2.5
Jan 140 60+50
GROUP BY Month;
Feb 75 25+50
March 40 40+30
Sales
GROUP BY Month
Product Price Quant Month Product Price Quant Month
Bagel 3 20 Jan Bagel 3 20 Jan
Bagel 5 10 Jan Bagel 5 10 Jan
Bagel 1.50 20 March Banana 5 10 Feb
Banana 0.5 50 Feb Apple 4 10 March
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 22
Group By Basics
Find total revenue for each product and each month.
Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 23
Group By Basics
Find total revenue for each product and each month.
Sales
Product Price Quant Month
Bagel 3 20 Jan
Product Month Rev
Bagel 5 10 Jan
Bagel Jan 110
Bagel 1.50 20 March
Bagel March 30
Banana 0.5 50 Feb
Banana Feb 75
Banana 5 10 Feb
Apple March 40
Apple 4 10 March
April 3, 2024 Aggregates 24
A Source of Errors
What does this query return?
SELECT Product, Price, sum(Price*Quant) as Rev
FROM Sales
GROUP BY Product;
Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
April 3, 2024 Aggregates 25
A Source of Errors
What does this query return?
SELECT Product, Price, sum(Price*Quant) as Rev
FROM Sales
GROUP BY Product;
No unique
Sales One row for price for
each product the group
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan Product Price Rev
Bagel 1.50 20 March Bagel ?? 140
Banana 0.5 50 Feb Banana ?? 75
Banana 5 10 Feb Apple ?? 40
Apple 4 10 March
April 3, 2024 Aggregates 26
A Source of Errors
What does this query return?
SELECT Product, Price, sum(Price*Quant) as Rev
FROM Sales
GROUP BY Product;
Rule: every attribute in SELECT
must also occur in GROUP BY No unique
Sales One row for price for
each product the group
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan Product Price Rev
Bagel 1.50 20 March Bagel ?? 140
Banana 0.5 50 Feb Banana ?? 75
Banana 5 10 Feb Apple ?? 40
Apple 4 10 March
April 3, 2024 Aggregates 27
Discussion so far
§ Step 2: GROUP BY
Step 2
Product Price Quant Month
Bagel 3 20 Jan
Group-by Banana
Bagel
0.5
1.50
50
20
Feb
March
Apple 4 10 March
Step 2
Product Price Quant Month Month Quant
Bagel 3 20 Jan Jan 20
Group-by Banana
Bagel
0.5
1.50
50
20
Feb
March
Feb
March
50
30
Apple 4 10 March
Step 3
April 3, 2024 Aggregates 34
Multiple Aggregates
SELECT Product, count(*), sum(Quant) Product count… sum…
§ Therefore count(*) ≥ 1
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 43
Coping with Empty Groups
SELECT Job, count(*) Job Count(*)
FROM Payroll TA 2
GROUP BY Job;
Prof 2
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 44
Coping with Empty Groups
SELECT Job, count(*) Job Count(*)
FROM Payroll TA 2
GROUP BY Job;
Prof 2
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 45
Coping with Empty Groups
SELECT Job, count(*) Job Count(*)
FROM Payroll TA 2
GROUP BY Job;
Prof 2
TA group
SELECT Job, count(*) no longer Job Count(*)
FROM Payroll
exists Prof 2
WHERE Salary > 75000
GROUP BY Job;
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 47
Coping with Empty Groups
SELECT Job, count(*) Job Count(*)
FROM Payroll TA 2
GROUP BY Job;
Prof 2
TA group
SELECT Job, count(*) no longer Job Count(*)
FROM Payroll
exists Prof 2
WHERE Salary > 75000
GROUP BY Job;
Payroll
UserID Name Job Salary
123 Jack TA 50000
Can never have count(*)=0
345 Allison TA 60000
If we want them: outer joins!
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 48
Coping with Empty Groups
How many cars does each person drive? Let’s start
with a simpler
example
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 49
Coping with Empty Groups
How many cars does each person drive? Let’s start
with a simpler
SELECT P.Name, count(*) example
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID
GROUP BY P.UserID;
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 50
Coping with Empty Groups
How many cars does each person drive? Let’s start
with a simpler
SELECT P.Name, count(*) example
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID
GROUP BY P.UserID;
Name count
We want this
Jack 1
Magda 2
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 51
Coping with Empty Groups
How many cars does each person drive?
Incorrect! Why??
SELECT P.Name, count(*)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID
GROUP BY P.UserID;
Name count
We want this
Jack 1
Magda 2
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 52
Coping with Empty Groups
How many cars does each person drive?
Incorrect! Why??
SELECT P.Name, count(*)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID P.Name must
GROUP BY P.UserID; occur in GROUP BY
Name count
We want this
Jack 1
Magda 2
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 53
Coping with Empty Groups
How many cars does each person drive?
Now it’s correct
SELECT P.Name, count(*)
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID
GROUP BY P.Name, P.UserID;
Name count
We want this
Jack 1
Magda 2
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 54
Coping with Empty Groups
How many cars does each person drive?
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 55
Coping with Empty Groups
How many cars does each person drive?
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 56
Coping with Empty Groups
How many cars does each person drive?
To also include Allison, Dan,
SELECT P.Name, count(*) we will use outer joins
FROM Payroll P, Regist R
WHERE P.UserID = R.UserID
GROUP BY P.Name, P.UserID;
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 57
Coping with Empty Groups
How many cars does each person drive?
To also include Allison, Dan,
we will use outer joins
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 58
Coping with Empty Groups
How many cars does each person drive?
SELECT P.Name, count(*)
FROM Payroll P LEFT OUTER JOIN Regist R ON P.UserID = R.UserID
GROUP BY P.Name, P.UserID;
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 59
Coping with Empty Groups
How many cars does each person drive?
SELECT P.Name, count(*)
FROM Payroll P LEFT OUTER JOIN Regist R ON P.UserID = R.UserID
GROUP BY P.Name, P.UserID;
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 60
Coping with Empty Groups
How many cars does each person drive?
SELECT P.Name, count(*)
FROM Payroll P LEFT OUTER JOIN Regist R ON P.UserID = R.UserID
GROUP BY P.Name, P.UserID;
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 61
Coping with Empty Groups
How many cars does each person drive?
SELECT P.Name, count(*)
FROM Payroll P LEFT OUTER JOIN Regist R ON P.UserID = R.UserID
GROUP BY P.Name, P.UserID;
Payroll Regist
UserID Name Job Salary UserID Car
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 62
Coping with Empty Groups
How many cars does each person drive?
SELECT P.Name, count(*)
FROM Payroll P LEFT OUTER JOIN Regist R ON P.UserID = R.UserID
GROUP BY P.Name, P.UserID;
Payroll Regist
UserID Name Job Salary UserID Car Should be 0!
How to fix?
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 63
Coping with Empty Groups
How many cars does each person drive? Count ignores NULLs
SELECT P.Name, count(R.Car)
FROM Payroll P LEFT OUTER JOIN Regist R ON P.UserID = R.UserID
GROUP BY P.Name, P.UserID;
Payroll Regist
UserID Name Job Salary UserID Car Now it’s
correct
123 Jack TA 50000 123 Charger
345 Allison TA 60000 567 Civic
567 Magda Prof 90000 567 Pinto
789 Dan Prof 100000
April 3, 2024 Aggregates 64
Coping with Empty Groups
For each job, how many people earn more than 75000?
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 65
Coping with Empty Groups
For each job, how many people earn more than 75000?
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 66
Coping with Empty Groups
For each job, how many people earn more than 75000?
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 69
Coping with Empty Groups
For each job, how many people earn more than 75000?
SELECT P1.Job, count(DISTINCT P2.UserID)
FROM Payroll P1 LEFT OUTER JOIN Payroll P2
ON P1.Job = P2.Job and P2.Salary > 75000
Left Outer Join
GROUP BY P1.Job;
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 70
Coping with Empty Groups
For each job, how many people earn more than 75000?
SELECT P1.Job, count(DISTINCT P2.UserID)
FROM Payroll P1 LEFT OUTER JOIN Payroll P2
ON P1.Job = P2.Job and P2.Salary > 75000
Left Outer Join
GROUP BY P1.Job;
P1.UserID P1.Name P1.Job P1.Salary P2.UserID P2.Name P2.Job P2.Salary
123 Jack TA 50000 NULL NULL NULL NULL
345 Allison TA 60000 NULL NULL NULL NULL
567 Magda Prof 90000 567 Magda Prof 90000
789 Dan Prof 100000 567 Magda Prof 90000
567 Magda Prof 90000 789 Dan Prof 100000
789 Dan Prof 100000 789 Dan Prof 100000
Payroll
UserID Name Job Salary We want to include all jobs,
123 Jack TA 50000 even when the count is 0.
345 Allison TA 60000 Need an outer join with the Jobs
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 71
Coping with Empty Groups
For each job, how many people earn more than 75000?
SELECT P1.Job, count(DISTINCT P2.UserID)
FROM Payroll P1 LEFT OUTER JOIN Payroll P2 Group by P1.Job
ON P1.Job = P2.Job and P2.Salary > 75000
GROUP BY P1.Job;
P1.UserID P1.Name P1.Job P1.Salary P2.UserID P2.Name P2.Job P2.Salary
123 Jack TA 50000 NULL NULL NULL NULL
345 Allison TA 60000 NULL NULL NULL NULL
567 Magda Prof 90000 567 Magda Prof 90000
789 Dan Prof 100000 567 Magda Prof 90000
567 Magda Prof 90000 789 Dan Prof 100000
789 Dan Prof 100000 789 Dan Prof 100000
Payroll
UserID Name Job Salary We want to include all jobs,
123 Jack TA 50000 even when the count is 0.
345 Allison TA 60000 Need an outer join with the Jobs
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 72
Coping with Empty Groups
For each job, how many people earn more than 75000? What do we write here?
SELECT P1.Job, count(DISTINCT P2.UserID)
FROM Payroll P1 LEFT OUTER JOIN Payroll P2 Job …
ON P1.Job = P2.Job and P2.Salary > 75000 Want this: TA 0
GROUP BY P1.Job; Prof 2
Payroll
UserID Name Job Salary We want to include all jobs,
123 Jack TA 50000 even when the count is 0.
345 Allison TA 60000 Need an outer join with the Jobs
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 73
Coping with Empty Groups
For each job, how many people earn more than 75000?
SELECT P1.Job, count(DISTINCT P2.UserID)
FROM Payroll P1 LEFT OUTER JOIN Payroll P2 Count this
ON P1.Job = P2.Job and P2.Salary > 75000
GROUP BY P1.Job;
P1.UserID P1.Name P1.Job P1.Salary P2.UserID P2.Name P2.Job P2.Salary
123 Jack TA 50000 NULL NULL NULL NULL
345 Allison TA 60000 NULL NULL NULL NULL
567 Magda Prof 90000 567 Magda Prof 90000
789 Dan Prof 100000 567 Magda Prof 90000
567 Magda Prof 90000 789 Dan Prof 100000
789 Dan Prof 100000 789 Dan Prof 100000
Payroll
UserID Name Job Salary We want to include all jobs,
123 Jack TA 50000 even when the count is 0.
345 Allison TA 60000 Need an outer join with the Jobs
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 74
Coping with Empty Groups
For each job, how many people earn more than 75000?
SELECT P1.Job, count(DISTINCT P2.UserID) Job Count(…)
FROM Payroll P1 LEFT OUTER JOIN Payroll P2 TA 0
ON P1.Job = P2.Job and P2.Salary > 75000 Prof 2
GROUP BY P1.Job;
P1.UserID P1.Name P1.Job P1.Salary P2.UserID P2.Name P2.Job P2.Salary
123 Jack TA 50000 NULL NULL NULL NULL
345 Allison TA 60000 NULL NULL NULL NULL
567 Magda Prof 90000 567 Magda Prof 90000
789 Dan Prof 100000 567 Magda Prof 90000
567 Magda Prof 90000 789 Dan Prof 100000
789 Dan Prof 100000 789 Dan Prof 100000
Payroll
UserID Name Job Salary We want to include all jobs,
123 Jack TA 50000 even when the count is 0.
345 Allison TA 60000 Need an outer join with the Jobs
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 75
Discussion
§ Use Left-outer-join
§ WHERE:
• Applies a predicate to a single tuple*
• Cannot use any aggregate operation
§ HAVING:
• Applies a predicate to an entire group
• May use aggregate operations
• Can only check attributes occurring in GROUP-BY
Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan
Bagel 1.50 20 March
Banana 0.5 50 Feb
Banana 5 10 Feb
Apple 4 10 March
Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan count(*)=3
Bagel 1.50 20 March
Banana 0.5 50 Feb
count(*)=2
Banana 5 10 Feb
Apple 4 10 March count(*)=1 NOT included
Sales
Product Price Quant Month
Bagel 3 20 Jan
Bagel 5 10 Jan count(*)=3 Product sum
Bagel 1.50 20 March Bagel 50
Banana 0.5 50 Feb Banana 60
count(*)=2
Banana 5 10 Feb
Apple 4 10 March count(*)=1 NOT included
§ HAVING:
• Applies to entire group: keep it or drop it
• May use aggregates (count(*)=5, sum(…) > 10)
• May only use attributes in GROUP-BY
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 88
The Witnessing Problem
Find the person with highest salary for each job
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 89
The Witnessing Problem
Find the person with highest salary for each job
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 90
The Witnessing Problem
Find the person with highest salary for each job
Payroll
UserID Name Job Salary But we want argmax.
123 Jack TA 50000 How do we find
345 Allison TA 60000 the witness?
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 91
The Witnessing Problem
Find the person with highest salary for each job
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 92
The Witnessing Problem
Find the person with highest salary for each job
WRONG!
Name not in GROUP BY
Payroll
UserID Name Job Salary
123 Jack TA 50000 Sqlite does not return an error,
but returns junk outputs.
345 Allison TA 60000
Don’t use this.
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 93
The Witnessing Problem
Find the person with highest salary for each job
Plan:
1. Compute the max(Salary) for each Job
2. Join back with Payroll on Job
3. Return the users where Salary = max(Salary)
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 94
The Witnessing Problem
Find the person with highest salary for each job
We first join
Plan:
1. Compute the max(Salary) for each Job
2. Join back with Payroll on Job
3. Return the users where Salary = max(Salary)
Goes in HAVING
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 95
The Witnessing Problem
Find the person with highest salary for each job
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 96
The Witnessing Problem
Find the person with highest salary for each job
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 97
The Witnessing Problem
Find the person with highest salary for each job
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 98
The Witnessing Problem
Find the person with highest salary for each job
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 99
The Witnessing Problem
Find the person with highest salary for each job
Payroll
UserID Name Job Salary
123 Jack TA 50000
345 Allison TA 60000
567 Magda Prof 90000
789 Dan Prof 100000
April 3, 2024 Aggregates 103
The Witnessing Problem
Find the person with highest salary for each job
SELECT P1.Job, P2.Name, P2.Salary
FROM Payroll AS P1, Payroll AS P2
WHERE P1.Job = P2.Job
GROUP BY P1.Job, P2.Name, P2.Salary
HAVING MAX(P1.Salary) = P2.Salary;
§ Empty groups
§ Having clause