0% found this document useful (0 votes)
5 views

chapter4

Uploaded by

Rahul Bhole
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

chapter4

Uploaded by

Rahul Bhole
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Pivoting

P O S T G R E S Q L S U M M A R Y S TAT S A N D W I N D O W F U N C T I O N S

Michel Semaan
Data Scientist
Transforming tables
Before After

| Country | Year | Awards | | Country | 2008 | 2012 |


|---------|------|--------| |---------|------|------|
| CHN | 2008 | 74 | | CHN | 74 | 56 |
| CHN | 2012 | 56 | | RUS | 43 | 47 |
| RUS | 2008 | 43 | | USA | 125 | 147 |
| RUS | 2012 | 47 |
| USA | 2008 | 125 |
Pivoted by Year
| USA | 2012 | 147 |

Easier to scan, especially if pivoted by a


Gold medals awarded to China, Russia, and chronologically ordered column
the USA

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Enter CROSSTAB
CREATE EXTENSION IF NOT EXISTS tablefunc;

SELECT * FROM CROSSTAB($$


source_sql TEXT
$$) AS ct (column_1 DATA_TYPE_1,
column_2 DATA_TYPE_2,
...,
column_n DATA_TYPE_N);

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Queries
Before After

SELECT CREATE EXTENSION IF NOT EXISTS tablefunc;


Country, Year, COUNT(*) AS Awards
FROM Summer_Medals SELECT * FROM CROSSTAB($$
WHERE SELECT
Country IN ('CHN', 'RUS', 'USA') Country, Year, COUNT(*) :: INTEGER AS Awards
AND Year IN (2008, 2012) FROM Summer_Medals
AND Medal = 'Gold' WHERE
GROUP BY Country, Year Country IN ('CHN', 'RUS', 'USA')
ORDER BY Country ASC, Year ASC; AND Year IN (2008, 2012)
AND Medal = 'Gold'
GROUP BY Country, Year
ORDER BY Country ASC, Year ASC;
$$) AS ct (Country VARCHAR, "2008" INTEGER, "2012" INTEGER)

ORDER BY Country ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Source query
WITH Country_Awards AS (
SELECT
Country, Year, COUNT(*) AS Awards
FROM Summer_Medals
WHERE
Country IN ('CHN', 'RUS', 'USA')
AND Year IN (2004, 2008, 2012)
AND Medal = 'Gold' AND Sport = 'Gymnastics'
GROUP BY Country, Year
ORDER BY Country ASC, Year ASC)

SELECT
Country, Year,
RANK() OVER
(PARTITION BY Year ORDER BY Awards DESC) :: INTEGER
AS rank
FROM Country_Awards
ORDER BY Country ASC, Year ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Source result
| Country | Year |Rank |
|---------|------|-----|
| CHN | 2004 | 3 |
| CHN | 2008 | 1 |
| CHN | 2012 | 1 |
| RUS | 2004 | 1 |
| RUS | 2008 | 2 |
| RUS | 2012 | 2 |
| USA | 2004 | 2 |
| USA | 2008 | 3 |
| USA | 2012 | 3 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Pivot query
CREATE EXTENSION IF NOT EXISTS tablefunc;

SELECT * FROM CROSSTAB($$


...
$$) AS ct (Country VARCHAR,
"2004" INTEGER,
"2008" INTEGER,
"2012" INTEGER)

ORDER BY Country ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Pivot result
| Country | 2004 | 2008 | 2012 |
|---------|------|------|------|
| CHN | 3 | 1 | 1 |
| RUS | 1 | 2 | 2 |
| USA | 2 | 3 | 3 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Let's practice!
P O S T G R E S Q L S U M M A R Y S TAT S A N D W I N D O W F U N C T I O N S
ROLLUP and CUBE
P O S T G R E S Q L S U M M A R Y S TAT S A N D W I N D O W F U N C T I O N S

Michel Semaan
Data Scientist
Group-level totals
Chinese and Russian medals in the 2008 Summer Olympics per medal class

| Country | Medal | Awards |


|---------|--------|--------|
| CHN | Bronze | 57 |
| CHN | Gold | 74 |
| CHN | Silver | 53 |
| CHN | Total | 184 |
| RUS | Bronze | 56 |
| RUS | Gold | 43 |
| RUS | Silver | 44 |
| RUS | Total | 143 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


The old way
SELECT
Country, Medal, COUNT(*) AS Awards
FROM Summer_Medals
WHERE
Year = 2008 AND Country IN ('CHN', 'RUS')
GROUP BY Country, Medal
ORDER BY Country ASC, Medal ASC
UNION ALL

SELECT
Country, 'Total', COUNT(*) AS Awards
FROM Summer_Medals
WHERE
Year = 2008 AND Country IN ('CHN', 'RUS')
GROUP BY Country, 2
ORDER BY Country ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Enter ROLLUP
SELECT
Country, Medal, COUNT(*) AS Awards
FROM Summer_Medals
WHERE
Year = 2008 AND Country IN ('CHN', 'RUS')
GROUP BY Country, ROLLUP(Medal)
ORDER BY Country ASC, Medal ASC;

ROLLUP is a GROUP BY subclause that includes extra rows for group-level aggregations

GROUP BY Country, ROLLUP(Medal) will count all Country - and Medal -level totals, then
count only Country -level totals and fill in Medal with null s for these rows

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


ROLLUP - Query
SELECT
Country, Medal, COUNT(*) AS Awards
FROM summer_medals
WHERE
Year = 2008 AND Country IN ('CHN', 'RUS')
GROUP BY ROLLUP(Country, Medal)
ORDER BY Country ASC, Medal ASC;

ROLLUP is hierarchical, de-aggregating from the leftmost provided column to the right-most
ROLLUP(Country, Medal) includes Country -level totals

ROLLUP(Medal, Country) includes Medal -level totals

Both include grand totals

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


ROLLUP - Result
| Country | Medal | Awards |
|---------|--------|--------|
| CHN | Bronze | 57 |
| CHN | Gold | 74 |
| CHN | Silver | 53 |
| CHN | null | 184 |
| RUS | Bronze | 56 |
| RUS | Gold | 43 |
| RUS | Silver | 44 |
| RUS | null | 143 |
| null | null | 327 |

Group-level totals contain nulls ; the row with all null s is the grand total

Notice that it didn't include Medal -level totals, since it's ROLLUP(Country, Medal) and not
ROLLUP(Medal, Country)

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Enter CUBE
SELECT
Country, Medal, COUNT(*) AS Awards
FROM summer_medals
WHERE
Year = 2008 AND Country IN ('CHN', 'RUS')
GROUP BY CUBE(Country, Medal)
ORDER BY Country ASC, Medal ASC;

CUBE is a non-hierarchical ROLLUP

It generates all possible group-level aggregations


CUBE(Country, Medal) counts Country -level, Medal -level, and grand totals

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


CUBE - Result
| Country | Medal | Awards |
|---------|--------|--------|
| CHN | Bronze | 57 |
| CHN | Gold | 74 |
| CHN | Silver | 53 |
| CHN | null | 184 |
| RUS | Bronze | 56 |
| RUS | Gold | 43 |
| RUS | Silver | 44 |
| RUS | null | 143 |
| null | Bronze | 113 |
| null | Gold | 117 |
| null | Silver | 97 |
| null | null | 327 |

Notice that Medal -level totals are included

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


ROLLUP vs CUBE
Source ROLLUP(Year, Quarter)

| Year | Quarter | Sales | | Year | Quarter | Sales |


|------|---------|-------| |------|---------|-------|
| 2008 | Q1 | 12 | | 2008 | null | 27 |
| 2008 | Q2 | 15 | | 2009 | null | 48 |
| 2009 | Q1 | 21 | | null | null | 75 |
| 2009 | Q2 | 27 |

CUBE(Year, Quarter)
Use ROLLUP when you have hierarchical
data (e.g., date parts) and don't want all Above rows + the following
possible group-level aggregations
| Year | Quarter | Sales |
Use CUBE when you want all possible |------|---------|-------|
| null | Q1 | 33 |
group-level aggregations
| null | Q2 | 42 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Let's practice!
P O S T G R E S Q L S U M M A R Y S TAT S A N D W I N D O W F U N C T I O N S
A survey of useful
functions
P O S T G R E S Q L S U M M A R Y S TAT S A N D W I N D O W F U N C T I O N S

Michel Semaan
Data Scientist
Nulls ahoy
Query Result

SELECT | Country | Medal | Awards |


Country, Medal, COUNT(*) AS Awards |---------|--------|--------|
FROM summer_medals | CHN | Bronze | 57 |
WHERE | CHN | Gold | 74 |
Year = 2008 AND Country IN ('CHN', 'RUS') | CHN | Silver | 53 |
GROUP BY ROLLUP(Country, Medal) | CHN | null | 184 |
ORDER BY Country ASC, Medal ASC; | RUS | Bronze | 56 |
| RUS | Gold | 43 |
| RUS | Silver | 44 |
null s signify group totals
| RUS | null | 143 |
| null | null | 327 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Enter COALESCE
COALESCE() takes a list of values and returns the first non- null value, going from left to
right

COALESCE(null, null, 1, null, 2) ? 1

Useful when using SQL operations that return null s


ROLLUP and CUBE

Pivoting

LAG and LEAD

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Annihilating nulls
Query Result

SELECT | Country | Medal | Awards |


COALESCE(Country, 'Both countries') AS Country, |----------------|------------|--------|
COALESCE(Medal, 'All medals') AS Medal, | Both countries | All medals | 327 |
COUNT(*) AS Awards | CHN | All medals | 184 |
FROM summer_medals | CHN | Bronze | 57 |
WHERE | CHN | Gold | 74 |
Year = 2008 AND Country IN ('CHN', 'RUS') | CHN | Silver | 53 |
GROUP BY ROLLUP(Country, Medal) | RUS | All medals | 143 |
ORDER BY Country ASC, Medal ASC; | RUS | Bronze | 56 |
| RUS | Gold | 43 |
| RUS | Silver | 44 |

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Compressing data
Before After

| Country | Rank | CHN, RUS, USA


|---------|------|
| CHN | 1 |
| RUS | 2 |
Succinct and provides all information
| USA | 3 | needed because the ranking is implied

Rank is redundant because the ranking is


implied

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Enter STRING_AGG
STRING_AGG(column, separator) takes all the values of a column and concatenates them,
with separator in between each value

STRING_AGG(Letter, ', ') transforms this...

| Letter |
|--------|
| A |
| B |
| C |

...into this

A, B, C

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Query and result
Before After

WITH Country_Medals AS ( WITH Country_Medals AS (...),


SELECT
Country, COUNT(*) AS Medals Country_Ranks AS (...)
FROM Summer_Medals
WHERE Year = 2012 SELECT STRING_AGG(Country, ', ')
AND Country IN ('CHN', 'RUS', 'USA') FROM Country_Medals;
AND Medal = 'Gold'
AND Sport = 'Gymnastics'
Result
GROUP BY Country),

SELECT CHN, RUS, USA


Country,
RANK() OVER (ORDER BY Medals DESC) AS Rank
FROM Country_Medals
ORDER BY Rank ASC;

POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS


Let's practice!
P O S T G R E S Q L S U M M A R Y S TAT S A N D W I N D O W F U N C T I O N S

You might also like