All-sql-cheat-sheet-a4
All-sql-cheat-sheet-a4
CITY The simplest subquery returns exactly one column and exactly one
id name country_id row. It can be used with comparison operators =, <, <=, >, or >=. CYCLING SKATING
1 Paris 1 id name country id name country
101 Marseille 1 CITY
This 102
query findsLyon 1 the same
cities with rating as country_id
Paris:
1
count
3
1 YK DE 1 YK DE
2 Berlin 2 2 ZG DE 2 DF DE
103 Hamburg 2 2 3 SELECT name FROM city 3 AK PL
4 2 3 WT PL
104 Munich 2 WHERE rating = ( ... ... ... ... ... ...
3 Warsaw 4
SELECT
105 Cracow 4
rating FROM
city UNION
WHERE name = 'Paris'
AGGREGATE FUNCTIONS ); UNION combines the results of two result sets and removes duplicates.
• avg(expr) − average value for rows within the group UNION ALL doesn't remove duplicate rows.
MULTIPLE VALUES
• count(expr) − count of values for rows within the group This query displays German cyclists together with German skaters:
A subquery can also return multiple columns or multiple rows. Such
• max(expr) − maximum value within the group SELECT name
subqueries can be used with operators IN, EXISTS, ALL, or ANY.
FROM cycling
• min(expr) − minimum value within the group This query finds cities in countries that have a population above 20M: WHERE country = 'DE'
• sum(expr) − sum of values within the group SELECT name UNION / UNION ALL
FROM city SELECT name
EXAMPLE QUERIES WHERE country_id IN ( FROM skating
Find out the number of cities: SELECT country_id WHERE country = 'DE';
SELECT COUNT(*) FROM country
FROM city; WHERE population > 20000000 INTERSECT
);
INTERSECT returns only rows that appear in both result sets.
Find out the number of cities with non-null ratings:
CORRELATED This query displays German cyclists who are also German skaters at the
SELECT COUNT(rating)
A correlated subquery refers to the tables introduced in the outer query. A same time:
FROM city;
correlated subquery depends on the outer query. It cannot be run SELECT name
Find out the number of distinctive country values: independently from the outer query. FROM cycling
SELECT COUNT(DISTINCT country_id) This query finds cities with a population greater than the average WHERE country = 'DE'
FROM city; population in the country: INTERSECT
SELECT * SELECT name
Find out the smallest and the greatest country populations: FROM city main_city FROM skating
WHERE population > ( WHERE country = 'DE';
SELECT MIN(population), MAX(population)
FROM country; SELECT
AVG(population) FROM EXCEPT
Find out the total population of cities in respective countries: city average_city
EXCEPT returns only the rows that appear in the first result set but
SELECT country_id, SUM(population) WHERE average_city.country_id = main_city.country_id do not appear in the second result set.
);
FROM city This query displays German cyclists unless they are also German
GROUP BY country_id; skaters at the same time:
This query finds countries that have at least one city:
SELECT name SELECT name
Find out the average rating for cities in respective countries if the
FROM country FROM cycling
average is above 3.0:
WHERE EXISTS ( WHERE country = 'DE'
SELECT country_id, AVG(rating) EXCEPT / MINUS
SELECT *
FROM city FROM city SELECT name
GROUP BY country_id WHERE country_id = country.id FROM skating
HAVING AVG(rating) > ); WHERE country = 'DE';
3.0;
WINDOW FUNCTIONS PARTITION BY ORDER BY
compute their result based on a sliding AGGREGATE FUNCTIONS VS. WINDOW
FUNCTIONS divides rows into multiple groups, called specifies the order of rows in each partition to
window frame, a set of rows that are
partitions, to which the window function is which the window function is applied.
somehow related to the current row. unlike aggregate functions, window functions do not
applied. PARTITION BY city ORDER BY month
collapse rows.
PARTITION BY city sold city month sold city month
month city sold month city sold 200 Rome 1 300 Paris 1
1 Rome 200 500 Paris 2 500 Paris 2
Aggregate Window 1 Paris
100 Londo 1 200 Rome 1
current row 2 Paris 500 2 Paris sum
Functions Functions 1 Londo 100 n 300 Rome 2
∑ 1 Rome 300 800 300 Paris 1
n 400 Rome 3
2 Rome 500 800 300 Rome 2 100 Londo 1
1 Paris 300
∑ 3 Rome 200 900 400 Londo 2 n
2 Rome 300
2 Londo 400 1 Londo 300 900 n 400 Londo 2
∑ ∑ n n 400 900 400 Rome 3 n
2
3 Rome 400 100 500
SYNTAX Default ORDER BY: with no ORDER BY clause, the
Default Partition: with no PARTITION BY clause, the order of rows within each partition is arbitrary.
entire result set is the partition.
SELECT city, month,
WINDOW FRAME
SELECT <column_1>, <column_2>,
sum(sold) OVER ( ⏴window_function>() OVER ( is a set of rows that are somehow related to the current row. The window frame is evaluated separately
PARTITION BY city PARTITION BY <...> within each partition.
ORDER BY month ORDER BY <...>
RANGE UNBOUNDED PRECEDING) ⏴window_frame>) ROWS | RANGE | GROUPS BETWEEN lower_bound AND upper_bound
total <window_column_alias>
PARTITION UNBOUNDED
FROM sales; FROM <table_name>; PRECEDING The bounds can be any of the five options:
N
PRECEDING ∙ UNBOUNDED PRECEDING
N ROWS
Named Window Definition ∙ n PRECEDING
CURRENT ∙ CURRENT ROW
M ROWS
ROW ∙ n FOLLOWING
SELECT country, city, SELECT <column_1>, <column_2>, ∙ UNBOUNDED FOLLOWING
rank() OVER ⏴window_function>() OVER M UNBOUNDE
country_sold_avg FOLLOWING D
⏴window_name> FOLLOWIN The lower_bound must be BEFORE the upper_bound
FROM sales FROM <table_name> G
WHERE month BETWEEN 1 AND 6 WHERE <...>
GROUP BY country, city GROUP BY <...>
HAVING sum(sold) > 10000 ROWS BETWEEN 1 PRECEDING RANGE BETWEEN 1 PRECEDING GROUPS BETWEEN 1 PRECEDING
HAVING <...> AND 1 FOLLOWING AND 1 FOLLOWING AND 1 FOLLOWING
WINDOW country_sold_avg WINDOW ⏴window_name> AS
AS ( (
PARTITION BY country city sold month city sold month city sold month
PARTITION BY <...> Paris 300 1 Paris 300 1 Paris 300 1
ORDER BY avg(sold) DESC) ORDER BY <...> Rome 200 1 Rome 200 1 Rome 200 1
ORDER BY country, city; ⏴window_frame>) Paris 500 2 Paris 500 2 Paris 500 2
curren Rome 100 4 curren Rome 100 4 curren Rome 100 4
ORDER BY <...>; t
Paris 200 4
t
Paris 200 4
t
Paris 200 4
ro ro ro
w Paris 300 5 w Paris 300 5 w Paris 300 5
Rome 200 5 Rome 200 5 Rome 200 5
Londo 200 5 Londo 200 5 Londo 200 5
PARTITION BY, ORDER BY, and window frame definition are all optional. n n n
London 100 6 London 100 6 London 100 6
Rome 300 6 Rome 300 6 Rome 300 6
LOGICAL ORDER OF OPERATIONS IN 1 row before the current values in the range between 3 and 5 1 group before the current row and 1
SQL row and 1 row after the
current row
ORDER BY must contain a single group after the current row
regardless of the value
expression
5. HAVING 10. ORDER BY
1. FROM, JOIN 6. window functions 7. SELECT 11. OFFSET
2. WHERE 8. DISTINCT 12. LIMIT/FETCH/TOP
3. GROUP BY 9. UNION/INTERSECT/EXCEPT
4. aggregate functions
If ORDER BY is specified, then the frame is
RANGE BETWEEN UNBOUNDED PRECEDING
As of 2020, GROUPS is
AND CURRENT ROW.
only supported in n PRECEDING
PostgreSQL 11 and up. CURRENT ROW BETWEEN CURRENT ROW AND CURRENT ROW
BETWEEN n PRECEDING AND DEFAULT WINDOW FRAME
CURRENT ROW
ABBREVIATIONS
You can use window functions in SELECT and ORDER BY. However, you can’t put window functions Without ORDER BY, the frame specification
anywhere in the FROM, WHERE, GROUP BY, or HAVING clauses. nAbbreviation
FOLLOWING Meaning
BETWEEN AND CURRENT ROW AND n FOLLOWING is ROWS BETWEEN UNBOUNDED PRECEDING
UNBOUNDED PRECEDING BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
UNBOUNDED FOLLOWING BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING AND UNBOUNDED FOLLOWING.
LIST OF WINDOW
RANKING FUNCTIONS DISTRIBUTION FUNCTIONS
FUNCTIONS
∙ row_number() − unique number for each row within partition, with ∙ percent_rank() − the percentile ranking number of a row—a value in
Aggregate Functions different numbers for tied values [0, 1] interval: (rank - 1) / (total number of rows - 1)
∙ rank() − ranking within partition, with gaps and same ranking for tied ∙ cume_dist() − the cumulative distribution of a value within a group of values, i.e.,
∙ avg()
values the number of rows with values less than or equal to the current row’s value
∙ count() divided by the total number of rows; a value in (0, 1] interval
∙ dense_rank() − ranking within partition, with no gaps and same ranking for
∙ max() tied values
∙ min() row_number rank dense_rank
city price over(order by price) percent_rank() OVER(ORDER BY sold) cume_dist() OVER(ORDER BY sold)
∙ sum() Paris 7 1 1 1 city sold cume_dist
Rome 7 2 1 1 city sold percent_ran Paris 100 0.2
Ranking k
London 8.5 3 3 2 Berlin 150 0.4
Paris 100 0
Functions Berlin 8.5 4 3 2 Rome 200 0.8
Berlin 150 0.25
Moscow 9 5 5 3 Moscow 200 0.8
∙ row_number() Rome 200 0.5
Madrid 10 6 6 4 without this row 50% of values are less than this row’s London 300 1 80% of values are
Moscow 200 0.5
Oslo 10 7 6 4 value less than or
∙ rank() London 300 1
equal to this
∙ dense_rank() one
Distribution
ORDER BY and Window Frame: rank() and dense_rank() require ORDER BY and Window Frame: Distribution functions require ORDER BY. They do not
Functions ORDER BY, but row_number() does not require ORDER BY. Ranking accept window frame definition (ROWS, RANGE, GROUPS).
∙ percent_rank() functions do not accept window frame definition (ROWS, RANGE, GROUPS).
∙ cume_dist()
Analytic Functions ANALYTIC FUNCTIONS ∙ first_value(expr) − the value for the first row within the window frame
∙ lead() ∙ lead(expr, offset, default) − the value for the row offset rows after the current; offset ∙ last_value(expr) − the value for the last row within the window frame
∙ lag() and
∙ ntile() default are optional; default values: offset = 1, default = NULL
∙ first_value() ∙ lag(expr, offset, default) − the value for the row offset rows before the current; offset first_value(sold) OVER last_value(sold) OVER
∙ last_value() and (PARTITION BY city ORDER BY month) (PARTITION BY city ORDER BY month
RANGE BETWEEN UNBOUNDED
default are optional; default values: offset = 1, default = NULL PRECEDING
∙ nth_value() lead(sold) OVER(ORDER BY month) lag(sold) OVER(ORDER BY month) city month sold first_value AND UNBOUNDED FOLLOWING)
Paris 1 500 500
month sold month sold Paris 2 300 500 city month sold last_value
1 500 300 1 500 Paris 3 400 500 Paris 1 500 400
NULL
2 300 400 2 300 Rome 2 200 200 Paris 2 300 400
order by
order by
500
AGGREGATE 3 400 Rome 3 300 200 Paris 3 400 400
month
3 400 100
month
300
4 100 500 4 100 Rome 4 500 200 Rome 2 200 500
FUNCTIONS 5 500 NULL 5 500
400
Rome 3 300 500
100
Rome 4 500 500
∙ avg(expr) − average value
for rows within the lead(sold, 2, 0) OVER(ORDER BY lag(sold, 2, 0) OVER(ORDER BY
window frame month) month) Note: You usually want to use RANGE BETWEEN
month sold month sold
UNBOUNDED PRECEDING AND UNBOUNDED
offset=
1 500 1 500
∙ count(expr) − count of 2 300 400 2 300 0 FOLLOWING with last_value(). With the
order by
order by
3 400 2 default
month
3 400 100 0
month
∙ max(expr) − maximum
value within the window
frame
∙ min(expr) − minimum ∙ ntile(n) − divide rows within a partition as equally as possible into n groups, and assign
value within the each row its group number.
window frame ORDER BY and Window Frame: Aggregate functions do not require an ORDER
BY. They accept window frame definition (ROWS, RANGE, GROUPS). ntile(3)
city sold
∙ sum(expr) − sum of values Rome 100 1
within the window frame Paris 1001 1 1
London 200 1
Moscow 200 2
Berlin 200 2
Madrid 300 2
Oslo 300 3
2 ORDER BY ∙ nth_value(expr, n) 2− the value for the n-th row within the window frame; n must be
and an integer
Window nth_value(sold, 2) OVER
Frame: (PARTITION BY city 3ORDER BY
ntile(),
month RANGE BETWEEN UNBOUNDED
lead(),
PRECEDING AND UNBOUNDED city month sold nth_value
and lag()
FOLLOWING) Paris 1 500 300
require an
ORDER BY. Paris 2 300 300
They do Paris 3 400 300
Rome 2 200 300
not accept
Rome 3 300 300
window Rome 4 500 300
frame
definition ORDER BY and Window Rome 5 300 300
Frame: first_value(), London 1 100 NULL
3
last_value(), and
RANGE, nth_value() do not require
GROUPS). an ORDER BY. They accept
window frame definition
(ROWS, RANGE, GROUPS).
JOINING TABLES LEFT JOIN
JOIN combines data from two tables.
LEFT JOIN returns all rows from the left table with matching rows from the right table. Rows without a
TOY CAT match are filled with NULLs. LEFT JOIN is also called LEFT OUTER JOIN.
toy_id toy_name cat_id cat_id cat_name SELECT * toy_id toy_name cat_id cat_id cat_name
1 ball 3 1 Kitty 5 ball 1 1 Kitty
2 spring NULL 2 Hugo FROM toy
3 mouse 1 1 Kitty
3 mouse 1 3 Sam LEFT JOIN cat 1 ball 3 3 Sam
4 mouse 4 4 Misty ON toy.cat_id = cat.cat_id; 4 mouse 4 4 Misty
5 ball 1 2 spring NULL NULL NULL
whole left
JOIN typically combines rows with equal values for the specified columns. Usually, one table contains table
a primary key, which is a column or columns that uniquely identify rows in the table (the cat_id
column in the cat table).
The other table has a column or columns that refer to the primary key columns in the first table (the
cat_id column in the toy table). Such columns are foreign keys. The JOIN condition is the equality RIGHT JOIN
between the primary key columns in one table and columns referring to them in the other table. RIGHT JOIN returns all rows from the right table with matching rows from the left table. Rows
without a match are filled with NULLs. RIGHT JOIN is also called RIGHT OUTER JOIN.
JOIN SELECT * toy_id toy_name cat_id cat_id cat_name
FROM toy 5 ball 1 1 Kitty
JOIN returns all rows that match the ON condition. JOIN is also called INNER 3 mouse 1 1 Kitty
RIGHT JOIN cat NULL NULL NULL 2 Hugo
JOIN. SELECT * toy_id toy_name cat_id cat_id cat_name ON toy.cat_id = cat.cat_id; 1 ball 3 3 Sam
FROM toy 5 ball 1 1 Kitty 4 mouse 4 4 Misty
3 mouse 1 1 Kitty whole ht table
JOIN cat 1 ball 3 3 Sam rig
ON toy.cat_id = cat.cat_id; 4 mouse 4 4 Misty
TROUBLESHOOTING
Integer division
When you don't see the decimal places you expect, it
means that you are dividing between two integers.
Cast one to decimal CAST(123 AS decimal) / 2
Division by 0
To avoid this error, make sure that the denominator is
not equal to 0. You can use the NULLIF() function
to replace 0 with a NULL, which will result in a NULL
for the whole expression: count /
NULLIF(count_all, 0)
Inexact calculations
If you do calculations using real (floating point)
numbers, you'll end up with some inaccuracies. This is
because this type is meant for scientific calculations
such as calculating the velocity.
Whenever you need accuracy (such as dealing with
monetary values), use the decimal / numeric type
(or money if available).
Errors when rounding with a specified precision
Most databases won't complain, but do check the
AGGREGATION AND GROUPING DATE AND TIME INTERVALs TIME ZONES
COUNT(expr) – the count of values for the rows There are 3 main time-related types: date time, and Note: In SQL Server, intervals aren't implemented – use In the SQL Standard, the date type can't have an
within the group timestamp. Time is expressed using a 24-hour clock, the associated time zone, but the time and timestamp
SUM(expr) – the sum of values within the group and it can be as vague as just hour and minutes (e.g., DATEADD() and DATEDIFF() functions. types can. In the real world, time zones have little
AVG(expr) – the average value for the rows within 15:30 – 3:30 p.m.) or as precise as microseconds meaning without the date, as the offset can vary
the group and time zone (as shown below): through the year because of daylight saving time.
To get the simplest interval, subtract one time
MIN(expr) – the minimum value within the group value from another: So, it's best to work with the timestamp values.
2021-12-31 14:39:53.662522-05
MAX(expr) – the maximum value within the group date time SELECT CAST('2021-12-31 23:59:59' AS When working with the type timestamp with time
To get the number of rows in the table: timestamp) CAST( 2021-06-01 12:00:00' AS zone (abbr. timestamptz), you can type in the value
timestamp
SELECT COUNT( ) timestamp) in your local time zone, and it'll get converted to the
YYYY-mm-dd HH:MM:SS.ssssss±TZ -- result: 213 days 11:59:59 UTC time zone as it is inserted into the table. Later
FROM city;
when you select from the table it gets converted
4:39:53.662522-05 is almost 2:40 p.m. CDT (e.g.,
To get the number of non-NULL values in a column: back to your local time zone. This is immune to time
in Chicago; in UTC it'd be 7:40 p.m.). The letters in To define an interval: INTERVAL '1' DAY
SELECT COUNT(rating) zone changes.
the above example represent: This syntax consists of three elements: the INTERVAL
ROM city;
keyword, a
To get the count of unique values in a In the date In the time part: quoted value, and a time part keyword (in singular AT TIME ZONE
column: SELECT COUNT(DISTINCT part: YYYY – form.) You can use the following time parts: YEAR To operate between different time zones, use the AT
HH – the zero-padded hour in a
country_id) FROM city; the 4-digit 24- hour clock. MONTH WEEK DAY HOUR MINUTE, and SECOND. In TIME ZONE keyword.
year. MySQL, omit the quotes. You can join many different
MM – the minutes. If you use this format: {timestamp without time
GROUP BY mm – the zero- SS – the seconds. Omissible
INTERVALs using the + operator:
INTERVAL '1' YEAR + INTERVAL '3' MONTH zone} AT TIME ZONE {time zone}, then the
CITY padded month ssssss – the smaller parts database will read the time stamp in the specified
(01—January
country_id of a second – they can be time zone and convert it to the time zone local to the
through 12 In some databases, there's an easier way to get the
Paris 1 expressed using 1 to 6 display. It returns the time in the format timestamp
December). above value. And it accepts plural forms! INTERVAL
digits. Omissible with time zone
Marseille 1 dd – the zero- '1 year 3 months
±TZ – the timezone. It must If you use this format: {timestamp with time zone}
Lyon 1 padded day. There are two more syntaxes in the Standard SQL:
AT TIME ZONE {time zone}, then the database will
start with either or , and
Berlin 2 use two digits relative to Syntax What it does convert the time in one time zone to the target time
zone specified by AT TIME ZONE. It returns the time in
Hamburg 2 UTC. Omissible INTERVAL 'x-y' YEAR TO INTERVAL 'x year y
the format timestamp without time zone, in the
Munich 2 MONTH month
What time is it? target time zone.
Warsaw 4 project after the JOIN. To get the correct count (0 for INTERVAL
To 'x-y'
answer that DAY TO
question INTERVAL
in SQL, 'x day y
you can use: You might skip casting in simple conditions – the
the clients you've never worked for), count the values SECOND second database will know what you mean.
Cracow 4 CURRENT_TIME – to find what time it is.
in a column of the other table, e.g., SELECT airline, flight_number, departure_time
CURRENT_DATE – to get today's date. (GETDATE()
COUNT(project_name). Check out this exercise to see FROM airport_schedule
an example. in SQL Server.)
WHERE departure_time < '12:00';
CURRENT_TIMESTAMP – to get the timestamp with
The example above – the count of cities in each the two above.
country:
SELECT name, COUNT(country_id)
Creating values
FROM city
To create a date time, or timestamp, simply write
GROUP BY name;
the value as a string and cast it to the proper type.
The average rating for the city: SELECT CAST( 2021-12-31' AS
SELECT city_id, AVG(rating) date) SELECT CAST( 15:31' AS
ROM ratings time)
GROUP BY city_id; SELECT CAST( 2021-12-31 23:59:29+02' AS
timestamp)
Common mistake: COUNT(*) and LEFT JOIN
SELECT CAST( 15:31.124769' AS time)
When you join the tables like this: client LEFT
Be careful with the last example – it will be
JOIN project, and you want to get the number of
interpreted as 15 minutes 31 seconds and 124769
projects for every client you know, COUNT(*) will
microseconds! It is always a good idea to write 00
return 1 for each client even if you've never worked
explicitly for hours: '00:15:31.124769
for them. This is because, they're still present in the
list but with the NULL in the fields related to the
CITY
country_id count
1 3
2 3
4 2 You can define the time zone with popular shortcuts
like UTC MST, or GMT, or by continent/city such as:
America/New_York Europe/London, and Asia/Tokyo
In MySQL, write year_month instead of YEAR TO
MONTH and Examples
day_second instead of DAY TO SECOND We set the local time zone to 'America/New_York'
SELECT TIMESTAMP '2021-07-16 21:00:00' AT
To get the last day of a month, add one month and TIME ZONE 'America/Los_Angeles';
subtract one day: -- result: 2021-07-17 00:00:00-04
SELECT CAST('2021-02-01' AS
Here, the database takes a timestamp without a time
date INTERVAL '1'
zone and it's told it's in Los Angeles time, which is
MONTH
then converted to the local time – New York for
INTERVAL '1' DAY
displaying. This answers the question "At what time
should I turn on the TV if the show starts at 9 PM in
To get all events for next three months from today: Los Angeles?"
SELECT event_date, event_name
FROM calendar
SELECT TIMESTAMP WITH TIME ZONE '2021-06-20
WHERE event_date BETWEEN CURRENT_DATE AND
19:30:00' AT TIME ZONE 'Australia/Sydney';
CURRENT_DATE + INTERVAL '3' MONTH
-- result: 2021-06-21 09:30:00
Here, the database gets a timestamp specified in the
To get part of the date:
local time zone and converts it to the time in Sydney
SELECT EXTRACT(YEAR FROM birthday) (note that it didn't return a time zone.) This answers the
FROM artists; question "What time is it in Sydney if it's 7:30 PM
One of possible returned values: 1946. In SQL Server, use here?"
the
DATEPART(part, date) function.