0% found this document useful (0 votes)
8 views

How to Add or Subtract Time Units - StrataScratch 2

Uploaded by

Girma Gessesse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

How to Add or Subtract Time Units - StrataScratch 2

Uploaded by

Girma Gessesse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Blog

Pricing

How to Add or Subtract


Time Units?

Back to guide

Modules
How to Add or Subtract Time Units?

How to Calculate Duration from Two


Timestamps?

How to Count Events per Hour of the Day?

How to Aggregate Data by the Day of the Week or


Month?

How to Filter Data in a Speci>c Timeframe?

How to Find the First or Last Occurrence in a


Dataset?

How to Detect Peaks in Time Series?

How to Find a Rolling Metric?

Main Topics

Theory and Application in Industry


General Approach in SQL
PostgreSQL: ‘interval’ Keyword
MySQL: DATE_ADD() and DATE_SUB()
Alternative: DATEDIFF()
Guided Practice
1. Interview Question from Meta
(PostgreSQL)
2. Interview Question from Noom (MySQL)
3. Interview Question from Asana
(DATEDIFF)
Conclusion

This module will introduce you to the concept of


adding or subtracting time units in SQL. Since the
syntax of this operation is often quite different from
one SQL engine to another, we will focus on methods
speci>c to PostgreSQL and MySQL.

Knowing how to work with temporal data is a must-


have skill for any data scientist. Quite often, data
scientists face issues where they have to perform
some date-time arithmetics, for example:

You have to create a sales report that contains


>gures for different periods, e.g., year-to-date,
month-to-date, previous month, etc. You’d have to
create a solution that automatically >nds the
relevant dates in reference to the date when the
report is generated.
You have a marketing campaign that has been
running for some time and you want to capture
its effectiveness. You will need to work around
the dates when the campaign started and ended,
and compute some key performance indicators
(KPIs). Maybe there were some important events
in the campaign and they would have to be
explicitly modeled.

In this module, we will see methods for doing


arithmetic with different time units that can be used to
>lter data in very Kexible ways.

Theory and Application in


Industry

Temporal (i.e., timestamped) data is omnipresent in


every industry.

A restaurant observes the number of orders


throughout the day, and the week, so that the
owner can plan for the workload in advance
A >nancial institution, like a bank, can be
interested in seeing the transaction turnover for
different time periods: weekly, bi-weekly, year-to-
date, month-to-date, year-over-year, etc.
A tourist agency offers different trips throughout
the year and is interested in keeping track of its
>nances for different periods of time. Its
customers often pay for their services in
installments spread out some intervals before
the actual trip takes place. For the agency to
remain liquid it has to have some amount of
money available at any given time, so it might
want to do some analyses or projections on its
money inKow to ensure that it can operate
without any problem.

Problems like these and many others involve heavy


>ltering and aggregations along different time periods.

In the next section, we will see how we can do time


arithmetics in SQL. It is worth noting that we are
working with a Gregorian calendar, which is used in
most parts of the world, like the USA, Canada, Mexico,
the EU, the UK, etc. There are other calendars as well
that are in use in other parts of the world, like in China
or Ethiopia. We won’t be dealing with them in this
module.

General Approach in SQL

In SQL, the main diTculty when it comes to time


arithmetic is encoding the portion of time that we want
to add to or subtract from a timestamp. Each time, we
need to select which units we are manipulating
(seconds, minutes, hours, days, months…) and, of
course, the number of these units to be added or
subtracted.

All SQL engines have methods for encoding this


information, however, and this is another major issue,
the syntax is typically different from one SQL engine to
another. In this module, we will cover the most notable
examples of how time arithmetic works in PostgreSQL
and MySQL:

1. The ‘interval’ keyword from PostgreSQL;


2. The DATE_ADD() and DATE_SUB() functions from
MySQL;
3. An alternative approach using a DATEDIFF()
function that can be used in some use cases but
is supported both by PostgreSQL and MySQL.

PostgreSQL: ‘interval’ Keyword

To add to or subtract time from a DateTime timestamp


in PostgreSQL, we start with the timestamp, then use a
plus or minus operator, followed by the keyword
‘interval’ and the quantity and type of time units. To
summarize, the syntax is the following:

datetime + interval 'X units'

The ‘X units’ placeholder can be replaced by a wide


variety of intuitive keywords such as ‘1 hour’, ‘5 hours’,
‘10 days’, ‘23 minutes’, ‘47 seconds’, ‘8 weeks’, ‘11
years’, etc. For example:

SELECT timestamp '2001-09-28 01:00' + interval '7


→ 2002-04-28 01:00:00

Note that the units can also be speci>ed before the


‘interval’ keyword by using multiplication.

SELECT timestamp '2001-09-28 01:00' + 7 * interval


→ 2002-04-28 01:00:00

Note also that when adding/subtracting time to/from


dates, there is no need to use the ‘interval’ keyword -
we can add or subtract an integer and this will be
interpreted as the number of days. But whenever we’re
performing arithmetics on timestamps, we need the
‘interval keyword.

SELECT date '2001-09-28' + 7


→ 2001-10-05

You can >nd more examples of time arithmetic


operations in the PostgreSQL documentation:
https://www.postgresql.org/docs/current/functions-
datetime.html

MySQL: DATE_ADD() and


DATE_SUB()

MySQL implements a very similar method for time


arithmetic but instead of using the plus or minus
operator, we pass all the different elements as
parameters to a function called DATE_ADD for adding
and DATE_SUB for subtracting. These functions have
the following syntax:

DATE_ADD(timestamp, INTERVAL X UNITS)

Similar to PostgreSQL, the unit can be selected from a


long list comprising examples such as MINUTE, HOUR,
DAY, WEEK, MONTH, QUARTER, or YEAR.

SELECT DATE_ADD('2001-09-28 01:00', INTERVAL 7 MON


→ 2002-04-28 01:00:00

The main difference from PostgreSQL is that in


MySQL, we can’t use the apostrophes when specifying
the time unit, instead, it’s a prebuilt keyword, the same
as the ‘INTERVAL’ keyword. These keywords are also
usually spelled using uppercase letters but this is only
a convention and not a requirement. What’s also
important is that these keywords are always in a
singular form, no matter the quantity. So while in
PostgreSQL we’d say «interval '7 months'», in MySQL
it’ll always be «INTERVAL 7 MONTH» - ‘MONTH’ being
singular despite there being 7 of them.

The DATE_SUB() function has the same syntax and


parameters as DATE_ADD, it only subtracts the
speci>ed time units.

SELECT DATE_SUB('2001-09-28 01:00', INTERVAL 7 MON

→ 2001-02-28 01:00:00

What’s interesting is that technically, we can use the


DATE_ADD() function for subtraction and DATE_SUB()
for addition by using the negative quantity of time
units.

SELECT DATE_ADD('2001-09-28 01:00', INTERVAL -7 MO


→ 2001-02-28 01:00:00

Alternative: DATEDIFF()

Even though this syntax works both in PostgreSQL and


MySQL, this approach can only be applied in very
speci>c cases. The DATEDIFF() function simply returns
a difference, in days, between two given dates.

SELECT DATEDIFF('2001-09-28', '2001-09-14')


→ 14

Because of this, we can only use it in the context of


time arithmetics when we are dealing with dates and if
the task is to >lter or search the data based on a
difference in days between two dates. Nevertheless,
and this will become clear in the next section of this
module, this very speci>c case is a frequent use case
of time arithmetics in data science.

Guided Practice

The notion of time arithmetic is frequently tested at


data science technical interviews so we will use some
of these problems to practice. In this module, we will
solve three problems: the >rst one we will solve using
PostgreSQL, the second one using MySQL and for the
third one we will guide you through a solution that will
work either by exploiting the DATEDIFF function.

1. Interview Question from Meta


(PostgreSQL)

Number of Comments Per User in


30 days before 2020-02-10
Interview Question Date: December 2020

Meta/Facebook Easy ID 2004

Data Engineer Data Scientist BI Analyst

Data Analyst ML Engineer

Return the total number of comments


received for each user in the 30 or less days
before 2020-02-10. Don't output users who
haven't received any comment in the de>ned
time period.
Table: fb_comments_count

Table: fb_comments_count

user_id created_at number_of_comments

2019-12-
18 1
29

2019-12-
25 1
21

2020-01-
78 1
04

2020-02-
37 1
01

2019-12-
41 1
23

Show all Toggle dTypes

Go to the question on the platform

Tables:
PostgreS…
fb_comments_count

1 select * from fb_comments_count;

Reset Run Code Check Solution

Use ⌥ + Enter to run query


Highlight some code to execute selection
only

To solve this interview question, we can essentially


follow the following two steps:

1. Filter the data to only count comments within the


30 days prior to 2020-02-10;
2. Aggregate the remaining data by user_id and
sum the number of comments.

Before we actually implement the >lter, let’s take a


moment to understand what it means. We have a
certain timeframe and the task is to verify if another
date falls within this interval. We are only given the
>nal date and the length of this timeframe. Let’s >rst
see how we can encode this timeframe in PostgreSQL,
starting with the user_id, number of comments, the
date of interest (created_at), and the end date of our
timeframe:

SELECT user_id,
number_of_comments,
created_at,
timestamp '2020-02-10' AS end_date
FROM fb_comments_count

All required columns and the >rst 5 rows of the solution are shown

user_id number_of_comments created_at

2019-12-
18 1
29

2019-12-
25 1
21

2020-01-
78 1
04

2020-02-
37 1
01

2019-12-
41 1
23

We have the end date of the timeframe since it was


given in the question text, but we also need to know
when it begins. We know that its length is 30 days so
we can simply subtract these 30 days from the end
date of 2020-02-10 using the syntax we learned.

timestamp '2020-02-10' - 30 * INTERVAL '1 day' AS

All required columns and the >rst 5 rows of the solution are shown

user_id number_of_comments created_at

2019-12-
18 1
29

2019-12-
25 1
21

2020-01-
78 1
04

2020-02-
37 1
01

2019-12-
41 1
23

The start_date and end_date are the same in each row


because it’s always the same timeframe. But hopefully,
this visualizes the problem better: in each case, we
need to check if the created_at date falls between the
start_date and end_date. And now we know how to
encode them in PostgreSQL. So let’s convert this all to
a date >lter using a very intuitive keyword BETWEEN.

WHERE created_at BETWEEN timestamp '2020-02-10' -

All required columns and the >rst 5 rows of the solution are shown

user_id number_of_comments created_at

2020-02-
37 1
01

2020-02-
99 1
02

2020-01-
18 1
31

2020-01-
58 1
26

2020-02-
24 1
03

Now that all the rows represent comments from the


correct timespan, we can move on to the second step,
which is about aggregating the data by user_id and
summing the number of comments.

SELECT user_id,
SUM(number_of_comments)AS number_of_comment
FROM fb_comments_count
WHERE created_at BETWEEN timestamp '2020-02-10' -
GROUP BY user_id

All required columns and the >rst 5 rows of the solution are shown

user_id number_of_comments

5 1

8 4

9 2

16 1

18 2

In MySQL, the solution would be:

SELECT user_id,
SUM(number_of_comments) AS number_of_commen
FROM fb_comments_count
WHERE created_at BETWEEN DATE_SUB('2020-02-10', IN
GROUP BY user_id

Go to the question on the platform

Tables:
PostgreS…
fb_comments_count

1 SELECT user_id,
2 SUM(number_of_comments)AS
number_of_comments
3 FROM fb_comments_count
4 WHERE created_at BETWEEN timestamp '2020-02
-10' - 30 * INTERVAL '1 day' AND
timestamp '2020-02-10'
5 GROUP BY user_id

Reset Run Code Check Solution

Use ⌥ + Enter to run query


Highlight some code to execute selection
only

2. Interview Question from Noom


(MySQL)

Transactions By Billing Method and


Signup ID
Interview Question Date: March 2021

Asana Noom Medium ID 2031

You might also like