30 SQL and Database Design Questions From Data Science Interviews at Top Tech Companies
30 SQL and Database Design Questions From Data Science Interviews at Top Tech Companies
30 SQL
BLOG
ACE THE DATA SCIENCE INTERVIEW
and
DATALEMUR SQL INTERVIEW PREP Database
14 BOOKS THAT CHANGED MY LIFE
ABOUT ME
Design
Questions
From Data
Science
Interviews
At Top
Tech
Companies
SOFTWARE
ENGINEERING CAREER
ADVICE
Nick Singh
Data Science Interview.
SQL Interview
Questions on
DataLemur
even come
with hints!
Data Science
Interview
Topics
Overview of
SQL Interview
Questions
BLOG
details involved for
powerful SQL
ACE THE DATA SCIENCE INTERVIEW
workflows. For
DATALEMUR SQL INTERVIEW PREP
example, utilizing
14 BOOKS THAT CHANGED MY LIFE
subqueries is important
ABOUT ME
and allows you to
manipulate subsets of
data by which later
operations can be
performed, while
window functions allow
you to cut data without
combining rows
explicitly using a
GROUP BY. The
questions asked within
SQL are usually quite
practical to the
company at hand - a
company like Facebook
might ask about various
user or app analytics
question, whereas a
company like Amazon
will ask about products
and transactions.
Nick Singh
guide for Data Analysts
and Data Scientists.
Overview of
BLOG
ACE THE DATA SCIENCE INTERVIEW
Databases
DATALEMUR SQL INTERVIEW PREP Design
14 BOOKS THAT CHANGED MY LIFE Questions
ABOUT ME
Although it isn’t
explicitly necessary to
know the inner
workings of databases
(which is typically more
data engineering
oriented), it helps to
have a high level
understanding of basic
concepts in Databases
and Systems. Databases
refers not to specific
ones but more so how
they operate at a high
level and what design
decisions and trade-offs
are made during
construction and
querying. “Systems” is a
broad term but refers
to any set of
frameworks or tools by
which analysis of large
volumes of data relies
on. For example, a
common interview
topic is the MapReduce
Nick Singh
framework which is
heavily utilized at many
companies for parallel
processing of large
Amazon Best-Selling Author of Ace datasets.
the Data Science Interview and Course Creator
of Ace the Data Job Hunt.
20 SQL Data
Founder of DataLemur and Previously Software
Engineer at Facebook & Google.
Science
Interview
Join my free 9-day Data Interview Crash Course!
Questions
1. [Robinhood -
BLOG Easy] Assume
you are given the
ACE THE DATA SCIENCE INTERVIEW
below tables for
DATALEMUR SQL INTERVIEW PREP trades and users.
14 BOOKS THAT CHANGED MY LIFE Write a query to
ABOUT ME list the top 3
cities which had
the highest
number of
completed
orders.
3. [Uber - Easy]
Assume you are
given the below
table for
spending activity
by product type.
Write a query to
calculate the
cumulative spend
for each product
over time in
chronological
order.
4. [Snapchat - Easy]
Assume you have
the below tables
on sessions that
users have, and a
users table.
6. [Amazon - Easy]
Assume you are
given the below
table on
purchases from
users. Write a
query to get the
number of
people that
purchased at
least one product
on multiple days.
7. [Opendoor -
Easy] Assume
you are given the
below table on
house prices
from various zip
8. [Etsy - Easy]
Assume you are
BLOG
given the below
ACE THE DATA SCIENCE INTERVIEW
table on
DATALEMUR SQL INTERVIEW PREP transactions
14 BOOKS THAT CHANGED MY LIFE from users for
purchases. Write
ABOUT ME
a query to get the
list of customers
where their
earliest purchase
was at least $50.
9. [Disney - Easy]
Assume you are
given the below
table on watch
times (in
minutes) for all
users, where
each user is
based in a given
city. Write a
query to return
all pairs of cities
that have total
watch times
within 10000
minutes of one
another.
Nick Singh
10. [Twitter - Easy]
Assume you are
given the below
Amazon Best-Selling Author of Ace
table on tweets
the Data Science Interview and Course Creator
by each user over
of Ace the Data Job Hunt.
a period of time.
Calculate the 7-
Founder of DataLemur and Previously Software day rolling
Engineer at Facebook & Google. average of
tweets by each
Join my free 9-day Data Interview Crash Course! user for every
date.
BLOG
ACE THE DATA SCIENCE INTERVIEW
DATALEMUR SQL INTERVIEW PREP
ABOUT ME
mean you'll need a
window function! In
case you need a
refresher, check out my
30 day learn
SQL roadmap, which
has my list of favorite
FREE SQL resources
and how I'd study them
to go from SQL zero to
SQL HERO!
Nick Singh
Amazon Best-Selling Author of Ace
the Data Science Interview and Course Creator 12. [Amazon - Easy]
of Ace the Data Job Hunt. Assume you are
given the below
Founder of DataLemur and Previously Software table on
Engineer at Facebook & Google. customer spend
amounts on
Join my free 9-day Data Interview Crash Course! products in
various
categories.
Calculate the top
BLOG three most
ACE THE DATA SCIENCE INTERVIEW bought item
within each
DATALEMUR SQL INTERVIEW PREP
category in 2020.
14 BOOKS THAT CHANGED MY LIFE
ABOUT ME
13. [DoorDash -
Easy] Assume
you are given the
below table on
transactions on
delivery
locations and
times for meals -
a start location,
an end location,
and timestamp
for a given
meal_id. Certain
locations are
aggregation
locations - where
meals get sent to,
and where meals
then go to their
final destination.
Calculate the
delivery time per
meal to each final
destination from
a particular
aggregation
location, loc_id =
4.
Nick Singh
Amazon Best-Selling Author of Ace 14. [Facebook -
the Data Science Interview and Course Creator Medium]
of Ace the Data Job Hunt. Assume you have
the below tables
Founder of DataLemur and Previously Software on user actions.
Engineer at Facebook & Google. Write a query to
get the active
Join my free 9-day Data Interview Crash Course! user retention by
month.
BLOG
ACE THE DATA SCIENCE INTERVIEW
DATALEMUR SQL INTERVIEW PREP
15. [Twitter -
14 BOOKS THAT CHANGED MY LIFE Medium]
ABOUT ME Assume you are
given the below
tables for the
session activity
of users. Write a
query to assign
ranks to users by
the total session
duration for the
different session
types they have
had between a
start date (2020-
01-01) and an
end date (2020-
02-01).
16. [Snapchat -
Medium]
Assume you are
given the below
tables on users
and their time
spent on sending
and opening
Snaps. Write a
query to get the
breakdown for
each age
breakdown of
Nick Singh
Amazon Best-Selling Author of Ace
the Data Science Interview and Course Creator 19. [Google -
of Ace the Data Job Hunt. Medium]
Assume you are
Founder of DataLemur and Previously Software given the below
Engineer at Facebook & Google. table of
measurement
Join my free 9-day Data Interview Crash Course! values from a
sensor for
several days.
Each
BLOG measurement
ACE THE DATA SCIENCE INTERVIEW can happen
several times in a
DATALEMUR SQL INTERVIEW PREP
given day. Write
14 BOOKS THAT CHANGED MY LIFE a query to output
ABOUT ME the sum of values
for every odd
measurement
and the sum of
values for every
even
measurement by
date.
DataLemur has
SQL Interview
Questions with
Solutions!
20. [Etsy - Medium]
Assume you are
given the below
10 Database
And Systems
Design
Interview
Questions
21. [MongoDB -
Easy] For each of
the ACID
7 Real
Amazon SQL
Interview
Questions
For additional practice,
I put together some
insights on the Amazon
SQL interview process
for Data Analysts &
Data Scientists, and
curated 7 real Amazon
SQL interview
Nick Singh
questions in the blog
below:
SQL And
Database
Interview
Solutions
Problem #4 Solution:
By definition, daily
cohorts are active users
from a particular day.
First, we can use a
subquery to get the
sessions of new users
by day using an inner
join with users. This is
to filter for only active
users by a particular
join date for the cohort.
Then we can get a
distinct count to return
the active user count:
WITH new_users_by_date
WITH purchase_rank AS
SELECT user_id, sp
RANK() OVER
(PARTITION
FROM user_tran
)
SELECT
user_id
FROM
purchase_rank
WHERE rank = 1 AND spe
Problem #11 Solution:
WITH latest_date AS (
SELECT user_id,
COUNT(DISTINCT
MAX(transactio
FROM user_transact
GROUP BY )
SELECT curr_date,
COUNT(user_id) AS
SUM(num_products)
FROM
latest_date
GROUP BY 1
Nick Singh
spent on each activity
by each user by filtering
out for the activity_type
and taking the sum of
Amazon Best-Selling Author of Ace time spent. In doing
the Data Science Interview and Course Creator this, we want to do an
of Ace the Data Job Hunt. outer join with the age
bucket to get the total
Founder of DataLemur and Previously Software time by age bucket for
Engineer at Facebook & Google.
both activity types. This
results in the below two
Join my free 9-day Data Interview Crash Course!
subqueries. Then, we
can use these two
subqueries to sum them
BLOG
by joining on the
ACE THE DATA SCIENCE INTERVIEW
appropriate age bucket
DATALEMUR SQL INTERVIEW PREP
and take the proportion
14 BOOKS THAT CHANGED MY LIFE for send time and the
ABOUT ME proportion for open
time per age bucket:
WITH send_timespent AS
SELECT age_breakdo
FROM age_breakdown
LEFT JOIN activities
WHERE activities.t
GROUP BY 1
),
open_timespent AS (
SELECT age_breakdo
FROM age_breakdown
LEFT JOIN activities
WHERE activities.t
GROUP BY 1
),
SELECT a.age_bucket,
s.send_timespent /
o.open_timespent /
FROM age_breakdown a
LEFT JOIN send_timespe
LEFT JOIN open_timespe
GROUP BY 1
Nick Singh
reviews are all 4 or 5
stars. We can do this
using a HAVING clause,
instead of a WHERE
Amazon Best-Selling Author of Ace clause since the reviews
the Data Science Interview and Course Creator need to all be 4 stars or
of Ace the Data Job Hunt. above. For the HAVING
condition, we can use a
Founder of DataLemur and Previously Software CASE statement that
Engineer at Facebook & Google.
filters for 4 or 5 stars
and then take a SUM
Join my free 9-day Data Interview Crash Course!
over them. This can
then be compared with
the total row count of
BLOG
the particular
ACE THE DATA SCIENCE INTERVIEW
business_id reviews to
DATALEMUR SQL INTERVIEW PREP
ensure that the count of
14 BOOKS THAT CHANGED MY LIFE top reviews matches
ABOUT ME with the total review
count. With the
relevant businesses, we
can then do an outer
join with the original
table on business_id to
get a COUNT of distinct
business_id matches,
and then the
percentage by
comparing the COUNT
from the top places
with the overall
COUNT of business_id:
WITH top_places AS (
SELECT business_id
FROM user_transact
GROUP BY 1
HAVING
SUM(CASE WHEN
)
SELECT
COUNT(DISTINCT t.b
COUNT(DISTINCT t.b
FROM reviews r
LEFT JOIN top_places t
ON r.business_id =
Nick Singh
Problem #21 Solution:
C: Consistency,
meaning that there are
integrity constraints
such that the database
is consistent before and
after a given
transaction. Essentially,
if I search for what is in
Row 3 and then do so
again without any
modifications to the
database (no deletes or
inserts), I should get the
same result. Any
referential integrity is
handled by appropriate
checks for the primary
and foreign keys.
MapReduce is a
framework that is
heavily used in
processing large
datasets across a large
number of clusters
(many machines).
Within the groups of
machines, there are
worker nodes (which
carry out the
computations) and
master nodes (which
delegate the tasks for
each worker node). The
three steps are
generally as follows.
A trigger is like a
CHECK condition, but
every time there is an
update to the database,
the trigger condition
will be checked to see if
it has been violated.
This allows you to
implement some level
of control and
assurance that all your
Nick Singh
data entries meet a
certain condition. For
instance, a trigger that
states that all ID values
Amazon Best-Selling Author of Ace must be > 0 will ensure
the Data Science Interview and Course Creator that you get no null
of Ace the Data Job Hunt. values or negative
values. When someone
Founder of DataLemur and Previously Software tries to enter such a
Engineer at Facebook & Google.
value, the entry will not
go through.
Join my free 9-day Data Interview Crash Course!
How To Get
More Data
Science
Interview Questions
Want more like this?
Buy our 301-page data
science interview prep
book on Amazon! And
if you want an
interactive
SQL interview platform
DataLemur has got you
covered.
Nick Singh
Real SQL Data
Science
Amazon Best-Selling Author of Ace Interview
the Data Science Interview and Course Creator Questions on
DataLemur!
of Ace the Data Job Hunt.
25+ Video
Lessons in Ace
the Data Job Hunt
Read the 40
prob & stat
data science
interview
questions
asked by
Facebook,
Microsoft, Two
Sigma, and
Bloomberg.
Join 30k+ Subscribers in 38
Nick Singh Countries. Just One Email A
Month.
email address
Amazon Best-Selling Author of Ace
the Data Science Interview and Course Creator Subscribe
BLOG
ACE THE DATA SCIENCE INTERVIEW
DATALEMUR SQL INTERVIEW PREP