DataLemur 🐵 (Ace the SQL & Data Interview)’s cover photo
DataLemur 🐵 (Ace the SQL & Data Interview)

DataLemur 🐵 (Ace the SQL & Data Interview)

Information Services

San Francisco, California 45,715 followers

Practice 200+ FAANG SQL & Data Interview questions! Made by Nick Singh (Ex-FB & Author of Ace the Data Interview 📕)

About us

Hi, I'm Nick Singh, the creator of DataLemur! I used to write SQL queries and run A/B tests all day at Facebook, and before that, wrote Python on Google Nest's Data Infrastructure Team. I know first-hand how difficult Data Science, Data Analyst, & Data Engineering interviews can be. While my best-selling book, Ace the Data Science Interview, has successfully helped 16,000+ readers prepare for the Statistics, ML, and Business-Sense portions of data interviews, readers kept asking for a more interactive way to practice the SQL questions from the book. That's why I made DataLemur, a SQL & Data Analytics interview platform for the data community! Happy practicing: https://datalemur.com/

Website
http://datalemur.com/
Industry
Information Services
Company size
2-10 employees
Headquarters
San Francisco, California
Type
Privately Held

Locations

Employees at DataLemur 🐵 (Ace the SQL & Data Interview)

Updates

  • SQL Interviews LOVE to test you on Window Functions. Here’s the 7 most popular window functions & some real SQL interview questions to practice these commands 👇 𝟕 𝐌𝐨𝐬𝐭 𝐓𝐞𝐬𝐭𝐞𝐝 𝐖𝐢𝐧𝐝𝐨𝐰 𝐅𝐮𝐧𝐜𝐭𝐢𝐨𝐧𝐬 * RANK() - gives a rank to each row in a partition based on a specified column or value * DENSE_RANK() - gives a rank to each row, but DOESN'T skip rank values * ROW_NUMBER() - gives a unique integer to each row in a partition based on the order of the rows * LEAD() - retrieves a value from a subsequent row in a partition based on a specified column or expression * LAG() - retrieves a value from a previous row in a partition based on a specified column or expression * NTH_VALUE() - retrieves the nth value in a partition Now, let’s put these commands into practice: 𝐔𝐛𝐞𝐫 𝐒𝐐𝐋 𝐈𝐧𝐭𝐞𝐫𝐯𝐢𝐞𝐰 𝐐𝐮𝐞𝐬𝐭𝐢𝐨𝐧 Uses Row_Number() to find the 3rd ride booked: https://lnkd.in/gf4UDx4d 𝐆𝐨𝐨𝐠𝐥𝐞 𝐒𝐐𝐋 𝐈𝐧𝐭𝐞𝐫𝐯𝐢𝐞𝐰 𝐐𝐮𝐞𝐬𝐭𝐢𝐨𝐧 Uses Row_Number() to find odd & even measurements from a sensor: https://lnkd.in/gBUCxxih 𝐒𝐩𝐨𝐭𝐢𝐟𝐲 𝐒𝐐𝐋 𝐈𝐧𝐭𝐞𝐫𝐯𝐢𝐞𝐰 𝐐𝐮𝐞𝐬𝐭𝐢𝐨𝐧 Uses DENSE_RANK() to find the top 5 artists on Spotify: https://lnkd.in/gDJ_paEY 𝐖𝐚𝐲𝐟𝐚𝐢𝐫 𝐒𝐐𝐋 𝐈𝐧𝐭𝐞𝐫𝐯𝐢𝐞𝐰 𝐐𝐮𝐞𝐬𝐭𝐢𝐨𝐧 Uses LAG() to find the Year-over-Year Growth: https://lnkd.in/g2WAe2BK

    • No alternative text description for this image
  • Twitter SQL Interview Question: Given a table of tweet data over a specified time period, calculate the 3-day rolling average of tweets for each user. Output the user ID, tweet date, and rolling averages rounded to 2 decimal places. Attempt the SQL question here: https://lnkd.in/ewFBqDVr 𝐓𝐡𝐞 𝐃𝐚𝐭𝐚: tweets table: •user_id (integer) •tweet_date (timestamp) •tweet_count (integer) 𝐒𝐭𝐞𝐩 𝟏: 𝐂𝐚𝐥𝐜𝐮𝐥𝐚𝐭𝐞 𝐭𝐡𝐞 𝐚𝐯𝐞𝐫𝐚𝐠𝐞 𝐭𝐰𝐞𝐞𝐭 𝐜𝐨𝐮𝐧𝐭 𝐟𝐨𝐫 𝐞𝐚𝐜𝐡 𝐮𝐬𝐞𝐫 To obtain the average rolling average tweet count for each user, we use the following query which calculates the average tweet count for each user ID and date using AVG() window function. SELECT user_id, tweet_date, AVG(tweet_count) OVER ( PARTITION BY user_id ORDER BY tweet_date) AS rolling_avg_3d FROM tweets; 𝐒𝐭𝐞𝐩 𝟐: 𝐂𝐚𝐥𝐜𝐮𝐥𝐚𝐭𝐞 𝐭𝐡𝐞 𝐫𝐨𝐥𝐥𝐢𝐧𝐠 𝐚𝐯𝐞𝐫𝐚𝐠𝐞 𝐭𝐰𝐞𝐞𝐭 𝐜𝐨𝐮𝐧𝐭 𝐟𝐨𝐫 𝐞𝐚𝐜𝐡 𝐮𝐬𝐞𝐫 𝐨𝐯𝐞𝐫 𝐚 𝟑-𝐝𝐚𝐲 𝐩𝐞𝐫𝐢𝐨𝐝 To calculate the rolling average tweet count by 3-day period, we modify the previous query by adding the ROWS BETWEEN 2 PRECEDING AND CURRENT ROW expression to the window function. SELECT user_id, tweet_date, tweet_count, AVG(tweet_count) OVER ( PARTITION BY user_id ORDER BY tweet_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS rolling_avg FROM tweets; 𝐒𝐭𝐞𝐩 𝟑: 𝐑𝐨𝐮𝐧𝐝 𝐭𝐡𝐞 𝐫𝐨𝐥𝐥𝐢𝐧𝐠 𝐚𝐯𝐞𝐫𝐚𝐠𝐞 𝐭𝐰𝐞𝐞𝐭 𝐜𝐨𝐮𝐧𝐭 𝐭𝐨 𝟐 𝐝𝐞𝐜𝐢𝐦𝐚𝐥 𝐩𝐨𝐢𝐧𝐭𝐬 Finally, we round up the rolling averages to the nearest 2 decimal points by incorporating the ROUND() function into the previous query. SELECT user_id, tweet_date, ROUND(AVG(tweet_count) OVER ( PARTITION BY user_id ORDER BY tweet_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) ,2) AS rolling_avg_3d FROM tweets; Test your query here: https://lnkd.in/ewFBqDVr

    • No alternative text description for this image
  • Can you solve this Microsoft SQL Interview question? Write a query to identify the top 2 Power Users who sent the highest number of messages on Microsoft Teams in August 2022. Display the IDs of these 2 users along with the total number of messages they sent. Output the results in descending order based on the count of the messages. 𝐓𝐡𝐞 𝐃𝐚𝐭𝐚: messages table: •message_id (integer) •sender_id (integer) •receiver_id (integer) •total_sales (decimal) •content (varchar) •sent_date (date time) 𝐇𝐢𝐧𝐭 𝟏: First, we must ensure that all the messages are sent in August 2022. We can pull the month and year information using the EXTRACT function: SELECT sender_id, message_id FROM messages WHERE EXTRACT(MONTH FROM sent_date) = '8' AND EXTRACT(YEAR FROM sent_date) = '2022'; 𝐁𝐞𝐟𝐨𝐫𝐞 𝐈 𝐫𝐞𝐯𝐞𝐚𝐥 𝐭𝐡𝐞 𝐟𝐮𝐥𝐥 𝐬𝐨𝐥𝐮𝐭𝐢𝐨𝐧, 𝐩𝐥𝐞𝐚𝐬𝐞 𝐭𝐞𝐬𝐭 & 𝐫𝐮𝐧 𝐲𝐨𝐮𝐫 𝐒𝐐𝐋 𝐪𝐮𝐞𝐫𝐲 𝐟𝐨𝐫 𝐟𝐫𝐞𝐞 𝐡𝐞𝐫𝐞: https://lnkd.in/gkM3AmQX 𝐓𝐡𝐞 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧: To find the top 2 Power Users who sent the most messages on Microsoft Teams in August 2022, we need to first determine the count of messages sent by each user, which we'll refer to as "senders". We start by extracting the month and year from the sent_date field and filtering the results to only include messages sent in August 2022. We then use the GROUP BY clause to group the messages by sender_id and calculate the count of messages using the COUNT() function: SELECT sender_id, COUNT(message_id) AS count_messages FROM messages WHERE EXTRACT(MONTH FROM sent_date) = '8' AND EXTRACT(YEAR FROM sent_date) = '2022' GROUP BY sender_id; Since we assume that no two users have sent the same number of messages in August 2022, we can simply use an ORDER BY clause in descending order to sort the results based on the count of messages. Finally, we use a LIMIT clause to restrict the results to only the top 2 senders, giving us the desired outcome. SELECT sender_id, COUNT(message_id) AS count_messages FROM messages WHERE EXTRACT(MONTH FROM sent_date) = '8' AND EXTRACT(YEAR FROM sent_date) = '2022' GROUP BY sender_id ORDER BY count_messages DESC LIMIT 2;

    • No alternative text description for this image
  • Can you solve this real Amazon SQL interview question, about finding the highest-grossing items in each category? 𝐏𝐫𝐨𝐛𝐥𝐞𝐦 𝐒𝐭𝐚𝐭𝐞𝐦𝐞𝐧𝐭 Assume you're given a table containing data on Amazon customers and their spending on products in different categories. Write a query to identify the top two highest-grossing products within each category in the year 2022. 𝐈𝐧𝐩𝐮𝐭 𝐃𝐚𝐭𝐚 (𝐩𝐫𝐨𝐝𝐮𝐜𝐭_𝐬𝐩𝐞𝐧𝐝 𝐭𝐚𝐛𝐥𝐞): •category (string) •product (string) •user_id (integer) •spend (decimal) • transaction_date (timestamp) p.s. you can explore and SELECT * all the input data on DataLemur: https://lnkd.in/gu2DES78 𝐇𝐢𝐧𝐭𝐬: Hint 1: Find the highest-grossing products by category and product To identify the highest-grossing products, we need to calculate the total spend for each category and product. We should also filter the transactions to only include those from the year 2022. SELECT category, product, SUM(spend) AS total_spend FROM product_spend WHERE EXTRACT(YEAR FROM transaction_date) = 2022 GROUP BY category, product; Hint 2: Rank the products by total spend within each category using a RANK() window function 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧: Because the full solution is a bit long, and I don't want to provide spoilers, you can check it out on the DataLemur site (and test + run your own SQL query solution on there too): https://lnkd.in/gu2DES78

    • No alternative text description for this image
  • DataLemur 🐵 (Ace the SQL & Data Interview) reposted this

    View profile for Shreya Mittal

    Senior Data Engineer @ Visa | Masters in Data Science from LJMU, UK | Hive | Spark | NLP | ETL | Data Analytics

    I've had so much help in interviews using the book written by Nick Singh 📕🐒 that I cannot express. The SQL & DB design chapter is by far the most valuable resource I keep referring again and again. I was asked in a recent interview if I had used MINUS in practical life in SQL and I kept thinking, I hadn't. After the interview, I glanced at one of the questions (8.22) Nick has mentioned in his book which was asked in Twitter ( now X) about obtaining all existing users on 2021-01-01 that did not follow any page in the 100 most popular topics for that day. Solution: First we need to obtain top 100 most popular topics for given date. Then we need to find out all users who followed no topic included in this list or equivalently, find those users who followed those topics and then filter them out of this list of users that existed on 2021-01-01. * use MINUS (or EXCEPT) operator and subtract those following top 100 topics ( via inner join) from entire user universe * use a WHERE NOT EXISTS clause in similar fashion WITH top_topics AS ( Select * from topic_ranking where ranking_date = '2021-01-01' and rank <= 100) SELECT DISTINCT user_id FROM user_topics WHERE follow_date <= '2021-01-01' MINUS SELECT user_id FROM user_topics u JOIN top_topics t ON u.topic_id = t.topic_id; DataLemur 🐵 (Ace the SQL & Data Interview) : I would highly recommend this platform for all aspiring data engineers #sql #interview

  • 🚨 URGENT NEWS ABOUT DATALEMUR PLEASE READ 🚨

    View profile for Nick Singh 📕🐒
    Nick Singh 📕🐒 Nick Singh 📕🐒 is an Influencer

    Founder of DataLemur.com (Ace SQL Interviews) • Best-Selling Author of Ace the Data Science Interview • Ex-Facebook • LinkedIn Top Voice • 175,000+ Nerds Follow Me on LinkedIn for Data/SQL/Career tips!

    Elon & DOGE cut $300K of USAID funding to DataLemur last week – and I just found out WHY (it's SO stupid) 😡 For context, I'm the CEO of DataLemur 🐵 (Ace the SQL & Data Interview) and we relied on $300K of US Government funding to help 200,000+ people across 106 countries: - Learn SQL for free:  https://lnkd.in/eeW-C9gQ - Ace SQL/Data interviews for free:  https://lnkd.in/epfEqDis - Play SQL Squid Games for free: https://lnkd.in/eSAee8DR 𝐖𝐞 𝐣𝐮𝐬𝐭 𝐟𝐨𝐮𝐧𝐝 𝐨𝐮𝐭 𝐖𝐇𝐘 𝐄𝐥𝐨𝐧 & 𝐃𝐎𝐆𝐄 𝐜𝐮𝐭 𝐟𝐮𝐧𝐝𝐬. It's because our non-profit DataLemur: - Favored LEFT JOINs - Biased against RIGHT JOINs - Promoted UNIONs (not just for some, but UNION ALL) Trump took to Truth Social and posted the following about Foreign Keys: "𝘛𝘩𝘦𝘳𝘦’𝘴 𝘢 𝘸𝘦𝘣𝘴𝘪𝘵𝘦— 𝘛𝘌𝘙𝘙𝘐𝘉𝘓𝘌 𝘸𝘦𝘣𝘴𝘪𝘵𝘦—𝘨𝘪𝘷𝘪𝘯𝘨 𝘢𝘸𝘢𝘺 𝘚𝘘𝘓 𝘢𝘯𝘥 𝘋𝘢𝘵𝘢 𝘪𝘯𝘵𝘦𝘳𝘷𝘪𝘦𝘸 𝘳𝘦𝘴𝘰𝘶𝘳𝘤𝘦𝘴 𝘧𝘰𝘳 𝘍𝘙𝘌𝘌 𝘵𝘰 𝘵𝘩𝘦 𝘸𝘩𝘰𝘭𝘦 𝘞𝘖𝘙𝘓𝘋 𝘶𝘴𝘪𝘯𝘨 𝘠𝘖𝘜𝘙 𝘈𝘮𝘦𝘳𝘪𝘤𝘢𝘯 𝘵𝘢𝘹𝘱𝘢𝘺𝘦𝘳 𝘥𝘰𝘭𝘭𝘢𝘳𝘴. 𝘛𝘩𝘪𝘴 𝘴𝘪𝘵𝘦 𝘪𝘴 𝘴𝘰𝘤𝘪𝘢𝘭𝘪𝘴𝘮, 𝘱𝘭𝘢𝘪𝘯 𝘢𝘯𝘥 𝘴𝘪𝘮𝘱𝘭𝘦. 𝘐𝘧 𝘴𝘰𝘮𝘦𝘵𝘩𝘪𝘯𝘨 𝘪𝘴 𝘨𝘰𝘰𝘥, 𝘺𝘰𝘶 𝘗𝘈𝘠 𝘧𝘰𝘳 𝘪𝘵! 𝘌𝘷𝘦𝘳𝘺𝘣𝘰𝘥𝘺 𝘬𝘯𝘰𝘸𝘴 𝘵𝘩𝘢𝘵! 𝘉𝘶𝘵 𝘯𝘰, 𝘵𝘩𝘦 𝘨𝘰𝘷𝘦𝘳𝘯𝘮𝘦𝘯𝘵 𝘸𝘢𝘯𝘵𝘴 𝘵𝘰 𝘨𝘪𝘷𝘦 𝘢𝘸𝘢𝘺 𝘔𝘐𝘓𝘓𝘐𝘖𝘕𝘚 𝘴𝘰 𝘴𝘰𝘮𝘦 𝘴𝘪𝘵𝘦 𝘤𝘢𝘯 𝘫𝘶𝘴𝘵 𝘵𝘦𝘢𝘤𝘩 𝘺𝘰𝘶𝘳 𝘬𝘪𝘥𝘴 𝘳𝘢𝘥𝘪𝘤𝘢𝘭 𝘭𝘦𝘧𝘵𝘪𝘴𝘵 𝘚𝘘𝘓 𝘵𝘦𝘤𝘩𝘯𝘰𝘭𝘰𝘨𝘪𝘦𝘴. 𝘈𝘯𝘥 𝘸𝘰𝘳𝘴𝘦 𝘵𝘩𝘦𝘺 𝘶𝘴𝘦 𝘈𝘔𝘌𝘙𝘐𝘊𝘈𝘕 𝘥𝘰𝘭𝘭𝘢𝘳𝘴 𝘰𝘯 𝘍𝘖𝘙𝘌𝘐𝘎𝘕 𝘬𝘦𝘺𝘴. 𝘞𝘦 𝘩𝘢𝘷𝘦 𝘵𝘩𝘦 𝘣𝘦𝘴𝘵 𝘬𝘦𝘺𝘴, 𝘈𝘮𝘦𝘳𝘪𝘤𝘢𝘯 𝘬𝘦𝘺𝘴. 𝘋𝘰𝘯’𝘵 𝘸𝘦 𝘧𝘰𝘭𝘬𝘴? 𝘛𝘩𝘦 𝘉𝘌𝘚𝘛 𝘬𝘦𝘺𝘴 – 𝘣𝘶𝘵 𝘋𝘢𝘵𝘢𝘓𝘦𝘮𝘶𝘳 𝘬𝘦𝘦𝘱𝘴 𝘶𝘴𝘪𝘯𝘨 𝘍𝘖𝘙𝘌𝘐𝘎𝘕 𝘒𝘌𝘠𝘚 𝘪𝘯 𝘥𝘢𝘵𝘢𝘣𝘢𝘴𝘦𝘴. 𝘞𝘩𝘢𝘵 𝘢 𝘮𝘦𝘴𝘴, 𝘌𝘓𝘖𝘕 – 𝘉𝘙𝘐𝘓𝘓𝘐𝘈𝘕𝘛 𝘌𝘓𝘖𝘕 – 𝘸𝘪𝘭𝘭 𝘤𝘭𝘦𝘢𝘯 𝘪𝘵 𝘶𝘱 𝘢𝘯𝘥 𝘴𝘦𝘯𝘥 𝘵𝘩𝘦𝘴𝘦 𝘋𝘢𝘵𝘢 𝘓𝘦𝘮𝘶𝘳𝘴 𝘱𝘢𝘤𝘬𝘪𝘯𝘨 𝘪𝘵 𝘣𝘢𝘤𝘬 𝘵𝘰 𝘵𝘩𝘦𝘪𝘳 𝘤𝘳𝘰𝘰𝘬𝘦𝘥 𝘫𝘶𝘯𝘨𝘭𝘦." Please like/share this post to support us in this trying time 🙏

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • 21 Falsehoods Programmers Believe About Names: 1. People have exactly one full name which they go by. 2. People have, at this point in time, exactly one canonical full name. 3. People’s names fit within a certain defined amount of space. 4. People’s names do not change. 5. People’s names change, but only at a certain enumerated set of events. 6. People’s names are written in ASCII. 7. People’s names are written in any single character set. 8. People’s names are all mapped in Unicode code points. 10. People’s names are case insensitive. 11. People’s names sometimes have prefixes or suffixes, but you can safely ignore those. 12. People’s names do not contain numbers. 13. People’s names are not written in ALL CAPS. 14. People’s names have an order to them. Picking any ordering scheme will automatically result in consistent ordering among all systems, as long as both use the same ordering scheme for the same name. 15. People’s first names and last names are, by necessity, different. 16. People have last names, family names, or anything else which is shared by folks recognized as their relatives. 17. People’s names are unique....or surely diverse enough such that no million people share the same name? 18. My system will never have to deal with names from China. Or Japan. Or Korea. Or Ireland, the United Kingdom, the United States, Spain, Mexico, Brazil, Peru, Russia, Sweden, Botswana, South Africa, Trinidad, Haiti, France, or the Klingon Empire, all of which have “weird” naming schemes in common use. 19. People’s names are assigned at birth. OK, maybe not at birth, but at least pretty close to birth. Alright, alright, within a year or so of birth. Five years? 20. Two different systems containing data about the same person will use the same name for that person. 21. I can safely assume that this dictionary of bad words contains no people’s names in it. Credit to Patrick McKenize/patio11's fantastic blog post "Falsehoods Programmers Believe About Names"!

    • No alternative text description for this image

Similar pages

Browse jobs