经典的SparkSQL/Hive-SQL/MySQL面试-练习题(一)

本文提供了九道经典的数据库面试题,涵盖SparkSQL、Hive-SQL和MySQL,包括各题的需求及解决方案,是大数据和SQL爱好者提升技能的实战练习。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

第一题

需求:

已知一个表order,有如下字段:date_time,order_id,user_id,amount。
数据样例:2020-10-10,1003003981,00000001,1000,请用sql进行统计:
(1)2019年每个月的订单数、用户数、总成交金额。
(2)2020年10月的新客数(指在2020年10月才有第一笔订单)

实现:

(1)
SELECT t1.year_month,
count(t1.order_id) AS order_cnt,
count(DISTINCT t1.user_id) AS user_cnt,
sum(amount) AS total_amount
FROM
  (SELECT order_id,
          user_id,
          amount,
date_format(date_time,'yyyy-MM') year_month
FROM test_db.test3
WHERE date_format(date_time,'yyyy') = '2019') t1
GROUP BY t1.year_month;

(2)
SELECT count(user_id)
FROM test_db.test3
GROUP BY user_id
HAVING date_format(min(date_time),'yyyy-MM')='2020-10';

第二题

需求:

存在如下客户访问商铺的数据,访问日志存储的表名为user_visit,访客的用户id为user_id,被访问的店铺名称为shop_name。
数据如下:
+--------+-----------+
|user_id |  shop_name|
+--------+-----------+
|     u1|beautiful_a|
|      u2|beautiful_b|
|     u1|beautiful_b|
|      u3|beautiful_c|
|     u4|beautiful_b|
|      u1|beautiful_a|
|     u5|beautiful_b|
|      u4|beautiful_b|
|     u6|beautiful_c|
|      u1|beautiful_b|
|     u2|beautiful_a|
|      u5|beautiful_a|
+--------+-----------+

实现:

(1)
SELECT shop_name,
count(*) uv
FROM
      (SELECT user_id,
              shop_name
FROM test_db.user_visit
GROUP BY user_id, shop_name) t
GROUP BY shop_name as t;

(2)   
SELECT t2.shop_name,
       t2.user_id,
       t2.cnt
FROM
  (SELECT t1.*,
          row_number() over(partition BY t1.shop_name ORDER BY t1.cnt DESC) rank
FROM
     (SELECT user_id,
             shop_name,
count(*) AS cnt
FROM test_db.user_visit
GROUP BY user_id, shop_name) t1
    ) t2
WHERE rank < 4; 

第三题

需求:

有如下的用户访问数据
+-------+----------+-----------+
|user_id|visit_date|visit_count|
+-------+----------+-----------+
|    u01| 2017/1/21|          5|
|    u02| 2017/1/23|          6|
|    u03| 2017/1/22|          8|
|    u04| 2017/1/20|          3|
|    u01| 2017/1/23|          6|
|    u01| 2017/2/21|          8|
|    u02| 2017/1/23|          6|
|    u01| 2017/2/22|          4|
+-------+----------+-----------+

要求使用SQL统计出每个用户的累积访问次数,如下表所示:
+-------+-----------+------------------+---------------+
|user_id|visit_month|month_total_visit_cnt|total_visit_cnt|
+-------+-----------+------------------+---------------+
|    u01|    2017-01|                 11|             11|
|    u01|    2017-02|                12|             23|
|    u02|    2017-01|                 12|             12|
|    u03|    2017-01|                 8|              8|
|    u04|    2017-01|                  3|              3|
+-------+-----------+------------------+---------------+

实现:

SELECT t2.user_id,
       t2.visit_month,
       month_total_visit_cnt,
sum(month_total_visit_cnt) over (partition BY user_id ORDER BY visit_month) AS total_visit_cnt
FROM
   (SELECT user_id,
           visit_month,
sum(visit_count) AS month_total_visit_cnt
FROM
      (SELECT user_id,
date_format(regexp_replace(visit_date,'/','-'),'yyyy-MM') AS visit_month,
             visit_count   
FROM test_db.test1) t1
GROUP BY user_id, visit_month) t2
ORDER BY t2.user_id, t2.visit_month;

第四题

需求:

表user(user_id,name,age)记录用户信息,表view_record(user_id,movie_name)记录用户观影信息,请根据年龄段(每10岁为一个年龄段,70以上的单独作为一个年龄段)观看电影的次数进行排序?

实现:

SELECT
  t2.age_group,
sum(t1.cnt) as view_cnt
FROM

  (SELECT   user_id,
count(*) cnt
FROM test_db.view_record
GROUP BY user_id) t1
JOIN
  (SELECT   user_id,
CASE WHEN age <= 10 AND age > 0 THEN '0-10'
WHEN age <= 20 AND age > 10 THEN '10-20'
WHEN age >20 AND age <=30 THEN '20-30'
WHEN age >30 AND age <=40 THEN '30-40'
WHEN age >40 AND age <=50 THEN '40-50'
WHEN age >50 AND age <=60 THEN '50-60'
WHEN age >60 AND age <=70 THEN '60-70'
ELSE '70以上' END as age_group
FROM test_db.user) t2 ON t1.user_id = t2.user_id 
GROUP BY t2.age_group 
ORDER BY t2.age_group;

第五题

需求:

有日志如下,请用SQL求得所有用户和活跃用户的总数及平均年龄。(活跃用户指连续两天都有访问记录的用户)
日期       用户   年龄
+----------+-------+---+
| date_time|user_id|age|
+----------+-------+---+
|2019-02-12|      2| 19|
|2019-02-11|      1| 23|
|2019-02-11|      3| 39|
|2019-02-11|      1| 23|
|2019-02-11|      3| 39|
|2019-02-13|      1| 23|
|2019-02-15|      2| 19|
|2019-02-11|      2| 19|
|2019-02-11|      1| 23|
|2019-02-16|      2| 19|
+----------+-------+---+

实现:

SELECT sum(total_user_cnt) total_user_cnt,
       sum(total_user_avg_age) total_user_avg_age,
       sum(two_days_cnt) two_days_cnt,
       sum(avg_age) avg_age
FROM
  (SELECT 0 total_user_cnt,
          0 total_user_avg_age,
          count(*) AS two_days_cnt,
          cast(sum(age) / count(*) AS decimal(5,2)) AS avg_age
   FROM
     (SELECT user_id,
             max(age) age
      FROM
        (SELECT user_id,
                max(age) age
         FROM
           (SELECT user_id,
                   age,
                   date_sub(date_time,rank) flag
            FROM
              (SELECT date_time,
                      user_id,
                      max(age) age,
                      row_number() over(PARTITION BY user_id ORDER BY date_time) rank
               FROM test_db.test5
               GROUP BY date_time,user_id) t1 ) t2
         GROUP BY user_id, flag
         HAVING count(*) >=2) t3
      GROUP BY user_id) t4
   UNION ALL 
   SELECT count(*) total_user_cnt,
                    cast(sum(age) /count(*) AS decimal(5,2)) total_user_avg_age,
                    0 two_days_cnt,
                    0 avg_age
   FROM
     (SELECT user_id,
             max(age) age
      FROM test_db.test5
      GROUP BY user_id) t5) t6;

第六题

需求:

请用sql写出所有用户中在2020年10月份第一次购买商品的金额,表order字段:
购买用户:user_id,金额:money,购买时间:pay_time(格式:2017-10-01),订单id:order_id

实现:

SELECT user_id,  pay_time, money, order_id
FROM  (SELECT user_id, money, pay_time, order_id,           
 row_number() over (PARTITION BY user_id ORDER BY pay_time) rank
    FROM test_db.order 
WHERE date_format(pay_time,'yyyy-MM') = '2020-10') t 
WHERE rank = 1;

第七题

需求:

有一个账号表如下,请写出SQL语句,查询各自区组的money排名前3的账号
dist_id string  '区组id',
account string  '账号',
gold_coin     int    '金币'

实现:

SELECT dist_id,
       account,
       gold_coin
FROM
  (SELECT   dist_id,
              account,
             gold_coin,
      row_number () over (PARTITION BY dist_id ORDER BY gold_coin DESC) rank
  FROM test_db.test9) t
WHERE rank <= 3;

第八题

需求:

充值日志表credit_log,字段如下:
`dist_id` int  '区组id',
`account` string  '账号',
`money` int   '充值金额',
`create_time` string  '订单时间'

请写出SQL语句,查询充值日志表2020年08月08号每个区组下充值额最大的账号,要求结果:
区组id,账号,金额,充值时间  

实现:

WITH temp AS
  (SELECT dist_id,
account,
sum(`money`) sum_money
FROM test_db.test8
WHERE date_format(create_time,'yyyy-MM-dd') = '2020-08-08'
GROUP BY dist_id,
account)
SELECT t1.dist_id,
       t1.account,
       t1.sum_money
FROM
  (SELECT temp.dist_id,
          temp.account,
          temp.sum_money,
rank() over(partition BY temp.dist_id
ORDER BY temp.sum_money DESC) ranks
FROM TEMP) t1
WHERE ranks = 1;

第九题

需求:

有一个线上服务器访问日志格式如下(用sql答题)
时间              接口            IP
+----------------------------------------+------------+
| date_time      |interface           |ip          |
+-------------------+--------------------+------------+
|2016-11-09 15:22:05|/request/user/logout| 110.32.5.23|
|2020-09-28 14:23:1 |/api_v1/user/detail | 57.2.1.16  |
|2020-09-28 14:59:40|/api_v2/read/buy    | 172.6.5.166|
+-------------------+--------------------+------------+

求2020年9月28号下午14点(14-15点),访问/api_v1/user/detail接口的top10的ip地址

实现:

SELECT ip,
count(*) AS count
FROM test_db.test7
WHERE date_format(date_time,'yyyy-MM-dd HH') >= '2020-09-28 14'
AND date_format(date_time,'yyyy-MM-dd HH') < '2020-09-28 15'
AND interface='/api_v1/user/detail'
GROUP BY ip
ORDER BY count desc
LIMIT 10;
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值