第一题
需求:
已知一个表order,有如下字段:date_time,order_id,user_id,amount。
数据样例:2020-10-10,1003003981,00000001,1000,请用sql进行统计:
(1)2019年每个月的订单数、用户数、总成交金额。
(2)2020年10月的新客数(指在2020年10月才有第一笔订单)
实现:
(1)
SELECT t1.year_month,
count(t1.order_id) AS order_cnt,
count(DISTINCT t1.user_id) AS user_cnt,
sum(amount) AS total_amount
FROM
(SELECT order_id,
user_id,
amount,
date_format(date_time,'yyyy-MM') year_month
FROM test_db.test3
WHERE date_format(date_time,'yyyy') = '2019') t1
GROUP BY t1.year_month;
(2)
SELECT count(user_id)
FROM test_db.test3
GROUP BY user_id
HAVING date_format(min(date_time),'yyyy-MM')='2020-10';
第二题
需求:
存在如下客户访问商铺的数据,访问日志存储的表名为user_visit,访客的用户id为user_id,被访问的店铺名称为shop_name。
数据如下:
+--------+-----------+
|user_id | shop_name|
+--------+-----------+
| u1|beautiful_a|
| u2|beautiful_b|
| u1|beautiful_b|
| u3|beautiful_c|
| u4|beautiful_b|
| u1|beautiful_a|
| u5|beautiful_b|
| u4|beautiful_b|
| u6|beautiful_c|
| u1|beautiful_b|
| u2|beautiful_a|
| u5|beautiful_a|
+--------+-----------+
实现:
(1)
SELECT shop_name,
count(*) uv
FROM
(SELECT user_id,
shop_name
FROM test_db.user_visit
GROUP BY user_id, shop_name) t
GROUP BY shop_name as t;
(2)
SELECT t2.shop_name,
t2.user_id,
t2.cnt
FROM
(SELECT t1.*,
row_number() over(partition BY t1.shop_name ORDER BY t1.cnt DESC) rank
FROM
(SELECT user_id,
shop_name,
count(*) AS cnt
FROM test_db.user_visit
GROUP BY user_id, shop_name) t1
) t2
WHERE rank < 4;
第三题
需求:
有如下的用户访问数据
+-------+----------+-----------+
|user_id|visit_date|visit_count|
+-------+----------+-----------+
| u01| 2017/1/21| 5|
| u02| 2017/1/23| 6|
| u03| 2017/1/22| 8|
| u04| 2017/1/20| 3|
| u01| 2017/1/23| 6|
| u01| 2017/2/21| 8|
| u02| 2017/1/23| 6|
| u01| 2017/2/22| 4|
+-------+----------+-----------+
要求使用SQL统计出每个用户的累积访问次数,如下表所示:
+-------+-----------+------------------+---------------+
|user_id|visit_month|month_total_visit_cnt|total_visit_cnt|
+-------+-----------+------------------+---------------+
| u01| 2017-01| 11| 11|
| u01| 2017-02| 12| 23|
| u02| 2017-01| 12| 12|
| u03| 2017-01| 8| 8|
| u04| 2017-01| 3| 3|
+-------+-----------+------------------+---------------+
实现:
SELECT t2.user_id,
t2.visit_month,
month_total_visit_cnt,
sum(month_total_visit_cnt) over (partition BY user_id ORDER BY visit_month) AS total_visit_cnt
FROM
(SELECT user_id,
visit_month,
sum(visit_count) AS month_total_visit_cnt
FROM
(SELECT user_id,
date_format(regexp_replace(visit_date,'/','-'),'yyyy-MM') AS visit_month,
visit_count
FROM test_db.test1) t1
GROUP BY user_id, visit_month) t2
ORDER BY t2.user_id, t2.visit_month;
第四题
需求:
表user(user_id,name,age)记录用户信息,表view_record(user_id,movie_name)记录用户观影信息,请根据年龄段(每10岁为一个年龄段,70以上的单独作为一个年龄段)观看电影的次数进行排序?
实现:
SELECT
t2.age_group,
sum(t1.cnt) as view_cnt
FROM
(SELECT user_id,
count(*) cnt
FROM test_db.view_record
GROUP BY user_id) t1
JOIN
(SELECT user_id,
CASE WHEN age <= 10 AND age > 0 THEN '0-10'
WHEN age <= 20 AND age > 10 THEN '10-20'
WHEN age >20 AND age <=30 THEN '20-30'
WHEN age >30 AND age <=40 THEN '30-40'
WHEN age >40 AND age <=50 THEN '40-50'
WHEN age >50 AND age <=60 THEN '50-60'
WHEN age >60 AND age <=70 THEN '60-70'
ELSE '70以上' END as age_group
FROM test_db.user) t2 ON t1.user_id = t2.user_id
GROUP BY t2.age_group
ORDER BY t2.age_group;
第五题
需求:
有日志如下,请用SQL求得所有用户和活跃用户的总数及平均年龄。(活跃用户指连续两天都有访问记录的用户)
日期 用户 年龄
+----------+-------+---+
| date_time|user_id|age|
+----------+-------+---+
|2019-02-12| 2| 19|
|2019-02-11| 1| 23|
|2019-02-11| 3| 39|
|2019-02-11| 1| 23|
|2019-02-11| 3| 39|
|2019-02-13| 1| 23|
|2019-02-15| 2| 19|
|2019-02-11| 2| 19|
|2019-02-11| 1| 23|
|2019-02-16| 2| 19|
+----------+-------+---+
实现:
SELECT sum(total_user_cnt) total_user_cnt,
sum(total_user_avg_age) total_user_avg_age,
sum(two_days_cnt) two_days_cnt,
sum(avg_age) avg_age
FROM
(SELECT 0 total_user_cnt,
0 total_user_avg_age,
count(*) AS two_days_cnt,
cast(sum(age) / count(*) AS decimal(5,2)) AS avg_age
FROM
(SELECT user_id,
max(age) age
FROM
(SELECT user_id,
max(age) age
FROM
(SELECT user_id,
age,
date_sub(date_time,rank) flag
FROM
(SELECT date_time,
user_id,
max(age) age,
row_number() over(PARTITION BY user_id ORDER BY date_time) rank
FROM test_db.test5
GROUP BY date_time,user_id) t1 ) t2
GROUP BY user_id, flag
HAVING count(*) >=2) t3
GROUP BY user_id) t4
UNION ALL
SELECT count(*) total_user_cnt,
cast(sum(age) /count(*) AS decimal(5,2)) total_user_avg_age,
0 two_days_cnt,
0 avg_age
FROM
(SELECT user_id,
max(age) age
FROM test_db.test5
GROUP BY user_id) t5) t6;
第六题
需求:
请用sql写出所有用户中在2020年10月份第一次购买商品的金额,表order字段:
购买用户:user_id,金额:money,购买时间:pay_time(格式:2017-10-01),订单id:order_id
实现:
SELECT user_id, pay_time, money, order_id
FROM (SELECT user_id, money, pay_time, order_id,
row_number() over (PARTITION BY user_id ORDER BY pay_time) rank
FROM test_db.order
WHERE date_format(pay_time,'yyyy-MM') = '2020-10') t
WHERE rank = 1;
第七题
需求:
有一个账号表如下,请写出SQL语句,查询各自区组的money排名前3的账号
dist_id string '区组id',
account string '账号',
gold_coin int '金币'
实现:
SELECT dist_id,
account,
gold_coin
FROM
(SELECT dist_id,
account,
gold_coin,
row_number () over (PARTITION BY dist_id ORDER BY gold_coin DESC) rank
FROM test_db.test9) t
WHERE rank <= 3;
第八题
需求:
充值日志表credit_log,字段如下:
`dist_id` int '区组id',
`account` string '账号',
`money` int '充值金额',
`create_time` string '订单时间'
请写出SQL语句,查询充值日志表2020年08月08号每个区组下充值额最大的账号,要求结果:
区组id,账号,金额,充值时间
实现:
WITH temp AS
(SELECT dist_id,
account,
sum(`money`) sum_money
FROM test_db.test8
WHERE date_format(create_time,'yyyy-MM-dd') = '2020-08-08'
GROUP BY dist_id,
account)
SELECT t1.dist_id,
t1.account,
t1.sum_money
FROM
(SELECT temp.dist_id,
temp.account,
temp.sum_money,
rank() over(partition BY temp.dist_id
ORDER BY temp.sum_money DESC) ranks
FROM TEMP) t1
WHERE ranks = 1;
第九题
需求:
有一个线上服务器访问日志格式如下(用sql答题)
时间 接口 IP
+----------------------------------------+------------+
| date_time |interface |ip |
+-------------------+--------------------+------------+
|2016-11-09 15:22:05|/request/user/logout| 110.32.5.23|
|2020-09-28 14:23:1 |/api_v1/user/detail | 57.2.1.16 |
|2020-09-28 14:59:40|/api_v2/read/buy | 172.6.5.166|
+-------------------+--------------------+------------+
求2020年9月28号下午14点(14-15点),访问/api_v1/user/detail接口的top10的ip地址
实现:
SELECT ip,
count(*) AS count
FROM test_db.test7
WHERE date_format(date_time,'yyyy-MM-dd HH') >= '2020-09-28 14'
AND date_format(date_time,'yyyy-MM-dd HH') < '2020-09-28 15'
AND interface='/api_v1/user/detail'
GROUP BY ip
ORDER BY count desc
LIMIT 10;