数据仓库笔试题-pdd

最新推荐文章于 2024-07-17 19:00:57 发布

浩海紫冰

最新推荐文章于 2024-07-17 19:00:57 发布

阅读量1.6k

点赞数 1

CC 4.0 BY-SA版权

分类专栏： spark sql 基本功大数据

本文链接：https://blog.csdn.net/stay_forever/article/details/109192123

大数据同时被 3 个专栏收录

14 篇文章

订阅专栏

基本功

9 篇文章

订阅专栏

spark sql

2 篇文章

订阅专栏

题目：商品活动表 goods_activity，字段id主键、goods_id 商品id,act_id 活动id,create_time 活动创建时间、status 上线状态
2 上线，3 下线

问题一、每个活动的每个商品的开始时间和结束时间？

问题二、假设数据中有脏数据，同一个活动可能两次上线，也可能两次结束，需要把活动上线重复的状态按照时间取第一条，结束上线状态重复的按时间取最后一条。

-- 第一题：
-- id,goods_id,act_id,create_time
-- asdf345,123,bybt123,2020-05-01,2
-- asdf346,123,bybt123,2020-05-10,3
-- asdf347,123,bybt123,2020-06-15,2
-- asdf348,123,bybt123,2020-06-20,3

-- ->
-- goods_id,act_id,online_dt,offline_dt
-- 123,bybt123,2020-05-01,2020-05-10
-- 123,bybt123,2020-06-15,2020-06-20

select
goods_id,act_id,online_dt,offline_dt
from 
(
    select goods_id,act_id,create_time as online_dt
    ,lead(create_time,1,'9999-12-31')  over(partition by goods_id,act_id order by create_time) as offline_dt
    ,status
    from goods_activity
) a
where status=2

-- 第二题：
-- id,goods_id,act_id,create_time lag lead
-- asdf341,123,bybt123,2020-05-01,2 0 2
-- asdf343,123,bybt123,2020-05-02,2 2 3
-- asdf34q,123,bybt123,2020-05-10,3 2 2
-- asdf346,123,bybt123,2020-06-15,2 3 3
-- asdf348,123,bybt123,2020-06-20,3 2 3
-- asdf350,123,bybt123,2020-06-21,3 3 0

-- ->
-- goods_id,act_id,online_dt,offline_dt
-- 123,bybt123,2020-05-01,2020-05-10
-- 123,bybt123,2020-06-15,2020-06-21

select
  goods_id,act_id,online_dt,offline_dt
from 
(
  select goods_id,act_id,create_time as online_dt
      ,lead(create_time,1,'9999-12-31')  over(partition by goods_id,act_id order by create_time) as offline_dt
      ,status
  from 
  (
    select
      goods_id,act_id,create_time,status,last_status,lag_status
    from 
    (
      select 
        goods_id,act_id,create_time,status,
        lead(status,1,0)  over(partition by goods_id,act_id order by create_time) as last_status,
        -- row_number() over(partition by goods_id,act_id,status order by create_time)  as 
        lag(status,1,0)   over(partition by goods_id,act_id order by create_time) as lag_status
      from (
        select 'asdf341' as id ,123 as goods_id,'bybt123' as act_id, '2020-05-01' as create_time, 2 as status UNiOn all
        select 'asdf343' as id ,123 as goods_id,'bybt123' as act_id, '2020-05-02' as create_time, 2 as status UNiOn all
        select 'asdf34q' as id ,123 as goods_id,'bybt123' as act_id, '2020-05-10' as create_time, 3 as status UNiOn all
        select 'asdf346' as id ,123 as goods_id,'bybt123' as act_id, '2020-06-15' as create_time, 2 as status UNiOn all
        select 'asdf348' as id ,123 as goods_id,'bybt123' as act_id, '2020-06-20' as create_time, 3 as status UNiOn all
        select 'asdf350' as id ,123 as goods_id,'bybt123' as act_id, '2020-06-21' as create_time, 3 as status 
        )goods_activity
    ) a
    where case when status=2 and status-lag_status <>0  then true 
               when status=3 and status-last_status <>0  then true end 
  ) a
) a
where status=2