Pyspark Learning Hub
Pyspark Learning Hub
WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR
PYSPARK LEARNING HUB : DAY - 9
WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR
PYSPARK LEARNING HUB : DAY - 9
INPUT
PLAYER_ID DEVICE_ID EVENT_DATE GAMES_PLAYED
1 2 2016-03-01 5
1 2 2016-05-02 6
2 3 2017-06-25 1
3 1 2016-03-02 0
3 4 2018-07-03 5
OUTPUT
OUTPUT
PLAYER_ID DEVICE_ID
1 2
2 3
3 1
WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR
PYSPARK LEARNING HUB : DAY - 9
WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR
PYSPARK LEARNING HUB : DAY - 9
rank_df=df.withColumn("rk",rank().over(Window.partitionBy(df["pla
yer_id"]).orderBy(df["event_date"])))
rank_df.show()
WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR
PYSPARK LEARNING HUB : DAY - 9
rank_df.filter(rank_df["rk"] ==
1).select("player_id","device_id").show()
WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR
PYSPARK LEARNING HUB : DAY - 9
WWW.LINKEDIN.COM/IN/AKASHMAHINDRAKAR