Home Assignment Data Engineer
Home Assignment Data Engineer
Assignment description
One of our major customers - TheGoodCorp (TGC) had a massive breach.
Attackers from unknown locations have penetrated some of their Microsoft 365
accounts.
We sampled a few users from TGC and found that some of them were compromised.
(HINT: more than 10 users).
You have been assigned the important task of analyzing the breach and creating an
automated logic to identify compromised users for TGC.
You have attached a raw data set in excel. Column descriptions are in the appendix of
this document.
Please provide python code that generates the following data, and any code used to
determine it (no need for any graphics or fancy formatting):
- A list of compromised users, for each user specify attack start and end times.
- A distribution of the Attackers country locations
- IOC’s of the attack
- Optional: any other information you think is relevant for TGC to know.
*Try researching data you don’t know and understand (google terms you don’t know) to
go beyond analyzing the data statistically or by model.
Submission Notes:
- Please do not share this task with anyone.
- You have 7 days to submit.
- Include your thought process during the task, document as much as you see fit.
- Make sure that the code is as generic as possible and written in Python.
- Focus on:
Correctness of the data.
Time complexity of the code.
Clean, readable, and well-documented code.
Appendix A