Musify

An end to end data engineering project made with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!

Description

Objective

Events data will be generated using Eventsim which will replicate a fake music streaming service (like Spotify), and a data pipeline will be running that consumes this real-time data. Data would be similar to an event of a user listening to a song, navigating on the website, and authenticating. This data would be processed in real-time and stored to a data lake periodically (every two minutes in this case). The hourly batch job will then consume this data, apply transformations, and create the desired tables for our dashboard to generate visualizations. We will try to analyze metrics like popular songs, active users, user demographics etc.

Dataset

Eventsim is a program writeen in Scala that generates event data to replicate page requests for a fake music web site. The results look like real use data, but are fake. The docker image is borrowed from viirya's fork of it, as the original project has gone without maintenance for a few years now.

Eventsim uses song data from Million Songs Dataset to generate events. I have used a subset of 10000 songs.

Tools & Technologies

Cloud - Google Cloud Platform
Infrastructure as Code software - Terraform
Containerization - Docker, Docker Compose
Stream Processing - Kafka, Spark Streaming
Orchestration - Airflow
Data Lake - Google Cloud Storage
Language - Python, SQL

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
airflow		airflow
dbt		dbt
eventsim		eventsim
images		images
kafka		kafka
scripts		scripts
setup		setup
spark		spark
terraform		terraform
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Musify

Description

Objective

Dataset

Tools & Technologies

Architecture

About

Uh oh!

Releases

Packages

Languages

License

satvikjadhav/musify

Folders and files

Latest commit

History

Repository files navigation

Musify

Description

Objective

Dataset

Tools & Technologies

Architecture

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages