From the course: Data Pipeline Automation with GitHub Actions Using R and Python
Unlock the full course today
Join today to access over 24,500 courses taught by industry experts.
Data pipeline maintenance - GitHub Tutorial
From the course: Data Pipeline Automation with GitHub Actions Using R and Python
Data pipeline maintenance
- [Instructor] Congratulations. We now have a deployed data pipelines running on GitHub actions. In this chapter, we will focus on the maintenance steps of the data pipeline. Let's start by discussing when and why you need to maintain the data pipeline. Typically, software upgrades, new features, and data integrity will force you to make changes in the code or the structure of the data pipeline. Software upgrades and new features typically trigger changes in the environment settings. Generally, it is recommended to have a clear deployment strategy for new features or changes in the environment. A classic setting is to have three environments, dev, stage, and prod, where the dev is where you roll out first and test new software updates before pushing it to the stage and prod, and a new feature you test on the stage environment before pushing the changes to the prod environment. This is will ensure that when you update your docker image or change a feature in the data pipeline, the…