GCP Data Engineer Curriculum
GCP Data Engineer Curriculum
GCP Interfaces
o Console
• Navigating the GCP Console
• Configuring the GCP Console for Efficiency
• Using the GCP Console for Service Management
o Shell
• Introduction to GCP Shell
• Command-line Interface (CLI) Basics
• GCP Shell Commands for Service Deployment and Management
o SDK
• Overview of GCP Software Development Kits (SDKs)
• Installing and Configuring SDKs
• Writing and Executing GCP SDK Commands
GCP Locations
o Regions
• Understanding GCP Regions
• Selecting Regions for Service Deployment
• Impact of Region on Service Performance
o Zones
•Exploring GCP Zones
•Distributing Resources Across Zones
•High Availability and Disaster Recovery Considerations
o Importance
• Significance of Choosing the Right Location
• Global vs. Regional Resources
• Factors Influencing Location Decisions
Cloud SQL
o Introduction to Cloud SQL
o Creating and Managing Cloud SQL Instances
o Configuring database settings, users, and access controls.
o Connecting to Cloud SQL instances using Cloud SQL studio, Shell, Workbenches
o Importing and exporting data in Cloud SQL.
o Backups and High Availability
o Integration with Other GCP Services
o Managing database user roles and permissions.
o Introduction to DMS
o End to End Database migration Project
• Offline: Export and Import method
• Online: DMS method
Databricks on GCP
o What is Databricks lakehouse platform
o Databricks architecture and components
o Setting up and Administering a Databricks workspace
o Managing data with Delta Lake
o Databricks Unity Catalog
o Note books and clusters
o ELT with Spark SQL and Python
o optimize performance within Databricks.
o Incremental Data Processing
o Delta Live tables
o Case study: creating end to end workflows
Cloud Pub/Sub
o Introduction to Pub/Sub
o Understanding the role of Pub/Sub in event-driven architectures.
o Key Pub/Sub concepts: topics, subscriptions, messages, and acknowledgments.
o Creating and Managing Topics and Subscriptions
• Using the GCP Console to create Pub/Sub topics and subscriptions.
• Configuring message retention policies and acknowledgment settings.
o Publishing and Consuming Messages
• Writing and deploying code to publish messages to a topic.
• Implementing subscribers to consume and process messages from subscriptions.
o Integration with Other GCP Services
• Connecting Pub/Sub with Cloud Functions for serverless event-driven computing.
• Integrating Pub/Sub with Dataflow for real-time stream processing.
o Streaming use-case using Dataflow
Data Fusion
o Introduction to Data Fusion
• Overview of Data Fusion as a fully managed data integration service.
• Use cases for Data Fusion in ETL and data migration.
o Building Data Integration Pipelines
• Creating ETL pipelines using the visual interface.
• Configuring data sources, transformations, and sinks.
• Using pre-built templates for common integration scenarios.
o Integration with GCP and External Services
• Integrating Data Fusion with BigQuery, Cloud Storage, and other GCP services.
o End to End pipeline using Data fusion with Wrangler, GCS, BigQuery
Cloud Functions
o Cloud Functions Introduction
o Setting up Cloud Functions in GCP
o Event-driven architecture and use cases
o Writing and deploying Cloud Functions
o Triggering Cloud Functions:
• HTTP triggers
• Pub/Sub triggers
• Cloud Storage triggers
o Monitoring and logging Cloud Functions
o Usecase-1: Loading the files from GCS to BigQuery as soon as it is uploaded.
Terraform
o Terraform Introduction
o Installing and configuring Terraform.
o Infrastructure Provisioning
o Terraform basic commands
• Init, plan, apply, destroy
o Create Resources in Google Cloud Platform
• GCS buckets
• Dataproc cluster
• BigQuery Datasets and tables
• And more resources as needed
Architecture Planning:
o Proficient in architecting end-to-end data solutions on GCP.
o Understanding the principles of designing scalable, reliable, and cost-effective data
architectures.
Certification Readiness
o Prepare for the Google Cloud Professional Data Engineer (PDE) and
o Associate Cloud Engineer (ACE) certifications through a combination of theoretical knowledge
and hands-on experience.
The course will empower students with practical skills in SQL, PySpark, Apache Beam, DAG creations,
and architecture planning, ensuring they are well-prepared to tackle real-world data engineering
challenges and successfully obtain GCP certifications.
Thank You.