0% found this document useful (0 votes)

6 views

GCP Data Engineer Curriculum

Uploaded by

aepuri

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

GCP Data Engineer Curriculum

Uploaded by

aepuri

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Google Cloud Data Engineering Training

with Real-world Projects and Case Studies

Role: GCP Data Engineer

Course duration: 2.5 months
Mode: Online
Teaching Language: English
Trainer: Shaik Saidhul
7305101711
GCP Cloud Basics
GCP Introduction
o The need for cloud computing in modern businesses.
o Key features and offerings of Google Cloud Platform (GCP).
o Overview of core GCP services and products.
o Benefits and advantages of using cloud infrastructure.
o Step-by-step guide to creating a free-tier account on GCP.

GCP Interfaces
o Console
• Navigating the GCP Console
• Configuring the GCP Console for Efficiency
• Using the GCP Console for Service Management
o Shell
• Introduction to GCP Shell
• Command-line Interface (CLI) Basics
• GCP Shell Commands for Service Deployment and Management
o SDK
• Overview of GCP Software Development Kits (SDKs)
• Installing and Configuring SDKs
• Writing and Executing GCP SDK Commands

GCP Locations
o Regions
• Understanding GCP Regions
• Selecting Regions for Service Deployment
• Impact of Region on Service Performance
o Zones
•Exploring GCP Zones
•Distributing Resources Across Zones
•High Availability and Disaster Recovery Considerations
o Importance
• Significance of Choosing the Right Location
• Global vs. Regional Resources
• Factors Influencing Location Decisions

GCP IAM & Admin

o Identities
• Introduction to Identity and Access Management (IAM)
• Users, Groups, and Service Accounts
• Best Practices for Identity Management
o Roles
• GCP IAM Roles Overview
• Defining Custom Roles
• Role-Based Access Control (RBAC) Implementation
o Policy
• Resource-based Policies
• Understanding and Implementing Organization Policies
• Auditing and Monitoring Policies
o Resource Hierarchy
• GCP Resource Hierarchy Structure
• Managing Resources in a Hierarchy
• Organizational Structure Best Practices

Linux Basics on Cloud Shell

o Getting started with Linux
o Linux Installation
o Basic Linux Commands
• Cloud shell tips
• File and Directory Operations (ls, cd, pwd, mkdir, rmdir, cp, mv, touch, rm, nano)
• File Content Manipulation (cat, less, head, tail, grep)
• Text Processing (awk, sed, cut, sort, uniq)
• User and Permission related (whoami, id, su, sudo, chmod, chown)

Python for Data Engineer

o Data Types
• Strings
• Operators
• Numbers (Int, Float)
• Booleans
o Data Structures
• Lists
• Tuples
• Dictionaries
• Sets
o Python Programming Constructs
• if, elif, else statements
• for loops, while loops
• Exception Handling
• File I/O operations
o Modular Programming in Python
• Functions & Lambda Functions
• Classes
GCP Data Engineering Tools
Google Cloud Storage
o Overview of Cloud Storage as a scalable and durable object storage service.
o Understanding buckets and objects in Cloud Storage.
o Use cases for Cloud Storage, such as data backup, multimedia storage, and website content
o Creating and managing Cloud Storage buckets.
o Uploading and downloading objects to and from Cloud Storage.
o Setting access controls and permissions for buckets and objects.
o Data Transfer and Lifecycle Management
o Versioning and Object Versioning
o Integration with Other GCP Services
o Implementing best practices for optimizing Cloud Storage performance.
o Securing data in Cloud Storage with encryption and access controls.
o Monitoring and logging for Cloud Storage operations.

Cloud SQL
o Introduction to Cloud SQL
o Creating and Managing Cloud SQL Instances
o Configuring database settings, users, and access controls.
o Connecting to Cloud SQL instances using Cloud SQL studio, Shell, Workbenches
o Importing and exporting data in Cloud SQL.
o Backups and High Availability
o Integration with Other GCP Services
o Managing database user roles and permissions.
o Introduction to DMS
o End to End Database migration Project
• Offline: Export and Import method
• Online: DMS method

BigQuery (SQL development)

o Introduction to BigQuery
o BigQuery Architecture
o Use cases for BigQuery in business intelligence and analytics.
o Various method of creating table in BigQuery
o BigQuery Data Sources and File Formats
o Native table and External Tables
o SQL Queries and Performance Optimization
• Writing and optimizing SQL queries in BigQuery.
• Understanding query execution plans and best practices.
• Partitioning and clustering tables for performance.
o Data Integration and Export
• Loading data into BigQuery from Cloud Storage, Cloud SQL, and other sources.
• Exporting data from BigQuery to various formats.
• Real-time data streaming into BigQuery.
o Configuring access controls and permissions in BigQuery.
o BigQuery Views:
• Views
• Materialized Views
• Authorized Views
o Integration with Other GCP Services
• Integrating BigQuery with Dataflow for ETL processes.
• Building data pipelines with BigQuery and Composer.
o Case Study-1: Spotify
o Case Study-2: Social Media

DataProc (Pyspark Development)

o Introduction to Hadoop and Apache Spark
o Understanding the difference between Spark and MapReduce
o What is Spark and Pyspark.
o Understanding Spark framework and its functionalities
o Overview of DataProc as a fully managed Apache Spark and Hadoop service.
o Use cases for DataProc in data processing and analytics.
o Cluster Creation and Configuration
• Creating and managing DataProc clusters.
• Configuring cluster properties for performance and scalability.
• Preemptible instances and cost optimization.
o Running Jobs on DataProc
• Submitting and monitoring Spark and Hadoop jobs on DataProc.
• Use of initialization actions and custom scripts.
• Job debugging and troubleshooting.
o Integration with Storage and BigQuery
• Reading and writing data from/to Cloud Storage and BigQuery.
• Integrating DataProc with other storage solutions.
• Performance optimization for data access.
o Automation and scheduling of recurring jobs.
o Case Study-1: Data Cleaning of Employee Travel Records
o End to End Batch Pyspark pipeline using Dataproc, BigQuery, GCS

Databricks on GCP
o What is Databricks lakehouse platform
o Databricks architecture and components
o Setting up and Administering a Databricks workspace
o Managing data with Delta Lake
o Databricks Unity Catalog
o Note books and clusters
o ELT with Spark SQL and Python
o optimize performance within Databricks.
o Incremental Data Processing
o Delta Live tables
o Case study: creating end to end workflows

DataFlow (Apache Beam development)

o Introduction to DataFlow
o Use cases for DataFlow in real-time analytics and ETL.
o Understanding the difference between Apache Spark and Apache Beam
o How Dataflow is different from Dataproc
o Building Data Pipelines with Apache Beam
• Writing Apache Beam pipelines for batch and stream processing.
• Custom Pipelines and Pre-defined pipelines
• Transformations and windowing concepts.
o Integration with Other GCP Services
• Integrating DataFlow with BigQuery, Pub/Sub, and other GCP services.
• Real-time analytics and visualization using DataFlow and BigQuery.
• Workflow orchestration with Composer.
o End to End Streaming Pipeline using Apache beam with Dataflow, Python app, PubSub,
BigQuery, GCS
o Template method of creating pipelines

Cloud Pub/Sub
o Introduction to Pub/Sub
o Understanding the role of Pub/Sub in event-driven architectures.
o Key Pub/Sub concepts: topics, subscriptions, messages, and acknowledgments.
o Creating and Managing Topics and Subscriptions
• Using the GCP Console to create Pub/Sub topics and subscriptions.
• Configuring message retention policies and acknowledgment settings.
o Publishing and Consuming Messages
• Writing and deploying code to publish messages to a topic.
• Implementing subscribers to consume and process messages from subscriptions.
o Integration with Other GCP Services
• Connecting Pub/Sub with Cloud Functions for serverless event-driven computing.
• Integrating Pub/Sub with Dataflow for real-time stream processing.
o Streaming use-case using Dataflow

Cloud Composer (DAG Creations)

o Introduction to Composer/Airflow
o Overview of Airflow Architecture
o Use cases for Composer in managing and scheduling workflows.
o Creating and Managing Workflows
• Creating and configuring Composer environments.
• Defining and scheduling workflows using Apache Airflow.
• Monitoring and managing workflow executions.
o Integration with Data Engineering Services
• Orchestrating workflows involving BigQuery, DataFlow, and other services.
• Coordinating ETL processes with Composer.
• Integrating with external systems and APIs.
o Error Handling and Troubleshooting
• Handling errors and retries in Composer workflows.
• Debugging and troubleshooting failed workflow executions.
• Logging and monitoring for Composer workflows.
o Level-1-DAG: Orchestrating the BigQuery pipelines
o Level-2-DAG: Orchestrating the DataProc pipelines
o Level-3-DAG: Orchestrating the Dataflow pipelines
o Implementing CI/CD in Composer Using Cloud Build and GitHub

Data Fusion
o Introduction to Data Fusion
• Overview of Data Fusion as a fully managed data integration service.
• Use cases for Data Fusion in ETL and data migration.
o Building Data Integration Pipelines
• Creating ETL pipelines using the visual interface.
• Configuring data sources, transformations, and sinks.
• Using pre-built templates for common integration scenarios.
o Integration with GCP and External Services
• Integrating Data Fusion with BigQuery, Cloud Storage, and other GCP services.
o End to End pipeline using Data fusion with Wrangler, GCS, BigQuery

Cloud Functions
o Cloud Functions Introduction
o Setting up Cloud Functions in GCP
o Event-driven architecture and use cases
o Writing and deploying Cloud Functions
o Triggering Cloud Functions:
• HTTP triggers
• Pub/Sub triggers
• Cloud Storage triggers
o Monitoring and logging Cloud Functions
o Usecase-1: Loading the files from GCS to BigQuery as soon as it is uploaded.

Terraform
o Terraform Introduction
o Installing and configuring Terraform.
o Infrastructure Provisioning
o Terraform basic commands
• Init, plan, apply, destroy
o Create Resources in Google Cloud Platform
• GCS buckets
• Dataproc cluster
• BigQuery Datasets and tables
• And more resources as needed

By the End of the course What Students can Expect

Proficient in SQL Development:
o Mastering SQL for querying and manipulating data within Google BigQuery and Cloud SQL.
o Writing complex queries and optimizing performance for large-scale datasets.
o Understanding schema design and best practices for efficient data storage.
Pyspark Development Skills:
o Proficiency in using PySpark for large-scale data processing on Google Cloud.
o Developing and optimizing Spark jobs for distributed data processing.
o Understanding Spark's RDDs, Dataframes, and transformations for data manipulation.

Apache Beam Development Mastery:

o Creating data processing pipelines using Apache Beam.
o Understanding the concepts of parallel processing and data parallelism.
o Implementing transformations and integrating with other GCP services.

DAG Creations with Cloud Composer:

o Designing and implementing Directed Acyclic Graphs (DAGs) for orchestrating workflows.
o Using Cloud Composer for workflow automation and managing dependencies.
o Developing DAGs that integrate various GCP services for end-to-end data processing.

Notebooks, Workflows with Databricks:

• Understand how to build and manage data pipelines using Databricks and Delta Lake.
• Efficiently query and analyze large datasets with Databricks SQL and Apache Spark.
• Implement scalable workflows and optimize performance within Databricks.

Architecture Planning:
o Proficient in architecting end-to-end data solutions on GCP.
o Understanding the principles of designing scalable, reliable, and cost-effective data
architectures.

Certification Readiness
o Prepare for the Google Cloud Professional Data Engineer (PDE) and
o Associate Cloud Engineer (ACE) certifications through a combination of theoretical knowledge
and hands-on experience.

The course will empower students with practical skills in SQL, PySpark, Apache Beam, DAG creations,
and architecture planning, ensuring they are well-prepared to tackle real-world data engineering
challenges and successfully obtain GCP certifications.

Thank You.

Dhruv I Resume Python Aws
No ratings yet
Dhruv I Resume Python Aws
8 pages
Kerala Caste List
No ratings yet
Kerala Caste List
49 pages
MD SP Vod Content1.1 C01 1208031
No ratings yet
MD SP Vod Content1.1 C01 1208031
57 pages
Totto-Chan: The Little Girl at The Window
No ratings yet
Totto-Chan: The Little Girl at The Window
2 pages
GCP DATA ENGINEER
No ratings yet
GCP DATA ENGINEER
8 pages
Become A Big Data Engineer 1
No ratings yet
Become A Big Data Engineer 1
7 pages
Ian
No ratings yet
Ian
7 pages
GCP Data Engineer Course Content
No ratings yet
GCP Data Engineer Course Content
7 pages
IIT Kharagpur Data Science PDF
No ratings yet
IIT Kharagpur Data Science PDF
22 pages
TOC - GCP Cloud Architect (Advanced) - 3 Days
No ratings yet
TOC - GCP Cloud Architect (Advanced) - 3 Days
4 pages
CloudxLab BDHS Course Details
No ratings yet
CloudxLab BDHS Course Details
9 pages
Croma Campus - Cloud Computing Training Curriculum
No ratings yet
Croma Campus - Cloud Computing Training Curriculum
6 pages
backend
No ratings yet
backend
11 pages
Master Google Cloud Platform (GCP) : Core Infrastructure With Bonus Data Engineering and Devops Services
No ratings yet
Master Google Cloud Platform (GCP) : Core Infrastructure With Bonus Data Engineering and Devops Services
5 pages
MongoDB Roadmap
No ratings yet
MongoDB Roadmap
3 pages
Google Cloud Platform Tutorial
No ratings yet
Google Cloud Platform Tutorial
6 pages
Mastering_Databricks_Data_Engineering-AWS-Azure
No ratings yet
Mastering_Databricks_Data_Engineering-AWS-Azure
6 pages
ANURAG SRIVASTAVA Resume Template
No ratings yet
ANURAG SRIVASTAVA Resume Template
3 pages
Data Engineeringg
No ratings yet
Data Engineeringg
4 pages
Pruthvi GCP_Data Engineer +++++++ (1)
No ratings yet
Pruthvi GCP_Data Engineer +++++++ (1)
8 pages
J OHN
No ratings yet
J OHN
8 pages
Curriculum Vitae Asif
No ratings yet
Curriculum Vitae Asif
9 pages
Krishna Nimbalkar DE Resume 2024 04 20-1
No ratings yet
Krishna Nimbalkar DE Resume 2024 04 20-1
1 page
resume template for golanf developer
No ratings yet
resume template for golanf developer
3 pages
2
No ratings yet
2
7 pages
Vinaykumar
No ratings yet
Vinaykumar
6 pages
ANSAR HAYAT BigData Architect
No ratings yet
ANSAR HAYAT BigData Architect
3 pages
PDF
No ratings yet
PDF
25 pages
Gagan
No ratings yet
Gagan
8 pages
Oracle Goldengate Fundamentals Troubleshooting and Tuning
No ratings yet
Oracle Goldengate Fundamentals Troubleshooting and Tuning
5 pages
Pls Academy Pde Student Slides 4 2405
No ratings yet
Pls Academy Pde Student Slides 4 2405
129 pages
GOLang
No ratings yet
GOLang
5 pages
Rajesh 15+
No ratings yet
Rajesh 15+
7 pages
Cloud Management and Operations Module 1
No ratings yet
Cloud Management and Operations Module 1
102 pages
unit 3 cc
No ratings yet
unit 3 cc
4 pages
GCP Fundamentals
100% (1)
GCP Fundamentals
178 pages
4.4 - Managed Services
No ratings yet
4.4 - Managed Services
17 pages
Kiran_Data Engineer
No ratings yet
Kiran_Data Engineer
6 pages
AnilNamdev GCP Cloud Engineer
No ratings yet
AnilNamdev GCP Cloud Engineer
4 pages
Abhinav Puskuru - GCP Data Engineer
No ratings yet
Abhinav Puskuru - GCP Data Engineer
5 pages
Next Level Web Development Topics [Draft]
No ratings yet
Next Level Web Development Topics [Draft]
8 pages
Data Engineering Bootcamp
No ratings yet
Data Engineering Bootcamp
5 pages
SAP BODS Course Curriculum: SAP Business Objects Data Services Overview
No ratings yet
SAP BODS Course Curriculum: SAP Business Objects Data Services Overview
3 pages
00 - Introduction to Parallel and Distributed Computing
No ratings yet
00 - Introduction to Parallel and Distributed Computing
3 pages
Toc Dot Net Stream
No ratings yet
Toc Dot Net Stream
26 pages
DevOps-with-AWS-by-Mr-Veerababu-Naresh-IT
No ratings yet
DevOps-with-AWS-by-Mr-Veerababu-Naresh-IT
17 pages
Oreilly Test2
No ratings yet
Oreilly Test2
12 pages
Ravali Data Engineer GCP
No ratings yet
Ravali Data Engineer GCP
8 pages
Rohit Ranan CV Updated
No ratings yet
Rohit Ranan CV Updated
3 pages
MNARESH
No ratings yet
MNARESH
2 pages
AWS Data Eng
No ratings yet
AWS Data Eng
8 pages
Data Engineering Bootcamp Detailed Syllabus
No ratings yet
Data Engineering Bootcamp Detailed Syllabus
3 pages
Oracle Database Performance Tuning Advanced Features and Best Practices For Dbas
No ratings yet
Oracle Database Performance Tuning Advanced Features and Best Practices For Dbas
238 pages
Anil Nouduri - Data Engineer
No ratings yet
Anil Nouduri - Data Engineer
6 pages
Akhil Reddy GCP
No ratings yet
Akhil Reddy GCP
8 pages
Mastering Azure DevOps Solutions
100% (1)
Mastering Azure DevOps Solutions
7 pages
Naveen's Resume - AWS DE
No ratings yet
Naveen's Resume - AWS DE
5 pages
Oracle Cloud Infrastructure Data Science Professio
No ratings yet
Oracle Cloud Infrastructure Data Science Professio
2 pages
Senior Java Developer (1)
No ratings yet
Senior Java Developer (1)
3 pages
Shyam Sundar Resume
No ratings yet
Shyam Sundar Resume
4 pages
Vamsi_Resume
No ratings yet
Vamsi_Resume
9 pages
Week 1 GCP Notes
No ratings yet
Week 1 GCP Notes
7 pages
Data Engineering with Google Cloud Platform: A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud
From Everand
Data Engineering with Google Cloud Platform: A guide to leveling up as a data engineer by building a scalable data platform with Google Cloud
Adi Wijaya
No ratings yet
MapReduce Questions and Answers Part 1 - Java Code Geeks
No ratings yet
MapReduce Questions and Answers Part 1 - Java Code Geeks
8 pages
Remote Debugging of Hadoop Job With Eclipse - Pravinchavan's Blog
No ratings yet
Remote Debugging of Hadoop Job With Eclipse - Pravinchavan's Blog
4 pages
Hadoop Imp Commands
No ratings yet
Hadoop Imp Commands
21 pages
Apsrtc.: PDF Created With Pdffactory Trial Version
No ratings yet
Apsrtc.: PDF Created With Pdffactory Trial Version
1 page
C&DS Lab Manual
No ratings yet
C&DS Lab Manual
212 pages
Vocabulary Learning and Teaching
No ratings yet
Vocabulary Learning and Teaching
11 pages
Phil Iri 9 Molave
No ratings yet
Phil Iri 9 Molave
39 pages
AScheme and Syllabus MCA 2013
No ratings yet
AScheme and Syllabus MCA 2013
56 pages
Seminar of April 15, 1975
No ratings yet
Seminar of April 15, 1975
7 pages
李光耀的语文观与新加坡的"双语政策"
No ratings yet
李光耀的语文观与新加坡的"双语政策"
142 pages
1er Año Cuadernillo Ingles 2024 - Organized
No ratings yet
1er Año Cuadernillo Ingles 2024 - Organized
62 pages
EM 22 - Literary Periods and Influences in English Literature
No ratings yet
EM 22 - Literary Periods and Influences in English Literature
3 pages
Class 10 Practical
No ratings yet
Class 10 Practical
17 pages
SEM 5 - English - Ecology Notes
No ratings yet
SEM 5 - English - Ecology Notes
7 pages
1st Term Computer 4th
No ratings yet
1st Term Computer 4th
2 pages
DDC Instructions: 1. General 3. Pin Assignment
No ratings yet
DDC Instructions: 1. General 3. Pin Assignment
4 pages
English Language Curriculum BUDGET OF WORK FOR SUMMATIVE ASSESSMENT AND INSTRUCTION Grade 9 1st 4th 1
No ratings yet
English Language Curriculum BUDGET OF WORK FOR SUMMATIVE ASSESSMENT AND INSTRUCTION Grade 9 1st 4th 1
45 pages
Download Full (Ebook) Microsoft Office Inside Out 2021 by Joe Habraken ISBN 9780137564095, 0137564090 PDF All Chapters
100% (5)
Download Full (Ebook) Microsoft Office Inside Out 2021 by Joe Habraken ISBN 9780137564095, 0137564090 PDF All Chapters
81 pages
G-8-Revision-Number System
100% (2)
G-8-Revision-Number System
3 pages
WORD Grade 2
No ratings yet
WORD Grade 2
9 pages
SystemC 2011 New Features
No ratings yet
SystemC 2011 New Features
98 pages
Ujian BING Semester 1 Kls 12
No ratings yet
Ujian BING Semester 1 Kls 12
7 pages
Evidence Based School Counseling Making a Difference With Data Driven Practices 1st Edition Carey Dimmitt - The ebook version is available in PDF and DOCX for easy access
100% (1)
Evidence Based School Counseling Making a Difference With Data Driven Practices 1st Edition Carey Dimmitt - The ebook version is available in PDF and DOCX for easy access
52 pages
Snatched Into Paradise 2 Cor 12 1 10 Paul S Heavenly Journey in The Context of Early Christian Experience 1st Edition James Buchanan Wallace
100% (14)
Snatched Into Paradise 2 Cor 12 1 10 Paul S Heavenly Journey in The Context of Early Christian Experience 1st Edition James Buchanan Wallace
70 pages
Register as field, tenor and mode
No ratings yet
Register as field, tenor and mode
6 pages
Oracle 11g New Features
No ratings yet
Oracle 11g New Features
3 pages
Unit 1.2-Active Listening Strategies
No ratings yet
Unit 1.2-Active Listening Strategies
10 pages
Suffixes and Prefixes
No ratings yet
Suffixes and Prefixes
2 pages
C Programming Book
No ratings yet
C Programming Book
152 pages
1 Purposive Communication
No ratings yet
1 Purposive Communication
4 pages
CSC126 Tutorial - Lab 4
No ratings yet
CSC126 Tutorial - Lab 4
3 pages
STYLISTICS Historical Perspectives
No ratings yet
STYLISTICS Historical Perspectives
13 pages

GCP Data Engineer Curriculum

Uploaded by

GCP Data Engineer Curriculum

Uploaded by

Google Cloud Data Engineering Training

with Real-world Projects and Case Studies

Role: GCP Data Engineer

GCP IAM & Admin

Linux Basics on Cloud Shell

Python for Data Engineer

BigQuery (SQL development)

DataProc (Pyspark Development)

DataFlow (Apache Beam development)

Cloud Composer (DAG Creations)

By the End of the course What Students can Expect

Apache Beam Development Mastery:

DAG Creations with Cloud Composer:

Notebooks, Workflows with Databricks:

You might also like