0% found this document useful (0 votes)
108 views

Data Engineering With Databricks

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views

Data Engineering With Databricks

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Welcome to

Data Engineering
with Databricks

©2022 Databricks Inc. — All rights reserved 1


Course Objectives
1. Use the Databricks Data Science and Engineering Workspace to perform
common code development tasks in a data engineering workflow.
2. Use Spark to extract data from a variety of sources, apply common cleaning
transformations, and manipulate complex data with advanced functions.
3. Define and schedule data pipelines that incrementally ingest and process
data through multiple tables in the lakehouse using Delta Live Tables.
4. Orchestrate data pipelines with Databricks Workflow Jobs and schedule
dashboard updates to keep analytics up-to-date.
5. Configure permissions in Unity Catalog to ensure that users have proper
access to databases for analytics and dashboarding.

©2022 Databricks Inc. — All rights reserved


Course Overview
Module 0: Get Started with PySpark Programming (OPTIONAL)
Module 1: Get Started with Databricks Data Science and Engineering Workspace
Module 2: Transform Data with Spark (SQL or PySpark)
Module 3: Manage Data with Delta Lake
Module 4: Build Data Pipelines with Delta Live Tables (SQL or PySpark)
Module 5: Deploy Workloads with Databricks Workflows
Module 6: Manage Data Access for Analytics with Unity Catalog

©2022 Databricks Inc. — All rights reserved


Module Agendas

©2022 Databricks Inc. — All rights reserved 4


Module Agenda
Get Started with PySpark Programming

Spark SQL Overview


DE 0.1 - Spark SQL
DE 0.2L - Spark SQL Lab
DE 0.3 - DataFrame & Column
DE 0.4L - Purchase Revenues Lab
DE 0.5 - Aggregation
DE 0.6L - Revenue by Traffic Lab

©2022 Databricks Inc. — All rights reserved 5


Module Agenda
Get Started with Databricks Data Science and Engineering Workspace

Introduction to the Databricks Lakehouse Platform


Databricks Architecture and Services
Demo - Navigating the Workspace
DE 1.1 - Create and Manage Clusters Interactively
DE 1.2 - Notebook Basics
Git Versioning with Databricks Repos
Demo - Using Databricks Repos
DE 1.3L - Getting Started with the Databricks Lakehouse Platform Lab

©2022 Databricks Inc. — All rights reserved 6


Module Agenda
Transform Data with Spark SQL Transform Data with PySpark

DE 2.1 - Querying Files Directly DE 3.1 - Querying Files Directly


DE 2.2 - Options for External Sources DE 3.2 - Reader & Writer
DE 2.3L - Extract Data Lab DE 3.3L - Extract Data Lab
DE 2.4 - Cleaning Data DE 3.4 - Cleaning Data
DE 2.5 - Complex Transformations DE 3.5 - Complex Transformations
DE 2.6 - UDFs and Control Flow DE 3.6 - UDFs
DE 2.7L - Reshape Data Lab DE 3.7L - Reshape Data Lab

©2022 Databricks Inc. — All rights reserved 7


Module Agenda
Manage Data with Delta Lake

What is Delta Lake


DE 4.1 - Schemas and Tables
DE 4.2 - Version and Optimize Delta Tables
DE 4.3L - Manipulate Delta Tables Lab
DE 4.4 - Set Up Delta Tables
DE 4.5 - Load Data into Delta Lake
DE 4.6 - Load Data Lab

©2022 Databricks Inc. — All rights reserved 8


Module Agenda
Build Data Pipelines with Delta Live Tables

Introduction to Delta Live Tables


DE 5.1 - DLT UI Walkthrough
DE 5.1A - SQL Pipelines
DE 5.1B - Python Pipelines
DE 5.2 - Python vs SQL
DE 5.3 - Pipeline Results
DE 5.4 - Pipeline Event Logs

©2022 Databricks Inc. — All rights reserved 9


Module Agenda
Deploy Workloads with Databricks Workflows

Introduction to Workflows
Building and Monitoring Workflow Jobs
DE 6.1 - Scheduling Tasks with the Jobs UI
DE 6.2L - Jobs Lab
DE 6.3 - Navigating Databricks SQL and Attaching to Endpoints
DE 6.4 - Last Mile ETL with DBSQL

©2022 Databricks Inc. — All rights reserved 10


Module Agenda
Manage Data Access for Analytics with Unity Catalog

Introduction to Unity Catalog


DE 7.1 - Managing principals in Unity Catalog
DE 7.2 - Managing Unity Catalog metastores
DE 7.3 - Creating compute resources for Unity Catalog access
DE 7.4 - Creating and governing data objects with Unity Catalog
DE 7.5 - Create and Share Tables in Unity Catalog
DE 7.6 - Create external tables in Unity Catalog
DE 7.7 - Upgrade a table to Unity Catalog
DE 7.8 - Create views and limit table access

©2022 Databricks Inc. — All rights reserved 11

You might also like