0% found this document useful (0 votes)
90 views

De Mod 5 Deploy Workloads With Databricks Workflows

This document provides an overview of Databricks Workflows, which is a fully-managed cloud-based task orchestration service. It describes how workflows can be used to orchestrate jobs, machine learning tasks, and arbitrary code. The document outlines common workflow patterns like sequence, funnel, and fan-out and discusses features of workflows like deep platform integration, proven reliability, and simple authoring. It also covers monitoring and debugging workflows through scheduling, alerts, access control, and viewing job run histories.

Uploaded by

Julio Alberto M
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

De Mod 5 Deploy Workloads With Databricks Workflows

This document provides an overview of Databricks Workflows, which is a fully-managed cloud-based task orchestration service. It describes how workflows can be used to orchestrate jobs, machine learning tasks, and arbitrary code. The document outlines common workflow patterns like sequence, funnel, and fan-out and discusses features of workflows like deep platform integration, proven reliability, and simple authoring. It also covers monitoring and debugging workflows through scheduling, alerts, access control, and viewing job run histories.

Uploaded by

Julio Alberto M
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Deploy

Workloads with
Databricks
Workflows

©2022 Databricks Inc. — All rights reserved 1


Module Agenda
Deploy Workloads with Databricks Workflows

Introduction to Workflows
Building and Monitoring Workflow Jobs
DE 6.1 - Scheduling Tasks with the Jobs UI
DE 6.2L - Jobs Lab
DE 6.3 - OPTIONAL Navigating Databricks SQL
DE 6.4 - OPTIONAL Last Mile ETL with DBSQL

©2022 Databricks Inc. — All rights reserved 2


Introduction to
Workflows

©2022 Databricks Inc. — All rights reserved 3


Course Objectives

1 Describe the main features and use cases of Databricks Workflows

Create a task orchestration workflow composed of various task


2
types

3 Utilize monitoring and debugging features of Databricks Workflows

4 Describe workflow best-practices

©2022 Databricks Inc. — All rights reserved


Introduction to Workflows
Databricks Workflows

Workflows is a fully-managed cloud


based general purpose task
orchestration service for the entire Lakehouse Platform
Lakehouse. Data Data Data Data Science
Warehousing Engineering Streaming and ML

Unity Catalog
Workflows is a service for data engineers, Fine-grained governance for data and AI

Delta Lake
data scientists and analysts to build Data reliability and performance

reliable data, analytics and AI workflows Cloud Data Lake


All structured and unstructured data
on any cloud.

©2022 Databricks Inc. — All rights reserved


Introduction to Workflows

Databricks Workflows has two main task orchestration services;


• Workflow Jobs (Workflows): Workflows for every job.

• Delta Live Tables (DLT): Automated data pipelines for Delta Lake

DLT pipeline can be a task in a Workflow.

©2022 Databricks Inc. — All rights reserved


Introduction to Workflows
Use Cases

Orchestration of Machine Learning Tasks Arbitrary Code, External Data Ingestion and
Dependent Jobs API Calls, Custom Tasks Transformation
Run MLflow notebook task
Jobs running on schedule, in a job Run tasks in a job which ETL jobs, Support for batch
containing dependent can contain Jar file, Spark and streaming, Built in data
tasks/steps Submit, Python Script, SQL quality constraints,
task, dbt monitoring & logging

Jobs Workflows Jobs Workflows Jobs Workflows Delta Live Tables

©2022 Databricks Inc. — All rights reserved


Introduction to Workflows
Features

Orchestrate Anything Fully Managed Simple Workflow


Anywhere Authoring
Run diverse workloads for the full Remove operational overhead An easy point-and-click authoring
data and AI lifecycle, on any cloud. with a fully managed experience for all your data teams
Orchestrate; orchestration service enabling not just those with specialized
you to focus on your workflows skills.
• Notebooks
not on managing your
• Delta Live Tables
infrastructure.
• Jobs for SQL
• ML models, and more.

©2022 Databricks Inc. — All rights reserved 8


Introduction to Workflows
Features

Deep Platform Integration Proven Reliability

Designed and built into your Have full confidence in your


lakehouse platform giving you workflows leveraging our proven
deep monitoring capabilities and experience running tens of
centralized observability across millions of production workloads
all your workflows. daily across AWS, Azure, and GCP.

©2022 Databricks Inc. — All rights reserved 9


Introduction to Workflows
How to Leverage Workflows

• Allows you to build simple ETL/ML task orchestration


• Reduces infrastructure overhead
• Easily integrate with external tools
• Enables non-engineers to build their own workflows using simple UI
• Cloud-provider independent
• Enables re-using clusters to reduce cost and startup time

©2022 Databricks Inc. — All rights reserved


Introduction to Workflows
Common Workflow Patterns

Sequence Funnel Fan-out

Sequence Funnel
● Data transformation/ Fan-out, star pattern
● Multiple data sources
processing/cleaning ● Single data source
● Data collection
● Bronze/silver/gold tables ● Data ingestion and
distribution

©2022 Databricks Inc. — All rights reserved


Introduction to Workflows
Example Workflow

Data ingestion funnel, e.g. with Auto


Loader, DLT

Data filtering, quality assurance,


transformation, e.g. DLT, SQL, Python

ML feature extraction, e.g. mlflow

Persisting features and training


prediction model
©2022 Databricks Inc. — All rights reserved
Building and
Monitoring
Workflow Jobs

©2022 Databricks Inc. — All rights reserved 13


Introduction to Workflows
Workflow Components

Workflows Job

Tasks Schedule Cluster

What? When? How?

©2022 Databricks Inc. — All rights reserved


Creating a Workflow
Task Definition

While creating a task;


• Define the task type
• Choose the cluster type
• Job clusters and All-purpose clusters can
be used.
• A cluster can be used by multiple tasks.
This reduces cost and startup time.
• If you want to create a new cluster,
you must have required permissions.
• Define task dependency if task
depends on another task

©2022 Databricks Inc. — All rights reserved


Monitoring and Debugging
Scheduling and Alerts

You can run your jobs immediately or


periodically through an easy-to-use
scheduling system.

You can specific alerts to be notified


when runs of a job begin, complete or
fail. Notifications can be sent via email,
Slack or AWS SNS.

©2022 Databricks Inc. — All rights reserved


Monitoring and Debugging
Access Control

Workflows integrates with existing


resources access controls, enabling you
to easily manage access across different
teams.

©2022 Databricks Inc. — All rights reserved


Monitoring and Debugging
Job Run History Run duration

Workflows keeps track of job runs and


save information about the success or
failure of each task in the job run.

Job tasks Job run

©2022 Databricks Inc. — All rights reserved


Monitoring and Debugging
Repair a Failed Job Run

Repair feature allows you to re-run only


the failed task and sub-tasks, which
reduces the time and resources required
to recover from unsuccessful job runs.

©2022 Databricks Inc. — All rights reserved

You might also like