0% found this document useful (0 votes)
21 views

Introduction to ADF - LwTN

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Introduction to ADF - LwTN

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Introduction to Azure Data Factory

Manuel Quintana
Agenda

• What is Azure Data Factory And Provisioning


• Integration Runtimes
• Linked Services
• Datasets
• Pipelines
• Data Flows
• Synapse Pipelines and Dataflows
Working with Azure Data Factory
Provisioning Azure Data Factory
Prerequisites

Azure Subscription
Must have an existing Azure Subscription

Azure Roles
Member of the contributor or owner role (or)
Administrator of the Azure subscription
Resource Groups

What is a resource group?


Container that holds related resources
Can hold all resources for solution or selective resources
Deploy, update, and delete them as a group
Stores metadata about the resources
Azure Storage account
What is an Azure Storage account?
General-purpose storage account

What services are available?


Tables
Queues
Files
Blobs
Azure VM Disks
Azure SQL DB

What is Azure SQL DB?


General-purpose relational database

What structures are supported?


Relational data
JSON
Spatial
XML
Provisioning Azure Data Factory

Data Factory
The name must be globally unique.
Subscription
Resource Group
Version (V1 vs V2)
Location
Version Control
Create Resources
Demo
Data Factory Navigation
Let’s get started – Home Hub
Actions
Ingest (Copy Data Activity Wizard)

Orchestrate (Create Pipeline)

Transform Data (Create Data flow)

Configure SSIS Runtime

Other Areas
Discover More
Recent Resources
Feature showcase
Resources
Author Hub

Design Area
Pipelines
Datasets
Data flows
Power Query
Monitoring Hub

Monitoring Options
Dashboards
Pipeline Runs
Trigger Runs
Integration Runtimes
Data flow debug
Manage Hub

Admin Options
Connections
Source Control
Author
Security
Data Factory Resources
Integration Runtimes
Linked Services
Datasets
Integration Runtimes

Integration Runtimes (Manage Hub)


The Integration Runtime is the compute infrastructure used by ADF to provide
the following data integration capabilities:

1. Data Movement (Azure IR, Self-Hosted IR)


2. SSIS package execution (Azure-SSIS IR)

Self-hosted integration runtime


Capable of running copy activities between cloud data stores and private data
stores
Linked Services and Datasets

Linked Services (Manage Hub)


Defines connection information so that Data Factory can connect to the data
source.
Can be reused among pipelines in a Data Factory

Datasets (Author Hub)


Named view of data that points or references the data
Data Stores: Tables, Files, Folders, and Documents
Resource Organization

Folders
Used to group pipeline resources together
Used to group dataset resources together
Used to group data flow’s together
Create Linked Service
Demo
Copy Activity Wizard
Copy Activity Wizard
Task cadence or schedule
Run once now
Run Regularly on schedule (Creates
Trigger)
Source Data Store
Choose existing data set
Create new data set
Destination data store
Choose existing data set
Create new data set
Settings
Data Integration Unit
Degree of copy parallelism
Copy Activity Wizard
Demo
Pipeline Basics
Demo
Get Metadata Activity
Get Metadata activity

Purpose
Retrieve metadata information of data

Metadata options
Item Name
Item Type
Size
Created
Last Modified
Child Items
Content MD5
Structure
Column Count
Exists
Output Parameters

Output Parameters
Outputs can be used in other activities

Output parameter names


Add dynamic content
Debug results (activity output)
Pipeline Design

Metadata Activity → Stored Procedure Activity


Get Metadata Activity
Demo
Stored Procedure Activity
Stored Procedure Activity
Purpose
Invoke a stored procedure
Utilize outputs from other activities

Supports
Azure SQL Database
Synapse Analytics (Azure SQL DW)
SQL Server Database

Limitations
No output parameters to ADF
Stored Procedure Activity
Demo
Lookup Activity
Pipeline Design

Design Pattern
Lookup Activity
Purpose
Retrieve a dataset

Supports
Any Azure Data Factory data source
Executing Stored Procedures
Executing SQL Scripts
Output parameters

Outputs
Single Value
Array / Object
Lookup Activity
Demo
If Condition Activity
Pipeline Design
If Condition Activity

Purpose
If statement functionality
Boolean expression (True/False)

Supports
ADF Expressions and Functions
If True Activities
If False Activities
If Condition Activity
Demo
Data Flows
Overview
What are Data Flows
Purpose
Allows for data transformations

Items
Source
Transformations
Sink

How to Execute
Debug
Data Flow Activity

ADF code converted to Scala


Data Flows are executed in Azure Databricks
Automatic scaling-out as needed
What is Parquet?

File Format
Column oriented data storage
format vs row oriented

Benefits
Storage
Performance
Source

Available Options
Azure SQL Data Warehouse
Azure SQL Database
Cosmos DB
Azure Blob
ADLS Gen1/2
Synapse Analytics

Items
Minimum of 1 Source
Transformations

Available Options
New Branch
Join
Conditional Split
Derived Column
Lookup
Select
Sort
Filter
Etc…
Expressions

Visual Expression Builder


Certain transformations require the usage of
the ADF expression language

Debug
Lets you see live in-progress preview of your
data results from the expression you are
building
Sink

Available Options
Azure SQL Data Warehouse
Azure SQL Database
Cosmos DB
Azure Blob
ADLS Gen1/2
Synapse Analytics

Items
Minimum of 1 Sink
Setup

Business Scenario
• My business has requested to get a file that lists all of the products our
company sells. (Source)
• They also want the model description of the product which comes from a
different table. (Lookup & Select)
• The shipping weight needs to be included but needs to be calculated by
padding the actual weight by 10% to account for packing (Derived Column)
• We also do not need products which have a list price of $0.00 (Filter)
• Finally we need to order the data in a file by the list price descending (Sort &
Sink)
Data Flow Overview
Demo
Scheduling a Pipeline
Triggers
Triggers

Schedule trigger
Invokes pipeline on a wall-clock schedule

Tumbling window trigger


Operates on a periodic interval, while also retaining
state

Event-based trigger
Responds to an event
Schedule Trigger

Schedule Recurrence:
Every Minute
Hourly
Daily
Weekly
Monthly

Pipeline Assignment
Multiple pipelines to single trigger
Assignment performed from pipeline
Schedule Triggers
Demo
Other ADF Features
Triggers

Lift and Shift


Executing SSIS packages stored in Azure
Using Azure resources, not on-prem resources

Power Query
Can leverage the Power Query Editor Online to
Transform data in a Pipeline

Flowlet
Store re-usable code

Data flow libraries (preview)


Custom functions using the expression builder for re-use

You might also like