0% found this document useful (0 votes)
213 views

Data Platform and Analytics Foundational Training: (Speaker Notes)

Azure Data Factory is a cloud-based data integration service that orchestrates and automates the movement and transformation of data. It allows users to create data pipelines to move and transform data between various data stores on a specified schedule. Pipelines contain activities for data movement and transformation that can utilize various compute services like Azure, SQL, and Hadoop.

Uploaded by

Kathalina Suarez
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
213 views

Data Platform and Analytics Foundational Training: (Speaker Notes)

Azure Data Factory is a cloud-based data integration service that orchestrates and automates the movement and transformation of data. It allows users to create data pipelines to move and transform data between various data stores on a specified schedule. Pipelines contain activities for data movement and transformation that can utilize various compute services like Azure, SQL, and Hadoop.

Uploaded by

Kathalina Suarez
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Microsoft C+E Technology Training

Data Platform and


Analytics
Foundational Training
Solution Area
Data Analytics
Solution
Advanced Analytics
Technology
Data Factory

[Speaker Notes]
Azure Data Factory Service
A data integration service in the cloud

Data Factory is a cloud-based data


integration service that orchestrates and
automates the movement and
transformation of data
Azure Data Factory Service
A data integration service in the cloud

Data Factory service allows you to create data


pipelines that move and transform data, and then run
the pipelines on a specified schedule (hourly, daily,
weekly, etc.).
Azure Data Factory Service
Pipelines and Activities

In a Data Factory solution, you create one or more


data pipelines
Activities define the actions to
perform on your data
Data movement
Data transformation
Azure Data Factory Service
Activities » Data Movement

The Copy Activity in Data Factory copies


data from a source data store to a sink
data store
Azure Data Factory Service
Activities » Data Movement » Azure
Data Store Supported as Source Supported as Sink
Azure Blob storage   
Azure Data Lake Store   
Azure SQL Database   
Azure SQL Data Warehouse   
Azure Table storage   
Azure DocumentDB   
Azure Search Index  
Azure Data Factory Service
Activities » Data Movement » Databases
Data Store Supported as Source Supported as Sink
SQL Server *  
Oracle *  
MySQL *  
DB2 *  
Teradata *   
PostgreSQL *   
Sybase *   
Cassandra *    * Data store can be
on-premises or on
MongoDB *   Azure IaaS, and
Amazon Redshift   require you to install
Data Management
Gateway
Azure Data Factory Service
Activities » Data Movement » File
Data Store Supported as Source Supported as Sink
File System *   
HDFS *   
Amazon S3   
FTP  

* Data store can be


on-premises or on
Azure IaaS, and
require you to install
Data Management
Gateway
Azure Data Factory Service
Activities » Data Movement » Others
Data Store Supported as Source Supported as Sink
Salesforce  
Generic ODBC *   
Generic OData   
Web Table (HTML)   
GE Historian *  

* Data store can be


on-premises or on
Azure IaaS, and
require you to install
Data Management
Gateway
Azure Data Factory Service
Activities » Data Transformation

Data Factory supports Data Transformation Activity Compute Environment

numerous
Hive HDInsight [Hadoop]
Pig HDInsight [Hadoop]

transformation
MapReduce HDInsight [Hadoop]
Hadoop Streaming HDInsight [Hadoop]

activities that can be


Machine Learning activities:
Azure VM
Batch Execution and Update Resource

added to pipelines
Azure SQL
Stored Procedure Azure SQL Data Warehouse, or

either individually or
SQL Server
Data Lake Analytics U-SQL Azure Data Lake Analytics

chained with another


HDInsight [Hadoop] or
.NET (custom activity)
Azure Batch

activity
Azure Data Factory Service
Linked Data Services

Linked services define the information needed for


Data Factory to connect to external resources
Linked services are used for two purposes:
To represent a data store including, but not limited to,
an on-premises SQL Server database, Oracle database,
file share, or an Azure Blob Storage account
To represent a compute resource that can
host the execution of an activity
Azure Data Factory Service
Datasets

Linked services link data stores to a Data factory


Datasets represent data structures within the data
stores
Azure Data Factory Service
Relationship Between Data Factory entities

Data Factory has a few key entities that work


together to define:
Input and output data
Processing events, and
The schedule and resources
required to execute the
desired data flow
Azure Data Factory Service
Data Flow Concepts
Datasets Activity: a processing step Pipeline: a sequence of
(Collection of files,
database table, etc)
(Hadoop job, custom code, ML model, etc)
activities (logical group)

Data Sources Ingest Transform & Analyze Publish

Call Log Files Call Log Files


Transform,
Combine, etc Analyze Move Visualize

Customer Table

Customer Table
Customers
Customer Customer
On Premises Call Details
Likely to
Churn Table
Churn
Data Mart
Azure Data Lake
Azure DW
Azure Data Factory Service
Supported Regions

Currently, you can create data factories in the


following three regions:
West US
East US, and
North Europe

However, a data factory can access data stores and


compute services in other Azure regions to move data
between data stores or process data using compute
services
Azure Data Factory Service
Supported Regions

Data Factory itself does not store any data


It lets you create data-driven flows to orchestrate
movement of data between supported data stores
and processing of data using compute services in
other regions or in an on-premises environment
It also allows you to monitor and manage workflows using both programmatic and UI mechanisms
Azure Data Factory Service
Supported Regions

Even though Azure Data Factory is available in only


three regions, the service powering the data
movement in Data Factory is available globally in
several regions
In case a data store sits behind a firewall then a Data
Management Gateway (DMG) can be installed on-
premises environment moves the data instead
Azure Data Factory Service
Resources

Introduction to Azure Data Factory Service, a data


integration service in the cloud
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-introduction

Tutorial: Build your first pipeline to process data using


Hadoop cluster
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-build-your-first-pipeline
© 2016 Microsoft Corporation. All rights reserved. Microsoft, Windows, Microsoft Azure, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The
information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions,
it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO
WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION

You might also like