Introduction to Azure Data Engineer
o Importance of ETL
o What is OLTP
o Limitation of OLTP Systems
o What is Data warehouse
o Limitations of traditional data warehouse
o What is ETL(Extract Transformation and Loading)
o What is ELT(Extract Loading and Transformation)
o ETL Tools in the Market
o Importance of Transformation
o What is ADF
o Why ADF
o Advantages compared to traditional ETL tools
o Different Modern Data warehouse systems in Market
Introduction to the Azure Data Engineering Products
o Azure SQL
o Azure Synapse
o Azure Data Lake
o Azure Data Factory
o Azure Databricks
o Azure Keyvault
o Azure Cosmos DB
Roles and Responsibilities of Azure Data Factory and Data Engineer
o Different types of data
▪ Structured
▪ Unstructured
▪ Semi-Structured
o Different Formats of Data
▪ CSV
▪ JSON
▪ Parquet
▪ AVRO
▪ ORC
o Different types of Triggers
▪ Scheduled Triggers
▪ Tumbling Window Triggers
▪ Event based Triggers
o Connectors overview
o Data Transformations using Mapping Data flows
Introduction to Azure Data Factory
o Difference between ADF version1 and version2
o Building Blocks of ADF
▪ Pipelines
▪ Activities
▪ Datasets
▪ Linked Services
▪ Integration Runtimes
● Azure Auto Integration Runtime
● Self Hosted Integration Runtime
● SSIS Integration Runtime
Introduction to Azure Portal
o How to create Azure Account
o Azure subscriptions
o What are resources and Resource Groups
o Why multiple resource groups are needed
o What are regions in azure
o Region Advantages
Azure Storage Introduction
o Blob Storage
o Data Lake Storage
o Differences between Data Lake Gen1 and Gen2
o Storage Redundancy options
▪ LRS(Locally Redundant Storage)
▪ ZRS(Zone Redundant Storage)
▪ GRS(Geo Redundant Storage)
▪ GZRS(Geo Zone Redundant Storage)
o Difference between blob storage and data lake gen2
o File Shares
o Queues
o Table Storage
o Load the data from Azure blob storage to Power BI
o Load the data from Azure Data Lake to Power BI
o Different Tiers in Azure Storage
▪ Hot
▪ Cool
▪ Achieve
o Storage Life Cycle Management
o RBAC (Role Based Access Control)
o ACL(Access Control Lists)
o Shared Access Signatures
o Access Keys
Azure Data Factory Practical's
o Create Azure Data Factory
o Create Azure SQL Resource
o Connect Azure SQL to SSMS
o Create first ADF pipeline
o Adding Client IP to Firewall settings
o Allow Azure Services in Firewall settings
Connectors overview
o Different data source connectors
o How to check connectors options
Copy Activity
o Genera Options
o Source Dataset in copy activity
o Sink Dataset in copy activity
o Mapping
o Settings
o User Properties
Scenario1
o Copy the file from one container to another container from same storage
account
o Copy the file from one storage account container to another storage account
container
Scenario2
o How to copy the data from Azure SQL to Azure Data Lake Gen2
o How to load the data into Azure SQL Table from Azure Data Lake Gen2
o How to delete the files on successful copy of entire folder contents
▪ Delete Activity Overview
● General
● Source
● Logging Settings
● User Properties
Scenario3
o How copy the files with some pattern to another container
▪ Get Metadata Activity Configure to the container
● Field List Options
● Child Items
● Exists
● Item Name
● Item Type
● Last Modified
o How to Get the File Size and delete the files with size greater than 4mb
▪ Configure Get Meta Data to the file
● Field List Options
▪ AND OR Functions in Expression Language
Scenario4
o Conditional Split with if condition
▪ Used activities
● Get Metadata
● Filter
● Foreach
● If Condition
Scenario5
o Send an email on Pipeline Failure
▪ Web Activity
● Settings
▪ Logic Apps Integration
Scenario6
o Migrate all tables of database into datalake with single pipeline
▪ Lookup Activity --> Foreach Activity --> Copy Activity --> with
Parameters
Scenario7
o How to apply more than one conditional split in ADF Activities
▪ Switch Activity
Scenario8
o How to run the pipeline until file arrives in Datalake Container
▪ Until Activity
Scenario9
o How to load the filtered data to the azure sql table/Synapse table
▪ Stored Procedure activity
● Pass output of previous activity to the stored procedure
parameters
Scenario10
o Perform Transformation using script activity
▪ Script Activity
Scenario11
o Use cases of Execute pipeline activity
o Use cases of Validation Activity
o Use cases of Wait Activity
o Use cases of Set and Append Variable Activity
Recap of all the activities
o Copy Data, Append Variable, Delete, Execute Pipeline, Get Metadata,
Lookup, Stored Procedure, Script, Set Variable, Validation, Web Activity, Wait,
Filter Activity, For Each Activity, If Condition, Switch, Until
Parameterization
o Parameterize datasets, Linked Services
o Bulk load of data from Azure SQL to Data Lake Storage
Data Flows Introduction
o ELT(Extract Load and Transform)
o Two DataFlows in ADF
o DataFlow Activity in Pipeline
o Mapping Data Flows
o Configuring Mapping Dataflows as Dataflow Activity in PipeLine
o DataFlow Debug Explanation
o Introduction to Transformations
▪ Source
▪ Sink
▪ Union
▪ Filter
▪ Select
▪ Derived Column
▪ Join, etc.
o Difference between source , sink of “Copy Data” activity And source , sink of
“Mapping Data Flows”.
Dataflow Scenario1
o Source as (datalake) Sink as (SQL table) --> by using dataflow source and
sink transformation
Extract data from RDBMS to Datalake --> Apply Filter
o Filter Transformation (Source --> Filter --> Sink) Execute with Dataflow
Activity
Data Flow Scenario on Select Transformation
o Select Transformation
▪ Renaming the columns
▪ Droping the columns
▪ Reordering the columns
▪ Flow
● Source --> Select --> Sink (Execute with Mapping Dataflow)
Derived Column use case
o Derived column Transformation
▪ Clean existing column data
▪ Generate new column by using expression language
▪ Generate Hierarchal Columns
▪ Type casting using derived column transformation
▪ Columns condition
▪ Column pattern
Union Transformation use cases
o Combining one or more datasets with same schema and write output to single
file
o Combining one or more datasets with different schema and write output to
single file
o Union by
▪ Name
▪ Position
o Select with union
Branching Transformation and its use cases
Exists Transformation
o Exists transformation and its use cases
▪ Exists with equality condition
▪ Exists with non-equality condition
● Inconsistent results handling with non-equality conditions
● Fixed broadcast settings
o Exists with Custom Expression
Lookup transformation and its use cases
o Overview of lookup transformation
o Lookup using
▪ Any row
▪ First row
▪ Last row
▪ Match Multiple rows
o Lookup with non-equality conditions
Conditional split transformation and its use cases
o Distribute data into to multiple datasets based on given condition
o Split datasets on
▪ First Matching Condition
▪ All Matching Condition
Join Transformation and its use cases
o Full outer
o Inner
o Left Outer
o Right Outer
o Custom Cross
Aggregate Transformation and its use cases
o Removing duplicates using aggregate
o Single grouping with single aggregation
o Single grouping with multiple aggregation
o Multi groping with multiple aggregation
Pivot Transformation
o What is pivot transformation
o Difference between Aggregate transformation and Pivot Transformation
o Implement pivot transformation in dataflow
o How to clean pivot output
o How to call multiple dataflows in a single pipeline
o Why we used multiple dataflows in one single pipeline
Unpivot Transformation
o Unpivot transformation
o What should be the input for Unpivot (pivoted output file).
o 3 configurations :
▪ Ungrouping column
▪ Unpivoted column (column names to as column values)
▪ aggregated column expression (which row aggregated values to be
turned as column values)
o Difference between output of “aggregate” and “unpivot” transformation
o What additionally unpivot produces
o Use case of Unpivot
Surrogate Key Transformation
o Surrogate Key transformation
o Why we should use Surrogate key
o Configuring Starting Integer Number for Surrogate Key
o Scenario of Surrogate Key
Rank Transformation
o Rank Transformation in dataflows
o Why sorting data is required for Rank transformation
o Sorting options
▪ Ascending
▪ Descending”
o When to use “Ascending” option for Ranking
o When to use “Descending” option for Ranking
o Dense Rank and How it Works
o Non Dense Rank (Normal Rank) and How it Works
o Why we should not use “surrogate key” for Ranking
o What is difference between Dense Rank and Non Dense Rank(Normal Rank)
Window Transformation
o Over
o Sort
o Range by
o Window columns
o Cumulative Average
o Cumulative Sum
o Cumulative Max
o Dense Rank for each partition.
o Making all rows as Single Window and apply Cumulative Aggregation
Stringify Transformation
o Reading Json data
o Reading a single Json record
o Reading array of documents
o Converting complex types into String using Stringify and derived column
transformations
Parse Transformation
o Parse transformation in Mapping DataFlows
o How to handle and parse string collection( delimited string values )
o How to handle and parse xml data
o How to handle and parse json data
Flatten Transformation
o Converting Complex Json nested structures into CSV/Text file.
o Complex data processing (transformations).
o How parse nested Json records
o How to flatten array of values into multiple rows.
o If data has complex structures, what are supportive data store formats .
o How to write into json format.
o How to write into flatten file format( csv).
o Transformations used to process data.
▪ Parse
▪ Flatten
▪ Derived Column
▪ Select
▪ Sink
Implementation of Incremental data loading using ADF Activities
Implementation of SCD (Slowly Changing Dimensions)
o SCD Type 1
o SCD Type 2
o SCD Type 3
o SCD Type 4
o SCD Type 6
Data Loading to Synapse from Azure Data Lake
o Copy Command
o Polybase
o Bulk Insert
o Upsert
ADD ON:
● Real-time Project Explanation
● Resume Preparation
● Interview Questions Discussions