Dynamics data in Synapse Link for Dataverse
- Transitioning from Export to Azure Data Lake to Synapse Link
Draft shared with customers under NDA, Ver 0.7, Aug-2023
Agenda 1. What is Synapse Link for Dataverse
2. What is links to Microsoft Fabric
3. Transitioning from Export to data lake
4. Architecture pattern & transition guidance
5. Additional resources
1
Synapse Link Unlock Dynamics data for analytics
for Dataverse
Microsoft Dataverse
Enterprise grade low-code, data platform that manages your data and
business logic across the cloud
Business Logic
AI + Analytics
Integration
App + Data Lifecycle
Data Storage
Security + Governance
Synapse
Power Apps
Synapse Link for Dynamics 365
Dataverse Sales
Dynamics 365
One-click export of Dataverse Customer
Service
data into Synapse
for analytics Dataverse
Dynamics 365
Continuous replication of Field Service
standard and custom tables to
Azure Synapse Analytics Dynamics 365
Marketing
Now supports F&O Tables & Dynamics 365
Finance and
Entities Operations
2
Dynamics 365 and Dataverse
Microsoft Fabric No-copy data integration with Microsoft Fabric
Microsoft Fabric
Data analytics for the era of AI
Data Synapse Data Synapse Data Synapse Data Synapse Real Data
Power BI
Factory Engineering Science Warehouse Time Analytics Activator
OneLake
In Public preview
Power platform & Power Apps
Microsoft Fabric
Dynamics 365
Sales
Dynamics & Dataverse data
available in Microsoft Fabric Dynamics 365
Customer
immediately, no ETL, no data copy Service
Dynamics 365
Field Service
Dataverse Microsoft Fabric
Dynamics 365
Marketing
Dynamics 365
Finance & Microsoft OneLake
Supply chain
In Public preview
Power platform & Power Apps
Microsoft Fabric
Power
Automate
Dynamics & Dataverse data
available in Microsoft Fabric
immediately, no ETL, no data copy Power Pages
Makers can build Apps with
Power
insights from Microsoft OneLake Virtual agents
no ETL, no data copy Microsoft Fabric Dataverse
Power Fx
AI builder
Microsoft OneLake
3
What’s changing
Why transition?
Transition Business benefits
Expected effort
Previously : Synapse Link & Export to Data Lake
Option 1: Microsoft OneLake with Fabric
Link to Microsoft Fabric
13
Option 2: Synapse Link- Export to your Data Lake (BYOL)
Optimized Apache Spark
Enterprise data warehousing
Fabric Near-Realtime Reporting
Azure Machine Learning
Delta/Parquet (live data)
Data Integration and Orchestration
Synapse Workspace Serverless SQL Query
…
CSV (Incremental data) Azure Data Factory Sink data to any destination with 90+ connectors.
14
Dataverse Link to Synapse &
Fabric
Demo
Comparison
Criteria (Legacy)Export to data lake (Option2) Synapse link BYOL (CSV) (Option2) Synapse link BYOL (Delta) (Option1) Microsoft
OneLake with Fabric
Ease of Customer managed PaaS resources- Customer managed PaaS resources- Customer managed PaaS resources- Managed by Microsoft
Administration Key vault, Storage account, Storage account & Synapse Storage account & Synapse
Synapse & CDMUtil
Security No firewall support on storage Firewall support on storage account Firewall support on storage account Managed by Microsoft
account and synapse workspace
Query Performance Data format : CSV Data format : CSV Data format : Delta
Potential read-write contention No read-write contention when No read-write contention as Delta supports ACID (atomicity,
when read CSV directly reading completed Incremental consistency, isolation, and durability)
High read cost update folder. Low cost and better-read performance
End to end data Data freshness – 10-15 min Data freshness (Configurable) – 15 Data freshness (Configurable) – 15 Data freshness ~1 hour
freshness ,Higher data conversion time. minutes – 24 hours Minutes – 24 hours Data ready for reports
Data ready for reports
Data write and Data Lake storage + Transaction Data Lake storage + Transaction Data Lake storage + Transaction + DV capacity (entitlement+
storage costs Delta lake conversion/maintenance add-on)
Compute cost Query cost + Data conversion + Query cost + Data conversion + Query cost + Maintenance of the Fabric capacity cost
Maintenance of the pipeline as PaaS Maintenance of the pipeline as PaaS pipeline as PaaS solutions (Simpler
solutions solutions PaaS)
Cost considerations
Cost consideration Export to data lake Synapse link – BYOL Synapse link – BYOL Synapse link – One lake
(Data type: CSV) (Data type: CSV) (Data type: Delta) (Data type: Delta)
Storage + Transaction cost Storage cost ($) Storage cost ($) Less storage cost ($) with delta Dataverse capacity + add-on
Data conversion and data Materialize parquet, SQL Server Materialize parquet, SQL Delta conversion - Spark pool
transformation cost or Synapse DW ($$$) + Server or Synapse DW ($$$) + ($$$)
Maintenance cost of PaaS ($) Maintenance cost of PaaS ($)
Data read cost Directly reading CSV: High read Reading converted data ($) Delta: Lower read cost ($) Fabric capacity
(Example: query via Synapse cost ($$$) / Fabric capacity
serverless, Excel or Power BI
refresh) Reading converted data : ($)
Fabric Capacities – Everything you need to know about
Azure data lake storage cost
Microsoft Fabric capacities for purchase
Azure Synapse Analytics cost
Microsoft Fabric Licensing: An Ultimate Guide
17
Benefits to transition from Export to Data lake
1. Data from all Dynamics apps available in a simple experience
2. Configures & connects Synapse and Microsoft Fabric
3. Save as Parquet & delta lake (faster queries, smaller files, no update conflicts)
4. You can setup firewalled storage accounts or use built-in “One Lake”
5. Materialize F&O Entities
6. #Table limitations go away (choose as much data as you like)
7. Available in all regions (including local regions)
8. Support for memo and large string fields
9. Translated F&O Enums
18
What is changing ?
Area What is changing Effort Value add
Admin and setup experience LCS to Power platform maker portal for setup and Admin user training Data from all Dynamics apps available in a simple
add tables
experience
Tables available to use Tables must have row version change tracking One-time small X++ effort # of table limit is not applicable
enabled to extend missing OOB or Default support for large memo fields
Common OOB already enabled custom tables
Change tracking mechanisms CDC to Row version change tracking No effort Lower impact on operational database
Full refresh is not needed to recover from
failure/Database failovers/ movements
Integration with Synapse and No need of CDMUtil Saved effort Configures & connects Synapse workspace and
Microsoft Fabric
Microsoft Fabric
Data format CSV format for incremental data, Delta lake format No effort Faster queries, smaller files, no update conflicts
for final data. Integrated with Microsoft Fabric data platform for
BYOL – Customer Synapse Spark for delta conversion the age of AI
Data transformation/ETL Adapt to any small audit field changes in ETL and Low effort to adapt
data locations as applicable
Incremental data Integration Adapt to Synapse link folder structure and audit Low effort – adjusting the Simpler logic to process full and incremental data
fields and incremental logic incremental pipelines to new with single pipeline
Fasttrack guidance and sample
available
BI & Reporting No change expected 19
What are the risk to transition?
• Synapse link for DV has been GA for a long time and is used by
thousands of D365 CE + Power Platform customers
• F&O Entity and Table support is in GA (Sept 15, 2023)
• With Synapse Link – you can still use data lake + synapse that does
not change
• Transitioning Tables from Export to Synapse link F&O table
requires minimal effort, only small property change required to
export custom tables for Synapse Link
• Follow the established architecture patterns to transition
20
Microsoft Product Availability and Roadmap
General available
• Synapse link for DV – GA
• F&O Data in Synapse link DV ( GA Sept 2023)
Public previews
• Microsoft Fabric link
Coming soon
• Enable Managed Lake for F&O tables
• Enum numeric values
• Spark 3.3 support reduces delta conversion cost
• Transition from Export to data lake tables to Synapse link in one click
Deprecation
• Finance and Operations - Export to data lake - (Deprecated Oct 2023 – Supported until Oct 2024) : New
customers start with Synapse link, existing customer can easily transition to Synapse Link.
21
5
Architecture
patterns and Common analytics architecture patterns and
transition transition strategy from export to data lake to
Synapse link and Microsoft Fabric
guidance
22
Overview of common analytics architecture patterns
#1 Virtual data warehousing # 2 Lakehouse # 3 Cloud Data warehousing # 4 Integration using TSQL
(Reading data from data lake (Reading and transforming data (Ingesting data from lake to cloud (Incremental ingestion of data to
using TSQL views and SPs) using spark notebooks) MPP data warehouse, transform relational DB (SQL Server) or third
data using TSQL) party DW)
23
#1 Virtual data warehousing: Synapse Link & Export to data lake, with Synapse Serverless
Old pattern
#1 Virtual data warehousing : Synapse Link (BYOL) with Synapse Spark & Serverless
#1 Virtual data warehousing: Synapse Link (BYOL), with Synapse Spark & Microsoft Fabric
#1 Virtual data warehousing: Microsoft OneLake, with Microsoft Fabric
Link to Microsoft Fabric
#2 Lakehouse : Synapse Link/Export to Data lake, with Synapse Spark & Serverless
Old Pattern
#2 Lakehouse : Synapse Link (BYOL), with Synapse Spark and Serverless
#2 Lakehouse : Synapse Link (BYOL), with Synapse Spark & Microsoft Fabric
#2 Lakehouse : Microsoft OneLake, with Fabric
Link to Microsoft Fabric
#3 Cloud data warehousing: Synapse link/ Export to data lake with Synapse DW
Old Pattern
#3 Cloud data warehouse: Synapse Link (BYOL) with Synapse Spark Microsoft Fabric
#3 Cloud data warehouse: Microsoft OneLake, with Fabric
Link to Microsoft Fabric
#4 Integration using SQL: Synapse Link/Export to Data lake with Azure SQL, On-prem SQL, third-party using Synapse
#4 Integration using SQL: Synapse Link - Export data to your lake with Azure SQL, On-prem SQL, third-party using
Synapse
#5 BYOD Replacement: Synapse link bring your own data lake and Microsoft Fabric
#5 BYOD Replacement: Synapse link direct link with Microsoft Fabric
6
Links to Tech talks
Appendix Product screen shots and details
Documentation
Resources
Documentation
• Synapse Link
• Azure Synapse Link - Power Apps | Microsoft Learn
• Create an Azure Synapse Link for Dataverse with your Azure Synapse Workspace - Power Apps | Microsoft Learn
• Export Microsoft Dataverse data in Delta Lake format - Power Apps | Microsoft Learn
• Choose finance and operations data in Azure Synapse Link for Dataverse - Power Apps | Microsoft Learn
• Use managed identities for Azure with your Azure data lake storage - Power Apps | Microsoft Learn
• Microsoft Fabric
• Dataverse direct integration with Microsoft Fabric - Power Apps | Microsoft Learn
• Microsoft Fabric documentation - Microsoft Fabric | Microsoft Learn
• Yammer group
• Viva Engage- Synapse Link for Dynamics
Video
• Dataverse
• Dynamics 365 Bites - Synapse Link for finance and operations apps - Getting started (preview) | August 2023
• (2747) Build 2023: Dataverse integration for Microsoft Fabric - YouTube
• Microsoft Fabric
• (2747) Microsoft Fabric: Satya Nadella at Microsoft Build 2023 – YouTube
• Microsoft Fabric Launch Digital Event (Day 1) – YouTube
Allow Row Version Change Tracking
• Row version CT property must be enabled on tables
• 1600+ OOB tables already have it enabled (most
common used with Export to data lake)
• Row version CT property is extensible – Developer can
extend the property for any tables that is not enable OOB
• Custom and ISV tables needs a property enabled on table
to make it available (one-time small dev work)
CSV vs Delta lake
Factor CSV Delta Lake
File Size & Storage Text format, larger file size, no built-in
Columnar format, smaller file size, built-in compression
Efficiency compression
Not supported, may lead to data corruption
ACID Transactions Supported, ensures data integrity
during write failures
Slower read/write due to row-oriented
Performance Faster read/write due to columnar storage and statistics
storage
Less efficient due to performance and More efficient due to performance optimizations and
Data Virtualization
schema evolution limitations schema evolution support
Data Processing Not optimized, longer processing times,
Efficient supports data processing
(Transformation) higher resource usage
Schema changes may require data rewriting
Schema Evolution (Export to data lake – support schema drift Built-in support for schema evolution
on new column)
No built-in metadata handling, managed
Metadata Handling separately. Export to data lake manage Built-in metadata handling alongside data
metadata using cdm.json
Widely supported, easy to use with various Open-source data format – widely used for BI and Analytics
Compatibility
tools and platforms workload.
Comparing incremental change technology
Feature/Aspect CDC (Export to data lake) Change tracking (BYOD) Row version CT (Synapse link)
What it captures Changes (Inserts, Updates, Deletes) Which rows have changed (Insert, Changes via a version column;
with historical data. Update, Delete) without historical doesn't provide historical data.
data.
Overhead Low to Moderate. Low to Moderate. Low. Only the version number
increments.
Storage Stored in change tables separate from Metadata only, so less storage Additional integer column in the
the original table. overhead. table for the version.
Query System functions and change tables. CHANGETABLE function. Standard SQL queries using the
Mechanism version column.
System Impact Log and IO impact on due to the Querying with CHANGETABLE on Minimal impact; only an increment
logging of changes. large dataset with extended time operation.
can cause blocking.
Recovery from Full export if CDC retention period is Full export if CT retention is Changes are retained in the
failure/Failover missed, Failover or Disaster recovery missed database- No full export needed
/DR
Data sync state column differences
Synapse link Description Equivalent columns in Export to data lake
Id Unique value for each row like RecId, generated to for Not applicable
compatibility between F&O and CE platform
SinkCreatedOn Datetime stamp when the data is written to datalake DataLakeModified_DateTime
SinkModifiedOn Datetime stamp when the data was updated in delta lake DataLakeModified_DateTime
LastProcessedChange_DateTime
versionnumber The versionnumber column is an automatically incremented LSN
binary number, that updates every time a row is inserted or
modified in a table. Can be used as marker for incremental
data processing.
isDelete Rows that are hard deleted in the source will be marked as NA
isDelete = true in the incremental CSV file.
Understanding – Incremental folder
Incremental folder feature
• An Incremental profile is created in maker portal with storage account only
• When table is added to incremental folder profile, system start initial export,
when complete, table is marked as Active.
• In Active state, incremental changes in the tables copied to data lake in near
real time.
• Initial data as well as incremental data follows the same pattern and folder
structure in the data lake.
• Folder created with name creation timestamp with format "yyyy-MM-
dd'T’HH.mm.ss.SSZ“
• Inside the timestamp folder, child folders are created with format
“{tablename}” that contains csv data.
• While data export starts a file model.json file is created with 0 bytes. You
should not read the data from the folder, until model.json file is updated.
• Once data write complete/update frequency is reached (example 15 min),
model.json file is updated with tables metadata under the timestamp folder
to mark completion of the folder.
• Once previous folder is complete, a new folder get created and process
continues.
Incremental folder data processing - Ideas
Type Trigger point Pre-Processing Data read Data copy Post Examples
processing
Event Storage event – Read each tables De-duplicate the Coming
based when model.json {ftimestampfolde data on soon
file under r1}/{tablename} VersionNumber and
timestamp folder CSV data ID
is updated size
not 0
Pass the Upsert the rows
{timestampfolder matching or ID
1}
Batch Specified time Create a high Read each folder Delete the rows Update
mode interval watermark {folder1}/{tablena where IsDelete = marker with
marker – me} , True largest
foldername {folder2}/{tablena timestamp
Identify list of me} folders CSV foldername
folder processed
completed since
last watermark
{folder1,folder2}