0% found this document useful (0 votes)
21 views14 pages

DP-203 Resources

The document provides guidance on designing and implementing data storage and logical data structures in Azure. It includes designing storage solutions like Data Lake, partitioning strategies, serving layers with star schemas and slowly changing dimensions, and implementing physical structures like compression, sharding, and archiving. It also covers building temporal data solutions and external tables.

Uploaded by

Forjohna Shaik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views14 pages

DP-203 Resources

The document provides guidance on designing and implementing data storage and logical data structures in Azure. It includes designing storage solutions like Data Lake, partitioning strategies, serving layers with star schemas and slowly changing dimensions, and implementing physical structures like compression, sharding, and archiving. It also covers building temporal data solutions and external tables.

Uploaded by

Forjohna Shaik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

DP-203 Resources

1. Design and Implement Data Storage (40-45%)


1. Design a data storage structure
1. design an Azure Data Lake solution
1. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-best-practices
2. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-data-scenarios
2. recommend file types for storage &
3. recommend file types for analytical queries
1. https://docs.microsoft.com/en-us/azure/data-factory/connector-azu
re-data-lake-storage#dataset-properties
4. design for efficient querying
1. https://docs.microsoft.com/en-us/azure/data-explorer/data-lake-qu
ery-data#optimize-your-query-performance
2. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-query-acceleration
3. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-query-acceleration-how-to?tabs=azure-powershell%2Cpow
ershell
5. design for data pruning
1. https://en.wikipedia.org/wiki/Decision_tree_pruning
2. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-performance-tuning-guidance
3. https://docs.microsoft.com/bs-cyrl-ba/azure/databricks//delta/optim
izations/dynamic-file-pruning
4. https://databricks.com/blog/2020/04/30/faster-sql-queries-on-delta-
lake-with-dynamic-file-pruning.html
5. https://docs.microsoft.com/en-ca/azure/databricks//delta/optimizati
ons/dynamic-file-pruning
6. design a folder structure that represents the levels of data transformation
1. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-best-practices#directory-layout-considerations
2. https://techcommunity.microsoft.com/t5/data-architecture-blog/how
-to-organize-your-data-lake/ba-p/1182562
3. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-namespace
7. design a distribution strategy
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/sql-data-warehouse-tables-distribute
8. design a data archiving solution
1. https://azure.microsoft.com/en-ca/updates/archive-tier-for-azure-d
ata-lake-storage-now-generally-available/
2. https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob
-storage-tiers?tabs=azure-portal#archive-access-tier
2. Design a partition strategy
1. design a partition strategy for files
2. design a partition strategy for analytical workloads
3. design a partition strategy for efficiency/performance
4. design a partition strategy for Azure Synapse Analytics
5. identify when partitioning is needed in Azure Data Lake Storage Gen2
1. https://docs.microsoft.com/en-us/azure/architecture/best-practices/
data-partitioning
2. https://docs.microsoft.com/en-us/azure/architecture/best-practices/
data-partitioning-strategies
3. Design the serving layer
1. design star schemas
1. https://docs.microsoft.com/en-us/power-bi/guidance/star-schema
2. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/sql-data-warehouse-tables-overview
2. design slowly changing dimensions
1. https://en.wikipedia.org/wiki/Slowly_changing_dimension
2. https://docs.microsoft.com/en-us/learn/modules/populate-slowly-c
hanging-dimensions-azure-synapse-analytics-pipelines/
3. https://docs.microsoft.com/en-us/learn/modules/populate-slowly-c
hanging-dimensions-azure-synapse-analytics-pipelines/3-choose-
between-dimension-types
4. https://docs.microsoft.com/en-us/learn/modules/populate-slowly-c
hanging-dimensions-azure-synapse-analytics-pipelines/2-describe
5. https://www.youtube.com/watch?v=Sg2AAk1vwEs
3. design a dimensional hierarchy
1. https://docs.microsoft.com/en-us/power-bi/guidance/star-schema#
snowflake-dimensions
2. https://en.wikipedia.org/wiki/Snowflake_schema
3. https://docs.microsoft.com/en-us/azure/data-factory/connector-sno
wflake
4. design a solution for temporal data
1. https://docs.microsoft.com/en-us/azure/azure-sql/temporal-tables
2. https://en.wikipedia.org/wiki/Temporal_database
5. design for incremental loading
1. https://docs.microsoft.com/en-us/azure/data-factory/tutorial-increm
ental-copy-overview
2. https://docs.microsoft.com/en-us/azure/data-factory/tutorial-increm
ental-copy-change-tracking-feature-portal
3. https://docs.microsoft.com/en-us/azure/data-factory/tutorial-increm
ental-copy-portal
4. https://www.youtube.com/watch?v=F9cBFnxaSGI
6. design analytical stores
1. https://docs.microsoft.com/en-us/azure/architecture/data-guide/tec
hnology-choices/analytical-data-stores
2. https://docs.microsoft.com/en-us/azure/architecture/data-guide/big
-data/#lambda-architecture
7. design metastores in Azure Synapse Analytics and Azure Databricks
1. https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-use-ext
ernal-metadata-stores
2. https://docs.microsoft.com/en-us/azure/databricks/data/metastore/
3. https://docs.microsoft.com/en-us/azure/synapse-analytics/metadat
a/overview
4. https://docs.microsoft.com/en-us/azure/databricks/data/metastores
/external-hive-metastore
5. https://www.youtube.com/watch?v=pBB5zFnhgyE&list=PL7_h0bR
fL52oZqAfV_kumYLUH5dbcWm9q
4. Implement physical data storage structures
1. implement compression
1. https://docs.microsoft.com/en-us/azure/data-factory/supported-file-
formats-and-compression-codecs
2. https://docs.microsoft.com/en-us/azure/data-factory/format-parque
t
3. https://databricks.com/glossary/what-is-parquet
4. https://docs.informatica.com/data-integration/powerexchange-ada
pters-for-informatica/10-5/powerexchange-for-microsoft-azure-blo
b-storage-user-guide/microsoft-azure-blob-storage-data-objects/d
ata-compression-in-microsoft-azure-blob-storage-sources-and-tar.
html
2. implement partitioning
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/sql-data-warehouse-tables-partition
3. implement sharding
1. https://docs.microsoft.com/en-us/azure/architecture/patterns/shard
ing
2. https://docs.microsoft.com/en-us/azure/azure-sql/database/elastic-
scale-introduction
3. https://docs.microsoft.com/en-us/azure/azure-sql/database/elastic-
scale-shard-map-management
4. implement different table geometries with Azure Synapse Analytics pools
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/get-start
ed-analyze-sql-pool
2. https://docs.microsoft.com/en-us/azure/synapse-analytics/get-start
ed-analyze-sql-on-demand
3. https://docs.microsoft.com/en-us/azure/synapse-analytics/get-start
ed-analyze-spark
5. implement data redundancy
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/backup-and-restore
2. https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/
migrate/azure-best-practices/analytics/azure-synapse
3. https://docs.microsoft.com/en-us/azure/storage/common/storage-r
edundancy
4. https://docs.microsoft.com/en-us/azure/databricks/scenarios/howt
o-regional-disaster-recovery
6. implement distributions
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/sql-data-warehouse-tables-distribute
7. implement data archiving
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/backup-and-restore
2. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-supported-blob-storage-features
a. https://docs.microsoft.com/en-us/azure/storage/blobs/stora
ge-blob-storage-tiers
5. Implement logical data structures
1. build a temporal data solution
1. https://docs.microsoft.com/en-us/azure/azure-sql/temporal-tables
2. https://docs.microsoft.com/en-us/azure/architecture/
2. build external tables
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/deve
lop-tables-external-tables?tabs=hadoop
3. implement file and folder structures for efficient querying and data pruning
1. https://docs.microsoft.com/en-us/azure/data-explorer/data-lake-qu
ery-data
2. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-performance-tuning-guidance
6. Implement the serving layer
1. deliver data in a relational star schema
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/deve
lop-tables-overview
2. https://en.wikipedia.org/wiki/Star_schema
2. deliver data in Parquet files
1. https://databricks.com/glossary/what-is-parquet
2. https://docs.microsoft.com/en-us/azure/data-factory/format-parque
t
3. implement a dimensional hierarchy
1. https://docs.microsoft.com/en-us/power-bi/guidance/star-schema#
snowflake-dimensions
2. https://en.wikipedia.org/wiki/Snowflake_schema
3. https://docs.microsoft.com/en-us/azure/data-factory/connector-sno
wflake
2. Design and Develop Data Processing (25-30%)
1. Ingest and transform data
1. transform data by using Apache Spark
1. https://docs.microsoft.com/en-us/azure/databricks/scenarios/datab
ricks-extract-load-sql-data-warehouse
2. transform data by using Transact-SQL
1. https://docs.microsoft.com/en-us/azure/data-factory/connector-azu
re-sql-data-warehouse
3. transform data by using Data Factory
1. https://docs.microsoft.com/en-us/azure/data-factory/connector-azu
re-sql-database
2. https://docs.microsoft.com/en-us/azure/data-factory/transform-dat
a-using-spark
4. transform data by using Azure Synapse Pipelines
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/get-start
ed-pipelines
2. https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipel
ines-activities?toc=/azure/synapse-analytics/toc.json&bc=/azure/s
ynapse-analytics/breadcrumb/toc.json
5. transform data by using Stream Analytics
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-introduction
6. cleanse data
1. https://en.wikipedia.org/wiki/Data_cleansing
2. https://www.sqlshack.com/data-cleansing-in-azure-machine-learni
ng/
3. https://app.pluralsight.com/guides/cleaning-data-with-azure-ml-stu
dio
4. https://docs.microsoft.com/en-us/azure/machine-learning/algorith
m-module-reference/clean-missing-data
7. split data
1. https://docs.microsoft.com/en-us/azure/machine-learning/algorith
m-module-reference/split-data
8. shred JSON
1. https://docs.microsoft.com/en-us/sql/relational-databases/json/con
vert-json-data-to-rows-and-columns-with-openjson-sql-server?vie
w=sql-server-ver15
2. https://docs.microsoft.com/en-us/sql/t-sql/functions/openjson-trans
act-sql?view=sql-server-ver15
9. encode and decode data
1. https://docs.microsoft.com/en-us/answers/questions/129474/azure
-data-factory-base64-encoded-secrets.html
10. configure error handling for the transformation
1. https://docs.microsoft.com/en-us/azure/data-factory/how-to-data-fl
ow-error-rows
2. https://techcommunity.microsoft.com/t5/azure-data-factory/underst
anding-pipeline-failures-and-error-handling/ba-p/1630459
3. https://docs.microsoft.com/en-us/azure/data-factory/data-factory-u
x-troubleshoot-guide
4. https://docs.microsoft.com/en-us/azure/data-factory/monitor-using-
azure-monitor
11. normalize and denormalize values
1. https://docs.microsoft.com/en-us/azure/machine-learning/algorith
m-module-reference/normalize-data
12. transform data by using Scala
1. https://docs.microsoft.com/en-us/azure/databricks/scenarios/datab
ricks-extract-load-sql-data-warehouse
13. perform data exploratory analysis
1. https://azure.microsoft.com/en-us/resources/videos/perform-explor
atory-analytics-over-your-data-lake/
2. https://docs.microsoft.com/en-us/learn/modules/perform-machine-l
earning-with-azure-databricks/
2. Design and develop a batch processing solution
1. develop batch processing solutions by using Data Factory, Data Lake,
Spark, Azure
1. https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factor
y-data-processing-using-batch
2. https://docs.microsoft.com/en-us/azure/architecture/data-guide/tec
hnology-choices/batch-processing
2. Synapse Pipelines, PolyBase, and Azure Databricks &
3. create data pipelines
1. https://docs.microsoft.com/en-us/sql/relational-databases/polybas
e/polybase-versioned-feature-summary?view=sql-server-ver15
2. https://docs.microsoft.com/en-us/azure/databricks/clusters/configu
re
3. https://www.youtube.com/watch?v=JUQXx0R0RfE
4. design and implement incremental data loads
1. https://docs.microsoft.com/en-us/azure/data-factory/tutorial-increm
ental-copy-overview
5. design and develop slowly changing dimensions
1. https://docs.microsoft.com/en-us/learn/modules/populate-slowly-c
hanging-dimensions-azure-synapse-analytics-pipelines/
2. https://docs.microsoft.com/en-us/learn/modules/populate-slowly-c
hanging-dimensions-azure-synapse-analytics-pipelines/3-choose-
between-dimension-types
3. https://docs.microsoft.com/en-us/learn/modules/populate-slowly-c
hanging-dimensions-azure-synapse-analytics-pipelines/2-describe
6. handle security and compliance requirements
1. https://azure.microsoft.com/en-ca/overview/trusted-cloud/complian
ce/
2. https://docs.microsoft.com/en-ca/azure/compliance/
7. scale resources
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/quickstart-scale-compute-portal
2. https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-p
erformance
8. configure the batch size
1. https://docs.microsoft.com/en-us/azure/batch/batch-automatic-scal
ing
2. https://docs.microsoft.com/en-us/azure/databricks/delta/delta-batc
h
9. design and create tests for data pipelines
1. https://docs.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/
ci-cd-azure-devops
10. integrate Jupyter/IPython notebooks into a data pipeline
1. https://docs.microsoft.com/en-us/azure/databricks/notebooks/
2. https://docs.microsoft.com/en-us/azure/databricks/notebooks/note
books-use
3. https://docs.microsoft.com/en-us/azure/databricks/notebooks/note
books-manage
11. handle duplicate data
1. https://docs.microsoft.com/en-us/azure/data-factory/how-to-data-fl
ow-dedupe-nulls-snippets
12. handle missing data &
13. handle late-arriving data
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-time-handling
2. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-solution-patterns
3. https://docs.microsoft.com/en-us/azure/machine-learning/algorith
m-module-reference/clean-missing-data
4. https://learning.oreilly.com/library/view/stream-analytics-with/9781
788395908/0b61b6d7-d805-42e2-a1cf-24148ce07f47.xhtml
5. https://docs.microsoft.com/en-us/azure/stream-analytics/event-ord
ering
14. upsert data
1. https://docs.microsoft.com/en-us/azure/data-factory/data-flow-alter
-row
15. regress to a previous state
1. https://docs.microsoft.com/en-us/answers/questions/31313/transa
ctions-in-adf.html
2. https://docs.microsoft.com/en-us/azure/data-factory/connector-azu
re-sql-data-warehouse
16. design and configure exception handling
1. https://docs.microsoft.com/en-us/azure/data-factory/how-to-data-fl
ow-error-rows
17. configure batch retention
1. Configure a simple Azure Batch Job with Azure Data Factory -
Microsoft Tech Community
18. design a batch processing solution
1. https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factor
y-data-processing-using-batch
19. debug Spark jobs by using the Spark UI
1. https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-sp
ark-job-debugging
3. Design and develop a stream processing solution
1. develop a stream processing solution by using Stream Analytics, Azure
Databricks, and Azure Event Hubs
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-introduction
2. https://docs.microsoft.com/en-us/azure/databricks/spark/latest/stru
ctured-streaming/
3. https://docs.microsoft.com/en-us/azure/architecture/reference-arch
itectures/data/stream-processing-databricks
2. process data by using Spark structured streaming
1. https://docs.microsoft.com/en-us/azure/databricks/spark/latest/stru
ctured-streaming/
3. monitor for performance and functional regressions
1. https://docs.microsoft.com/en-us/azure/databricks/kb/jobs/job-run-
dash
2. https://docs.microsoft.com/en-us/azure/data-factory/concepts-data
-flow-monitoring
4. design and create windowed aggregates
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-window-functions
5. handle schema drift
1. https://docs.microsoft.com/en-us/azure/data-factory/concepts-data
-flow-schema-drift
6. process time series data
1. https://azure-samples.github.io/azureiotlabs/timeseriesinsights/#:~
:text=Azure%20Time%20Series%20Insights%20is,over%20the%2
0world%20in%20seconds.
2. https://docs.microsoft.com/en-ca/azure/time-series-insights/
7. process within one partition
8. process across partitions
1. https://docs.microsoft.com/en-us/azure/architecture/reference-arch
itectures/event-hubs/partitioning-in-event-hubs-and-kafka
2. https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-fea
tures#partitions
3. https://docs.microsoft.com/en-us/azure/stream-analytics/repartition
4. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-parallelization
9. configure checkpoints/watermarking during processing
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-time-handling
10. scale resources
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-scale-jobs
11. handle interruptions
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-job-reliability
2. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-time-handling
12. design and configure exception handling
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-output-error-policy
2. https://docs.microsoft.com/en-us/azure/stream-analytics/configurat
ion-error-codes
13. upsert data
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-documentdb-output
14. replay archived stream data
1. https://docs.microsoft.com/en-us/azure/stream-analytics/stream-a
nalytics-concepts-checkpoint-replay
15. design a stream processing solution
1. https://docs.microsoft.com/en-us/azure/architecture/reference-arch
itectures/data/stream-processing-stream-analytics
4. Manage batches and pipelines
1. trigger batches
2. handle failed batch loads
1. https://docs.microsoft.com/en-us/azure/batch/error-handling
2. https://docs.microsoft.com/en-us/azure/batch/batch-job-task-error-
checking
3. https://docs.microsoft.com/en-us/azure/batch/batch-pool-node-err
or-checking
4. https://docs.microsoft.com/en-us/azure/batch/best-practices
3. validate batch loads
1. https://docs.microsoft.com/en-us/azure/batch/batch-job-task-error-
checking
4. manage data pipelines in Data Factory/Synapse Pipelines
5. schedule data pipelines in Data Factory/Synapse Pipelines
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/get-start
ed-pipelines
2. https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipel
ines-activities
6. implement version control for pipeline artifacts
1. https://docs.microsoft.com/en-us/azure/data-factory/source-control
7. manage Spark jobs in a pipeline
1. https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factor
y-spark
3. Design and Implement Data Security (10-15%)
1. Design security for data policies and standards
1. design data encryption for data at rest and in transit
1. https://docs.microsoft.com/en-us/azure/storage/common/storage-s
ervice-encryption
2. https://docs.microsoft.com/en-us/azure/cosmos-db/database-encr
yption-at-rest
3. https://docs.microsoft.com/en-us/azure/synapse-analytics/security/
workspaces-encryption
4. https://docs.microsoft.com/en-us/azure/security/fundamentals/encr
yption-atrest
2. design a data auditing strategy
1. https://docs.microsoft.com/en-us/azure/azure-sql/database/auditin
g-overview
2. https://docs.microsoft.com/en-us/azure/cosmos-db/audit-control-pl
ane-logs
3. design a data masking strategy, design for data privacy
1. https://docs.microsoft.com/en-us/azure/security/fundamentals/prot
ection-customer-data
2. https://docs.microsoft.com/en-us/azure/azure-sql/database/dynami
c-data-masking-overview
4. design a data retention policy
1. https://docs.microsoft.com/en-us/azure/storage/blobs/storage-lifec
ycle-management-concepts?tabs=azure-portal
2. https://docs.microsoft.com/en-us/azure/azure-monitor/logs/manag
e-cost-storage
3. https://docs.microsoft.com/en-us/azure/azure-monitor/app/data-ret
ention-privacy
4. https://azure.microsoft.com/en-ca/updates/retention-by-type/
5. design to purge data based on business requirements
1. https://docs.microsoft.com/en-us/azure/storage/blobs/soft-delete-b
lob-overview
2. https://docs.microsoft.com/en-us/rest/api/keyvault/purgedeletedsto
rageaccount/purgedeletedstorageaccount
3. https://docs.microsoft.com/en-us/azure/data-explorer/kusto/conce
pts/data-purge
4. https://docs.microsoft.com/en-us/azure/storage/blobs/soft-delete-b
lob-enable
6. design Azure role-based access control (Azure RBAC) and POSIX-like
Access Control List
1. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-access-control-model
7. (ACL) for Data Lake Storage Gen2
1. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-access-control
8. Design and implement row-level and column-level security
1. https://docs.microsoft.com/en-us/sql/relational-databases/security/
row-level-security?view=sql-server-ver15
2. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/column-level-security
2. Implement data security
1. implement data masking
1. https://docs.microsoft.com/en-us/azure/azure-sql/database/dynami
c-data-masking-overview
2. implement Azure RBAC
1. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-access-control-model
3. implement POSIX-like ACLs for Data Lake Storage Gen2
1. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-access-control
4. implement a data retention policy
1. https://azure.microsoft.com/en-ca/updates/lifecycle-management-f
or-azure-data-lake-storage-is-now-generally-available/
2. https://docs.microsoft.com/en-us/azure/storage/blobs/storage-lifec
ycle-management-concepts?tabs=azure-portal
5. implement a data auditing strategy
1. https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-la
ke-analytics-diagnostic-logs
6. manage identities, keys, and secrets across different data platform
technologies
1. https://docs.microsoft.com/en-us/rest/api/storageservices/authoriz
e-with-shared-key
2. https://docs.microsoft.com/en-us/azure/storage/common/storage-s
as-overview?toc=/azure/storage/blobs/toc.json
3. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-access-control-model
7. implement secure endpoints (private and public)
1. https://docs.microsoft.com/en-us/azure/private-link/private-endpoin
t-overview
2. https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-st
orage-best-practices
3. https://docs.microsoft.com/en-us/azure/data-factory/data-moveme
nt-security-considerations
8. implement resource tokens in Azure Databricks
1. https://docs.microsoft.com/en-us/azure/databricks/administration-g
uide/access-control/tokens
2. https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/lat
est/aad/service-prin-aad-token
9. load a Data Frame with sensitive information &
10. write encrypted data to tables or Parquet files &
11. manage sensitive information
1. https://databricks.com/blog/2020/11/20/enforcing-column-level-enc
ryption-and-avoiding-data-duplication-with-pii.html
2. https://databricks.com/session_na20/encryption-and-masking-for-s
ensitive-apache-spark-analytics-addressing-ccpa-and-governance
4. Monitor and Optimize Data Storage and Data Processing (10-15%)
1. Monitor data storage and data processing
1. implement logging used by Azure Monitor
1. https://docs.microsoft.com/en-us/azure/azure-monitor/logs/data-pl
atform-logs
2. configure monitoring services
1. https://docs.microsoft.com/en-us/azure/azure-monitor/deploy
3. measure performance of data movement
1. https://docs.microsoft.com/en-us/azure/azure-sql/database/monito
ring-with-dmvs
4. monitor and update statistics about data across a system
5. monitor data pipeline performance
1. https://docs.microsoft.com/en-us/azure/data-factory/monitor-using-
azure-monitor
6. measure query performance
1. https://docs.microsoft.com/en-us/azure/azure-sql/database/query-
performance-insight-use
7. monitor cluster performance
1. https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-key-sc
enarios-to-monitor
2. https://docs.microsoft.com/en-us/azure/synapse-analytics/monitori
ng/how-to-monitor-using-azure-monitor
3. https://docs.microsoft.com/en-us/azure/architecture/databricks-mo
nitoring/
8. understand custom logging options
1. https://docs.microsoft.com/en-us/azure/azure-monitor/agents/data-
sources-custom-logs
9. schedule and monitor pipeline tests
10. interpret Azure Monitor metrics and logs
1. https://docs.microsoft.com/en-us/azure/azure-monitor/essentials/d
ata-platform-metrics
11. interpret a Spark directed acyclic graph (DAG)
2. Optimize and troubleshoot data storage and data processing
1. compact small files
2. rewrite user-defined functions (UDFs)
3. handle skew in data
1. https://en.wikipedia.org/wiki/Skewness
2. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/sql-data-warehouse-tables-distribute#choose-a-distrib
ution-column-with-data-that-distributes-evenly
3. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/sql-data-warehouse-tables-distribute#determine-if-the
-table-has-data-skew
4. handle data spill
1. https://en.wikipedia.org/wiki/Data_breach
2. https://docs.microsoft.com/en-us/compliance/regulatory/gdpr-brea
ch-notification
3. https://docs.microsoft.com/en-us/compliance/regulatory/gdpr-brea
ch-azure-dynamics
5. tune shuffle partitions
1. https://docs.microsoft.com/en-us/azure/architecture/databricks-mo
nitoring/performance-troubleshooting
6. find shuffling in a pipeline
7. optimize resource management
8. tune queries by using indexers
1. https://docs.microsoft.com/en-us/azure/azure-sql/database/autom
atic-tuning-overview
2. https://docs.microsoft.com/en-us/sql/relational-databases/automati
c-tuning/automatic-tuning?view=sql-server-ver15
9. tune queries by using cache
1. https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data
-warehouse/performance-tuning-result-set-caching
10. optimize pipelines for analytical or transactional purposes
11. optimize pipeline for descriptive versus analytical workloads
12. troubleshoot a failed spark job
1. https://docs.microsoft.com/en-us/azure/databricks/kb/jobs/
2. https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-sp
ark-known-issues
3. https://docs.microsoft.com/en-us/azure/data-factory/data-factory-tr
oubleshoot-guide
13. troubleshoot a failed pipeline run
1. https://docs.microsoft.com/en-us/azure/data-factory/data-factory-tr
oubleshoot-guide

You might also like