HowToCrackInterview Udemy
HowToCrackInterview Udemy
ENGINEERING
CRACK THE INTERVIEW
By Karthik J
https://www.udemy.com/course/azure-data-engineering-interview-
questions/?referralCode=41FC8A6544E97A2F4AB2 [email protected]
BigDataAzure.com
BigDataAzure.com
[email protected]
WHAT IS THIS COURSE ABOUT?
This course will give you What you should NOT expect
This course will help you to Before you buy this course
• Focus on relevant Azure services, not all you need • Azure Data Services knowledge is required.
to learn • Primary audience of this course is developers
• Self learning and practicing and architects.
• This course is available only in English language
with no caption text
Discussion point
Demo
Hints
Reference material to gain knowledge
Quiz per topic
If you do not have experience then I recommend below course on Udemy to gain project development experience
Focus on:
• What was your role? – developer, senior developer, architect ….
• How many projects you have worked on
• Were you part of designing the solution?
• What was data size and volume?
• Batch mode and/or real time data ingestion?
• Structured / semi-strucutred / unstructured data
• Mention if you faced challenges or not. Don’t describe those at this stage.
• List of Azure data services you have experience or confident on.
Aggregate
Clean and
transform
Reference architecture
This presentation is part of course: https://www.udemy.com/course/azure-data-engineering-interview-
questions/?referralCode=41FC8A6544E97A2F4AB2 www.BigDataAzure.com
EXPLAIN YOUR PROJECT ARCHITECTURE
Focus on:
• Your understanding of architecture
• How effectively Azure data services are used. What was the purpose? E.g. in ref architecture
Databricks is used only for compute purpose
• What data is loaded? E.g. history data load, data load from heterogeneous sources etc
• Know consumers of your data – Visualization tools, data analysts, downstream applications
Candidate A Candidate B
It copies data from source to target. It copies data from source to target
We have to specify from where data using ADF connectors. Supports
has to be copied e.g. SFTP and dynamic values as paths or table
format (e.g. csv). On target also we names etc. Configurable DIU and
have to specify type and format of parallel connections can be specified
data with additional settings like for optimum performance.
header = true..
Tricky questions
• Can you not store media files on ADLS?
• Yes, you can.
• Then why not?
• Cost of ingress and egress is high on ADLS. Benefit of ADLS performance is on analytics.
• A third party team (outside of Azure env) wants to send data files to Azure. How?
• Create container on Blob storage and share SAS URL. The team can POST data using HTTPS
from any technology / programming language.
• Once I create BLOB storage, can I change it to ADLS later by changing properties?
• No
Tip: Instead of large number of small files, if you keep data in small number of large files, analytics
performance improves by many folds.
• Data Protection –
• Column level security
• Row level security
• Data masking
• Manual Encryption.
• Azure Delta Lake – adoption of open source Delta storage layer (similar to Databricks Delta)
• Azure Data Explorer – To query and analyze real time data being ingested in event hub, IOT hub,
Azure queue etc.
• Azure Data Studio – SQL Client + capability of running notebooks using Python and R on Spark
cluster
• What are various ways to copy data from Databricks to Azure SQL/DW? - From Spark pyodbc/jdbc, ADF Copy
activity, SQL DW polybase if we have created external tables in Databricks, Azure SQL Library for Python
• How do you read file from Blob storage and write back treating as File IO operation from Azure
Databricks – Using Storage SDK for Python