Slide Deck Data Analysis With Databricks
Slide Deck Data Analysis With Databricks
Databricks
Databricks Academy
July 2024
©2024 Databricks Inc. — All rights reserved
Welcome and Get Settled
While you are getting settled, share with us a little about
yourself in the chat:
• Where are you joining from today (city, country)?
• How long have you been working with Databricks SQL?
• What data analysis tools have you worked with in the
past?
• What are you hoping to get out of this class?
Databricks Academy
July 2024
©2024 Databricks Inc. — All rights reserved
©2024 Databricks Inc. — All rights reserved
Agenda
01. Databricks SQL Services and Capabilities Time Lecture Demo Lab
Demonstrations:
- Setting up a Catalog and Schema 20 mins
✓
- Data Importing 20 mins
- A Quick Query and Visualization 15 mins
Demonstration:
✓
- Integrations* 10 mins
Demonstrations:
✓ ✓
- Delta Lake in Databricks SQL 30 mins
✓
Course Agenda
Data Security 20 mins
Databricks Academy
July 2024
©2024 Databricks Inc. — All rights reserved
Databricks SQL Services and Capabilities
LECTURE
warehouse performance and data Mosaic AI Delta Live Tables Workflows Databricks SQL
cluster
Databricks automatically
determines instance types and
configuration for the best
price/performance.
Pu
bl
ic
Pr
ev
ie
w
!
● Ask questions of your data
in natural language.
● Have a follow-up
conversation with your data.
● Find answers to questions
not answered by your
dashboards.
● Leverages data in Unity
Catalog to the advantages
of your business.
Setting Up a Catalog
and Schema
Data lineage
End-to-end table & column
lineage
Leverage common
permission model from
Unity Catalog
Data Importing
Data Importing
● Query data
● Create a visualization
Unity Catalog in
Databricks SQL
Compute Compute
resources resources
Metastore
Catalog
Schema
Metastore
Catalog
Schema
Metastore
Catalog
Schema
Metastore
Catalog
Schema
Managed Table
Catalog
External Table
Schema
Table
Lakehouse
Architecture
1. Describe the benefits of using Databricks SQL for in-platform data processing.
2. Describe the medallion architecture as a sequential data organization and
pipeline system of progressively cleaner data.
3. Identify that bronze and silver layers data requires additional processing and
cleaning.
4. Describe the data in the gold layer of the medallion architecture.
5. Describe last-mile ETL workflows fully within the gold layer for specific use
cases.
6. Identify the gold layer as the most common layer for data analysts using
Databricks SQL.
7. Describe the benefits of working with streaming data.
©2024 Databricks Inc. — All rights reserved
The Lakehouse Architecture
● Full ACID Transaction
● Focus on your data
flow, instead of
worrying about failures. Data stream source
● Open Standards, Open
Source
Streaming
● Store petabytes of data Analytics
without worries of lock- Batch source
in. Growing community
including Presto, Spark
and more.
● Powered by Data Lake
AI & Reporting
● Unifies Streaming / CSV,
JSON, TXT…
Batch. Convert existing
jobs with minimal
modifications. Data stream source
Quality
Quality
Quality
Quality
Which of the following describes the data quality of the gold layer of data
in the lakehouse medallion architecture? Select one response.
A. The gold layer brings the data from different sources into an Enterprise
view.
B. The gold layer is comprised of clean aggregated data, ready to use in
production for a specific use case.
C. The table structures in the gold layer correspond to the source system
table structures "as-is”.
D. The focus of the gold layer is quick Change Data Capture and the ability
to provide a historical archive if needed without rereading the data from
the source system.
©2024 Databricks Inc. — All rights reserved
Knowledge check
Think about this question and volunteer an answer
Integrations
This helps data analysts get useful data into their lakehouse
faster without the need to manually configure each product, so
they can get data-driven insights.
cluster
Integrations
Databricks Academy
July 2024
©2024 Databricks Inc. — All rights reserved
Data Management in Databricks SQL
LECTURE
Databricks SQL
Warehouses
INFRASTRUCTURE
ALL THE
DATA LAKE DATA WAREHOUSE DATA
©2024 Databricks Inc. — All rights reserved
Problems with Managing Infrastructure
Users Admins
Clusters
Cost
Need to reduce costs
Finance
Databricks
Databricks
Account Databricks control plane
Account Benefits:
● Robust security
VPC/VNET
foundation - data
Databricks Serverless compute
isolation and
encryption
Customers
Customer
Account
Customer Storage
©2024 Databricks Inc. — All rights reserved
Warehouse Configuration
AWS Azure
Delta Lake in
Databricks SQL
Data Security
1. Describe the different levels of data object access available with Unity
Catalog.
2. Identify that catalogs, schemas, and tables can all have unique owners.
3. Describe how to organize owned data objects for the purposes of security.
4. Identify that the creator of a data object becomes the owner of that data
object.
5. Identify the responsibilities of data object ownership.
6. Update data object permissions to address user access needs in a variety of
common scenarios.
7. Identify PII data objects as needing additional, organization-specific
considerations.
©2024 Databricks Inc. — All rights reserved
The Life of a Query (Without Unity Catalog)
Per Workspace
2 nts
a
SQL hec
k Gr Table ACL
C
3
SELECT 1 Lookup Location
*
FROM Sales2020;
Cluster or SQL 4 Hive
6 Warehouse Return path to table Metastore
Cluster filters s3://sales/sales2020
unauthorized data
5
Instance Profile /
Service Principal /
Service Account
Unity Catalog
(cross-workspace)
Managed
Data Source
Cluster or SQL
Warehouse
User Identity
Passthrough
Defined
Credentials External
Tables
Other
Existing Data
Sources
©2024 Databricks Inc. — All rights reserved
Databricks Unity Catalog
Audit
Unity Log
Users Data (files on S3/ADLS/GCS)
Catalog
table1 /dataset/pages/part-001
/dataset/pages/part-002
table2
/dataset/users/uk/part-001
view1
/dataset/users/uk/part-002
view2 /dataset/users/us/part-001
models
SQL Databases
view3
iot_key
Credentials
External
Tables
Databricks Academy
July 2024
©2024 Databricks Inc. — All rights reserved
AI/BI Dashboards vs. Legacy Dashboards
Data Visualizations
and Dashboards
Data Visualizations
and Dashboards
Create Interactive
Dashboards
Databricks SQL in
Production