0% found this document useful (0 votes)
61 views

Slide Deck Data Analysis With Databricks

The document outlines a training course on Data Analysis with Databricks, detailing the agenda, objectives, and key concepts related to Databricks SQL, including its services and capabilities. It covers topics such as setting up catalogs and schemas, data management, visualization, and the Lakehouse architecture. The course aims to equip participants with the skills to perform data analysis, create visualizations, and manage data securely within the Databricks environment.

Uploaded by

rafabertuol
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Slide Deck Data Analysis With Databricks

The document outlines a training course on Data Analysis with Databricks, detailing the agenda, objectives, and key concepts related to Databricks SQL, including its services and capabilities. It covers topics such as setting up catalogs and schemas, data management, visualization, and the Lakehouse architecture. The course aims to equip participants with the skills to perform data analysis, create visualizations, and manage data securely within the Databricks environment.

Uploaded by

rafabertuol
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 115

Data Analysis with

Databricks

Databricks Academy
July 2024
©2024 Databricks Inc. — All rights reserved
Welcome and Get Settled
While you are getting settled, share with us a little about
yourself in the chat:
• Where are you joining from today (city, country)?
• How long have you been working with Databricks SQL?
• What data analysis tools have you worked with in the
past?
• What are you hoping to get out of this class?

©2024 Databricks Inc. — All rights reserved


Welcome and
Agenda

Databricks Academy
July 2024
©2024 Databricks Inc. — All rights reserved
©2024 Databricks Inc. — All rights reserved
Agenda
01. Databricks SQL Services and Capabilities Time Lecture Demo Lab

Get Started with Databricks SQL 15 mins ✓

Demonstrations:
- Setting up a Catalog and Schema 20 mins

- Data Importing 20 mins
- A Quick Query and Visualization 15 mins

Unity Catalog in Databricks SQL 20 mins ✓

Lakehouse Architecture 15 mins ✓

Demonstration:

- Integrations* 10 mins

*Integrations cannot be executed in Vocareum environments.

©2024 Databricks Inc. — All rights reserved


Agenda
02. Data Management in Databricks SQL Time Lecture Demo Lab

Databricks SQL Warehouses 15 mins ✓

Demonstrations:
✓ ✓
- Delta Lake in Databricks SQL 30 mins

Course Agenda
Data Security 20 mins

03. Data Visualization and Dashboarding

Demonstrations and Lab Exercises:


- Data Visualizations and Dashboards 50 mins
✓ ✓
- Create Interactive Dashboards 20 mins
- Databricks SQL in Production 30 mins

04. Summary and Next Steps

Closing Statements 10 mins ✓

©2024 Databricks Inc. — All rights reserved


Vocareum Lab Environment
Accessible through Databricks Academy

Notes about the lab environment used in today’s session:


• Everyone is in the same workspace.
• Everyone will create their own catalog.
• Everyone is using the same SQL Warehouse for compute.
• All demonstrations and labs can be completed in this workspace.*
• Everyone has the necessary privileges to complete all tasks
demonstrated.

*The Integrations demo cannot be completed in this environment.


©2024 Databricks Inc. — All rights reserved
Course Objectives:
• Describe fundamental concepts about Databricks SQL.
• Define the terms metastore, catalog, schema, table, and view in the
context of the Databricks DI Platform.
• Use the Databricks UI to create a catalog, schema, table, and view.
• Use the Databricks UI to upload data and create a managed table.
• Use the SQL Editor to complete multiple data analytics tasks.
• Create a data visualization associated with a query.
• Create an interactive dashboard.
• Create a refresh schedule and alert.
• Share data based assets in the Databricks DI Platform with others.

©2024 Databricks Inc. — All rights reserved


Databricks SQL
Services and
Capabilities

Databricks Academy
July 2024
©2024 Databricks Inc. — All rights reserved
Databricks SQL Services and Capabilities
LECTURE

Get Started with


Databricks SQL

©2024 Databricks Inc. — All rights reserved


Learning Objectives
By the end of this lesson, you should be able to:

1. Describe what Databricks SQL is.


2. Describe the benefits of Databricks SQL.

©2024 Databricks Inc. — All rights reserved


Databricks SQL
Delivering analytics on the
freshest data with data Data Science
& AI
ETL &
Real-time Analytics
Orchestration Data
Warehousing

warehouse performance and data Mosaic AI Delta Live Tables Workflows Databricks SQL

lake economics Use generative AI to understand the semantics of your data


Data Intelligence Engine

■ Better price / performance than other Unity Catalog


Securely get insights in natural language
cloud data warehouses
■ Simplify discovery and sharing of new Delta Lake
insights Data layout is automatically optimized based on usage patterns

■ Connect to familiar BI tools, like Tableau or


Open Data Lake
Power BI
All Raw Data
■ Simplified administration and governance (Logs, Texts, Audio, Video, Images)

©2024 Databricks Inc. — All rights reserved


Built on an Open Foundation
Easily integrate with the entire data and AI ecosystem

©2024 Databricks Inc. — All rights reserved


Better Together | Broad Integration with BI Tools
Connect your preferred BI tools with
optimized connectors that provide fast
performance, low latency, and high user
conconcurrency to your data lake for your
existing BI tools.

©2024 Databricks Inc. — All rights reserved


Partner Connect Makes it Easy
How do I get the data What tools can I use I heard Fivetran is
from SFDC into Delta to ingest data into great! How do I connect
lake? Delta? it to Databricks?

cluster

■ Many partner integrations take as few as 6 clicks


■ No context or page switches required
DATABRICKS ■ Automatically launches a cluster, calls Partner API to pass on PAT token and the cluster
PARTNER configuration details
CONNECT ■ Sets up all the necessary configs for an optimized user experience
■ Creates trial account in the partner product if an account doesn’t exist

©2024 Databricks Inc. — All rights reserved


A New Home for Data Analysts
Enable data analysts to
quickly perform ad-hoc and
exploratory data analysis,
with a SQL query editor,
visualizations and dashboards.

Automatic alerts can be


triggered for critical changes,
allowing to respond to
business needs faster.

©2024 Databricks Inc. — All rights reserved


Simple Administration and Governance
Quickly setup SQL / BI
optimized compute with SQL
warehouses.

Databricks automatically
determines instance types and
configuration for the best
price/performance.

Then, easily manage usage, perform


quick auditing, and troubleshooting
with query history.

©2024 Databricks Inc. — All rights reserved


Use Cases

Connect existing BI Collaboratively Build data-


tools to one source explore the enhanced
of truth for all your latest and freshest applications
data data

©2024 Databricks Inc. — All rights reserved


In
AI/BI Genie Spaces

Pu
bl
ic
Pr
ev
ie
w
!
● Ask questions of your data
in natural language.
● Have a follow-up
conversation with your data.
● Find answers to questions
not answered by your
dashboards.
● Leverages data in Unity
Catalog to the advantages
of your business.

©2024 Databricks Inc. — All rights reserved


Databricks SQL Services and Capabilities
DEMONSTRATION

Setting Up a Catalog
and Schema

©2024 Databricks Inc. — All rights reserved


Catalog Explorer UI
Single pane of glass for all of your data

UI driven access control to


simplify secure data
permissioning

Browse and understand data


assets stored in your
Lakehouse

Data lineage
End-to-end table & column
lineage

©2024 Databricks Inc. — All rights reserved


Data Lineage
Mapping the flow of data in the lakehouse

Auto-capture runtime data


lineage across all languages

Track lineage down to the


table and column level

Leverage common
permission model from
Unity Catalog

©2024 Databricks Inc. — All rights reserved


Follow Along Demo

Setting Up a Catalog and Schema

● Get your username


● Create a catalog
● Create a schema

©2024 Databricks Inc. — All rights reserved


Databricks SQL Services and Capabilities
DEMONSTRATION

Data Importing

©2024 Databricks Inc. — All rights reserved


Follow Along Demo

Data Importing

● Upload a .csv file


● Use the Catalog Explorer
● Create a table with data from object store

©2024 Databricks Inc. — All rights reserved


Databricks SQL Services and Capabilities
DEMONSTRATION

A Quick Query and


Visualization

©2024 Databricks Inc. — All rights reserved


Follow Along Demo

A Quick Query and Visualization

● Query data
● Create a visualization

©2024 Databricks Inc. — All rights reserved


Knowledge Check

©2024 Databricks Inc. — All rights reserved


Knowledge check
Think about this question and volunteer an answer

Which of the following features is used by Databricks


SQL to ensure your data is secure? Select one
response.
A. Built-in data governance
B. Delta sharing
C. Integration with 3rd party tools
D. Automatically scalable cloud infrastructure

©2024 Databricks Inc. — All rights reserved


Knowledge check
Think about this question and volunteer an answer

Which of the following features of Databricks is used


for running queries in Databricks SQL? Select one
response.
A. Dashboards
B. Job scheduler
C. SQL Editor
D. SQL warehouses

©2024 Databricks Inc. — All rights reserved


Knowledge check
Think about this question and volunteer an answer

What is the primary purpose of Databricks SQL?


A. To provide better price/performance and simplify discovery
for BI tools.
B. To manage administration and governance of data
warehouses.
C. To support a broad set of BI tools, including Tableau and
Power BI.
D. All of the above.
©2024 Databricks Inc. — All rights reserved
Knowledge check
Think about this question and volunteer an answer

Which feature of the platform provides users with the


ability to quickly connect to third-party tools with simple
to implement integrations? Select one response.
A. SQL Editor
B. Partner Connect
C. Workflows
D. Features

©2024 Databricks Inc. — All rights reserved


Databricks SQL Services and Capabilities
LECTURE

Unity Catalog in
Databricks SQL

©2024 Databricks Inc. — All rights reserved


Learning Objectives
By the end of this lesson, you should be able:

1. Describe the three-level namespacing system provided by Unity Catalog.


2. Describe persistence and scope of catalogs, schemas (databases), tables, and
views on Databricks.
3. Compare and contrast the behavior of managed and unmanaged tables.
4. Identify the legacy hive_metastore as appearing as the default catalog to be
compatible with Unity Catalog.
5. Describe how and where Unity Catalog stores the data behind its catalogs,
schema, and granular data objects.
6. Explain the impact of Unity Catalog on existing external storage locations.

©2024 Databricks Inc. — All rights reserved


Unity Catalog
Architecture

Implements access control on data


Unity Catalog
Access control is always enabled Identity
User/group
Provider Metastore
management
Works across multiple workspaces
Grants permissions to users at the
account level
Workspace Workspace

Compute Compute
resources resources

©2024 Databricks Inc. — All rights reserved


Unity Catalog
Unity Catalog security model

Workspace security model Unity Catalog security model

Object/privilege/principal access Object/privilege/principal access


control model control model
Open by default Secure by default
Local to workspace Works across multiple workspaces
Grants privileges to workspace- Grants privileges to account-level
level principals principals

©2024 Databricks Inc. — All rights reserved


Catalog Explorer and Data lineage

©2024 Databricks Inc. — All rights reserved


Key Concepts
Catalogs

Metastore

Catalog

Schema

Table View Volume Function Model

©2024 Databricks Inc. — All rights reserved


Key Concepts
Catalogs

Metastore

Catalog

Schema

Table View Volume Function Model

©2024 Databricks Inc. — All rights reserved


Key Concepts
Schemas/Databases

Metastore

Catalog

Schema

Table View Volume Function Model

©2024 Databricks Inc. — All rights reserved


Key Concepts
Tables

Metastore

Catalog

Schema

Table View Volume Function Model

©2024 Databricks Inc. — All rights reserved


Key Concepts
Tables
Metastore

Managed Table
Catalog

External Table

Schema

Table

©2024 Databricks Inc. — All rights reserved


Three-Level Namespace Notation
• Data objects must be specified with three elements, depending on
granularity required: Catalog, Schema, and Table
• Example:
CREATE TABLE main.default.department
(
deptcode INT,
deptname STRING,
location STRING
);
• Or, with a USE statement:
USE main.default;
SELECT * FROM department;

©2024 Databricks Inc. — All rights reserved


Databricks SQL Services and Capabilities
LECTURE

Lakehouse
Architecture

©2024 Databricks Inc. — All rights reserved


Learning Objectives
By the end of this lesson, you should be able:

1. Describe the benefits of using Databricks SQL for in-platform data processing.
2. Describe the medallion architecture as a sequential data organization and
pipeline system of progressively cleaner data.
3. Identify that bronze and silver layers data requires additional processing and
cleaning.
4. Describe the data in the gold layer of the medallion architecture.
5. Describe last-mile ETL workflows fully within the gold layer for specific use
cases.
6. Identify the gold layer as the most common layer for data analysts using
Databricks SQL.
7. Describe the benefits of working with streaming data.
©2024 Databricks Inc. — All rights reserved
The Lakehouse Architecture
● Full ACID Transaction
● Focus on your data
flow, instead of
worrying about failures. Data stream source
● Open Standards, Open
Source
Streaming
● Store petabytes of data Analytics
without worries of lock- Batch source
in. Growing community
including Presto, Spark
and more.
● Powered by Data Lake
AI & Reporting
● Unifies Streaming / CSV,
JSON, TXT…
Batch. Convert existing
jobs with minimal
modifications. Data stream source

©2024 Databricks Inc. — All rights reserved


The Delta Lake Architecture
*Data Quality Levels *

Bronze Silver Gold


Streaming
Analytics
CSV,
JSON, TXT…

Raw Filtered, Cleaned Business-level


Ingestion Augmented Aggregates
Data Lake AI & Reporting

Quality

©2024 Databricks Inc. — All rights reserved


The Delta Lake Architecture
*Data Quality Levels *

Bronze Silver Gold


Streaming
Analytics
CSV,
JSON, TXT…

Raw Filtered, Cleaned Business-level


Ingestion Augmented Aggregates
Data Lake AI & Reporting

Quality

©2024 Databricks Inc. — All rights reserved


The Delta Lake Architecture
*Data Quality Levels *

Bronze Silver Gold


Streaming
Analytics
CSV,
JSON, TXT…

Raw Filtered, Cleaned Business-level


Ingestion Augmented Aggregates
Data Lake AI & Reporting

Quality

©2024 Databricks Inc. — All rights reserved


The Delta Lake Architecture
*Data Quality Levels *
OVERWRITE
INSERT MERGE
Bronze DELETE Silver Gold
Streaming
Analytics
CSV,
JSON, TXT…

Raw Filtered, Cleaned Business-level


Ingestion UPDATE Augmented Aggregates
Data Lake AI & Reporting

Quality

©2024 Databricks Inc. — All rights reserved


Knowledge Check

©2024 Databricks Inc. — All rights reserved


Knowledge check
Think about this question and volunteer an answer

Which of the following statements about the lakehouse medallion


architecture is true? Select one response.
A. The data in a single upstream table could be used to generate multiple
downstream tables.
B. The silver layer is for reporting and uses more de-normalized and
read-optimized data models with fewer joins.
C. The gold layer provides a broad view of all key business entities,
concepts and transactions.
D. Only minimal or "just-enough" transformations and data cleansing rules
are applied to each layer in the medallion architecture.
©2024 Databricks Inc. — All rights reserved
Knowledge check
Think about this question and volunteer an answer

Which of the following describes the data quality of the gold layer of data
in the lakehouse medallion architecture? Select one response.
A. The gold layer brings the data from different sources into an Enterprise
view.
B. The gold layer is comprised of clean aggregated data, ready to use in
production for a specific use case.
C. The table structures in the gold layer correspond to the source system
table structures "as-is”.
D. The focus of the gold layer is quick Change Data Capture and the ability
to provide a historical archive if needed without rereading the data from
the source system.
©2024 Databricks Inc. — All rights reserved
Knowledge check
Think about this question and volunteer an answer

What is the primary purpose of the bronze layer in the "bronze-silver-


gold medallion" paradigm in Delta Lake?
A. To store data in a format suitable for individual business projects or
reports.
B. To perform data cleansing, joining, and enrichment on raw data.
C. To provide a "single source of truth" for the enterprise across various
projects.
D. To ingest raw data quickly, keeping it in its original format for both
current and future projects.

©2024 Databricks Inc. — All rights reserved


Knowledge check
Think about this question and volunteer an answer

Which of the following statements describes the relationship between


the silver and gold layer of data? Select one response.
A. The gold layer has less clean data than the silver layer.
B. Project-specific business rules are applied from the silver to gold layer.
C. Self-service analytics are enabled for the gold layer for ad-hoc
reporting in the silver layer.
D. The gold layer is where we land all the data from external source
systems, which are represented by the silver layer.

©2024 Databricks Inc. — All rights reserved


Databricks SQL Services and Capabilities
LECTURE

Integrations

©2024 Databricks Inc. — All rights reserved


Learning Objectives
By the end of this lesson, you should be able:

1. Identify Databricks SQL as a complementary tool for BI partner tool


workflows.
2. Identify Databricks SQL as a quick opportunity to create queries,
visualizations, and dashboards from within the Lakehouse
3. Identify Partner Connect as a tool for implementing simple integrations
with a number of other data products.

©2024 Databricks Inc. — All rights reserved


Databricks Partner Connect

Databricks Partner Connect is a dedicated ecosystem of


integrations that allows users to easily connect with popular data
ingestion, transformation, and BI partner products.

This helps data analysts get useful data into their lakehouse
faster without the need to manually configure each product, so
they can get data-driven insights.

©2024 Databricks Inc. — All rights reserved


Partner Connect Makes it Easy
How do I get the data What tools can I use I heard Fivetran is
from SFDC into Delta to ingest data into great! How do I connect
lake? Delta? it to Databricks?

cluster

■ Many partner integrations take as few as 6 clicks


■ No context or page switches required
DATABRICKS ■ Automatically launches a cluster, calls Partner API to pass on PAT token and the cluster
PARTNER configuration details
CONNECT ■ Sets up all the necessary configs for an optimized user experience
■ Creates trial account in the partner product if an account doesn’t exist

©2024 Databricks Inc. — All rights reserved


Databricks Partner Connect

Data BI and Data Prep and Machine


Ingestion Visualization Transformation Learning

©2024 Databricks Inc. — All rights reserved


Databricks Partner Connect

Data Data Reverse Semantic


Governance Quality ETL Layer

©2024 Databricks Inc. — All rights reserved


Built on an Open Foundation
Easily integrate with the entire data and AI ecosystem

©2024 Databricks Inc. — All rights reserved


Databricks SQL Services and Capabilities
DEMONSTRATION

Integrations

©2024 Databricks Inc. — All rights reserved


Follow Along Demo
Integrations
• Connecting to outside data
• Connecting to BI tools
• Partner Connect

©2024 Databricks Inc. — All rights reserved


Data Management
in Databricks SQL

Databricks Academy
July 2024
©2024 Databricks Inc. — All rights reserved
Data Management in Databricks SQL
LECTURE

Databricks SQL
Warehouses

©2024 Databricks Inc. — All rights reserved


Learning Objectives
By the end of this lesson, you should be able:

1. Describe the purpose of Databricks SQL warehouses.


2. Compare and contrast Classic, Pro, and Serverless Databricks SQL
warehouses.
3. Identify Serverless Databricks SQL warehouses as a quick-starting
option.
4. Describe basic Databricks SQL warehouse sizing and scaling guidelines
in response to slow-running single-user queries and multi-user
environments.
5. Describe the impact of Databricks SQL warehouse permissions on
query history availability.
©2024 Databricks Inc. — All rights reserved
Successful data management platforms rely
on efficient infrastructure
ML
engineers Data scientists ALL THE
USERS
Data Data
engineers analysts

INFRASTRUCTURE

ALL THE
DATA LAKE DATA WAREHOUSE DATA
©2024 Databricks Inc. — All rights reserved
Problems with Managing Infrastructure

Users Admins

Clusters

Lost Productivity Admin Effort


Waiting for results while Manually configure
clusters startup / scaleup. versions, cluster sizes,
instance types

Cost
Need to reduce costs

Finance

©2024 Databricks Inc. — All rights reserved


Industry leading security architecture
supporting production workloads

1. Container Isolation 2. VM Isolation 3. Network Isolation

● Hardened container ● Workloads separated by ● All nodes egress is


images per industry best VM boundaries blocked except to nodes
practice in same cluster
● Blocked - reusing VMs
● Disable privilege access among customers ● Federated access
in the container through temporary
security tokens

● Ingress traffic from other


customers is blocked

©2024 Databricks Inc. — All rights reserved


Databricks SQL Serverless Benefits

Higher user Zero Management Lower Cost


productivity
● User queries start ● No configuration ● Pay what you consume;
instantly, no waiting ● No performance tuning eliminate idle cluster time
for cluster start-up ● No capacity ● No over-provisioning of
● Add more concurrent management resources
users with instant ● Automatic upgrades and ● Idle capacity removed 10
cluster scaling patching minutes after last query

©2024 Databricks Inc. — All rights reserved


Serverless Compute Architecture

Databricks
Databricks
Account Databricks control plane
Account Benefits:

WORKSPACE WORKSPACE WORKSPACE


● Production ready
environment

● Robust security
VPC/VNET
foundation - data
Databricks Serverless compute
isolation and
encryption

Customers
Customer
Account

Customer Storage
©2024 Databricks Inc. — All rights reserved
Warehouse Configuration
AWS Azure

©2024 Databricks Inc. — All rights reserved


Warehouse Configuration
GCP

©2024 Databricks Inc. — All rights reserved


Warehouse Configuration
In the course, SQL Warehouses have the following settings
• Cluster size – 2X-Small
• Scaling – Min: 1, Max 1
• Auto-stop – After ten minutes

©2024 Databricks Inc. — All rights reserved


Data Management in Databricks SQL
DEMONSTRATION

Delta Lake in
Databricks SQL

©2024 Databricks Inc. — All rights reserved


Follow Along Demo
Delta Lake in Databricks SQL
• Create a schema
• Create views that withhold data from unauthorized
groups
• Optimize delta tables

©2024 Databricks Inc. — All rights reserved


Knowledge Check

©2024 Databricks Inc. — All rights reserved


Knowledge check
Think about this question and volunteer an answer

Which of the following statements describes the purpose of Databricks


SQL warehouses? Select one response.
A. SQL warehouses enable data analysts to find and share dashboards.
B. SQL warehouses are a declarative framework for building data
processing pipelines.
C. SQL warehouses provide data discovery capabilities across Databricks
workspaces.
D. SQL warehouses allow users to run SQL commands on data objects
within Databricks SQL.

©2024 Databricks Inc. — All rights reserved


Knowledge check
Think about this question and volunteer an answer

What are the benefits of Delta Lake within the Lakehouse


Architecture?
A. Real-time data processing with low latency
B. Exclusive support for batch processing
C. ACID transactions, metadata scalability, and storage
improvement
D. Data isolation for multiple software development
environments
©2024 Databricks Inc. — All rights reserved
Knowledge check
Think about this question and volunteer an answer

Which of the following statements about SQL warehouse sizing and


scaling is true? Select two responses.
A. Increasing maximum scaling allows for multiple users to use the
same warehouse at the same time.
B. Scaling is set to a minimum of 1 and a maximum of 1 by default.
C. The higher the cluster size, the higher the latency in your queries.
D. The auto-stop feature will restart the warehouse if it remains idle
during the auto-stop period.

©2024 Databricks Inc. — All rights reserved


Knowledge check
Think about this question and volunteer an answer

Which feature of the platform provides users with the


ability to quickly connect to third-party tools with simple
to implement integrations? Select one response.
A. SQL Editor
B. Partner Connect
C. Workflows
D. Features

©2024 Databricks Inc. — All rights reserved


Data Management in Databricks SQL
LECTURE

Data Security

©2024 Databricks Inc. — All rights reserved


Learning Objectives
By the end of this lesson, you should be able:

1. Describe the different levels of data object access available with Unity
Catalog.
2. Identify that catalogs, schemas, and tables can all have unique owners.
3. Describe how to organize owned data objects for the purposes of security.
4. Identify that the creator of a data object becomes the owner of that data
object.
5. Identify the responsibilities of data object ownership.
6. Update data object permissions to address user access needs in a variety of
common scenarios.
7. Identify PII data objects as needing additional, organization-specific
considerations.
©2024 Databricks Inc. — All rights reserved
The Life of a Query (Without Unity Catalog)
Per Workspace

2 nts
a
SQL hec
k Gr Table ACL
C
3
SELECT 1 Lookup Location
*
FROM Sales2020;
Cluster or SQL 4 Hive
6 Warehouse Return path to table Metastore
Cluster filters s3://sales/sales2020
unauthorized data
5
Instance Profile /
Service Principal /
Service Account

©2024 Databricks Inc. — All rights reserved Cloud Storage


Unity Catalog Overview SQL access
controls
Audit
log
Managed
Data Source

Unity Catalog
(cross-workspace)
Managed
Data Source

Cluster or SQL
Warehouse
User Identity
Passthrough
Defined
Credentials External
Tables

Other
Existing Data
Sources
©2024 Databricks Inc. — All rights reserved
Databricks Unity Catalog
Audit
Unity Log
Users Data (files on S3/ADLS/GCS)
Catalog
table1 /dataset/pages/part-001
/dataset/pages/part-002
table2
/dataset/users/uk/part-001
view1
/dataset/users/uk/part-002
view2 /dataset/users/us/part-001
models
SQL Databases
view3

Fine-grained permissions on tables, fields, views: not


files
Industry standard interface: ANSI SQL grants ML Models

Uniform permission model for all data assets

Centrally audited Delta Shares

©2024 Databricks Inc. — All rights reserved


Unity Catalog: External Table with Defined
Credentials
CREATE CREDENTIAL iot_role TYPE AWS_ROLE ...

CREATE TABLE iot_data LOCATION s3:/...


WITH CREDENTIAL iot_role

iot_key
Credentials

Unity Catalog External


Tables

©2024 Databricks Inc. — All rights reserved


Unity Catalog: External Files with
Passthrough
SELECT * FROM csv.`adls:/.../myfolder`
if a direct file path is specified,
we perform passthrough with
the user’s cloud credentials

CREATE VIEW v AS SELECT * FROM csv.`adls:/.../myfolder`


User ADLS
Credentials

External
Tables

©2024 Databricks Inc. — All rights reserved


Data Visualization
and Dashboarding

Databricks Academy
July 2024
©2024 Databricks Inc. — All rights reserved
AI/BI Dashboards vs. Legacy Dashboards

AI/BI Dashboards (formerly Legacy Dashboards


Lakeview)
● Generally available since April 2024 Generally available since 2020
● Feature streamlined sharing Require sharing of individual
settings components to grant full access
● Can be shared with users without Can be shared only with users
workspace access registered to the workspace
● Assist in creating visualizations by No built-in AI functionality
interpreting natural language
prompts.

©2024 Databricks Inc. — All rights reserved


Which SQL Editor Should I Use?
Draft dashboards include a Data tab where authors can write and edit SQL
queries whose results can be used in visualizations.
For this course:
- Use context clues. Unless you’re explicitly instructed to navigate to a
new part of the platform, keep working with the same toolset.
In general:
- Use the Databricks SQL query editor to create and modify tables and
run queries that you want to use with alerts.
- Use the dashboard query editor to write queries that produce results
sets you want to include in dashboard visualizations.

©2024 Databricks Inc. — All rights reserved


Migrating Legacy Dashboards to New Dashboards
Databricks offers tools to help you migrate your legacy dashboards to the
latest dashboard tooling. Databricks offers the following tools and guidance
for help with migration:

- Clone to Lakeview button available in the UI


- API tools for migrating and managing dashboards
- Step-by-step tutorials for guidance

©2024 Databricks Inc. — All rights reserved


Data Visualization and Dashboarding
DEMONSTRATION

Data Visualizations
and Dashboards

©2024 Databricks Inc. — All rights reserved


Follow Along Demo
Data Visualizations
• Create common visualizations
• Add visualizations to a dashboard

©2024 Databricks Inc. — All rights reserved


Knowledge Check

©2024 Databricks Inc. — All rights reserved


Knowledge check
Think about this question and volunteer an answer

How can you enable aggregation in a Databricks SQL


visualization?

A. Modify the underlying SQL query to add an aggregation column.


B. Select the aggregation type directly in the visualization editor.
C. Use the Aggregation drop-down menu in the Visualization Type
options.
D. Aggregation is not supported in Databricks SQL visualizations.

©2024 Databricks Inc. — All rights reserved


Data Visualization and Dashboarding
LAB EXERCISE

Data Visualizations
and Dashboards

©2024 Databricks Inc. — All rights reserved


Data Visualization and Dashboarding
DEMONSTRATION

Create Interactive
Dashboards

©2024 Databricks Inc. — All rights reserved


Follow Along Demo
Dashboarding Basics
• Create a dashboard
• Add Filter to dashboard
• Organize Databricks SQL assets

©2024 Databricks Inc. — All rights reserved


Knowledge Check

©2024 Databricks Inc. — All rights reserved


Knowledge check
Think about this question and volunteer an answer

A data analyst needs to create a visualization Which of the following visualization


out of the following query: types is best suited to depict the
results of this query? Select one
SELECT order_date
response.
FROM sales
A. Funnel
WHERE order_date >= to_date('2020-01-01')
B. Stacked bar chart
AND order_date <= to_date('2021-01-01'); C. Bar chart
D. Boxplot

©2024 Databricks Inc. — All rights reserved


Knowledge check
Think about this question and volunteer an answer

Which of the following data visualizations displays a


single number by default? Select one response.
A. Bar chart
B. Counter
C. Map - markers
D. Funnel

©2024 Databricks Inc. — All rights reserved


Data Visualization and Dashboarding
DEMONSTRATION

Databricks SQL in
Production

©2024 Databricks Inc. — All rights reserved


Follow Along Demo
Databricks SQL in Production
• Automation in Databricks SQL
• Sharing Databricks SQL assets

©2024 Databricks Inc. — All rights reserved


Knowledge Check

©2024 Databricks Inc. — All rights reserved


Knowledge check
Think about this question and volunteer an answer

Which of the following automations are available in


Databricks SQL? Select one response.
A. Query refresh schedules
B. Dashboard refresh schedules
C. Alerts
D. All of the above

©2024 Databricks Inc. — All rights reserved


Knowledge check
Think about this question and volunteer an answer

What is the purpose of Alerts in Databricks SQL?


A. To automatically execute SQL queries.
B. To organize queries within a folder structure.
C. To trigger notifications based on specific conditions
in scheduled queries.
D. To share dashboards with other team members.

©2024 Databricks Inc. — All rights reserved


Knowledge check
Think about this question and volunteer an answer

What is the purpose of configuring a refresh schedule


for a query in Databricks SQL?.
A. To automatically pull new data into a table.
B. To create a new table based on specified criteria.
C. To manually execute queries on-demand.
D. To edit existing data in the database.

©2024 Databricks Inc. — All rights reserved


Knowledge check
Think about this question and volunteer an answer

What level of permissions is the owner of a query


granted on their query? Select one response.
A. Can View
B. Can Run
C. Can Edit
D. Can Manage

©2024 Databricks Inc. — All rights reserved


Knowledge check
Think about this question and volunteer an answer

A team of stakeholders needs to be notified of changes in a


dashboard’s statistics on a daily basis. Which of the following
actions can be taken to ensure they always have the newest
information? Select one response.
A. A refresh schedule can be configured and stakeholders can be
subscribed to the dashboard's output.
B. A trigger alert can be created for the dashboard and
stakeholders can be added to the alert notification list.
C. A webhook can be created and shared with stakeholders.
D. None of the above
©2024 Databricks Inc. — All rights reserved
Knowledge check
Think about this question and volunteer an answer

What is the benefit of setting a refresh schedule for a


Databricks dashboard?

A. To change the color palette of visualizations.


B. To organize and label workspace objects.
C. To keep the data underlying visualizations up-to-date.
D. To create query parameters for customization.

©2024 Databricks Inc. — All rights reserved


Summary and
Next Steps

©2024 Databricks Inc. — All rights reserved


Earn a Databricks certification!
Certification helps you gain industry recognition, competitive
differentiation, greater productivity, and results.
• This course helps you prepare for the
Databricks Certified Data Analyst
Associate exam
• Recommended Self-Paced Courses
• Ingesting Data for Databricks SQL
• Integrating BI Tools with Databricks SQL
• Please see the Databricks Academy
for additional prep materials
For more information visit:
databricks.com/learn/certification

©2024 Databricks Inc. — All rights reserved


©2024 Databricks Inc. — All rights reserved

You might also like