UNITY CATALOG
IN
DATABRICKS
WHAT IS UNITY CATALOG
Unity Catalog is a unified governance solution for managing data and AI
assets across Databricks workspaces.
It offers a centralized platform for controlling access, auditing, tracking
lineage, and discovering data assets.
WITHOUT UNITY CATALOG WITH UNITY CATALOG
Databricks Databricks Unity Catalog
Workspace Workspace User
Metastore
Management
User User
Management Management
Metastore Metastore Databricks Databricks
Workspace Workspace
Compute Compute
Compute Compute
KEY FEATURES:
Centralized Access Control
Define data access policies in one place, enforce them across all
workspaces.
Standards-Compliant Security Model
Based on ANSI SQL, permissions are managed at catalog, schema, table,
and view levels using familiar syntax.
Built-In Auditing and Lineage
Automatic logging of user access and tracking of data lineage, showing the
creation and usage of data assets.
Data Discovery
Tag and document data assets, with a search interface for easy discovery.
UNITY CATALOG OBJECT MODEL
AND HIERARCHY
The top-level container for
metadata, managing data and AI
Metastore assets and their access permissions.
Storage External Clean
Catalog Share Recipient Ptovider Connection
Credential location Room
Schema
Table View Volume Model Function
Data & AI Objects
Catalogs Schema
Volumes: Logical storage for
Organize data assets; Also known as
unstructured, non-tabular data
often reflect databases; contain
Tables: Collections of data,
organizational units tables, views,
either managed or external.
or development volumes, AI models,
Views: Saved queries against
scopes. and functions.
tables.
Non-data securable Organize assets into
Functions: Saved logic that
objects like storage logical categories,
returns values or sets of rows.
credentials and usually representing
Models: AI models with
external locations a specific use case,
MLflow.
also live at this level. project, or team.
UNITY CATALOG GOVERNANCE
MODELS
Centralized Governance:
Governance administrators own the metastore.
Administrators can take ownership of any object and
manage permissions.
Distributed Governance:
Data domains are managed at the catalog level.
Catalog owners manage all assets and governance
within their domain.
Best Practice:
Set a group as the metastore admin or catalog owner
for consistent management.
STORAGE SEPARATION AND
HIERARCHY
Data Separation:
Store specific data types in designated cloud accounts or
buckets.
Example: HR production data stored in s3://mycompany-
hr-prod/unity-catalog.
Storage Hierarchy:
Storage locations can be configured at the metastore, catalog,
or schema level.
Hierarchical Evaluation:
a. Schema-level location
b. Catalog-level location
c. Metastore-level location
DATA ACCESS CONTROL IN
DESIGNATED ENVIRONMENTS
Environment-Specific Access:
Workspaces are primary data processing
environments.
Catalogs as primary data domains.
Metastore admins and catalog owners can bind
catalogs to specific workspaces.
Use Cases:
Isolate production data from development
environments.
Ensure data compliance by restricting access to
specific environments.
CONFIGURING UNITY CATALOG
METASTORE
Metastore Overview:
Top-level container managing data assets (tables,
views, volumes).
Configure one metastore per region for Databricks
workspaces.
Best Practices:
Use a dedicated bucket for metastore-managed
storage.
Avoid giving direct access to the managed storage
location.
Prefer catalog-level managed storage over
metastore-level.
EXTERNAL LOCATIONS AND
STORAGE CREDENTIALS
External Locations:
Combine storage credentials with a cloud storage
path.
Use to register external tables and volumes.
Best Practices:
Limit direct access to external locations.
Avoid using external locations for path-based
access outside of registered tables or volumes.
Use volumes for SQL-based file management and
access.