0% found this document useful (0 votes)
47 views8 pages

Unity Catalog

Unity Catalog is a governance solution for managing data and AI assets across Databricks workspaces, providing centralized access control, auditing, and data discovery. It features a hierarchical object model that organizes data assets into catalogs, schemas, tables, and views, and supports both centralized and distributed governance models. Best practices include configuring a dedicated metastore per region, managing storage locations effectively, and limiting direct access to external locations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views8 pages

Unity Catalog

Unity Catalog is a governance solution for managing data and AI assets across Databricks workspaces, providing centralized access control, auditing, and data discovery. It features a hierarchical object model that organizes data assets into catalogs, schemas, tables, and views, and supports both centralized and distributed governance models. Best practices include configuring a dedicated metastore per region, managing storage locations effectively, and limiting direct access to external locations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

UNITY CATALOG

IN
DATABRICKS
WHAT IS UNITY CATALOG
Unity Catalog is a unified governance solution for managing data and AI
assets across Databricks workspaces.
It offers a centralized platform for controlling access, auditing, tracking
lineage, and discovering data assets.

WITHOUT UNITY CATALOG WITH UNITY CATALOG

Databricks Databricks Unity Catalog


Workspace Workspace User
Metastore
Management
User User
Management Management

Metastore Metastore Databricks Databricks


Workspace Workspace
Compute Compute
Compute Compute

KEY FEATURES:
Centralized Access Control
Define data access policies in one place, enforce them across all
workspaces.
Standards-Compliant Security Model
Based on ANSI SQL, permissions are managed at catalog, schema, table,
and view levels using familiar syntax.
Built-In Auditing and Lineage
Automatic logging of user access and tracking of data lineage, showing the
creation and usage of data assets.
Data Discovery
Tag and document data assets, with a search interface for easy discovery.
UNITY CATALOG OBJECT MODEL
AND HIERARCHY
The top-level container for
metadata, managing data and AI
Metastore assets and their access permissions.

Storage External Clean


Catalog Share Recipient Ptovider Connection
Credential location Room

Schema

Table View Volume Model Function

Data & AI Objects


Catalogs Schema
Volumes: Logical storage for
Organize data assets; Also known as
unstructured, non-tabular data
often reflect databases; contain
Tables: Collections of data,
organizational units tables, views,
either managed or external.
or development volumes, AI models,
Views: Saved queries against
scopes. and functions.
tables.
Non-data securable Organize assets into
Functions: Saved logic that
objects like storage logical categories,
returns values or sets of rows.
credentials and usually representing
Models: AI models with
external locations a specific use case,
MLflow.
also live at this level. project, or team.
UNITY CATALOG GOVERNANCE
MODELS

Centralized Governance:
Governance administrators own the metastore.
Administrators can take ownership of any object and
manage permissions.
Distributed Governance:
Data domains are managed at the catalog level.
Catalog owners manage all assets and governance
within their domain.
Best Practice:
Set a group as the metastore admin or catalog owner
for consistent management.
STORAGE SEPARATION AND
HIERARCHY
Data Separation:
Store specific data types in designated cloud accounts or
buckets.
Example: HR production data stored in s3://mycompany-
hr-prod/unity-catalog.
Storage Hierarchy:
Storage locations can be configured at the metastore, catalog,
or schema level.
Hierarchical Evaluation:
a. Schema-level location
b. Catalog-level location
c. Metastore-level location
DATA ACCESS CONTROL IN
DESIGNATED ENVIRONMENTS
Environment-Specific Access:
Workspaces are primary data processing
environments.
Catalogs as primary data domains.
Metastore admins and catalog owners can bind
catalogs to specific workspaces.
Use Cases:
Isolate production data from development
environments.
Ensure data compliance by restricting access to
specific environments.
CONFIGURING UNITY CATALOG
METASTORE
Metastore Overview:
Top-level container managing data assets (tables,
views, volumes).
Configure one metastore per region for Databricks
workspaces.
Best Practices:
Use a dedicated bucket for metastore-managed
storage.
Avoid giving direct access to the managed storage
location.
Prefer catalog-level managed storage over
metastore-level.
EXTERNAL LOCATIONS AND
STORAGE CREDENTIALS
External Locations:
Combine storage credentials with a cloud storage
path.
Use to register external tables and volumes.
Best Practices:
Limit direct access to external locations.
Avoid using external locations for path-based
access outside of registered tables or volumes.
Use volumes for SQL-based file management and
access.

You might also like