DataCamp Databricks
DataCamp Databricks
Databricks
Lakehouse
D ATA B R I C K S C O N C E P T S
Kevin Barlow
Data Analytics Practitioner
The Data Warehouse
Data Warehouse
Pros
Highly performant
Cons
Very expensive
1 https://www.databricks.com/blog/2021/05/19/evolution-to-the-data-lakehouse.html
DATABRICKS CONCEPTS
The Data Lake
Data Lake
Pros
Very flexible
Cost effective
Cons
1 https://www.databricks.com/blog/2021/05/19/evolution-to-the-data-lakehouse.html
DATABRICKS CONCEPTS
Birth of the Lakehouse
1 https://www.databricks.com/blog/2021/05/19/evolution-to-the-data-lakehouse.html
DATABRICKS CONCEPTS
Birth of the Lakehouse
1 https://www.databricks.com/blog/2021/05/19/evolution-to-the-data-lakehouse.html
DATABRICKS CONCEPTS
The Databricks Lakehouse
The Databricks Lakehouse Platform
Simplified architecture
1 https://www.databricks.com/blog/2021/05/19/evolution-to-the-data-lakehouse.html
DATABRICKS CONCEPTS
Databricks Architecture Benefits
Unification Multi-Cloud
Benefits of data warehouse and data lake No lock-in to a specific cloud platform
DATABRICKS CONCEPTS
Databricks Development Benefits
Collaborative Open-Source
Ability to work in same platform in real- Support for most popular languages
time (Python, R, Scala, SQL)
DATABRICKS CONCEPTS
Let's practice!
D ATA B R I C K S C O N C E P T S
Core features of the
Databricks
Lakehouse Platform
D ATA B R I C K S C O N C E P T S
Kevin Barlow
Data Practitioner
Apache Spark
Apache Spark is an open-source data processing framework and is the engine underneath
Databricks.
DataCamp Courses
Introduction to Pyspark
DATABRICKS CONCEPTS
Benefits of Spark
Key Benefits:
4. Databricks optimizations
1 https://spark.apache.org/docs/latest/cluster-overview.html
DATABRICKS CONCEPTS
Cloud computing basics
DATABRICKS CONCEPTS
Databricks Compute
Clusters
SQL Warehouses
SQL only
BI use cases
Photon
DATABRICKS CONCEPTS
Cloud data storage
DATABRICKS CONCEPTS
Delta
Delta is an open-source data storage file
format, and provides:
ACID transactions
Schema evolution
Table history
Time-travel
1 delta.io
DATABRICKS CONCEPTS
Unity Catalog
Unity Catalog is an open data governance
strategy that controls access to all data
assets in the Databricks Lakehouse platform.
DATABRICKS CONCEPTS
Databricks UI
Designed for easier access to capabilities
based on your data workload.
DATABRICKS CONCEPTS
Let's review!
D ATA B R I C K S C O N C E P T S
Administering a
Databricks
workspace
D ATA B R I C K S C O N C E P T S
Kevin Barlow
Data Practitioner
Account Admin
Key Responsibilities:
DATABRICKS CONCEPTS
Account Console
https://accounts.cloud.databricks.com/
DATABRICKS CONCEPTS
Account Console - Workspaces
https://accounts.cloud.databricks.com/
DATABRICKS CONCEPTS
Account Console - Data
https://accounts.cloud.databricks.com/
DATABRICKS CONCEPTS
Account Console - Users & Groups
https://accounts.cloud.databricks.com/
DATABRICKS CONCEPTS
Account Console - Settings
https://accounts.cloud.databricks.com/
DATABRICKS CONCEPTS
Workspace Admin
Key Responsibilities:
DATABRICKS CONCEPTS
Data Plane
Contains all of the customer's assets needed for computation with Databricks.
DATABRICKS CONCEPTS
Control Plane
The portion of the platform that is managed and hosted by Databricks.
DATABRICKS CONCEPTS
Databricks Platform Architecture
Each cloud will have the same general
options to create a workspace:
Account Console
1 https://docs.databricks.com/getting-started/overview.html
DATABRICKS CONCEPTS
Let's review!
D ATA B R I C K S C O N C E P T S
Setting up a
Databricks
workspace example
D ATA B R I C K S C O N C E P T S
Kevin Barlow
Data Practitioner
Let's practice!
D ATA B R I C K S C O N C E P T S