0% found this document useful (0 votes)
11 views

Querying Files

Uploaded by

Shakti Rathore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Querying Files

Uploaded by

Shakti Rathore
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Querying Files

Learning Objectives

u Querying data files directly

u Extract files as raw contents

u Configure options of external sources

u Use CTAS statements to create Delta Lake tables

Derar Alhussein © Udemy | Databricks Certified Data Engineer Associate - Preparation


Querying Files Directly

SELECT * FROM file_format.`/path/to/file`

Self-describing Non self-describing Single file Multiple files complete directory


formats Formats file_2022.json file_*.json /path/dir
- json - CSV
- parquet - TSV
- … - …

Derar Alhussein © Udemy | Databricks Certified Data Engineer Associate - Preparation


Example: JSON

SELECT * FROM json.`/path/file_name.json`

Derar Alhussein © Udemy | Databricks Certified Data Engineer Associate - Preparation


Raw data

u Extract text files as raw strings


u Text-based files (JSON, CSV, TSV, and TXT formats)
u SELECT * FROM text.`/path/to/file`

u Extract files as raw bytes


u Images or unstructured data
u SELECT * FROM binaryFile.`/path/to/file`

Derar Alhussein © Udemy | Databricks Certified Data Engineer Associate - Preparation


CTAS: Registering Tables from Files

u CREATE TABLE table_name


AS SELECT * FROM file_format.`/path/to/file`

u Automatically infer schema information from query results


u Do Not support manual schema declaration.
u Useful for external data ingestion with well-defined schema

u Do Not support file options


Derar Alhussein © Udemy | Databricks Certified Data Engineer Associate - Preparation
Registering Tables on External Data Sources

u CREATE TABLE table_name


(col_name1 col_type1, ...)
USING data_source
OPTIONS (key1 = val1, key2 = val2, ...)
LOCATION = path

u External table
u Non-Delta table!
Derar Alhussein © Udemy | Databricks Certified Data Engineer Associate - Preparation
Example: CSV

u CREATE TABLE table_name


(col_name1 col_type1, ...)
USING CSV
OPTIONS (header = "true",
delimiter = ”;")
LOCATION = path

Derar Alhussein © Udemy | Databricks Certified Data Engineer Associate - Preparation


Example: Database

u CREATE TABLE table_name


(col_name1 col_type1, ...)
USING JDBC
OPTIONS (url = "jdbc:sqlite://hostname:port",
dbtable = "database.table",
user = "username",
password = ”pwd” )
Derar Alhussein © Udemy | Databricks Certified Data Engineer Associate - Preparation
Limitation

u It’s Not Delta table!

u We can not expect the performance guarantees associated with


Delta Lake and Lakehouse

u Having a huge database table

Derar Alhussein © Udemy | Databricks Certified Data Engineer Associate - Preparation


Solution

u CREATE TEMP VIEW temp_view_name (col_name1 col_type1, ...)


USING data_source
OPTIONS (key1 = “val1”, key2 = “val2”, ..., path = “/path/to/files”)

u CREATE TABLE table_name


AS SELECT * FROM temp_view_name

Derar Alhussein © Udemy | Databricks Certified Data Engineer Associate - Preparation

You might also like