0% found this document useful (0 votes)
776 views333 pages

Ultimate SnowPro Core Certification Course Slides by Tom Bailey

Snowflake is a cloud data platform that allows users to store, process, and analyze data in the cloud. It utilizes a multi-cluster architecture with decoupled storage, compute, and management services to provide scalability. Data is stored in a proprietary columnar format across multiple micro-partitions for high performance. Users can create virtual warehouses to query the data using standard SQL, with the warehouses auto-scaling based on usage.

Uploaded by

ThanarakLee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
776 views333 pages

Ultimate SnowPro Core Certification Course Slides by Tom Bailey

Snowflake is a cloud data platform that allows users to store, process, and analyze data in the cloud. It utilizes a multi-cluster architecture with decoupled storage, compute, and management services to provide scalability. Data is stored in a proprietary columnar format across multiple micro-partitions for high performance. Users can create virtual warehouses to query the data using standard SQL, with the warehouses auto-scaling based on usage.

Uploaded by

ThanarakLee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 333

What is Snowflake?

tombaileycourses.com
What is Snowflake?

Cloud Native Data Platform SaaS

tombaileycourses.com
Data Platform

Structured & relational data Scalable storage and compute COPY INTO & Snowpipe

ANSI Standard SQL Separate compute clusters


Schema does not need to be
defined upfront
ACID compliant transactions Tasks and Streams

Data Data stored in databases,


Data Native processing of semi-
Data All data encrypted at rest
Warehouse schemas & tables Lake structured data formats Engineering and in transit.

Remove data management Secure Data Sharing Connectors and Drivers


roadblocks with centralised
storage Data Marketplace UDFs and Stored Procedures

Partner eco-system includes


data science tooling: Data Exchange External UDFs
Data • Amazon SageMaker Data Data
• DataRobot BI with the Snowflake partner Preview features such as
Science • Dataiku Sharing ecosystem tools Applications Snowpark

tombaileycourses.com
Cloud Native

Snowflake’s software is purpose built for the Cloud.

All Snowflake infrastructure runs on the Cloud in


either AWS, GCP or Azure.

Snowflake makes use of Cloud’s elasticity, scalability,


high availability, cost-efficiency & durability.

tombaileycourses.com
Software as a service (SaaS)

No management of Transparent updates Subscription payment


hardware and patches model

Ease of access Automatic optimisation

tombaileycourses.com
Multi-cluster Shared Data Architecture

tombaileycourses.com
Distributed Architectures

Shared-Disk Shared Nothing

tombaileycourses.com
Shared Disk Architecture

Shared-Disk

tombaileycourses.com
Shared Nothing Architecture

Shared Nothing

tombaileycourses.com
Multi-cluster Shared Data Architecture

Authentication & Access Control

Cloud Services Infrastructure Query Transaction


Security
Layer Management Optimiser Manager

Metadata
Decouple storage, compute
and management services.

Virtual Virtual Virtual


Query Processing Warehouse Warehouse Warehouse Three infinitely scalable
Layer layers.

Workload isolation with


virtual warehouses.

Data Storage
Layer

Cloud Agnostic Layer

tombaileycourses.com
Storage Layer

tombaileycourses.com
Storage Layer

Persistent and infinitely scalable cloud storage residing


in cloud providers blob storage service, such as AWS S3.

Snowflake users by proxy get the availability & durability


99.999999999% durability
guarantees of the cloud providers blob storage.
AWS S3

Data loaded into Snowflake is organized by databases,


schemas and accessible primarily as tables.

CSV JSON Avro


Both structured and semi-structured data files can be
loaded and stored in Snowflake.
ORC Parquet XML

tombaileycourses.com
Storage Layer

When data files are loaded or rows inserted into a table,


Snowflake reorganizes the data into its proprietary CSV

compressed, columnar table file format.

The data that is loaded or inserted is also partitioned into


P1 P2 P3 P4 P5 P6 P7 …
what Snowflake call micro-partitions.

Storage is billed by how much is stored based on a flat rate $42.00(TB/mo)


per TB calculated monthly. AWS Europe London

Data is not directly accessible in the underlying blob


SELECT * FROM <table>;
storage, only via SQL commands.

tombaileycourses.com
Query Processing Layer

tombaileycourses.com
Query Processing Layer

The query processing layer consists of “Virtual Warehouses”


that execute the processing tasks required to return results
for most SQL statements.

Virtual Virtual Virtual


A Virtual Warehouse is a named abstraction for a cluster Warehouse Warehouse Warehouse
of a cloud-based compute instances that Snowflake manage.

Cache Cache Cache

CREATE WAREHOUSE MY_WH WAREHOUSE_SIZE=LARGE;

Storage Layer

Underlying nodes of a Virtual Warehouse cooperate in a


similar way to a shared–nothing compute clusters making use
of local caching.

tombaileycourses.com
Query Processing Layer

Virtual warehouses can be created or removed instantly.

Large Small Medium


Virtual warehouses can be paused or resumed.
Virtual Virtual Virtual
Warehouse Warehouse Warehouse
Virtually unlimited number of virtual warehouses can be
created each with it’s own configuration.
Cache Cache Cache

Virtual warehouses come in multiple “t-shirt” sizes


indicating their relative compute power. Storage Layer

All running virtual warehouses have consistent access to


the same data in the storage layer.

tombaileycourses.com
Services Layer

tombaileycourses.com
Services Layer

The services layer is a collection of highly available and


scalable services that coordinate activities such as
authentication and query optimization across all
Snowflake accounts.

Authentication & Access Control


Similar to the underlying virtual warehouse resources,
the services layer also runs on cloud compute instances.
Infrastructure Query Transaction
Security
Management Optimiser Manager

Services managed by this layer include:


• Authentication & Access Control Metadata

• Infrastructure Management
• Transaction Management
Virtual
• Metadata Management Warehouse
• Query parsing and optimisation
• Security

tombaileycourses.com
Services Layer

The services layer is a collection of highly available and scalable


services that coordinate activities such as authentication and
query optimization across all Snowflake accounts.

Authentication & Access Control

Infrastructure Query Transaction


Security
Management Optimiser Manager

Metadata

tombaileycourses.com
Editions & Key Features

tombaileycourses.com
Snowflake Editions & Key Features

Virtual Private
Standard Enterprise Business Critical
Snowflake

SQL Support Security, Governance, & Data Protection Compute Resource Management

Interface & Tools Releases Data Import & Export Data Replication & Failover

tombaileycourses.com
SQL Support

Feature Standard Enterprise Business Critical VPS

⇒ Standard SQL defined in SQL:1999

⇒ Advanced DML defined in SQL:2003

⇒ Standard data types: VARCHAR, NUMBER,


TIMESTAMP etc

⇒ Semi-structured data types: VARIANT, OBJECT


& ARRAY.

⇒ Multi-statement transactions

tombaileycourses.com
SQL Support

Feature Standard Enterprise Business Critical VPS

⇒ User Defined Functions (UDFs)

⇒ Automatic Clustering

⇒ Zero-copy Cloning

⇒ Search Optimization Service

⇒ Materialized Views

tombaileycourses.com
Security, Governance & Data Protection

Feature Standard Enterprise Business Critical VPS

⇒ Federated authentication and SSO

⇒ Multi-factor authentication (MFA)

⇒ Time Travel & Fail-safe Continuous


Data
⇒ Encryption at-rest and in-transit Protection

⇒ Network Policies

⇒ Access Control Framework

tombaileycourses.com
Security, Governance & Data Protection

Feature Standard Enterprise Business Critical VPS

⇒ Column and row access policies

⇒ Tri-secret secure

⇒ Private Connectivity

⇒ Support for compliance regulations: PCI DSS,


HIPAA, HITRUST CSF, IRAP, FedRAMP

⇒ Dedicated metastore and pool of compute


resources

tombaileycourses.com
Compute Resource Management

Feature Standard Enterprise Business Critical VPS

⇒ Separate compute clusters (Virtual Warehouses)

⇒ Resource Monitors

⇒ Multi-cluster Virtual Warehouses

tombaileycourses.com
Interface & Tools

Feature Standard Enterprise Business Critical VPS

⇒ Classic UI & Snowsight UI

⇒ SnowSQL & SnowCD

⇒ Native programming interfaces

⇒ Third-party tools ecosystem

⇒ Snowflake Partner Connect

tombaileycourses.com
Data Import & Export

Feature Standard Enterprise Business Critical VPS

⇒ Bulk Loading

⇒ Bulk Unloading

⇒ Continuous Data Loading with Snowpipe

⇒ Snowflake Connector for Kafka

tombaileycourses.com
Data Replication & Failover

Feature Standard Enterprise Business Critical VPS

⇒ Database replication

⇒ Database failover and failback

tombaileycourses.com
Snowflake’s Catalogue
and Objects

tombaileycourses.com
Snowflake Object Model

Organisation

ALTER ACCOUNT SET SSO_LOGIN_PAGE = TRUE;

Account

Network Resource
User Role Database Warehouse Share
Policy Monitor

Schema

Stage Pipe Procedure Function Table View Task Stream

tombaileycourses.com
Organisation, Account,
Database & Schema.

tombaileycourses.com
Organisation Overview

Organisation

Manage one or more Snowflake accounts.

Setup and administer Snowflake features


which make use of multiple accounts.

Monitoring usage across accounts.

tombaileycourses.com
Organisation Setup

Contact Snowflake support Provide organisation name and nominate an account ORGADMIN role added to nominated account

tombaileycourses.com
ORGADMIN Role

ORGADMIN

Account Management Enable cross-account features Monitoring account usage

CREATE ACCOUNT MYACCOUNT1 SELECT


SELECT
ADMIN_NAME = admin system$global_account_set_parameter(
ROUND(SUM(AVERAGE_BYTES) /POWER(1024,4),2)
ADMIN_PASSWORD = 'Password123' ‘UT677AA’,
FROM ORGANIZATION_USAGE.STORAGE_DAILY_HISTORY
FIRST_NAME = jane ‘ENABLE_ACCOUNT_DATABASE_REPLICATION’,
WHERE USAGE_DATE = CURRENT_DATE();
LAST_NAME = smith ‘true’);
EMAIL = '[email protected]'
EDITION = enterprise
REGION = aws_us_west_2;

SHOW ORGANIZATION ACCOUNTS;

SHOW REGIONS;

tombaileycourses.com
Account Overview

An account is the administrative name for a collection of


storage, compute and cloud services deployed and managed
entirely on a selected cloud platform.

Each account is hosted on a single cloud provider:


• Amazon Web Services (AWS)
• Google Cloud Platform (GCP)
• Microsoft Azure (Azure)

Each account is provisioned in a single geographic region.

Each account is created as a single Snowflake edition.

An account is created with the system-defined role


ACCOUNTADMIN.

tombaileycourses.com
Account Regions

aws.us-west-2
US West (Oregon)

aws.ca-central-1
Canada (Central)

azure.westus2
West US 2 (Washington)

tombaileycourses.com
Account URL

Using an Account Locator as an Identifier

xy12345.us-east-2.aws.snowflakecomputing.com

Account Region Cloud


Locator ID Provider

Account
Identifier

Using an Organization and Account Name as Identifier

acme-marketing-test-account.snowflakecomputing.com

Organization Account
Name
tombaileycourses.com
Database & Schemas

DATABASE

SCHEMA_1 SCHEMA_2 SCHEMA_3

Tables Tables Tables


TABLE TABLE TABLE

Views Views Views


VIEW VIEW VIEW

tombaileycourses.com
Database & Schemas

DATABASE SCHEMA

Databases must have a unique identifier in an Schemas must have a unique identifier in a
account. database.

A database must start with an alphabetic A schema must start with an alphabetic character
character and cannot contain spaces or special and cannot contain spaces or special characters
characters unless enclosed in double quotes. unless enclosed in double quotes.

CREATE DATABASE MY_DATABASE; CREATE SCHEMA MY_SCHEMA;

CREATE DATABASE MY_DB_CLONE CLONE MYTESTDB; CREATE SCHEMA MY_SCHEMA_CLONE CLONE MY_SCHEMA;

CREATE DATABASE MYDB1


AS REPLICA OF MYORG.ACCOUNT1.MYDB1 MY_DATABASE.MY_SCHEMA
DATA_RETENTION_TIME_IN_DAYS = 10;

Namespace
CREATE DATABASE SHARED_DB FROM SHARE UTT783.SHARE;

tombaileycourses.com
Table and View Types

tombaileycourses.com
Table Types

Permanent Temporary Transient External

Default table type. Used for transitory Exists until explicitly Query data outside
data. dropped. Snowflake.

Exists until explicitly Persist for duration No fail-safe period. Read-only table.
dropped. of a session.

Time Travel 90 days 1 day 1 day

Fail-safe

tombaileycourses.com
View Types

Standard Materialized Secure

CREATE VIEW MY_VIEW AS CREATE MATERILIZED VIEW MY_VIEW AS CREATE SECURE VIEW MY_VIEW AS
SELECT COL1, COL2 FROM MY_TABLE; SELECT COL1, COL2 FROM MY_TABLE; SELECT COL1, COL2 FROM MY_TABLE;

Does not contribute to storage cost. Stores results of a query definition Both standard and materialized
and periodically refreshes it. can be secure.

If source table is dropped, querying view Underlying query definition only


Incurs cost as a serverless feature.
returns error. visible to authorized users.

Used to boost performance of Some query optimizations


Used to restrict contents of a table.
external tables. bypassed to improve security.

tombaileycourses.com
UDFs and Stored
Procedures

tombaileycourses.com
User Defined Functions (UDFs)

User defined functions (UDFs) are schema-level objects that enable


users to write their own functions in three different languages:
• SQL
CREATE FUNCTION AREA_OF_CIRLE(radius FLOAT)
• JavaScript RETURNS FLOAT
TABLE (area number)

• Python AS

Java
$$

pi() * radius * radius
$$;
UDFs accept 0 or more parameters.

UDFs can return scalar or tabular results (UDTF).

UDFs can be called as part of a SQL statement. SELECT AREA_OF_CIRCLE(col1) FROM MY_TABLE;

UDFs can be overloaded.

tombaileycourses.com
JavaScript UDF

CREATE FUNCTION JS_FACTORIAL(d double)


RETURNS DOUBLE
JavaScript is specified with the language parameter.
LANGUAGE JAVASCRIPT
AS
$$
Enables use of high-level programming language features.
if (D <= 0) {
return 1
} else { JavaScript UDFs can refer to themselves recursively.
var result = 1;
for (var i = 2; i <= D; i++) {
result = result * i;
} Snowflake data types are mapped to JavaScript data types.
return result;
}
$$;

JavaScript

tombaileycourses.com
Java UDF

CREATE FUNCTION DOUBLE(X INTEGER) Snowflake boots up a JVM to execute function written in Java.
RETURNS INTEGER
LANGUAGE JAVA
HANDLER='TestDoubleFunc.double’ Snowflake currently supports writing UDFs in Java versions
TARGET_PATH='@~/TestDoubleFunc.jar’ 8.x, 9.x, 10.x, and 11.x.
AS
$$
class TestDoubleFunc { Java UDFs can specify their definition as in-line code or
public static int double(int x) { a pre-compiled jar file.
return x * 2;
}
}
$$; Java UDFs cannot be designated as secure.

Java

tombaileycourses.com
External Functions

CREATE OR REPLACE EXTERNAL FUNCTION CALC_SENTIMENT(STRING_COL VARCHAR) Function Name and Parameters
RETURNS VARIANT Return Type
API_INTEGRATION = AWS_API_INTEGRATION Integration Object
AS 'https://ttu.execute-api.eu-west-2.amazonaws.com/'; URL Proxy Service

CREATE OR REPLACE API INTEGRATION AWS_API_INTEGRATION

API_PROVIDER=AWS_API_GATEWAY

API_AWS_ROLE_ARN='ARN:AWS:IAM::123456789012:ROLE/MY_CLOUD_ACCOUNT_ROLE'

API_ALLOWED_PREFIXES=('HTTPS://XYZ.EXECUTE-API.US-WEST-2.AMAZONAWS.COM/PRODUCTION')

ENABLED=TRUE;

tombaileycourses.com
External Function Call Lifecycle

Client Program Snowflake HTTPS Proxy HTTPS Proxy


Snowflake UI, SnowSQL, Python… Service Service

HTTP POST HTTP POST


request request

SELECT CALC_SENTIMENT(‘x’)

FROM MY_TABLE;
External Function
HTTP HTTP
response response

AWS API Gateway AWS Lambda

API Integration

tombaileycourses.com
External Function Limitations

Slower Scalar only Not sharable

Less secure Egress charges

tombaileycourses.com
Stored Procedures
In Relational Database Management Systems (RDBMS) stored procedures
were named collections of SQL statements often containing procedural logic.

Database Admin (DBA) Data Engineer (DE)

EXECUTE CLEAR_EMP_TABLES;
CREATE PROCEDURE CLEAR_EMP_TABLES
DELETE FROM EMP01 WHERE EMP_DATE < DATEADD(MONTH, -1, GET_DATE())
AS
DELETE FROM EMP02 WHERE EMP_DATE < DATEADD(MONTH, -1, GET_DATE())
BEGIN
DELETE FROM EMP03 WHERE EMP_DATE < DATEADD(MONTH, -1, GET_DATE())
DELETE FROM EMP04 WHERE EMP_DATE < DATEADD(MONTH, -1, GET_DATE())
DELETE FROM EMP05 WHERE EMP_DATE < DATEADD(MONTH, -1, GET_DATE())

END

SQL examples on this slide from Microsoft SQL Server.


tombaileycourses.com
Snowflake Stored Procedures

JavaScript Snowflake Scripting Snowpark

SQL Python Java Scala

tombaileycourses.com
Stored Procedure: JavaScript

Stored procedure identifier and input parameters. CREATE PROCEDURE EXAMPLE_STORED_PROCEDURE(PARAM1 STRING)

RETURNS option mandatory. RETURNS STRING

JAVASCRIPT, SQL, PYTHON, JAVA & SCALA. LANGUAGE JAVASCRIPT

Stored procedures can execute with the owner’s EXECUTE AS OWNER


rights or caller’s rights.
AS

$$

var param1 = PARAM1;

Stored procedures mix JavaScript and SQL in var sql_command = “SELECT * FROM ” + param1;
their definition using Snowflake’s JavaScript API.
snowflake.execute({sqlText: sql_command});

return "Succeeded.";
CALL EXAMPLE_STORED_PROCEDURE(‘EMP01’);
$$;

tombaileycourses.com
Stored Procedures & UDFs

Feature UDF Stored Procedure

Called as part of SQL statement

Ability to overload

0 or more input parameters

Use of JavaScript API

Return of value optional

Values returned usable in SQL

Call itself recursively

Functions calculate something and Stored procedures perform


return a value to the user. actions rather than return values.

tombaileycourses.com
Sequences

tombaileycourses.com
Sequences

CREATE SEQUENCE DEFAULT_SEQUENCE

START = 1

INCREMENT = 1;

SELECT DEFAULT_SEQUENCE.NEXTVAL; SELECT DEFAULT_SEQUENCE.NEXTVAL; SELECT DEFAULT_SEQUENCE.NEXTVAL;

NEXTVAL NEXTVAL NEXTVAL


1 2 3

Values generated by a sequence are globally unique.

tombaileycourses.com
Sequences

CREATE SEQUENCE INCREMENT_SEQUENCE

START = 0

INCREMENT = 5;

SELECT INCREMENT_SEQUENCE.NEXTVAL, INCREMENT_SEQUENCE.NEXTVAL, INCREMENT_SEQUENCE.NEXTVAL, INCREMENT_SEQUENCE.NEXTVAL;

NEXTVAL NEXTVAL_1 NEXTVAL_2 NEXTVAL_3

35
0 405 45
10 50
15

Sequences cannot guarantee their values will be gap free.

tombaileycourses.com
Sequences

CREATE SEQUENCE TRANSACTION_SEQ

INSERT INTO TABLE. START = 1001

INCREMENT = 1;

INSERT INTO TRANSACTION (ID)


VALUES (TRANSACTION_SEQ.NEXTVAL)

SELECT ID FROM TRANSACTION;

NEXTVAL

1001

tombaileycourses.com
Sequences

DEFAULT VALUE FOR


A COLUMN TABLE.

CREATE TABLE TRANSACTIONS


(ID INTEGER DEFAULT TRANSACTION_SEQ.NEXTVAL,
AMOUNT DOUBLE);

INSERT INTO TRANSACTION (AMOUNT) VALUES (756.00);

SELECT ID FROM TRANSACTION;

ID AMOUNT

1002 756.00

tombaileycourses.com
Tasks & Streams

tombaileycourses.com
Tasks & Streams

SQL

Tasks Streams

tombaileycourses.com
Tasks
A task is an object used to schedule the execution of a SQL command or a stored procedure.

Task Workflow Tree of Tasks

ACCOUTNADMIN role or CREATE TASK privilege. T1 Root Task


Must define a
schedule.

CREATE TASK T1 Task Name T2 T3 Child Task


WAREHOUSE = MYWH Warehouse Definition Cannot define a
schedule.
SCHEDULE = ‘30 MINUTE’ Triggering Mechanism
AS T4 Up to 1000 child
… tasks
COPY INTO MY_TABLE Query Definition
FROM $MY_STAGE;
CREATE TASK T2

WAREHOUSE = MYWH

AFTER T1 Child task


ALTER TASK MINUTE_TASK RESUME; Start task triggering
AS
mechanism
COPY INTO MY_TABLE FROM $MY_STAGE;

tombaileycourses.com
Streams
A stream is an object created to view & track DML changes to a source table – inserts, updates & deletes.

Create Stream

CREATE STREAM MY_STREAM ON TABLE MY_TABLE;

Query Stream

SELECT * FROM MY_STREAM;

tombaileycourses.com
Streams

Action Table Version Stream Progress stream offset

INSERT 10 ROWS V1 SELECT * FROM MYSTREAM; INSERT INTO MYTABLE2 SELECT * FROM MYSTREAM;

Empty stream
CREATE STREAM MYSTREAM Tasks & Streams
ON TABLE MYTABLE;

V2
CREATE TASK MYTASK1
UPDATE 2 ROWS SELECT * FROM MYSTREAM;
WAREHOUSE = MYWH
2 Updates
SCHEDULE = '5 MINUTE'
WHEN

SYSTEM$STREAM_HAS_DATA('MYSTREAM')
AS
DELETE 1 ROW V3 SELECT * FROM MYSTREAM;
INSERT INTO MYTABLE1(ID,NAME) SELECT ID, NAME
2 Updates
1 Delete
FROM MYSTREAM WHERE METADATA$ACTION = 'INSERT';

tombaileycourses.com
Billing

tombaileycourses.com
Billing Overview

On-demand Capacity

Pay for usage as you go Pay for usage upfront

tombaileycourses.com
Billing Overview

Virtual Warehouse
Cloud Services Serverless Services
Services

Storage Data Transfer

tombaileycourses.com
Billing Overview

Credits

Dollar Value

tombaileycourses.com
Compute Billing Overview
Credits Snowflake billing unit of measure for compute resource consumption.

Virtual Warehouse Services Cloud Services Serverless Services

• Credit calculated based on size • Credits calculated at a rate of • Each serverless feature has it’s
of virtual warehouse. 4.4 Credits per compute hour. own credit rate per compute-hour.

• Credit calculated on per second • Only cloud services that • Serverless features are
basis while a virtual warehouse exceeds 10% of the daily usage composed of both compute
is in ‘started’ state. of the compute resources are services and cloud services.
billed.

• Credit calculated with a • This is called the Cloud • Cloud Services Adjustment does
minimum of 60 seconds. Services Adjustment. not apply to cloud services
usage when used by serverless
features.

tombaileycourses.com
Data Storage & Transfer Billing Overview

Dollar value Storage and Data Transfer are billed in currency.

Data Storage Data Transfer

• Data storage is calculated monthly based on the average • Data transfer charges apply when moving data from one
number of on-disk bytes per day in the following locations: region to another or from one cloud platform to another.
o Database Tables.
o Internal Stages. • Unloading data from Snowflake using COPY INTO
<location> command.

• Costs calculated based on a flat dollar value rate per • Replicating data to a Snowflake account in a different
terabyte (TB) based on: region or cloud platform.
o Capacity or On-demand.
o Cloud provider. • External functions transferring data out of and into
o Region. Snowflake.

tombaileycourses.com
SnowCD

SELECT SYSTEM$WHITELIST(); Returns hostnames and port numbers.

tombaileycourses.com
SnowCD
whitelist.json

snowcd ~\whitelist.json

Performing 30 checks on 12 hosts


All checks passed.

tombaileycourses.com
Connectivity: Connectors, Drivers
and Partnered Tools

tombaileycourses.com
Connectors and Drivers

Python Go PHP .NET NodeJS

ODBC

Spark Kafka JDBC ODBC

Connect third-party tools


tombaileycourses.com
Python Connector Example

pip install snowflake-connector-python==2.6.2

tombaileycourses.com
Snowflake Partner Tools

Business Intelligence Data Integration Security & Governance

SQL Development & Machine Learning &


Management Data Science

tombaileycourses.com
Snowflake Partner Tools

Business Intelligence Data Integration Security & Governance

Snowflake Partner Connect is a feature to expedite connectivity with partnered tools.


tombaileycourses.com
Snowflake Partner Tools

Machine Learning & Data Science SQL Development & Management

tombaileycourses.com
Snowflake Scripting

tombaileycourses.com
Snowflake Scripting

Snowflake Scripting is an extension to Snowflake SQL


that adds support for procedural logic.

It’s used to write stored procedures and procedural


code outside of a stored procedure.

DECLARE
(variable declarations, cursor declarations, etc.)
BEGIN
(Snowflake Scripting and SQL statements)
EXCEPTION
(statements for handling exceptions)
END;

tombaileycourses.com
Snowflake Scripting

declare
leg_a number(38, 2);
hypotenuse number(38,5);
5.38516
begin
leg_a := 2; 2
let leg_b := 5;

hypotenuse := sqrt(square(leg_a) + square(leg_b));


return hypotenuse;
5
end;

Variables can only be used within the scope of the block.

anonymous block
5.38516 Variables can also be declared and assigned in the
BEGIN section using the LET keyword.
tombaileycourses.com
Snowflake Scripting

CREATE PROCEDURE pythagoras()


RETURNS float
LANGUAGE sql
AS
declare 5.38516
leg_a number(38, 2); 2
hypotenuse number(38,5);
begin
leg_a := 2;
let leg_b := 5; 5

hypotenuse := sqrt(square(leg_a) + square(leg_b));


return hypotenuse;
SnowSQL and the Classic Console do not correctly
end; parse Snowflake Scripting blocks, they need to be
wrapped in string constant delimiters like dollar signs.

tombaileycourses.com
Branching Constructs

begin
let count := 4;
if (count % 2 = 0) then
return 'even value';
else DECLARE or EXCEPTION sections
of a block are optional.
return 'odd value';
end if;
end;

anonymous block
even value

tombaileycourses.com
Looping Constructs

declare
total integer default 0;
max_num integer default 10;
begin
for i in 1 to max_num do
total := i + total;
end for;
return total;
end;

anonymous block
55

tombaileycourses.com
Cursor

declare
total_amount float;
c1 cursor for select amount from transactions;
begin
total_amount := 0.0;
for record in c1 do
total_amount := total_amount + record.amount;
end for;
return total_amount;
end;

anonymous block
136.78

tombaileycourses.com
RESULTSET

TABLE()

declare
res resultset;
begin
res := (select amount from transactions);
return table(res);
end;

amount
101.01

24.78

10.99

tombaileycourses.com
RESULTSET
Cursor

declare
total_amount float;
res resultset default (select amount from transactions);
c1 cursor for res;
begin
total_amount := 0.0;
for record in c1 do
total_amount := total_amount + record.amount;
end for;
return total_amount;
end;

anonymous block
136.78

tombaileycourses.com
Snowpark

tombaileycourses.com
Snowpark

Lazily-evaluated/executed.

Snowpark API Pushdown computation.

.select() DataFrame
Java
.join()
.group_by() Row
Scala .distinct()
Row
.drop()
.union() Row
Python .sort()
Row

tombaileycourses.com
Snowpark API: Python

1 import os
2 from snowflake.snowpark import Session
3 from snowflake.snowpark.functions import col

5 connection_parameters = {
6 "account": os.environ["snowflake_account"],
7 "user": os.environ["snowflake_user"],
8 "password": os.environ["snowflake_password"],
9 "role": os.environ["snowflake_user_role"],
10 "warehouse": os.environ["snowflake_warehouse"],
11 "database": os.environ["snowflake_database"],
12 "schema": os.environ["snowflake_schema"]
13 }

tombaileycourses.com
Snowpark API: Python

14 session = Session.builder.configs(connection_parameters).create()

15 transactions_df = session.table("transactions")

16 print(transactions_df.collect())

Console output:

[Row(ACCOUNT_ID=8764442, AMOUNT=12.99),
Row(ACCOUNT_ID=8764442, AMOUNT=50.0),
Row(ACCOUNT_ID=8764442, AMOUNT=1100.0),
Row(ACCOUNT_ID=8764443, AMOUNT=110.0),
Row(ACCOUNT_ID=8764443, AMOUNT=2766.0),
Row(ACCOUNT_ID=8764443, AMOUNT=1010.0),
Row(ACCOUNT_ID=8764443, AMOUNT=3022.23),
Row(ACCOUNT_ID=8764444, AMOUNT=6986.0),
Row(ACCOUNT_ID=8764444, AMOUNT=1500.0)]

tombaileycourses.com
Snowpark API: Python

18 transactions_df_filtered = transactions_df.filter(col("amount") >= 1000.00)

19 transaction_counts_df = transactions_df_filtered.group_by("account_id").count()

20 flagged_transactions_df = transaction_counts_df.filter(col("count") >= 2).rename(col("count"), "flagged_count")

21 flagged_transactions_df.write.save_as_table("flagged_transactions", mode="append")

22 print(flagged_transactions_df.show())

23 session.close()

Console output:

----------------------------------
|“ACCOUNT_ID” |“FLAGGED_COUNT” |
----------------------------------
|8764443 |3 |
|8764444 |2 |
----------------------------------
tombaileycourses.com
Access Control Overview

tombaileycourses.com
Access Control Overview

Role-based Access Control (RBAC) Discretionary Access Control (DAC)

Role A Object
Privilege Role User
OWNS

GRANTS

SELECT
SELECT
MODIFY
Role B
REMOVE

Role-based access control (RBAC) is an access control Snowflake combines RBAC with Discretionary Access
framework in which access privileges are assigned to roles Control (DAC) in which each object has an owner, who can in
and in turn assigned to users. turn grant access to that object.

tombaileycourses.com
Securable Objects

tombaileycourses.com
Securable Objects

Every securable object is owned by a single role which can be found by


executing a SHOW <object> command.

The owning role:


• Has all privileges on the object by default.
• Can grant or revoke privileges on the object to other roles.
• Transfer ownership to another role.
• Share control of an object if the owning role is shared.

Access to objects is also defined by privileges granted to roles:


• Ability to create a Warehouse.
• Ability to list tables contained in a schema
• Ability to add data to a table

Unless allowed by a grant, access to a securable object will be denied.

tombaileycourses.com
Roles

tombaileycourses.com
Roles

Role
A role is an entity to which privileges on securable objects
can be granted or revoked.
Privilege GRANT USAGE ON DATABASE TEST_DB TO ROLE TEST_ROLE;

Privilege GRANT SELECT ON TABLE TEST_TABLE TO ROLE TEST_ROLE;

Role Role
Roles are assigned to users to give them the authorization
to perform actions.
Role
Privilege
User USAGETEST_ROLE
GRANT ROLE ON SCHEMATOTEST_SCHEMA
USER ADMIN;TO ROLE TEST_ROLE;

GRANT ROLE ROLE_3 GRANT ROLE ROLE_2


A user can have multiple roles and switch between them
TO ROLE ROLE_2; TO ROLE ROLE_1; within a Snowflake session.

Role 3 Role 2 Role 1

Roles can be granted to other roles creating a role


Privilege A Privilege A Privilege A hierarchy.

Privilege B Privilege B

Privilege B Privileges of child roles are inherited by parent roles.

tombaileycourses.com
System-defined Roles

tombaileycourses.com
System-defined Roles

ORGADMIN
• Manages operations at organization level.
• Can create account in an organization.
• Can view all accounts in an organization.
• Can view usage information across an organization.

ACCOUNTADMIN
• Top-level and most powerful role for an account.
• Encapsulates SYSADMIN & SECURITYADMIN.
• Responsible for configuring account-level parameters.
• View and operate on all objects in an account.
• View and manage Snowflake billing and credit data.
• Stop any running SQL statements.

SYSADMIN
• Can create warehouses, databases, schemas and other
objects in an account.

tombaileycourses.com
System-defined Roles

SECURITYADMIN
• Manage grants globally via the MANAGE GRANTS
privilege.
• Create, monitor and manage users and roles.

USERADMIN
• User and Role management via CREATE USER and
CREATE ROLE security privileges.
• Can create users and roles in an account.

PUBLIC
• Automatically granted to every user and every role in an
account.
• Can own securable objects, however objects owned by
PUBLIC role are available to every other user and role in
an account.

tombaileycourses.com
Custom Roles

Custom roles allows you to create a role with custom and fine-
grained security privileges defined.

Custom roles allow administrators working with the system-


defined roles to exercise the security principle of least privilege.

Custom roles can be created by the SECURITYADMIN &


USERADMIN roles as well as by any role to which the CREATE
ROLE privilege has been granted.

Custom
Role
Custom
Role
It is recommended to create a hierarchy of custom roles with the
top-most custom role assigned to the SYSADMIN role.
Custom
Role

If custom roles are not assigned to the SYSADMIN role, system


admins will not be able to manage the objects owned by the custom
role.

tombaileycourses.com
Privileges

tombaileycourses.com
Privileges
MODIFY
A security privilege defines a level of access to an object.
MONITOR

For each object there is a set of security privileges that


USAGE
can be granted on it. OWNERSHIP

There are 4 categories of security privileges:


• Global Privileges
• Privileges for account objects Global
Privileges
Account
Objects
Schemas
Schema
Objects
• Privileges for schemas
• Privileges for schema objects

GRANT USAGE ON DATABASE MY_DB TO ROLE MY_ROLE;


Privileges are managed using the GRANT and REVOKE
commands. REVOKE USAGE ON DATABASE MY_DB TO ROLE MY_ROLE;

Future grants allow privileges to be defined for objects not GRANT SELECT ON FUTURE TABLES IN SCHEMA MY_SCHEMA TO ROLE MY_ROLE;
yet created.

tombaileycourses.com
User Authentication

tombaileycourses.com
User Authentication

User authentication is the process of authenticating with Snowflake


via user provided username and password credentials.

User authentication is the default method of authentication.

CREATE USER USER1


Users with the USERADMIN role can create additional Snowflake PASSWORD='ABC123’
users, which makes use of the CREATE USER privilege. DEFAULT_ROLE = MYROLE
MUST_CHANGE_PASSWORD = TRUE;

› A password can be any case-sensitive string up to 256 characters.


› Must be at least 8 characters long.
› Must contain at least 1 digit.
‘q@-*DaC2yjZoq3Re4JYX’
› Must contain at least 1 uppercase letter and 1 lowercase letter.

tombaileycourses.com
MFA

tombaileycourses.com
Multi-factor Authentication (MFA)

MFA is an additional layer of security, requiring the user to


prove their identity not only with a password but with an x2
additional piece of information (or factor).

MFA in Snowflake is powered by a service called Duo Security.

MFA is enabled on a per-user basis & only via the UI.

Snowflake recommend that all users with the ACCOUNTADMIN


role be required to use MFA.
ACCOUNTADMIN

tombaileycourses.com
Multi-factor Authentication Flow

Enter Passcode
Enter Snowflake
Login Successful
Interface Credentials
Follow Instructions
on phone Call

Approve Duo Push


notifications

Click Call Me

Click Enter A
Passcode

Device
tombaileycourses.com
MFA Properties

MINS_TO_BYPASS_MFA DISABLE_MFA ALLOWS_CLIENT_MFA_CACHING

ALTER USER USER1 SET ALTER USER USER1 SET ALTER USER USER1 SET
MINS_TO_BYPASS_MFA=10; DISABLE_MFA=TRUE; ALLOWS_CLIENT_MFA_CACHING=TRUE;

Specifies the number of minutes to Disables MFA for the user, effectively MFA token caching reduces the number of
temporarily disable MFA for the user so that cancelling their enrolment. To use MFA prompts that must be acknowledged while
they can log in. again, the user must re-enrol. connecting and authenticating to
Snowflake.

tombaileycourses.com
Federated Authentication

tombaileycourses.com
Federated Authentication (SSO)

Federated authentication enables users to connect to


Snowflake using secure SSO (single sign-on).

Snowflake can delegate authentication responsibility to an


SAML 2.0 compliant external identity provider (IdP) with
native support for Okta and ADFS IdPs.

An IdP is an independent service responsible for creating and


maintaining user credentials as well as authenticating users
for SSO access to Snowflake.

In a federated environment Snowflake is referred to as a


Service Provider (SP). SP IdP

tombaileycourses.com
Federated Authentication Login Flow

Clicks SSO button Login Successful


Interface

Enters IdP
credentials

Clicks on Snowflake
Application

IdP

tombaileycourses.com
Federated Authentication Properties

SAML_IDENTITY_PROVIDER SSO_LOGIN_PAGE

ALTER ACCOUNT SET SAML_IDENTITY_PROVIDER =


'{
"certificate": "XXXXXXXXXXXXXXXXXXX",
ALTER ACCOUNT SET
"ssoUrl": "https://abccorp.testmachine.com/adfs/ls",
SSO_LOGIN_PAGE = TRUE;
"type" : "ADFS",
"label" : "ADFSSingleSignOn"
}';

How to specify an IdP during the Snowflake Enable button for Snowflake-initiated SSO
setup of Federated Authentication. for your identity provider (as specified in
SAML_IDENTITY_PROVIDER) in the
Snowflake main login page.

tombaileycourses.com
Key Pair Authentication,
OAuth & SCIM

tombaileycourses.com
Key Pair Authentication

Generate Key-Pair Assign Public Key to Configure Snowflake Configure Key-Pair


using OpenSSL Snowflake User Client Rotation

2048-bit RSA key pair • SnowSQL


ALTER USER USER1 SET ALTER USER USER1 SET
RSA_PUBLIC_KEY='MIIB%JA...';
• Python connector
RSA_PUBLIC_KEY_2=‘JER£E...';
• Spark connector
• Kafka connector
• Go driver
ALTER USER USER1 UNSET
Public Key
• JDBC driver
Private Key RSA_PUBLIC_KEY;
• ODBC driver
• .NET driver
• Node.js Driver

tombaileycourses.com
OAuth & SCIM

OAuth SCIM
System for Cross-domain Identity Management (SCIM) can be
Snowflake supports the OAuth 2.0 protocol. used to manage users and groups ( Snowflake roles) in cloud
applications using RESTful APIs.

OAuth is an open-standard protocol that allows


supported clients authorized access to Snowflake 201
without sharing or storing user login credentials.

Snowflake offers two OAuth pathways: Snowflake CREATE USER


OAuth and External OAuth.
Snowflake (SP) ADFS (IdP)

tombaileycourses.com
Network Policies

tombaileycourses.com
Network Policies
MY_NETWORK_POLICY

Network Policies provide the user with the ability to allow or deny
us47171.eu-west-2.aws.snowflakecomputing.com access to their Snowflake account based on a single IP address or
list of addresses.

Network Policies are composed of an allowed IP range and optionally


a blocked IP range. Blocked IP ranges are applied first.
CREATE NETWORK POLICY MY_POLICY
ALLOWED_IP_LIST=('192.168.1.0/24')
BLOCKED_IP_LIST=('192.168.1.99');
Network Policies currently support only IPv4 addresses.

Network policies use CIDR notation to express an IP subnet range.

Network Policies can be applied on the account level or to individual


users.
Account User

If a user is associated to both an account-level and user-level


network policy, the user-level policy takes precedence.

tombaileycourses.com
Network Policies

CREATE NETWORK POLICY MYPOLICY


ALLOWED_IP_LIST=('192.168.1.0/24') SHOW NETWORK POLICIES;
BLOCKED_IP_LIST=('192.168.1.99');

ACCOUNT USER

Only one Network Policy can be associated with an account at any one time. Only one Network Policy can be associated with an user at any one time.

ALTER ACCOUNT SET NETWORK_POLICY = MYPOLICY; ALTER USER USER1 SET NETWORK_POLICY = MYPOLICY;

SECURITYADMIN or ACCOUNTADMIN system roles can apply SECURITYADMIN or ACCOUNTADMIN system roles can apply
policies. Or custom role with the ATTACH POLICY global privilege. policies. Or custom role with the ATTACH POLICY global privilege.

SHOW PARAMETERS LIKE ‘MYPOLICY' IN ACCOUNT; SHOW PARAMETERS LIKE ‘MYPOLICY' IN USER USER1;

tombaileycourses.com
Data Encryption

tombaileycourses.com
Data Encryption

Encryption At Rest Encryption In Transit

ODBC JDBC Web UI SnowSQL

Table Data Internal Stage HTTPS


Data TLS 1.2

AES-256 strong encryption

Virtual Warehouse and Query Result Caches

tombaileycourses.com
E2EE Encryption Flows

Internal Stage
PUT COPY INTO
<table>

External Stage
Cloud COPY INTO
Utils <table>

tombaileycourses.com
Hierarchical Key Model

Root Key AWS CloudHSM

Account Master Keys


Account 1 Account 2

Table Master Keys

File Keys
Data File Data File Data File Data File Data File Data File

tombaileycourses.com
Key Rotation

Rotate Rotate

TMK v1 TMK v2 TMK v3

Data File Data File Data File Data File Data File
1 2 3 4 5

TIME January February March

Key rotation is the practise of transparently replacing existing account and table
encryption keys every 30 days with a new key.

tombaileycourses.com
Re-Keying
Re-key

TMK v1 Gen1
TMK v1 Gen2

Data File Data File Data File Data File


1 2 1 2

TIME January 2021 January 2022

Once a retired key exceeds 1 year, Snowflake automatically creates a new encryption key
and re-encrypts all data previously protected by the retired key using the new key.

ALTER ACCOUNT SET PERIODIC_DATA_REKEYING = TRUE;


tombaileycourses.com
Tri-secret Secure and Customer Managed Keys

KMS HSM
Root Keys

Account Master Composite Key


AMK-C AMK-S

Table Master Keys

File Keys
Data File Data File Data File Data File

tombaileycourses.com
Column Level security

tombaileycourses.com
Dynamic Data Masking

Plain text data

SELECT
Table/View

Policy
Unauthorized ID Email
101 ******@gmail.com

102 ******@gmail.com

103 ******@gmail.com

Masked

Sensitive data in plain text is loaded into Snowflake, and it is


dynamically masked at the time of query for unauthorized users.

tombaileycourses.com
Dynamic Data Masking

Plain text data

SELECT
Table/View

Policy
Authorized ID Email
101 [email protected]

102 [email protected]

103 [email protected]

Unmasked

Sensitive data in plain text is loaded into Snowflake, and it is


dynamically masked at the time of query for unauthorized users.

tombaileycourses.com
Masking Policies

CREATE MASKING POLICY EMAIL_MASK AS (VAL STRING) RETURNS STRING ->


CASE
Unmasked WHEN CURRENT_ROLE() IN (‘SUPPORT’) THEN VAL
Partially Masked WHEN
ELSE CURRENT_ROLE() IN (‘ANALYST’) THEN REGEXP_REPLACE(VAL,'.+\@','*****@’)
'********'
System Functions END;
WHEN CURRENT_ROLE() IN (‘HR’) THEN SHA2(VAL)
User Defined Functions WHEN CURRENT_ROLE() IN (‘SALES’) THEN MASK_UDF(VAL)
Semi-structured WHEN CURRENT_ROLE() IN (‘FINANCE’) THEN OBJECT_INSERT(VAL, 'USER_EMAIL', '*', TRUE)
Fully Masked

ALTER TABLE IF EXISTS EMP_INFO MODIFY COLUMN USER_EMAIL SET MASKING POLICY EMAIL_MASK;

tombaileycourses.com
Masking Policies

CREATE MASKING POLICY EMAIL_MASK AS (VAL STRING) RETURNS STRING ->


CASE
Unmasked WHEN CURRENT_ROLE() IN (‘SUPPORT’) THEN VAL
Partially Masked WHEN
ELSE CURRENT_ROLE() IN (‘ANALYST’) THEN REGEXP_REPLACE(VAL,'.+\@','*****@’)
'********'
System Functions END;
WHEN CURRENT_ROLE() IN (‘HR’) THEN SHA2(VAL)
User Defined Functions WHEN CURRENT_ROLE() IN (‘SALES’) THEN MASK_UDF(VAL)
Semi-structured WHEN CURRENT_ROLE() IN (‘FINANCE’) THEN OBJECT_INSERT(VAL, 'USER_EMAIL', '*', TRUE)
Fully Masked

ALTER TABLE IF EXISTS EMP_INFO MODIFY COLUMN USER_EMAIL SET MASKING POLICY EMAIL_MASK;

tombaileycourses.com
Masking Policies

Data masking policies are schema-level objects, like tables & views.

Creating and applying data masking policies can be done independently of object owners.

Masking policies can be nested, existing in tables and views that reference those tables.

A masking policy is applied no matter where the column is referenced in a SQL statement.

A data masking policy can be applied either when the object is created or after the object is created.

tombaileycourses.com
External Tokenization

Tokenized data

SELECT
Table/View

Tokenized
External
Authorized
External
ID DOB Policy
Tokenization
Function
Detokenized
101 01/02/1978 Service

102 10/12/1960

103 10/09/2000
REST API

Detokenized

Tokenized data is loaded into Snowflake, which is detokenized at query


run-time for authorized users via masking policies that call an external
tokenization service using external functions.

tombaileycourses.com
Row Level security

tombaileycourses.com
Row Access Policies

SELECT
Table/View

Policy
Unauthorized ID Email

Rows filtered

Row access policies enable a security team to restrict which rows are
return in a query.

tombaileycourses.com
Row Access Policies

SELECT
Table/View

Policy
Authorized ID Email

101 [email protected]

102 [email protected]

103 [email protected]

Rows unfiltered

Row access policies enable a security team to restrict which rows are
return in a query.

tombaileycourses.com
Row Access Policies

CREATE OR REPLACE ROW ACCESS POLICY RAP_ID AS (ACC_ID VARCHAR) RETURNS BOOLEAN ->
CASE
WHEN 'ADMIN' = CURRENT_ROLE() THEN TRUE
ELSE FALSE
END;

ALTER TABLE ACCOUNTS ADD ROW ACCESS POLICY RAP_IT ON (ACC_ID);

Similarities
Adding a masking policy to a column fails if
Schema level object the column is referenced by a row access
policy.
Segregation of duties

Creation and applying workflow Row access policies are evaluated before
data masking policies.
Nesting policies

tombaileycourses.com
Secure Views

tombaileycourses.com
Secure Views

Secure views are a type of view designed to limit access to


the underlying tables or internal structural details of a view. Standard
View
Materialized
View

Both standard and materialized views can be designated as Table


secure.

A secure view is created by adding the keyword SECURE in CREATE OR REPLACE SECURE VIEW MY_SEC_VIEW AS
the view DDL. SELECT COL1, COL2, COL3 FROM MY_TABLE;

The definition of a secure view is only available to the object SHOW VIEWS; GET_DDL();
Information Account

owner.
Schema Usage

Query Optimizer

Secure views bypass query optimizations which may


inadvertently expose data in the underlying table. Filter Authorization

tombaileycourses.com
Account Usage and Information
Schema

tombaileycourses.com
Account Usage

Snowflake provide a shared read-only databased called SNOWFLAKE,


imported using a Share object called ACCOUNT_USAGE.

It is comprised of 6 schemas, which contain many views providing ACCOUNTADMIN

fine-grained usage metrics at the account and object level.

By default, only users with the ACCOUNTADMIN role can access the
SNOWFLAKE database.
SELECT * FROM "SNOWFLAKE"."ACCOUNT_USAGE"."TABLES";
Account usage views record dropped objects, not just those that are TABLE_ID DELETED
currently active. 4 2022-12-03 09:08:35.765 -0800

There is latency between an event and when that event is recorded in ~ 2 Hours
an account usage view.

Certain account usage views provide historical usage metrics. The 365 Days
retention period for these views is 1 year.

tombaileycourses.com
Information Schema
Each database created in an account automatically includes a
built-in, read-only schema named INFORMATION_SCHEMA
based on the SQL-92 ANSI Information Schema.

Each INFORMATION_SCHEMA contains:


• Views displaying metadata for all objects contained in
the database. Object Metadata Account Metadata
Table Functions
Views Views
• Views displaying metadata for account-level objects
(non-database objects such as roles, warehouses and
databases).
Tables Databases Task History
Stages Load History Login History
Pipes Enabled Roles Copy History
Functions Applicable Roles Tag References
• Table functions displaying metadata for historical and ... ... ...
usage data across an account.

The output of a view or table function depends on the privileges


granted to the user’s current role.

tombaileycourses.com
Account Usage vs. Information Schema

Property Account Usage Information Schema

Includes dropped
Yes No
objects

From 45 minutes to 3
Latency of data None
hours (varies by view)

From 7 days to 6 months


Retention of
1 Year (varies by view/table
historical data
function)

tombaileycourses.com
What is a Virtual
Warehouse?

tombaileycourses.com
Virtual Warehouse Overview

A Virtual Warehouse is a named abstraction for a


Massively Parallel Processing (MPP) compute cluster.

Virtual Warehouses execute:


• DQL operations (SELECT)
• DML operations (UPDATE)
• Data loading operations (COPY INTO)

As a user you only interact with the named warehouse


object not the underlying compute resources.

tombaileycourses.com
Virtual Warehouse Overview

Spin up and shut-down a virtually unlimited number of


warehouses without resource contention.

Virtual Warehouse configuration can be changed on-the-fly.

Virtual Warehouses contain local SSD storage used to


store raw data retrieved from the storage layer.
Storage Layer

Virtual Warehouses are created via the Snowflake UI or


through SQL commands.

tombaileycourses.com
Virtual Warehouse Overview

DROP WAREHOUSE MY_WAREHOUSE;

CREATE WAREHOUSE MY_MED_WH


WAREHOUSE_SIZE=‘MEDIUM’;

ALTER WAREHOUSE MY_WH SUSPEND;

ALTER WAREHOUSE MY_WH_2 SET


WAREHOUSE_SIZE=MEDIUM;

CREATE WAREHOUSE MY_WH_3


MIN_CLUSTER_COUNT=1
MAX_CLUSTER_COUNT=3
SCALING_POLICY=STANDARD;

tombaileycourses.com
Sizing and Billing

tombaileycourses.com
Virtual Warehouse Sizes

Virtual Warehouses can be created in 10 t-shirt


sizes.

Underlying compute power approximately doubles with


Large X-Large each size.
Small Medium
X-Small

In general the larger the Virtual Warehouse the better


the query performance.

5X- 6X-
3X- 4X- Choosing a size is typically done by experimenting with
2X- Large Large
Large Large a representative query of a workload.
Large

Data loading does not typically require large Virtual


Warehouses and sizing up does not guarantee increased
data loading performance.

tombaileycourses.com
Virtual Warehouse Billing

Virtual
3X-Large 4X-Large 5X-Large 6X-Large Warehouse
2X-Large
X-Small Small Medium Large X-Large Size

Credits /
1 2 4 8 16 32 64 128 256 512
Hour

Credits /
0.0003 0.0006 0.0011 0.0022 0.0044 0.0089 0.0178 0.0356 0.0711 0.1422
Second|

tombaileycourses.com
Credit Calculation

Running Time Credits

0-60 Seconds 0.017

61 Seconds 0.017

2 Minutes 0.033

10 Minutes 0.167

1 Hour 1.000

The first 60 seconds after a virtual warehouse is


provisioned and running are always charged.
tombaileycourses.com
Credit Pricing

Credit price is determined by region & Snowflake edition.

tombaileycourses.com
Credit Pricing

Virtual Warehouse

If a XS Virtual Warehouse is active If a L Virtual Warehouse is active


for 1 hour for 3 hours
on the Standard Edition of Snowflake on the Enterprise Edition of Snowflake
deployed in AWS Europe (London) Region deployed in AWS AP Northeast 1 (Tokyo) Region
it will consume 1 Snowflake Credit it will consume 24 Snowflake Credit
costing $2.70 (Nov 2021) costing $72.00 (Nov 2021)

tombaileycourses.com
Virtual Warehouse
State and Properties

tombaileycourses.com
Virtual Warehouse State

STARTED SUSPENDED RESIZING

tombaileycourses.com
Virtual Warehouse State

CREATE WAREHOUSE MY_MED_WH WITH By default when a Virtual Warehouse is created


WAREHOUSE_SIZE=‘MEDIUM’; it is in the STARTED state.

Suspending a Virtual Warehouse puts it in the


ALTER WAREHOUSE MY_WH SUSPEND; SUSPENDED state, removing the compute nodes
from a warehouse.

Resuming a Virtual Warehouse puts in back into


ALTER WAREHOUSE MY_WH RESUME;
the STARTED state and can execute queries.

tombaileycourses.com
Virtual Warehouse State Properties

AUTO SUSPEND AUTO RESUME INITIALLY SUSPENDED

CREATE WAREHOUSE MY_MED_WH CREATE WAREHOUSE MY_MED_WH CREATE WAREHOUSE MY_MED_WH


AUTO_SUSPEND=300; AUTO_RESUME=TRUE; INITIALLY_SUSPENDED=TRUE;

Specifies the number of seconds of Specifies whether to automatically Specifies whether the warehouse is
inactivity after which a warehouse is resume a warehouse when a SQL created initially in the ‘Suspended’
automatically suspended. statement is submitted to it. state.

tombaileycourses.com
Resource Monitors

tombaileycourses.com
Resource Monitors

Resource Monitors are objects allowing users to set


credit limits on user managed warehouses.
Send notification when 75%
Resource
of weekly limit of 10 credits
Monitor
Resource Monitors can be set on either the account or has been consumed.
individual warehouse level.

Limits can be set for a specified interval or data range.

When limits are reached an action can be triggered, such


as notify user or suspend warehouse.
Resource
Monitor
Resource Monitors can only be created by account
administrators.

tombaileycourses.com
Resource Monitors

CREATE RESOURCE MONITOR ANALYSIS_RM


Number of credits allocated to the resource monitor per
WITH CREDIT_QUOTA=100 frequency interval.

FREQUENCY=MONTHLY
DAILY, WEEKLY, MONTHLY, YEARLY or NEVER.
START_TIMESTAMP=‘2023-01-04 00:00 GMT'

TRIGGERS ON 50 PERCENT DO NOTIFY Start timestamp determines when a resource monitor will
start once applied to a warehouse or account. The
ON 75 PERCENT DO NOTIFY frequency is relative to the start timestamp.
ON 95 PERCENT DO SUSPEND

ON 100 PERCENT DO SUSPEND_IMMEDIATE; Triggers determine the condition for a certain action to
take place.

ALTER ACCOUNT SET RESOURCE_MONITOR = ANALYSIS_RM;


tombaileycourses.com
Virtual Warehouse
Concurrency & Query
Complexity

tombaileycourses.com
Scaling Up: Resizing Virtual Warehouses

Scaling up a Virtual Warehouse is intended to improve


query performance.

Virtual Warehouses can be manually resized via the


X-Small
Snowflake UI or SQL commands.

Resizing a running warehouse does not impact running


queries. The additional compute resources are used
for queued and new queries.
Large

Decreasing size of running warehouse removes compute


ALTER WAREHOUSE MY_WH resources from the warehouse and clears the
SET WAREHOUSE_SIZE=LARGE; warehouse cache.

tombaileycourses.com
Scaling out: Multi-cluster Warehouses

MY_MCW_1 Maximized Mode A multi-cluster warehouse is a named group of virtual


warehouses which can automatically scale in and out
based on the number of concurrent users/queries.

MIN_CLUSTER_COUNT specifies the minimum


number of warehouses for a multi-cluster warehouse.
MIN_CLUSTER_COUNT=4 & MAX_CLUSTER_COUNT=4
MAX_CLUSTER_COUNT specifies the maximum
number of warehouses for a multi-cluster warehouse.
MY_MCW_2 Auto-scale Mode

Setting these two values the same will put the multi-
cluster warehouse in MAXIMIZED mode.

Setting these two values differently will put the multi-


cluster warehouse in AUTO-SCALE mode.
MIN_CLUSTER_COUNT=1 & MAX_CLUSTER_COUNT=4

tombaileycourses.com
Standard Scaling Policy

Scaling Out

When a query is queued a new warehouse


will be added to the group immediately.

Standard Scaling Policy

CREATE WAREHOUSE MY_MCW_1


MIN_CLUSTER_COUNT=1
MAX_CLUSTER_COUNT=4
SCALING_POLICY=STANDARD;
Scaling In Every minute a background process will
check if the load on the least busy
warehouse can be redistributed to another
warehouse.

If this condition is met after 2 consecutive


60%
50% 10% minutes a warehouse will be marked for
shutdown.

tombaileycourses.com
Economy Scaling policy

Scaling Out

When a query is queued the system will


estimate if there’s enough query load to
keep a new warehouse busy for 6
minutes.
Economy Scaling Policy

CREATE WAREHOUSE MY_MCW_2


MIN_CLUSTER_COUNT=1
MAX_CLUSTER_COUNT=4
SCALING_POLICY=ECONOMY;
Scaling In Every minute a background process will
check if the load on the least busy
warehouse can be redistributed to another
warehouse.

If this condition is met after 6


60%
50% 10% consecutive minutes a warehouse will be
marked for shutdown.

tombaileycourses.com
Multi-cluster Warehousing Billing

CREATE WAREHOUSE MY_MCW The total credit cost of a multi-cluster warehouse is the sum
MIN_CLUSTER_COUNT=1 of all the individual running warehouses that make up that
MAX_CLUSTER_COUNT=3
cluster.
WAREHOUSE_SIZE=‘MEDIUM’;

3 x 4 Credits = 12 Credits / Hour


The maximum number of credits a multi-cluster can consume is
the number of warehouses multiplied by the hourly credit rate
of the size of the warehouses.

Because multi-cluster warehouses scale in and out based on


demand it’s typical to get some fraction of the maximum credit
consumption.

1 hour 2 hour
tombaileycourses.com
Concurrency Behaviour Properties

MAX CONCURRENCY STATEMENT QUEUED TIMEOUT STATEMENT TIMEOUT IN


LEVEL IN SECONDS SECONDS

CREATE WAREHOUSE MY_MED_WH CREATE WAREHOUSE MY_MED_WH CREATE WAREHOUSE MY_MED_WH


MAX_CONCURRENCY_LEVEL=6; STATEMENT_QUEUED_TIMEOUT_IN_SECONDS=60; STATEMENT_TIMEOUT_IN_SECONDS=600;

Specifies the number of concurrent SQL


Specifies the time, in seconds, a SQL It specifies the time, in seconds, after
statements that can be executed against a
statement can be queued on a warehouse which any running SQL statement on a
warehouse before either it is queued or
before it is aborted. warehouse is aborted.
additional compute power is provided.

tombaileycourses.com
Performance and Tuning
Overview

tombaileycourses.com
Query Performance Analysis Tools

SQL

History Tab Query History Query Profile

History Tab displays query history for the last 14 days.

Users can view other users queries but cannot view their
query results.

tombaileycourses.com
SQL Tuning

tombaileycourses.com
Database Order Of Execution

ROWS GROUPS RESULT

FROM GROUP BY SELECT

JOIN HAVING DISTINCT

WHERE ORDER BY

LIMIT

tombaileycourses.com
Join Explosion
Orders Products
ORDER_DATE PRODUCT_NAME CUSTOMER_NAME ORDER_AMOUNT PRODUCT_NAME PRODUCT_PRICE ORDER_DATE

01/12/2022 Apple MacBook Air Arun 1 Apple MacBook Air 899.99 01/12/2022

13/12/2021 Sony Playstation 5 Ana 1 LG Blu-ray Player 110.00 01/01/2022

01/01/2022 LG Blu-ray Player Pawel 2 Sony Playstation 5 449.00 12/11/2020

21/02/2020 Sony Playstation 5 Susan 1 Sony Playstation 5 429.00 10/06/2021

SELECT *, (O.ORDER_AMOUNT * P.PRODUCT_PRICE)


FROM ORDERS O
LEFT JOIN PRODUCTS P ON O.PRODUCT = P.PRODUCT;

ORDER_DATE PRODUCT_NAME CUSTOMER_NAME ORDER_AMOUNT ORDER_TOTAL

01/12/2022 Apple MacBook Air Arun 1 899.99

13/12/2021 Sony Playstation 5 Ana 1 449.00

01/01/2022 LG Blu-ray Player Pawel 2 220.00

21/02/2020 Sony Playstation 5 Susan 1 449.00

21/02/2020 Sony Playstation 5 Susan 1 429.00

13/12/2021 Sony Playstation 5 Ana 1 429.00

tombaileycourses.com
Join Explosion

Join on columns with unique values.

Understand table relationships.

tombaileycourses.com
Limit & Order By
SELECT * FROM CUSTOMER
SELECT * FROM CUSTOMER
ORDER BY C_ACCTBAL
ORDER BY C_ACCTBAL;
LIMIT 10;

tombaileycourses.com
Spilling to Disk

“Bytes spilled to local storage”


Volume of data spilled to virtual
warehouse local disk.

“Bytes spilled to remote storage”


Volume of data spilled to remote
disk.

Process less data.

Virtual Warehouse size.

tombaileycourses.com
Order By Position

SELECT C_NATIONKEY, R.R_NAME, TOTAL_BAL FROM


(
SELECT
C_NATIONKEY,
COUNT(C_ACCTBAL) AS TOTAL_BAL
FROM CUSTOMER
GROUP BY C_NATIONKEY
Redundant ORDER BY C_NATIONKEY
) C JOIN REGION R ON (C.C_NATIONKEY = R.R_REGIONKEY)
Top-level ORDER BY TOTAL_BAL;

ORDER BY in top-level select only.

tombaileycourses.com
Group By
SELECT C_NATIONKEY, COUNT(C_ACCTBAL) SELECT C_CUSTKEY, COUNT(C_ACCTBAL)
FROM CUSTOMER FROM CUSTOMER
GROUP BY C_NATIONKEY; -- Low Cardinality GROUP BY C_CUSTKEY; -- High Cardinality

tombaileycourses.com
Caching

tombaileycourses.com
Caching

Metadata Cache Results Cache


Services Layer

Virtual Virtual Virtual


Warehouse Warehouse Warehouse

Local Disk Cache Local Disk Cache Local Disk Cache


Warehouse Layer

Remote Disk

Storage Layer

tombaileycourses.com
Caching

Metadata Cache

Metadata Cache Results Cache Snowflake has a high availability metadata store which
maintains metadata object information and statistics.
Services Layer

Some queries can be completed purely using this


metadata, not requiring a running virtual warehouse.
Virtual Virtual Virtual
Warehouse Warehouse Warehouse
SELECT COUNT(*) FROM MY_TABLE;
Local Disk Cache Local Disk Cache Local Disk Cache
Warehouse Layer
SELECT SYSTEM$WHITELIST();

SELECT CURRENT_DATABASE();

Remote Disk
DESCRIBE TABLE MY_TABLE;

Storage Layer SHOW TABLES;

tombaileycourses.com
Caching

Result Cache

Metadata Cache Results Cache


Services Layer 24hr 31 Days

Virtual Virtual Virtual To reuse a result:


Warehouse Warehouse Warehouse • New query exactly matches previous query.
• The underlying table data has not changed.
Local Disk Cache Local Disk Cache Local Disk Cache
Warehouse Layer • The same role is used as the previous query.

If time context functions are used, such as


CURRENT_TIME(), the result cache will not be used.

Remote Disk Result reuse can be disabled using the session


parameter USE_CACHED_RESULT.
Storage Layer

tombaileycourses.com
Caching

Warehouse Cache

Metadata Cache Results Cache


Virtual Warehouses have local SSD storage which
Services Layer
maintains raw table data used for processing a query.

Virtual Virtual Virtual The larger the virtual warehouse the greater the local
Warehouse Warehouse Warehouse cache.
Local Disk Cache Local Disk Cache Local Disk Cache
Warehouse Layer
It is purged when the virtual warehouse is resized,
suspended or dropped.

Remote Disk Can be used partially, retrieving the rest of the data
required for a query from remote storage.
Storage Layer

tombaileycourses.com
Materialized Views

tombaileycourses.com
Materialized Views

“A Materialized View is a pre-computed & persisted data set


derived from a SELECT query.”

MVs are updated via a background process ensuring data is


current and consistent with the base table.
M
Non-materialized View Materialized View
MVs improve query performance by making complex queries
that are commonly executed readily available.

MVs are an enterprise edition and above serverless feature.

Base Table
tombaileycourses.com
Materialized Views

MVs use compute resources to perform automatic background


maintenance.
CREATE OR REPLACE MATERIALIZED VIEW MV1 AS
MVs use storage to store query results, adding to the monthly SELECT COL1, COL2 FROM T1;
storage usage for an account.

MATERIALIZED_VIEW_REFRESH_HISTORY ALTER MATERIALIZED VIEW MV1 SUSPEND;

MVs can be created on top of External Tables to improve their


query performance.
ALTER MATERIALIZED VIEW MV1 RESUME;

MVs are limited in the following ways:

UDF, HAVING, ORDER SHOW MATERIALIZED VIEWS LIKE 'MV1%';


Single
JOIN BY, LIMIT, WINDOW
Table FUNCTIONS

tombaileycourses.com
Clustering

tombaileycourses.com
Natural Clustering

A-C D-F G-G X-Z

A A B C D E F F G G G G X Y Z Z

G F B X C G F Z D A G Z E A Y G

B-X C-Z D-Z A-Y

tombaileycourses.com
Clustering

Products.csv

ORDER_ID PRODUCT_ID ORDER_DATE

1 PROD-YVN1VO 2022-06-16

2 PROD-Y5TTKB 2022-06-16

3 PROD-T9ISFR 2022-06-16

4 PROD-HK2USO 2022-06-16

5 PROD-YVN1VO 2022-06-17

6 PROD-BKMWB 2022-06-17

7 PROD-IPM6HU 2022-06-18

8 PROD-IPM6HU 2022-06-18

9 PROD-YVN1VO 2022-06-19

… … …

tombaileycourses.com
Clustering

ORDER_ID PRODUCT_ID ORDER_DATE

1 PROD-YVN1VO 2022-06-16

ORDER_ID PRODUCT_ID ORDER_DATE MP 1 7


2 PROD-Y5TTKB 2022-06-16

1 PROD-YVN1VO 2022-06-16
3 PROD-T9ISFR 2022-06-16

2 PROD-Y5TTKB 2022-06-16

3 PROD-T9ISFR 2022-06-16
ORDER_ID PRODUCT_ID ORDER_DATE
4 PROD-HK2USO 2022-06-16
4 PROD-HK2USO 2022-06-16
5 PROD-YVN1VO 2022-06-17 MP 2
5 PROD-YVN1VO 2022-06-17

6 PROD-BKMWB 2022-06-17
6 PROD-BKMWB 2022-06-17

7 PROD-IPM6HU 2022-06-18

8 PROD-IPM6HU 2022-06-18
ORDER_ID PRODUCT_ID ORDER_DATE
9 PROD-YVN1VO 2022-06-19
7 PROD-IPM6HU 2022-06-18
… … …
MP3
8 PROD-IPM6HU 2022-06-18

9 PROD-YVN1VO 2022-06-19

tombaileycourses.com
Clustering Metadata

Snowflake maintains the following clustering SYSTEM$CLUSTERING_INFORMATION


metadata for micro-partitions in a table: SELECT system$clustering_information(‘table’,’(col1,col3)’);

Total Number of Micro-


partitions

Number of Overlapping
Micro-partitions

Depth of Overlapping
Micro-partitions

tombaileycourses.com
Clustering Depth

Q Z

L S A D E J K Z

A Z
A N

Overlapping
Micro-partitions
3 3 0

Overlap Depth 3 2 1

tombaileycourses.com
Automatic Clustering

tombaileycourses.com
Automatic Clustering

PRODUCT_ID PRODUCT_ID
Snowflake supports specifying one or more table
columns/expressions as a clustering key for a table. PROD-YVN1VO PROD-YVN1VO

PROD-Y5TTKB PROD-YVN1VO

PROD-T9ISFR
Cluster PROD-YVN1VO
On
PRODUCT_ID
Clustering aims to co-locate data of the clustering key
MP 1 MP 2
in the same micro-partitions.

Clustering improves performance of queries that WHERE JOIN ORDER BY GROUP BY


frequently filter or sort on the clustered keys.

Clustering should be reserved for large tables in the >1TB


multi-terabyte range.

tombaileycourses.com
Choosing a Clustering Key

Snowflake recommended a maximum of 3 or 4 columns (or


expressions) per key.
CREATE TABLE T1 (C1 date, c2 string, c3 number)
CLUSTER BY (C1, C2);

Columns used in common queries which perform filtering


and sorting operations.
CREATE TABLE T1 (C1 date, c2 string, c3 number)
CLUSTER BY (MONTH(C1), SUBSTRING(C2,0,10));

Consider the cardinality of the clustering key:

X T U Z C I U L H O Q D
ALTER TABLE T1 CLUSTER BY (C1, C3);
HC O V U O B I U L E V Q F
O F F Z I T U O Z F T W

P1 P2 P3
ALTER TABLE T2 CLUSTER BY (SUBSTRING(C2,5,10),
MONTH(C1));
LC F F F F F M F F F F M M
P1 P2 P3

tombaileycourses.com
Reclustering and Clustering Cost

As DML operations are performed on a clustered table, the A A A B B B C C C B C A


data in the table might become less clustered.
Clustered Insert

Reclustering is a background process which transparently A A A A B B B B C C C C


reorganizes data in the micro-partitions by the clustering key.

Reclustered

Initial clustering and subsequent reclustering operations


consume compute & storage credits. COMPUTE STORAGE

Clustering is recommended for large tables which do not


frequently change and are frequently queried.

tombaileycourses.com
Search Optimization

tombaileycourses.com
Search Optimization Service
Search optimization service is a table level property aimed at
improving the performance of selective point lookup queries.

SELECT NAME, ADDRESS FROM USERS SELECT NAME, ADDRESS FROM USERS
WHERE USER_EMAIL = ‘semper.google.edu’; WHERE USER_ID IN (4,5);

USER_ID USER_NAME USER_ADDRESS USER_EMAIL

1 Duff Joisce 81 Mandrake


[email protected]
Center

2 Ira Downing 33214 Barnett [email protected]


Junction

3 Alis Litel semper.google.edu


9259 Russell Point

4 Cory Calderon 9266 New Castle [email protected]


Hill

5 Pearl Denyuk 499 Thierer Hill [email protected]

The search optimization service is an enterprise edition feature.


tombaileycourses.com
Search Optimization Service
A background process creates and maintains a search access path to
enable search optimization.

ALTER TABLE MY_TABLE ADD SEARCH OPTIMIZATION;

ALTER TABLE MY_TABLE DROP SEARCH OPTIMIZATION;

SHOW TABLES LIKE ‘%MY_TABLE%';

The access path data structure 10 Snowflake credits per


requires space for each table on Snowflake-managed compute hour
which search optimization is enabled.
The larger the table, the larger the 5 Snowflake credits per Cloud
access path storage costs. Services compute hour

tombaileycourses.com
Data Loading Simple Methods

tombaileycourses.com
Data Movement

Stage Table

Stage Pipe Table Stage

INSERT

Upload via UI
Table

tombaileycourses.com
INSERT

INSERT INTO MY_TABLE SELECT ‘001’, ‘John Doughnut’, ‘10/10/1976’; Insert a row into a table from the results of a select query.

To load specific columns, individual columns can be


INSERT INTO MY_TABLE (ID, NAME) SELECT ‘001’, ‘John Doughnut’;
specified

INSERT INTO MY_TABLE (ID, NAME, DOB) VALUES


(‘001’, ‘John Doughnut’, ‘10/10/1976’), The VALUES keyword can be used to insert multiple rows
(‘002’, ‘Lisa Snowflake’, ‘21/01/1934’), into a table.
(‘003’, ‘Oggle Berry’, ‘01/01/2001’);

INSERT INTO MY_TABLE SELECT * FROM MY_TABLE_2; Another table can be used to insert rows into a table.

The keyword OVERWRITE will truncate a table before new


INSERT OVERWRITE INTO MY_TABLE SELECT * FROM MY_TABLE_2;
values are inserted into it.

tombaileycourses.com
Stages

tombaileycourses.com
Stages

Stage Table

Stage Pipe Table Stage

Table

tombaileycourses.com
Stages
Stages are temporary storage locations for data files used in the
data loading and unloading process.

Internal External

User Stage Table Stage Named Stage Named Stage

tombaileycourses.com
Internal Stages

User Stage Table Stage Named Stage

Automatically allocated when a user is created. Automatically allocated when a table is created. User created database object.

PUT PUT PUT

ls @~; ls @%MY_TABLE; ls @MY_STAGE;

Cannot be altered or dropped. Cannot be altered or dropped. Securable object.

Not appropriate if multiple users need User must have ownership Supports copy transformations and
access to stage. privileges on table. applying file formats.

tombaileycourses.com
External Stages and Storage Integrations
External stages reference data files stored in a location outside of Snowflake.

External Named CREATE STAGE MY_EXT_STAGE


Stage
URL='S3://MY_BUCKET/PATH/'
STORAGE_INTEGRATION=MY_INT;
CREDENTIALS=(AWS_KEY_ID=‘’ AWS_SECRET_KEY=‘’)
ENCRYPTION=(MASTER_KEY=‘’)
User created database object.

Cloud Utilities
CREATE STORAGE INTEGRATION MY_INT
TYPE=EXTERNAL_STAGE
ls @MY_STAGE; STORAGE_PROVIDER=S3
STORAGE_AWS_ROLE_ARN=‘ARN:AWS:IAM::98765:ROLE/MY_ROLE’
ENABLED=TRUE
STORAGE_ALLOWED_LOCATIONS=(‘S3://MY_BUCKET/PATH/’);
Storage location can be private or public.

Copy options such as ON_ERROR and A storage integration is a reusable and securable Snowflake object which can
PURGE can be set on stages. be applied across stages and is recommended to avoid having to explicitly set
sensitive information for each stage definition.

tombaileycourses.com
Stage Helper Commands

LIST SELECT REMOVE

LIST/ls @MY_STAGE; SELECT


REMOVE/rm @MY_STAGE;
LIST/ls @~; metadata$filename,
REMOVE/rm @~;
LIST/ls @%MY_TABLE; metadata$file_row_number,
REMOVE/rm @%MY_TABLE;
$1,
$2
FROM @MY_STAGE
(FILE_FORMAT => ‘MY_FORMAT’);
List the contents of a stage: Remove files from either an external or
• Path of staged file internal stage.
• Size of staged file
• MD5 Hash of staged file Query the contents of staged files directly
• Last updated timestamp using standard SQL for both internal and
external stages.
Can optionally specify a path for specific
folders or files.
Can optionally specify a path for specific Useful for inspected files prior to data
folders or files. loading/unloading.

Named and internal table stages can Named and internal table stages can
optionally include database and schema Reference metadata columns such as
optionally include database and schema
global pointer. filename and row numbers for a staged file.
global pointer.

tombaileycourses.com
PUT

The PUT command uploads data files from a local directory


on a client machine to any of the three types of internal
stage.
PUT FILE:///FOLDER/MY_DATA.CSV @MY_INT_STAGE;

PUT cannot be executed from within worksheets. PUT FILE:///FOLDER/MY_DATA.CSV @~; macOS / Linux

PUT FILE:///FOLDER/MY_DATA.CSV @%MY_TABLE;

Duplicate files uploaded to a stage via PUT are


ignored.

PUT FILE://c:\\FOLDER\\MY_DATA.CSV @MY_INT_STAGE; Windows

Uploaded files are automatically encrypted with a 128-bit


key with optional support for a 256-bit key.

tombaileycourses.com
Bulk Loading with
COPY INTO <table>

tombaileycourses.com
Data Movement

COPY INTO <table>

File Formats
Stage Table

Stage Pipe Table Stage

Table

tombaileycourses.com
COPY INTO <table>

The COPY INTO <table> statement copies the contents of an


internal or external stage or external location directly into a COPY INTO MY_TABLE FROM @MY_INT_STAGE;
table.

The following file formats can be uploaded to Snowflake:


• Delimited files (CSV, TSC, etc)
• JSON CSV JSON Avro
• Avro
• ORC ORC Parquet XML
• Parquet
• XML

COPY INTO <table> requires a user created virtual


warehouse to execute.

Load history is stored in the metadata of the target table for


64 Days
64 days, which ensures files are not loaded twice.

tombaileycourses.com
COPY INTO <table>

COPY INTO MY_TABLE FROM @MY_INT_STAGE; Copy all the contents of a stage into a table.

COPY INTO MY_TABLE FROM @MY_INT_STAGE/folder1;


Copy contents of a stage from a specific folder/file
path.
COPY INTO MY_TABLE FROM @MY_INT_STAGE/folder1/file1.csv;

COPY INTO MY_TABLE FROM @MY_INT_STAGE COPY INTO <table> has an option to provide a list
FILE=(‘folder1/file1.csv’, ‘folder2/file2.csv’); of one or more files to copy.

COPY INTO MY_TABLE FROM @MY_INT_STAGE COPY INTO <table> has an option to provide a
PATTERN=(‘people/.*[.]csv’); regular expression to extract files to load.

tombaileycourses.com
COPY INTO <table> Load Transformations

Snowflake allows users to perform simple transformations


on data as it’s loaded into a table.

COPY INTO MY_TABLE FROM (

Load transformations allows the user to perform: SELECT


• Column reordering. TO_DOUBLE(T.$1),
• Column omission. T.$2,
• Casting.
• Truncate test string that exceed target length. T.$3,
TO_TIMESTAMP(T.$4)
FROM @MY_INT_STAGE T);

Users can specify a set of fields to load from the staged


data files using a standard SQL query.

tombaileycourses.com
COPY External Stage/Location

Files can be loaded from external stages in the same way


as internal stages.

COPY INTO MY_TABLE FROM @MY_EXTERNAL_STAGE;


Data transfer billing charges may apply when loading data
from files in a cloud storage service in a different region
or cloud platform from your Snowflake account.

COPY INTO MY_TABLE FROM S3://MY_BUCKET/


Files can be copied directly from a cloud storage service
location.
STORAGE_INTEGRATION=MY_INTEGRATION
ENCRYPTION=(MASTER_KEY=‘’);
Snowflake recommend encapsulating cloud storage service
in an external stage.

tombaileycourses.com
Copy Options
Copy Option Definition Default Value
ON_ERROR Value that specifies the error handling for the load operation: ‘ABORT_STATEMENT’
• CONTINUE
• SKIP_FILE
• SKIP_FILE_<num>
• SKIP_FILE_<num>%
• ABORT_STATEMENT
SIZE_LIMIT Number that specifies the maximum size of data loaded by a COPY null (no size limit)
statement.
PURGE Boolean that specifies whether to remove the data files from the stage FALSE
automatically after the data is loaded successfully.
RETURN_FAILED_ONLY Boolean that specifies whether to return only files that have failed to FALSE
load in the statement result.
MATCH_BY_COLUMN_NAME String that specifies whether to load semi-structured data into columns NONE
in the target table that match corresponding columns represented in the
data.
ENFORCE_LENGTH Boolean that specifies whether to truncate text strings that exceed the TRUE
target column length.
TRUNCATECOLUMNS Boolean that specifies whether to truncate text strings that exceed the FALSE
target column length.
FORCE Boolean that specifies to load all files, regardless of whether they’ve FALSE
been loaded previously and have not changed since they were loaded.
LOAD_UNCERTAIN_FILES Boolean that specifies to load files for which the load status is unknown. FALSE
The COPY command skips these files by default.
tombaileycourses.com
COPY INTO <table> Output

Column Name Data Type Description


FILE TEXT Name of source file and relative path to the file.

STATUS TEXT Status: loaded, load failed or partially loaded.

ROWS_PARSED NUMBER Number of rows parsed from the source file.

ROWS_LOADED NUMBER Number of rows loaded from the source file.

ERROR_LIMIT NUMBER If the number of errors reaches this limit, then


abort.
ERRORS_SEEN NUMBER Number of error rows in the source file.

FIRST_ERROR TEXT First error of the source file.

FIRST_ERROR_LINE NUMBER Line number of the first error.

FIRST_ERROR_CHARACTER NUMBER Position of the first error character.

FIRST_ERROR_COLUMN_NAME TEXT Column name of the first error.

tombaileycourses.com
COPY INTO <table> Validation

VALIDATION_MODE VALIDATE

Optional parameter allows you to perform a dry- Validate is a table function to view all errors
run of load process to expose errors. encountered during a previous COPY INTO
execution.

• RETURN_N_ROWS
Validate accepts a job id of a previous query or
• RETURN_ERRORS
the last load operation executed.
• RETURN_ALL_ERRORS

COPY INTO MY_TABLE


FROM @MY_INT_STAGE; SELECT * FROM TABLE(VALIDATE(MY_TABLE,
JOB_ID=>'5415FA1E-59C9-4DDA-B652-533DE02FDCF1'));
VALIDATION_MODE = ‘RETURN_ERRORS’;

tombaileycourses.com
File Formats

tombaileycourses.com
File Formats

File format options can be set on a named stage or COPY CREATE STAGE MY_STAGE
INTO statement. FILE_FORMAT=(TYPE='CSV' SKIP_HEADER=1);

CREATE FILE FORMAT MY_CSV_FF


Explicitly declared file format options can all be rolled up
TYPE='CSV’
into independent File Format Snowflake objects.
SKIP_HEADER=1;

File Formats can be applied to both named stages and COPY CREATE OR REPLACE STAGE MY_STAGE
INTO statements. If set on both COPY INTO will take
precedence.
FILE_FORMAT=MY_CSV_FF;

tombaileycourses.com
File Formats

In the File Format object the file format you’re expecting to load
is set via the ‘type’ property with one of the following values:
CSV , JSON, AVRO, ORC, PARQUET or XML. Number of lines
at the start of
the file to skip.
Each ‘type’ has it’s own set of properties related to parsing that
specific file format.

CREATE FILE FORMAT MY_CSV_FF Specifies the


current
TYPE='CSV’;
compression
algorithm for
the data file.

If a File Format object or options are not provided to either the


stage or COPY statement, the default behaviour will be to try
and interpret the contents of a stage as a CSV with UTF-8
encoding.

tombaileycourses.com
Snowpipe and Loading
Best Practises

tombaileycourses.com
Snowpipe

Stage Table

Stage Pipe Table Stage

Table

tombaileycourses.com
Snowpipe

There are two methods for detecting when a new file has
been uploaded to a stage:
CREATE PIPE MY_PIPE
• Automating Snowpipe using cloud messaging
AUTO_INGEST=TRUE (external stages only)
AS • Calling Snowpipe REST endpoints
(internal and external stages)
COPY INTO MY_TABLE

FROM @MY_STAGE
The Pipe object defines a COPY INTO <table>
FILE_FORMAT = (TYPE = 'CSV'); statement that will execute in response to a file
being uploaded to a stage.

tombaileycourses.com
Snowpipe: Cloud Messaging

S3 Bucket / SQS Queue Pipe Table


External Stage

Cloud COPY INTO


PutObject
Utils <table>

tombaileycourses.com
Snowpipe: REST Endpoint

S3 Bucket / Pipe COPY INTO Table


External Stage <table>

InsertFiles
REST endpoint

File List and


Pipe Name

tombaileycourses.com
Snowpipe

Snowpipe is designed to load new data typically within a minute


after a file notification is sent. 1
Stage Minute Table

Snowpipe is serverless feature, using Snowflake managed


compute resources to load data files not a user managed Virtual
Warehouse.

Snowpipe load history is stored in the metadata of the pipe for


14 Days
14 days, used to prevent reloading the same files in a table.

When a pipe is paused, event messages received for the pipe


enter a limited retention period. The period is 14 days by 14 Days
default.

tombaileycourses.com
Bulk Loading vs. Snowpipe

Feature Bulk Loading Snowpipe

Authentication Relies on the security When calling the REST endpoints: Requires key pair
options supported by the authentication with JSON Web Token (JWT). JWTs are
client for authenticating signed using a public/private key pair with RSA encryption.
and initiating a user
session.
Load History Stored in the metadata of Stored in the metadata of the pipe for 14 days.
the target table for 64
days.
Compute Requires a user-specified Uses Snowflake-supplied compute resources.
Resources warehouse to execute
COPY statements.
Billing Billed for the amount of Snowflake tracks the resource consumption of loads for all
time each virtual pipes in an account, with per-second/per-core granularity, as
warehouse is active. Snowpipe actively queues and processes data files.
In addition to resource consumption, an overhead is included
in the utilization costs charged for Snowpipe: 0.06 credits
per 1000 files notified or listed via event notifications or
REST API calls.

tombaileycourses.com
Data Loading Best Practises

• 2022/07/10/05/
• 2022/06/01/11/

100-250 MB Compressed Organize Data By Path Load Query

1
Minute

Pre-sort data Once per minute

tombaileycourses.com
Data Unloading Overview

tombaileycourses.com
Data Unloading

COPY INTO <location>

GET
Stage Table

Stage Pipe Table Stage

Table

tombaileycourses.com
Data Unloading

Table data can be unloaded to a stage via the


COPY INTO <location> command.

CSV

COPY INTO @MY_STAGE


JSON
FROM MY_TABLE;

Table Stage Parquet

The GET command is used to download a staged


file to the local file system.

GET @MY_STAGE
file:///folder/files/;

Stage

tombaileycourses.com
Data Unloading

By default results unloaded to a stage using COPY INTO <location>


command are split in to multiple files:

CSV GZIP UTF-8

All data files unloaded to internal stages are


automatically encrypted using 128-bit keys.

tombaileycourses.com
COPY INTO <location> Examples

COPY INTO @MY_STAGE/RESULT/DATA_


FROM (SELECT * FROM T1)
Output files can be prefixed by specifying a string at the
FILE_FORMAT = MY_CSV_FILE_FORMAT; end of a stage path.

COPY INTO @%T1


FROM T1
COPY INTO <location> includes a PARTITION BY copy
PARTITION BY ('DATE=' || TO_VARCHAR(DT)) option to partition unloaded data into a directory
FILE_FORMAT=MY_CSV_FILE_FORMAT; structure.

COPY INTO 'S3://MYBUCKET/UNLOAD/'


FROM T1 COPY INTO <location> can copy table records directly to
STORAGE_INTEGRATION = MY_INT external cloud provider’s blob storage.
FILE_FORMAT=MY_CSV_FILE_FORMAT;

tombaileycourses.com
COPY INTO <location> Copy Options

Copy Option Definition Default Value

Boolean that specifies whether the COPY command overwrites existing


OVERWRITE ‘ABORT_STATEMENT’
files with matching names, if any, in the location where files are stored.

SINGLE Boolean that specifies whether to generate a single file or multiple files. FALSE

Number (> 0) that specifies the upper size limit (in bytes) of each file
MAX_FILE_SIZE FALSE
to be generated in parallel per thread.

Boolean that specifies whether to uniquely identify unloaded files by


INCLUDE_QUERY_ID including a universally unique identifier (UUID) in the filenames of FALSE
unloaded data files.

tombaileycourses.com
GET

GET is the reverse of PUT. It allows users to specify a source


stage and a target local directory to download files to. GET @MY_STAGE FILE:///TMP/DATA/;

GET cannot be used for external stages.

GET cannot be executed from within worksheets.

Downloaded files are automatically decrypted.

Parallel optional parameter specifies the number of threads to


GET @MY_STAGE FILE:///TMP/DATA/
use for downloading files. Increasing this number can improve PARALLEL=99;
parallelisation with downloading large files.

Pattern optional parameter specifies a regular expression GET @MY_STAGE FILE:///TMP/DATA/


pattern for filtering files to download. PATTERN=‘*\\.(csv)’;

tombaileycourses.com
Semi-structured Overview

tombaileycourses.com
Semi-structured Data Types

VARIANT

• VARIANT is universal semi-structured data type of Snowflake for loading data in semi-structured data formats.

• VARIANT are used to represent arbitrary data structures.

• Snowflake stores the VARIANT type internally in an efficient compressed columnar binary representation.

• Snowflake extracts as much of the data as possible to a columnar form, based on certain rules.

• VARIANT data type can hold up to 16MB compressed data per row.

• VARIANT column can contain SQL NULLs and VARIANT NULL which are stored as a string containing the word “null”.

tombaileycourses.com
Semi-structured Data Overview

{"widget": {
"debug": "on", <widget>
"window": { <debug>on</debug>
"title": "Sample Konfabulator Widget", <window title="Sample Konfabulator Widget">
"name": "main_window", <name>main_window</name>
"width": 500, <width>500</width>
"height": 500 <height>500</height>
}, </window>
"image": { <image src="Images/Sun.png" name="sun1">
"src": "Images/Sun.png", <hOffset>250</hOffset>
"name": "sun1", <hOffset>300</hOffset>
"hOffset": [250, 300, 850], <hOffset>850</hOffset>
"alignment": "center" <alignment>center</alignment>
}, </image>
"text": { <text data="Click Here" size="36" style="bold">
"data": "Click Here", <name>text1</name>
"size": 36, <hOffset>250</hOffset>
"style": "bold", <vOffset>100</vOffset>
"name": "text1", <alignment>center</alignment>
"hOffset": 250, <onMouseUp>
"vOffset": 100, sun1.opacity = 90;
"alignment": "center", </onMouseUp>
"onMouseUp": "sun1.opacity = 90;" </text>
}} </widget>

JSON XML

tombaileycourses.com
Semi-structured Data Types

ARRAY

Contains 0 or more elements of data. Each element


is accessed by its position in the array.

CREATE TABLE MY_ARRAY_TABLE (


NAME VARCHAR,
HOBBIES ARRAY
);

INSERT INTO MY_ARRAY_TABLE


SELECT 'Alina Nowak’, ARRAY_CONSTRUCT('Writing', 'Tennis', 'Baking');

tombaileycourses.com
Semi-structured Data Types

OBJECT

Represent collections of key-value pairs.

CREATE TABLE MY_OBJECT_TABLE (


NAME VARCHAR,
ADDRESS OBJECT
);

INSERT INTO MY_OBJECT_TABLE


SELECT 'Alina Nowak', OBJECT_CONSTRUCT('postcode', 'TY5 7NN', 'first_line', '67 Southway Road');

tombaileycourses.com
Semi-structured Data Types

VARIANT
VARIANT data type can hold up to
Universal Semi-structured data type used to 16MB compressed data per row
represent arbitrary data structures.

CREATE TABLE MY_VARIANT_TABLE (


NAME VARIANT,
ADDRESS VARIANT,
HOBBIES VARIANT
);

INSERT INTO MY_VARIANT_TABLE


SELECT
'Alina Nowak'::VARIANT,
OBJECT_CONSTRUCT('postcode', 'TY5 7NN', 'first_line', '67 Southway Road'),
ARRAY_CONSTRUCT('Writing', 'Tennis', 'Baking’);

tombaileycourses.com
Semi-structured Data Formats

JSON AVRO ORC

Plain-text data-interchange format based Binary row-based storage format


Highly efficient binary format used to
on a subset of the JavaScript originally developed for use with Apache
store Hive data.
programming language. Hadoop.

Load Unload Load Load

PARQUET XML

Binary format designed for projects in Consists primarily of tags <> and
the Hadoop ecosystem. elements.

Load Unload Load

tombaileycourses.com
Loading and Unloading
Semi-structured Data

tombaileycourses.com
Loading Semi-structured Data

SEMI-STRUCTURED
STAGE TABLE
DATA FILES
PUT COPY
INTO

CREATE FILE FORMAT <name>


TYPE = { JSON | AVRO | ORC | PARQUET | XML }
[<FORMAT OPTIONS>];

tombaileycourses.com
JSON File Format options
DESC FILE FORMAT FF_JSON;

Option Details

Used only for loading JSON data into separate columns. Defines the format of
DATE_FORMAT
date string values in the data files.

Used only for loading JSON data into separate columns. Defines the format on
TIME FORMAT
time string values in the data files.

COMPRESSION Supported algorithms: GZIP, BZ2, BROTLI, ZSTD, DEFLATE, RAW_DEFLATE,


NONE. If BROTLI, cannot use AUTO.

Only used for loading. If TRUE, allows duplicate object field names (only the
ALLOW DUPLICATE
last one will be preserved)

STRIP OUTER ARRAY Only used for loading. If TRUE, JSON parser will remove outer brackets []

STRIP NULL VALUES Only used for loading. If TRUE, JSON parser will remove object fields or array
elements containing NULL

tombaileycourses.com
Semi-structured Data Loading Approaches

ELT ETL Automatic Schema Detection

INFER_SCHEMA

CREATE TABLE MY_TABLE


CREATE TABLE MY_TABLE ( CREATE TABLE MY_TABLE ( USING TEMPLATE (
V VARIANT NAME STRING, SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*))
); AGE NUMBER, FROM TABLE(
DOB DATE INFER_SCHEMA(
); LOCATION=>'@MYSTAGE',
FILE_FORMAT=>'FF_PARQUET'
)
));

MATCH_BY_COLUMN_NAME
COPY INTO MY_TABLE COPY INTO MY_TABLE
FROM @MY_STAGE/FILE1.JSON FROM ( SELECT
FILE_FORMAT = FF_JSON; V:name,
V:age,
V:dob COPY INTO MY_TABLE
FROM @MY_STAGE/FILE1.JSON) FROM @MY_STAGE/FILE1.JSON
FILE_FORMAT = FF_JSON; FILE_FORMAT = (TYPE = ‘JSON’)
MATCH_BY_COLUMN_NAME = CASE_SENSITIVE;

tombaileycourses.com
Unloading Semi-structured Data

TABLE STAGE SEMI-STRUCTURED


DATA FILES
COPY GET
INTO

CREATE FILE FORMAT <name>


TYPE = { JSON | AVRO | ORC | PARQUET | XML }
[<FORMAT OPTIONS>];

tombaileycourses.com
Accessing Semi-structured Data

tombaileycourses.com
Accessing Semi-structured Data

{
“employee”:{
“name”:“Aiko Tanaka”,
“_id”:“UNX789544”,
“age”:42
CREATE TABLE EMPLOYEE (
}, src VARIANT
COPY INTO );
“joined_on”:“2019-01-01”,
“skills”:[“Java”, “Kotlin”, “Android”],
“is_manager”:true,
“base_location”:null,
}

tombaileycourses.com
Accessing Semi-structured Data

Dot Notation Bracket Notation Repeating Element

SELECT src:employee.name FROM EMPLOYEES; SELECT SRC[‘employee’][‘name’] FROM EMPLOYEES; SELECT SRC:skills[0] FROM EMPLOYEES;

Element
Array
VARIANT First level Subsequent VARIANT First level Subsequent Index
column element elements column element elements

SELECT SRC:Employee.name FROM EMPLOYEES; SELECT SRC[‘Employee’][‘name’] FROM EMPLOYEES; SELECT GET(SRC, ‘employee’)
FROM EMPLOYEE;

Column name is case insensitive but key names Column name is case insensitive but key names
are case insensitive so the above query would are case insensitive so the above query would SELECT GET(SRC, ‘skills’)[0]
FROM EMPLOYEE;
result in an error. result in an error.

tombaileycourses.com
Casting Semi-structured Data

SELECT src:employee.name, src:joined_on, src:employee.age, src:is_manager, src:base_location FROM EMPLOYEE;

Double colon TO_<datatype>() AS_<datatype>()

SELECT src:employee.joined_on::DATE SELECT TO_DATE(src:employee.joined_on) SELECT AS_VARCHAR(src:employee.name)


FROM EMPLOYEE; FROM EMPLOYEE; FROM EMPLOYEE;

tombaileycourses.com
Semi-structured Functions

tombaileycourses.com
Semi-structured Functions
JSON and XML Parsing Array/Object Creation and Manipulation Extraction Conversion/Casting Type Predicates
CHECK_JSON ARRAY_AGG FLATTEN AS_<object> IS_<object>

CHECK_XML ARRAY_APPEND GET AS_ARRAY IS_ARRAY

JSON_EXTRACT_PATH_TEXT ARRAY_CAT GET_IGNORE_CASE AS_BINARY IS_BOOLEAN

PARSE_JSON ARRAY_COMPACT GET_PATH AS_CHAR , AS_VARCHAR IS_BINARY

PARSE_XML ARRAY_CONSTRUCT OBJECT_KEYS AS_DATE IS_CHAR ,

STRIP_NULL_VALUE ARRAY_CONSTRUCT_COMPACT XMLGET AS_DECIMAL , AS_NUMBER IS_VARCHAR

ARRAY_CONTAINS AS_DOUBLE , AS_REAL IS_DATE ,

ARRAY_INSERT AS_INTEGER IS_DATE_VALUE

ARRAY_INTERSECTION AS_OBJECT IS_DECIMAL

ARRAY_POSITION AS_TIME IS_DOUBLE ,

ARRAY_PREPEND AS_TIMESTAMP_* IS_REAL

ARRAY_SIZE STRTOK_TO_ARRAY IS_INTEGER

ARRAY_SLICE TO_ARRAY IS_NULL_VALUE

ARRAY_TO_STRING TO_JSON IS_OBJECT

ARRAYS_OVERLAP TO_OBJECT IS_TIME

OBJECT_AGG TO_VARIANT IS_TIMESTAMP_*

OBJECT_CONSTRUCT TO_XML TYPEOF

OBJECT_CONSTRUCT_KEEP_NULL

OBJECT_DELETE

OBJECT_INSERT

OBJECT_PICK

tombaileycourses.com
FLATTEN Table Function
Flatten is a table function that accepts compound values
(VARIANT, OBJECT & ARRAY) and produces a row for each item.

SELECT VALUE FROM TABLE(FLATTEN(INPUT => SELECT src:skills FROM EMPLOYEE));

Path Recursive

SELECT VALUE FROM TABLE(FLATTEN( SELECT VALUE FROM TABLE(FLATTEN(


INPUT => PARSE_JSON('{"A":1, "B":[77,88]}’), INPUT => PARSE_JSON('{"A":1, "B":[77,88]}’),
PATH => 'B')); RECURSIVE => true));

Specify a path inside object to flatten. Flattening is performed for all sub-elements
recursively.

tombaileycourses.com
FLATTEN Output

SELECT * FROM TABLE(FLATTEN(INPUT => (ARRAY_CONSTRUCT(1,45,34))));

SEQ KEY PATH INDEX VALUE THIS

A unique sequence For maps or objects, The path to the The index of the element, The value of the element The element being flattened
number associated this column contains element within a data if it is an array; otherwise of the flattened (useful in recursive
with the input record. the key to the structure which needs NULL. array/object. flattening).
exploded value. to be flattened.

tombaileycourses.com
LATERAL FLATTEN

SELECT src:employee.name::varchar, src:employee._id::varchar, src:skills FROM EMPLOYEE;

SELECT
Aiko Tanaka, UNX789544, [ “Java” , “Kotlin” , “Android” ]
src:employee.name,
src:employee._id,
Aiko Tanaka, UNX789544,
f.value
FROM EMPLOYEE e,
Aiko Tanaka, UNX789544,
LATERAL FLATTEN(INPUT => e.src:skills) f;

tombaileycourses.com
Summary of
Snowflake Functions

tombaileycourses.com
Supported Function Types

Input Input Output


Input Output Input Output Input Output Input Output
Input Input Output

Scalar Aggregate Window Table

System User-defined External

tombaileycourses.com
Scalar Functions
Bitwise Expression Semi-structured Data
A scalar function is a function that returns one value
per invocation; these are mostly used for returning
one value per row. String & Binary
Conditional Expression

Regular Expressions
Context

Hash
Conversion SELECT UUID_STRING();
Metadata

Data Generation Output:


File

Date & Time --------------------------------------


|“UUID_STRING()” | Geospatial
--------------------------------------
|d29d4bfa-40cb-4159-9186-e10f5d59f031|
Encryption -------------------------------------- Numeric

tombaileycourses.com
Aggregate Functions

Aggregate functions operate on values across rows


to perform mathematical calculations such as sum,
General average & counting. Stats and Probability

Bitwise Distinct Values

Boolean Cardinality Estimation


INSERT INTO ACCOUNT VALUES ('001', 10.00), ('001', 23.78),('002', 67.78);

SELECT MAX(AMOUNT) FROM ACCOUNT;


Hash Similarity Estimation

Semi-structured Output: Frequency Estimation

Linear Regression Percentile Estimation


----------------
|“MAX(AMOUNT)” |
----------------
|67.78 |
----------------

tombaileycourses.com
Window Functions

Window functions are a subset of aggregate


functions, allowing us to aggregate on a subset
of rows used as input to a function.

SELECT ACCOUNT_ID, AMOUNT, MAX(AMOUNT) OVER (PARTITION BY ACCOUNT_ID) FROM ACCOUNT;

Output:

-----------------------------------------
|“ACCOUNT_ID” |“AMOUNT” |“MAX(AMOUNT)” |
-----------------------------------------
|001 |10.00 |23.78 |
|001 |23.78 |23.78 |
|002 |67.78 |67.78 |
-----------------------------------------

tombaileycourses.com
Table Functions
Table functions return a set of rows for each
input row. The returned set can contain zero,
one, or more rows. Each row can contain one or
more columns.

Data Loading Data Generation Data Conversion Object Modelling

Semi-structured Query Results Usage Information

SELECT RANDSTR(5, RANDOM()), RANDOM() FROM TABLE(GENERATOR(ROWCOUNT => 3));

Output:

--------------------------------------------
|“RANDSTR(5,RANDOM())” |“RANDOM()” |
--------------------------------------------
|My4FU |574440610751796211 |
|YiPSS |1779357660907745898|
|cu2Hw |6562320827285185330|
--------------------------------------------
tombaileycourses.com
System Functions

System functions provide a way


to execute actions in the system.

SELECT system$cancel_query('01a65819-0000-2547-0000-94850008c1ee');

Output:

---------------------------------------------------------------
|“SYSTEM$CANCEL_QUERY('01A65819-0000-2547-0000-94850008C1EE’)”|
---------------------------------------------------------------
|query [01a65819-0000-2547-0000-94850008c1ee] terminated. |
---------------------------------------------------------------

tombaileycourses.com
System Functions

System functions provide


information about the system.

SELECT system$pipe_status(‘my_pipe');

Output:

----------------------------------------------------
|“SYSTEM$PIPE_STATUS('MYPIPE’)” |
---------------------------------------------------
|{"executionState":"RUNNING","pendingFileCount":0} |
----------------------------------------------------

tombaileycourses.com
System Functions

System functions provide


information about queries.

SELECT system$explain_plan_json('SELECT AMOUNT FROM ACCOUNT');

Output:

-----------------------------------------------------------
|“SYSTEM$EXPLAIN_PLAN_JSON('SELECT AMOUNT FROM ACCOUNT')” |
-----------------------------------------------------------
|{ |
|"GlobalStats": { |
| "partitionsTotal": 1, |
| "partitionsAssigned": 1, |
| "bytesAssigned": 1024 |
| }[…] |
-----------------------------------------------------------

tombaileycourses.com
Estimation
Functions

tombaileycourses.com
Estimation Functions

Cardinality Estimation Similarity Estimation Frequency Estimation Percentile Estimation

Estimate
Estimate the Estimate Estimate
similarity of
number of frequency percentile of
two or more
distinct values. values in a set. values in a set.
sets.

tombaileycourses.com
Cardinality Estimation
Snowflake implemented something called the
HyperLogLog cardinality estimation algorithm,
which returns an approximation of the distinct
number of values of a column.

HLL() HLL_ACCUMULATE() HLL_COMBINE()

HLL_ESTIMATE() HLL_EXPORT() HLL_IMPORT()

Output: 1,491,111,415
SELECT APPROX_COUNT_DISTINCT(L_ORDERKEY) FROM LINEITEM;
Execution Time: 44 Seconds

Output: 1,500,000,000
SELECT COUNT(DISTINCT L_ORDERKEY) FROM LINEITEM;
Execution Time: 4 Minutes 20 Seconds

tombaileycourses.com
Similarity Estimation

Snowflake have implemented a two-step process


to estimate similarity, without the need to
compute the intersection or union of two sets.

SELECT MINHASH(5, C_CUSTKEY) FROM CUSTOMER;

-------------------------
|“MINHASH(5, C_CUSTKEY) |
-------------------------
|{ |
| "state": [ |
| 557181968304, |
| 67530801241, |
Output:
| 1909814111197, |
| 8406483771, |
| 34962958513
|
| ], |
| "type": "minhash", |
| "version": 1 |
|} |
-------------------------
tombaileycourses.com
Similarity Estimation

Snowflake have implemented a two-step process


to estimate similarity, without the need to
compute the intersection or union of two sets.

SELECT APPROXIMATE_SIMILARITY(MH) FROM


(
(SELECT MINHASH(5, C_CUSTKEY) MH FROM CUSTOMER)
UNION
(SELECT MINHASH(5, O_CUSTKEY) MH FROM ORDERS)
);

-------------------------------
|“APPROXIMATE_SIMILARITY(MH)” |
Output: -------------------------------
|0.8 |
-------------------------------

tombaileycourses.com
Frequency Estimation

Snowflake have implemented a family of functions


using the Space-Saving algorithm to produce an
estimation of values and their frequencies.

APPROX_TOP_K APPROX_TOP_K_ACCUMULATE

APPROX_TOP_K_COMBINE APPROX_TOP_K_ESTIMATE

SELECT APPROX_TOP_K(P_SIZE, 3, 100000) FROM PART;

Output:

----------------------------------------
|“APPROX_TOP_K(P_SIZE, 3, 100000)” |
----------------------------------------
|[[13,401087],[38,401074],[35,401033]] |
----------------------------------------

tombaileycourses.com
Frequency Estimation

Snowflake have implemented a family of functions


using the Space-Saving algorithm to produce an
estimation of values and their frequencies.

APPROX_TOP_K

SELECT P_SIZE, COUNT(P_SIZE) AS C FROM PART


GROUP BY P_SIZE
ORDER BY C DESC
LIMIT 3;

Output:

--------------------
|“P_SIZE” | “C” |
--------------------
|13 |401,087 |
|38 |401,074 |
|35 |401,033 |
--------------------
tombaileycourses.com
Percentile Estimation

Snowflake have implemented the t-Digest


algorithm as an efficient way of estimating
approximate percentile values in data sets.

APPROX_PERCENTILE APPROX_PERCENTILE_ACCUMULATE

APPROX_PERCENTILE_COMBINE APPROX_PERCENTILE_ESTIMATE

INSERT OVERWRITE INTO TEST_SCORES VALUES (23),(67),(2),(3),(9),(19),(45),(81),(90),(11);

SELECT APPROX_PERCENTILE(score, 0.8) FROM TEST_SCORES;

Output:

---------------------------------
|“APPROX_PERCENTILE(score,0.8)” |
---------------------------------
|74 |
---------------------------------

tombaileycourses.com
Table Sampling

tombaileycourses.com
Table Sampling

Table sampling is a convenient way to read


a random subset of rows from a table.

Fraction-based

SELECT * FROM LINEITEM TABLESAMPLE/SAMPLE [samplingMethod] (<probability>);

Row <p>/100*n Block

SELECT
SELECT * * FROM LINEITEM
FROM LINEITEM SAMPLE SAMPLE (50); (50);
BERNOULLI/ROW SELECT * FROM LINEITEM SAMPLE SYSTEM/BLOCK (50);

SELECT * FROM LINEITEM SAMPLE (50) REPEATABLE/SEED (765); 0 to 2147483647C

tombaileycourses.com
Table Sampling

Table sampling is a convenient way to read


a random subset of rows from a table.

Fixed-size

SELECT * FROM LINEITEM TABLESAMPLE/SAMPLE (<num> ROWS);

SELECT L_TAX, L_SHIPMODE FROM LINEITEM SAMPLE BERNOULLI/ROW (3 rows);

Output:

-------------------------
|“L_TAX” | “L_SHIPMODE” |
------------------------- Fixed-size sampling does not support block
|0.02 |REG AIR | sampling and use of seed. Adding these will
|0.02 |TRUCK | result in an error.
|0.06 |TRUCK |
-------------------------

tombaileycourses.com
Unstructured Data
File Functions

tombaileycourses.com
Unstructured Data

Unstructured
Data

Multi-media Documents

Image Audio Video PDF Spreadsheet Text

tombaileycourses.com
File Functions

BUILD_SCOPED_FILE_URL BUILD_STAGE_FILE_URL GET_PRESIGNED_URL

GET_STAGE_LOCATION GET_RELATIVE_PATH GET_ABSOLUTE_PATH

BUILD_SCOPED_FILE_URL
UNSTRUCTURED STAGE
DATA FILES
BUILD_STAGE_FILE_URL URL

GET_PRESIGNED_URL

PUT file://image.jpg @images_stage SELECT GET_PRESIGNED_URL(@images_stage, image.jpg)

tombaileycourses.com
BUILD_SCOPED_FILE_URL

URL valid
build_scoped_file_url( @<stage_name> , '<relative_file_path>' )
for 24 hours.

SELECT build_scoped_file_url(@images_stage, 'prod_z1c.jpg');

Output:

-----------------------------------------------------------------------
|“BUILD_SCOPED_FILE_URL(@images_stage, ‘prod_z1c.jpg’);” |
-----------------------------------------------------------------------
|https://go44755.eu-west-2.aws.snowflakecomputing.com/api/files/ |
|01a691df-0000-277e-0000-9485000bc022/163298951696390/5fGgfDJX6kvA |
|qZx6tUJNjWDXEu%2f8%2b7a%2fqQ5HFPCKKMs81o1MC5NSLKPzC6p2hy670VChIC7o |
|Po2JwrY8%2fAQ13fVjwXtxs4OUf76eUDVH7G1UzOf5ugveSR6qAQF60EV7y2F9e9cn |
|RWHBMncTyGuyCxd4gxtVSyXRQuQ7s2qBsh6%2bt0Yj4LNsOhjQFmD3EPgfGQ7P81gY |
|z2p%2fFyRcFX4V |
-----------------------------------------------------------------------

When this function is called in a query the role must have USAGE privileges
on an external named stage and READ privileges on an internal named stage.
tombaileycourses.com
BUILD_SCOPED_FILE_URL

When this function is called in a UDF, Stored Procedure or View the


calling role does not require privileges on the underlying stage.

CREATE VIEW PRODUCT_SCOPED_URL_VIEW AS


SELECT build_scoped_file_url(@images_stage, 'prod_z1c.jpg') AS scoped_file_url;

SELECT * FROM PRODUCT_SCOPED_URL_VIEW;

Output:

-----------------------------------------------------------------------
|SCOPED_FILE_URL |
-----------------------------------------------------------------------
|https://go44755.eu-west-2.aws.snowflakecomputing.com/api/files/ |
|01a691df-0000-277e-0000-9485000bc022/163298951696390/5fGgfDJX6kvA |
|qZx6tUJNjWDXEu%2f8%2b7a%2fqQ5HFPCKKMs81o1MC5NSLKPzC6p2hy670VChIC7[…] |
-----------------------------------------------------------------------

tombaileycourses.com
BUILD_STAGE_FILE_URL

The file URL


build_stage_file_url( @<stage_name> , '<relative_file_path>' )
does not expire.

SELECT build_stage_file_url(@images_stage, 'prod_z1c.jpg');

Output:

---------------------------------------------------------------------------------------------------------------
| “BUILD_STAGE_FILE_URL(@images_stage, 'prod_z1c.jpg’)” |
---------------------------------------------------------------------------------------------------------------
|https://go44755.eu-west-2.aws.snowflakecomputing.com/api/files/DEMO_DB/DEMO_SCHEMA/IMAGES_STAGE/prod_z1c.jpg |
---------------------------------------------------------------------------------------------------------------

Calling this function whether it’s part of a query, UDF, Stored Procedure or
View requires privileges on the underlying stage, that is USAGE for external
stages and READ for internal stages.

CREATE STAGE MY_STAGE ALTER STAGE MY_STAGE SET


ENCRYPTION = (TYPE = ‘SNOWFLAKE_SSE’); ENCRYPTION = (TYPE = ‘SNOWFLAKE_SSE’);

tombaileycourses.com
GET_PRESIGNED_URL

get_presigned_url( @<stage_name> , '<relative_file_path>' , [ <expiration_time> ] )

SELECT get_presigned_url(@images_stage, 'prod_z1c.jpg’, 600);

Output:

-----------------------------------------------------------------------------------------
|GET_PRESIGNED_URL(@images_stage, 'prod_z1c.jpg’,600) |
-----------------------------------------------------------------------------------------
|https://sfc-uk-ds1-6-customer-stage.s3.eu-west-2.amazonaws.com/vml0-s- |
|ukst8973/stages/42763db5-31fa-47ef-bdb1-729d81b645c2/tree.jpg?X-Amz-Algorithm=AWS4- |
|HMAC-SHA256&X-Amz-Date=20220827T162316Z&X-Amz-SignedHeaders=host&X-Amz-Expires=599&X- |
|Amz-Credential=AKIA4ANG2XQCHAHQKBKS%2F20220827%2Feu-west-2%2Fs3%2Faws4_request&X-Amz- |
|Signature=0e8edf89acc24d9bd23b5387f8938671f54212be1c50c1bfe26c974ea151af55 |
-----------------------------------------------------------------------------------------

Calling this function whether it’s part of a query, UDF, Stored Procedure or
View requires privileges on the underlying stage, that is USAGE for external
stages and READ for internal stages.

tombaileycourses.com
GET_PRESIGNED_URL

@DOCUMENTS_STAGE

@images_stage/document.pdf

@images_stage/document_metadata.json

{
“relative_path": “document.pdf",
"author": "Corado Fernandez",
"published_on": "2022-01-23",
"topics":[
"nutrition",
"health",
"science"
]
}

tombaileycourses.com
GET_PRESIGNED_URL

CREATE TABLE document_metadata


(
relative_path string,
author string,
published_on date,
topics array
);

COPY INTO document_metadata


FROM
(SELECT
$1:relative_path,
$1:author,
$1:published_on::date,
$1:topics
FROM
@documents_stage/document_metadata.json)
FILE_FORMAT = (type = json);

tombaileycourses.com
GET_PRESIGNED_URL

CREATE VIEW document_catalog AS


(
SELECT
author,
published_on,
get_presigned_url(@documents_stage, relative_path) as presigned_url,
topics
FROM
documents_table
);

SELECT * FROM document_catalog;

Output:

------------------------------------------------------------------------------------------------------------
|AUTHOR |PUBLISHED_ON |PRESIGNED_URL |TOPICS |
------------------------------------------------------------------------------------------------------------
|Corado Fernandez |2022-01-23 |https://sfc-uk-ds1-6-customer-stage.[…] |["nutrition","health","science"] |
------------------------------------------------------------------------------------------------------------

tombaileycourses.com
Directory Tables

tombaileycourses.com
Directory Tables

Internal External

CREATE STAGE INT_STAGE ALTER STAGE EXT_STAGE SET


Directory Table Directory Table
DIRECTORY = (ENABLE = TRUE) DIRECTORY = (ENABLE = TRUE)

SELECT * FROM DIRECTORY(@INT_STAGE)

--------------------------------------------------------------------------------------------------------------
|RELATIVE_PATH |SIZE |LAST_MODIFIED |MD5 |ETAG |FILE_URL |
--------------------------------------------------------------------------------------------------------------
|document.pdf |250,838 |55:42.0 |ba247312[…] |f76b4327[…] |https://go44755.eu-west-2.aws.snowflake[…] |
--------------------------------------------------------------------------------------------------------------

tombaileycourses.com
Refreshing Directory Tables

Directory Tables must be refreshed to reflect the most up-to-date


changes made to stage contents. This includes new files uploaded,
removed files and changes to files in the path.

Internal

ALTER STAGE INT_STAGE REFRESH;

External

tombaileycourses.com
Refreshing Directory Tables

Directory Tables must be refreshed to reflect the most up-to-date


changes made to stage contents. This includes new files uploaded,
removed files and changes to files in the path.

Enable a directory table on an external stage.

CREATE STAGE INT_STAGE


DIRECTORY = (ENABLE = TRUE)

Describe stage and retrieve ARN for Snowflake managed


SQS Queue in field directory_notification_channel.
External
DESCRIBE STAGE EXT_STAGE;

Configure event notifications for a S3 bucket to notify


Snowflake managed SQS queue associated with the stage
when new or updated data is available to read into the
directory table metadata.

tombaileycourses.com
File Support REST API

tombaileycourses.com
File Support REST API

GET /api/files

Scoped URL File URL

BUILD_SCOPED_FILE_URL(); BUILD_STAGE_FILE_URL();

Only the user who Any role that has


Authorization generated the scoped privileges on the
URL can download the underlying stage can
staged file. access the file.

tombaileycourses.com
File Support REST API

1 import requests

2 response = requests.get(“https://go44755.eu-west-2.aws.snowflakecomputing.com/api/files/DB/SCHEMA/STAGE/img.jpg”,

3 headers={

4 "User-Agent": "reg-tests",

5 "Accept": "*/*",

6 "X-Snowflake-Authorization-Token-Type": "OAUTH",

7 "Authorization": """Bearer {}""".format(token)

8 },

9 allow_redirects=True)

10 print(response.status_code)

11 print(response.content)

tombaileycourses.com
Storage Layer Overview

tombaileycourses.com
Storage Summary

CSV JSON

ORC Parquet

Avro XML

P1

P2

P3

tombaileycourses.com
Micro-partitions

tombaileycourses.com
Micro-partitions

Micro-partition 1

order_id item_id order_date


001 2456ASTT 01/06/2022

002 098098SS 01/06/2022

003 TT778GH2 01/06/2022

004 JX098FJ32 02/06/2022


Micro-partition 2
005 TF098SD32 02/06/2022

006 CC098FJ32 02/06/2022

COPY INTO MY_CSV_TABLE FROM


@MY_STAGE/orders.csv

tombaileycourses.com
Micro-partitions

Micro-partition 1
Snowflake partitions along the natural ordering of the input data
as it is inserted/loaded.

Micro-partitions are the physical file stored in blob storage and


they range in size from 50-500mb of uncompressed data.

Micro-partitions undergo a reorganisation process into the Micro-partition 2


Snowflake columnar data format.

Micro-partitions are immutable, they are write once and read


many.

tombaileycourses.com
Micro-partition Metadata

Micro-partition 1 Partition Metadata

MIN: 1 MAX: 3

MIN: 098098SS MAX: TT778GH2


MIN: 01/06/2022 MAX: 01/06/2022

tombaileycourses.com
Micro-partition Pruning
MY_CSV_TABLE

Micro-partition 1 001-100
Micro-partition metadata allows Snowflake to optimize a query
by first checking the min-max metadata of a column and
discarding micro-partitions from the query plan that are not
Micro-partition 2 101-200
required.

Micro-partition 3 201-300
SELECT ORDER_ID, ITEM_ID
FROM MY_CSV_TABLE
Micro-partition 4 301-400 WHERE ORDER_ID > 360 AND ORDER_ID < 460;

Micro-partition 5 401-500
The metadata is typically considerably smaller than the actual
data, speeding up query time.
Micro-partition 6 501-600

tombaileycourses.com
Time Travel & Fail-Safe

tombaileycourses.com
Data Lifecycle

Current Data Time Travel


Fail-Safe
Storage Retention

tombaileycourses.com
Time Travel

Current Data Time Travel


Fail-Safe
Storage Retention

Time Travel enables users to restore objects such as


UNDROP DATABASE MY_DATABASE;
tables, schemas and databases that have been removed.

Time Travel enables users to analyse historical data by SELECT * FROM MY_TABLE AT(TIMESTAMP =>
querying it at points in the past. TO_TIMESTAMP('2021-01-01'));

Time Travel enables users to create clones of objects CREATE TABLE MY_TABLE_CLONE CLONE MY_TABLE
from a point in the past. AT (TIMESTAMP => TO_TIMESTAMP('2021-01-01'));

tombaileycourses.com
Time Travel Retention Period

The Time Travel retention period is configured with the parameter: ALTER DATABASE MY_DB
DATA_RETENTION_TIME_IN_DAYS. SET DATA_RETENTION_TIME_IN_DAYS=90;

Enterprise Edition
The default retention period on the account, database, schema and
table level is 1 day. Standard Edition

On the Standard edition of Snowflake the minimum value is 0 and


maximum is 1 day and for Enterprise and higher the maximum is 0 1 90
increase from 1 to 90.

Temporary Transient
Temporary and transient objects can have a max retention period of
1 day across all editions.

0 1 0 1
tombaileycourses.com
Accessing Data In Time Travel

AT BEFORE UNDROP

SELECT * FROM MY_TABLE SELECT * FROM MY_TABLE UNDROP TABLE MY_TABLE;


AT(STATEMENT => BEFORE(STATEMENT => UNDROP SCHEMA MY_SCHEMA;
'01a00686-0000-0c47'); '01a00686-0000-0c47'); UNDROP DATABASE MY_DATABASE;

The AT keyword allows you to capture The BEFORE keyword allows you to select The UNDROP keyword can be used to
historical data inclusive of all changes historical data from a table up to, but not restore the most recent version of a
made by a statement or transaction up including any changes made by a specified dropped table, schema or database.
until that point. statement or transaction.
If an object of the same name already
Three parameters are available to One parameter is available to specify a exists an error is returned.
specify a point in the past: point in the past:
To view dropped objects you can use:
• TIMESTAMP • STATEMENT SHOW TABLES HISTORY;
• OFFSET
• STATEMENT

tombaileycourses.com
Fail-safe

Current Data Time Travel


Fail-Safe
Storage Retention

Fail-safe is a non-configurable period of 7 days in which historical


data can be recovered by contacting Snowflake support.

0 1 90 7

It could take several hours or days for Snowflake to complete


recovery.

Fail-safe is only enabled for permanent objects, not temporary


or transient.

tombaileycourses.com
Cloning

tombaileycourses.com
Cloning

Cloning is the process of creating a copy of an existing


CREATE TABLE MY_TABLE ( object within a Snowflake account.
COL_1 NUMBER COMMENT 'COLUMN ONE',
COL_2 STRING COMMENT 'COLUMN TWO',
); Users can clone:
• DATABASES
• SCHEMAS
CREATE TABLE MY_TABLE_CLONE CLONE MY_TABLE;

• TABLES
• STREAMS
CREATE STAGE MY_EXT_STAGE • STAGES
URL='S3://RAW/FILES/' • FILE FORMATS
CREDENTIALS=();
• SEQUENCES
CREATE STAGE MY_EXT_STAGE_CLONE CLONE • TASKS
MY_EXT_STAGE;
• PIPES (reference external stage only)

Cloning is a metadata only operation, copying the


CREATE FILE FORMAT MY_FF properties, structure and configuration of it's source.
TYPE=JSON;
CREATE FILE FORMAT MY_FF_CLONE CLONE MY_FF; Cloning does not contribute to storage costs until data is
modified or new data is added to the clone.

tombaileycourses.com
Zero-copy Cloning

CREATE TABLE MY_CLONE CLONE MY_TABLE;

MY_TABLE P1 P2 P3 P4 P5
Changes made after the point of cloning then start to create
additional micro-partitions.

Changes made to the source or the clone are not reflected between
each other, they are completely independent.

Clones can be cloned with nearly no limits.


P1 P2 P3 P4 P5

Because cloning is a meta-data only operation it’s very quick,


enabling interesting use-case, such as rapid integration testing.

MY_CLONE
tombaileycourses.com
Cloning Rules

A cloned object does not retain the privileges of the source


object, with the exception of tables.

Cloning is recursive for databases and schemas.

External tables and internal named stages are never cloned.

A cloned table does not contain the load history of the source table.

Temporary and transient tables can only be cloned as temporary


or transient tables, not permanent tables.
tombaileycourses.com
Cloning With Time Travel

Time Travel and Cloning features can be combined


CREATE TABLE MY_TABLE_CLONE CLONE MY_TABLE to create a clone of an existing database, schema
AT (TIMESTAMP => TO_TIMESTAMP(‘2022-01-01')); non-temporary table and stream at a point within
their retention period.

If the source object did not exist at the time


CREATE TABLE MY_TABLE_CLONE CLONE MY_TABLE
AT (OFFSET => -60*30);
specified in the AT | BEFORE parameter an error is
thrown.

tombaileycourses.com
Replication

tombaileycourses.com
Replication
ORG_1
Replication is a feature in Snowflake which enables replicating databases
account1.eu-west-2.aws between Snowflake accounts within an organization.

A database is selected to serve as the primary database from which


secondary databases can be created in other accounts:
P
ALTER DATABASE DB_1
ENABLE REPLICATION TO ACCOUNTS ORG1.account2;
DB_1 DB_2

When a primary database is replicated a snapshot of it's database objects


account2.eu-central-1.aws and data are transferred to the secondary database:

CREATE DATABASE DB_1_REPLICA


AS REPLICA OF ORG_1.account1.DB_1;

The secondary database can be periodically refreshed:


DB_1_REPLICA
ALTER DATABASE DB_1_REPLICA REFRESH;

tombaileycourses.com
Replication
ORG_1

account1.eu-west-2.aws External Tables, Stages, Pipes, Streams and Tasks are not currently
replicated.

Refreshing a secondary database can be automated by configuring a task


object to run the refresh command on a schedule.
DB_1 DB_2

Only databases and some of their child objects can be replicated, not users,
account2.eu-central-1.aws roles, warehouses, resource monitors or shares.

Privileges granted to database objects are not replicated to the secondary


database.

DB_1_REPLICA
Billing for database replication is based on data transfer and compute
resources.
tombaileycourses.com
Storage Billing

tombaileycourses.com
Storage Billing

Data storage cost is calculated monthly based on the average number of


on-disk bytes for all data stored each day in a Snowflake account.

Current Data Time Travel


Fail-Safe
Storage Retention

Database Tables Internal Stages

The monthly costs for storing data $42 per Terabyte per month
in Snowflake is based on a flat rate for customers deployed in
per Terabyte. AWS – EU (London)

tombaileycourses.com
Data Storage Pricing

Storage cost is determined by Cloud Provider, Region & Pricing Plan.

tombaileycourses.com
Data Storage Billing Scenarios

Storage

If an average of 15TB is stored during a month If an average of 14TB is stored during a month
on account deployed in AWS Europe (London) Region on account deployed in AWS AP Mumbai Region
it will cost $630.00 (Nov 2021) it will cost $644.00 (Nov 2021)

tombaileycourses.com
Storage Monitoring

Data storage usage can be monitored from the Classic and


Snowsight user interfaces.

Equivalent functionality can be achieved via the account usage


views and information schema views and table functions.

Account Usage Views:


• DATABASE_STORAGE_USAGE
• STAGE_STORAGE_USAGE_HISTORY
• TABLE_STORAGE_METRICS

tombaileycourses.com
Data Sharing

tombaileycourses.com
Secure Data Sharing

Secure Data Sharing allows an account to provide read-only access to selected


database objects to other Snowflake accounts without transferring data.

Account A Account B

MY_DATABASE MY_SHARED_DATABASE

MY_TABLE MY_TABLE

DATA PROVIDER DATA CONSUMER

tombaileycourses.com
Secure Data Sharing

Sharing is enabled with an account-level SHARE object. It is


created by a data provider and consists of two key configurations:

• Grants on database objects Account A Account B


• Consumer account definitions
Share Database

Table Table
An account can share the following database objects:
External Table
• Tables
Secure View
• External tables
• Secure views M
Secure Materialized View Account C
• Secure materialized views Secure Function

Secure UDFs
Database

Add ACCOUNT B
Secure View
Add ACCOUNT C
Data consumers create a database from a SHARE which contains
the read-only database objects granted by the data provider.
DATA PROVIDER DATA CONSUMER

Sharing is not available on the VPS edition of Snowflake.

tombaileycourses.com
Secure Data Sharing Example

https://GYU889T -- Create empty share (CREATE SHARE privilege required) https://TT67B77


CREATE SHARE MY_SHARE;

MY_DATABASE -- Sharing objects SHARED_DATABASE


GRANT USAGE ON DATABASE MY_DATABASE TO SHARE MY_SHARE;
MY_SCHEMA MY_SCHEMA
GRANT USAGE ON SCHEMA MY_DATABASE.MY_SCHEMA TO SHARE MY_SHARE;
Tables Tables
GRANT USAGE ON TABLE MY_DATABASE.MY_SCHEMA.MY_TABLE TO SHARE MY_SHARE;
MY_TABLE MY_TABLE
-- Add accounts
MY_SECOND_TABLE
ALTER SHARE MY_SHARE ADD ACCOUNTS=TT67B77;
Views
MY_SECURE_VIEW

MY_SHARE -- LIST AVAILABLE SHARES (IMPORT SHARE privilege required)

SHOW SHARES;
Database Privilege
-- Create database from share (IMPORT SHARE privilege required)
Schema Privilege
CREATE DATABASE SHARED_DATABASE FROM SHARE GYU889T.MY_SHARE;

Table Privilege -- ACCOUNT ADMIN LISTS CONTENTS OF AVAILABLE SHARE

Add TT6TB77 DESC SHARE GYU889T.MY_SHARE;

-- Query shared table

SELECT * FROM SHARED_DATABASE.MY_TABLE;

PROVIDER ACCOUNT CONSUMER ACCOUNT


tombaileycourses.com
Data Provider

Account A Account B
Database objects added to a share become SHARE Database

immediately available to all consumers.

SQL compilation error: Database 'TEST_DB' does


Only one database can be added per share. not belong to the database that is being shared.

Account A

SHARE SHARE_2 SHARE_3


No hard limits on the number of shares you Add D
Add B Add B
can create or the number of accounts you can
add to a share.
Add C Add D

Add C

tombaileycourses.com
Data Provider

To execute a create share command a user must have a


role with the CREATE SHARE privilege granted.
CREATE SHARE

Account A Account B
Access to a share or database objects in a share can be SHARE Database
revoked at any time.

AWS Europe (London) Azure Japan East (Tokyo)

Replicate Account C
A share can only be granted to accounts in the same Share
region and cloud provider as the data provider account. Account B

Account A Share

tombaileycourses.com
Data Consumer

Only one database can be created per share. Importing more than once is not supported. Database
is already imported as 'SNOWFLAKE_SAMPLE_DATA’.

A data consumer cannot use the Time Travel


feature on shared database objects.
Time Travel

Account A Account B

A data consumer cannot create a clone of a


SHARE Database

shared database or database objects.


Database Clone

tombaileycourses.com
Data Consumer

To create a database from a share a user must


have a role with the IMPORT SHARE privilege
granted. IMPORT SHARE

Account A Account B

Data consumers cannot create objects inside the SHARE Database

shared database.

Account A Account B Account C

Data consumers cannot reshare shared database SHARE Database Database

objects.

tombaileycourses.com
Reader Account

A reader account provides the ability for non-Snowflake customers Provider


to gain access to a providers data. Account

Reader accounts can’t insert data or copy data into an account.

A reader account can only consume data from the provider account
that created it.

Reader
Provider accounts assume responsibility for the reader account they Account
create and are billed for their usage.

tombaileycourses.com
Reader Account

https://GYU889T -- CREATE MANAGED ACCOUNT privilege required https://X88TYUS


CREATE MANAGED ACCOUNT READER_ACCT1
ADMIN_NAME=user1,
MY_DATABASE ADMIN_PASSWORD='Sdfed43da!44', SHARED_DATABASE
TYPE=READER;
MY_SCHEMA MY_SCHEMA
-- Create empty share (CREATE SHARE privilege required)
Tables Tables
CREATE SHARE MY_SHARE;
MY_TABLE MY_TABLE

MY_SECOND_TABLE -- Sharing objects

Views GRANT USAGE ON DATABASE MY_DATABASE TO SHARE MY_SHARE;


MY_SECURE_VIEW
GRANT USAGE ON SCHEMA MY_DATABASE.MY_SCHEMA TO SHARE MY_SHARE;

GRANT USAGE ON TABLE MY_DATABASE.MY_SCHEMA.MY_TABLE TO SHARE MY_SHARE;


MY_SHARE
-- Add accounts

Database Privilege ALTER SHARE MY_SHARE ADD ACCOUNTS=X88TYUS;

Schema Privilege

Table Privilege
-- IMPORT SHARE privilege required
Add X88TYUS
CREATE DATABASE SHARED_DATABASE FROM SHARE GYU889T.MY_SHARE;

-- Query shared table

SELECT * FROM SHARED_DATABASE.MY_TABLE;

PROVIDER ACCOUNT READER ACCOUNT


tombaileycourses.com
Data Marketplace
Portal for browsing publicly available third-party shares.

Data Data
Provider Consumer

⇒ Third-party dataset
⇒ Via the Provider Studio
immediately available to
Data Provides can list
query from account without
standard or personalised
transformation.
shares.

tombaileycourses.com
Video 22: Data Exchange

tombaileycourses.com
Data Exchange

A Data Exchange is a private version of the data


marketplace for accounts to provide and consume data.

A Data Exchange for a Snowflake account is set up with


Snowflake support by providing a name and description.

The Snowflake account hosting a Data Exchange is the


Data Exchange Admin and can manage members,
designate members as providers and consumers and
manage listing requests.

tombaileycourses.com

You might also like