Ultimate SnowPro Core Certification Course Slides by Tom Bailey
Ultimate SnowPro Core Certification Course Slides by Tom Bailey
tombaileycourses.com
What is Snowflake?
tombaileycourses.com
Data Platform
Structured & relational data Scalable storage and compute COPY INTO & Snowpipe
tombaileycourses.com
Cloud Native
tombaileycourses.com
Software as a service (SaaS)
tombaileycourses.com
Multi-cluster Shared Data Architecture
tombaileycourses.com
Distributed Architectures
tombaileycourses.com
Shared Disk Architecture
Shared-Disk
tombaileycourses.com
Shared Nothing Architecture
Shared Nothing
tombaileycourses.com
Multi-cluster Shared Data Architecture
Metadata
Decouple storage, compute
and management services.
Data Storage
Layer
tombaileycourses.com
Storage Layer
tombaileycourses.com
Storage Layer
tombaileycourses.com
Storage Layer
tombaileycourses.com
Query Processing Layer
tombaileycourses.com
Query Processing Layer
Storage Layer
tombaileycourses.com
Query Processing Layer
tombaileycourses.com
Services Layer
tombaileycourses.com
Services Layer
• Infrastructure Management
• Transaction Management
Virtual
• Metadata Management Warehouse
• Query parsing and optimisation
• Security
tombaileycourses.com
Services Layer
Metadata
tombaileycourses.com
Editions & Key Features
tombaileycourses.com
Snowflake Editions & Key Features
Virtual Private
Standard Enterprise Business Critical
Snowflake
SQL Support Security, Governance, & Data Protection Compute Resource Management
Interface & Tools Releases Data Import & Export Data Replication & Failover
tombaileycourses.com
SQL Support
⇒ Multi-statement transactions
tombaileycourses.com
SQL Support
⇒ Automatic Clustering
⇒ Zero-copy Cloning
⇒ Materialized Views
tombaileycourses.com
Security, Governance & Data Protection
⇒ Network Policies
tombaileycourses.com
Security, Governance & Data Protection
⇒ Tri-secret secure
⇒ Private Connectivity
tombaileycourses.com
Compute Resource Management
⇒ Resource Monitors
tombaileycourses.com
Interface & Tools
tombaileycourses.com
Data Import & Export
⇒ Bulk Loading
⇒ Bulk Unloading
tombaileycourses.com
Data Replication & Failover
⇒ Database replication
tombaileycourses.com
Snowflake’s Catalogue
and Objects
tombaileycourses.com
Snowflake Object Model
Organisation
Account
Network Resource
User Role Database Warehouse Share
Policy Monitor
Schema
tombaileycourses.com
Organisation, Account,
Database & Schema.
tombaileycourses.com
Organisation Overview
Organisation
tombaileycourses.com
Organisation Setup
Contact Snowflake support Provide organisation name and nominate an account ORGADMIN role added to nominated account
tombaileycourses.com
ORGADMIN Role
ORGADMIN
SHOW REGIONS;
tombaileycourses.com
Account Overview
tombaileycourses.com
Account Regions
aws.us-west-2
US West (Oregon)
aws.ca-central-1
Canada (Central)
azure.westus2
West US 2 (Washington)
tombaileycourses.com
Account URL
xy12345.us-east-2.aws.snowflakecomputing.com
Account
Identifier
acme-marketing-test-account.snowflakecomputing.com
Organization Account
Name
tombaileycourses.com
Database & Schemas
DATABASE
tombaileycourses.com
Database & Schemas
DATABASE SCHEMA
Databases must have a unique identifier in an Schemas must have a unique identifier in a
account. database.
A database must start with an alphabetic A schema must start with an alphabetic character
character and cannot contain spaces or special and cannot contain spaces or special characters
characters unless enclosed in double quotes. unless enclosed in double quotes.
CREATE DATABASE MY_DB_CLONE CLONE MYTESTDB; CREATE SCHEMA MY_SCHEMA_CLONE CLONE MY_SCHEMA;
Namespace
CREATE DATABASE SHARED_DB FROM SHARE UTT783.SHARE;
tombaileycourses.com
Table and View Types
tombaileycourses.com
Table Types
Default table type. Used for transitory Exists until explicitly Query data outside
data. dropped. Snowflake.
Exists until explicitly Persist for duration No fail-safe period. Read-only table.
dropped. of a session.
Fail-safe
tombaileycourses.com
View Types
CREATE VIEW MY_VIEW AS CREATE MATERILIZED VIEW MY_VIEW AS CREATE SECURE VIEW MY_VIEW AS
SELECT COL1, COL2 FROM MY_TABLE; SELECT COL1, COL2 FROM MY_TABLE; SELECT COL1, COL2 FROM MY_TABLE;
Does not contribute to storage cost. Stores results of a query definition Both standard and materialized
and periodically refreshes it. can be secure.
tombaileycourses.com
UDFs and Stored
Procedures
tombaileycourses.com
User Defined Functions (UDFs)
• Python AS
Java
$$
•
pi() * radius * radius
$$;
UDFs accept 0 or more parameters.
UDFs can be called as part of a SQL statement. SELECT AREA_OF_CIRCLE(col1) FROM MY_TABLE;
tombaileycourses.com
JavaScript UDF
JavaScript
tombaileycourses.com
Java UDF
CREATE FUNCTION DOUBLE(X INTEGER) Snowflake boots up a JVM to execute function written in Java.
RETURNS INTEGER
LANGUAGE JAVA
HANDLER='TestDoubleFunc.double’ Snowflake currently supports writing UDFs in Java versions
TARGET_PATH='@~/TestDoubleFunc.jar’ 8.x, 9.x, 10.x, and 11.x.
AS
$$
class TestDoubleFunc { Java UDFs can specify their definition as in-line code or
public static int double(int x) { a pre-compiled jar file.
return x * 2;
}
}
$$; Java UDFs cannot be designated as secure.
Java
tombaileycourses.com
External Functions
CREATE OR REPLACE EXTERNAL FUNCTION CALC_SENTIMENT(STRING_COL VARCHAR) Function Name and Parameters
RETURNS VARIANT Return Type
API_INTEGRATION = AWS_API_INTEGRATION Integration Object
AS 'https://ttu.execute-api.eu-west-2.amazonaws.com/'; URL Proxy Service
API_PROVIDER=AWS_API_GATEWAY
API_AWS_ROLE_ARN='ARN:AWS:IAM::123456789012:ROLE/MY_CLOUD_ACCOUNT_ROLE'
API_ALLOWED_PREFIXES=('HTTPS://XYZ.EXECUTE-API.US-WEST-2.AMAZONAWS.COM/PRODUCTION')
ENABLED=TRUE;
tombaileycourses.com
External Function Call Lifecycle
SELECT CALC_SENTIMENT(‘x’)
FROM MY_TABLE;
External Function
HTTP HTTP
response response
API Integration
tombaileycourses.com
External Function Limitations
tombaileycourses.com
Stored Procedures
In Relational Database Management Systems (RDBMS) stored procedures
were named collections of SQL statements often containing procedural logic.
EXECUTE CLEAR_EMP_TABLES;
CREATE PROCEDURE CLEAR_EMP_TABLES
DELETE FROM EMP01 WHERE EMP_DATE < DATEADD(MONTH, -1, GET_DATE())
AS
DELETE FROM EMP02 WHERE EMP_DATE < DATEADD(MONTH, -1, GET_DATE())
BEGIN
DELETE FROM EMP03 WHERE EMP_DATE < DATEADD(MONTH, -1, GET_DATE())
DELETE FROM EMP04 WHERE EMP_DATE < DATEADD(MONTH, -1, GET_DATE())
DELETE FROM EMP05 WHERE EMP_DATE < DATEADD(MONTH, -1, GET_DATE())
END
tombaileycourses.com
Stored Procedure: JavaScript
Stored procedure identifier and input parameters. CREATE PROCEDURE EXAMPLE_STORED_PROCEDURE(PARAM1 STRING)
$$
Stored procedures mix JavaScript and SQL in var sql_command = “SELECT * FROM ” + param1;
their definition using Snowflake’s JavaScript API.
snowflake.execute({sqlText: sql_command});
return "Succeeded.";
CALL EXAMPLE_STORED_PROCEDURE(‘EMP01’);
$$;
tombaileycourses.com
Stored Procedures & UDFs
Ability to overload
tombaileycourses.com
Sequences
tombaileycourses.com
Sequences
START = 1
INCREMENT = 1;
tombaileycourses.com
Sequences
START = 0
INCREMENT = 5;
35
0 405 45
10 50
15
tombaileycourses.com
Sequences
INCREMENT = 1;
NEXTVAL
1001
tombaileycourses.com
Sequences
ID AMOUNT
1002 756.00
tombaileycourses.com
Tasks & Streams
tombaileycourses.com
Tasks & Streams
SQL
Tasks Streams
tombaileycourses.com
Tasks
A task is an object used to schedule the execution of a SQL command or a stored procedure.
WAREHOUSE = MYWH
tombaileycourses.com
Streams
A stream is an object created to view & track DML changes to a source table – inserts, updates & deletes.
Create Stream
Query Stream
tombaileycourses.com
Streams
INSERT 10 ROWS V1 SELECT * FROM MYSTREAM; INSERT INTO MYTABLE2 SELECT * FROM MYSTREAM;
Empty stream
CREATE STREAM MYSTREAM Tasks & Streams
ON TABLE MYTABLE;
V2
CREATE TASK MYTASK1
UPDATE 2 ROWS SELECT * FROM MYSTREAM;
WAREHOUSE = MYWH
2 Updates
SCHEDULE = '5 MINUTE'
WHEN
SYSTEM$STREAM_HAS_DATA('MYSTREAM')
AS
DELETE 1 ROW V3 SELECT * FROM MYSTREAM;
INSERT INTO MYTABLE1(ID,NAME) SELECT ID, NAME
2 Updates
1 Delete
FROM MYSTREAM WHERE METADATA$ACTION = 'INSERT';
tombaileycourses.com
Billing
tombaileycourses.com
Billing Overview
On-demand Capacity
tombaileycourses.com
Billing Overview
Virtual Warehouse
Cloud Services Serverless Services
Services
tombaileycourses.com
Billing Overview
Credits
Dollar Value
tombaileycourses.com
Compute Billing Overview
Credits Snowflake billing unit of measure for compute resource consumption.
• Credit calculated based on size • Credits calculated at a rate of • Each serverless feature has it’s
of virtual warehouse. 4.4 Credits per compute hour. own credit rate per compute-hour.
• Credit calculated on per second • Only cloud services that • Serverless features are
basis while a virtual warehouse exceeds 10% of the daily usage composed of both compute
is in ‘started’ state. of the compute resources are services and cloud services.
billed.
• Credit calculated with a • This is called the Cloud • Cloud Services Adjustment does
minimum of 60 seconds. Services Adjustment. not apply to cloud services
usage when used by serverless
features.
tombaileycourses.com
Data Storage & Transfer Billing Overview
• Data storage is calculated monthly based on the average • Data transfer charges apply when moving data from one
number of on-disk bytes per day in the following locations: region to another or from one cloud platform to another.
o Database Tables.
o Internal Stages. • Unloading data from Snowflake using COPY INTO
<location> command.
• Costs calculated based on a flat dollar value rate per • Replicating data to a Snowflake account in a different
terabyte (TB) based on: region or cloud platform.
o Capacity or On-demand.
o Cloud provider. • External functions transferring data out of and into
o Region. Snowflake.
tombaileycourses.com
SnowCD
tombaileycourses.com
SnowCD
whitelist.json
snowcd ~\whitelist.json
tombaileycourses.com
Connectivity: Connectors, Drivers
and Partnered Tools
tombaileycourses.com
Connectors and Drivers
ODBC
tombaileycourses.com
Snowflake Partner Tools
tombaileycourses.com
Snowflake Partner Tools
tombaileycourses.com
Snowflake Scripting
tombaileycourses.com
Snowflake Scripting
DECLARE
(variable declarations, cursor declarations, etc.)
BEGIN
(Snowflake Scripting and SQL statements)
EXCEPTION
(statements for handling exceptions)
END;
tombaileycourses.com
Snowflake Scripting
declare
leg_a number(38, 2);
hypotenuse number(38,5);
5.38516
begin
leg_a := 2; 2
let leg_b := 5;
anonymous block
5.38516 Variables can also be declared and assigned in the
BEGIN section using the LET keyword.
tombaileycourses.com
Snowflake Scripting
tombaileycourses.com
Branching Constructs
begin
let count := 4;
if (count % 2 = 0) then
return 'even value';
else DECLARE or EXCEPTION sections
of a block are optional.
return 'odd value';
end if;
end;
anonymous block
even value
tombaileycourses.com
Looping Constructs
declare
total integer default 0;
max_num integer default 10;
begin
for i in 1 to max_num do
total := i + total;
end for;
return total;
end;
anonymous block
55
tombaileycourses.com
Cursor
declare
total_amount float;
c1 cursor for select amount from transactions;
begin
total_amount := 0.0;
for record in c1 do
total_amount := total_amount + record.amount;
end for;
return total_amount;
end;
anonymous block
136.78
tombaileycourses.com
RESULTSET
TABLE()
declare
res resultset;
begin
res := (select amount from transactions);
return table(res);
end;
amount
101.01
24.78
10.99
tombaileycourses.com
RESULTSET
Cursor
declare
total_amount float;
res resultset default (select amount from transactions);
c1 cursor for res;
begin
total_amount := 0.0;
for record in c1 do
total_amount := total_amount + record.amount;
end for;
return total_amount;
end;
anonymous block
136.78
tombaileycourses.com
Snowpark
tombaileycourses.com
Snowpark
Lazily-evaluated/executed.
.select() DataFrame
Java
.join()
.group_by() Row
Scala .distinct()
Row
.drop()
.union() Row
Python .sort()
Row
tombaileycourses.com
Snowpark API: Python
1 import os
2 from snowflake.snowpark import Session
3 from snowflake.snowpark.functions import col
5 connection_parameters = {
6 "account": os.environ["snowflake_account"],
7 "user": os.environ["snowflake_user"],
8 "password": os.environ["snowflake_password"],
9 "role": os.environ["snowflake_user_role"],
10 "warehouse": os.environ["snowflake_warehouse"],
11 "database": os.environ["snowflake_database"],
12 "schema": os.environ["snowflake_schema"]
13 }
tombaileycourses.com
Snowpark API: Python
14 session = Session.builder.configs(connection_parameters).create()
15 transactions_df = session.table("transactions")
16 print(transactions_df.collect())
Console output:
[Row(ACCOUNT_ID=8764442, AMOUNT=12.99),
Row(ACCOUNT_ID=8764442, AMOUNT=50.0),
Row(ACCOUNT_ID=8764442, AMOUNT=1100.0),
Row(ACCOUNT_ID=8764443, AMOUNT=110.0),
Row(ACCOUNT_ID=8764443, AMOUNT=2766.0),
Row(ACCOUNT_ID=8764443, AMOUNT=1010.0),
Row(ACCOUNT_ID=8764443, AMOUNT=3022.23),
Row(ACCOUNT_ID=8764444, AMOUNT=6986.0),
Row(ACCOUNT_ID=8764444, AMOUNT=1500.0)]
tombaileycourses.com
Snowpark API: Python
19 transaction_counts_df = transactions_df_filtered.group_by("account_id").count()
21 flagged_transactions_df.write.save_as_table("flagged_transactions", mode="append")
22 print(flagged_transactions_df.show())
23 session.close()
Console output:
----------------------------------
|“ACCOUNT_ID” |“FLAGGED_COUNT” |
----------------------------------
|8764443 |3 |
|8764444 |2 |
----------------------------------
tombaileycourses.com
Access Control Overview
tombaileycourses.com
Access Control Overview
Role A Object
Privilege Role User
OWNS
GRANTS
SELECT
SELECT
MODIFY
Role B
REMOVE
Role-based access control (RBAC) is an access control Snowflake combines RBAC with Discretionary Access
framework in which access privileges are assigned to roles Control (DAC) in which each object has an owner, who can in
and in turn assigned to users. turn grant access to that object.
tombaileycourses.com
Securable Objects
tombaileycourses.com
Securable Objects
tombaileycourses.com
Roles
tombaileycourses.com
Roles
Role
A role is an entity to which privileges on securable objects
can be granted or revoked.
Privilege GRANT USAGE ON DATABASE TEST_DB TO ROLE TEST_ROLE;
Role Role
Roles are assigned to users to give them the authorization
to perform actions.
Role
Privilege
User USAGETEST_ROLE
GRANT ROLE ON SCHEMATOTEST_SCHEMA
USER ADMIN;TO ROLE TEST_ROLE;
Privilege B Privilege B
tombaileycourses.com
System-defined Roles
tombaileycourses.com
System-defined Roles
ORGADMIN
• Manages operations at organization level.
• Can create account in an organization.
• Can view all accounts in an organization.
• Can view usage information across an organization.
ACCOUNTADMIN
• Top-level and most powerful role for an account.
• Encapsulates SYSADMIN & SECURITYADMIN.
• Responsible for configuring account-level parameters.
• View and operate on all objects in an account.
• View and manage Snowflake billing and credit data.
• Stop any running SQL statements.
SYSADMIN
• Can create warehouses, databases, schemas and other
objects in an account.
tombaileycourses.com
System-defined Roles
SECURITYADMIN
• Manage grants globally via the MANAGE GRANTS
privilege.
• Create, monitor and manage users and roles.
USERADMIN
• User and Role management via CREATE USER and
CREATE ROLE security privileges.
• Can create users and roles in an account.
PUBLIC
• Automatically granted to every user and every role in an
account.
• Can own securable objects, however objects owned by
PUBLIC role are available to every other user and role in
an account.
tombaileycourses.com
Custom Roles
Custom roles allows you to create a role with custom and fine-
grained security privileges defined.
Custom
Role
Custom
Role
It is recommended to create a hierarchy of custom roles with the
top-most custom role assigned to the SYSADMIN role.
Custom
Role
tombaileycourses.com
Privileges
tombaileycourses.com
Privileges
MODIFY
A security privilege defines a level of access to an object.
MONITOR
Future grants allow privileges to be defined for objects not GRANT SELECT ON FUTURE TABLES IN SCHEMA MY_SCHEMA TO ROLE MY_ROLE;
yet created.
tombaileycourses.com
User Authentication
tombaileycourses.com
User Authentication
tombaileycourses.com
MFA
tombaileycourses.com
Multi-factor Authentication (MFA)
tombaileycourses.com
Multi-factor Authentication Flow
Enter Passcode
Enter Snowflake
Login Successful
Interface Credentials
Follow Instructions
on phone Call
Click Call Me
Click Enter A
Passcode
Device
tombaileycourses.com
MFA Properties
ALTER USER USER1 SET ALTER USER USER1 SET ALTER USER USER1 SET
MINS_TO_BYPASS_MFA=10; DISABLE_MFA=TRUE; ALLOWS_CLIENT_MFA_CACHING=TRUE;
Specifies the number of minutes to Disables MFA for the user, effectively MFA token caching reduces the number of
temporarily disable MFA for the user so that cancelling their enrolment. To use MFA prompts that must be acknowledged while
they can log in. again, the user must re-enrol. connecting and authenticating to
Snowflake.
tombaileycourses.com
Federated Authentication
tombaileycourses.com
Federated Authentication (SSO)
tombaileycourses.com
Federated Authentication Login Flow
Enters IdP
credentials
Clicks on Snowflake
Application
IdP
tombaileycourses.com
Federated Authentication Properties
SAML_IDENTITY_PROVIDER SSO_LOGIN_PAGE
How to specify an IdP during the Snowflake Enable button for Snowflake-initiated SSO
setup of Federated Authentication. for your identity provider (as specified in
SAML_IDENTITY_PROVIDER) in the
Snowflake main login page.
tombaileycourses.com
Key Pair Authentication,
OAuth & SCIM
tombaileycourses.com
Key Pair Authentication
tombaileycourses.com
OAuth & SCIM
OAuth SCIM
System for Cross-domain Identity Management (SCIM) can be
Snowflake supports the OAuth 2.0 protocol. used to manage users and groups ( Snowflake roles) in cloud
applications using RESTful APIs.
tombaileycourses.com
Network Policies
tombaileycourses.com
Network Policies
MY_NETWORK_POLICY
Network Policies provide the user with the ability to allow or deny
us47171.eu-west-2.aws.snowflakecomputing.com access to their Snowflake account based on a single IP address or
list of addresses.
tombaileycourses.com
Network Policies
ACCOUNT USER
Only one Network Policy can be associated with an account at any one time. Only one Network Policy can be associated with an user at any one time.
ALTER ACCOUNT SET NETWORK_POLICY = MYPOLICY; ALTER USER USER1 SET NETWORK_POLICY = MYPOLICY;
SECURITYADMIN or ACCOUNTADMIN system roles can apply SECURITYADMIN or ACCOUNTADMIN system roles can apply
policies. Or custom role with the ATTACH POLICY global privilege. policies. Or custom role with the ATTACH POLICY global privilege.
SHOW PARAMETERS LIKE ‘MYPOLICY' IN ACCOUNT; SHOW PARAMETERS LIKE ‘MYPOLICY' IN USER USER1;
tombaileycourses.com
Data Encryption
tombaileycourses.com
Data Encryption
tombaileycourses.com
E2EE Encryption Flows
Internal Stage
PUT COPY INTO
<table>
External Stage
Cloud COPY INTO
Utils <table>
tombaileycourses.com
Hierarchical Key Model
File Keys
Data File Data File Data File Data File Data File Data File
tombaileycourses.com
Key Rotation
Rotate Rotate
Data File Data File Data File Data File Data File
1 2 3 4 5
Key rotation is the practise of transparently replacing existing account and table
encryption keys every 30 days with a new key.
tombaileycourses.com
Re-Keying
Re-key
TMK v1 Gen1
TMK v1 Gen2
Once a retired key exceeds 1 year, Snowflake automatically creates a new encryption key
and re-encrypts all data previously protected by the retired key using the new key.
KMS HSM
Root Keys
File Keys
Data File Data File Data File Data File
tombaileycourses.com
Column Level security
tombaileycourses.com
Dynamic Data Masking
SELECT
Table/View
Policy
Unauthorized ID Email
101 ******@gmail.com
102 ******@gmail.com
103 ******@gmail.com
Masked
tombaileycourses.com
Dynamic Data Masking
SELECT
Table/View
Policy
Authorized ID Email
101 [email protected]
Unmasked
tombaileycourses.com
Masking Policies
ALTER TABLE IF EXISTS EMP_INFO MODIFY COLUMN USER_EMAIL SET MASKING POLICY EMAIL_MASK;
tombaileycourses.com
Masking Policies
ALTER TABLE IF EXISTS EMP_INFO MODIFY COLUMN USER_EMAIL SET MASKING POLICY EMAIL_MASK;
tombaileycourses.com
Masking Policies
Data masking policies are schema-level objects, like tables & views.
Creating and applying data masking policies can be done independently of object owners.
Masking policies can be nested, existing in tables and views that reference those tables.
A masking policy is applied no matter where the column is referenced in a SQL statement.
A data masking policy can be applied either when the object is created or after the object is created.
tombaileycourses.com
External Tokenization
Tokenized data
SELECT
Table/View
Tokenized
External
Authorized
External
ID DOB Policy
Tokenization
Function
Detokenized
101 01/02/1978 Service
102 10/12/1960
103 10/09/2000
REST API
Detokenized
tombaileycourses.com
Row Level security
tombaileycourses.com
Row Access Policies
SELECT
Table/View
Policy
Unauthorized ID Email
Rows filtered
Row access policies enable a security team to restrict which rows are
return in a query.
tombaileycourses.com
Row Access Policies
SELECT
Table/View
Policy
Authorized ID Email
Rows unfiltered
Row access policies enable a security team to restrict which rows are
return in a query.
tombaileycourses.com
Row Access Policies
CREATE OR REPLACE ROW ACCESS POLICY RAP_ID AS (ACC_ID VARCHAR) RETURNS BOOLEAN ->
CASE
WHEN 'ADMIN' = CURRENT_ROLE() THEN TRUE
ELSE FALSE
END;
Similarities
Adding a masking policy to a column fails if
Schema level object the column is referenced by a row access
policy.
Segregation of duties
Creation and applying workflow Row access policies are evaluated before
data masking policies.
Nesting policies
tombaileycourses.com
Secure Views
tombaileycourses.com
Secure Views
A secure view is created by adding the keyword SECURE in CREATE OR REPLACE SECURE VIEW MY_SEC_VIEW AS
the view DDL. SELECT COL1, COL2, COL3 FROM MY_TABLE;
The definition of a secure view is only available to the object SHOW VIEWS; GET_DDL();
Information Account
owner.
Schema Usage
Query Optimizer
tombaileycourses.com
Account Usage and Information
Schema
tombaileycourses.com
Account Usage
By default, only users with the ACCOUNTADMIN role can access the
SNOWFLAKE database.
SELECT * FROM "SNOWFLAKE"."ACCOUNT_USAGE"."TABLES";
Account usage views record dropped objects, not just those that are TABLE_ID DELETED
currently active. 4 2022-12-03 09:08:35.765 -0800
There is latency between an event and when that event is recorded in ~ 2 Hours
an account usage view.
Certain account usage views provide historical usage metrics. The 365 Days
retention period for these views is 1 year.
tombaileycourses.com
Information Schema
Each database created in an account automatically includes a
built-in, read-only schema named INFORMATION_SCHEMA
based on the SQL-92 ANSI Information Schema.
tombaileycourses.com
Account Usage vs. Information Schema
Includes dropped
Yes No
objects
From 45 minutes to 3
Latency of data None
hours (varies by view)
tombaileycourses.com
What is a Virtual
Warehouse?
tombaileycourses.com
Virtual Warehouse Overview
tombaileycourses.com
Virtual Warehouse Overview
tombaileycourses.com
Virtual Warehouse Overview
tombaileycourses.com
Sizing and Billing
tombaileycourses.com
Virtual Warehouse Sizes
5X- 6X-
3X- 4X- Choosing a size is typically done by experimenting with
2X- Large Large
Large Large a representative query of a workload.
Large
tombaileycourses.com
Virtual Warehouse Billing
Virtual
3X-Large 4X-Large 5X-Large 6X-Large Warehouse
2X-Large
X-Small Small Medium Large X-Large Size
Credits /
1 2 4 8 16 32 64 128 256 512
Hour
Credits /
0.0003 0.0006 0.0011 0.0022 0.0044 0.0089 0.0178 0.0356 0.0711 0.1422
Second|
tombaileycourses.com
Credit Calculation
61 Seconds 0.017
2 Minutes 0.033
10 Minutes 0.167
1 Hour 1.000
tombaileycourses.com
Credit Pricing
Virtual Warehouse
tombaileycourses.com
Virtual Warehouse
State and Properties
tombaileycourses.com
Virtual Warehouse State
tombaileycourses.com
Virtual Warehouse State
tombaileycourses.com
Virtual Warehouse State Properties
Specifies the number of seconds of Specifies whether to automatically Specifies whether the warehouse is
inactivity after which a warehouse is resume a warehouse when a SQL created initially in the ‘Suspended’
automatically suspended. statement is submitted to it. state.
tombaileycourses.com
Resource Monitors
tombaileycourses.com
Resource Monitors
tombaileycourses.com
Resource Monitors
FREQUENCY=MONTHLY
DAILY, WEEKLY, MONTHLY, YEARLY or NEVER.
START_TIMESTAMP=‘2023-01-04 00:00 GMT'
TRIGGERS ON 50 PERCENT DO NOTIFY Start timestamp determines when a resource monitor will
start once applied to a warehouse or account. The
ON 75 PERCENT DO NOTIFY frequency is relative to the start timestamp.
ON 95 PERCENT DO SUSPEND
ON 100 PERCENT DO SUSPEND_IMMEDIATE; Triggers determine the condition for a certain action to
take place.
tombaileycourses.com
Scaling Up: Resizing Virtual Warehouses
tombaileycourses.com
Scaling out: Multi-cluster Warehouses
Setting these two values the same will put the multi-
cluster warehouse in MAXIMIZED mode.
tombaileycourses.com
Standard Scaling Policy
Scaling Out
tombaileycourses.com
Economy Scaling policy
Scaling Out
tombaileycourses.com
Multi-cluster Warehousing Billing
CREATE WAREHOUSE MY_MCW The total credit cost of a multi-cluster warehouse is the sum
MIN_CLUSTER_COUNT=1 of all the individual running warehouses that make up that
MAX_CLUSTER_COUNT=3
cluster.
WAREHOUSE_SIZE=‘MEDIUM’;
1 hour 2 hour
tombaileycourses.com
Concurrency Behaviour Properties
tombaileycourses.com
Performance and Tuning
Overview
tombaileycourses.com
Query Performance Analysis Tools
SQL
Users can view other users queries but cannot view their
query results.
tombaileycourses.com
SQL Tuning
tombaileycourses.com
Database Order Of Execution
WHERE ORDER BY
LIMIT
tombaileycourses.com
Join Explosion
Orders Products
ORDER_DATE PRODUCT_NAME CUSTOMER_NAME ORDER_AMOUNT PRODUCT_NAME PRODUCT_PRICE ORDER_DATE
01/12/2022 Apple MacBook Air Arun 1 Apple MacBook Air 899.99 01/12/2022
tombaileycourses.com
Join Explosion
tombaileycourses.com
Limit & Order By
SELECT * FROM CUSTOMER
SELECT * FROM CUSTOMER
ORDER BY C_ACCTBAL
ORDER BY C_ACCTBAL;
LIMIT 10;
tombaileycourses.com
Spilling to Disk
tombaileycourses.com
Order By Position
tombaileycourses.com
Group By
SELECT C_NATIONKEY, COUNT(C_ACCTBAL) SELECT C_CUSTKEY, COUNT(C_ACCTBAL)
FROM CUSTOMER FROM CUSTOMER
GROUP BY C_NATIONKEY; -- Low Cardinality GROUP BY C_CUSTKEY; -- High Cardinality
tombaileycourses.com
Caching
tombaileycourses.com
Caching
Remote Disk
Storage Layer
tombaileycourses.com
Caching
Metadata Cache
Metadata Cache Results Cache Snowflake has a high availability metadata store which
maintains metadata object information and statistics.
Services Layer
SELECT CURRENT_DATABASE();
Remote Disk
DESCRIBE TABLE MY_TABLE;
tombaileycourses.com
Caching
Result Cache
tombaileycourses.com
Caching
Warehouse Cache
Virtual Virtual Virtual The larger the virtual warehouse the greater the local
Warehouse Warehouse Warehouse cache.
Local Disk Cache Local Disk Cache Local Disk Cache
Warehouse Layer
It is purged when the virtual warehouse is resized,
suspended or dropped.
Remote Disk Can be used partially, retrieving the rest of the data
required for a query from remote storage.
Storage Layer
tombaileycourses.com
Materialized Views
tombaileycourses.com
Materialized Views
Base Table
tombaileycourses.com
Materialized Views
tombaileycourses.com
Clustering
tombaileycourses.com
Natural Clustering
A A B C D E F F G G G G X Y Z Z
G F B X C G F Z D A G Z E A Y G
tombaileycourses.com
Clustering
Products.csv
1 PROD-YVN1VO 2022-06-16
2 PROD-Y5TTKB 2022-06-16
3 PROD-T9ISFR 2022-06-16
4 PROD-HK2USO 2022-06-16
5 PROD-YVN1VO 2022-06-17
6 PROD-BKMWB 2022-06-17
7 PROD-IPM6HU 2022-06-18
8 PROD-IPM6HU 2022-06-18
9 PROD-YVN1VO 2022-06-19
… … …
tombaileycourses.com
Clustering
1 PROD-YVN1VO 2022-06-16
1 PROD-YVN1VO 2022-06-16
3 PROD-T9ISFR 2022-06-16
2 PROD-Y5TTKB 2022-06-16
3 PROD-T9ISFR 2022-06-16
ORDER_ID PRODUCT_ID ORDER_DATE
4 PROD-HK2USO 2022-06-16
4 PROD-HK2USO 2022-06-16
5 PROD-YVN1VO 2022-06-17 MP 2
5 PROD-YVN1VO 2022-06-17
6 PROD-BKMWB 2022-06-17
6 PROD-BKMWB 2022-06-17
7 PROD-IPM6HU 2022-06-18
8 PROD-IPM6HU 2022-06-18
ORDER_ID PRODUCT_ID ORDER_DATE
9 PROD-YVN1VO 2022-06-19
7 PROD-IPM6HU 2022-06-18
… … …
MP3
8 PROD-IPM6HU 2022-06-18
9 PROD-YVN1VO 2022-06-19
tombaileycourses.com
Clustering Metadata
Number of Overlapping
Micro-partitions
Depth of Overlapping
Micro-partitions
tombaileycourses.com
Clustering Depth
Q Z
L S A D E J K Z
A Z
A N
Overlapping
Micro-partitions
3 3 0
Overlap Depth 3 2 1
tombaileycourses.com
Automatic Clustering
tombaileycourses.com
Automatic Clustering
PRODUCT_ID PRODUCT_ID
Snowflake supports specifying one or more table
columns/expressions as a clustering key for a table. PROD-YVN1VO PROD-YVN1VO
PROD-Y5TTKB PROD-YVN1VO
PROD-T9ISFR
Cluster PROD-YVN1VO
On
PRODUCT_ID
Clustering aims to co-locate data of the clustering key
MP 1 MP 2
in the same micro-partitions.
tombaileycourses.com
Choosing a Clustering Key
X T U Z C I U L H O Q D
ALTER TABLE T1 CLUSTER BY (C1, C3);
HC O V U O B I U L E V Q F
O F F Z I T U O Z F T W
P1 P2 P3
ALTER TABLE T2 CLUSTER BY (SUBSTRING(C2,5,10),
MONTH(C1));
LC F F F F F M F F F F M M
P1 P2 P3
tombaileycourses.com
Reclustering and Clustering Cost
Reclustered
tombaileycourses.com
Search Optimization
tombaileycourses.com
Search Optimization Service
Search optimization service is a table level property aimed at
improving the performance of selective point lookup queries.
SELECT NAME, ADDRESS FROM USERS SELECT NAME, ADDRESS FROM USERS
WHERE USER_EMAIL = ‘semper.google.edu’; WHERE USER_ID IN (4,5);
tombaileycourses.com
Data Loading Simple Methods
tombaileycourses.com
Data Movement
Stage Table
INSERT
Upload via UI
Table
tombaileycourses.com
INSERT
INSERT INTO MY_TABLE SELECT ‘001’, ‘John Doughnut’, ‘10/10/1976’; Insert a row into a table from the results of a select query.
INSERT INTO MY_TABLE SELECT * FROM MY_TABLE_2; Another table can be used to insert rows into a table.
tombaileycourses.com
Stages
tombaileycourses.com
Stages
Stage Table
Table
tombaileycourses.com
Stages
Stages are temporary storage locations for data files used in the
data loading and unloading process.
Internal External
tombaileycourses.com
Internal Stages
Automatically allocated when a user is created. Automatically allocated when a table is created. User created database object.
Not appropriate if multiple users need User must have ownership Supports copy transformations and
access to stage. privileges on table. applying file formats.
tombaileycourses.com
External Stages and Storage Integrations
External stages reference data files stored in a location outside of Snowflake.
Cloud Utilities
CREATE STORAGE INTEGRATION MY_INT
TYPE=EXTERNAL_STAGE
ls @MY_STAGE; STORAGE_PROVIDER=S3
STORAGE_AWS_ROLE_ARN=‘ARN:AWS:IAM::98765:ROLE/MY_ROLE’
ENABLED=TRUE
STORAGE_ALLOWED_LOCATIONS=(‘S3://MY_BUCKET/PATH/’);
Storage location can be private or public.
Copy options such as ON_ERROR and A storage integration is a reusable and securable Snowflake object which can
PURGE can be set on stages. be applied across stages and is recommended to avoid having to explicitly set
sensitive information for each stage definition.
tombaileycourses.com
Stage Helper Commands
Named and internal table stages can Named and internal table stages can
optionally include database and schema Reference metadata columns such as
optionally include database and schema
global pointer. filename and row numbers for a staged file.
global pointer.
tombaileycourses.com
PUT
PUT cannot be executed from within worksheets. PUT FILE:///FOLDER/MY_DATA.CSV @~; macOS / Linux
tombaileycourses.com
Bulk Loading with
COPY INTO <table>
tombaileycourses.com
Data Movement
File Formats
Stage Table
Table
tombaileycourses.com
COPY INTO <table>
tombaileycourses.com
COPY INTO <table>
COPY INTO MY_TABLE FROM @MY_INT_STAGE; Copy all the contents of a stage into a table.
COPY INTO MY_TABLE FROM @MY_INT_STAGE COPY INTO <table> has an option to provide a list
FILE=(‘folder1/file1.csv’, ‘folder2/file2.csv’); of one or more files to copy.
COPY INTO MY_TABLE FROM @MY_INT_STAGE COPY INTO <table> has an option to provide a
PATTERN=(‘people/.*[.]csv’); regular expression to extract files to load.
tombaileycourses.com
COPY INTO <table> Load Transformations
tombaileycourses.com
COPY External Stage/Location
tombaileycourses.com
Copy Options
Copy Option Definition Default Value
ON_ERROR Value that specifies the error handling for the load operation: ‘ABORT_STATEMENT’
• CONTINUE
• SKIP_FILE
• SKIP_FILE_<num>
• SKIP_FILE_<num>%
• ABORT_STATEMENT
SIZE_LIMIT Number that specifies the maximum size of data loaded by a COPY null (no size limit)
statement.
PURGE Boolean that specifies whether to remove the data files from the stage FALSE
automatically after the data is loaded successfully.
RETURN_FAILED_ONLY Boolean that specifies whether to return only files that have failed to FALSE
load in the statement result.
MATCH_BY_COLUMN_NAME String that specifies whether to load semi-structured data into columns NONE
in the target table that match corresponding columns represented in the
data.
ENFORCE_LENGTH Boolean that specifies whether to truncate text strings that exceed the TRUE
target column length.
TRUNCATECOLUMNS Boolean that specifies whether to truncate text strings that exceed the FALSE
target column length.
FORCE Boolean that specifies to load all files, regardless of whether they’ve FALSE
been loaded previously and have not changed since they were loaded.
LOAD_UNCERTAIN_FILES Boolean that specifies to load files for which the load status is unknown. FALSE
The COPY command skips these files by default.
tombaileycourses.com
COPY INTO <table> Output
tombaileycourses.com
COPY INTO <table> Validation
VALIDATION_MODE VALIDATE
Optional parameter allows you to perform a dry- Validate is a table function to view all errors
run of load process to expose errors. encountered during a previous COPY INTO
execution.
• RETURN_N_ROWS
Validate accepts a job id of a previous query or
• RETURN_ERRORS
the last load operation executed.
• RETURN_ALL_ERRORS
tombaileycourses.com
File Formats
tombaileycourses.com
File Formats
File format options can be set on a named stage or COPY CREATE STAGE MY_STAGE
INTO statement. FILE_FORMAT=(TYPE='CSV' SKIP_HEADER=1);
File Formats can be applied to both named stages and COPY CREATE OR REPLACE STAGE MY_STAGE
INTO statements. If set on both COPY INTO will take
precedence.
FILE_FORMAT=MY_CSV_FF;
tombaileycourses.com
File Formats
In the File Format object the file format you’re expecting to load
is set via the ‘type’ property with one of the following values:
CSV , JSON, AVRO, ORC, PARQUET or XML. Number of lines
at the start of
the file to skip.
Each ‘type’ has it’s own set of properties related to parsing that
specific file format.
tombaileycourses.com
Snowpipe and Loading
Best Practises
tombaileycourses.com
Snowpipe
Stage Table
Table
tombaileycourses.com
Snowpipe
There are two methods for detecting when a new file has
been uploaded to a stage:
CREATE PIPE MY_PIPE
• Automating Snowpipe using cloud messaging
AUTO_INGEST=TRUE (external stages only)
AS • Calling Snowpipe REST endpoints
(internal and external stages)
COPY INTO MY_TABLE
FROM @MY_STAGE
The Pipe object defines a COPY INTO <table>
FILE_FORMAT = (TYPE = 'CSV'); statement that will execute in response to a file
being uploaded to a stage.
tombaileycourses.com
Snowpipe: Cloud Messaging
tombaileycourses.com
Snowpipe: REST Endpoint
InsertFiles
REST endpoint
tombaileycourses.com
Snowpipe
tombaileycourses.com
Bulk Loading vs. Snowpipe
Authentication Relies on the security When calling the REST endpoints: Requires key pair
options supported by the authentication with JSON Web Token (JWT). JWTs are
client for authenticating signed using a public/private key pair with RSA encryption.
and initiating a user
session.
Load History Stored in the metadata of Stored in the metadata of the pipe for 14 days.
the target table for 64
days.
Compute Requires a user-specified Uses Snowflake-supplied compute resources.
Resources warehouse to execute
COPY statements.
Billing Billed for the amount of Snowflake tracks the resource consumption of loads for all
time each virtual pipes in an account, with per-second/per-core granularity, as
warehouse is active. Snowpipe actively queues and processes data files.
In addition to resource consumption, an overhead is included
in the utilization costs charged for Snowpipe: 0.06 credits
per 1000 files notified or listed via event notifications or
REST API calls.
tombaileycourses.com
Data Loading Best Practises
• 2022/07/10/05/
• 2022/06/01/11/
1
Minute
tombaileycourses.com
Data Unloading Overview
tombaileycourses.com
Data Unloading
GET
Stage Table
Table
tombaileycourses.com
Data Unloading
CSV
GET @MY_STAGE
file:///folder/files/;
Stage
tombaileycourses.com
Data Unloading
tombaileycourses.com
COPY INTO <location> Examples
tombaileycourses.com
COPY INTO <location> Copy Options
SINGLE Boolean that specifies whether to generate a single file or multiple files. FALSE
Number (> 0) that specifies the upper size limit (in bytes) of each file
MAX_FILE_SIZE FALSE
to be generated in parallel per thread.
tombaileycourses.com
GET
tombaileycourses.com
Semi-structured Overview
tombaileycourses.com
Semi-structured Data Types
VARIANT
• VARIANT is universal semi-structured data type of Snowflake for loading data in semi-structured data formats.
• Snowflake stores the VARIANT type internally in an efficient compressed columnar binary representation.
• Snowflake extracts as much of the data as possible to a columnar form, based on certain rules.
• VARIANT data type can hold up to 16MB compressed data per row.
• VARIANT column can contain SQL NULLs and VARIANT NULL which are stored as a string containing the word “null”.
tombaileycourses.com
Semi-structured Data Overview
{"widget": {
"debug": "on", <widget>
"window": { <debug>on</debug>
"title": "Sample Konfabulator Widget", <window title="Sample Konfabulator Widget">
"name": "main_window", <name>main_window</name>
"width": 500, <width>500</width>
"height": 500 <height>500</height>
}, </window>
"image": { <image src="Images/Sun.png" name="sun1">
"src": "Images/Sun.png", <hOffset>250</hOffset>
"name": "sun1", <hOffset>300</hOffset>
"hOffset": [250, 300, 850], <hOffset>850</hOffset>
"alignment": "center" <alignment>center</alignment>
}, </image>
"text": { <text data="Click Here" size="36" style="bold">
"data": "Click Here", <name>text1</name>
"size": 36, <hOffset>250</hOffset>
"style": "bold", <vOffset>100</vOffset>
"name": "text1", <alignment>center</alignment>
"hOffset": 250, <onMouseUp>
"vOffset": 100, sun1.opacity = 90;
"alignment": "center", </onMouseUp>
"onMouseUp": "sun1.opacity = 90;" </text>
}} </widget>
JSON XML
tombaileycourses.com
Semi-structured Data Types
ARRAY
tombaileycourses.com
Semi-structured Data Types
OBJECT
tombaileycourses.com
Semi-structured Data Types
VARIANT
VARIANT data type can hold up to
Universal Semi-structured data type used to 16MB compressed data per row
represent arbitrary data structures.
tombaileycourses.com
Semi-structured Data Formats
PARQUET XML
Binary format designed for projects in Consists primarily of tags <> and
the Hadoop ecosystem. elements.
tombaileycourses.com
Loading and Unloading
Semi-structured Data
tombaileycourses.com
Loading Semi-structured Data
SEMI-STRUCTURED
STAGE TABLE
DATA FILES
PUT COPY
INTO
tombaileycourses.com
JSON File Format options
DESC FILE FORMAT FF_JSON;
Option Details
Used only for loading JSON data into separate columns. Defines the format of
DATE_FORMAT
date string values in the data files.
Used only for loading JSON data into separate columns. Defines the format on
TIME FORMAT
time string values in the data files.
Only used for loading. If TRUE, allows duplicate object field names (only the
ALLOW DUPLICATE
last one will be preserved)
STRIP OUTER ARRAY Only used for loading. If TRUE, JSON parser will remove outer brackets []
STRIP NULL VALUES Only used for loading. If TRUE, JSON parser will remove object fields or array
elements containing NULL
tombaileycourses.com
Semi-structured Data Loading Approaches
INFER_SCHEMA
MATCH_BY_COLUMN_NAME
COPY INTO MY_TABLE COPY INTO MY_TABLE
FROM @MY_STAGE/FILE1.JSON FROM ( SELECT
FILE_FORMAT = FF_JSON; V:name,
V:age,
V:dob COPY INTO MY_TABLE
FROM @MY_STAGE/FILE1.JSON) FROM @MY_STAGE/FILE1.JSON
FILE_FORMAT = FF_JSON; FILE_FORMAT = (TYPE = ‘JSON’)
MATCH_BY_COLUMN_NAME = CASE_SENSITIVE;
tombaileycourses.com
Unloading Semi-structured Data
tombaileycourses.com
Accessing Semi-structured Data
tombaileycourses.com
Accessing Semi-structured Data
{
“employee”:{
“name”:“Aiko Tanaka”,
“_id”:“UNX789544”,
“age”:42
CREATE TABLE EMPLOYEE (
}, src VARIANT
COPY INTO );
“joined_on”:“2019-01-01”,
“skills”:[“Java”, “Kotlin”, “Android”],
“is_manager”:true,
“base_location”:null,
}
tombaileycourses.com
Accessing Semi-structured Data
SELECT src:employee.name FROM EMPLOYEES; SELECT SRC[‘employee’][‘name’] FROM EMPLOYEES; SELECT SRC:skills[0] FROM EMPLOYEES;
Element
Array
VARIANT First level Subsequent VARIANT First level Subsequent Index
column element elements column element elements
SELECT SRC:Employee.name FROM EMPLOYEES; SELECT SRC[‘Employee’][‘name’] FROM EMPLOYEES; SELECT GET(SRC, ‘employee’)
FROM EMPLOYEE;
Column name is case insensitive but key names Column name is case insensitive but key names
are case insensitive so the above query would are case insensitive so the above query would SELECT GET(SRC, ‘skills’)[0]
FROM EMPLOYEE;
result in an error. result in an error.
tombaileycourses.com
Casting Semi-structured Data
tombaileycourses.com
Semi-structured Functions
tombaileycourses.com
Semi-structured Functions
JSON and XML Parsing Array/Object Creation and Manipulation Extraction Conversion/Casting Type Predicates
CHECK_JSON ARRAY_AGG FLATTEN AS_<object> IS_<object>
OBJECT_CONSTRUCT_KEEP_NULL
OBJECT_DELETE
OBJECT_INSERT
OBJECT_PICK
tombaileycourses.com
FLATTEN Table Function
Flatten is a table function that accepts compound values
(VARIANT, OBJECT & ARRAY) and produces a row for each item.
Path Recursive
Specify a path inside object to flatten. Flattening is performed for all sub-elements
recursively.
tombaileycourses.com
FLATTEN Output
A unique sequence For maps or objects, The path to the The index of the element, The value of the element The element being flattened
number associated this column contains element within a data if it is an array; otherwise of the flattened (useful in recursive
with the input record. the key to the structure which needs NULL. array/object. flattening).
exploded value. to be flattened.
tombaileycourses.com
LATERAL FLATTEN
SELECT
Aiko Tanaka, UNX789544, [ “Java” , “Kotlin” , “Android” ]
src:employee.name,
src:employee._id,
Aiko Tanaka, UNX789544,
f.value
FROM EMPLOYEE e,
Aiko Tanaka, UNX789544,
LATERAL FLATTEN(INPUT => e.src:skills) f;
tombaileycourses.com
Summary of
Snowflake Functions
tombaileycourses.com
Supported Function Types
tombaileycourses.com
Scalar Functions
Bitwise Expression Semi-structured Data
A scalar function is a function that returns one value
per invocation; these are mostly used for returning
one value per row. String & Binary
Conditional Expression
Regular Expressions
Context
Hash
Conversion SELECT UUID_STRING();
Metadata
tombaileycourses.com
Aggregate Functions
tombaileycourses.com
Window Functions
Output:
-----------------------------------------
|“ACCOUNT_ID” |“AMOUNT” |“MAX(AMOUNT)” |
-----------------------------------------
|001 |10.00 |23.78 |
|001 |23.78 |23.78 |
|002 |67.78 |67.78 |
-----------------------------------------
tombaileycourses.com
Table Functions
Table functions return a set of rows for each
input row. The returned set can contain zero,
one, or more rows. Each row can contain one or
more columns.
Output:
--------------------------------------------
|“RANDSTR(5,RANDOM())” |“RANDOM()” |
--------------------------------------------
|My4FU |574440610751796211 |
|YiPSS |1779357660907745898|
|cu2Hw |6562320827285185330|
--------------------------------------------
tombaileycourses.com
System Functions
SELECT system$cancel_query('01a65819-0000-2547-0000-94850008c1ee');
Output:
---------------------------------------------------------------
|“SYSTEM$CANCEL_QUERY('01A65819-0000-2547-0000-94850008C1EE’)”|
---------------------------------------------------------------
|query [01a65819-0000-2547-0000-94850008c1ee] terminated. |
---------------------------------------------------------------
tombaileycourses.com
System Functions
SELECT system$pipe_status(‘my_pipe');
Output:
----------------------------------------------------
|“SYSTEM$PIPE_STATUS('MYPIPE’)” |
---------------------------------------------------
|{"executionState":"RUNNING","pendingFileCount":0} |
----------------------------------------------------
tombaileycourses.com
System Functions
Output:
-----------------------------------------------------------
|“SYSTEM$EXPLAIN_PLAN_JSON('SELECT AMOUNT FROM ACCOUNT')” |
-----------------------------------------------------------
|{ |
|"GlobalStats": { |
| "partitionsTotal": 1, |
| "partitionsAssigned": 1, |
| "bytesAssigned": 1024 |
| }[…] |
-----------------------------------------------------------
tombaileycourses.com
Estimation
Functions
tombaileycourses.com
Estimation Functions
Estimate
Estimate the Estimate Estimate
similarity of
number of frequency percentile of
two or more
distinct values. values in a set. values in a set.
sets.
tombaileycourses.com
Cardinality Estimation
Snowflake implemented something called the
HyperLogLog cardinality estimation algorithm,
which returns an approximation of the distinct
number of values of a column.
Output: 1,491,111,415
SELECT APPROX_COUNT_DISTINCT(L_ORDERKEY) FROM LINEITEM;
Execution Time: 44 Seconds
Output: 1,500,000,000
SELECT COUNT(DISTINCT L_ORDERKEY) FROM LINEITEM;
Execution Time: 4 Minutes 20 Seconds
tombaileycourses.com
Similarity Estimation
-------------------------
|“MINHASH(5, C_CUSTKEY) |
-------------------------
|{ |
| "state": [ |
| 557181968304, |
| 67530801241, |
Output:
| 1909814111197, |
| 8406483771, |
| 34962958513
|
| ], |
| "type": "minhash", |
| "version": 1 |
|} |
-------------------------
tombaileycourses.com
Similarity Estimation
-------------------------------
|“APPROXIMATE_SIMILARITY(MH)” |
Output: -------------------------------
|0.8 |
-------------------------------
tombaileycourses.com
Frequency Estimation
APPROX_TOP_K APPROX_TOP_K_ACCUMULATE
APPROX_TOP_K_COMBINE APPROX_TOP_K_ESTIMATE
Output:
----------------------------------------
|“APPROX_TOP_K(P_SIZE, 3, 100000)” |
----------------------------------------
|[[13,401087],[38,401074],[35,401033]] |
----------------------------------------
tombaileycourses.com
Frequency Estimation
APPROX_TOP_K
Output:
--------------------
|“P_SIZE” | “C” |
--------------------
|13 |401,087 |
|38 |401,074 |
|35 |401,033 |
--------------------
tombaileycourses.com
Percentile Estimation
APPROX_PERCENTILE APPROX_PERCENTILE_ACCUMULATE
APPROX_PERCENTILE_COMBINE APPROX_PERCENTILE_ESTIMATE
Output:
---------------------------------
|“APPROX_PERCENTILE(score,0.8)” |
---------------------------------
|74 |
---------------------------------
tombaileycourses.com
Table Sampling
tombaileycourses.com
Table Sampling
Fraction-based
SELECT
SELECT * * FROM LINEITEM
FROM LINEITEM SAMPLE SAMPLE (50); (50);
BERNOULLI/ROW SELECT * FROM LINEITEM SAMPLE SYSTEM/BLOCK (50);
tombaileycourses.com
Table Sampling
Fixed-size
Output:
-------------------------
|“L_TAX” | “L_SHIPMODE” |
------------------------- Fixed-size sampling does not support block
|0.02 |REG AIR | sampling and use of seed. Adding these will
|0.02 |TRUCK | result in an error.
|0.06 |TRUCK |
-------------------------
tombaileycourses.com
Unstructured Data
File Functions
tombaileycourses.com
Unstructured Data
Unstructured
Data
Multi-media Documents
tombaileycourses.com
File Functions
BUILD_SCOPED_FILE_URL
UNSTRUCTURED STAGE
DATA FILES
BUILD_STAGE_FILE_URL URL
GET_PRESIGNED_URL
tombaileycourses.com
BUILD_SCOPED_FILE_URL
URL valid
build_scoped_file_url( @<stage_name> , '<relative_file_path>' )
for 24 hours.
Output:
-----------------------------------------------------------------------
|“BUILD_SCOPED_FILE_URL(@images_stage, ‘prod_z1c.jpg’);” |
-----------------------------------------------------------------------
|https://go44755.eu-west-2.aws.snowflakecomputing.com/api/files/ |
|01a691df-0000-277e-0000-9485000bc022/163298951696390/5fGgfDJX6kvA |
|qZx6tUJNjWDXEu%2f8%2b7a%2fqQ5HFPCKKMs81o1MC5NSLKPzC6p2hy670VChIC7o |
|Po2JwrY8%2fAQ13fVjwXtxs4OUf76eUDVH7G1UzOf5ugveSR6qAQF60EV7y2F9e9cn |
|RWHBMncTyGuyCxd4gxtVSyXRQuQ7s2qBsh6%2bt0Yj4LNsOhjQFmD3EPgfGQ7P81gY |
|z2p%2fFyRcFX4V |
-----------------------------------------------------------------------
When this function is called in a query the role must have USAGE privileges
on an external named stage and READ privileges on an internal named stage.
tombaileycourses.com
BUILD_SCOPED_FILE_URL
Output:
-----------------------------------------------------------------------
|SCOPED_FILE_URL |
-----------------------------------------------------------------------
|https://go44755.eu-west-2.aws.snowflakecomputing.com/api/files/ |
|01a691df-0000-277e-0000-9485000bc022/163298951696390/5fGgfDJX6kvA |
|qZx6tUJNjWDXEu%2f8%2b7a%2fqQ5HFPCKKMs81o1MC5NSLKPzC6p2hy670VChIC7[…] |
-----------------------------------------------------------------------
tombaileycourses.com
BUILD_STAGE_FILE_URL
Output:
---------------------------------------------------------------------------------------------------------------
| “BUILD_STAGE_FILE_URL(@images_stage, 'prod_z1c.jpg’)” |
---------------------------------------------------------------------------------------------------------------
|https://go44755.eu-west-2.aws.snowflakecomputing.com/api/files/DEMO_DB/DEMO_SCHEMA/IMAGES_STAGE/prod_z1c.jpg |
---------------------------------------------------------------------------------------------------------------
Calling this function whether it’s part of a query, UDF, Stored Procedure or
View requires privileges on the underlying stage, that is USAGE for external
stages and READ for internal stages.
tombaileycourses.com
GET_PRESIGNED_URL
Output:
-----------------------------------------------------------------------------------------
|GET_PRESIGNED_URL(@images_stage, 'prod_z1c.jpg’,600) |
-----------------------------------------------------------------------------------------
|https://sfc-uk-ds1-6-customer-stage.s3.eu-west-2.amazonaws.com/vml0-s- |
|ukst8973/stages/42763db5-31fa-47ef-bdb1-729d81b645c2/tree.jpg?X-Amz-Algorithm=AWS4- |
|HMAC-SHA256&X-Amz-Date=20220827T162316Z&X-Amz-SignedHeaders=host&X-Amz-Expires=599&X- |
|Amz-Credential=AKIA4ANG2XQCHAHQKBKS%2F20220827%2Feu-west-2%2Fs3%2Faws4_request&X-Amz- |
|Signature=0e8edf89acc24d9bd23b5387f8938671f54212be1c50c1bfe26c974ea151af55 |
-----------------------------------------------------------------------------------------
Calling this function whether it’s part of a query, UDF, Stored Procedure or
View requires privileges on the underlying stage, that is USAGE for external
stages and READ for internal stages.
tombaileycourses.com
GET_PRESIGNED_URL
@DOCUMENTS_STAGE
@images_stage/document.pdf
@images_stage/document_metadata.json
{
“relative_path": “document.pdf",
"author": "Corado Fernandez",
"published_on": "2022-01-23",
"topics":[
"nutrition",
"health",
"science"
]
}
tombaileycourses.com
GET_PRESIGNED_URL
tombaileycourses.com
GET_PRESIGNED_URL
Output:
------------------------------------------------------------------------------------------------------------
|AUTHOR |PUBLISHED_ON |PRESIGNED_URL |TOPICS |
------------------------------------------------------------------------------------------------------------
|Corado Fernandez |2022-01-23 |https://sfc-uk-ds1-6-customer-stage.[…] |["nutrition","health","science"] |
------------------------------------------------------------------------------------------------------------
tombaileycourses.com
Directory Tables
tombaileycourses.com
Directory Tables
Internal External
--------------------------------------------------------------------------------------------------------------
|RELATIVE_PATH |SIZE |LAST_MODIFIED |MD5 |ETAG |FILE_URL |
--------------------------------------------------------------------------------------------------------------
|document.pdf |250,838 |55:42.0 |ba247312[…] |f76b4327[…] |https://go44755.eu-west-2.aws.snowflake[…] |
--------------------------------------------------------------------------------------------------------------
tombaileycourses.com
Refreshing Directory Tables
Internal
External
tombaileycourses.com
Refreshing Directory Tables
tombaileycourses.com
File Support REST API
tombaileycourses.com
File Support REST API
GET /api/files
BUILD_SCOPED_FILE_URL(); BUILD_STAGE_FILE_URL();
tombaileycourses.com
File Support REST API
1 import requests
2 response = requests.get(“https://go44755.eu-west-2.aws.snowflakecomputing.com/api/files/DB/SCHEMA/STAGE/img.jpg”,
3 headers={
4 "User-Agent": "reg-tests",
5 "Accept": "*/*",
6 "X-Snowflake-Authorization-Token-Type": "OAUTH",
8 },
9 allow_redirects=True)
10 print(response.status_code)
11 print(response.content)
tombaileycourses.com
Storage Layer Overview
tombaileycourses.com
Storage Summary
CSV JSON
ORC Parquet
Avro XML
P1
P2
P3
tombaileycourses.com
Micro-partitions
tombaileycourses.com
Micro-partitions
Micro-partition 1
tombaileycourses.com
Micro-partitions
Micro-partition 1
Snowflake partitions along the natural ordering of the input data
as it is inserted/loaded.
tombaileycourses.com
Micro-partition Metadata
MIN: 1 MAX: 3
tombaileycourses.com
Micro-partition Pruning
MY_CSV_TABLE
Micro-partition 1 001-100
Micro-partition metadata allows Snowflake to optimize a query
by first checking the min-max metadata of a column and
discarding micro-partitions from the query plan that are not
Micro-partition 2 101-200
required.
Micro-partition 3 201-300
SELECT ORDER_ID, ITEM_ID
FROM MY_CSV_TABLE
Micro-partition 4 301-400 WHERE ORDER_ID > 360 AND ORDER_ID < 460;
Micro-partition 5 401-500
The metadata is typically considerably smaller than the actual
data, speeding up query time.
Micro-partition 6 501-600
tombaileycourses.com
Time Travel & Fail-Safe
tombaileycourses.com
Data Lifecycle
tombaileycourses.com
Time Travel
Time Travel enables users to analyse historical data by SELECT * FROM MY_TABLE AT(TIMESTAMP =>
querying it at points in the past. TO_TIMESTAMP('2021-01-01'));
Time Travel enables users to create clones of objects CREATE TABLE MY_TABLE_CLONE CLONE MY_TABLE
from a point in the past. AT (TIMESTAMP => TO_TIMESTAMP('2021-01-01'));
tombaileycourses.com
Time Travel Retention Period
The Time Travel retention period is configured with the parameter: ALTER DATABASE MY_DB
DATA_RETENTION_TIME_IN_DAYS. SET DATA_RETENTION_TIME_IN_DAYS=90;
Enterprise Edition
The default retention period on the account, database, schema and
table level is 1 day. Standard Edition
Temporary Transient
Temporary and transient objects can have a max retention period of
1 day across all editions.
0 1 0 1
tombaileycourses.com
Accessing Data In Time Travel
AT BEFORE UNDROP
The AT keyword allows you to capture The BEFORE keyword allows you to select The UNDROP keyword can be used to
historical data inclusive of all changes historical data from a table up to, but not restore the most recent version of a
made by a statement or transaction up including any changes made by a specified dropped table, schema or database.
until that point. statement or transaction.
If an object of the same name already
Three parameters are available to One parameter is available to specify a exists an error is returned.
specify a point in the past: point in the past:
To view dropped objects you can use:
• TIMESTAMP • STATEMENT SHOW TABLES HISTORY;
• OFFSET
• STATEMENT
tombaileycourses.com
Fail-safe
0 1 90 7
tombaileycourses.com
Cloning
tombaileycourses.com
Cloning
• TABLES
• STREAMS
CREATE STAGE MY_EXT_STAGE • STAGES
URL='S3://RAW/FILES/' • FILE FORMATS
CREDENTIALS=();
• SEQUENCES
CREATE STAGE MY_EXT_STAGE_CLONE CLONE • TASKS
MY_EXT_STAGE;
• PIPES (reference external stage only)
tombaileycourses.com
Zero-copy Cloning
MY_TABLE P1 P2 P3 P4 P5
Changes made after the point of cloning then start to create
additional micro-partitions.
Changes made to the source or the clone are not reflected between
each other, they are completely independent.
MY_CLONE
tombaileycourses.com
Cloning Rules
A cloned table does not contain the load history of the source table.
tombaileycourses.com
Replication
tombaileycourses.com
Replication
ORG_1
Replication is a feature in Snowflake which enables replicating databases
account1.eu-west-2.aws between Snowflake accounts within an organization.
tombaileycourses.com
Replication
ORG_1
account1.eu-west-2.aws External Tables, Stages, Pipes, Streams and Tasks are not currently
replicated.
Only databases and some of their child objects can be replicated, not users,
account2.eu-central-1.aws roles, warehouses, resource monitors or shares.
DB_1_REPLICA
Billing for database replication is based on data transfer and compute
resources.
tombaileycourses.com
Storage Billing
tombaileycourses.com
Storage Billing
The monthly costs for storing data $42 per Terabyte per month
in Snowflake is based on a flat rate for customers deployed in
per Terabyte. AWS – EU (London)
tombaileycourses.com
Data Storage Pricing
tombaileycourses.com
Data Storage Billing Scenarios
Storage
If an average of 15TB is stored during a month If an average of 14TB is stored during a month
on account deployed in AWS Europe (London) Region on account deployed in AWS AP Mumbai Region
it will cost $630.00 (Nov 2021) it will cost $644.00 (Nov 2021)
tombaileycourses.com
Storage Monitoring
tombaileycourses.com
Data Sharing
tombaileycourses.com
Secure Data Sharing
Account A Account B
MY_DATABASE MY_SHARED_DATABASE
MY_TABLE MY_TABLE
tombaileycourses.com
Secure Data Sharing
Table Table
An account can share the following database objects:
External Table
• Tables
Secure View
• External tables
• Secure views M
Secure Materialized View Account C
• Secure materialized views Secure Function
Secure UDFs
Database
•
Add ACCOUNT B
Secure View
Add ACCOUNT C
Data consumers create a database from a SHARE which contains
the read-only database objects granted by the data provider.
DATA PROVIDER DATA CONSUMER
tombaileycourses.com
Secure Data Sharing Example
SHOW SHARES;
Database Privilege
-- Create database from share (IMPORT SHARE privilege required)
Schema Privilege
CREATE DATABASE SHARED_DATABASE FROM SHARE GYU889T.MY_SHARE;
Account A Account B
Database objects added to a share become SHARE Database
Account A
Add C
tombaileycourses.com
Data Provider
Account A Account B
Access to a share or database objects in a share can be SHARE Database
revoked at any time.
Replicate Account C
A share can only be granted to accounts in the same Share
region and cloud provider as the data provider account. Account B
Account A Share
tombaileycourses.com
Data Consumer
Only one database can be created per share. Importing more than once is not supported. Database
is already imported as 'SNOWFLAKE_SAMPLE_DATA’.
Account A Account B
tombaileycourses.com
Data Consumer
Account A Account B
shared database.
objects.
tombaileycourses.com
Reader Account
A reader account can only consume data from the provider account
that created it.
Reader
Provider accounts assume responsibility for the reader account they Account
create and are billed for their usage.
tombaileycourses.com
Reader Account
Schema Privilege
Table Privilege
-- IMPORT SHARE privilege required
Add X88TYUS
CREATE DATABASE SHARED_DATABASE FROM SHARE GYU889T.MY_SHARE;
Data Data
Provider Consumer
⇒ Third-party dataset
⇒ Via the Provider Studio
immediately available to
Data Provides can list
query from account without
standard or personalised
transformation.
shares.
tombaileycourses.com
Video 22: Data Exchange
tombaileycourses.com
Data Exchange
tombaileycourses.com