snowflake_notes

The document provides a comprehensive overview of Snowflake's data sharing capabilities, virtual warehouses, storage, and protection features. It details the types of shares, costs, architecture, and various editions, as well as the functionalities of Snowflake's tools like Snowpark and SnowSQL. Additionally, it covers data loading/unloading processes, including Snowpipe and the management of micro-partitions, streams, and tasks.

Uploaded by

martinrabosinfin8

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

snowflake_notes

Uploaded by

martinrabosinfin8

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 20

SNOWFLAKE NOTES

1. OVERVIEW
- Data Sharing – Direct Share:
o Objects included in a share:
 Tables. I don’t think you can share temporary/transient tables.
 External tables.
 Dynamic tables.
 Secure views.
 Secure materialized views.
 Secure UDFs.
o Each share consists of:
 Privileges that grant access to the DB and the schema containing the objects
shared. At least USAGE.
 Privileges that grant access to the specific objects in the DB. At least SELECT.
 The list of consumer accounts.
o Types of Secure Data Sharing:
 Direct Shares/Inbound share: objects shared with another account in your
region.
 Listings: objects + metadata.
 Data Exchange: group created by the provider with different consumers.
o Reader account: belongs to the provider.
o Cannot share a share.
o ACCOUNTADMIN or role that has IMPORT SHARE privilege. This user will have to create a
DB before querying DB objects of the share.
o GRANT_IMPORTED_PRIVILEGES/REVOKE_IMPORTED_PRIVILEGES
o Every account has two inbound shares: ACCOUNT_USAGE and SAMPLE_DATA.
o Create secure view from different DBs. Share the data to a consumer account.
o Cross-region sharing requires data replication. You replicate once per region, the number
of consumers in the region doesn’t matter.
o An object added by the data provider is instantly accessible by the consumer.
o You can’t create a table in a shared DB. Read-only DBs.
o Actions performed by the consumer on a share:
 Query tables and join them with existing tables of their account.
 Copy shared data into another table in their account.
 Time travel NOT available.
- Costs:
o Storage: average daily amount of compressed data stored. Files Staged, tables, Time
Travel and Fail Safe.
o Account & Usage: to see storage costs in the UI.
o Cloud Services Compute: only charged if it represents more than 10% of daily WH usage.
- Architecture:
o Cloud services layer:
 Query compilation
 Snowflake DOES NOT have availability zones management.
 Snowflake meets ACID (Atomicity, Consistency, Isolation, and Durability)
compliance.
- Editions:
o Enterprise:
 Search Optimization Service + Query Acceleration Service.
 Materialized views.
 Row/column access policies.
 Multi-cluster WHs.
 ACCOUNT_USAGE.ACCESS_HISTORY view.
o Business Critical:
 Made for organizations with extremely sensitive data.
 Particularly for PHI data that must comply HIPAA and HITRUST CSF
regulations.
 Supports private connectivity to Snowflake service through:
 AWS PrivateLink.
 Azure Private Link.
 Google Cloud Private Service Connect.
 Supports encrypted communication between Snowflake VPC and others VPCs (in
the same region).
 Tri-Secret Secure encryption:
 Requires Snowflake support to activate.
 Composite master key: Snowflake + customer managed key.
 DB failover and failback support between Snowflake accounts.
o Virtual Private Snowflake: data sharing and data marketplace are not allowed.
 Isolation edition. It has its own metadata store and compute resources.
- Context functions:
o select current_region ()/…
o current_client () returns client’s version. The version of the JDBC driver, for example.
- Comments: -- or //
- Availability zones: each cloud region usually has 3. They are physically separated data centers.
- URL: https://account_locator.region_id.cloud.snowflakecomputing.com
- ALTER VIEW <<view name>> SET SECURE;
- Data Marketplace:
o Two types of listings:
 Standard, usually publicly available and free.
 Personalized, usually require a request and a payment.
o Data providers must share fresh, real and legally shareable data.
o Can be browsed by non-Snowflake users.
o ACCOUNTADMIN or role that has IMPORT SHARE privilege, as in others shares.
- Snowpark: deploy and process non-SQL code using Snowflake data (serverless). You write in
your language and Snowflake pushes it down to execute in SQL.
o Python – Java – Scala.
o The code is lazily executed.
- Snowflake Scripting:
o Extension of Snowflake SQL to support procedural logic.
o Typically used to write stored procedures.
o DECLARE, BEGIN/END, EXCEPTION.
- Snowsight: the web UI.
o Each worksheet is an independent session.
o It allows to:
 Share worksheets between users in the same account.
 Run ad-hoc queries and DDL/DML operations.
 Export results of a SELECT statement.
o Set default Role and WH for a user.
- SnowSQL: CLI available for Windows, Linux and mac OS.
- Drivers: write applications that perform operations in Snowflake using the driver’s supported
language.
o Go – JDBC – ODBC – .NET – Node.js – PHP – Python.
o Kafka – Spark Connectors.
- SNOWFLAKE.ACCOUNT_USAGE:
o It’s a Snowflake share with metadata.
o Key differences with INFORMATION_SCHEMA table functions:
 Data Latency (45’ to 3h).
 Longer retention period (1 year).
 Includes dropped objects.
o Some views require Enterprise Edition.
o Views examples:
 QUERY_HISTORY view: 45’ latency. Monitors WH load and performance.
 STORAGE_USAGE view: average daily data storage.
 LOGIN_HISTORY view: login attempts.
 METERING_HISTORY view: hourly credit usage for all WH.
 WAREHOUSE_METERING_HISTORY view: hourly credit usage at WH level.
 DATABASE_STORAGE_USAGE_HISTORY: includes time travel and fail-safe.
 COPY_HISTORY: both COPY INTO and continuous data loading with Snowpipe.
 LOAD_HISTORY view: data load with COPY INTO <table>.
 PIPE_USAGE_HISTORY: data loading history using Snowpipe.
 ACCES_HISTORY (Enterprise): information about access to tables and columns.
 SQL read statements.
 DML operations such as INSERT, UPDATE, DELETE.
 Variations of the COPY command.
 READER_ACCOUNT_USAGE:
 Views for all reader accounts created.
 These views are a subset of the others with the addition of the
RESOURCE_MONITOR view.
- INFORMATION_SCHEMA:
o Metadata of DB objects and some non-DB objects common across all DBs, such as roles,
WHs and DBs.
o From 7 days to 6 months of metadata depending on the view/table function.
o Query not selective enough -> error because the query returned too much data.
o LOGIN_HISTORY_BY_USER ():
 These kinds of functions don’t have latency.
o AUTO_REFRESH_REGISTRATION_HISTORY:
 History of data files registered in the metadata of specified objects.
 Credits billed for these operations.
 14 days of billing history.
- SnowCD (Snowflake Connectivity Diagnostic Tool): troubleshooting network connection.
- SELECT LAST_QUERY_ID (-1) (default). (1) would return the first query of the current session.
- Sampling:
o SYSTEM | BLOCK
o BERNOULLI | ROW
- INSERT + OVERWRITE truncates and then inserts.
- TIMESTAMP_NTZ (No Time Zones) is the default data type for timestamp column.

2. SNOWFLAKE VIRTUAL WAREHOUSES

- General considerations:
o A WH provides CPU, memory and temporary local storage resources for computing
queries.
o Bytes spilled to local/remote storage: complex queries that don’t fit in the WH memory.
o Sizes: from X-Small to 6X-Large.
o A bigger WH has more compute resources, which allows parallelization when loading
files. So, you should split and, unless you need to bulk load hundreds or thousands of
files, a small WH is enough.
 XS WH can load 8 files in parallel.
o Idle resources will be suspended if you suspend the WH even if it is running a query.
o Queries doesn’t start until the WH is fully provisioned.
 If process fails during start-up, Snowflake retries the failing nodes and start
executing queries when 50% already provisioned.
o USE WAREHOUSE <wh_name>
- Resource Monitor:
o Created by ACCOUNTADMIN. Grant MONITOR and MODIFY to a role to view and modify.
o Measure user managed virtual WH and WH managed by the cloud services layer.
o Notifications:
 Classic console notification to administrator users.
 Emails to verified addresses. Up to 5 non-administrator users.
o Resets Daily, Weekly, Monthly, Yearly, never. NOT per-minute/hourly basis!
o Must have and action specified.
 Notify.
 Notify and suspend.
 ‘Notify and suspend immediately’ can incur in additional costs while suspending
all the workload.
o If a WH is dropped, you can’t resume it again unless:
 You drop the Resource Monitor.
 You remove the WH from the Resource Monitor (not applicable to account RM).
 You increase the credit quota.
 You increase the credit threshold.
 The next interval starts.
- Multi-cluster WH (Enterprise):
o Switch from single to multi and vice-versa when you want.
o Number of compute resources = number of clusters. NO
o One cluster has many compute resources, so:
 Scaling up adds compute resources to the cluster.
 Scaling out adds clusters to the WH.
- Snowpark-optimized WH: for workload with high memory requirements such as ML training use
cases.
- States:
o STARTED: running/active.
o SUSPENDED: inactive.
o RESIZING.
- Privileges:
o OPERATE: change the state.
o MODIFY: alter the WH, in size, for example.
o MONITOR: current and past queries executed and statistics.
o USAGE: enables a user to execute queries with the WH.
o OWNERSHIP: full control. Only allowed to one role at a time.
o ALL: grant all the privileges but OWNERSHIP.

3. SNOWFLAKE STORAGE AND PROTECTION

- Micro-partitions:
o Contiguous units of storage between 50MB and 500MB of uncompressed data,
organized in columnar way.
o Snowflake automatically determines the most efficient compression algorithm for each
column.
o Immutable.
o Automatically created using the order or insertion/load.
o Metadata about rows stored in micro-partitions:
 The range of values for each of the columns in the micro-partition.
 The number of distinct values.
 Additional properties used for both optimization and efficient query processing.
o The files are stored in the cloud platform, but the user can’t either see or access them.
o Pruning does not happen when using subqueries.
- External tables:
o Read-only tables stored in the cloud providers.
o You query an external stage as if it was a table inside Snowflake.
o Useful if you only access a portion of the data and you don’t do it often.
- Data protection features:
o Triple redundancy: Snowflake crashes but cloud all right.
o Automatic AZ fail-over: data replicated across 3 cloud availability zones.
o DB replication (Business Critical): synchronized DB in another cloud provider/region.
 All DB objects but: stages, streams, tasks and external tables.
 Non – DB objects are not replicated.
 No share can be replicated.
 You can use a task to perform DB replication periodically.
- Streams: Change Data Capture during retention period.
o Offset: snapshot of every row of the object and track of DML changes.
o Tables – Directory tables – External tables – Views.
o Types of streams:
 Standard/Append-only for tables, directory tables and views (underlying tables
of the view).
 Insert-only on external tables.
o Hidden columns (consume storage):
 METADATA$ACTION
 METADATA$ISUPDATE
 METADATA$ROW_ID
o Not allowed for materialized views.
o The views allowed must meet underlying table and query requirements.
o Stale stream: not consumed.
o Different streams with different periods in the same table.
- Materialized views:
o Can have clustering keys.
o Suspended if columns in the base table dropped or changed. Recreation required.
o Can query only one table. No self-joins allowed.
o NO UDFs, window functions, LIMIT…
o GROUP BY fields must be part of the SELECT list.
- UDFs:
o Java – JavaScript – Scala – Python – SQL.
o Must return single value/tabular data.
o Overload a UDF: multiple UDFs, same identifier, different input parameters.
o Runs with owner rights.
o They don’t allow DDL or DML.
- Stored Procedures:
o Same UDF’s languages.
o Cannot be administered using Snowflake UI. Requires SnowSQL???.
o In contrast with UDFs, they don’t have to return a value.
o Caller’s rights stored procedures vs. owner’s rights stored procedures.
o USAGE or OWNERSHIP granted to the role to use it.
- Tasks:
o Types of SQL code:
 Single SQL statement.
 Call to a Stored Procedure.
 Procedural logic with Snowflake Scripting.
o Delete parent? The node becomes root/standalone task.
o Compute resources:
 Serverless compute model. Consume up to the equivalent of XXL WH.
 User-provided WH.
o No trigger. The nodes don’t have their own schedule.
o User-managed tasks:
 If you fully utilize a WH.
 Unpredictable loads (multi-cluster WH).
 If adherence to scheduled interval is less important.
o Serverless tasks:
 If you don’t fully utilize a WH.
 Predictable loads.
 Adherence is important. It can resize until equivalent of 2XL WH.
o Privileges required for viewing task history:
 ACCOUNTADMIN role.
 OWNERSHIP over the task.
 Global MONITOR EXECUTION privilege and USAGE over the DB and schema that
contains the task.
o Max execution time is 60 min??? NO.
o Snowflake automatically rolls back disconnected tasks after 4h???.
- Transactions:
o Automatically aborted and rolled back by Snowflake if open for 4 hours.
o Rolled back if the session ends.
o AUTOCOMMIT set to ON by default.
- Cloning:
o Not for shared DBs.
o You clone everything in a DB but external tables, internal stages and their pipes.
o A cloned table is just like any other table.
o Considerations:
 Table stages will be cloned but empty.
 Cloned but suspended by default:
 Clustering keys.
 Tasks.
 Alerts.
 UDFs can be cloned with some limitations.
 Streams and Time Travel will be restarted to the moment of the clone.
o Privileges required:
 Tables: SELECT over table + USAGE over Schema and DB.
 Pipes, streams and tasks: OWNERSHIP.
 Other objects: USAGE.
o Source object privileges are not inherited automatically. Use COPY GRANTS command.
 Child objects cloned do inherit all granted privileges.
o Cloning does not copy load metadata of a table.
o Examples of objects that can be cloned:
 File formats
 Sequences.
o Transient tables can be cloned to transient or temporary tables, temporary don’t.
o When you clone views and stored procedures with fully qualified table references, the
cloned views and stored procedures keep pointing to the source tables.
- Snowflake replicates the Cloud Services Layer and the Storage Layer across availability zones???.
- Time travel:
o DATA_RETENTION_TIME_IN_DAYS (account level/individual level and ACCOUNTADMIN).
 MIN_DATA_RETENTION_TIME_IN_DAYS at account level a regular data retention
time parameter over an object: the max remains.
o UNDROP for tables, schemas and DBs.
o AT|BEFORE -> TIMESTAMP, OFFSET(s), STATEMENT.
o 0-1 days of time travel for temporary/transient tables.

- Fail safe:
o You can’t alter fail safe at account, database, schema or table level. Use
temporary/transient tables instead.

4. DATA LOADING/UNLOADING
- Storage Integration: object with credentials for creating stages or unloading data in external
cloud providers.
- Snowpipe:
o 14 days of load history.
o REST APIs: endpoints to interact with pipes. For internal and external stages.
 insertFiles: files to be ingested into a table.
 insertReport: report of files submitted through insertFiles and ingested into a
table.
 loadHistoryScan: ~ to insertReport but with specified time range. Up to 10 000
items returned. Rely more on insertReport to avoid errors for excessive calls.
o AUTO_INGEST = TRUE: enables automatic data loading from external stages.
o AUTO_INGEST = FALSE: requires making calls to the REST APIs.
o Event notifications:
 AWS supports notifications from all cloud platforms.
 GCP and Azure only their own.
o File sizing recommendations:
 100 – 250 MB of compressed data.
 +100GB files are not recommended.
 Maximum allowed data load duration: 24 hours.
o Basic transformations allowed: column ordering, omitting, casting or truncation.
o It can’t reload a file with the same name twice.
- Bulk load – COPY INTO statement: from on-premises or cloud storage.
o Requires user-managed WH.
o 64 days of load history.
o copy into <table> from @~/<file>.xml
file_format = (type = 'XML' strip_outer_element = true); strip_outer_array (JSON)
o File size recommendations and transformations are the same as for Snowpipe.
o VALIDATE:
 Validates de files loaded in the last execution of the COPY INTO statement and
returns all errors encountered.
 Does not support COPY statements with transformations.
o VALIDATION_MODE:
 Validates data files instead of loading them.
 Does not support COPY statements with transformations.
 Types:
 RETURN_<n>_ROWS.
 RETURN_ERRORS. I don’t understand the difference with the next???.
 RETURN_ALL_ERRORS.
o ON_ERROR:
 CONTINUE.
 SKIP_FILE.
 ABORT_STATEMENT. Default behavior.
o COPY options:
 LOAD_UNCERTAIN_FILES = TRUE
 Checks load metadata for avoiding duplication and load every file with
no load metadata.
 FORCE = TRUE
 Loads every file without considering load metadata.
 OBJECT_CONTRUCT
 Transform structured data into VARIANT data type.
o Select the files to load by:
 List of specific files.
 Pattern matching.
 Path (internal stage) or prefix (Amazon S3 bucket).
- Load with UI wizard:
o background = PUT + COPY INTO.
o Designed for a few small files (<50MB).
- Concurrent workload processing is managed in both COPY INTO and Snowpipe.
- Avoid data loading into a table from Snowpipe and bulk load at the same time.
- Unload data:
o By default:
 Parallelizing.
 16MB each file.
 Compressed format.
 Automatically decrypted (after GET, I guess).
o SINGLE parameter set to TRUE allows up to 5GB in a single file.
o Allows PARTITION BY command for partitioned data unload into the stages.
o Tables and SELECT statements.
o Allowed compression algorithms:
 CSV and JSON: GZIP | BZ2 | BROTLI | ZSTD | DEFLATE | RAW_DEFLATE
 Parquet: LZO | SNAPPY

- Stages: (what about external stages and transformations?)

o Types:
 @%table_stage
 @~user_stage
 @named_stage
o Metadata columns (you must specify them, as they are not returned with SELECT *):
 METADATA$FILE_NAME
 METADATA$FILE_ROW_NUMBER
 METADATA$FILE_CONTENT_KEY
 METADATA$FILE_LAST_MODIFIED
 METADATA$START_SCAN_TIME
o Internal stages:
 Stores data files internally within Snowflake.
 Can be permanent or temporary.
 Supports transformations.
 Can be seen from the UI.
o External stages:
 References data files stored in cloud provider by means of URL, file format,
credentials, etc.
 Supports transformations.
o Download to on-premises:
 GET if the origin is internal/table/user stage.
 Cloud services utilities for external stages.
o Upload from on-premises:
 PUT to internal/user/table stage.
 Cloud services utilities to external stage.
o SQL queries allowed to all stages. Useful for viewing data before loading.
o Directory table:
 Implicit object layered on a stage. It’s not a separate object.
 Metadata about stage files -> similar to external table.
 SELECT * FROM DIRECTORY (@<stage_name>)
 URL doesn’t expire.
 Automatic metadata refresh requires a cost overhead to manage event
notifications.
 SET DIRECTORY = (ENABLED = TRUE | FALSE)
o External and internal stages support unstructured data.
o Table stages are the only one that don’t support transformations.
o Files can be deleted after loading by:
 Specifying PURGE=TRUE in the copy options.
 Execute the REMOVE command after the COPY statement.
o INFER_SCHEMA (): retrieves metadata schema of staged files.
o LIST @<stage_name> / LS @<stage_name>
o BUILD_SCOPED_FILE_URL:
 URL to staged file active for 24 hours/query result period.
 No privileges required???. Only the caller can use it.
 Ideal for custom applications that provide unstructured data to other accounts
via a share or for downloading/ad-hoc analysis in Snowsight.
o BUILD_STAGED_FILE_URL:
 Permanent URL using stage name and relative access path.
 Role with enough privileges is required.
 Ideal for custom applications that require access to unstructured data files.
o GET_PRESIGNED_URL:
 Simple HTTPS URL for accessing a file in the web browser.
 Temporal pre-signed access token. Specify expiration_time argument.
 Ideal for BI tools that need to display unstructured content.
o GET_STAGE_LOCATION: gets stage URL.
- File formats:
o Structured data: CSV.
o Semi Structured data:
 JSON
 Avro
 Parquet
 XML
 OCR
o For unloading:
 CSV or similar.
 JSON.
 Parquet.
o Can be defined at stage, table or COPY command.
o By default, Snowflake loads data as CSV, so no named file format required.
- Pipes:
o SHOW PIPES lists all pipes for which you have access.
o CREATE OR REPLACE PIPE delete load history of the pipe.
 You must ALTER PIPE <> REFRESH after recreating.
o Transformations allowed when using a SQL statement:
 Column omission.
 Column reordering.
 …
 NO filters.
- External functions:
o They call code executed outside Snowflake, in a remote service.

5. PERFORMANCE AND TUNING

- Monitor Query History:
o Query History Page (UI -> History):
 all queries all users all WH all interfaces.
 14 days of queries, 1 days of results (same role required)
 If no cache is used: ‘Bytes scanned’ column shows a green bar.
 No delay.
 It can monitor WH load too.
o ACCOUNT_USAGE.QUERY_HISTORY view. It can monitor WH load too.
o INFORMATION_SCHEMA.QUERY_HISTORY(_BY_USER/SESSION/WAREHOUSE) table
function:
 Specified time range up to 7 days.
- Clustering keys:
o Can be defined over materialized views.
o Up to 3-4 columns, from lower to higher cardinality.
o Can be resumed/suspended by the user at any time at table level.
 Cannot be turned off at account level.
o Consume credits.
o Fully managed by Snowflake in the background.
o Enterprise Edition needed??? NO
o Can be of any data type except VARIANT, OBJECT, GEOGRAPHY, ARRAY.
o Functions for clustering depth:
 SYSTEM$CLUSTERING_INFORMATION
 SYSTEM$CLUSTERING_DEPTH
o CLUSTER BY (c1, c2)
o ALTER TABLE t1 RESUME RECLUSTER (command <t1> clause)

- Cache:
o Resultset cache:
 No changes on the underlying data.
 Exact same query. Case sensitive and no alias added/deleted allowed.
 Query without runtime/UDF/external functions.
 Queries with functions like CURRENT_DATE are eligible for query result caching.
 24 hours of retention period and up to 31 days since the first execution.
 Can be turned off at session, user or account level with the
USE_CACHED_RESULT parameter.
- Query optimizer: examines metadata cache 1st, result cache 2nd and warehouse cache 3rd.
- Query profile:
o Only for completed queries???.
o Available for 14 days for any user for every query.
o Typical performance issues:
 Exploding JOINS.
 UNION without ALL.
 Queries that don’t fit in memory.
 Partition pruning issues.
o Execution time:
 Processing — time spent on data processing by the CPU.
 Local Disk IO — time when the processing was blocked by local disk access.
 Remote Disk IO — time when the processing was blocked by remote disk access.
 Network Communication — time when the processing was waiting for the
network data transfer.
 Synchronization — various synchronization activities between participating
processes.
 Initialization — time spent setting up the query processing.
o Statistics:
 IO – input-output operations:
 Scan progress: percentage of data scanned for a table so far.
 Bytes scanned: # of bytes scanned so far.
 Percentage scanned from cache: from local (WH) cache.
 Bytes written: written when loading into a table.
 Bytes written to result: size of the result set.
 Bytes read from result.
 External bytes scanned: from an external object such as a stage.
 Pruning.
 Spilling: disk usage when intermediate results don’t fit in memory. Slower
performance because it requires more IO operations and disk access is slower
than memory access.
 Bytes spilled to local storage: to local disk.
 Bytes spilled to remote storage: to remote disk.
 Network: network communication with other applications. BI tools, for example.
 Bytes sent over the network.
- Optimizing Query Performance:
o Clustering the table.
o Materialized Views (Enterprise).
o Search optimization service (Enterprise): lookup queries.
 It uses a search access path. Persistent metadata about column values in each
micro-partition. May be similar to an index in relational DBs.
 Queries that do not benefit:
 External tables.
 Dynamic tables.
 Materialized views.
 COLLATE columns.
 Column concatenation.
 Analytical expressions.
 Cast on columns. Except for numeric column cast to string.
 Queries that do benefit:
 Equality searches.
 Substring and regular expression searches.
 Searches in a VARIANT column.
 Searches in GEOGRAPHY column with geospatial functions.
ALTER TABLE t1 ADD SEARCH OPTIMIZATION [ON EQUALITY (c1), SUBSTRING
(c2)]
o Query Acceleration Service (Enterprise):
 Offloads parts or query processing work to shared compute resources.
 Server availability allows more parallelization, but performance may fluctuate.
 Might benefit:
 Ad-hoc analysis.
 Workloads with unpredictable data volume per query.
 Queries with large scans and selective filters.
 Not eligible queries:
 Not enough partitions to scan.
 No filters or aggregations.
 Not selective enough filters or high cardinality aggregations.
 LIMIT without ORDER BY.
 Queries with random functions.
 Detect eligible queries:
 SYSTEM$ESTIMATE_QUERY_ACCELERATION Function
 ACCOUNT_USAGE.QUERY_ACCELERATION_ELIGIBLE view
 Enable the service at WH level.
 Scale factor:
 Cost control mechanism to limit the compute resources used.
 8 by default. It means that the service can spend up to 8 times the WH
resources.
 There’re 3 columns in the QUERY_HISTORY view to see the effects of the service.
- Optimize WH performance:
o Reduce queuing.
o Resolve memory spillage.
o Increase WH size.
o Try query acceleration.
o Optimize WH cache.
o Limit concurrent queries.
- Algorithms:
o Estimate count(distinct): HyperLogLog algorithm.
o Estimate percentiles: t-Digest.
o Estimate approximate frequent values: Space-Saving.
- EXPLAIN plan: useful to evaluate query efficiency.
o Compiles the SQL but does not execute it -> no WH required.
o Gives info about:
 Partition pruning.
 Join ordering.
 Join types.

6. SEMI STRUCTURED DATA

- Data types:
o VARIANT
 Can store a value of any type, including OBJECT and ARRAY.
 16MB uncompressed is the max storage per row. Usually less due to internal
overhead.
 JSON path notation to query VARIANT columns.
 Dot notation:
SELECT
<col_name>:<key_name_1>[N]. < key_name_2>.
<key_name_3>: :<cast_datatype>
FROM table_name
 Bracket Notation:
SELECT
<col_name> [‘key_name_1’] [‘key_name_2’]
[‘key_name_3’]: :<cast_datatype>
FROM table_name
 Relational table with VARIANT column: separate storage.
 Repeating keys and paths stored as separate physical columns.
 Views are recommended to make VARIANT data accessible to BI tools.
 Common use cases:
 Create hierarchical data explicitly defining a hierarchy between OBJECTs
or ARRAYs.
 Loading semi structured file like JSON, Avro, ORC, XML or Parquet
directly, without specifying its underlying hierarchical structure.
 NULL value is literal ‘null’.
o OBJECT = dictionary = JSON.
 Comparable query and storage with relational table (if contains mainly int and
strings).
o ARRAY
- Functions:
o STRIP_OUTER_ARRAY: remove JSON structure to load data in separate rows. You can use
it in the COPY INTO command.
o FLATTEN:
 Set RECURSIVE = TRUE to expand all sub-elements recursively.
 OUTER = FALSE omits the output of the input rows that cannot be expanded.
o LATERAL_FLATTEN: parse arrays in JSON file. (LATERAL joins with data outside the obj.).

7. ACCOUNT AND SECURITY

- Network policies:
o Whitelist/blacklist of IPv4 addresses.
 Blacklist has priority over whitelist.
 Snowflake does not block any IP address by default.
o You can bypass them for a specific number of minutes.
 MINS_TO_BYPASS_NETWORK_POLICY -> contact Snowflake Support.
o Privileges required:
 SECURITYADMIN or higher.
 CREATE NETWORK RULE on the schema (the schema owner has it).
o Can be applied at account or user level.
o SHOW NETWORK_POLICIES: list all network policies.
o SHOW PARAMETERS [LIKE ‘pattern’] [{IN | FOR} {USER | ACCOUNT | SESSION} {WH…}
o If a policy changes for a user, he can’t do anything until he logs in again.
o You can’t block your own IP.
o 0.0.0.0/0 represents all IPv4 addresses in your local machine.
- Account objects:
o Securable object: an entity to which access can be granted.

- Object access methods:

o RBAC: Role-based access control.
 Privileges assigned to roles and roles to users.
o DAC: Discretionary access control.
 A role that creates an object owns it and can provide access to other roles.
- Roles:
o Role selected when logging in > Default role. If none defined, then PUBLIC.
o All roles are assigned to SYSADMIN -> SYSADMIN manages all account objects.
o Types of roles:
 Account roles.
 Database roles.
 Instance roles.
o System defined roles:
 ORGADMIN: Organization Administrator.
 Can create and view all accounts.
 List all regions enabled for the organization.
 View usage information of all accounts.
 Enable DB replication for an account.
 ACCOUNTADMIN:
 Encapsulates SYSADMIN and SECURITYADMIN.
 SECURITYADMIN:
 MANAGE GRANTS: able to modify or revoke any grant.
 Encapsulates USERADMIN.
 USERADMIN:
 CREATE USER and CREATE ROLE.
 SYSADMIN:
 Create WH, DBs and other objects.
 All custom roles SHOULD be assigned to him.
 PUBLIC: automatically granted to every user.
- You can see every query. You can only see your own results.
- Grant USAGE allows the role to see the object. U-D1, U-S1, S-T1
- Authentication mechanisms:
o MFA: fully managed by Snowflake and enabled by Duo Security Service.
 Cannot be centrally enforced.
 Automatically enabled for every account.
 Any user can enroll through UI.
 SECURITYADMIN to disable MFA for a user.
 Options: push notifications, call, passcode.
o Key pair authentication:
 Enabled to all clients and all editions.
 Consist of 1 private key and up to 2 public keys for user.
 You can rotate the keys.
o Federated authentication:
 Snowflake is compatible with the majority of SAML Identity Providers (IdP)
 Okta (native)
 AD FS (native)
 OneLogin
 Ping Identity PingOne
 Google G Suite
 Microsoft Azure Active Directory
 You don’t need to log in with Snowflake after that, single sign-on (SSO) is
allowed.
 SSO: 1 log in to access multiple applications.
 If you disable a user in these kinds of environments, the user will still be able to
log in to the IdP but will receive an error when connecting to Snowflake.
 If the IdP times out, the user’s Snowflake session will remain but a new log in is
required to start a new session.
 You can log in directly with Snowflake without using the IdP.
- SCIM:
o Automated management of user and groups.
o RESTful APIs to integrate different IdP:
 Okta – Azure – Custom integration.
o CREATE SECURITY INTEGRATION for redirecting to authentication before accessing REST
API.
- Data encryption: fully managed by Snowflake and available to all Snowflake connections.
o At rest: AES 256-bit. Keys rotated every 30 days + one re-keying process every year.
o In transit: TLS 1.2.
o Active key encrypts and decrypts, retired key only decrypts.
- Column level security (Enterprise):
o Dynamic Data Masking.
 Schema level object.
o External Tokenization.
o Tag based masking.
- Row level security (Enterprise):
o Schema level object.
o Determine which rows to return.
- ACCOUNTADMIN should have a functional email for urgent Snowflake Support issues.
- Security and compliance reports:
o IRAP Protected
o ITAR
o FedRAMP
o GxP
o SOC 1 Type II – SOC 2 Type II
o CSA Star Level 1
o PCI-DSS
o HITRUST / HIPAA
o ISO/IEC 27001
o Department of Defense (DoD)
o CJIS
- Releases:
o Weekly – Full Release: new features, updates, fixes…
 Day 1: early access – Enterprise, if wanted.
 Day 1-2: regular access – All Standard.
 Day 2: last – Minimum of 24h from early access.
o Weekly – Patch Release: fixes only.
o Monthly – Behavior Changes:
 One full release that introduces behavior changes.
 Behavior changes: something that returns different results and may
affect current code and workloads.
rd th
 3 , 4 week typically.
 Each month but November and December.
 Lifecycle:
 Testing period – 1st Month. Disabled by default.
 Opt-out period – 2nd Month. Enabled by default.
 After 2 months, contact Snowflake Support to disable individual
behavior changes.
- Every account has its own region.

Snowflake Notes
100% (9)
Snowflake Notes
67 pages
Download Designing Data Intensive Applications The Big Ideas Behind Reliable Scalable and Maintainable Systems Martin Kleppmann ebook All Chapters PDF
75% (4)
Download Designing Data Intensive Applications The Big Ideas Behind Reliable Scalable and Maintainable Systems Martin Kleppmann ebook All Chapters PDF
61 pages
Snowflake+Interview+Questions+ +Part+II
100% (1)
Snowflake+Interview+Questions+ +Part+II
31 pages
Snowflake PPT 22
50% (2)
Snowflake PPT 22
220 pages
Programming+in+Snowflake+ +All+Slides
100% (1)
Programming+in+Snowflake+ +All+Slides
342 pages
Snowflake - Syllubus
No ratings yet
Snowflake - Syllubus
10 pages
Administering Snowflake
No ratings yet
Administering Snowflake
4 pages
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-3: AZ 104 EXAM STUDY GUIDE
From Everand
MICROSOFT AZURE ADMINISTRATOR EXAM PREP(AZ-104) Part-3: AZ 104 EXAM STUDY GUIDE
Devi Prasad
No ratings yet
GCP Snowflake.pptx
No ratings yet
GCP Snowflake.pptx
83 pages
Snowflake - T
No ratings yet
Snowflake - T
108 pages
Snowflake
No ratings yet
Snowflake
122 pages
Snow
No ratings yet
Snow
17 pages
Snow
No ratings yet
Snow
17 pages
Snowflake
No ratings yet
Snowflake
7 pages
Snowflake Training Slide SANMs
67% (6)
Snowflake Training Slide SANMs
218 pages
SnowFlake Notes
100% (1)
SnowFlake Notes
40 pages
What Is Snowflake
No ratings yet
What Is Snowflake
34 pages
Snowflake Certification Syllabus
No ratings yet
Snowflake Certification Syllabus
4 pages
Mithun Snowflake
No ratings yet
Mithun Snowflake
3 pages
Snowflake Architecture - Concepts
No ratings yet
Snowflake Architecture - Concepts
38 pages
SnowFlake SnowPro Core Certification notes_fromblogs (1) 1 2
No ratings yet
SnowFlake SnowPro Core Certification notes_fromblogs (1) 1 2
172 pages
Architecture
No ratings yet
Architecture
4 pages
Snowflake
No ratings yet
Snowflake
16 pages
2 - Snowflake de Feb25
No ratings yet
2 - Snowflake de Feb25
90 pages
(Views4You - English) Getting Started - Architecture & Key Concepts
No ratings yet
(Views4You - English) Getting Started - Architecture & Key Concepts
6 pages
Tecnical Seminar
No ratings yet
Tecnical Seminar
16 pages
Snowflake Material.docx
No ratings yet
Snowflake Material.docx
2,157 pages
Snowflake Data Sharing
100% (1)
Snowflake Data Sharing
35 pages
Snowpro Exam Questions
No ratings yet
Snowpro Exam Questions
4 pages
Snowflake UNIT II
No ratings yet
Snowflake UNIT II
44 pages
Certification Dumps1
No ratings yet
Certification Dumps1
280 pages
Snowflake Architecture
No ratings yet
Snowflake Architecture
5 pages
Practice_questions_0 (1)
No ratings yet
Practice_questions_0 (1)
19 pages
Best Practices For Optimizing Your DBT and Snowflake Deployment
No ratings yet
Best Practices For Optimizing Your DBT and Snowflake Deployment
30 pages
Presentation 1
No ratings yet
Presentation 1
57 pages
SnowPro Core Study Guide
No ratings yet
SnowPro Core Study Guide
37 pages
Rocking Snowflake With Aws Co5BhSmn
No ratings yet
Rocking Snowflake With Aws Co5BhSmn
7 pages
Certification_Dumps1 (1)
No ratings yet
Certification_Dumps1 (1)
306 pages
Snowflake Mastering
No ratings yet
Snowflake Mastering
6 pages
Arch and Types of Snowflake Stages
No ratings yet
Arch and Types of Snowflake Stages
51 pages
Introduction to Oracle Database Administration
From Everand
Introduction to Oracle Database Administration
Ying Wang
5/5 (1)
Snowpro-Core LATEST
No ratings yet
Snowpro-Core LATEST
391 pages
E - Snowflake-Snowpro-Core-1
No ratings yet
E - Snowflake-Snowpro-Core-1
79 pages
Snowpro Core
No ratings yet
Snowpro Core
58 pages
Snowpro Core (1)
No ratings yet
Snowpro Core (1)
55 pages
Cloud Data Platform Security How Snowflake Sets The Standard
No ratings yet
Cloud Data Platform Security How Snowflake Sets The Standard
11 pages
Best Practices For Using Tableau With Snowflake
No ratings yet
Best Practices For Using Tableau With Snowflake
63 pages
snowflake
No ratings yet
snowflake
13 pages
Snowflake Data Sharing
No ratings yet
Snowflake Data Sharing
16 pages
Snowflake Prctice1
No ratings yet
Snowflake Prctice1
51 pages
Practice_questions_0
No ratings yet
Practice_questions_0
19 pages
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Interview Questions
No ratings yet
Interview Questions
16 pages
Snowflake SnowPro Set3 222
No ratings yet
Snowflake SnowPro Set3 222
16 pages
All Course Slides
100% (1)
All Course Slides
192 pages
Snowflake Questions 2
No ratings yet
Snowflake Questions 2
6 pages
SnowProCoreStudyGuide 042621
No ratings yet
SnowProCoreStudyGuide 042621
13 pages
Kent Graziano UKDVUG Snowflake-Features-18NOV20
No ratings yet
Kent Graziano UKDVUG Snowflake-Features-18NOV20
40 pages
Snowflake Question Ppt
No ratings yet
Snowflake Question Ppt
446 pages
SnowflakeDataCloudConnector en
No ratings yet
SnowflakeDataCloudConnector en
74 pages
Redshift Vs Snowflake - An In-Depth Comparison PDF
100% (2)
Redshift Vs Snowflake - An In-Depth Comparison PDF
19 pages
06_Memory System_I
No ratings yet
06_Memory System_I
63 pages
Glossary of ICT Terminology
No ratings yet
Glossary of ICT Terminology
69 pages
Hts Log
No ratings yet
Hts Log
6 pages
Micro
No ratings yet
Micro
265 pages
Configure ASMM and AMM in Oracle
No ratings yet
Configure ASMM and AMM in Oracle
3 pages
4 Designing Instagram - Grokking The System Design Interview
No ratings yet
4 Designing Instagram - Grokking The System Design Interview
9 pages
Module:-11. Day56,57,58
No ratings yet
Module:-11. Day56,57,58
17 pages
CPUlogic Design 11 Cache
No ratings yet
CPUlogic Design 11 Cache
27 pages
h9538 Vfcache Oracle Vmax WP
No ratings yet
h9538 Vfcache Oracle Vmax WP
13 pages
Archer Dynamic Workflow Tracker Tool & Utility 6.12 P1 Implementation Guide
No ratings yet
Archer Dynamic Workflow Tracker Tool & Utility 6.12 P1 Implementation Guide
52 pages
SQLSentry - Optimizing SSAS
No ratings yet
SQLSentry - Optimizing SSAS
18 pages
Disk Performance Optimization
No ratings yet
Disk Performance Optimization
56 pages
Ldco Unit 6 Notes
No ratings yet
Ldco Unit 6 Notes
44 pages
Vbo Vao
No ratings yet
Vbo Vao
19 pages
Drupal On Windows Azure
No ratings yet
Drupal On Windows Azure
21 pages
Information Systems and Its Components: After Reading This Chapter, You Will Be Able To
No ratings yet
Information Systems and Its Components: After Reading This Chapter, You Will Be Able To
104 pages
DB Cache Advice
No ratings yet
DB Cache Advice
2 pages
Vxworks Ut700leap BSP
No ratings yet
Vxworks Ut700leap BSP
133 pages
Cit314 Summary From Noungeeks
No ratings yet
Cit314 Summary From Noungeeks
42 pages
Memory Worksheet Vision Academy
No ratings yet
Memory Worksheet Vision Academy
3 pages
1z0 338 PDF
No ratings yet
1z0 338 PDF
30 pages
9 - Computer Memory System Overview Cache Memory Principles
No ratings yet
9 - Computer Memory System Overview Cache Memory Principles
11 pages
DBSI - PROJECT Final
No ratings yet
DBSI - PROJECT Final
25 pages
Osg-Final Test - Key - Nokey
No ratings yet
Osg-Final Test - Key - Nokey
26 pages
Purple and White Modern Advertising Presentation
No ratings yet
Purple and White Modern Advertising Presentation
18 pages
Advance Concept in Data Bases Unit-2 by Arun Pratap Singh
100% (1)
Advance Concept in Data Bases Unit-2 by Arun Pratap Singh
51 pages
Alcatel BSS: A9130 MFS Evolution IMT User Guide
No ratings yet
Alcatel BSS: A9130 MFS Evolution IMT User Guide
74 pages
Huawei Oceanstor 5000 v5 Product Description
No ratings yet
Huawei Oceanstor 5000 v5 Product Description
248 pages
Good Performance of Storage Systems With IBM I
No ratings yet
Good Performance of Storage Systems With IBM I
51 pages

snowflake_notes

Uploaded by

snowflake_notes

Uploaded by

SNOWFLAKE NOTES

2. SNOWFLAKE VIRTUAL WAREHOUSES

3. SNOWFLAKE STORAGE AND PROTECTION

- Stages: (what about external stages and transformations?)

5. PERFORMANCE AND TUNING

6. SEMI STRUCTURED DATA

7. ACCOUNT AND SECURITY

- Object access methods:

You might also like