0% found this document useful (0 votes)
422 views

Snowpro Advanced: Data Engineer: Exam Study Guide

The document provides an overview and study guide for Snowflake's SnowPro Advanced: Data Engineer certification exam. It outlines the exam's domains and objectives, estimated study time, prerequisites, and recommended resources. The guide includes sample questions and links to documentation, videos, blogs and exercises to help candidates understand data engineering concepts in Snowflake and prepare for the certification. It recommends holding the SnowPro Core certification and having 2+ years of data engineering experience, including practical experience using Snowflake, to successfully complete the exam.

Uploaded by

harikishoreg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
422 views

Snowpro Advanced: Data Engineer: Exam Study Guide

The document provides an overview and study guide for Snowflake's SnowPro Advanced: Data Engineer certification exam. It outlines the exam's domains and objectives, estimated study time, prerequisites, and recommended resources. The guide includes sample questions and links to documentation, videos, blogs and exercises to help candidates understand data engineering concepts in Snowflake and prepare for the certification. It recommends holding the SnowPro Core certification and having 2+ years of data engineering experience, including practical experience using Snowflake, to successfully complete the exam.

Uploaded by

harikishoreg
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

SNOWPROⓇ ADVANCED:

DATA ENGINEER
EXAM STUDY GUIDE
Last Updated: August 1, 2023
SNOWPRO ADVANCED: DATA ENGINEER STUDY GUIDE OVERVIEW

This study guide highlights concepts that may be covered on Snowflake’s SnowPro Advanced:
Data Engineer Certification exam.

This study guide does not guarantee certification success.

Holding the SnowPro Core certification in good standing is a prerequisite for taking the
Advanced: Data Scientist certification.

For an overview and more information on the SnowPro Core Certification exam or SnowPro
Advanced Certification series, please navigate here.

RECOMMENDATIONS FOR USING THE GUIDE

This guide will show the Snowflake topics and subtopics covered on the exam. Following the
topics will be additional resources consisting of videos, documentation, blogs, and/or exercises
to help you understand data engineering with Snowflake.

Estimated length of study guide: 10 – 13 hours

Some links may have more value than others, depending on your experience. The same
amount of time should not be spent on each link. Some links may appear in more than one
domain.

Page 1
TABLE OF CONTENTS
SNOWPRO ADVANCED: DATA ENGINEER STUDY GUIDE OVERVIEW 1
RECOMMENDATIONS FOR USING THE GUIDE 1
SNOWPRO ADVANCED: DATA ENGINEER CERTIFICATION OVERVIEW 2
SNOWPRO ADVANCED: DATA ENGINEER PREREQUISITE 3
SNOWPRO ADVANCED: DATA ENGINEER SUBJECT AREA BREAKDOWN 4
SNOWPRO ADVANCED: DATA ENGINEER DOMAINS & OBJECTIVES 4
Domain 1.0: Data Movement 4
Domain 1.0: Data Movement Study Resources 5
Domain 2.0: Performance Optimization 6
Domain 2.0: Performance Optimization Study Resources 6
Domain 3.0: Storage & Data Protection 7
Domain 3.0: Storage & Data Protection Study Resources 7
Domain 4.0: Security 7
Domain 4.0: Security Study Resources 8
Domain 5.0: Data Transformation 8
Domain 5.0: Data Transformation Study Resources 9
SNOWPRO ADVANCED: DATA ENGINEER SAMPLE QUESTIONS 10

SNOWPRO ADVANCED: DATA ENGINEER CERTIFICATION OVERVIEW

The SnowPro Advanced: Data Engineer tests advanced knowledge and skills used to apply
comprehensive data engineering principles using Snowflake. The exam will assess skills through
scenario-based questions and real-world examples.

This certification will test the ability to:


● Source data from Data Lakes, APIs, and on-premises
● Transform, replicate, and share data across cloud platforms
● Design end-to-end near real-time streams
● Design scalable compute solutions for Data Engineer workloads
● Evaluate performance metrics

Target Audience:
2 + years of data engineering experience, including practical experience using Snowflake for
Data Engineer tasks. In addition, successful candidates may have:
● A working knowledge of Restful APIs, SQL, semi-structured datasets, and cloud native
concepts.

Programming experience is a plus.

Page 2
This exam is designed for:
● Data Engineers
● Software Engineers

SNOWPRO ADVANCED: DATA ENGINEER PREREQUISITE

Eligible individuals must hold an active SnowPro Core Certified credential. If you feel you need
more guidance on the fundamentals, please see the SnowPro Core Exam Study Guide.

STEPS TO SUCCESS
1. Review the Data Engineer Exam Guide
2. Attend Snowflake’s Instructor Led Data Engineering Course
3. Review and study applicable white papers and documentation
4. Get hands-on practical experience with relevant business requirements using Snowflake
5. Attend Snowflake Webinars
6. Attend Snowflake Virtual Hands-on Labs for more hands-on practical experience
7. Schedule your exam
8. Take your exam!

Additional Snowflake Asset to check out for Data Engineering:

Cloud Data Engineering for Dummies

Page 3
SNOWPRO ADVANCED: DATA ENGINEER SUBJECT AREA BREAKDOWN

This exam guide includes test domains, weightings, and objectives. It is not a comprehensive
listing of all the content that will be presented on this examination. The table below lists the main
content domains and their weightings.

Domain Domain Weightings on Exams

1.0 Data Movement 28%

2.0 Performance Optimization 22%

3.0 Storage and Data Protection 10%

4.0 Security 10%

5.0 Data Transformation 30%

SNOWPRO ADVANCED: DATA ENGINEER DOMAINS & OBJECTIVES

Domain 1.0: Data Movement

1.1 Given a data set, load data into Snowflake.


● Outline considerations for data loading
● Define data loading features and potential impact

1.2 Ingest data of various formats through the mechanics of Snowflake.


● Required data formats
● Outline stages

1.3 Troubleshoot data ingestion.


● Identify causes of ingestion errors
● Determine resolutions for ingestion errors

1.4 Design, build and troubleshoot continuous data pipelines.


● Stages
● Tasks
● Streams
● Snowpipe (for example, Auto ingest as compared to Rest API)

1.5 Analyze and differentiate types of data pipelines.


● Create User-Defined Functions (UDFs) and stored procedures including
Snowpark
● Design and use the Snowflake SQL API

1.6 Install, configure, and use connectors to connect to Snowflake.

Page 4
1.7 Design and build data sharing solutions.
● Implement a data share
● Create a secure view
● Implement row level filtering

1.8 Outline when to use external tables and define how they work.
● Partitioning external tables
● Materialized views
● Partitioned data unloading

Domain 1.0: Data Movement Study Resources

Lab Guides
Accelerating Data Engineering with Snowflake & dbt
Auto-Ingest Twitter Data into Snowflake
Automating Data Pipelines to Drive Marketing Analytics with Snowflake & Fivetran

Additional Assets
Support for Calling External functions via Google Cloud API Gateway Now in Public
Preview (blog)
Snowflake and Spark, Part 2: Pushing Spark Query (blog)
Fetching Query Results From Snowflake (blog)
Moving from On-Premises ETL to Cloud-Driven ELT (white paper)

Snowflake Documentation Links


COPY INTO
Connectors & Drivers
Continuous Data Pipelines
COPY_HISTORY
CREATE EXTERNAL TABLE
CREATE FILE FORMAT
CREATE STREAM
CREATE TASK
Data Loading Tutorials
Databases, Tables & Views
DESCRIBE STAGE
Loading Data into Snowflake
Sharing Data Securely in Snowflake
VALIDATE_PIPE_LOAD

Page 5
Domain 2.0: Performance Optimization

2.1 Troubleshoot underperforming queries.


● Identify underperforming queries
● Outline telemetry around the operation
● Increase efficiency
● Identify the root cause

2.2 Given a scenario, configure a solution for the best performance.


● Scale out as compared to scale up
● Virtual warehouse properties (for example, size, multi-cluster)
● Query complexity
● Micro-partitions and the impact of clustering
● Materialized views
● Search optimization service
● Query acceleration service

2.3 Outline and use caching features.

2.4 Monitor continuous data pipelines.


● Snowpipe
● Tasks
● Streams

Domain 2.0: Performance Optimization Study Resources

Lab Guides
Resource Optimization: Performance
Resource Optimization: Usage Monitoring
Building a Data Application

Additional Assets
Performance Impact from Local and Remote Disk Spilling (blog)
Snowflake: Visualizing Warehouse Performance (blog)
Caching in Snowflake Data Warehouse (blog)

Snowflake Documentation Links


Account Usage
Analyzing Queries Using Query Profile
COPY_HISTORY
COPY_HISTORY View
Databases, Tables & Views
LOAD_HISTORY View
PIPE_USAGE_HISTORY View
Queries
QUERY_HISTORY, QUERY_HISTORY_BY_*

Page 6
SHOW STREAMS
System Functions
TASK_HISTORY
Virtual Warehouses

Domain 3.0: Storage & Data Protection

3.1 Implement data recovery features in Snowflake.


● Time Travel
● Fail-safe

3.2 Outline the impact of streams on Time Travel.

3.3 Use system functions to analyze micro-partitions.


● Clustering depth
● Cluster keys

3.4 Use Time Travel and cloning to create new development environments.
● Clone objects
● Validate changes before promoting
● Rollback changes

Domain 3.0: Storage & Data Protection Study Resources

Lab Guides
Getting Started with Time Travel

Snowflake Documentation Links


Snowflake Time Travel & Fail-safe
Databases, Tables & Views
Parameter Hierarchy and Types
Database Replication and Failover/Failback
Continuous Data Pipelines
SYSTEM$CLUSTERING_INFORMATION
SYSTEM$CLUSTERING_DEPTH

Domain 4.0: Security

4.1 Outline Snowflake security principles.


● Authentication methods (Single Sign-On (SSO), key pair authentication,
username/password, Multi-Factor Authentication (MFA))
● Role Based Access Control (RBAC)
● Column level security and how data masking works with RBAC to secure
sensitive data

Page 7
4.2 Outline the system defined roles and when they should be applied.
● The purpose of each of the system defined roles including best practices usage
in each case
● The primary differences between SECURITYADMIN and USERADMIN
roles
● The difference between the purpose and usage of the USERADMIN/
SECURITYADMIN roles and SYSADMIN

4.3 Manage data governance.


● Explain the options available to support column level security including
Dynamic Data Masking and external tokenization
● Explain the options available to support row level security using Snowflake
row access policies
● Use DDL required to manage Dynamic Data Masking and row access policies
● Use methods and best practices for creating and applying masking policies on
data
● Use methods and best practices for object tagging

Domain 4.0: Security Study Resources

Additional Assets
Snowflake RBAC Security Prefers Role Inheritance to Role Composition (blog)

Snowflake Documentation Links


CREATE MATERIALIZED VIEW
GRANT <privileges>…TO ROLE
Managing Governance in Snowflake
Managing Security in Snowflake
Managing Your User Preferences
Stored Procedures

Domain 5.0: Data Transformation

5.1 Define User-Defined Functions (UDFs) and outline how to use them.
● Snowpark UDFs (for example, Java, Python, Scala)
● Secure UDFs
● SQL UDFs
● JavaScript UDFs
● User-Defined Table Functions (UDTFs)

5.2 Define and create external functions.


● Secure external functions
● Integration requirements

Page 8
5.3 Design, build, and leverage stored procedures.
● Snowpark stored procedures (for example, Java, Python, Scala)
● SQL Scripting stored procedures
● JavaScript stored procedures
● Transaction management

5.4 Handle and transform semi-structured data.


● Traverse and transform semi-structured data to structured data
● Transform structured data to semi-structured data
● Understand how to work with unstructured data

5.5 Use Snowpark for data transformation.


● Understand Snowpark architecture
● Query and filter data using the Snowpark library
● Perform data transformations using Snowpark (for example, aggregations)
● Manipulate Snowpark DataFrames

Domain 5.0: Data Transformation Study Resources

Additional Assets
Snowflake For Data Engineering – Easily Ingest, Transform and Deliver Data for
Up-To-The Moment Insight (white paper)
Bringing Extensibility to Data Pipelines: What’s New with Snowflake External Functions
(blog)
Generating a JSON Dataset Using Relational Data in Snowflake (blog)
Best Practices for Managing Unstructured Data (white paper)

Snowflake Documentation Links


CREATE API INTEGRATION
CREATE EXTERNAL FUNCTION
Databases, Tables & Views
External Functions
Queries
Semi-Structured Data
Snowpark
Stored Procedures
Transactions
TRY_PARSE_JSON
UDFs (User-Defined Functions)

Ready to register for an exam? Navigate here to get started.

Page 9
SNOWPRO ADVANCED: DATA ENGINEER SAMPLE QUESTIONS

1. Running the below clustering information analysis function:

SELECT SYSTEM$CLUSTERING_INFORMATION(‘table1 , ‘(col1, col2)’)on


TABLE1, that is not clustered, will return which of the following?

a. An error: this function works only on clustered tables.


b. Clustering information on all tables: this function clusters all tables by default.
c. Clustering information: the information will be presented as if the table was
clustered by col1,col2.
d. An error: this function does not accept lists of columns as a second parameter.

2. A Data Engineer has inherited a database and is monitoring a table with the below query
every 30 days:

SELECT SYSTEM$CLUSTERING_INFORMATION( ‘orders’, ‘(o_orderdate)’);

The Engineer gets the first two results (e.g., Day 0 and Day 30).

-- DAY 0 -------
{
"cluster_by_keys" : "LINEAR(o_orderdate)",
"total_partition_count" : 3218,
"total_constant_partition_count" : 0,
"average_overlaps" : 20.4133,
"average_depth" : 11.4326,
"partition_depth_histogram" : {
"00000" : 0,
"00001" : 0,
"00002" : 0,
"00003" : 0,
"00004" : 0,
"00005" : 0,
"00006" : 0,
"00007" : 0,
"00008" : 0,
"00009" : 0,
"00010" : 993,
"00011" : 841,
"00012" : 748,
"00013" : 413,
"00014" : 121,
"00015" : 74,
"00016" : 16,
"00032" : 12

Page 10
}
}

-- DAY 30 -------
{
"cluster_by_keys" : "LINEAR(o_orderdate)",
"total_partition_count" : 3240,
"total_constant_partition_count" : 0,
"average_overlaps" : 64.1185,
"average_depth" : 33.4704,
"partition_depth_histogram" : {
"00000" : 0,
"00001" : 0,
"00002" : 0,
"00003" : 0,
"00004" : 0,
"00005" : 0,
"00006" : 0,
"00007" : 0,
"00008" : 0,
"00009" : 0,
"00010" : 0,
"00011" : 0,
"00012" : 0,
"00013" : 0,
"00014" : 0,
"00015" : 0,
"00016" : 0,
"00032" : 993,
"00064" : 2247
}
}

How should the Engineer interpret these results?

a. The table is well organized for queries that range over column o_orderdate.
Over time, this organization is degrading.
b. The table was initially well organized for queries that range over column
o_orderdate. Over time this organization has improved further.
c. The table was initially not organized for queries that range over column
o_orderdate. Over time, this organization has changed.
d. The table was initially poorly organized for queries that range over column
o_orderdate. Over time, this organization has improved.

Page 11
3. A Data Engineer is preparing to load staged data from an external stage using a
task object.

Which of the following practices will provide the MOST efficient load performance?

a. Store the files on the external stage to ensure caching is maintained


b. PUT all files in a single directory
c. Limit file names to under 30 characters
d. Organize files into logical paths that reflect a scheduling pattern

4. A Data Engineer is working on a project that requires data to be moved directly from an
internal stage to an external stage.

Which of the following is the QUICKEST way to accomplish this task?

a. COPY INTO @myExtStage from (SELECT $1, $2, ...


@myInternalStage);
b. Copy the data from the internal stage to a table and then unload the data to an
external stage
c. COPY INTO @myExtStage from @myInternalStage;
d. Write a custom script to move the data

5. The S1 schema contains two permanent tables that were created as shown below:

CREATE TABLE table_a (c1 INT)


DATA_RETENTION_TIME_IN_DAYS = 10;

CREATE TABLE table_b (c1 INT);

What will be the impact of running the following command?

ALTER SCHEMA S1 SET DATA_RETENTION_TIME_IN_DAYS = 20;

a. The retention time on table_a does not change; table_b is set to 20 days.
b. An error will be generated; a data retention time on a schema cannot be set.
c. The retention time on both tables will be set to 20 days.
d. The retention time will not change on either table.

Page 12
Correct responses for sample questions:
1: b, 2: a, 3: d, 4: a , 5: a

The information provided in this study guide is provided for your purposes only and may not be provided
to third parties.

IN ADDITION, THIS STUDY GUIDE IS PROVIDED “AS IS”. NEITHER SNOWFLAKE NOR ITS
SUPPLIERS MAKES ANY OTHER WARRANTIES, EXPRESS OR IMPLIED, STATUTORY OR
OTHERWISE, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY,
TITLE, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT.

Page 13

You might also like