Snowflake Data e Book
Snowflake Data e Book
by Joe Kraynak
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Cloud Data Warehousing For Dummies, Snowflake Special Edition
Published by
John Wiley & Sons, Inc.
111 River St.
Hoboken, NJ 07030-5774
www.wiley.com
Copyright 2017 by John Wiley & Sons, Inc., Hoboken, New Jersey
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any
form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise,
except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without
the prior written permission of the Publisher. Requests to the Publisher for permission should be
addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken,
NJ07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/
permissions.
Trademarks: Wiley, For Dummies, the Dummies Man logo, The Dummies Way, Dummies.com,
Making Everything Easier, and related trade dress are trademarks or registered trademarks of
John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries, and may
notbe used without written permission. All other trademarks are the property of their respective
owners. John Wiley & Sons, Inc., is not associated with any product or vendor mentioned in
thisbook.
For general information on our other products and services, or how to create a custom For
Dummies book for your business or organization, please contact our Business Development
Department in the U.S. at 877-409-4177, contact [email protected], or visit www.wiley.com/go/
custompub. Forinformation about licensing the For Dummies brand for products or services,
contact BrandedRights&[email protected].
ISBN 978-1-119-35192-4(pbk); ISBN 978-1-119-35190-0(ebk)
10 9 8 7 6 5 4 3 2 1
Publishers Acknowledgments
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Introduction
A
s an executive, manager, or analyst, youre well aware that
knowledge is power and that data properly analyzed on a
timely basis provides the insight necessary to make well-
informed decisions and achieve a competitive advantage.
More data opens the door to more and bigger opportunities, which
are almost always accompanied by equally big challenges. To take
advantage of these big opportunities, you need to find and imple-
ment a data warehouse solution that can store and organize data
in diverse formats, provide convenient access to it, and improve
the speed at which you can analyze that data. And it must be done
as cost-effectively as possible. This book shows you how.
Introduction 1
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Foolish Assumptions
We surmise that you grasp the concept of data warehousing and
the challenges and opportunities it presents.
Throughout this book are case studies that reveal how various
companies applied cloud data warehousing in real-world situa-
tions. They significantly improved the speed and performance of
their data storage and analytics systems and saved money in the
process.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
Data warehousing: past to present
Chapter 1
Getting Up to Speed on
Cloud Data Warehousing
I
n one form or another, cloud computing and software-as-a-service
(SaaS) have been around for decades. But cloud data warehouse-
as-a-service (DWaaS) has only recently emerged as an alternative
to conventional, on-premises data warehousing and similar types of
solutions that have appeared in recent years. Why? Why now? Whats
changed? In this chapter, we answer these questions, and more.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
emerged when companies realized that analyzing data directly
from those internal systems competed with the day-to-day activi-
ties of business users such as data entry and operational reporting.
Over the years, data sources have expanded beyond internal busi-
ness operations and external transactions. They now include
exponentially greater volumes of data and more complex data
from websites, mobile phones, online games, online banking apps,
and even machines. Most recently, companies are capturing huge
amounts of data from IoT (Internet of things)-enabled devices.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Recognizing the limitations of
conventional data warehousing
Conventional data warehouse solutions were not designed to
handle the volume, variety, and complexity of todays data. And
newer systems designed to address these shortcomings struggle
to accommodate the data access and analysis that organizations
now require. Todays challenges reveal:
But all is not lost! Like all great things, technology evolves. New
ideas and new methods emerge to address the significant busi-
ness problems of today and the aspirations of tomorrow.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Massively parallel processing (MPP): MPP emerged in the
previous decade, which involves dividing a single computing
operation to execute simultaneously across a large number
of separate computer processors. This division of labor
facilitates faster storage and analysis of data when software
is built to capitalize on this approach.
Columnar storage: Traditionally, databases stored records
in rows, similar to how a spreadsheet appears. For example,
this could include all information about a customer or a retail
transaction. Retrieving data the traditional way required the
system to read the entire row to get one element. This is
laborious and time-consuming. With columnar storage, each
data element of a record is stored in a column. With this
approach, a user can query just one data element, such as
gym members who have paid their dues, without having to
read everything else in that entire record, which may include
each members ID number, name, age, address, city, state,
payment info, and so on. The approach can provide a much
faster response to these kinds of analytic queries.
Vectorized processing: This form of data processing for
data analytics (the science of examining data to draw
conclusions) takes advantage of the recent and revolutionary
computer chip designs. This approach delivers much faster
performance versus older data warehouse solutions built
decades ago for older, slower hardware technology.
Solid state drives (SSDs): Unlike hard disk drives (HDDs),
SSDs store data on flash memory chips, which accelerates
data storage, retrieval, and analysis. A solution that
takesadvantage of SSDs can deliver significantly better
performance.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Traditional data warehouse software deployed on cloud
infrastructure: This option is very similar to a conventional
data warehouse, as it reuses the original code base. So you
still need IT expertise to build and manage the data ware-
house. While you do not have to purchase and install the
hardware and software, you may still have to do significant
configuration and tuning, and perform operations such as
regular backups.
Traditional data warehouse hosted and managed in the
cloud by a third party as a managed service: With this
option, the third party provider supplies the IT expertise, but
youre still likely to experience many of the same limitations
of a conventional data warehouse. The data warehouse is
hosted on hardware installed in a data center managed by
the vendor. This is similar to what the industry referred to as
an ASP or application service provider. The customer still has
to specify in advance how much disk space and compute
resources (CPUs and memory) they expect to use.
A true SaaS data warehouse: With this option, often
referred to as data-warehousing-as-a-service, (DWaaS), the
vendor delivers a complete cloud data warehouse solution
that includes all hardware and software and the IT and
database administration (DBA) expertise required. Clients
typically pay only for the storage and computing resources
they use, when they use them. This option should scale up
and down on demand.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Here are a few areas where cutting-edge cloud data warehouse
technology can significantly improve a companys operations:
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
Adapting to increasing demands for data
access and analytics
Chapter 2
Why the Modern Data
Warehouse Emerged
C
loud data warehousing emerged from the convergence of
three major trends changes in data sources, volume and
complexity; increased demand for data access and analyt-
ics; and technology improvements that significantly increased the
efficiency of data storage, access, and analytics.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
In this section, we focus on changes in data, and data use, that
have led to demand for cloud data warehousing.
The use cases that cloud data warehousing has sparked continue
to emerge. For example, SaaS-born companies and big enterprises
that use the cloud to store their data are monetizing (selling) that
data. They package it as a service and sell it to other organizations
keen to make better business decisions.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Data born in the cloud
The business world has experienced a rapid adoption of SaaS,
including customer relationship management (CRM) software,
business management (ERP) software suites, advertising buying
platforms, and online marketing tools, just to name a few. Thanks
to the cloud, new SaaS companies can set up shop with just the
price of a laptop or two. These products can create huge amounts
of valuable data stored in the cloud.
Machine-generated data
Machine-generated data is a key topic related to the Internet of
Things (IoT). Its an endless collection of devices that communi-
cate data via the Internet, including smart phones, thermostats,
refrigerators, oil rigs, home security systems, smart meters, and
much more. Data collected and analyzed from IoT devices can
enhance products and processes, monitor equipment, and predict
needed maintenance to avoid failure.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Data exploration
Analyzing data starts with data exploration identifying inter-
esting and valuable connections and serving them up to data users
in the form of reports and analytics. Although data exploration
isnt a new concept, the growth in data volume makes it a more
resource-intensive exercise.
Data exploration often involves large data sets. Its also often
experimental in nature, which complicates the ROI assessment
needed to support the significant upfront cost of deploying a tra-
ditional, on-premises data warehouse. In response, the cloud can
enable a data warehouse to scale up and down as needed, and
offers a pay-for-use model to avoid the challenge of whether or
not to make an expensive, upfront commitment.
Data lakes
The growing need to have massive amounts of raw data in different
formats, all in a single location, spawned the data lake. But com-
panies quickly realized that transforming that data and extracting
valuable insight was a cost-prohibitive, labor-intensive process
to even attempt these efforts.
But the original interest in data lakes reveals that companies want
to store all of their data in one location at a reasonable cost. With
a modern data warehouse, the cloud supports the cost-effective
methods to store and transform data with on-demand resources
to minimize this resource-intensive process.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Elasticity to enable analytics
Here are a few scenarios where true, cloud-based elastic data
warehousing can make it possible to do more with data:
Embedded analytics
For many companies, analytics operate as a separate and distinct
business process. But a growing trend is to build analytics into
business applications, which are increasingly built in the cloud.
These applications handle significant variability in the number
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
of users that query the applications and the number of queries
(workloads) users run to analyze that data. The cloud facilitates
data transfers from cloud-based applications to the organiza-
tions cloud data warehouse, where its scalability and elasticity
can better support fluctuations in users and workloads.
As Jana and its data grew, the companys initial analytics architecture
could no longer efficiently serve its business. Queries slowed and
table scans became unfeasible. Adding capacity and backup systems
and administering its open-sourced data repository required more
and more administration time.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Technology Musts for Any Modern Data
Warehouse
Technology innovations can improve data warehousing and ana-
lytics with regard to availability, simplicity, cost, and perfor-
mance. In this section, we focus on the key technologies that
should be part of any modern data warehouse.
Cloud
The properties of cloud make it particularly well-suited for data
warehousing. Weve mentioned these in other contexts but its
also important to know that they came from cloud:
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
noSQL
noSQL, short for not only structured query language (SQL), describes
a technology that enables storing and analyzing newer forms
of data, such as data generated from machines and from social
media, to enrich and expand an organizations data analytics.
Traditional data warehouses dont accommodate these data types
very well. Therefore, newer systems have emerged in recent years
to handle these semi-structured data forms such as JSON,
Avro, and XML.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
Choosing the right data warehouse
solution
Chapter 3
The Criteria for Selecting a
Modern Data Warehouse
T
he trends discussed in Chapter 2 have led to a need and
anopportunity for a new kind of data warehouse. One built
for the volume, diversity, and velocity of todays data,
andfor the new ways organizations use their data. Such a solu-
tion must take advantage of key technology innovations, includ-
ing the cloud.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Accommodates and Integrates All Data
in One Place
Non-traditional, or semi-structured data, as discussed in pre-
vious chapters, can enrich the insight of data analytics beyond
the limits of traditional data. But this requires a new approach to
loading and transforming these new data types before an orga-
nization can analyze that data. Most traditional data warehouses
sacrifice performance or flexibility to handle these data types. A
modern data warehouse should eliminate the need to design and
model rigid, traditional structures upfront that would require
transforming semi-structured data before loading. It should also
optimize query performance against semi-structured data while
still in its native form. Overall, the data warehouse should support
diverse data with flexibility and avoid performance issues.
Efficiently loading all of your data into one location is crucial. But
integrating all of those diverse data types for more precise analyt-
ics is something else. A modern data warehouse should automati-
cally integrate your semi-structured data, once confined to noSQL
systems, with structured data inherent to a traditional, corporate
relational database. There should be nothing to install and con-
figure, with tuning and performance built in. Most importantly,
you shouldnt have to maintain and pay for two separate systems
to manage all of your data.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
ANALYZING DISPARATE
DATA AT CHIME
Chime (chimecard.com) is smarter banking for the mobile generation.
Chime gathers and analyzes data across mobile, web, and back-end
server platforms to enhance its members experiences while deliver-
ing value to its business.
Chime satisfied the following requirements with its new, cloud data
warehouse:
knowledge and skills that arent broadly available and may not
support SQL. A modern data warehouse should be architected
with leading technology but built on inclusive and established
standards (such as SQL) compatible with skills and tools com-
monly available in the industry.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Saves Your Organization Money
A conventional data warehouse can cost millions of dollars in:
licensing fees, hardware, and services; the time and expertise
required to set up, manage, deploy, and tune the warehouse; and
the costs to secure and back up data. In addition, building a data
warehouse that meets the business requirements and takes full
advantage of the volume and variety of todays data is often cost
prohibitive for any organization.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Secures Data at Rest and in Transit
Data security covers the following two main areas:
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
In addition, the solution provider of a modern cloud data ware-
house must perform periodic security testing, known as penetra-
tion testing, to proactively check for vulnerabilities. The vendor
must administer these measures consistently and automatically
without impacting performance.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
Compressing the time-to-value gap
Chapter 4
On-Premises versus
Cloud Data Warehousing
W
hen youre in the market for a new data warehouse, the
first choice to consider is where you want your data
warehouse located: your organizations data center or in
the cloud and provided as software-as-a-service. Traditional on-
premises data warehousing is a mature, well-established technol-
ogy designed well before cloud became a viable platform. With the
rapid adoption of cloud, theres a need for data warehouse solu-
tions that can take full advantage of what the cloud offers.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
revenue shortfalls, and the risk of never implementing the project
due to scope creep.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
A cloud data warehouse replaces the upfront and ongoing cost of
an on-premises system, with simple usage-based pricing. You
pay a monthly fee based on how much storage and computing
resources you use. Conservatively speaking, the annualized cost
for a cloud data warehouse solution can be one-tenth of a similar,
on-premises system.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Considering Data Preparation
and ETL Costs
An on-premises data warehouse must extract data from all of
your data sources. Then it must transform that data to adhere to
the often rigid data structure inside the system before loading it
into the warehouse. A key challenge includes adhering to a finite
and expensive amount of processing capacity and storage. As a
result, data transformation must happen outside normal business
hours to avoid competing with other data processing jobs. This
is expensive. In addition, semi-structured data doesnt arrive in
neatly organized and consistent rows and columns inherent to
traditional data structures. The data is also high-volume, high-
velocity data.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
OPTIMIZING DOUBLEDOWNS
DATAPIPELINE
DoubleDown, an online gaming studio, added a noSQL system to
their data pipeline to prepare data for loading into their data ware-
house. But this approach meant DoubleDowns daily event log (user
clicks and other data generated by gamers activities) required long
processing times. The company couldnt access the previous days
data until 3pm the next day. Even worse, if one of its data computing
clusters went down, the company actually lost data.
When youre in the market for a new data warehouse, consider the
cost and availability of the skills and expertise required to manage
the data warehouse, and the many analytics and other tools used
in conjunction with a data warehouse.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Newer, cloud-built data warehouse solutions provide virtually
unlimited storage and compute; however, consider a data ware-
house that scales storage separate from compute (see Figure4-1).
Ideally, the cloud data warehouse should scale in three ways:
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Deliberating Over Delays andDowntime
Many companies with on-premises solutions have two main
complaints. They must wait hours or more than a day before data
collected the previous day is in the warehouse and available. They
must wait the same time for a complex query to run on a large
data set. In some cases, multiple, concurrent processes can freeze
or crash the system, extending delays and downtime.
As you evaluate your options, look for solutions that address all
these types of performance issues. How quickly you can access
your data and analytics can significantly impact your operations
and your ability to maintain a competitive edge.
Firewall protection
Security protocols
Data encryption, at rest and in transit
User roles and privileges
Monitoring and adapting to emerging security threats
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Effective data security is complex and costly to implement, espe-
cially in terms of human resources. Poorly implemented security
measures expose you to even more costs if breached.
The cloud provides an ideal solution for data protection and recov-
ery. By its nature, it stores data off premises. Some, but certainly not
all, cloud-based solutions automatically back up data to two or more
separate physical locations. If the data centers are geographically
isolated, then they also provide built-in disaster recovery. Cloud data
centers are equipped with redundant power supplies so they remain
up and running even during lengthy power outages. Cloud data
warehousing providers can deliver these protections at a much lower
cost than you by distributing the cost over thousands of clients.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
Evaluating differences between cloud
data warehouse options
Chapter 5
Comparing Cloud Data
Warehouse Solutions
T
he growing adoption of cloud has caused legacy on-premises
vendors and recent market entrants to offer cloud versions
of their data warehouse products. Like any product or
service, no two cloud data warehouse solutions are the same. In
this chapter, we explain some of the differences and what to look
for among cloud data warehouse solutions.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
data warehouse are identical to the same software deployed
using on-premises hardware.
Platform-as-a-service (PaaS): With this hybrid approach,
the data warehouse vendor provides the hardware and
software as a cloud service. The vendor manages the
hardware deployment, software installation, and software
configuration. However, the customer manages, tunes, and
optimizes the data warehouse software.
Software-as-a-service (SaaS): With the SaaS approach, the
data warehouse vendor provides all hardware and software
as part of its service, including all aspects of managing the
hardware and software. Typically included in the service are
software and hardware upgrades, security, availability, data
protection, and optimization.
Comparing Architectures
Many vendors offer a cloud data warehouse originally designed
and deployed for on-premises environments. These traditional
architectures were created long before the cloud and its benefits
emerged as a viable option. Alternatively, any data warehouse
solution built for the cloud should capitalize on the benefits of the
cloud (See Figure5-1).
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Illustration provided by Snowflake.
FIGURE5-1: How a cloud-optimized architecture streamlines performance.
With this greater volume and variety of data, the cloud has
become a natural integration point. An ideal way to address this
issue is with a cloud data warehouse that can handle both rela-
tional and non-relational data, and without having to transform
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
the non-relational data or compromise performance during the
data loading or subsequent query process.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Loading and querying can happen concurrently. The cloud
data warehouse solution should enable separate compute
resources (nodes or even clusters) for different workloads.
This approach enables simultaneous loading and querying of
data without contention by assigning the conflicting pro-
cesses to independent compute clusters.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
copy of the data warehouse is available in case of a failure. At the
other end of the spectrum, the vendor provides monitoring, rep-
lication and automatic failover as part of the service.
Gauging Performance
One of the great promises of the cloud is the ability to have huge
amounts of resources available that you can pay for only when
you need them. Imagine renting a Ferrari for that one impor-
tant occasion but using a fuel-efficient hybrid for your everyday
drive to work. Look for a cloud data warehouse solution that can
optimize performance on demand and eliminates administrative
effort to incorporate new resources.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
The most basic cloud data warehouse offerings provide only
security capabilities, leaving things like encryption, access
control, and security monitoring to the customer.
Other solutions offer features such as encryption and access
controls, which customers can choose to turn on, but they
leave the system vulnerable if not enabled.
Cloud data warehouse offerings that are more service-
orientated incorporate features for security and also assume
the burden of security management by providing encryption,
encryption key management, key rotation, intrusion detec-
tion, and more, as part of the service.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
A cloud data warehouse built on an older architecture is likely to
behave the same. To be effective, the cloud data warehouse should
easily configure multiple pools of compute resources (of varying
sizes) to separate the workloads of users and processes that need to
run concurrently. This approach eliminates contention and provides
resources sized to each workload. Ideally, these separate workloads
should access the same data simultaneously, and turn on and off
easily, based on need.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
IN THIS CHAPTER
Listing your data warehouse needs and
success criteria
Chapter 6
Six Steps to Getting
Started with Cloud Data
Warehousing
N
ow that you know data warehousing basics, the differences
between on-premises and cloud data warehousing, and the
differences between various cloud data warehouses, youre
probably wondering how to apply your newly acquired knowledge
to choose the right cloud data warehouse for your organization.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
In this chapter, we guide you through these six key steps to
choosing a cloud data warehouse. It starts with evaluating your
data warehouse needs and ends with the process of testing your
top choice. By the end of this chapter, youll have a plan to help
you choose your next solution with confidence.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Step 2: Migrate or Start Fresh
Every cloud data warehouse project should start with assessing
how much of your existing environment should migrate to the
new system, and what should be built new for a cloud data ware-
house. These decisions may address everything from design of
the extract, transform, and load (ETL) processes to data models
and software development lifecycle methods. Here are a few key
considerations:
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Step 3: Establish Success Criteria
How will you measure the success of moving to a new cloud data
warehouse? Choose important business and technical require-
ments. Criteria should focus on the performance, concurrency,
simplicity, and total cost of ownership.
White Ops had previously relied on noSQL systems to store and pro-
cess that data. However, that approach required a developer to build
a custom query. The latency for results was at least 24 hours, depend-
ing on the workload. The more requests, the longer the delays.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
If your new cloud data warehouse has capabilities that werent
available in your previous system, and those capabilities are rele-
vant to evaluating the business and technical success of your new
solution, be sure to include them.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Isolates workloads, so workloads do not compete for limited
resources
Scales compute and storage independently and automati-
cally, and scales concurrency without impacting performance
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
opt for an infrastructure-as-a-service (IaaS) or platform-as-a-
service (PaaS) solution (see Chapter5), you need to add the costs
of whatever software, administration, and services the solution
doesnt include.
This is also your chance to think outside the box. Consider what
else you could do above and beyond what you do today. If you had
this cloud data warehouse solution in place, what additional busi-
ness value could this system deliver?
When setting up your POC, list all requirements and success cri-
teria, not just the issues youre trying to resolve. For example, if
your primary complaint about your current data warehouse is that
queries take too long to run, dont focus solely on that issue. Your
POC should cover everything, including ease of migrating your
data to the new warehouse, loading new structured and semi-
structured data, running queries and multiple workloads, and
using your existing business intelligence tools.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
Develop a comprehensive check list. Use your list of data ware-
housing needs and your criteria for success as a starting point.
But dont overlook positive qualities of your existing data ware-
house that are non-issues for you now. In other words, make
sure the new data warehouse can do everything your current data
warehouse does but better, and that it overcomes the drawbacks
of your current warehouse.
If you do a POC with multiple vendors, try to use the same check
list for each.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
These materials are 2017 John Wiley & Sons, Inc. Any dissemination, distribution, or unauthorized use is strictly prohibited.
WILEY END USER LICENSE AGREEMENT
Go to www.wiley.com/go/eula to access Wileys ebook EULA.