0% found this document useful (0 votes)
77 views

Beyond "Lift and Shift": Rearchitecting BI, ETL, and Data Warehousing For The Cloud

TDWI_Pulse_Report_Q42020_web

Uploaded by

Sajeed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Beyond "Lift and Shift": Rearchitecting BI, ETL, and Data Warehousing For The Cloud

TDWI_Pulse_Report_Q42020_web

Uploaded by

Sajeed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Beyond "Lift and Shift"

Rearchitecting BI, ETL, and Data


Warehousing for the Cloud

By David Stodder

Sponsored by:
TDWI PULSE REPORT

 Q4 2020

Beyond "Lift and Shift" Table of Contents


Rearchitecting BI, ETL, and Data The Pulse: Moving Beyond Limited Practices and Mindsets . . 3

Warehousing for the Cloud Leading Drivers, Trends, and Challenges . . . . . . . . . . . . 5

Three Critical Areas of Focus . . . . . . . . . . . . . . . . . . 8

The Data Analytics Platform . . . . . . . . . . . . . . . . 8


By David Stodder Data Transformation, Preparation, and Pipelines . . . . .10

BI and Analytics Systems . . . . . . . . . . . . . . . . .12

Recommendations . . . . . . . . . . . . . . . . . . . . . . . 15

A Final Point: Achieving Business Advantages with Data . . . 17

About Our Sponsors . . . . . . . . . . . . . . . . . . . . . . .18

© 2020 by TDWI, a division of 1105 Media, Inc. All rights reserved. Reproductions
in whole or in part are prohibited except by written permission. Email requests or
feedback to [email protected].

Product and company names mentioned herein may be trademarks and/or


registered trademarks of their respective companies. Inclusion of a vendor,
product, or service in TDWI research does not constitute an endorsement by TDWI
or its management. Sponsorship of a publication should not be construed as an
endorsement of the sponsor organization or validation of its claims.

This report is based on independent research and represents TDWI’s findings;


reader experience may differ. The information contained in this report was
obtained from sources believed to be reliable at the time of publication. Features
and specifications can and do change frequently; readers are encouraged to
visit vendor websites for updated information. TDWI shall not be liable for any
omissions or errors in the information in this report.

1 
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud

About the Author


DAVID STODDER is senior director of TDWI Research for business intelligence. He focuses
on providing research-based insight and best practices for organizations implementing BI,
analytics, performance management, data discovery, data visualization, and related technologies
and methods. He is the author of TDWI Best Practices Reports and Checklist Reports on data-
driven decision-making, increasing the value of BI and analytics, customer intelligence, visual
analytics, BI/DW agility, mobile BI, big data, and information management. He has chaired
TDWI conferences on BI agility and big data analytics. Stodder has provided thought leadership
on BI, information management, and IT management for over two decades. He has served as vice
president and research director with Ventana Research, and he was the founding chief editor of
Intelligent Enterprise, where he served as editorial director for nine years. You can reach him at
[email protected], @dstodder on Twitter, and on LinkedIn at linkedin.com/in/davidstodder.

About TDWI Research


TDWI Research provides industry-leading research and advice for data and analytics
professionals worldwide. TDWI Research focuses on modern data management, analytics,
and data science approaches and teams up with industry thought leaders and practitioners to
deliver both broad and deep understanding of business and technical challenges surrounding
the deployment and use of data and analytics. TDWI Research offers in-depth research reports,
commentary, assessments, inquiry services, and topical conferences as well as strategic planning
services to user and vendor organizations.

About TDWI Pulse Reports


This series offers focused research and analysis of trending analytics, business intelligence, and
data management issues facing organizations. The reports are designed to educate technical and
business professionals and aid them in developing strategies for improvement. Research for the
reports is conducted through surveys of professionals. To suggest a topic, please contact TDWI
senior research directors Fern Halper ([email protected]), Philip Russom ([email protected]),
and David Stodder ([email protected]).

Acknowledgments
TDWI would like to thank many people who contributed to this report, including, our report
sponsors, who reviewed outlines and report drafts. Finally, we would like to recognize TDWI’s
production team: James Powell, Lindsay Stares, Rich Seeley, and Rod Gosser.

Sponsor
Amazon Web Services, Matillion, and Sisense sponsored the writing of this report.

2 
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud

The Pulse: Moving Beyond Limited Practices and


Mindsets
Most organizations could list a variety of reasons why migrating business intelligence, analytics,
and data management to the cloud is a good idea. However, tying them all together is one big
reason: the opportunity to fuel business innovation with unlimited data.

Cloud migration can accelerate the use of data and analytics to accomplish familiar but
essential business goals such as optimizing processes, driving stronger financial performance,
proactively responding to risk and fraud incidents, and building greater customer loyalty and
satisfaction. Timely data insights can create competitive differentiation in each of these areas.
Yet, organizations also need to seize the opportunity to unleash business creativity with data to
intelligently expand into new markets and develop services that monetize data and analytics to
increase return on investment.

Cloud computing offers scalability, flexibility, cost elasticity, and speed to deployment—all
highly attractive attributes essential to data-fueled innovation. TDWI research finds that for
many organizations, adoption of cloud computing goes hand-in-hand with the growing business
stakeholder role in the direction of BI, analytics, and data management; 54% of organizations
surveyed anticipate that the business side’s role in solution and service selection and investment
will increase in the next year.1 This suggests that for many organizations, the impetus for expanding
into the cloud is to make BI, analytics, and data management easier, faster, and more flexible so that
all types of business users can answer critical questions and solve problems.

Executives, managers, and frontline personnel need continuous access to larger and more diverse
data sets so they can gain a complete and contextual understanding of trends, patterns, and events in
their markets, supply chains, and manufacturing processes as well as stay on top of regulatory and
data governance responsibilities. The importance of access to all relevant data is never higher than
during times of unanticipated change. Executives and line-of-business (LOB) managers are under
pressure to respond; they need to make informed, fact-based decisions about personnel, inventory,
marketing strategies, logistics, and other concerns. These users want as little friction as possible in
accessing data so they can quickly visualize and analyze integrated sets of historical and real-time
data from different perspectives. They need predictive insights and prescriptive recommendations.

Organizations are pushing beyond the limits of traditional, on-premises systems. To adapt
to change, often the first consideration is to augment on-premises systems with pay-as-you-go,
cloud-based BI, analytics, and data management. Organizations with investments in existing,
on-premises systems find they have reached their limits and it would be too expensive and time-
consuming to expand capacity or add new instances. The inability of these systems to respond to
immediate and changing business needs is often what drives migration to the cloud.

For example, a marketing function might need to analyze new and diverse data about customers
to support an upcoming campaign—data that pushes beyond the limits of what the organization’s
current extract, transform, and load (ETL) and data warehouse systems can handle. The marketing

1
2020 TDWI Best Practices Report: Evolving from Traditional Business Intelligence to Modern Business Analytics, page 12.
Online at tdwi.org/bpreports.

3 
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud

managers find that managed BI reports and dashboards no longer fit their requirements because
use cases have changed; they need more personalization and the flexibility to pose unanticipated,
ad hoc, and often more complex queries. Marketing’s data scientists want to explore data beyond
what is in the data warehouse; they want to add new variables to predictive models and develop
machine-learning algorithms. Managers need data-driven prescriptive recommendations for
marketing campaigns and omnichannel customer engagement.

Such organizations often discover that working with new cloud-based systems frees them from
not only the physical and economic constraints of on-premises systems but also legacy practices
that have been shaped by those constraints. These practices have been honed by administrators to
deliver results within the limits of traditional BI, ETL, and data warehouse capacities. They were
developed based on what was feasible for achieving planned service levels of speed, availability,
and concurrency with existing on-premises platforms—and no more.

If organizations simply impose these same practices in the new terrain of cloud-based systems,
they become a source of friction that slows down integration of new data and support for new
and different workloads. Thus, when moving to the cloud, it is important to revise mindsets about
how to manage, transform, and analyze data. In the cloud, many of the boundaries associated with
traditional on-premises systems no longer exist.

Data-driven business objectives demand more than “lift and shift.” To realize the potential
of cloud-based BI, analytics, transformation, and data management, organizations need to
do more than just “lift and shift” existing systems and practices into the cloud. To be sure, a
simple lift-and-shift project can reduce costs and improve performance and resiliency, especially
for applications you may not otherwise touch again or you plan to retire soon. However,
organizations that embark on modernizing their data architecture will have the opportunity to
upgrade performance, scalability, and security. A modern architecture that takes full advantage
of the cloud will allow organizations to add new business capabilities, support a culture of
innovation, and attain a higher ROI by leveraging data as a strategic asset.

One key reason to upgrade is to support democratization: that is, expansion in


the number and diversity of people who want to access and interact with data.

One key reason to upgrade is to support democratization: that is, expansion in the number and
diversity of people who want to access and interact with data. As noted, business executives and
managers are now important stakeholders in cloud BI and analytics services adoption. Business
users want more self-service capabilities so they can depend on IT less. Cloud-based services can
empower users to answer questions and solve problems with less IT intervention. Organizations,
therefore, need to evaluate how they can rearchitect BI, analytics, and data management. They must
rethink existing practices to realize the potential of the cloud for supporting growth in self-service
data interaction.

4 
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud

Leading Drivers, Trends, and Challenges


Together, cloud-based BI, analytics, data transformation, and data management offer organizations
opportunities to accelerate business-critical projects that languish with traditional on-premises
systems and practices because these are not adequately scalable and are slow and expensive to
modernize. An overwhelming percentage of organizations surveyed by TDWI (86%) say that cloud
data management is important to the success of their data strategy.2 Today, newer companies cannot
wait until they have built an internal IT and data management infrastructure and commonly have
a cloud-first strategy. Increasingly, more established firms are following suit and making the cloud
their first choice for new applications and data platforms.

Leading drivers identified by TDWI research for cloud-first or cloud-migration strategies include:

• Lower up-front investment for expanding data use; being able to afford newer ways of
achieving scale and speed

• Support for expansion of self-service BI and analytics to new types of users (beyond traditional
business analysts) and new types of projects; foster easier data collaboration and connectivity
between people and between applications

• Shorter data preparation delays, especially ETL or extraction, loading, and transformation
(ELT) by using scalable, cloud-based processing

• Better alignment of data storage and management with business users’ access patterns (i.e., hot,
warm, and cold classifications for data storage)

• More scalable and performant data architecture to support growth in prototyping and
deployment of artificial intelligence (AI) techniques such as machine learning (ML)

• Development of new product lines, services, and embedded application functionality based on
data and analytics (i.e., data monetization)

Data security has been a concern of organizations about moving to the cloud, but TDWI research
finds that although data security is always a priority, organizations are increasingly confident in the
security of cloud applications and databases. With modern practices and technologies, cloud-based
systems can be more secure than on-premises systems.

This level of assurance is reflected in Figure 1, which displays TDWI research into which BI,
analytics, and data management systems organizations are running on cloud-based (i.e., platform-
or software-as-a-service [PaaS or SaaS]) platforms. More than half of respondents say they
have enterprise BI, reporting, and dashboards in the cloud (54%), which indicates that many
organizations are confident in using the cloud to support secure, mainstream data consumption
critical to driving daily business decisions. Large organizations are tapping the cloud’s scalability
and processing performance advantages to support the hundreds (if not thousands) of users of
enterprise BI, reporting, and dashboards.

2
2020 TDWI Best Practices Report: Cloud Data Management, online at tdwi.org/bpreports.

5 
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud

Nearly as many respondents are running cloud-based, business-driven, self-service BI and analytics
in the cloud (51%). This aligns with our research noted earlier that the majority of organizations
surveyed see the business side’s role increasing in the acquisition of BI, analytics, and data
management cloud services. These users are interested in self-service capabilities.

In your organization, which of the following BI, analytics, and data management systems are
currently running on cloud-based (e.g., PaaS) or SaaS platforms?

Enterprise BI, reporting, and dashboards 54%


Business-driven, self-service BI and analytics 51%
Data warehouse 41%
Data lake 39%
Data transformation (ETL) and preparation 38%
Predictive analytics and AI/ML 32%
Mobile BI and analytics 31%
Operational alerting and notification 25%
Data pipelines 25%
Data quality, profiling, and validation processes 22%
Forecasting, budgeting, and financial planning 17%
Data virtualization and federation 16%
Data catalog, glossary, and/or metadata management 16%
OLAP cube 14%

Figure 1. Based on answers from 138 respondents. Respondents could select all answers that applied.
(Source: 2020 TDWI Best Practices Report: Evolving from Traditional Business Intelligence to Modern
Business Analytics, page 29.)

Using the scalability, flexibility, and speed available, a cloud data analytics platform
such as a cloud data warehouse can support all types of BI and analytics including
data monetization projects aimed at creating new, data-rich products and services
for business partners, suppliers, and customers.

Scalability, flexibility, and speed draw data analytics platforms to the cloud. About two
in five respondents have a data warehouse in the cloud (41%) plus related data transformation,
quality, profiling, and validation processes. A data warehouse is vital for providing users with
properly prepared, structured, and transformed data for BI, reporting, dashboards, and many
forms of analytics. A standard data warehouse centralizes data from the organization’s business
applications, transaction systems, and various external data sources such as third-party credit
reporting or customer segmentation services.

Using the scalability, flexibility, and speed available, a cloud data analytics platform such as
a cloud data warehouse can support all types of BI and analytics including data monetization

6 
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud

projects aimed at creating new, data-rich products and services for business partners, suppliers, and
customers. Related TDWI research finds that getting more business value from data, whether in
operations or analytics, is the top priority for cloud data management.

Figure 1 shows that a significant percentage of organizations are running predictive analytics
and AI/ML in the cloud (32%). New business-driven products and services increasingly require
advanced analytics that can deliver unique insights and prescriptive recommendations that partners,
suppliers, and customers (not to mention employees) cannot get anywhere else. Organizations
surveyed in related TDWI research say that capturing and leveraging emerging data types and
sources is a key driver behind cloud data management. New analytics-driven products and services
tap these often semistructured and unstructured data sources to provide advanced business insights
and to feed AI/ML algorithms.

A common place to store and manage emerging data types is a data lake, which 39% indicate
they are running in the cloud, typically in a scalable and cost-elastic cloud storage system. A
data lake provides a central repository for collecting a variety of structured, semistructured, and
raw, unstructured data. Some organizations use a cloud data lake as a place to put data they plan
to prepare and transform for the cloud data warehouse. However, many organizations use data
lakes to support data science projects involving exploratory, predictive, real-time analytics and
AI/ML. These projects often need access to hundreds of terabytes if not petabytes of data from
log files, social media, mobile device data, and other types of raw data that may be streamed into
the lake in real time.

Technologies are evolving to enable “citizen data professionals”—data-savvy business users


and analysts who do not have formal data science or IT expertise—to do more of their own data
ingestion, refinement, analytics, and visualization. Cloud analytics platforms support this trend,
which is critical if organizations are to keep up with business user demands for data-driven insights
without having to hire legions of data scientists and engineers, who are in high demand.

Cloud analytics platforms enable users to accelerate data preparation, transformation,


and pipelines. As organizations position more data sources, data warehouses, and data lakes
in the cloud, it makes sense to bring related data preparation, transformation, and pipeline
development there as well. In Figure 1, we can see that 38% of organizations surveyed are
running data transformation (ETL) and preparation in the cloud and 25% are running cloud-
based data pipelines.

Pipelines are designed to move data from sources to data lakes, data warehouses, or other target
data platforms to support visualization, analytics, and AI/ML. Some transport raw, streaming data
to target data lakes while others incorporate processes for data profiling, cleansing, transformation,
enrichment, and governance. Today, some solutions offer easy-to-use workspaces that enable
citizen data professionals to do more of their own data preparation, transformation, and pipeline
development rather than depend on IT.

7 
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud

Three Critical Areas of Focus


With these cloud deployment research trends as context, we can look more closely at key
challenges and rearchitecture strategies in three important areas: the data analytics platform;
transformation, preparation, and pipelines; and the BI and analytics platform.

The Data Analytics Platform


Technology advances continue to expand the boundaries of cloud data analytics platforms such as
data warehouses and data lakes, enabling organizations to move beyond the limits of on-premises
systems. Traditional data warehouses were designed primarily to support repeatable and auditable
managed reporting. These systems are less suited for today’s demands that include significant ad
hoc querying, flexible exploration of new data, and complex analytics workloads that require high-
performance processing and access to granular and unstructured data. Traditional, on-premises data
warehouses are notorious for taking more than a year (or multiple years) to develop and deploy,
with slow processes for incorporating new data. Cloud data warehouses can be set up faster.

Traditional, on-premises data warehouses are notorious for taking more than a year
(or multiple years) to develop and deploy, with slow processes for incorporating
new data. Cloud data warehouses can be set up faster.

Traditional data warehouse limitations have driven many line-of-business and departmental users
to set up their own data marts and/or BI platforms, both on premises and in the cloud. This has
exacerbated the proliferation of data silos that makes it difficult to gain a single integrated view of
all relevant data and to govern and secure data use.

Cloud data warehouses operate in a modern technology environment, which enables organizations
to update their strategy and address problems that have held back traditional on-premises data
warehouses. Cloud data warehouses can support standard managed enterprise reporting and
operational dashboards, but they have additional flexibility to stand up to greater numbers
of concurrent users and workloads and can be optimized as needed for faster performance.
Organizations can manage the volume of data needed for complex analytics for developing
predictive insights, exploring new data, and creating prescriptive recommendations.

Because cloud data warehouses can tap as much cloud storage as they need and run on massively
parallel processing (MPP) database engines, they have virtually unlimited scalability; resources can
be added or removed to provide cost elasticity. Some cloud data warehouses offer columnar data
storage, an approach that optimizes query performance by reducing the data needed from disk (I/O)
to answer queries. Columnar databases are a good fit for analytics workloads, which often need to
access a small number of deep columns to return query results.

8 
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud

Cloud data analytics platforms can benefit from a flexible, loosely coupled architecture.
Cloud data analytics platforms such as a cloud data warehouse can take advantage of the trend
away from tightly coupled data computation and storage toward a loosely coupled separation
of the two. Remote, network-attached object storage is much less expensive than local storage,
which is better reserved for transient data. Loosely coupling storage and computation increases
flexibility over where to store data and where to perform computational processing.

A loosely coupled architecture also improves scalability, availability, and recovery time. It eliminates
the data management process of having to ship data across MPP nodes when new nodes are added,
rebalance the entire system, and, if a node fails, recover persistent data that was stored locally.

With the explosion in diverse, high-volume, and fast big data, many organizations have set up
data lakes as a landing place for rapid batch-loading or streaming of primarily raw and granular
data that does not conform to the structure and format of a data warehouse. As noted, TDWI
research finds that many organizations are taking advantage of inexpensive cloud storage to
establish cloud data lakes.

To enable broader accessibility for all users along with more efficient and
comprehensive data management and governance, some organizations are
examining how to integrate the data warehouse and data lake into a single,
seamless architecture.

Most data lakes use a schema-on-read approach in which users (such as data scientists or
engineers) and applications transform, format, and organize the data schema as needed after
landing data in the cloud data lake. In contrast, the schema-on-write approach of traditional data
warehouses that involves long ETL processes to format and transform data before it lands in the
data warehouse. Schema-on-read allows the data lake to support diverse and dynamic analytics
and AI/ML workloads.

Cloud data lakes and data warehouses can become integrated “data lakehouses.” However,
just as data scientists and analysts are interested in accessing not just the data lake but also the
contents of a data warehouse, many data-savvy business users (that is, citizen data professionals)
are interested in additionally accessing the data lake. To enable broader accessibility for all
users along with more efficient and comprehensive data management and governance, some
organizations are examining how to integrate the data warehouse and data lake into a single,
seamless architecture. Some experts in the technology industry call this a “data lakehouse.” This
unified approach can serve as part of a larger strategy to consolidate distributed data mart silos
into one centralized cloud data warehouse and then integrate the data warehouse more tightly
with the cloud data lake.

Offering a more complete data architecture, a cloud data lakehouse could continue to support
classic data warehouse workloads for BI reporting, dashboards, and online analytical processing
(OLAP) on scalable and elastic cloud storage and processing. Users would be able to set up
transformation, data cleansing, and other preparation processes to run inside the data lakehouse
rather than in a separate staging area. The cloud data lakehouse could also provide BI and search

9 
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud

access to the cloud data lake. The data lake would continue to be available for advanced analytics
and AI/ML that requires access to diverse, raw, and granular data. All users, analytics, and AI/ML
programs would have access to different types of data, including semi- and unstructured data.

Better integration between the cloud data warehouse and data lake would enable organizations to
think holistically about data availability. With visibility into data use across the entire architecture,
organizations could set up cloud storage based on assessments of what data is “hot” (frequently
accessed), “warm” (less-frequently accessed), and “cold” (rarely accessed). Organizations could
then monitor data use over time across the entire architecture and adjust the positioning of data in
cloud storage based on usage patterns.

Data Transformation, Preparation, and Pipelines


Traditional ETL involves moving data to a specialized staging area where it is then cleansed and
transformed before being moved to a target system, typically a data warehouse. ETL programming
usually requires IT-level expertise. IT is also needed to manage often numerous (possibly hundreds
or thousands of) ETL processes of varying complexity. As a result, ETL is often slowed by the need
for manual IT work and cannot be performed by nontechnical users on their own.

TDWI research finds that a significant percentage of organizations surveyed (38%) regard data
loading, movement, and integration steps associated with ETL as among their biggest challenges
in trying to augment or replace on-premises systems with new cloud-based services. Only 22% are
very satisfied with self-service data loading and movement capabilities in their current systems.3

Many organizations, led by their data scientists and citizen data professionals,
have chosen to develop data pipelines that rapidly load data into a data lake
as the target platform and then perform transformations after loading..

As an alternative, many organizations, led by their data scientists and citizen data professionals,
have chosen to develop data pipelines that rapidly load data into a data lake as the target platform
and then perform transformations after loading. This approach, known as ELT, has the advantage
of being able to use powerful and scalable MPP database engines for faster and more reliable
performance of complex data transformation and cleansing. Solutions today allow data pipelines
to be reusable, which enables less-technical, self-service users to load data on their own without
intensive development. Data scientists as well as citizen data professionals can be more productive
and access usable data sooner. Figure 2 shows the difference between traditional ETL and ELT.

3
2020 TDWI Best Practices Report: Faster Insights from Faster Data, online at tdwi.org/bpreports.

10 
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud

Source 1 Source 1 Target


(MPP database)

Source 2 Source 2
Target
Staging Final
Tables tables

Source 3 Source 3

Extract Transform Load Extract & Load Transform

Figure 2. Comparing ETL and ELT.

ELT can function within a data lakehouse to address broad needs. Organizations that
choose to rearchitect their data platforms into a cloud data lakehouse, which as described
earlier integrates the data warehouse more tightly with the data lake, can use ELT to serve
both traditional, schema-enforced BI data warehouse workloads (that predominantly use SQL
and advanced analytics) and AI/ML workloads (that use Python, R, Apache Spark, or machine
learning libraries).

With an integrated cloud data lakehouse, raw data in a cloud storage-based data lake would be
available for transformation routines as well as related preparation processes for data enrichment,
quality, and refinement. Data for advanced analytics and AI/ML often requires some cleansing
and transformation; using available solutions that have self-service capabilities, data scientists and
engineers can determine what is needed. Finally, with the data lakehouse, organizations can govern
data transformations and track data lineage more effectively than is possible across disparate data
warehouses, data marts, and data lakes.

As ELT pipelines become more numerous and common, it is important to apply


automated solutions that increase standardization, repeatability, and speed in
development and deployment across projects.

Automation reduces the need for manual transformation and connectivity. TDWI
research finds that most organizations struggle to increase self-service data preparation and
transformation and reduce dependence on IT specialists. Automation and easier-to-use graphical
user interfaces are critical to relieving users of the manual data wrangling necessary to transform
and prepare diverse data, especially raw data contained in data lakes. This manual work typically

11 
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud

requires technology experts. Specialized projects will always demand close, manual work by
data scientists and engineers, but as ELT pipelines become more numerous and common, it is
important to apply automated solutions that increase standardization, repeatability, and speed in
development and deployment across projects.

Automation and better user interfaces can reduce the complexity and inconsistency that often
make data transformation and pipeline development problematic for nontechnical users and lead
to delays in developing data insights that business decision makers need immediately. Solutions
offer prebuilt connectors that reduce the workload for data scientists, data engineers, and
developers who traditionally must hand-code connectivity to extract data. Solutions can provide
monitoring visibility so users can examine the data during pipeline extraction, loading, and
transformation processes.

Monitoring enhanced with AI-driven automation can surface anomalies and data quality issues
in large and varied data volumes, enabling users to address them sooner. This is important as
organizations stream or rapid batch-load data from newer sources such as social media and Internet
of Things (IoT) sensors into cloud data lakes or data lakehouses. Users need to be made aware of
anomalies and inconsistencies so they can determine what transformation, cleansing, enrichment, or
other preparation processes are necessary.

BI and Analytics Platforms


To further goals for informed decisions and actions, organizations need a data architecture that can
support a broad range of user skill levels for performing operations on data. These range from self-
service BI and dashboards to predictive analytics and AI/ML. As business users move beyond static
dashboards and reporting, they need more than just the confined set of data in their traditional data
mart or data warehouse. With access to more data, business users will ask more varied questions to
gain uncommon insights. This deeper analytics engagement will lead to a higher number of ad hoc
queries to, for example, understand performance metrics in context and deliver answers.

Natural language search and query capabilities are important to bridge


skills gaps and enable users to uncover deep insights.

Natural language search and query capabilities are important to bridge skills gaps and enable users
to uncover deep insights. Providing natural language interaction and better ease of use for all skill
levels is key to increasing adoption and productivity because these capabilities reduce the need for
coding expertise. Users working with standalone tools as well as those using business applications
or web services with embedded dashboards and analytics need natural language interaction and
ease-of-use features.

Even as organizations focus on enabling nontechnical business users to interact with data
effectively, they must also provide tools that help data scientists and advanced citizen data
professionals push the envelope and drive innovation. This involves providing access to integrated
volumes of data from diverse sources. Organizations need advanced and flexible BI and analytics
tools to keep pace with these more advanced users’ needs.

12 
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud

Modern tools can play a key role in ensuring that as workloads become more advanced,
organizations take advantage of automation and scalability so users do not experience delays or
become mired in manual coding. Modern tools can augment BI and analytics user experiences
by providing AI-driven recommendations and AI-led automation of complex activities such as
predictive forecasting and real-time analytics.

Four issues are important to address when rearchitecting BI and analytics to take advantage of new
data transformation capabilities and modern data analytics platforms in the cloud:

Issue #1: Data democratization. This is an essential first area of focus. Organizations want
technologies that enable data-driven decisions for the full range of users, from nontechnical
data consumers to citizen data professionals to expert data scientists and data analysts. To
support embedded BI and analytics within the organization as well as the creation of data-driven
products and services for external customers and partners, organizations need flexible tools for
creative, business-driven analytics app development that do not require constant IT developer
handholding. “Citizen” developers on the business side will want to tailor user experiences to
different personas so that BI and analytics fit each user’s context as determined by the type
of decisions they make, their daily information needs, security and authentication levels, and
responsibilities that come with their positions.

Issue #2: Access to a fuller range of data. Digital transformation of business processes and
applications drives more data online, where it may be ingested into a cloud data warehouse, data
lake, or data lakehouse. Businesses need to tap this data to fuel innovation in how they optimize
processes, improve customer and partner relationships, and monitor trends and patterns for
predictive insights. Thus, it is imperative that organizations enable all types of users to access
the full variety of data, wherever it is located.

Many users today need access to live, real-time data. Too often, they are limited to historical data
that is not updated frequently enough to deliver timely insights. TDWI research finds that only 9%
of organizations surveyed currently have very satisfactory capabilities for accessing live or real-
time data.4 Modern BI and analytics tools can offer improved capabilities for analyzing live or real-
time data alongside historical sources.

Ease of use is key to satisfaction, especially as less-technical business users explore data outside
the carefully prepared contents of the data warehouse. BI and analytics tools that provide intuitive
user interfaces with drag-and-drop functionality can reduce frustration and improve productivity.
Tools integrated with underlying semantic layers and data models can help users by automating
processes for resolving inconsistencies and quality issues discovered as users blend and mash
data from different sources. The semantic layer is also key to formalizing data definitions,
formats, and lineage.

Issue #3: Scalability and cost elasticity. Organizations require a highly scalable data architecture
to accommodate growing needs to access high volumes of diverse data and perform complex,
less-predictable analytics. They need an architecture that can align with changing requirements.
This raises the importance of cost elasticity; organizations need flexibility so they can move
beyond having to pay for peak resource levels and data availability at all times, as is the case with
traditional on-premises systems that have fixed capacity. A cloud data analytics platform such

4
2020 TDWI Best Practices Report: Evolving from Traditional Business Intelligence to Modern Business Analytics,
online at tdwi.org/bpreports.

13 
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud

as a cloud data lakehouse can enable organizations to manage cloud data storage and processing
scalability as a central resource. More holistic management of the entire cloud data architecture can
give organizations control of cloud elasticity to address their scalability needs—more so than is
possible in an environment of disparate data marts, data warehouses, and data lakes.

Issue #4: Agility and flexibility for embedding BI and analytics in applications and services.
Standalone BI and analytics tools should not be the only means for data-driven business
innovation. Another is through embedded visualizations and analytics inside applications
and services. This mode increases adoption of analytics by infusing intelligence into existing
workflows and processes. TDWI research finds that organizations embed BI and analytics
in a range of systems, notably performance management KPIs and scorecards (51% of those
surveyed), operational systems (28%), and CRM, SFA, and marketing management applications
(23%).5 Organizations can also leverage embedded functionality in services built to create
additional value for partners and customers.

Applications and cloud-based services developed by third parties for specific industries or business
operations could deliver significant additional value if embedded with modern BI and analytics.
Examples include a retail supply chain platform that offers real-time insights into millions of
industry-specific data points to analyze different categories, SKUs, and stores; and a corporate
management solution that uses embedded BI and analytics to provide single-view visibility
and analysis of hundreds of specialized performance and financial metrics that depend on large
volumes of varied data.

Embedded BI and analytics need to take advantage of current technologies and standards so
functionality is agile, flexible, and portable. Containers and container orchestration standards
such as open source Kubernetes are critical to portability. The use of standard RESTful or
JavaScript application programming interfaces (APIs) along with data pipelines enable easier
and, if needed, real-time flow of data for visualization and analytics. APIs make it simpler to
request data and then (with permission) automatically populate embedded BI and analytics with
data from other applications. APIs also let developers create data models and schemas and share
them among developers and users so embedded visualizations and analytics can be produced
faster and more consistently.

APIs, pipelines, and portability are critical to increasing external developers’ agility in creating
data-driven applications and services to meet clients’ evolving data needs. In the same way,
the technologies and standards enable internal developers to innovate beyond the proprietary
constraints of traditionally static and monolithic vertical applications.

5
2020 TDWI Best Practices Report: Faster Insights from Faster Data, online at tdwi.org/bpreports.

14 
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud

Recommendations
In this section, we provide nine recommendations for successfully rearchitecting BI, data
transformation, and data management to take advantage of the cloud.

1. Solve business pain points with data-driven innovation. Migrating to the cloud delivers
a rare opportunity to bring the focus of BI, analytics, data preparation, transformation, and
management back to the fundamental purpose: to enable the organization to solve problems
with innovative data insights. With on-premises systems, it can be easy to lose the focus on this
mission when it is a struggle just to maintain systems and control costs. Organizations should
align their new architecture with key business objectives for data-driven innovation such as
improving customer experiences, optimizing operations and processes, using data and analytics
to create new products and services, and protecting the firm from undue risk and regulatory
exposure.

2. Elevate self-service capabilities. TDWI research finds that expanding and enhancing self-
service capabilities continue to be top priorities. However, the built-in scalability limitations of
on-premises systems plus traditionally constrained practices for data access, transformation, and
preparation limit democratization and lead to lower self-service satisfaction. When organizations
migrate to the cloud, they should improve self-service experiences by enabling users to organize,
search, and query more types of data, including from cloud data lakes and storage. Augment
decision-making and alert notification with recommendations based on AI/ML-driven discovery
of actionable insights.

3. Support development of data-rich products, services, and embedded functionality.


Migration to the cloud and deploying modern BI and analytics solutions can open up data assets to
third-party developers, giving them the opportunity to deliver unique and substantial value through
data- and analytics-rich OEM products and services. Using standards for open data connectivity
and containers, APIs, and agile BI visualization and analytics technologies, developers can embed
BI and analytics services that interact with large volumes of diverse data beyond what is possible
with traditional on-premises applications. Use of standards will make data integration between
cloud apps easier and more streamlined.
Developers at user organizations can apply the same combination of cloud data analytics platforms,
data transformation, connectivity through open APIs, and modern BI and analytics solutions to
create new, revenue-generating data and analytics services. These monetize data assets and provide
capabilities that enhance existing partner or customer relationships. Such services can also be
valuable to internal employees and drive higher satisfaction and productivity.

4. Deliver a faster, broader, and more agile data platform for analytics and AI/ML.
Analytics and AI/ML grind to a halt without appropriately provisioned and prepared data. New
and evolving user requirements for data from sources such as IoT, social media, video, and
customer behavior log files make it important to position your cloud data platform’s scalable
storage and processing power to support analytics and AI/ML growth.

Data pipeline development can take advantage of a cloud data analytics platform for flexibility
and power to support a broader range of workloads and use cases, from technical data scientists
performing real-time analytics to citizen data professionals enhancing forecasting with predictive
analytics. Modern solutions can reduce manual development through automation and prebuilt

15 
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud

connectivity. These solutions enable users to connect to data sources using standard REST APIs and
thereby avoid having to develop and manage custom data connectivity.

5. Make full use of metadata, semantic layers, and data catalogs. Easily accessible and
up-to-date knowledge about data sources, including data definitions, lineage, and location,
is important for fast and complete data integration. Data catalogs, semantic layers, and other
metadata repositories can accelerate discovery, preparation, transformation, and use data. They
also reduce confusion about data quality and consistency and play a vital role in governance.

TDWI research finds that most organizations are looking for improvement. Only 9% are currently
very satisfied with how well their data catalogs, metadata, and semantic layers support self-service
BI, analytics, AI/ML, and related data preparation, transformation, and integration; 49% say they
need a major upgrade.6 Ensure that data catalogs, semantic layers, and metadata repositories are
central to your new architecture. Evaluate self-service BI solutions that provide a semantic layer
and offer integration with data catalogs through APIs.

6. Take advantage of decoupled storage and computation. Organizations should rearchitect


their data analytics to take advantage of technology advances in the cloud. These include MPP
and layered architectures that allow separation between open-format data storage services and
computation resources. This looser coupling exploits fast, low-latency networks that will only
continue to get faster to make it easier to scale storage or computation processing separately,
on demand. Organizations can use the flexibility to move or replicate data appropriately for
workloads and manage hot, medium, and cold data-access requirements.

7. Tighten data warehouse and data lake integration; consider a data lakehouse in the
cloud. With different types of users and workloads needing access to diverse data, organizations
need to tighten integration between their cloud data warehouse and data lake. One approach is to
establish holistic a cloud data lakehouse that makes it easier for users, applications, and services
to realize benefits. With a cloud data lakehouse, data scientists and analysts can explore data,
test models, and run AI/ML algorithms on prepared and transformed data in the data warehouse
and raw data in the data lake. Citizen data professionals and less-technical business users can
search, query, and analyze a fuller range of data.

8. Increase the business value of data by improving transformation and preparation.


Transformation is how users and applications get data into the right format and join data from
different sources together for reports, interactive dashboards, and analytics. Most users and
applications need transformed and prepared data so they can move faster to apply it to business
decisions and collaborate with data more confidently. Easier-to-use self-service interfaces and AI
augmentation in data transformation solutions enable users to do their own projects without IT
intervention. TDWI research finds room for improvement and growth; only 17% of organizations
surveyed have self-service data preparation, transformation, and pipeline development currently,
but 31% would like to have it in the near future.7

Organizations should evaluate how they can rearchitect data transformation by utilizing powerful
cloud database processing, including by switching from traditional ETL to ELT. This can improve
speed and efficiency.

6
2020 TDWI Best Practices Report: Evolving from Traditional Business Intelligence to Modern Business Analytics,
online at tdwi.org/bpreports.

7
Ibid.

16 
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud

9. Address governance and security. Governance must be central to your new architecture, but
it becomes challenging as users and workloads increase. Most organizations need to respond to
data privacy and internal data-use regulations as part of governance. A critical step is to bring
business and IT stakeholders together to interpret data privacy regulations, set rules and policies,
and monitor observance. Organizations should use resources such as data catalogs and semantic
layers to improve data lineage tracking. A cloud data lakehouse can give organizations holistic
governance of more of their data. Organizations should update data stewardship practices for
guiding users to trusted and governed data across all data platforms.

A Final Point: Achieving Business Advantages


with Data and Analytics
Growth in cloud computing gives organizations an opportunity to think differently about how
to position their architecture to support data-driven business innovation. Agility is needed
because business requirements change quickly. This TDWI Pulse Report has discussed why
organizations need self-service flexibility and scalable and elastic processing power to deliver
what decision makers need to solve business problems and grow revenue through new, data-rich
products and services.
Doing more than a simple “lift and shift” of traditional BI and data management practices into the
cloud will help organizations future-proof their data architecture. Analytics and AI/ML workloads
are growing; most organizations want to increase democratization of self-service BI, analytics,
data transformation, and preparation. To meet the demands of citizen data professionals in LOB
and business functions, organizations need to move past the constraints of traditional on-premises
systems and embrace the open, elastic, and scalable environment of the cloud.

17 
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud

About Our Sponsors


Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud
platform, offering over 175 fully featured services globally. Millions of customers—including
the fastest-growing startups, largest enterprises, and leading government agencies—trust AWS to
power their infrastructure, become more agile, and lower costs.

Matillion’s software empowers customers to extract data from a wide number of sources, load it
into their chosen cloud data warehouse (CDW)—including Amazon Redshift, Snowflake, Microsoft
Azure Synapse, and Google BigQuery—and transform that data from its siloed source state
into analytics-ready insights—prepared for advanced analytics, machine learning, and artificial
intelligence use cases. Matillion does this at scale, delivering fast time to value, high performance
with pay-as-you-go economics. Matillion’s technology reimagines the traditional ETL (extract-
transform-load) approach to take advantage of the flexibility and scalability of their customer’s
cloud data warehouse. Matillion’s extract-load-transform (ELT) approach offers increased
performance and value by extracting and loading data in one move, then leveraging the power of
the cloud data warehouse to handle transformations. Companies can start out with Matillion’s latest
product, Matillion Data Loader, for data ingestion, and transition to Matillion ETL to perform more
complex data transformation. Matillion stands out in the crowded data technology field by helping
companies of all sizes worldwide by meeting their diverse data needs, and meeting them wherever
they are on their data journey. For more information, visit www.matillion.com.

As a leader in analytics infusion, Sisense powers businesses across the globe to go beyond just
making sense of data to unlocking untapped business potential. Sisense infuses analytics at the
right place and the right time to transform businesses from data-driven to intelligence-infused. Join
our community of intelligence-infused organizations such as Expedia, Philips, UIPath, and The
Salvation Army. Learn how you, too, can infuse analytics everywhere at www.sisense.com.

18 
TDWI Research provides research and advice for data
professionals worldwide. TDWI Research focuses
exclusively on business intelligence, data warehousing,
and analytics issues and teams up with industry
thought leaders and practitioners to deliver both broad
and deep understanding of the business and technical
challenges surrounding the deployment and use of
business intelligence, data warehousing, and analytics
solutions. TDWI Research offers in-depth research reports,
commentary, inquiry services, and topical conferences
as well as strategic planning services to user and vendor
organizations.

T 425.277.9126
555 S. Renton Village Place, Ste. 700 F 425.687.2842
Renton, WA 98057-3295 E [email protected] tdwi.org

You might also like