Beyond "Lift and Shift": Rearchitecting BI, ETL, and Data Warehousing For The Cloud
Beyond "Lift and Shift": Rearchitecting BI, ETL, and Data Warehousing For The Cloud
By David Stodder
Sponsored by:
TDWI PULSE REPORT
Q4 2020
Recommendations . . . . . . . . . . . . . . . . . . . . . . . 15
© 2020 by TDWI, a division of 1105 Media, Inc. All rights reserved. Reproductions
in whole or in part are prohibited except by written permission. Email requests or
feedback to [email protected].
1
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud
Acknowledgments
TDWI would like to thank many people who contributed to this report, including, our report
sponsors, who reviewed outlines and report drafts. Finally, we would like to recognize TDWI’s
production team: James Powell, Lindsay Stares, Rich Seeley, and Rod Gosser.
Sponsor
Amazon Web Services, Matillion, and Sisense sponsored the writing of this report.
2
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud
Cloud migration can accelerate the use of data and analytics to accomplish familiar but
essential business goals such as optimizing processes, driving stronger financial performance,
proactively responding to risk and fraud incidents, and building greater customer loyalty and
satisfaction. Timely data insights can create competitive differentiation in each of these areas.
Yet, organizations also need to seize the opportunity to unleash business creativity with data to
intelligently expand into new markets and develop services that monetize data and analytics to
increase return on investment.
Cloud computing offers scalability, flexibility, cost elasticity, and speed to deployment—all
highly attractive attributes essential to data-fueled innovation. TDWI research finds that for
many organizations, adoption of cloud computing goes hand-in-hand with the growing business
stakeholder role in the direction of BI, analytics, and data management; 54% of organizations
surveyed anticipate that the business side’s role in solution and service selection and investment
will increase in the next year.1 This suggests that for many organizations, the impetus for expanding
into the cloud is to make BI, analytics, and data management easier, faster, and more flexible so that
all types of business users can answer critical questions and solve problems.
Executives, managers, and frontline personnel need continuous access to larger and more diverse
data sets so they can gain a complete and contextual understanding of trends, patterns, and events in
their markets, supply chains, and manufacturing processes as well as stay on top of regulatory and
data governance responsibilities. The importance of access to all relevant data is never higher than
during times of unanticipated change. Executives and line-of-business (LOB) managers are under
pressure to respond; they need to make informed, fact-based decisions about personnel, inventory,
marketing strategies, logistics, and other concerns. These users want as little friction as possible in
accessing data so they can quickly visualize and analyze integrated sets of historical and real-time
data from different perspectives. They need predictive insights and prescriptive recommendations.
Organizations are pushing beyond the limits of traditional, on-premises systems. To adapt
to change, often the first consideration is to augment on-premises systems with pay-as-you-go,
cloud-based BI, analytics, and data management. Organizations with investments in existing,
on-premises systems find they have reached their limits and it would be too expensive and time-
consuming to expand capacity or add new instances. The inability of these systems to respond to
immediate and changing business needs is often what drives migration to the cloud.
For example, a marketing function might need to analyze new and diverse data about customers
to support an upcoming campaign—data that pushes beyond the limits of what the organization’s
current extract, transform, and load (ETL) and data warehouse systems can handle. The marketing
1
2020 TDWI Best Practices Report: Evolving from Traditional Business Intelligence to Modern Business Analytics, page 12.
Online at tdwi.org/bpreports.
3
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud
managers find that managed BI reports and dashboards no longer fit their requirements because
use cases have changed; they need more personalization and the flexibility to pose unanticipated,
ad hoc, and often more complex queries. Marketing’s data scientists want to explore data beyond
what is in the data warehouse; they want to add new variables to predictive models and develop
machine-learning algorithms. Managers need data-driven prescriptive recommendations for
marketing campaigns and omnichannel customer engagement.
Such organizations often discover that working with new cloud-based systems frees them from
not only the physical and economic constraints of on-premises systems but also legacy practices
that have been shaped by those constraints. These practices have been honed by administrators to
deliver results within the limits of traditional BI, ETL, and data warehouse capacities. They were
developed based on what was feasible for achieving planned service levels of speed, availability,
and concurrency with existing on-premises platforms—and no more.
If organizations simply impose these same practices in the new terrain of cloud-based systems,
they become a source of friction that slows down integration of new data and support for new
and different workloads. Thus, when moving to the cloud, it is important to revise mindsets about
how to manage, transform, and analyze data. In the cloud, many of the boundaries associated with
traditional on-premises systems no longer exist.
Data-driven business objectives demand more than “lift and shift.” To realize the potential
of cloud-based BI, analytics, transformation, and data management, organizations need to
do more than just “lift and shift” existing systems and practices into the cloud. To be sure, a
simple lift-and-shift project can reduce costs and improve performance and resiliency, especially
for applications you may not otherwise touch again or you plan to retire soon. However,
organizations that embark on modernizing their data architecture will have the opportunity to
upgrade performance, scalability, and security. A modern architecture that takes full advantage
of the cloud will allow organizations to add new business capabilities, support a culture of
innovation, and attain a higher ROI by leveraging data as a strategic asset.
One key reason to upgrade is to support democratization: that is, expansion in the number and
diversity of people who want to access and interact with data. As noted, business executives and
managers are now important stakeholders in cloud BI and analytics services adoption. Business
users want more self-service capabilities so they can depend on IT less. Cloud-based services can
empower users to answer questions and solve problems with less IT intervention. Organizations,
therefore, need to evaluate how they can rearchitect BI, analytics, and data management. They must
rethink existing practices to realize the potential of the cloud for supporting growth in self-service
data interaction.
4
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud
Leading drivers identified by TDWI research for cloud-first or cloud-migration strategies include:
• Lower up-front investment for expanding data use; being able to afford newer ways of
achieving scale and speed
• Support for expansion of self-service BI and analytics to new types of users (beyond traditional
business analysts) and new types of projects; foster easier data collaboration and connectivity
between people and between applications
• Shorter data preparation delays, especially ETL or extraction, loading, and transformation
(ELT) by using scalable, cloud-based processing
• Better alignment of data storage and management with business users’ access patterns (i.e., hot,
warm, and cold classifications for data storage)
• More scalable and performant data architecture to support growth in prototyping and
deployment of artificial intelligence (AI) techniques such as machine learning (ML)
• Development of new product lines, services, and embedded application functionality based on
data and analytics (i.e., data monetization)
Data security has been a concern of organizations about moving to the cloud, but TDWI research
finds that although data security is always a priority, organizations are increasingly confident in the
security of cloud applications and databases. With modern practices and technologies, cloud-based
systems can be more secure than on-premises systems.
This level of assurance is reflected in Figure 1, which displays TDWI research into which BI,
analytics, and data management systems organizations are running on cloud-based (i.e., platform-
or software-as-a-service [PaaS or SaaS]) platforms. More than half of respondents say they
have enterprise BI, reporting, and dashboards in the cloud (54%), which indicates that many
organizations are confident in using the cloud to support secure, mainstream data consumption
critical to driving daily business decisions. Large organizations are tapping the cloud’s scalability
and processing performance advantages to support the hundreds (if not thousands) of users of
enterprise BI, reporting, and dashboards.
2
2020 TDWI Best Practices Report: Cloud Data Management, online at tdwi.org/bpreports.
5
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud
Nearly as many respondents are running cloud-based, business-driven, self-service BI and analytics
in the cloud (51%). This aligns with our research noted earlier that the majority of organizations
surveyed see the business side’s role increasing in the acquisition of BI, analytics, and data
management cloud services. These users are interested in self-service capabilities.
In your organization, which of the following BI, analytics, and data management systems are
currently running on cloud-based (e.g., PaaS) or SaaS platforms?
Figure 1. Based on answers from 138 respondents. Respondents could select all answers that applied.
(Source: 2020 TDWI Best Practices Report: Evolving from Traditional Business Intelligence to Modern
Business Analytics, page 29.)
Using the scalability, flexibility, and speed available, a cloud data analytics platform
such as a cloud data warehouse can support all types of BI and analytics including
data monetization projects aimed at creating new, data-rich products and services
for business partners, suppliers, and customers.
Scalability, flexibility, and speed draw data analytics platforms to the cloud. About two
in five respondents have a data warehouse in the cloud (41%) plus related data transformation,
quality, profiling, and validation processes. A data warehouse is vital for providing users with
properly prepared, structured, and transformed data for BI, reporting, dashboards, and many
forms of analytics. A standard data warehouse centralizes data from the organization’s business
applications, transaction systems, and various external data sources such as third-party credit
reporting or customer segmentation services.
Using the scalability, flexibility, and speed available, a cloud data analytics platform such as
a cloud data warehouse can support all types of BI and analytics including data monetization
6
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud
projects aimed at creating new, data-rich products and services for business partners, suppliers, and
customers. Related TDWI research finds that getting more business value from data, whether in
operations or analytics, is the top priority for cloud data management.
Figure 1 shows that a significant percentage of organizations are running predictive analytics
and AI/ML in the cloud (32%). New business-driven products and services increasingly require
advanced analytics that can deliver unique insights and prescriptive recommendations that partners,
suppliers, and customers (not to mention employees) cannot get anywhere else. Organizations
surveyed in related TDWI research say that capturing and leveraging emerging data types and
sources is a key driver behind cloud data management. New analytics-driven products and services
tap these often semistructured and unstructured data sources to provide advanced business insights
and to feed AI/ML algorithms.
A common place to store and manage emerging data types is a data lake, which 39% indicate
they are running in the cloud, typically in a scalable and cost-elastic cloud storage system. A
data lake provides a central repository for collecting a variety of structured, semistructured, and
raw, unstructured data. Some organizations use a cloud data lake as a place to put data they plan
to prepare and transform for the cloud data warehouse. However, many organizations use data
lakes to support data science projects involving exploratory, predictive, real-time analytics and
AI/ML. These projects often need access to hundreds of terabytes if not petabytes of data from
log files, social media, mobile device data, and other types of raw data that may be streamed into
the lake in real time.
Pipelines are designed to move data from sources to data lakes, data warehouses, or other target
data platforms to support visualization, analytics, and AI/ML. Some transport raw, streaming data
to target data lakes while others incorporate processes for data profiling, cleansing, transformation,
enrichment, and governance. Today, some solutions offer easy-to-use workspaces that enable
citizen data professionals to do more of their own data preparation, transformation, and pipeline
development rather than depend on IT.
7
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud
Traditional, on-premises data warehouses are notorious for taking more than a year
(or multiple years) to develop and deploy, with slow processes for incorporating
new data. Cloud data warehouses can be set up faster.
Traditional data warehouse limitations have driven many line-of-business and departmental users
to set up their own data marts and/or BI platforms, both on premises and in the cloud. This has
exacerbated the proliferation of data silos that makes it difficult to gain a single integrated view of
all relevant data and to govern and secure data use.
Cloud data warehouses operate in a modern technology environment, which enables organizations
to update their strategy and address problems that have held back traditional on-premises data
warehouses. Cloud data warehouses can support standard managed enterprise reporting and
operational dashboards, but they have additional flexibility to stand up to greater numbers
of concurrent users and workloads and can be optimized as needed for faster performance.
Organizations can manage the volume of data needed for complex analytics for developing
predictive insights, exploring new data, and creating prescriptive recommendations.
Because cloud data warehouses can tap as much cloud storage as they need and run on massively
parallel processing (MPP) database engines, they have virtually unlimited scalability; resources can
be added or removed to provide cost elasticity. Some cloud data warehouses offer columnar data
storage, an approach that optimizes query performance by reducing the data needed from disk (I/O)
to answer queries. Columnar databases are a good fit for analytics workloads, which often need to
access a small number of deep columns to return query results.
8
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud
Cloud data analytics platforms can benefit from a flexible, loosely coupled architecture.
Cloud data analytics platforms such as a cloud data warehouse can take advantage of the trend
away from tightly coupled data computation and storage toward a loosely coupled separation
of the two. Remote, network-attached object storage is much less expensive than local storage,
which is better reserved for transient data. Loosely coupling storage and computation increases
flexibility over where to store data and where to perform computational processing.
A loosely coupled architecture also improves scalability, availability, and recovery time. It eliminates
the data management process of having to ship data across MPP nodes when new nodes are added,
rebalance the entire system, and, if a node fails, recover persistent data that was stored locally.
With the explosion in diverse, high-volume, and fast big data, many organizations have set up
data lakes as a landing place for rapid batch-loading or streaming of primarily raw and granular
data that does not conform to the structure and format of a data warehouse. As noted, TDWI
research finds that many organizations are taking advantage of inexpensive cloud storage to
establish cloud data lakes.
To enable broader accessibility for all users along with more efficient and
comprehensive data management and governance, some organizations are
examining how to integrate the data warehouse and data lake into a single,
seamless architecture.
Most data lakes use a schema-on-read approach in which users (such as data scientists or
engineers) and applications transform, format, and organize the data schema as needed after
landing data in the cloud data lake. In contrast, the schema-on-write approach of traditional data
warehouses that involves long ETL processes to format and transform data before it lands in the
data warehouse. Schema-on-read allows the data lake to support diverse and dynamic analytics
and AI/ML workloads.
Cloud data lakes and data warehouses can become integrated “data lakehouses.” However,
just as data scientists and analysts are interested in accessing not just the data lake but also the
contents of a data warehouse, many data-savvy business users (that is, citizen data professionals)
are interested in additionally accessing the data lake. To enable broader accessibility for all
users along with more efficient and comprehensive data management and governance, some
organizations are examining how to integrate the data warehouse and data lake into a single,
seamless architecture. Some experts in the technology industry call this a “data lakehouse.” This
unified approach can serve as part of a larger strategy to consolidate distributed data mart silos
into one centralized cloud data warehouse and then integrate the data warehouse more tightly
with the cloud data lake.
Offering a more complete data architecture, a cloud data lakehouse could continue to support
classic data warehouse workloads for BI reporting, dashboards, and online analytical processing
(OLAP) on scalable and elastic cloud storage and processing. Users would be able to set up
transformation, data cleansing, and other preparation processes to run inside the data lakehouse
rather than in a separate staging area. The cloud data lakehouse could also provide BI and search
9
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud
access to the cloud data lake. The data lake would continue to be available for advanced analytics
and AI/ML that requires access to diverse, raw, and granular data. All users, analytics, and AI/ML
programs would have access to different types of data, including semi- and unstructured data.
Better integration between the cloud data warehouse and data lake would enable organizations to
think holistically about data availability. With visibility into data use across the entire architecture,
organizations could set up cloud storage based on assessments of what data is “hot” (frequently
accessed), “warm” (less-frequently accessed), and “cold” (rarely accessed). Organizations could
then monitor data use over time across the entire architecture and adjust the positioning of data in
cloud storage based on usage patterns.
TDWI research finds that a significant percentage of organizations surveyed (38%) regard data
loading, movement, and integration steps associated with ETL as among their biggest challenges
in trying to augment or replace on-premises systems with new cloud-based services. Only 22% are
very satisfied with self-service data loading and movement capabilities in their current systems.3
Many organizations, led by their data scientists and citizen data professionals,
have chosen to develop data pipelines that rapidly load data into a data lake
as the target platform and then perform transformations after loading..
As an alternative, many organizations, led by their data scientists and citizen data professionals,
have chosen to develop data pipelines that rapidly load data into a data lake as the target platform
and then perform transformations after loading. This approach, known as ELT, has the advantage
of being able to use powerful and scalable MPP database engines for faster and more reliable
performance of complex data transformation and cleansing. Solutions today allow data pipelines
to be reusable, which enables less-technical, self-service users to load data on their own without
intensive development. Data scientists as well as citizen data professionals can be more productive
and access usable data sooner. Figure 2 shows the difference between traditional ETL and ELT.
3
2020 TDWI Best Practices Report: Faster Insights from Faster Data, online at tdwi.org/bpreports.
10
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud
Source 2 Source 2
Target
Staging Final
Tables tables
Source 3 Source 3
ELT can function within a data lakehouse to address broad needs. Organizations that
choose to rearchitect their data platforms into a cloud data lakehouse, which as described
earlier integrates the data warehouse more tightly with the data lake, can use ELT to serve
both traditional, schema-enforced BI data warehouse workloads (that predominantly use SQL
and advanced analytics) and AI/ML workloads (that use Python, R, Apache Spark, or machine
learning libraries).
With an integrated cloud data lakehouse, raw data in a cloud storage-based data lake would be
available for transformation routines as well as related preparation processes for data enrichment,
quality, and refinement. Data for advanced analytics and AI/ML often requires some cleansing
and transformation; using available solutions that have self-service capabilities, data scientists and
engineers can determine what is needed. Finally, with the data lakehouse, organizations can govern
data transformations and track data lineage more effectively than is possible across disparate data
warehouses, data marts, and data lakes.
Automation reduces the need for manual transformation and connectivity. TDWI
research finds that most organizations struggle to increase self-service data preparation and
transformation and reduce dependence on IT specialists. Automation and easier-to-use graphical
user interfaces are critical to relieving users of the manual data wrangling necessary to transform
and prepare diverse data, especially raw data contained in data lakes. This manual work typically
11
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud
requires technology experts. Specialized projects will always demand close, manual work by
data scientists and engineers, but as ELT pipelines become more numerous and common, it is
important to apply automated solutions that increase standardization, repeatability, and speed in
development and deployment across projects.
Automation and better user interfaces can reduce the complexity and inconsistency that often
make data transformation and pipeline development problematic for nontechnical users and lead
to delays in developing data insights that business decision makers need immediately. Solutions
offer prebuilt connectors that reduce the workload for data scientists, data engineers, and
developers who traditionally must hand-code connectivity to extract data. Solutions can provide
monitoring visibility so users can examine the data during pipeline extraction, loading, and
transformation processes.
Monitoring enhanced with AI-driven automation can surface anomalies and data quality issues
in large and varied data volumes, enabling users to address them sooner. This is important as
organizations stream or rapid batch-load data from newer sources such as social media and Internet
of Things (IoT) sensors into cloud data lakes or data lakehouses. Users need to be made aware of
anomalies and inconsistencies so they can determine what transformation, cleansing, enrichment, or
other preparation processes are necessary.
Natural language search and query capabilities are important to bridge skills gaps and enable users
to uncover deep insights. Providing natural language interaction and better ease of use for all skill
levels is key to increasing adoption and productivity because these capabilities reduce the need for
coding expertise. Users working with standalone tools as well as those using business applications
or web services with embedded dashboards and analytics need natural language interaction and
ease-of-use features.
Even as organizations focus on enabling nontechnical business users to interact with data
effectively, they must also provide tools that help data scientists and advanced citizen data
professionals push the envelope and drive innovation. This involves providing access to integrated
volumes of data from diverse sources. Organizations need advanced and flexible BI and analytics
tools to keep pace with these more advanced users’ needs.
12
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud
Modern tools can play a key role in ensuring that as workloads become more advanced,
organizations take advantage of automation and scalability so users do not experience delays or
become mired in manual coding. Modern tools can augment BI and analytics user experiences
by providing AI-driven recommendations and AI-led automation of complex activities such as
predictive forecasting and real-time analytics.
Four issues are important to address when rearchitecting BI and analytics to take advantage of new
data transformation capabilities and modern data analytics platforms in the cloud:
Issue #1: Data democratization. This is an essential first area of focus. Organizations want
technologies that enable data-driven decisions for the full range of users, from nontechnical
data consumers to citizen data professionals to expert data scientists and data analysts. To
support embedded BI and analytics within the organization as well as the creation of data-driven
products and services for external customers and partners, organizations need flexible tools for
creative, business-driven analytics app development that do not require constant IT developer
handholding. “Citizen” developers on the business side will want to tailor user experiences to
different personas so that BI and analytics fit each user’s context as determined by the type
of decisions they make, their daily information needs, security and authentication levels, and
responsibilities that come with their positions.
Issue #2: Access to a fuller range of data. Digital transformation of business processes and
applications drives more data online, where it may be ingested into a cloud data warehouse, data
lake, or data lakehouse. Businesses need to tap this data to fuel innovation in how they optimize
processes, improve customer and partner relationships, and monitor trends and patterns for
predictive insights. Thus, it is imperative that organizations enable all types of users to access
the full variety of data, wherever it is located.
Many users today need access to live, real-time data. Too often, they are limited to historical data
that is not updated frequently enough to deliver timely insights. TDWI research finds that only 9%
of organizations surveyed currently have very satisfactory capabilities for accessing live or real-
time data.4 Modern BI and analytics tools can offer improved capabilities for analyzing live or real-
time data alongside historical sources.
Ease of use is key to satisfaction, especially as less-technical business users explore data outside
the carefully prepared contents of the data warehouse. BI and analytics tools that provide intuitive
user interfaces with drag-and-drop functionality can reduce frustration and improve productivity.
Tools integrated with underlying semantic layers and data models can help users by automating
processes for resolving inconsistencies and quality issues discovered as users blend and mash
data from different sources. The semantic layer is also key to formalizing data definitions,
formats, and lineage.
Issue #3: Scalability and cost elasticity. Organizations require a highly scalable data architecture
to accommodate growing needs to access high volumes of diverse data and perform complex,
less-predictable analytics. They need an architecture that can align with changing requirements.
This raises the importance of cost elasticity; organizations need flexibility so they can move
beyond having to pay for peak resource levels and data availability at all times, as is the case with
traditional on-premises systems that have fixed capacity. A cloud data analytics platform such
4
2020 TDWI Best Practices Report: Evolving from Traditional Business Intelligence to Modern Business Analytics,
online at tdwi.org/bpreports.
13
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud
as a cloud data lakehouse can enable organizations to manage cloud data storage and processing
scalability as a central resource. More holistic management of the entire cloud data architecture can
give organizations control of cloud elasticity to address their scalability needs—more so than is
possible in an environment of disparate data marts, data warehouses, and data lakes.
Issue #4: Agility and flexibility for embedding BI and analytics in applications and services.
Standalone BI and analytics tools should not be the only means for data-driven business
innovation. Another is through embedded visualizations and analytics inside applications
and services. This mode increases adoption of analytics by infusing intelligence into existing
workflows and processes. TDWI research finds that organizations embed BI and analytics
in a range of systems, notably performance management KPIs and scorecards (51% of those
surveyed), operational systems (28%), and CRM, SFA, and marketing management applications
(23%).5 Organizations can also leverage embedded functionality in services built to create
additional value for partners and customers.
Applications and cloud-based services developed by third parties for specific industries or business
operations could deliver significant additional value if embedded with modern BI and analytics.
Examples include a retail supply chain platform that offers real-time insights into millions of
industry-specific data points to analyze different categories, SKUs, and stores; and a corporate
management solution that uses embedded BI and analytics to provide single-view visibility
and analysis of hundreds of specialized performance and financial metrics that depend on large
volumes of varied data.
Embedded BI and analytics need to take advantage of current technologies and standards so
functionality is agile, flexible, and portable. Containers and container orchestration standards
such as open source Kubernetes are critical to portability. The use of standard RESTful or
JavaScript application programming interfaces (APIs) along with data pipelines enable easier
and, if needed, real-time flow of data for visualization and analytics. APIs make it simpler to
request data and then (with permission) automatically populate embedded BI and analytics with
data from other applications. APIs also let developers create data models and schemas and share
them among developers and users so embedded visualizations and analytics can be produced
faster and more consistently.
APIs, pipelines, and portability are critical to increasing external developers’ agility in creating
data-driven applications and services to meet clients’ evolving data needs. In the same way,
the technologies and standards enable internal developers to innovate beyond the proprietary
constraints of traditionally static and monolithic vertical applications.
5
2020 TDWI Best Practices Report: Faster Insights from Faster Data, online at tdwi.org/bpreports.
14
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud
Recommendations
In this section, we provide nine recommendations for successfully rearchitecting BI, data
transformation, and data management to take advantage of the cloud.
1. Solve business pain points with data-driven innovation. Migrating to the cloud delivers
a rare opportunity to bring the focus of BI, analytics, data preparation, transformation, and
management back to the fundamental purpose: to enable the organization to solve problems
with innovative data insights. With on-premises systems, it can be easy to lose the focus on this
mission when it is a struggle just to maintain systems and control costs. Organizations should
align their new architecture with key business objectives for data-driven innovation such as
improving customer experiences, optimizing operations and processes, using data and analytics
to create new products and services, and protecting the firm from undue risk and regulatory
exposure.
2. Elevate self-service capabilities. TDWI research finds that expanding and enhancing self-
service capabilities continue to be top priorities. However, the built-in scalability limitations of
on-premises systems plus traditionally constrained practices for data access, transformation, and
preparation limit democratization and lead to lower self-service satisfaction. When organizations
migrate to the cloud, they should improve self-service experiences by enabling users to organize,
search, and query more types of data, including from cloud data lakes and storage. Augment
decision-making and alert notification with recommendations based on AI/ML-driven discovery
of actionable insights.
4. Deliver a faster, broader, and more agile data platform for analytics and AI/ML.
Analytics and AI/ML grind to a halt without appropriately provisioned and prepared data. New
and evolving user requirements for data from sources such as IoT, social media, video, and
customer behavior log files make it important to position your cloud data platform’s scalable
storage and processing power to support analytics and AI/ML growth.
Data pipeline development can take advantage of a cloud data analytics platform for flexibility
and power to support a broader range of workloads and use cases, from technical data scientists
performing real-time analytics to citizen data professionals enhancing forecasting with predictive
analytics. Modern solutions can reduce manual development through automation and prebuilt
15
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud
connectivity. These solutions enable users to connect to data sources using standard REST APIs and
thereby avoid having to develop and manage custom data connectivity.
5. Make full use of metadata, semantic layers, and data catalogs. Easily accessible and
up-to-date knowledge about data sources, including data definitions, lineage, and location,
is important for fast and complete data integration. Data catalogs, semantic layers, and other
metadata repositories can accelerate discovery, preparation, transformation, and use data. They
also reduce confusion about data quality and consistency and play a vital role in governance.
TDWI research finds that most organizations are looking for improvement. Only 9% are currently
very satisfied with how well their data catalogs, metadata, and semantic layers support self-service
BI, analytics, AI/ML, and related data preparation, transformation, and integration; 49% say they
need a major upgrade.6 Ensure that data catalogs, semantic layers, and metadata repositories are
central to your new architecture. Evaluate self-service BI solutions that provide a semantic layer
and offer integration with data catalogs through APIs.
7. Tighten data warehouse and data lake integration; consider a data lakehouse in the
cloud. With different types of users and workloads needing access to diverse data, organizations
need to tighten integration between their cloud data warehouse and data lake. One approach is to
establish holistic a cloud data lakehouse that makes it easier for users, applications, and services
to realize benefits. With a cloud data lakehouse, data scientists and analysts can explore data,
test models, and run AI/ML algorithms on prepared and transformed data in the data warehouse
and raw data in the data lake. Citizen data professionals and less-technical business users can
search, query, and analyze a fuller range of data.
Organizations should evaluate how they can rearchitect data transformation by utilizing powerful
cloud database processing, including by switching from traditional ETL to ELT. This can improve
speed and efficiency.
6
2020 TDWI Best Practices Report: Evolving from Traditional Business Intelligence to Modern Business Analytics,
online at tdwi.org/bpreports.
7
Ibid.
16
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud
9. Address governance and security. Governance must be central to your new architecture, but
it becomes challenging as users and workloads increase. Most organizations need to respond to
data privacy and internal data-use regulations as part of governance. A critical step is to bring
business and IT stakeholders together to interpret data privacy regulations, set rules and policies,
and monitor observance. Organizations should use resources such as data catalogs and semantic
layers to improve data lineage tracking. A cloud data lakehouse can give organizations holistic
governance of more of their data. Organizations should update data stewardship practices for
guiding users to trusted and governed data across all data platforms.
17
Beyond “Lift and Shift”: Rearchitecting BI, ETL, and Data Warehousing for the Cloud
Matillion’s software empowers customers to extract data from a wide number of sources, load it
into their chosen cloud data warehouse (CDW)—including Amazon Redshift, Snowflake, Microsoft
Azure Synapse, and Google BigQuery—and transform that data from its siloed source state
into analytics-ready insights—prepared for advanced analytics, machine learning, and artificial
intelligence use cases. Matillion does this at scale, delivering fast time to value, high performance
with pay-as-you-go economics. Matillion’s technology reimagines the traditional ETL (extract-
transform-load) approach to take advantage of the flexibility and scalability of their customer’s
cloud data warehouse. Matillion’s extract-load-transform (ELT) approach offers increased
performance and value by extracting and loading data in one move, then leveraging the power of
the cloud data warehouse to handle transformations. Companies can start out with Matillion’s latest
product, Matillion Data Loader, for data ingestion, and transition to Matillion ETL to perform more
complex data transformation. Matillion stands out in the crowded data technology field by helping
companies of all sizes worldwide by meeting their diverse data needs, and meeting them wherever
they are on their data journey. For more information, visit www.matillion.com.
As a leader in analytics infusion, Sisense powers businesses across the globe to go beyond just
making sense of data to unlocking untapped business potential. Sisense infuses analytics at the
right place and the right time to transform businesses from data-driven to intelligence-infused. Join
our community of intelligence-infused organizations such as Expedia, Philips, UIPath, and The
Salvation Army. Learn how you, too, can infuse analytics everywhere at www.sisense.com.
18
TDWI Research provides research and advice for data
professionals worldwide. TDWI Research focuses
exclusively on business intelligence, data warehousing,
and analytics issues and teams up with industry
thought leaders and practitioners to deliver both broad
and deep understanding of the business and technical
challenges surrounding the deployment and use of
business intelligence, data warehousing, and analytics
solutions. TDWI Research offers in-depth research reports,
commentary, inquiry services, and topical conferences
as well as strategic planning services to user and vendor
organizations.
T 425.277.9126
555 S. Renton Village Place, Ste. 700 F 425.687.2842
Renton, WA 98057-3295 E [email protected] tdwi.org