0% found this document useful (0 votes)
13 views

Data Strategy Only in God may we trust The rest bring Data

The document discusses the evolution of data architectures, highlighting the transition from traditional data warehousing to modern concepts like Data Fabric, Data Mesh, and the Medallion Architecture. It emphasizes the importance of Data Products in SAP's Business Data Cloud (BDC) strategy, which aims to make data more accessible, understandable, and trustworthy for business use. Additionally, it outlines the role of Large Language Models (LLMs) in enhancing data utilization and the necessity of structured data products for effective AI integration.

Uploaded by

saprega123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Data Strategy Only in God may we trust The rest bring Data

The document discusses the evolution of data architectures, highlighting the transition from traditional data warehousing to modern concepts like Data Fabric, Data Mesh, and the Medallion Architecture. It emphasizes the importance of Data Products in SAP's Business Data Cloud (BDC) strategy, which aims to make data more accessible, understandable, and trustworthy for business use. Additionally, it outlines the role of Large Language Models (LLMs) in enhancing data utilization and the necessity of structured data products for effective AI integration.

Uploaded by

saprega123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Data Strategy Only in God may we trust The rest bring Data

Let me tell you 4 truths from the multiverse:

1. Data Fabric / Data Lakes killed Data Warehouse concept

2. Data Mesh comes as an evolution of Data Fabric

3. Dashboard generation is on demand since GPT, rigid no more

4. SAP gets the point and releases BDC with Fabric, Mesh, Dashboards
and Gen AI

Traditionally, building analytics in SAP primarily built around back end and
presentation layers, Business Warehouse (BW) and Business Objects (BO)
gave enterprise reporting and data analysis for many organizations. SAP BW
provided the data warehousing platform, centralizing data from mostly SAP
sources into a structured environment optimized for reporting. On the
presentation layer, Business Objects offered a suite of tools for creating
reports, dashboards, and ad-hoc queries, it has given business users with
self-service analytics.

These technologies brought significant benefits, however, that analytical


layer was long gone and not considered the technology of the future. Its on-
premise nature, complexity, and rigidity struggle to keep pace with the
demands of modern, agile businesses.

The rise of cloud computing, big data, and real-time analytics brought
different architectures more flexible, scalable. The focus shifted towards
cloud-based platforms like SAP Analytics Cloud and embedded analytics
within S/4HANA, the raise of cloud hyperscalers and modern analytics
platforms, first Snowflake and then Databricks, got the idea perfectly and
leveraged a nice combination of technological improvements and platform
modernity. The medallion architecture was born.

In this blog, I will go through the new offering from SAP, Business Data Cloud
(BDC from now on), the vision to unify applications, data, and AI, with a core
focus on Data Products.

The Evolution of Data Architectures

The Medallion Architecture

Spoiler
In the world of data management, the Medallion architecture, also known as
multi-hop architecture, is an approach to data model design that encourages
the logical organisation of data within a data lakehouse.

The Medallion architecture structures data in a multi-tier approach —bronze,


silver and gold tier— taking into account and encouraging data quality as it
moves through the transformation process (from raw data to valuable
business insights). This architecture ensures data integrity by passing
through several stages of validations and transformations that ensure data
atomicity, consistency and durability. Once the data has passed through
these validations and transformations, it is stored in an optimal layout for
effective analysis, ready to be used for strategic decision making.

By Author: Medallion architecture diagram showing Bronze, Silver, Gold tiers


with data sources

Initially, the Medallion Architecture emerged as a response to the growing


complexity of data management, offering a seemingly manageable way to
organize expanding business data by breaking down the problem into
smaller, quality-focused stages, but Medallion falls short for some.

A primary criticism is that it operates as a "pull mechanism," which


inadvertently shifts the burden of complex data transformations onto the
data consumers. These consumers, often business analysts or downstream
applications are forced to handle intricate data manipulations and wait for
data to be fully curated in the Gold layer before they can derive meaningful
insights. This creates inefficiencies and delays, hindering agile decision-
making.
Also, the Medallion Architecture has been criticized for data quality issues.
Becoming so rigid, errors and inconsistencies introduced in the initial Bronze
layer can propagate through subsequent layers, becoming increasingly
difficult to rectify downstream. This layered approach can lead to a fragile
data foundation, where each layer's integrity is heavily dependent on the
preceding one.

In Medallion, Data is repeatedly moved and transformed across layers,


adding to computational costs and processing backlogs without necessarily
adding commensurate business value in the earlier stages.

In the last layers, Medallion Architecture's lack of business context in its


upstream tiers. This linear, assembly-line style data transformation
architecture treats data as a technical artifact rather than a business
product. Bronze and Silver layers that often lack the necessary business
context to be readily usable for decision-making, relegating true business
value realization to the final Gold layer and causing delays in accessing
actionable insights.

This also limits data consumption options and creates bottlenecks, as


downstream consumers must wait for the data to reach the Gold layer, high-
waiting queues and restricted access to data in its more raw or intermediate
forms. It is not agile, its still rigid and the risk of errors is high.

The Evolution to Data Products and Data Mesh

The Data Product Architecture adopts a "push mechanism," where data is


proactively shaped and refined based on clearly defined analytical and
operational use cases. The Data Product Architecture prioritizes pushing
business context to the forefront, right from the initial stages of data
processing. This ensures that data is treated as a software, as a product from
the outset, designed and engineered to meet specific business needs and
deliver value at every stage of its lifecycle.

This concept, called Data Mesh is perfect for LLM consumption, but came
before LLMs, its accidental feature. Introduced here in 2020 describes that
data monolith (being a data lake or warehouse), is many times a bottleneck
when organizations grow and data complexity increases. Data Lakes, coming
from the old days of Big Data, lack agility and scaling, because data needs to
move over and offer to the next phase of "curation".
By Author: Data mesh conceptual diagram showing decentralized data
ownership

The Data Mesh is founded on four key principles:

1. Firstly, Domain-oriented decentralized data ownership shifts


responsibility to domain teams, who are closest to the data and its
context.

2. Secondly, Data as a product emphasizes treating data as a valuable


product, making it discoverable, understandable, trustworthy, and
natively accessible. This requires domain teams to not just own the
data but also serve it to consumers effectively.

3. Thirdly, a Self-serve data platform providing domain teams with the


necessary infrastructure and tools to build, deploy, and operate their
data products independently, without relying on a central data team as
a bottleneck.

4. Fourthly, Federated governance addresses the need for


standardization and interoperability across decentralized domains. This
involves establishing global standards and policies, while allowing
domains autonomy within those boundaries.

Moving to a Data Mesh is an evolutionary journey for Fabric, not a


revolutionary overhaul. Organizations could start with their existing data
monolith and incrementally transition towards a mesh architecture, this is
being proposed by SAP as well. Key steps include identifying business
domains and their corresponding data products, building self-serve data
platform capabilities, and implementing federated governance.

But what is a Data Product?

In Data Mesh speak, a Data Product is an architectural quantum, which is


the "smallest unit of architecture that can be independently deployed with
high functional cohesion and includes all the structural elements required for
its function."

More practically, a Data Product is a self-contained package encompassing


not just data, but also metadata, code for transformation, and potentially
infrastructure. It embodies the principle of "data as a product," where data
owners treat their data with a product mindset, just as it is Software,
focusing on consumer needs and usability.
By Author: Concentric circles diagram showing Data Product components -
Data, Metadata, Infrastructure, Code

A more nuanced view defines a Data Product as an autonomous logical


entity describing data meant for consumption, with relationships to
the underlying technology. This logical entity includes a dataset name,
description, owner, and references to physical data assets, making it
technology-agnostic.

Regardless of the definition, the goal is to create discoverable, addressable,


understandable, trustworthy, and natively accessible data. Data Products are
the fundamental building blocks of a Data Mesh, designed to serve analytical
data and facilitate data-driven decision-making within specific business
domains. They should be well-defined, simple, cohesive, focusing on a single,
well-defined function to maximize reusability. Data becomes a product when
it is effectively packaged, delivered, and consumed in a way that solves a
specific user problem or fulfills a business need.

Data becomes a DaaS product at the point of consumption, not merely at the
point of collection or storage. It's like preparing a meal, you collect
ingredients which are not consumed until you have a final dish, and you
collect the ingredients you want, not the other way around, being given
ingredients "what can I do with all this".

Several key factors are identified as crucial in this data-to-product


transformation. Firstly, understanding the user and their problem, just the
same way we do if we want to build DaaS. A data product must be designed
with a specific user in mind and must address a tangible problem they face.
Without this user-centric approach, data risks remaining an abstract asset
with limited practical value. This necessitates close collaboration with
potential users to deeply understand their workflows, pain points, and
information requirements.

Secondly, packaging and presentation play a vital role. Transforming data


into easily digestible formats, such as reports, dashboards, or APIs, is
essential for making it accessible and actionable. This involves not only
technical processing but also thoughtful design of interfaces and
visualizations that facilitate intuitive interaction and interpretation. The form
in which data is delivered is as important as the data itself in determining its
product value.

Also, the lifecycle of the data is critical in its design. Adopting a product-
centric mindset within data teams. Data teams should possess a strong
understanding of business needs and user workflows. This requires a shift in
perspective, viewing data not just as a technical asset but as a product that
needs to be carefully crafted, marketed, and supported to ensure user
adoption and satisfaction.

SAP's Journey in Data Productization

SAP has long been player in Data since the introduction of BW in 1998. This
was the dawn of an era where organizations sought to consolidate and
analyze their business data. SAP BW evolved through capabilities in ETL,
data modeling, and integration, ultimately the in-memory HANA around
2015.

The introduction of SAP Data Warehouse Cloud, around 2019, which was later
SAP Datasphere in 2023, and built on the BTP, has been the first business
data fabric architecture, because it aimed the consolidation of critical data
from various sources, on HANA Cloud. This architecture, with its focus on
integrating diverse data landscapes, aligns with the decentralized and
interconnected nature of a Data Mesh. SAP Datasphere provides a
centralized catalog for data discovery and governance across these
interconnected sources. The Catalog is important, and I will discuss later on.

This 2025, the introduction of the SAP Business Data Cloud (BDC) is now the
future of SAP's data and analytics strategy. It integrates the strengths of SAP
BW, SAP Datasphere, and SAP Analytics Cloud (SAC) on a single platform.

SAP BDC ecosystem diagram showing integration of components

What is really new is the concept of data products, central to SAP BDC, with
SAP aiming to deliver out-of-the-box data products following a harmonized
data model. This strong emphasis on data products as fundamental building
blocks clearly echoes the core tenets of a Data Mesh.

LLMs and the Necessity of Data Products


Let's first understand how LLMs function and why these issues occur so
frequently.

LLMs are trained on vast amounts of unstructured data; they learn about the
data and store this information as part of weights and biases in their neural
network; this information includes the language understanding and the
general knowledge provided in the training data.

To date, off-the-shelf LLMs are not prepared with structured and relevant
data for enterprise use cases. The most popular use case for enterprises is to
be able to query their extensive set of tables and data lakes using LLMs.
Here are two broad ways in which LLMs are being used in enterprises today:

Scenario 1: Unorganized Data Pools

A common misconception is that LLMs can seamlessly process unorganized


data to deliver accurate responses, leading organizations to provide them
with such sources. However, this approach is flawed. LLMs struggle to create
accurate and optimized queries without a structured data framework,
resulting in inefficient SQL, suboptimal performance, and elevated
computational costs. Supplying just the database schema isn't enough; LLMs
necessitate detailed contextual information about metrics, measures,
dimensions, entities, and their relationships to generate effective SQL
queries.
By Author; LLM
with unorganized data diagram showing consumption layer, LLM, query
engine, and data lakes

Scenario 2: Organized Data Catalogues

Organizations may choose to organize their data with defined schemas and
entities in catalogs before using LLMs, which helps the LLMs understand the
data and improves accuracy and efficiency. However, this method requires
ongoing updates, involves data movement, and has high upfront costs for
organizing and cataloging large datasets. Additionally, even with this
structure, LLMs may still not fully comprehend the data's context and
semantics, which results in inaccuracies.
By Author; LLM
with data catalogs diagram showing the addition of data catalog between
query engine and data lakes

The Solution: Building LLMs Powered by Data Products

Enter the data product era! A data product is a comprehensive solution that
integrates Data, Metadata (including semantics), Code (transformations,
workflows, policies, and SLAs), and Infrastructure (storage and compute). It is
specifically designed to address various data and analytics (D&A) and AI
scenarios, such as data sharing, LLM training, data monetization, analytics,
and application integration. Across various industries, organizations are
increasingly turning to sophisticated data products to transform raw data
into actionable insights, enhancing decision-making and operational
efficiency. Data products are not merely a solution; they are transformative
tools that prepare and present data effectively for AI consumption—trusted
by its users, up-to-date, and governed appropriately.
B
y Author; LLM with data product layer diagram showing how data products
sit between LLM and data lakes

Integrating a Data Product Layer with our existing data infrastructure


represents a significant advancement in leveraging Large Language Models
(LLMs) for enterprises. This enhances LLMs' contextual understanding and
query precision.

SAP BDC: Data Products in Action

At the heart of SAP BDC lies the concept of Data Products. SAP recognizes
that data is only valuable if it's accessible, understandable, and trustworthy.
Data Products in BDC are not just raw data; they are curated, enriched, and
contextualized data assets designed for specific business purposes. They are
the core components of the Business Data Cloud.

Within SAP, a Data Product is a data set made available for use outside its
original application through APIs. It comes with detailed, high-quality
descriptions accessible via a Data Product Catalog. It's important to note
that "Data Product" doesn't mean something you purchase isolated; it
simply refers to data that is "packaged" for straightforward use.

Cash Flow Data Product example showing interface elements and properties

There are 2 variants of Data products available for flexible access:


 SAP Data Products, based on a canonical / standard SAP One Domain
Model Definition

 Customer Data Products, based on customer individual configurations


like S/4HANA Z-CDS-Views, BW DSO Objects, IBP Dataset, HANA Cloud,
etc.

Features of Data Products:

 Business Data Sets: consisting of one or more business objects entities,


related objects, analytical data (measures, dimensions), documents,
graph data, spatial data, ...

 Consumable: via APIs or via Events. Supported API types are SQL (incl.
SQL interface views), Delta Sharing, Event, REST, (oData)

 Described: with metadata that is of high quality and provided via Open
Resource Discovery (ORD), following ORD schema for Data Product*
ORD will be explained in a minute

 Discoverable: via the Data Product Directory that is a service of UCL


that aggregates metadata of all Data Products to make them
discoverable in a landscape.

Data catalog interface showing filtering capabilities and available data


products
The data catalog plays a pivotal role in modern data management by
serving as a centralized and organized inventory of metadata. It is crucial for
data discovery, allowing users to easily find the data they need and
understand its purpose and contents. A data catalog typically stores
metadata such as business terms, owners, origins, lineage, labels, and
classifications. This enables data analysts and other users to evaluate the
fitness of data for intended use cases. Furthermore, the catalog is
fundamental for data governance by providing a system to manage and
oversee data assets, track ownership, understand data flows (lineage), and
enforce policies.

Requirements for Utilizing Data Products

What is needed to use Data Products effectively:

1. Discoverability: Users need a way to find and understand the available


data products, often through a Data Mesh Marketplace or a data
product catalog. This requires metadata and documentation to be
readily accessible.

2. Accessibility: Once a data product is found, users need to be able to


access it through defined interfaces. This might involve APIs, SQL
interfaces, file-based endpoints, or other methods depending on the
data product's design and intended use cases.

3. Understanding Data Contracts and Policies: Users need to be aware of


and adhere to the data contracts that define the structure, quality,
service levels, security, and privacy policies associated with the data
product. They need to understand the access rights and licenses
governing the use of the artifacts within the data product.

4. Authentication and Authorization: Secure access requires


authentication to verify the user's identity and authorization to ensure
they have the necessary permissions to consume the data product and
its artifacts.

5. Appropriate Tools and Skills: Depending on the access method, users


might need specific tools (e.g., SQL clients, API clients, data science
workbenches) and the skills to interact with the data in the provided
format. Self-service capabilities and user-friendly interfaces aim to
minimize the need for complex IT skills.
6. Network Connectivity and Endpoints: Users need to be able to connect
to the addressable endpoints (URLs, URIs) of the data products. This
technical foundation ensures that data consumers can reliably access
the data products regardless of their location.

7. Awareness of Service Level Objectives (SLOs): Consumers should be


aware of the SLOs to understand the expected reliability and
performance of the data product. This includes metrics like uptime,
response time, and data freshness guarantees.

8. Data Preparation (Potentially): While data products aim to provide


processed and user-friendly data, consumers might still need to
perform lightweight integration or transformation tasks based on
mappings between similar data elements. Self-service data preparation
tools can assist with this.

SAP BDC Architecture: Enabling Data Products

SAP BDC architecture diagram showing layers and components


The architecture of SAP Business Data Cloud is designed to support the
creation, management, and consumption of Data Products through a unified
and coherent framework. This architecture forms the foundation that enables
data to be transformed from raw information into valuable, business-ready
data products.

The core components of SAP BDC architecture work together to support the
Data Product lifecycle:

SAP Datasphere

Datasphere serves as the central hub for integrating data from various
sources, both SAP and non-SAP. It plays a crucial role in harmonizing data
across different formats and structures, ensuring consistency and
compatibility. Through its data modeling and transformation capabilities,
Datasphere creates the foundation for high-quality Data Products by
providing a unified semantic layer that bridges technical data storage and
business meaning.

SAP Analytics Cloud

SAC functions as the primary consumption layer for Data Products. It


leverages the well-structured Data Products to deliver analytics, reports, and
dashboards that enable business users to gain insights without complex data
manipulation. This component translates the technical excellence of Data
Products into business value through visualization and analysis.

SAP Databricks

SAP Databricks (remember, they go together, its called SAP


Databricks because its OEM, single contract) provides advanced data
processing, machine learning, and AI capabilities within the BDC ecosystem.
It enables the creation of sophisticated Data Products that incorporate
predictive analytics, complex transformations, and AI-driven insights.
Through its serverless computing and unified analytics platform, SAP
Databricks helps extend the value of Data Products beyond traditional
reporting.
Foundation Services

The underlying Foundation Services provide essential capabilities for data


acquisition, transformation, and storage. These services ensure that Data
Products have a reliable infrastructure foundation, addressing needs for
performance, scalability, and security.
BW integration diagram showing how BW connects with BDC components

Data Product Creation and Consumption in SAP BDC

The journey of creating and consuming Data Products in SAP BDC follows a
structured process that ensures quality, governance, and accessibility. This
process encompasses multiple steps, from initial data package activation to
the creation of business value through Data Product consumption.

SAP applications produce data products that can be consumed in SAP


Datasphere via SAP data packages. Data products provide the data for
consumption, while the Business Analyst reviews the information about the
installed Data Products.
By Author; Data activation workflow diagram showing the 5-step process for
data products

Integration with SAP Business Warehouse

SAP BDC offers seamless integration with SAP Business Warehouse, allowing
organizations to leverage their existing BW investments while moving toward
a modern data product approach. With the introduction of SAP BDC,
customer-managed BW data products can be transitioned to SAP-managed
data products.

Customers will first work with BW data products based on their BW data
models from SAP BW, starting use cases in SAP Databricks and SAP
Datasphere, with exposure via Delta Share. As they begin exploring what's
possible with SAP-managed data products and insight apps, customers can
gradually replace BW Data Products with SAP-managed Data Products as
they move into a clean data approach.

BW can publish "BW data products" into the SAP BDC object store. We could
use SAP Databricks delta share capabilities to expose those data products to
3rd party data lakes. We could also access 3rd party data lakes from BDC via
delta share, and for example, join data from BW with data from Azure Data
Lake in a Datasphere data layer.
The Role of Open Resource Discovery (ORD)

Open Resource Discovery (ORD) is a protocol that allows applications and


services to self-describe their exposed resources and capabilities. It's critical
for enabling consistent technical documentation and facilitating the
discovery of Data Products.

ORD provides several benefits in the context of Data Products:

 It enables automated discovery and aggregation of metadata

 It ensures a high degree of automation and helps keep systems in sync


with reality

 It provides a bigger context with shared, high-level information,


taxonomy, and relations between described resources
 SAP Business Data Cloud uses ORD to provide high-quality metadata
for its Data Products via the Data Product Directory

Business Use Case: Data Products Driving Value

A real-world impact of Data Products within SAP BDC, let's consider a finance
department struggling with cash flow visibility and forecasting accuracy.

Before implementing a Data Product approach, the finance team dealt with
fragmented data sources: bank transaction data in SAP S/4HANA, accounts
receivable in an Ariba system, and forecasting data in various Excel
spreadsheets. Analysts spent days each month reconciling and consolidating
this data, often discovering discrepancies too late to inform decision-making.

Old relationship between stakeholders

With SAP BDC's Data Product approach, the organization implemented a


"Cash Flow" Data Product that provides:

 A unified view of actual cash positions from banking systems

 Confirmed transactions from accounts receivable

 Forecasted cash flows from planning systems

Finance analysts now access this Data Product through SAP Analytics Cloud,
where they can immediately visualize current cash positions, analyze trends,
and generate accurate forecasts. The Data Product ensures that all data is
current, consistent, and properly contextualized with business meaning.
By Author

Benefits of a Data Product-Centric Approach in SAP BDC

The Data Product approach within SAP Business Data Cloud delivers several
key advantages that collectively transform how organizations leverage their
data assets:

1. Improved data accessibility and discoverability: Well-defined and


cataloged Data Products make it significantly easier for business users
to find relevant data. With rich metadata, clear ownership, and
intuitive cataloging, users spend less time searching and more time
analyzing.

2. Enhanced data trustworthiness and reliability: The focus on quality and


governance within Data Products builds confidence in the data. Users
can trust that the data they access is accurate, current, and compliant
with relevant policies and standards.

3. Increased business agility: By empowering users to access and utilize


Data Products without complex data wrangling, organizations can
respond more quickly to changing business conditions. Business teams
can independently explore data and generate insights without heavy IT
involvement.
4. Accelerated innovation: A solid foundation of Data Products provides
the reliable data needed for advanced analytics, machine learning, and
AI initiatives. This accelerates the path from data to innovation,
allowing organizations to develop new capabilities and offerings faster.

5. Reduced duplication and increased efficiency: By creating reusable


Data Products instead of one-off data extracts or reports, organizations
reduce redundant work and establish a shared understanding of key
business data.

6. Improved data literacy across the organization: The business context


embedded in Data Products helps users understand the meaning and
proper use of data, elevating overall data literacy.

TL:DR

In the age of AI, the ability to link business understanding with strategic
choices is vital. As every company becomes an AI data creator, strong
business context becomes necessary to get AI ready for real-world use.
Building a solid base for Business AI means bringing together data systems
and business operations.

The data product approach fundamentally shifts how we think about data,
moving from a technical resource to be managed to a strategic asset to be
leveraged. It brings business context to the forefront, ensuring that data is
not just stored but is truly useful and actionable.

In SAP, the new Business Data Cloud, with Data Products, moves into how
organizations manage and leverage their data in this new paradigm by
treating data as a product—discoverable, accessible, trustworthy, and
natively usable.

By adopting a data product mindset, organizations can bridge the gap


between data and business value and at the same time, establish the
foundation of AI accessibility of Data.

You might also like