Collibra - Enterprise-Data-Catalogs-Whitepaper
Collibra - Enterprise-Data-Catalogs-Whitepaper
Executive summary
Data lies at the heart of all digital transformation initiatives. It helps organizations
“Data management and better understand their customers, improve products and services, drive
governance is at the operational efficiencies and reduce risk. But to achieve those goals, data needs to
center of every emerging
be managed as an asset. Just like physical assets (property, plant and equipment),
and digital technology
challenge.” data needs to be properly maintained. Organizations need processes in place
- Forrester Research, An Advanced to assure data quality and protect data from misuse. They need to maintain a
Course in Data Governance: deeper and richer understanding of how data flows through an organization.
Ambient Data Governance Makes
Data Work, July 2019 Data catalogs serve that purpose. They offer the means to manage metadata and
curate the necessary information to make data assets easier to discover, manage
and consume. In doing so, data catalogs have become an essential component
of enterprise data architectures.
However, making data easy to discover and consume is just the first step. To
get the most out of data, it needs to be trusted, and the right data needs to be
shared with the right users. That requires proper governance to assure data
quality and define usage policies that can prevent data from being misused or
misappropriated.
Implementing a data catalog with embedded data governance helps to drive agile
data operations. It enables business intelligence teams to be truly self-service,
safe in the knowledge that analysts are accessing trusted data under permitted
circumstances.
2
Whitepaper
• How data governance addresses those challenges to ensure trusted data is readily
available across the organization
• How Collibra empowers business users to quickly discover, understand and trust data and
reports to drive impactful business decisions
3
Whitepaper
4
Whitepaper
Catalog of catalogs
• User group (e.g. data science / BI team)
• Business unit
Greater requirements for:
• Consistency: standardizing business
terminology and nomenclature across
sources
• Compliance: setting usage policies and
administering access control
• Data quality: improving source data
through greater accountability /
stewardship
Governance requirements
Some data catalog implementations are relatively narrow in scope. For example,
many of the leading cloud service providers offer their own data catalogs, which
are well-suited to aiding discovery of data on those platforms, but lack capabilities
to support enterprise implementations (particularly for hybrid, multi-cloud
architectures). Equally, some specialist data catalog vendors cater squarely to
the need of data scientists, helping them to curate technical information on a
variety of data assets, but with less consideration to issues such as business
context, data quality, consistency and compliance.
5
Whitepaper
Given these added complexities, it is important that enterprise data catalogs can
offer built-in data governance capabilities, helping to bring in expertise from
a range of different roles to drive more detailed understanding, promote data
quality and consistency, and ensure compliance.
Key takeaway
While some organizations may begin implementing a data catalog with a narrow remit,
most will ultimately discover requirements to expand that approach. It is important to note
that enterprise capabilities are not only needed by large corporations. Organizations of
all sizes can benefit from proper data governance. Managing the business imperative
to extract more value from data in tandem with evolving regulatory responsibilities is a
challenge faced by all kinds of companies.
6
Whitepaper
80 %
sets can be harvested automatically. There are many data catalogs — particularly
those deployed to serve a relatively narrow function — that collate metadata
in such a way, primarily to aid data discovery. Yet looking to manage data as a
strategic asset requires expert insights and human judgments. Data catalogs
of the participants
without built-in, proper governance will invariably run into challenges relating
believe data and analytics
governance is important to data quality and consistency. Perhaps even more significantly, they can leave
in enabling business organizations exposed to significant liabilities by failing to comply with relevant
outcomes. Yet, from rules and regulations.
the same survey, we
know that four in ten
participating organizations Data quality
do not assess, monitor or
measure data governance. While any business decision can be data driven, the right decisions will be based
- Gartner’s Data & Analytics on the right data. That is why data quality remains such a pressing priority
Governance Survey, June 2019
and an integral component of digital transformation strategies. Poor data
quality results in a lack of trust, not only in the underlying data, but also in the
conclusions garnered from analyzing that data. And a lack of trust will blur an
organization’s decision making processes.
The act of cataloging data does not, in itself, do anything to improve data quality,
but can promote transparency into data quality metrics. There are several ways
that an enterprise data catalog can go about that task, including data profiling,
detailed technical lineage, collating user feedback and certification.
However, to truly improve the quality of source data requires expert knowledge
and clear accountability. This is what governance brings to the table. By assigning
owners and stewards, organizations can begin putting in place the right processes
to identify, track and remediate issues relating to data quality. Ultimately,
governance ensures there is a foundation of trust, which is what organizations
need to drive agile data-driven operations.
7
Whitepaper
Consistency
Even small and medium sized organizations run into issues relating to data
consistency. It is all too easy for data to proliferate without upfront consideration
to standards. When new systems are developed or procured there is often little
thought taken into standardizing nomenclature or ensuring KPIs and other
analytics are calculated consistently. Achieving data consistency, therefore,
needs to be achieved once the horse has bolted, which requires significant
amounts of effort and coordination. Given the amount of technical debt faced
by most organizations, the task of achieving data consistency can be even more
daunting for large enterprises. Without a common language and standardized
definitions, it is impossible to ensure coherent analysis of data gathered from
multiple source systems or business units. That means organizations risk
jeopardizing their analyses by failing to make apples-to-apples comparisons
and ultimately yielding invalid conclusions.
Compliance
Compliance with rules and regulations that govern the use of data is a challenge
that is not only growing in importance, but also complexity. Privacy regulations
around the globe have evolved significantly in recent years. The European
Union’s General Data Privacy Regulation (GDPR) started what has become a
global trend, with comprehensive privacy regulations being enacted carrying
significant penalties for non-compliance. Examples of other jurisdictions to
follow suit include the recently introduced California Consumer Protection
8
Whitepaper
Act (CCPA) and Brazilian General Data Protection Law, as well as the Indian
Data Privacy Bill, which is still working its way through parliament. To further
complicate matters, these new regional, national and supranational rules, have
to be managed in conjunction with industry-specific regulations (such as HIPAA
and BCBS 239), along with rules from tax authorities that require data retention
for audit purposes and a variety of internal policies (for example, prohibiting
salary information being shared outside of HR).
Key takeaway
The combination of these three challenges — data quality, consistency and compliance
— is what ultimately prevents an organization from being agile and data driven in their
decision making. Poor data quality and inconsistency in data definitions leads to a lack
of trust in underlying data and analysis. The complexity of compliance requirements can
also act as an impediment to agile operations. The only way to enable true self-service
business intelligence is for organizations to codify their data usage policies into granular
controls that allow access to data only under permitted circumstances.
9
Whitepaper
This section details aspects of data governance that are crucial to successful data
catalog implementations of all kinds, but particularly for strategic, enterprise
initiatives.
Data Quality
Certification Leveraging insights from data experts to highlight trusted data sets
Consistency Can have positive impact on data quality (helping to track errors),
Data Lineage consistency (ensure reports pointed to right sources) and
compliance (e.g. BCBS 239)
10
Whitepaper
Organizations invariably find that different systems will name the same concept
in different ways. Business terminology can be made even more confusing in an
environment of constant change. Any ambiguity in definitions causes confusion
in analytical processes and can lead to inaccurate conclusions. Expert insights
are required to clear up those discrepancies with the help of tools such as a
business glossary, data dictionary and reference data crosswalks.
Data lineage
11
Whitepaper
makes data meaningful and helps to assure trust, mitigate risks (particularly
from change management) and demonstrate compliance. In fact, end-to-end
lineage is a necessary and crucial foundation for all data-driven initiatives.
Accurate data lineage helps data consumers ensure reports are pointed at the
correct sources, and easily trace any potential errors identified through their
analysis. From the perspective of compliance, data lineage can also be a useful
tool to track data processing activities and monitor where sensitive information
resides and how it is processed.
Collaborative workflows
12
Whitepaper
Key takeaway
Most companies have to deal with technical debt — including legacy information
systems and siloed data architectures — that makes the challenge of maintaining data
quality, consistency and compliance all the more complex. Tackling these challenges
on a piecemeal basis is simply not feasible. It means organizations need to create an
abstraction layer that can harmonize their disparate physical data infrastructure. Data
catalogs play a key role in that abstraction, particularly if they offer embedded governance
capabilities. They enable organizations to establish centralized definitions, mappings,
policies and controls that can be applied consistently across disparate physical data
stores. They also help to clear up confusion arising from complex data infrastructures by
promoting certification and enabling effective and compliant information sharing.
Business users gain rapid access to trusted data for meaningful analysis, while
the enterprise can also overlay appropriate policies to mitigate potential liabilities
from data misuse.
13
Whitepaper
An open architecture ensures Collibra Data Catalog can sit on top of tactical
solutions designed for narrow use cases, allowing firms to manage data across
disparate enterprise architectures and business silos.
Insights into the way data flows from source through to consumer, including
views of both technical and business lineage, help drive data quality, mitigate
risk and demonstrate compliance.
Embedded governance within a data catalog ensures that access to data can
be granted enterprise-wide without risk. Defining appropriate usage policies
allows organizations to administer appropriate access controls.
Increase productivity
Access data that you trust, understand, and can use right away.
14
Whitepaper
For additional questions, United States United Kingdom All other locations By email
contact us at: +1 646 893 3042 +44 203 695 6965 +32 2 894 79 60 [email protected]