0% found this document useful (0 votes)
105 views

The What, Why and How of Data Quality

This document discusses data quality, including what it is, why it is important, and best practices for improving data quality. Specifically: 1) Data quality refers to how well data accurately represents real-world entities and is fit for its intended uses. Poor data quality can negatively impact business operations, information quality, and decision making. 2) Common causes of poor data quality include human errors, lack of communication between departments, and inadequate data strategies. Improving data quality requires addressing people, processes, and technology. 3) Examples of how poor data quality can negatively impact businesses include overspending in marketing due to duplicate records, inability to present complete product data online, and inconsistent financial reporting. Better data

Uploaded by

rlwersal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views

The What, Why and How of Data Quality

This document discusses data quality, including what it is, why it is important, and best practices for improving data quality. Specifically: 1) Data quality refers to how well data accurately represents real-world entities and is fit for its intended uses. Poor data quality can negatively impact business operations, information quality, and decision making. 2) Common causes of poor data quality include human errors, lack of communication between departments, and inadequate data strategies. Improving data quality requires addressing people, processes, and technology. 3) Examples of how poor data quality can negatively impact businesses include overspending in marketing due to duplicate records, inability to present complete product data online, and inconsistent financial reporting. Better data

Uploaded by

rlwersal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

DATA QUALITY – WHAT,

WHY, HOW, 10 BEST


PRACTICES & MORE!

As data is becoming a core part of every during business processes will determine
business operation the quality of the data the success achieved in doing business
that is gathered, stored and consumed today and tomorrow.
This article covers the following topics about Data Quality:

Ask the question: What is Data Quality?................................................................................................................3

Examine the importance of data quality................................................................................................................4

Go through how to improve data quality...............................................................................................................7

Outline the data quality best practices...................................................................................................................12

List some good data quality resources....................................................................................................................14


What is Data Quality? These two possible definitions may
contradict each other. If for example a
You can regard data as the foundation customer master data record is fit for
for a hierarchy where data is the bottom issuing an invoice at receiving a payment
level. On top of data you have information, it may be fit for that purpose. But if the
being data in context. Further up we customer master data record at the same
have knowledge seen at actionable time is incomplete or incorrect for doing
information and on top level wisdom as customer service, because the data does
the applied knowledge. not fully or incorrectly describe the who,
what and where of the real-world entity
If you have bad data quality, you will not having the customer role in that business
have good information quality. With bad operation, we have a business problem.
information quality you will lack actionable
knowledge in business operations and not Not at least master data must often be fit
be able to apply that knowledge or do that for multiple purposes. You can achieve that
wrongly with risky business outcomes as by ensuring the real-world alignment. On
the result. the other hand, it might not be profitable
and proportionate to strive for the prefect
There are many definitions of data quality. real-world alignment in order to have data
The two predominate ones are: fit for the intended purpose of use within
the business objective where a data quality
initiative is funded. Thus, in practice, it is
1 Data is of high quality, if the data is fit
about striking a balance between these
for the intended purpose of use
two definitions.
2 Data is of high quality, if the data
correctly represent the real-world
construct that the data describes

Figure 1.

3
In a research commissioned by Experian Data quality resembles human health.
Data Quality in 2013 the top reason for Accurately testing how any one element
data inaccuracy was found to be human of our diet and exercising may affect
errors, with 59 % of cases assessed to be our health is fiendishly difficult. In the
stemming from that cause. Avoiding same way, accurately testing how any
or eventually correcting low quality one element of our data may affect our
data caused by human errors requires a business is fiendishly difficult too.
comprehensive effort with the right mix of
remedies being about people, processes Nevertheless, numerous experiences
and technology. tell us that bad data quality is not very
healthy for business.
Other top reasons for data inaccuracy
found in the mentioned research are lack
of communication between departments The classic examples are:

(31%) and inadequate data strategy (24%).


• In marketing you overspend, and annoy
Solving such issues calls for an passionate
your prospects, by sending the same
top-level management involvement.
material more than once to the same
person – with the name and address
Importance of Data Quality
spelled a bit different. The problem
Usually it is not hard to get everyone here is duplicates within the same
in a business, including the top-level database and across several internal
management, to agree about that having and external sources.
good data quality is good for business. In
• In online sales you cannot present
the current era of digital transformation,
sufficient product data to support a self-
the support for focussing on data quality is
service buying decision. The issues here
even better than it was before.
are completeness of product data within
your databases and how product data is
However, when it comes to the essential
syndicated between trading partners.
questions about who is responsible for data
quality, who must do something about it • In supply chain you cannot automate
and who will fund the necessary activities, processes based on reliable location
then the going gets tough. information. The challenges here are
using the same standards and having
the necessary precision within the
location data.

4
• In financial reporting you get different will only work on complete and
answers for the same question. This consistent data.
is due to inconsistent data, varying
• Shortcomings in meeting increasing
freshness of data and unclear
compliance requirements. These
data definitions.
requirements span from privacy and
data protection regulations as GDPR,
On a corporate level, data quality issues
health and safety requirements
have a drastic impact on meeting core
in various industries to financial
business objectives, as:
restrictions, requirements and
• Inability to timely react to new market guidelines. Better data quality is most
opportunities and thus hindering profit times a must in order to meet those
and growth achievements. Often this is compliance objectives.
due to not being ready for repurposing
• Difficulties in exploiting predictive
existing data that were only fit for
analysis on corporate data assets
yesterday’s requirements.
resulting in more risk than necessary
• Obstacles in implementing cost when making both short-term and long-
reduction programs, as the data that term decisions. These challenges stems
must support the ongoing business from issues around duplication of data,
processes needs too much manual data incompleteness, data inconsistency
inspection and correction. Automation and data inaccuracy.

5
HOW TO IMPROVE
data are often marred by duplicates,
meaning two or more database rows

DATA QUALITY describing the same real-world entity.


There are several remedies around to cure
that pain going from intercepting the
Improving data quality takes a balanced
duplicates at the onboarding point to bulk
mix of medicine encompassing people,
deduplication of records already stored in
processes and technology as well
one or several databases.
as a good portion of top-level
management involvement.
With product master data, uniqueness
is a less frequent issue. However,
completeness is often a big pain. One
reason is that completeness means
different requirements for different
categories of products.

When working with location master


data consistency can be a challenge.
Addressing, so to speak, the different
postal address formats around the world is
certainly not a walkover.

In the intersection between the location


domain and the customer domain the
Figure 2.
data quality dimension called precision can
be hard to manage, as different use cases
Data Quality Dimensions
require different precision for a location

When improving data quality, the aim will weather being a postal address and/or a

be to measure and improve a range of data geographic position.

quality dimensions.
What is relevant to know about your

Uniqueness is the most addressed data customers and what is relevant to tell about

quality dimension when it comes to your products are essential questions in the

customer master data. Customer master intersection of the customer and product
master data domains.

6
Conformity of product data is related to The data quality KPIs will typically be
locations. Take unit measurement. In the measured on the core business data
United States the length of a small thing assets within the data quality dimensions
will be in inches. In most of the rest of the as data uniqueness, data completeness,
world it will be in centimetres. In the UK data consistency, data conformity,
you will never know. data precision, data relevance, data
timeliness, data accuracy, data validity
Timeliness, meaning if the data is available and data integrity.
at the time needed, is the everlasting data
quality dimension all over. The data quality KPIs must relate to
the KPIs used to measure the business
Other data quality dimensions to measure performance in general.
and improve are data accuracy, being
about the real-world alignment or The remedies used to prevent data
alignment with a verifiable source, data quality issues and eventual data cleansing
validity, being about if data is within the includes these disciplines:
specified business requirements, and data
• Data Governance
integrity, being about the if the relations
between entities and attributes are • Data Profiling
technically consistent.
• Data Matching

Data Quality Management • Data Quality Reporting

In data quality management the goal is • Master Data Management (MDM)


to exploit a balanced set of remedies in
• Customer Data Integration (CDI)
order to prevent future data quality issues
and to cleanse (or ultimately purge) data • Product Information Management (PIM)
that does not meet the data quality Key
• Digital Asset Management (DAM)
Performance Indicators (KPIs) needed to
achieve the business objectives of today
and tomorrow.

7
Data Governance

A data governance framework must lay out


the data policies and data standards that
sets the bar for what data quality KPIs that
is needed and which data elements that
should be addressed. This includes what
business rules that must be adhered to and
underpinned by data quality measures.

Furthermore, the data governance


framework must encompass the
organizational structures needed to
achieve the required level of data quality.
This includes fora as a data governance
committee or similar, roles as data owners,
data stewards, data custodians or similar in
balance with what makes sense in a
given organization.

A business glossary is another valuable


outcome from data governance used in
data quality management. The business
glossary is a primer to establish the
metadata used to achieve common data
definitions within an organization and
eventually in the business ecosystem
where the organization operates.

8
Data Profiling The classic example is how we spell
the name of a person differently due
It is essential that the people who are to misunderstandings, typos, use of
appointed to be responsible for data nicknames and more. With company
quality and those who are tasked with names the issues just piles up with
preventing data quality issues and data funny mnemonics and inclusion of legal
cleansing have a deep understanding of forms. When we place these persons and
the data at hand. organizations at locations using a postal
address the ways of writing that has
Data profiling is a method, often numerous outcomes too.
supported by dedicated technology, used
to understand the data assets involved Data matching is a technology based on
in data quality management. These data match codes, as for example soundex,
assets have most often been populated fuzzy logic and increasingly also machine
over the years by different people learning used to determine if two or more
operating under varying business rules and data records are describing the same
gathered for bespoke business objectives. real-world entity (typically a person, a
household or an organization).
In data profiling the frequency and
distribution of data values is counted on This method can be used in deduplicating
relevant structural levels. Data profiling can a single database and finding matching
also be used to discover the keys that relate entities across several data sources.
data entities across different databases
and in the degree that this is not already Often data matching is based on data
done within the single databases. parsing, where names, addresses and other
data elements are split into discrete data
Data profiling can be used to directly elements as for example an envelope type
measure data integrity and can be used as address is split into building name, unit,
input to set up the measurement of other house number, street, postal code, city,
data quality dimensions. state/province and country. This may be
supplemented by data standardization for
Data Matching example using the same value for street,
str and st.
When it comes to real-world alignment
using exact keys in databases is not enough.

9
Data Quality Reporting Master Data Management and Data
Quality Management (DQM) are tightly
The findings from data profiling can be coupled disciplines. MDM and DQM will
used as input to measure data quality be a part of the same data governance
KPIs based on the data quality dimensions framework and share the same roles as
relevant to a given organization. The data owners, data stewards and data
findings from data matching are especially custodians. Data profiling activities will
useful for measuring data uniqueness. most often be done with master data
assets. When doing data matching the
In addition to that it is helpful to operate a results must be kept in master data assets
data quality issue log, where known data controlling the merged and purged
quality issues are documented, and the records and the survivorship of data
preventive and data cleansing activities are attributes relating to those records.
followed up.
Customer Data Integration (CDI)
Organizations focussing on data quality
find it useful to operate a data quality Not at least customer master data are in

dashboard highlighting the data quality many organizations sourced from a range

KPIs and the trend in their measurements of applications. These are self-service

as well as the trend in issues going through registration sites, Customer Relationship

the data quality issue log. Management (CRM) applications, ERP


applications, customer service applications
Master Data Management and perhaps many more.
(MDM)
Besides setting up the technical platform
The most, and the most difficult, data for compiling the customer master data
quality issues are related to master data as from these sources into one source of truth
party master data (customer roles, supplier there is a huge effort in ensuring the data
roles, employee roles and more), product quality of that source of truth. This involves
master data and location master data. data matching and a sustainable way of
ensuring the right data completeness, the
Preventing data quality issues in a best data consistency and the adequate
sustainable way and not being forced to data accuracy.
launch data cleansing activities over and
again will for most organizations mean
that an MDM framework must be in place.

10
Product Information
Management (PIM)

As a manufacturer of goods, you need to


align your internal data quality KPIs with
those of your distributors and merchants in
order to make your products the ones that
will be chosen by end customers where
ever they have a touchpoint in the supply
chain. This must be done by ensuring
the data completeness and other data
quality dimensions within the product data
syndication processes.

As a merchant of goods, you will collect


product information from many suppliers
with each having their data quality KPIs
(or not having that yet). Merchants must
therefore work closely with their suppliers
and strive to have a uniform way of
receiving product data in the best quality
according to the data quality KPIs at the
merchant side.

Digital Asset Management


(DAM)

Digital assets are images, text documents,


videos and other files often used in
conjunction with product data. In the data
quality lens, the challenges for this kind of
data is around correct at relevant tagging
(metadata) as well as quality of the assets
as such as for example if a product image
shows only the product clearly and not a
lot of other things too.

11
Data Quality Best Practices 3. Occupy roles as data owners and data
stewards from the business side of
In the following we will, based on the the organization and occupy data
reasoning provided above in this post, list custodian roles from business or IT
a collection of 10 highly important data where it makes most sense.
quality best practices. These are:
4. Use a business glossary as
the foundation for metadata
1. Ensure top-level management management. Metadata is data about
involvement. Quite a lot of data quality data and metadata management
issues are only solved by having a cross must be used to have common data
departmental view. definitions and link those to current
and future business applications.
2. Manage data quality activities as a part
of a data governance framework. This 5. Operate a data quality issue log
framework should set the data policies with an entry for each issue with
and data standards, the roles needed information about the assigned
and provide a business glossary. data owner and the involved data
steward(s), the impact of the issue,
the resolution and the timing of the
necessary proceedings.

12
6. For each data quality issue raised, start
with a root cause analysis. The data
quality problems will only go away, if
the solution addresses the root cause.

7. When finding solutions strive to


implement processes and technology
that prevents the issues from occurring
as close to the data onboarding point
as possible rather than relying on
downstream data cleansing.

8. Define data quality KPIs that are


linked to the general KPIs for business
performance. Data quality KPIs,
sometimes also called Data Quality
Indicators (DQIs), can be related to
data quality dimensions as for example
data uniqueness, data completeness
and data consistency.

9. Use anecdotes about data quality


train wrecks to get awareness around
the importance of data quality.
However, use fact-based impact and
risk analysis to justify the solutions and
the needed funding.

10. Today a lot of data is already digitalized.


Therefore, avoid typing in data where
possible. Instead, try to find cost
effective solutions for data onboarding
that utilizes third party data sources for
publicly available data as for example
with locations in general and names,
addresses and IDs for companies and
some cases individual persons. For
product data utilize second party data
from trading partners where possible.

13
Data Quality Resources

There are many resources out here where you can learn more about data quality. Please
find below a list of some of the resources that may be very useful when framing a data
quality strategy and addressing specific data quality issues:

• Larry P. English is the father of data and information quality management. His thoughts are still available
here: https://www.information-management.com/author/larry-english-im30029

• Thomas C. Redman, aka the Data Doc, writes about data quality and data in general on Howard Business
Review. His articles are found here: https://hbr.org/search?term=thomas%20c.%20redman

• David Loshin has made a book with the title The Practitioners’ Guide to Data Quality Improvement
http://dataqualitybook.com/?page_id=2

• Gartner, the analyst firm, has a glossary with definitions of data quality terms here:
https://www.gartner.com/it-glossary/?s=data+quality

• Massachusetts Institute of Technology (MIT) has a Total Data Management Program (TDQM)
http://web.mit.edu/tdqm/www/index.shtml

• Knowledgent, a part of Accenture, provides a white paper on Data Quality Management here:
https://knowledgent.com/whitepaper/building-successful-data-quality-management-program/

• Deloitte has published a case study called data quality driven, customer insights enabled: https://www2.
deloitte.com/us/en/pages/deloitte-analytics/articles/data-quality-driven-customer-insights-enabled.html

• An article on bi-survey examines why data quality is essential in Business Intelligence


https://bi-survey.com/data-quality-master-data-management

• The University of Leipzig has a page on data matching in big data environments (they call it dedoop)
https://dbs.uni-leipzig.de/dedoop

• A Toolbox article by Steve Jones goes through How to Achieve Quality Data in a Big Data context
https://it.toolbox.com/blogs/stevejones/how-to-achieve-quality-data-111618

• An Information Week article points to 8 Ways To Ensure Data Quality https://www.informationweek.com/


big-data/big-data-analytics/8-ways-to-ensure-data-quality/d/d-id/1322239?image_number=1

• Data Quality Pro is a site, manged by Dylan Jones, with a lot of information about data quality:
https://www.dataqualitypro.com/

• Obsessive-Compulsive Data Quality (OCDQ) by Jim Harris is an inspiring blog about data quality and its
related disciplines http://www.ocdqblog.com/

• Nicola Askham runs a blog about data governance: https://www.nicolaaskham.com/blog One of the posts
in this blog is about what to include in a data quality issue log: https://www.nicolaaskham.com/blog/2018-
21-02what-do-you-include-in-data-quality-issue-log

• Henrik Liliendahl have a long-time running blog with over 1,000 blog posts about data quality and Master
Data Management: https://liliendahl.com/

• A blog called Viqtor Davis Data Craftmanship provides some useful insights on data management:
https://www.viqtordavis.com/blog/

14
Fast Track
Data Management

Profisee is a leading enterprise data


management company that makes
it easy and affordable for any size
organization to ensure a trusted data
foundation. Our unique, Fast Track Your
Data Management approach allows
companies to accelerate their business
digital strategies with enterprise data
management capability.

We serve the 90% of companies yet to


adopt an enterprise MDM (master data
management) platform by offering the
first “Fast, Affordable, and Scalable”
solution. Customers no longer need to
choose between cost, performance and
speed. No matter where an organization
is on their data management journey,
we help them become strategic. Our
customers have the freedom to choose
their deployment, with the flexibility to
deliver on premise, in the cloud, or via a
hybrid model.

Visit Profisee.com to learn more or


contact us to get a conversation started.

Profisee Headquarters
+1 678 202 8990
[email protected]

www.profisee.com
Documents_110_01_07

You might also like