Data Management Fundamentals
Data Management Fundamentals
Data Management is the development, execution, and supervision of plans, policies, programs, and
practices that deliver, control, protect, and enhance the value of data and information assets
throughout their life cycles.
A Data Management Professional is any person who works in any facet of data management
(from technical management of data throughout its lifecycle to ensuring that data is properly
utilized and leveraged) to meet strategic organizational goals.
Data management professionals are called by many names in the industry, fill numerous roles,
from the highly technical (for example: database administrators, network administrators, and
programmers) to strategic and business (such as: Data Stewards, Data Strategists, Chief
Data Officers).
They include everything from the ability to make consistent decisions about how to get strategic
value from data to the technical deployment and performance of databases.
Thus, data management requires both technical and non-technical (in other words ‘business’) skills.
Responsibility for managing data must be shared between business and information technology roles,
and people in both areas must be able to collaborate to ensure an organization has high-quality data
that meets its strategic needs.
The framework of the Analytics Association of the Philippines defines five main career paths for a
data professional:
Data Steward
Data Engineer
Data Scientist
Functional Analyst
Analytics Managers
Data Management is involved in each one of these career paths, but most especially affects Data
Stewards and Engineers.
Data Stewards develop, enforce, and maintain an organization’s data governance process, data
usage, and data security policies to ensure that data assets provide the organization with high-quality
data.
Data Engineers design, construct, test, and maintain data infrastructures including applications
that extract, clean, transform, and load data from the data sources to centralized data
repositories.
These two roles will benefit tremendously from understanding data management concepts and
they are usually the main practitioners of data management in any organization.
Understanding and supporting the information needs of the enterprise and its
stakeholders, including customers, employees, and business partners
In relation to information technology, data is also understood as information that has been stored
in digital form (though data is not limited to information that has been digitized and data
management principles apply to data captured on paper as well as in databases).
Still, because today we can capture so much information electronically, we call many things ‘data’
that would not have been called ‘data’ in earlier times – things like names, addresses, birthdates,
what one ate for dinner on Saturday, the most recent book one purchased.
Most people assume that, because data represents facts, it is a form of truth about the world and
that the facts will fit together.
Data is both an interpretation of the objects it represents and an object that must be interpreted.
This is another way of saying that we need context for data to be meaningful.
Context can be thought of as data’s representational system; such a system includes a common
vocabulary and a set of relationships between components.
If we know the conventions of such a system, then we can interpret the data within it.
These conventions are often documented in a specific kind of data referred to as Metadata– or data
about data.
Think of the range of ways we have to represent calendar dates, a concept about which there is an
agreed-to definition.
Now consider more complex concepts (such as customer or product), where the granularity and
level of detail of what needs to be represented is not always self-evident.
Within a single organization, there are often multiple ways of representing the same idea.
Hence the need for Data Architecture, modeling, governance, and stewardship, and Metadata and
Data Quality management, all of which help people understand and use data.
Across organizations, the problem of multiplicity multiplies. Hence the need for industry-level
data standards that can bring more consistency to data.
Wisd
om
Knowledge
Information
Data
While the pyramid can be helpful in describing why data needs to be well-managed, this
representation presents several challenges for data management.
By describing a linear sequence from data through wisdom, it fails to recognize that it takes
knowledge to create data in the first place.
It implies that data and information are separate things, when in reality, the two concepts are
intertwined with and dependent on each other.
Within an organization, it may be helpful to draw a line between information and data for purposes
of clear communication about the requirements and expectations of different uses by different
stakeholders.
(“Here is a sales report for the last quarter [information]. It is based on data from our data
warehouse [data]. Next quarter these results [data] will be used to generate our quarter-over-
quarter performance measures [information]”).
Recognizing data and information need to be prepared for different purposes drives home a central
tenet of data management: Both data and information need to be managed.
Both will be of higher quality if they are managed together with uses and customer requirements in
mind.
Data as an Asset
An asset is an economic resource, that can be owned or controlled, and that holds or produces
value.
In the early 1990s, some organizations found it questionable whether the value of goodwill should
be given a monetary value.
Now, the ‘value of goodwill’ commonly shows up as an item on the Profit and Loss Statement (P&L).
Similarly, while not universally adopted, monetization of data is becoming increasingly common.
Today’s organizations rely on their data assets to make more effective decisions and to operate more
efficiently.
Government agencies, educational institutions, and not-for-profit organizations also need high-
quality data to guide their operational, tactical, and strategic activities.
As organizations increasingly depend on data, the value of data assets can be more clearly
established.
Businesses aiming to stay competitive must stop making decisions based on gut feelings or
instincts, and instead, use event triggers and apply analytics to gain actionable insight.
Being data-driven includes the recognition that data must be managed efficiently and with
professional discipline, through a partnership of business leadership and technical expertise.
Furthermore, the pace of business today means that change is no longer optional; digital
disruption is the norm.
To react to this, business must co-create information solutions with technical data professionals
working alongside line-of-business counterparts.
They must plan for how to obtain and manage data that they know they need to support business
strategy.
They must also position themselves to take advantage of opportunities to leverage data in new
ways.
Like other management processes, it must balance strategic and operational needs.
This balance can best be struck by following a set of principles that recognize salient features of
data management and guide data management practice.
First, Data is an asset with unique properties: Data is an asset, but it differs from other assets in
important ways that influence how it is managed. The most obvious of these properties is that data
is not consumed when it is used, as are financial and physical assets.
Second, The value of data can and should be expressed in economic terms: Calling data an
asset implies that it has value. While there are techniques for measuring data’s qualitative and
quantitative value, there are not yet standards for doing so. Organizations that want to make better
decisions about their data should develop consistent ways to quantify that value. They should also
measure both the costs of low-quality data and the benefits of high-quality data.
Third, Managing data means managing the quality of data: Ensuring that data is fit for purpose
is a primary goal of data management. To manage quality, organizations must ensure they
understand stakeholders’ requirements for quality and measure data against these requirements.
Fourth, It takes Metadata to manage data: Managing any asset requires having data about that
asset (number of employees, accounting codes, etc.). The data used to manage and use data is called
Metadata. Because data cannot be held or touched, to understand what it is and how to use it
requires definition and knowledge in the form of Metadata.
Metadata originates from a range of processes related to data creation, processing, and use,
including architecture, modeling, stewardship, governance, Data Quality management, systems
development, IT and business operations, and analytics.
Fifth, It takes planning to manage data: Even small organizations can have complex technical and
business process landscapes. Data is created in many places and is moved between places for use.
To coordinate work and keep the end results aligned requires planning from an architectural and
process perspective.
Sixth, Data management is cross-functional; it requires a range of skills and expertise: A data
team cannot manage all of an organization’s data. Data management requires both technical and
non-technical skills and the ability to collaborate.
Seventh, Data management requires an enterprise perspective: Data management has local
applications, but it must be applied across the enterprise to be as effective as possible. This is one
reason why data management and data governance are intertwined.
Eighth, Data management must account for a range of perspectives: Data is fluid. Data
management must constantly evolve to keep up with the ways data is created and used and the data
consumers who use it.
Nineth, Data management is lifecycle management: Data has a lifecycle and managing data
requires managing its lifecycle. Because data begets more data, the data lifecycle itself can be very
complex. Data management practices need to account for the data lifecycle.
Tenth, Different types of data have different lifecycle characteristics: And for this reason, they
have different management requirements. Data management practices have to recognize these
differences and be flexible enough to meet different kinds of data lifecycle requirements.
Eleventh, Managing data includes managing the risks associated with data: In addition to being
an asset, data also represents risk to an organization. Data can be lost, stolen, or misused.
Organizations must consider the ethical implications of their uses of data. Data-related risks must be
managed as part of the data lifecycle.
Twelfth, Data management requirements must drive Information Technology decisions: Data
and data management are deeply intertwined with information technology and information
technology management. Managing data requires an approach that ensures technology serves,
rather than drives, an organization’s strategic data needs .
Physical assets can be pointed to, touched, and moved around. They can be in only one place at a
time. Financial assets must be accounted for on a balance sheet.
However, data is different. Data is not tangible. Yet it is durable; it does not wear out, though the
value of data often changes as it ages.
Data is easy to copy and transport. But it is not easy to reproduce if it is lost or destroyed. Because it is
not consumed when used, it can even be stolen without being gone.
Data is dynamic and can be used for multiple purposes. The same data can even be used by multiple
people at the same time – something that is impossible with physical or financial assets.
Many uses of data beget more data. Most organizations must manage increasing volumes of data
and the relation between data sets.
Ensuring that data is of high quality is central to data management. Organizations manage their data
because they want to use it. If they cannot rely on it to meet business needs, then the effort to
collect, store, secure, and enable access to it is wasted.
To ensure data meets business needs, they must work with data consumers to define these needs,
including characteristics that make data of high quality.
Largely because data has been associated so closely with information technology, managing Data
Quality has historically been treated as an afterthought.
IT teams are often dismissive of the data that the systems they create are supposed to store.
It was probably a programmer who first observed ‘garbage in, garbage out’ – and who no doubt
wanted to let it go at that.
But the people who want to use the data cannot afford to be dismissive of quality.
They generally assume data is reliable and trustworthy, until they have a reason to doubt these
things.
Organizations require reliable Metadata to manage data as an asset. Metadata in this sense should
be understood comprehensively. It includes not only the business, technical, and operational
Metadata but also the Metadata embedded in Data Architecture, data models, data security
requirements, data integration standards, and data operational processes.
Metadata describes what data an organization has, what it represents, how it is classified, where it
came from, how it moves within the organization, how it evolves through use, who can and cannot use
it, and whether it is of high quality.
Data is abstract. Definitions and other descriptions of context enable it to be understood. They
make data, the data lifecycle, and the complex systems that contain data comprehensible.
Cross-functional nature
Data is managed in different places within an organization by teams that have responsibility for
different phases of the data lifecycle.
Data management requires design skills to plan for systems, highly technical skills to administer
hardware and build software, data analysis skills to understand issues and problems, analytic skills
to interpret data, language skills to bring consensus to definitions and models, as well as strategic
thinking to see opportunities to serve customers and meet goals.
The challenge is getting people with this range of skills and perspectives to recognize how the pieces
fit together so that they collaborate well as they work toward common goals.
To effectively manage data assets, organizations need to understand and plan for the data lifecycle.
Well-managed data is managed strategically, with a vision of how the organization will use its data.
A strategic organization will define not only its data content requirements, but also its data
management requirements.
These include policies and expectations for use, quality, controls, and security; an enterprise approach
to architecture and design; and a sustainable approach to both infrastructure and software
development.
Data Risk
Data not only represents value, it also represents risk. Low quality data (inaccurate, incomplete, or
out-of-date) obviously represents risk because its information is not right. But data is also risky
because it can be misunderstood and misused.
Organizations get the most value from the highest quality data – available, relevant, complete,
accurate, consistent, timely, usable, meaningful, and understood. Yet, for many important decisions,
we have information gaps – the difference between what we know and what we need to know to
make an effective decision.
Information gaps represent enterprise liabilities with potentially profound impacts on operational
effectiveness and profitability.
Organizations that recognize the value of high quality data can take concrete, proactive steps to
improve the quality and usability of data and information within regulatory and ethical cultural
frameworks.
From its inception, the concept of data management has been deeply intertwined with
management of technology.
In many organizations, there is ongoing tension between the drive to build new technology and the
desire to have more reliable data – as if the two were opposed to each other instead of necessary to
each other.
Successful data management requires sound decisions about technology, but managing
technology is not the same as managing data.
Instead, data requirements aligned with business strategy should drive decisions about technology.
In the game of chess, a strategy is a sequenced set of moves to win by checkmate or to survive by
stalemate.
A data strategy should include business plans to use information to competitive advantage and
support enterprise goals.
Data strategy must come from an understanding of the data needs inherent in the business
strategy: what data the organization needs, how it will get the data, how it will manage it and ensure
its reliability over time, and how it will utilize it.
In many organizations, the data management strategy is owned and maintained by the Chief Data
Officer (CDO) and enacted through a data governance team, supported by a Data Governance
Council.
Often, the CDO will draft an initial data strategy and data management strategy even before a Data
Governance Council is formed, in order to gain senior management’s commitment to establishing
data stewardship and governance.
A Data Management Charter: Overall vision, business case, goals, guiding principles, measures of
success, critical success factors, recognized risks, operating model, etc.
A Data Management Scope Statement: Goals and objectives for some planning horizon (usually 3
years) and the roles, organizations, and individual leaders accountable for achieving these
objectives.