0% found this document useful (0 votes)
165 views

How To Succeed With Data Classification Using Modern Approaches

The document discusses modern approaches to data classification, including shifting from manual to automated classification using tools that leverage metadata and machine learning. It recommends establishing an effective data classification program by focusing on automation, segmentation into discovery, enrichment, and control phases, and increasing depth through descriptive classification and metadata tagging.

Uploaded by

tim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
165 views

How To Succeed With Data Classification Using Modern Approaches

The document discusses modern approaches to data classification, including shifting from manual to automated classification using tools that leverage metadata and machine learning. It recommends establishing an effective data classification program by focusing on automation, segmentation into discovery, enrichment, and control phases, and increasing depth through descriptive classification and metadata tagging.

Uploaded by

tim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Licensed for Distribution

How to Succeed With Data Classification Using Modern


Approaches
Published 25 March 2022 - ID G00764590 - 6 min read

By Ravisha Chugh, Bart Willemsen, and 1 more

Data classification is critical as most of an organization’s data is in an unstructured format


and classifying it manually is cumbersome. Security and risk management leaders need to
understand the alternatives to traditional classification approaches and address data security
governance.

Overview
Key Findings
Manual data classification approaches can result in misclassification of data due to human
error or a lack of user awareness training.

While users label/tag their data, these labels remain one-dimensional, serving a single purpose,
and do not provide sufficient context for increasing regulatory data controls.

Recommendations
To implement an effective data classification program, security and risk management leaders
tasked with data security must:

Establish a data classification program by shifting focus from user awareness and training
toward automation and the enrichment tools that generate metadata.

Increase depth and dimensionality in the data classification approach by segmenting into a
discovery phase, data enrichment phase and control phase.

Introduction
Data classification is vital as it is useful in supporting controls for data security and governance
such as data loss prevention (DLP), data access governance and enterprise digital rights
management (EDRM). It also helps organizations to understand data in the context of its usage
and risk levels. However, Gartner observes that unstructured data is becoming increasingly
difficult to manage. As a result, the individuals or systems that are tasked with processing
information rarely classify, label or enforce controls on every piece of data. This inconsistency
makes classification unreliable as a driver of and means of support for data security and
compliance efforts. Organizations need a practical data classification approach that provides a
foundation for the business to understand and address the mitigating measures necessary.

There are two types of tools that are available in the market for data classification.

1. User-driven/manual tools: These tools enforce the classification of data at the time of creation
or use. They rely on user education and awareness, an absence of which will lead to
inconsistent and misclassified data.

2. Automated tools: These tools are based on out-of-the-box policies and templates that are
provided by the vendors to identify the sensitive data and further classify it. Apart from
analyzing the content, leading tools also leverage context such as location, access groups and
adjacent documents. Automated tools get the best results with well-known standard data
types (such as driving license information, proper names and social security numbers). If your
intellectual property data is consistently well-formatted (such as with an account number or
project coding system), then automated systems will succeed there.

The introduction of machine learning to automated data classification tools has proven to be
beneficial, especially as some of these tools are now supporting dynamic feedback. These tools
learn from the responses provided by the security analyst/administrators, which helps to quickly
address any false positives. But for most tools, the cost of implementing and tuning them to
reliably identify sensitive internal or proprietary data in detail is prohibitively expensive — and for
those use cases, user-driven classification should be considered instead (or preferably as well).
Analysis
Enrich, Don’t Just Classify Data
Traditional data classification approaches have always relied on users. Data owners and data
creators were responsible for classifying any file or document they created or owned. There are
some prerequisites involved, including user awareness training, educating users about the
importance of data classification and preexisting data classification policies.

To accommodate users, sensitivity classification schemes are often simplified into “buckets.” The
four levels of classification that are often used are:

Restricted

Confidential

Internal

Public

This approach is dependent on the understanding (and often the risk appetite) of the users that
are classifying the information. This is prone to human error, which might also lead to
misclassification of data.

Misclassification comes in two flavors: data can be:

Underclassified (either through error or because users realize that a lower classification will
make their job easier).

Overclassified (a common mistake when users are risk-averse or uncomfortable with the
scheme, leading to overspending and difficulty in accessing and handling the data).

SRM leaders currently using a traditional classification scheme — and finding that it does not
support the increased detail demanded of modern data governance laws — should take steps to
evolve toward metadata enrichment. As metadata in general terms refers to data about data, this
approach provides additional information to the data, which can be further embedded directly into
the files. This approach is called “descriptive classification.” Here, data is classified not in
accordance with control requirements, but in accordance with the semantic description of the
data. Figure 1 is an example of descriptive classification.

Figure 1. Descriptive Classification

Here, users set the description of the data (such as customer records, financial data and HR data)
which is mapped with the control requirement so that the description itself yields metadata. The
benefits of this method are a reduction in the need for awareness, and a reduction in human error
and misclassification This approach also provides a good transition from control-based
classifiers, as each descriptive classifier maps to a control. The organization also gains the
benefit of inferred metadata associated with the descriptive classifier, so for example “HR data”' is
taken to contain both “personal” and “personal sensitive” data. Also, as there is high risk of data
exfiltration, this approach will help organizations to easily classify the information and ensure that
only the right people have access to any sensitive data. The one downside is that the list of
descriptive classifiers is far longer.

Adopt Governance Agility by Breaking Classification Into Discovery, Enrichment


and Control
As traditional manual data classification methods have lots of limitations, we also see many tools
providing automated data classification techniques. This approach is called “governance agility,”
which involves three phases.

The first phase is a discovery process, which involves locating information. This may seem trivial,
but the nature of our digital world means that information is everywhere, and much of it is
unknown to IT teams. Most of the work carried out by the automated data classification tool
provides data discovery capabilities. Next comes enrichment, which takes the result of discovery
and applies tags or labels to data objects. Many tools provide the needed automation for this step
by using content inspection capabilities as well as AI-driven methods including machine learning,
natural language processing (NLP) and computer vision.

For example, some of the tags associated with a résumé document would include aspects like
“Personal,” “Sensitive,” “HR,” “CV,” “DOB: 19760822,” “Last Edit: 20190326” and “Region: India.” The
last step is applying controls where these tags provide the critical metadata needed by control
tools — such as data retention tools, DLP tools or content collaboration platforms — to properly
handle the files in question (see Figure 2).

Figure 2. Governance Agility


In this example, simply detecting personal data in unstructured objects does not give an
organization much context with which to mitigate any risk. Associating metadata tags or labels
with the personal data in the objects gives the organization actionable outcomes, allowing
multiple control tools to automate risk mitigation. Metadata enrichment is an important step that
helps to develop a rich understanding of data, and allows further controls to be applied. Some of
the vendors that are moving in this direction include NOW Privacy, MinerEye, Dathena (now
acquired by Proofpoint) and Securiti.AI.
Evidence
This research is based on a large volume of inquiries on the topic of data classification policies
and technologies between March 2021 and March 2022.
 
© 2023 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc.
and its affiliates. This publication may not be reproduced or distributed in any form without Gartner's prior
written permission. It consists of the opinions of Gartner's research organization, which should not be
construed as statements of fact. While the information contained in this publication has been obtained from
sources believed to be reliable, Gartner disclaims all warranties as to the accuracy, completeness or adequacy
of such information. Although Gartner research may address legal and financial issues, Gartner does not
provide legal or investment advice and its research should not be construed or used as such. Your access and
use of this publication are governed by Gartner’s Usage Policy. Gartner prides itself on its reputation for
independence and objectivity. Its research is produced independently by its research organization without input
or influence from any third party. For further information, see "Guiding Principles on Independence and
Objectivity."

About Careers Newsroom Policies Site Index IT Glossary Gartner Blog Network Contact Send
Feedback

© 2023 Gartner, Inc. and/or its Affiliates. All Rights Reserved.

You might also like