0% found this document useful (0 votes)
39 views

Data Classification Workflow For Data Stewards

The document discusses the data steward workflow which involves ingesting new data sources, classifying the physical data within them, and connecting the physical and logical data layers. The challenges are that classifying data takes too long and is too manual. The approach is to ingest, classify, and connect data using classifications. The outcome is shorter classification time and connecting physical and logical data.

Uploaded by

Host Mom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

Data Classification Workflow For Data Stewards

The document discusses the data steward workflow which involves ingesting new data sources, classifying the physical data within them, and connecting the physical and logical data layers. The challenges are that classifying data takes too long and is too manual. The approach is to ingest, classify, and connect data using classifications. The outcome is shorter classification time and connecting physical and logical data.

Uploaded by

Host Mom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Data Classification Workflow for data

stewards
Zooming in on the data steward workflow, we're going to focus on onboarding, classifying, and arranging data.
The specific steps that we're going to take you through are ingesting and onboarding new sets of data, identifying
the physical data within that, classifying that physical data, and then using those classifications to connect your
physical and logical data layers within your application.

As we go through the data steward workflow, we're going to focus on the challenge, approach, and outcome that
the data steward is taking. The challenges that we hope to address with data classification for the data steward are
that it takes too long to add context to registered data and it's too manual of a process in general, and I don't have
enough data to quickly connect physical and logical assets. So being able to easily make those connections
between the physical and logical layer.
The approach as outlined in the previous slide is to ingest new data, classify that data, optionally give feedback on
that classification to make our recommendations stronger in the future, and then connect the logical and physical
data assets using those classifications as guidepost. The expected outcome is that classification will allow us to
shorten the time required to define, understand, and contextualize physical data and connect it to the logical
operating model.

After we've logged into our data steward account, our first objective is going to be to load some new data sources.
Within the Catalog pane, we'll head to the create button and register a new data source. In this example, we're
going to be working with a simple CSV file. We'll select which Jobserver to process that data on and give a short
description of the file. In order to fully utilize data classification, we'll want to store both the data profile and the
sample data of the information being ingested.

Once the ingestion process is completed, we can see the results within our Data Catalog. As you can see, our table
has been successfully ingested, including sample data and profiling information available. To carry out data
classification, from the table level, we'll go to the More dropdown and go to the bottom and click Classify. This
will click off the classification workflow, which should complete in a matter of seconds. Once classification is
completed, refresh the page and you should see the classifications load. As you can see here, we have a number of
different data classification suggested, with confidence intervals associated with each. These confidence intervals
indicate how confident the Collibra matching algorithm is with the class that it's suggested. At this point, we can
optionally go through each one of these columns and either accept or reject the classes that have been suggested.

Collibra Inc. (TIN 80-0924168) Phone Email Visit


61 Broadway, 31st Floor, New York, NY 10006 – USA +1 646 893 3042 [email protected] university.collibra.com
Even in the absence of feedback from the data steward, these classifications that have been suggested for
individual columns or searchable by data consumers.

At this point in the data steward workflow, we'd like to complete the process by going into individual physical
columns and connecting them to the logical data layer. By creating this connection, we'll give other data
consumers a useful data point to find the information they need quickly. And with that, the data steward workflow
is complete.

To review, classification for the data steward consisted of ingesting new physical data, identifying that physical
data, classifying the data, and then connecting that physical data to assets in the logical and conceptual layer. By
carrying out these steps, we've laid the foundation for simple Catalog use by our data consumers and data science.

Collibra Inc. (TIN 80-0924168) Phone Email Visit


61 Broadway, 31st Floor, New York, NY 10006 – USA +1 646 893 3042 [email protected] university.collibra.com

You might also like