Data Profiling Overview: What Is Data Profiling, and How Can It Help With Data Quality?
Data Profiling Overview: What Is Data Profiling, and How Can It Help With Data Quality?
What Is Data Profiling, and How Can It Help With Data Quality?
Data Profiling is a systematic analysis of the content of a data source (Ralph
Kimball).
You must look at the data; you cant trust copybooks, data models, or source
system experts
It is systematic in the sense that its thorough and looks in all the nooks and
crannies of the data
You have to know your data before you can fix it
Completeness Analysis
o How often is a given attribute populated, versus blank or null?
Uniqueness Analysis
o How many unique (distinct) values are found for a given
attribute across all records? Are there duplicates? Should
there be?
Values Distribution Analysis
o What is the distribution of records across different values for a
given attribute?
Range Analysis
o What are the minimum, maximum, average and median
values found for a given attribute?
Pattern Analysis
o What formats were found for a given attribute, and what is the
distribution of records across these formats?
Available Tools
A variety of options exist in the marketplace to help ease the challenge of data
profiling. They range in capabilities and price. Tools like Datiris Profiler and
Informatica Data Quality have been successfully deployed by myriad of
organizations. Implemented in the right way, such tools stand to sculpt the data
profiling landscape, by reducing effort, broadening scope, and improving consistency
across all data quality initiatives.