What Are The 5 Dimensions of Data Quality? Explain. o Accuracy
What Are The 5 Dimensions of Data Quality? Explain. o Accuracy
o Accuracy
Accuracy is the correctness of the data. It is the degree to if the data can represent
real-world situation or reflect the actual event, such as if the product name is
spelling correctly, if the sales number reflect customers’ demand, or if the data
o Completeness
relevant are captured and recorded. A complete dataset should not be missing any
o Uniqueness
Uniqueness is the fact that the data is the only one being recorded in the dataset.
This dimension of the data ensures there is no repetitive data in the dataset and
o Timeliness
Timeliness refers to if the data is up to date. Quality should not be outdated and
o Consistency
words, it means if the data is making sense and in sync with the operating system.
2. List at least 5 dirty data problems and briefly describe each.
When “fields too long” happens and information gets truncated, a record or data
set can be incomplete as it lacks the key fields to be processed the incoming
o Different representations
It refers to inconsistent data, which means data that looks different but represent
the same thing. For example, if you input the data “1”, data that has different
o Redundant Records
This might happen when there is duplicate data in the dataset, which results in
redundant actions and records. This might be caused by data exchanges through
integrations, 3rd party connectors, manual entry, and from batch imports
(Anonymous, n.d.)
o Insecurity/Privacy
Insecure data that violate individuals’ privacy has become one of the most
dangerous types of dirty data. Without proper database hygiene, remain within
o Naming conventions/formats
Any minor differences on naming conventions and formats, like spelling, date
formatting, and currency, can be regarded as dirty data (Anonymous, n.d.). For
instance, New York, NYC, NEW YORK, NY. Even they all refer to the same
area, the inconsistency might cause analysis tools to read two separate items.
Another example is date formatting, such as MM/DD/YYYY but the data set is
formatted as DD-MM-YYYY.
i
i1
Thatipamula, S. (2020) Data Done Right: 6 Dimensions of Data Quality. Smartbridge. Retrieved from
https://smartbridge.com/data-done-right-6-dimensions-of-data-quality/
2
Anonymous. (n.d.) The 7 Most Common Types of Dirty Data (and how to clean them). RingLead. Retrieved from
https://www.ringlead.com/blog/the-7-most-common-types-of-dirty-data-and-how-to-clean-them/