0% found this document useful (0 votes)
59 views

What Are The 5 Dimensions of Data Quality? Explain. o Accuracy

The document discusses the 5 dimensions of data quality and 5 common dirty data problems. The 5 dimensions of data quality are accuracy, completeness, uniqueness, timeliness, and consistency. Accuracy refers to correctness of data. Completeness means all relevant data is captured. Uniqueness ensures no duplicate data. Timeliness means data is up to date. Consistency means data is consistent across systems. The 5 dirty data problems described are fields too long, different representations, redundant records, insecure/private data, and issues with naming conventions/formats.

Uploaded by

Doris Liu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

What Are The 5 Dimensions of Data Quality? Explain. o Accuracy

The document discusses the 5 dimensions of data quality and 5 common dirty data problems. The 5 dimensions of data quality are accuracy, completeness, uniqueness, timeliness, and consistency. Accuracy refers to correctness of data. Completeness means all relevant data is captured. Uniqueness ensures no duplicate data. Timeliness means data is up to date. Consistency means data is consistent across systems. The 5 dirty data problems described are fields too long, different representations, redundant records, insecure/private data, and issues with naming conventions/formats.

Uploaded by

Doris Liu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

1. What are the 5 dimensions of data quality? Explain.

o Accuracy

Accuracy is the correctness of the data. It is the degree to if the data can represent

real-world situation or reflect the actual event, such as if the product name is

spelling correctly, if the sales number reflect customers’ demand, or if the data

even exists in the real world (Thatipamula, 2020).

o Completeness

Completeness is the comprehensiveness of the data. It is the degree to if all

relevant are captured and recorded. A complete dataset should not be missing any

information, even if it is not required.

o Uniqueness

Uniqueness is the fact that the data is the only one being recorded in the dataset.

This dimension of the data ensures there is no repetitive data in the dataset and

every information is meaningful.

o Timeliness

Timeliness refers to if the data is up to date. Quality should not be outdated and

be timely enough to be applied in real life (Thatipamula, 2020).

o Consistency

Consistency refers to data is able to reflect same information across different

datasets and systems, versus conflict information (Thatipamula, 2020). In other

words, it means if the data is making sense and in sync with the operating system.
2. List at least 5 dirty data problems and briefly describe each.

o Fields too long

When “fields too long” happens and information gets truncated, a record or data

set can be incomplete as it lacks the key fields to be processed the incoming

information (Anonymous, n.d.)

o Different representations

It refers to inconsistent data, which means data that looks different but represent

the same thing. For example, if you input the data “1”, data that has different

representations might show either “1”, “one”, or “I”.

o Redundant Records

This might happen when there is duplicate data in the dataset, which results in

redundant actions and records. This might be caused by data exchanges through

integrations, 3rd party connectors, manual entry, and from batch imports

(Anonymous, n.d.)

o Insecurity/Privacy

Insecure data that violate individuals’ privacy has become one of the most

dangerous types of dirty data. Without proper database hygiene, remain within

these stringent regulations becomes nearly impossible (Anonymous, n.d.)

o Naming conventions/formats

Any minor differences on naming conventions and formats, like spelling, date

formatting, and currency, can be regarded as dirty data (Anonymous, n.d.). For

instance, New York, NYC, NEW YORK, NY. Even they all refer to the same

area, the inconsistency might cause analysis tools to read two separate items.
Another example is date formatting, such as MM/DD/YYYY but the data set is

formatted as DD-MM-YYYY.

i
i1
Thatipamula, S. (2020) Data Done Right: 6 Dimensions of Data Quality. Smartbridge. Retrieved from
https://smartbridge.com/data-done-right-6-dimensions-of-data-quality/
2
Anonymous. (n.d.) The 7 Most Common Types of Dirty Data (and how to clean them). RingLead. Retrieved from
https://www.ringlead.com/blog/the-7-most-common-types-of-dirty-data-and-how-to-clean-them/

You might also like