0% found this document useful (0 votes)
52 views

Data Integration Concepts, Processes, and Techniques

This document discusses change data concepts for data integration. It defines change data as data derived from internal and external sources to populate and refresh a data warehouse. It describes four types of change data: cooperative, which uses triggers; logged, which records audit information; queryable, stored in recent event tables; and snapshot, which identifies differences between files. The document also notes common data quality problems encountered during integration, such as multiple identifiers and missing values, especially from legacy systems.

Uploaded by

duc
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Data Integration Concepts, Processes, and Techniques

This document discusses change data concepts for data integration. It defines change data as data derived from internal and external sources to populate and refresh a data warehouse. It describes four types of change data: cooperative, which uses triggers; logged, which records audit information; queryable, stored in recent event tables; and snapshot, which identifies differences between files. The document also notes common data quality problems encountered during integration, such as multiple identifiers and missing values, especially from legacy systems.

Uploaded by

duc
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Information Systems Program

Module 4
Data Integration Concepts,
Processes,
and Techniques
Lesson 2: Change Data Concepts
Lesson Objectives
• Explain the types of data sources involved in data
integration
• Provide examples of typical data quality problems
encountered during data integration
• Reflect on the relationship between type of
change data and data quality

Information Systems Program


Basics of Change Data
• Derived from internal and external data sources
• Used to populate and refresh a data warehouse
– Insert rows in fact and dimension tables
– Update rows in dimension tables
• Challenges
– Difficult to change to source systems especially
external systems
– Lack of SQL access and descriptive (meta) data
especially for legacy data

Information Systems Program


Change Data Classification

Processing Level

Logged
Snapshot
Queryable

Cooperative

Source System Requirements


4

Information Systems Program


Cooperative Change Data

Applications
UPDATE … UPDATE
UPDATE trigger
trigger
INSERT …
Table
DELETE … INSERT
INSERT trigger
trigger

DELETE
DELETE trigger
trigger

Information Systems Program


Logged Change Data

IP Address 111.111.111.111
Remote user -
Authenticated user -
Timestamp [08/Oct/2014:11:17:55 -0400]
Access request "GET / HTTP/1.1"
Status 200
Bytes 10801
Referrer URL "http://www.google.com/search?q=log+analyzer&ie=utf-8&oe=utf-
8&aq=t&rls=org.mozilla:en- US:official&client=firefox-a"
User agent "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.7)
Gecko/20070914 Firefox/2.0.0.7"

Information Systems Program


Queryable Change Data

Event table with Recent events


date columns table

SELECT …
FROM <EventTable>
WHERE <event-cond>

Information Systems Program


Snapshot Change Data
Previous Source File

Delta

Difference
New rows
Changed rows
Current Source File Deleted rows

Information Systems Program


Data Quality Problems

• Multiple identifiers
• Different units
• Missing values
• Text data with different components and formats
• Conflicting data
• Different update times

Information Systems Program


Summary
• Change data used in data integration
• Understand source system requirements and processing
level for each type of change data
• Data quality problems more prevalent with legacy
systems

10

Information Systems Program

You might also like