Data Warehouse Architecture
Data Warehouse Architecture
Overview
Architecture a technical blueprint stage Must support 3 major driving forces:
Populating the warehouse
Data extraction, cleaning and loading
Load
Takes extracted data and loads it into the data warehouse
Data in operational systems is held in a from suitable for that system Before loading the data into the DW, information content must be reconstructed Data must become value added business information
Extract & load process must take data and add context and meaning
When to extract?
Data must be in consistent Start extracting data from data sources when it represents the same snapshot of time as all other data sources
Eg. Customer database
Issues
Loading the data
Extracted data are loaded into temporary data store to perform clean up and check for consistency Do not execute consistency checks until all the data sources have been loaded into the temporary data store
Eg. Customer canceling subscriptions
Error recovery must be an integral part of the design The effort required to clean up the source systems increases exponentially with the number of overlapping data sources
Issues
Copy Management tools and clean up
Eg. IBMs Information Warehouse Framework
Data Refresher & Data Hub
Most copy management tools do not have the capability of performing consistency check directly (user must write the logic & code it) Make cost-benefit analysis before purchasing copy management tool
If source systems do not overlap, then consistency checks are very simple
Once data is cleaned, convert source data into a structure that is designed to balance query performance and operational cost
The structure must be suitable for long term storage