Data Warehousing: Defined and Its Applications
Data Warehousing: Defined and Its Applications
Pete Johnson
April 2002
Introduction
• The Past and The Problem
• What is a Data Warehouse?
• Components of a Data Warehouse
• OLAP, Metadata, Data Mining
• Getting the Data in
• Benefits vs. Costs
• Conclusion & Questions
The Past and The Problem
• Only had scattered transactional systems in the
organization – data spread among different
systems
• Transactional systems were not designed for
decision support analysis
• Data constantly changes on transactional systems
• Lack of historical data
• Often resources were taxed with both needs on the
same systems
The Past and The Problem
• Operational databases are designed to keep
transactions from daily operations. It is
optimized to efficiently update or create
individual records
• A database for analysis on the other hand
needs to be geared toward flexible requests
or queries (Ad hoc, statistical analysis)
What is a Data Warehouse?
Data warehousing is an architectural model
designed to gather data from various
sources into a single unified data model for
analysis purposes.
What Is a Data Warehouse?
Term was introduced in 1990 by William
Immon
A managed database in which the data is:
• Subject Oriented
• Integrated
• Time Variant
• Non Volatile
Subject Oriented
• Organized around major subject areas in the
enterprise (Sales, Inventory, Financial, etc.)
• Only includes data which is used in the
decision making processes
• Elements used for transactional processing are
removed
Integrated
• Data from different sources are brought
together and consolidated
• The data is cleaned and made consistent
Information
Exploration / Analysis
SQL reporting
Relational Warehouse
Three Steps :
1. Extraction Phase
2. Transformation Phase
3. Loading Phase
Getting the Data In
Extraction Phase:
• Source systems export data via files or
populates directly when the databases can
“talk” to each other
• Transfers them to the Data Warehouse
server and puts it into some sort of staging
area
Getting the Data In
Transformation Phase:
• Takes data and turns it into a form that is suitable
for insertion into the warehouse
• Combines related data
• Removes redundancies
• Common Codes (Commercial Customer)
• Spelling Mistakes (Lozenges)
• Consistency (PA,Pa,Penna,Pennsylvania)
• Formatting (Addresses)
Getting the Data In
Loading Phase:
• Places the cleaned data into the DBMS in
its final, useable form
• Compare data from source systems and the
Data Warehouse
• Document the load information for the
users
Benefits vs. Costs
Benefits
• Creates a single point for all data
• System is optimized and designed
specifically for analysis
• Access data without impacting the
operational systems
• Users can access the data directly without
the direct help from IT dept
Costs
• Cost of implementation & maintenance
(hardware, software, and staffing)
• Lack of compatibility between components
• Data from many sources are hard to
combine, data integrity issues
• Bad designs and practices can lead to costly
failures
Conclusion
• What is a Data Warehouse?
• Components of a Data Warehouse
• How the Data Gets In
• OLAP, Metadata, and Data Mining
• Benefits vs. Costs
Questions