The Data Warehouse ETL Toolkit - Chapter 06
The Data Warehouse ETL Toolkit - Chapter 06
Toolkit
VSV Training
Chapter 6: Delivering Fact Tables
Prepared by: Tho HOANG Hien
NGUYEN
Date: 09/02/2008
4. Fundamental Grains
1. Transaction Grain Fact Tables
2. Periodic Snapshot Fact Tables
3. Accumulating Snapshot Fact Tables
Managing Indexes
Managing Partitions
Outwitting the Rollback Log
Loading the Data
Design Requirement # 1
Design Requirement # 2
Design Requirement #3
Design Requirement #4
Administering Aggregations, Including Materialized
Views
6.0 Introduction
Fact tables hold the measurements
of an enterprise.
Fact tables contain measurements,
and dimension tables contain the
context surrounding measurements.
6.1 (cont.)
6.2 (cont.)
Three main places in the ETL pipeline where
referential integrity can be enforced. They are:
1.Careful bookkeeping and data preparation just
before loading the fact table records into the final
tables, coupled with careful bookkeeping before
deleting any dimension records
2. Enforcement of referential integrity in the
database itself at the moment of every fact table
insertion and every dimension table deletion
3. Discovery and correction of referential integrity
violations after loading has occurred by regularly
scanning the fact table, looking for bad foreign
keys
6.2 (cont.)
6.2 (cont.)
The queries checking referential
integrity must be of the form:
select f.product_key
from fact_table f
where f.product_key not in (select
p.product_key from
product_dimension p)
6.3 (cont.)
6.4.1. (cont.)
6.4.1. (cont.)
6.4.3. (cont.)
compared with
6.12. Aggregations
The single most dramatic way to
affect performance in a large data
warehouse is to provide a proper set
of aggregate (summary) records that
coexist with the primary base records.
Aggregate navigation is a standard
data warehouse topic that has been
discussed extensively in literature
6.12.5.
Administering Aggregations,
Including
Materialized Views
6.12.5.
Administering Aggregations,
Including
Materialized Views (cont.)
Summary
We defined the fact table as the
vessel that holds all numeric
measurements of the enterprise.
We saw that referential integrity is
hugely important to the proper
functioning of a dimensional schema,
and we proposed three places where
referential integrity can be enforced.
Summary (cont.)
We showed how to build a surrogate
key pipeline for data warehouses that
accurately track the historical
changes in their imensional entities
We described the structure of the
three kinds of fact tables: transaction
grain, periodic snapshot grain, and
accumulating snapshot grain
Summary (cont.)
We then proposed a number of specific
techniques for handling graceful
modifications to fact and dimension
tables, multiple units of measurement,
late-arriving fact data, and building
aggregations.
A specialized section on loading OLAP
cubes.