Data Warehouse Schemas For Decision Support
Data Warehouse Schemas For Decision Support
Support
Data Warehousing
Integrated data spanning long EXTRACT
time periods, often augmented TRANSFORM
LOAD
with summary information. REFRESH
Several gigabytes to terabytes
common.
DATA
Interactive response times Metadata WAREHOUSE
expected for complex Repository
queries; ad-hoc updates
SUPPORTS
uncommon.
DATA
MINING OLAP
Database Management Systems, 2nd Edition. R. Ramakrishnan and J. Gehrke
Warehousing Issues
Semantic Integration: When getting data from
multiple sources, must eliminate mismatches, e.g.,
different currencies, schemas.
Heterogeneous Sources: Must access data from a
variety of source formats and repositories.
Replication capabilities can be exploited here.
Load, Refresh, Purge: Must load data, periodically
refresh it, and purge too-old data.
Metadata Management: Must keep track of source,
loading time, and other information for all data in the
warehouse.
locid
sales
pid
Data Model
Collection of numeric measures,
which depend on a set of dimensions.
E.g., measure Sales, dimensions
Product (key: pid), Location (locid),
and Time (timeid).
11 12 13
8 10 10
Slice locid=1
pid
is shown: 30 20 50
25 8 15
locid
1 2 3
timeid
Database Management Systems, 2nd Edition. R. Ramakrishnan and J. Gehrke
MOLAP vs ROLAP
Multidimensional data can be stored physically in a
(disk-resident, persistent) array; called MOLAP
systems. Alternatively, can store as a relation; called
ROLAP systems.
The main relation, which relates dimensions to a
measure, is called the fact table. Each dimension can
have additional attributes and an associated
dimension table.
E.g., Products(pid, pname, category, price)
Fact tables are much larger than dimensional tables.
year
quarter country
1995 63 81 144
Slicing and Dicing: Equality 1996 38 107 145