EEE Data Warehousing and Data Mining Reference Note
Unit-4
Data Cube Technology
Efficient Method for Data Cube Computation
Data cube computation is an essential task in data warehouse implementation. The pre-
computation of all or part of a data cube can greatly reduce the response time and enhance the
performance of on-line analytical processing. At the core of multidimensional data analysis is
the efficient computation of aggregations across many sets of dimensions. In SQL terms, these
aggregations are referred to as group-by’s. Each group-by can be represented by a cuboid,
where the set of group-by’s forms a lattice of cuboids defining a data cube.
A major challenge related to pre-computation would be time and storage space if al the cuboids
in the data cube are computed, especially when the cube has many dimensions
The Compute Cube or
The compute cube operator computes aggregates over all subsets of the dimensions specified
in the operation.
Following figure shows a 3-D data cube for the dimensions city, item, and year, and an
aggregate measure, M. A data cube is a lattice of cuboids.
‘OD ape) cei
1D cuboids
cuboids = 2°
Total no. of
2D cuboid
Each cuboid represents a group-by. (city, item, year) is the base cuboid, containing all three of
the dimensions. The base cuboid is the least generalized of all of the cuboids in the data cube
The most generalized cuboid is the apex cuboid, commonly represented as all. It contains one
value, To drill down in the data cube, we move from the apex cuboid, downward in the lattice.
To roll up, we move fiom the base cuboid, upward
Consider the following 2 cases for n-dimensional cube
- Case 1: Dimensions have no hierarchies
+ Then the total number of cuboids computed for a n-dimensional cube = 2"
= Case 2: Dimensions have hierarchies (E.g. Hierarchy “day