DATA
CUBES
Online Analytic
Processing
OLAP
2
OLAP
• OLAP: Online Analytic
Processing
• OLAP queries are complex queries that
• Touch large amounts of data
• Discover patterns and trends in the data
• Typically expensive queries that take long
time
Select salary
• Also called decision-support queries
From Emp
• In contrast to OLAP: Where ID =
• OLTP: Online Transaction 100;
• Processing
OLTP queries are simple queries, e.g., over banking or
airline systems
• OLTP queries touch small amount of data for fast
transactions
3
OLTP vs. OLAP
On-Line Transaction Processing (OLTP):
– technology used to perform updates on operational
or transactional systems (e.g., point of sale systems)
On-Line Analytical Processing (OLAP):
– technology used to perform complex analysis of the
data in a data warehouse
OLAP is a category of software technology that enables
analysts, managers, and executives to gain insight into
data through fast, consistent, interactive access to a
wide variety of possible views of information that has
been transformed from raw data to reflect the
dimensionality of the enterprise
as understood by the user.
[source: OLAP Council: www.olapcouncil.org] 4
OLAP AND DATA WAREHOUSE
OLA
OLA
P
P
Server
Internal
Reports
Source
s Data Dat Query and
Integration a
Warehouse Analysis
Operation Componen
Componen Data
al DBs t
t Minin
g
Met
a
data Clien
Externa
l t
Source Tool
s s
5
OLAP AND DATA WAREHOUSE
• Typically, OLAP queries are executed over a separate copy
of the working data
• Over data warehouse
• Data warehouse is periodically updated, e.g., overnight
• OLAP queries tolerate such out-of-date gaps
• Why run OLAP queries over data warehouse??
• Warehouse collects and combines data from multiple sources
• Warehouse may organize the data in certain formats to support
OLAP queries
• OLAP queries are complex and touch large amounts of data
• They may lock the database for long periods of time
• Negatively affects all other OLTP transactions
6
OLAP ARCHITECTURE
7
EXAMPLE OLAP APPLICATIONS
• Market Analysis
• Find which items are frequently sold over the summer
but not over winter?
• Credit Card Companies
• Given a new applicant, does (s)he a credit-worthy?
• Need to check other similar applicants (age, gender,
income, etc…) and observe how they perform, then
do prediction for new applicant
OLAP queries are also called
“decision- support”
queries 8
MULTI-DIMENSIONAL VIEW
• Data is typically viewed as
Locatio points in multi-dimensional
n space
NY
Item MA
s CA Raw data cubes
(raw level
bread 10 without
Orange
aggregation)
juice
47
Milk 2%fat 30 Typical OLAP
applications have
Milk 1%fat 12 many dimensions
Tim
3/1 3/2 3/3 3/4 e
9
ANOTHER EXAMPLE
gender
nts
de
c i
age ac
'
10
DATA
CUBES
• Data cube is a structure that enable OLAP to
achieves the multidimensional functionality.
• The data cube is used to represent data along
some measure of interest.
• Data Cubes are an easy way to look at the data
( allow us to look at complex data in a simple
format).
• Although called a "cube", it can be 2-dimensional,
3- dimensional, or higher-dimensional.
DATA
CUBES
• databases design s is for OLTP and
efficiency in data storage.
• data cube design is for efficiency in
data retrieval (ensures report
optimization).
• The cube is comparable to a table in
a relational database.
Dimensions Measures and Hierarchies
• data cubes have categories of data called
dimensions and measures.
• measure
– represents some fact (or number) such as cost
or units of service.
• dimension
– represents descriptive categories of data such
as time or location.
Hierarchy
Some dimensions can have multiple levels forming
a hierarchy.
For example dates have year, month, day;
geography has country, region, city;
product might have category, subcategory and
the product.
Dimensions And
Measures
Data Cubes
Concepts
• Three important concepts associated
with data cubes :
1. Slicing.
2. Dicing.
3. Rotating.
Slicin
g
• the term slice most often refers to a
two- dimensional page selected from
the cube.
• subset of a multidimensional array
corresponding to a single value for one or
more members of the dimensions not in
the subset.
Slicin
g
Slicing-Wireless
Mouse
Slicin
g
Slicing-
Asia
Dicin
g
• A related operation to slicing .
• in the case of dicing, we define a subcube
of the original space.
• Dicing provides you the smallest
available slice.
Dicin
g
SELECT PRODUCT, SUM(REVENUE) FROM SALES
WHERE PRODUCTS= ‘OPV’ GROUP BY
PRODUCTS ;---- Slicing
EXAMPLE:
SELECT PRODUCT, SUM(REVENUE) FROM
SALES WHERE PRODUCTS= ‘EL’ AND
LOCATION=’EUROPE’ GROUP BY PRODUCTS;
---------DICING
Usage
Slice is used to select one particular dimension
from a given cube and to provide a new subcube.
Dice is used to select two or more dimensions from
a given cube and to provide a new subcube.
Rotatin
g
• Some times called pivoting.
• Rotating changes the dimensional orientation
of the report from the cube data.
• For example …
– rotating may consist of swapping the rows and
columns, or moving one of the row dimensions
into the column dimension
– or swapping an off-spreadsheet dimension with
one of the dimensions in the page display
Rotatin
g
Dimension
s
• represents descriptive categories of data
such as time or location.
• Each dimension includes different levels
of categories.
Dimension
s
Categorie
s
• is an item that matches a specific
description or classification such as years in
a time dimension.
• Categories can be at different levels
of information within a dimension.
Categorie
s
• parent category
– is the next higher level of another category in
a drill-up path.
• child category
– is the next lower level category in a drill-
down path.
Categorie
s
Categorie
s
measur
es
• The measures are the actual data values
that occupy the cells as defined by the
dimensions selected.
• Measures include facts or variables
typically stored as numerical fields.
measur
es
Computed versus Stored Data Cubes
• The goal is to retrieve the information
from the data cube in the most efficient
way possible.
• Three possible solutions are:
– Pre-compute all cells in the cube.
– Pre-compute no cells.
– Pre-compute some of the cells.
Computed versus Stored Data Cubes
• If the whole cube is pre-computed
– Advantage
• the queries run on the cube will be very
fast.
– Disadvantage
• pre-computed cube requires a lot of
memory.
Computed versus Stored Data Cubes
• To minimize memory requirements, we can
pre- compute none of the cells in the cube.
• But the queries on the cube will run more
slowly.
• As a compromise between these two, we can
pre- compute only those cells in the cube which
will most likely be used for decision support
queries.
representation of
Totals
• A simple data cube does not contain totals.
• The storage of totals increases the size of
the data cube but can also decrease the
time to make total-based queries.
• A simple way to represent totals is to add
an additional layer on n sides of the n-
dimensional data cube.
representation of
Totals