OLAP QUERIES
1
Online Analytic Processing
OLAP
2
OLAP
• OLAP: Online Analytic Processing
• OLAP queries are complex queries that
• Touch large amounts of data
• Discover patterns and trends in the data
• Typically expensive queries that take long time
• Also called decision-support queries
Select salary
From Emp
• In contrast to OLAP: Where ID = 100;
• OLTP: Online Transaction Processing
• OLTP queries are simple queries, e.g., over banking or airline
systems
• OLTP queries touch small amount of data for fast transactions
3
OLTP vs. OLAP
§ On-Line Transaction Processing (OLTP):
– technology used to perform updates on operational or
transactional systems (e.g., point of sale systems)
§ On-Line Analytical Processing (OLAP):
– technology used to perform complex analysis of the data
in a data warehouse
OLAP is a category of software technology that enables
analysts, managers, and executives to gain insight into data
through fast, consistent, interactive access to a wide variety
of possible views of information that has been transformed
from raw data to reflect the dimensionality of the enterprise
as understood by the user.
[source: OLAP Council: www.olapcouncil.org]
4
OLAP AND DATA WAREHOUSE
OLAP
Server OLAP
Internal
Sources
Reports
Data Data Query and
Integration Warehouse Analysis
Operational Component Component
DBs Data
Mining
Meta
data
External Client
Sources Tools
5
OLAP AND DATA WAREHOUSE
• Typically, OLAP queries are executed over a separate copy of
the working data
• Over data warehouse
• Data warehouse is periodically updated, e.g., overnight
• OLAP queries tolerate such out-of-date gaps
• Why run OLAP queries over data warehouse??
• Warehouse collects and combines data from multiple sources
• Warehouse may organize the data in certain formats to support OLAP
queries
• OLAP queries are complex and touch large amounts of data
• They may lock the database for long periods of time
• Negatively affects all other OLTP transactions
6
OLAP ARCHITECTURE
7
EXAMPLE OLAP APPLICATIONS
• Market Analysis
• Find which items are frequently sold over the summer but
not over winter?
• Credit Card Companies
• Given a new applicant, does (s)he a credit-worthy?
• Need to check other similar applicants (age, gender,
income, etc…) and observe how they perform, then do
prediction for new applicant
OLAP queries are also called “decision-
support” queries
8
MULTI-DIMENSIONAL VIEW
• Data is typically viewed as points
Location in multi-dimensional space
NY
Items MA
CA Raw data cubes
(raw level without
bread 10 aggregation)
Orange
juice
47
Milk 2%fat 30 Typical OLAP applications
have many dimensions
Milk 1%fat 12
Time
3/1 3/2 3/3 3/4
9
ANOTHER EXAMPLE
"#'&#*
()
'
$%&#
! $
!"#
10
APPROACHES FOR OLAP
• Relational OLAP (ROLAP)
• Multi-dimensional OLAP (MOLAP)
• Hybrid OLAP (HOLAP) = ROLAP + MOLAP
11
RELATIONAL OLAP: ROLAP
• Data are stored in relational model (tables)
• Special schema called Star Schema
• One relation is the fact table, all the others are dimension tables
Product Region
Model Nation
Type Facts District
Color Product Dealer
Region
Large table Small tables
Time
Channel
Revenue
Channel Expenses Time
Units Week
Year
12
CUBE vs. STAR SCHEMA
Product Region
Dimension tables Model Nation
Facts
describe the dimensions Type District
Color Product Dealer
Region
Time
Location Channel
Revenue
NY
Channel Expenses Time
Items MA
CA Units Week
Year
bread 10
Orange
juice
47 Data inside the cube
are the fact records
Milk 2%fat 30
Milk 1%fat 12
Time
3/1 3/2 3/3 3/4
13
ROLAP: EXTENSIONS TO DBMS
• Schema design
• Specialized scan, indexing and join techniques
• Handling of aggregate views (querying and materialization)
• Supporting query language extensions beyond SQL
• Complex query processing and optimization
• Data partitioning and parallelism
14
SLICING & DICING
Dicing Location by state
• Dicing Location
• how each dimension in the cube NY
Items MA
is divided CA
• Different granularities bread 10
• When building the data cube Orange
juice
47
Milk 2%fat 30
• Slicing Milk 1%fat 12
• Selecting slices of the data cube Time
3/1 3/2 3/3 3/4
to answer the OLAP query
• When answering a query
Dicing Time by day
15
SLICING & DICING: EXAMPLE 1
Dicing Slicing
Slicing operation in ROLAP is basically:
-- Selection conditions on some attributes (WHERE clause) +
-- Group by and aggregation
16
SLICING & DICING: EXAMPLE 2
17
SLICING & DICING: EXAMPLE 3
18
DRILL-DOWN & ROLL-UP
Region Sales variance
Africa 105%
Asia 57%
Europe 122%
North America 97%
Pacific 85%
South America 163%
Roll-up
Drill-down (group by Region)
(Group by Nation)
Nation Sales variance
China 123%
Japan 52%
India 87%
Singapore 95%
19
ROLAP: DRILL-DOWN & ROLL-UP
Drill-down Roll-up
20
MOLAP
• Unlike ROLAP, in MOLAP data are stored in special structures called
“Data Cubes” (Array-bases storage)
• Data cubes pre-compute and aggregate the data
• Possibly several data cubes with different granularities
• Data cubes are aggregated materialized views over the data
• As long as the data does not change frequently, the overhead of
data cubes is manageable
Sales 1996 1997
Red
blob
Blue
blob
Every week, every item
Every day, every item, every city
category, every city 21
MOLAP: CUBE OPERATOR
Aggregation over the X,Y
Aggregation over the Z axis
Aggregation over the Y axis
Raw-data (fact table)
Aggregation over the X axis
22
MOLAP & ROLAP
• Commercial offerings of both types are available
• In general, MOLAP is good for smaller warehouses and is
optimized for canned queries
• In general, ROLAP is more flexible and leverages relational
technology
• ROLAP May pay a performance penalty to realize flexibility
23
OLTP vs. OLAP
OLTP OLAP
User • Clerk, IT Professional • Knowledge worker
Function • Day to day operations • Decision support
DB Design • Application-oriented (E-R • Subject-oriented (Star, snowflake)
based)
Data • Current, Isolated • Historical, Consolidated
View • Detailed, Flat relational • Summarized, Multidimensional
Usage • Structured, Repetitive • Ad hoc
Unit of work • Short, Simple transaction • Complex query
Access • Read/write • Read Mostly
Operations • Index/hash on prim. Key • Lots of Scans
# Records accessed • Tens • Millions
#Users • Thousands • Hundreds
Db size • 100 MB-GB • 100GB-TB
Metric • Trans. throughput • Query throughput, response
Source: Datta, GT
24
OLAP: SUMMARY
• OLAP stands for Online Analytic Processing and used in
decision support systems
• Usually runs on data warehouse
• In contrast to OLTP, OLAP queries are complex, touch large
amounts of data, try to discover patterns or trends in the data
• OLAP Models
• Relational (ROLAP): uses relational star schema
• Multidimensional (MOLAP): uses data cubes
25