0% found this document useful (0 votes)
11 views

db2 Olap

The document discusses OLAP and data analysis techniques in DB2. It describes how to perform cross tabulation, pivoting, slicing, dicing, roll-ups, drill downs and other OLAP operations. It also covers extended aggregation using GROUP BY, ROLLUP, CUBE and GROUPING SETS. Ranking queries using RANK and DENSE_RANK functions are also discussed.

Uploaded by

gkathiravan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

db2 Olap

The document discusses OLAP and data analysis techniques in DB2. It describes how to perform cross tabulation, pivoting, slicing, dicing, roll-ups, drill downs and other OLAP operations. It also covers extended aggregation using GROUP BY, ROLLUP, CUBE and GROUPING SETS. Ranking queries using RANK and DENSE_RANK functions are also discussed.

Uploaded by

gkathiravan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

OLAP Data Analysis with DB2

Data Analysis and OLAP

Aggregate functions summarize large volumes of data.


Online Analytical Processing (OLAP):
. Interactive analysis of data.
. Allows data to be summarized and viewed in different ways in an online fashion
(with negligible delay).
 OLAP data is modeled multi-dimensionally.
 It can be modeled as dimension attributes and measure attributes:
. Given a relation used for data analysis, we can identify some of its attributes as
measure attributes, since they measure some value and can be aggregated upon
(e.g., number of sales, inhabitants, passengers, etc.).
. Some of the other attributes of the relation are identified as dimension attributes,
since they define the dimensions on which measure attributes and summaries of
measure attributes are viewed.



c 2005 Jens Teubner, Andre Seifert, University of Konstanz


1.1


Cross Tabulation and its Relational Representation

A cross tabulation, also referred to as a pivot table, is a table where


. values of one of the dimension attributes form the row headers,
. values of another dimension attribute form the column headers, and
. values in individual cells are (aggregates of) the values of the dimension attributes
that specify the cell.

The table below is an example of a cross tab:

Country

Sex
Australia
Denmark
Germany
Netherlands
United States
Total

Male
9,913, 658
2,676, 377
40,413, 132
8,079, 392
143,957, 558
205,040, 117

c 2005 Jens Teubner, Andre Seifert, University of Konstanz


Female
9,999, 486
2,737, 015
42,011, 477
8,238, 807
149,070, 013
212,056, 798

Total
19,913, 144
5,413, 392
82,424, 609
16,318, 199
293,027, 571
417,096, 915

In relational DBMSs cross tabs are represented as relations:


. The value all is used to represent
aggregates.
. The SQL 1999 standard uses null
values in place of all.
. DB2 used the minus sign (-) to
denote aggregate values.

c 2005 Jens Teubner, Andre Seifert, University of Konstanz


Country

Sex

Population

Australia
Australia
Australia
Denmark
Denmark
Denmark
Germany
Germany
Germany
Netherlands
Netherlands
Netherlands
United States
United States
United States
all
all
all

male
female
all
male
female
all
male
female
all
male
female
all
male
female
all
male
female
all

9,913, 658
9,999, 486
19,913, 144
2,676, 377
2,737, 015
5,413, 392
40,413, 132
42,011, 477
82,424, 609
8,079, 392
8,238, 807
16,318, 199
143,957, 558
149,070, 013
293,027, 571
205,040, 117
212,056, 798
417,096, 915

1.2

OLAP Terminology

The operation of changing the dimensions in a cross tab is called pivoting.

Suppose an analyst wishes to see a population cross tab on countries and sex for a fixed
value of the size of the states of the respective countries, for example, 10, 000 km2
instead of the sum across all states:
. Such an operation is referred to as slicing.
. If values from multiple dimensions are fixed, the operation is called dicing.

The operation of moving from finer-granularity data to a coarser granularity is called


a roll-up.

The opposite direction that of moving from coarse granularity data to fine granularity data is called drill down.

c 2005 Jens Teubner, Andre Seifert, University of Konstanz


Extended Aggregation

2.1


SQL-92 vs. SQL-99

SQL-92 aggregation functionality quite limited.


. Very useful aggregates are either very hard or impossible to specify.
Data cube operations,
Complex aggregates (e.g., median, variance),
Binary aggregates (e.g., correlation, regression curves),
Ranking queries (e.g., assign each football team a rank based on the total number
of point, goal difference, goals scored).

SQL-99 OLAP extensions provide a variety of aggregation functions to address the


above limitations.
. Supported by DB2 version 6.

c 2005 Jens Teubner, Andre Seifert, University of Konstanz


2.2

Extended Aggregation in DB2

GROUP BY and GROUPING SETS statements are used to group individual rows into
combined sets based on the value in one, or more, columns.

ROLLUP and CUBE statements are short-hand forms of particular types of


GROUPING SETS statement.

2.2.1

Cube Operation

CUBE operation computes union of GROUP BYs on every subset of the specified attributes.

Consider the following example query:


SELECT country, sex, sum(population)
FROM population
GROUP BY CUBE(country, sex);

c 2005 Jens Teubner, Andre Seifert, University of Konstanz


This computes the union of 2n with n = 2 groupings of the population relation:


{(country, sex), (country), (sex), ()},

where () denotes an empty group by list.




For each grouping, the result contains the null value for attributes not present in
the grouping.

Query above computes the relational representation of the population cross tab that
we saw earlier.

The function grouping() can be used to identify what rows come from which particular grouping set.
. A value of 1 indicates that the corresponding data field is null because the row is
from of a grouping set that does not involve this row.
. Otherwise, the value is zero.

c 2005 Jens Teubner, Andre Seifert, University of Konstanz


Example:
SELECT country, sex, sum(population),
grouping(country) AS country_flag,
grouping(sex) AS sex_flag,
FROM population
GROUP BY CUBE(country, sex);

You can use the CASE expression in the SELECT clause to replace such nulls (presented
as -) by a value such as all.

For example: Replace country in the previous query by:


CASE WHEN grouping(country) = 1 THEN all ELSE country END AS country

c 2005 Jens Teubner, Andre Seifert, University of Konstanz


2.2.2

Rollup Operation

ROLLUP operation generates union on every prefix of specified list of attributes.

The following example query:


SELECT country, sex, sum(population)
FROM population
GROUP BY ROLLUP(country, sex);

generates the union of three groupings:


{(country, sex), (country), ()}


Rollup can be used to generate aggregates at multiple levels of a hierarchy.

Suppose their exists the dimension stretch in the population relation which can be
used to aggregate by town, state, and country.

c 2005 Jens Teubner, Andre Seifert, University of Konstanz


Then the query


SELECT country, state, town, sex, sum(population)
FROM population
GROUP BY ROLLUP(country, state, town, sex);

would give a hierarchical summary by sex and by stretch.




Multiple roll-ups and cubes can be used in a single group by clause.


. Each generates set of group by lists.
. Cross product of sets gives overall set of group by lists.

The following example query:


SELECT year, country, sex, sum(population)
FROM population
GROUP BY ROLLUP (year), ROLLUP(country, sex);

generates the groupings:


{(year), ()} X {(country,sex), (country), ()}

{(year,country,sex), (year,country), (year), (country,sex), (country), ()}


c 2005 Jens Teubner, Andre Seifert, University of Konstanz

10

Having multiple CUBE statements is allowed, but not always useful:

The following example query:


SELECT year, country, sex, sum(population)
FROM population
GROUP BY CUBE (year), CUBE(country, sex);

would generate the groupings:


{(year), ()} X {(country,sex), (country), (sex), ()}

{(year,country,sex), (year,country), (year,sex), (year), (country,sex),


(country), (sex), ()}

c 2005 Jens Teubner, Andre Seifert, University of Konstanz


11

2.2.3

Grouping Sets Operation

GROUPING SETS statement enables us to get multiple GROUP BY result sets using a
single statement.

Nested (i.e., in secondary parenthesis), and non-nested GROUPING SETS sub-phrases


can be distinguished:
. Nested list of columns works as a simple GROUP BY.
. Non-nested list of columns works as separate simple GROUP BY statements, which
are then combined in an implied UNION ALL.

c 2005 Jens Teubner, Andre Seifert, University of Konstanz


12

Example:
GROUP BY
GROUPING SETS
((year,country,sex))

GROUP BY year, country, sex

GROUP BY
GROUPING SETS
(year,country,sex)

GROUP
UNION
GROUP
UNION
GROUP

GROUP BY
GROUPING SETS
(year,(country,sex))

GROUP BY year
UNION ALL
GROUP BY country, sex

c 2005 Jens Teubner, Andre Seifert, University of Konstanz


BY year
ALL
BY country
ALL
BY sex

13

Multiple GROUPING SETS in the same GROUP BY are combined together as if they
were simple fields in a GROUP BY list.

Example:
GROUP BY
GROUPING SETS (year),
GROUPING SETS (country),
GROUPING SETS (sex)

GROUP BY year, country, sex

GROUP BY
GROUPING SETS (year),
GROUPING SETS ((country,sex))

GROUP BY year, country, sex

GROUP BY
GROUPING SETS (year),
GROUPING SETS (country,sex)

GROUP BY year, country


UNION ALL
GROUP BY year, sex

c 2005 Jens Teubner, Andre Seifert, University of Konstanz


14

ROLLUP and CUBE statements are short-hand forms of particular types of GROUPING SETS statement.

ROLLUP expression displays sub-totals for the specified fields. Example:


GROUP BY
ROLLUP(year, country, sex)

GROUP BY
GROUPING SETS((year,country,sex),
(year,country),(year), ())

CUBE expression displays a cross tab of the sub-totals for any specified fields. Example:
GROUP BY
CUBE(year, country, sex)

c 2005 Jens Teubner, Andre Seifert, University of Konstanz


GROUP BY
GROUPING SETS((year,country,sex),
(year,country),
(year,sex),
(country,sex),
(year),
(country),
(sex),
())

15

Ranking

Ranking is done in conjunction with an ORDER BY clause.

Given the relation population(country, number) find the rank of each country.
SELECT country, rank() OVER (ORDER BY number DESC) AS n_rank
FROM population

ORDER BY clause is required to return query results in sorted order.


SELECT country, rank() OVER (ORDER BY number DESC) AS n_rank
FROM population
ORDER BY n_rank

Ranking may leave gaps:


. If multiple rows have equal values, they all get the same rank.
. If 2 countries have the same top population number, both have rank 1, and the
next have rank 3.
. Function dense rank() does not leave such gaps, i.e., next dense rank would be
2.

c 2005 Jens Teubner, Andre Seifert, University of Konstanz


16

Example query: Find the rank of the countries within each sex in terms of their
population size.
SELECT country, rank() OVER (PARTITION BY sex ORDER BY number DESC) AS n_rank
FROM population
ORDER BY sex, n_rank

Multiple independent rankings can be specified in the same query:


. Example:
SELECT country,
rank() OVER (ORDER BY number DESC) AS n_rank_desc,
rank() OVER (ORDER BY number ASC) AS n_rank_asc,
rank() OVER (ORDER BY year ASC) AS y_rank_asc
FROM population
ORDER BY n_rank_desc

c 2005 Jens Teubner, Andre Seifert, University of Konstanz


17

DB2 provides special syntax for top-n (rank) queries.


. Example query: Find the five highest populated countries.
SELECT country, rank() OVER (ORDER BY number DESC) AS n_rank
FROM population
ORDER BY n_rank
FETCH FIRST 5 ROWS ONLY

When writing the ORDER BY clause, one can specify whether to count null values as
high or low.
. The default, for an ascending field is that they are counted as high (i.e. come last),
and for a descending field, that they are counted as low:
. Example:
SELECT country, rank() OVER (ORDER BY number DESC NULLS LAST)
AS n_rank
FROM population
ORDER BY n_rank

c 2005 Jens Teubner, Andre Seifert, University of Konstanz


18

Windowing

Windowing constructs allow us to do things like get cumulative totals or running


averages.

For example:
. Given population values for each country and year, calculate the average population
rate for each country and year on the basis of the current, previous, and next year.
. Query in SQL:
SELECT country, year,
avg(population) OVER (ORDER BY country, year ROWS BETWEEN 1 PRECEDING
AND 1 FOLLOWING) AS p_avg
FROM population
order by country, year, p_avg;

c 2005 Jens Teubner, Andre Seifert, University of Konstanz


19

Examples of other window clause specifications:


.
.
.
.
.
.

ROWS UNBOUNDED PRECEDING,


ROWS UNBOUNDED PRECEDING AND CURRENT ROW,
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING,
RANGE BETWEEN 10 PRECEDING AND CURRENT ROW,
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING,
RANGE BETWEEN CURRENT ROW AND CURRENT ROW.

We can do windowing within partitions.

For example:
. Find the average male and female population rate for each country and year on
the basis of the current, previous, and next year.
. Query in SQL:
SELECT country, sex, year,
avg(population) OVER (PARTITION BY sex ORDER BY name, year ROWS
BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS p_avg
FROM population
ORDER BY country, sex, year, p_avg;

c 2005 Jens Teubner, Andre Seifert, University of Konstanz


20

You might also like