OLAP Functions Part 1
OLAP Functions Part 1
Patrice Bérubé
Technical Solution Architect
Teradata Canada
OLAP Analytics - Agenda
• Business view
• Academic examples
• Summary
2 pg.2
Typical Business Questions
1. How much revenue did each market have in February, what
percent of total revenue?
4. What is the average revenue and MOU for each decile of the
consumer base in February?
6. What price plan is the final one for a given day when there are
several changes over the course of the day?
3 pg.3
Business needs beyond SQL (1 of 2)
4 pg.4
Business needs beyond SQL (2 of 2)
5 pg.5
Any options… solutions…
• So!
• Any suggestions?
6 pg.6
OLAP Analytics - Agenda
• Business view
• Academic examples
• Summary
7 pg.7
Business question #1 (1 of 3)
SELECT SELECT
bs.bill_mkt_id bs.bill_mkt_id
,DT.CumRev ,SUM(bs.tot_amt) AS MktRev
,SUM(bs.tot_amt) AS MktRev
,MktRev / (DT.CumRev/ 1.0000) AS MktRevPerc ,SUM(MktRev)
FROM bl_stmnt bs
OVER (ROWS BETWEEN UNBOUNDED PRECEDING
CROSS JOIN
AND UNBOUNDED FOLLOWING ) AS CumRev
(
SELECT
,MktRev / (CumRev / 1.0000) AS MktRevPerc
SUM(bs.tot_amt) AS CumRev
FROM bl_stmnt bs
FROM bl_stmnt bs
WHERE bs.bl_cyc_dt BETWEEN '2006-02-01' AND '2006-02-28'
) DT
WHERE bs.bl_cyc_dt BETWEEN '2006-02-01' AND '2006-02-28'
WHERE bs.bl_cyc_dt BETWEEN '2006-02-01' AND '2006-02-28'
GROUP BY 1
GROUP BY 1,2
ORDER BY MktRev DESC
ORDER BY MktRev DESC
Aggregation
10
10 pg.
OLAP functions available in Teradata
12
12 pg.
month day qty cum qty
• Academic examples
• Summary
13
13 pg.
month day qty cum qty
Group Window -
Aggregates based on a grouping of rows.
Cumulative Window -
Aggregates based on a cumulation of rows.
Moving Window -
Aggregates based on a moving window of rows.
Remaining Window -
Aggregates based on the rows remaining outside of a defined window.
• AVG • SUM
• COUNT • RANK
• MAX • PERCENT_RANK
• MIN • ROW_NUMBER
14
14 pg.
month day qty cum qty
15
15 pg.
month day qty cum qty
OLAP - Diagram
SUM (qty) over (...) as cum_qty
month day qty cum_qty
200401 1 2 2
200401 10 3 5
day
200401 partition by month 15 1 6
unbounded preceding
200402 5 3 3
by
rows between
200403 6 order 7 7
200403 7 2 9
200403 8 1 10
200404 4 5 5
200404 16 4 9
200404 27 5 14
16
16 pg.
month day qty cum qty
Coding
• Options and Syntax
• Example:
SUM(qty)
OVER(PARTITION BY month ORDER BY day
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) as Cumulative_Quantity
• Window defined by
• PARTITION BY clause defining the “grouping” of data
• ORDER BY clause defining the sequence of data
• ROWS BETWEEN defines the window used for calculation
e.g.: following/preceding/current row
unbounded or relative row numbers
17
17 pg.
month day qty cum qty
18
18 pg.
month day qty cum qty
Note that the Group Sum reflects the total for each product.
19
19 pg.
month day qty cum qty
20
20 pg.
month day qty cum qty
• Each row computes a moving sum based on itself and 2 preceding rows.
• The 1st and 2nd rows compute their sums based on one and two rows respectively.
• The default title of the last column indicates this is a Moving function.
21
21 pg.
month day qty cum qty
23
23 pg.
month day qty cum qty
Rank vs Row_Number (1 of 2)
Ranking positions which results in ties do not reuse the tied position
number. For example, the seventh row in this list still maintains a rank of
seven, even though the previous row has a rank of four.
24
24 pg.
month day qty cum qty
Rank vs Row_Number (2 of 2)
Percent_Rank
SELECT storeid, prodid, sales,
RANK() OVER (ORDER BY sales DESC) AS Rank_Sales,
PERCENT_RANK() OVER (ORDER BY sales DESC) AS
Pct_Rank_Sales
FROM salestbl ;
storeid prodid sales Rank_Sales Pct_Rank_Sales
------------ --------- ------------- ------------------ -----------------------
1001 F 150000.00 1 0.000000
1001 A 100000.00 2 0.100000
1003 B 65000.00 3 0.200000
1001 C 60000.00 4 0.300000
1003 D 50000.00 5 0.400000
1002 A 40000.00 6 0.500000
1001 D 35000.00 7 0.600000
1002 C 35000.00 7 0.600000
1003 A 30000.00 9 0.800000
1002 D 25000.00 10 0.900000
1003 C 20000.00 11 1.000000
26
26 pg.
OLAP Analytics - Agenda
• Business view
• Academic examples
• Summary
27
27 pg.
Business question #1
How much revenue did each market have in February,
what percent of total revenue?
OLAP Way
Perc of Cum
SELECT Market MktRev GrpRev Total Perc
bs.bill_mkt_id North Pole $ 42,806,310.55 $ 231,945,644.49 18.5% 18.5%
,SUM(bs.tot_amt) AS MktRev South Pole $ 14,017,714.23 $ 231,945,644.49 6.0% 24.5%
Fairyland $ 12,427,672.94 $ 231,945,644.49 5.4% 29.9%
,SUM(MktRev) OVER (ROWS BETWEEN
Tir-na-nog $ 11,116,845.45 $ 231,945,644.49 4.8% 34.7%
UNBOUNDED PRECEDING Shan-gri-la $ 10,770,807.94 $ 231,945,644.49 4.6% 39.3%
AND UNBOUNDED FOLLOWING Land Far-Far Away $ 10,726,915.78 $ 231,945,644.49 4.6% 43.9%
Isle of Misfit Toys $ 7,410,941.03 $ 231,945,644.49 3.2% 47.1%
) AS CumRev Soder Island $ 7,224,536.83 $ 231,945,644.49 3.1% 50.2%
,MktRev / (CumRev / 1.0000) AS MktRevPerc Birdwell Island $ 7,210,494.65 $ 231,945,644.49 3.1% 53.3%
Hazzard County $ 6,814,012.59 $ 231,945,644.49 2.9% 56.3%
FROM bl_stmnt bs
GROUP BY 1
ORDER BY MktRev DESC
28
28 pg.
Business question #2
What are the top 5 and bottom 5 Called Countries by market in February?
SELECT
calld_cntry_nam
,SUM(toll_dur_min) AS TollMin
,RANK() OVER (ORDER BY TollMin ) AS Cntry_Rnk_Asc
,RANK() OVER (ORDER BY TollMin DESC ) AS Cntry_Rnk_Desc
FROM bl_dtl_usge
WHERE bl_cyc_dt BETWEEN '2006-02-01' AND '2006-02-28'
AND calld_cntry_nam <> ''
GROUP BY 1
ORDER BY 3 DESC
QUALIFY Cntry_Rnk_Asc <= 5 OR Cntry_Rnk_Desc <= 5
29
29 pg.
Business question #4 (2 of 2)
What is the average revenue and MOU for each decile of the consumer
base in February?
SELECT
DT.Acct_Decile
,DT.BillMonth
,COUNT(DT.id) AS Occur_Cnt
,SUM(DT.tot_amt) AS Dec_Chg_Amt
,AVG(DT.tot_amt) AS Avg_Chg_Amt
,MIN(DT.tot_amt) AS Min_Chg_Amt
,MAX(DT.tot_amt) AS Max_Chg_Amt
,SUM (Dec_Chg_Amt) OVER () AS Grp_Amt
,Dec_Chg_Amt / (Grp_Amt / 1.0000) AS Amt_Perc
FROM
(SELECT
bs.bill_id
,bs.id
,bs.bl_cyc_id
,bs.bl_cyc_dt
,bs.tot_amt
,ROW_NUMBER () OVER (ORDER BY bs.tot_amt) AS Acct_Rnk
,(( (Acct_Rnk - 1) * 10 ) / COUNT(*) OVER() ) + 1 AS Acct_Decile
FROM bl_stmnt bs
WHERE bs.bl_ind = 'Y'
AND bs.bl_cyc_dt BETWEEN '2006-02-01' AND '2006-02-28'
AND bs.bill_id = 9999
) DT
GROUP BY 1,2
ORDER BY 1,2;
30
30 pg.
Business Question #5 (1 of 2)
31
31 pg.
Business question #5 (2 of 2)
32
32 pg.
Business question #6 (1 of 2)
What price plan is the final one for a given day when there are several
changes over the course of the day?
srv_accs_id client_prc_pln_eff_dt client_prc_pln_end_dt client_prc_pln_seq_nbr prd_id Row_Nbr
808090 02/12/2003 02/12/2003 1 538042 6
808090 02/12/2003 02/12/2003 2 546478 5
808090 02/12/2003 02/12/2003 3 509834 4
808090 02/12/2003 02/12/2003 4 547262 3
808090 02/12/2003 02/12/2003 5 538042 2
808090 02/12/2003 02/12/2003 6 538038 1
808096 26/02/2003 26/02/2003 1 269338 2
808096 26/02/2003 26/02/2003 2 270708 1
809186 01/05/2003 01/05/2003 1 478658 2
809186 01/05/2003 01/05/2003 2 269332 1
809186 05/12/2003 05/12/2003 1 478658 3
809186 05/12/2003 05/12/2003 2 486812 2
809186 05/12/2003 05/12/2003 3 546478 1
813436 11/11/2003 11/11/2003 1 547310 2
813436 11/11/2003 11/11/2003 2 547308 1
813502 17/02/2005 17/02/2005 1853936 836964 2
813502 17/02/2005 17/02/2005 2093328 836956 1
33
33 pg.
Business question #6 (2 of 2)
What price plan is the final one for a given day when there are several
changes over the course of the day?
SELECT
vpph.srv_accs_id
,vpph.client_prc_pln_eff_dt
,vpph.client_prc_pln_end_dt
,vpph.client_prc_pln_seq_nbr
,vpph.prd_id
,ROW_NUMBER () OVER (PARTITION BY vpph.srv_accs_id
,vpph.client_prc_pln_eff_dt
,vpph.client_prc_pln_end_dt
ORDER BY vpph.client_prc_pln_seq_nbr DESC
) AS Row_Nbr
ORDER BY 1,2,3,4,5
QUALIFY COUNT(vpph.srv_accs_id)
OVER (PARTITION BY vpph.srv_accs_id, vpph.client_prc_pln_eff_dt, vpph.client_prc_pln_end_dt) > 1
34
34 pg.
Real Life –
Assign phone_no Attributes 1 of 3
• Assign Country Code and Area Code to the phone number using the
‘best match’
• RANK (windows function), QUALIFY clause
phone_ID phone_no
100001 3723567890
100002 37250867890
100003 37255334234
100004 3725512543
37
37 pg.
OLAP Analytics - Agenda
• Business view
• Academic examples
• Summary
38
38 pg.
Summary
• Functionality of Ordered Analytical Functions
• Supports a large subset of the SQL-99 Window Functions
• All combinations of (cumulative, moving, running) x
(sum, count, min, max, avg)
• Any physical row based window definition: preceding,
following, current row, unbound
• ANSI Row_Number, ANSI Rank
• Benefit
• Exceed Application Limits, get better Performance
• Process costly OLAP Functions within Teradata
• ANSI standard makes SQL support easier for application
developers
• Tidy SQL
• less subselects
39
• replace self joins or multipass SQL
39 pg.
Remember, coding is one thing,
what you want to achieve is another!
40
40 pg.
Thanks and Questions
• Questions???
• I can be reached at [email protected]
• Thanks
• Patrick R. McHugh
41
41 pg.