Teradata SQL Performance Tuning Case Study Part II
Teradata SQL Performance Tuning Case Study Part II
Part II
Overview
Case 9: Derived Table verses Volatile Table Case 10: Pre-aggregation First Case 11: Cross left outer join skew on null Case 12: Join column skew on default value Case 13: QUALIFY & ROW_NUMBER() Function Case 14: Avoid spool PI skew Case 15: Pre-aggregate then join back with duplicated APPENDIX 1: IMD Teradata Performance wiki APPENDIX 2: IMD Support Team Page APPENDIX 3: Key Performance Metrics1 - SKEW APPENDIX 4: Key Performance Metrics2 CPU/IO Efficiency Ratio
Teradata Optimizer could not look into derived table and no confidence how many row will return, but Teradata would do a sample collect stats on Volatile Table automatically. If result set is small, Optimizer could choose a plan (duplicated data to all amps) better than derived table (joined using a merge join).
myEffectiveCPU 16000 14000 12000 10000 8000 6000 4000 2000 0 2006-12-4 2006-12-11 2006-12-18 2006-12-20 10 0 40 30 20 mySkewOverhead myParallelEfficiency 60 50
Example: o_srch.ods_item_aisle_clssfctn_w.del_ins.sql
select
a.item_id
, syslib.udf_utf8to16( prdct_aspct_nm ) , syslib.udf_utf8to16( aspct_vlu_nm ) , a.last_clsfn_date from ods_batch_views.stg_item_aspct_clssfctn_w a,
mySkewOverhead
myParallelEfficiency 100 90 80 70 60 50 40 30 20 10 0
20 07 20 -3 -1 07 5 20 -3 07 15 20 -3 07 15 20 -3 -1 07 5 20 -3 07 15 20 -3 07 15 20 -3 -1 07 6 20 -3 07 16 20 -3 07 17 20 -3 -1 07 7 20 -3 07 18 20 -3 07 18 20 -3 -1 07 8 20 -3 07 18 20 -3 07 18 20 -3 -1 07 9 -3 -2 0
If there are duplicate records in spool which need to join with lookup table, then pre-aggregate first to compress the data set will be more effective. This should result in less skew, due to the aggregate (rather than the detail). Example using Pre-aggregation resolve the skew on issue and downsize the spool when joining with DW_EXCHANGE_RATE.
For multiple left outer join in one query, there are several steps of join. It is possible that the later joining column skew on NULL or other particular value based on previous result set. Change this value to a more evenly redistributed value like ITEM_ID. That will bring a balanced distribution and the value can not match in join condition.
myTotalCPUTime
myParallelEfficiency 100 90 80 70 60 50
MIP_TRANS.ORDER_ID=MIP_ORDER.ORDER _ID
AND MIP_ORDER.ORDER_STATUS=3 GROUP BY 1,2,3,5,6,7,8,9,10,11,12,13;
20,000.00 40 15,000.00
Rewrite SQL
() ITEM LEFT OUTER JOIN ${readDB}.DW_MIP_ORDER_TRANS_MAP MIP_TRANS ON ITEM.ITEM_ID=MIP_TRANS.ITEM_ID AND MIP_TRANS.ORDER_ID IS NOT NULL LEFT OUTER JOIN ${readDB}.DW_MIP_ORDER MIP_ORDER
30 20 10 0 20 20 06- 0612- 1226 27 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 06- 06- 06- 06- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-1- 07-112- 12- 12- 12- 1 2 3 4 5 6 7 8 9 10 11 12 13 14 28 29 30 31
When join with a nullable column, the default value (-999, NULL, etc) maybe the root cause of bad parallel efficiency. Two ways avoid joining on default value.
Split query into two query with/without skew data Filter the skew data on join condition(need confirm with SAE)
11
Example: dw_myebay.dw_myebay_sav_search.ups.sql
Before change: UPDATE DW_SAV_SEARCH FROM ${gdwDB}.DW_MYEBAY_SAV_SEARCH DW_SAV_SEARC H, DW_MYEBAY_SAV_SEARCH_V SAV_SEARCH_V SET DELTD_YN_ID = 1 ,DELTD_DT = DATE ,UPD_DATE = CURRENT_TIMESTAMP(0) ,UPD_USER = 'DW_BATCH' WHERE DW_SAV_SEARCH.SAV_SRCH_ID=SAV_SEARCH_V.ID;
Rewrite SQL
UPDATE DW_SAV_SEARCH FROM ${gdwDB}.DW_MYEBAY_SAV_SEARCH DW_SAV_SEARC H, DW_MYEBAY_SAV_SEARCH_V SAV_SEARCH_V SET DELTD_YN_ID = 1 ,DELTD_DT = DATE ,UPD_DATE = CURRENT_TIMESTAMP(0) ,UPD_USER = 'DW_BATCH' WHERE DW_SAV_SEARCH.SAV_SRCH_ID=SAV_SEARCH_V.ID And
12
When join with derived table which has aggregation to filter unwanted data, we can use QUALIFY & ROW_NUMBER() to remove the derived table .
Qualify is used to get the needed data, its often used with function ROW_NUMBER(), RANK(), SUM(1). Those functions has better performance
than aggregation function in group by sentences.
13
left join (select to_email_address, max(to_email_addr_id) from batch_views.dw_se_to_email_addr group by 1) c (to_email_address, to_email_addr_id)
ON a.to_email_address = c.to_email_address group by 1,2,3,4,5,6,7,8,9 ) sebw;
14
eBay Inc. confidential
4/ 13 4/ /20 14 0 4/ /20 7 1 0 4/ 5/20 7 16 0 4/ /20 7 17 0 4/ /20 7 18 0 4/ /20 7 19 0 4/ /20 7 2 0 4/ 0/20 7 21 0 4/ /20 7 22 0 4/ /20 7 23 0 4/ /20 7 24 0 4/ /20 7 2 0 4/ 5/20 7 26 0 4/ /20 7 27 0 4/ /20 7 28 0 /2 7 00 7
Spool is intermediate table with PI as normal table. Sometimes skew on spool PI will cause the performance issue.
To avoid spool PI skew, need rewrite the query to cut the join which make the spool.
16
FROM fndng_working.STG_FNDNG_TD_RESULT_SET_W STG JOIN fndng_tables.DW_FNDNG_RULE_SET DW ON DW.RULE_ID_STRING_TXT=STG.RULE_ID_STRING_TXT AND DW.LABEL_ID=STG.LABEL_ID JOIN fndng_tables.DW_FNDNG_RULE_MAP DW_MAP ON DW.RULE_SET_KEY=DW_MAP.RULE_SET_KEY JOIN fndng_tables.DW_FNDNG_TD_RULE DW_RULE ON DW_RULE.RULE_ID=DW_MAP.RULE_ID AND DW_RULE.LABEL_ID=DW_MAP.LABEL_ID JOIN batch_views.DW_KWDM_CNSTRNT_VAL_CFG KW ON KW.ITEM_SITE_ID=DW_RULE.SITE_ID GROUP BY 1,2,3;
17
RULE_ID LABEL_ID
L STG L S7 1,418,765
S3
S11
S1
LABEL_ID RULE_ID_STRING_TXT DW D
S9
SKEW
S8
RULE_SET_KEY
LABEL_ID RULE_ID_STRING_TXT
18
FROM fndng_working.STG_FNDNG_TD_RESULT_SET_W STG JOIN (select DW.RULE_SET_KEY ,DW.RULE_ID_STRING_TXT ,DW.LABEL_ID ,MAX(DW_RULE.SITE_ID) SITE_ID
45,000.00 40,000.00 35,000.00 30,000.00 25,000.00 20,000.00 15,000.00 10,000.00 5,000.00 0.00 2007-7-24 2007-7-25 2007-7-26 2007-7-27 100 80 60 40 20 0 myEffectiveCPU mySkewOverhead myParallelEfficiency
,MAX(KW.BID_VAL) BID_VAL
,MAX(KW.BIN_VAL) BIN_VAL from fndng_tables.DW_FNDNG_RULE_SET DW JOIN fndng_tables.DW_FNDNG_RULE_MAP DW_MAP group by 1,2,3 )tmp ON tmp.RULE_ID_STRING_TXT=STG.RULE_ID_STR ING_TXT AND tmp.LABEL_ID=STG.LABEL_ID GROUP BY 1,2,3;
19
There is a large dataset in spool which need to join with large lookup table, the dataset only have few distinct values over the join column pre-aggregate first to compress the data set, join it with the lookup table, then join back with the duplicated dataset will be more effective.
20
21
Rewrite SQL
First, pre-aggregate the dataset into a volatile table
myEffectiveCPU 14,000.00
mySkewOverhead
myParallelEfficiency 90 80 70 60 50 40 30 20 10 0
CREATE volatile TABLE pre_distinct_v AS ( SELECT SMPL_LKP.USER_SMPL_ID ,PR_MTRC.TEST_VARIANT FROM support_scratch.STG_UM_USER_SMPL_PR_MTRC_P1_W PR_MTRC JOIN DDM_UM_T.DW_UM_USER_SMPL_LKP SMPL_LKP ON PR_MTRC.TEST_GRP = SMPL_LKP.USER_SMPL_CMPGN_ID AND PR_MTRC.SOJ_SITE_ID = SMPL_LKP.SITE_ID Group By 1,2 ) WITH DATA Unique Primary Index( USER_SMPL_ID,TEST_VARIANT ) ONCOMMIT PRESERVE ROWS;
FROM DDM_UM_W.STG_UM_USER_SMPL_PR_MTRC_P1_W PR_MTRC JOIN DDM_UM_T.DW_UM_USER_SMPL_LKP SMPL_LKP ON PR_MTRC.TEST_GRP = SMPL_LKP.USER_SMPL_CMPGN_ID AND PR_MTRC.SOJ_SITE_ID = SMPL_LKP.SITE_ID JOIN ( --join the compressed volatile table with the large lookup table select SMPL_VRNT_PRE.USER_SMPL_PRNT_ID ,SMPL_VRNT_PRE.USER_SMPL_VRNT ,USER_SMPL_VRNT_ID From pre_distinct_v JOIN DDM_UM_T.DW_UM_USER_SMPL_VRNT SMPL_VRNT_PRE ON SMPL_VRNT_PRE.USER_SMPL_PRNT_ID= pre_distinct_v.USER_SMPL_ID AND SMPL_VRNT_PRE.USER_SMPL_VRNT=pre_distinct_v.TEST_VARIANT ) SMPL_VRNT ON SMPL_VRNT.USER_SMPL_PRNT_ID= SMPL_LKP.USER_SMPL_ID AND SMPL_VRNT.USER_SMPL_VRNT=PR_MTRC.TEST_VARIANT
22
OLAP function in current version V2R6 has performance issues due to the using of large spool. We can avoid using OLAP functions in some cases (not all cases) and choose alternative methods While doing De-duplication, we can try using max() function instead of OLAP function (row_number() or sum() with qualify clause)
23
Before change:o_odw_itm.itm_dmx_cpu_w.ins.sql
Insert Into odw_itm_w.ITM_DMX_CPU_W ( ) Select From
odw_itm_w.ITM_DENORM_W w
Where w.RESOURCE_MODEL = 'DMXCpu' qualify sum(1) over ( partition by W.SERVER_NAME,W.PROFILE_NAME,W.RESOURCE_MODEL,W.CONTEXT_TYPE, W.INSTANCE_NAME,W.ITM_TYPE,W.TRANS_TS order by W.TRANS_TS desc ROWS UNBOUNDED PRECEDING) = 1 ;
24
Rewrite SQL
myEffectiveCPU
mySkewOverhead
myParallelEfficiency
7000 60 Insert Into odw_itm_w.ITM_DMX_CPU_W ( ......) 6000 50 Select ...... 5000 From 40 4000 odw_itm_w.ITM_DENORM_W w 30 Join 3000 ( 20 2000 Select 10 1000 SERVER_NAME , PROFILE_NAME 0 0 , RESOURCE_MODEL , CONTEXT_TYPE , INSTANCE_NAME , ITM_TYPE , TRANS_TS , MAX(COALESCE(FILE_TIME,0)||COALESCE(DATA_VALUE1,0)||COALESCE(DATA_VALUE2,0)||COALESCE(DATA_VALUE3,0)||COALESCE(DATA_VALUE4,0)||COA LESCE(DATA_VALUE5,0)||COALESCE(DATA_VALUE6,0)||COALESCE(DATA_VALUE7,0)||COALESCE(DATA_VALUE8,0)||COALESCE(INSTANCE1,0)) AS MAX_VALUE From odw_itm_w.ITM_DENORM_W Where RESOURCE_MODEL = 'DMXCpu' Group by 1,2,3,4,5,6,7 ) w1 on w.SERVER_NAME =w1.SERVER_NAME and w.PROFILE_NAME =w1.PROFILE_NAME and w.RESOURCE_MODEL =w1.RESOURCE_MODEL and w.CONTEXT_TYPE =w1.CONTEXT_TYPE and w.INSTANCE_NAME =w1.INSTANCE_NAME and w.ITM_TYPE =w1.ITM_TYPE and w.TRANS_TS =w1.TRANS_TS and (COALESCE(FILE_TIME,0)||COALESCE(DATA_VALUE1,0)||COALESCE(DATA_VALUE2,0)||COALESCE(DATA_VALUE3,0)||COALESCE(DATA_VALUE4,0)||COALES CE(DATA_VALUE5,0)||COALESCE(DATA_VALUE6,0)||COALESCE(DATA_VALUE7,0)||COALESCE(DATA_VALUE8,0)||COALESCE(INSTANCE1,0))= w1.MAX_VALUE Where w.RESOURCE_MODEL = 'DMXCpu ;
25
9/ 2 9/ 3/2 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 9/ 3/20 07 2 9/ 3/20 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 9/ 3/20 07 2 9/ 3/20 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 9/ 3/20 07 2 9/ 3/20 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 3/2 07 2 0 9/ 4/2 07 2 0 9/ 4/2 07 2 9/ 4/20 07 2 9/ 4/20 07 2 0 9/ 4/2 07 2 0 9/ 4/2 07 2 0 9/ 4/2 07 2 0 9/ 4/2 07 2 0 9/ 4/2 07 2 9/ 4/20 07 2 9/ 4/20 07 2 0 9/ 4/2 07 2 0 9/ 4/2 07 24 0 /2 07 00 7
This wiki is maintained by IMD performance analyst This wiki contains tips and techniques to improve your query performance and to learn about Teradata and the MPP (Massive Parallel Processing) capabilities of the system. It contents of CSUM, skew fact, TD Architecture and developers guidelines. It also includes the TD Performance Quick Tips. http://portal.corp.ebay.com/wiki/tikiindex.php?page=IMD+Teradata+Performance http://portal.corp.ebay.com/wiki/tiki-index.php?page=TD Performance Quick Tips
26
This teamwork page is community owned by IMD support team it contains of most of hot support issue and explain on varies aspect, like performance Tuning. Also it contains the BBS to leave your message and response there. http://teamworks/sites/10320/default.aspx
27
Skew: Uneven resource consumption across units of parallelism Example: %MAX/Agv Skew = (max-avg)/max * 100 Avg = 1.8, Max = 5, Skew = 64%
28
Teradata is highly IO capacity, TD performance monitor focus on CPU more than IO.
29
Parallel efficiency is a calculation that determines how much a given query impacts the system overall.
30
, SUM(TotalCPUTime) AS myTotalCPUTime
, myEffectiveCPU - myTotalCPUTime AS mySkewOverhead , myTotalCPUTime / (myEffectiveCPU+1) * 100 AS myParallelEfficiency , COUNT(*) AS myNumberOfExecutions , AVG(actual_mins) AS AvgActualRunTimeMins , SUM(TotalIOCount) AS myTotalIOCount , (myTotalCPUTime / myTotalIOCount * 1000) AS myCPUIORate , querytext FROM dw_monitor_views.QryLog_dba q inner join dw_monitor_views.dw_vproc_hist v on v.thedate = q.acctstringdate WHERE AcctStringDate >= date - 20 GROUP BY 1,2,3,12
31
Document the improvement approaches and help SAEs to understand the proposal
Checklist for Performance Tuning # Data Collection 1 DBQL provides the primary tool for isolating poorly performing queries Action Item
2
Identify Poorly Performing Queries 1 2 3 4
Collect statistics
1
32
eBay Inc. confidential
Explain Plan Diagnostic verboseexplain on for session; (V2R5.1+) 1 2 3 4 5 6 7 8 9 Indices (Primary, Secondary, Join, etc) 1 Rewrites 1 2 3 4 Quantify Impact at each step Generate the Explain plan and see for improvement Run the query and monitor Spool and CPU usage Explain the corrections help SAE understand the proposal Identify physical model changes (indexes, Primary indexes) Redistributions of large tables and spools(on fields) Product Joins on large numbers of rows Aggregates early in a plan on large numbers of rows Poor confidence Large estimates (hours or days) Large numbers of rows in a step (billions) Index check. See if proper Primary Indexes (PI) and Secondary Indexes (SI) are being used. See for Column Data type Mismatch. (Translate) PPI filter enable
33
Possibility: How possible one thing can be done as P. P in [0..1]. P=1 means possible while P=0 means impossible.
Satisfaction: How people satisfy with one thing as Q. Q in [0..1]. Q=1 means satisfied while Q=0 means unsatisfied.
Possibility-Satisfaction Score: An approach can be possibly be done with satisfaction, which is the combination with P and Q.
34
Top Important
Very Important
Important
Consider
Can ignore
Score
Weight
Effective CPU -%
104
39%
Parallel Efficiency +
96
36%
48
18%
Code change +
19
7%
Total
267
100%
35
A Vaule
B Value
Effective CPU -%
5%
50%
10%
5%
50%
50%
Parallel Efficiency +
50
11.72
50
5%
50%
43%
40%
5%
45%
Code change +
10
36
Effective CPU -%
0.11
0.00
1.00
1.00
Parallel Efficiency +
0.23
0.10
0.00
1.00
0.84
0.78
0.00
0.89
Code change +
0.67
0.89
0.56
1.00
Total
0.33
0.24
0.43
0.98
37