Informix Performance Optimization
Informix Performance Optimization
Table of Contents
Informix Performance Optimization Overview: Steps for Optimizing Optimization Goal: Increase Performance Setting up a Test Environment Optimizing the Query: Understand the Requirements Optimizing the Query: Examine the Schema Optimizing the Query: Examine the Data Optimizing the Query: Run, Examine and Modify Set Explain Output Set Explain: Example 1 Set Explain: Example 2 Set Explain: Example 3 Set Explain: Example 4 Set Explain: Example 5 Set Explain: Example 6 Set Explain: Example 7 Set Explain: Example 8 Set Explain: Example 8 cont. Indexing Strategies Indexing Strategies: B+ Trees Indexing Strategies: Types of Indexes Indexing Strategies: Leading Portion of an Index Indexing Strategies: Guidelines Indexing Strategies: Benefits vs. Cost
Email: [email protected]
by
Kevin Fennimore
Overview:
Discuss steps for optimizing Discuss the output of the Set Explain command
Discuss Indexing Strategies New SQL in OnLine Dynamic Server (IDS 7.3) Table Scans & Table Joins Optimizer Directives (IDS 7.3) Discuss optimization techniques and examples XTREE command Correlated Sub-Queries (IDS 7.3)
Reduce I/O
o o
reduce I/O performed by the engine reduce I/O between the back-end and the front-end
Identify Problem Queries Simplify Queries Test on a machine with minimal system activity
What is the object of the query? What is the information required? What is the order criteria?
Identify the the data types and indexes on the columns being:
o o o o
Consider the number of rows examined vs. the number of rows returned
Determine the distribution of filter columns Look at the relationship of joined tables:
o o o
Examine the Set Explain output Modify the query and/or schema (use directives to test various paths) Run the query again
Estimated # of Rows Returned: 15 Temporary Files Required For: Order By 1) informix.stock: SEQUENTIAL SCAN
Estimated Cost: 9 Estimated # of Rows Returned: 22 1) informix.stock: SEQUENTIAL SCAN 2) informix.items: INDEX PATH Filters: informix.items.quantity > 1 (1) Index Keys: stock_num manu_code Lower Index Filter: informix.items.stock_num = informix.stock.stock_num
WHERE a = 1 ORDER BY a, b, c Index is not used for: SELECT * FROM XYZ WHERE b = 2 AND c = 3 SELECT * FROM XYZ WHERE b = 2 SELECT * FROM XYZ WHERE c = 3 ORDER BY b, c
Columns used in joining tables Columns used as filters Columns used in ORDER BYs and GROUP BYs Avoid highly duplicate columns Keep key size small Limit indexes on highly volatile tables Use the FILL FACTOR option (version 7)
Cost Maintenance of indexes on Inserts, Updates & Deletes Extra Disk Space
New SQL for Informix Dynamic Server 7.3 New SQL in IDS 7.3
order by any_column;
This causes the first 10 rows from the result set to be returned.
order by name
This could also be: CASE WHEN address2 is NULL then else address2
QUERY: select * from stock where stock_num>=99 and stock_num<=190 Estimated Cost: 1 Estimated # of Rows Returned: 1 1) informix.stock: INDEX PATH (1) Index Keys: stock_num manu_code Lower Index Filter: informix.stock.stock_num >= 99 Upper Index Filter: informix.stock.stock_num <= 190
Index Scans: Upper and Lower Index Filters Index Scans: Upper and Lower Index Filters
Create indexes on columns that are the most selective. For example: SELECT * FROM CUSTOMER WHERE ACCOUNT BETWEEN 100 and 1000 AND STATUS = A AND STATE = MD
Estimated Cost: 4 Estimated # of Rows Returned: 1 1) informix.mytable: INDEX PATH Filters: informix.mytable.d = 'Y' (1) Index Keys: a b c d (Key-First) (Serial, fragments: ALL) Lower Index Filter: (informix.mytable.a = 1 AND informix.mytable.b = 1 )
Read from A then find matching rows in B Read from B then find matching rows in A
scan into A (3 reads) Total reads: 200,000 (50,000 for B + 50,000*3 for A)
Table A - 1000 rows Table B - 50,000 rows This is a difference of 195,000 reads!!!
scan into A (3 reads) Total reads: 43 (13 for B+10*3 for A) Total Rows Returned: 10
Table A - 1,000 rows Table B - 50,000 rows Select * from A, B where A.join_col = B.join_col and B.filter_col = 1 General Rule: The table which returns the fewest rows, either through a filter or the row count, should be first. Assume 10 rows in B meet this condition
Optimizer Directives
Optimizer Directives
Changes the generated query plan by removing paths from consideration Similar to Oracles HINTs Better than HINTs
o o o o
Optimizer Directives
A then B Seq A, Seq B Cost:100 Seq A, Idx B Cost:50 Idx A, Idx B Cost:20 etc. B then A Seq B, Seq A Cost:100 Seq B, Idx A Cost:50 Idx B, Idx A Cost:10 etc.
Select --ORDERED * from A, B where A.join_col = B.join_col With the directive, ORDERED, the optimizer only considers paths that read from A then B. The lowest cost is then chosen from those paths. Normally, this path would be chosen With the directive, this path would be chosen
SELECT --+ directive text SELECT {+ directive text } UPDATE --+ directive text UPDATE {+ directive text } DELETE --+ directive text DELETE {+ directive text }
C-style comments are also valid as in: SELECT /*+directive*/
Types of Directives
Optimization Goal
index - forces use of a subset of specified indexes index_one - forces use of one of the specified indexes index_all - forces use of all of the specified indexes avoid_index - avoids use of specified indexes full - forces sequential scan of specified table avoid_full - avoids sequential scan of specified table
first_rows (N) - tells the optimizer to choose a plan optimized to return the first N rows of the result set all_rows - tells the optimizer to choose a plan optimized to all tupelos
use_nl - forces nested loop join on specified tables use_merge - forces sort merge join on specified tables use_hash - forces hash join on specified tables avoid_nl - avoids nested loop join on specified tables avoid_merge - avoids sort merge join on specified tables avoid_hash - avoids hash join on specified tables
DIRECTIVES NOT FOLLOWED: 1) customer: SEQUENTIAL SCAN 2) orders: INDEX PATH (1) Index Keys: customer_num Lower Index Filter: orders.customer_num = customer.customer_num NESTED LOOP JOIN 3) items: INDEX PATH Filters: items.order_num = orders.order_num (1) Index Keys: stock_num manu_code Lower Index Filter: (items.stock_num = 6 AND items.manu_code = 'SMT' ) NESTED LOOP JOIN
customer.lname, orders.order_num, items.total_price from customer c, orders o, items i where c.customer_num = o.customer_num and o.order_num = i.order_num and stock_num = 6 and manu_code = "SMT"
Filters:i.order_num =o.order_num (1) Index Keys: stock_num manu_code Lower Index Filter: (i.stock_num = 6 AND i.manu_code = 'SMT' ) NESTED LOOP JOIN
Use Composite Indexes Use Index Filters Create indexes for Key-Only scans Perform indexed reads for sorting Use temporary tables Simplify queries by using Unions Avoid sequential scans of large tables
Drop and recreate indexes for large modifications Avoid Correlated Sub-queries (pre IDS 7.3) Select needed columns vs. Select * Use OUTER JOINS
Composite indexes are ones built on more than one column The optimizer uses the leading portions of a composite index for filters, join conditions and sorts A composite index on columns a, b and c will be used for selects involving:
o o o
It will not be used for selects involving only columns b and/or c since those columns are not at the beginning of the index( i.e. the leading portion )
AND STATE = MD Which column is the most selective? Account, status or state?
Optimization Techniques: Use Index Filters Optimization Techniques: Use Index Filters
If we can change the query to include an upper bound on begin_idx as follows:
SELECT * FROM xyz WHERE begin_idx >= 99
Data for the select list is read from the index key -- No read of the data page is needed Useful for inner tables of nested-loop joins Useful for creating a sub-table for very wide tables
Indexed reads cause rows to be read in the order of the indexed columns Higher priority is given to indexes on columns used as filters Reasons why an index will not be used to perform a sort:
o o o
Columns in the sort criteria are not in the index Columns in the sort criteria are in a different order than the index Columns in the sort criteria are from different tables
With IDS v7.3, an optimizer directive could be used. Assuming the index name on inv_sum was i_inv_sum1, the select would be: The ORDERED directive might have done the same thing.
Useful for batch reporting Avoid selecting a subset of data repetitively from a larger table Create summary information that can be joined to other tables
Disadvantage The data in the temporary table is a copy of the real data and therefore is not changed if the original data is modified.
create index i1 on tmp1( sku ) select tot_qty from tmp1 where sku = ?
select sum(b.sz_qty) from ctn a, ctn_detail b where a.carton_stat = "Q" and a.ctn_id = b.ctn_id and b.sku = ? The ctn table contains 300,000 records and very few records have a status of Q.
OR's can cause the optimizer to not use indexes Complex where conditions can cause the optimizer to use the wrong index Note: Informix Dynamic Server v7.3 allows UNIONs in views
and date_time > ? UNION ... select sum(qty) from log where trans_id = 8 and sku = ? and date_time > ?
select sum(qty) from log where sku = ? and trans_id in ( 1, 2, 3, 4, 5, 6, 7, 8 ) and date_time > ? The log table has an index on date_time and a composite index on trans_id, sku and date_time.
Sequential scans of large tables are resource intensive Sequential scans of small tables are not harmful Consider using permanent indexes to avoid sequential scans when possible Create temporary indexes for batch reporting
Eliminates overhead of maintaining indexes during modification Indexes are recreated more efficiently
END FOR
PREPARE p1 FROM INSERT INTO some_table VALUES ( ?, 10 ) FOR x = 1 to 1000 EXECUTE p1 USING x END FOR
The statement is executed by executing the sub-query, on orders, for every row retrieved from customers.
If customers table had 100,000 rows, the sub-query would get executed 100,000 times. However, if orders only had 20 rows with stat=OPEN the database would be doing a lot of extra work.
Correlated Sub-queries
update customers set stat = A where exists ( select X from orders o where o.custid = customer.custid and o.cmpny = customers.cmpny and o.stat = OPEN ) and custid in ( select custid from orders o where o.stat = OPEN )
update customers set stat = A where exists ( select X from orders o where o.custid = customers.custid and o.cmpny = customers.cmpny
and o.stat = OPEN ) If orders has only 20 rows meeting the filter, the second version of the update runs much faster, assuming that customers has an index on the column custid. The original CSQ is left since it was joining on more than one column
QUERY: update orders set ship_charge = 0 where exists ( select "X" from customer c where c.customer_num = orders.customer_num and c.state = "MD ) 1) informix.c: SEQUENTIAL SCAN Filters: informix.c.state = 'MD' 2) informix.orders: INDEX PATH (1) Index Keys: customer_num Lower Index Filter: orders.customer_num = c.customer_num NESTED LOOP JOIN
An index could be created on state to avoid the sequential scan.
where business_unit = 'ABC and process_instance = 5960 and not exists ( select "X" from PS_SP_BU_GL_NONVW P where P.business_unit = ps_jrnl_ln.business_unit ) Non-Correlated Version select * from ps_jrnl_ln where business_unit = 'ABC and process_instance = 5960 and not exists ( select "X" from PS_SP_BU_GL_NONVW P where P.business_unit = ABC )
--------1) ps_bus_unit_tbl_gl: INDEX PATH (1) Index Keys: business_unit (Key-Only) Lower Index Filter: ps_bus_unit_tbl_gl.business_unit = 'ABC' 2) ps_bus_unit_tbl_fs: INDEX PATH (1) Index Keys: business_unit descr (Key-Only) Lower Index Filter: ps_bus_unit_tbl_fs.business_unit = ps_bus_unit_tbl_gl.business_unit NESTED LOOP JOIN
Constant Subquery Optimization When this filter is checked for the first row, the query can stop immediately, if: its a NOT EXISTS and a row is found its an EXISTS and no rows are found
AND EXISTS ( SELECT 'X' FROM PS_COMBO_SEL_06 A WHERE A.SETID='ABC' AND A.COMBINATION='OVERHEAD' AND A.CHARTFIELD='ACCOUNT' AND PS_JRNL_LN.ACCOUNT BETWEEN A.RANGE_FROM_06 AND A.RANGE_TO_06) Estimated Cost: 79 Estimated # of Rows Returned: 1
Filters: (informix.a.range_to_06 >= ps_jrnl_ln.account AND a.tree_effdt = <subquery> ) (1) Index Keys: setid chartfield combination range_from_06 range_to_06 Lower Index Filter: (a.setid = 'ABC' AND (a.combination = 'OVERHEAD' AND a.chartfield = 'ACCOUNT' ) ) Upper Index Filter: a.range_from_06 <= ps_jrnl_ln.account NESTED LOOP JOIN (Semi Join)
AND items.manu_code='SMT' ) (1) Index Keys: order_num 2) informix.orders: INDEX PATH (1) Index Keys: order_num Lower Index Filter: orders.order_num = items.order_num NESTED LOOP JOIN