Partitioning Oracle Sources in PowerCenter
Partitioning Oracle Sources in PowerCenter
2012 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means
Abstract
You can partition Oracle database table reads to increase performance. This article explains some techniques for
partitioning Oracle source data.
Supported Versions
Informatica PowerCenter 9.x
Table of Contents
Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Database Partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Key Range Partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Configuring Key Range Partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Pass-Through Partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
MOD Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Configuring the Filter Condition with the MOD Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Function-Based Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Filter on ROWID. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Filter on ROWID Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
CreatePartitionInfo Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
DimReadTest Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Overview
When an Oracle source is the bottleneck for PowerCenter session performance, you can increase performance by
configuring source partitioning in the session properties. With source partitioning, the PowerCenter Integration Service reads
multiple Oracle rows in parallel.
The PowerCenter Integration Service creates a reader thread for each pipeline partition. For a relational database source,
each partition issues an SQL statement to access the source data. To optimize performance, the SQL statements should
create efficient and fairly equal-sized data sets.
You can use the following approaches for partitioning Oracle source data:
Database partitioning
Creates a pipeline for each physical table partition in the database.
Key range partitioning
Distributes source rows into partitions based on the values of a port or set of ports.
Pass-through partitioning
Passes rows into static partitions based on a filter condition for each partition.
Database Partitioning
You can optimize session performance by using the database partitioning partition type for Oracle sources. Use database
partitioning for Oracle sources whenever possible.
Database partitioning creates a pipeline for each physical table partition in the Oracle database. When you use database
partitioning, the PowerCenter Integration Service queries the database system for table partition information and fetches
data into the session partitions. You can use any number of session partitions and any number of database partitions. When
the pipeline partitions do not equal the database partitions, the PowerCenter Integration Service generates SQL queries for
each database partition and distributes the data among the session partitions equally. However, you can improve
performance when the number of pipeline partitions equals the number of database partitions.
For Oracle sources that use composite partitioning, you can increase performance when the number of pipeline partitions
equals the number of database subpartitions. For example, if an Oracle source contains three partitions and two
subpartitions for each partition, set the number of pipeline partitions at the source to six.
You can use other partitioning methods when the Oracle table is not partitioned or when the table is partitioned in a way that
is not useful for extracting data sets of equal size.
2.
Select the Source Qualifier partition point on the Partitions view of theMapping tab.
3.
4.
5.
6.
Pass-Through Partitioning
With pass-through partitioning, the PowerCenter Integration Service passes all rows from one partition point to the next
partition point without redistributing data across partitions. All rows in a partition stay in that partition after crossing a partition
point. Pass-through partitioning is the default partitioning method.
When the session has pass-through partitioning, you can configure a filter condition for each static partition. You can use the
following filters to partition data:
MOD function
Use a MOD function to filter data into different partitions based on a value of a numeric column.
ROWID
Partition data by ROWID. Oracle can perform direct reads on rows of data by ROWID.
MOD Function
You can create a filter condition to partition data with the Oracle MOD function. The Oracle MOD function receives two
numeric input values and returns the remainder. For example MOD(4,2) = 0 and MOD(4,3)=1.
To use the MOD function to filter rows, define the Source Qualifier partition type as pass-through. The PowerCenter
Integration Service generates a WHERE clause that includes any filter condition you enter in the session properties.
Enter the filter condition for each partition on the Transformations view of the Mapping tab. The filter overrides any filter
condition that you set in the Designer when you configure the Source Qualifier transformation.
For example, if the session has two partitions you might configure the following function for the first partition:
MOD(columnName,2)=0
When the value of the column is an even number, the first partition receives the row. When the column value is odd, the
second partition receives the row.
When you configure the MOD function, choose a numeric column that has an even distribution of values. You can use a key
column. Do not use a column that has few values because the partitions will be unequal sizes. For example, if a column can
contain zero or one, you cannot partition the row into more than two partitions.
2.
3.
On the Source Filter attribute, enter the following filter conditions for each partition:
Partition#1:
Partition#2:
Partition#3:
Partition#4:
MOD(InvoiceID,4)=0
MOD(InvoiceID,4)=1
MOD(InvoiceID,4)=2
MOD(InvoiceID,4)=3
The following figure shows where to configure the MOD functions in session properties:
Function-Based Index
You can define a function-based index on a MOD function to increase performance and eliminate full table scans. With a
function-based index, Oracle performs fast index range scans. Using a function-based index can affect all SQL statements
that have the matching predicate.
To create a function-based index use the following SQL syntax:
create index invoiceID-mod4-idx on MOD(InvoiceID,4)
If you change the number of partitions in the session, you need to build the function-based index to match the SELECT query.
Note: Without a function-based index, MOD partitioning performance is similar to range-based partitioning. Each query
typically requires a table scan.
Filter on ROWID
When a session reads all the rows in a table, you can configure the session partitions to read a table by the Oracle ROWID.
The ROWID is the physical address of a row in the table. Oracle performs direct reads on a row using ROWID.
To filter with ROWID, you need to determine the ROWID values in the database table. You can configure a SQL statement
that returns the ROWID for specific rows in a table. For example, the following SELECT statement returns the ROWID and
last name of each customer in department 20:
SELECT ROWID, last_name FROM Invoices WHERE Dept = 20
To partition the table read using ROWID, run a SQL query that returns a minimum and maximum ROWID for each partition
you plan to have in the session. After you determine the minimum and maximum ROWID, configure the Source Read
partition filters using the minimum and maximum ROWID for each partition.
For example, a session that reads the Invoices table has four partitions. Configure the following SQL statement to return the
minimum and maximum ROWID for each partition:
SELECT min(ROWID), max(ROWID), tile from (select ROWID, ntile(4) over (order by ROWID) as tile from Invoices) group
by tile order by 1
MAX ROWID
AAATtYAA9AAAK5kAAJ
AAATtYAA+AAAAZzAAA
AAATtYAA+AAAK9DAAI
AAATtYAA+AAAlMIAAJ
TILE
1
2
3
4
In the session properties, configure the filter condition for each of the partitions. Configure the filter condtions with the
minimum and maximum ROWID values from the SQL query.
rowid
rowid
rowid
rowid
between
between
between
between
chartorowid('AAATtYAA9AAAAALAAA')
chartorowid('AAATtYAA9AAAK5kAAK')
chartorowid('AAATtYAA+AAAAZzAAB')
chartorowid('AAATtYAA+AAAK9DAAJ')
and
and
and
and
chartorowid('AAATtYAA9AAAK5kAAJ')
chartorowid('AAATtYAA+AAAAZzAAA')
chartorowid('AAATtYAA+AAAK9DAAI')
chartorowid('AAATtYAA+AAAlMIAAJ')
A disadvantage to partitioning with ROWID is that you must maintain the minimum and maximum values for the filter
conditions. When the table contains new rows, and the filter conditions do not contain the new ROWID values, the session
does not select the rows. You can automate the process to maintain the ROWID values in the filter conditions.
CreatePartitionInfo Mapping
Create a mapping that returns a parameter file containing the ROWID based WHERE clauses.
Target
Configure the target session properties and select the Use Header Command Output Header option. The PowerCenter
Integration Service adds a header to the target. It appends the contents of the partition_where_header.txt file.
Configure the Header Command field to generate a header row. The Header Command contains the following text:
cat /u01/app/infa_shared/presales/sdorcey/TgtFiles/partition_where_header.txt
DimReadTest Mapping
The DimTest mapping reads the parameter file from the CreatePartitionInfo mapping in order to determine how to partition
the source data and perform fast parallel reads. The DimTest mapping reads source rows in 16 partitions. The mapping
passes each row through a Filter transformation.
Note: The Filter transformation returns no rows in the target. You can change the example to include different transformations.
10
DIM_COM_ACCOUNT_TERM
Target that receives no rows.
The following figure shows the parameter file path in the General Options section of the Properties tab:
11
If you change the number of partitions in the DimReadTest session, change the $$number_of_partitions parameter in the
CreatePartitionInfo mapping to match the number of partitions in the session.
Authors
Ellen Chandler
Principal Technical Writer
Stan Dorcey
Sr. Product Specialist
12