Informatica PowerCenter
Performance Tuning Tips
Pittsburgh Informatica User Group
March 22nd 2016
Gregory Reynolds [Link]@[Link]
Topics
General Tips
Source Qualifiers
Lookups
Sorters / Aggregators / Joiners
Targets
Other Options
General Tips
Remove Dead Weight
Filter unneeded records in the source qualifier
Turn off or remove unused ports
Filter and aggregate as soon as possible
Trim strings immediately
Set Proper Attributes on All Ports
Improper data types require behind the scenes conversions or cause loss of data
Incorrect data sizes require conversions and may truncate data unintentionally
Watch Your Logs
Dont ignore warnings and non-critical errors. It requires a significant amount
of resources to write to a log file
Choose terse level logging in production to save space and processing
Source Qualifier
Balance Work Effort
Consider the time it takes the source to do work instead
PowerCenter is faster at many things / DBMS is faster at others
Local Files
When reading from files it is almost always faster to ship the file to the local server
before reading it then it is to try and read it with a remote connection
Large data files likely would be better off
staged before trying to do any
complex transformations
Use Built-In Options
A sort is usually faster here
than anywhere in the mapping
Same thing for distinct
Choose Deterministic Output if possible
Lookup
Unconnected Lookups
Using an unconnected lookup multiple times saves all of the cost associated with building the cache
except the first time.
Cache Type
Static cache is the default for a reason. Best for a large amount of lookup records
Disabling the cache sends every lookup to the database as an individual request. Works well for a very
small amount of lookup requests versus a large table.
Persistent cache allows you to reuse a cache across multiple sessions. Warning: it may require a lot of
drive space
Dynamic cache keeps records in the lookup up to date automatically as new data is sent to the target
Size Matters
Too small a cache allotment will force large lookups to go to disk
Too large a cache will waste available memory that other transformations could use
Filter The Data
Source data can be filtered
without using an override
More records filtered mean less time to
retrieve and smaller more efficient caches
Sorter, Aggregator, and Joiner
Maximum Cost
No data will pass through until every record has been read and sorted
Large cache usage
Session will fail if cache grows larger than capacity
Waste Not Want Not
Filter out as many unneeded records as is possible before hand
Sorted Input
Whenever possible use sorted input to save the majority of processing and cache cost from
joiners as well as aggregators. Data will flow in incremental chunks if the input is sorted
versus waiting until the end, releasing memory and cache requirements as it does so.
Simplified Group Bys
Grouping by a simple port may save an aggregator having to complete all the grouping
before making any final decisions.
Smaller Master Set
Designate the master data in a sorter as the side that is expected to have fewer rows.
Two large data flows will not join as quickly
Targets
Rejected Rows
Rejecting rows from the target database will slow down normal operations by
potentially several seconds per record! Look out for unexpected NULL values, bad
dates, and invalid keys.
Update Else Insert
With this option selected, every insert is approximately twice as slow as an update
Other options include using a lookup to determine the appropriate action before hand
Commit Not
Setting too low a threshold for database commits will require transactions to wait more
often while the commit is completing. Setting it too high will cause the database to use
an increasing amount of temp space and may cause it to slow down transaction speeds
Bulk Loading
Bulk loading is a fast cheat to be able to push large amount of data into some types of
databases at high speed but there cant any indices on the target table
Other Tuning Options
Workflow Parallelization
Session Partitioning
Push-down Optimization
Data Partitions
Buffer Size
Indices and Primary Keys
System Competition
Redesign slow mappings to find a better way