IBM Web Sphere Datastage7.5.X Parallel Jobs Development Basic Guidelines-Ok
IBM Web Sphere Datastage7.5.X Parallel Jobs Development Basic Guidelines-Ok
24/02/2009
TCS Public
General
Create file definitions for all delimited files Avoid varchar data types for the numeric data Specify the length when defining varchar fields Align the data types according to table definition While reading the input in the Sequential file, the fields should not be defined as Varchar (1). Instead the fields should be defined as Char (1). Integer data types should not have precision
Stage variables
Define the data types for stage variables. DO NOT use the default data type (varchar (255)) Do not combine operations which involve intermediate function values which are of different data types than that of stage variable. Input assignment - Field specification should be INPUTRECORD [Start_Position, Field_length] All project level common variables should be saved in Data stage Administrator and used.
Partitioning
When using Aggregator stage, the input data should be hash-partitioned on all Grouping keys. When using Remove Duplicates stage, the input data should be Hash-partitioned on all keys. When using JOIN stage, the input data should be Hash-partitioned on both input links. If the data in a stage has to use previous stage's partitioning method, SAME partitioning has to be used.
TCS Public
Design Suggestions
The input data to JOIN stage must be sorted on all keys "In Aggregator stage, if input volume is less -> Use Hash method if input volume is high -> Use Sort method -> The data has to be sorted on all grouping keys" Transfer a column only if it is required in subsequent stages. When using a Lookup stage, all columns from the reference link will be transferred to the output. So all the unnecessary columns should be dropped after the lookup stage. When two lookups are used consecutively, the lookups can be combined into one lookup. If an operation can be done in the database, do it in the database side itself. If a stage has multiple output links, make sure metadata is defined for all data output links. When a job is doing an operation(s) common to multiple jobs, a reusable component has to be built If a character field can be defined with fixed length datatype, it should be defined as fixed length field. Do not specify length for non-floating numeric fields like Integer, Bigint. If defined so, then it should be maintained throughout the job. Lengths of all the fields should be consistent throughout the job. All table lookups should be specified as Uncommitted Read, wherever possible Avoid hard coding values in Transformer stage. If multiple values have to be hard-coded, load the values in a table and use a lookup instead of hard coding the values Transformer stage should be sparingly used. Try to combine multiple operations in a single Transformer.
TCS Public