Informatica Powermart / Powercenter 8.6
Informatica Powermart / Powercenter 8.6
Introduction
PowerMart and PowerCenter provide an environment that allows you to load data into a
centralized location, such as a datamart, data warehouse, or operational data store
(ODS).
You can extract data from multiple sources, transform the data according to business
logic you build in the client application, and load the transformed data into file and
relational targets.
Informatica provides the following integrated components:
Informatica Client. Use the Informatica Client to manage users, define sources
and targets, build mappings and mapplets with the transformation logic, and create
sessions to run the mapping logic. The Informatica Client has three client
applications: Repository Manager, Designer, and Server Manager.
Informatica Server. The Informatica Server extracts the source data, performs the
data transformation, and loads the transformed data into the targets.
PowerMart/PowerCenter Architecture
Sources
PowerMart and PowerCenter access the following sources:
File. Fixed and delimited flat file, COBOL file, and XML.
Targets
PowerMart and PowerCenter can load data into the following targets:
Relational. Oracle, Sybase, Sybase IQ, Informix, IBM DB2, Microsoft SQL
Server, and Teradata.
You can load data into targets using ODBC or native drivers, FTP, or
external loaders.
5
Repository
The Informatica repository is a set of tables that stores the metadata you create using the
Informatica Client tools. You create a database for the repository, and then use the
Repository Manager to create the metadata tables in the database.
You add metadata to the repository tables when you perform tasks in the Informatica Client
application such as creating users, analyzing sources, developing mappings or mapplets,
or creating sessions. The Informatica Server reads metadata created in the Client
application when you run a session. The Informatica Server also creates metadata such as
start and finish times of a session or session status.
When you use PowerCenter, you can develop global and local repository to share
metadata:
Global repository. The global repository is the hub of the domain. Use the global
repository to store common objects that multiple developers can use through
shortcuts. These objects may include operational or application source definitions,
reusable transformations, mapplets, and mappings.
Local repositories. A local repository is within a domain that is not the global
repository. Use local repositories for development. From a local repository, you can
create shortcuts to objects in shared folders in the global repository. These objects
typically include source definitions, common dimensions and lookups, and enterprise
standard transformations. You can also create copies of objects in non-shared
folders.
6
Informatica Client
The Informatica Client is comprised of three applications that you use to manage the
repository, design mappings, mapplets, and create sessions to load the data.
Repository Manager. Use the Repository Manager to create and administer the metadata
repository. You can create repository users and groups, assign privileges and permissions,
manage folders and locks, and print Crystal Reports containing repository data.
Designer. Use the Designer to create mappings that contain transformation instructions for
the Informatica Server. Before you can create mappings, you must add source and target
definitions to the repository. The Designer has five tools that you use to analyze sources,
design target schemas, and build source-to-target mappings:
Server Manager. Use the Server Manager to create, schedule, execute, and monitor
sessions. You create a session based on a mapping in the repository and schedule it to run
against an Informatica Server. You can view scheduled and running sessions for each
Informatica Server in the domain. You can also access details about those sessions.
7
Informatica Server
The Informatica Server reads mapping and session information from the
repository. It extracts data from the mapping sources and stores the data
in memory while it applies the transformation rules that you configure in
the mapping. The Informatica Server loads the transformed data into the
mapping targets.
You can install the Informatica Server on a Windows NT/2000 or UNIX
server machine.
You can communicate with the Informatica Server using pmcmd, a
command line program.
Connectivity
PowerMart and PowerCenter use the following types of connectivity:
Network Protocol
Native Drivers
ODBC
Connectivity Overview
10
Metadata Reporter
PowerMart and PowerCenter use the following types of connectivity:
Network Protocol
Native Drivers
ODBC
11
Perform repository maintenance. You can create, copy, restore, upgrade, backup,
and delete repositories. With a global repository, you can register and unregister local
repositories. You can import and export repository connection information in the
registry and edit repository connection information.
Implement repository security. You can create, edit, and delete repository users
and user groups. You can assign and revoke repository privileges and folder
permissions.
Perform folder functions. You can create, edit, copy, and delete folders. All the work
you perform in the Designer is stored in folders. If you want to share metadata, you
can configure a folder to be shared.
View metadata. You can analyze sources, targets, mappings, and shortcut
dependencies, search by keyword, and view the properties of repository objects.
Customize the Repository Manager. You can add, edit, and remove repositories in
the Navigator, view or hide windows.
Run repository reports. You can run repository reports such as the Source to Target
Dependency report or the Session report. You can also add and remove customized
reports.
12
Navigator. Displays all objects that you create in the Repository Manager,
the Designer, and the Server Manager. It is organized first by repository,
then by folder and folder version. Viewable objects include sources, targets,
dimensions, cubes, mappings, mapplets, transformations, sessions, and
batches. You can also view folder versions and business components.
13
14
15
Repository Objects
You create repository objects using the Repository Manager, Designer, and Server
Manager client tools. You can view the following objects in the Navigator window of the
Repository Manager:
Target definitions. Definitions of database objects or files that contain the target
data.
Session and batches. Sessions and batches store information about how and when
the Informatica Server moves data. Each session corresponds to a single mapping.
You can group several sessions together in a batch
16
Design Process
The goal of the design process is to create mappings that depict the flow of
data between sources and targets, including changes made to the data before it
reaches the targets. However, before you can create a mapping, you must first
create or import source and target definitions. You might also want to create
reusable objects such as reusable transformations or mapplets.
Perform the following design tasks in the Designer:
1.
2.
3.
Create the target tables. If you add a target definition to the repository that
does not exist in a relational database, you need to create target tables in
your target database. You do this by generating and executing the
necessary SQL code within the Warehouse Designer.
17
Design Process
4. Design mappings. Once you have source and target definitions in the
repository, you can create mappings in the Mapping Designer. A mapping is a
set of source and target definitions linked by transformation objects that define
the rules for data transformation. A transformation is an object that performs a
specific function in a mapping, such as looking up data or performing
aggregation.
5. Create mapping objects. Optionally, you can create reusable objects for
use in multiple mappings. Use the Transformation Developer to create reusable
transformations. Use the Mapplet Designer to create mapplets. A mapplet is a
set of transformations that may contain sources and transformations.
6. Debug mappings. Use the Mapping Designer to debug a valid mapping to
gain troubleshooting information about data and error conditions.
7. Import and export repository objects. You can import and export
repository objects, such as sources, targets, transformations, mapplets, and
mappings to archive or share metadata.
18
Designer Windows
You can display the following windows in the Designer:
Designer Windows
20
Debugger Window
21
Server Manager
Use the Server Manager to create, schedule, monitor, edit, copy, and abort
sessions. You can group multiple sessions to run as a single unit, known as a
batch. When you create a session, you select a valid mapping and configure
other settings such as connections, error handling, and scheduling. You may
also be able to override some transformation properties.
When you monitor sessions, the Server Manager displays status such as
scheduled, completed, and failed sessions. It also displays some errors
encountered while running the session. You can find a complete log of errors in
the session log and server log files. Before you create a session, you must
configure the following connection information:
22
Session Properties
You can set the following properties when you create a session:
Source and target location. Select a connection or specify a path for the
source and target data.
Pre- and post-session scripts. Run shell commands before or after the
session.
23
24
25
Creating a Repository
To create Repository
1.
2.
26
Creating a Repository
27
28
Repository Privileges
Privilege
Description
Use Designer
Can edit metadata, import and export objects in the Designer, with
read and write permission at the folder-level
Browse Repository
Can create, import, export, modify, start, stop and delete sessions and
batches through the Server Manager with folder level read, write and
execute permissions. Can configure some connections used by the
Informatica Server
Session Operator
Can use the command line program (pmcmd) to start sessions and
batches, Can start, view, monitor and stop sessions or batches with
folder-level read permission and the create sessions and batches
privilege using the Server Manager
29
Repository Privileges
Privilege
Description
Administer Repository
Administer Server
Super User
Can perform all the tasks across all folders in the repository,
including unlocking locks and managing global object permissions
30
Folders
Folders provide a way to organize and store all metadata in the repository,
including mappings, schemas, and sessions. Folders are designed to be
flexible, to help you organize your data warehouse logically. Each folder has a
set of properties you can configure to define how users access the folder. For
example, you can create a folder that allows all repository users to see objects
within the folder, but not to edit them. Or you can create a folder that allows
users to share objects within the folder.
Shared Folders
When you create a folder, you can configure it as a shared folder. Shared
folders allow users to create shortcuts to objects in the folder. If you have
reusable transformation that you want to use in several mappings or across
multiple folders, you can place the object in a shared folder.
For example, you may have a reusable Expression transformation that
calculates sales commissions. You can then use the object in other folders by
creating a shortcut to the object.
31
Folder Permissions
Permissions allow repository users to perform tasks within a folder. With folder
permissions, you can control user access to the folder, and the tasks you
permit them to perform.
Folder permissions work closely with repository privileges. Privileges grant
access to specific tasks while permissions grant access to specific folders with
read, write, and execute qualifiers.
However, any user with the Super User privilege can perform all tasks across
all folders in the repository. Folders have the following types of permissions:
Read permission. Allows you to view the folder as well as objects in the
folder.
32
33
Creating Folders
To Create a New Folder:
Choose Folder-Create
34
Importing Sources
Source Analyzer is used to import or create source definitions for flat file, XML, Cobol, ERP,
and relational sources.
35
36
37
Double-click the title bar of the source definition for the table.
The Edit Tables dialog box opens and displays all the properties of this
source definition. The Table tab shows the name of the table, business name,
owner name, and the database type. You can add a comment in
the
Description section.
Note: To change the source table name, click Rename.
Click the Columns tab.
The Columns tab displays the column descriptions for the source. You
can modify the source definition, change or delete columns. Any changes
you make in this dialog box affect the source definition, not the source.
38
39
Creating Targets
You can create target definitions in the Warehouse Designer for file and relational
sources. Create definitions in the following ways:
Import the definition for an existing target. Import the target definition
from a relational target.
Creating Targets
41
42
43
Click the icon representing the EMPLOYEES source and drag it into the
workbook.
44
the Targets icon in the Navigator to open the list of all target
definitions.
Click and drag the icon for the T_EMPLOYEES target into the
workspace.
The target definition appears. The final step is connecting the Source
Qualifier to this target definition.
45
46
Transformations
A transformation is any part of a mapping that generates or modifies data. Every mapping
includes a Source Qualifier transformation, representing all the columns of information read
from a source and temporarily stored by the Informatica Server. In addition, you can add
transformations such as a calculating sum, looking up a value, or generating a unique ID that
modify information before it reaches the target.
When you build a mapping, you add transformations and configure them to handle data
according to your business purpose. Perform the following tasks to incorporate a
transformation into a mapping:
Create
47
Transformation Descriptions
Transformation
Type
Description
Active/
Connected
Aggregator
Active/
Connected
Performs aggregate
calculations.
Active/
Connected
Expression
Passive/
Connected
Calculates a value.
External Procedure
Passive/
Connected or Unconnected
Filter
Active/
Connected
Filters records.
48
Transformation Descriptions
Input
Passive/
Connected
Joiner
Active/
Connected
Lookup
Passive/
Connected or Unconnected
Looks up values.
Normalizer
Active/
Connected
Output
Passive/
Connected
Rank
Active/
Connected
Sequence Generator
Passive/
Connected
49
Transformation Descriptions
Source Qualifier
Active/
Connected
Router
Active/
Connected
Stored Procedure
Passive/
Connected or Unconnected
Update Strategy
Active/
Connected
Passive/
Connected
50
Transformations Toolbar
51
Aggregator Transformation
The Aggregator transformation allows you to perform aggregate
calculations, such as averages and sums. The Aggregator transformation
is unlike the Expression transformation, in that you can use the Aggregator
transformation to perform calculations on groups. The Expression
transformation permits you to perform calculations on a row-by-row basis
only.
When using the transformation language to create aggregate expressions,
you can use conditional clauses to filter records, providing more flexibility
than SQL language.
The Informatica Server performs aggregate calculations as it reads, and
stores necessary data group and row data in an aggregate cache.
52
Configure
Create
53
outputs
you must pass data to the Aggregator transformation sorted by group by port, in
ascending or descending order.
Aggregate
completes aggregate calculations. It stores group values in an index cache and row
data in the data cache.
54
Aggregate Cache
When you run a session that uses an Aggregator transformation, the Informatica
Server creates index and data caches in memory to process the transformation. If
the Informatica Server requires more space, it stores overflow values in cache files.
You configure the cache parameters in the session properties.
55
a name for the Aggregator, click Create. Then click Done. The Designer
transformation. The
4.Double-click
dialog box.
5.Select
the Group By option for each column you want the Aggregator to use in
creating groups. You can optionally enter a default value to replace null groups.
7.If
you want to use a non-aggregate expression to modify groups, click the Add
button and enter a name and datatype for the port. Make the port an output port by
clearing Input (I). Click in the right corner of the Expression field, enter the nonaggregate expression using one of the input ports, then click OK. Select Group By.
8.Click
Add and enter a name and datatype for the aggregate expression port. Make
the port an output port by clearing Input (I). Click in the right corner of the
Expression field to open the Expression Editor. Enter the aggregate expression,
click Validate, then click OK. Make sure the expression validates before closing the
Expression Editor.
9.Add
default values for specific ports as necessary. If certain ports are likely to
contain null values, you might specify a default value if the target database does
not handle null values.
10.Select
57
58
Description
Cache Directory
Local directory where the Informatica Server creates the index and data
caches and, if necessary, index and data files. By default, the Informatica
Server uses the directory entered in the Server Manager for the server
variable $PMCacheDir. If you enter a new directory, make sure the directory
exists and contains enough memory/disk space for the aggregate caches.
Tracing Level
Sorted Input
Indicates input data is presorted by groups. Select this option only if the
mapping passes data to the Aggregator that is sorted by the Aggregator
group by ports and by the same sort order configured for the session.
Note: Use the Source Qualifier Number of Sorted Ports option to sort
relational sources.
59
Expression Transformation
You can use the Expression transformations to calculate values in a single row before
you write to the target.
For example, you might need to adjust employee salaries, concatenate first and last
names, or convert strings to numbers.
You can use the Expression transformation to perform any non-aggregate calculations.
You can also use the Expression transformation to test conditional statements before
you output the results to target tables or other transformations.
60
Expression Transformation
Calculating Values
To use the Expression transformation to calculate values for a single row, you must include
the following ports:
Input
One port
output port. The return value for the output port needs to
61
match
Expression Transformation
Adding Multiple Calculations
You can enter multiple expressions in a single Expression transformation. As long as you
enter only one expression for each output port, you can create any number of output ports in
the transformation. In this way, you can use one Expression transformation rather than
creating separate transformations for each calculation that requires the same set of data.
For example, you might want to calculate several types of withholding taxes from each
employee paycheck, such as local and federal income tax, Social Security and Medicare.
Since all of these calculations require the employee salary, the withholding category, and/or
the corresponding tax rate, you can create one Expression transformation with the salary and
withholding category as input/output ports and a separate output port for each necessary
calculation.
62
transformation and add it to the mapping. Enter a name for it (the convention is
EXP_TransformationName) and click OK.
2.Create
the input ports. If you have the input transformation available, you can
select Link Columns from the Layout menu and then click and drag each port used
the calculation into the Expression transformation. With this method, the
in
Designer
copies the port into the new transformation and creates a connection between the two
ports. Or, you can open the Edit dialog box and create each port
manually.
Note: If you want to make this transformation reusable, you must create each port
manually within the transformation.
3.Repeat
the previous step for each input port you want to add to the expression.
4.Create
the output ports (O) you need, making sure to assign a port datatype that
matches the expression return value. The naming convention for output ports is
OUT_PORTNAME.
63
enter
possible,
6. If
the small button that appears in the Expression section of the dialog box and
use the listed port names and functions.
you select a port name that is not connected to the transformation, the Designer
copies the port into the new transformation and creates a connection between the two
ports.
7. Port
stricter
A port name must begin with a single- or double-byte letter or single- or doublebyte underscore (_).
6.Check the expression syntax by clicking Validate. If necessary, make corrections to the
expression and check the syntax again. Then save the expression and exit the
Expression Editor.
64
Lookup Transformation
Use a Lookup transformation in your mapping to look up data in a relational table, view,
or
synonym. Import a lookup definition from any relational database to which both the
Informatica Client and Server can connect. You can use multiple Lookup
transformations
in a mapping.
The Informatica Server queries the lookup table based on the lookup ports in the
transformation. It compares Lookup transformation port values to lookup table column values
based on the lookup condition. Use the result of the lookup to pass to other transformations
and the target.
You can use the Lookup transformation to perform many tasks, including:
Get
a related value. For example, if your source table includes employee ID, but
you want to include the employee name in your target table to make your summary
data easier to read.
Perform
calculation,
such as gross sales per invoice or sales tax, but not the calculated value
(such as net
sales).
Update
Lookup Transformation
You can configure the Lookup transformation to perform different types of lookups. You can
configure the transformation to be connected or unconnected, cached or uncached:
Connected
caching
the lookup table. If you cache the lookup table, you can choose to use a
dynamic or
static cache. By default, the lookup cache remains static and does not
change during
rows into the cache
lookup. This enables you to look up values in the target and insert
exist.
66
each input row, the Informatica Server queries the lookup table or cache based
returns values from the lookup query. If the transformation uses a dynamic cache,
the
Informatica Server inserts the row into the cache when the lookup query does not
find
the row in the cache. It flags the row as new or existing, based on the result of
the
lookup query.
4.The
Lookup transformation passes return values from the query to the next
transformation. If the transformation uses a dynamic cache, you can pass rows
to a
67
Informatica Server queries the lookup table or cache based on the lookup ports
Informatica Server returns one value into the return port of the Lookup
transformation.
4.The
Lookup transformation passes the return value into the :LKP expression.
68
Connected Lookup
Unconnected Lookup
69
Connected Lookup
Unconnected Lookup
70
Lookup Components
When you configure a Lookup transformation in a mapping, you define the
following components:
Lookup table
Ports
Properties
Condition
71
Lookup Table
You can import a lookup table from the mapping source or target database, or
you can import a lookup table from any database that both the Informatica
Server and Client machine can connect to. If your mapping includes
heterogeneous joins, you can use any of the mapping sources or mapping
targets as the lookup table.
The lookup table can be a single table, or you can join multiple tables in the
same database using a lookup query override. The Informatica Server queries
the lookup table or an in-memory cache of the table for all incoming rows into
the Lookup transformation.
Connect to the database to import the lookup table definition. The Informatica
Sever can connect to a lookup table using a native database driver or an ODBC
driver. However, the native database drivers improve session performance.
72
Lookup Table
Indexes and a Lookup Table
If you have privileges to modify the database containing a lookup table, you can
improve lookup initialization time by adding an index to the lookup table. This is
important for very large lookup tables. Since the Informatica Server needs to
query, sort, and compare values in these columns, the index needs to include
every column used in a lookup condition.
You can improve performance by adding indexes for the following lookups:
Cached
lookup ORDER BY. The session log contains the ORDER BY statement.
Uncached
for
each row passing into the Lookup transformation, you can improve performance by
indexing the columns in the lookup condition.
73
Lookup Ports
The Ports tab contains options similar to other transformations, such as port
name, datatype, and scale. In addition to input and output ports, the Lookup
transformation includes a lookup port type that represents columns of data in
the lookup table. An unconnected Lookup transformation also includes a return
port type that represents the return value.
74
Lookup Ports
Ports
Type of
Lookup
Connected
Unconnected
Number
Required
Description
Minimum of 1
Input port. Create an input port for each lookup port you want to
use in the lookup condition. You must have at least one input or
input/output port in each Lookup transformation.
Connected
Unconnected
Minimum of 1
Output port. Create an output port for each lookup port you want
to link to another transformation. You can designate both input
and lookup ports as output ports. For connected lookups, you
must have at least one output port. For unconnected lookups, use
the return port (R) to designate a return value.
Connected
Unconnected
Minimum of 1
1 only
Unconnected
75
76
Description
Lookup SQL
Override
Lookup Table
Name
Specifies the name of the table from which the transformation looks up and caches values.
You can import a table, view, or synonym from another database by selecting the Import button
on the dialog box that displays when you first create a Lookup transformation.
If you enter a lookup SQL override, you do not need to add an entry for this option.
Lookup Caching
Enabled
Indicates whether the Lookup transformation caches lookup values during the session.
When lookup caching is enabled, the Informatica Server queries the lookup table once,
caches the values, and looks up values in the cache during the session. This can improve
session performance.
When you disable caching, each time a row passes into the transformation, the Informatica
Server issues a select statement to the lookup table for lookup values.
77
Description
Lookup Policy on
Multiple Match
Available for Lookup transformations that are uncached or use a static cache. Determines
what happens when the Lookup transformation finds multiple rows that match the lookup
condition. You can select the first or last record returned from the cache or lookup table, or
report an error.
The Informatica Server fails a session when it encounters a multiple match while processing a
Lookup transformation with a dynamic cache.
Lookup Condition
Location
Information
Specifies the database containing the lookup table. You can select the exact database or you
can use the $Source or $Target variable. If you use one of these variables, the lookup table
must reside in the source or target database you specify when you configure the session.
When you have more than one relational source in the mapping, the session fails if you use
$Source.
Source Type
Indicates that the Lookup transformation reads values from a relational database.
78
Description
Recache if Stale
The Recache from Database option replaces the Recache if Stale and Lookup Cache
Initialize options.
Tracing Level
Sets the amount of detail included in the session log when you run a session containing
this transformation.
Lookup Cache
Directory Name
Specifies the directory used to build the lookup cache files when the Lookup
transformation is configured to cache the lookup table. Also used to save the persistent
lookup cache files when the Lookup Persistent option is selected.
By default, the Informatica Server uses the $PMCacheDir directory configured for the
Informatica Server.
Lookup Cache
Initialize
The Recache from Database option replaces the Lookup Cache Initialize and Recache if
Stale options.
Lookup Cache
Persistent
Indicates whether the Informatica Server uses a persistent lookup cache, which consists
of at least two cache files. If a Lookup transformation is configured for a persistent lookup
cache and persistent lookup cache files do not exist, the Informatica Server creates the
files during the session. You can use this only when you enable lookup caching.
79
Description
Lookup Data
Cache Size
Indicates the maximum size the Informatica Server allocates to the data cache in memory. If
the Informatica Server cannot allocate the configured amount of memory when initializing the
session, it fails the session. When the Informatica Server cannot store all the data cache data
in memory, it pages to disk as necessary.
The Lookup Data Cache Size is 2,000,000 bytes by default. The minimum size is 1,024 bytes.
Use only with the lookup cache enabled.
Lookup Index
Cache Size
Indicates the maximum size the Informatica Server allocates to the index cache in memory. If
the Informatica Server cannot allocate the configured amount of memory when initializing the
session, it fails the session. When the Informatica Server cannot store all the index cache data
in memory, it pages to disk as necessary.
The Lookup Index Cache Size is 1,000,000 bytes by default. The minimum size is 1,024 bytes.
Use only with the lookup cache enabled.
Dynamic Lookup
Cache
Indicates to use a dynamic lookup cache. Inserts new rows into the lookup cache as it passes
rows to the target table. You can use this only when you enable lookup caching.
Specifies the file name prefix to use with persistent lookup cache files. The Informatica Server
uses the file name prefix as the file name for the persistent cache files it saves to disk. Only
enter the prefix. Do not enter .idx or .dat.
If the named persistent cache files exist, the Informatica Server builds the memory cache from
the files. If the named persistent cache files do not exist, the Informatica Server rebuilds the
persistent cache files. Use only with persistent lookup cache.
80
Lookup Condition
The Informatica Server uses the lookup condition to test incoming values. It is similar to the
WHERE clause in an SQL query. When you configure a lookup condition for the
transformation, you compare transformation input values with values in the lookup table or
cache, represented by lookup ports. When you run a session, the Informatica Server queries
the lookup table or cache for all incoming values based on the condition.
You must enter a lookup condition in all Lookup transformations. Some guidelines for the
lookup condition apply for all Lookup transformations, and some guidelines vary depending
on how you configure the transformation.
81
Lookup Condition
Use the following guidelines when you enter a condition for any Lookup transformation:
The
Use
one input port for each lookup port used in the condition. You can use the same
condition
all the
The
condition column is NULL, the Informatica Server evaluates the NULL equal to a
NULL in the lookup table.
The lookup condition guidelines and the way the Informatica Server processes matches
varies depending on whether you configure the transformation for a dynamic cache or for an
uncached or static cache.
82
you configure a Lookup transformation to use a static cache, or not to cache, you
can use the following operators when you create the lookup condition: =, >, <, >=, <=,
!=
If
you include more than one lookup condition, place the conditions with an equal
input value must meet all conditions for the lookup to return a value.
The condition can match equivalent values or supply a threshold condition. For example, you
might look for customers who do not live in California, or employees whose salary is greater
than $30,000. Depending on the nature of the source and condition, the Lookup might return
multiple values.
83
the first matching value, or return the last matching value. You can configure the
transformation either to return the first matching value or the last matching value. The first and last
values are the first values and last values found in the lookup cache that match the lookup condition.
When you cache the lookup table, the Informatica Server determines which record is first and which is
last by generating an ORDER BY clause for each column in the lookup cache.
then sorts each lookup source column in the lookup condition in ascending
Server sorts numeric columns in ascending numeric order (such as 0 to 10
from January to December and from the first of the month to the end of the
an error. The Informatica Server returns the default value for the output ports.
Note: The Informatica Server fails the session when it encounters multiple keys for a Lookup
transformation configured to use a dynamic cache.
84
85
Lookup Caches
The Informatica Server builds a cache in memory when it processes the first row of data in a
cached Lookup transformation. It allocates memory for the cache based on the amount you
configure in the transformation or session properties. The Informatica Server stores condition
values in the index cache and output values in the data cache. The Informatica Server
queries the cache for each row that enters the transformation.
The Informatica Server also creates cache files by default in the $PMCacheDir. If the data
does not fit in the memory cache, the Informatica Server stores the overflow values in the
cache files. When the session completes, the Informatica Server releases cache memory and
deletes the cache files unless you configure the Lookup transformation to use a persistent
cache.
When configuring a lookup cache, you can specify any of the following options:
Persistent
cache. You can save the lookup cache files and reuse them the next time
from Database. If the persistent cache is not synchronized with the lookup
table, you can configure the Lookup transformation to rebuild the lookup cache.
86
Lookup Caches
Static
cache. You can configure a static, or read-only, cache for any lookup table.
By default, the Informatica Server creates a static cache. It caches the lookup table
and looks up values in the cache for each row that comes into the transformation.
When the lookup condition is true, the Informatica Server returns a value in the
lookup cache. The Informatica Server does not update the cache while it processes
the Lookup transformation.
Dynamic
cache. If you want to cache the target table and insert new rows into the
cache and the target, you can create a Lookup transformation to use a dynamic
cache. The Informatica Server dynamically inserts data into the lookup cache and
passes data to the target table.
Shared
cache. You can share the lookup cache between multiple transformations.
You can share an unnamed cache between transformations in the same mapping.
You can share a named cache between transformations in the same or different
mappings.
87
2.
In the Select Lookup Table dialog box, you can choose the lookup table. Click
the Import button if the lookup table is not in the source or target database.
88
If you want to manually define the lookup transformation, click the Skip button.
4.
Define input ports for each Lookup condition you want to define.
5.
For an unconnected Lookup transformation, create a return port for the value
you want to return from the lookup.
6.
Define output ports for the values you want to pass to another transformation.
7.
8.
Add the lookup conditions. If you include more than one condition, place the
conditions using equal signs first to optimize lookup performance. On the
Properties tab, set the properties for the lookup. Click OK.
11.
89
2.
Enter a name for the Sequence Generator, click Create. Then click Done. The
Designer creates the Sequence Generator transformation.
3.
Double-click the title bar of the transformation to open the Edit Transformations
dialog box.
91
5.
92
Check the status of a target database before moving records into it.
93
94
95
Connected
Unconnected
The type you use depends on what your stored procedure does, and how often
the stored procedure should run in a mapping.
96
Connected
The flow of data through a mapping in connected mode also passes through the Stored
Procedure transformation. All data entering the transformation through the input ports
affects the stored procedure. You should use a connected stored procedure when you
need data from an input port sent as an input parameter to the stored procedure, or the
results of a stored procedure sent as an output parameter to another transformation.
97
98
Unconnected
The unconnected Stored Procedure transformation is not connected directly to
the flow of the mapping. It either runs before or after the session, or is called by
an expression in another transformation in the mapping.
99
100
Join data originating from the same source database. You can join two or more tables
with primary-foreign key relationships by linking the sources to one Source Qualifier.
Filter records when the Informatica Server reads source data. If you include a filter
condition, the Informatica Server adds a WHERE clause to the default query.
Specify an outer join rather than the default inner join. If you include a user-defined
join, the Informatica Server replaces the join information specified by the metadata in the
SQL query.
Specify sorted ports. If you specify a number for sorted ports, the Informatica Server adds
an ORDER BY clause to the default SQL query.
Select only distinct values from the source. If you choose Select Distinct, the
Informatica Server adds a SELECT DISTINCT statement to the default SQL query.
Create a custom query to issue a special SELECT statement for the Informatica
Server to read source data. For example, you might use a custom query to perform
aggregate calculations or execute a stored procedure
101
102
Description
SQL Query
Defines a custom query that replaces the default query the Informatica Server uses to read data from
sources represented in this Source Qualifier
User-Defined
Join
Specifies the condition used to join data from multiple sources represented in the same Source
Qualifier transformation
Source Filter
Specifies the filter condition the Informatica Server applies when querying records.
Number of Sorted
Ports
Indicates the number of columns used when sorting records queried from relational sources. If you
select this option, the Informatica Server adds an ORDER BY to the default query when it reads
source records. The ORDER BY includes the number of ports specified, starting from the top of
the Source Qualifier.
When selected, the database sort order must match the session sort order.
Tracing Level
Sets the amount of detail included in the session log when you run a session containing this
transformation.
Select Distinct
Specifies if you want to select only unique records. The Informatica Server includes a SELECT
DISTINCT statement if you choose this option.
103
Filter Transformation
The Filter transformation provides the means for filtering rows in a mapping. You
pass all the rows from a source transformation through the Filter transformation,
and then enter a filter condition for the transformation. All ports in a Filter
transformation are input/output, and only rows that meet the condition pass
through the Filter transformation.
104
Select and drag all the desired ports from a source qualifier or other transformation
to add them to the Filter transformation. After you select and drag ports, copies of
these ports appear in the Filter transformation. Each column has both an input and
an output port.
Click the Properties tab. A default condition appears in the list of conditions. The
default condition is TRUE (a constant with a numeric value of 1).
105
Joiner Transformation
While a Source Qualifier transformation can join data originating from a common source
database, the Joiner transformation joins two related heterogeneous sources residing in
different locations or file systems. The combination of sources can be varied. You can use
the following sources:
If two relational sources contain keys, then a Source Qualifier transformation can easily join
the sources on those keys. Joiner transformations typically combine information from two
different sources that do not have matching keys, such as flat file sources.
The Joiner transformation allows you to join sources that contain binary data.
106
107
108
Description
Case-Sensitive String
Comparison
Cache Directory
Specifies the directory used to cache master records and the index to these records. By
default, the caches are created in a directory specified by the server variable
$PMCacheDir. If you override the directory, be sure there is enough disk space on the file
system. The directory can be a mapped or mounted drive.
Join Type
Specifies the type of join: Normal, Master Outer, Detail Outer, or Full Outer.
Null Ordering in
Master
Tracing Level
Amount of detail displayed in the session log for this transformation. The options are
Terse, Normal, Verbose Data, and Verbose Initialization.
109
Rank Transformation
The Rank transformation allows you to select only the top or bottom rank of data. You can
use a Rank transformation to return the largest or smallest numeric value in a port or
group. You can also use a Rank transformation to return the strings at the top or the bottom
of a session sort order. During the session, the Informatica Server caches input data until it
can perform the rank calculations.
The Rank transformation differs from the transformation functions MAX and MIN, in that it
allows you to select a group of top or bottom values, not just one value. For example, you
can use Rank to select the top 10 salespersons in a given territory. Or, to generate a
financial report, you might also use a Rank transformation to identify the three departments
with the lowest expenses in salaries and overhead. While the SQL language provides
many functions designed to handle groups of data, identifying top or bottom strata within a
set of rows is not possible using standard SQL functions.
110
Click OK, and then click Done. The Designer creates the Rank transformation.
Click the Ports tab, and then select the Rank (R) option for the port used to
measure ranks.
111
112
Description
Cache directory
Local directory where the Informatica Server creates the index and data caches and,
inecessary, index and data files. By default, the Informatica Server uses the directory
entered in the Server Manager for the server variable $PMCacheDir. If you enter a new
directory, make sure the directory exists and contains enough disk space for the rank
caches.
Top/Bottom
Specifies whether you want the top or bottom ranking for a column.
Number of Ranks
Case-Sensitive String
Comparison
When running in Unicode mode, the Informatica Server ranks strings based on the sort
order selected for the session. If the session sort order is case-sensitive, select this option
to enable case-sensitive string comparisons, and clear this option to have the Informatica
Server ignore case for strings. If the sort order is not case-sensitive, the Informatica Server
ignores this setting. By default, this option is selected.
Tracing level
Determines the amount of information the Informatica Server writes to the session log about
data passing through this transformation during a session.
113
Router Transformation
A Router transformation is similar to a Filter transformation because both transformations
allow you to use a condition to test data. A Filter transformation tests data for one condition
and drops the rows of data that do not meet the condition. However, a Router
transformation tests data for one or more conditions and gives you the option to route rows
of data that do not meet any of the conditions to a default output group.
If you need to test the same input data based on multiple conditions, use a Router
Transformation in a mapping instead of creating multiple Filter transformations to perform
the same task. The Router transformation is more efficient when you design a mapping and
when you run a session. For example, to test data based on three conditions, you only
need one Router transformation instead of three filter transformations to perform this task.
Likewise, when you use a Router transformation in a mapping, the Informatica Server
processes the incoming data only once. When you use multiple Filter transformations in a
mapping, the Informatica Server processes the incoming data for each transformation.
114
115
Input
Output
Input Group
The Designer copies property information from the input ports of the input group to
create a set of output ports for each output group.
Output Groups
There are two types of output groups:
User-defined groups
Default group
116
117
2.
Choose Transformation-Create. Select Router transformation, and enter the name of the
new transformation. The naming convention for the Router transformation is
RTR_TransformationName. Click Create, and then click Done.
3.
Select and drag all the desired ports from a transformation to add them to the Router
transformation, or you can manually create input ports on the Ports tab.
4.
Double-click the title bar of the Router transformation to edit transformation properties.
5.
6.
7.
Click the Groups tab, and then click the Add button to create a user-defined group. The
Designer creates the default group when you create the first user-defined group.
8.
Click the Group Filter Condition field to open the Expression Editor.
9.
10.
118
Within a session. When you configure a session, you can instruct the Informatica Server
to either treat all records in the same way (for example, treat all records as inserts), or use
instructions coded into the session mapping to flag records for different database
operations.
Within a mapping. Within a mapping, you use the Update Strategy transformation to flag
records for insert, delete, update, or reject.
119
120
Description
Insert
Treat all records as inserts. If inserting the record violates a primary or foreign key constraint in the
database, the Informatica Server rejects the record.
Delete
Treat all records as deletes. For each record, if the Informatica Server finds a corresponding record in the
target table (based on the primary key value), the Informatica Server deletes it. Note that the primary key
constraint must exist in the target definition in the repository.
Update
Treat all records as updates. For each record, the Informatica Server looks for a matching primary key value
in the target table. If it exists, the Informatica Server updates the record. Again, the primary key constraint
must exist in the target definition.
Data
Driven
The Informatica Server follows instructions coded into Update Strategy transformations within the session
mapping to determine how to flag records for insert, delete, update, or reject.
If the mapping for the session contains an Update Strategy transformation, this field is marked Data Driven
by default.
If you do not choose Data Driven setting, the Informatica Server ignores all Update Strategy transformations
in the mapping.
121
Use To
Insert
Populate the target tables for the first time, or maintaining a historical data warehouse. In the latter case,
you must set this strategy for the entire data warehouse, not just a select group of target tables.
Delete
Update
Update target tables. You might choose this setting whether your data warehouse contains historical data
or a snapshot. Later, when you configure how to update individual target tables, you can determine
whether to insert updated records as new records or use the updated information to modify existing
records in the target.
Data
Driven
Exert finer control over how you flag records for insert, delete, update, or reject. Choose this setting if
records destined for the same table need to be flagged on occasion for one operation (for example,
update), or for a different operation (for example, reject). In addition, this setting provides the only way you
can flag records for reject.
122
123
Option
Description
Update as
update
Update as
insert
Update else
insert
124
Constant
Numeric Value
Insert
DD_INSERT
Update
DD_UPDATE
Delete
DD_DELETE
Reject
DD_REJECT
125
2.
3.
Click and drag across the area where you want the transformation to appear. When you release the
mouse button, a new Update Strategy transformation appears.
4.
5.
Click and drag all the ports from another transformation representing data you want to pass through
the Update Strategy transformation. In the Update Strategy transformation, the Designer creates a
copy of each port you click and drag. The Designer also connects the new port to the original port.
Each port in the Update Strategy transformation is a combination input/output port. Normally, you
would select all of the columns destined for a particular target. After they pass through the Update
Strategy transformation, this information is flagged for update, insert, delete, or reject.
6.
7.
Click Rename, enter a descriptive name, and click OK. The naming convention for Update Strategy
transformations is UPD_TransformationName.
126
9.
Click the button in the Update Strategy Expression field. The Expression Editor appears.
10.
Enter an update strategy expression to flag records as inserts, deletes, updates, or rejects.
11.
12.
13.
Connect the ports in the Update Strategy transformation to another transformation or a target
instance.
14.
Choose Repository-Save.
127
128
Normalizer Transformation
The Normalizer transformation normalizes records from COBOL and relational sources, allowing you to
organize the data according to your own needs. A Normalizer transformation can appear anywhere in a
data flow when you normalize a relational source. Use a Normalizer transformation instead of the
Source Qualifier transformation when you normalize a COBOL source. When you drag a COBOL
source into the Mapping Designer workspace, the Normalizer transformation automatically appears,
creating input and output ports for every column in the source.
You primarily use the Normalizer transformation with COBOL sources, which are often stored in a
denormalized format. The OCCURS statement in a COBOL file nests multiple records of information in
a single record. Using the Normalizer transformation, you break out repeated data within a record into
separate records. For each new record it creates, the Normalizer transformation generates a unique
identifier. You can use this key value to join the normalized records.
You can also use the Normalizer transformation with relational sources to create multiple rows from a
single row of data.
129
130
131
External Procedure
Single return value: One row in, one row out. Each input has
one or zero output.
132
Mapplets
A mapplet is a reusable object that represents a set of transformations. It allows you to reuse
transformation logic and can contain as many transformations as you need. You create mapplets in the
Mapplet Designer.
Create a mapplet when you want to use a standardized set of transformation logic in several mappings.
For example, if you have several fact tables that require a series of dimension keys, you can create a
mapplet containing a series of Lookup transformations to find each dimension key. You can then use
the mapplet in each fact table mapping, rather than recreate the same lookup logic in each mapping.
To create a mapplet, you add, connect, and configure transformations to complete the desired
transformation logic.
After you save a mapplet, you can use it in a mapping to represent the transformations within the
mapplet. When you use a mapplet in a mapping, you use an instance of the mapplet. Like a reusable
transformation, any changes made to the mapplet are automatically inherited by all instances of the
mapplet.
133
Mapplet Input
Data passing through a mapplet comes from a source. Source data for a mapplet can
originate from one of two places:
Sources within the mapplet. Mapplet input can originate from within the mapplet if you
include one or more source definitions in the mapplet. When you use more than one
source definition in a mapplet, you must connect the sources to a single Source Qualifier or
ERP Source Qualifier transformation. When you use the mapplet in a mapping, the
mapplet provides source data for the mapping.
Sources outside the mapplet. Mapplet input can originate from outside a mapplet if you
include an Input transformation to define mapplet input ports. When you use the mapplet in
a mapping, data passes through the mapplet as part of the mapping pipeline.
134
Source Qualifier
You cannot connect sources to a Normalizer transformation. You cannot use COBOL, MQ,
or XML source definitions in a mapplet.
135
136
Mapplet Output
To pass data out of a mapplet, you create mapplet output ports. To create mapplet output ports, you add
Output transformations to the mapplet. Each port in an Output transformation connected to another
transformation in the mapplet becomes a mapplet output port. Each mapplet must contain at least one
Output transformation, and at least one port in the Output transformation must be connected within the
mapplet.
Each Output transformation in a mapplet represents a group of mapplet output ports, or output group.
Each output group can pass data to a single pipeline in the mapping. To pass data from a mapplet to
more than one pipeline, create an Output transformation for each pipeline.
When you use a mapplet in a mapping, you connect ports in each output group to different pipelines.
You do not have to use all mapplet output ports in a mapping, but you must use at least one.
137
138
Creating a Dimension
Before you can create a cube, you need to create dimensions. Complete each of the
following steps to create a dimension:
1.
2.
3.
4.
139
1.
2.
3.
4.
Description.
Database type. The database type of a dimension must match the database type of the
cube. Note: You cannot change the database type once you create the dimension.
Click OK.
140
After you create the dimension, add as many levels as needed. Levels hold the properties necessary to
create target tables.
1.
In the Dimension Editor select Levels and click Add Level.
141
142
3.
4.
143
5.
Select a source table from which you want to copy columns to the level. The columns display in the
Source Fields section.
144
1.
2.
145
3.
146
147
3.
148
149
2.
Database type. The database type for the cube must match the database type for
the dimensions in the cube.
3.
Click Next.
150
151
152
153
You can view the metadata for cubes and dimensions in the Repository Manager.
To view cube or dimension metadata:
154
Mapping Wizards
The Designer provides two mapping wizards to help you create mappings quickly and easily. Both
wizards are designed to create mappings for loading and maintaining star schemas, a series of
dimensions related to a central fact table. You can, however, use the generated mappings to load
other types of targets.
You choose a different wizard and different options in each wizard based on the type of target you
want to load and the way you want to handle historical data in the target:
Getting Started Wizard. Creates mappings to load static fact and dimension tables, as
well as slowly growing dimension tables.
After using a mapping wizard, you can edit the generated mapping to further customize it.
155
The Getting Started Wizard creates mappings to load static fact and dimension tables, as well as
slowly growing dimension tables.
The Getting Started Wizard can create two types of mappings:
Simple Pass Through. Loads a static fact or dimension table by inserting all rows. Use
this mapping when you want to drop all existing data from your table before loading new
data.
Slowly Growing Target. Loads a slowly growing fact or dimension table by inserting new
rows. Use this mapping to load new data when existing data does not require updates.
156
Type 1 Dimension mapping. Loads a slowly changing dimension table by inserting new
dimensions and overwriting existing dimensions. Use this mapping when you do not want a
history of previous dimension data.
Type 2 Dimension/Effective Date Range mapping. Loads a slowly changing dimension table
by inserting new and changed dimensions using a date range to define current dimension data.
Use this mapping when you want to keep a full history of dimension data, tracking changes with
an exact effective date range.
Type 3 Dimension mapping. Loads a slowly changing dimension table by inserting new
dimensions and updating values in existing dimensions. Use this mapping when you want to
keep the current and previous dimension values in your dimension table.
157
158
159
Enter a name for the mapping target table. Click Next. The naming convention for target
definitions is T_TARGET_NAME.
Select the column or columns from the Target Table Fields list that you want the
Informatica Server to use to look up data in the target table. Click Add.
The wizard adds selected columns to the Logical Key Fields list.
Tip: The columns you select should be a key column in the source.
When you run the session, the Informatica Server performs a lookup on existing target
data. The Informatica Server returns target data when Logical Key Fields columns match
corresponding target columns.
To remove a column from Logical Key Fields, select the column and click Remove.
Note: The Fields to Compare for Changes field is disabled for the Slowly Growing
Targets mapping.
160
161
The Slowly Growing Target mapping flags new source rows, and then inserts them to the target
with a new primary key. The mapping uses an Update Strategy transformation to indicate new
rows must be inserted. Therefore, when you create a session for the mapping, configure the
session as follows:
To ensure rows are inserted into the target properly, click the Target Options button to
access the Targets dialog box and select Insert.
162
Transformation
Type
SQ_SourceName
Source Qualifier
or ERP Source
Qualifier
Selects all rows from the source you choose in the Mapping Wizard.
LKP_GetData
Lookup
EXP_DetectChanges
Expression
Uses the following expression to flag source rows that have no matching key
in the target (indicating they are new):
IIF(ISNULL(PM_PRIMARYKEY),TRUE,FALSE)
Populates the NewFlag field with the results.
Passes all rows to FIL_InsertNewRecord.
FIL_InsertNewRecord
Filter
Uses the following filter condition to filter out any rows from
EXP_DetectChanges that are not marked new (TRUE): NewFlag. Passes new
rows to UPD_ForceInserts.
UPD_ForceInserts
Update Strategy
SEQ_GenerateKeys
Sequence
Generator
Generates a value for each new row written to the target, incrementing values
by 1. Passes values to the target to populate the PM_PRIMARYKEY column.
T_TargetName
Target Definition
Instance of the target definition for new rows to be inserted into the target.
Description
163
The Type 1 Dimension mapping filters source rows based on user-defined comparisons and
inserts only those found to be new dimensions to the target. Rows containing changes to
existing dimensions are updated in the target by overwriting the existing dimension. In the
Type 1 Dimension mapping, all rows contain current dimension data.
Use the Type 1 Dimension mapping to update a slowly changing dimension table when you
do not need to keep any previous versions of dimensions in the table.
164
165
166
Select a source definition to be used by the mapping. All available source definitions appear in
the Select Source Table list. This list includes shortcuts, flat file, relational, and ERP sources .
167
Enter a name for the mapping target table. Click Next. The naming convention for target
definitions is T_TARGET_NAME.
Select the column or columns you want to use as a lookup condition from the Target Table
Fields list and click Add.
The wizard adds selected columns to the Logical Key Fields list.
Tip: The columns you select should be a key column in the source.
When you run the session, the Informatica Server performs a lookup on existing target data.
The Informatica Server returns target data when Logical Key Fields columns match
corresponding target columns.
To remove a column from Logical Key Fields, select the column and click Remove.
168
169
The Type 1 Dimension mapping inserts new rows with a new primary key and updates
existing rows. When you create a session for the mapping, configure the session as
follows:
For the source, set Treat Rows As to Data Driven and select the source
database.
Select the target database. Then to ensure the Informatica Server loads
rows to the target properly, click the Target Options button. Select Insert and
Update (as Update).
170
The Type 2 Dimension/Version Data mapping filters source rows based on user-defined
comparisons and inserts both new and changed dimensions into the target. Changes are
tracked in the target table by versioning the primary key and creating a version number for
each dimension in the table. In the Type 2 Dimension/Version Data target, the current
version of a dimension has the highest version number and the highest incremented primary
key of the dimension.
Use the Type 2 Dimension/Version Data mapping to update a slowly changing dimension
table when you want to keep a full history of dimension data in the table. Version numbers
and versioned primary keys track the order of changes to each dimension.
When you use this option, the Designer creates two additional fields in the target:
PM_PRIMARYKEY. The Informatica Server generates a primary key for each row
written to the target.
171
Handling Keys
The next time you run the session, the same item has a different number of styles. The
Informatica Server creates a new row with updated style information and increases the
existing key by 1 to create a new key of 65,001. Both rows exist in the target, but the row
with the higher key version contains current dimension data.
PM_PRIMARYKEY ITEM STYLES
65000
Sandal
65001
Sandal
5
14
172
Numbering Versions
STYLES
5
14
17
173
Compares logical key columns in the source against corresponding columns in the
target lookup table
Creates two data flows: one for new rows, one for changed rows
Increments the primary key and version number for changed rows
174
Enter a mapping name and select Type 2 Dimension. Click Next. The naming
convention for mappings is mMappingName.
175
176
Select the column or columns you want to use as a lookup condition from the Target Table Fields
list and click Add.
177
178
The Type 2 Dimension/Version Data mapping inserts both new and updated rows with
a unique primary key. When you create a session for the mapping, configure the
session as follows:
For the source, set Treat Rows As to Data Driven and select the source
database.
To ensure rows are inserted into the target properly, click the Target Options
button to access the Targets dialog box and select Insert.
179
The Type 2 Dimension/Flag Current mapping filters source rows based on user defined
comparisons and inserts both new and changed dimensions into the target. Changes are tracked
in the target table by flagging the current version of each dimension and versioning the primary
key. In the Type 2 Dimension/Flag Current target, the current version of a dimension has a current
flag set to 1 and the highest incremented primary key.
Use the Type 2 Dimension/Flag Current mapping to update a slowly changing dimension table
when you want to keep a full history of dimension data in the table, with the most current data
flagged. Versioned primary keys track the order of changes to each dimension.
When you use this option, the Designer creates two additional fields in the target:
PM_CURRENT_FLAG. The Informatica Server flags the current row 1 and all
previous versions 0.
PM_PRIMARYKEY. The Informatica Server generates a primary key for each row
written to the target.
180
Compares logical key columns in the source against corresponding columns in the
target lookup table
Creates two data flows: one for new rows, one for changed rows
Increments the existing primary key and sets the current flag for changed rows
Updates existing versions of the changed rows in the target, resetting the current
flag to indicate the row is no longer current
181
182
The Type 2 Dimension/Effective Date Range mapping filters source rows based on user-defined
comparisons and inserts both new and changed dimensions into the target. Changes are tracked
in the target table by maintaining an effective date range for each version of each dimension in the
target. In the Type 2 Dimension/Effective Date Range target, the current version of a dimension
has a begin date with no corresponding end date.
Use the Type 2 Dimension/Effective Date Range mapping to update a slowly changing dimension
table when you want to keep a full history of dimension data in the table. An effective date range
tracks the chronological history of changes for each dimension.
When you use this option, the Designer creates three additional fields in the target:
PM_BEGIN_DATE. For each new and changed dimension written to the target, the
Informatica Server uses the system date to indicate the start of the effective date range
for the dimension.
PM_END_DATE. For each dimension being updated, the Informatica Server uses the
system date to indicate the end of the effective date range for the dimension.
PM_PRIMARYKEY. The Informatica Server generates a primary key for each row
written to the target.
183
Compares logical key columns in the source against corresponding columns in the target
lookup table
Compares source columns against corresponding target columns if key columns match
Creates three data flows: one for new rows, one for changed rows, one for updating
existing rows
Generates a primary key and beginning of the effective date range for new rows
Generates a primary key and beginning of the effective date range for changed rows
Updates existing versions of the changed rows in the target, generating the end of the
effective date range to indicate the row is no longer current
184
Select Mark the Dimension Records with their Effective Date Range.
185
The Type 3 Dimension mapping filters source rows based on user-defined comparisons and
inserts only those found to be new dimensions to the target. Rows containing changes to existing
dimensions are updated in the target. When updating an existing dimension, the Informatica
Server saves existing data in different columns of the same row and replaces the existing data
with the updates. The Informatica Server optionally enters the system date as a timestamp for
each row it inserts or updates. In the Type 3 Dimension target, each dimension contains current
dimension data.
Use the Type 3 Dimension mapping to update a slowly changing dimension table when you want
to keep only current and previous versions of column data in the table. Both versions of the
specified column or columns are saved in the same row.
When you use this option, the Designer creates additional fields in the target:
PM_PRIMARYKEY. The Informatica Server generates a primary key for each row
written to the target.
PM_EFFECT_DATE. An optional field. The Informatica Server uses the system date to
indicate when it creates or updates a dimension.
186
Compares logical key columns in the source against corresponding columns in the target
lookup table
Compares source columns against corresponding target columns if key columns match
Creates two data flows: one for new rows, one for updating changed rows
Generates a primary key and optionally notes the effective date for new rows
Writes previous values for each changed row into previous columns and replaces
previous values with updated values
Optionally uses the system date to note the effective date for inserted and updated
values
If you want the Informatica Server to timestamp new and changed rows, select Effective Date.
The wizard displays the columns the Informatica Server compares and the name of the column to
hold historic values.
188
Server Architecture
The Informatica Server moves data from sources to targets based on mapping and session
metadata stored in a repository.
Session Process
The Informatica Server uses both process memory and system shared memory to perform these
tasks. It runs as a daemon on UNIX and as a service on Windows NT/2000. The Informatica
Server uses the following processes to run a session:
The Load Manager process. Starts the session, creates the DTM process, and sends
post-session email when the session completes.
The DTM process. Creates threads to initialize the session, read, write, and transform
data, and handle pre- and post-session operations.
189
The Load Manager is the primary Informatica Server process. It performs the following tasks:
Creates the Data Transformation Manager (DTM) process, which executes the session.
190
The DTM process is the second process associated with a session run. The primary purpose of
the DTM process is to create and manage threads that carry out the session tasks.
The DTM allocates process memory for the session and divides it into buffers. This is also known
as buffer memory. The default memory allocation is 12,000,000 bytes. It creates the main thread,
which is called the master thread. The master thread creates and manages all other threads.
Thread Type
Description
Master Thread
Main thread of the DTM process. Creates and manages all other threads. Handles stop and
abort requests from the Load Manager.
Mapping Thread
One thread for each session. Fetches session and mapping information. Compiles the
mapping. Cleans up after session execution.
Reader Thread
One thread for each partition for each source pipeline. Reads sources. Relational sources
use relational threads, and file sources use file threads.
Writer Thread
One thread for each partition, if a target exists in the source pipeline. Writes to targets.
Transformation
Thread
191
Running a Session
When the Informatica Server runs a session, it performs the following tasks as configured in the
session properties:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
192
System Resources
CPU
Shared memory
Buffer memory
Cache Memory
The DTM process creates in-memory index and data caches to temporarily store data used by the
following transformations:
Rank transformation
Joiner transformation
193
Reject Files
Control File
Post-Session Email
Output File
Cache Files
194
Cache Files
The Informatica Server writes to the index and data cache files during the session in the following
cases:
The mapping contains one or more Aggregator transformations, and the session is
configured for incremental aggregation.
The DTM runs out of cache memory and pages to the local cache files. The DTM may
create multiple files when processing large amounts of data. The session fails if the
local directory runs out of disk space.
After the session completes, the DTM generally deletes the overflow index and data files. It does
not delete the cache files under the following circumstances:
195
Configure Server Manager display options. You can configure the display options
such as grouping sessions or docking and undocking windows.
Register Informatica Servers. Before you can start an Informatica Server, you must
register it with the repository.
Create FTP connections. After you create FTP connections, you can configure a
session to use FTP to access source or target files
Create external loader connections. Create connections to Oracle, Sybase IQ, and
Teradata external loaders. You must create these connections before you can
configure a session to use an external loader.
196
197
Server Variables
Server Variable
Required/Optional
Description
$PMRootDir
Required
$PMSessionLogDir
Required
$PMBadFileDir
Required
$PMCacheDir
Required
Default directory for the lookup cache, index and data caches, and index and
data files. Defaults to $PMRootDir/Cache. To avoid performance problems,
always use a drive local to the Informatica Server for the cache directory. Do
not use a mapped or mounted drive for cache files.
$PMTargetFileDir
Required
$PMSourceFileDir
Required
$PMExtProcDir
Required
$PMTempDir
Required
198
Server Variables
Optional
$PMFailureEmailUser
Optional
$PMSessionLogCount
Optional
$PMSessionErrorThreshhold
Optional
$PMSuccessEmailUser
199
A session is a set of instructions that tells the Informatica Server how and when to
move data from sources to targets.
When you create a session, you enter general information such as the session
name, session schedule, and the Informatica Server to run the session.
You can also select options to execute pre-session shell commands, send postsession email, and FTP source and target files.
Using session properties, you can also override parameters established in the
mapping, such as source and target location, source and target type, error tracing
levels, and transformation attributes.
200
Creating a Session
Session name, which must be unique among all sessions in a given folder
Source type
Target type
201
Use the Session Wizard to create a session for a valid mapping. The Session Wizard has the
following pages, and each of those pages has multiple dialog boxes where you enter session
properties:
General page. Enter source and target information and performance configuration.
Log Files page. Enter log file and error handling information.
202
Starting a Session
Server Manager
PMCMD Command
203
Monitoring a Session
The Server Manager allows you to monitor sessions on an Informatica Server. When monitoring a
session, you can use information provided through the Server Manager to troubleshoot sessions
and improve session performance.
When you poll the Informatica Server, it indicates the following types of session status in the
Monitor window:
204
Description
Session
Name
Session name.
Server Name
Top Level
Batch
Top or outermost batch containing the session. If the session is not a part of a nested batch, this
field displays the folder name.
Batch
Batch containing the session, if the session is batched. If the session is a standalone session,
this field displays the folder name.
Status
Session status.
Start Time
Completion
Time
First Error
Mapping
Name
Session Run
Mode
User Name
205
When you run a session, the Server Manager creates session details that provide load statistics
for each target in the mapping. You can view session details during the session or after the
session completes.
Session
Detail
Description
Table Name
Name of target table. If you have multiple instances of a target, this field shows both the target
instance name and the table name. The target instance display format is Table Name:Instance
Name.
Loaded
Failed
Read
Throughput
Rate at which the Informatica Server read rows from the source (bytes/sec).
Write
Throughput
Rate at which the Informatica Server wrote data into the target (rows/sec).
Current
Message
The most recent error message written to the session log. If you view details after the session
completes, this field displays the last error message.
206
Stopping a Session
In the Server Manager Navigator, select the session you want to stop.
To stop a session running against the Informatica Server configured in the session
properties, choose Server Requests-Stop or use the Stop button on the toolbar.
To stop a session running against an Informatica Server other than the one configured
in the session properties, use the Stop button on the toolbar to select the Informatica
Server running the session.
In the Server Manager Navigator, select the session you want to abort.
To abort a session running against the Informatica Server configured in the session
properties, choose Server-Requests-Abort.
207
Managing Batches
Batches provide a way to group sessions for either serial or parallel execution by the Informatica
Server. There are two types of batches:
Nesting Batches
Each batch can contain any number of sessions or other batches. You can nest batches several
levels deep, defining batches within batches. Nested batches are useful when you want to control
a complex series of sessions that must run sequentially or concurrently.
Scheduling
When you place sessions in a batch, the batch schedule overrides the session schedule by
default. However, you can configure a batched session to run on its own schedule by selecting the
Use Absolute Time session option.
208
Recovering a Batch
When a session or sessions in a batch fail, you can perform recovery to complete the batch. The
steps you take vary depending on the type of batch:
Sequential batch. If the batch is sequential, you can recover data from the session
that failed and run the remaining sessions in the batch.
Concurrent batch. If a session within in a concurrent batch fails, but the rest of the
sessions complete successfully, you can recover data from the failed session targets to
complete the batch. However, if all sessions in a concurrent batch fail, you might want
to truncate all targets and run the batch again.
209
Using PMCMD
You can use the command line program pmcmd to communicate with the Informatica Server. This
does not replace the Server Manager, since there are many tasks that you can perform only with
the Server Manager.
You can perform the following actions with pmcmd:
Recover sessions.
210
Connection type. The type of connection from the client machine to the Informatica
Server (TCP/IP or IPX/SPX).
Port or connection. The TCP/IP port number or IPX/SPX connection (Windows NT/2000
only) to the Informatica Server.
Host name. The machine hosting the Informatica Server (if running pmcmd from a remote
machine through a TCP/IP connection).
Session or batch name. The names of any sessions or batches you want to start or stop.
Folder name. The folder names for those sessions or batches (if their names are not
unique in the repository).
Parameter file. The directory and name of the parameter file you want the Informatica
Server to use with the session or batch.
211
212
Return Value
Description
The Informatica Server is down, or pmcmd cannot connect to the Informatica Server.
The TCP/IP host name or port number, or IPX/SPX address (if applicable) may be
incorrect, or a network problem occurred.
Informatica Server timed out while waiting for the request. Try sending it again.
213
Use the following syntax to start a session or batch on a Windows NT/2000 system:
pmcmd start {user_name | %user_env_var} {password | %password_env_var } {[TCP/IP:]
[hostname:]portno | IPX/SPX: ipx/spx_address} [folder_name:] {session_name |
batch_name} [:pf=param_file] session_flag wait_flag
214
Description
If pmcmd was called in wait mode (wait flag = 1), 0 indicates the session or batch ran successfully.
If pmcmd was not called in wait mode (wait flag = 0), 0 indicates the request to start the session was
successfully transmitted to the Informatica Server, and it acknowledged the request.
The Informatica Server is down, or pmcmd cannot connect to the Informatica Server. The TCP/IP host name
or port number, or IPX/SPX address (if applicable) may be incorrect, or a network problem occurred.
The specified session or batch name does not exist. Or, if you specified a folder name, the folder does not
contain the specified session or batch.
You do not have the appropriate permissions or privileges to perform this action.
Informatica Server timed out while waiting for the request. Try sending it again.
215
10
13
14
15
16
17
18
The Informatica Server found the parameter file, but experienced errors expanding the start values for the
session parameters. The parameter file may not have the start values for the session parameters.
216
Use the following syntax to stop a session or batch on a Windows NT/2000 system:
pmcmd stop {user_name | %user_env_var} {password | %password_env_var } {[TCP/IP:]
[hostname:]portno | IPX/SPX: ipx/spx_address} [folder_name:] {session_name |
batch_name} [:pf=param_file] session_flag wait_flag
217
Use pmcmd to recover a standalone session. You cannot use pmcmd to recover a
session in a batch.
Use the following syntax to recover a standalone session on a Windows NT/2000
system:
pmcmd startrecovery {user_name | %user_env_var}
{password | %password_env_var}
{[TCP/IP:][hostname:]portno | IPX/SPX:ipx/spx_address}
[folder_name:]session_name [:pf=param_file] wait_flag
218
Use the following syntax to stop Informatica Server on a Windows NT/2000 system:
pmcmd stopserver {user_name | %user_env_var}
{password | %password_env_var}
{[TCP/IP:][hostname:]portno | IPX/SPX:ipx/spx_address}
219
Description
The Informatica Server is down, or pmcmd cannot connect to the Informatica Server. The TCP/IP host name
or port number, or IPX/SPX address (if applicable) may be incorrect, or a network problem occurred.
An error occurred while stopping the Informatica Server. Contact Informatica Technical Support.
You do not have the appropriate permissions or privileges to perform this action.
Server timed out while waiting for the request. Try sending it again.
13
14
15
16
220
Reject Loading
During a session, the Informatica Server creates a reject file for each target instance in the
mapping. If the writer or the target rejects data, the Informatica Server writes the rejected row into
the reject file.
The reject file and session log contain information that helps you determine the cause of the
reject.
You can correct reject files and load them to relational targets using the Informatica reject loader
utility. The reject loader also creates another reject file for the data that the writer or target reject
during the reject loading.
Each time you run a session, the Informatica Server appends rejected data to the reject file.
Complete the following tasks to load reject data into the target:
221
Reject files contain rows of data rejected by the writer or the target database. Though the
Informatica Server writes the entire row in the reject file, the problem generally centers on one
column within the row. To help you determine which column caused the row to be rejected, the
Informatica Server adds row and column indicators to give you more information about each
column:
Row indicator. The first column in each row of the reject file is the row indicator. The
numeric indicator tells whether the row was marked for insert, update, delete, or reject.
Column indicator. Column indicators appear after every column of data. The alphabetical
character indicators tell whether the data was valid, overflow, null, or truncated.
The following sample reject file shows the row and column indicators:
3,D,1,D,,D,0,D,1094945255,D,0.00,D,-0.00,D
0,D,1,D,April,D,1997,D,1,D,-1364.22,D,-1364.22,D
0,D,1,D,April,D,2000,D,1,D,2560974.96,D,2560974.96,D
3,D,1,D,April,D,2000,D,0,D,0.00,D,0.00,D
0,D,1,D,August,D,1997,D,2,D,2283.76,D,4567.53,D
0,D,3,D,December,D,1999,D,1,D,273825.03,D,273825.03,D
0,D,1,D,September,D,1997,D,1,D,0.00,D,0.00,D
222
Row Indicators
The first column in the reject file is the row indicator. The number listed as the row indicator tells
the writer what to do with the row of data.
Row Indicator
Meaning
Rejected By
Insert
Writer or target
Update
Writer or target
Delete
Writer or target
Reject
Writer
223
Column Indicators
After the row indicator is a column indicator, followed by the first column of data, and another
column indicator. Column indicators appear after every column of data and define the type of the
data preceding it.
Column Indicator
Type of data
Writer Treats As
Valid data.
224
After you correct the reject file and rename it to reject_file.in, you can use the reject loader to send
those files through the writer to the target database.
Use the reject loader utility from the command line to load rejected files into target tables. The
syntax for reject loading differs on UNIX and Windows NT/2000 platforms.
Use the following syntax for UNIX:
pmrejldr pmserver.cfg [folder_name:]session_name
Use the following syntax for Windows NT/2000:
pmrejldr [folder_name:]session_name
225
Commit Points
A commit interval is the interval at which the Informatica Server commits data to relational targets
during a session. You can choose between the following types of commit interval:
Target-based commit. The Informatica Server commits data based on the number of
target rows and the key constraints on the target table. The commit point also depends on
the buffer block size and the commit interval.
Source-based commit. The Informatica Server commits data based on the number of
source rows. The commit point is the commit interval you configure in the session
properties.
226
227
Source Qualifier
Normalizer
Aggregator
Joiner
Rank
Note: Although the Filter, Router, and Update Strategy transformations are active transformations,
the Informatica Server does not use them as active sources in a source-based commit session.
The Informatica Server generates a commit row from the active source at every commit interval.
When each target in the pipeline receives the commit row, the Informatica Server performs the
commit.
The number of rows held in the writer buffers does not affect the commit point for a source-based
commit session.
228
Performance Tuning
The most common performance bottleneck occurs when the Informatica Server writes to a target
database. You can identify performance bottlenecks by the following methods:
Running test sessions. You can configure a test session to read from a flat file source or
to write to a flat file target to identify source and target bottlenecks.
Studying performance details. You can create a set of information called performance
details to identify session bottlenecks. Performance details provide information such as
buffer input and output efficiency.
Monitoring system performance. You can use system monitoring tools to view percent
CPU usage, I/O waits, and paging to identify system bottlenecks.
229
Target
2.
Source
3.
Mapping
4.
Session
5.
System
230
231
Filter transformation
Database query
Optimizing the Source Database
If your session reads from a relational source, review the following suggestions for improving
performance:
232
233
Optimizing a Mapping
Generally, you reduce the number of transformations in the mapping and delete unnecessary links
between transformations to optimize the mapping. You should configure the mapping with the least
number of transformations and expressions to do the most amount of work possible. You should
minimize the amount of data moved by deleting unnecessary links between transformations.
For transformations that use data cache (such as Aggregator, Joiner, Rank, and Lookup
transformations), limit connected input/output or output ports. Limiting the number of connected
input/output or output ports reduces the amount of data the transformations store in the data cache.
You can also perform the following tasks to optimize the mapping:
Optimize transformations.
Optimize expressions.
234
235
Optimizing a Session
You can perform the following tasks to improve overall performance:
Partition sessions.
236
237
Percent processor time. If you have several CPUs, monitor each CPU for percent
processor time. If the processors are utilized at more than 80%, you may consider
adding more processors.
Pages/second. If pages/second is greater than five, you may have excessive memory
pressure (thrashing). You may consider adding more physical memory.
Physical disks percent time. This is the percent time that the physical disk is busy
performing read or write requests. You may consider adding another disk device or
upgrading the disk device.
Physical disks queue length. This is the number of users waiting for access to the
same disk device. If physical disk queue length is greater than two, you may consider
adding another disk device or upgrading the disk device.
Server total bytes per second. This is the number of bytes the server has sent to and
received from the network. You can use this information to improve network bandwidth.
238
lsattr -E -I sys0. Use this tool to view current system settings. This tool shows
maxuproc, the maximum level of user background processes. You may consider
reducing the amount of background process on your system.
iostat. Use this tool to monitor loading operation for every disk attached to the database
server. Iostat displays the percentage of time that the disk was physically active. High
disk utilization suggests that you may need to add more disks. If you use disk arrays, use
utilities provided with the disk arrays instead of iostat.
vmstat or sar -w. Use this tool to monitor disk swapping actions. Swapping should not
occur during the session. If swapping does occur, you may consider increasing your
physical memory or reduce the number of memory-intensive applications on the disk.
sar -u. Use this tool to monitor CPU loading. This tool provides percent usage on user,
system, idle time, and waiting time. If the percent time spent waiting on I/O (%wio) is
high, you may consider using other under-utilized disks. For example, if your source
data, target data, lookup, rank, and aggregate cache files are all on the same disk,
consider putting them on different disks.
239