Informatica Power Center 9
Informatica Power Center 9
[email protected]
Introduction
At the end of this course you will -
Understand how to use all major PowerCenter
components
Be able to build basic ETL Mappings and Mapplets
Understand the different Transformations and their
basic attributes in PowerCenter
Be able to create, run and monitor Workflows
Be able to troubleshoot common development
problems
[email protected]
ETL Basics
This section will include -
Concepts of ETL
PowerCenter Architecture
Connectivity between PowerCenter components
[email protected]
Extract, Transform, and Load
Operational Systems Decision Support
Data
RDBMS Mainframe Other Warehouse
ETL Load
Extract
[email protected]
PowerCenter Architecture
[email protected]
PowerCenter Components
[email protected]
PowerCenter Components
• PowerCenter Repository
• Repository Service Application Services
• Integration Service
• PowerCenter Client
• Administration Console
• Repository Manager
• Designer
• Workflow Manager
• Workflow Monitor
• External Components
• Sources
• Targets
[email protected]
Introduction To
PowerCenter Repository
and
Administration
This section includes -
The purpose of the Repository and Repository Service
Admin Console
The Repository Manager
Security and privileges
Object sharing, searching and locking
[email protected]
PowerCenter Repository
It is a relational database managed by the Repository Service
Stores metadata about the objects (mappings, transformations
etc.) in database tables called as Repository Content
The Repository database can be in Oracle, IBM DB2 UDB, MS
SQL Server or Sybase ASE
To create a repository service one must have full privileges in
the Administrator Console and also in the domain
Integration Service uses repository objects for performing the
ETL
[email protected]
Repository Service
A Repository Service process is a multi-threaded process that
fetches, inserts and updates metadata in the repository
Manages connections to the Repository from client
applications and Integration Service
Maintains object consistency by controlling object locking
Each Repository Service manages a single repository database.
However multiple repositories can be connected and managed
using repository domain
It can run on multiple machines or nodes in the domain. Each
instance is called a Repository Service process
[email protected]
Administration Console
[email protected]
Introduction to
PowerCenter Design Process
We will walk through -
Design Process
PowerCenter Designer Interface
Mapping Components
[email protected]
Design Process
1. Create Source definition(s)
2. Create Target definition(s)
3. Create a Mapping
4. Create a Session Task
5. Create a Workflow from Task components
6. Run the Workflow
7. Monitor the Workflow and verify the results
[email protected]
Designer- Interface
Overview Window
Client
Tools
Workspace
Navigator
Output
Status Bar
[email protected]
Mapping Components
[email protected]
Introduction
To
PowerCenter Designer Interface
Designer- Interface
Mapping List
Folder List Transformation Toolbar
Iconized Mapping
[email protected]
Designer- Source Analyzer
Foreign Key
[email protected]
Designer- Target Designer
[email protected]
Transformation Developer
[email protected]
Mapplet Designer
[email protected]
Designer- Mapping Designer
[email protected]
EXTRACT – Source Object Definitions
This section introduces to -
Different Source Types
Creation of ODBC Connections
Creation of Source Definitions
Source Definition properties
Data Preview option
[email protected]
Source Analyzer
Navigation
Window
Analyzer Window
[email protected]
Methods of Analyzing Sources
[email protected]
Analyzing Relational Sources
Source Analyzer
Relational Source
ODBC Table
View
Synonym
DEF
TCP/IP
Repository
Service
native
Repository
DEF
[email protected]
Analyzing Relational Sources
Editing Source Definition Properties
Key Type
[email protected]
Flat File Wizard
Three-step
wizard
Columns can
be renamed
within wizard
Text, Numeric
and Datetime
datatypes are
supported
Wizard
‘guesses’
datatype
[email protected]
LOAD – Target Definitions
Target Object Definitions
By the end of this section you will:
Be familiar with Target Definition types
Know the supported methods of creating Target Definitions
Understand individual Target Definition properties
[email protected]
Target Designer
[email protected]
Creating Target Definitions
Methods of creating Target Definitions
Import from Database
Import from an XML file
Import from third party software like SAP, Siebel etc.
Manual Creation
Automatic Creation
[email protected]
Automatic Target Creation
Drag-and-
drop a
Source
Definition
into
the Target
Designer
Workspace
[email protected]
Import Definition from Database
Can “Reverse engineer” existing object definitions from a
database system catalog or data dictionary
Target
Database
Designer
ODBC
Table
TCP/IP View
DEF Synonym
Repository
Service
native
Repository DEF
[email protected]
Creating Physical Tables
DEF
DEF
LOGICAL PHYSICAL
Repository target table Target database
definitions tables
[email protected]
TRANSFORM – Transformation Concepts
Transformation Concepts
By the end of this section you will be familiar with:
Transformation types
Data Flow Rules
Transformation Views
PowerCenter Functions
Expression Editor and Expression validation
Port Types
PowerCenter data types and Datatype Conversion
Connection and Mapping Valdation
PowerCenter Basic Transformations – Source Qualifier, Filter, Joiner, Expression
[email protected]
Types of Transformations
Active/Passive
• Active : Changes the numbers of rows as data passes
through it
• Passive: Passes all the rows through it
Connected/Unconnected
• Connected : Connected to other transformation through
connectors
• Unconnected : Not connected to any transformation.
Called within a transformation
[email protected]
Transformation Types
PowerCenter provides 24 objects for data transformation
Aggregator: performs aggregate calculations
Application Source Qualifier: reads Application object sources as ERP
Custom: Calls a procedure in shared library or DLL
Expression: performs row-level calculations
External Procedure (TX): calls compiled code for each row
Filter: drops rows conditionally
Mapplet Input: Defines mapplet input rows. Available in Mapplet designer
Java: Executes java code
Joiner: joins heterogeneous sources
Lookup: looks up values and passes them to other objects
Normalizer: reads data from VSAM and normalized sources
Mapplet Output: Defines mapplet output rows. Available in Mapplet designer
[email protected]
Transformation Types
Rank: limits records to the top or bottom of a range
Router: splits rows conditionally
Sequence Generator: generates unique ID values
Sorter: sorts data
Source Qualifier: reads data from Flat File and Relational Sources
Stored Procedure: calls a database stored procedure
Transaction Control: Defines Commit and Rollback transactions
Union: Merges data from different databases
Update Strategy: tags rows for insert, update, delete, reject
XML Generator: Reads data from one or more Input ports and outputs XML through
single output port
XML Parser: Reads XML from one or more Input ports and outputs data through
single output port
XML Source Qualifier: reads XML data
[email protected]
Data Flow Rules
Each Source Qualifier starts a single data stream
(a dataflow)
Transformations can send rows to more than one
transformation (split one data flow into multiple pipelines)
Two or more data flows can meet together -- if (and only if)
they originate from a common active transformation
Cannot add an active transformation into the mix
ALLOWED DISALLOWED
Passive Active
T T T T
Example holds true with Normalizer in lieu of Source Qualifier. Exceptions are:
Mapplet Input and Joiner transformations
[email protected]
Transformation Views
A transformation has
three views:
Iconized - shows the
transformation in
relation to the rest of
the mapping
Normal - shows the
flow of data through
the transformation
Edit - shows
transformation ports
and properties; allows
editing
Edit Mode
Allows users with folder “write” permissions to change
or create transformation ports and properties
Define port level handling Define transformation
level properties
Enter comments
Make reusable
Switch
between
transformations
[email protected]
Expression Editor
An expression formula is a calculation or conditional statement
Used in Expression, Aggregator, Rank, Filter, Router, Update Strategy
Performs calculation based on ports, functions, operators, variables,
literals, constants and return values from other transformations
PowerCenter Data Types
NATIVE DATATYPES TRANSFORMATION DATATYPES
Specific to the source and target database types PowerCenter internal datatypes based on ANSI SQL-92
Display in source and target tables within Mapping Designer Display in transformations within Mapping Designer
ASCII
CHR Character Functions
CHRCODE
CONCAT Used to manipulate character data
INITCAP
INSTR
CHRCODE returns the numeric value
LENGTH (ASCII or Unicode) of the first
LOWER character of the string passed to this
LPAD
LTRIM function
RPAD
RTRIM
SUBSTR For backwards compatibility only - use || instead
UPPER
REPLACESTR
REPLACECHR
[email protected]
PowerCenter Functions
TO_CHAR (numeric)
TO_DATE
Conversion Functions
TO_DECIMAL Used to convert datatypes
TO_FLOAT
TO_INTEGER Date Functions
TO_NUMBER
Used to round, truncate, or
ADD_TO_DATE compare dates; extract one part of
DATE_COMPARE a date; or perform arithmetic on a
DATE_DIFF
GET_DATE_PART date
LAST_DAY
ROUND (date)
To pass a string to a date function,
SET_DATE_PART first use the TO_DATE function to
TO_CHAR (date) convert it to an date/time
TRUNC (date)
datatype
[email protected]
PowerCenter Functions
IIF IIF(Condition,True,False)
ISNULL
Test Functions
IS_DATE
IS_NUMBER Used to test if a lookup result is null
IS_SPACES Used to validate data
[email protected]
Expression Validation
The Validate or ‘OK’ button in the Expression Editor will:
Parse the current expression
• Remote port searching (resolves references to ports in
other transformations)
Parse transformation attributes
• e.g. - filter condition, lookup condition, SQL Query
Parse default values
Check spelling, correct number of arguments in functions,
other syntactical errors
[email protected]
Types of Ports
Four basic types of ports are there
• Input
• Output
• Input/Output
• Variable
Apart from these Look-up & Return ports are also there
that are specific to the Lookup transformation
[email protected]
Variable and Output Ports
Use to simplify complex expressions
• e.g. - create and store a depreciation formula to be
referenced more than once
Use in another variable port or an output port expression
Local to the transformation (a variable port cannot also be an input or
output port)
Available in the Expression, Aggregator and Rank transformations
[email protected]
Connection Validation
Examples of invalid connections in a Mapping:
Connecting ports with incompatible datatypes
Connecting output ports to a Source
Connecting a Source to anything but a Source Qualifier
or Normalizer transformation
Connecting an output port to an output port or an input
port to another input port
Connecting more than one active transformation to
another transformation (invalid dataflow)
[email protected]
Mapping Validation
Mappings must:
• Be valid for a Session to run
• Be end-to-end complete and contain valid expressions
• Pass all data flow rules
Mappings are always validated when saved; can be validated
without being saved
Output Window will always display reason for invalidity
[email protected]
Source Qualifier Transformation
Reads data from the sources
Active & Connected Transformation
Applicable only to relational and flat file sources
Maps database/file specific datatypes to PowerCenter
Native datatypes.
• Eg. Number(24) becomes decimal(24)
Determines how the source database binds data when the
Integration Service reads it
If mismatch between the source definition and source
qualifier datatypes then mapping is invalid
All ports by default are Input/Output ports
Source Qualifier Transformation
Used as
Joiner for homogenous
tables using a where clause
Filter using a where clause
Sorter
Select distinct values
Pre-SQL and Post-SQL Rules
Can use any command that is valid for the database type; no
nested comments
Can use Mapping Parameters and Variables in SQL executed
against the source
Use a semi-colon (;) to separate multiple statements
Informatica Server ignores semi-colons within single quotes,
double quotes or within /* ...*/
To use a semi-colon outside of quotes or comments, ‘escape’
it with a back slash (\)
Workflow Manager does not validate the SQL
[email protected]
Filter Transformation
Active Transformation
Connected
Ports
• All input / output
Specify a Filter
condition
Usage
• Filter rows from
flat file sources
• Single pass
source(s) into
multiple targets
[email protected]
Filter Transformation – Tips
Boolean condition is always faster as compared to complex
conditions
Use filter transformation early in the mapping
Source qualifier filters rows from relational sources but filter
transformation is source independent
Always validate a condition
[email protected]
Joiner Transformation
By the end of this sub-section you will be familiar with:
When to use a Joiner Transformation
Homogeneous Joins
Heterogeneous Joins
Joiner properties
Joiner Conditions
Nested joins
[email protected]
Homogeneous Joins
Joins that can be performed with a SQL SELECT statement:
Source Qualifier contains a SQL join
Tables on same database server (or are synonyms)
Database server does the join “work”
Multiple homogenous tables can be joined
[email protected]
Heterogeneous Joins
Joins that cannot be done with a SQL statement:
An Oracle table and a Sybase table
Two Informix tables on different database servers
Two flat files
A flat file and a database table
[email protected]
Joiner Transformation
Performs heterogeneous joins on records from two
tables on same or different databases or flat file
sources
Active Transformation
Connected
Ports
• All input or input / output
• “M” denotes port comes
from master source
Specify the Join condition
Usage
• Join two flat files
• Join two tables from
different databases
• Join a flat file with a
relational table
[email protected]
Joiner Conditions
Multiple
join
conditions
are supported
[email protected]
Joiner Properties
Join types:
“Normal” (inner)
Master outer
Detail outer
Full outer
[email protected]
Sorted Input for Joiner
Using sorted input improves session performance minimizing the disk
input and output
The pre-requisites for using the sorted input are
• Database sort order must be same as the session sort order
• Sort order must be configured by the use of sorted sources (flat
files/relational tables) or sorter transformation
• The flow of sorted data must me maintained by avoiding the use of
transformations like Rank, Custom, Normalizer etc. which alter the sort
order
• Enable the sorted input option is properties tab
• The order of the ports used in joining condition must match the order of
the ports at the sort origin
• When joining the Joiner output with another pipeline make sure that the
data from the first joiner is sorted
[email protected]
Mid-Mapping Join - Tips
The Joiner does not accept input in the following situations:
Both input pipelines begin with the same Source Qualifier
Both input pipelines begin with the same Normalizer
Both input pipelines begin with the same Joiner
Either input pipeline contains an Update Strategy
[email protected]
Expression Transformation
Passive Transformation
Connected
Ports
• Mixed
• Variables allowed
Create expression in
an output or variable
port
Usage
• Perform majority of
data manipulation
Click here to invoke the
Expression Editor
[email protected]
Router Transformation
Multiple filters in single transformation Adds a group
Active Transformation
Connected
Ports
• All input/output
Specify filter conditions
for each Group
Usage
• Link source data in
one pass to multiple
filter conditions
[email protected]
Router Transformation in a
Mapping
TARGET_ROU
TED_ORDER1 (
Ora c le )
[email protected]
Comparison – Filter and Router
Filter Router
Tests rows for only one condition Tests rows for one or more condition
Drops the rows which don’t meet the filter Routes the rows not meeting the filter condition to
condition default group
Active transformation
Is always connected
Can sort data from relational tables or flat files both in ascending or
descending order
Only Input/Output/Key ports are there
Sort takes place on the Integration Service machine
Multiple sort keys are supported. The Integration Service sorts each port
sequentially
The Sorter transformation is often more efficient than a sort performed
on a database with an ORDER BY clause
[email protected]
Sorter Transformation
Discard duplicate rows by selecting ‘Distinct’ option
Active Transformation
Connected
Ports
• Mixed
• Variables allowed
• Group By allowed
Create expressions in
output or variable ports
Usage
• Standard aggregations
[email protected]
PowerCenter Aggregate Functions
Aggregate Functions
AVG Return summary values for non-null data in selected ports
COUNT
Used only in Aggregator transformations
FIRST
LAST Used in output ports only
MAX Calculate a single value (and row) for all records in a group
MEDIAN
MIN Only one aggregate function can be nested within an
aggregate function
PERCENTILE
STDDEV Conditional statements can be used with these functions
SUM
VARIANCE
[email protected]
Aggregate Expressions
Aggregate
functions are
supported
only in the
Aggregator
Transformation
Conditional
Aggregate
expressions are
supported
Instructs the
Aggregator to
expect the data
to be sorted
Set Aggregator
cache sizes (on
Integration Service
machine)
[email protected]
Why Sorted Input?
Aggregator works efficiently with sorted input data
• Sorted data can be aggregated more efficiently,
decreasing total processing time
The Integration Service will cache data from each
group and release the cached data -- upon reaching
the first record of the next group
Data must be sorted according to the order of the
Aggregator “Group By” ports
Performance gain will depend upon varying factors
[email protected]
Lookup Transformation
By the end of this sub-section you will be familiar with:
Lookup principles
Lookup properties
Lookup conditions
Lookup techniques
Caching considerations
[email protected]
Lookup Transformation
[email protected]
Lookup Transformation
Different types of configuration possible :-
Flat File or Relational
Pipeline Lookup
Connected or Unconnected lookup
Cached or Un-cached Lookup
[email protected]
How a Lookup Transformation Works
For each Mapping row, one or more port values are
looked up in a database table
If a match is found, one or more table values are
returned to the Mapping. If no match is found, NULL is
returned Look Up Transformation
Look-up
Values
Return
SQ_TARGET_ITEMS_OR... LKP_OrderID TARGET_ORDERS_COS...
Source Qualifier Lookup Proce dure
Values Target Definition
Name Datatype Len... Name Datatype Len... Loo... Ret... AssociatedK...Name
... Datatype Len...
ITEM_ID decimal 38 IN_ORDER_ID decimal 38 No No ORDER_ID number(p,s) 38
ITEM_NAME string 72 DATE_ENTERED date/ time 19 Yes No DATE_ENTERED date 19
ITEM_DESC string 72 DATE_PROMISED date/ time 19 Yes No DATE_PROMISED date 19
WHOLESALE_CO... decimal 10 DATE_SHIPPED date/ time 19 Yes No DATE_SHIPPED date 19
DISCONTINUED_... decimal 38 EMPLOYEE_ID decimal 38 Yes No EMPLOYEE_ID number(p,s) 38
MANUFACTURER...decimal 38 CUSTOMER_ID decimal 38 Yes No CUSTOMER_ID number(p,s) 38
DISTRIBUTOR_ID decimal 38 SALES_TAX_RATE decimal 5 Yes No SALES_TAX_RATE number(p,s) 5
ORDER_ID decimal 38 STORE_ID decimal 38 Yes No STORE_ID number(p,s) 38
TOTAL_ORDER_... decimal 38 TOTAL_ORDER_... number(p,s) 38
[email protected]
Lookup Transformation
Looks up values in a database table or flat files and provides
data to downstream transformation in a Mapping
Passive Transformation
Connected / Unconnected
Ports
• Mixed
• “L” denotes Lookup port
• “R” denotes port used as a
return value (unconnected
Lookup only)
Specify the Lookup Condition
Usage
• Get related values
• Verify if records exists or
if data has changed
[email protected]
Lookup Properties
Override
Lookup SQL
option
Toggle
caching
Native
Database
Connection
Object name
[email protected]
Additional Lookup Properties
Set cache
directory
Make cache
persistent
Set Lookup
cache sizes
[email protected]
Lookup Conditions
Multiple conditions are supported
[email protected]
Connected Lookup
SQ_TARGET_ITEMS_OR... LKP_OrderID TARGET_ORDERS_COS...
Source Qualifier Lookup Proce dure Target Definition
Name Datatype Len... Name Datatype Len... Loo... Ret... AssociatedK...Name
... Datatype Len...
ITEM_ID decimal 38 IN_ORDER_ID decimal 38 No No ORDER_ID number(p,s) 38
ITEM_NAME string 72 DATE_ENTERED date/ time 19 Yes No DATE_ENTERED date 19
ITEM_DESC string 72 DATE_PROMISED date/ time 19 Yes No DATE_PROMISED date 19
WHOLESALE_CO... decimal 10 DATE_SHIPPED date/ time 19 Yes No DATE_SHIPPED date 19
DISCONTINUED_... decimal 38 EMPLOYEE_ID decimal 38 Yes No EMPLOYEE_ID number(p,s) 38
MANUFACTURER...decimal 38 CUSTOMER_ID decimal 38 Yes No CUSTOMER_ID number(p,s) 38
DISTRIBUTOR_ID decimal 38 SALES_TAX_RATE decimal 5 Yes No SALES_TAX_RATE number(p,s) 5
ORDER_ID decimal 38 STORE_ID decimal 38 Yes No STORE_ID number(p,s) 38
TOTAL_ORDER_... decimal 38 TOTAL_ORDER_... number(p,s) 38
Connected Lookup
Part of the data flow pipeline
[email protected]
Unconnected Lookup
Will be physically “unconnected” from other transformations
• There can be NO data flow arrows leading to or from an unconnected Lookup
Lookup data is
called from the
point in the
Mapping that
needs it
[email protected]
Conditional Lookup Technique
Two requirements:
Must be Unconnected (or “function mode”) Lookup
Lookup function used within a conditional statement
Row keys
Condition (passed to Lookup)
IIF ( ISNULL(customer_id),0,:lkp.MYLOOKUP(order_no))
Lookup function
Condition Lookup
(true for 2 percent of all (called only when condition is
rows) true)
[email protected]
Unconnected Lookup - Return Port
The port designated as ‘R’ is the return port for the unconnected lookup
There can be only one return port
The look-up (L) / Output (O) port can be assigned as the Return (R) port
The Unconnected Lookup can be called in any other transformation’s
expression editor using the expression
:LKP.Lookup_Tranformation(argument1, argument2,..)
[email protected]
Connected vs. Unconnected Lookups
Part of the mapping data flow Separate from the mapping data
flow
Returns multiple values (by linking Returns one value (by checking the
output ports to another Return (R) port option for the output
transformation) port that provides the return value)
Executed for every record passing Only executed when the lookup
through the transformation function is called
More visible, shows where the Less visible, as the lookup is called
lookup values are used from an expression within another
transformation
Default values are used Default values are ignored
[email protected]
To Cache or not to Cache?
Caching can significantly impact performance
Cached
• Lookup table data is cached locally on the machine
• Mapping rows are looked up against the cache
• Only one SQL SELECT is needed
Uncached
• Each Mapping row needs one SQL SELECT
Rule Of Thumb: Cache if the number (and size) of
records in the Lookup table is small relative to the
number of mapping rows requiring lookup or large
cache memory is available for Integration Service
[email protected]
Additional Lookup Cache Options
[email protected]
Update Strategy Transformation
By the end of this section you will be familiar with:
Update Strategy functionality
Update Strategy expressions
Refresh strategies
Smart aggregation
[email protected]
Target Refresh Strategies
Single snapshot: Target truncated, new records
inserted
Sequential snapshot: new records inserted
Incremental: Only new records are inserted.
Records already present in the target are ignored
Incremental with Update: Only new records are
inserted. Records already present in the target are
updated
[email protected]
Update Strategy Transformation
Used to specify how each individual row will be used to
update target tables (insert, update, delete, reject)
• Active Transformation
• Connected
• Ports
• All input / output
• Specify the Update
Strategy Expression
• Usage
• Updating Slowly
Changing
Dimensions
• IIF or DECODE
logic determines
how to handle the
record
[email protected]
Sequence Generator Transformation
• Passive Transformation
• Connected
• Ports
Two predefined output ports,
• NEXTVAL
• CURRVAL
• No input ports allowed
• Usage
• Generate sequence numbers
• Shareable across mappings
[email protected]
Sequence Generator Properties
Increment Value
To repeat values
Number of
Cached
Values
[email protected]
Introduction
To
Workflows
This section will include -
Integration Service Concepts
The Workflow Manager GUI interface
Setting up Server Connections
• Relational
• FTP
• External Loader
• Application
Task Developer
Creating and configuring Tasks
Creating and Configuring Wokflows
Workflow Schedules
[email protected]
Integration Service
Application service that runs data integration sessions and
workflows
To access it one must have permissions on the service in the
domain
Is managed through Administrator Console
A repository must be assigned to it
A code page must be assigned to the Integration Service
process which should be compatible with the repository
service code page
[email protected]
Workflow Manager Interface
Task
Tool Bar
Workflow
Designer
Tools
Navigator
Window
Workspace
Output Window
Status Bar
[email protected]
Workflow Manager Tools
Workflow Designer
• Maps the execution order and dependencies of Sessions, Tasks
and Worklets, for the Informatica Server
Task Developer
• Create Session, Shell Command and Email tasks
• Tasks created in the Task Developer are reusable
Worklet Designer
• Creates objects that represent a set of tasks
• Worklet objects are reusable
[email protected]
Source & Target Connections
Configure Source & Target data access connections
• Used in Session Tasks
Configure:
Relational
MQ Series
FTP
Application
Loader
Relational Connections (Native )
Create a relational (database) connection
• Instructions to the Integration Service to locate relational tables
• Used in Session Tasks
Relational Connection Properties
Define native relational
(database) connection
User Name/Password
Database connectivity
information
[email protected]
Task Developer
Create basic Reusable “building blocks” – to use in any Workflow
Reusable Tasks
• Session - Set of instructions to execute Mapping logic
• Command - Specify OS shell / script command(s) to run during the Workflow
• Email - Send email at any point in the Workflow
Session
Command
Email
[email protected]
Session Tasks
After this section, you will be familiar with:
How to create and configure Session Tasks
Session Task properties
Transformation property overrides
Reusable vs. non-reusable Sessions
Session partitions
[email protected]
Session Task
Integration Service instructs to runs the logic of ONE specific
Mapping
• e.g. - source and target data location specifications, memory
allocation, optional Mapping overrides, scheduling, processing
and load instructions
Becomes a
component of a
Workflow (or
Worklet)
If configured in the
Task Developer,
the Session Task
is reusable
(optional)
[email protected]
Session Task
Created to execute the logic of a mapping (one mapping only)
Session Tasks can be created in the Task Developer (reusable) or Workflow
Developer (Workflow-specific)
Steps to create a Session Task
• Select the Session button from the Task Toolbar or
• Select menu Tasks -> Create
[email protected]
Session Task - Sources
[email protected]
Session Task - Targets
[email protected]
Session Task - Transformations
Allows overrides of
some
transformation
properties
Does not change
the properties in
the Mapping
[email protected]
Command Task Command
Specify one (or more) Unix shell or DOS (NT, Win2000) commands to run at
a specific point in the Workflow
Becomes a component of a Workflow (or Worklet)
If configured in the Task Developer, the Command Task is reusable
(optional)
[email protected]
Workflow Structure
A Workflow is set of instructions for the Integration Service to perform
data transformation and load
Combines the logic of Session Tasks, other types of Tasks and Worklets
The simplest Workflow is composed of a Start Task, a Link and one
other Task
Link
Start Session
Task Task
[email protected]
Additional Workflow Components
Two additional components are Worklets and Links
Worklets are objects that contain a series of Tasks
Links are required to connect objects in a Workflow
[email protected]
Building Workflow Components
Add Sessions and other Tasks to the Workflow
Connect all Workflow components with Links
Save the Workflow
Assign the workflow to the integration Service
Start the Workflow
[email protected]
Workflows Properties
[email protected]
Workflows Administration
This section details -
The Workflow Monitor GUI interface
Monitoring views
Server monitoring modes
Filtering displayed items
Actions initiated from the Workflow Monitor
[email protected]
Workflow Monitor Interface
[email protected]
Monitoring Workflows
Perform operations in the Workflow Monitor
Restart -- restart a Task, Workflow or Worklet
Stop -- stop a Task, Workflow, or Worklet
Abort -- abort a Task, Workflow, or Worklet
Recover -- recovers a suspended Workflow after a failed Task is
corrected from the point of failure
View Session and Workflow logs
Abort has a 60 second timeout
If the Integration Service has not completed processing and
committing data during the timeout period, the threads and
processes associated with the Session are killed
[email protected]
Monitor Workflows
The Workflow Monitor is the tool for monitoring Workflows and Tasks
Review details about a Workflow or Task in two views
• Gantt Chart view
• Task view
[email protected]
Monitoring Workflows
Completion Time
Task View Workflow Start Time Status
Status
Bar
[email protected]
PowerCenter Designer
Other Transformations
This section introduces to -
Rank
Normalizer
Stored Procedure
[email protected]
Rank Transformation RNKTRANS
Active
Connected
Selects the top and bottom rank of the data
Different from MAX,MIN functions as we can choose a set of top or bottom values
You can designate only one Rank port in a Rank transformation, i.e. Rank is
decided based on one column only.
The Integration Service uses the Rank Index port to store the ranking position for
each row.
The Designer creates a RANKINDEX port for each Rank transformation.
[email protected]
Normalizer Transformation NRMTRANS
Active
Connected
Used to organize data to reduce redundancy primarily with the
COBOL sources
A single long record with repeated data is converted into
separate records.
[email protected]
Stored Procedure GET_NAME_US
ING_ID
Passive
Connected/ Unconnected
Used to run the Stored Procedures already present in the
database
A valid relational connection should be there for the Stored
Procedure transformation to connect to the database and run
the stored procedure
[email protected]
Reusability
This section discusses -
Parameters and Variables
Transformations
Mapplets
Tasks
[email protected]
Parameters and Variables
System Variables
Creating Parameters and Variables
Features and advantages
Establishing values for Parameters and Variables
[email protected]
System Variables
Provides current datetime on the
SYSDATE
Integration Service machine
• Not a static value
$$$SessStartTime Returns the system date value as a
string when a session is initialized.
Uses system clock on machine hosting
Integration Service
• format of the string is database type
dependent
• Used in SQL override
• Has a constant value
SESSSTARTTIME Returns the system date value on the
Informatica Server
• Used with any function that accepts
transformation date/time data types
• Not to be used in a SQL override
• Has a constant value
[email protected]
Mapping Parameters and Variables
Apply to all transformations within one Mapping
Represent declared values
Variables can change in value during run-time
Parameters remain constant during run-time
Provide increased development flexibility
Defined in Mapping menu
Format is $$VariableName or $$ParameterName
[email protected]
Mapping Parameters and Variables
Sample declarations
Set the
User- appropriate
defined aggregation
names type
Set optional
Initial Value
[email protected]
Reusable Tasks
Tasks can be created in
• Task Developer (Reusable)
• Workflow Designer (Non-reusable)
Tasks can be made reusable my checking the ‘Make Reusable’
checkbox in the general tab of sessions
Following tasks can be made reusable:
• Session
• Email
• Command
When a group of tasks are to be reused then use a worklet (in
worklet designer )
[email protected]
Queries???
Informatica version 8.6 vs 9.x
Aspect 9x
Architecture No Change
SQL transformation Introduction of Query mode (DML – Active) & Script Mode (DDL – Passive)
XML Transformation XML error can be passed as an output to a target without session failure.
[email protected]
Informatica version 9x vs 9.5
Aspect 9x
Architecture No Change
Client Tools IDQ – Developer, Analyst
Lookup Transformations Cache update feature
Multiple rows return of look up making it an Active transformation
SQL transformation Introduction of Query mode (DML – Active) & Script Mode (DDL – Passive)
XML Transformation XML error can be passed as an output to a target without session failure.
Resilience DB deadlock resilience. Auto recovery during dead lock.
Monitoring Job monitoring can be done directly from Admin Console
Deployments Deployment group creation feature is available
[email protected]
Workflow Execution Process
1
Load Manager
DTM Process
2
Workflow
[email protected]
Load Manager Load Manager Process Steps
[email protected]
DTM Process DTM Process Steps
• Fetches session and mapping metadata from the repository.
• Creates and expands session variables.
• Creates the session log file.
• Validates session code pages if data code page validation is enabled. Checks query conversions if data
code page validation is disabled.
• Verifies connection object permissions.
• Runs pre-session shell commands.
• Runs pre-session stored procedures and SQL.
• Creates and runs mapping, reader, writer, and transformation threads to extract,transform, and load
data.
• Runs post-session stored procedures and SQL.
• Runs post-session shell commands.
• Sends post-session email.
[email protected]
• Master Thread
• Mapping Thread
• Pre and Post
Session Thread
• Reader Thread
• Writer Thread
• Transformation
Thread
[email protected]
Lookup cache – What is it?
• The Integration Service builds the cache in memory when the first row is processed. If the memory is
inadequate, the data is paged into a cache file.
• If you use a flat file lookup, the Integration Service always caches the lookup rows.
• Cache if the number (and size) of records in the Lookup table is small relative to the number of mapping
rows requiring the lookup.
156
Types of Lookup Cache
• Static Cache
• Dynamic cache
• Persistent Cache
• Re-cache from Source
• Shared Cache
• Un-named Cache
• Named Cache
[email protected]
Lookup cache: Static
• For each row that passes the transformation, the cache is queried for specified condition.
• If a match is not available either default value (for connected lookups only) or NULL is returned.
• If multiple matches are found, rows are returned based on the option specified in “Lookup policy on multiple
match” in the lookup properties.
158
Static Cache Lookup Source
Integration
Service
Source Cache
Transformations
Target Systems
[email protected]
Persistent Cache Lookup Source
Integration
Service
Source Cache
Transformations
Target Systems
[email protected]
Dynamic Cache Lookup Source
Source
Cache
Target
[email protected]
Lookup cache: Dynamic
• Insert - Inserts the row into the cache if it is not present and you specified to insert rows. You can configure
to insert rows into cache based on input ports or generated sequence IDs.
• Update – updates the row in cache if the row is already present and an update is specified in the properties
• No change:
• Row does not exist in cache, but you have specified to only insert new rows
• Row does not exist in cache, but you have specified update existing rows only
• Row exists in the cache, but based on the lookup conditions nothing changes
162
Re-Cache from Source Lookup Source
Source Cache
Transformations
Target Systems
[email protected]
Shared Cache – Unnamed Cache
• When two Lookup transformations share an unnamed cache, the Integration Service saves
the cache for a Lookup transformation and uses it for subsequent Lookup transformations
that have the same lookup cache structure.
• For example, if you have two instances of the same reusable Lookup transformation in one
mapping and you use the same output ports for both instances, the Lookup
transformations share the lookup cache by default
• Shared transformations must use the same ports in the lookup condition. The conditions
can use different operators, but the ports must be the same.
Shared Cache – Named Cache
• You can also share the cache between multiple Lookup transformations by using a persistent lookup cache
and naming the cache files.
• When the Integration Service processes the first Lookup transformation, it searches the cache directory for
cache files with the same file name prefix.
• If the Integration Service finds the cache files and you do not specify to recache from source, the Integration
Service uses the saved cache files.
• If the Integration Service does not find the cache files or if you specify to recache from source, the
Integration Service builds the lookup cache us.
• The Integration Service saves the cache files to disk after it processes each target load order.
165
Shared Cache – Named Cache
• The Integration Service fails the session if you configure subsequent Lookup transformations to re-cache from source, but
not the first one in the same target load order group.
• If the cache structures do not match, the Integration Service fails the session.
• The Integration Service processes multiple sessions simultaneously when the Lookup transformations only need to read
the cache files.
• The Integration Service fails the session if one session updates a cache file while another session attempts to read or
update the cache file.
• For example, Lookup transformations update the cache file if they are configured to use a dynamic cache or re-cache from source.
166