0% found this document useful (0 votes)
87 views

Informatica Power Center 9

The document provides an overview of Informatica PowerCenter 9.x. It describes the major components of PowerCenter including the repository, administration console, repository manager, designer interface, mappings, transformations, sources, and targets. It explains the basic concepts of extract, transform, and load (ETL) and PowerCenter's architecture. It also outlines the design process in PowerCenter and how to create source definitions, target definitions, and mappings.

Uploaded by

SANDEEP K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views

Informatica Power Center 9

The document provides an overview of Informatica PowerCenter 9.x. It describes the major components of PowerCenter including the repository, administration console, repository manager, designer interface, mappings, transformations, sources, and targets. It explains the basic concepts of extract, transform, and load (ETL) and PowerCenter's architecture. It also outlines the design process in PowerCenter and how to create source definitions, target definitions, and mappings.

Uploaded by

SANDEEP K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 166

Informatica Power Center 9.

[email protected]
Introduction
At the end of this course you will -
 Understand how to use all major PowerCenter
components
 Be able to build basic ETL Mappings and Mapplets
 Understand the different Transformations and their
basic attributes in PowerCenter
 Be able to create, run and monitor Workflows
 Be able to troubleshoot common development
problems

[email protected]
ETL Basics
This section will include -

 Concepts of ETL
 PowerCenter Architecture
 Connectivity between PowerCenter components

[email protected]
Extract, Transform, and Load
Operational Systems Decision Support
Data
RDBMS Mainframe Other Warehouse

• Transaction level data Cleanse Data


Apply Business Rules
• Aggregated data
• Optimized for Transaction
Aggregate Data • Historical
Response Time Consolidate Data
• Current De-normalize
• Normalized or De-
Normalized data
Transform

ETL Load
Extract
[email protected]
PowerCenter Architecture

[email protected]
PowerCenter Components

[email protected]
PowerCenter Components
• PowerCenter Repository
• Repository Service Application Services
• Integration Service
• PowerCenter Client
• Administration Console
• Repository Manager
• Designer
• Workflow Manager
• Workflow Monitor
• External Components
• Sources
• Targets

[email protected]
Introduction To
PowerCenter Repository
and
Administration
This section includes -
 The purpose of the Repository and Repository Service
 Admin Console
 The Repository Manager
 Security and privileges
 Object sharing, searching and locking

[email protected]
PowerCenter Repository
 It is a relational database managed by the Repository Service
 Stores metadata about the objects (mappings, transformations
etc.) in database tables called as Repository Content
 The Repository database can be in Oracle, IBM DB2 UDB, MS
SQL Server or Sybase ASE
 To create a repository service one must have full privileges in
the Administrator Console and also in the domain
 Integration Service uses repository objects for performing the
ETL

[email protected]
Repository Service
 A Repository Service process is a multi-threaded process that
fetches, inserts and updates metadata in the repository
 Manages connections to the Repository from client
applications and Integration Service
 Maintains object consistency by controlling object locking
 Each Repository Service manages a single repository database.
However multiple repositories can be connected and managed
using repository domain
 It can run on multiple machines or nodes in the domain. Each
instance is called a Repository Service process

[email protected]
Administration Console

A web-based interface used to administer the


PowerCenter domain.
Following tasks can be performed:
 Manage the domain
 Shutdown and restart domain and Nodes
 Manage objects within a domain
 Create and Manage Folders, Grid, Integration Service,
Node, Repository Service, Web Service and Licenses
[email protected]
Administration Console

 Enable/ Disable various services like the Integration Services, Repository


Services etc.
 Upgrade Repositories and Integration Services
 View log events for the domain and the services
 View locks
 Add and Manage Users and their profile
 Monitor User Activity
 Manage Application Services
[email protected]
Repository Manager
Use Repository manager to navigate through multiple folders and repositories.
Perform following tasks:
 Add/Edit Repository Connections
 Implement Repository Security (By changing the password only)
 Perform folder functions ( Create , Edit , Delete ,Compare)
 Compare Repository Objects
 Manage Workflow/Session Log Entries
 View Dependencies
 Exchange Metadata with other BI tools

[email protected]
Introduction to
PowerCenter Design Process
We will walk through -
 Design Process
 PowerCenter Designer Interface
 Mapping Components

[email protected]
Design Process
1. Create Source definition(s)
2. Create Target definition(s)
3. Create a Mapping
4. Create a Session Task
5. Create a Workflow from Task components
6. Run the Workflow
7. Monitor the Workflow and verify the results

[email protected]
Designer- Interface
Overview Window

Client
Tools
Workspace

Navigator

Output
Status Bar

[email protected]
Mapping Components

• Each PowerCenter mapping consists of one or more of the following


mandatory components
• Sources
• Transformations
• Targets
• The components are arranged sequentially to form a valid data flow from
SourcesTransformationsTargets

[email protected]
Introduction
To
PowerCenter Designer Interface
Designer- Interface

Mapping List
Folder List Transformation Toolbar

Iconized Mapping

[email protected]
Designer- Source Analyzer

Foreign Key

It Shows the Dependencies of the tables also

[email protected]
Designer- Target Designer

[email protected]
Transformation Developer

Transformation Developer is used only for creating reusable transformations

[email protected]
Mapplet Designer

[email protected]
Designer- Mapping Designer

[email protected]
EXTRACT – Source Object Definitions
This section introduces to -
 Different Source Types
 Creation of ODBC Connections
 Creation of Source Definitions
 Source Definition properties
 Data Preview option

[email protected]
Source Analyzer

Navigation
Window

Analyzer Window

[email protected]
Methods of Analyzing Sources

 Import from Database Repository


 Import from File
 Import from Cobol File
 Import from XML file
 Import from third party
Source
software like SAP, Siebel, Analyzer
PeopleSoft etc
 Create manually

Relational XML file Flat file COBOL file

[email protected]
Analyzing Relational Sources

Source Analyzer
Relational Source
ODBC Table
View
Synonym
DEF
TCP/IP

Repository
Service

native

Repository
DEF
[email protected]
Analyzing Relational Sources
Editing Source Definition Properties
Key Type

[email protected]
Flat File Wizard

 Three-step
wizard
 Columns can
be renamed
within wizard
 Text, Numeric
and Datetime
datatypes are
supported
 Wizard
‘guesses’
datatype
[email protected]
LOAD – Target Definitions
Target Object Definitions
By the end of this section you will:
 Be familiar with Target Definition types
 Know the supported methods of creating Target Definitions
 Understand individual Target Definition properties

[email protected]
Target Designer

[email protected]
Creating Target Definitions
Methods of creating Target Definitions
 Import from Database
 Import from an XML file
 Import from third party software like SAP, Siebel etc.
 Manual Creation
 Automatic Creation

[email protected]
Automatic Target Creation

Drag-and-
drop a
Source
Definition
into
the Target
Designer
Workspace

[email protected]
Import Definition from Database
Can “Reverse engineer” existing object definitions from a
database system catalog or data dictionary

Target
Database
Designer
ODBC

Table
TCP/IP View
DEF Synonym
Repository
Service

native
Repository DEF

[email protected]
Creating Physical Tables

DEF

DEF

DEF Execute SQL


via
Designer

LOGICAL PHYSICAL
Repository target table Target database
definitions tables

[email protected]
TRANSFORM – Transformation Concepts
Transformation Concepts
By the end of this section you will be familiar with:
 Transformation types
 Data Flow Rules
 Transformation Views
 PowerCenter Functions
 Expression Editor and Expression validation
 Port Types
 PowerCenter data types and Datatype Conversion
 Connection and Mapping Valdation
 PowerCenter Basic Transformations – Source Qualifier, Filter, Joiner, Expression

[email protected]
Types of Transformations
 Active/Passive
• Active : Changes the numbers of rows as data passes
through it
• Passive: Passes all the rows through it
 Connected/Unconnected
• Connected : Connected to other transformation through
connectors
• Unconnected : Not connected to any transformation.
Called within a transformation

[email protected]
Transformation Types
PowerCenter provides 24 objects for data transformation
 Aggregator: performs aggregate calculations
 Application Source Qualifier: reads Application object sources as ERP
 Custom: Calls a procedure in shared library or DLL
 Expression: performs row-level calculations
 External Procedure (TX): calls compiled code for each row
 Filter: drops rows conditionally
 Mapplet Input: Defines mapplet input rows. Available in Mapplet designer
 Java: Executes java code
 Joiner: joins heterogeneous sources
 Lookup: looks up values and passes them to other objects
 Normalizer: reads data from VSAM and normalized sources
 Mapplet Output: Defines mapplet output rows. Available in Mapplet designer

[email protected]
Transformation Types
 Rank: limits records to the top or bottom of a range
 Router: splits rows conditionally
 Sequence Generator: generates unique ID values
 Sorter: sorts data
 Source Qualifier: reads data from Flat File and Relational Sources
 Stored Procedure: calls a database stored procedure
 Transaction Control: Defines Commit and Rollback transactions
 Union: Merges data from different databases
 Update Strategy: tags rows for insert, update, delete, reject
 XML Generator: Reads data from one or more Input ports and outputs XML through
single output port
 XML Parser: Reads XML from one or more Input ports and outputs data through
single output port
 XML Source Qualifier: reads XML data

[email protected]
Data Flow Rules
 Each Source Qualifier starts a single data stream
(a dataflow)
 Transformations can send rows to more than one
transformation (split one data flow into multiple pipelines)
 Two or more data flows can meet together -- if (and only if)
they originate from a common active transformation
 Cannot add an active transformation into the mix

ALLOWED DISALLOWED

Passive Active

T T T T

Example holds true with Normalizer in lieu of Source Qualifier. Exceptions are:
Mapplet Input and Joiner transformations
[email protected]
Transformation Views
A transformation has
three views:
 Iconized - shows the
transformation in
relation to the rest of
the mapping
 Normal - shows the
flow of data through
the transformation
 Edit - shows
transformation ports
and properties; allows
editing
Edit Mode
Allows users with folder “write” permissions to change
or create transformation ports and properties
Define port level handling Define transformation
level properties
Enter comments
Make reusable

Switch
between
transformations

[email protected]
Expression Editor
 An expression formula is a calculation or conditional statement
 Used in Expression, Aggregator, Rank, Filter, Router, Update Strategy
 Performs calculation based on ports, functions, operators, variables,
literals, constants and return values from other transformations
PowerCenter Data Types
NATIVE DATATYPES TRANSFORMATION DATATYPES

Specific to the source and target database types PowerCenter internal datatypes based on ANSI SQL-92

Display in source and target tables within Mapping Designer Display in transformations within Mapping Designer

Native Transformation Native

 Transformation datatypes allow mix and match of source and target


database types
 When connecting ports, native and transformation datatypes must be
compatible (or must be explicitly converted)
[email protected]
Datatype Conversions
Integer, Small Decimal Double, Real String , Text Date/ Time Binary
Int

Integer, Small Integer X X X X


Decimal X X X X
Double , Real X X X X
String , Text X X X X X
Date/Time X X
Binary X

 All numeric data can be converted to all other numeric datatypes,


e.g. - integer, double, and decimal
 All numeric data can be converted to string, and vice versa
 Date can be converted only to date and string, and vice versa
 Raw (binary) can only be linked to raw
 Other conversions not listed above are not supported
 
 These conversions are implicit; no function is necessary
[email protected]
PowerCenter Functions - Types

ASCII
CHR Character Functions
CHRCODE
CONCAT  Used to manipulate character data
INITCAP
INSTR
 CHRCODE returns the numeric value
LENGTH (ASCII or Unicode) of the first
LOWER character of the string passed to this
LPAD
LTRIM function
RPAD
RTRIM
SUBSTR For backwards compatibility only - use || instead
UPPER
REPLACESTR
REPLACECHR

[email protected]
PowerCenter Functions

TO_CHAR (numeric)
TO_DATE
Conversion Functions
TO_DECIMAL  Used to convert datatypes
TO_FLOAT
TO_INTEGER Date Functions
TO_NUMBER
 Used to round, truncate, or
ADD_TO_DATE compare dates; extract one part of
DATE_COMPARE a date; or perform arithmetic on a
DATE_DIFF
GET_DATE_PART date
LAST_DAY
ROUND (date)
 To pass a string to a date function,
SET_DATE_PART first use the TO_DATE function to
TO_CHAR (date) convert it to an date/time
TRUNC (date)
datatype
[email protected]
PowerCenter Functions

ERROR Special Functions


ABORT  Used to handle specific conditions within a session;
DECODE search for certain values; test conditional statements

IIF IIF(Condition,True,False)

ISNULL
Test Functions
IS_DATE
IS_NUMBER  Used to test if a lookup result is null
IS_SPACES  Used to validate data

SOUNDEX Encoding Functions


METAPHONE  Used to encode string values

[email protected]
Expression Validation
The Validate or ‘OK’ button in the Expression Editor will:
 Parse the current expression
• Remote port searching (resolves references to ports in
other transformations)
 Parse transformation attributes
• e.g. - filter condition, lookup condition, SQL Query
 Parse default values
 Check spelling, correct number of arguments in functions,
other syntactical errors

[email protected]
Types of Ports
 Four basic types of ports are there
• Input
• Output
• Input/Output
• Variable
 Apart from these Look-up & Return ports are also there
that are specific to the Lookup transformation

[email protected]
Variable and Output Ports
 Use to simplify complex expressions
• e.g. - create and store a depreciation formula to be
referenced more than once
 Use in another variable port or an output port expression
 Local to the transformation (a variable port cannot also be an input or
output port)
 Available in the Expression, Aggregator and Rank transformations

[email protected]
Connection Validation
Examples of invalid connections in a Mapping:
 Connecting ports with incompatible datatypes
 Connecting output ports to a Source
 Connecting a Source to anything but a Source Qualifier
or Normalizer transformation
 Connecting an output port to an output port or an input
port to another input port
 Connecting more than one active transformation to
another transformation (invalid dataflow)

[email protected]
Mapping Validation
 Mappings must:
• Be valid for a Session to run
• Be end-to-end complete and contain valid expressions
• Pass all data flow rules
 Mappings are always validated when saved; can be validated
without being saved
 Output Window will always display reason for invalidity

[email protected]
Source Qualifier Transformation
 Reads data from the sources
 Active & Connected Transformation
 Applicable only to relational and flat file sources
 Maps database/file specific datatypes to PowerCenter
Native datatypes.
• Eg. Number(24) becomes decimal(24)
 Determines how the source database binds data when the
Integration Service reads it
 If mismatch between the source definition and source
qualifier datatypes then mapping is invalid
 All ports by default are Input/Output ports
Source Qualifier Transformation

Used as
 Joiner for homogenous
tables using a where clause
 Filter using a where clause
 Sorter
 Select distinct values
Pre-SQL and Post-SQL Rules
 Can use any command that is valid for the database type; no
nested comments
 Can use Mapping Parameters and Variables in SQL executed
against the source
 Use a semi-colon (;) to separate multiple statements
 Informatica Server ignores semi-colons within single quotes,
double quotes or within /* ...*/
 To use a semi-colon outside of quotes or comments, ‘escape’
it with a back slash (\)
 Workflow Manager does not validate the SQL

[email protected]
Filter Transformation

Drops rows conditionally

 Active Transformation
 Connected
 Ports
• All input / output
 Specify a Filter
condition
 Usage
• Filter rows from
flat file sources
• Single pass
source(s) into
multiple targets

[email protected]
Filter Transformation – Tips
 Boolean condition is always faster as compared to complex
conditions
 Use filter transformation early in the mapping
 Source qualifier filters rows from relational sources but filter
transformation is source independent
 Always validate a condition

[email protected]
Joiner Transformation
By the end of this sub-section you will be familiar with:
 When to use a Joiner Transformation
 Homogeneous Joins
 Heterogeneous Joins
 Joiner properties
 Joiner Conditions
 Nested joins

[email protected]
Homogeneous Joins
Joins that can be performed with a SQL SELECT statement:
 Source Qualifier contains a SQL join
 Tables on same database server (or are synonyms)
 Database server does the join “work”
 Multiple homogenous tables can be joined

[email protected]
Heterogeneous Joins
Joins that cannot be done with a SQL statement:
 An Oracle table and a Sybase table
 Two Informix tables on different database servers
 Two flat files
 A flat file and a database table

[email protected]
Joiner Transformation
Performs heterogeneous joins on records from two
tables on same or different databases or flat file
sources
 Active Transformation
 Connected
 Ports
• All input or input / output
• “M” denotes port comes
from master source
 Specify the Join condition
 Usage
• Join two flat files
• Join two tables from
different databases
• Join a flat file with a
relational table

[email protected]
Joiner Conditions

Multiple
join
conditions
are supported

[email protected]
Joiner Properties
Join types:
 “Normal” (inner)
 Master outer
 Detail outer
 Full outer

Set Joiner Cache

Joiner can accept sorted data

[email protected]
Sorted Input for Joiner
 Using sorted input improves session performance minimizing the disk
input and output
 The pre-requisites for using the sorted input are
• Database sort order must be same as the session sort order
• Sort order must be configured by the use of sorted sources (flat
files/relational tables) or sorter transformation
• The flow of sorted data must me maintained by avoiding the use of
transformations like Rank, Custom, Normalizer etc. which alter the sort
order
• Enable the sorted input option is properties tab
• The order of the ports used in joining condition must match the order of
the ports at the sort origin
• When joining the Joiner output with another pipeline make sure that the
data from the first joiner is sorted

[email protected]
Mid-Mapping Join - Tips
The Joiner does not accept input in the following situations:
 Both input pipelines begin with the same Source Qualifier
 Both input pipelines begin with the same Normalizer
 Both input pipelines begin with the same Joiner
 Either input pipeline contains an Update Strategy

[email protected]
Expression Transformation
Passive Transformation

Connected

Ports
• Mixed
• Variables allowed

Create expression in
an output or variable
port

Usage
• Perform majority of
data manipulation
Click here to invoke the
Expression Editor
[email protected]
Router Transformation
Multiple filters in single transformation Adds a group

 Active Transformation
 Connected
 Ports
• All input/output
 Specify filter conditions
for each Group
 Usage
• Link source data in
one pass to multiple
filter conditions

[email protected]
Router Transformation in a
Mapping

TARGET_ORD SQ_TARGET_O RTR_Orde rCostt TARGET_ROU


ERS_COST (Ora RDERS_COST TED_ORDER2 (
c le ) Ora c le )

TARGET_ROU
TED_ORDER1 (
Ora c le )

[email protected]
Comparison – Filter and Router
Filter Router

Tests rows for only one condition Tests rows for one or more condition

Drops the rows which don’t meet the filter Routes the rows not meeting the filter condition to
condition default group

 In case of multiple filter transformation the Integration service


processes rows for each transformation but in case of router the
incoming rows are processed only once.
Sorter Transformation

Sorts the data, selects distinct

 Active transformation
 Is always connected
 Can sort data from relational tables or flat files both in ascending or
descending order
 Only Input/Output/Key ports are there
 Sort takes place on the Integration Service machine
 Multiple sort keys are supported. The Integration Service sorts each port
sequentially
 The Sorter transformation is often more efficient than a sort performed
on a database with an ORDER BY clause

[email protected]
Sorter Transformation
 Discard duplicate rows by selecting ‘Distinct’ option

 Acts as an active transformation with distinct option


else as passive
Aggregator Transformation
Performs aggregate calculations

 Active Transformation
 Connected
 Ports
• Mixed
• Variables allowed
• Group By allowed
 Create expressions in
output or variable ports
 Usage
• Standard aggregations

[email protected]
PowerCenter Aggregate Functions
Aggregate Functions
AVG  Return summary values for non-null data in selected ports
COUNT
 Used only in Aggregator transformations
FIRST
LAST  Used in output ports only
MAX  Calculate a single value (and row) for all records in a group
MEDIAN
MIN  Only one aggregate function can be nested within an
aggregate function
PERCENTILE
STDDEV  Conditional statements can be used with these functions
SUM
VARIANCE

[email protected]
Aggregate Expressions

Aggregate
functions are
supported
only in the
Aggregator
Transformation

Conditional
Aggregate
expressions are
supported

Conditional SUM format: SUM(value, condition)


[email protected]
Aggregator Properties
Sorted Input
Property

Instructs the
Aggregator to
expect the data
to be sorted

Set Aggregator
cache sizes (on
Integration Service
machine)

[email protected]
Why Sorted Input?
 Aggregator works efficiently with sorted input data
• Sorted data can be aggregated more efficiently,
decreasing total processing time
 The Integration Service will cache data from each
group and release the cached data -- upon reaching
the first record of the next group
 Data must be sorted according to the order of the
Aggregator “Group By” ports
 Performance gain will depend upon varying factors

[email protected]
Lookup Transformation
By the end of this sub-section you will be familiar with:
 Lookup principles
 Lookup properties
 Lookup conditions
 Lookup techniques
 Caching considerations

[email protected]
Lookup Transformation

 Get Related Value


 Get Multiple Values
 Perform Calculation
 Update Slowly Changing Dimension tables

[email protected]
Lookup Transformation
 Different types of configuration possible :-
 Flat File or Relational
 Pipeline Lookup
 Connected or Unconnected lookup
 Cached or Un-cached Lookup

[email protected]
How a Lookup Transformation Works
 For each Mapping row, one or more port values are
looked up in a database table
 If a match is found, one or more table values are
returned to the Mapping. If no match is found, NULL is
returned Look Up Transformation
Look-up
Values
Return
SQ_TARGET_ITEMS_OR... LKP_OrderID TARGET_ORDERS_COS...
Source Qualifier Lookup Proce dure
Values Target Definition
Name Datatype Len... Name Datatype Len... Loo... Ret... AssociatedK...Name
... Datatype Len...
ITEM_ID decimal 38 IN_ORDER_ID decimal 38 No No ORDER_ID number(p,s) 38
ITEM_NAME string 72 DATE_ENTERED date/ time 19 Yes No DATE_ENTERED date 19
ITEM_DESC string 72 DATE_PROMISED date/ time 19 Yes No DATE_PROMISED date 19
WHOLESALE_CO... decimal 10 DATE_SHIPPED date/ time 19 Yes No DATE_SHIPPED date 19
DISCONTINUED_... decimal 38 EMPLOYEE_ID decimal 38 Yes No EMPLOYEE_ID number(p,s) 38
MANUFACTURER...decimal 38 CUSTOMER_ID decimal 38 Yes No CUSTOMER_ID number(p,s) 38
DISTRIBUTOR_ID decimal 38 SALES_TAX_RATE decimal 5 Yes No SALES_TAX_RATE number(p,s) 5
ORDER_ID decimal 38 STORE_ID decimal 38 Yes No STORE_ID number(p,s) 38
TOTAL_ORDER_... decimal 38 TOTAL_ORDER_... number(p,s) 38

[email protected]
Lookup Transformation
Looks up values in a database table or flat files and provides
data to downstream transformation in a Mapping

 Passive Transformation
 Connected / Unconnected
 Ports
• Mixed
• “L” denotes Lookup port
• “R” denotes port used as a
return value (unconnected
Lookup only)
 Specify the Lookup Condition
 Usage
• Get related values
• Verify if records exists or
if data has changed

[email protected]
Lookup Properties

Override
Lookup SQL
option

Toggle
caching

Native
Database
Connection
Object name

[email protected]
Additional Lookup Properties

Set cache
directory

Make cache
persistent

Set Lookup
cache sizes

[email protected]
Lookup Conditions
Multiple conditions are supported

[email protected]
Connected Lookup
SQ_TARGET_ITEMS_OR... LKP_OrderID TARGET_ORDERS_COS...
Source Qualifier Lookup Proce dure Target Definition
Name Datatype Len... Name Datatype Len... Loo... Ret... AssociatedK...Name
... Datatype Len...
ITEM_ID decimal 38 IN_ORDER_ID decimal 38 No No ORDER_ID number(p,s) 38
ITEM_NAME string 72 DATE_ENTERED date/ time 19 Yes No DATE_ENTERED date 19
ITEM_DESC string 72 DATE_PROMISED date/ time 19 Yes No DATE_PROMISED date 19
WHOLESALE_CO... decimal 10 DATE_SHIPPED date/ time 19 Yes No DATE_SHIPPED date 19
DISCONTINUED_... decimal 38 EMPLOYEE_ID decimal 38 Yes No EMPLOYEE_ID number(p,s) 38
MANUFACTURER...decimal 38 CUSTOMER_ID decimal 38 Yes No CUSTOMER_ID number(p,s) 38
DISTRIBUTOR_ID decimal 38 SALES_TAX_RATE decimal 5 Yes No SALES_TAX_RATE number(p,s) 5
ORDER_ID decimal 38 STORE_ID decimal 38 Yes No STORE_ID number(p,s) 38
TOTAL_ORDER_... decimal 38 TOTAL_ORDER_... number(p,s) 38

Connected Lookup
Part of the data flow pipeline

[email protected]
Unconnected Lookup
 Will be physically “unconnected” from other transformations
• There can be NO data flow arrows leading to or from an unconnected Lookup

Lookup function can be set within any


transformation that supports expressions

Lookup data is
called from the
point in the
Mapping that
needs it

Function in the Aggregator


calls the unconnected Lookup

[email protected]
Conditional Lookup Technique
Two requirements:
 Must be Unconnected (or “function mode”) Lookup
 Lookup function used within a conditional statement

Row keys
Condition (passed to Lookup)

IIF ( ISNULL(customer_id),0,:lkp.MYLOOKUP(order_no))

Lookup function

 Conditional statement is evaluated for each row


 Lookup function is called only under the pre-defined condition
[email protected]
Conditional Lookup Advantage
 Data lookup is performed only for those rows which require it.
Substantial performance can be gained

EXAMPLE: A Mapping will process 500,000 rows. For two percent of


those rows (10,000) the item_id value is NULL. Item_ID can be derived
from the SKU_NUMB.

IIF ( ISNULL(item_id), 0,:lkp.MYLOOKUP


(sku_numb))

Condition Lookup
(true for 2 percent of all (called only when condition is
rows) true)

Net savings = 490,000 lookups

[email protected]
Unconnected Lookup - Return Port
 The port designated as ‘R’ is the return port for the unconnected lookup
 There can be only one return port
 The look-up (L) / Output (O) port can be assigned as the Return (R) port
 The Unconnected Lookup can be called in any other transformation’s
expression editor using the expression
:LKP.Lookup_Tranformation(argument1, argument2,..)

[email protected]
Connected vs. Unconnected Lookups

CONNECTED LOOKUP UNCONNECTED LOOKUP

Part of the mapping data flow Separate from the mapping data
flow
Returns multiple values (by linking Returns one value (by checking the
output ports to another Return (R) port option for the output
transformation) port that provides the return value)
Executed for every record passing Only executed when the lookup
through the transformation function is called
More visible, shows where the Less visible, as the lookup is called
lookup values are used from an expression within another
transformation
Default values are used Default values are ignored

[email protected]
To Cache or not to Cache?
Caching can significantly impact performance
 Cached
• Lookup table data is cached locally on the machine
• Mapping rows are looked up against the cache
• Only one SQL SELECT is needed
 Uncached
• Each Mapping row needs one SQL SELECT
 Rule Of Thumb: Cache if the number (and size) of
records in the Lookup table is small relative to the
number of mapping rows requiring lookup or large
cache memory is available for Integration Service
[email protected]
Additional Lookup Cache Options

Make cache persistent

Cache File Name Prefix


• Reuse cache by
name for another
similar business
purpose
Recache from Source
• Overrides other
settings and Lookup Dynamic Lookup Cache
data is refreshed
• Allows a row to know about the
handling of a previous row
[email protected]
Persistent Caches
 By default, Lookup caches are not persistent
 When Session completes, cache is erased
 Cache can be made persistent with the Lookup
properties
 When Session completes, the persistent cache is
stored on the machine hard disk files
 The next time Session runs, cached data is loaded
fully or partially into RAM and reused
 Can improve performance, but “stale” data may pose
a problem

[email protected]
Update Strategy Transformation
By the end of this section you will be familiar with:
 Update Strategy functionality
 Update Strategy expressions
 Refresh strategies
 Smart aggregation

[email protected]
Target Refresh Strategies
 Single snapshot: Target truncated, new records
inserted
 Sequential snapshot: new records inserted
 Incremental: Only new records are inserted.
Records already present in the target are ignored
 Incremental with Update: Only new records are
inserted. Records already present in the target are
updated

[email protected]
Update Strategy Transformation
Used to specify how each individual row will be used to
update target tables (insert, update, delete, reject)

• Active Transformation
• Connected
• Ports
• All input / output
• Specify the Update
Strategy Expression
• Usage
• Updating Slowly
Changing
Dimensions
• IIF or DECODE
logic determines
how to handle the
record

[email protected]
Sequence Generator Transformation

Generates unique keys for any port on a row

• Passive Transformation
• Connected
• Ports
Two predefined output ports,
• NEXTVAL
• CURRVAL
• No input ports allowed
• Usage
• Generate sequence numbers
• Shareable across mappings

[email protected]
Sequence Generator Properties

Increment Value

To repeat values

Number of
Cached
Values

[email protected]
Introduction
To
Workflows
This section will include -
 Integration Service Concepts
 The Workflow Manager GUI interface
 Setting up Server Connections
• Relational
• FTP
• External Loader
• Application
 Task Developer
 Creating and configuring Tasks
 Creating and Configuring Wokflows
 Workflow Schedules

[email protected]
Integration Service
 Application service that runs data integration sessions and
workflows
 To access it one must have permissions on the service in the
domain
 Is managed through Administrator Console
 A repository must be assigned to it
 A code page must be assigned to the Integration Service
process which should be compatible with the repository
service code page

[email protected]
Workflow Manager Interface

Task
Tool Bar

Workflow
Designer
Tools

Navigator
Window
Workspace

Output Window

Status Bar

[email protected]
Workflow Manager Tools
 Workflow Designer
• Maps the execution order and dependencies of Sessions, Tasks
and Worklets, for the Informatica Server

 Task Developer
• Create Session, Shell Command and Email tasks
• Tasks created in the Task Developer are reusable

 Worklet Designer
• Creates objects that represent a set of tasks
• Worklet objects are reusable

[email protected]
Source & Target Connections
 Configure Source & Target data access connections
• Used in Session Tasks

Configure:
 Relational
 MQ Series
 FTP
 Application
 Loader
Relational Connections (Native )
 Create a relational (database) connection
• Instructions to the Integration Service to locate relational tables
• Used in Session Tasks
Relational Connection Properties
 Define native relational
(database) connection

User Name/Password

Database connectivity
information

Optional Environment SQL


(executed with each use of
database connection)

Optional Environment SQL


(executed before initiation of
each transaction)

[email protected]
Task Developer
 Create basic Reusable “building blocks” – to use in any Workflow
 Reusable Tasks
• Session - Set of instructions to execute Mapping logic
• Command - Specify OS shell / script command(s) to run during the Workflow
• Email - Send email at any point in the Workflow

Session
Command
Email

[email protected]
Session Tasks
After this section, you will be familiar with:
 How to create and configure Session Tasks
 Session Task properties
 Transformation property overrides
 Reusable vs. non-reusable Sessions
 Session partitions

[email protected]
Session Task
 Integration Service instructs to runs the logic of ONE specific
Mapping
• e.g. - source and target data location specifications, memory
allocation, optional Mapping overrides, scheduling, processing
and load instructions

 Becomes a
component of a
Workflow (or
Worklet)
 If configured in the
Task Developer,
the Session Task
is reusable
(optional)

[email protected]
Session Task
 Created to execute the logic of a mapping (one mapping only)
 Session Tasks can be created in the Task Developer (reusable) or Workflow
Developer (Workflow-specific)
 Steps to create a Session Task
• Select the Session button from the Task Toolbar or
• Select menu Tasks -> Create

Session Task Bar Icon

[email protected]
Session Task - Sources

[email protected]
Session Task - Targets

[email protected]
Session Task - Transformations
 Allows overrides of
some
transformation
properties
 Does not change
the properties in
the Mapping

[email protected]
Command Task Command
 Specify one (or more) Unix shell or DOS (NT, Win2000) commands to run at
a specific point in the Workflow
 Becomes a component of a Workflow (or Worklet)
 If configured in the Task Developer, the Command Task is reusable
(optional)

Commands can also be referenced in a Session through the Session


“Components” tab as Pre- or Post-Session commands
[email protected]
Email Task
Email
 Sends email during a workflow
 Becomes a component of a Workflow (or Worklet)
 If configured in the Task Developer, the Email Task is reusable (optional)
 Email can be also sent by using post-session email option and suspension
email options of the session. (Non-reusable)

[email protected]
Workflow Structure
 A Workflow is set of instructions for the Integration Service to perform
data transformation and load
 Combines the logic of Session Tasks, other types of Tasks and Worklets
 The simplest Workflow is composed of a Start Task, a Link and one
other Task

Link

Start Session
Task Task

[email protected]
Additional Workflow Components
Two additional components are Worklets and Links
 Worklets are objects that contain a series of Tasks
 Links are required to connect objects in a Workflow

[email protected]
Building Workflow Components
 Add Sessions and other Tasks to the Workflow
 Connect all Workflow components with Links
 Save the Workflow
 Assign the workflow to the integration Service
 Start the Workflow

Sessions in a Workflow can be independently executed


[email protected]
Workflow Properties
Customize Workflow
Properties

Workflow log displays

Select a Workflow Schedule (optional)

May be reusable or non-reusable

[email protected]
Workflows Properties

Create a User-defined Event


which can later be used
with the Raise Event Task

Define Workflow Variables that can


be used in later Task objects
(example: Decision Task)

[email protected]
Workflows Administration
This section details -
 The Workflow Monitor GUI interface
 Monitoring views
 Server monitoring modes
 Filtering displayed items
 Actions initiated from the Workflow Monitor

[email protected]
Workflow Monitor Interface

Available Integration Services

[email protected]
Monitoring Workflows
 Perform operations in the Workflow Monitor
 Restart -- restart a Task, Workflow or Worklet
 Stop -- stop a Task, Workflow, or Worklet
 Abort -- abort a Task, Workflow, or Worklet
 Recover -- recovers a suspended Workflow after a failed Task is
corrected from the point of failure
 View Session and Workflow logs
 Abort has a 60 second timeout
 If the Integration Service has not completed processing and
committing data during the timeout period, the threads and
processes associated with the Session are killed

Stopping a Session Task means the Server stops reading data

[email protected]
Monitor Workflows
 The Workflow Monitor is the tool for monitoring Workflows and Tasks
 Review details about a Workflow or Task in two views
• Gantt Chart view
• Task view

[email protected]
Monitoring Workflows
Completion Time
Task View Workflow Start Time Status

Status
Bar

[email protected]
PowerCenter Designer
Other Transformations
This section introduces to -
 Rank
 Normalizer
 Stored Procedure

[email protected]
Rank Transformation RNKTRANS

 Active
 Connected
 Selects the top and bottom rank of the data
 Different from MAX,MIN functions as we can choose a set of top or bottom values
 You can designate only one Rank port in a Rank transformation, i.e. Rank is
decided based on one column only.
 The Integration Service uses the Rank Index port to store the ranking position for
each row.
 The Designer creates a RANKINDEX port for each Rank transformation.

[email protected]
Normalizer Transformation NRMTRANS

 Active
 Connected
 Used to organize data to reduce redundancy primarily with the
COBOL sources
 A single long record with repeated data is converted into
separate records.

[email protected]
Stored Procedure GET_NAME_US
ING_ID

 Passive
 Connected/ Unconnected
 Used to run the Stored Procedures already present in the
database
 A valid relational connection should be there for the Stored
Procedure transformation to connect to the database and run
the stored procedure

[email protected]
Reusability
This section discusses -
 Parameters and Variables
 Transformations
 Mapplets
 Tasks

[email protected]
Parameters and Variables
 System Variables
 Creating Parameters and Variables
 Features and advantages
 Establishing values for Parameters and Variables

[email protected]
System Variables
 Provides current datetime on the
SYSDATE
Integration Service machine
• Not a static value
$$$SessStartTime  Returns the system date value as a
string when a session is initialized.
Uses system clock on machine hosting
Integration Service
• format of the string is database type
dependent
• Used in SQL override
• Has a constant value
SESSSTARTTIME  Returns the system date value on the
Informatica Server
• Used with any function that accepts
transformation date/time data types
• Not to be used in a SQL override
• Has a constant value
[email protected]
Mapping Parameters and Variables
 Apply to all transformations within one Mapping
 Represent declared values
 Variables can change in value during run-time
 Parameters remain constant during run-time
 Provide increased development flexibility
 Defined in Mapping menu
 Format is $$VariableName or $$ParameterName

[email protected]
Mapping Parameters and Variables
Sample declarations
Set the
User- appropriate
defined aggregation
names type

Set optional
Initial Value

Declare Variables and Parameters in the Designer Mappings menu


[email protected]
Transformation Developer
 Transformations used in multiple mappings are called Reusable
Transformations
 Two ways of building reusable transformations
• Using the Transformation developer
• Making the transformation reusable by checking the reusable
option in the mapping designer
 Changes made to the reusable transformation are inherited by
all the instances ( Validate in all the mappings that use the
instances )
 Most transformations can be made non-reusable /reusable.
***External Procedure transformation can be created as a
reusable transformation only
[email protected]
Mapplet Developer
 When a group of transformation are to be reused in multiple
mappings then we develop mapplets
 Input and/ Output can be defined for the mapplet
 Editing the mapplet changes the instances of the mapplet
used

[email protected]
Reusable Tasks
 Tasks can be created in
• Task Developer (Reusable)
• Workflow Designer (Non-reusable)
 Tasks can be made reusable my checking the ‘Make Reusable’
checkbox in the general tab of sessions
 Following tasks can be made reusable:
• Session
• Email
• Command
 When a group of tasks are to be reused then use a worklet (in
worklet designer )

[email protected]
Queries???
Informatica version 8.6 vs 9.x
Aspect 9x

Architecture No Change

Client Tools IDQ – Developer, Analyst

Lookup Transformations Cache update feature

Multiple rows return of look up making it an Active transformation

SQL transformation Introduction of Query mode (DML – Active) & Script Mode (DDL – Passive)

XML Transformation XML error can be passed as an output to a target without session failure.

Resilience DB deadlock resilience. Auto recovery during dead lock.

Monitoring Job monitoring can be done directly from Admin Console

Deployments Deployment group creation feature is available

[email protected]
Informatica version 9x vs 9.5
Aspect 9x
Architecture No Change
Client Tools IDQ – Developer, Analyst
Lookup Transformations Cache update feature
Multiple rows return of look up making it an Active transformation
SQL transformation Introduction of Query mode (DML – Active) & Script Mode (DDL – Passive)
XML Transformation XML error can be passed as an output to a target without session failure.
Resilience DB deadlock resilience. Auto recovery during dead lock.
Monitoring Job monitoring can be done directly from Admin Console
Deployments Deployment group creation feature is available

[email protected]
Workflow Execution Process

1
Load Manager

DTM Process

2
Workflow

[email protected]
Load Manager Load Manager Process Steps

• Locks the workflow and reads workflow properties. 


• Reads the parameter file and expands workflow variables. 
• Creates the workflow log file. 
• Runs workflow tasks. 
• Distributes sessions to worker servers. 
• Starts the DTM to run sessions. 
• Runs sessions from master servers. 
• Sends post-session email if the DTM terminates abnormally. 

[email protected]
DTM Process DTM Process Steps
• Fetches session and mapping metadata from the repository. 
• Creates and expands session variables. 
• Creates the session log file. 
• Validates session code pages if data code page validation is enabled. Checks query conversions if data
code page validation is disabled. 
• Verifies connection object permissions. 
• Runs pre-session shell commands. 
• Runs pre-session stored procedures and SQL. 
• Creates and runs mapping, reader, writer, and transformation threads to extract,transform, and load
data. 
• Runs post-session stored procedures and SQL. 
• Runs post-session shell commands. 
• Sends post-session email.

[email protected]
• Master Thread
• Mapping Thread
• Pre and Post
Session Thread
• Reader Thread
• Writer Thread
• Transformation
Thread

[email protected]
Lookup cache – What is it?

• Lookup transformations can be configured to use cache files.

• The Integration Service builds the cache in memory when the first row is processed. If the memory is
inadequate, the data is paged into a cache file.

• If you use a flat file lookup, the Integration Service always caches the lookup rows.

• By default, the cache files are created under $PMCacheDir.

• Cache if the number (and size) of records in the Lookup table is small relative to the number of mapping
rows requiring the lookup.

156
Types of Lookup Cache
• Static Cache
• Dynamic cache
• Persistent Cache
• Re-cache from Source
• Shared Cache
• Un-named Cache
• Named Cache

[email protected]
Lookup cache: Static

• This is the default type of cache.

• Cache is built when the first lookup row is processed.

• For each row that passes the transformation, the cache is queried for specified condition.

• If a match is available, the proper value is returned.

• If a match is not available either default value (for connected lookups only) or NULL is returned.

• If multiple matches are found, rows are returned based on the option specified in “Lookup policy on multiple
match” in the lookup properties.

158
Static Cache Lookup Source

Integration
Service

Source Cache

Transformations
Target Systems

[email protected]
Persistent Cache Lookup Source

Integration
Service

Source Cache

Transformations
Target Systems

[email protected]
Dynamic Cache Lookup Source

Source
Cache

Target

[email protected]
Lookup cache: Dynamic

• The cache file is constantly updated by the following actions

• Insert - Inserts the row into the cache if it is not present and you specified to insert rows. You can configure
to insert rows into cache based on input ports or generated sequence IDs.

• Update – updates the row in cache if the row is already present and an update is specified in the properties

• No change:
• Row does not exist in cache, but you have specified to only insert new rows

• Row does not exist in cache, but you have specified update existing rows only

• Row exists in the cache, but based on the lookup conditions nothing changes

162
Re-Cache from Source Lookup Source

Integration For the First time


Service

Source Cache

Transformations
Target Systems

[email protected]
Shared Cache – Unnamed Cache

• When two Lookup transformations share an unnamed cache, the Integration Service saves
the cache for a Lookup transformation and uses it for subsequent Lookup transformations
that have the same lookup cache structure.

• For example, if you have two instances of the same reusable Lookup transformation in one
mapping and you use the same output ports for both instances, the Lookup
transformations share the lookup cache by default

• Shared transformations must use the same ports in the lookup condition. The conditions
can use different operators, but the ports must be the same.
Shared Cache – Named Cache

• You can also share the cache between multiple Lookup transformations by using a persistent lookup cache
and naming the cache files.

• When the Integration Service processes the first Lookup transformation, it searches the cache directory for
cache files with the same file name prefix.

• If the Integration Service finds the cache files and you do not specify to recache from source, the Integration
Service uses the saved cache files.

• If the Integration Service does not find the cache files or if you specify to recache from source, the
Integration Service builds the lookup cache us.

• The Integration Service saves the cache files to disk after it processes each target load order.

165
Shared Cache – Named Cache
• The Integration Service fails the session if you configure subsequent Lookup transformations to re-cache from source, but
not the first one in the same target load order group.

• If the cache structures do not match, the Integration Service fails the session.

• The Integration Service processes multiple sessions simultaneously when the Lookup transformations only need to read
the cache files.

• The Integration Service fails the session if one session updates a cache file while another session attempts to read or
update the cache file.
• For example, Lookup transformations update the cache file if they are configured to use a dynamic cache or re-cache from source.

166

You might also like