0% found this document useful (0 votes)
383 views78 pages

Informatica Interview Questions and Answers

The document discusses the candidate's work experience developing ETL mappings using Informatica. It describes their roles and responsibilities which include understanding data models, creating mapping specifications, developing mappings using transformations, unit testing, code reviews, and knowledge transfer activities. The current project involves loading data from an Oracle OLTP system into a data warehouse for reporting in Business Objects. Materialized views are used to optimize report performance.

Uploaded by

PAVANN T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
383 views78 pages

Informatica Interview Questions and Answers

The document discusses the candidate's work experience developing ETL mappings using Informatica. It describes their roles and responsibilities which include understanding data models, creating mapping specifications, developing mappings using transformations, unit testing, code reviews, and knowledge transfer activities. The current project involves loading data from an Oracle OLTP system into a data warehouse for reporting in Business Objects. Materialized views are used to optimize report performance.

Uploaded by

PAVANN T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 78

My Self Name. I have done my MBA from JNTU University in 2015.

After that I had an


opportunity to work for a TCS from march 2016 to Dec 2018 where I have started of my
carrer as an ETL developer and I shifted to

Omega Health Care Solutions from december 2018 to till date.

Total I have 4.5 Years of experience in DWH using informatica tool in development and
Enhancement projects. Primarily I worked on healthcare and manufacturing domains.

In my Current project my roles & responsibilities are basically

⮚ I am working with onsite offshore model so we use to get the tasks from my onsite
team.
⮚ As a developer first I need to understand the physical data model i.e. dimensions and
facts; their relationship & also functional specifications that tells the business
requirement designed by Business Analyst.
⮚ I involved into the preparation of source to target mapping sheet (tech Specs) which
tell us what is the source and target and which column we need to map to target
column and also what would be the business logic. This document gives the clear
picture for the development.
⮚ Creating informatica mappings, sessions and workflows using different
transformations to implement business logic.
⮚ Preparation of Unit test cases also one of my responsibilities as per the business
requirement.
⮚ And also involved into Unit testing for the mappings developed by myself.
⮚ I use to source code review for the mappings & workflows developed by my team
members.
⮚ And also involved into the preparation of deployment plan which contains list of
mappings and workflows they need to migrate based on this deployment team can
migrate the code from one environment to another environment.
⮚ Once the code rollout to production we also work with the production support team
for 2 weeks where we parallel give the KT. So we also prepare the KT document as well
for the production team.
Manufacturing or Supply Chain

Page 1 of 78
Coming to My Current Project:

Currently I am working for XXX project for YYY client. Generally YYY does not have a
manufacturing unit, What BIZ (Business) use to do here is before quarter ends they will call
for quotations for primary supply channels this process we called as a RFQ’s(Request for
quotations).Once BIZ creates RFQ automatically notification will go to supply channels .So
these supply channels send back their respective quoted values that we called it as response
from the supply channel. After that biz will start negotiations with supply channels for the
best deal then they approve the RFQ’s.

All these activities (Creating RFQ, supplier response and approve RFQ etc.)Performed in the
oracle apps this is source frontend tool application. These data which get stored into OLTP.
So the OLTP contains all the RFQs, supplier response and approval status data.

We have some Oracle jobs running between OLTP and ODS which replicate the OLTP data to
ODS. It is designed in such a way that any transaction entering into the OLTP is immediately
reflected into the ODS.

We have a staging area where we load the entire ODS data into staging tables for this we
have created some ETL informatica mappings these mappings will truncate and reload the
staging tables for each session run. Before loading to staging tables we are dropping
indexes then after loading bulk data we are recreating indexes using store procedures.

Then we extract all this data from stage & load it into the dimensions & facts on top of
dims and facts we have created some materialized views as per the report requirement.
Finally report directly pulls the data from MV .These reports /dashboards performance
always good because we are not doing any calculation at reporting level. These
dashboards/reports can be used for the analysis purpose like say how many RFQs created,
how many RFQs approved, how many RFQs got responded from the supply channels?

What is the budget?

What is budget approved?

Who is the approval manager pending with whom what is the feedback of the supply
channels from the past etc?

Page 2 of 78
In the present system they don’t have the BI design, so they are using the manual process
by exporting the sql query data to excel sheet an preparing PI charts using macros. In the
new system we are providing BI passion reports like drill down, drillups,PI
charts,Graphs,detail reports and Dashboards.

For Sales Project:

Biz Reports Production


Unit

Delivery Centers

Dist1 Dist2 Dist


3

Coming to My Current Project:

Current system was designed in webmethods (it’s a middleware tool) now they found some issues in
the existing system its not supporting BI capabilities like drill down and drill ups and etc. that’s why

Bizz (Business orClient) decided to migrate to Informatica.

As is system was designed in Webmethods and

To be determined system is designed in Informatica for ETL and BO for reporting.

We are replacing the exact same functionality of webmethods using Informatica.

Page 3 of 78
Generally Once production completes for any product they will send it to delivery centers from the
DC(Delivery Centers) it will be shipped to Suppliy channels or Distributors from there it will go to
end customers.

Before start production of any product bizz approval is essential for the production Unit.

Before taking the decision Bizz has to do some analysis on the existing stock , previous sales history
and future orders etc..to do these they need the reports in BI passion(Drill down and Drill
ups)these reports created in BO, it would show what is on stock in the each delivery
center ,shipping status ,previous sales History and what would be the customer orders for each of
the product across all the delivery centers.Bizz will buy those details from Third party IMS
company .

IMS will collect the information from different distrusters and delivery centers like what is the on
hand stock, shipping stock and how many orders we have in hand for the next quarter and also
what was the previous sales history for specific products.

We have a staging area where we load the entire IMS data into staging tables for this we have
created some ETL informatica mappings these mappings will truncate and reload the staging tables
for each session run. Before loading to staging tables we are dropping indexes then after loading
bulk data we are recreating indexes using store procedures. After completion of sagging load we
load the data in to our dims and facts. On top of out data model we have created some materialized
views where we have complete reporting calculations from Materialized views we will pull the data
to BO reports with less join and less aggregations. So report performance is good.

ORACLE

How strong you are in SQL& PL/SQL?

1) I am good in SQL,I use to write the source qualifier queries for informatica mappings
as per the business requirement.
2) I am comfortable to work with joins; co related queries, sub queries, analyzing
tables, inline views and materialized views.
3) As a informatica developer I could not get more opportunity to work on pl/sql side.
But I worked on PL/SQL to informatica migration project so I do have exposure on
procedure, function and triggers.

Page 4 of 78
What is the difference between view and materialized view?

View Materialized view

A view has a logical existence. It does not A materialized view has a physical
contain data. existence.

Its not a database object. It is a database object.

We cannot perform DML operation on We can perform DML operation on


view. materialized view.

When we do select * from view it will fetch When we do select * from materialized
the data from base table. view it will fetch the data from materialized
view.

In view we cannot schedule to refresh. In materialized view we can schedule to


refresh.

We can keep aggregated data into


materialized view. Materialized view can be
created based on multiple tables.

Materialized View
Materialized view is very essential for reporting. If we don’t have the materialized view it
will directly fetch the data from dimension and facts. This process is very slow since it
involves multiple joins. So the same report logic if we put in the materialized view. We can
fetch the data directly from materialized view for reporting purpose. So that we can avoid
multiple joins at report run time.
It is always necessary to refresh the materialized view. Then it can simply perform select
statement on materialized view.
Difference between Trigger and Procedure

Triggers Stored Procedures

In trigger no need to execute manually. Where as in procedure we need to execute


Triggers will be fired automatically. manually.

Triggers that run implicitly when an


INSERT, UPDATE, or DELETE statement is
issued against the associated table.

Differences between sub-query and co-related sub-query

Sub-query Co-related sub-query

Page 5 of 78
A sub-query is executed once for the Where as co-related sub-query is executed
parent Query once for each row of the parent query.

Example: Example:

Select * from emp where deptno in (select Select a.* from emp e where sal >= (select
deptno from dept); avg(sal) from emp a where
a.deptno=e.deptno group by a.deptno);

Differences between where clause and having clause

Where clause Having clause

Both where and having clause can be used to filter the data.

Where as in where clause it is not But having clause we need to use it with
mandatory. the group by.

Where clause applies to the individual Where as having clause is used to test some
rows. condition on the group rather than on
individual rows.

Where clause is used to restrict rows. But having clause is used to restrict groups.

Restrict normal query by where Restrict group by function by having

In where clause every record is filtered In having clause it is with aggregate records
based on where. (group by functions).

Differences between stored procedure and functions

Stored Procedure Functions

Stored procedure may or may not return Function should return at least one output
values. parameter. Can return more than one
parameter using OUT argument.

Stored procedure can be used to solve the Function can be used to calculations
business logic.

Stored procedure is a pre-compiled But function is not a pre-compiled


statement. statement.

Stored procedure accepts more than one Whereas function does not accept
argument. arguments.

Page 6 of 78
Stored procedures are mainly used to Functions are mainly used to compute
process the tasks. values

Cannot be invoked from SQL statements. Can be invoked form SQL statements e.g.
E.g. SELECT SELECT

Can affect the state of database using Cannot affect the state of database.
commit.

Stored as a pseudo-code in database i.e. Parsed and compiled at runtime.


compiled form.

Differences between rowid and rownum

Rowid Rownum

Rowid is an oracle internal id that is Rownum is a row number returned by a


allocated every time a new record is select statement.
inserted in a table. This ID is unique and
cannot be changed by the user.

Rowid is permanent. Rownum is temporary.

Rowid is a globally unique identifier for a The rownum pseudocoloumn returns a


row in a database. It is created at the time number indicating the order in which oracle
the row is inserted into the table, and selects the row from a table or set of joined
destroyed when it is removed from a table. rows.

What is the difference between joiner and lookup

Joiner Lookup

In joiner on multiple matches it will In lookup it will return either first


return all matching records. record or last record or any value or
error value.

In joiner we cannot configure to use Where as in lookup we can configure


persistence cache, shared cache, to use persistence cache, shared
uncached and dynamic cache cache, uncached and dynamic cache.

Page 7 of 78
We cannot override the query in We can override the query in lookup
joiner to fetch the data from multiple
tables.

We can’t perform any filters along We can apply filters along with lkp
with join condition in joiner conditions using lkp query override
transformation. lookup transformation.

We cannot use relational operators in Where as in lookup we can use the


joiner transformation.(i.e. <,>,<= and relation operators. (i.e. <,>,<= and so
so on) on)

What is the difference between source qualifier and lookup

Source Qualifier Lookup

In source qualifier it will push all the Where as in lookup we can restrict
matching records. whether to display first value, last
value or any value

In source qualifier there is no Where as in lookup we concentrate


concept of cache. on cache concept.

When both source and lookup are in When the source and lookup table
same database we can use source exists in different database then we
qualifier. need to use lookup.

What is the difference between source qualifier and Joiner

Source Qualifier Joiner

Page 8 of 78
We use source qualifier to join the We use joiner to join the tables if
tables if tables are in the same tables are in the different database
database

In source qualifier we can use any Where as in joiner we can’t use other
type of join between two tables. than 4 types of joins.

We can join N number of sources in a Where as in joiner we can join only 2


single source qualifier using sq sources using 1 joiner, to join N
override sources we need N-1 joiners.

Difference between Stop and Abort?

Stoped:

You choose to stop the workflow or task in the Workflow Monitor or through
pmcmd. The Integration Service stops processing the task and all other tasks in
its path. The Integration Service continues running concurrent tasks like
backend store procedures.

Abort:

You choose to abort the workflow or task in the Workflow Monitor or through
pmcmd. The Integration Service kills the DTM process and aborts the task.

How to find Nth highest salary?


SELECT name, salary FROM #Employee e1 WHERE N-1 = (SELECT COUNT(DISTINCT
salary) FROM #Employee e2 WHERE e2.salary > e1.salary)

Top Keyword:

SELECT TOP 1 salary FROM ( SELECT DISTINCT TOP 3 salary FROM #Employee ORDER
BY salary DESC ) AS temp ORDER BY salary

Page 9 of 78
Limit keyword:
SELECT salary FROM Employee ORDER BY salary DESC LIMIT N-1, 1

How to find second highest salary for each department?


select t.deptno, max(t.salary) as maxs
from table t
where t.salary < (select max(salary)
from table t2
where t2.deptno = t.deptno
)
group by t.deptno;

How to find out duplicate records in table?

Select empno, count (*) from EMP group by empno having count (*)>1;

How to delete a duplicate records in a table?

Delete from EMP where rowid not in (select max (rowid) from EMP group by empno);

What is your tuning approach if SQL query taking long time? Or how do u tune SQL query?

If query taking long time then First will run the query in Explain Plan, The explain plan
process stores data in the PLAN_TABLE.

It will give us execution plan of the query like whether the query is using the relevant
indexes on the joining columns or indexes to support the query are missing.

If joining columns doesn’t have index then it will do the full table scan if it is full table scan
the cost will be more then will create the indexes on the joining columns and will run the
query it should give better performance and also needs to analyze the tables if analyzation
happened long back. The ANALYZE statement can be used to gather statistics for a specific
table, index or cluster using

ANALYZE TABLE employees COMPUTE STATISTICS;

If still have performance issue then will use HINTS, hint is nothing but a clue. We can use
hints like

Page 10 of 78
● ALL_ROWS\
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing systems.

(/*+ ALL_ROWS */)

● FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.

(/*+ FIRST_ROWS */)

● CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS, based on
statistics gathered.
● HASH
Hashes one table (full scan) and creates a hash index for that table. Then hashes
other table and uses hash index to find corresponding records. Therefore not
suitable for < or > join conditions.

/*+ use_hash */

Hints are most useful to optimize the query performance.

DWH Concepts

Difference between OLTP and DWH/DS/OLAP

OLTP DWH/DSS/OLAP

OLTP maintains only current information. OLAP contains full history.

It is a normalized structure. It is a de-normalized structure.

Its volatile system. Its non-volatile system.

It cannot be used for reporting purpose. It’s a pure reporting system.

Since it is normalized structure so here it Here it does not require much joins to fetch
requires multiple joins to fetch the data. the data.

It’s not time variant. Its time variant.

It’s a pure relational model. It’s a dimensional model.

Page 11 of 78
What is Staging area why we need it in DWH?

If target and source databases are different and target table volume is high it contains some
millions of records in this scenario without staging table we need to design your informatica
using look up to find out whether the record exists or not in the target table since target
has huge volumes so its costly to create cache it will hit the performance.

If we create staging tables in the target database we can simply do outer join in the source
qualifier to determine insert/update this approach will give you good performance.
It will avoid full table scan to determine insert/updates on target.

And also we can create index on staging tables since these tables were designed for
specific application it will not impact to any other schemas/users.

While processing flat files to data warehousing we can perform cleansing

Data cleansing, also known as data scrubbing, is the process of ensuring that a set of data
is correct and accurate. During data cleansing, records are checked for accuracy and
consistency.

● Since it is one-to-one mapping from ODS to staging we do truncate and reload.


● We can create indexes in the staging state, to perform our source qualifier best.
● If we have the staging area no need to relay on the informatics transformation to
known whether the record exists or not.

ODS:

My understanding of ODS is, its a replica of OLTP system and so the need of this, is to
reduce the burden on production system (OLTP) while fetching data for loading targets.
Hence its a mandate Requirement for every Warehouse.

So every day do we transfer data to ODS from OLTP to keep it up to date?


OLTP is a sensitive database they should not allow multiple select statements it may impact
the performance as well as if something goes wrong while fetching data from OLTP to data
warehouse it will directly impact the business.
ODS is the replication of OLTP.
ODS is usually getting refreshed through some oracle jobs.

What is the difference between a primary key and a surrogate key?

A primary key is a special constraint on a column or set of columns. A primary key constraint
ensures that the column(s) so designated have no NULL values, and that every value is
unique. Physically, a primary key is implemented by the database system using a unique

Page 12 of 78
index, and all the columns in the primary key must have been declared NOT NULL. A table
may have only one primary key, but it may be composite (consist of more than one column).

A surrogate key is any column or set of columns that can be declared as the primary key
instead of a "real" or natural key. Sometimes there can be several natural keys that could be
declared as the primary key, and these are all called candidate keys. So a surrogate is a
candidate key. A table could actually have more than one surrogate key, although this
would be unusual. The most common type of surrogate key is an incrementing integer, such
as an auto increment column in MySQL, or a sequence in Oracle, or an identity column in
SQL Server.
Have you done any Performance tuning in informatica?

1) Yes, One of my mapping was taking 3-4 hours to process 40 millions rows into
staging table we don’t have any transformation inside the mapping its 1 to 1
mapping .Here nothing is there to optimize the mapping so I created session
partitions using key range on effective date column. It improved performance lot,
rather than 4 hours it was running in 30 minutes for entire 40millions.Using
partitions DTM will creates multiple reader and writer threads.
2) There was one more scenario where I got very good performance in the mapping
level .Rather than using lookup transformation if we can able to do outer join in the
source qualifier query override this will give you good performance if both lookup
table and source were in the same database. If lookup tables is huge volumes then
creating cache is costly.
3) And also if we can able to optimize mapping using less no of transformations always
gives you good performance.
4) If any mapping taking long time to execute then first we need to look in to source
and target statistics in the monitor for the throughput and also find out where
exactly the bottle neck by looking busy percentage in the session log will come to
know which transformation taking more time ,if your source query is the bottle neck
then it will show in the end of the session log as “query issued to database “that
means there is a performance issue in the source query.we need to tune the query
using .

How strong you are in UNIX?

Page 13 of 78
1) I have Unix shell scripting knowledge whatever informatica required like

If we want to run workflows in Unix using PMCMD.

Below is the script to run workflow using Unix.

cd /pmar/informatica/pc/pmserver/

/pmar/informatica/pc/pmserver/pmcmd startworkflow -u $INFA_USER -p $INFA_PASSWD


-s $INFA_SERVER:$INFA_PORT -f $INFA_FOLDER -wait $1 >> $LOG_PATH/$LOG_FILE

2) And if we suppose to process flat files using informatica but those files were exists in
remote server then we have to write script to get ftp into informatica server before start
process those files.

3) And also file watch mean that if indicator file available in the specified location then we
need to start our informatica jobs otherwise will send email notification using

Mail X command saying that previous jobs didn’t completed successfully something like
that.

4) Using shell script update parameter file with session start time and end time.

This kind of scripting knowledge I do have. If any new UNIX requirement comes then I can
Google and get the solution implement the same.

What is use of Shortcuts in informatica?

If we copy source definaltions or target definations or mapplets from Shared folder to any
other folders that will become a shortcut.

Let’s assume we have imported some source and target definitions in a shared folder after
that we are using those sources and target definitions in another folders as a shortcut in
some mappings.

If any modifications occur in the backend (Database) structure like adding new columns or
drop existing columns either in source or target I f we reimport into shared folder those new
changes automatically it would reflect in all folder/mappings wherever we used those
sources or target definitions.

Page 14 of 78
How to Concat row data through informatica?

Source:

Ename EmpNo

stev 100

methew 100

john 101

tom 101

Target:

Ename EmpNo

Stev methew 100

John tom 101

Ans:

Using Dynamic Lookup on Target table:

If record doen’t exit do insert in target .If it is already exist then get corresponding Ename
vale from lookup and concat in expression with current Ename value then update the target
Ename column using update strategy.

Using Var port Approch:

Sort the data in sq based on EmpNo column then Use expression to store previous record
information using Var port after that use router to insert a record if it is first time if it is
already inserted then update Ename with concat value of prev name and current name
value then update in target.

How to send Unique (Distinct) records into One target and duplicates into another tatget?

Source:

Ename EmpNo

Page 15 of 78
stev 100

Stev 100

john 101

Mathew 102

Output:

Target_1:

Ename EmpNo

Stev 100

John 101

Mathew 102

Target_2:

Ename EmpNo

Stev 100

Ans:

Using Dynamic Lookup on Target table:

If record doen’t exit do insert in target_1 .If it is already exist then send it to Target_2 using
Router.

Using Var port Approch:

Sort the data in sq based on EmpNo column then Use expression to store previous record
information using Var ports after that use router to route the data into targets if it is first
time then sent it to first target if it is already inserted then send it to Tartget_2.

How to do Dymanic File generation in Informatica?

Page 16 of 78
I want to generate the separate file for every employee (as per Name, it should generate
file).It has to generate 5 flat files and name of the flat file is corresponding employee name
that is the requirement.

Below is my mapping.

Source (Table) -> SQ -> Target (FF)

Source:

Dept Ename EmpNo

A S 22

A R 27

B P 29

B X 30

B U 34

This functionality was added in informatica 8.5 onwards earlier versions it was not there.

We can achieve it with use of transaction control and special "FileName" port in the target
file .

In order to generate the target file names from the mapping, we should make use of the
special "FileName" port in the target file. You can't create this special port from the usual
New port button. There is a special button with label "F" on it to the right most corner of the
target flat file when viewed in "Target Designer".

When you have different sets of input data with different target files created, use the same
instance, but with a Transaction Control transformation which defines the boundary for the
source sets.

in target flat file there is option in column tab i.e filename as column.
when you click that one non editable column gets created in metadata of target.

in transaction control give condition as iif(not isnull(emp_no),tc_commit_before,continue)


else tc_commit_before

map the emp_no column to target's filename column

ur mapping will be like this

source -> squlf-> transaction control-> target

run it ,separate files will be created by name of Ename

Page 17 of 78
How do u populate 1st record to 1st target , 2nd record to 2nd target ,3rd record to 3rd target
and 4th record to 1st target through informatica?

We can do using sequence generator by setting end value=3 and enable cycle option.then
in the router take 3 goups

In 1st group specify condition as seq next value=1 pass those records to 1st target simillarly

In 2nd group specify condition as seq next value=2 pass those records to 2 nd target

In 3rd group specify condition as seq next value=3 pass those records to 3rd target.

Since we have enabled cycle option after reaching end value sequence generator will start
from 1,for the 4th record seq.next value is 1 so it will go to 1st target.

How do you perform incremental logic or Delta or CDC?

Incremental means suppose today we processed 100 records ,for tomorrow run u need to
extract whatever the records inserted newly and updated after previous run based on last
updated timestamp (Yesterday run) this process called as incremental or delta.

Implementation for Incremental Load

Approach_1: Using set max var ()

1) First need to create mapping var ($$INCREMENT_TS)and assign initial value as


old date (01/01/1940).
2) Then override source qualifier query to fetch only LAT_UPD_DATE >=($
$INCREMENT_TS (Mapping var)
3) In the expression assign max last_upd_date value to ($$INCREMENT_TS
(mapping var) using set max var
4) Because its var so it stores the max last upd_date value in the repository, in the
next run our source qualifier query will fetch only the records updated or inseted
after previous run.

Page 18 of 78
Logic in the mapping variable is

Page 19 of 78
Logic in the SQ is

Page 20 of 78
In expression assign max last update date value to the variable using function set max variable

Page 21 of 78
Logic in the update strategy is below

Page 22 of 78
Approach_2: Using parameter file

1 First need to create mapping parameter ($$LastUpdateDate Time )and assign


initial value as old date (01/01/1940) in the parameterfile.
2 Then override source qualifier query to fetch only LAT_UPD_DATE >=($
$LastUpdateDate Time (Mapping var)
3 Update mapping parameter($$LastUpdateDate Time) values in the parameter
file using shell script or another mapping after first session get completed
successfully
4 Because its mapping parameter so every time we need to update the value in
the parameter file after comptetion of main session.

Parameterfile format

[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.
ST:s_m_GEHC_APO_BAAN_SALES_HIST_AUSTRIA]

$DBConnection_Source=DMD2_GEMS_ETL

$DBConnection_Target=DMD2_GEMS_ETL

$$LastUpdateDate Time =01/01/1940

Page 23 of 78
Updating parameter File

Logic in the expression

Page 24 of 78
Main mapping

Page 25 of 78
Sql override in SQ Transformation

Workflod Design

Page 26 of 78
Parameter file

It is a text file below is the format for parameter file. We use to place this file in the unix box where
we have installed our informatic server.

[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.
ST:s_m_GEHC_APO_BAAN_SALES_HIST_AUSTRIA]

$InputFileName_BAAN_SALE_HIST=/interface/dev/etl/apo/srcfiles/HS_025_20070921

$DBConnection_Target=DMD2_GEMS_ETL

$$CountryCode=AT

$$CustomerNumber=120165

Page 27 of 78
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.
ST:s_m_GEHC_APO_BAAN_SALES_HIST_BELGIUM]

$DBConnection_Sourcet=DEVL1C1_GEMS_ETL

$OutputFileName_BAAN_SALES=/interface/dev/etl/apo/trgfiles/HS_002_20070921

$$CountryCode=BE

$$CustomerNumber=101495

Approach_3: Using oracle Control tables

1 First we need to create two control tables cont_tbl_1 and cont_tbl_1 with
structure of session_st_time, wf_name
2 Then insert one record in each table with session_st_time=1/1/1940 and
workflow_name
3 create two store procedures one for update cont_tbl_1 with session st_time,
set property of store procedure type as Source_pre_load .
4 In 2nd store procedure set property of store procedure type as Target
_Post_load.this proc will update the session _st_time in Cont_tbl_2 from
cnt_tbl_1.
5 Then override source qualifier query to fetch only LAT_UPD_DATE >=(Select
session_st_time from cont_tbl_2 where workflow name=’Actual work flow
name’.
How to load cumulative salary in to target ?

Solution:

Using var ports in expression we can load cumulative salary into target.

Page 28 of 78
Also below is the logic for converting columns into Rows without using Normalizer
Transformation.

Page 29 of 78
1)      Source will contain two columns address and id.

2)      Use sorter to arrange the rows in ascending order.

3)      Then create expression as shown in below screen shot.

1) Use Aggregator transformation and check group by on port id only. As shown below:-

Page 30 of 78
Difference between dynamic lkp and static lkp cache?

1 In Dynamic lkp the cache memory will get refreshed as soon as the record get
inserted or updated/deleted in the lookup table where as in static lookup the
cache memory will not get refreshed even though record inserted or updated
in the lookup table it will refresh only in the next session run.
2 Best example where we need to use dynamic chache is if suppose first record
and last record both are same but there is a change in the address what
informatica mapping has to do here is first record needs to get insert and last
record should get update in the target table.
3 If we use static look up first record it will go to lookup and check in the lkp
cache based on the condition it will not find the macth so it returns null value
then in the router will send that recod to insert flow.
4 But still this record does not available in the cach memory so when the last
record comes to look up it will check in the cache it will not find the match it
returns null values again it will go to insert flow through router but it suppose
to go update flow because cache didn’t refreshed when the first record get
insert in to target table. So if we use dynamic look up we can achieve our
requirement because first time record get insert then immediately cache also
get refresh with the target data. When we process last record it will find the
match in the cache so it returns the value then router will route that record to
update flow.
What is the difference between snow flake and star schema

Star Schema Snow Flake Schema

The star schema is the simplest data Snowflake schema is a more complex data
warehouse schema. warehouse model than a star schema.

In star schema each of the dimensions is In snow flake schema at least one hierarchy
represented in a single table .It should not should exists between dimension tables.
have any hierarchies between dims.

Page 31 of 78
It contains a fact table surrounded by It contains a fact table surrounded by
dimension tables. If the dimensions are de- dimension tables. If a dimension is
normalized, we say it is a star schema normalized, we say it is a snow flaked
design. design.

In star schema only one join establishes the In snow flake schema since there is
relationship between the fact table and any relationship between the dimensions tables
one of the dimension tables. it has to do many joins to fetch the data.

A star schema optimizes the performance Snowflake schemas normalize dimensions


by keeping queries simple and providing to eliminated redundancy. The result is
fast response time. All the information more complex queries and reduced query
about the each level is stored in one row. performance.

It is called a star schema because the It is called a snowflake schema because the
diagram resembles a star. diagram resembles a snowflake.

Difference between data mart and data warehouse

Data Mart Data Warehouse

Data mart is usually sponsored at the Data warehouse is a “Subject-Oriented,


department level and developed with a Integrated, Time-Variant, Nonvolatile
specific issue or subject in mind, a data collection of data in support of decision
mart is a data warehouse with a focused making”.
objective.

A data mart is used on a business division/ A data warehouse is used on an enterprise


department level. level

A Data Mart is a subset of data from a Data A Data Warehouse is simply an integrated
Warehouse. Data Marts are built for consolidation of data from a variety of
specific user groups. sources that is specially designed to
support strategic and tactical decision
making.

By providing decision makers with only a The main objective of Data Warehouse is to
subset of data from the Data Warehouse, provide an integrated environment and
Privacy, Performance and Clarity Objectives coherent picture of the business at a point
can be attained. in time.

Differences between connected lookup and unconnected lookup

Connected Lookup Unconnected Lookup

Page 32 of 78
This is connected to pipleline and receives Which is not dynamic cache
the input values from pipleline.
to pipeline and receives input values from
the result of a: LKP expression in another
transformation via arguments.

We cannot use this lookup more than once We can use this transformation more than
in a mapping. once within the mapping

We can return multiple columns from the Designate one return port (R), returns one
same row. column from each row.

We can configure to use dynamic cache. We cannot configure to use dynamic cache.

Pass multiple output values to another Pass one output value to another
transformation. Link lookup/output ports transformation. The lookup/output/return
to another transformation. port passes the value to the transformation
calling: LKP expression.

Use a dynamic or static cache Use a static cache

Supports user defined default values. Does not support user defined default
values.

Cache includes the lookup source column Cache includes all lookup/output ports in
in the lookup condition and the lookup the lookup condition and the lookup/return
source columns that are output ports. port.

What is the difference between joiner and lookup

Joiner Lookup

In joiner on multiple matches it will return In lookup it will return either first record or
all matching records. last record or any value or error value.

In joiner we cannot configure to use Where as in lookup we can configure to use


persistence cache, shared cache, uncached persistence cache, shared cache, uncached
and dynamic cache and dynamic cache.

We cannot override the query in joiner We can override the query in lookup to
fetch the data from multiple tables.

We can perform outer join in joiner We cannot perform outer join in lookup

Page 33 of 78
transformation. transformation.

We cannot use relational operators in joiner Where as in lookup we can use the relation
transformation.(i.e. <,>,<= and so on) operators. (i.e. <,>,<= and so on)

What is the difference between source qualifier and lookup

Source Qualifier Lookup

In source qualifier it will push all the Where as in lookup we can restrict whether
matching records. to display first value, last value or any value

In source qualifier there is no concept of Where as in lookup we concentrate on


cache. cache concept.

When both source and lookup are in same When the source and lookup table exists in
database we can use source qualifier. different database then we need to use
lookup.

Differences between dynamic lookup and static lookup

Dynamic Lookup Cache Static Lookup Cache

In dynamic lookup the cache memory will In static lookup the cache memory will not
get refreshed as soon as the record get get refreshed even though record inserted
inserted or updated/deleted in the lookup or updated in the lookup table it will
table. refresh only in the next session run.

When we configure a lookup It is a default cache.


transformation to use a dynamic lookup
cache, you can only use the equality
operator in the lookup condition.

NewLookupRow port will enable


automatically.

Best example where we need to use If we use static lookup first record it will go
dynamic cache is if suppose first record and to lookup and check in the lookup cache
last record both are same but there is a based on the condition it will not find the
change in the address. What informatica match so it will return null value then in the
mapping has to do here is first record needs router it will send that record to insert
to get insert and last record should get flow.
update in the target table.
Page 34 of 78
But still this record dose not available in
the cache memory so when the last record
comes to lookup it will check in the cache it
will not find the match so it returns null
value again it will go to insert flow through
router but it is suppose to go to update
flow because cache didn’t get refreshed
when the first record get inserted into
target table.

How to Process multiple flat files to single target table through informatica if all files are
same structure?

We can process all flat files through one mapping and one session using list file.

First we need to create list file using Unix script for all flat file the extension of the list file
is .LST.

This list file it will have only flat file names.

At session level we need to set

Source file directory as list file path

And source file name as list file name

And file type as indirect.

How to populate file name to target while loading multiple files using list file concept.

In informatica 8.6 by selecting Add currently processed flatfile name option in the
properties tab of source definition after import source file defination in source analyzer.It
will add new column as currently processed file name.we can map this column to target to
populate filename.

SCD Type-II Effective-Date Approach

● We have one of the dimension in current project called resource dimension. Here we
are maintaining the history to keep track of SCD changes.
● To maintain the history in slowly changing dimension or resource dimension. We
followed SCD Type-II Effective-Date approach.
● My resource dimension structure would be eff-start-date, eff-end-date, s.k and
source columns.

Page 35 of 78
● Whenever I do a insert into dimension I would populate eff-start-date with sysdate,
eff-end-date with future date and s.k as a sequence number.
● If the record already present in my dimension but there is change in the source data.
In that case what I need to do is
● Update the previous record eff-end-date with sysdate and insert as a new record
with source data.

Informatica design to implement SDC Type-II effective-date approach

● Once you fetch the record from source qualifier. We will send it to lookup to find out
whether the record is present in the target or not based on source primary key
column.
● Once we find the match in the lookup we are taking SCD column and s.k column
from lookup to expression transformation.
● In lookup transformation we need to override the lookup override query to fetch
active records from the dimension while building the cache.
● In expression transformation I can compare source with lookup return data.
● If the source and target data is same then I can make a flag as ‘S’.
● If the source and target data is different then I can make a flag as ‘U’.
● If source data does not exists in the target that means lookup returns null value. I
can flag it as ‘I’.
● Based on the flag values in router I can route the data into insert and update flow.
● If flag=’I’ or ‘U’ I will pass it to insert flow.
● If flag=’U’ I will pass this record to eff-date update flow
● When we do insert we are passing the sequence value to s.k.
● Whenever we do update we are updating the eff-end-date column based on lookup
return s.k value.

Complex Mapping

● We have one of the order file requirement. Requirement is every day in source
system they will place filename with timestamp in informatica server.
● We have to process the same date file through informatica.
● Source file directory contain older than 30 days files with timestamps.
● For this requirement if I hardcode the timestamp for source file name it will process
the same file every day.
● So what I did here is I created $InputFilename for source file name.
● Then I am going to use the parameter file to supply the values to session variables
($InputFilename).
● To update this parameter file I have created one more mapping.
● This mapping will update the parameter file with appended timestamp to file name.
● I make sure to run this parameter file update mapping before my actual mapping.

Page 36 of 78
How to handle errors in informatica?

● We have one of the source with numerator and denominator values we need to
calculate num/deno
● When populating to target.
● If deno=0 I should not load this record into target table.
● We need to send those records to flat file after completion of 1 st session run. Shell
script will check the file size.
● If the file size is greater than zero then it will send email notification to source
system POC (point of contact) along with deno zero record file and appropriate email
subject and body.
● If file size<=0 that means there is no records in flat file. In this case shell script will
not send any email notification.
● Or
● We are expecting a not null value for one of the source column.
● If it is null that means it is a error record.
● We can use the above approach for error handling.

Worklet

Worklet is a set of reusable sessions. We cannot run the worklet without workflow.

If we want to run 2 workflow one after another.

● If both workflow exists in same folder we can create 2 worklet rather than creating 2
workfolws.
● Finally we can call these 2 worklets in one workflow.
● There we can set the dependency.
● If both workflows exists in different folders or repository then we cannot create
worklet.
● We can set the dependency between these two workflow using shell script is one
approach.
● The other approach is event wait and event rise.

In shell script approach

● As soon as it completes first workflow we are creating zero byte file (indicator file).
● If indicator file is available in particular location. We will run second workflow.
● If indicator file is not available we will wait for 5 minutes and again we will check for
the indicator. Like this we will continue the loop for 5 times i.e 30 minutes.
● After 30 minutes if the file does not exists we will send out email notification.

Event wait and Event rise approach

In event wait it will wait for infinite time. Till the indicator file is available.

Page 37 of 78
Why we need source qualifier?
Simply it performs select statement.
Select statement fetches the data in the form of row.
Source qualifier will select the data from the source table.
It identifies the record from the source.
It will convert the data types from database language to informatica understandable
language

Parameter file it will supply the values to session level variables and mapping level
variables.

Variables are of two types:

● Session level variables


● Mapping level variables

Session level variables are of four types:


● $DBConnection_Source
● $DBConnection_Target
● $InputFile
● $OutputFile

Mapping level variables are of two types:


● Variable
● Parameter

What is the difference between mapping level and session level variables?
Mapping level variables always starts with $$.
A session level variable always starts with $.

Flat File
Flat file is a collection of data in a file in the specific format.

Informatica can support two types of files

● Delimiter
● Fixed Width

In delimiter we need to specify the separator.

In fixed width we need to known about the format first. Means how many character to read
for particular column.

In delimiter also it is necessary to know about the structure of the delimiter. Because to
know about the headers.

Page 38 of 78
If the file contains the header then in definition we need to skip the first row.

List file:

If you want to process multiple files with same structure. We don’t need multiple mapping
and multiple sessions.

We can use one mapping one session using list file option.

First we need to create the list file for all the files. Then we can use this file in the main
mapping.

Aggregator Transformation:

Transformation type:
Active
Connected
The Aggregator transformation performs aggregate calculations, such as averages and sums.
The Aggregator transformation is unlike the Expression transformation, in that you use the
Aggregator transformation to perform calculations on groups. The Expression
transformation permits you to perform calculations on a row-by-row basis only.

Components of the Aggregator Transformation:

The Aggregator is an active transformation, changing the number of rows in the pipeline.
The Aggregator transformation has the following components and options

Aggregate cache: The Integration Service stores data in the aggregate cache until it
completes aggregate calculations. It stores group values in an index cache and row data in
the data cache.

Aggregate expression: Enter an expression in an output port. The expression can include
non-aggregate expressions and conditional clauses.

Group by port: Indicate how to create groups. The port can be any input, input/output,
output, or variable port. When grouping data, the Aggregator transformation outputs the
last row of each group unless otherwise specified.

Sorted input: Select this option to improve session performance. To use sorted input, you
must pass data to the Aggregator transformation sorted by group by port, in ascending or
descending order.

Aggregate Expressions:

The Designer allows aggregate expressions only in the Aggregator transformation. An


aggregate expression can include conditional clauses and non-aggregate functions. It can
also include one aggregate function nested within another aggregate function, such as:

Page 39 of 78
MAX (COUNT (ITEM))

The result of an aggregate expression varies depending on the group by ports used in the
transformation

Aggregate Functions

Use the following aggregate functions within an Aggregator transformation. You can nest
one aggregate function within another aggregate function.

The transformation language includes the following aggregate functions:

AVG
COUNT

FIRST
LAST

MAX
MEDIAN

MIN
PERCENTILE

STDDEV
SUM

VARIANCE

When you use any of these functions, you must use them in an expression within an
Aggregator transformation.

Tips

Use sorted input to decrease the use of aggregate caches.

Sorted input reduces the amount of data cached during the session and improves session
performance. Use this option with the Sorter transformation to pass sorted data to the
Aggregator transformation.

Limit connected input/output or output ports.

Limit the number of connected input/output or output ports to reduce the amount of data
the Aggregator transformation stores in the data cache.

Filter the data before aggregating it.

If you use a Filter transformation in the mapping, place the transformation before the
Aggregator transformation to reduce unnecessary aggregation.

Page 40 of 78
Normalizer Transformation:

Transformation type:
Active
Connected
The Normalizer transformation receives a row that contains multiple-occurring columns and
returns a row for each instance of the multiple-occurring data.
The Normalizer transformation parses multiple-occurring columns from COBOL sources,
relational tables, or other sources. It can process multiple record types from a COBOL source
that contains a REDEFINES clause.

The Normalizer transformation generates a key for each source row. The Integration Service
increments the generated key sequence number each time it processes a source row. When
the source row contains a multiple-occurring column or a multiple-occurring group of
columns, the Normalizer transformation returns a row for each occurrence. Each row
contains the same generated key value.
Transaction Control Transformation
Transformation type:
Active
Connected

PowerCenter lets you control commit and roll back transactions based on a set of rows that
pass through a Transaction Control transformation. A transaction is the set of rows bound
by commit or roll back rows. You can define a transaction based on a varying number of
input rows. You might want to define transactions based on a group of rows ordered on a
common key, such as employee ID or order entry date.

In PowerCenter, you define transaction control at the following levels:


Within a mapping. Within a mapping, you use the Transaction Control transformation to
define a transaction. You define transactions using an expression in a Transaction Control
transformation. Based on the return value of the expression, you can choose to commit, roll
back, or continue without any transaction changes.
Within a session. When you configure a session, you configure it for user-defined commit.
You can choose to commit or roll back a transaction if the Integration Service fails to
transform or write any row to the target.

When you run the session, the Integration Service evaluates the expression for each row
that enters the transformation. When it evaluates a commit row, it commits all rows in the
transaction to the target or targets. When the Integration Service evaluates a roll back row,
it rolls back all rows in the transaction from the target or targets.

If the mapping has a flat file target you can generate an output file each time the Integration
Service starts a new transaction. You can dynamically name each target flat file.

Page 41 of 78
Union Transformation

1. What is a union transformation?

A union transformation is used merge data from multiple sources similar to the UNION ALL SQL
statement to combine the results from two or more SQL statements.

2. As union transformation gives UNION ALL output, how you will get the UNION output?

Pass the output of union transformation to a sorter transformation. In the properties of sorter
transformation check the option select distinct. Alternatively you can pass the output of union
transformation to aggregator transformation and in the aggregator transformation specify all ports
as group by ports.

3. What are the guidelines to be followed while using union transformation?

The following rules and guidelines need to be taken care while working with union
transformation:

● You can create multiple input groups, but only one output group.
● All input groups and the output group must have matching ports. The precision, datatype, and
scale must be identical across all groups.
● The Union transformation does not remove duplicate rows. To remove duplicate rows, you must
add another transformation such as a Router or Filter transformation.
● You cannot use a Sequence Generator or Update Strategy transformation upstream from a
Union transformation.
● The Union transformation does not generate transactions.

4. Why union transformation is an active transformation?

Union is an active transformation because it combines two or more data streams into one.
Though the total number of rows passing into the Union is the same as the total number of rows
passing out of it, and the sequence of rows from any given input stream is preserved in the
output, the positions of the rows are not preserved, i.e. row number 1 from input stream 1 might
not be row number 1 in the output stream. Union does not even guarantee that the output is
repeatable

Aggregator Transformation

1. What is aggregator transformation?


Aggregator transformation performs aggregate calculations like sum, average, count etc. It is an
active transformation, changes the number of rows in the pipeline. Unlike expression
transformation (performs calculations on a row-by-row basis), an aggregator transformation
performs calculations on group of rows.

2. What is aggregate cache?


The integration service creates index and data cache in memory to process the aggregator
transformation and stores the data group in index cache, row data in data cache. If the
integration service requires more space, it stores the overflow values in cache files.

3. How can we improve performance of aggregate transformation?

● Use sorted input: Sort the data before passing into aggregator. The integration service uses
memory to process the aggregator transformation and it does not use cache memory.

Page 42 of 78
● Filter the unwanted data before aggregating.
● Limit the number of input/output or output ports to reduce the amount of data the aggregator
transformation stores in the data cache.

4. What are the different types of aggregate functions?

The different types of aggregate functions are listed below:

● AVG
● COUNT
● FIRST
● LAST
● MAX
● MEDIAN
● MIN
● PERCENTILE
● STDDEV
● SUM
● VARIANCE

5. Why cannot you use both single level and nested aggregate functions in a single aggregate
transformation?

The nested aggregate function returns only one output row, whereas the single level aggregate
function returns more than one row. Since the number of rows returned are not same, you cannot
use both single level and nested aggregate functions in the same transformation. If you include
both the single level and nested functions in the same aggregator, the designer marks the
mapping or mapplet as invalid. So, you need to create separate aggregator transformations.

6. Up to how many levels, you can nest the aggregate functions?

We can nest up to two levels only.


Example: MAX( SUM( ITEM ) )

7. What is incremental aggregation?

The integration service performs aggregate calculations and then stores the data in historical
cache. Next time when you run the session, the integration service reads only new data and uses
the historical cache to perform new aggregation calculations incrementally.

8. Why cannot we use sorted input option for incremental aggregation?

In incremental aggregation, the aggregate calculations are stored in historical cache on the
server. In this historical cache the data need not be in sorted order.  If you give sorted input, the
records come as presorted for that particular run but in the historical cache the data may not be
in the sorted order. That is why this option is not allowed.

9. How the NULL values are handled in Aggregator?

Page 43 of 78
You can configure the integration service to treat null values in aggregator functions as NULL or
zero. By default the integration service treats null values as NULL in aggregate functions.

Normalizer Transformation

1. What is normalizer transformation?

The normalizer transformation receives a row that contains multiple-occurring columns and
retruns a row for each instance of the multiple-occurring data. This means it converts column
data in to row data. Normalizer is an active transformation.

2. Which transformation is required to process the cobol sources?

Since the cobol sources contain denormalzed data, normalizer transformation is used to
normalize the cobol sources.

3. What is generated key and generated column id in a normalizer transformation?

● The integration service increments the generated key sequence number each time it process a
source row. When the source row contains a multiple-occurring column or a multiple-occurring
group of columns, the normalizer transformation returns a row for each occurrence. Each row
contains the same generated key value.
● The normalizer transformation has a generated column ID (GCID) port for each multiple-
occurring column. The GCID is an index for the instance of the multiple-occurring data. For
example, if a column occurs 3 times in a source record, the normalizer returns a value of 1,2 or
3 in the generated column ID.

4. What is VSAM?

VSAM (Virtual Storage Access Method) is a file access method for an IBM mainframe operating
system. VSAM organize records in indexed or sequential flat files.

5. What is VSAM normalizer transformation?

The VSAM normalizer transformation is the source qualifier transformation for a COBOL source
definition. A COBOL source is flat file that can contain multiple-occurring data and multiple types
of records in the same file.

6. What is pipeline normalizer transformation?

Pipeline normalizer transformation processes multiple-occurring data from relational tables or flat
files.

7. What is occurs clause and redefines clause in normalizer transformation?

● Occurs clause is specified when the source row has a multiple-occurring columns.
● A redefines clause is specified when the source has rows of multiple columns

Rank Transformation

1. What is rank transformation?

A rank transformation is used to select top or bottom rank of data. This means, it
selects the largest or smallest numeric value in a port or group. Rank

Page 44 of 78
transformation also selects the strings at the top or bottom of a session sort
order. Rank transformation is an active transformation.

2. What is rank cache?

The integration service compares input rows in the data cache, if the input row
out-ranks a cached row, the integration service replaces the cached row with the
input row. If you configure the rank transformation to rank across multiple groups,
the integration service ranks incrementally for each group it finds. The integration
service stores group information in index cache and row data in data cache.

3. What is RANKINDEX port?

The designer creates RANKINDEX port for each rank transformation. The
integration service uses the rank index port to store the ranking position for each
row in a group.

4. How do you specify the number of rows you want to rank in a rank
transformation?

In the rank transformation properties, there is an option 'Number of Ranks' for


specifying the number of rows you wants to rank.

5. How to select either top or bottom ranking for a column?

In the rank transformation properties, there is an option 'Top/Bottom' for selecting


the top or bottom ranking for a column.

6. Can we specify ranking on more than one port?

No. We can specify to rank the data based on only one port. In the ports tab, you
have to check the R option for designating the port as a rank port and this option
can be checked only on one port

Joiner Transformation

1. What is a joiner transformation?

A joiner transformation joins two heterogeneous sources. You can also join the data from the
same source. The joiner transformation joins sources with at least one matching column. The
joiner uses a condition that matches one or more joins of columns between the two sources.

2. How many joiner transformations are required to join n sources?

To join n sources n-1 joiner transformations are required.

3. What are the limitations of joiner transformation?

● You cannot use a joiner transformation when input pipeline contains an update strategy
transformation.

Page 45 of 78
● You cannot use a joiner if you connect a sequence generator transformation directly before the
joiner.

4. What are the different types of joins?

● Normal join: In a normal join, the integration service discards all the rows from the master and
detail source that do not match the join condition.
● Master outer join: A master outer join keeps all the rows of data from the detail source and the
matching rows from the master source. It discards the unmatched rows from the master source.
● Detail outer join: A detail outer join keeps all the rows of data from the master source and the
matching rows from the detail source. It discards the unmatched rows from the detail source.
● Full outer join: A full outer join keeps all rows of data from both the master and detail rows.

5. What is joiner cache?

When the integration service processes a joiner transformation, it reads the rows from master
source and builds the index and data cached. Then the integration service reads the detail
source and performs the join. In case of sorted joiner, the integration service reads both sources
(master and detail) concurrently and builds the cache based on the master rows.

6. How to improve the performance of joiner transformation?

● Join sorted data whenever possible.


● For an unsorted Joiner transformation, designate the source with fewer rows as the master
source.
● For a sorted Joiner transformation, designate the source with fewer duplicate key values as the
master source.

7. Why joiner is a blocking transformation?

When the integration service processes an unsorted joiner transformation, it reads all master
rows before it reads the detail rows. To ensure it reads all master rows before the detail rows, the
integration service blocks all the details source while it caches rows from the master source. As it
blocks the detail source, the unsorted joiner is called a blocking transformation.

8. What are the settings used to configure the joiner transformation

● Master and detail source


● Type of join
● Join condition

Router Transformation

1. What is a router transformation?

A router is used to filter the rows in a mapping. Unlike filter transformation, you can specify one
or more conditions in a router transformation. Router is an active transformation.

2. How to improve the performance of a session using router transformation?

Use router transformation in a mapping instead of creating multiple filter transformations to


perform the same task. The router transformation is more efficient in this case. When you use a
router transformation in a mapping, the integration service processes the incoming data only
once. When you use multiple filter transformations, the integration service processes the

Page 46 of 78
incoming data for each transformation.

3. What are the different groups in router transformation?

The router transformation has the following types of groups:

● Input
● Output

4. How many types of output groups are there?

There are two types of output groups:

● User-defined group
● Default group

5. Where you specify the filter conditions in the router transformation?

You can creat the group filter conditions in the groups tab using the expression editor.

6. Can you connect ports of two output groups from router transformation to a single target?

No. You cannot connect more than one output group to one target or a single input group
transformation.

Stored Procedure Transformation

1. What is a stored procedure?

A stored procedure is a precompiled collection of database procedural statements. Stored


procedures are stored and run within the database.

2. Give some examples where a stored procedure is used?

The stored procedure can be used to do the following tasks

● Check the status of a target database before loading data into it.
● Determine if enough space exists in a database.
● Perform a specialized calculation.
● Drop and recreate indexes.

3. What is a connected stored procedure transformation?

The stored procedure transformation is connected to the other transformations in the mapping
pipeline.

4. In which scenarios a connected stored procedure transformation is used?

● Run a stored procedure every time a row passes through the mapping.

Page 47 of 78
● Pass parameters to the stored procedure and receive multiple output parameters.

5. What is an unconnected stored procedure transformation?

The stored procedure transformation is not connected directly to the flow of the mapping. It either
runs before or after the session or is called by an expression in another transformation in the
mapping.

6. In which scenarios an unconnected stored procedure transformation is used?

● Run a stored procedure before or after a session


● Run a stored procedure once during a mapping, such as pre or post-session.
● Run a stored procedure based on data that passes through the mapping, such as when a
specific port does not contain a null value.
● Run nested stored procedures.
● Call multiple times within a mapping.

7. What are the options available to specify when the stored procedure transformation needs to
be run?

The following options describe when the stored procedure transformation runs:

● Normal: The stored procedure runs where the transformation exists in the mapping on a row-by-
row basis. This is useful for calling the stored procedure for each row of data that passes
through the mapping, such as running a calculation against an input port. Connected stored
procedures run only in normal mode.
● Pre-load of the Source: Before the session retrieves data from the source, the stored procedure
runs. This is useful for verifying the existence of tables or performing joins of data in a
temporary table.
● Post-load of the Source: After the session retrieves data from the source, the stored procedure
runs. This is useful for removing temporary tables.
● Pre-load of the Target: Before the session sends data to the target, the stored procedure runs.
This is useful for verifying target tables or disk space on the target system.
● Post-load of the Target: After the session sends data to the target, the stored procedure runs.
This is useful for re-creating indexes on the database.

A connected stored procedure transformation runs only in Normal mode. A unconnected stored
procedure transformation runs in all the above modes.

8. What is execution order in stored procedure transformation?

The order in which the Integration Service calls the stored procedure used in the transformation,
relative to any other stored procedures in the same mapping. Only used when the Stored
Procedure Type is set to anything except Normal and more than one stored procedure exists.

9. What is PROC_RESULT in stored procedure transformation?

PROC_RESULT is a system variable, where the output of an unconnected stored procedure


transformation is assigned by default.

10. What are the parameter types in a stored procedure?

There are three types of parameters exist in a stored procedure:

Page 48 of 78
● IN: Input passed to the stored procedure
● OUT: Output returned from the stored procedure
● INOUT: Defines the parameter as both input and output. Only Oracle supports this parameter
type.

Source Qualifier Transformation

1. What is a source qualifier transformation?

A source qualifier represents the rows that the integration service reads when it runs a session.
Source qualifier is an active transformation.

2. Why you need a source qualifier transformation?

The source qualifier transformation converts the source data types into informatica native data
types.

3. What are the different tasks a source qualifier can do?

● Join two or more tables originating from the same source (homogeneous sources) database.
● Filter the rows.
● Sort the data
● Selecting distinct values from the source
● Create custom query
● Specify a pre-sql and post-sql

4. What is the default join in source qualifier transformation?

The source qualifier transformation joins the tables based on the primary key-foreign key
relationship.

5. How to create a custom join in source qualifier transformation?

When there is no primary key-foreign key relationship between the tables, you can specify a
custom join using the 'user-defined join' option in the properties tab of source qualifier.

6. How to join heterogeneous sources and flat files?

Use joiner transformation to join heterogeneous sources and flat files

7. How do you configure a source qualifier transformation?

● SQL Query
● User-Defined Join
● Source Filter
● Number of Sorted Ports
● Select Distinct
● Pre-SQL
● Post-SQL

Sequence Generator Transformation

1. What is a sequence generator transformation?

Page 49 of 78
A Sequence generator transformation generates numeric values. Sequence generator
transformation is a passive transformation.

2. What is the use of a sequence generator transformation?

A sequence generator is used to create unique primary key values, replace missing primary key
values or cycle through a sequential range of numbers.

3. What are the ports in sequence generator transformation?

A sequence generator contains two output ports. They are CURRVAL and NEXTVAL.

4. What is the maximum number of sequence that a sequence generator can generate?

The maximum value is 9,223,372,036,854,775,807

5. When you connect both the NEXTVAL and CURRVAL ports to a target, what will be the output
values of these ports?

The output values are


NEXTVAL  CURRVAL
1        2
2        3
3        4
4        5
5        6

6. What will be the output value, if you connect only CURRVAL to the target without connecting
NEXTVAL?

The integration service passes a constant value for each row.

7. What will be the value of CURRVAL in a sequence generator transformation?

CURRVAL is the sum of "NEXTVAL" and "Increment By" Value.

8. What is the number of cached values set to default for a sequence generator transformation?

For non-reusable sequence generators, the number of cached values is set to zero.
For reusable sequence generators, the number of cached values is set to 1000.

9. How do you configure a sequence generator transformation?

The following properties need to be configured for a sequence generator transformation:

● Start Value
● Increment By
● End Value
● Current Value
● Cycle
● Number of Cached Values

Lookup Transformation

Page 50 of 78
1. What is a lookup transformation?
A lookup transformation is used to look up data in a flat file, relational table, view, and synonym.

2. What are the tasks of a lookup transformation?


The lookup transformation is used to perform the following tasks?

● Get a related value: Retrieve a value from the lookup table based on a value in the source.
● Perform a calculation: Retrieve a value from a lookup table and use it in a calculation.
● Update slowly changing dimension tables: Determine whether rows exist in a target.

3. How do you configure a lookup transformation?


Configure the lookup transformation to perform the following types of lookups:

● Relational or flat file lookup


● Pipeline lookup
● Connected or unconnected lookup
● Cached or uncached lookup

4. What is a pipeline lookup transformation?


A pipeline lookup transformation is used to perform lookup on application sources such as JMS,
MSMQ or SAP. A pipeline lookup transformation has a source qualifier as the lookups source.

5. What is connected and unconnected lookup transformation?

● A connected lookup transformation is connected the transformations in the mapping pipeline. It


receives source data, performs a lookup and returns data to the pipeline.
● An unconnected lookup transformation is not connected to the other transformations in the
mapping pipeline. A transformation in the pipeline calls the unconnected lookup with a :LKP
expression.

6. What are the differences between connected and unconnected lookup transformation?

● Connected lookup transformation receives input values directly from the pipeline. Unconnected
lookup transformation receives input values from the result of a :LKP expression in another
transformation.
● Connected lookup transformation can be configured as dynamic or static cache. Unconnected
lookup transformation can be configured only as static cache.
● Connected lookup transformation can return multiple columns from the same row or insert into
the dynamic lookup cache. Unconnected lookup transformation can return one column from
each row.
● If there is no match for the lookup condition, connected lookup transformation returns default
value for all output ports. If you configure dynamic caching, the Integration Service inserts rows
into the cache or leaves it unchanged. If there is no match for the lookup condition, the
unconnected lookup transformation returns null.
● In a connected lookup transformation, the cache includes the lookup source columns in the
lookup condition and the lookup source columns that are output ports. In an unconnected
lookup transformation, the cache includes all lookup/output ports in the lookup condition and the
lookup/return port.
● Connected lookup transformation passes multiple output values to another transformation.
Unconnected lookup transformation passes one output value to another transformation.
● Connected lookup transformation supports user-defined values. Unconnected lookup
transformation does not support user-defined default values.

Page 51 of 78
7. How do you handle multiple matches in lookup transformation? or what is "Lookup Policy on
Multiple Match"?
"Lookup Policy on Multiple Match" option is used to determine which rows that the lookup
transformation returns when it finds multiple rows that match the lookup condition. You can select
lookup to return first or last row or any matching row or to report an error.

8. What is "Output Old Value on Update"?


This option is used when dynamic cache is enabled. When this option is enabled, the integration
service outputs old values out of the lookup/output ports. When the Integration Service updates a
row in the cache, it outputs the value that existed in the lookup cache before it updated the row
based on the input data. When the Integration Service inserts a new row in the cache, it outputs
null values. When you disable this property, the Integration Service outputs the same values out
of the lookup/output and input/output ports.

9. What is "Insert Else Update" and "Update Else Insert"?


These options are used when dynamic cache is enabled.

● Insert Else Update option applies to rows entering the lookup transformation with the row type of
insert. When this option is enabled the integration service inserts new rows in the cache and
updates existing rows when disabled, the Integration Service does not update existing rows.
● Update Else Insert option applies to rows entering the lookup transformation with the row type of
update. When this option is enabled, the Integration Service updates existing rows, and inserts
a new row if it is new. When disabled, the Integration Service does not insert new rows.

10. What are the options available to configure a lookup cache?


The following options can be used to configure a lookup cache:

● Persistent cache
● Recache from lookup source
● Static cache
● Dynamic cache
● Shared Cache
● Pre-build lookup cache

11. What is a cached lookup transformation and uncached lookup transformation?

● Cached lookup transformation: The Integration Service builds a cache in memory when it
processes the first row of data in a cached Lookup transformation. The Integration Service
stores condition values in the index cache and output values in the data cache. The Integration
Service queries the cache for each row that enters the transformation.
● Uncached lookup transformation: For each row that enters the lookup transformation, the
Integration Service queries the lookup source and returns a value. The integration service does
not build a cache.

12. How the integration service builds the caches for connected lookup transformation?
The Integration Service builds the lookup caches for connected lookup transformation in the
following ways:

● Sequential cache: The Integration Service builds lookup caches sequentially. The Integration
Service builds the cache in memory when it processes the first row of the data in a cached
lookup transformation.

Page 52 of 78
● Concurrent caches: The Integration Service builds lookup caches concurrently. It does not need
to wait for data to reach the Lookup transformation.

13. How the integration service builds the caches for unconnected lookup transformation?
The Integration Service builds caches for unconnected Lookup transformations as sequentially.

14. What is a dynamic cache?


The dynamic cache represents the data in the target. The Integration Service builds the cache
when it processes the first lookup request. It queries the cache based on the lookup condition for
each row that passes into the transformation. The Integration Service updates the lookup cache
as it passes rows to the target. The integration service either inserts the row in the cache or
updates the row in the cache or makes no change to the cache.

15. When you use a dynamic cache, do you need to associate each lookup port with the input
port?
Yes. You need to associate each lookup/output port with the input/output port or a sequence ID.
The Integration Service uses the data in the associated port to insert or update rows in the
lookup cache.

16. What are the different values returned by NewLookupRow port?


The different values are

● 0 - Integration Service does not update or insert the row in the cache.
● 1 - Integration Service inserts the row into the cache.
● 2 - Integration Service updates the row in the cache.

17. What is a persistent cache?


If the lookup source does not change between session runs, then you can improve the
performance by creating a persistent cache for the source. When a session runs for the first time,
the integration service creates the cache files and saves them to disk instead of deleting them.
The next time when the session runs, the integration service builds the memory from the cache
file.

18. What is a shared cache?


You can configure multiple Lookup transformations in a mapping to share a single lookup cache.
The Integration Service builds the cache when it processes the first Lookup transformation. It
uses the same cache to perform lookups for subsequent Lookup transformations that share the
cache.

19. What is unnamed cache and named cache?

● Unnamed cache: When Lookup transformations in a mapping have compatible caching


structures, the Integration Service shares the cache by default. You can only share static
unnamed caches.
● Named cache: Use a persistent named cache when you want to share a cache file across
mappings or share a dynamic and a static cache. The caching structures must match or be
compatible with a named cache. You can share static and dynamic named caches.

20. How do you improve the performance of lookup transformation?

● Create an index on the columns used in the lookup condition


● Place conditions with equality operator first
● Cache small lookup tables.

Page 53 of 78
● Join tables in the database: If the source and the lookup table are in the same database, join the
tables in the database rather than using a lookup transformation.
● Use persistent cache for static lookups.
● Avoid ORDER BY on all columns in the lookup source. Specify explicitly the ORDER By clause
on the required columns.
● For flat file lookups, provide Sorted files as lookup source.

Transaction Control Transformation

1. What is a transaction control transformation?

A transaction is a set of rows bound by a commit or rollback of rows. The transaction control
transformation is used to commit or rollback a group of rows.

2. What is the commit type if you have a transaction control transformation in the mapping?

The commit type is "user-defined".

3. What are the different transaction levels available in transaction control transformation?
The following are the transaction levels or built-in variables:

● TC_CONTINUE_TRANSACTION: The Integration Service does not perform any


transaction change for this row. This is the default value of the expression.
● TC_COMMIT_BEFORE: The Integration Service commits the transaction, begins a new
transaction, and writes the current row to the target. The current row is in the new
transaction.
● TC_COMMIT_AFTER: The Integration Service writes the current row to the target,
commits the transaction, and begins a new transaction. The current row is in the
committed transaction.
● TC_ROLLBACK_BEFORE: The Integration Service rolls back the current transaction,
begins a new transaction, and writes the current row to the target. The current row is in
the new transaction.
● TC_ROLLBACK_AFTER: The Integration Service writes the current row to the target,
rolls back the transaction, and begins a new transaction. The current row is in the rolled
back transaction.
● Basic Commands
● 1. # system administrator prompt
$ user working prompt
● 2. $pwd: It displays present working directory. (If you login with dwhlabs)
● /home/dwhlabs
● 3. $logname: It displays the current username.
● dwhlabs
● 4. $clear: It clears the screen.
● 5. $date: It displays the current system date and time.
● Day Month date hour:min:sec standard name year
FRI MAY 23 10:50:30 IST 2008

● 6. $banner: It prints a message in large letters.


● $banner "tecno"
● 7. $exit: To logout from the current user

Page 54 of 78
● 8. $finger: It displays complete information about all the users who are logged in
● 9. $who: To display information about all users who have logged into the system
currently (login name, terminal number, date and time).
● 10. $who am i: It displays username, terminal number, date and time at which you logged
into the system.
● 11. $cal: It displays previous, present and next month calendar
● $cal year
$cal month year
● 12. $ls: This command is used to find list of files except hidden.
● Administrator Commands
● 1. # system administrator prompt
$ user working prompt
● 2. #useradd: To add the user terminals.
● #useradd user1
● 3. #password: To give the password to a particular user terminal.
● #password user1
Enter password:
Retype password:
● System Run Levels
● #init: To change the system run levels.
● #init 0: To shut down the system
#init 1or s: To bring the system to single user mode.
#init 2: To bring the system multi user mode with no resource shared.
#init 3: To bring the system multi user mode with resource shared.
#init 6: Halt and reboot the system to the default run level.
● ls command options
●  $ls: This command is used to display list of files and directories.
●  $ls -x: It displays width wise
●  $ls | pg: It displays list of files and directories page wise & width wise.
●  $ls -a: It displays files and directories including. and .. hidden files
●  $ls -f: It displays files, directories, executable files, symbolic files

● a1
a2
sample\
letter*
notes@

Page 55 of 78
● \ directory
* executable files
- symbolic link file
●  $ls -r: It displays files and directories in reverse order (descending order)
●  $ls -R: It displays files and directories recursively.
●  $ls -t: It displays files and directories based on date and time of creation.
●  $ls -i: It displays files and directories along with in order number.
●  $ls -l: It displays files and directories in long list format.

lin filenam
file type Permission uname group sizeinbytes Date
k e

nov 3
- rw-r 1 tec group 560 sample
1:30

mar 7
D rwxr-wr 2 tec group 430 student
7.30

● Different types of files:


- ordinary file
d directory file
b special block files
c special character file
l symboliclink file

● Wild card characters or Meta characters


● Wild card characters * , ? , [] ,-, . ,^,$

wild card
Description
character

   

* It matches one or more characters

? It matches any single character

[] It matches any single character in the given list

- It matches any character in that range

Page 56 of 78
. It matches any single character except enter key character.

^ It displays line which starts with that character

$ It displays line which ends with that character.

● Examples:
● * wild card
● $ ls t* :  It displays all files starting with 't' character.
● $ ls *s: It displays all files ending with 's'
● $ ls b*k: It displays all files starting with b and ending with k

● ? wild card
● $ ls t??: It displays all files starting with 't' character and also file length must be 3 characters.
● $ ls ?ram: It displays all files starting with any character and also the length of the file must
be 4 characters.

● - wild card
● $ ls [a-z]ra: It displays all files starting with any character between a to z and ends with
'ra'.The length of the file must be 3 characters.

● [ ] wild card
● $ ls [aeiou]ra: It displays all files starting with 'a' or 'e' or 'i' or 'o' or 'u' character and ending
with 'ra'.
● $ ls [a-b]*: It displays all files starting with any character from a to b.

● . wild card
● $ ls t..u: It displays all files starting with 't' character and ending with 'u'. The length of the
file must be4 characters. It never includes enter key character.

● ^ wild card
● $ ls *sta$: It displays all files which ends with 'sta'. The length of the word can be any
number of characters.

Page 57 of 78
● Filter Commands: grep, fgrep and egrep:
● AWK : Used for scans a file line by line.
● Splits each input line into fields

● $ awk '{print}' employee.txt

● $ grep :Globally Search a Regular expression and print it.


● This command is used to search a particular pattern (single string) in a file or directory and
regular expression (pattern which uses wild card characters).

● Syntax - $grep pattern filename


● Ex. $grep dwhlabs dwh (It displays all the lines in a file which as dwhlabs pattern/string) -
Character Pattern
Ex: $grep "dwhlabs website" dwh - character pattern
eg. $grep \< dwhlabs\> dwh (It displays all the line in a file which as a exact word 'dwhlabs') -
Word Pattern
eg. $grep ^UNIX$ language (It displays all the lines in the file which has a single word 'UNIX' )
- Line Pattern
● z
● $grep options
● $grep -i pattern filename (ignore case sensitive)
$grep -c pattern filename (It displays total count of lines)
$grep -n pattern filename (It displays along with the line number)
$grep -l pattern filename (It displays filename in the current directory)
$grep -v pattern filename (It displays unmatched pattern lines)

● Q) How to remove blank lines in a file?


● A)$grep -v "^$" file1>tmp | mv tmp>file1

● Explanation: Piping concept is used to combine more than one command. Here we first
identifying the non-blank files in file1 and redirecting the result to tmp (temporary file).This

Page 58 of 78
temporary file is input to the next command. Now we are renaming the tmp into file1.Now
check the file1 result <You will not find any blank lines>

● fgrep:
● This command is used to search multiple strings or one string but not regular expression.
● Syntax - $fgrep "pattern1
>pattern2
>....
>patternn " filename
Ex: $fgrep "unix
>c++
>Data Warehouse" stud

● egrep: Extended grep


● This command is used to search for single or multiple patterns and also regular expressions.

● Filter Commands: cut, sort and uniq:


● Filter Commands:
● $ cut: This command is used to retrieve the required fields or characters from a file.

100 Rakesh UNIX HYD

101 Ramani C++ CHE

102 Prabhu C BAN

103 Jyosh DWH CHE

● Syntax - $cut -f 1-3 filename


● Ex. $cut -f 0-2 employee (all the fields in the above file)

100 Rakesh UNIX

101 Ramani C++

102 Prabhu C

103 Jyosh DWH

Page 59 of 78
● Ex. $cut -c 1-8 employee (first 8 characters)

100 Rake

101 Rama

102 Prab

103 Jyos

● Delimeters:

Default ( Tab )

: ,
; * _

● $cut options
● $cut -f 0-2 employee
$cut -c 1-8 employee
$cut -d ',' 1-8 employee

● $ sort: This command is used to sort the file records in ascending or descending

● $sort Options
● $sort -r filename (reverse)
$sort -u filename (unique records)
$sort -n filename (show the number along sorting)

● How to sort data in field wise?


● Syntax- $sort -f +pos1 -pos2 filename
Ex: $sort -f +1 -2 employee

● $ uniq: This command is used to filter the duplicate lines from a file
Note: Always the file must be in sorting order.

Page 60 of 78
● Syntax - $uniq filename
Ex: $uniq employee
● $uniq Options
● Ex: $uniq -u employee (displays non duplicated lines)
Ex: $uniq -d employee (displays duplicated lines)

● Question: How to remove duplicate lines in a file?


● Answer: $uniq employee>temp
$mv temp employee
● $ tr: Translate the character in string1 from stdin into those in string2 in stdout.

$ sed: This command is used for editing files from a script or from the command line.
● $ head: Display first 'N' number of files.

$ tail :: Display last 'N' number of files.

$ cmp: Compare two files andd list where differences occur.

$ diff: Compare the two files and shows the difference.

$ wc: Display word (or character or line ) count for file.

23. What is Dimensional Table? Explain the different dimensions.


Dimension table is the one that describes business entities of an enterprise,
represented as hierarchical, categorical information such as time, departments,
locations, products etc.

Types of dimensions in data warehouse

A dimension table consists of the attributes about the facts. Dimensions store the
textual descriptions of the business. Without the dimensions, we cannot measure the
facts. The different types of dimension tables are explained in detail below.

Conformed Dimension:
A conformed dimension is a dimension that has exactly the same
meaning and content when being referred from different fact tables. A
conformed dimension can refer to multiple tables in multiple data
marts within the same organization. For two dimension tables to be
considered as conformed, they must either be identical or one must
be a subset of another. There cannot be any other type of difference
between the two tables. For example, two dimension tables that are
Page 61 of 78
exactly the same except for the primary key are not considered
Eg: The time dimension is a common conformed dimension in an
organization

● Junk Dimension:
A junk dimension is a collection of random transactional codes flags and/or text
attributes that are unrelated to any particular dimension. The junk dimension is
simply a structure that provides a convenient place to store the junk attributes.
Eg: Assume that we have a gender dimension and marital status dimension. In the
fact table we need to maintain two keys referring to these dimensions. Instead of that
create a junk dimension which has all the combinations of gender and marital status
(cross join gender and marital status table and create a junk table). Now we can
maintain only one key in the fact table.
● Degenerated Dimension:
A degenerate dimension is a dimension which is derived from the fact table and
doesn’t have its own dimension table.
Eg: A transactional code in a fact table.
● Role-playing dimension:
Dimensions which are often used for multiple purposes within the same database are
called role-playing dimensions. For example, a date dimension can be used for “date
of sale”, as well as “date of delivery”, or “date of hire”.

24. What is Fact Table? Explain the different kinds of Facts.


The centralized table in the star schema is called the Fact table. A Fact table
typically contains two types of columns. Columns which contains the measure called
facts and columns, which are foreign keys to the dimension tables. The Primary key
of the fact table is usually the composite key that is made up of the foreign keys of
the dimension tables.

Types of Facts in Data Warehouse

A fact table is the one which consists of the measurements, metrics or facts of
business process. These measurable facts are used to know the business value and
to forecast the future business. The different types of facts are explained in detail
below.

● Additive:
Additive facts are facts that can be summed up through all of the dimensions
in the fact table. A sales fact is a good example for additive fact.
● Semi-Additive:
Semi-additive facts are facts that can be summed up for some of the
dimensions in the fact table, but not the others.
Eg: Daily balances fact can be summed up through the customers dimension
but not through the time dimension.
● Non-Additive:
Non-additive facts are facts that cannot be summed up for any of the
dimensions present in the fact table.
Eg: Facts which have percentages, ratios calculated.

Factless Fact Table:

Page 62 of 78
In the real world, it is possible to have a fact table that contains no measures or
facts. These tables are called “Factless Fact tables”.
E.g: A fact table which has only product key and date key is a factless fact. There
are no measures in this table. But still you can get the number products sold over a
period of time.

A fact table that contains aggregated facts are often called summary tables.

A Mapplet is a reusable object created in the Mapplet Designer which contains a set
of transformations and lets us reuse the transformation logic in multiple mappings.

A Mapplet can contain as many transformations as we need. Like a reusable


transformation when we use a mapplet in a mapping, we use an instance of the
mapplet and any change made to the mapplet is inherited by all instances of the
mapplet.

Session:

● A session is a task that executes mapping.


● A session is created for each Mapping.
● A session is created to provide runtime properties.
● A session is a set of instructions that tells ETL server to move the data from source to destination.
Workflow:
Workflow is a  set of instructions that tells how to run the session taks and when to run the session
tasks.

Data Model:

The primary goal of using data model are:

● Ensures that all data objects required by the database are accurately
represented. Omission of data will lead to creation of faulty reports
and produce incorrect results.
● A data model helps design the database at the conceptual, physical
and logical levels.
● Data Model structure helps to define the relational tables, primary
and foreign keys and stored procedures.
● It provides a clear picture of the base data and can be used by
database developers to create a physical database.
● It is also helpful to identify missing and redundant data.
● Though the initial creation of data model is labor and time
consuming, in the long run, it makes your IT infrastructure upgrade
and maintenance cheaper and faster.

Types of Data Models


Page 63 of 78
Conceptual Model
The main aim of this model is to establish the entities, their attributes, and
their relationships. In this Data modeling level, there is hardly any detail
available of the actual Database structure.

The 3 basic tenants of Data Model are

Entity: A real-world thing

Attribute: Characteristics or properties of an entity

Relationship: Dependency or association between two entities

For example:

● Customer and Product are two entities. Customer number and name
are attributes of the Customer entity
● Product name and price are attributes of product entity
● Sale is the relationship between the customer and product

Logical Data Model


Logical data models add further information to the conceptual model
elements. It defines the structure of the data elements and set the
relationships between them.

Page 64 of 78
The advantage of the Logical data model is to provide a foundation to form
the base for the Physical model. However, the modeling structure remains
generic.
At this Data Modeling level, no primary or secondary key is defined. At this
Data modeling level, you need to verify and adjust the connector details
that were set earlier for relationships.

Physical Data Model


A Physical Data Model describes the database specific implementation of
the data model. It offers an abstraction of the database and helps generate
schema. This is because of the richness of meta-data offered by a Physical
Data Model.

This type of Data model also helps to visualize database structure. It helps
to model database columns keys, constraints, indexes, triggers, and other
RDBMS features.

How to load first half records into one target?

select * from emp where rownum <= ( select count(*)/2 from


emp))

How to load second half records into one target?

In source-Qualifier , go to propery and write the SQL query


like

Page 65 of 78
select * from emp

minus

select * from emp where rownum <= ( select count(*)/2 from


emp))

Unique records into one target and duplicates into another target

Step  1: Drag  the source to mapping and connect it to an aggregator transformation.

Step  2: In aggregator transformation, group by the key column and add a new
port  call it count_rec to count  the key column.
Step  3: connect  a router to the  aggregator from the previous step.In router make
two groups one named "original" and another as "duplicate"
In original write count_rec=1 and in duplicate write count_rec>1.

Bottom-up design: In the bottom-up approach, data marts are first created to provide
reporting and analytical capabilities for specific business processes. These data marts can
then be integrated to create a comprehensive data warehouse. The data warehouse bus
architecture is primarily an implementation of "the bus", a collection of conformed
dimensions and conformed facts, which are dimensions that are shared (in a specific way)
between facts in two or more data marts.

Top-down design: The top-down approach is designed using a normalized enterprise data
model. "Atomic" data, that is, data at the lowest level of detail, are stored in the data
warehouse. Dimensional data marts containing data needed for specific business processes
or specific departments are created from the data warehouse.

Page 66 of 78
Hybrid design: Data warehouses (DW) often resemble the hub and spokes architecture.
Legacy systems feeding the warehouse often include customer relationship management
and enterprise resource planning, generating large amounts of data. To consolidate these
various data models, and facilitate the extract transform load process, data warehouses
often make use of an operational data store, the information from which is parsed into the
actual DW. To reduce data redundancy, larger systems often store the data in a normalized
way. Data marts for specific reports can then be built on top of the DW.

The picture below depicting group name and the filter conditions

Page 67 of 78
Step 4: Connect two group to corresponding target table.

● How to access UPDATE_OVERRIDE property?

1. Go to Mapping

Page 68 of 78
2. Double Clink on the concerned TARGET or edit the Target
3. Click on Properties Tab
4. The second Transformation Attribute is the property we are looking for

● Syntax for UPDATE_OVERIDE SQL:

UDATE          <TARGET TABLE>


SET                      <COLUMN NAME to be updated> = :TU.<TARGET COLUMN PORT NAME (As in
Designer)>     
                                    , [Other columns need to be updated]
WHERE          <COLUMN NAME to be treated as KEY > = :TU.<corresponding TARGET COLUMN PORT
NAME (As in Designer)>
AND                  [other conditions]

● Example:

UPDATE      EMPL_POST_HIST
SET                      POST = :TU.POST
                              , UPDATE_DATE = :TU.UPDATE_DATE
WHERE EMPL = :TU.EMPL

Debug Process
To debug a mapping, complete the following steps:

1. Create breakpoints.
 Create breakpoints in a mapping where you want the Integration Service to evaluate data and error
conditions.
2. Configure the Debugger. 
Use the Debugger Wizard to configure the Debugger for the mapping. Select the session type the
Integration Service uses when it runs the Debugger. When you create a debug session, you configure a
subset of session properties within the Debugger Wizard, such as source and target location. You can
also choose to load or discard target data.
3. Run the Debugger.
 Run the Debugger from within the Mapping Designer. When you run the Debugger, the Designer
connects to the Integration Service. The Integration Service initializes the Debugger and runs the
debugging session and workflow. The Integration Service reads the breakpoints and pauses the
Debugger when the breakpoints evaluate to true.
4. Monitor the Debugger.
 While you run the Debugger, you can monitor the target data, transformation and mapplet output data,
the debug log, and the session log. When you run the Debugger, the Designer displays the following
windows:
● Debug log.
 View messages from the Debugger.
● Target window. 
View target data.
● Instance window. 
View transformation data.
5. Modify data and breakpoints.
 When the Debugger pauses, you can modify data and see the effect on transformations, mapplets, and
targets as the data moves through the pipeline. You can also modify breakpoint information.

Page 69 of 78
The Designer saves mapping breakpoint and Debugger information in the workspace files. You can copy
breakpoint information and the Debugger configuration to another mapping. If you want to run the
Debugger from another PowerCenter Client machine, you can copy the breakpoint information and the
Debugger configuration to the other PowerCenter Client machine.
The following figure shows the windows in the Mapping Designer that appears when you run the
Debugger:

Page 70 of 78
1. Debugger log.
2. Session log.
3. Instance window.
4. Target window.

Target load plan is that you can set at the Informatica level where


as the Constraint based load ordering is what happens at the
database side. Whenever you have multiple pipelines in your
mapping, you can set the order of execution of those pipelines,
which one run first and which one next. Whenever you select the
target load plan, it displays the list of source qualifiers lying
in the different pipelines. You just need to arrange the source
qualifier as per your requirement and save your mapping.
Constraint based load ordering is an option in the
session properties where the informatica understands that
the parent table has to be loaded first after that the child table
loaded, based on the primary key and foreign key relationship.
You need not determine which table to load first and which next, all
you need to do this just check the Constraint based load
ordering’ option in the session properties. This applies when
you have multiple targets in the same pipeline simultaneously (like
multiple targets from the same transformation or multiple targets
from the same output group of a router etc.
Bulk nd Normal Load
The main difference between normal and bulk load is, in
normal load Informatica repository service create logs
and in bulk load log is not being created. That is the
reason bulk load loads the data fast and if anything goes
wrong the data cannot be recovered. But in normal load since
the log is created the lost data can be recovered.
By configuring session to Bulk loading we cannot do ‘session
recovery‘ when it get failed whereas in normal loading we can
easily do session recovery.
Note: In normal load Informatica creates logs record by
record in database level and in bulk mode it will not create

Page 71 of 78
detail log. By avoiding that detail writing in bulk mode improve
performance.
Hi. Use bulk mode when there are no constraints, but you can't rollback things.
Use normal mode if there are constraints and if you want to rollback.

When loading to Microsoft SQL Server and Oracle targets, you must specify a normal load if
you select data driven for the Treat Source Rows As session property. When you
specify bulk mode and data driven, the Informatica Server fails the session

Note: If there are indexes and constraints in the table, you


cannot use Bulk Load
You can drop/disable those indexes/constraints (pre
session ) , then load the data (use Bulk Load) , and then
create/enable the indexes/constraints (post session)
Bulk load configuration is applicable in:
1. DB2.
2. Sybase ASE.
3. Oracle.
4. Microsoft SQL Server databases.

Data driven Sessions in Informatica


Data driven is the property by which the Informatica Server decides the way the data needs to
be treated whenever a mapping contains Update Strategy Transformation.

For example Whenever we use Update Strategy Transformation we need to mention whether
its DD_UPDATE or DD_INSERT or DD_DELETE in the mapping. One mapping may contain two
or more Update Strategy Transformation. Therefore, in order for the session to get executed
successfully mention Data Driven property in the session properties for that particular
mapping.

For

DD_UPDATE records needs to be updated

DD_INSERT records needs to be newly inserted.

DD_DELETE records needs to be deleted

Architecture:
Informatica ETL tool consists of following services & components

1. Repository Service –
Responsible for maintaining
Informatica metadata &
providing access of same to
other services.
Page 72 of 78
2. Integration Service –
Responsible for the
movement of data from
sources to targets
3. Reporting Service - Enables
the generation of reports
4. Nodes – Computing platform
where the above services are
executed

5. Informatica Designer - Used for creation of mappings between


source and target
6. Workflow Manager – Used to create workflows and other task & their
execution
7. Workflow Monitor – Used to monitor the execution of workflows
8. Repository Manager – Used to manage objects in repository

Page 73 of 78
Page 74 of 78
In this tutorial- you will learn

● Informatica Domain
● PowerCenter Repository
● Domain Configuration
● Properties of the domain
● Powercenter client & Server Connectivity
● Repository Service
● Integration Service
● Sources & Targets

Informatica Domain
The overall architecture of Informatica is Service Oriented Architecture
(SOA).
1. Repository Service: It is responsible for maintaining Informatica metadata and
provides access to the same to other services.
2. Integration Service: This service helps in the movement of data from sources
to the targets.
3. Reporting Service: This service generates the reports.
4. Nodes: This is a computing platform to execute the above services.
5. Informatica Designer: It creates the mappings between source and target.
6. Workflow Manager: It is used to create workflows or other tasks and their
execution.
7. Workflow Monitor: It is used to monitor the execution of workflows.
8. Repository Manager: It is used to manage the objects in the repository.

A node is a logical representation of a machine in a domain

 A Domain is a collection of services.

Nth highest salary


select * from(
select ename, sal, dense_rank()
over(order by sal desc)r from Employee)
where r=&n;

Approaches for finding Nth highest salary:

1) Using Correlated subquery.

Page 75 of 78
SQL Query:

SELECT name, salary


FROM #Employee e1
WHERE N-1 = (SELECT COUNT(DISTINCT salary) FROM #Employee e2
WHERE e2.salary > e1.salary)

2) Using Top

SELECT TOP 1 salary FROM ( SELECT DISTINCT TOP N salary FROM #Employee ORDER


BY salary DESC ) AS temp ORDER BY salary

3) Using Limit

SELECT salary FROM Employee ORDER BY salary DESC LIMIT N-1, 1

4) Row Number

SELECT * FROM ( SELECT e.*, ROW_NUMBER() OVER (ORDER BY salary DESC)


rn FROM Employee e ) WHERE rn = N;

5) Using Dense_Rank()

select * from( select ename, sal, dense_rank() over(order by sal desc)r from Employee)


where r=&n

Java transformation can be re-usable and it can be defined as both active or passive
Informatica object.
The Java Transformation has four self-explanatory tabs: Transformation (general
options), Ports (inputs and outputs in separate groups), Properties (active/passive,
deterministic), and Java Code. Once the ports and properties are set, the java code
can be entered and compiled from within the designer window. The code window is
divided into tab windows which includes:
● Import Packages - import 3rd party java packages, built-in or custom Java packages
● Helper code - declare user-defined variables and methods for the Java
transformation class.
● On Input Row - the Java code is executed one time for each input row. Only on
this tab the input row can be accessed.
● On End of Data - defines the behavior after processing all the input data
● On Receiving transaction - code which is executed when a transaction is received by
the transformation
● Java expressions - used for defining and calling Java expressions
Then the code snippets get compiled into byte code behind the scenes and
Integration Service starts a JVM that executes it to process the data.

Page 76 of 78
NVL(expr1, expr2) : In SQL, NVL() converts a null value to an actual value.
Data types that can be used are date, character and number. Data type must
match with each other i.e. expr1 and expr2 must of same data type.
Ex:
expr1 is the source value or expression that may contain a null.
expr2 is the target value for converting the null.

NVL2(expr1, expr2, expr3) : The NVL2 function examines the first


expression. If the first expression is not null, then the NVL2 function returns
the second expression. If the first expression is null, then the third expression
is returned i.e. If expr1 is not null, NVL2 returns expr2. If expr1 is null, NVL2
returns expr3. The argument expr1 can have any data type.
Ex:
expr1 is the source value or expression that may contain null
expr2 is the value returned if expr1 is not null
expr3 is the value returned if expr1 is null.

COALESCE() : The COALESCE() function examines the first expression, if


the first expression is not null, it returns that expression; Otherwise, it does a
COALESCE of the remaining expressions.
The advantage of the COALESCE() function over the NVL() function is that
the COALESCE function can take multiple alternate values. In simple words
COALESCE() function returns the first non-null expression in the list.
Static cache

Dynamic cache
Lookup Caches in Informatica
Shared cache

Persistent cache
Static cache:
Static Cache is same as a Cached Lookup in which once a Cache is created and the Integration
Service always queries the Cache instead of the Lookup Table.

In Static Cache when the Lookup condition is set to true it returns the value from lookup table else
returns Null or Default value.
One of the key point to remember when using the Static Cache is that we cannot insert or update the
cache.
Dynamic cache:
In Dynamic Cache we can insert or update rows in the cache when we pass the rows through the
transformation. The Integration Service dynamically does the inserts or updates of data in the lookup
cache and passes the data to the target. The dynamic cache is synchronized with the target to have
the latest of the key attribute values.
Shared cache:
For Shared Cache Informatica server creates the cache memory for multiple lookup transformations in
the mapping and once the lookup is done for the first lookup then memory is released and that
memory is used by the other look up transformation.
We can share the lookup cache between multiple transformations. Un-named cache is shared
between transformations in the same mapping and named cache between transformations in the
same or different mappings.
Persistent cache:

Page 77 of 78
If we use Persistent cache Informatica server processes a lookup transformation and saves the
lookup cache files and reuses them the next time when the workflow is executed. The Integration
Service saves or deletes lookup cache files after a successful session run based on whether the
Lookup cache is checked as persistent or not.

In order to make a Lookup Cache as Persistent cache you need to make the following changes

● Lookup cache persistent: Needs to be checked


● Cache File Name Prefix: Enter the Named Persistent cache file name
● Re-cache from lookup source: Needs to be checked
Re-cache from database
If the persistent cache is not synchronized with the lookup table you can configure the lookup
transformation to rebuild the lookup cache.

Page 78 of 78

You might also like