Informatica Interview Questions and Answers
Informatica Interview Questions and Answers
Total I have 4.5 Years of experience in DWH using informatica tool in development and
Enhancement projects. Primarily I worked on healthcare and manufacturing domains.
⮚ I am working with onsite offshore model so we use to get the tasks from my onsite
team.
⮚ As a developer first I need to understand the physical data model i.e. dimensions and
facts; their relationship & also functional specifications that tells the business
requirement designed by Business Analyst.
⮚ I involved into the preparation of source to target mapping sheet (tech Specs) which
tell us what is the source and target and which column we need to map to target
column and also what would be the business logic. This document gives the clear
picture for the development.
⮚ Creating informatica mappings, sessions and workflows using different
transformations to implement business logic.
⮚ Preparation of Unit test cases also one of my responsibilities as per the business
requirement.
⮚ And also involved into Unit testing for the mappings developed by myself.
⮚ I use to source code review for the mappings & workflows developed by my team
members.
⮚ And also involved into the preparation of deployment plan which contains list of
mappings and workflows they need to migrate based on this deployment team can
migrate the code from one environment to another environment.
⮚ Once the code rollout to production we also work with the production support team
for 2 weeks where we parallel give the KT. So we also prepare the KT document as well
for the production team.
Manufacturing or Supply Chain
Page 1 of 78
Coming to My Current Project:
Currently I am working for XXX project for YYY client. Generally YYY does not have a
manufacturing unit, What BIZ (Business) use to do here is before quarter ends they will call
for quotations for primary supply channels this process we called as a RFQ’s(Request for
quotations).Once BIZ creates RFQ automatically notification will go to supply channels .So
these supply channels send back their respective quoted values that we called it as response
from the supply channel. After that biz will start negotiations with supply channels for the
best deal then they approve the RFQ’s.
All these activities (Creating RFQ, supplier response and approve RFQ etc.)Performed in the
oracle apps this is source frontend tool application. These data which get stored into OLTP.
So the OLTP contains all the RFQs, supplier response and approval status data.
We have some Oracle jobs running between OLTP and ODS which replicate the OLTP data to
ODS. It is designed in such a way that any transaction entering into the OLTP is immediately
reflected into the ODS.
We have a staging area where we load the entire ODS data into staging tables for this we
have created some ETL informatica mappings these mappings will truncate and reload the
staging tables for each session run. Before loading to staging tables we are dropping
indexes then after loading bulk data we are recreating indexes using store procedures.
Then we extract all this data from stage & load it into the dimensions & facts on top of
dims and facts we have created some materialized views as per the report requirement.
Finally report directly pulls the data from MV .These reports /dashboards performance
always good because we are not doing any calculation at reporting level. These
dashboards/reports can be used for the analysis purpose like say how many RFQs created,
how many RFQs approved, how many RFQs got responded from the supply channels?
Who is the approval manager pending with whom what is the feedback of the supply
channels from the past etc?
Page 2 of 78
In the present system they don’t have the BI design, so they are using the manual process
by exporting the sql query data to excel sheet an preparing PI charts using macros. In the
new system we are providing BI passion reports like drill down, drillups,PI
charts,Graphs,detail reports and Dashboards.
Delivery Centers
Current system was designed in webmethods (it’s a middleware tool) now they found some issues in
the existing system its not supporting BI capabilities like drill down and drill ups and etc. that’s why
Page 3 of 78
Generally Once production completes for any product they will send it to delivery centers from the
DC(Delivery Centers) it will be shipped to Suppliy channels or Distributors from there it will go to
end customers.
Before start production of any product bizz approval is essential for the production Unit.
Before taking the decision Bizz has to do some analysis on the existing stock , previous sales history
and future orders etc..to do these they need the reports in BI passion(Drill down and Drill
ups)these reports created in BO, it would show what is on stock in the each delivery
center ,shipping status ,previous sales History and what would be the customer orders for each of
the product across all the delivery centers.Bizz will buy those details from Third party IMS
company .
IMS will collect the information from different distrusters and delivery centers like what is the on
hand stock, shipping stock and how many orders we have in hand for the next quarter and also
what was the previous sales history for specific products.
We have a staging area where we load the entire IMS data into staging tables for this we have
created some ETL informatica mappings these mappings will truncate and reload the staging tables
for each session run. Before loading to staging tables we are dropping indexes then after loading
bulk data we are recreating indexes using store procedures. After completion of sagging load we
load the data in to our dims and facts. On top of out data model we have created some materialized
views where we have complete reporting calculations from Materialized views we will pull the data
to BO reports with less join and less aggregations. So report performance is good.
ORACLE
1) I am good in SQL,I use to write the source qualifier queries for informatica mappings
as per the business requirement.
2) I am comfortable to work with joins; co related queries, sub queries, analyzing
tables, inline views and materialized views.
3) As a informatica developer I could not get more opportunity to work on pl/sql side.
But I worked on PL/SQL to informatica migration project so I do have exposure on
procedure, function and triggers.
Page 4 of 78
What is the difference between view and materialized view?
A view has a logical existence. It does not A materialized view has a physical
contain data. existence.
When we do select * from view it will fetch When we do select * from materialized
the data from base table. view it will fetch the data from materialized
view.
Materialized View
Materialized view is very essential for reporting. If we don’t have the materialized view it
will directly fetch the data from dimension and facts. This process is very slow since it
involves multiple joins. So the same report logic if we put in the materialized view. We can
fetch the data directly from materialized view for reporting purpose. So that we can avoid
multiple joins at report run time.
It is always necessary to refresh the materialized view. Then it can simply perform select
statement on materialized view.
Difference between Trigger and Procedure
Page 5 of 78
A sub-query is executed once for the Where as co-related sub-query is executed
parent Query once for each row of the parent query.
Example: Example:
Select * from emp where deptno in (select Select a.* from emp e where sal >= (select
deptno from dept); avg(sal) from emp a where
a.deptno=e.deptno group by a.deptno);
Both where and having clause can be used to filter the data.
Where as in where clause it is not But having clause we need to use it with
mandatory. the group by.
Where clause applies to the individual Where as having clause is used to test some
rows. condition on the group rather than on
individual rows.
Where clause is used to restrict rows. But having clause is used to restrict groups.
In where clause every record is filtered In having clause it is with aggregate records
based on where. (group by functions).
Stored procedure may or may not return Function should return at least one output
values. parameter. Can return more than one
parameter using OUT argument.
Stored procedure can be used to solve the Function can be used to calculations
business logic.
Stored procedure accepts more than one Whereas function does not accept
argument. arguments.
Page 6 of 78
Stored procedures are mainly used to Functions are mainly used to compute
process the tasks. values
Cannot be invoked from SQL statements. Can be invoked form SQL statements e.g.
E.g. SELECT SELECT
Can affect the state of database using Cannot affect the state of database.
commit.
Rowid Rownum
Joiner Lookup
Page 7 of 78
We cannot override the query in We can override the query in lookup
joiner to fetch the data from multiple
tables.
We can’t perform any filters along We can apply filters along with lkp
with join condition in joiner conditions using lkp query override
transformation. lookup transformation.
In source qualifier it will push all the Where as in lookup we can restrict
matching records. whether to display first value, last
value or any value
When both source and lookup are in When the source and lookup table
same database we can use source exists in different database then we
qualifier. need to use lookup.
Page 8 of 78
We use source qualifier to join the We use joiner to join the tables if
tables if tables are in the same tables are in the different database
database
In source qualifier we can use any Where as in joiner we can’t use other
type of join between two tables. than 4 types of joins.
Stoped:
You choose to stop the workflow or task in the Workflow Monitor or through
pmcmd. The Integration Service stops processing the task and all other tasks in
its path. The Integration Service continues running concurrent tasks like
backend store procedures.
Abort:
You choose to abort the workflow or task in the Workflow Monitor or through
pmcmd. The Integration Service kills the DTM process and aborts the task.
Top Keyword:
SELECT TOP 1 salary FROM ( SELECT DISTINCT TOP 3 salary FROM #Employee ORDER
BY salary DESC ) AS temp ORDER BY salary
Page 9 of 78
Limit keyword:
SELECT salary FROM Employee ORDER BY salary DESC LIMIT N-1, 1
Select empno, count (*) from EMP group by empno having count (*)>1;
Delete from EMP where rowid not in (select max (rowid) from EMP group by empno);
What is your tuning approach if SQL query taking long time? Or how do u tune SQL query?
If query taking long time then First will run the query in Explain Plan, The explain plan
process stores data in the PLAN_TABLE.
It will give us execution plan of the query like whether the query is using the relevant
indexes on the joining columns or indexes to support the query are missing.
If joining columns doesn’t have index then it will do the full table scan if it is full table scan
the cost will be more then will create the indexes on the joining columns and will run the
query it should give better performance and also needs to analyze the tables if analyzation
happened long back. The ANALYZE statement can be used to gather statistics for a specific
table, index or cluster using
If still have performance issue then will use HINTS, hint is nothing but a clue. We can use
hints like
Page 10 of 78
● ALL_ROWS\
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing systems.
● FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.
● CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS, based on
statistics gathered.
● HASH
Hashes one table (full scan) and creates a hash index for that table. Then hashes
other table and uses hash index to find corresponding records. Therefore not
suitable for < or > join conditions.
/*+ use_hash */
DWH Concepts
OLTP DWH/DSS/OLAP
Since it is normalized structure so here it Here it does not require much joins to fetch
requires multiple joins to fetch the data. the data.
Page 11 of 78
What is Staging area why we need it in DWH?
If target and source databases are different and target table volume is high it contains some
millions of records in this scenario without staging table we need to design your informatica
using look up to find out whether the record exists or not in the target table since target
has huge volumes so its costly to create cache it will hit the performance.
If we create staging tables in the target database we can simply do outer join in the source
qualifier to determine insert/update this approach will give you good performance.
It will avoid full table scan to determine insert/updates on target.
And also we can create index on staging tables since these tables were designed for
specific application it will not impact to any other schemas/users.
Data cleansing, also known as data scrubbing, is the process of ensuring that a set of data
is correct and accurate. During data cleansing, records are checked for accuracy and
consistency.
ODS:
My understanding of ODS is, its a replica of OLTP system and so the need of this, is to
reduce the burden on production system (OLTP) while fetching data for loading targets.
Hence its a mandate Requirement for every Warehouse.
A primary key is a special constraint on a column or set of columns. A primary key constraint
ensures that the column(s) so designated have no NULL values, and that every value is
unique. Physically, a primary key is implemented by the database system using a unique
Page 12 of 78
index, and all the columns in the primary key must have been declared NOT NULL. A table
may have only one primary key, but it may be composite (consist of more than one column).
A surrogate key is any column or set of columns that can be declared as the primary key
instead of a "real" or natural key. Sometimes there can be several natural keys that could be
declared as the primary key, and these are all called candidate keys. So a surrogate is a
candidate key. A table could actually have more than one surrogate key, although this
would be unusual. The most common type of surrogate key is an incrementing integer, such
as an auto increment column in MySQL, or a sequence in Oracle, or an identity column in
SQL Server.
Have you done any Performance tuning in informatica?
1) Yes, One of my mapping was taking 3-4 hours to process 40 millions rows into
staging table we don’t have any transformation inside the mapping its 1 to 1
mapping .Here nothing is there to optimize the mapping so I created session
partitions using key range on effective date column. It improved performance lot,
rather than 4 hours it was running in 30 minutes for entire 40millions.Using
partitions DTM will creates multiple reader and writer threads.
2) There was one more scenario where I got very good performance in the mapping
level .Rather than using lookup transformation if we can able to do outer join in the
source qualifier query override this will give you good performance if both lookup
table and source were in the same database. If lookup tables is huge volumes then
creating cache is costly.
3) And also if we can able to optimize mapping using less no of transformations always
gives you good performance.
4) If any mapping taking long time to execute then first we need to look in to source
and target statistics in the monitor for the throughput and also find out where
exactly the bottle neck by looking busy percentage in the session log will come to
know which transformation taking more time ,if your source query is the bottle neck
then it will show in the end of the session log as “query issued to database “that
means there is a performance issue in the source query.we need to tune the query
using .
Page 13 of 78
1) I have Unix shell scripting knowledge whatever informatica required like
cd /pmar/informatica/pc/pmserver/
2) And if we suppose to process flat files using informatica but those files were exists in
remote server then we have to write script to get ftp into informatica server before start
process those files.
3) And also file watch mean that if indicator file available in the specified location then we
need to start our informatica jobs otherwise will send email notification using
Mail X command saying that previous jobs didn’t completed successfully something like
that.
4) Using shell script update parameter file with session start time and end time.
This kind of scripting knowledge I do have. If any new UNIX requirement comes then I can
Google and get the solution implement the same.
If we copy source definaltions or target definations or mapplets from Shared folder to any
other folders that will become a shortcut.
Let’s assume we have imported some source and target definitions in a shared folder after
that we are using those sources and target definitions in another folders as a shortcut in
some mappings.
If any modifications occur in the backend (Database) structure like adding new columns or
drop existing columns either in source or target I f we reimport into shared folder those new
changes automatically it would reflect in all folder/mappings wherever we used those
sources or target definitions.
Page 14 of 78
How to Concat row data through informatica?
Source:
Ename EmpNo
stev 100
methew 100
john 101
tom 101
Target:
Ename EmpNo
Ans:
If record doen’t exit do insert in target .If it is already exist then get corresponding Ename
vale from lookup and concat in expression with current Ename value then update the target
Ename column using update strategy.
Sort the data in sq based on EmpNo column then Use expression to store previous record
information using Var port after that use router to insert a record if it is first time if it is
already inserted then update Ename with concat value of prev name and current name
value then update in target.
How to send Unique (Distinct) records into One target and duplicates into another tatget?
Source:
Ename EmpNo
Page 15 of 78
stev 100
Stev 100
john 101
Mathew 102
Output:
Target_1:
Ename EmpNo
Stev 100
John 101
Mathew 102
Target_2:
Ename EmpNo
Stev 100
Ans:
If record doen’t exit do insert in target_1 .If it is already exist then send it to Target_2 using
Router.
Sort the data in sq based on EmpNo column then Use expression to store previous record
information using Var ports after that use router to route the data into targets if it is first
time then sent it to first target if it is already inserted then send it to Tartget_2.
Page 16 of 78
I want to generate the separate file for every employee (as per Name, it should generate
file).It has to generate 5 flat files and name of the flat file is corresponding employee name
that is the requirement.
Below is my mapping.
Source:
A S 22
A R 27
B P 29
B X 30
B U 34
This functionality was added in informatica 8.5 onwards earlier versions it was not there.
We can achieve it with use of transaction control and special "FileName" port in the target
file .
In order to generate the target file names from the mapping, we should make use of the
special "FileName" port in the target file. You can't create this special port from the usual
New port button. There is a special button with label "F" on it to the right most corner of the
target flat file when viewed in "Target Designer".
When you have different sets of input data with different target files created, use the same
instance, but with a Transaction Control transformation which defines the boundary for the
source sets.
in target flat file there is option in column tab i.e filename as column.
when you click that one non editable column gets created in metadata of target.
Page 17 of 78
How do u populate 1st record to 1st target , 2nd record to 2nd target ,3rd record to 3rd target
and 4th record to 1st target through informatica?
We can do using sequence generator by setting end value=3 and enable cycle option.then
in the router take 3 goups
In 1st group specify condition as seq next value=1 pass those records to 1st target simillarly
In 2nd group specify condition as seq next value=2 pass those records to 2 nd target
In 3rd group specify condition as seq next value=3 pass those records to 3rd target.
Since we have enabled cycle option after reaching end value sequence generator will start
from 1,for the 4th record seq.next value is 1 so it will go to 1st target.
Incremental means suppose today we processed 100 records ,for tomorrow run u need to
extract whatever the records inserted newly and updated after previous run based on last
updated timestamp (Yesterday run) this process called as incremental or delta.
Page 18 of 78
Logic in the mapping variable is
Page 19 of 78
Logic in the SQ is
Page 20 of 78
In expression assign max last update date value to the variable using function set max variable
Page 21 of 78
Logic in the update strategy is below
Page 22 of 78
Approach_2: Using parameter file
Parameterfile format
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.
ST:s_m_GEHC_APO_BAAN_SALES_HIST_AUSTRIA]
$DBConnection_Source=DMD2_GEMS_ETL
$DBConnection_Target=DMD2_GEMS_ETL
Page 23 of 78
Updating parameter File
Page 24 of 78
Main mapping
Page 25 of 78
Sql override in SQ Transformation
Workflod Design
Page 26 of 78
Parameter file
It is a text file below is the format for parameter file. We use to place this file in the unix box where
we have installed our informatic server.
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.
ST:s_m_GEHC_APO_BAAN_SALES_HIST_AUSTRIA]
$InputFileName_BAAN_SALE_HIST=/interface/dev/etl/apo/srcfiles/HS_025_20070921
$DBConnection_Target=DMD2_GEMS_ETL
$$CountryCode=AT
$$CustomerNumber=120165
Page 27 of 78
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.
ST:s_m_GEHC_APO_BAAN_SALES_HIST_BELGIUM]
$DBConnection_Sourcet=DEVL1C1_GEMS_ETL
$OutputFileName_BAAN_SALES=/interface/dev/etl/apo/trgfiles/HS_002_20070921
$$CountryCode=BE
$$CustomerNumber=101495
1 First we need to create two control tables cont_tbl_1 and cont_tbl_1 with
structure of session_st_time, wf_name
2 Then insert one record in each table with session_st_time=1/1/1940 and
workflow_name
3 create two store procedures one for update cont_tbl_1 with session st_time,
set property of store procedure type as Source_pre_load .
4 In 2nd store procedure set property of store procedure type as Target
_Post_load.this proc will update the session _st_time in Cont_tbl_2 from
cnt_tbl_1.
5 Then override source qualifier query to fetch only LAT_UPD_DATE >=(Select
session_st_time from cont_tbl_2 where workflow name=’Actual work flow
name’.
How to load cumulative salary in to target ?
Solution:
Using var ports in expression we can load cumulative salary into target.
Page 28 of 78
Also below is the logic for converting columns into Rows without using Normalizer
Transformation.
Page 29 of 78
1) Source will contain two columns address and id.
1) Use Aggregator transformation and check group by on port id only. As shown below:-
Page 30 of 78
Difference between dynamic lkp and static lkp cache?
1 In Dynamic lkp the cache memory will get refreshed as soon as the record get
inserted or updated/deleted in the lookup table where as in static lookup the
cache memory will not get refreshed even though record inserted or updated
in the lookup table it will refresh only in the next session run.
2 Best example where we need to use dynamic chache is if suppose first record
and last record both are same but there is a change in the address what
informatica mapping has to do here is first record needs to get insert and last
record should get update in the target table.
3 If we use static look up first record it will go to lookup and check in the lkp
cache based on the condition it will not find the macth so it returns null value
then in the router will send that recod to insert flow.
4 But still this record does not available in the cach memory so when the last
record comes to look up it will check in the cache it will not find the match it
returns null values again it will go to insert flow through router but it suppose
to go update flow because cache didn’t refreshed when the first record get
insert in to target table. So if we use dynamic look up we can achieve our
requirement because first time record get insert then immediately cache also
get refresh with the target data. When we process last record it will find the
match in the cache so it returns the value then router will route that record to
update flow.
What is the difference between snow flake and star schema
The star schema is the simplest data Snowflake schema is a more complex data
warehouse schema. warehouse model than a star schema.
In star schema each of the dimensions is In snow flake schema at least one hierarchy
represented in a single table .It should not should exists between dimension tables.
have any hierarchies between dims.
Page 31 of 78
It contains a fact table surrounded by It contains a fact table surrounded by
dimension tables. If the dimensions are de- dimension tables. If a dimension is
normalized, we say it is a star schema normalized, we say it is a snow flaked
design. design.
In star schema only one join establishes the In snow flake schema since there is
relationship between the fact table and any relationship between the dimensions tables
one of the dimension tables. it has to do many joins to fetch the data.
It is called a star schema because the It is called a snowflake schema because the
diagram resembles a star. diagram resembles a snowflake.
A Data Mart is a subset of data from a Data A Data Warehouse is simply an integrated
Warehouse. Data Marts are built for consolidation of data from a variety of
specific user groups. sources that is specially designed to
support strategic and tactical decision
making.
By providing decision makers with only a The main objective of Data Warehouse is to
subset of data from the Data Warehouse, provide an integrated environment and
Privacy, Performance and Clarity Objectives coherent picture of the business at a point
can be attained. in time.
Page 32 of 78
This is connected to pipleline and receives Which is not dynamic cache
the input values from pipleline.
to pipeline and receives input values from
the result of a: LKP expression in another
transformation via arguments.
We cannot use this lookup more than once We can use this transformation more than
in a mapping. once within the mapping
We can return multiple columns from the Designate one return port (R), returns one
same row. column from each row.
We can configure to use dynamic cache. We cannot configure to use dynamic cache.
Pass multiple output values to another Pass one output value to another
transformation. Link lookup/output ports transformation. The lookup/output/return
to another transformation. port passes the value to the transformation
calling: LKP expression.
Supports user defined default values. Does not support user defined default
values.
Cache includes the lookup source column Cache includes all lookup/output ports in
in the lookup condition and the lookup the lookup condition and the lookup/return
source columns that are output ports. port.
Joiner Lookup
In joiner on multiple matches it will return In lookup it will return either first record or
all matching records. last record or any value or error value.
We cannot override the query in joiner We can override the query in lookup to
fetch the data from multiple tables.
We can perform outer join in joiner We cannot perform outer join in lookup
Page 33 of 78
transformation. transformation.
We cannot use relational operators in joiner Where as in lookup we can use the relation
transformation.(i.e. <,>,<= and so on) operators. (i.e. <,>,<= and so on)
In source qualifier it will push all the Where as in lookup we can restrict whether
matching records. to display first value, last value or any value
When both source and lookup are in same When the source and lookup table exists in
database we can use source qualifier. different database then we need to use
lookup.
In dynamic lookup the cache memory will In static lookup the cache memory will not
get refreshed as soon as the record get get refreshed even though record inserted
inserted or updated/deleted in the lookup or updated in the lookup table it will
table. refresh only in the next session run.
Best example where we need to use If we use static lookup first record it will go
dynamic cache is if suppose first record and to lookup and check in the lookup cache
last record both are same but there is a based on the condition it will not find the
change in the address. What informatica match so it will return null value then in the
mapping has to do here is first record needs router it will send that record to insert
to get insert and last record should get flow.
update in the target table.
Page 34 of 78
But still this record dose not available in
the cache memory so when the last record
comes to lookup it will check in the cache it
will not find the match so it returns null
value again it will go to insert flow through
router but it is suppose to go to update
flow because cache didn’t get refreshed
when the first record get inserted into
target table.
How to Process multiple flat files to single target table through informatica if all files are
same structure?
We can process all flat files through one mapping and one session using list file.
First we need to create list file using Unix script for all flat file the extension of the list file
is .LST.
How to populate file name to target while loading multiple files using list file concept.
In informatica 8.6 by selecting Add currently processed flatfile name option in the
properties tab of source definition after import source file defination in source analyzer.It
will add new column as currently processed file name.we can map this column to target to
populate filename.
● We have one of the dimension in current project called resource dimension. Here we
are maintaining the history to keep track of SCD changes.
● To maintain the history in slowly changing dimension or resource dimension. We
followed SCD Type-II Effective-Date approach.
● My resource dimension structure would be eff-start-date, eff-end-date, s.k and
source columns.
Page 35 of 78
● Whenever I do a insert into dimension I would populate eff-start-date with sysdate,
eff-end-date with future date and s.k as a sequence number.
● If the record already present in my dimension but there is change in the source data.
In that case what I need to do is
● Update the previous record eff-end-date with sysdate and insert as a new record
with source data.
● Once you fetch the record from source qualifier. We will send it to lookup to find out
whether the record is present in the target or not based on source primary key
column.
● Once we find the match in the lookup we are taking SCD column and s.k column
from lookup to expression transformation.
● In lookup transformation we need to override the lookup override query to fetch
active records from the dimension while building the cache.
● In expression transformation I can compare source with lookup return data.
● If the source and target data is same then I can make a flag as ‘S’.
● If the source and target data is different then I can make a flag as ‘U’.
● If source data does not exists in the target that means lookup returns null value. I
can flag it as ‘I’.
● Based on the flag values in router I can route the data into insert and update flow.
● If flag=’I’ or ‘U’ I will pass it to insert flow.
● If flag=’U’ I will pass this record to eff-date update flow
● When we do insert we are passing the sequence value to s.k.
● Whenever we do update we are updating the eff-end-date column based on lookup
return s.k value.
Complex Mapping
● We have one of the order file requirement. Requirement is every day in source
system they will place filename with timestamp in informatica server.
● We have to process the same date file through informatica.
● Source file directory contain older than 30 days files with timestamps.
● For this requirement if I hardcode the timestamp for source file name it will process
the same file every day.
● So what I did here is I created $InputFilename for source file name.
● Then I am going to use the parameter file to supply the values to session variables
($InputFilename).
● To update this parameter file I have created one more mapping.
● This mapping will update the parameter file with appended timestamp to file name.
● I make sure to run this parameter file update mapping before my actual mapping.
Page 36 of 78
How to handle errors in informatica?
● We have one of the source with numerator and denominator values we need to
calculate num/deno
● When populating to target.
● If deno=0 I should not load this record into target table.
● We need to send those records to flat file after completion of 1 st session run. Shell
script will check the file size.
● If the file size is greater than zero then it will send email notification to source
system POC (point of contact) along with deno zero record file and appropriate email
subject and body.
● If file size<=0 that means there is no records in flat file. In this case shell script will
not send any email notification.
● Or
● We are expecting a not null value for one of the source column.
● If it is null that means it is a error record.
● We can use the above approach for error handling.
Worklet
Worklet is a set of reusable sessions. We cannot run the worklet without workflow.
● If both workflow exists in same folder we can create 2 worklet rather than creating 2
workfolws.
● Finally we can call these 2 worklets in one workflow.
● There we can set the dependency.
● If both workflows exists in different folders or repository then we cannot create
worklet.
● We can set the dependency between these two workflow using shell script is one
approach.
● The other approach is event wait and event rise.
● As soon as it completes first workflow we are creating zero byte file (indicator file).
● If indicator file is available in particular location. We will run second workflow.
● If indicator file is not available we will wait for 5 minutes and again we will check for
the indicator. Like this we will continue the loop for 5 times i.e 30 minutes.
● After 30 minutes if the file does not exists we will send out email notification.
In event wait it will wait for infinite time. Till the indicator file is available.
Page 37 of 78
Why we need source qualifier?
Simply it performs select statement.
Select statement fetches the data in the form of row.
Source qualifier will select the data from the source table.
It identifies the record from the source.
It will convert the data types from database language to informatica understandable
language
Parameter file it will supply the values to session level variables and mapping level
variables.
What is the difference between mapping level and session level variables?
Mapping level variables always starts with $$.
A session level variable always starts with $.
Flat File
Flat file is a collection of data in a file in the specific format.
● Delimiter
● Fixed Width
In fixed width we need to known about the format first. Means how many character to read
for particular column.
In delimiter also it is necessary to know about the structure of the delimiter. Because to
know about the headers.
Page 38 of 78
If the file contains the header then in definition we need to skip the first row.
List file:
If you want to process multiple files with same structure. We don’t need multiple mapping
and multiple sessions.
We can use one mapping one session using list file option.
First we need to create the list file for all the files. Then we can use this file in the main
mapping.
Aggregator Transformation:
Transformation type:
Active
Connected
The Aggregator transformation performs aggregate calculations, such as averages and sums.
The Aggregator transformation is unlike the Expression transformation, in that you use the
Aggregator transformation to perform calculations on groups. The Expression
transformation permits you to perform calculations on a row-by-row basis only.
The Aggregator is an active transformation, changing the number of rows in the pipeline.
The Aggregator transformation has the following components and options
Aggregate cache: The Integration Service stores data in the aggregate cache until it
completes aggregate calculations. It stores group values in an index cache and row data in
the data cache.
Aggregate expression: Enter an expression in an output port. The expression can include
non-aggregate expressions and conditional clauses.
Group by port: Indicate how to create groups. The port can be any input, input/output,
output, or variable port. When grouping data, the Aggregator transformation outputs the
last row of each group unless otherwise specified.
Sorted input: Select this option to improve session performance. To use sorted input, you
must pass data to the Aggregator transformation sorted by group by port, in ascending or
descending order.
Aggregate Expressions:
Page 39 of 78
MAX (COUNT (ITEM))
The result of an aggregate expression varies depending on the group by ports used in the
transformation
Aggregate Functions
Use the following aggregate functions within an Aggregator transformation. You can nest
one aggregate function within another aggregate function.
AVG
COUNT
FIRST
LAST
MAX
MEDIAN
MIN
PERCENTILE
STDDEV
SUM
VARIANCE
When you use any of these functions, you must use them in an expression within an
Aggregator transformation.
Tips
Sorted input reduces the amount of data cached during the session and improves session
performance. Use this option with the Sorter transformation to pass sorted data to the
Aggregator transformation.
Limit the number of connected input/output or output ports to reduce the amount of data
the Aggregator transformation stores in the data cache.
If you use a Filter transformation in the mapping, place the transformation before the
Aggregator transformation to reduce unnecessary aggregation.
Page 40 of 78
Normalizer Transformation:
Transformation type:
Active
Connected
The Normalizer transformation receives a row that contains multiple-occurring columns and
returns a row for each instance of the multiple-occurring data.
The Normalizer transformation parses multiple-occurring columns from COBOL sources,
relational tables, or other sources. It can process multiple record types from a COBOL source
that contains a REDEFINES clause.
The Normalizer transformation generates a key for each source row. The Integration Service
increments the generated key sequence number each time it processes a source row. When
the source row contains a multiple-occurring column or a multiple-occurring group of
columns, the Normalizer transformation returns a row for each occurrence. Each row
contains the same generated key value.
Transaction Control Transformation
Transformation type:
Active
Connected
PowerCenter lets you control commit and roll back transactions based on a set of rows that
pass through a Transaction Control transformation. A transaction is the set of rows bound
by commit or roll back rows. You can define a transaction based on a varying number of
input rows. You might want to define transactions based on a group of rows ordered on a
common key, such as employee ID or order entry date.
When you run the session, the Integration Service evaluates the expression for each row
that enters the transformation. When it evaluates a commit row, it commits all rows in the
transaction to the target or targets. When the Integration Service evaluates a roll back row,
it rolls back all rows in the transaction from the target or targets.
If the mapping has a flat file target you can generate an output file each time the Integration
Service starts a new transaction. You can dynamically name each target flat file.
Page 41 of 78
Union Transformation
A union transformation is used merge data from multiple sources similar to the UNION ALL SQL
statement to combine the results from two or more SQL statements.
2. As union transformation gives UNION ALL output, how you will get the UNION output?
Pass the output of union transformation to a sorter transformation. In the properties of sorter
transformation check the option select distinct. Alternatively you can pass the output of union
transformation to aggregator transformation and in the aggregator transformation specify all ports
as group by ports.
The following rules and guidelines need to be taken care while working with union
transformation:
● You can create multiple input groups, but only one output group.
● All input groups and the output group must have matching ports. The precision, datatype, and
scale must be identical across all groups.
● The Union transformation does not remove duplicate rows. To remove duplicate rows, you must
add another transformation such as a Router or Filter transformation.
● You cannot use a Sequence Generator or Update Strategy transformation upstream from a
Union transformation.
● The Union transformation does not generate transactions.
Union is an active transformation because it combines two or more data streams into one.
Though the total number of rows passing into the Union is the same as the total number of rows
passing out of it, and the sequence of rows from any given input stream is preserved in the
output, the positions of the rows are not preserved, i.e. row number 1 from input stream 1 might
not be row number 1 in the output stream. Union does not even guarantee that the output is
repeatable
Aggregator Transformation
● Use sorted input: Sort the data before passing into aggregator. The integration service uses
memory to process the aggregator transformation and it does not use cache memory.
Page 42 of 78
● Filter the unwanted data before aggregating.
● Limit the number of input/output or output ports to reduce the amount of data the aggregator
transformation stores in the data cache.
● AVG
● COUNT
● FIRST
● LAST
● MAX
● MEDIAN
● MIN
● PERCENTILE
● STDDEV
● SUM
● VARIANCE
5. Why cannot you use both single level and nested aggregate functions in a single aggregate
transformation?
The nested aggregate function returns only one output row, whereas the single level aggregate
function returns more than one row. Since the number of rows returned are not same, you cannot
use both single level and nested aggregate functions in the same transformation. If you include
both the single level and nested functions in the same aggregator, the designer marks the
mapping or mapplet as invalid. So, you need to create separate aggregator transformations.
The integration service performs aggregate calculations and then stores the data in historical
cache. Next time when you run the session, the integration service reads only new data and uses
the historical cache to perform new aggregation calculations incrementally.
In incremental aggregation, the aggregate calculations are stored in historical cache on the
server. In this historical cache the data need not be in sorted order. If you give sorted input, the
records come as presorted for that particular run but in the historical cache the data may not be
in the sorted order. That is why this option is not allowed.
Page 43 of 78
You can configure the integration service to treat null values in aggregator functions as NULL or
zero. By default the integration service treats null values as NULL in aggregate functions.
Normalizer Transformation
The normalizer transformation receives a row that contains multiple-occurring columns and
retruns a row for each instance of the multiple-occurring data. This means it converts column
data in to row data. Normalizer is an active transformation.
Since the cobol sources contain denormalzed data, normalizer transformation is used to
normalize the cobol sources.
● The integration service increments the generated key sequence number each time it process a
source row. When the source row contains a multiple-occurring column or a multiple-occurring
group of columns, the normalizer transformation returns a row for each occurrence. Each row
contains the same generated key value.
● The normalizer transformation has a generated column ID (GCID) port for each multiple-
occurring column. The GCID is an index for the instance of the multiple-occurring data. For
example, if a column occurs 3 times in a source record, the normalizer returns a value of 1,2 or
3 in the generated column ID.
4. What is VSAM?
VSAM (Virtual Storage Access Method) is a file access method for an IBM mainframe operating
system. VSAM organize records in indexed or sequential flat files.
The VSAM normalizer transformation is the source qualifier transformation for a COBOL source
definition. A COBOL source is flat file that can contain multiple-occurring data and multiple types
of records in the same file.
Pipeline normalizer transformation processes multiple-occurring data from relational tables or flat
files.
● Occurs clause is specified when the source row has a multiple-occurring columns.
● A redefines clause is specified when the source has rows of multiple columns
Rank Transformation
A rank transformation is used to select top or bottom rank of data. This means, it
selects the largest or smallest numeric value in a port or group. Rank
Page 44 of 78
transformation also selects the strings at the top or bottom of a session sort
order. Rank transformation is an active transformation.
The integration service compares input rows in the data cache, if the input row
out-ranks a cached row, the integration service replaces the cached row with the
input row. If you configure the rank transformation to rank across multiple groups,
the integration service ranks incrementally for each group it finds. The integration
service stores group information in index cache and row data in data cache.
The designer creates RANKINDEX port for each rank transformation. The
integration service uses the rank index port to store the ranking position for each
row in a group.
4. How do you specify the number of rows you want to rank in a rank
transformation?
No. We can specify to rank the data based on only one port. In the ports tab, you
have to check the R option for designating the port as a rank port and this option
can be checked only on one port
Joiner Transformation
A joiner transformation joins two heterogeneous sources. You can also join the data from the
same source. The joiner transformation joins sources with at least one matching column. The
joiner uses a condition that matches one or more joins of columns between the two sources.
● You cannot use a joiner transformation when input pipeline contains an update strategy
transformation.
Page 45 of 78
● You cannot use a joiner if you connect a sequence generator transformation directly before the
joiner.
● Normal join: In a normal join, the integration service discards all the rows from the master and
detail source that do not match the join condition.
● Master outer join: A master outer join keeps all the rows of data from the detail source and the
matching rows from the master source. It discards the unmatched rows from the master source.
● Detail outer join: A detail outer join keeps all the rows of data from the master source and the
matching rows from the detail source. It discards the unmatched rows from the detail source.
● Full outer join: A full outer join keeps all rows of data from both the master and detail rows.
When the integration service processes a joiner transformation, it reads the rows from master
source and builds the index and data cached. Then the integration service reads the detail
source and performs the join. In case of sorted joiner, the integration service reads both sources
(master and detail) concurrently and builds the cache based on the master rows.
When the integration service processes an unsorted joiner transformation, it reads all master
rows before it reads the detail rows. To ensure it reads all master rows before the detail rows, the
integration service blocks all the details source while it caches rows from the master source. As it
blocks the detail source, the unsorted joiner is called a blocking transformation.
Router Transformation
A router is used to filter the rows in a mapping. Unlike filter transformation, you can specify one
or more conditions in a router transformation. Router is an active transformation.
Page 46 of 78
incoming data for each transformation.
● Input
● Output
● User-defined group
● Default group
You can creat the group filter conditions in the groups tab using the expression editor.
6. Can you connect ports of two output groups from router transformation to a single target?
No. You cannot connect more than one output group to one target or a single input group
transformation.
● Check the status of a target database before loading data into it.
● Determine if enough space exists in a database.
● Perform a specialized calculation.
● Drop and recreate indexes.
The stored procedure transformation is connected to the other transformations in the mapping
pipeline.
● Run a stored procedure every time a row passes through the mapping.
Page 47 of 78
● Pass parameters to the stored procedure and receive multiple output parameters.
The stored procedure transformation is not connected directly to the flow of the mapping. It either
runs before or after the session or is called by an expression in another transformation in the
mapping.
7. What are the options available to specify when the stored procedure transformation needs to
be run?
The following options describe when the stored procedure transformation runs:
● Normal: The stored procedure runs where the transformation exists in the mapping on a row-by-
row basis. This is useful for calling the stored procedure for each row of data that passes
through the mapping, such as running a calculation against an input port. Connected stored
procedures run only in normal mode.
● Pre-load of the Source: Before the session retrieves data from the source, the stored procedure
runs. This is useful for verifying the existence of tables or performing joins of data in a
temporary table.
● Post-load of the Source: After the session retrieves data from the source, the stored procedure
runs. This is useful for removing temporary tables.
● Pre-load of the Target: Before the session sends data to the target, the stored procedure runs.
This is useful for verifying target tables or disk space on the target system.
● Post-load of the Target: After the session sends data to the target, the stored procedure runs.
This is useful for re-creating indexes on the database.
A connected stored procedure transformation runs only in Normal mode. A unconnected stored
procedure transformation runs in all the above modes.
The order in which the Integration Service calls the stored procedure used in the transformation,
relative to any other stored procedures in the same mapping. Only used when the Stored
Procedure Type is set to anything except Normal and more than one stored procedure exists.
Page 48 of 78
● IN: Input passed to the stored procedure
● OUT: Output returned from the stored procedure
● INOUT: Defines the parameter as both input and output. Only Oracle supports this parameter
type.
A source qualifier represents the rows that the integration service reads when it runs a session.
Source qualifier is an active transformation.
The source qualifier transformation converts the source data types into informatica native data
types.
● Join two or more tables originating from the same source (homogeneous sources) database.
● Filter the rows.
● Sort the data
● Selecting distinct values from the source
● Create custom query
● Specify a pre-sql and post-sql
The source qualifier transformation joins the tables based on the primary key-foreign key
relationship.
When there is no primary key-foreign key relationship between the tables, you can specify a
custom join using the 'user-defined join' option in the properties tab of source qualifier.
● SQL Query
● User-Defined Join
● Source Filter
● Number of Sorted Ports
● Select Distinct
● Pre-SQL
● Post-SQL
Page 49 of 78
A Sequence generator transformation generates numeric values. Sequence generator
transformation is a passive transformation.
A sequence generator is used to create unique primary key values, replace missing primary key
values or cycle through a sequential range of numbers.
A sequence generator contains two output ports. They are CURRVAL and NEXTVAL.
4. What is the maximum number of sequence that a sequence generator can generate?
5. When you connect both the NEXTVAL and CURRVAL ports to a target, what will be the output
values of these ports?
6. What will be the output value, if you connect only CURRVAL to the target without connecting
NEXTVAL?
8. What is the number of cached values set to default for a sequence generator transformation?
For non-reusable sequence generators, the number of cached values is set to zero.
For reusable sequence generators, the number of cached values is set to 1000.
● Start Value
● Increment By
● End Value
● Current Value
● Cycle
● Number of Cached Values
Lookup Transformation
Page 50 of 78
1. What is a lookup transformation?
A lookup transformation is used to look up data in a flat file, relational table, view, and synonym.
● Get a related value: Retrieve a value from the lookup table based on a value in the source.
● Perform a calculation: Retrieve a value from a lookup table and use it in a calculation.
● Update slowly changing dimension tables: Determine whether rows exist in a target.
6. What are the differences between connected and unconnected lookup transformation?
● Connected lookup transformation receives input values directly from the pipeline. Unconnected
lookup transformation receives input values from the result of a :LKP expression in another
transformation.
● Connected lookup transformation can be configured as dynamic or static cache. Unconnected
lookup transformation can be configured only as static cache.
● Connected lookup transformation can return multiple columns from the same row or insert into
the dynamic lookup cache. Unconnected lookup transformation can return one column from
each row.
● If there is no match for the lookup condition, connected lookup transformation returns default
value for all output ports. If you configure dynamic caching, the Integration Service inserts rows
into the cache or leaves it unchanged. If there is no match for the lookup condition, the
unconnected lookup transformation returns null.
● In a connected lookup transformation, the cache includes the lookup source columns in the
lookup condition and the lookup source columns that are output ports. In an unconnected
lookup transformation, the cache includes all lookup/output ports in the lookup condition and the
lookup/return port.
● Connected lookup transformation passes multiple output values to another transformation.
Unconnected lookup transformation passes one output value to another transformation.
● Connected lookup transformation supports user-defined values. Unconnected lookup
transformation does not support user-defined default values.
Page 51 of 78
7. How do you handle multiple matches in lookup transformation? or what is "Lookup Policy on
Multiple Match"?
"Lookup Policy on Multiple Match" option is used to determine which rows that the lookup
transformation returns when it finds multiple rows that match the lookup condition. You can select
lookup to return first or last row or any matching row or to report an error.
● Insert Else Update option applies to rows entering the lookup transformation with the row type of
insert. When this option is enabled the integration service inserts new rows in the cache and
updates existing rows when disabled, the Integration Service does not update existing rows.
● Update Else Insert option applies to rows entering the lookup transformation with the row type of
update. When this option is enabled, the Integration Service updates existing rows, and inserts
a new row if it is new. When disabled, the Integration Service does not insert new rows.
● Persistent cache
● Recache from lookup source
● Static cache
● Dynamic cache
● Shared Cache
● Pre-build lookup cache
● Cached lookup transformation: The Integration Service builds a cache in memory when it
processes the first row of data in a cached Lookup transformation. The Integration Service
stores condition values in the index cache and output values in the data cache. The Integration
Service queries the cache for each row that enters the transformation.
● Uncached lookup transformation: For each row that enters the lookup transformation, the
Integration Service queries the lookup source and returns a value. The integration service does
not build a cache.
12. How the integration service builds the caches for connected lookup transformation?
The Integration Service builds the lookup caches for connected lookup transformation in the
following ways:
● Sequential cache: The Integration Service builds lookup caches sequentially. The Integration
Service builds the cache in memory when it processes the first row of the data in a cached
lookup transformation.
Page 52 of 78
● Concurrent caches: The Integration Service builds lookup caches concurrently. It does not need
to wait for data to reach the Lookup transformation.
13. How the integration service builds the caches for unconnected lookup transformation?
The Integration Service builds caches for unconnected Lookup transformations as sequentially.
15. When you use a dynamic cache, do you need to associate each lookup port with the input
port?
Yes. You need to associate each lookup/output port with the input/output port or a sequence ID.
The Integration Service uses the data in the associated port to insert or update rows in the
lookup cache.
● 0 - Integration Service does not update or insert the row in the cache.
● 1 - Integration Service inserts the row into the cache.
● 2 - Integration Service updates the row in the cache.
Page 53 of 78
● Join tables in the database: If the source and the lookup table are in the same database, join the
tables in the database rather than using a lookup transformation.
● Use persistent cache for static lookups.
● Avoid ORDER BY on all columns in the lookup source. Specify explicitly the ORDER By clause
on the required columns.
● For flat file lookups, provide Sorted files as lookup source.
A transaction is a set of rows bound by a commit or rollback of rows. The transaction control
transformation is used to commit or rollback a group of rows.
2. What is the commit type if you have a transaction control transformation in the mapping?
3. What are the different transaction levels available in transaction control transformation?
The following are the transaction levels or built-in variables:
Page 54 of 78
● 8. $finger: It displays complete information about all the users who are logged in
● 9. $who: To display information about all users who have logged into the system
currently (login name, terminal number, date and time).
● 10. $who am i: It displays username, terminal number, date and time at which you logged
into the system.
● 11. $cal: It displays previous, present and next month calendar
● $cal year
$cal month year
● 12. $ls: This command is used to find list of files except hidden.
● Administrator Commands
● 1. # system administrator prompt
$ user working prompt
● 2. #useradd: To add the user terminals.
● #useradd user1
● 3. #password: To give the password to a particular user terminal.
● #password user1
Enter password:
Retype password:
● System Run Levels
● #init: To change the system run levels.
● #init 0: To shut down the system
#init 1or s: To bring the system to single user mode.
#init 2: To bring the system multi user mode with no resource shared.
#init 3: To bring the system multi user mode with resource shared.
#init 6: Halt and reboot the system to the default run level.
● ls command options
● $ls: This command is used to display list of files and directories.
● $ls -x: It displays width wise
● $ls | pg: It displays list of files and directories page wise & width wise.
● $ls -a: It displays files and directories including. and .. hidden files
● $ls -f: It displays files, directories, executable files, symbolic files
● a1
a2
sample\
letter*
notes@
Page 55 of 78
● \ directory
* executable files
- symbolic link file
● $ls -r: It displays files and directories in reverse order (descending order)
● $ls -R: It displays files and directories recursively.
● $ls -t: It displays files and directories based on date and time of creation.
● $ls -i: It displays files and directories along with in order number.
● $ls -l: It displays files and directories in long list format.
●
lin filenam
file type Permission uname group sizeinbytes Date
k e
nov 3
- rw-r 1 tec group 560 sample
1:30
mar 7
D rwxr-wr 2 tec group 430 student
7.30
wild card
Description
character
Page 56 of 78
. It matches any single character except enter key character.
● Examples:
● * wild card
● $ ls t* : It displays all files starting with 't' character.
● $ ls *s: It displays all files ending with 's'
● $ ls b*k: It displays all files starting with b and ending with k
●
● ? wild card
● $ ls t??: It displays all files starting with 't' character and also file length must be 3 characters.
● $ ls ?ram: It displays all files starting with any character and also the length of the file must
be 4 characters.
●
● - wild card
● $ ls [a-z]ra: It displays all files starting with any character between a to z and ends with
'ra'.The length of the file must be 3 characters.
●
● [ ] wild card
● $ ls [aeiou]ra: It displays all files starting with 'a' or 'e' or 'i' or 'o' or 'u' character and ending
with 'ra'.
● $ ls [a-b]*: It displays all files starting with any character from a to b.
●
● . wild card
● $ ls t..u: It displays all files starting with 't' character and ending with 'u'. The length of the
file must be4 characters. It never includes enter key character.
●
● ^ wild card
● $ ls *sta$: It displays all files which ends with 'sta'. The length of the word can be any
number of characters.
●
Page 57 of 78
● Filter Commands: grep, fgrep and egrep:
● AWK : Used for scans a file line by line.
● Splits each input line into fields
● Explanation: Piping concept is used to combine more than one command. Here we first
identifying the non-blank files in file1 and redirecting the result to tmp (temporary file).This
Page 58 of 78
temporary file is input to the next command. Now we are renaming the tmp into file1.Now
check the file1 result <You will not find any blank lines>
●
● fgrep:
● This command is used to search multiple strings or one string but not regular expression.
● Syntax - $fgrep "pattern1
>pattern2
>....
>patternn " filename
Ex: $fgrep "unix
>c++
>Data Warehouse" stud
●
102 Prabhu C
Page 59 of 78
● Ex. $cut -c 1-8 employee (first 8 characters)
100 Rake
101 Rama
102 Prab
103 Jyos
● Delimeters:
Default ( Tab )
: ,
; * _
● $cut options
● $cut -f 0-2 employee
$cut -c 1-8 employee
$cut -d ',' 1-8 employee
● $ sort: This command is used to sort the file records in ascending or descending
● $sort Options
● $sort -r filename (reverse)
$sort -u filename (unique records)
$sort -n filename (show the number along sorting)
●
● $ uniq: This command is used to filter the duplicate lines from a file
Note: Always the file must be in sorting order.
Page 60 of 78
● Syntax - $uniq filename
Ex: $uniq employee
● $uniq Options
● Ex: $uniq -u employee (displays non duplicated lines)
Ex: $uniq -d employee (displays duplicated lines)
●
$ sed: This command is used for editing files from a script or from the command line.
● $ head: Display first 'N' number of files.
A dimension table consists of the attributes about the facts. Dimensions store the
textual descriptions of the business. Without the dimensions, we cannot measure the
facts. The different types of dimension tables are explained in detail below.
Conformed Dimension:
A conformed dimension is a dimension that has exactly the same
meaning and content when being referred from different fact tables. A
conformed dimension can refer to multiple tables in multiple data
marts within the same organization. For two dimension tables to be
considered as conformed, they must either be identical or one must
be a subset of another. There cannot be any other type of difference
between the two tables. For example, two dimension tables that are
Page 61 of 78
exactly the same except for the primary key are not considered
Eg: The time dimension is a common conformed dimension in an
organization
● Junk Dimension:
A junk dimension is a collection of random transactional codes flags and/or text
attributes that are unrelated to any particular dimension. The junk dimension is
simply a structure that provides a convenient place to store the junk attributes.
Eg: Assume that we have a gender dimension and marital status dimension. In the
fact table we need to maintain two keys referring to these dimensions. Instead of that
create a junk dimension which has all the combinations of gender and marital status
(cross join gender and marital status table and create a junk table). Now we can
maintain only one key in the fact table.
● Degenerated Dimension:
A degenerate dimension is a dimension which is derived from the fact table and
doesn’t have its own dimension table.
Eg: A transactional code in a fact table.
● Role-playing dimension:
Dimensions which are often used for multiple purposes within the same database are
called role-playing dimensions. For example, a date dimension can be used for “date
of sale”, as well as “date of delivery”, or “date of hire”.
A fact table is the one which consists of the measurements, metrics or facts of
business process. These measurable facts are used to know the business value and
to forecast the future business. The different types of facts are explained in detail
below.
● Additive:
Additive facts are facts that can be summed up through all of the dimensions
in the fact table. A sales fact is a good example for additive fact.
● Semi-Additive:
Semi-additive facts are facts that can be summed up for some of the
dimensions in the fact table, but not the others.
Eg: Daily balances fact can be summed up through the customers dimension
but not through the time dimension.
● Non-Additive:
Non-additive facts are facts that cannot be summed up for any of the
dimensions present in the fact table.
Eg: Facts which have percentages, ratios calculated.
Page 62 of 78
In the real world, it is possible to have a fact table that contains no measures or
facts. These tables are called “Factless Fact tables”.
E.g: A fact table which has only product key and date key is a factless fact. There
are no measures in this table. But still you can get the number products sold over a
period of time.
A fact table that contains aggregated facts are often called summary tables.
A Mapplet is a reusable object created in the Mapplet Designer which contains a set
of transformations and lets us reuse the transformation logic in multiple mappings.
Session:
Data Model:
● Ensures that all data objects required by the database are accurately
represented. Omission of data will lead to creation of faulty reports
and produce incorrect results.
● A data model helps design the database at the conceptual, physical
and logical levels.
● Data Model structure helps to define the relational tables, primary
and foreign keys and stored procedures.
● It provides a clear picture of the base data and can be used by
database developers to create a physical database.
● It is also helpful to identify missing and redundant data.
● Though the initial creation of data model is labor and time
consuming, in the long run, it makes your IT infrastructure upgrade
and maintenance cheaper and faster.
For example:
● Customer and Product are two entities. Customer number and name
are attributes of the Customer entity
● Product name and price are attributes of product entity
● Sale is the relationship between the customer and product
Page 64 of 78
The advantage of the Logical data model is to provide a foundation to form
the base for the Physical model. However, the modeling structure remains
generic.
At this Data Modeling level, no primary or secondary key is defined. At this
Data modeling level, you need to verify and adjust the connector details
that were set earlier for relationships.
This type of Data model also helps to visualize database structure. It helps
to model database columns keys, constraints, indexes, triggers, and other
RDBMS features.
Page 65 of 78
select * from emp
minus
Unique records into one target and duplicates into another target
Step 2: In aggregator transformation, group by the key column and add a new
port call it count_rec to count the key column.
Step 3: connect a router to the aggregator from the previous step.In router make
two groups one named "original" and another as "duplicate"
In original write count_rec=1 and in duplicate write count_rec>1.
Bottom-up design: In the bottom-up approach, data marts are first created to provide
reporting and analytical capabilities for specific business processes. These data marts can
then be integrated to create a comprehensive data warehouse. The data warehouse bus
architecture is primarily an implementation of "the bus", a collection of conformed
dimensions and conformed facts, which are dimensions that are shared (in a specific way)
between facts in two or more data marts.
Top-down design: The top-down approach is designed using a normalized enterprise data
model. "Atomic" data, that is, data at the lowest level of detail, are stored in the data
warehouse. Dimensional data marts containing data needed for specific business processes
or specific departments are created from the data warehouse.
Page 66 of 78
Hybrid design: Data warehouses (DW) often resemble the hub and spokes architecture.
Legacy systems feeding the warehouse often include customer relationship management
and enterprise resource planning, generating large amounts of data. To consolidate these
various data models, and facilitate the extract transform load process, data warehouses
often make use of an operational data store, the information from which is parsed into the
actual DW. To reduce data redundancy, larger systems often store the data in a normalized
way. Data marts for specific reports can then be built on top of the DW.
The picture below depicting group name and the filter conditions
Page 67 of 78
Step 4: Connect two group to corresponding target table.
1. Go to Mapping
Page 68 of 78
2. Double Clink on the concerned TARGET or edit the Target
3. Click on Properties Tab
4. The second Transformation Attribute is the property we are looking for
● Example:
UPDATE EMPL_POST_HIST
SET POST = :TU.POST
, UPDATE_DATE = :TU.UPDATE_DATE
WHERE EMPL = :TU.EMPL
Debug Process
To debug a mapping, complete the following steps:
1. Create breakpoints.
Create breakpoints in a mapping where you want the Integration Service to evaluate data and error
conditions.
2. Configure the Debugger.
Use the Debugger Wizard to configure the Debugger for the mapping. Select the session type the
Integration Service uses when it runs the Debugger. When you create a debug session, you configure a
subset of session properties within the Debugger Wizard, such as source and target location. You can
also choose to load or discard target data.
3. Run the Debugger.
Run the Debugger from within the Mapping Designer. When you run the Debugger, the Designer
connects to the Integration Service. The Integration Service initializes the Debugger and runs the
debugging session and workflow. The Integration Service reads the breakpoints and pauses the
Debugger when the breakpoints evaluate to true.
4. Monitor the Debugger.
While you run the Debugger, you can monitor the target data, transformation and mapplet output data,
the debug log, and the session log. When you run the Debugger, the Designer displays the following
windows:
● Debug log.
View messages from the Debugger.
● Target window.
View target data.
● Instance window.
View transformation data.
5. Modify data and breakpoints.
When the Debugger pauses, you can modify data and see the effect on transformations, mapplets, and
targets as the data moves through the pipeline. You can also modify breakpoint information.
Page 69 of 78
The Designer saves mapping breakpoint and Debugger information in the workspace files. You can copy
breakpoint information and the Debugger configuration to another mapping. If you want to run the
Debugger from another PowerCenter Client machine, you can copy the breakpoint information and the
Debugger configuration to the other PowerCenter Client machine.
The following figure shows the windows in the Mapping Designer that appears when you run the
Debugger:
Page 70 of 78
1. Debugger log.
2. Session log.
3. Instance window.
4. Target window.
Page 71 of 78
detail log. By avoiding that detail writing in bulk mode improve
performance.
Hi. Use bulk mode when there are no constraints, but you can't rollback things.
Use normal mode if there are constraints and if you want to rollback.
When loading to Microsoft SQL Server and Oracle targets, you must specify a normal load if
you select data driven for the Treat Source Rows As session property. When you
specify bulk mode and data driven, the Informatica Server fails the session
For example Whenever we use Update Strategy Transformation we need to mention whether
its DD_UPDATE or DD_INSERT or DD_DELETE in the mapping. One mapping may contain two
or more Update Strategy Transformation. Therefore, in order for the session to get executed
successfully mention Data Driven property in the session properties for that particular
mapping.
For
Architecture:
Informatica ETL tool consists of following services & components
1. Repository Service –
Responsible for maintaining
Informatica metadata &
providing access of same to
other services.
Page 72 of 78
2. Integration Service –
Responsible for the
movement of data from
sources to targets
3. Reporting Service - Enables
the generation of reports
4. Nodes – Computing platform
where the above services are
executed
Page 73 of 78
Page 74 of 78
In this tutorial- you will learn
● Informatica Domain
● PowerCenter Repository
● Domain Configuration
● Properties of the domain
● Powercenter client & Server Connectivity
● Repository Service
● Integration Service
● Sources & Targets
Informatica Domain
The overall architecture of Informatica is Service Oriented Architecture
(SOA).
1. Repository Service: It is responsible for maintaining Informatica metadata and
provides access to the same to other services.
2. Integration Service: This service helps in the movement of data from sources
to the targets.
3. Reporting Service: This service generates the reports.
4. Nodes: This is a computing platform to execute the above services.
5. Informatica Designer: It creates the mappings between source and target.
6. Workflow Manager: It is used to create workflows or other tasks and their
execution.
7. Workflow Monitor: It is used to monitor the execution of workflows.
8. Repository Manager: It is used to manage the objects in the repository.
Page 75 of 78
SQL Query:
2) Using Top
3) Using Limit
SELECT salary FROM Employee ORDER BY salary DESC LIMIT N-1, 1
4) Row Number
5) Using Dense_Rank()
Java transformation can be re-usable and it can be defined as both active or passive
Informatica object.
The Java Transformation has four self-explanatory tabs: Transformation (general
options), Ports (inputs and outputs in separate groups), Properties (active/passive,
deterministic), and Java Code. Once the ports and properties are set, the java code
can be entered and compiled from within the designer window. The code window is
divided into tab windows which includes:
● Import Packages - import 3rd party java packages, built-in or custom Java packages
● Helper code - declare user-defined variables and methods for the Java
transformation class.
● On Input Row - the Java code is executed one time for each input row. Only on
this tab the input row can be accessed.
● On End of Data - defines the behavior after processing all the input data
● On Receiving transaction - code which is executed when a transaction is received by
the transformation
● Java expressions - used for defining and calling Java expressions
Then the code snippets get compiled into byte code behind the scenes and
Integration Service starts a JVM that executes it to process the data.
Page 76 of 78
NVL(expr1, expr2) : In SQL, NVL() converts a null value to an actual value.
Data types that can be used are date, character and number. Data type must
match with each other i.e. expr1 and expr2 must of same data type.
Ex:
expr1 is the source value or expression that may contain a null.
expr2 is the target value for converting the null.
Dynamic cache
Lookup Caches in Informatica
Shared cache
Persistent cache
Static cache:
Static Cache is same as a Cached Lookup in which once a Cache is created and the Integration
Service always queries the Cache instead of the Lookup Table.
In Static Cache when the Lookup condition is set to true it returns the value from lookup table else
returns Null or Default value.
One of the key point to remember when using the Static Cache is that we cannot insert or update the
cache.
Dynamic cache:
In Dynamic Cache we can insert or update rows in the cache when we pass the rows through the
transformation. The Integration Service dynamically does the inserts or updates of data in the lookup
cache and passes the data to the target. The dynamic cache is synchronized with the target to have
the latest of the key attribute values.
Shared cache:
For Shared Cache Informatica server creates the cache memory for multiple lookup transformations in
the mapping and once the lookup is done for the first lookup then memory is released and that
memory is used by the other look up transformation.
We can share the lookup cache between multiple transformations. Un-named cache is shared
between transformations in the same mapping and named cache between transformations in the
same or different mappings.
Persistent cache:
Page 77 of 78
If we use Persistent cache Informatica server processes a lookup transformation and saves the
lookup cache files and reuses them the next time when the workflow is executed. The Integration
Service saves or deletes lookup cache files after a successful session run based on whether the
Lookup cache is checked as persistent or not.
In order to make a Lookup Cache as Persistent cache you need to make the following changes
Page 78 of 78