informarica questions and answers
informarica questions and answers
Active transformations can change the number of rows passing through them
(e.g., Filter, Router). Passive transformations do not change the row count
(e.g., Lookup, Expression).
Informatica uses three types of caches: Index Cache, Data Cache, and
Persistent Cache. Index Cache stores the index of lookup values, Data Cache
stores the actual data, and Persistent Cache stores cache information across
sessions.
The Source Qualifier transformation extracts data from a source and converts
it into a format that can be processed by Informatica. It’s the starting point of
any mapping.
7. What is a Joiner Transformation?
The Joiner transformation is used to join data from two sources. It supports
different types of joins like Inner Join, Left Outer Join, Right Outer Join, and
Full Outer Join.
Incremental load is done using a technique where only the new or changed
data since the last load is fetched, typically based on a timestamp or an ID.
This can be done using filters and lookup transformations.
11. Scenario: Your session failed due to memory issues. How do you debug
and fix it?
Check session logs for memory usage patterns. Tune cache size for lookups
and aggregators, reduce DTM buffer size, and ensure parallel processing is
set up correctly.
Export the mapping as XML files using Repository Manager and validate
connections, parameter files, and DB links before deployment using
Deployment Groups.
13. How do you schedule workflows in Informatica?
Use Informatica Scheduler, or external tools like Control-M or cron jobs for
scheduling. Use event wait and trigger actions based on file dependencies.
14. What are session logs and how do you manage them?
Session logs contain details of session run status, errors, and performance
metrics. Manage logs by using Workflow Monitor and setting retention
policies.
17. How do you handle a scenario where you need to filter records based on
a range of values in Informatica?
20. How do you handle changes in source data schema (e.g., adding a
column) during an active ETL process?
You need to modify the mappings to reflect the changes. This could involve
adding new fields to the source definitions, updating the transformation
logic, and updating the target schema accordingly.
The Rank transformation is used to select the top or bottom N records from a
group of records. It’s typically used when you want to find the highest or
lowest values based on certain conditions.
The Target Load Order is used in a session to specify the order in which
targets are loaded. It helps in managing dependencies between tables,
ensuring the correct sequence of loading.
29. What is the difference between SCD Type 1 and SCD Type 2?
SCD Type 1 overwrites old data with new data without preserving history,
while SCD Type 2 preserves historical data by adding a new record for each
change, typically with effective and expiry dates.
30. How do you manage performance issues in a large volume ETL process?
A connected lookup is part of the data flow, receives input directly from the
pipeline, and can return multiple columns. An unconnected lookup is called
within an expression and returns a single value.
32. What are some best practices for managing and organizing workflows in
Informatica?
Group related workflows into folders, use descriptive names for sessions and
workflows, manage dependencies using pre- and post-session commands,
and maintain consistent logging to help with troubleshooting.
Errors can be handled using error tables, custom error handling in the
transformation logic, post-session commands, or error logs. It’s important to
set up proper logging to capture error details for debugging.
41. What is the difference between the Connected and Unconnected Lookup
transformation?
44. How do you handle situations where your source data has duplicate
records?
Late arriving data can be managed by setting up a delay for processing the
incoming data or using a custom logic in transformations like the Lookup or
Update Strategy transformation to handle the late data correctly.
A Passive transformation does not alter the number of rows in the pipeline,
while an Active transformation can change the number of rows. For example,
a Filter (Active) can drop rows, but an Expression (Passive) can only modify
data within the same row.
49. How would you implement a Slowly Changing Dimension (SCD) Type 3 in
Informatica?
You can create reusable objects like transformations, mapplets, and sessions
by defining them and saving them in the repository. These reusable
components can then be used in multiple mappings and workflows to ensure
consistency and save time.
54. How do you handle a scenario where the target database is not available
during the ETL process?
In such a scenario, you can configure your session to retry the connection,
use a database-specific failover mechanism, or write data to a temporary
staging area, which can be later reloaded when the target database becomes
available.
55. What is the difference between the Source Qualifier and the Lookup
transformation?
The Source Qualifier is used to filter, join, or aggregate data from source
tables directly, whereas the Lookup transformation is used to look up
additional data from a reference source or cache, typically used to enrich
source data with reference data.
57. What is the difference between a Target Definition and a Target Load
Plan?
A Target Definition represents the structure of the target data that is being
loaded, while a Target Load Plan defines the sequence in which the target
tables should be loaded, managing the dependencies between multiple
targets in a session.
The Dynamic Lookup Cache is used when you want to update existing
records in the target based on a lookup match and insert new records if no
match is found. It dynamically updates the cache during session execution to
handle both inserts and updates.
61. How do you handle the scenario where the data needs to be processed in
parallel for improved performance?
To process data in parallel, you can use partitioning in the session properties
or employ pipeline partitioning. You can also use parallel session execution to
divide and conquer large datasets by processing them in smaller chunks
simultaneously.
In the Lookup transformation, the Cache File stores the cached data that is
used for lookups. The cache allows the transformation to perform faster
lookups by keeping a local copy of the reference data instead of querying the
source database for each lookup.
65. What are the different types of errors that can occur during ETL
processing?
Errors during ETL processing can include data truncation, data type
mismatches, constraint violations, connection issues, transformation logic
errors, and external system errors such as unavailability of source or target
systems.
66. How do you configure a session to run a shell script after the session
completes?
You can configure a session to run a shell script by using the Post-Session
Command option in the session properties. This allows you to specify the
script or executable to run after the session finishes processing.
For handling large volumes of data, use parallel processing, partitioning, and
pushdown optimization. Also, ensure that your session and database
configurations are optimized to manage large datasets efficiently.
Data extraction from a flat file can be done using the Flat File Source
Definition in Informatica, where you define the structure of the flat file (fixed
width, delimited, etc.) and map the fields to the required targets.
70. How do you handle a situation where you have to combine data from
multiple sources with different data formats?
Use the appropriate source definitions for each data format (e.g., Flat File,
Relational, XML), and then perform any necessary transformations to
harmonize and consolidate the data before loading it into the target.
72. How do you handle duplicate data in a source system using Informatica?
To handle duplicate data in a source system, you can use the Sorter
transformation with the "Distinct" option enabled, or use the Aggregator
transformation to group data and eliminate duplicates based on specific
fields.
75. What are the main differences between an Informatic PowerCenter and
Informatica Cloud?
An ETL Mapping in Informatica defines the data flow from sources to targets.
It includes the transformations that will be applied to the data during
extraction, transformation, and loading.
79. What is the function of the XML Source and Target in Informatica?
The XML Source and Target are used to read and write data from and to XML
files. You can define an XML schema, which outlines the structure of the XML
file, and map the data to and from relational tables.
80. What is a Post-Session Command in Informatica?
The Sorter transformation is used to sort data based on one or more columns
in ascending or descending order. Sorting is often required before performing
certain operations like aggregation, joining, or partitioning.
85. How do you process and load unstructured data into a database using
Informatica?
87. How do you handle different file formats (CSV, XML, JSON) as sources in
Informatica?
Informatica provides different connectors and source definitions for each file
format (e.g., Flat File Source, XML Source, JSON Source). For CSV files, you
can define the file structure; for XML and JSON, you define schemas to map
the data.
88. How do you implement a process to load data from multiple sources into
a single target?
To load data from multiple sources into a single target, you can use multiple
Source Definitions, and then use transformations like Joiner, Union, or Lookup
to merge the data before loading it into the target.
File dependencies can be managed using the Event Wait and Event Raise
tasks in a workflow. These tasks help in controlling the execution sequence
by waiting for specific files to be available before proceeding with the ETL
processing.
Version control can be managed through the use of the repository, where
different versions of objects (such as mappings, sessions, and workflows) are
maintained. You can check in and check out versions, and ensure that
changes are tracked across different environments.