0% found this document useful (0 votes)
172 views

Ab Initio Interview Question v1.0

This document contains an objective, organization, and question bank regarding Ab Initio interview questions. The objective is to support lateral recruitment activities across TCS for the Ab Initio ETL tool. The interview questions are classified by Ab Initio components, parallelism, performance, backend commands, and data warehousing concepts. The question bank contains over 30 questions related to these Ab Initio topics.

Uploaded by

Ravi Chythanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
172 views

Ab Initio Interview Question v1.0

This document contains an objective, organization, and question bank regarding Ab Initio interview questions. The objective is to support lateral recruitment activities across TCS for the Ab Initio ETL tool. The interview questions are classified by Ab Initio components, parallelism, performance, backend commands, and data warehousing concepts. The question bank contains over 30 questions related to these Ab Initio topics.

Uploaded by

Ravi Chythanya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 7

Ab Initio Interview Question Bank version 1.

0 Ab Initio Center of Excellence

1.

Ab Initio Interview Question Bank


Version 1.0
Author: Ab Initio Center of Excellence

In Confidence 1 of 7 Tata Consultancy Services Ltd


Ab Initio Interview Question Bank version 1.0 Ab Initio Center of Excellence

Objective

Lateral recruitment is one of the key activities of Ab Initio Center of Excellence. This document
has been intended to support the recruitment activities across TCS in Ab Initio ETL tool.
Answers to these questions have been mentioned to carry out the interview by any BI resource.
An experienced Ab Initio candidate is expected to answer all the questions correctly to become
technically qualified for a berth in TCS.

2. Organization

The interview questions have been classified according to Ab Initio features along with blend of
basic UNIX and Data warehousing conceptual questions.

3. Question Bank

Ab Initio Components

Q. What’s the difference between .dbc file and .cfg file?


A. .dbc file has the information required for AbInitio to connect to the database, while .cfg
file is the table configuration file created by db_config file.

Q. How a dbc file can be tested from command prompt.


A. m_db test <name of dbc file.

Q. Difference between Phase and Checkpoint.

Phases - are used to break the graph into pieces. Temporary files created during a phase will be
deleted after its completion. Phases are used to effectively separately manage resource-consuming
(memory, CPU, disk) parts of the application.

Checkpoints - created for recovery purposes. These are points where everything is written to disk.
You can recover to the latest saved point - and rerun from it. You can have phase breaks with or
without checkpoints.

Q. Which component Automatically increase phase in AbInitio


A. Intermediate File.

Q. What is Driving Input of a join component?


R. Number of the port to which you connect the driving input. The driving input is the
largest input, as specified by the driving parameter. All other inputs are read into
memory. The driving parameter is available only when the sorted-input parameter is set
to In memory: Input need not be sorted. For example, suppose the largest input to be
joined is on the in1 port. Specify a port number of 1 as the value of the driving
parameter. The component reads all other inputs to the join — for example, in0, and in2
— into memory.

Default is 0, which specifies that the driving input is on port in0.

In Confidence 2 of 7 Tata Consultancy Services Ltd


Ab Initio Interview Question Bank version 1.0 Ab Initio Center of Excellence

A.

Q. How an output file can be used as a lookup file?


A. Add to Catalog.

Q. Where we can use local lookup Function?


A. In multi-file lookup, local to a particular partition depending on the lookup key.
Returns a data record from a partition of a lookup file. This function is similar to lookup
and takes the same arguments.

Q. Explain Control Partition and Data Partition


A. Data partition contains data and control partition contains URL’s of data partition. The
first URL specifies the location of the root directory of the control partition of the
multifile system and gives the multifile system the name by which the Co>Operating
System recognizes it.

Q. What’s the function of output_index parameter in Reformat component?


A. Selection of transformed data to a particular output file. The transform function uses the
value of the input record to direct that input record to a particular output port. suppose
there are 100 input records and two output ports. Each output port receives between 0 and
100 records. According to the transform function you specify for output-index, the split
can be 50/50, 60/40, 0/100, 99/1, or any other combination that adds up to 100. If an
index is out of range (less than zero or greater than the highest-numbered port), the
component ignores that index.

Q. Difference Between graph parameter and Sandbox Parameter.


A. Graph parameter sets at the runtime of a individual graph, can be done properties
->parameter -> create.

Q. The components Rollup and Aggregation component in AbInitio is used to summarize


group of data records. Then which component will perform better?
A. Rollup is more explanatory than aggregate. Rollup can do other functionality like
input_selection, output_selection.

Q. What is the use of Packages?


A. Packages is the high level view of functions, local variables, global variables.

Q. Which one is faster for processing fixed length DMLs or delimited DMLs.
A. Fixed length DML is better because it will directly read the data of that length without
any comparison but in delimited one, every character is to be compared.

Q. How a 4 way MFS can be converted to 8 way MFS.


A It can be done by using a partition and departition component

Q. Which Component in AbInitio can create Deadlock?


A. Departition components.

In Confidence 3 of 7 Tata Consultancy Services Ltd


Ab Initio Interview Question Bank version 1.0 Ab Initio Center of Excellence

Q. What’s the difference between partition by Key and partition by round robin component?
A. Partition by key places the right data at the right partition, partition by round robin in a
round robin manner.

Q. By which component you can run a script?


A. Run Program

Q. What’s the difference between API and Utility mode?


These are database interfaces (api - uses SQL, utility - bulk loads, whatever vendor provides)
Q. How you count the record in a file with the AbInitio?
Using Rollup, key {}, using count (in).

Q. What is the significance of left outer/ right outer join?


A. Actually it takes the data from one input file (record required – True) and takes the data
from other input file only for calculation.(record required false).

Q. By which component u can do the same function of a dedup sort component?


A. Rollup.

3.2 Parallelism

Q. Explain different types of parallelism.


A. Component, Pipeline, Data.

Q. Which component unable to do pipeline parallelism?


A. Sort.

Q. What’s the difference between Replicate and Broadcast?


A. Replicate Component parallelism, broadcast data parallelism.

3.3 Performance

Q. How u improve performance of a graph


A. These things can be done to improve performance-
1. Use limited number of components
2. Minimize number of SORT component.
3. Minimize sorted join component, if possible replace them by in-memory join.
4. Use multifile.
5. Minimize regular expression functions (like re_index. in the transfer functions).

In Confidence 4 of 7 Tata Consultancy Services Ltd


Ab Initio Interview Question Bank version 1.0 Ab Initio Center of Excellence

3.4 Backend Commands


Q. How u calculate the total number of lines in a multifile
A. m_wc <file name dml name>.

Q. Create a multifile
A. m_mkfs.

Q. Suppose in a graph a look up name is changed, how u implement the change in graph.
A. Changes in the xfr can be done by grep and sed commands.

Q. What’s the difference between m_rollback and m_cleanup?


A. How records of a multifile and contents of multifile can be seen.
m_dump and m_expand commands.

Q. What’s the significance of nohup?

Q. What is pipe command?

Q. How u check a file is sorted or not


A. sort –c.

Q. How blank lines can be deleted.


A. sed -e ‘d/ */’ (filename. > new_filename
mv new_filename filename

Q. Write a command in the start script to check the size of input file is zero or not,
if zero, graph will fail.
A. size = `du <filename> | awk { print $1}`
if [$size -eq 0]; then
exit

3.5 Data Warehousing

Q. Characteristic of DW
A. Subject oriented, integrated, non-volatile and time variant.

Q. Concept of Sarogate key, by which component in AbInitio u can generate sarogate key.
A. Assign Key Component.

Q. Types Of Slowly Changing Dimension Difference Between Slow changing Dimension


and Rapidly changing Dimension

In Confidence 5 of 7 Tata Consultancy Services Ltd


Ab Initio Interview Question Bank version 1.0 Ab Initio Center of Excellence

A. Type 1, Type 2, Type 3.


Type 1 - Update a row.
Type 2 - Add a new row.
Type 3 – Add a New Column.
RCD -To control the size, split off some attributes into a minidimensions.

Q. What is Fact? Explain the different type of Facts (additive, semi additive, non Additive).
R. A fact table contains composite key where each candidate key is a foreign key to the
dimension table.
S.
A. Additive: Additive facts are facts that can be summed up through all of the dimensions in

the fact table. Ex – sales amount

Semi-Additive: Semi-additive facts are facts that can be summed up for some of the
dimensions in the fact table, but not the others. Ex – Current Balance, because it can be added
for all accounts, but can’t be added for each day.

Non-Additive: Non-additive facts are facts that cannot be summed up for any of the
dimensions present in the fact table. Ex- Profit margin, because it can’t be added in day level
or account level.

Q. Explain Conformed dimension, Degenerate Dimension, Junk Dimension


A. Conformed Dimension – means the same thing with every possible fact table to which it can
be joined. Generally it means that a conformed dimension is identical in each data mart.
Degenerate Dimension – that is dimensional in nature but stored in a fact table. For example,
if you have a dimension that only has Order Number and Order Line Number, you would have
a 1:1 relationship with the Fact table.
Junk Dimension - convenient grouping of flags and attribute to get them out of fact table into
a useful dimensional framework.
 Junk Dimension: A dimension with a type of text description, boolean and flags are
known as Junk Dimension.
 Dirty Dimension: A dimension table if the record exist more than once with difference in
non-key attributes is known as Dirty Dimension.

Conformed Dimension (Reusable): A dimension which can be shared by multiple fact tables is
known as conformed dimension
Q. Explain Factless Fact.
A. Contains a series of key values does not contain any fact or measure.

Q. What is Slice and Dice of data?


A. Project and select of data.

Q. What is Materialize View?`


A. Query results that have been stored in advance so long running calculations are not necessary
when u actually execute your SQL statement.

In Confidence 6 of 7 Tata Consultancy Services Ltd


Ab Initio Interview Question Bank version 1.0 Ab Initio Center of Excellence

Q. What is Real-time Data Warehousing?


A. In real time data warehousing there is a minimal delay between source data being generated
and being available in data warehouse. The steps are-
1. Reduce or eliminate the time taken to get new and changed data to your source system.
2. Reduce time required to cleanse.
3. Reduce time required to update your change.

Q. What are the different methods of loading Dimension Table?


A. Conventional Load - Before loading the data, all table constraints checked against the data.
Direct Load – All the constraints will be disabled.

Q. What is conformed Fact?


A. Allow having the same name in separate tables and can be combining and comparing
Mathematically.

Q. What is the purpose of Staging Area?


A. It is use basically to hold the data, and perform data cleansing and merging, before loading
the data into data warehouse.

Q. What is three-tier data warehouse?


A. 1. Data Tier – consists of Database.
2. Application tier – consists of the analytical server.
3. Presentation tier – tier that interacts with the end-user.

Q. Different Types of Schemas


A. Star Schema, snowflake schema, and fact constellation schema.

Q. What’s the difference between OLTP and OLAP?


A. Current value / historical data
Application / subject area across enterprise
Dynamic / static
Insert, update, delete / load, read
Normalization / demoralized.

Q. What is ODS?
A. It contain near real time data. ODS is used for analytical reporting as well as source for
data warehouse.

In Confidence 7 of 7 Tata Consultancy Services Ltd

You might also like