Ab Initio Interview Questions and Answers
Ab Initio Interview Questions and Answers
Ab Initio, also known as Abinitio, is a tool used to extract, transform and load data. 'Abinitio' is a Latin word that means 'from the beginning'. It was named Abinitio because Sheryl Handler and their team started it
after the bankruptcy of their previous company. Sheryl Handler was the former CEO of Thinking Machines Corporation, and he decided to start this company as a new beginning when the Thinking Machines
Corporation went bankrupt.
It is mainly used for data analysis, data manipulation, batch processing, and graphical user interface (GUI) based parallel processing for businesses.
Ab Initio Software is an American multinational private enterprise software corporation headquartered in Lexington, Massachusetts. Ab Initio Software specializes in high-volume data processing applications and
enterprise application integration. The Ab Initio software provides several products on a platform for parallel data processing applications.
Abinitio Software applications are most widely used in Business Intelligence Data Processing Platforms to build most business applications such as operational systems, distributed application integration, complex
event processing to data warehousing, and data quality management systems.
The Ab Initio Software applications are mainly used to perform functions related to fourth generation data analysis, batch processing, complex events, quantitative and qualitative data processing, data
manipulation, and graphical user interface (GUI)-based parallel processing software which is commonly used to extract, transform, and load (ETL) data.
The Ab Initio Software was founded in 1995 by Sheryl Handler and several other employees of Thinking Machines Corporation after the company's bankruptcy. Sheryl Handler was the former CEO of Thinking
Machines Corporation, and he decided to start this company when the Thinking Machines Corporation went bankrupt.
The most important components that the architecture of Abinitio includes are as follows:
ADVERTISEMENT
o Co-operating System
o Conduct-IT
7) What is the most important role of Co-operating system in Abinitio?
The most important role of Co-operating system in Abinitio is to provide the following features:
o It manages and runs the Abinitio graph and controls the ETL processes.
o It is also responsible for meta-data management and interaction with the EME.
Yes, it is possible to run a graph infinitely in Ab Initio. To do so, the graph end script should call the .ksh file of the graph. After that, if the graph name is xyz.mp then in the end script of the graph, it should call to
xyz.ksh. By following the above steps, we can run the graph for infinitely.
The roll-up component facilitates users to collect or group the records on certain field values. It is called for each of the records in the group and consists of initializing 2 and Rollup 3.
o Set AB_AIR_ROOT
In Abinitio, the term SANDBOX is a collection of graphs and related files stored in a single directory tree and behaves as a group for version control, navigation, migration, and relocation. It is a safe and controlled
environment to run graphs.
In Abinitio, data encoding is an approach that is used to keep data confidential. In this approach, we ensure that the information remains in a form that cannot be understood by someone else other than the
sender and the receiver.
15) What are the different types of file extensions used in Abinitio?
ADVERTISEMENT
o .mp: This file extension is used to store Abinitio graph or graph components.
o .mdc: This file extension is used to specify data-set or custom data-set components.
o .dml: This file extension is used to specify data manipulation language file or record type definition.
o .dat: This file extension is used to specify data files (multifile or serial file).
16) What information does a .dbc file extension provide to connect to the database?
The .dbc file extension provides the following information to connect to the database:
ADVERTISEMENT
ADVERTISEMENT
o It provides the name and version number of the database you want to connect to.
o It also specifies the computer's name on which the database instance or server runs to which you want to connect or install the database remote access software.
o It specifies the server's name, database instance, or provider you want to link.
In Abinitio, the lookup file is used to define one or more serial files (also known as flat files). It is a physical file that stores the data for the Lookup. It is a two-dimensional table of data that has been stored in a disk
file. It stores the name and display format for each column of data depending on the file format.
There are mainly three types of parallelism used in Abinitio. They are:
o Component parallelism: The component parallelism is used by a graph with multiple processes executing simultaneously on separate data.
o Data parallelism: The data parallelism is used by a graph that works with data divided into segments and operates on each segment respectively.
o Pipeline parallelism: The pipeline parallelism is used by a graph that deals with multiple components executing simultaneously on the same data. In this parallelism, each component in the
pipeline reads continuously from the upstream components, processes data, and writes to downstream components. It facilitates both components to operate in parallel.
19) What is the usage of dedup component and replicate component in Abinitio?
In Abinitio, the dedup component is used to eliminate duplicate records. On the other hand, the replicate component combines the data records from the inputs into one run and writes a copy of that run to each
of its output ports.
ADVERTISEMENT
20) What do you understand by Partition? What are the different types of partition components in Abinitio?
Partition is a process used in Abinitio for dividing data sets into multiple small sets for further processing. Following is a list of different types of partition components in Abinitio:
o Partition by Round-Robin: The Round-Robin Partition is used for distributing data evenly, in block size chunks, across the output partitions.
o Partition by Range: The Partition by Range facilitates users to divide data evenly among nodes, according to the set of partitioning ranges and keys.
o Partition by Percentage: The Partition by Percentage is used to distribute data in a way that the output is proportional to fractions of 100.
o Partition by Load balance: The Partition by Load balance is used for dynamic load balancing.
o Partition by Expression: The Partition by Expression is used to divide data according to a DML expression.
De-partition is used to read data from multiple flows or operations and re-join data records from different flows. Several de-partition components are available in Abinitio, such as Gather, Merge, Interleave,
Concatenation, etc.
ADVERTISEMENT
ADVERTISEMENT
The overflow errors are the errors that occur when the computer cannot process the bulk data. While processing data, overflow errors occur if the bulky calculations exceed the range of memory provided to them.
o air object Is<EME path for the object-/Projects/edf/..>: This air command is used to see the listings of objects in a directory inside the project.
o air object rm<EME path for the object-/Projects/edf/..>: This air command is used to remove an object from the repository.
o air object versions-verbose<EME path for the object-/Projects/edf/..>: This air command is used to give the object's version history.
Note: Apart from these, there are some other air commands for Abinitio, such as air object cat, air object modify, airlock show user, etc.
In Abinitio, the syntax for m_dump is used to view the data in multifile from the UNIX prompt. Following are the commands for m_dump:
o m_dump a.dml a.dat: This command is used to print the data as it manifested from GDE when we view data in formatted text.
o m_dump a.dml a.dat>b.dat: This command is used in output. The output is re-directed in b.dat and acts as a serial file.b.dat that can be referred to when required.
In Abinitio, the Sort Component is used to re-order the data. It consists of two parameters, "Key" and "Max-core".
o Key: The key parameter is one of the parameters for the sort component. It is used to determine the collation order.
o Max-core: The max-core parameter controls how often the sort component dumps data from memory to disk.
26) What is the difference between a DB config (.dbc file) and a CFG (.cfg) file?
The DB config file (.dbc file) consists of the information required for Ab Initio to connect to the database to extract or load tables or views. On the other hand, the .cfg file is the table configuration file created by
db_config while using components like Load DB Table.
ETL is an acronym that stands for Extract, Transform and Load. The ETL tool is software that works with the client-server model.
ADVERTISEMENT
Ab Initio works as an ETL tool. It is a fourth-generation data analysis, data manipulation, and batch-processing graphical user interface (GUI)-based parallel processing tool used to Extract, Transform and Load (ETL)
data.
Local lookup file contains documentation or data records that can be settled in the major or main memory. It can be used to retrieve records much faster than it retrieves data from a disk. For this, transform
functions are used by Local lookup.
29) What is the difference between the sandbox and EME? Can we perform checkin and checkout through sandbox?
Sandbox is a work area used to develop, test, or run code associated with a given project. A specific sandbox is associated with only one project whereas a project can be checked out to several sandboxes. We can
hold only one version of the code within the sandbox at any time. On the other hand, the EME is a data store that contains all versions of the code checked into it.
Local and formal parameters are both graph-level parameters, but there is a key difference between them. In the local parameter, we need to initialize the worth at the announcement. On the other hand, there is
no need to initialize the data in formal parameters. It will produce at the time of operation of the graph for that parameter.
31) What is the difference between check point and phase in Ab Initio?
A check point is a recovery point that is created A graph consists of phases. If a graph is created with
when a graph fails in the middle of the process. phases, each phase is assigned to some part of the
memory.
The rest of the process will be continued after All the phases run one by one.
the check point.
Data from the check point is fetched and In phase, the intermediate file will be deleted.
continues to execute after correction.
32) What do you understand by the rollup component? How can you do it?
Rollup is a way to group the records on a particular field. If a user wants to group the records on particular field values, rollup is the best way. It is a multi-stage transform function that contains the following
mandatory functions.
o Initialise
o Rollup
o Finalise
33) What is the difference between scientific data processing and commercial data processing?
In scientific data processing, data is processed with a great amount of computation, i.e., arithmetic operations. A limited amount of data is provided as input in this processing, and bulk data is there at the
outcome. On the other hand, commercial data processing is completely different. In commercial data processing, the outcome is limited compared to the input data. The computational operations are also limited
in commercial data processing.