0% found this document useful (0 votes)
54 views50 pages

Best Data Warehouse Interview Quastions

Uploaded by

santoshdvg1997
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views50 pages

Best Data Warehouse Interview Quastions

Uploaded by

santoshdvg1997
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Data Warehouse Interview

Questions

To view the live version of the


page, click here.

© Copyright by Interviewbit
Contents

Data Warehouse Interview Questions for Freshers


1. What do you mean by data mining? Differentiate between data mining and data
warehousing.
2. What do you mean by OLAP in the context of data warehousing? What guidelines
should be followed while selecting an OLAP system?
3. What do you understand about a fact table in the context of a data warehouse?
What are the different types of fact tables?
4. What do you mean by dimension table in the context of data warehousing? What
are the advantages of using a dimension table?
5. What are the different types of dimension tables in the context of data
warehousing?
6. Differentiate between fact table and dimension table.
7. What are the advantages of a data warehouse?
8. What are the disadvantages of using a data warehouse?
9. What are the different types of data warehouse?
10. What are the different types of data marts in the context of data warehousing?
11. Differentiate between data warehouse and database.
12. What do you mean by a factless fact table in the context of data warehousing?
13. What do you mean by Real time data warehousing?
14. What do you mean by Active Data Warehousing?
15. What are the characteristics of a data warehouse?
16. What do you understand about metadata and why is it used for?
17. Enlist a few data warehouse solutions that are currently being used in the
industry.
18. Enlist some of the renowned ETL tools currently used in the industry.
19. Explain what you mean by a star schema in the context of data warehousing.
20. What do you mean by snowflake schema in the context of data warehousing?
Page 1 © Copyright by Interviewbit
Data Warehouse Interview Questions

Data Warehouse Interview Questions for


Freshers (.....Continued)

21. What do you understand about a data cube in the context of data warehousing?

Data Warehouse Questions for Experienced


22. Explain the architecture of a data warehouse.
23. What are the advantages and disadvantages of the top down approach of data
warehouse architecture?
24. What are the advantages and disadvantages of the bottom up approach of data
warehouse architecture?
25. Differentiate between a data warehouse and a data mart.
26. What do you mean by data purging in the context of data warehousing?
27. What do you mean by dimensional modelling in the context of data
warehousing?
28. What do you understand by data lake in the context of data warehousing?
Differentiate between data lake and data warehouse.
29. Differentiate between star schema and snowflake schema in the context of data
warehousing.
30. Differentiate between Agglomerative hierarchical clustering and Divisive
clustering.
31. What are the advantages of a cloud based data warehouse?

Page 2 © Copyright by Interviewbit


Let's get Started

What is Data Warehouse?

Data warehousing (DW) is a method of gathering and analysing data from many
sources in order to get useful business insights. Typically, a data warehouse is used to
integrate and analyse corporate data from many sources. The data warehouse is the
heart of the business intelligence (BI) system, which is designed to analyse and report
on data.

Page 3 © Copyright by Interviewbit


Data Warehouse Interview Questions

It is a collection of technology and components that help with data strategy. It refers
to a company's electronic storage of a huge volume of data that is intended for
inquiry and analysis rather than transaction processing. It is a method of converting
data into information and making it available to people in a timely manner so that it
can be used to make a difference. It is created by combining data from a variety of
disparate sources to provide analytical reporting, structured and/or ad hoc queries,
and decision-making. Cleaning, integrating, and consolidating data are all part of
data warehousing.
The Data Warehouse is kept distinct from the operational database of the company.
It is an environment rather than a product. It is an information system's architectural
design that gives users access to current and historical decision-support data that's
difficult to find or find in a standard operational data store.
For example, a report on current inventory information may have more than 12
connected conditions. This can cause the query and report to take a long time to
respond. A data warehouse introduces a novel design that can help to improve query
performance and minimise response time for reporting and analytics.
The following are some alternative names for the data warehouse system:

Page 4 © Copyright by Interviewbit


Data Warehouse Interview Questions

Decision Support System (DSS)


Management Information System
Executive Information System
Analytic Application
Business Intelligence Solution
Now, let’s dive deep into the plethora of the most commonly asked Data Warehouse
Interview Questions and Answers.

Data Warehouse Interview Questions for Freshers


1. What do you mean by data mining? Differentiate between
data mining and data warehousing.

Page 5 © Copyright by Interviewbit


Data Warehouse Interview Questions

Data mining is the process of collecting information in order to find patterns, trends,
and usable data that will help a company to make data-driven decisions from large
amounts of data. In other words, Data Mining is the method of analysing hidden
patterns of data from various perspectives for categorization into useful data, which
is gathered and assembled in specific areas such as data warehouses, efficient
analysis, data mining algorithm, assisting decision making, and other data
requirements, ultimately resulting in cost-cutting and revenue generation. Data
mining is the process of automatically examining enormous amounts of data for
patterns and trends that go beyond simple analysis. Data mining estimates the
probability of future events by utilising advanced mathematical algorithms for data
segments.
Following are the differences between data warehousing and data mining:-

Page 6 © Copyright by Interviewbit


Data Warehouse Interview Questions

Data Warehousing Data Mining

A data warehouse is a database The technique of


system that is intended for examining data
analytical rather than transactional patterns is known as
purposes. data mining.

In data mining, data is


In data warehousing, data is saved
evaluated on a regular
on a regular basis.
basis.

With the assistance of


Engineers are the only ones that do technologists, business
data warehousing. users conduct data
mining.

Data mining is the


Data warehousing is the process of process of extracting
bringing all relevant data together. information from big
datasets.

Data Mining can be


Data warehousing can be referred referred to as a super
to as a subset of data mining. set of data
warehousing.

2. What do you mean by OLAP in the context of data


warehousing? What guidelines should be followed while
selecting an OLAP system?

Page 7 © Copyright by Interviewbit


Data Warehouse Interview Questions

OLAP is an acronym for On-Line Analytical Processing. OLAP is a so ware


technology classification that allows analysts, managers, and executives to get
insight into information through quick, reliable, interactive access to data that has
been converted from raw data to reflect the true dimensionality of the company as
perceived by the clients. OLAP allows for multidimensional examination of corporate
data while also allowing for complex estimations, trend analysis, and advanced data
modelling. It's rapidly improving the foundation for Intelligent Solutions, which
includes Business Performance Management, Strategy, Budgeting, Predicting,
Financial Documentation, Analysis, Modeling, Knowledge Discovery, and Data
Warehouses Reporting. End-clients can use OLAP to perform ad hoc record analysis in
several dimensions, giving them the information and understanding they need to
make better choices.
Following guidelines must be followed while selecting an OLAP system:-

Page 8 © Copyright by Interviewbit


Data Warehouse Interview Questions

Multidimensional Conceptual View: This is one of an OLAP system's most


important capabilities. It is feasible to use methods like slice and dice that
require a multidimensional view.
Transparency: Make the technology, the underlying data repository, computing
operations, and the disparate nature of source data completely accessible to
consumers. Users' efficiency and productivity are improved as a result of this
transparency.
Accessibility: OLAP systems must only allow access to the data that is truly
needed to do the analysis, giving clients a single, coherent, and consistent
picture. The OLAP system must map its own logical schema to the disparate
physical data storage, as well as to conduct any required transformations.
Consistent Reporting Performance: As the number of dimensions or the size of
the database grows, users should not experience any substantial reduction in
documenting performance. That is, as the number of dimensions grows, OLAP
performance should not deteriorate.
Client/Server Architecture: Make the OLAP tool's server component clever
enough that the various clients can be connected with minimal effort and
integration code. The server should be able to map and consolidate data from
disparate databases.
Generic Dimensionality: Each dimension in an OLAP method should be seen as
equal in terms of structure and operational capabilities. Select dimensions may
be granted additional operational capabilities, although such duties should be
available to all dimensions.
Dynamic Sparse Matrix Handling: To optimise sparse matrix handling by
adapting the physical schema to the unique analytical model being built and
loaded. When confronted with a sparse matrix, the system must be able to
dynamically assume the information distribution and change storage and access
in order to achieve and maintain a constant level of performance.
Multiuser Support: OLAP technologies must allow several users to access data
at the same time while maintaining data integrity and security.
Unrestricted cross-dimensional Operations: It gives techniques the ability to
determine dimensional order and to perform roll-up and drill-down operations
within and across dimensions.
Intuitive Data Manipulation: Reorientation (pivoting), drill-down and roll-up,
and other manipulations can be done intuitively and precisely on the cells of the
scientific model using point-and-click and drag-and-drop methods. It does away
with the need for a menu or several visits to the user interface.
Fl ibl R ti It id ffi i t t li t b ll i th
Page 9 © Copyright by Interviewbit
Data Warehouse Interview Questions

3. What do you understand about a fact table in the context of a


data warehouse? What are the different types of fact tables?
In a Data Warehouse system, a Fact table is simply a table that holds all of the facts or
business information that can be exposed to reporting and analysis when needed.
Fields that reflect direct facts, as well as foreign fields that connect the fact table to
other dimension tables in the Data Warehouse system, are stored in these tables.
Depending on the model type used to construct the Data Warehouse, a Data
Warehouse system can have one or more fact tables.
Following are the three types of fact tables:-

Page 10 © Copyright by Interviewbit


Data Warehouse Interview Questions

Transactional Fact Table: This is a very basic and fundamental view of


corporate processes. It can be used to depict the occurrence of an event at any
given time. The facts measure are only valid at that specific time and for that
specific incident. "One row per line in a transaction," according to the grain
associated with the transaction table. It typically comprises data at the detailed
level, resulting in a huge number of dimensions linked with it. It captures the
smallest or atomic level of dimension measurement. This allows the table to
provide users with extensive dimensional grouping, roll-up, and drill-down
reporting features. It's packed yet sparse at the same time. It can also be big at
the same time, depending on the number of events (transactions) that have
occurred.
Snapshot Fact Table: The snapshot depicts the condition of things at a specific
point in time, sometimes known as a "picture of the moment." It usually
contains a greater number of non-additive and semi-additive information. It aids
in the examination of the company's overall performance at regular and
predictable times. Unlike the transaction fact table, which adds a new row for
each occurrence of an event, this represents the performance of an activity at
the end of each day, week, month, or any other time interval. However, to
retrieve the detailed data in the transaction fact table, snapshot fact tables or
periodic snapshots rely on the transaction fact table. The periodic snapshot
tables are typically large and take up a lot of space.
Accumulating Fact Table: These are used to depict the activity of any process
with a well-defined beginning and end. Multiple data stamps are commonly
found in accumulating snapshots, which reflect the predictable stages or events
that occur over the course of a lifespan. There is sometimes an extra column
with the date that indicates when the row was last updated.

4. What do you mean by dimension table in the context of data


warehousing? What are the advantages of using a dimension
table?

Page 11 © Copyright by Interviewbit


Data Warehouse Interview Questions

A table in a data warehouse's star schema is referred to as a dimension table.


Dimensional data models, which are made up of fact and dimension tables, are used
to create data warehouses. Dimension tables contain dimension keys, values, and
attributes and are used to describe dimensions. It is usually of a tiny size. The number
of rows might range from a few to thousands. It is a description of the objects in the
fact table. The term "dimension table" refers to a collection or group of data
pertaining to any quantifiable occurrence. They serve as the foundation for
dimensional modelling. It includes a column that serves as a primary key, allowing
each dimension row or record to be uniquely identified. Through this key, it is linked
to the fact tables. When it's constructed, a system-generated key called the surrogate
key is used to uniquely identify the rows in the dimension.
Following are the advantages of using a dimension table :
It features a straightforward design.
It is simple to study and comprehend.
It stores data that has been de-normalized.
It aids in the preservation of historical data for any dimension.
It's simple to get info from it.
It's simple to build and put into action.
It provides the context for any business operation.

5. What are the different types of dimension tables in the


context of data warehousing?
Following are the different types of dimension tables in the context of data
warehousing:-

Page 12 © Copyright by Interviewbit


Data Warehouse Interview Questions

Page 13 © Copyright by Interviewbit


Data Warehouse Interview Questions

Slowly Changing Dimensions (SCD): Slowly changing dimensions are


dimension attributes that tend to vary slowly over time rather than at a regular
period of time. For example, the address and phone number may change, but
not on a regular basis. Consider the case of a man who travels to several nations
and must change his address according to the place he is visiting. This can be
accomplished in one of three ways:
Type 1: Replaces the value that was previously entered. This strategy is
simple to implement and aids in the reduction of costs by saving space.
However, in this circumstance, history is lost.
Type 2: Insert a new row containing the new value. This method saves the
history and allows it to be accessed at any time. However, it takes up a lot of
space, which raises the price.
Type 3: Add a new column to the table. It is the ideal strategy because
history can be easily preserved.
Junk Dimension: A trash dimension is a collection of low-cardinality attributes.
It contains a number of varied or disparate features that are unrelated to one
another. These can be used to implement RCD (rapidly changing dimension)
features like flags and weights, among other things.
Conformed Dimension: Multiple subject areas or data marts share this
dimension. It can be utilised in a variety of projects without requiring any
changes. This is used to keep things in order. Dimensions that are exactly the
same as or a proper subset of any other dimension are known as conformed
dimensions.
Roleplay Dimension: Role-play dimension refers to the dimension table that
has many relationships with the fact table. In other words, it occurs when the
same dimension key and all of its associated attributes are linked to a large
number of foreign keys in the fact table. Within the same database, it might
serve several roles.
Degenerate Dimension: Degenerate dimension attributes are those that are
contained in the fact table itself rather than in a separate dimension table. For
instance, a ticket number, an invoice number, a transaction number, and so on.

6. Differentiate between fact table and dimension table.

Page 14 © Copyright by Interviewbit


Data Warehouse Interview Questions

The record of a reality or fact table could be made up of attributes from various
dimension tables. The Fact Table, also known as the Reality Table, assists the user in
investigating the business aspects that aid him in call taking in order to improve his
firm. Dimension Tables, on the other hand, make it easier for the reality table or fact
table to collect dimensions from which measurements must be taken.
The following table enlists the difference between a fact table and a dimension
table:-

Page 15 © Copyright by Interviewbit


Data Warehouse Interview Questions

Fact Table Dimension Table

It contains the attributes' It is the companion table that


measurements, facts, or has the attributes that the fact
metrics. table uses to derive the facts.

Data grain (the most


atomic level by which facts It is detailed, comprehensive,
may be defined) is what and lengthy.
defines it.

It is used for analysis and It contains information


decision-making and regarding a company's
contains measures. operations and procedures.

It contains information in
It only contains textual
both numeric and textual
information.
formats.

It has a primary key that It has a foreign key that is


works as a foreign key in linked to the fact table's
the dimension table. primary key.

It stores the filter domain


It organizes the atomic data
and reports labels in
into dimensional structures.
dimension tables.

It does not have a


It has a hierarchy.
hierarchy.

It has lesser attributes than It has more attributes than a


a dimension table. fact table.

It has more records as


It has fewer records than a fact
compared to a dimension
table.
table.

Page 16 © Copyright by Interviewbit


Data Warehouse Interview Questions

7. What are the advantages of a data warehouse?


Following are the advantages of using a data warehouse:

Page 17 © Copyright by Interviewbit


Data Warehouse Interview Questions

Helps you save time:


To stay ahead of your competitors in today's fast-paced world of cutthroat
competition, your company's ability to make smart judgments quickly is
critical.
A Data warehouse gives you instant access to all of your essential data, so
you and your staff don't have to worry about missing a deadline. All you
have to do now is deploy your data model to start collecting data in a
matter of seconds. You can do this with most warehousing solutions
without utilising a sophisticated query or machine learning.
With data warehousing, your company won't have to rely on a technical
professional to troubleshoot data retrieval issues 24 hours a day, seven days
a week. You will save a lot of time this way.
Enhances the quality of data:
The high-quality data ensures that your company's policies are founded on
accurate information about your operations.
You can turn data from numerous sources into a shared structure using
data warehousing. You can assure the consistency and integrity of your
company's data this way. This allows you to spot and eliminate duplicate
data, inaccurately reported data and disinformation.
For your firm, implementing a data quality management program may be
both costly and time-consuming. You can easily use a data warehouse to
reduce the number of these annoyances while saving money and increasing
the general productivity of your company.
Enhances Business Intelligence (BI):
Throughout your commercial endeavours, you can use a data warehouse to
gather, absorb, and derive data from any source. As a result of the capacity
to easily consolidate data from several sources, your BI will improve by
leaps and bounds.
Data standardization and Consistency are achieved:
The uniformity of huge data is another key benefit of having central data
repositories. In a similar manner, a data storage or data mart might benefit
your company. Because data warehousing stores data from various sources
in a consistent manner, such as a transactional system, each source will
produce results that are synchronized with other sources. This ensures that
data is of higher quality and homogeneous. As a result, you and your team
can rest assured that your data is accurate, resulting in more informed
d
Page 18 © Copyright by Interviewbit
Data Warehouse Interview Questions

8. What are the disadvantages of using a data warehouse?


Following are the disadvantages of using a data warehouse:-
Loading time of data resources is undervalued: We frequently underestimate
the time it will take to gather, sanitize, and post data to the warehouse.
Although some resources are in place to minimize the time and effort spent on
the process, it may require a significant amount of the overall production time.
Source system flaws that go unnoticed: A er years of non-discovery, hidden
flaws linked with the source networks that provide the data warehouse may be
discovered. Some fields, for example, may accept nulls when entering new
property information, resulting in workers inputting incomplete property data,
even if it was available and relevant.
Homogenization of data: Data warehousing also deals with data formats that
are comparable across diverse data sources. It's possible that some important
data will be lost as a result.

9. What are the different types of data warehouse?


Following are the different types of data warehouse:

Page 19 © Copyright by Interviewbit


Data Warehouse Interview Questions

Enterprise Data Warehouse:


An enterprise database is a database that brings together the various
functional areas of an organisation in a cohesive manner. It's a centralised
location where all corporate data from various sources and apps can be
accessed. They can be utilised for analytics and by everyone in the
organisation once they've been saved. The data can be categorised by
subject, and access is granted according to the necessary division. The tasks
of extracting, converting, and conforming are taken care of in an Enterprise
Datawarehouse.
Enterprise Datawarehouse's purpose is to provide a comprehensive
overview of any object in the data model. This is performed by finding and
wrangling the data from different systems. This is then loaded into a model
that is consistent and conformed. The data is acquired by Enterprise
Datawarehouse, which can provide access to a single site where various
tools can be used to execute analytical functions and generate various
predictions. New trends or patterns can be identified by research teams,
which can then be focused on to help the company expand.
Operational Data Store (ODS):
An operational data store is utilised instead of having an operational
decision support system application. It facilitates data access directly from
the database, as well as transaction processing. By checking the associated
business rules, the data in the Operational Data Store may be cleansed, and
any redundancy found can be checked and rectified. It also aids in the
integration of disparate data from many sources so that business activities,
analysis, and reporting may be carried out quickly and effectively while the
process is still ongoing.
The majority of current operations are stored here before being migrated
to the data warehouse for a longer period of time. It is particularly useful for
simple searches and little amounts of data. It functions as short-term or
temporary memory, storing recent data. The data warehouse keeps data
for a long time and also keeps information that is generally permanent.
Data Mart:
Data Mart is referred to as a pattern to get client data in a data warehouse
environment. It's a data warehouse-specific structure that's employed by
the team's business domain. Every company has its own data mart, which is
kept in the data warehouse repository. Dependent, independent, and
hybrid data marts are the three types of data marts Independent data
Page 20 © Copyright by Interviewbit
Data Warehouse Interview Questions

10. What are the different types of data marts in the context of
data warehousing?
Following are the different types of data mart in data warehousing:

Page 21 © Copyright by Interviewbit


Data Warehouse Interview Questions

Dependent Data Mart: A dependent data mart can be developed using data
from operational, external, or both sources. It enables the data of the source
company to be accessed from a single data warehouse. All data is centralized,
which can aid in the development of further data marts.
Independent Data Mart: There is no need for a central data warehouse with this
data mart. This is typically established for smaller groups that exist within a
company. It has no connection to Enterprise Data Warehouse or any other data
warehouse. Each piece of information is self-contained and can be used
independently. The analysis can also be carried out independently. It's critical to
maintain a consistent and centralized data repository that numerous users can
access.
Hybrid Data Mart: A hybrid data mart is utilized when a data warehouse
contains inputs from multiple sources, as the name implies. When a user
requires an ad hoc integration, this feature comes in handy. This solution can be
utilized if an organization requires various database environments and quick
implementation. It necessitates the least amount of data purification, and the
data mart may accommodate huge storage structures. When smaller data-
centric applications are employed, a data mart is most effective.

11. Differentiate between data warehouse and database.


Database: A database is a logically organized collection of structured data kept
electronically in a computer system. A database management system is usually in
charge of a database (DBMS). The data, the DBMS, and the applications that go with
them are referred to as a database system, which is commonly abbreviated to just a
database.
The following table enlists the difference between data warehouse and database:-

Page 22 © Copyright by Interviewbit


Data Warehouse Interview Questions

Page 23 © Copyright by Interviewbit


Data Warehouse Interview Questions

Data Warehouse Database

Database uses the


OnLine
Data Warehouse uses the OnLine
Transactional
Analytical Processing (OLAP).
Processing
(OLTP).

The database aids


Data Warehouse is mainly used for
in the execution
analyzing the historical data so as to
of basic business
make future decisions based on them.
procedures.

A database's
Because a data warehouse is tables and joins
denormalized, tables and joins are are complicated
straightforward. because they are
normalised.

It can be referred
to as an
It can be referred to as a subject-
application-
oriented collection of data.
oriented
collection of data.

In this, Entity-
Relationship (ER)
In this, data modelling techniques are modelling
used for designing. techniques are
used for
designing.

Data is generally
Data may not be up to date in this.
up to date in this.

Page 24 © Copyright by Interviewbit


Data Warehouse Interview Questions

12. What do you mean by a factless fact table in the context of


data warehousing?
A fact table with no measures is known as a factless fact table. It's essentially a
crossroads of dimensions (it contains nothing but dimensional keys). One form of
factless table is used to capture an event, while the other is used to describe
conditions.
In the first type of factless fact table, there is no measured value for an event, but it
develops the relationship among the dimension members from several dimensions.
The existence of the relationship is itself the fact. This type of fact table can be
utilised to create valuable reports on its own. Various criteria can be used to count
the number of occurrences.
The second type of factless fact table is a tool that's used to back up negative
analytical reports. Consider a store that did not sell a product for a period of time. To
create such a report, you'll need a factless fact table that captures all of the
conceivable product combinations that were on offer. By comparing the factless
table to the sales table for the list of things that did sell, you can figure out what's
missing.

13. What do you mean by Real time data warehousing?


A system that reflects the condition of the warehouse in real time is referred to as
real-time data warehousing. If you perform a query on the real-time data warehouse
to learn more about a specific aspect of the company or entity described by the
warehouse, the result reflects the status of that entity at the time the query was run.
Most data warehouses contain data that is highly latent — that is, data that reflects
the business at a specific point in time. A real-time data warehouse provides current
(or real-time) data with low latency.

14. What do you mean by Active Data Warehousing?

Page 25 © Copyright by Interviewbit


Data Warehouse Interview Questions

The technical capacity to collect transactions as they change and integrate them into
the warehouse, as well as maintaining batch or planned cycle refreshes, is known as
active data warehousing. Automating routine processes and choices is possible with
an active data warehouse. The active data warehouse sends decisions to the On-Line
Transaction Processing (OLTP) systems automatically. An active data warehouse is
designed to capture and distribute data in real time. They give you a unified view of
your customers across all of your business lines. Business Intelligence Systems are
linked to it.

15. What are the characteristics of a data warehouse?


Following are the characteristics of a data warehouse:-

Page 26 © Copyright by Interviewbit


Data Warehouse Interview Questions

Subject-oriented : Because it distributes information about a theme rather than


an organization's actual operations, a data warehouse is always subject-
oriented. It is possible to do so with a certain theme. That is to say, the data
warehousing procedure is intended to deal with a more defined theme. These
themes could include sales, distribution, and marketing, for example. The focus
of a data warehouse is never solely on present activities. Instead, it concentrates
on demonstrating and analyzing evidence in order to reach diverse conclusions.
It also provides a simple and precise demonstration around a specific theme by
removing info that isn't needed to make conclusions.
Integrated : It is similar to subject orientation in that it is created in a
dependable format. Integration entails the creation of a single entity to scale all
related data from several databases. The data has to be stored in several data
warehouses in a shared and widely accessible manner. A data warehouse is
created by combining information from a variety of sources, such as a
mainframe and a relational database. It must also have dependable naming
conventions, formats, and codes. The utilization of a data warehouse allows for
more effective data analysis. The consistency of name conventions, column
scaling, and encoding structure, among other things, should be validated. The
data warehouse integration handles a variety of subject-related warehouses.
Time-Variant : Data is kept in this system at various time intervals, such as
weekly, monthly, or annually. It discovers a number of time limits that are
structured between massive datasets and held in the online transaction process
(OLTP). Data warehouse time limitations are more flexible than those of
operational systems. The data in the data warehouse is predictable over a set
period of time and provides information from a historical standpoint. It contains
explicit or implicit time elements. Another property of time-variance is that data
cannot be edited, altered, or updated once it has been placed in the data
warehouse.
Non-volatile : The data in a data warehouse is permanent, as the name implies.
It also means that when new data is put, it is not erased or removed. It
incorporates a massive amount of data that is placed into logical business
alteration between the designated quantity. It assesses the analysis in the
context of warehousing technologies. Data is read-only and refreshed at
scheduled intervals. This is useful for analyzing historical data and
understanding how things work. It is not required to have a transaction process,
a recapture mechanism, or a concurrency control mechanism. In a data
warehouse environment, operations like delete, update, and insert that are
f di ti l li ti l t
Page 27 © Copyright by Interviewbit
Data Warehouse Interview Questions

16. What do you understand about metadata and why is it used


for?

Metadata is defined as information about data. Metadata is the context that provides
data a more complete identity and serves as the foundation for its interactions with
other data. It can also be a useful tool for saving time, staying organised, and getting
the most out of the files you're working with. Structural Metadata describes how an
object should be classified in order to fit into a wider system of things. Structural
Metadata makes a link with other files that allows them to be categorized and used in
a variety of ways. Administrative Metadata contains information about an object's
history, who owned it previously, and what it can be used for. Rights, licences, and
permissions are examples. This information is useful for persons who are in charge of
managing and caring for an asset.
When a piece of information is placed in the correct context, it takes on a whole new
meaning. Furthermore, better-organized Metadata will considerably reduce search
time.

17. Enlist a few data warehouse solutions that are currently


being used in the industry.
Some of the major data warehouse solutions currently being used in the industry are
as follows :

Page 28 © Copyright by Interviewbit


Data Warehouse Interview Questions

Snowflakes
Oracle Exadata
Apache Hadoop
SAP BW4HANA
Microfocus Vertica
Teradata
AWS Redshi
GCP Big Query

18. Enlist some of the renowned ETL tools currently used in the
industry.
Some of the renowned ETL tools currently used in the industry are as follows :
Informatica
Talend
Pentaho
Abnitio
Oracle Data Integrator
Xplenty
Skyvia
Microso – SQL Server Integrated Services (SSIS)

19. Explain what you mean by a star schema in the context of


data warehousing.
Star schema is a sort of multidimensional model and is used in a data warehouse.
The fact tables and dimension tables are both contained in the star schema. There
are fewer foreign-key joins in this design. With fact and dimension tables, this schema
forms a star.

Page 29 © Copyright by Interviewbit


Data Warehouse Interview Questions

20. What do you mean by snowflake schema in the context of


data warehousing?
Snowflake Schema is a multidimensional model that is also used in data warehouses.
The fact tables, dimension tables, and sub dimension tables are all contained in the
snowflake schema. With fact tables, dimension tables, and sub-dimension tables, this
schema forms a snowflake.

Page 30 © Copyright by Interviewbit


Data Warehouse Interview Questions

21. What do you understand about a data cube in the context of


data warehousing?
A data cube is a multidimensional data model that stores optimized, summarized, or
aggregated data for quick and easy analysis using OLAP technologies. The
precomputed data is stored in a data cube, which makes online analytical processing
easier. We all think of a cube as a three-dimensional structure, however in data
warehousing, an n-dimensional data cube can be implemented. A data cube stores
information in terms of dimensions and facts.

Page 31 © Copyright by Interviewbit


Data Warehouse Interview Questions

Data Cubes have two categories. They are as follows :


Multidimensional Data Cube : Data is stored in multidimensional arrays, which
allows for a multidimensional view of the data. A multidimensional data cube
aids in the storage of vast amounts of information. A multidimensional data
cube uses indexing to represent each dimension of the data cube, making it
easier to access, retrieve, and store data.
Relational Data Cube : The relational data cube can be thought of as an
"expanded version of relational DBMS." Data is stored in relational tables, and
each relational table represents a data cube's dimension. The relational data
cube uses SQL to produce aggregated data, although it is slower than the
multidimensional data cube in terms of performance. The relational data cube,
on the other hand, is scalable for data that grows over time.

Data Warehouse Questions for Experienced


22. Explain the architecture of a data warehouse.
A data warehouse is a single schema that organizes a heterogeneous collection of
multiple data sources. There are two techniques to building a data warehouse. They
are as follows:

Page 32 © Copyright by Interviewbit


Data Warehouse Interview Questions

Top-Down Approach in Data Warehouse:

Following are the major components :

Page 33 © Copyright by Interviewbit


Data Warehouse Interview Questions

External Sources - An external source is a location from which data is collected,


regardless of the data format. Structured, semi-structured, and unstructured
data are all possibilities.
Stage Area - Because the data gathered from external sources does not follow a
specific format, it must be validated before being loaded into the data
warehouse. ETL tool is used for this purpose in the stage area.
Data-warehouse - A er data has been cleansed, it is kept as a central repository
in the data warehouse. The meta data is saved here, while the real data is
housed in data marts. In this top-down approach, the data warehouse stores the
data in its purest form.
Data Marts - A data mart is a storage component as well. It maintains
information about a single organization's function that is managed by a single
authority. Depending on the functions, an organization can have as many data
marts as it wants.
Data Mining - Data mining is the process of analyzing large amounts of data in a
data warehouse. With the use of a data mining algorithm, it is used to discover
hidden patterns in databases and data warehouses.
Bottom Up Approach in Data Warehouse:

Following are the steps involved in the bottom up approach:

Page 34 © Copyright by Interviewbit


Data Warehouse Interview Questions

The data is first gathered from external sources (same as happens in top-down
approach).
The data is then imported into data marts rather than data warehouses a er
passing through the staging area (as stated above). The data marts are built first,
and they allow for reporting. It focuses on a specific industry.
A er that, the data marts are incorporated into the data warehouse.

23. What are the advantages and disadvantages of the top down
approach of data warehouse architecture?
Following are the advantages of the top down approach :
Because data marts are formed from data warehouses, they have a consistent
dimensional perspective.
This methodology is also thought to be the most effective for corporate reforms.
As a result, large corporations choose to take this method.
It is simple to create a data mart from a data warehouse.
The disadvantage of the top down approach is that the cost, time, and effort
required to design and maintain it are all very expensive.

24. What are the advantages and disadvantages of the bottom


up approach of data warehouse architecture?
Following are the advantages of the bottom up approach :
The reports are generated quickly since the data marts are created first.
We can fit a greater number of data marts here, allowing us to expand our data
warehouse.
In addition, the cost and effort required to build this model are quite minimal.
Because the dimensional view of data marts is not consistent as it is in the top-down
approach, this model is not as strong as the top-down approach and this is a
disadvantage of the bottom up approach.

25. Differentiate between a data warehouse and a data mart.


Following table enlists the difference between a data warehouse and a data mart:

Page 35 © Copyright by Interviewbit


Data Warehouse Interview Questions

Page 36 © Copyright by Interviewbit


Data Warehouse Interview Questions

Data Warehouse Data Mart

A data mart is a Data


A data warehouse is a huge
Warehouse's single
collection of data gathered from
subtype. It is created to
several departments or groups
fulfill the requirements
inside a company.
of a certain user group.

It aids in the making of


It aids in strategic decision-
tactical business
making.
decisions.

The process of designing a Data The Data Mart design


Warehouse is fairly challenging. procedure is simple.

Because they can only


Data warehousing involves a big handle tiny amounts of
portion of the company, which is data, data marts are
why it takes so long to process. simple to use, create,
and install.

The primary goal of a Data


A data mart is primarily
Warehouse is to create a unified
used at the department
environment and a consistent
level in a business
view of the business at any given
division.
moment in time.

Data Marts are designed


When opposed to data mart, the
for certain user groups.
data kept in the Data Warehouse
As a result, the data is
is always detailed.
brief and limited.

Data in Data Mart comes


Data is collected from a variety of
from a limited number of
sources in a data warehouse.
sources.

Page 37 © Copyright by Interviewbit


Data Warehouse Interview Questions

26. What do you mean by data purging in the context of data


warehousing?
Data purging is a term that describes techniques for permanently erasing and
removing data from a storage space. Data purging, which is typically contrasted with
data deletion, involves a variety of procedures and techniques.

Purging removes data permanently and frees up memory or storage space for other
purposes, whereas deletion is commonly thought of as a temporary preference.
Automatic data purging features are one of the methods for data cleansing in
database administration. Some Microso products, for example, feature an
automatic purge strategy that uses a circular buffer mechanism, in which older data
is purged to create room for fresh data. Administrators must manually remove data
from the database in other circumstances.

27. What do you mean by dimensional modelling in the context


of data warehousing?

Page 38 © Copyright by Interviewbit


Data Warehouse Interview Questions

Dimensional Modelling (DM) is a data structure technique that is specifically designed


for data storage in a data warehouse. The goal of dimensional modelling is to
optimise the database so that data can be retrieved more quickly. In a data
warehouse, a dimensional model is used to read, summarise, and analyse numeric
data such as values, balances, counts, weights, and so on. Relation models, on the
other hand, are designed for adding, modifying, and deleting data in a real-time
Online Transaction System.
Following are the steps that should be followed while creating a dimensional model:

Page 39 © Copyright by Interviewbit


Data Warehouse Interview Questions

Identifying the business process : The first step is to identify the specific
business processes that a data warehouse should address. This might be
Marketing, Sales, or Human Resources, depending on the organization's data
analytic needs. The quality of data available for that process is also a factor in
deciding which business process to use. It is the most crucial step in the Data
Modeling process, and a failure here would result in a cascade of irreversible
flaws.
Identifying the grain : The level of detail for the business problem/solution is
described by the grain. It's the procedure for determining the lowest level of
data in any table in your data warehouse. If a table contains sales data for each
day, the granularity should be daily. Monthly granularity is defined as a table that
contains total sales data for each month.
Identifying the dimension : Date, shop, inventory, and other nouns are
examples of dimensions. All of the data should be saved in these dimensions.
The date dimension, for example, could include information such as the year,
month, and weekday.
Identifying the fact : This stage is linked to the system's business users because
it is here that they gain access to data housed in the data warehouse. The
majority of the rows in the fact table are numerical values such as price or cost
per unit.
Building the schema : The Dimension Model is implemented in this step. The
database structure is referred to as a schema (arrangement of tables).

28. What do you understand by data lake in the context of data


warehousing? Differentiate between data lake and data
warehouse.
A Data Lake is a large-scale storage repository for structured, semi-structured, and
unstructured data. It's a location where you can save any type of data in its original
format, with no restrictions on account size or file size. It provides a significant
amount of data for improved analytical performance and native integration.

Page 40 © Copyright by Interviewbit


Data Warehouse Interview Questions

A data lake is a huge container that looks a lot like a lake or a river. Similar to how a
lake has various tributaries, a data lake has structured data, unstructured data,
machine-to-machine communication, and logs flowing through in real-time.
The following table enlists the differences between data lake and data warehouse:

Page 41 © Copyright by Interviewbit


Data Warehouse Interview Questions

Data Lake Data Warehouse

Data extracted from


All data is stored in the data transactional systems or
lake, regardless of its source data consisting of
or structure. The data is quantitative measures and
stored in its unprocessed their properties will be
state. When it is ready to be stored in a data warehouse.
used, it is converted. The information has been
cleansed and changed.

Captures semi-structured Captures structured data


and unstructured data in and organises it according to
their original form from defined standards for data
source systems. warehouse purposes.

The data lake is appropriate


for those that perform in-
Because it is highly
depth analysis. Data
structured, easy to use, and
scientists, for example,
understand, the data
require advanced analytical
warehouse is perfect for
techniques that include
operational users.
predictive modelling and
statistical analysis.

The cost of storing data in big


Data warehouse storage is
data technology is less than
more expensive and time-
that of storing data in a data
consuming.
warehouse.

The schema is usually


developed a er the data has Schema is usually defined
been stored. This provides a before data is saved. Work is
great level of flexibility and required at the start of the
convenience of data process, but performance,
collecting, but it necessitates security, and integration are
labour at the end of the all advantages
Page 42 © Copyright by Interviewbit
Data Warehouse Interview Questions

29. Differentiate between star schema and snowflake schema in


the context of data warehousing.
Following table enlists the difference between the star schema and the snowflake
schema:

Page 43 © Copyright by Interviewbit


Data Warehouse Interview Questions

Star Schema Snowflake Schema

The fact tables and The fact tables, dimension tables,


dimension tables are and sub dimension tables are all
both contained in the contained in the snowflake
star schema. schema.

It is a top-down model. It is, however, a bottom-up model.

The star schema takes


It takes up less space.
up more room.

Snowflake schema has a higher


Star schema has a low
query complexity than star
query complexity.
schema.

It is really simple to
It is tough to comprehend.
comprehend.

It contains less foreign It has a greater number of foreign


keys. keys.

It has a lot of It has a low level of data


redundancy in its data. redundancy.

The execution of The execution of queries takes


queries takes less time. longer than star schema.

Normalization is not
Both normalisation and
employed in the star
denormalization are used in this.
schema.

It has a pretty simple


It has a complicated design.
design.

Page 44 © Copyright by Interviewbit


Data Warehouse Interview Questions

30. Differentiate between Agglomerative hierarchical clustering


and Divisive clustering.
Agglomerative hierarchical clustering : Flat clustering returns an unstructured set
of clusters. On the other hand, this structure is more informative. We don't have to
define the number of clusters in advance with this clustering procedure. Bottom-up
algorithms start by treating each piece of data as a singleton cluster, then
agglomerate pairs of clusters until all of them are merged into a single cluster that
contains all of the data.
Divisive Clustering : This approach also eliminates the need to define the number of
clusters ahead of time. It necessitates a method for breaking a cluster that contains
all of the data and then recursively splitting clusters until all of the data has been split
into singletons.
Following are the differences between the two :
When compared to agglomerative clustering, divisive clustering is more
complicated since we require a flat clustering algorithm as a "subroutine" to
split each cluster until each data has its own singleton cluster.
If we don't create a complete hierarchy all the way down to individual data
leaves, divisive clustering is more efficient.
A divisive algorithm is also more precise. Without first examining the global
distribution of data, agglomerative clustering makes judgments based on local
patterns or neighbour points. These early decisions are irreversible. When
generating top-level dividing decisions, divisive clustering takes into account the
global distribution of data.

31. What are the advantages of a cloud based data warehouse?


Following are the advantages of a cloud-based data warehouse:

Page 45 © Copyright by Interviewbit


Data Warehouse Interview Questions

Page 46 © Copyright by Interviewbit


Data Warehouse Interview Questions

Total cost of ownership is low: The low cost of cloud data warehouses is one of
the reasons they are becoming more popular. On-premises data warehouses
necessitate high-cost technology, lengthy upgrades, ongoing maintenance, and
outage management.
Increased performance and speed: To keep up with the expanding number of
data sources, cloud data warehouses are crucial. Cloud data warehouses can
easily and quickly integrate with additional data sources as needed, and deploy
the updated solution to production. Cloud data warehouses significantly
improve speed and performance, allowing IT to focus on more innovative
projects.
Enhanced Security: Cloud security engineers can create and iterate on precise
data-protection measures. Furthermore, cloud encryption technologies such as
multi-factor authentication make data transfer between regions and resources
extremely safe.
Improved Disaster Recovery: Physical assets are not required to prepare cloud
data warehouses for disasters. Instead, almost all cloud data warehouses offer
asynchronous data duplication and execute automatic snapshots and backups.
This data is kept across multiple nodes, allowing duplicate data to be accessed
at any time without stopping present activity.

Conclusion:

In this article, we have covered the most frequently asked interview questions on
data warehousing. ETL tools are o en required in a data warehouse and so one can
expect interview questions on ETL tools as well in a data warehouse interview.
References and Resources:
Data Warehouse Concepts
Data Engineer
ETL Testing
Azure

Page 47 © Copyright by Interviewbit


Data Warehouse Interview Questions

DBMS

Page 48 © Copyright by Interviewbit


Links to More Interview
Questions

C Interview Questions Php Interview Questions C Sharp Interview Questions

Web Api Interview Hibernate Interview Node Js Interview Questions


Questions Questions

Cpp Interview Questions Oops Interview Questions Devops Interview Questions

Machine Learning Interview Docker Interview Questions Mysql Interview Questions


Questions

Css Interview Questions Laravel Interview Questions Asp Net Interview Questions

Django Interview Questions Dot Net Interview Questions Kubernetes Interview


Questions

Operating System Interview React Native Interview Aws Interview Questions


Questions Questions

Git Interview Questions Java 8 Interview Questions Mongodb Interview


Questions

Dbms Interview Questions Spring Boot Interview Power Bi Interview Questions


Questions

Pl Sql Interview Questions Tableau Interview Linux Interview Questions


Questions

Ansible Interview Questions Java Interview Questions Jenkins Interview Questions

Page 49 © Copyright by Interviewbit

You might also like