Best Data Warehouse Interview Quastions
Best Data Warehouse Interview Quastions
Questions
© Copyright by Interviewbit
Contents
21. What do you understand about a data cube in the context of data warehousing?
Data warehousing (DW) is a method of gathering and analysing data from many
sources in order to get useful business insights. Typically, a data warehouse is used to
integrate and analyse corporate data from many sources. The data warehouse is the
heart of the business intelligence (BI) system, which is designed to analyse and report
on data.
It is a collection of technology and components that help with data strategy. It refers
to a company's electronic storage of a huge volume of data that is intended for
inquiry and analysis rather than transaction processing. It is a method of converting
data into information and making it available to people in a timely manner so that it
can be used to make a difference. It is created by combining data from a variety of
disparate sources to provide analytical reporting, structured and/or ad hoc queries,
and decision-making. Cleaning, integrating, and consolidating data are all part of
data warehousing.
The Data Warehouse is kept distinct from the operational database of the company.
It is an environment rather than a product. It is an information system's architectural
design that gives users access to current and historical decision-support data that's
difficult to find or find in a standard operational data store.
For example, a report on current inventory information may have more than 12
connected conditions. This can cause the query and report to take a long time to
respond. A data warehouse introduces a novel design that can help to improve query
performance and minimise response time for reporting and analytics.
The following are some alternative names for the data warehouse system:
Data mining is the process of collecting information in order to find patterns, trends,
and usable data that will help a company to make data-driven decisions from large
amounts of data. In other words, Data Mining is the method of analysing hidden
patterns of data from various perspectives for categorization into useful data, which
is gathered and assembled in specific areas such as data warehouses, efficient
analysis, data mining algorithm, assisting decision making, and other data
requirements, ultimately resulting in cost-cutting and revenue generation. Data
mining is the process of automatically examining enormous amounts of data for
patterns and trends that go beyond simple analysis. Data mining estimates the
probability of future events by utilising advanced mathematical algorithms for data
segments.
Following are the differences between data warehousing and data mining:-
The record of a reality or fact table could be made up of attributes from various
dimension tables. The Fact Table, also known as the Reality Table, assists the user in
investigating the business aspects that aid him in call taking in order to improve his
firm. Dimension Tables, on the other hand, make it easier for the reality table or fact
table to collect dimensions from which measurements must be taken.
The following table enlists the difference between a fact table and a dimension
table:-
It contains information in
It only contains textual
both numeric and textual
information.
formats.
10. What are the different types of data marts in the context of
data warehousing?
Following are the different types of data mart in data warehousing:
Dependent Data Mart: A dependent data mart can be developed using data
from operational, external, or both sources. It enables the data of the source
company to be accessed from a single data warehouse. All data is centralized,
which can aid in the development of further data marts.
Independent Data Mart: There is no need for a central data warehouse with this
data mart. This is typically established for smaller groups that exist within a
company. It has no connection to Enterprise Data Warehouse or any other data
warehouse. Each piece of information is self-contained and can be used
independently. The analysis can also be carried out independently. It's critical to
maintain a consistent and centralized data repository that numerous users can
access.
Hybrid Data Mart: A hybrid data mart is utilized when a data warehouse
contains inputs from multiple sources, as the name implies. When a user
requires an ad hoc integration, this feature comes in handy. This solution can be
utilized if an organization requires various database environments and quick
implementation. It necessitates the least amount of data purification, and the
data mart may accommodate huge storage structures. When smaller data-
centric applications are employed, a data mart is most effective.
A database's
Because a data warehouse is tables and joins
denormalized, tables and joins are are complicated
straightforward. because they are
normalised.
It can be referred
to as an
It can be referred to as a subject-
application-
oriented collection of data.
oriented
collection of data.
In this, Entity-
Relationship (ER)
In this, data modelling techniques are modelling
used for designing. techniques are
used for
designing.
Data is generally
Data may not be up to date in this.
up to date in this.
The technical capacity to collect transactions as they change and integrate them into
the warehouse, as well as maintaining batch or planned cycle refreshes, is known as
active data warehousing. Automating routine processes and choices is possible with
an active data warehouse. The active data warehouse sends decisions to the On-Line
Transaction Processing (OLTP) systems automatically. An active data warehouse is
designed to capture and distribute data in real time. They give you a unified view of
your customers across all of your business lines. Business Intelligence Systems are
linked to it.
Metadata is defined as information about data. Metadata is the context that provides
data a more complete identity and serves as the foundation for its interactions with
other data. It can also be a useful tool for saving time, staying organised, and getting
the most out of the files you're working with. Structural Metadata describes how an
object should be classified in order to fit into a wider system of things. Structural
Metadata makes a link with other files that allows them to be categorized and used in
a variety of ways. Administrative Metadata contains information about an object's
history, who owned it previously, and what it can be used for. Rights, licences, and
permissions are examples. This information is useful for persons who are in charge of
managing and caring for an asset.
When a piece of information is placed in the correct context, it takes on a whole new
meaning. Furthermore, better-organized Metadata will considerably reduce search
time.
Snowflakes
Oracle Exadata
Apache Hadoop
SAP BW4HANA
Microfocus Vertica
Teradata
AWS Redshi
GCP Big Query
18. Enlist some of the renowned ETL tools currently used in the
industry.
Some of the renowned ETL tools currently used in the industry are as follows :
Informatica
Talend
Pentaho
Abnitio
Oracle Data Integrator
Xplenty
Skyvia
Microso – SQL Server Integrated Services (SSIS)
The data is first gathered from external sources (same as happens in top-down
approach).
The data is then imported into data marts rather than data warehouses a er
passing through the staging area (as stated above). The data marts are built first,
and they allow for reporting. It focuses on a specific industry.
A er that, the data marts are incorporated into the data warehouse.
23. What are the advantages and disadvantages of the top down
approach of data warehouse architecture?
Following are the advantages of the top down approach :
Because data marts are formed from data warehouses, they have a consistent
dimensional perspective.
This methodology is also thought to be the most effective for corporate reforms.
As a result, large corporations choose to take this method.
It is simple to create a data mart from a data warehouse.
The disadvantage of the top down approach is that the cost, time, and effort
required to design and maintain it are all very expensive.
Purging removes data permanently and frees up memory or storage space for other
purposes, whereas deletion is commonly thought of as a temporary preference.
Automatic data purging features are one of the methods for data cleansing in
database administration. Some Microso products, for example, feature an
automatic purge strategy that uses a circular buffer mechanism, in which older data
is purged to create room for fresh data. Administrators must manually remove data
from the database in other circumstances.
Identifying the business process : The first step is to identify the specific
business processes that a data warehouse should address. This might be
Marketing, Sales, or Human Resources, depending on the organization's data
analytic needs. The quality of data available for that process is also a factor in
deciding which business process to use. It is the most crucial step in the Data
Modeling process, and a failure here would result in a cascade of irreversible
flaws.
Identifying the grain : The level of detail for the business problem/solution is
described by the grain. It's the procedure for determining the lowest level of
data in any table in your data warehouse. If a table contains sales data for each
day, the granularity should be daily. Monthly granularity is defined as a table that
contains total sales data for each month.
Identifying the dimension : Date, shop, inventory, and other nouns are
examples of dimensions. All of the data should be saved in these dimensions.
The date dimension, for example, could include information such as the year,
month, and weekday.
Identifying the fact : This stage is linked to the system's business users because
it is here that they gain access to data housed in the data warehouse. The
majority of the rows in the fact table are numerical values such as price or cost
per unit.
Building the schema : The Dimension Model is implemented in this step. The
database structure is referred to as a schema (arrangement of tables).
A data lake is a huge container that looks a lot like a lake or a river. Similar to how a
lake has various tributaries, a data lake has structured data, unstructured data,
machine-to-machine communication, and logs flowing through in real-time.
The following table enlists the differences between data lake and data warehouse:
It is really simple to
It is tough to comprehend.
comprehend.
Normalization is not
Both normalisation and
employed in the star
denormalization are used in this.
schema.
Total cost of ownership is low: The low cost of cloud data warehouses is one of
the reasons they are becoming more popular. On-premises data warehouses
necessitate high-cost technology, lengthy upgrades, ongoing maintenance, and
outage management.
Increased performance and speed: To keep up with the expanding number of
data sources, cloud data warehouses are crucial. Cloud data warehouses can
easily and quickly integrate with additional data sources as needed, and deploy
the updated solution to production. Cloud data warehouses significantly
improve speed and performance, allowing IT to focus on more innovative
projects.
Enhanced Security: Cloud security engineers can create and iterate on precise
data-protection measures. Furthermore, cloud encryption technologies such as
multi-factor authentication make data transfer between regions and resources
extremely safe.
Improved Disaster Recovery: Physical assets are not required to prepare cloud
data warehouses for disasters. Instead, almost all cloud data warehouses offer
asynchronous data duplication and execute automatic snapshots and backups.
This data is kept across multiple nodes, allowing duplicate data to be accessed
at any time without stopping present activity.
Conclusion:
In this article, we have covered the most frequently asked interview questions on
data warehousing. ETL tools are o en required in a data warehouse and so one can
expect interview questions on ETL tools as well in a data warehouse interview.
References and Resources:
Data Warehouse Concepts
Data Engineer
ETL Testing
Azure
DBMS
Css Interview Questions Laravel Interview Questions Asp Net Interview Questions