0% found this document useful (0 votes)
94 views

DataWarehouseInterview Part3

A snapshot refers to a complete visualization of data at a point in time and occupies less space than the source data. The granularity is the lowest level of detail stored in a fact table, such as year, month, or day. A junk dimension stores unrelated attributes like flags or Boolean values in a single dimension table.

Uploaded by

montosh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views

DataWarehouseInterview Part3

A snapshot refers to a complete visualization of data at a point in time and occupies less space than the source data. The granularity is the lowest level of detail stored in a fact table, such as year, month, or day. A junk dimension stores unrelated attributes like flags or Boolean values in a single dimension table.

Uploaded by

montosh
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

What is a snapshot with reference to Data Warehouse?

 A snapshot refers to a complete visualization of data at the time of extraction. It occupies less space
and can be used to back up and restore data quickly.
 A snapshot is a process of knowing about the activities performed. It is stored in a report format
from a specific catalog. The report is generated soon after the catalog is disconnected.

What is the level of granularity of a fact table?


The granularity is the lowest level of information stored in the fact table. The depth of the data level
is known as granularity. In date dimension, the level could be year, month, quarter, period, week,
and day of granularity.

What is junk dimension?


 In scenarios where certain data may not be appropriate to store in the schema, the data (or
attributes) can be stored in a junk dimension. The nature of the data of junk dimension is usually
Boolean or flag values.
 A single dimension is formed by lumping a number of small dimensions. This is called a junk
dimension. Junk dimension has unrelated attributes. The process of grouping random flags and text
attributes in a dimension by transmitting them to a distinguished sub-dimension is related to junk
dimension.

What are conformed dimensions?


Conformed dimensions are the dimensions which can be used across multiple data marts in
combination with multiple fact tables accordingly.

What is the main difference between Inmon and Kimball philosophies of Data Warehousing?
Both differ in the concept of building the Data Warehouse.
Hence, the process will be as follows:
Kimball > First Data Marts > Combined Ways > Data Warehouse
Inmon > First Data Warehouse > Data mart

What is data validation strategies for data mart validation after loading process

What is meant by Aggregate Factable?


An aggregate fact table stores information that has been aggregated, or summarized from a detail
fact table. Aggregate fact table are useful in improving query performance.

What is rapidly changing dimension?


A rapidly changing dimension is a result of poor decisions during the requirements analysis and data
modeling stages of the Data Warehousing project. If the data in the dimension table is changing a
lot, it is a hint that the design should be revisited

Rapidly changing dimensions are dimensions where the attribute values of the dimension change
frequently causing the dimension grow rapidly
The rapid growth of this dimension will impact maintenance and performance as the dimension
grows.

What is hybrid slowly changing dimension


Hybrid SCDs are combination of both SCDÂ 2 and SCD 3.Whatever changes done in source for each
and every record there is a new entry in target side, whether it may be UPDATE or INSERT
What is degenerate dimension table?
If a table contains the values, which r neither dimesion nor measures is called degenerate
dimensions.Ex : invoice id.
A Degenerate dimension is a Dimension which has only a single attribute.This dimension is typically
represented as a single field in a fact table.Degenerate Dimensions are the fastest way to group
similar transactions.

Sometimes a dimension is defined that has no content except for its primary key. For example, when
an invoice has multiple line items, the line item fact rows inherit all the descriptive dimension
foreign keys of the invoice, and the invoice is left with no unique content. But the invoice number
remains a valid dimension key for fact tables at the line item level. This degenerate dimension is
placed in the fact table with the explicit acknowledgment that there is no associated dimension
table. Degenerate dimensions are most common with transaction and accumulating snapshot fact
tables.

Degenerated Dimension is a dimension key without corresponding dimension. Example:

In the PointOfSale Transaction Fact table, we have:

Date Key (FK), Product Key (FK), Store Key (FK), Promotion Key (FP), and POS Transaction
Number

Date Dimension corresponds to Date Key, Production Dimension corresponds to Production Key. In a
traditional parent-child database, POS Transactional Number would be the key to the transaction
header record that contains all the info valid for the transaction as a whole, such as the transaction
date and store identifier. But in this dimensional model, we have already extracted this info into
other dimension. Therefore, POS Transaction Number looks like a dimension key in the fact table but
does not have the corresponding dimension table.

Therefore, POS Transaction Number is a degenerated dimension.

What Is Surrogate Key


Surrogate Key (SK) is sequentially generated meaningless unique number attached with each and
every record in a table in any Data Warehouse (DW).
It is UNIQUE since it is sequentially generated integer for each record being inserted in the table.
It is MEANINGLESS since it does not carry any business meaning regarding the record it is attached to
in any table.
It is SEQUENTIAL since it is assigned in sequential order as and when new records are created in the
table, starting with one and going up to the highest number that is needed.

Why Should We Use Surrogate Key


Basically it’s an artificial key that is used as a substitute for a Natural Key (NK). We should have
defined NK in our tables as per the business requirement and that might be able to uniquely identify
any record. But, SK is just an Integer attached to a record for the purpose of joining different tables
in a Star or Snowflake schema based DW. SK is much needed when we have very long NK or the
datatype of the NK is not suitable for Indexing.

A surrogate key is a key which does not have any contextual or business meaning. It is manufactured
“artificially” and only for the purposes of data analysis. The most frequently used version of a
surrogate key is an increasing sequential integer or “counter” value (i.e. 1, 2, 3). Surrogate keys can
also include the current system date/time stamp, or a random alphanumeric string.
What does level of Granularity of a fact table signify
The granularity is the lowest level of information stored in the fact table. The depth of data level is
known as granularity. In date dimension the level could be year, month, quarter, period, week, day
of granularity.

How do you load the time dimension


create a procedure to load data into Time Dimension. The procedure needs to run only once to
popullate all the data. For eg, the code below fills up till 2015. You can modify the code to suit the
feilds in ur table

What is a lookup table


if the data is not available in the source systems then we have to get the data by some reference
tables which are present in the database.these tables are called lookup tables

Differences between star and snowflake schemas ?


A single fact table with N number of Dimension. Any dimensions with extended dimensions are
know as snowflake schema

You might also like