0% found this document useful (0 votes)
53 views

DWH Tables

This document discusses data warehouse tables and schemas. It describes two main types of tables: dimension tables and fact tables. Dimension tables contain attributes that describe objects in fact tables, like keys and textual attributes. Fact tables contain numeric facts and measures that can be aggregated, like sales amounts. Schemas for organizing these tables include star schemas with one fact table at the center linked to multiple dimension tables, and snowflake schemas which further normalize dimension tables. The document provides an overview of dimensional modeling and different types of data warehouse tables and schemas.

Uploaded by

Toriq Al Fahrezy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

DWH Tables

This document discusses data warehouse tables and schemas. It describes two main types of tables: dimension tables and fact tables. Dimension tables contain attributes that describe objects in fact tables, like keys and textual attributes. Fact tables contain numeric facts and measures that can be aggregated, like sales amounts. Schemas for organizing these tables include star schemas with one fact table at the center linked to multiple dimension tables, and snowflake schemas which further normalize dimension tables. The document provides an overview of dimensional modeling and different types of data warehouse tables and schemas.

Uploaded by

Toriq Al Fahrezy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/340546587

COMPARATIVE STUDY ON DATA WAREHOUSE TABLES AND SCHEMA-AN


OVERVIEW

Article  in  Adalya · April 2020


DOI: 10.37896/aj9.4/016

CITATIONS READS

0 1,296

3 authors, including:

K Rajasekaran Saravanan P.
D B jain college Dhanraj baid jain college(Autonomous), Chennai
7 PUBLICATIONS   9 CITATIONS    28 PUBLICATIONS   34 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

PREDICTION OF PHARMA STOCKS USING DEEP LEARNING MODELS View project

Analysis of Various Contrast Improvement Techniques for Dehazing an Image View project

All content following this page was uploaded by Saravanan P. on 10 April 2020.

The user has requested enhancement of the downloaded file.


ADALYA JOURNAL https://doi.org/10.37896/aj9.4/016 ISSN NO: 1301-2746

COMPARATIVE STUDY ON DATA


WAREHOUSE TABLES AND SCHEMA–AN
OVERVIEW

Dr.K.Rajasekaran
Associate Professor, P.G. and Research Dept of Computer Science,
D.B.Jain College, Chennai, India
[email protected]

P.Saravanan
Assistant Professor, P.G. and Research Dept of Computer Science,
D.B.Jain College, Chennai, India
[email protected]

P.Prabakaran
Assistant Professor, P.G. and Research Dept of Computer Science,
D.B.Jain College, Chennai, India
[email protected]

Abstract- An Aim of focusing various types of tables and Schema in Data Warehouse. A Data Warehouse (DWH) is a
subject-oriented, integrated, time-variant, non-volatile and large amount of data in support of management’s decision-
making process. It is gaining importance day by day. The words On-Line Analytical Processing (OLAP) bring together a
set of tools that use multidimensional modeling in the extraction of information from the Data Warehouse.
Multidimensional data model is an connected part of On-Line Analytical Processing, or OLAP. It contains a
multidimensional view of huge amounts of historical data from operational sources, thus supplying useful information for
decision makers to improve their business intelligence which has become an integral part of decision making strategy.
There are two forms for data management i.e. operational databases and data warehouse. The operational databases are
where the put in data. Users of this type almost handle with one record at a time and they usually perform the same tasks.
The data warehouse is where we find the data away. Users of this kind of type almost deal with set of row at a time and
their questions require that thousands of rows be fetched into an answer set. The data stored in the warehouse is upload
from the operational system such as marketing, sales, finance etc. There are two types of Tables first one is
Dimension table and second one is Fact table.
Keywords - Dimensional modeling, Dimension table, Slowly Changing, Rapidly Changing, Junk, Inferred, Conformed,
Degenerate, Role Playing, Shrunken and Static Dimension table, Fact Table , Schema, Star, Snowflake and Fact
constellation (or) Multi-Star.

1. INTRODUCTION

The term Data Warehouse (DWH) was first coined by Bill Inmon in 1990. According to Bill Inmon, a data
warehouse is a subject-oriented, integrated, time-variant, non-volatile and large amount of data. This data (or)
information helps analyst to take informed decision in an organization.
An operational database undergoes frequent change on a daily basis on account of the communication that take
place. Assume a business executive wants to analyze earlier comment on any data such as a supplier, a product, or any
consumer data, then the executive will have no data available to study because the previous or old data has been updated
due to transactions.
A data warehouses provide us comprehensive and consolidate data in multidimensional view. Along with
comprehensive and consolidate view of data, a data warehouses also provide us Online Analytical Process (OLAP) tools.
These tools help us in interactive and efficient analysis of data in a multidimensional space. This analysis outcome in

Volume 9, Issue 4, April 2020 126 http://adalyajournal.com/


ADALYA JOURNAL https://doi.org/10.37896/aj9.4/016 ISSN NO: 1301-2746

data generalization and data mining.

1.1. Data Warehouse Features

The advantages of a data warehouse are given below:


• Subject Oriented - It provide data (or) information around a subject rather than the organization ongoing
operations. These subjects can be customers, product, suppliers, revenue, sales, etc. A data warehouse doesn’t focus on
the continuing operations; rather it focuses on model and study of data for decision making.
• Integrated – A data warehouse is construct by integrate data from heterogeneous sources such as relational
databases, flat files, etc. This integration enhances the effective analysis of data.
• Time Variant - The data collected in a data warehouse is standard with a particular time era. The data in a data
warehouse provide information from the historical data point of view.
• Non-volatile - Non-volatile means the before data is not erased when new data is added to it. A data warehouse is
kept split from the operational database and hence frequent change in operational database is not reflected in the data
warehouse.
Note: A data warehouse does not require transaction processing, concurrency controls, and recovery, because it is
physically stored and separate from the operational database.

Figure 1.1. Data warehouse Design

2. DATA WAREHOUSE DIMENSIONAL MODELING

Dimensional modeling (DM) is the design concept used by many data warehouse designers to build their data
warehouse. Dimensional model is the fundamental data model used by many of the commercial OLAP product available
today in the market. Dimensional model names a set of techniques and concepts used in design of data warehouse. It is
consider being different from entity-relationship modeling(ER). It is simpler, more communicative and easier to
understand than ER modeling. It is a technique for conceptualize and visualize data models as a set of measures that are
described by common aspects of the business. It is especially useful for summarizing and rearranging the data and
presenting views of the data to support data analysis.
Dimensional modeling focuses on numeric data such as values, counts, balances, weights, and occurrences. It does not
necessarily involve a relational database. The same model approach, at logical level, can be used for any physical form,
such as multidimensional database or even flat files. Dimensional model always uses the concept of facts (numeric
values) and dimensions (text). Facts are typically numeric values that can be aggregated and dimensions are groups of
hierarchies and descriptors that define the facts. For example, sales amount is a fact; register, timestamp, product, store,
etc. are elements of dimensions. Dimensional models are built by business process area, for example store sales,
inventory, claims, etc. Because the various business process areas share some but not all dimensions, efficiency in design,
operation, and consistency, is achieved using conformed dimensions, i.e. using one copy of the shared dimension across
subject areas. In this model, all data is contained in two types of tables called Dimension Table and Fact Table.

Volume 9, Issue 4, April 2020 127 http://adalyajournal.com/


ADALYA JOURNAL https://doi.org/10.37896/aj9.4/016 ISSN NO: 1301-2746

3. TABLES

A fact table consists of the measurements, metrics or facts of a business process. Tables mainly classified into two
types. There are

 Dimensions Table

 Fact Table

3.1. Dimensions Table


A Dimension table contain primary key and textual message. It is a table in a star schema of a data warehouse. A
dimension table stores attribute or dimension that describes the objects in a fact table. From the above example Item_no
dimension, the attributes are Item_name, supplier, etc. Generally the Dimension Attributes are used in report labels, and
query constraints such as where Supplier='P.Saravanan'.

Item_no Item_name Supplier


210 LED P.Saravanan
572 Refrigerator Dr.K .Rajasekaran
335 Air Dr.S.Balaji
Conditioner
348 Washing Dr.B.Jagadhesan
Machine
Table 1.1. Dimension Table
The dimension attributes contain one or more hierarchical relationships. Before design your data warehouse, you
need to decide what this data warehouse contains. The measure is a numeric attribute in the fact table which illustrate the
behavior of the business of the dimension. If you need to build a data into data warehouse contain monthly sales numbers
across multiple store locations, across time and across products then your dimensions are:

1) Location
2) Time
3) Product
Each dimension table contains data for one dimension. In above example get all your store location information and
put that into one single table called Location. Your store location data may be spanned across multiple tables in your
OLTP system (unlike OLAP), but you need to de-normalize all that data into one single table. A data warehouse require
concise, subject oriented schema that make possible on-line analysis. The majority of data model for data warehouse is
multidimensional model. There are nine types are Slowly Changing, Rapidly Changing, Junk, Inferred, Conformed,
Degenerate, Role Playing, Shrunken and Static Dimension table.
Slowly Changing Dimension- Attributes of a dimension that would undertake change over time. It depend on the
business requirement whether particular attribute history of change should be sealed in the data warehouse. This is called
a slowly changing attribute and a dimension contain such an attribute is called a slowly changing dimension.
Rapidly Changing Dimension- A dimension attribute that changes regularly is a rapidly altering attribute. If you
don’t need to track the change, the rapidly changing attribute is no problem, but if you do need to track the changes,
using a standard slowly changing dimension can result in a enormous increase of the size of the dimension. One solution
is to move the attribute to its own dimension, with a separate reference key in the fact table.
Junk Dimension - A junk dimension is a single table with a combination of various and dissimilar attributes to avoid
having a large number of reference keys in the fact table. Junk dimensions are often created to manage the reference keys
created by rapidly changing dimension.
Inferred Dimension- While loading fact(numeric) records, a dimension record may not yet be ready. One solution is
to generate a surrogate key with empty for all the other attributes. This should technically be called an inferred member,
but is often called an inferred dimension.
Conformed Dimension is used in various locations is called a conformed dimension. A conformed dimension may be
used with many fact tables in a single database, or across multiple data marts or data warehouses.
Degenerate Dimension is when the dimension attribute is store as part of fact table, and not in a part dimension
table. These are essentially dimension keys for which there are no other attributes. In a data warehouse, these are often
used as the result of a drill through query to examine the source of an aggregate number in a report. It can use these

Volume 9, Issue 4, April 2020 128 http://adalyajournal.com/


ADALYA JOURNAL https://doi.org/10.37896/aj9.4/016 ISSN NO: 1301-2746

values to outline back to communication in the OLTP system.


Role Playing Dimension is one where the same dimension key along with its associated attributes can be joined to
more than one reference key in the fact table. For example, a fact table may include reference keys for both ship date and
delivery date. But the same date dimension attributes apply to each reference key, so you can join the same dimension
table to both reference keys. Here the date dimension is taking many roles to map ship date as well as delivery date, and
therefore the name of role in performance dimension.
Shrunken Dimension is a subset of one more dimension. For example, the orders fact table may include a reference
key for product, but the target fact table may include a reference key only for product category, which is in the product
table, but much less granular. Creating a smaller dimension table, with product category as its primary key, is one way of
deali with this situation of heterogeneous grain. If the product dimension is snowflaked, there is probably already a
separate table for product category, which can serve as the shrunken dimension.
Static Dimensions are not extract from the original data source, created within the context of the data warehouse. A
static dimension can be loaded physically for example with status codes or it can be generate by a procedure, such as a
date or time dimension.
3.2. Fact Table
Fact table contains reference key and numerical value of business processes. For example business process is Sales
revenue, then a measurement of this business process such as "monthly sales number" is captured in the fact table. In
addition to the measurements, the only other things a fact table contains are reference keys for the dimension tables. The
fact is a set of related data, contains analytical context data and measures. It used to represent business items or business
transactions. A fact table is the central table in a star schema or multi-star schema of a data warehouse. A fact table stores
numerical information for analysis and is often denormalized. Suppose an electronic shop sells its product. Thus, every
sale is a fact that happens and the fact table is used to record these facts. For example:

Item_no. Branch_code Location Unit_Sold


210 1238 Chennai 15
572 4567 Salem 35
335 4892 Trichy 77
348 3296 Mumbai 48
Table 2.1. Fact Table

There are three types of facts:

Additive Facts- Additive facts can be used with any aggregation function like Sum( ), Avg( ) etc. Example is
Quantity, sales amount etc.
Semi Additive Facts- Semi-additive facts are those where only a few of aggregation function can be applied. For
example, Consider bank account details. You cannot apply the Sum( ) on the bank balance that does not give useful
results but min( ) and max( ) function may return useful information.
Non-Additive Facts- You cannot use numeric aggregation functions such as Sum( ), Avg( ) etc on Non-additive facts.
For example of non-additive fact is any kind of ratio or percentage. Non numerical values facts can also be a non-additive
facts.

Types of Fact-less Fact Tables

Fact-less Fact Tables- A fact table that does not have any measure is a fact-less fact table. For example, a fact table
which has only product_ID and date key is a fact-less fact table.
Centipede Fact Table- Centipede fact table is a normalized fact table. Modeler may decide to normalize the fact
instead of snow flaking dimensions tables.
Conformed Fact Tables- It can be measures reused across multiple dimension model. For example, KPI such as
profit, sales revenue etc

Incident and Snapshot Facts- A fact table store some kind of measurement and captured against a specific time.
Now it might so happen that the business might unable to capture all of its measure always for every point in time. Then
those unavailable measurement can be kept empty (or) can be filled up with the last available measurements. The first
one is the example of incident fact and the second one is the example of snapshot fact.
Cumulative Fact- This type of fact table describes what has happened over a period of time. For example, this fact
table may bring out the total sales by product by store by day.

Volume 9, Issue 4, April 2020 129 http://adalyajournal.com/


ADALYA JOURNAL https://doi.org/10.37896/aj9.4/016 ISSN NO: 1301-2746

4. SCHEMA
Schema is a logical description of the entire database and Collection of Dimension and Fact table. It contain the
name and detail of records of all record types including all associated data-items and aggregate value. Similar to a database,
a data warehouse also require to maintain a schema. A database uses relational model, while a data warehouse uses Star,
Snowflake, and Fact Constellation schema or Multi-Star Schema.
4.1. Star Schema

Each dimension table directly connected to one central fact table is called. This dimension table contains the set of
attributes. The following diagram show the sales fact table data of a company with respect to the four dimension tables are
namely time, item, branch, and location. There is a fact table at the center. It contains the keys to each of four dimensions
tables like namely time_key, item_key, branh_key and location_key. The fact table also contain the attributes are namely
dollars_sold and units_sold.

Figure 4.1 Star Schema

4.2. Snowflake Schema


A Snowflake Schema is all the dimension table directly is not connected to fact table is called. A few dimension
table in the Snowflake schema are normalized. The normalization split up the data into added tables. Dissimilar Star
schema, the dimensions table in a snowflake schema are normalized. For example, the item dimension table in star schema
is normalized and split into two dimension tables, namely item and supplier table.

Figure 4.2 Snowflake Schema

Volume 9, Issue 4, April 2020 130 http://adalyajournal.com/


ADALYA JOURNAL https://doi.org/10.37896/aj9.4/016 ISSN NO: 1301-2746

4.3 Fact Constellation Schema (or) Multi-Star Schema

Fact constellation has the entire dimension table directly connected to more than one central fact tables are called.
The following diagram show two fact tables, namely shipping and sales. Now the item dimension table contain the
attributes are item_key, item_name, brand, type, and supplier-key. The shipper_key is linked to the shipper dimension
table. The shipper dimension table contain the attributes are shipper _key , shipper _name and shipper _type.

Figure 4.3 Fact Constellation Schema (or) Multi-Star Schema

5. CONCLUSION

This paper discuss the data warehouse types of tables such as dimension tables are Slowly Changing, Rapidly
Changing, Junk, Inferred, Conformed, Degenerate, Role Playing, Shrunken and Static Dimensions table and fact tables
are Additive Fact , Semi- Additive Fact and Non-Additive Fact table ,Multidimensional schemas such as star schema,
snowflake schema and fact constellation or multi-star schema. The advantage of using these table and schema are simpler
and communicative, easy to read then E-R models. It is useful for terse and rearranging the data and presenting views of
the data to support data analysis. Data warehouses (DWH) separate study workload from transaction workload and allow
an organization to consolidate data from several sources. Dimensional model is a logical design technique for structure
data so that it is instinctive to business users and delivers fast query presentation. Data presented to the business
intelligence tools must be grounded in simplicity to stand any chance of success. Simplicity is a fundamental requirement
because it ensures that users can easily understand databases, as well as allows software to efficiently navigate databases.

References

[1] Mandeep Kaur Sandhu, Amanjot Kaur, Ramandeep Kaur: Data Warehouse Schemas, International Journal of
Innovative Research in Advanced Engineering (IJIRAE) ISSN: 2349-2163, Issue 4, Volume 2 (April 2015), pp-47-
51.
[2] Boris Vrdoljak, Marko Banek, and Stefano Rizzi: Designing Web Warehouses from XML Schemas Y. Kambayashi,
M. Mohania, W. Wöß (Eds.): DaWaK 2003, LNCS 2737, pp. 89-98, 2003. SpringerVerlag Berlin Heidelberg 2003
[3] Wolfgang Hummer, Andreas Bauer, Gunnar Harde: XCube – XML For Data Warehouses, DOLAP’03, November
7, 003, USA.
[4] M. Golfarelli, S. Rizzi, and B. Vrdoljak, Data warehouse design from XML sources., Proc. DOLAP’01, Atlanta, pp.
40-47, 2001.
[5] Jiawei Han and Micheline Kamber , “Data Mining Concepts and Technique” ,2nd Edition, , Morgan Kaufmann

Volume 9, Issue 4, April 2020 131 http://adalyajournal.com/


ADALYA JOURNAL https://doi.org/10.37896/aj9.4/016 ISSN NO: 1301-2746

Publisher.
[6] Ralph Kimball and Margy Ross, “The Data Warehouse Toolkit: The Complete Guide to Dimensional Modelling”
2nd Edition, R, John Willy & Sons.

[7] Soumya Sen, Ranak Ghosh, Debanjali Paul, Nabendu Chaki “Integrating related XML data into multiple data
warehouse schemas” research paper 2012-2013.
[8] R. Patel, MS.Alpa, “Data Modeling techniques for data warehouse” International Journal of Multidisciplinary
Research, Vol.2 Issue 2, February 2012, ISSN 2231 5780.
[9] Keshav Dev Gupta, Jyoti Gupta, Prakati Prasoon, “Novel Architecture with Dimensional Approach of Data
Warehouse” International Journal of Advanced Research in Computer Science and Software Engineering, Volume
3, Issue 3, March 2013.
[10] Anirban Sarkar “Data Warehouse Requirements Analysis Framework: Business-Object Based Approach”
International Journal of Advanced Computer Science and Applications, Vol. 3, No. 1, 2012.
[11] https://www.folkstalk.com/2010/01/types-of-dimensions.html
[12] http://dwgeek.com/types-of-fact-tables-data-warehouse.html/

Volume 9, Issue 4, April 2020 132 http://adalyajournal.com/


View publication stats

You might also like