0% found this document useful (0 votes)
4 views

Week5

The document provides an overview of Dimensional Modeling in data warehousing, detailing its significance, components, and types of schemas. It outlines the key concepts such as facts, dimensions, attributes, and the different types of dimensional designs like Star and Snowflake schemas. Additionally, it describes the steps involved in dimensional modeling and the advantages of using this approach for data organization and retrieval.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Week5

The document provides an overview of Dimensional Modeling in data warehousing, detailing its significance, components, and types of schemas. It outlines the key concepts such as facts, dimensions, attributes, and the different types of dimensional designs like Star and Snowflake schemas. Additionally, it describes the steps involved in dimensional modeling and the advantages of using this approach for data organization and retrieval.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Dimensional

Data Warehouse
Models
Sonny Boy M. Sasis
IT Faculty
Cagayan State University
Learning Objectives
At the end of this unit, the student is Compare and contrast the
expected to: 5 advantages and disadvantages of
each schema type
Distinguish Dimensional
1 Modeling

Articulate the significance of


2 dimensional modeling in data
warehousing
Identify the key concepts and
characteristics of Facts,
3 Dimensions, Attributes, and
Hierarchies

Determine between various


4 types of Dimensional Modeling
Designs and its components
Learning Content
This unit discusses/covers concepts
regarding:

Fundamentals of Dimensional
1 Modeling

Types of Dimensional Modeling


2 Design (i.e. Star Schema, Snowflake
Schema, etc.)

Components of the types of


3 Dimensional Modeling Design (Fact
Table, Dimension Tables)
Dimensional Modeling Dimensional Model vs.
● Dimensional Modeling (DM) is a data Relational Model
structure technique optimized for data
● A dimensional model in data warehouse
storage in a Data warehouse.
is designed to read, summarize, analyze
● DM’s purpose is to optimize the
numeric information like values,
database for faster retrieval of data.
balances, counts, weights, etc. in a data
● It was developed by Ralph Kimball and
warehouse. In contrast, relational
consists of “fact” and “dimension”
models are optimized for addition,
tables.
updating and deletion of data in a real-
time Online Transaction System.
● In the relational model, normalization
and ER models reduce redundancy in
data. On the contrary, dimensional
model in data warehouse arranges data
in such a way that it is easier to retrieve
information and generate reports.
● Hence, dimensional models are used in
data warehouse systems and not a
good fit for relational/operational
systems.
Components of
Dimensional Modeling ● Attributes are the various
● Facts are the measurements/metrics or characteristics of the dimension in
facts from your business process. For a dimensional data modeling. In the
Sales business process, a measurement Location dimension, the attributes can
would be quarterly sales number. be:
● State
● Dimension provides the context ● Country
surrounding a business process event. ● Zipcode etc.
In simple terms, they give who, what,
where of a fact. In the Sales business Attributes are used to search, filter, or
process, for the fact quarterly sales classify facts.
number, dimensions would be:
● Who – Customer Names ● Measures are numeric data based on
● Where – Location columns in a fact table. They are the
● What – Product Name primary data which end users are
interested in.
In other words, a dimension is a window
to view information in the facts. E.g. a sales fact table may contain a profit measure which
represents profit on each sale.
Components of
Dimensional Modeling
● Fact table is a primary table in ● Dimension table contains dimensions of
dimension modelling. It contains: a fact.
● Measurements/facts ● They are joined to fact table via a
● Foreign key to dimension table foreign key.
● A fact table can contain fact's data on ● Dimension tables are de-normalized
detail or aggregated level. tables.
E.g. Suppose a patient registers himself in the Hospital XYZ and
● The Dimension Attributes are the
Pays RS.1000 as registration fees, then a fact record is generated various columns in a dimension table.
for that transaction with PatientID, HospitalID, DepartmentID, and
TransactionID as dimension keys and RegistrationFees as
● Dimensions offers descriptive
measurements. characteristics of the facts with the help
of their attributes.
● No set limit given for number of
dimensions.
● The dimension can also contain one or
more hierarchical relationships.
E.g. Customer, Product, Location, Doctor, Patient, Ward, Test etc.
Typical Presentation of
Dimensional Modeling
Components

Brand Dollar Sales Unit Sales


Axon 780 263
Framis 1044 509
Widget 213 444
Zapper 95 39
Types of Dimensions in ● Junk Dimension/Dirty Dimension is
simply a dimension that stores the junk
Data Warehouse attributes. It is a collection of flags,
transactional ports and text attributes
● Conformed Dimensions are the that are not related to any dimension.
dimensions which once built in the
model can be reused with multiple Eg. Transaction info dimension for pharmacy.
times with different fact tables. Transaction Info Key Transaction Type Payment Type
Dimensions are confirmed when they
1 Regular Cash
are the same dimension or when one
2 Regular Check
dimension is strict rollup of another.
Same dimensions mean it should have 3 Regular Credit

exactly same set of primary keys and 4 Regular Debit


have the same number of records. 5 Refund Cash
● When one dimension is strict roll up of 6 Refund Check
another, means to confirm dimension to 7 Refund Credit
be combined into a single logical 8 Refund Debit
dimension by creating a union of the 9 No Sale Cash
attributes. 10 No Sale Check

Eg. Patient is the conformed dimension that is shared with


11 No Sale Credit
multiple facts such as diagnosis fact, billing fact so on. 12 No Sale Debit
Types of Dimensions in ● Slowly Changing Dimension(SCD) are
dimensions that are subjected to
Data Warehouse change over time. So, the concept of
SCD is to deal with the changes in
● Degenerate Dimension is a generated attributes, depending on the business
dimension key in the fact table which requirements on whether history of
doesn’t have its own dimension table. It changes for a particular attribute have
is directly related to an event stored in to be preserved or not in DWH.
the fact table but it is not eligible to be Key ID Name Designation Salary Status
stored in the separate dimension table. 1 100 ABC Apprentice 45000 0
2 100 ABC Surgeon 75000 1
Eg. Invoice number of the bills generated for the patient. Suppose
a patient pays the amount of bills then for each transactions the SCD based on Status Flag
invoice number is generated to keep track of the transaction.
Key ID Name Designation Salary Start Time End Time
1 100 ABC Apprentice 45000 01/22/2024 03.22.2025
● Static Dimensions are dimensions 2 100 ABC Surgeon 75000 01/22/2025 NULL
whose data are not extracted from the SCD based on Start Time and End Time
data source but are created in the Key ID Name Previous Current Previous Current
context of DWH. Stored procedure is Designation Designation Salary Salary

used to generate data for these 1 100 ABC Apprentice Surgeon 45000 75000

dimensions. SCD based on Previous and Current Values

E.g. The time dimension-it contains day, week, month, quarter, Surrogate Key/Synthetic Key is a key that is used as a primary key in
year, decade etc. the dimension table in data warehouse.
Types of Facts in DWH E.g. Suppose a pharmaceutical company ‘SUN” manufactures ten
types of drugs and the profit margin for each of the drugs is a non-
additive fact.

● Fully Additive Facts is said to be fully


additive fact if it could be summed up ● Factless Facts contain only dimensional
across all the dimensions. keys and do not contain facts as it
captures events for the information
Eg. In pharmaceutical company Sales amount of a medicine is an
additive fact, because it could be summed up along all the dimensions
rather than computation.
present in the fact table; time, store, product. Sales amount for all 7
days in a week represent the total sales amount for that week. E.g. Suppose a medicine is not sold throughout the year in a location,
only looking at fact table we can’t find which medicine is not sold in
which location because fact table has not had this record as this
● Semi Additive Facts is said to be semi- medicine is not sold in that location that’s why we needed a factless
fact table that identifies these scenarios and capturing such scenarios
additive fact if it could be summed up would help the concerning team focus on sale of the product by making
most of the dimension but not all. some business strategy such as giving some discounts on the product
etc.
Eg Suppose a patient gets admitted in a hospital named Chandryan for
two months and at the end of the day the total amount due on him for
the day is recorded in patient bill table. Here, total amount due is a
semi-additive fact measure.

● Non-Additive Facts is said to be non-


additive if it couldn’t be summed up any
of the dimension. All ratios are non-
additive.
1) Identify the Business Process.
Steps of Dimension Identifying the actual business process
Modeling a data warehouse should cover. This
could be Marketing, Sales, HR, etc. as
per the data analysis needs of the
organization. The selection of the
Business process also depends on the
quality of data available for that
process. It is the most important step
of the Data Modelling process, and a
failure here would have cascading and
irreparable defects.
2) Identify the Grain. The Grain describes
the level of detail for the business
problem/solution. It is the process of
identifying the lowest level of
the facts information for any table in your data
warehouse. If a table contains sales
data for every day, then it should be
daily granularity. If a table contains
total sales data for each month, then it
has monthly granularity.
4) Identify the Facts. This step is co-
Steps of Dimension associated with the business users of
Modeling the system because this is where they
get access to data stored in the data
3) Identify the Dimensions. Dimensions warehouse. Most of the fact table rows
are nouns like date, store, inventory, are numerical values like price or cost
etc. These dimensions are where all per unit, etc.
the data should be stored. For
Example of Facts:
example, the date dimension may The CEO at an MNC wants to find the sales for specific products in
contain data like a year, month and different locations on a daily basis.

weekday. The fact here is Sum of Sales by product by location by time.

Example of Dimensions: The CEO at an MNC wants to find the 5) Build Schema. In this step, you
sales for specific products in different locations on a daily basis.
implement the Dimension Model.
Dimensions: Product, Location and Time
Attributes: For Product: Product key (Foreign Key), Name, Type,
Specifications A schema is nothing but the database
Hierarchies: For Location: Country, State, City, Street Address,
Name
structure (arrangement of tables).
● It is a logical description of the entire
database.
● It includes the name and description of
records of all record types including all
associated data-Items and aggregates.
Disadvantage: Data Redundancy - values
Types of DWH Schema may be repeated in some instance like city,
province_or_state and country would be
● Star Schema is the basic form of a
repeated for two streets in the same city.
dimensional model, in which data are
Advantages:
organized into facts and dimensions. It
1. Simplest and Easiest
is called a star schema because the
2. It optimizes navigation through
diagram resembles a star, with points
database
radiating from a center. The center of
3. Most suitable for Query Processing
the star consists of the fact table, and
4. Faster performance as there are less
the points of the star is dimension
number of joins required
tables.
Types of DWH Schema SNOWFLAKE SCHEMA
● Snowflake Schema is an extension of
the star schema. In a snowflake
schema, each dimension are normalized
and connected to more dimension
tables. It is named snowflake because it
looks like a snowflake because of
decomposition of one de-normalized
dimension into many normalized
dimensions.
Disadvantages:
1. Slow Performance - too many joins
required to form the result.
2. It is a complex schema
Advantages:
1. Less redundancies due to
normalization of dimension tables.
2. Dimension Tables are easier to update.
Types of DWH Schema
● Galaxy Schema/Fact Constellation
Schema is the collection of multiple star
schemas in which multiple facts are
connected to their respective
dimensions. The resemblance of the
collection of star schemas looks like a
galaxy that’s why it is called galaxy
schema.
Disadvantages:
1. Complex due to multiple fact tables.
2. It is difficult to manage.
3. Dimension Tables are very large.
Advantages:
1. Ensures data reusability.
2. Guarantees referential integrity.
Benefits of Dimensional
Modeling
● Standardization of dimensions allows easy ● Dimensional models are deformalized and
reporting across areas of the business. optimized for fast data querying. Many relational
● Dimension tables store the history of the database platforms recognize this model and
dimensional information. optimize query execution plans to aid in
● It allows to introduce entirely new dimension performance.
without major disruptions to the fact table. ● Dimensional modelling in data warehouse creates
● Compared to the normalized model a schema which is optimized for high
dimensional table are easier to understand. performance. It means fewer joins and helps with
● Information is grouped into clear and simple minimized data redundancy.
business categories. ● Dimensional models can comfortably
● The dimensional model is very understandable accommodate change. Dimension tables can
by the business. This model is based on have more columns added to them without
business terms, so that the business knows affecting existing business intelligence
what each fact, dimension, or attribute means. applications using these tables.
● The dimensional model also helps boost query
performance. It is more denormalized;
therefore, it is optimized for querying.
Summary
● A dimensional model is a data ● Types of Dimensions are Conformed,
structure technique optimized for Junk/Dirty, Degenerate, Static, and
Data warehousing tools. Slowly Changing Dimensions.
● Facts are the measurements/metrics ● There are 4 types of facts: (1) Additive
or facts from your business process. (2) Non-additive (3) Semi-additive (4)
● Dimension provides the context Factless Facts
surrounding a business process ● Five steps of Dimensional modeling are
event. (1) Identify Business Process (2)
● Attributes are the various Identify Grain (level of detail) (3)
characteristics of the dimension Identify Dimensions (4) Identify Facts
modelling. (5) Build Schema
● Measures are numeric data based on ● A schema is nothing but the database
columns in a fact table. structure (arrangement of tables).
● A fact table is a primary table in a ● The three (3) types of schema are: (1)
dimensional model. Star (2) Snowflakes (3) Fact
● A dimension table contains Constellation/Galaxy Schema
dimensions of a fact.
Your Turn!
Let’s do the “Two Truths and a Lie”
learning activity. Thank you for
● Individual Reflection: Write down
three statements about the lesson,
participating!

ensuring one is a false statement.

● Sharing: Share and read aloud the


three statements before your
classmates. Great works are
● Class Guessing: The class tries to performed not by
determine which statement is the lie.
strength, but by
perseverance.
- Samuel Johnson
References
Deepika, K. et.al. Data Ponniah, P. Data Warehousing
Fundamentals for IT Professionals.
1 Warehousing. Bhumi Publishing. 4 2nd Edition. John Wiley & Sons, Inc.
ISBN: 978-93-88901-23-9.
ISBN 978-0-470-46207-2.

Inmon, W. Building the Data https://www.guru99.com/dimensio


2 Warehouse. 4th Edition. 5 nal-model-data-warehouse.html

https://tinyurl.com/Dimensional-
3
Kimball, R. The Data Warehouse 6 Modeling
Toolkit. 3rd Edition

You might also like