0% found this document useful (0 votes)

25 views84 pages

Dimensional Modeling

Uploaded by

vanshikaedu0105

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views84 pages

Dimensional Modeling

Uploaded by

vanshikaedu0105

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 84

Dimensional Modeling

Nature of business data

Business Data
 The users tend to think in terms of business
dimensions and analyze measurements along
such business dimensions.
 The business dimensions are different and
relevant to the industry and to the subject for
analysis.
 Time dimension is a common dimension.
 Almost all business analyses are performed
over time.
Nature of business data
 Sometimes, the users are unable to describe
fully what they expect.
 So, when requirements cannot be fully
determined, a new and innovative concept is
needed to gather and record the requirements.
 The traditional methods are not adequate in this
context.
 The new methodology for determining
requirements for a data warehouse system is
based on business dimensions.
 The new concept incorporates the basic
measurements and the business dimensions
along which the users analyze these basic
measurements.
 You come up with what is known as an
information package for specific subject.
 Primary goal in the requirements definition
phase is to compile information packages for all
the subjects for the data warehouse.
An automobile manufacturer- analyze sales.
Product, dealer, customer demographic, method of payment, and time.
A hotel chain- hotel occupancy.
Hotel, room type, and time.
Metrics for analyzing hotel occupancy
• Occupied rooms
• Vacant rooms
• Unavailable rooms
• Number of occupants
• Revenue
Information Subject: Hotel Occupancy
Dimensions
Time Hotel Room Type
Hierarchies/ Year Hotel Line Room Type
Categories Quarter Branch Name Room Size
Day of Month
Holiday Flag Month Branch Code Number of
Beds
Date Region Type of Bed
Day of Week Address Max.
Occupants
Suite
Refrigerator

Kitchenette

Facts: Occupied Rooms, Vacant Rooms, Unavailable

Rooms, Number of Occupants, Revenue
Design Decisions
• Choosing the process Selecting the subjects from the
information packages for the first set of logical structures to be
designed.
• Choosing the grain Determining level of detail for data in the
data structures.
• Identifying and conforming the dimensions Choosing the
business dimensions (such as product, market, time, etc.) to be
included in the first set of structures and making sure that each
particular data element in every business dimension is
conformed to one another.
• Choosing the facts Selecting the metrics or units of
measurements (such as product sale units, dollar sales, dollar
revenue, etc.) to be included in the first set of structures.
• Choosing the duration of the database Determining how far
back in time you should go for historical data.
• Dimensional modeling gets its name from the business
dimensions we need to incorporate into the logical data
model.
• It is a logical design technique to structure the business
dimensions and the metrics that are analyzed along
these dimensions.
• This modeling technique is intuitive.
• The model provides high performance for queries and
analysis.
• Consists of the specific data structures needed to
represent the business dimensions. These data
structures also contain the metrics or facts.
From STAR schema, the users can easily visualize
answers to these questions:
For a given amount of dollars, what was the product sold?
Who was the customer?
Which salesperson brought the order?
When was the order placed?
• Let us examine a typical query against the automaker
sales data. How much sales proceeds did the Jeep
Cherokee, Year 2000 Model with standard options,
generate in January 2000 at Big Sam Auto dealership
for buyers who own their homes and who took 3-year
leases, financed by Daimler-Chrysler Financing?
• The attributes in the dimension tables act as constraints
and filters in our queries. Any or all of the attributes of
each dimension table can participate in a query.
• Each dimension table has an equal chance to be part of
a query.
The marketing department wants the quantity sold and order dollars for product
bigpart-1, relating to customers in the state of Maine, obtained by salesperson
Jane Doe, during the month of June.
Drill-down analysis from the STAR schema
Inside a Dimension Table
Inside the Fact Table
The Factless Fact Table
Data Granularity
• Fact tables at the lowest grain facilitate "graceful"
extensions.
• But we have to pay the price in terms of storage and
maintenance for the fact table at the lowest grain.
• In practice, however, we build aggregate fact tables to
support queries looking for summary numbers.
STAR SCHEMA KEYS
Primary Keys
product code in the operational system is an 8-position code,
2 - code of the warehouse where the product is normally stored
2 - product category

What if a product is now stored in a different warehouse of the

company?
Problems in aggregation
Foreign Keys
Primary key of each dimension table must be a foreign key in the fact
table.
1) A single compound primary key whose length is the total length
of the keys of the individual dimension tables. Under this option,
in addition to the compound primary key, the foreign keys must
also be kept in the fact table as additional attributes. This option
increases the size of the fact table.
2) Concatenated primary key that is the concatenation of all the
primary keys of the dimension tables. Here you need not keep the
primary keys of the dimension tables as additional attributes to
serve as foreign keys. The individual parts of the primary keys
themselves will serve as the foreign keys.
3) A generated primary key independent of the keys of the
dimension tables. In addition to the generated primary key, the
foreign keys must also be kept in the fact table as additional
attributes. This option also increases the size of the fact table.
 The STAR schema reflects exactly how the users
think and need data for querying and analysis.

 STAR schema defines the join paths in exactly

the same way users normally visualize the
relationships.

 The STAR schema is intuitively understood by

the users.

 It is easy to use it as a vehicle for communicating

with the users during the development of the
data warehouse.
Irrespective of the number of dimensions that
participate in the query and irrespective of the
complexity of the query, every query is simply
executed first by selecting rows from the
dimension tables using the filters based on the
query parameters and then finding the
corresponding fact table rows.
UPDATES TO THE DIMENSION TABLES

 The fact table continues to grow in the number of rows

over time.
 Very rarely are the rows in a fact table updated with

changes.
 Even when there are adjustments to the prior numbers,

these are also processed as additional adjustment rows

and added to the fact table.
 Compared to the fact table, the dimension tables are

more stable and less volatile. However, a dimension table

changes through the attributes themselves.
Slowly Changing Dimensions

1) A customer's status changes from rental home

to own home
2) When finance type changes for one of the
payment methods
Slowly Changing Dimensions
 Most dimensions are generally constant over time
 Many dimensions, though not constant over time, change
slowly
 The product key of the source record does not change

 The description and other attributes change slowly

over time
 In the source OLTP systems, the new values overwrite
the old ones
 Overwriting of dimension table attributes is not always the
appropriate option in a data warehouse
 The ways changes are made to the dimension tables
depend on the types of changes and what information
must be preserved in the data warehouse
Type 1 Changes: Correction of Errors

1) A spelling error in the customer name

2) Customer name is changed
3) The marital status changed from single to
married.
Type 1 Changes: Correction of Errors

 Usually, the changes relate to correction of

errors in source systems
 Sometimes the change in the source system

has no significance
 The old value in the source system needs to be

discarded
 The change in the source system need not be

preserved in warehouse
Type 1 Changes: Correction of Errors
Type 2 Changes: Preservation of History

Eg: Change in marital status and customer

address

They usually relate to true changes in source

systems
There is a need to preserve history in the data
warehouse
This type of change partitions the history in the
data warehouse
Every change for the same attribute must be
preserved
Type 2 Changes: Preservation of History

• Add a new dimension table row with new

value of the changed attribute
• An effective date field may be included in
the dimension table
• There are no changes to the original row in
the dimension table
• The key of the original row is not affected
• The new row is inserted with a new
surrogate key
Type 2 Changes: Preservation of History
Type 3 Changes: Tentative Soft Revisions
Type 1 changes are more common.

Type 2 changes preserve the history. When a Type 2

change is applied on a certain date, that date is a cut-off
point.

Sometimes, there is a need to track both the old and

new values of changed attributes for a certain period, in
both forward and backward directions.
These types of changes are Type 3 changes.
Type 3 changes are tentative or soft changes.
Eg. realignment of the territorial assignments for
salespersons.
Type 3 Changes: Tentative Soft Revisions

• They usually relate to "soft" or tentative

changes in the source systems
• There is a need to keep track of history with old
and new values of the changed attribute
• They are used to compare performances
across the transition
• They provide the ability to track forward and
backward
Type 3 Changes: Tentative Soft Revisions
• Add an "old" field in the dimension table for the affected
attribute
• Push down existing value of attribute from "current" field to
the "old" field
• Keep the new value of the attribute in the "current" field
• Also, you may add a "current" effective date field for the
attribute
• The key of the row is not affected
• No new dimension row is needed
• The existing queries will seamlessly switch to the "current"
value
• Any queries that need to use the "old" value must be
revised accordingly
• The technique works best for one "soft" change at a time
• If there is a succession of changes, more sophisticated
techniques must be devised
Type 3 Changes: Tentative Soft Revisions
Large Dimensions

very deep - very large number of rows.

very wide - large number of attributes.
Eg. The customer and product dimensions
Customer Product
Huge—20 million rows 100,000 product variations
Up to 150 dimension attributes 100 dimension attributes
Can have multiple hierarchies Can have multiple hierarchies

Data warehouse functions could be slow and inefficient.

Inefficiencies in fact table queries when large dimensions
need to be used
Additional rows created to handle Type 2 slowly changing
dimensions
Multiple Hierarchies
Rapidly Changing Dimensions
Junk Dimensions
• Some fields like miscellaneous flags and textual
fields from source data structures may not be
included as significant fields in the major
dimensions, but cannot be discarded either.
• Keep only those flags and texts that are
meaningful; group all the useful flags into a
single "junk" dimension.
• "Junk" dimension attributes are useful for
constraining queries based on flag/text values.
THE SNOWFLAKE SCHEMA

500 - product brands

10 - product categories
500,000 - product dimension rows
a query constraining just on product category
THE SNOWFLAKE SCHEMA
THE SNOWFLAKE SCHEMA

1. Partially normalize only a few dimension

tables, leaving the others intact
2. Partially or fully normalize only a few
dimension tables, leaving the rest intact
3. Partially normalize every dimension table
4. Fully normalize every dimension table
THE SNOWFLAKE SCHEMA
THE SNOWFLAKE SCHEMA
Eliminating all long text fields from the dimension
tables can save storage space. For example:
category name-"men's furnishings"
Product dimension table - 500,000 rows.
snowflaking can remove 500,000 20-byte category
names.
4-byte artificial category key to the dimension
table.
The net savings = 500,000 *16 byte = 8 MB.
500,000-row product dimension table - 200 MB
Fact table - 20 GB.
The savings are just 4%.
THE SNOWFLAKE SCHEMA
Advantages
Small savings in storage space
Normalized structures are easier to update
and maintain

Disadvantages
Schema less intuitive and end-users are put
off by the complexity
Ability to browse through the contents difficult
Degraded query performance because of
additional joins
THE SNOWFLAKE SCHEMA
Aggregate Fact Tables
Query 1: Total sales for customer number
12345678 during the first week of December
2000 for product Widget-1

Query 2: Total sales for customer number

12345678 during the first three months of 2000
for product Widget-1.

Query 3: Total sales for all customers in the

South-Central territory for the first two quarters of
2000 for product category Bigtools.

No. of rows-7,90,large
Assume that there is at least one sale per product
per store per week
1. Query involves 1 product, 1 store, 1 week—only
1 fact table row
2. Query involves 1 product, all stores, 1 week—
300 fact table rows
3. Query involves 1 brand, 1 store, 1 week—500
fact table rows
4. Query involves 1 brand, all stores, 1 year—
7,800,000 fact table rows

If summarized the totals for a brand, per store, per

week-3rd query only one row
4th query -15600 rows
When you rise to higher levels in hierarchy of one dimension and keep the level
at the lowest in the other dimensions, you create one-way aggregate tables.

Product category by territory by month

Product department by territory by month
All products by territory by month
Product category by region by month
Product department by region by month
All products by region by month
Product category by all stores by month
Product department by all stores by month
Product category by territory by quarter
Product department by territory by quarter
All products by territory by quarter
Product category by region by quarter
Product department by region by quarter
All products by region by quarter
Product category by all stores by quarter
Product department by all stores by quarter

Deloitte Data Analyst Interview Questions 1743734558
No ratings yet
Deloitte Data Analyst Interview Questions 1743734558
40 pages
Peer-Graded Assignment - Week 4
33% (3)
Peer-Graded Assignment - Week 4
2 pages
60 Multiple Choice Questions
No ratings yet
60 Multiple Choice Questions
11 pages
Alteryx Designer Cheat Sheet
No ratings yet
Alteryx Designer Cheat Sheet
2 pages
Excel Lamda Collate 160722
No ratings yet
Excel Lamda Collate 160722
264 pages
03 IMGrind - Media Buying Icon PDF
No ratings yet
03 IMGrind - Media Buying Icon PDF
83 pages
WWW - Sanketham.tk: Excel Formulae
No ratings yet
WWW - Sanketham.tk: Excel Formulae
211 pages
EX2013 Capstone Level3 Instructions
0% (2)
EX2013 Capstone Level3 Instructions
5 pages
Net Backup Interview Questions Symantec Connect
100% (1)
Net Backup Interview Questions Symantec Connect
9 pages
Data Warehouse Concepts
No ratings yet
Data Warehouse Concepts
11 pages
Data Warehousing OLAP
No ratings yet
Data Warehousing OLAP
8 pages
10 Excel Functions
No ratings yet
10 Excel Functions
1 page
SQL Queries and PL/SQL
No ratings yet
SQL Queries and PL/SQL
92 pages
Hadoop Fundamentals
No ratings yet
Hadoop Fundamentals
45 pages
5 Pivot Tables You Probably Haven't Seen Before - Exceljet
No ratings yet
5 Pivot Tables You Probably Haven't Seen Before - Exceljet
7 pages
Ms SQL Notes
No ratings yet
Ms SQL Notes
14 pages
PostgreSQL Internals and Performance Optimization
No ratings yet
PostgreSQL Internals and Performance Optimization
30 pages
Cheat Sheet: Hive Basics
No ratings yet
Cheat Sheet: Hive Basics
1 page
Mysql Commands
0% (1)
Mysql Commands
3 pages
Excel VBA - Message and Input Boxes in Excel, MsgBox Function, InputBox Function, InputBox Method
No ratings yet
Excel VBA - Message and Input Boxes in Excel, MsgBox Function, InputBox Function, InputBox Method
2 pages
Color Names
No ratings yet
Color Names
5 pages
MySQL Commands PDF
No ratings yet
MySQL Commands PDF
12 pages
Hadoop
No ratings yet
Hadoop
34 pages
Various MDX Queries
No ratings yet
Various MDX Queries
13 pages
17 ch17 p17-1-17-46
No ratings yet
17 ch17 p17-1-17-46
46 pages
Quantitative Techniques & Operations Research: Ankit Sharma Neha Rathod Suraj Bairagi Vaibhav Thamman
No ratings yet
Quantitative Techniques & Operations Research: Ankit Sharma Neha Rathod Suraj Bairagi Vaibhav Thamman
12 pages
Nosql: Basics: Alexey Zinovyev, Java/Bigdata Trainer in Epam
No ratings yet
Nosql: Basics: Alexey Zinovyev, Java/Bigdata Trainer in Epam
145 pages
Formulas in Excel Spreadsheet
100% (1)
Formulas in Excel Spreadsheet
19 pages
DM 0903 Data Stage Slowly Changing PDF
No ratings yet
DM 0903 Data Stage Slowly Changing PDF
32 pages
Looker
No ratings yet
Looker
57 pages
Querying Microsoft SQL Server
No ratings yet
Querying Microsoft SQL Server
3 pages
General Ledger Basics
No ratings yet
General Ledger Basics
1 page
Excel Formulas Cheat Sheet
No ratings yet
Excel Formulas Cheat Sheet
2 pages
DailyNotes(2)
100% (1)
DailyNotes(2)
77 pages
Hands-On Lab 5 - Cleaning Data
No ratings yet
Hands-On Lab 5 - Cleaning Data
5 pages
Chap 011
No ratings yet
Chap 011
70 pages
Unit V
No ratings yet
Unit V
51 pages
Exam Da 100 Analyzing Data With Microsoft Power Bi Skills Measured
No ratings yet
Exam Da 100 Analyzing Data With Microsoft Power Bi Skills Measured
9 pages
File Handling in
No ratings yet
File Handling in
10 pages
Access Part 2 Bangla Book
0% (1)
Access Part 2 Bangla Book
16 pages
Database Systems Scse
No ratings yet
Database Systems Scse
80 pages
Data Warehousing
No ratings yet
Data Warehousing
39 pages
Section11-Data Manipulation Skills Checklist - Pages
No ratings yet
Section11-Data Manipulation Skills Checklist - Pages
2 pages
Adbms Data Warehousing and Data Mining
No ratings yet
Adbms Data Warehousing and Data Mining
169 pages
VBA Error Codes
No ratings yet
VBA Error Codes
4 pages
Hadoop Commands Cheat Sheet
No ratings yet
Hadoop Commands Cheat Sheet
1 page
Example:: Transact-SQL Functions 135
No ratings yet
Example:: Transact-SQL Functions 135
104 pages
239 Excel Shortcuts For Windows - MyOnlineTrainingHub
100% (1)
239 Excel Shortcuts For Windows - MyOnlineTrainingHub
17 pages
Star Schema Vs Snowflake Schema - 5 Differences - Xplenty
No ratings yet
Star Schema Vs Snowflake Schema - 5 Differences - Xplenty
15 pages
Explain in Detail About Hadoop Framework
No ratings yet
Explain in Detail About Hadoop Framework
4 pages
Data Analytics With PowerBI
No ratings yet
Data Analytics With PowerBI
27 pages
Alteryx Topic
No ratings yet
Alteryx Topic
2 pages
Getting Started With Tableau Prep
No ratings yet
Getting Started With Tableau Prep
3 pages
SQL Crash Course PDF
No ratings yet
SQL Crash Course PDF
24 pages
Introduction To Spreadsheet Modeling - Winston Albright
No ratings yet
Introduction To Spreadsheet Modeling - Winston Albright
46 pages
VBA Is Short For Visual Basic For Application.: History Hint
No ratings yet
VBA Is Short For Visual Basic For Application.: History Hint
4 pages
SQL Server Import Manual
No ratings yet
SQL Server Import Manual
132 pages
SQL01 - Introduction To Business Intelligence
No ratings yet
SQL01 - Introduction To Business Intelligence
75 pages
FOC Record 2012: by FHM - Afzal Bijli.,M.Tech Asst Professor, IT Msajce
No ratings yet
FOC Record 2012: by FHM - Afzal Bijli.,M.Tech Asst Professor, IT Msajce
11 pages
The Power of Prediction in Health Care: A Step-by-step Guide to Data Science in Health Care
From Everand
The Power of Prediction in Health Care: A Step-by-step Guide to Data Science in Health Care
Rafiq Muhammad
No ratings yet
Exploring Data with Access 2016
From Everand
Exploring Data with Access 2016
Larry Rockoff
No ratings yet
Better Proposals Yield Better Wins!
From Everand
Better Proposals Yield Better Wins!
Howard Nevin
No ratings yet
Microsoft Dynamics AX 2012 Development Cookbook
From Everand
Microsoft Dynamics AX 2012 Development Cookbook
Mindaugas Pocius
No ratings yet
10 Socket Programming Using TCP or UDP
No ratings yet
10 Socket Programming Using TCP or UDP
1 page
DWM Question Bank2024 (Unit1,2)
No ratings yet
DWM Question Bank2024 (Unit1,2)
4 pages
TCS Lect 9 - 10 FSM With Output
No ratings yet
TCS Lect 9 - 10 FSM With Output
36 pages
Module 2 - Chapter 2 Requirement Engineering
No ratings yet
Module 2 - Chapter 2 Requirement Engineering
22 pages
MYSQL STRING FUNCTIONS by Wadie Belghiti
No ratings yet
MYSQL STRING FUNCTIONS by Wadie Belghiti
27 pages
HVAC Design
No ratings yet
HVAC Design
450 pages
Objectives of DBMS:: Data Availability
No ratings yet
Objectives of DBMS:: Data Availability
2 pages
Application Architectures: ©ian Sommerville 2004 Slide 1
No ratings yet
Application Architectures: ©ian Sommerville 2004 Slide 1
37 pages
Hive Architecture and Working
No ratings yet
Hive Architecture and Working
2 pages
Anil Resume 3
No ratings yet
Anil Resume 3
5 pages
Excel Course Outline
No ratings yet
Excel Course Outline
3 pages
Analisis Efisiensi Pengelolaan Tempat Tidur Rumah Sakit Berdasarkan Grafik Barber Johnson Di Rs Pku Muhammadiyah Yogyakarta Tahun 2015
No ratings yet
Analisis Efisiensi Pengelolaan Tempat Tidur Rumah Sakit Berdasarkan Grafik Barber Johnson Di Rs Pku Muhammadiyah Yogyakarta Tahun 2015
8 pages
Unit 3 Notes DBMS
No ratings yet
Unit 3 Notes DBMS
10 pages
Business Intelligence
No ratings yet
Business Intelligence
73 pages
Advantages and Extra Functions of Distributed
No ratings yet
Advantages and Extra Functions of Distributed
3 pages
Structured Query Language
No ratings yet
Structured Query Language
37 pages
©silberschatz, Korth and Sudarshan 1 Database System Concepts - 6 Edition
No ratings yet
©silberschatz, Korth and Sudarshan 1 Database System Concepts - 6 Edition
9 pages
Accounting Information System
No ratings yet
Accounting Information System
2 pages
SQL Revision IP
No ratings yet
SQL Revision IP
4 pages
Data-Resource-management
No ratings yet
Data-Resource-management
3 pages
Class 12 IP Practice Assignment Series 12
No ratings yet
Class 12 IP Practice Assignment Series 12
3 pages
Course Code: Comp 324
No ratings yet
Course Code: Comp 324
20 pages
MF DB2
No ratings yet
MF DB2
5 pages
Project Report
No ratings yet
Project Report
25 pages
Team-Techies Sprint Task Sheet
No ratings yet
Team-Techies Sprint Task Sheet
8 pages
Genesys Interactive Insights: End-To-End Visibility Into Your Contact Center Performance
No ratings yet
Genesys Interactive Insights: End-To-End Visibility Into Your Contact Center Performance
2 pages
Db2 Call Level Interface Guide Db2950
No ratings yet
Db2 Call Level Interface Guide Db2950
255 pages
Levels of Data, Data Independence-1
No ratings yet
Levels of Data, Data Independence-1
12 pages
Spatial Databases Management System
No ratings yet
Spatial Databases Management System
28 pages
Data Modeling
No ratings yet
Data Modeling
26 pages
SQL
No ratings yet
SQL
142 pages

Dimensional Modeling

Uploaded by

Dimensional Modeling

Uploaded by

Dimensional Modeling

Nature of business data

Facts: Occupied Rooms, Vacant Rooms, Unavailable

What if a product is now stored in a different warehouse of the

 STAR schema defines the join paths in exactly

 The STAR schema is intuitively understood by

 It is easy to use it as a vehicle for communicating

 The fact table continues to grow in the number of rows

these are also processed as additional adjustment rows

more stable and less volatile. However, a dimension table

1) A customer's status changes from rental home

 The description and other attributes change slowly

1) A spelling error in the customer name

 Usually, the changes relate to correction of

Eg: Change in marital status and customer

They usually relate to true changes in source

• Add a new dimension table row with new

Type 2 changes preserve the history. When a Type 2

Sometimes, there is a need to track both the old and

• They usually relate to "soft" or tentative

very deep - very large number of rows.

Data warehouse functions could be slow and inefficient.

500 - product brands

1. Partially normalize only a few dimension

Query 2: Total sales for customer number

Query 3: Total sales for all customers in the

If summarized the totals for a brand, per store, per

Product category by store by date

Product category by territory by date

Product category by territory by month

You might also like