0% found this document useful (0 votes)
12 views27 pages

CSIS 3300 W3 Denormalization StarSchema

Uploaded by

rodrigoferraribr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views27 pages

CSIS 3300 W3 Denormalization StarSchema

Uploaded by

rodrigoferraribr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

DENORMALIZATION AND

STAR SCHEMA
NIKHIL BHARDWAJ
he 7Ws Framework
Lawrence Corr, Jim Stagnitto
p://www.dama-phila.org/JS20120509.pdf)

The one missing W


in this one is how it
happened.

This shows that at


times, you might not
have all 7Ws
DIMENSIONS - REVIEW
• Dimension data is highly denormalized and precalculated
• Dimensions are usually identified by “By” words while
querying / generating reports. E.g. show sales amount By
region By month
• The quality of DW is only as good as the richness of
dimensions
• Some common dimensions: Date, Time, Product/Service,
Location, Customer
FACTS - REVIEW

• Primary table in a dimensional model where the


numerical performance measurements of the business
are stored.
• Grain: list/intersection of dimensions when the fact
measurements are taken. All measurements in the fact
table must be at the same grain.
• It is not possible to drill down any further than the grain of fact table.
• Most useful facts are numerical and additive such as
FOUR STEPS IN DIMENSIONAL
MODELING
1. Identify the process being modeled.
2. Determine the grain at which facts will be
stored.
3. Choose the dimensions.
4. Identify the numeric measures for the facts.
SLOWLY CHANGING DIMENSIONS

• Most dimensions are generally constant over time


• Many dimensions change slowly
• Though the key does not change other description and
attributes change slowly over time
• Dimension table attributes are not overwritten
• The ways changes are made in dimension tables depend on the
types of changes and what information must be preserved.
TYPE 1: CORRECTION OF ERRORS

• Usually relate to correction of errors in the source systems.


• E.g., spelling error in customer names; change of names of
customers;
• There is no need to preserve the old values here.
• The old value in the source system needs to be discarded.
• The changes made need not be preserved or noted.
TYPE 1.. CONTINUED
• Overwrite attribute value in the dimension table
row with new value
• No other changes are made to the dimension table
row.
• The key is not disturbed
• Easiest type of change to implement.
TYPE 2: PRESERVATION OF
HISTORY
• True changes in the source systems.
• E.g., change of marital status; change of address
• There is a need to preserve history
• This type of changes partition the warehouse
• Every change for the same attribute must be preserved.
• Applying these changes:
• Add a new dimension table row with new value of the changed
attribute
• No changes are made to the existing row.
• New rows are inserted with a new surrogate key.
TYPE 2.. CONTINUED
TYPE 3: TENTATIVE SOFT REVISION

• Tentative changes in the source system


• E.g., if an employee will get posted for a short period to a different location
• Need to keep track of history with old and new values
• Used to compare performances across the transition
• Applying these changes
• An “old” field is added in the dimension table
• Push existing value of attribute from “current” to “old”
• Update the “current” field with the new value with effective date
TYPE 3.. CONTINUED
SLOWLY CHANGING DIMENSIONS

Attributes in a dimension that change more slowly than


the fact granularity
• Type 1: Current only
• Type 2: All history
• Type 3: Most recent few (rare)
Note: rapidly changing dimensions usually indicate
the presence of a business process that should be
tracked as a separate dimension or as a fact table
CustKey BKCustID CustName CommDist Gender HomOwn?
1552 31421 Jane Rider 3 F N

Fact Table
Date CustKey ProdKey Item Count Amount
1/7/2014 1552 95 1 1,798.00
3/2/2014 1552 37 1 27.95
5/7/2015 1552 87 2 320.26
2/21/2016 1552 2387 42 1 19.95

Dimension with a slowly changing attribute


Cust BKCust Cust Comm Gender Hom Eff End
Key ID Name Dist Own?
1552 31421 Jane Rider 3 F N 1/7/2004 1/1/2006
2387 31421 Jane Rider 31 F N 1/2/2006 12/31/9999
RAPIDLY CHANGING DIMENSIONS
DATE DIMENSIONS

• One row for every day for which you expect to have
data for the fact table (perhaps generated in a
spreadsheet and imported)
• Usually use a meaningful integer surrogate key (such
as yyyymmdd 20160926 for Sep. 26, 2016). Note:
this order sorts correctly.
• Include rows for missing or future dates to be added
later.
DEGENERATE DIMENSIONS

• Dimensions without attributes. (Such as a transaction


number or order number.)
• Put the attribute value into the fact table even though it is
not an additive fact.
FACTLESS FACT TABLE

• A fact table is said to be empty if it has no measures to be


displayed. Fact table represents events
• Contains no data, only keys.
• How can you make it better?
DENORMALIZATION

Combin
e
DENORMALIZATION

Expand /
Calculate
1.Denormalize and add region (e.g. NA, EMEA)
2.Denormalize and add location data based on IP

DENORMALIZATION -
LAB
STAR SCHEMA

• Fact tables (center of star) contains FKs for all other


dimension tables.
• Sometimes people use either a prefix or suffix such as dim
and fact to identify dimension and fact tables
• Dimension tables are do not contain huge amount of data
but do contain a larger number of columns (attributes)
• Aggregation is done on one or more attributes of the
dimensional tables.
STAR SCHEMA BENEFITS

• Simpler queries
• Simplified business logic due to extra attributes
stored right into dimensions e.g. instead of just
storing date time, all attributes such as week,
weekday, month, quarter, year etc are stored and
later business analyst can run query on any of these
attributes without the need to calculate these
derived attributes in the application.
Star Schema for Foodmart
SNOWFLAKE SCHEMA
• For M:N relationship snowflake schema is better which
utilizes intersection table to join the M:N relationship.
• In snowflake schema we keep the main dimension
close to the fact table and the second dimension is
joined to the main dimension via intersection table
• E.g. if we design a schema for a library system which has
book and author as M:N relationship, we might consider
connecting the book dimension to the fact table. And The
author dimension can be joined to the book dimension via
BookAuthor dimension table. More on this in the case study
MULTIPLE FACT TABLES

• A business may have multiple fact tables due to two reasons


• Different measurements at different grain level or involving different dimensions
• Different types of facts tables to accommodate business needs.
• Types of fact tables
• Transactional (e.g. normal business transactions)
• Periodic Snapshot (e.g. inventory levels)
• Accumulating Snapshot (e.g. business process pipeline)
More details on this topic can be found on
http://www.kimballgroup.com/2014/06/design-tip-167-complementary-fact-table-types/
CASE STUDY
Creating star and snowflake schemas in MySQL workbench
We will analyse a library system which captures data about
patrons, what resource (books, DVDs etc.) are they
checking out, author(s) of that resource, when they check-in
the resource back, branch information, mode of request
(online holds etc.), any relevant event, and fine paid, if any.
A very simple dimensional model and the snowflake schema
is provided in the following slides.

You might also like