Agile Data Warehouse Design Sampler
Agile Data Warehouse Design Sampler
Data Warehouse
Design
Dimensional Modeling,
from to Star Schema
Lawrence Corr
with Jim Stagnitto
e r
pl
a m
S
business intelligence (DW/BI) requirements and turning them into high performance dimensional models in the most
direct way: by modelstorming (data modeling + brainstorming) with BI stakeholders.
Agile Data Warehouse Design is a step-by-step guide for capturing data warehousing/
business intelligence (DW/BI) requirements and turning them into high performance dimensional models in the
This
This book describes BEAM✲, an agile approach to dimensional modeling, for improving communication between
most direct way: by modelstorming (data modeling + brainstorming) with BI stakeholders.
data warehouse designers, BI stakeholders and the whole DW/BI development team. BEAM✲ provides tools
and techniques
This bookthat will encourage
describes BEAM✲, DW/BI
an agiledesigners
approach and developers modeling,
to dimensional to move for
away from their
improving keyboards and
communication
between data
entity relationship warehouse
based designers,
tools and model BIinteractively
stakeholders with
and their
the whole DW/BI development
colleagues. The result team. BEAM✲thinks
is everyone
provides tools and techniques that will encourage DW/BI designers and developers to move away from their
dimensionally from the outset! Developers understand how to efficiently implement dimensional modeling
keyboards and entity relationship based tools and model interactively with their colleagues. The result is
solutions. Business stakeholders feel ownership of the data warehouse they have created, and can already
everyone thinks dimensionally from the outset! Developers understand how to efficiently implement dimensional
imagine how they will use it to answer their business questions.
modeling solutions. Business stakeholders feel ownership of the data warehouse they have created, and can
already imagine how they will use it to answer their business questions.
Within this book, you will learn:
Within this book, you will learn:
✲ Agile dimensional modeling using Business Event Analysis & Modeling (BEAM✲)
✲ Agile dimensional modeling using Business Event Analysis & Modeling (BEAM✲)
✲ Modelstorming: data modeling that is quicker, more inclusive, more productive, and frankly more fun!
✲ Modelstorming: data modeling that is quicker, more inclusive, more productive, and frankly more fun!
✲ Telling dimensional data stories using the 7Ws (who, what, when, where, how many, why and how)
✲ Telling dimensional data stories using the 7Ws (who, what, when, where, how many, why and how)
✲ Modeling by example not abstraction;; using data story themes, not crow’s feet, to describe detail
✲ Modeling by example not abstraction; using data story themes, not crow’s feet, to describe detail
✲ Storyboarding the data warehouse to discover conformed dimensions and plan iterative development
✲ Storyboarding the data warehouse to discover conformed dimensions and plan iterative development
✲ Visual modeling: sketching timelines, charts and grids to model complex process measurement – simply
✲ Visual modeling: sketching timelines, charts and grids to model complex process measurement – simply
✲ Agile design documentation: enhancing star schemas with BEAM✲ dimensional shorthand notation
✲ Agile design documentation: enhancing star schemas with BEAM✲ dimensional shorthand notation
✲ Solving difficult DW/BI performance and usability problems with proven dimensional design patterns
✲ Solving difficult DW/BI performance and usability problems with proven dimensional design patterns
Lawrence Corr is a data warehouse designer and educator. As Principal of DecisionOne
Consulting, he helps Corr
Lawrence is to
clients a data warehouse
review designertheir
and simplify and educator. As Principaldesigns,
data warehouse of DecisionOne
and advises
vendors Consulting,
on visual he helps
data organizations
modeling to reviewHe
techniques. andregularly
simplify their data warehouse
teaches designs, and
agile dimensional modeling
advises on visual data modeling techniques. He regularly holds agile modeling workshops
courses worldwide and has taught dimensional DW/BI skills to thousands of students.
worldwide and has taught dimensional DW/BI skills to thousands of business/IT professionals.
Jim Stagnitto is a data warehouse and master data management architect specializing in the
Jimfinancial
Stagnitto
healthcare, is a data
services, and warehouse andservice
information master data management
industries. architect
He is specializing
the founder in data
of the
the healthcare, financial services, and information
warehousing and data mining consulting firm Llumino. service industries. He is the founder of the
data warehousing and data mining consulting firm Llumino.
decisionone.co.uk
BEAM✲
modelstorming.com
Agile Data Warehouse Design
Collaborative Dimensional Modeling,
from Whiteboard to Star Schema
Lawrence Corr
with Jim Stagnitto
Agile Data Warehouse Design
by Lawrence Corr with Jim Stagnitto
No part of this book may be reproduced in any form or by any electronic or mechanical means including informa-
tion presentation, storage and retrieval systems, without permission in writing from the copyright holder. The only
exception is by a reviewer, who may quote short excerpts in a review.
This eBook is free of copy protection or functionality restrictions. You may view or print it for
personal use as you see fit. You may make copies for your own personal use (e.g., one for use while
traveling and one on a home computer or backup device) but you may not give the eBook file to
other people. The file is personalized with an email address on the cover and other identifying
information and belongs to that email user. Ownership can not be transferred or sold. You may print the eBook but
it has been formatted specifically for on-screen viewing with no blank pages so margins and facing pages will not
print correctly. Generally it is cheaper and more efficient to order a paperback copy than print out the entire book.
Proofing: Laurence Hesketh, Geoff Hurrell Illustrators: Gill Guile and Lawrence Corr
Cover Design: After Escher
Printing History:
November 2011: First Edition. January 2012: Revision. October 2012: Revision. May 2013: Revision.
Non-Printing History:
May 2013: eBook First Edition.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks.
Where those designations appear in this book, and DecisionOne Press was aware of a trademark claim, the
designations have been printed in caps or initial caps.
Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with
respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties,
including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended
by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation.
Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or
Website is referred to in this work as a citation and/or a potential source of further information does not mean that
the author or the publisher endorses the information the organization or Website may provide or recommendations
it may make. Further, readers should be aware that Internet Websites listed in this work may have changed or
disappeared between when this work was written and when it is read.
Lawrence has developed and reviewed data warehouses for healthcare, telecommunications, engineering,
broadcasting, financial services and public service organizations. He held the position of data warehouse
practice leader at Linc Systems Corporation, CT, USA and vice-president of data warehousing products
at Teleran Technologies, NJ, USA. Lawrence was also a Ralph Kimball Associate and has taught data
warehousing classes for Kimball University in Europe and South Africa and contributed to Kimball
articles and design tips. He lives in Yorkshire, England with his wife Mary and daughter Aimee. Law-
rence can be contacted at:
Jim Stagnitto is a data warehouse and master data management architect specializing in the healthcare,
financial services and information service industries. He is the founder of the data warehousing and data
mining consulting firm Llumino.
Jim has been a guest contributor for Ralph Kimball’s Intelligent Enterprise column, and a contributing
author to Ralph Kimball & Joe Caserta’s The Data Warehouse ETL Toolkit. He lives in Bucks County,
PA, USA with his wife Lori, and their happy brood of pets. Jim can be contacted at:
[email protected]
ACKNOWLEDGEMENTS
We would like to express our gratitude to everyone who made this book about BEAM✲ possible
(using BEAM✲ notation):
CONTENTS
INTRODUCTION ................................................................................................................................. XVII!
PART I: MODELSTORMING ................................................................................................................... 1!
CHAPTER 1
HOW TO MODEL A DATA WAREHOUSE.............................................................................................. 3!
OLTP VS. DW/BI: TWO DIFFERENT WORLDS ......................................................................................... 4!
The Case Against Entity-Relationship Modeling .......................................................................... 5!
Advantages of ER Modeling for OLTP ...................................................................................... 6!
Disadvantages of ER Modeling for Data Warehousing............................................................. 6!
The Case For Dimensional Modeling........................................................................................... 7!
Star Schemas............................................................................................................................ 8!
Fact and Dimension Tables ...................................................................................................... 8!
Advantages of Dimensional Modeling for Data Warehousing................................................... 9!
DATA WAREHOUSE ANALYSIS AND DESIGN .......................................................................................... 11!
Data-Driven Analysis.................................................................................................................. 11!
Reporting-Driven Analysis.......................................................................................................... 12!
Proactive DW/BI Analysis and Design ....................................................................................... 13!
Benefits of Proactive Design for Data Warehousing ............................................................... 14!
Challenges of Proactive Analysis for Data Warehousing........................................................ 15!
Proactive Reporting-Driven Analysis Challenges.................................................................... 15!
Proactive Data-Driven Analysis Challenges............................................................................ 15!
Data then Requirements: a ‘Chicken or the egg’ Conundrum................................................. 15!
Agile Data Warehouse Design ................................................................................................... 16!
Agile Data Modeling ................................................................................................................... 17!
Agile Dimensional Modeling....................................................................................................... 18!
Agile Dimensional Modeling and Traditional DW/BI Analysis ................................................. 19!
Agile Data-Driven Analysis...................................................................................................... 19!
Agile Reporting-Driven Analysis.............................................................................................. 19!
Requirements for Agile Dimensional Modeling ....................................................................... 19!
BEAM✲ ............................................................................................................................................ 21!
Data Stories and the 7Ws Framework ....................................................................................... 21!
Diagrams and Notation .............................................................................................................. 21!
BEAM✲ (Example Data) Tables ............................................................................................. 21!
BEAM✲ Short Codes.............................................................................................................. 22!
Comparing BEAM✲ and Entity-Relationship Diagrams.......................................................... 22!
Data Model Types ................................................................................................................... 23!
BEAM✲ Diagram Types ......................................................................................................... 24!
SUMMARY .......................................................................................................................................... 26!
CHAPTER 2
MODELING BUSINESS EVENTS.......................................................................................................... 27!
DATA STORIES ................................................................................................................................... 28!
Story Types ................................................................................................................................ 28!
Discrete Events ....................................................................................................................... 29!
Evolving Events....................................................................................................................... 29!
X Agile Data Warehouse Design
CHAPTER 3
MODELING BUSINESS DIMENSIONS ................................................................................................. 59!
DIMENSIONS ....................................................................................................................................... 60!
Dimension Stories ...................................................................................................................... 60!
Discovering Dimensions............................................................................................................. 61!
DOCUMENTING DIMENSIONS ................................................................................................................ 62!
Dimension Subject ..................................................................................................................... 62!
Dimension Granularity and Business Keys ................................................................................ 63!
DIMENSIONAL ATTRIBUTES .................................................................................................................. 64!
Attribute Scope........................................................................................................................... 64!
Contents XI
CHAPTER 4
MODELING BUSINESS PROCESSES.................................................................................................. 95!
MODELING MULTIPLE EVENTS WITH AGILITY ........................................................................................ 96!
Conformed Dimensions.............................................................................................................. 97!
The Data Warehouse Bus........................................................................................................ 100!
The Event Matrix ...................................................................................................................... 102!
Event Sequences ..................................................................................................................... 104!
Time/Value Sequence........................................................................................................... 104!
Process Sequence ................................................................................................................ 104!
Modeling Process Sequences as Evolving Events ............................................................... 105!
Using Process Sequences to Enrich Events......................................................................... 105!
MODELSTORMING WITH AN EVENT MATRIX ......................................................................................... 105!
XII Agile Data Warehouse Design
CHAPTER 5
MODELING STAR SCHEMAS............................................................................................................. 129!
AGILE DATA PROFILING ..................................................................................................................... 130!
Identifying Candidate Data Sources......................................................................................... 131!
Data Profiling Techniques ........................................................................................................ 132!
Missing Values ...................................................................................................................... 132!
Unique Values and Frequency.............................................................................................. 133!
Data Ranges and Lengths .................................................................................................... 133!
Automating Your Own Data Profiling Checks ....................................................................... 134!
No Source Yet: Proactive DW/BI Design ................................................................................. 134!
Annotating the Model with Data Profiling Results .................................................................... 135!
Data Sources and Data Types .............................................................................................. 135!
Additional Data...................................................................................................................... 137!
Unavailable Data................................................................................................................... 137!
Nulls and Mismatched Attribute Descriptions........................................................................ 137!
MODEL REVIEW AND SPRINT PLANNING ............................................................................................. 138!
Team Estimating ...................................................................................................................... 138!
Running a Model Review ......................................................................................................... 139!
Sprint Planning......................................................................................................................... 140!
STAR SCHEMA DESIGN ..................................................................................................................... 141!
Adding Keys to a Dimensional Model ...................................................................................... 141!
Choosing Primary Keys: Business Keys vs. Surrogate Keys................................................ 141!
Benefits of Data Warehouse Surrogate Keys ....................................................................... 142!
Insulate the Data Warehouse from Business Key Change ................................................... 143!
Cope with Multiple Business Keys for a Dimension .............................................................. 143!
Track Dimensional History Efficiently.................................................................................... 143!
Handle Missing Dimensional Values..................................................................................... 143!
Support Multi-Level Dimensions ........................................................................................... 144!
Protect Sensitive Information ................................................................................................ 144!
Reduce Fact Table Size........................................................................................................ 144!
Improve Query Performance................................................................................................. 145!
Enforce Referential Integrity Efficiently ................................................................................. 145!
Slowly Changing Dimensions................................................................................................... 146!
Overwriting History: Type 1 Slowly Changing Dimensions ................................................... 147!
Contents XIII
CHAPTER 7
WHEN AND WHERE: DESIGN PATTERNS FOR TIME AND LOCATION ........................................ 203!
TIME DIMENSIONS............................................................................................................................. 204!
Calendar Dimensions............................................................................................................... 205!
Date Keys.............................................................................................................................. 206!
ISO Date Keys ...................................................................................................................... 207!
Epoch-Based Date Keys ....................................................................................................... 207!
Populating the Calendar........................................................................................................ 208!
BI Tools and Calendar Dimensions....................................................................................... 208!
Period Calendars ..................................................................................................................... 209!
Month Dimensions ................................................................................................................ 209!
Offset Calendars ................................................................................................................... 210!
Year-to-Date Comparisons ...................................................................................................... 210!
Fact-Specific Calendar Pattern ............................................................................................. 212!
Using Fact State Information in Report Footers.................................................................... 213!
Conformed Date Ranges ...................................................................................................... 214!
CLOCK DIMENSIONS ......................................................................................................................... 214!
Day Clock Pattern - Date and Time Relationships................................................................... 215!
Time Keys ................................................................................................................................ 216!
INTERNATIONAL TIME ........................................................................................................................ 217!
Multinational Calendar Pattern................................................................................................. 218!
Date Version Keys ................................................................................................................... 220!
INTERNATIONAL TRAVEL .................................................................................................................... 221!
Time Dimensions or Time Facts? ............................................................................................ 224!
NATIONAL LANGUAGE DIMENSIONS .................................................................................................... 225!
National Language Calendars.................................................................................................. 225!
Swappable National Language Dimensions Pattern................................................................ 225!
SUMMARY ........................................................................................................................................ 226!
CHAPTER 8
HOW MANY: DESIGN PATTERNS FOR HIGH PERFORMANCE FACT TABLES AND FLEXIBLE
MEASURES ......................................................................................................................................... 227!
FACT TABLE TYPES .......................................................................................................................... 228!
Transaction Fact Table ............................................................................................................ 228!
Contents XV
CHAPTER 9
WHY AND HOW: DESIGN PATTERNS FOR CAUSE AND EFFECT ................................................ 261!
WHY DIMENSIONS............................................................................................................................. 262!
Internal Why Dimensions ......................................................................................................... 262!
Unstructured Why Dimensions................................................................................................. 263!
External Why Dimensions ........................................................................................................ 264!
MULTI-VALUED DIMENSIONS ............................................................................................................. 265!
Weighting Factor Pattern ......................................................................................................... 265!
Modeling Multi-Valued Groups................................................................................................. 267!
Multi-Valued Bridge Pattern ..................................................................................................... 268!
Optional Bridge Pattern............................................................................................................ 270!
Pivoted Dimension Pattern....................................................................................................... 273!
HOW DIMENSIONS ............................................................................................................................ 276!
Too Many Degenerate Dimensions?........................................................................................ 277!
Creating How Dimensions........................................................................................................ 277!
Range Band Dimension Pattern............................................................................................... 278!
XVI Agile Data Warehouse Design
Agile, with its mantra of creating business value through the early and frequent Agile techniques
delivery of working software and responding to change, has had just such a revolu- can help, but they
tionary effect on the world of application development. Can it take on the chal- must address data
lenges of DW/BI? Agile’s emphasis on collaboration and incremental development warehouse design,
coupled with techniques such as Scrum and User Stories, will certainly improve BI not just BI
application development—once a data warehouse is in place. But to truly have an application
impact on DW/BI, agile must also address data warehouse design itself. Unfortu- development
nately, the agile approaches that have emerged, so far, are vague and non-
prescriptive in this one key area. For agile BI to be more than a marketing reboot of
business-as-usual business intelligence, it must be agile DW/BI and we, DW/BI
professionals, must do what every true agilist would recommend: adapt agile to
meet our needs while still upholding its values and principlFs (see Appendix A: The
Agile Manifesto). At the same time, agilists coming afresh to DW/BI, for their part,
must learn our hard-won data lessons.
With that aim in mind, this book introduces BEAM✲ (Business Event Analysis & This book is about
Modeling): a set of collaborative techniques for modelstorming BI data require- BEAM✲: an agile
ments and translating them into dimensional models on an agile timescale. We call approach to
the BEAM✲ approach “modelstorming” because it combines data modeling and dimensional
brainstorming techniques for rapidly creating inclusive, understandable models modeling
that fully engage BI stakeholders.
BEAM✲ modelers achieve this by asking stakeholders to tell data stories, using the BEAM✲ is used for
7W dimensional types—who, what, when, where, how many, why, and how—to modelstorming BI
describe the business events they need to measure. BEAM✲ models support mod- requirements
elstorming by differing radically from conventional entity-relationship (ER) based directly with BI
models. BEAM✲ uses tabular notation and example data stories to define business stakeholders
events in a format that is instantly recognizable to spreadsheet-literate BI
stakeholders, yet easily translated into atomic-detailed star schemas. By doing so,
BEAM✲ bridges the business-IT gap, creates consensus on data definitions and
generates a sense of business ownership and pride in the resulting database design.
XVII
XVIII Introduction
It is aimed at both For those new to data warehousing, this book provides a quick-study introduction
new and experienced to dimensional modeling techniques. For those of you who would like more
DW/BI practitioners. background on the techniques covered, the later chapters and Appendix C provide
It’s a quick-study references to case studies in other texts that will help you gain additional business
guide to dimensional insight. Experienced data warehousing professionals will find that this book offers
modeling and a a fresh perspective on familiar dimensional modeling patterns, covering many in
source of new more detail than previously available, and adding several new ones. For all readers,
dimensional design this book offers a radically new agile way of engaging with business users and kick-
patterns starting their next warehouse development project.
The bright modeler, not surprisingly, has some bright ideas. His tips, techniques and
practical modeling advice, distilled from the current topic, will help you improve
your design.
The experienced dimensional modeler has seen it all before. He’s here to warn you
when an activity or decision can steal your time, sanity or agility. Later in the book
he follows the pattern users (see below) to tell you about the consequences or side
effects of using their recommended design patterns. He would still recommend
you use their patterns though—just with a little care.
Introduction XIX
The note takers are the members of the team who always read the full instructions
before they use that new gadget or technique. They’re always here to tell you to
“make a note of that” when there is extra information on the current topic.
The agilists will let you know when we're being particularly agile. They wave their
banner whenever a design technique supports a core value of the agile manifesto or
principle of agile software development. These are listed in Appendix A.
The scribe appears whenever we introduce new BEAM✲ diagrams, notation con-
ventions or short codes for rapidly documenting your designs. All the scribe’s short
codes are listed in Appendix B.
The agile modeler engages with stakeholders and facilitates modelstorming. She is
here to ask example BEAM✲ questions, using the 7Ws, to get stakeholders to tell
their data stories.
The stakeholders are the subject matter experts, operational IT staff, BI users and BI
consumers, who know the data sources, or know the data they want—anyone who
can help define the data warehouse who is not a member of the DW/BI develop-
ment team. They are here to provide example answers to the agile modeler’s ques-
tions, tell data stories and pose their own tricky BI questions.
The bookworm points you to further reading on the current topic. All her reading
recommendations are gathered in Appendix C.
The agile developer appears when we have some practical advice about using soft-
ware tools or there is something useful you can download.
The pattern users have a solution to the head scratcher’s problems. They’re going to
use tried and tested dimensional modeling design patterns, some new in print.
XX Introduction
Part I: Modelstorming
Part I describes how to modelstorm BI stakeholders’ data requirements, validate
Collaborative these requirements using agile data profiling, review and prioritize them with
modeling with BI stakeholders, estimate their ETL tasks as a team, and convert them into star sche-
stakeholders mas. It illustrates how agile data modeling can be used to replace traditional BI
requirements gathering with accelerated database design, followed by BI prototyp-
ing to capture the real reporting and analysis requirements. Chapter 1 provides an
introduction to dimensional modeling. Chapters 2 to 4 provide a step-by-step
guide for using BEAM✲ to model business events and dimensions. Chapter 5
describes how BEAM✲ models are validated and translated into physical dimen-
sional models and development sprint plans.
cause (why), and effect (how), we document new and established dimensional
techniques from a dimensional perspective for the first time.
Cross-process analysis: Combining the results from multiple fact tables using
drill-across processing and multi-pass queries. Building derived fact tables and
consolidated data marts to simplify query processing.
Companion Website
Visit modelstorming.com to download the BEAM✲Modelstormer spreadsheet
and other templates that accompany this book. On the site you will find example
models and code listings together with links to articles, books, and the worldwide
schedule of training courses and workshops on BEAM✲ and agile data warehouse
design. Register your paperback copy online to receive a discounted eBook version.
PART I: MODELSTORMING
AGILE DIMENSIONAL MODELING, FROM WHITEBOARD TO STAR SCHEMA
In this first chapter we set out the motivation for adopting an agile approach to Dimensional
data warehouse design. We start by summarizing the fundamental differences modeling supports
between data warehouses and online transaction processing (OLTP) databases to data warehouse
show why they need to be designed using very different data modeling techniques. design
We then contrast entity-relationship and dimensional modeling and explain why
dimensional models are optimal for data warehousing/business intelligence
(DW/BI). While doing so we also describe how dimensional modeling enables
incremental design and delivery: key principles of agile software development.
Readers who are familiar with the benefits of traditional dimensional modeling Collaborative
may wish to skip to Data Warehouse Analysis and Design on Page 11 where we begin dimensional
the case for agile dimensional modeling. There, we take a step back in the DW/BI modeling
development lifecycle and examine the traditional approaches to data requirements supports agile
analysis, and highlight their shortcomings in dealing with ever more complex data data warehouse
sources and aggressive BI delivery schedules. We then describe how agile data analysis and design
modeling can significantly improve matters by actively involving business
stakeholders in the analysis and design process. We finish by introducing BEAM✲
(Business Event Analysis and Modeling): the set of agile techniques for collabora-
tive dimensional modeling described throughout this book.
3
4 Chapter 1
Figure 1-1
Entity-Relationship
diagram (ERD)
Within a relational database, entities are implemented as tables and their attributes Entities become
as columns. Relationships are implemented either as columns within existing tables, attributes
tables or as additional tables depending on their cardinality. One-to-one (1:1) and become columns
many-to-one (M:1) relationships are implemented as columns, whereas many-to-
many (M:M) relationships are implemented using additional tables, creating
additional M:1 relationships.
ER modeling is associated with normalization in general, and third normal form ER models are
(3NF) in particular. ER modeling and normalization have very specific technical typically in third
goals: to reduce data redundancy and make explicit the 1:1 and M:1 relationships normal form (3NF)
within the data that can be enforced by relational database management systems.
6 Chapter 1
Higher forms of normalization are available, but most ER modelers are satisfied
when their models are in 3NF. There is even a mnemonic to remind everyone that
data in 3NF depends on “The key, the whole key, and nothing but the key, so help
me Codd”—in memory of Edgar (Ted) Codd, inventor of the relational model.
3NF models are More importantly, queries will only produce the right answers if users navigate the
difficult to right join paths, i.e., ask the right questions in SQL terms. If the wrong joins are
understand used, they unknowingly get answers to some other (potentially meaningless)
questions. 3NF models are complex for both people and machines. Specialist
hardware (data warehouse appliances) is improving query/join performance all the
time, but the human problems are far more difficult to solve. Smart BI software can
hide database schema complexity behind a semantic layer, but that merely moves
the burden of understanding a 3NF model from BI users at query time to BI
developers at configuration time. That’s a good move but its not enough. 3NF
models remain too complex for business stakeholders to review and quality assure
(QA).
History further ER models are further complicated by data warehousing requirements to track
complicates 3NF history in full to support valid ‘like-for-like’ comparisons over time. Providing a
true historical perspective of business events requires that many otherwise simple
descriptive attributes become time relationships, i.e., existing M:1 relationships
become M:M relationships that translate into even more physical tables and com-
How to Model a Data Warehouse 7
plex join paths. Such temporal database designs can defeat even the smartest BI
tools and developers.
Laying out a readable ERD for any non-trivial data model isn’t easy. The mne- Large readable ER
monic “dead crows fly east” encourages modelers to keep crows’ feet pointing up diagrams are
or to the left. Theoretically this should keep the high-volume volatile entities difficult to draw: all
(transactions) top left and the low-volume stable entities (lookup tables) bottom those overlapping
right. However, this layout seldom survives as modelers attempt to increase read- lines
ability by moving closely related or commonly used entities together. The task
rapidly descends into an exercise in trying to reduce overlapping lines. Most ERDs
are visually overwhelming for BI stakeholders and developers who need simpler,
human-scale diagrams to aid their communication and understanding.
Figure 1-2
Multidimensional
analysis
8 Chapter 1
Star Schemas
Star schemas are Real-world dimensional models are used to measure far more complex business
used to visualize processes (with more dimensions) in far greater detail than could be attempted
dimensional models using spreadsheets. While it is difficult to envision models with more than three
dimensions as multi-dimensional cubes (they wouldn’t actually be cubes), they can
easily be represented using star schema diagrams. Figure 1-3 shows a classic star
schema for retail sales containing a fourth (causal) dimension: PROMOTION, in
addition to the dimensional attributes and facts from the previous cube example.
Figure 1-3
Sales star schema
Star schema is also the term used to describe the physical implementation of a
dimensional model as relational tables.
ER diagrams work best for viewing a small number of tables at one time. How
many tables? About as many as in a dimensional model: a star schema.
The term dimension in this book refers to a dimension table whereas dimensional
attribute refers to a column in a dimension table.
Dimensions contain sets of descriptive (dimensional) attributes that are used to Dimensional
filter data and group facts for aggregation. Their role is to provide good report row hierarchies support
headers and title/heading/footnote filter descriptions. Dimensional attributes often drill-down analysis
have a hierarchical relationship that allows BI tools to provide drill-down analysis.
For example, drilling down from Quarter to Month, Country to Store, and Cate-
gory to Product.
Not all dimensional attributes are text. Dimensions can contain numbers and dates Dimensions are
too, but these are generally used like the textual attributes to filter and group the small,
facts rather than to calculate aggregate measures. Despite their width, dimensions fact tables
are tiny relative to fact tables. Most dimensions contain considerably less than a are large
million rows.
The most useful facts are additive measures that can be aggregated using any
combination of the available dimensions. The most useful dimensions provide
rich sets of descriptive attributes that are familiar to BI users.
A deeper, less immediately obvious benefit of dimensional models is that they are Dimensional models
process-oriented. They are not just the result of some aggressive physical data are process-
model optimization (that has denormalized a logical 3NF ER model into a smaller oriented. They
number of tables) to overcome the limitations of databases to cope with join represent business
intensive BI queries. Instead, the best dimensional models are the result of asking processes
questions to discover which business processes need to be measured, how they described using the
should be described in business terms and how they should be measured. The 7Ws framework
resulting dimensions and fact tables are not arbitrary collections of denormalized
data but the 7Ws that describe the full details of each individual business event
worth measuring.
10 Chapter 1
7Ws
Framework
Where did it take place?
HoW many or much was recorded – how can it be measured?
Why did it happen?
HoW did it happen – in what manner?
The 7Ws are The 7Ws are an extension of the 5 or 6Ws that are often cited as the checklist in
interrogatives: essay writing and investigative journalism for getting the ‘full’ story. Each W is an
question forming interrogative: a word or phrase used to make questions. The 7Ws are especially
words useful for data warehouse data modeling because they focus the design on BI
activity: asking questions.
Fact tables represent verbs (they record business process activity). The facts they
contain and the dimensions that surround them are nouns, each classifiable as
one of the 7Ws. 6Ws: who, what, when, where, why, and how represent dimension
types. The 7th W: how many, represents facts. BEAM✲ data stories use the 7Ws
to discover these important verb and noun combinations.
Star schemas Detailed dimensional models usually contain more than 6 dimensions because any
usually contain of the 6Ws can appear multiple times. For example, an order fulfillment process
8-20 dimensions could be modeled with 3 who dimensions: CUSTOMER, EMPLOYEE, and
CARRIER, and 2 when dimensions: ORDER DATE and DELIVERY DATE.
Having said that, most dimensional models do not have many more than 10 or 12
dimensions. Even the most complex business events rarely have 20 dimensions.
Star schemas The deep benefit of process-oriented dimensional modeling is that it naturally
support agile, breaks data warehouse scope, design and development into manageable chunks
incremental BI consisting of just the individual business processes that need to be measured next.
Modeling each business process as a separate star schema supports incremental
design, development and usage. Agile dimensional modelers and BI stakeholders
can concentrate on one business process at a time to fully understand how it
should be measured. Agile development teams can build and incrementally deliver
individual star schemas earlier than monolithic designs. Agile BI users can gain
early value by analyzing these business processes initially in isolation and then
grow into more valuable, sophisticated cross-process analysis. Why develop ten
stars when one or two can be delivered far sooner with less investment ‘at risk’?
Figure 1-4
Data warehouse
analysis and design
biases
Data-Driven Analysis
Using a data-driven approach, data requirements are obtained by analyzing oper- Pure data-driven
ational data sources. This form of analysis was adopted by many early IT-lead data analysis avoided
warehousing initiatives to the exclusion of all others. User involvement was early user
avoided as it was mistakenly felt that data warehouse design was simply a matter of involvement
re-modeling multiple data sources using ER techniques to produce a single ‘per-
fect’ 3NF model. Only after that was built, would it then be time to approach the
users for their BI requirements.
12 Chapter 1
Leading to DW Unfortunately, without user input to prioritize data requirements and set a man-
designs that did ageable scope, these early data warehouse designs were time-consuming and
not met BI user expensive to build. Also, being heavily influenced by the OLTP perspective of the
needs source data, they were difficult to query and rarely answered the most pressing
business questions. Pure data-driven analysis and design became known as the
“build it and they will come” or “field of dreams” approach, and eventually died
out to be replaced by hybrid methods that included user requirements analysis,
source data profiling, and dimensional modeling.
Packaged apps are Data-driven analysis has benefited greatly from the use of modern data profiling
especially tools and methods but despite their availability, data-driven analysis has become
challenging data increasing problematic as operational data models have grown in complexity. This
sources to analyze is especially true where the operational systems are packaged applications, such as
Enterprise Resource Planning (ERP) systems built on highly generic data models.
IT staff are In spite of its problems, data-driven analysis continues to be a major source of data
comfortable with requirements for many data warehousing projects because it falls well within the
data-driven analysis technical comfort zone of IT staff who would rather not get too involved with
business stakeholders and BI users.
Reporting-Driven Analysis
Reporting Using a reporting-driven approach, data requirements are obtained by analyzing
requirements are the BI users’ reporting requirements. These requirements are gathered by inter-
gathered by viewing stakeholders one at a time or in small groups. Following rounds of meet-
interviewing ings, analyst’s interview notes and detailed report definitions (typically spreadsheet
potential BI users or word processor mock-ups) are cross-referenced to produce a consolidated list of
in small groups data requirements that are verified against available data sources. The results
requirements documentation is then presented to the stakeholders for ratification.
After they have signed off the requirements, the documentation is eventually used
to drive the data modeling process and subsequent BI development.
User involvement Reporting-driven analysis focuses the data warehouse design on efficiently priori-
helps to create more tizing the stakeholder’s most urgent reporting requirements and can lead to timely,
successful DWs successful deployments when the scope is managed carefully.
future information needs beyond the ‘next reports’, because these needs are de-
pendent upon the answers the ‘next reports’ will provide, and the unexpected new
business initiatives those answers will trigger. The ensuing steps of collating re-
quirements, feeding them back to business stakeholders, gaining consensus on data
terms, and obtaining sign off can also be an extremely lengthy process.
Over-reliance on reporting requirements has lead to many initially successful data Focusing too closely
warehouse designs that fail to handle change in the longer-term. This typically on current reports
occurs when inexperienced dimensional modelers produce designs that match the alone leads to
current report requests too closely, rather than treating these reports as clues to inflexible
discovering the underlying business processes that should be modeled in greater dimensional models
detail to provide true BI flexibility. The problem is often exasperated by initial
requirement analysis taking so long that there isn’t the budget or willpower to
swiftly iterate and discover the real BI requirements as they evolve. The resulting
inflexible designs have led some industry pundits to unfairly brand dimensional
modeling as too report-centric, suitable at the data mart level for satisfying the
current reporting needs of individual departments, but unsuitable for enterprise
data warehouse design. This is sadly misleading because dimensional modeling has
no such limitation when used correctly to iteratively and incrementally model
atomic-level detailed business processes rather than reverse engineer summary
report requests.
Figure 1-5
Reactive DW
timeline
Today, DW/BI has caught up and become proactive. The two different worlds of The lag between
OLTP and DW/BI have become parallel worlds where many new data warehouses OLTP and DW roll-
need to go live/be developed concurrently with their new operational source out is disappearing
systems, as shown on the Figure 1-6 timeline.
14 Chapter 1
Figure 1-6
Proactive DW
timeline
Proactive DW/BI DW/BI has steadily become proactive for a number of business-led reasons:
addresses
operational DW/BI itself has become more operational. The (largely technical) distinction
demands, avoids between operational and analytical reporting has blurred. Increasingly, sophis-
interim solutions ticated operational processes are leveraging the power of (near real-time) BI
and preempts BI and stakeholders want a one-stop shop for all reporting needs: the data ware-
performance house.
problems
Organizations (especially those that already have DW/BI success) now realize
that, sooner rather than later, each major new operational system will need its
own data mart or need to be integrated with an existing data warehouse.
BI stakeholders simply don’t want to support ‘less than perfect’ interim report-
ing solutions and suffer BI backlogs.
When source database schemas are not yet available, ETL development can still
proceed if ETL and OLTP designers can agree on flat file data extracts. Once
OLTP have committed to provide the specified extracts on a schedule to meet BI
needs, ETL transformation and load routines can be developed to match this
source to the proactive data warehouse design target.
Figure 1-7
Waterfall DW
development
timeline
Dimensional Dimensional modeling can help reduce the risks of pure waterfall by allowing
modeling enables developers to release early incremental BI functionality one star schema at a time,
incremental get feedback and make adjustments. But even dimensional modeling, like most
development other forms of data modeling, takes a (near) serial approach to analysis and design
(with ‘Big Requirements Up Front’ (BRUF) preceding BDUF data modeling) that
is subject to the inherent limitations and initial delays described already.
Agile data Agile data warehousing seeks to further reduce the risks associated with upfront
warehousing is analysis and provide even more timely BI value by taking a highly iterative, incre-
highly iterative and mental and collaborative approach to all aspects of DW design and development as
collaborative shown on the Figure 1-8 timeline.
Figure 1-8
Agile DW
development
timeline
How to Model a Data Warehouse 17
By avoiding the BDUF and instead doing ‘Just Enough Design Upfront’ (JEDUF) Agile focuses on the
in the initial iterations and ‘Just-In-Time’ (JIT) detailed design within each itera- early and frequent
tion, agile development concentrates on the early and frequent delivery of working delivery of working
software that adds value, rather than the production of exhaustive requirements software that adds
and design documentation that describes what will be done in the future to add value
value.
For agile DW/BI, the working software that adds value is a combination of query- For DW design, the
able database schemas, ETL processes and BI reports/dashboards. The minimum minimum valuable
set of valuable working software that can be delivered per iteration is a star schema, working software is
the ETL processes that populates it and a BI tool or application configured to a star schema
access it. The minimum amount of design is a star.
To design any type of significant database schema to match the early and frequent Agile database
delivery schedule of an agile timeline requires an equally agile alternative to the development needs
traditionally serial tasks of data requirements analysis and data modeling. agile data modeling
Iterative, incremental and collaborative all have very specific meanings in an agile Collaborative
development context that bring with them significant benefits: modeling combines
analysis and design
Collaborative data modeling obtains data requirements by modeling directly and actively
with stakeholders. It effectively combines analysis and design and ‘cuts to the involves
chase’ of producing a data model (working software and documentation) stakeholders
rather than ‘the establishing shot’ of recording data requirements (only docu-
mentation).
Incremental data modeling gives you more data requirements when they are Evolutionary
better understood/needed by stakeholders, and when you are ready to imple- modeling supports
ment them. Incremental modeling and development are scheduling strategies incremental
that support early and frequent software delivery. development by
capturing
Iterative data modeling helps you to understand existing data requirements requirements when
better and improve existing database schemas through refactoring: correcting they grow and
mistakes and adding missing attributes which have now become available or change
important. Iterative modeling and development are rework strategies that in-
crease software value.
18 Chapter 1
Agile dimensional Agile modeling avoids the ‘analysis paralysis’ caused by trying to discover the
modeling focuses on ‘right’ reports amongst the large (potentially infinite?) number of volatile, con-
business processes stantly re-prioritized requests in the BI backlog. Instead, agile dimensional
rather than reports modeling gets everyone to focus on the far smaller (finite) number of relatively
stable business processes that stakeholders want to measure now or next.
Agile dimensional Agile dimensional modeling avoids the need to decode detailed business events
modeling creates from current summary report definitions. Modeling business processes without
flexible, report- the blinkers of specific report requests produces more flexible, report-neutral,
neutral designs enterprise-wide data warehouse designs.
Agile modeling Agile data modeling can break the “data then requirements” stalemate that
enables proactive exists for DW/BI just before a new operational system is implemented. Proac-
DW/BI to influence tive agile dimensional modeling enables BI stakeholders to define new business
operational system processes from a measurement perspective and provide timely BI input to op-
development erational application development or package configuration.
Evolutionary Agile modeling’s evolutionary approach matches the accretive nature of genu-
modeling supports ine BI requirements. By following hands-on BI prototyping and/or real BI us-
accretive BI age, iterative and incremental dimensional modeling allows stakeholders to
requirements (re)define their real data requirements.
Collaborative Many of the stakeholders involved in collaborative modeling will become direct
modeling teaches users of the finished dimensional data models. Doing some form of dimen-
stakeholders to think sional modeling with these future BI users is an opportunity to teach them to
dimensionally think dimensionally about their data and define common, conformed dimen-
sions and facts from the outset.
Never underestimate the affection stakeholders will have for data models that they
themselves (help) create.
How to Model a Data Warehouse 19
Business stakeholders have little appetite for traditional data models, even Collaborative
conceptual models (see Data Model Types, shortly) that are supposedly targeted data modeling
at them. They find the ER diagrams and notation favored by data modelers must use simple,
(and generated by database modeling tools) too complex or too abstract. To inclusive notation
engage stakeholders, agile modelers need to create less abstract, more inclusive and tools
data models using simple tools that are easy to use, and easy to share. These in-
clusive models must easily translate into the more technically detailed,
20 Chapter 1
logical and physical, star schemas used by database administrators (DBAs) and
ETL/BI developers.
Data modeling To encourage collaboration and support iteration, agile data modeling needs to
sessions (model- be quick. If stakeholders are going to participate in multiple modeling sessions
storms) need to be they don’t want each one to take days or weeks. Agile modelers want speed too.
quick: hours rather They don’t want to wear out their welcome with stakeholders. The best results
than days are obtained by modeling with groups of stakeholders who have the experience
and knowledge to define common business terms (conformed dimensions) and
prioritize requirements. It is hard enough to schedule long meetings with these
people individually let alone in groups. Agile data modeling techniques must
support modelstorming: impromptu stand up modeling that is quicker, simpler,
easier and more fun than traditional approaches.
Agile modelers must Stakeholders don’t want to feel that a design is constantly iterating (fixing what
balance JIT and they have already paid for) when they want to be incrementing (adding func-
JEDUF modeling to tionality). They want to see obvious progress and visible results. Agile modelers
reduce design need techniques that support JIT modeling of current data requirement in details
rework and JEDUF modeling of ‘the big picture’ to help anticipate future iterations and
reduce the amount of design rework.
Evolutionary DW Developers need to embrace database change. They are used to working with
development (notionally) stable database designs, by-products of BDUF data modeling. It is
benefits from ETL/BI support staff who are more familiar with coding around the database changes
tools that support needed to match users’ real requirements. To respond efficiently to evolution-
automated testing ary data warehouse design, agile ETL and BI developers need tools that support
database impact analysis and automated testing.
DW designers must Data warehouse designers also need to embrace data model change. They will
embrace change naturally want to limit the amount of disruptive database refactoring required
and allow their by evolutionary design, but they must avoid resorting to generic data model
models to evolve patterns which reduce understandability and query performance, and can al-
ienate stakeholders. Agile data warehouse modelers need dimensional design
patterns that they can trust to represent tomorrow’s BI requirements tomorrow,
while they concentrate on today’s BI requirements now.
Agile dimensional If agile dimensional modeling that is interactive, inclusive, quick, supports JIT and
modeling techniques JEDUF, and enables DW teams to embrace change seems like a tall order don’t worry;
exist for addressing while there are no silver bullets that will make everyone or everything agile over-
these requirements night, there are proven tools and techniques that can address the majority of these
agile modeling prerequisites.
How to Model a Data Warehouse 21
BEAM✲
BEAM✲ is an agile data modeling method for designing dimensional data ware- BEAM✲ is an
houses and data marts. BEAM stands for Business Event Analysis & Modeling. As agile dimensional
the name suggests it combines analysis techniques for gathering business event modeling method
related data requirements and data modeling techniques for database design. The
trailing ✲ (six point open centre asterisk) represents its dimensional deliverables:
star schemas and the dimensional position of each of the 7Ws it uses.
BEAM✲ consists of a set of repeatable, collaborative modeling techniques for BEAM✲ is used to
rapidly discovering business event details and an inclusive modeling notation for discover and
documenting them in a tabular format that is easily understood by business stake- document business
holders and readily translated into logical/physical dimensional models by IT event details
developers.
visible signs of progress. Stakeholders can easily imagine sorting and filtering the
low-level detail columns of a business event using the higher-level dimensional
attributes that they subsequently model.
Figure 1-9
Customer Orders
BEAM✲ table
BEAM✲ BEAM✲ short codes act as dimensional modelers’ shorthand for documenting
short codes act generic data properties such as data type and nullability, and specific dimensional
as dimensional properties such as slowly changing dimensions and fact additivity. Short codes can
modeling be used to annotate any BEAM✲ diagram type for technical audiences but can
shorthand easily be hidden or ignored when modeling with stakeholders who are disinter-
ested in the more technical details. Short codes and other BEAM✲ notation con-
ventions will be highlighted in the text in bold. Appendix B provides a reference
list of short codes.
Figure 1-10
Order processing
ER Diagram
By looking at the ERD you can tell that customers may place orders for multiple Example data
products at a time. The BEAM✲ table records the same information, but the models capture
example data also reveals the following: more business
information than
Customers can be individuals, companies, and government bodies. ER models
Products were sold yesterday.
Products have been sold for 10 years.
Products vary considerably in price.
Products can be bundles (made up of 2 products).
Customers can order the same product again on the same day.
Orders are processed in both dollars and pounds.
Orders can be for a single product or bulk quantities.
Discounts are recorded as percentages and money.
Additionally, by scanning the BEAM✲ table you may have already guessed the type Example data
of products that Pomegranate sells and come to some conclusions as to what sort speaks volumes!
of company it is. Example data speaks volumes—wait until you hear what it says
about some of Pomegranate’s (fictional) staff!
BEAM✲ and ER Based on the detail levels described in Table 1-2 the order processing ERD in
notation are jointly Figure 1-10 is a logical data model as it shows primary keys, foreign keys and
used to create cardinality, while the BEAM✲ event in Figure 1-9 is a conceptual model (we prefer
collaborative models “business model”) as this information is missing. With additional columns and
for different short codes it could be added to the BEAM✲ table but each diagram type suits its
audiences target audience as is. BEAM✲ tables are more suitable for collaborative modeling
with stakeholders than traditional ERD based conceptual models. While other
BEAM✲ diagram types and short codes compliment and enhance ERDs for col-
laborating with developers on logical/physical star schema design.
BEAM✲ supports the core agile values: “Individuals and interactions over proc-
esses and tools.”, “Working software over comprehensive documentation.” and
“Customer collaboration over contract negotiation.” BEAM✲ upholds these
values and the agile principle of “maximizing the amount of work not done” by
encouraging DW practitioners to work directly with stakeholders to produce
compilable data models rather than requirements documents, and working BI
prototypes of reports/dashboards rather than mockups.
How to Model a Data Warehouse 25
DATA
PRINCIPAL
DIAGRAM USAGE MODEL AUDIENCE
CHAPTER
TYPE
BEAM✲ (Example Data) Modeling business events and Business Data Modelers 2
Table dimensions one at a time using Logical Business Analysts
example data to document their Physical Business Experts
7Ws details. Stakeholders
BI Users
Example data tables are also used
to describe physical dimension
and fact tables and explain
dimensional design patterns.
Summary
Data warehouses and operational systems are fundamentally different. They have radically
different database requirements and should be modeled using very different techniques.
Star schemas record and describe the measureable events of business processes as fact tables and
dimensions. These are not arbitrary denormalized data structures. Instead they represent the
combination of the 7Ws (who, what, when, where, how many, why and how) that fully describe
the details of each business event. In doing so, fact tables represents verbs, while the facts
(measures) they contain and the dimensions they reference represent nouns.
Even with the right database design techniques there are numerous analysis challenges in
gathering detailed data warehousing requirements in a timely manner.
Both data-driven and reporting-driven analysis are problematic, increasingly so, with DW/BI
development becoming more proactive and taking place in parallel with agile operational
application development.
Iterative, incremental and collaborative data modeling techniques are agile alternatives to the
traditional BI data requirements gathering.
BEAM✲ is an agile data modeling method for engaging BI stakeholders in the design of their
own dimensional data warehouses.
BEAM✲ data stories use the 7Ws framework to discover, describe and document business
events dimensionally.
BEAM✲ modelers encourage collaboration by using simple modeling tools such as whiteboards
and spreadsheets to create inclusive data models.
BEAM✲ models use example data tables and alphanumeric short codes rather than ER data
abstractions and graphical notation to improve stakeholder communication. These models are
readily translated into star schemas.
Business events are the individual actions performed by people or organizations Business events are
during the execution of business processes. When customers buy products or use the measureable
services, brokers trade stocks, and suppliers deliver components, they leave behind atomic details of
a trail of business events within the operational databases of the organizations business processes
involved. These business events contain the atomic-level measurable details of the
business processes that DW/BI systems are built to evaluate.
BEAM✲ uses business events as incremental units of data discovery/data model- BEAM✲ modelers
ing. By prompting business stakeholders to tell their event data stories, BEAM✲ discover BI data
modelers rapidly gather the clear and concise BI data requirements they need to requirements by
produce efficient dimensional designs. telling data stories
In this chapter we begin to describe the BEAM✲ collaborative approach to dimen- This chapter is a
sional modeling, and provide a step-by-step guide to discovering a business event step-by-step guide
and documenting its data stories in a BEAM✲ table: a simple tabular format that is to using BEAM✲
easily translated into a star schema. By following each step you will learn how to tables and the 7Ws
use the 7Ws (who, what, when, where, how many, why, and how) to get stake- to describe event
holders thinking dimensionally about their business processes, and describing the details
information that will become the dimensions and facts of their data warehouse—
one that they themselves helped to design!
Data stories and story types: discrete, recurring and evolving Chapter 2 Topics
Discovering business events: asking “Who does what?” At a Glance
Documenting events: using BEAM✲ Tables
Describing event details: using the 7Ws and stories themes
Modelstorming with whiteboards: practical collaborative data modeling
27
3
MODELING BUSINESS DIMENSIONS
I keep six honest serving-men (They taught me all I knew);
Their names are What and Why and When And How and Where and Who.
— Rudyard Kipling, The Elephant’s Child
Business events and their numeric measurements are only part of the agile dimen- Business events
sional modeling story. On their own, BEAM✲ event tables are not sufficient to need dimensions to
design a data warehouse or even a data mart, because they do not contain all the fully describe them
descriptive attributes required for reporting purposes. For complete BI flexibility, for reporting
stakeholders need both the atomic-level event details modeled so far and higher- purposes
level descriptions that allow those details to be analyzed in practical ways. The data
structures that provide these descriptive attributes are dimensions.
In addition to the 7Ws and example data tables, BEAM✲ uses hierarchy charts and BEAM✲ modelers
change stories to discover and define dimensional attributes. Hierarchy charts are draw hierarchy
used to explore the hierarchical relationships between attributes that support BI charts and tell
drill-down analysis, while change stories allow stakeholders to describe their change stories to
business rules for handling slowly changing dimensions. define dimensions
In this chapter we describe how these BEAM✲ tools and techniques are used to This chapter shows
model complete dimension definitions from individual event details. We will use you how to model
the CUSTOMER and PRODUCT event story details from Chapter 2 for our dimensions from
example dimension modelstorming with stakeholders. event story details
59
4
abbreviations
Designing a data warehouse or data mart for business process measurement BI Stakeholders
demands that you quickly move beyond modeling single business events. All but need multiple
the simplest business processes are made up of multiple business events and BI events for process
stakeholders invariably want to do cross-process analysis. When you modelstorm measurement
these multi-event requirements you soon notice two crucial things:
Stakeholders model events chronologically. As you complete one event, Events sequences
stakeholders naturally think of related events that immediately follow or pre- represent business
cede it. These event sequences represent business processes and value chains processes and
that need to be measured end-to-end. value chains
Stakeholders describe different events using many of the same 7Ws. When Events share
you define an event in terms of its 7Ws, stakeholders start thinking of other common
events with the same details, especially events that share its subject or object. dimensions that
These shared details, known to dimensional modelers as conformed dimen- support cross-
sions, are the basis for cross-process analysis. process analysis
In this chapter we describe how an event matrix, the single most powerful BEAM✲ The event matrix is
artifact, is used to storyboard the data warehouse: rapidly model multiple events, an agile tool for
identify significant business processes and conformed dimensions, and prioritize modeling multiple
their development. events
In this chapter we describe the star schema design process for converting This chapter is a
BEAM✲ models into flexible and efficient dimensional data warehouse models. guide to:
The agile approach that we take begins with test-first design, by using data profiling Verifying BEAM✲
techniques to verify the BEAM✲ model against the data available in source sys- models against
tems. This results in an annotated model which documents source data characteris- available data
tics and issues. This is used for model review with stakeholders and development sources
sprint planning with the DW/BI team.
Next, the revised BEAM✲ model is translated into a logical dimensional model by Converting BEAM✲
adding surrogate keys. The resulting facts and dimensions are documented by models into star
drawing enhanced star schemas using a combination of BEAM✲ and ER notation. schemas
Finally, the star schemas are used to generate physical data warehouse schemas Validating DW
which are validated by BI prototyping and documented by creating a physical designs by
dimensional matrix. prototyping
129
PART II: DIMENSIONAL DESIGN
PATTERNS
DIMENSIONAL MODELING TECHNIQUES FOR PERFORMANCE, FLEXIBILITY, AND USABILITY
Chapter 6: Who and What: People and Organizations, Products and Services
Who’s on first?
— Bud Abbott and Lou Costello
What’s next?
— President Jed Bartlet, “The West Wing”
Who and what dimensions such as CUSTOMER, EMPLOYEE and PRODUCT Who and what are
represent some of the most interesting, highly scrutinized, and complex dimen- the most important
sions of a data warehouse. Modeling these dimensions and their inherent hierar- dimensions
chies presents a number of challenges that can be addressed by design patterns.
In the first of our W-themed design pattern chapters we begin by describing mini- This chapter
dimensions and snowflaking for handling large, volatile customer dimensions, describes design
swappable dimensions for mixed customer type models and hierarchy maps for patterns for defining
recursive customer relationships. We then move on to employee dimensions to flexible, high
cover hybrid SCD views for current value/historic value (CV/HV) reporting re- performance who
quirements and multi-valued hierarchy maps for multi-parent HR hierarchies with and what
dotted-line relationships. We finish by looking at product and service dimension dimensions
issues and introduce multi-level dimensions for variable detail facts and reverse
hierarchy maps for component analysis.
Every business event happens at a point in time or represents an interval of time. Time is the most
Time is the primary way that BI queries group (“show me monthly totals”), filter frequently used
(“show me sales for Financial Q1”), and compare business events (“How are we dimension for BI
doing year to date, versus last year?”). That is why every fact table has at least one analysis
time (when) dimension.
Most business events occur at a specific geographical or online location. Many Location dimensions
interesting events represent changes of location. Hence, a large number of fact and attributes are
tables have distinct where dimensions in addition to the location attributes that can frequently used too
be found in who and what dimensions, such as customer and product.
Although when and where are separate dimensions, they can influence one an- Time and location
other: Time zones, holidays and seasons, are all examples of location-specific time are separate
attributes that are affected by event geography. Similarly, analytically significant dimensions but can
locations such as the first and last locations in a sequence of events are timing- affect one another
specific location dimensions, affected by event chronology.
In this chapter, we describe dimensional design patterns for efficiently handling This chapter
time and location, in particular, patterns for correctly analyzing year-to-date facts, describes when and
and journeys—facts that represent changes in space and time, that are all about where patterns
where and when.
Efficient date and time reporting Chapter 7 Design
Correct year-to-date analysis Challenges
Time zones, international holidays and seasons At a Glance
National language support
Trip and journey analysis
203
8
HOW MANY
Design Patterns for High Performance Fact Tables and Flexible Measures
In this chapter we describe how the three fact table patterns—transaction fact This chapter
tables, periodic snapshots, and accumulating snapshots—are implemented to covers techniques
efficiently measure discrete, recurring and evolving business events. We particu- for incrementally
larly focus on the agile design of accumulating snapshots, by describing how the designing and
requirements for these powerful but complex fact tables can be visually modeled as developing high-
evolving events using event timelines, our final BEAM✲ modelstorming tool. We performance fact
also describe the BEAM✲ notation for capturing fact additivity and fully docu- tables and flexible
menting the limitations of semi-additive facts, such as balances. We conclude with measures
techniques for optimizing fact table performance and multi-fact table reporting by
concentrating on design patterns for aggregates and other derived fact tables that
accelerate and simplify BI queries
227
WHY AND HOW
9
Dimensional Design Patterns for Cause and Effect
How am I doing?
— Ed Koch, Mayor of New York 1978–1989
Some of the most valuable dimensions in a data warehouse attempt to explain why Why and how
and how events occur. Why dimensions are used to describe direct and indirect dimensions are
causal factors. They are often closely linked to the how dimensions that provide all closely linked: they
the remaining event descriptions that are not related to the major who, what, when describe cause and
and where dimension types. Together why and how represent cause and effect and effect
complete the 7W dimensional description of a business event.
In our final chapter we cover dimensional design patterns for describing how This chapter
events occur and why facts vary. We focus particularly on bridge table patterns for describes why and
representing multiple causal factors and multi-valued dimensions in general. We how dimension
describe how bridge table weighting factors are used to preserve atomic fact granu- design patterns
larity and avoid ETL time fact allocations. We also describe how bridge tables can
be augmented with multi-level dimensions and pivoted dimensions to efficiently
handle barely multi-valued reporting and complex combination constraints. We
conclude with step, range band and audit dimension techniques for analyzing
sequential events, grouping by facts and handling ETL metadata.
Websites
decisionone.co.uk : DecisionOne Consulting, Lawrence Corr’s training and consulting firm.
modelstorming.com : The companion website to this book where you can download the BEAM✲
Modelstormer spreadsheet, the BI Model Canvas (inspired by the Business Model Canvas) plus other
useful BEAM✲ tools and example models from the book and beyond. It also contains links to our rec-
ommended books, articles, websites, and training courses.
End of Preview
US: amazon.com
UK: amazon.co.uk
World: bookdepository.com
eBook: odelstorming